CN1010810B

CN1010810B - Computer architecture supporting forth language

Info

Publication number: CN1010810B
Application number: CN89102661A
Authority: CN
Inventors: 刘大力
Original assignee: Beijing Daxing Doosi Software Co ltd
Current assignee: Nansi Science and Technology Development Co., Ltd., Beijing
Priority date: 1989-03-04
Filing date: 1989-03-04
Publication date: 1990-12-12
Also published as: CN1035571A

Abstract

The present invention relates to a computer architecture which supports the higher-level language FORTH of fourth generation computers, and adopts RISC, CISC and WISC techniques. The computer architecture comprises an original stack management part, an arithmetic part, a shift register part, a control part, etc.; both the combination of all parts, and the single design embody the overall conception of the present invention. Compared with the traditional computer, the computer of the present invention has the advantages of rapider arithmetic speed, simpler structure, and easier and more flexible operation programming.

Description

Computer architecture supporting Forth language

The present invention relates to a kind of Computer Architecture, more particularly, the present invention relates to a kind of can directly support the 4th generation high-level [computer-FORTH Computer Architecture.

All be equipped with abundant software in the present computer system, wherein traditional higher level lanquage (for example: FORTRAN, COBOL, PASCAL, BASIC etc.) has all obtained using widely.But, the function of this class higher level lanquage all is limited, and generally all need finish by manufacturer to the secondary development work of this speech like sound itself, can not finish by language itself, in addition, though more convenient, the standard of this speech like sound writing program is because the interface between they and the machine is too much, so compilation speed and efficient are all lower.Because the opacity to machine also is very restricted the quality of calling program and the raising of operational efficiency.For example, when using formula translation, at first on operation system interface, use text edit software, edit a FORTRAN source files of program, then will be on the FORTRAN compiler system interface, source files of program is carried out two times scanning compilings, thereby produce corresponding target program file; After compiling is passed through, return the link software of using system on the operation system interface, file destination is linked the location, make it to produce corresponding executable file, such executable file just can directly operation on operation system interface.When needing to call assembly language program(me) in the FORTRAN routine, then need by with produce the similar method of FORTRAN target program file, the text edit software of using system and assembly language generate the compilation target program file, link the location with the FORTRAN target program file then.This shows, the very numerous rope of such compilation process, the conversion at interface is also a lot, has influenced compilation speed and efficient greatly, and the beginner also is difficult for grasping.

In recent years, people seek more and more energetically that the interface is simple, structuring reaches the compile efficiency height well, and can be from the high-level software instrument of expansion (self-secondary development), thus occurred having clear superiority the 4th generation the high-level programming language FORTH.

The FORTH system is different with other higher level lanquage, and the FORTH system just needn't carry out the conversion at any interface after in a single day generating, and only just can carry out from editor on this bed interface of FORTH system, be compiled into the work under each state of performer.This is because the FORTH system has held the major function of the resident module of operating system, and the text editor of oneself is arranged.In addition, also comprised the FORTH compilation of the single pass of complete latticeization in the FORTH system, it can be in the FORTH system direct compilation and operation assembly routine.In addition, FORTH also has many advantages that are better than other higher level lanquage, for example it is a complete latticeization, and having self-extended capability, it is transparent language, also is the shortest language of object code or the like, because have above-mentioned advantage, FORTH, has obtained using more and more widely especially in the U.S. in countries in the world.

Yet present computing machine all is traditional Feng Yiman type computing machine.Such computing machine all has the order set of a cover machine, and the CPU of machine can only analyze, carry out this cover machine instruction set, and people must get on to develop software on the basis of this order set.Therefore just need to develop a kind of next generation computer that can directly move the FORTH higher level lanquage.

At present, traditional Computer Architecture mainly is to adopt complicated order set computer organization, i.e. CISC(Complex Instruction Set Computer) structure.In order to obtain exectorial faster speed, make simultaneously simple in structure, circuit reduction, cost is cheaper, has occurred a kind of new Computer Architecture-(RISC(Reduced Instruction Set Computer) structure again.The characteristics of risc architecture computing machine are to carry out an instruction in each clock period; Set form is used in its instruction.The risc architecture computing machine emphasizes use to carry/deposit design in, and finish from memory in memory execution command mode and carry out and deposit in memory.In addition, the risc architecture computing machine uses hardware to connect up and realizes instruction control, without the microcode design, to guarantee the operation of fast speed, single cycle operation sign indicating number.The end of the year 1987, the Koopman of the U.S. has proposed a kind of interactive computer structure, it can risc architecture with CISC structure composition be in the same place, produce a kind of more interesting, rationally distributed, flexible, fireballing machine, this structure is named as WISC(Writable Instruction Set Computer) structure.

About the reference paper of FORTH higher level lanquage and CISC, RISC and WISC architecture is asked for an interview Liu Dali, Li Xiaozhun, Zhang Hanyu writes, " fourth-generation computer higher level lanquage-FORTH " book that the People's Telecon Publishing House published in February, 1988, and the article " Reduced Instruction Set Computer " stepped on of 1985 the 1st periodicals of " Communication of the ACM " magazine, the article of " Byte " magazine April in 1987 " The WWISC Concept " Phil Koopman.

The purpose of this invention is to provide a kind of novel Computer Architecture, this Computer Architecture is to support the hardware environment of computer advanced language FORTH.The basic operation that this computing machine is finished can be described the semanteme of the basic word of FORTH very effectively, and the basic word of FORTH is equivalent to " assembly instruction " of this computing machine.Computer Architecture of the present invention is directly towards fourth generation language, comprehensively adopt the Computer Architecture of CISC, RISC, WISC technology and grand technical design, and every technical indicators such as its dominant frequency, arithmetic speed, addressing capability have all reached advanced international standard the late nineteen eighties.

First storing apparatus that comprises the net result of the intermediate result of preparation data, operation of a deposit operation and complete operation in the Computer Architecture of the present invention; Deposit the breakpoint address of subroutine call and call second storing apparatus that recovers the address of breakpoint when returning for one; Deposit system for computer software for one, the main memory storage apparatus of instruction and user program; Finish the first storing apparatus management devices according to the microoperation control signal that instruction decode produced for one to the write operation and the read operation of first storing apparatus; Finish the second storing apparatus management devices according to the microoperation control signal that instruction decode produced for one to the write operation and the read operation of second storing apparatus; The main memory storage apparatus management devices of the address of a management access main memory storage apparatus; One receives instruction that the main memory storage apparatus reads or data and for the write operation of main memory storage apparatus being prepared the main memory storage apparatus FPDP of data; A combinational logic decoding control device of the instruction of taking out in the main memory storage apparatus being deciphered and produced a plurality of microoperation signals; An arithmetic unit of forming by traffic pilot and the arithmetic logical operation device that carries out arithmetic logical operation; And shift register arrangement that comprises first register and second register at least.

Fig. 1 is the The general frame of Computer Architecture of the present invention;

Fig. 2 is an explanation focus of the present invention, disperses the synoptic diagram of data path;

Fig. 3 is the synoptic diagram that specifies parameter stack of the present invention management component;

Fig. 4 is the synoptic diagram that specifies return stack of the present invention management component

Fig. 5 is the synoptic diagram that specifies shift register parts of the present invention

Fig. 6 specifies the synoptic diagram that instruction of the present invention repeats control assembly

Fig. 7 is the synoptic diagram that specifies arithmetical unit parts of the present invention

Fig. 8 is the synoptic diagram that specifies main memory address administration parts of the present invention

Fig. 9 is the synoptic diagram that specifies main memory FPDP parts of the present invention

Figure 10 is the synoptic diagram that specifies serial of the present invention, parallel I/O port part

Figure 11 is the synoptic diagram that specifies parallel I of the present invention/O port part

Figure 12 is the synoptic diagram that specifies extraction of square root computing hardware circuit of the present invention

Figure 13 is the synoptic diagram that explanation computing machine of the present invention is finished an instruction process;

Figure 14 is the synoptic diagram of explanation dual latch principle of the present invention;

Figure 15 be explanation in the computing machine of the present invention the TOP register and the synoptic diagram of NXT latch principle;

Figure 16 is the synoptic diagram of the clock signal waveform in the explanation computing machine of the present invention;

Figure 17 is the synoptic diagram of the dual latch structure in the explanation computing machine of the present invention.

One embodiment of the present of invention are described below with reference to the accompanying drawings, illustrate design philosophy of the present invention simultaneously.Each particular content of it should be noted that this place narration does not limit the scope of the invention only so as to explaining the means of the present invention's design.Those skilled in the art person may make various changes and the modification that does not break away from the scope of the invention in view of the above.Therefore, determine that patented claim protection domain of the present invention should be according to appended claim.

Referring to Fig. 1, support FORTH in order to make Computer Architecture of the present invention, two stacks-parameter stack, return stack in the FORTH are arranged to two independently storage portions, i.e. parameter stack storage portions 1 and return stack storage portions 2 among Fig. 1.The capacity of parameter stack storage portions can be 1K * 16bit; The capacity of return stack storage portions can be 1K * 20bit.Accordingly, be provided with the management component of these two stacks among the CPU3: parameter stack FSD of management component and the FRD of return stack management component.The effect of parameter stack is preparation data of depositing an operation, the net result of intermediate result and complete operation.The parameter stack is write a number, mean and finished stack-incoming operation one time.On the contrary, read a number, then be equivalent to finish and once go out stack operation from the parameter stack.The effect of return stack in FORTH is to preserve the breakpoint address that word (subroutine) calls, and recovers breakpoint address calling when returning, and can deposit intermediate result for a certain operation under certain condition simultaneously.For preserving 20 address, return stack is 20, and the parameter stack is 16.The major function of FSD of stack management component and FRD is that the microoperation control signal that produces according to instruction decode is finished write operation (stacked) and the read operation (popping) to the stack storage portions.

In architecture of the present invention, the capacity of main storage portions 4 can be 2 ²⁰=1M word is used for storage system software and user program.Computing machine constantly takes out instruction and carries out this instruction from this storage portions.Before an instruction in program obtained carrying out, CPU3 must provide this instruction or data in the address of main storage portions, just can finish the visit to main storage portions.The main effect of the main memory storage regional address FPCA of management component is the address that forms the main storage portions of visit.The effect of main memory FPDP parts FD is to receive the instruction or the data of reading from main storage portions, for the write operation of finishing main storage portions is ready to data, if what take out from main storage portions is instruction, then combinational logic encoded control parts FCC is delivered in this instruction, produce various microoperation signals (various switching signal) by it, thereby finish the operation of an instruction defined.

Arithmetical unit parts FALU is made up of three parts, that is: traffic pilot (herein be 24 select a traffic pilot) MUX-Y, and its effect is to select register that inner each user of CPU can use as operand; Another is arithmetic logical unit ALU, and it can carry out 9 kinds of arithmetic logical operations, the operation of 5 FORTH words; Another is extraction of square root hardware circuit SQR, and this link tester is crossed Algorithm for square root hardwareization, can finish the extraction of square root computing of 16 no symbol binary numbers in 8 clock period.

Comprise also among the CPU3 that an instruction repeats control assembly FREPT, a register REPT is arranged in these parts, when this register is effective, can control CPU and repeat and instruct.

Also comprise a shift register parts FTN among the CPU3.It mainly is made up of two registers, that is: TOP register and NXT register.The operating result of ALU parts must be delivered in the FTN parts.Also comprise two barrel shifters (Barral Shifter) in the FTN parts, thereby finish shifting function the ALU operation result.In addition, the FTN parts can be delivered to the result among each register.The main effect of TOP register and NXT register is as the parameter stack stack top of FORTH semanteme and time stack top.

Two I/O ports have also been arranged in this CPU3 structure, i.e. 16 bit parallel I/O port FP and 8 bit serial, parallel I/O port FDS/P.The FDS/P port by the setting to this port controling register, can make it finish serial i/O operation except finishing parallel I/O operation.

Register is the basic element of character with the most use among the CPU, the relation in the structure of the present invention between each register as shown in Figure 2, it has reflected data channel of the present invention.

Data path in the structure of the present invention, focus and dispersing is around the operational design of ALU.Can select any one the participation ALU computing in a plurality of registers, operation result is directly sent into the TOP register, is then sent in each register by the TOP register.

Traffic pilot MUX-Y among Fig. 2 is that 5 by order format control mask register.The following describes the relation of each register and each parts of Fig. 1 among Fig. 2.

Contain RD, RPL register among the FRD of return stack management component;

Contain SD, SPL register among the parameter stack FSD of management component;

Contain TOP, NXT, TH, NH register among the shift register parts FTN;

Instruction repeats and contains REPT, RT register among the control assembly FREPT;

SR, MD register have been comprised among the arithmetical unit parts FALU;

Comprised the PC register among the FPCA of core address management component;

D, IL register have been comprised among the main memory FPDP FD.

8 bit serial, parallel I/O port FDS/P have comprised DS/P, XBS, XSP, XIO register;

P, BBS, BIO register have been comprised among the 16 bit parallel I/O port FP.

The effect of these 21 registers will be explained below.Fig. 2 has reflected the data channel of CPU of the present invention basically, but is not whole.It has only reflected the data path around ALU.Data path between the register is not represented in Fig. 2.But can give reflection to some extent on the path of each register from the TOP register.Input end at some registers has multidiameter option switch, they are provided with for the data between register transmit, and these multidiameter option switchs are the microoperation signal controlling that are subjected to instruction decode control assembly FCC.

Each component function and the data path of structure of the present invention have been introduced above.With reference to Fig. 1, Fig. 2 and Figure 13, the process that CPU finishes an instruction is as follows:

In a last instruction cycle, an instruction of taking out from main storage portions is left in the order register IL of main memory FPDP FD, and has carried out pre-service in instruction decode control assembly FCC;

After entering this instruction cycle, the FCC parts discharge through the microoperation control signal after the pre-service;

The microoperation control signal is controlled each parts and is done corresponding actions, finishes the operation of instruction defined.

When finishing this instruction, finish the operation of taking out next bar instruction from main storage portions, and carry out pre-service, wait for the arrival in next instruction cycle.4 of FORTH word R@SWAP-2/ for example, Computer Architecture of the present invention can be finished the operation of these 4 FORTH word defineds in the monocycle.See figures.1.and.2, its process is as follows: the content of return stack stack top register RT is delivered to ALU by traffic pilot MUX-Y; It is RT-TOP that ALU finishes Y-TOR() operation; The operation result of ALU send the TOP register of shift register parts FTN, finishes 2/ operation by the barrel shifter that comprises in the TOP register.

In FORTH, any operation is all relevant with parameter stack and return stack, specifically, any arithmetic, logical operation, accessing operation is (promptly! @), the parameter of subroutine call (word calls) is prepared, and result's transmission etc. is all carried out on the parameter stack; the protection of subroutine (word) breakpoint address when calling and recovery are all carried out on return stack, and the preservation of the intermediate result of certain operations can be carried out on return stack.Yet the intrinsic shortcoming of general storehouse is: an end of stack is fixed, and an end floats, and has only an operating point, and this just makes when data are operated on the stack efficient low.For example for FORTH verb " ten ", finish the addition of parameter stack stack top, inferior stack top, the variation of stack state as shown in the formula:

（s n t→S（n+t））

This operating process is generally described as: take out the stack top value as one of input of ALU; Take out time stack top value another input value as ALU; Finish add operation; Pop down as a result.This operation to spend 4 cycles (establish here to the stack storage portions read and write all in the monocycle, finish).

As seen, addressing the above problem is key towards the machine of storehouse.In the structure of the present invention, aforesaid operations not only can be finished in the monocycle, and FORTH that can compound other many non-ALU computings instruction, finishes in the monocycle.That is to say, can express the semanteme of FORTH word efficiently.

Among the present invention the FSD of management component, the FRD of parameter stack and return stack and the design of shift register parts FTN have been solved the problems referred to above, the back illustrates above-mentioned modular construction with tool spare;

The operation of any word of FORTH is a condition with DSR on the stack all, and the execution result of current word is ready to data for the execution of next word, therefore zero addressing mode here.The operation of access memory is realized according to the address on the stack by a FORTH word, do not belonged to the meaning scope of addressing mode, and initial address or data can be the address, get the instruction of a data pop down and finish from internal memory by current PC.In architecture of the present invention, memory address duct member FPCA has reflected this needs, as shown in Figure 8.Its structure will specifically be narrated in the back.

The executed in parallel degree that improves inner each parts of CPU is the common target of pursuing of Computer Architecture design.Similar to traditional computing machine, have instruction prefetch in the structure of the present invention, preprocessing function.In addition, also have general computing machine and the new design of not adopting.To specify below.

Adopted the dual latch design in TOP register in the architecture of the present invention and the NXT register.The principle of dual latch as shown in figure 14.The effect of dual latch is: the input and output of register can be carried out simultaneously, perhaps can accomplish import earlier afterwards to export and first output afterwards imported, and can carry out destructiveness to register or non-destructive writes by two controls of latching switching signal.SD among the present invention, RT, RD, RPL, SPL, REPT, PC register have also adopted this structure.

The dual latch structure of having represented TOP in the structure of the present invention, NXT register among Figure 15.When the output of ALU entered the TOP register, the initial value of TOP can be sent into NXT simultaneously, and the original content of NXT register can be sent to other places simultaneously.This operation is by TL, NL signal controlling, and this is to make one of main design that storehouse of the present invention efficiently operates.

For the sequential control of traditional computer, the present invention has adopted no sequential control mode.

Comprise three parts in the Combinational Logic Control parts in traditional computing machine: the instruction decode part; Timing sequencer; The combinational logic decoding scheme.The combinational logic decoding scheme is according to the clock signal of timing sequencer, and the requirement and the STATE FEEDBACK CONTROL signal of comprehensive present instruction finally produce a series of microoperation control signals, controls the operations that each parts is finished the instruction defined.At this, an instruction comprises a plurality of cpu cycles, and a cpu cycle comprises a plurality of clock pulse signals.The combinational logic decoding scheme can be arranged the microoperation control signal of concurrent activity in single clock pulse, and arranges to have the microoperation control signal of precedence in different clock pulses.

Combination decoding logic in the architecture of the present invention will not need above-mentioned timing sequencer, replace single time clock is produced two minor clock control signals through time-delay and decoding, by these two time clock controls the microoperation of precedence is arranged, and in the design of data path and parts, arrange most operation not finish with can having sequential.Although inner microoperation still unavoidably has operation successively, the sequential in a clock period is different with the notion of timing sequencer here.In this sense, promptly in the single clock period, can finish the relevant microoperation successively that has, we can say and realized no sequential control.

Figure 16 has represented time clock CLK and two minor clock pulse CP ₁And CP ₂Between waveform relationship.CP ₁And CP ₂Main effect be the moment of control dual latch switch, two latchs up and down of a register can not be opened forever simultaneously, guarantee to read reliably and write register.

In the architecture of the present invention, realized the program control instruction and the organization instruction of FORTH, they are: IF, ELSE, DO ... + LOOP, BEGIN, UNTIL, REPEAT, WHILE, FOR ... NEXT.Here just accelerated the execution speed of FORTH greatly.

Below with reference to accompanying drawings the structure of each parts in the architecture of the present invention is explained.

FORTH is to the stack top of parameter stack, and inferior stack top and the 3rd 's operation is the most frequent, and the present invention is provided with three registers in CPU inside: TOP register, NXT register and SD register.To parameter stack stack top, inferior stack top and the 3rd 's operation is equivalent to operation registers, and the efficient that makes computing machine carry out FORTH improves greatly.Certainly, in CPU, not necessarily be only limited to three registers are set, also can establish four five ... n or the like, just illustrate herein.

Above-mentioned register adopts the dual latch structure, when stack top register TOP is pressed into a numerical value, can send the initial value of TOP register to time stack top register NXT, and the initial value of NXT sends the SD register to, the initial value of SD register sends parameter stack storage portions to, has so just finished the push operation of stack.Eject a value from the parameter stack, its operating process is just in time opposite.

Fig. 3 and Fig. 4 have provided the FSD of management component of two stacks and the structure of FRD respectively, therefrom the also relation of register TOP, NXT and SD as can be seen.

Being described as follows of each several part among Fig. 3:

WES: the read/write signal line of parameter stack storage portions, it controls triple gate SD(PAD in addition) effect;

SDBUS: 16 BDB Bi-directional Data Bus of parameter stack storage portions (external bus) go out stacked data all by this data bus;

SDA: inner 16 transmission lines, the data of reading from parameter stack storage portions are delivered to twin-lock thus and are deposited the register SD;

SDB: inner 16 transmission lines, the data that the parameter stack writes are delivered to the SDBUS bus thus;

SPBUS: 10 bit address buses (external bus) of parameter stack storage portions, determine by it which unit of stack storage portions is done read or write;

SPEN: the external address enable line, whether the content that it controls SP is sent to SPBUS:

SP(PAD): triple gate;

SP: inner stack storage portions address bus, it is by the output decision of traffic pilot MUX-SPL;

TD: the output of register TOP or D, the i.e. output of the MUX-TD traffic pilot among Fig. 5;

SPL: the parameter stack pointer register also is the dual latch structure, and LN1SPL and LN2SPL are respectively its control signals, and the MUX-Y traffic pilot of Fig. 7 is sent in its output;

SPAS: adding of parameter stack pointer SP 1 subtracts 1 device, and CSPAS is a control signal;

NXT: 16 output signals of register NXT, can be referring to Fig. 5;

The SD:SD register, the dual latch structure, LN1SD and LN2SD are respectively control signal, the traffic pilot MUX-N among its output valve warp let-off Fig. 5.

As seen from Figure 3, by MUX-SD, under the control of microoperation control signal MSD, the data that can select TD or NXT to write as the parameter stack; Can select AS by MUX-SPL, SPL, the TD thrin as the parameter stack write, the address value of read operation, AS is that the content of SPL adds 1 or subtract 1 result, the normality output of MUX-SPL is AS, and is that SPL subtracts 1 value, and TD is low 10 of TOP register output.

The stack storage portions write two kinds of selections, i.e. the TOP register of traffic pilot MUX-SD control or the output of D register and NXT register.In addition, the readout of stack storage portions necessarily is sent to the SD register, and the output of SD register is one of input of parameter stack time stack top register NXT, and the content of SD also can select to participate in ALU computing (can referring to Fig. 7) by traffic pilot MUX-Y.

Reading and writing operation to the parameter stack is as follows:

TOP, NXT, SD are respectively stack top, inferior stack top and the 3rd, and the content of the actual stack top of stack storage portions is identical with the SD register.Therefore, finish a pop down or pop, in fact will finish three transmission between the register, and it is consistent with the content of actual stack top in the stack storage portions that the SD register is remained.

If be pressed into a new value V _X(see figure 3):

V _XDeliver to TOP simultaneously the TOP initial value deliver to NXT simultaneously the NXT initial value deliver to SDB by MUX-SD and when WES is low (triple gate conducting), deliver to the SDBUS bus simultaneously, deliver to the SD register by SDBUS and SDA bus simultaneously.

SPBUS is delivered to by MUX-SPL gating AS in the address that writes, and finishes the write operation to stack when WES is effective.Behind the complete operation, send SPL register the SP value, finish the modification of stack pointer, promptly AS=SPL-1 delivers to SPL.

If will be the content V of TOP register _XEject, then to the following (see figure 3) of the operating process of stack:

Take out the value V among the TOP _XThe NXT value is sent TOP simultaneously, the SD content is sent NXT register simultaneously, simultaneously CSPAS=1 makes AS=SPL+1, and MUX-SPL selects AS to deliver to SPBUS as the address of stack read operation by SP, is to read effectively in 1 o'clock at WES, deliver to SDBUS, be transmitted back to the SD register simultaneously, the value of SP sent among the register SPL after finishing read operation, finish and revise the operation that pointer SPL+1 send SPL.

In the management of parameter stack, consider that stack-incoming operation must vacate the TOP register as early as possible, so the content of NXT register must be sent into the stack storage portions as early as possible, therefore, the element address that the content of NXT register will deposit in when making the selected stack storage portions of the normal open attitude address of traffic pilot MUX-SPL be exactly stack-incoming operation, Here it is is provided with the reason (SPL-1 referred to stack storage portions dummy cell) of the normal open attitude of traffic pilot MUX-SPL by SPL-1.

In contrast, when going out stack operation, requiring needn't be too fast from the operation of stack storage portions taking-up data S1, only require as early as possible the TOP content of registers is sent, the content that changes TOP, NXT register as early as possible gets final product, and these operations all are the operations between register, so meet the demands naturally.Like this, only require in the present instruction cycle S1 value is sent into the SD register.When finishing this operation, can finish the operation of SPL+1 more at leisure, the MUX-SPL gating is just finished the read operation of stack storage portions then.

As can be seen, the operation delay that is usually directed to outer path does not here exert an influence to the speed of parameter stack operation, to the rate request not high (with identical the getting final product of visit frequency range of main memory) of stack memory.From the effect that instruction is carried out, the stack management has here reached fast, has walked abreast, and has solved outer path the speed of data this contradiction far below the processing speed of CPU is provided.

Referring to Fig. 4 being described as follows to the FRD of return stack management component:

In the CPU internal configurations two registers as return stack stack top and time stack top.They are respectively RT register (20) and RD register (20).The RT register is located at instruction shown in Figure 6 and repeats among the control assembly FREPT, but also can be located among the FRD of return stack management component.The return stack storage portions is set to 20, is the needs of 20 bit address of 1M word to satisfy the preservation addressing capability.In Fig. 4, one 4 extended register ID is arranged, its effect does not have substantial connection with the FRD of stack management component, and it only is standby in order to expand outside storage portions space.

The effect of return stack is: by PC value of program counter being pressed into return stack and return stack stack top value being ejected and protection and recover breakpoint address when sending programmable counter PC to realize that word calls.

The data path of return stack stack top register RT and programmable counter PC is seen the main memory storage regional address FPCA of management component shown in Figure 8.RT(L) and RT(H) be respectively the low 16 and high 4 of 20 bit registers, the output of PC register send the situation of RT register to find out in Fig. 6, i.e. PC(H) and PC(L) send into traffic pilot MUX-RTH and MUX-RT respectively.

Fig. 5 has represented the shift register in the structure of the present invention, parts FTN, and it has comprised a part of control circuit.The main effect of FTN parts is as parameter stack stack top register that has shift function and time stack top register.Specify the various piece among Fig. 5 below.

TN is that the twin-lock that has shift function is deposited register, and its structure as shown in figure 17.Wherein LN1 and LN2 are the switching signal of secondary latch, and a barrel shifter (Barral Shifter) is set between this secondary latch, and this barrel shifter can be finished and move to right two, one, do not move, and one operation moves to left.The data that enter the TOP register can or not moved and are sent to second level latch through the displacement of barrel shifter.

MUX-T and MUX-N are two traffic pilots.The data that write TOP and NXT register have 4 sources respectively, select a kind of serving to state register respectively by these two traffic pilots.

MUX-TD is a traffic pilot of selecting the output of the output of TOP register and D register (being positioned at main memory FPDP parts FD).From Fig. 1 and Fig. 2 as can be seen, the TOP register is the divergence point of data path in the architecture of the present invention, and the D register is a main memory FPDP data register, by the TD signal path, can send to each register to the data of main memory taking-up and the data of parameter stack stack top.

TH and NH are 4 bit registers, and their effect is when needing to use register TOP and NXT to handle 20 address value, deposits the high 4 of 20 bit address therein.

MUX-TH and MUX-NH are respectively the input signal traffic pilots of TH and NH register.

MUX-HAS and MUX-AS are respectively the output multiplexers of TH, NH register and TOP, NXT register, and its effect is to select that (TH, TOP) still (NH, NXT) entering one 20 add 1 subtracts 1 device TNAS.

Add 1 subtract 1 device TNAS can realize to parameter stack stack top or inferior stack top add 1 subtract 1 the operation, this operating in the FORTH often runs into.

One group of signal wire below in Fig. 5 represents that they are control signal feedback signals of sending or send into FCC from encoded control parts FCC.Theing contents are as follows of the input signal cable of figure top:

SD is the output signal line of the 3rd register SD of parameter stack, and it can be sent into the NXT twin-lock by traffic pilot MUX-N and deposit register;

D and ALU represent respectively from the output of data register D and arithmetic and logic unit ALU, and they can write parameter stack stack top register TOP by the selection of traffic pilot MUX-T;

IL(0-7) be the least-significant byte of 16 bit instruction forms, its effect is, when number is high 4 of 20 bit address immediately with the least-significant byte of order format, can write TH to high 4 bit address respectively, the NH register by this path.

In the signal wire among Fig. 6; NCC, EQN are the operation result status flag value of ALU, and NCC is carry digit " non-", and EQN is " equating " state " non-"; DRX and DTX are signal wires, and DRX can send into the serial input signals of going here and there and I/O port FDS/P receives from the outside most significant digit of TOP register, and DTX can be sent to I/O port EDS/P to the lowest order of TOP register, realizes serial output; TOP and NXT representation parameter stack stack top, inferior stack top output signal, TOP is sent to the ALU input end and the FPCA parts of FALU parts, and NXT is sent to three different parts, i.e. traffic pilot MUX-Y, MUX-SD and FPCA parts; HCLK quickens clock, and it can quicken multiplication and division, evolution, square algorithm from outside the CPU.

The FTN parts are one of vitals among the present invention, and its major function has:

1. as the stack top of parameter stack, inferior stack top, realize flexible operating to stack;

2. as the divergence point of the data path of Computer Architecture;

3. as shift register, two, one or one the operation of moving to left can move to right to the data that write TOP and NXT register;

4. as parallel serial i/o port, serial/parallel, the parallel/serial change-over circuit when the serial input and output is realized serial communication;

5. as two shift registers of multiplication and division computing and the multiplication and division register of the computing of extracting square root;

6. as the usefulness of depositing main memory storage regional address, send the FPCA of core address management component, realize with TOP, NXT being the visit of the main storage portions of address by the output of TOP and NXT;

7. deposit the two-stage switch control of register and have the precedence except CP1 and CP2 signal are used for TOP and NXT twin-lock, other control signals of sending from the FCC control assembly all are no sequential, can make the operation of FTN parts operate to walk abreast with miscellaneous part and carry out.

Fig. 6 has represented that the instruction in the structure of the present invention repeats parts FREPT.This modular construction is simpler, and it repeats control register by one 16 instruction, and 20 return stack stack top register (RT, RTH) and one adds 1, subtract 1 device forms.The RD register is as the return stack stack top register, and its output valve can write return stack time stack top register, write-in program counter PC, and there are return stack stack top register RD and programmable counter PC in the input signal source of RT register.In Fig. 6, the input end of MUX-RTH and MUX-RT can be seen these signals.

It is 16 dual latches that instruction repeats control register REPT, the most significant digit of this latch is drawn the input signal as instruction decode control assembly FCC separately, when it is effective, (equal zero), make FCC repeat a needed control signal of instruction, the instruction that every execution is once current, subtract 1 device REPTIAS the value of REPT register is subtracted 1 by adding 1, until till during REPT=FFFFH.

Fig. 7 has represented the arithmetic unit FALU in the architecture of the present invention, and it is made up of four parts:

MUX-Y is a traffic pilot, and it can select 24 road input signals to participate in the ALU computing;

ALU is an arithmetic logical unit, and its effect is to finish 9 kinds of arithmetic logical operations, that is: T+Y, T+Y _O, T-Y, T-Y _O, Y-T, Y-T _C, with, or, non-.5 kinds of FORTH verbs, that is: NIPDUP, SWAP, NIPOVER, DROPDUP, DROPOVER.Two kinds of all the other operations (being ALU=T and ALU=T);

COM is a comparer, and effect is that the output valve of the value of RD register and ALU is compared, and produces a status signal GEQ and send FCC parts, changes the control of certain operations with this;

MSR is an Algorithm for square root hard wire, by the method for Algorithm for square root sclerosis, can improve the arithmetic speed of extraction of square root greatly.

The multiplication and division, the extraction of square root computing that utilize FTN in the structure of the present invention and ALU parts can realize two 16 bits.The multiplication and division computing needs 16 and 17 clock period respectively, and the extraction of square root computing only needs 8 clock period.

The hardware design principle of extraction of square root computing as shown in figure 12.Among the figure, TOP, NXT and SR are that twin-lock is deposited register, and wherein TOP deposits by the extraction of square root number, and NXT deposits the extraction of square root number, and the SR initial value is zero, and final value is the square root of data in the TOP register.Circuit shown in Figure 12 mainly is to finish Algorithm for square root shown in Figure 180, and it can finish once process shown in Figure 180 in the monocycle, by REPT register controlled circulation 8 times, promptly finishes the extraction of square root computing in 8 clock period.

The characteristics of extraction of square root hardware circuit are that it utilizes instruction to repeat control register REPT, finish the extraction of square root operation with repeating of an ALU operational order, make Algorithm for square root can utilize existing hardware resource, as ALU, TOP, the NXT register has been saved hardware spending, makes to realize that in a CPU extraction of square root hardware circuit becomes possibility.

The multiplication and division algorithm in the architecture of the present invention and the design concept of hardware circuit are roughly the same with the design of general computing machine, and difference is: in the general design, use special-purpose totalizer rather than use the used ALU of the present invention to carry out the multiplication and division computing.Because the present invention can utilize instruction to repeat control register and repeat an ALU operational order and finish multiplication and division operations, so it has saved hardware spending.

Fig. 8 has represented the structure of the main memory storage regional address FPCA of management component.FPCA is made up of three parts, and they are: programmable counter PC(20 position); The address is worth totalizer PCAADD partially; The MUX-AH of traffic pilot MUX-AL.

ABUS among Fig. 8 is a main memory storage regional address bus (20); A(PAD) be triple gate; A is 20 bit address lines, links to each other with ABUS by triple gate; A(H) be the high 4 of 20 bit address lines; A(L) be the low 16 of 20 bit address lines; IL(0-14) be to instruct low 15 of form to count immediately, TOP20 is the output signal of TOP and TH register; NXT20 is the output signal of NXT and NH register; RT20 is the output signal of return stack stack top register.

The FPCA parts are selected the address of source, eight kinds of addresses as the main storage portions of visit by the selection of traffic pilot MUX-AH and MUX-AL, and the while address is worth totalizer PCAADD partially and revises pointer, send the PC register holds then.The characteristics of these parts are that it has changed the design that the address directly is provided to address bus by programmable counter PC in the traditional design.

Fig. 9 has represented the FPDP parts FD in the architecture of the present invention, it is made up of two registers and a traffic pilot, wherein the D register is used to deposit the data of reading or write main storage portions from main storage portions, and the IL register is used to deposit the instruction of taking out from main storage portions; Traffic pilot MUX-DIO is used to select 4 kinds of Data Sources a kind ofly send main storage portions data bus dbus.WED among the figure is a read-write control signal.

Figure 10 has represented the serial/parallel I/O port FDS/P of eight of architecture of the present invention parts, and it can realize walking abreast, serial i/O communication, can realize bit mask control.In Figure 10, XSP is an eight bit register, and its effect is the serial or parallel working method that the FDS/P port is set, and the XIO among the figure is an eight bit register, and its effect is that the FDS/P port is set is input state or output state; XBS is an eight bit register, and its effect is any several or whole position of 8 I/O paths of shielding FDS/P port; XBSDR is the bit mask decoder driver, and its effect is by the 5th～7 decoding to instruction, changes the set bit mask state of XBS register by force; DS/P inside comprises that a data register reaches the decoding scheme to XBS, XIO, XSP signal, to realize the control action of XBS, XIO and XSP.

Figure 11 has represented the parallel I/O port FP in the structure of the present invention, and it is 16 bit ports, and its structure is very simple.The FP port is made up of three parts: a bit mask register BBS, and it is identical with the XBS effect in the FDS/P port; An input and output direction register BIO, it is identical with the XIO effect in the FDS/P port; A parts P, it comprises the decoding control circuit of a data register and BBS, BIO, to realize control of I/O direction and the bit mask function of BBS, BIO.

Claims

1, directly support a kind of Computer Architecture of the grand technical design of employing of computer advanced language, it comprises:

One first storing apparatus, it is used to deposit the preparation data of an operation, the intermediate result of operation and the net result of complete operation, and operates in the storehouse mode;

The management devices of one first storing apparatus, it finishes writing and read operation first storing apparatus according to the microoperation control signal that instruction decode produced;

One second storing apparatus, it is used to deposit breakpoint address and some necessary datas of subroutine call, and operates in the storehouse mode;

The management devices of one second storing apparatus, it finishes writing and read operation second storing apparatus according to the microoperation control signal that instruction decode produced;

A main memory storage apparatus, it is used to deposit system for computer software, instruction and user program;

The management devices of a main memory storage apparatus, it is used to form the address of visit main memory storage apparatus;

A combinational logic decoding control device, it finishes the decoding to the instruction of taking out in the main memory storage apparatus, produces a plurality of microoperation signals;

Described Computer Architecture is characterised in that it also comprises:

A main memory storage apparatus FPDP is used to receive the instruction of reading from the main memory storage apparatus or data and is ready to data for the write operation of finishing the main memory storage apparatus;

An arithmetic unit, it comprises one first multiplexing unit (MUX-Y) and an arithmetic logical operation device that carries out multiple arithmetic logical operation, this multiplexing unit is selected the first storing apparatus management devices, the second storing apparatus management devices, main memory storage management devices, and one of content of a plurality of registers enters the arithmetic logical operation device and participates in arithmetic operation in the main memory storage apparatus FPDP;

A shift register arrangement, it comprises at least one first register (TOP) and second register (NXT) that is attached thereto, first register receives the operation result from the arithmetic logical operation device, content in first register and second register respectively as the first storing apparatus content last and second from the bottom, the output of first register and second register links to each other with the input end of above-mentioned arithmetic logical operation device, so that participation arithmetic operation, first register is as the divergence point of data path in the architecture, with the first storing apparatus management devices, the second storing apparatus management devices, main memory storage apparatus management devices, a plurality of registers in main memory storage apparatus FPDP and the arithmetic unit are connected.

2, Computer Architecture as claimed in claim 1, comprising one the 3rd register (SD), the 3rd register is connected with first storing apparatus with second register respectively, content in the 3rd register is as the item third from the bottom of the content of first storing apparatus, and it is corresponding with last of actual content in first storing apparatus, the content of second register writes first storing apparatus and all passes through the 3rd register from the content that first storing apparatus is read, the 3rd register also links to each other with the arithmetic logical operation device, participates in arithmetic operation to guarantee it.

3, Computer Architecture as claimed in claim 2, the wherein said first storing apparatus management devices also comprises one second traffic pilot (MUX-SD), this second traffic pilot selects one of output valve of first register and second register as the content that writes first storing apparatus and the 3rd register, thereby make first storing apparatus have these two data sources of first register and second register, and last of actual content is identical in the content that guarantees the 3rd register and first storing apparatus.

4, Computer Architecture as claimed in claim 3, in wherein said first register, second register and the 3rd register each all is made of dual latch, in this dual latch each all receives the control signal from the combinational logic decoding control device, thereby the input and output of first register, second register and the 3rd register can be carried out simultaneously.