CN1664775A

CN1664775A - Data by-passage technology in digital signal processor

Info

Publication number: CN1664775A
Application number: CN 200410016756
Authority: CN
Inventors: 陈晓毅; 刘鹏; 姚庆栋; 李东晓; 俞国军
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2004-03-03
Filing date: 2004-03-03
Publication date: 2005-09-07
Anticipated expiration: 2024-03-03
Also published as: CN100514281C

Abstract

This invention discloses a microprocessor and a computer system to supply a DSP structure for internal memory, especially relating to data byway technology, offering a new data byway technology, wherein its circuit, six-way data transfer is actualized with four ways choosing parallel data according to priority in eleven data sources and two ways choosing them in 3 sources. The invention has the advantages that it decreases conflict stagnation in pipeline and improves real-time processing ability, and the data byway technology of DSP in new six-way pipeline structure adopts parallel processing technology fir the key four ways, compared to the normal means that every way needs ten data selector to select serial data.

Description

Data bypass technology in a kind of digital signal processor

Technical field

The present invention relates to microprocessor and computer system, more particularly, the present invention relates to a kind of digital signal processor (DSP) structure, refer more particularly to data bypass technology in the digital signal processor towards internal memory.

Background technology

Increase along with demand in the development of modern microelectronic technique and the practical application, digital signal device towards internal memory operation becomes more and more popular, and wherein maximum characteristics are can take out two data the data-carrier store simultaneously from sheet to carry out operations such as logic, calculating in a clock.Because the individualized and customization trend of electronic equipment, digital signal processor must be pursued higher and faster arithmetic speed, under the certain process conditions restriction, adopts multistage pipeline organization to become to solve a kind of means of clock bottleneck.Be the principle of work that example illustrates data bypass circuit in the digital signal processor (BPU) now with 6 stage flowing structures.

The principle of work of data bypass circuit in the digital signal processor architecture streamline is described with reference to figure 1.

Phase one, the instruction fetch phase (IF stage): programmable counter 101 provides a virtual address S1 to on-chip command storer 102.On-chip command storer 102 is according to the instruction S2 of one 32 bit of corresponding address output.This instruction according to circumstances difference can be towards register instruction (elementary instruction), towards the DSP of internal memory instruction (DSP instruction).The elementary instruction vector comprises general address (rs, rt), destination address (rd) and operation control code (op).The address, address searching modes sign indicating number (mode) and the operation control code (op) that comprise destination register (rd), two address background registers (ARm, ARn), two indexed registers (IR0, IR1) in the DSP instruction vector.

Subordinate phase, the instruction decode stage (ID stage): the output terminal 103a by this register 103 after instructing vectorial S2 through interface registers 103 time-delays clock period sends command decoder 105 to.The instruction vector decoding back output control code S3 that this code translator 105 will be imported is to control module, output register reference address S5 is to register file, export current read register address S6 to data bypass circuit and later on each stage the control, the data-signal S4 that use arrive interface registers 108.Register file is exported corresponding data S7 to data bypass circuit 107 according to access register address S5, and signal S7 has comprised 6 32 data, is respectively: rs, rt, ARm, ARn, IR0, IR1.Data bypass circuit is selected corresponding 6 value S8..S13 according to correlationship and priority relationship from S22, S23, S27, S31, S33, S7 set of signals, and delivers to interface registers 108 and latch.

Phase III, the address computation stage (DA stage): two address logic unit (DALU) 109,110 are according to operating control code S20, S21 to operand S10a, S12a, S11a, S13a carry out corresponding operation, and with S22, S23 visit data-carrier store 111,112 on two sheets as a result, their value of while is also delivered to interface registers 113 and is latched, and feeds back to the data source that data bypass circuit 107 is used as bypass.Address register upgrades according to address addressing mode sign indicating number, and its updating value S35, S36 also are latched into 113.

The quadravalence section, the internal storage access stage (DM stage): according to address S22, S23, data register is through corresponding two data S25, S26 of output after certain access time on the sheet, and they are latched into interface registers 114 under the effect of clock.Control signal S24 also latchs into 114.

Five-stage, carry out computing dial-tone stage (EX stage): data logical block (ALU) 115 is carried out corresponding arithmetic operation according to operation control code S15 to operand S17, S18, and operation result S16 is input to interface registers 116.

In the 6th stage, write the stage (WB stage) as a result: will have data logic operation result in the interface registers 116 and two address registers, two indexed registers, data register is read from the dual-port sheet in one of them, and the aggregate signal S33 of the value of poke is written to the register file 106.

The available following table 1 of the operation of above-mentioned 6 stage pipeline organizations is observed.

Table 1

Cycle	Instruction fetch	Decoding	Address computation	Internal storage access	Carry out	Write the result
Cycle	Instruction fetch	Decoding	Address computation	Internal storage access	Carry out	Write the result	????0	Instruction 1
????1	Instruction 2	Instruction 1					????0	Instruction 1
????1	Instruction 2	Instruction 1					????2	Instruction 3	Instruction 2	Instruction 1
????3	Instruction 4	Instruction 3	Instruction 2	Instruction 1			????2	Instruction 3	Instruction 2	Instruction 1

????4	Instruction 5	Instruction 4	Instruction 3	Instruction 2	Instruction 1
????4	Instruction 5	Instruction 4	Instruction 3	Instruction 2	Instruction 1		????5	Instruction 6	Instruction 5	Instruction 4	Instruction 3	Instruction 2	Instruction 1
????6	Instruction 7	Instruction 6	Instruction 5	Instruction 4	Instruction 3	Instruction 2	????5	Instruction 6	Instruction 5	Instruction 4	Instruction 3	Instruction 2	Instruction 1

In the cycle 0, instruction 1 is in the instruction fetch phase, and instruct 1 to take out 102 from the on-chip command storer this moment.In the cycle 1, instruction 1 moves on to the decoding stage, and the register address according to regulation in the instruction 1 reads register value.At this moment, instruction 2 enters the instruction fetch phase.In the cycle 2, instruction 1 enters the address computation stage, and the value of address register, indexed registers is offered address logic unit 109,110 and carries out corresponding operation according to control code.Instruction 2 moves into the decoding stage, and instruction 3 enters the instruction fetch phase.In the cycle 3, instruction 1 enters the internal memory fetch phase, and S25, S26 latch into the next operation of interface registers 114 waits reading as a result, and a stage is also all moved in instruction subsequently, newly instructs 4 to enter the stage of reading instruction.In the cycle 4, instruction 1 enters the execute phase, according to control code S15 two data S17, S18 is carried out various operations, and instruction subsequently also all moves forward a stage, newly instructs 5 to enter the stage of reading instruction.At last, in the cycle 5, instruction 1 enters the stage as a result of writing, and poke writes register file in one of them that data register is read data logical block 115 results, address register, indexed registers, from the dual-port sheet.Instruction subsequently also all moves forward a stage, newly instructs 6 to enter the stage of reading instruction.

From table 1 streamline chart, can see that when the operand in the

instruction

2,3,4,5 or had been read, the result of instruction 1 just was written in the register file 106.If instruct 2,3,4,5 need use instruction 1 result in the stage of read operation number or execution command, instruct 1 result just must be switched in advance in the bypass circuit 107 in ID stage so, otherwise unnecessary data conflict (DATAHAZARD) can take place and cause pipeline stall to be waited for.For example, if instruction 2 will be used instruction 1 data logical block result, streamline must pause and wait by the time the cycle 5 so, instruct 1 when writing operation result the bypass result to module 107.

Consider the follow procedure fragment:

ADD ARm, B, C #B+C, the result deposits address register ARm in

SUB E, [ARm], D #[ARm]-D, the result deposits register E in.[ARm] expression is according to data register 111 on the value visit sheet of ARm#.

Command N 1

Command N 2

The execution order of said procedure fragment can be represented with the flowing water line chart of following table 2.

As can be seen from Table 2, the result of B+C will write address register ARm.Can but need to visit data register 111 on the sheet at ARm.Therefore, the streamline wait that must pause, when writing the ARm value by the time, its value is bypassed to the ID stage simultaneously, otherwise just can not get expected result.

In the processor of reduced instruction, because the structure of streamline and the structure of digital signal processor streamline have very big difference, before reference-to storage, general general structure is for getting finger, decoding, execution, access memory, write-back usually for the execution level of reduced instruction processor.And the reduced instruction processor adopting is towards register manipulation, and its two operands can only be two registers, so its data bypass circuit simple many than in the digital signal processor.Address register, indexed registers, base register etc. are arranged in the digital signal processor, simultaneously may use 4 register values, and the pipeline depth increase, so that improve work clock, therefore, the design of data bypass circuit also will consider to satisfy the requirement of time delay.In order to embody the characteristics of real-time processing, reduce the pause of streamline as far as possible.To sum up, The key factor has been played in the design of the data bypass circuit in the digital signal processor.

At present, having used detection of serial data address conflict and data in the data bypass of common digital signal processor pipeline organization selects.

Table 2

Cycle	Instruction fetch	Decoding	Address computation	Internal storage access	Carry out	Write the result
Cycle	Instruction fetch	Decoding	Address computation	Internal storage access	Carry out	Write the result	0	ADD ARm，B，C
1	SUB E，[ARm]，D	ADD?ARm，B，C					0	ADD ARm，B，C
1	SUB E，[ARm]，D	ADD?ARm，B，C					2	Command N 1	SUB E，[ARm]，D	?ADD ?ARm，B，C
3	Command N 1	SUB E，[ARm]，D	?Nop	?ADD ?ARm，B，C			2	Command N 1	SUB E，[ARm]，D	?ADD ?ARm，B，C
3	Command N 1	SUB E，[ARm]，D	?Nop	?ADD ?ARm，B，C			4	Command N 1	SUB	?NOP	?NOP	?ADD

		?E，[ARm]，D			?ARm，B，C
		?E，[ARm]，D			?ARm，B，C		?5	Command N 1	?SUB ?E，[ARm]，D	?NOP	?NOP	?NOP	?ADD ?ARm，B，C
?6	Command N 2	Command N 1	?SUB ?E，[ARm]，D	?NOP	?NOP	?NOP	?5	Command N 1	?SUB ?E，[ARm]，D	?NOP	?NOP	?NOP	?ADD ?ARm，B，C

Summary of the invention

The objective of the invention is to overcome deficiency of the prior art, provide a kind of digital signal processor architecture (DSP), especially data bypass technology in the digital signal processor towards internal memory.

In order to solve the problems of the technologies described above, the present invention is achieved by the following technical solutions:

The present invention proposes a kind of new data bypass technology, in this circuit, realized 6 tunnel data forwarding, wherein 4 the tunnel have the parallel data of priority to select to 11 data sources, and 2 the tunnel have the parallel data of priority to select to 3 data sources.

The digital signal processor streamline adopts six stage flowing water among the present invention, is respectively instruction fetch phase, decoding stage, address computation stage, internal storage access stage, execute phase and write the stage as a result.

Data bypass technology among the present invention is made up of two parts, and a part is that the parallel data address conflict detects, and comprises 10 5 bit address comparers; Another part is that prioritized data is selected, and according to parallel data address conflict testing result 11 data sources is carried out prioritized data and selects.

There have 10 data sources to realize in 4 tunnel 11 data sources of data bypass technology among the present invention to be shared, is respectively: from the address computation stage not have to pass through the address register updating value that latchs, data logical consequence from the address register latched value in internal storage access stage, from the address register latched value of execute phase, from the address register latched value of writing stage as a result, after latching, data register is read from the dual-port sheet in one of them poke; Data shared is not for reading the register value from register file separately, and the data of writing in front have higher priority, and the address register updating value in address computation stage has the highest priority; Have 2 data shared in 3 data sources in described 2 tunnel, be respectively: from write data logical consequence after the latching of stage as a result, data register is read from the dual-port sheet in one of them poke; Not shared is the register value of reading separately from register file, and the data of writing in front have higher priority.

The data bypass technology has 8 address registers among the present invention, is wherein the 8th to the 15th in 32 32 bit register files, and can be used as general-purpose register and use, promptly can be multiplexing; The indexed registers address is fixed as the 24th, 25 in 32, and is also multiplexing with general-purpose register.

Parallel data address conflict testing circuit comprises the first to the 14 comparer (CMP1, CMP2, CMP3, CMP4 among the present invention, CMP5, CMP6, CMP7, CMP8, CMP9, CMP10, CMP11, CMP12, CMP13, CMP14), prioritized data selects circuit to comprise first to the 5th anti-door (NO1, NO2, the NO3, NO4, NO5), the first to the 18 with the door (an AND1, AND2 ..., AND18), first to the 8th Sheffer stroke gate (NOR1, NOR2 ... NOR8), first to the 8th or the door (OR1, OR2,, OR8), a data selector switch (MUX1).

Compared with prior art, the invention has the beneficial effects as follows:

The conflict that reduces in the streamline pauses, and reduces time delay, improves the clock of processor, thereby improves processing capability in real time.Data bypass technology in the digital signal processor of 6 stage pipeline structure of the present invention design has all adopted parallel processing technique to 4 tunnel of key, and general way to be each road need carry out serial data with 10 data selector switchs selects.

Description of drawings

Fig. 1 is the fundamental diagram of 6 stage pipeline structure of the present invention.

Fig. 2 is the data bypass circuit fundamental diagram in the digital signal processor streamline of one embodiment of the present of invention.

Fig. 3 is for there being the data selection circuit fundamental diagram of priority ranking to 11 data sources in the data bypass technology of the present invention.

Embodiment

Below in conjunction with specific embodiment technical solution of the present invention is elaborated:

The present invention proposes a kind of new data bypass technology that is applied in the digital signal processor streamline.This data bypass technology realizes that 6 circuit-switched data transmit, and wherein 4 the tunnel have the parallel data bypass of priority to 11 data sources, and 2 the tunnel have the parallel data bypass of priority to 3 data sources.4 tunnel forwardings that realize rs, rt, ARm, ARn register data value respectively wherein.2 the tunnel realize the forwarding to IR0, IR1 register data.

Involved in the present invention to the data bypass technology in have a priority ranking the parallel data bypass circuit form by two parts, a part is that the parallel data address conflict detects, a part is that prioritized data is selected.The parallel data address conflict detects and comprises 10 5 bit address comparers, and preceding 4 composition a piles walk abreast relatively to preceding 4 pairs of addresses.4 comparers in back are formed another heap, and back 4 pairs of addresses are walked abreast relatively.Latter two comparer is to 2 pairs of remaining addresses comparison that walk abreast.Prioritized data is selected according to parallel data address conflict testing result 11 data sources preferentially to be selected.

Involved in the present invention to data bypass circuit in wherein to have 10 data sources to realize in 4 tunnel 11 data sources shared, they are respectively: from the ARm in DA stage, the updating value of ARn (not through not latching), from the ARm in DM stage, ARn latched value, the ARm, ARn latched value from the EX stage, ARm, the ARn latched value from the WB stage, the data logical consequence after latching, data register is read from the dual-port sheet in one of them poke.Data shared is not for reading the register value from register file separately.The data of writing in front have higher priority, and promptly the ARm in DA stage, ARn updating value have the highest priority.Wherein have 2 data shared in 3 data sources in 2 tunnel, they are respectively: poke in one of them that data register is read from the data logical consequence after the latching of WB stage, from the dual-port sheet.Not shared is the register value of reading separately from register file.The data of writing in front have higher priority.

Totally 8 of address registers involved in the present invention are wherein the 8th to the 15th in 32 32 bit register files, and can be used as general-purpose register (rs, rt), promptly can be multiplexing.The indexed registers address is fixed as the 24th, 25 in 32, they also with rs, rt is multiplexing.

For example, present instruction will be read the address register of the 14th position of the register file that is arranged in 32 registers through decoding back discovery, and suppose that parallel data address conflict testing circuit detects that it is relevant with the 1st, 5 data in 11 data sources this moment, promptly the 1st comparer, the 5th comparer are output as 32 complete " 1 ", and other are complete " 0 ".Prioritized data selects circuit can select the 1st data source so, because its priority is greater than the 5th value, and the 5th value will be left in the basket.

Technical scheme of the present invention can illustrate that Fig. 2 is an overall pattern with Fig. 2 and Fig. 3, and Fig. 3 is the refinement about parts Prl_sel among Fig. 2.At first with reference to figure 2, each road input signal of data bypass circuit is respectively 45 bit register addresses after the process decoding that is in the ID stage, be respectively m1 (or ID_Reg_Addr1[4:0]), m3 (or ID_Reg_Addr2[4:0]), m5 (or ID_Reg_ARm[4:0]), m7 (or ID_Reg_ARn[4:0]) and 6 32 bit register values of reading from register file, they are respectively m2 (or RF_Rd_Data1[31:0]), m4 (or RF_Rd_Data2[31:0]), m6 (or RF_Rd_ARm[31:0]), m8 (or RF_Rd_ARn[31:0]), m9 (or RF_Rd_IR0[31:0]), m10 (or RF_Rd_IR1[31:0]).Then from other each stages, they are respectively other signals:

Come from the address of DA stage A Rm, ARn, the data of writing enable signal and upgrading, be m35 (or DA_ARm[4:0]), m36 (or DA_ARm_wr), m37 (or DA_ARm_din[31:0]), m38 (or DA_ARn[4:0]), m39 (or DA_ARn_wr), m40 (or DA_ARn_din[31:0]).The meaning of writing enable signal is if it is effective, shows that then this register is destination register in this instruction.

Come from DM stage A Rm, ARn the address, write enable signal and latched data, be m11 (or DA_DM_ARm[4:0]), m12 (or DA_DM_ARm_wr), m13 (or DA_DM_ARm_din[31:0]), m14 (or DA_DM_ARn[4:0]), m15 (or DA_DM_ARn_wr), m16 (or DA_DM_ARn_din[31:0]).

Come from EX stage A Rm, ARn the address, write enable signal and latched data, be m17 (or DM_EX_ARm[4:0]), m18 (DM_EX_ARm_wr), m19 (or DM_EX_ARm_din[31:0]), m20 (or DM_EX_ARn[4:0]), m21 (or DM_EX_ARn_wr), m22 (or DM_EX_ARn_din[31:0]).

Come from the WB stage data logic latch result, register address, write enable signal, i.e. m25 (or EX_WB_Dest1_din[31:0]), m23 (or EX_WB_Dest1[4:0]), m24 (or EX_WB_Dest1_wr).The register address that poke, interior poke will be write in one of them that data register is read from the dual-port sheet, write enable signal, i.e. m28 (or EX_WB_Dest2_din[31:0]), m26 (or EX_WB_Dest2[4:0]), m27 (or EX_WB_Dest2_wr).The address of ARm, ARn, write enable signal and latched data, be m29 (or EX_WB_ARm[4:0]), m30 (or EX_WB_ARm_wr), m31 (or EX_WB_ARm_din[31:0]), m32 (or EX_WB_ARn[4:0]), m33 (or EX_WB_ARn_wr), m22 (or EX_WB_ARn_din[31:0]).

With reference to figure 3, it is the refinement of module 201,202,203,204 among Fig. 2 again, and its function is that parallel data collision detection, prioritized data are selected.Its input signal is compared address n1 respectively, 10 compare address n2, n5, n8, n11, n14, n17, n20, n23, n26, n29.10 comparer enable signal n3, n6, n9, n12, n15, n18, n21, n24, n27, n30.11 selected data source n4, n7, n10, n13, n16, n19, n22, n25, n28, n31, n32.

According to above signal definition and in conjunction with Fig. 2, Fig. 3, can know that rs, rt, ARm, ARn data bypass have called module Prl_sel (see figure 3), because their structure is closely similar, the different address differences that just is compared, therefore, we just stress with the bypath principle of rs.

The address m1 of rs register is taken as the n1 signal and is input to the Prl_sel module by decoding scheme 105 outputs.The value of rs register is obtained by the address of register file 106 according to it, and is used as signal n32 and is input to the Prl_sel module.Other n2 then are the correspondence of m11 to m40 to the n31 signal, and they are respectively the data, the addresses that come from each stage, write enable signal.With reference to figure 3, be compared address n1 and be fed in 10 comparers.And another comparand in each comparer is the feedback addresses in each stage.Control comparer and enable with writing enable signal.If enable signal is " 0 ", then, the comparer operate as normal, if be " 1 ", then comparer is not worked, all the time output " 0 ".If comparer is made equal differentiation, can export complete " 1 " signal of 32 so, otherwise export 32 complete " 0 ".Therefore, 10 comparers, 10 comparative result cc1 of output that can walk abreast are to cc10.10 comparative results are divided into 3 heaps, cc1, cc2, cc3, cc4 are the 1st heap, cc5, cc6, cc7, cc8 are the 2nd heap, cc9, cc10 are the 3rd heap.At first see the working condition of first heap.The priority of cc1 is the highest as a result in this heap, and cc4 is minimum.Therefore cc1 and the corresponding input data n 4 of this comparer are carried out and operation output sc1.Suppose n1, the n2 address equates that then cc1 be " 32 ' HFFFFFFFF ", the result or the n4 of it and n4 and operation, so sc1=n4.Simultaneously it by or the operation of door NO1, rejection gate NOR1, NOR2, make output sc2, sc3, the sc4 with door AND2, an AND3, AND4 be complete " 0 ", thereby ignored the comparative result of comparator C MP2, CMP3, CMP4.Again by or a door OR1 output tc1 be the value of n4.Realized having 4 of priority to select 1.In like manner, equating that with the n6 address then cc2 is " 32 ' HFFFFFFFF " if find n5, must be that cc1 is " 32 ' H00000000 " at this moment, otherwise the priority height of cc1, can neglect the result of cc2.Cc1 is by making that with door AND1 the result of output sc1 is complete " 0 ".And cc2 makes also that by rejection gate NOR1, NOR2 output sc3, sc4 are complete " 0 ".With the anti-and cc2 of two input signal cc1 of door AND2 all be complete " 1 ", so sc2 to export be the value of n7 in fact, and by or the OR1 selection result tc1 that exports this heap equal the value of n7, also realized the data selection.Corresponding other comparative result also has same logical relation.

For the principle of work of the 2nd heap, similar with the situation of the 1st heap.And they are identity relations, do not have priority relationship.Their priority relationship realizes by data selector MUX1, if having a condition to set up among cc1, cc2, cc3, the cc4, so they or g1 as a result just be " 1 " entirely, thereby Dout1 selects the tc1 as a result of the 1st heap.And if g1 is " 0 ", then no matter the condition of g2 how, Dout1 can select the tc2 as a result of the 2nd heap.Thereby realized that the 1st heap is better than the function of the 2nd heap.1st, the common establishment condition cfr of the 2nd heap be g1 and g2's or, if cc1 has condition to set up to cc8, cfr just sets up.

For the 3rd heap circuit.Have only two comparator C MP9, CMP10.Obtain two cc9, cc 10 as a result.According to the front narration, these two condition priority are the highest, but they come from the updating value of address logic unit 109,110, and all its arrival time delays are bigger.Therefore Dout1 computing as a result the time, sc9, sc10 are also carrying out as a result, and the result of Dout1 is often arranged earlier, and sc9, sc10 are just arranged.Therefore, can be used as Dout1 the 3rd data input of the 3rd heap, cfr is used as the 3rd comparative result.Such hypothesis has been arranged, and the 3rd heap principle of work is also similar with preceding two heaps, if cc1 sets up to cc10 neither one condition, represents that then this instructs this register not have the data can bypass, thereby selects the value n32 that reads from register, exports out as a result at last.

The bypass circuit of IR0, IR1 is simpler than rs, rt, ARm, ARn's, and one of reason is that its selection data source is fewer, has only 3, and therefore, in any case, the forwarding of these two registers can not become the critical path of circuit.With IR0 is example, owing to have only 3 data sources, and therefore as long as 2 comparator C MP11, CMP12.Principle of work and Prl_sel module class are that it is simpler seemingly.

As mentioned above, the present invention compares the beneficial effect that is had in background technology and is following advantage: circuit time delay reduces, thereby higher efficient is arranged, and adopts the work clock of digital signal processor of the present invention to improve accordingly, and the DSP characteristic also better embodies.

Below be two application examples of the present invention.

Example 1, consider following 2 instructions:

ADD?A，B，C

SUB?D，A，E

Article one, the operation of B+C is carried out in instruction, and the result is stored among the register A.The A-E operation is carried out in second instruction, and the result is deposited among the register D.Because the source-register A of second instruction has used the result of calculation of article one instruction, so exist data dependence between these two instructions.So, when instructing, second is in decoding during the stage, and control module can be judged data collision control information S35, make streamline IF, ID pause, and continues to carry out in other stages.Up to carrying out WB during the stage when article one instruction, parallel data collision detection compare address value in the data bypass circuit, it is identical with write address that comparator C MP5 can find to read the address, it all is A-register, comparator C MP5 output cc5 can become " 1 " entirely, thereby prioritized data is selected the value of circuit output A-register.Be that output s8 in the data bypass circuit is the value of A-register.Realized the data bypass work on this road, streamline is carried out after this.

Example 2, consider following 3 instructions:

ADD?ARm，B，C

SUB?D，*ARm++(IR0)，*ARn++(IR1)

ADD?E，*ARm+(4)，F

Article one, instruction is add instruction, realizes B+C, and the result deposits address register ARm in.Second instruction more complicated, realization be subtraction, two operands come from so that ARm+IR0, ARn+IR1 are the interior poke of address as a result.The value of ARm register can be updated to the ARm+IR0 value simultaneously, and the value of ARn register can be updated to the ARn+IR1 value.The result of subtraction deposits register D in.Article three, instruction still is add instruction, and two operands are respectively from interior poke, and memory address is the result of ARm+4, and another operand comes from register F.In this example, it is relevant that the ARm of article one instruction and second instruction exists data, and the operand of second instruction is the operation result of article one instruction.It is relevant that second instruction and the 3rd instruction also exist data, because the address register ARm that second instructs upgrades, it need change the analog value of register file 106, and ARm here are not only source operand, or destination operand.And the 3rd instruction will be used as address register to ARm and carry out memory address, is source operand, so the second instruction also exists data relevant with the 3rd instruction.

Carry out the second instruction when streamline and arrive decoding scheme, control circuit detects data collision, makes streamline IF, ID stage pause, and other stages continue to carry out, and arrives the stage as a result of writing up to article one instruction.It is equal that comparator C MP5 in the 3rd roadside road branch module 203 of this moment in the data bypass circuit detects two addresses of n1 (or ID_Reg_ARm[4:0]) and n14 (or EX_WB_Dest1[4:0]), other are complete in not waiting, therefore module 203 is selected output n16 value, be the value of the destination register ARm in article one instruction, thereby realized the data bypass effect.Streamline continues to carry out after this, and the second instruction is carried out the DA stage, and the 3rd instruction enters the ID stage.It is equal that comparator C MP10 in the 3rd roadside road branch module 203 of this moment in the data bypass circuit detects the address of n1 (or ID_Reg_ARm[4:0]) and n14 (or DA_ARm[4:0]), all be ARm, other are complete in not waiting, therefore module 203 is selected output n28 value, it is the value of the address register ARm of the renewal in the second instruction, thereby realized the data bypass function, and avoided unnecessary pipeline stall in the DSP processing.

At last, it is also to be noted that what more than enumerate only is specific embodiments of the invention.Obviously, the invention is not restricted to above examples of implementation, many distortion can also be arranged.All distortion that those of ordinary skill in the art can directly derive or associate from content disclosed by the invention all should be thought protection scope of the present invention.

Claims

1, a kind of data bypass technology that is applied in the digital signal processor streamline, it is characterized in that realizing the forwarding of 6 circuit-switched data, wherein 4 the tunnel have the parallel data bypass of priority to 11 data sources, and 2 the tunnel have the parallel data bypass of priority to 3 data sources.

2, data bypass technology as claimed in claim 1 is characterized in that the digital signal processor streamline adopts six stage flowing water, is respectively instruction fetch phase, decoding stage, address computation stage, internal storage access stage, execute phase and write the stage as a result.

3, data bypass technology as claimed in claim 1 is characterized in that being made up of two parts, and a part is that the parallel data address conflict detects, and comprises 10 5 bit address comparers; Another part is that prioritized data is selected, and according to parallel data address conflict testing result 11 data sources is carried out prioritized data and selects.

4, data bypass technology as claimed in claim 1, it is shared to it is characterized in that having in described 4 tunnel 11 data sources 10 data sources to realize, is respectively: from the address computation stage not have to pass through the address register updating value that latchs, data logical consequence from the address register latched value in internal storage access stage, from the address register latched value of execute phase, from the address register latched value of writing stage as a result, after latching, data register is read from the dual-port sheet in one of them poke; Data shared is not for reading the register value from register file separately, and the data of writing in front have higher priority, and the address register updating value in address computation stage has the highest priority; Have 2 data shared in 3 data sources in described 2 tunnel, be respectively: from write data logical consequence after the latching of stage as a result, data register is read from the dual-port sheet in one of them poke; Not shared is the register value of reading separately from register file, and the data of writing in front have higher priority.

5, data bypass technology as claimed in claim 4 is characterized in that having 8 address registers, is wherein the 8th to the 15th in 32 32 bit register files, and can be used as general-purpose register and use, promptly can be multiplexing; The indexed registers address is fixed as the 24th, 25 in 32, and is also multiplexing with general-purpose register.

6, data bypass technology as claimed in claim 3 is characterized in that described parallel data address conflict testing circuit comprises the first to the 14 comparer (CMP1, CMP2, CMP3, CMP4, CMP5, CMP6, CMP7, CMP8, CMP9, CMP10, CMP11, CMP12, CMP13, CMP14), prioritized data selects circuit to comprise first to the 5th anti-door (N01, N02, the N03, N04, N05), the first to the 18 with the door (an AND1, AND2 ..., AND18), first to the 8th Sheffer stroke gate (NOR1, NOR2 ... NOR8), first to the 8th or the door (OR1, OR2,, OR8), a data selector switch (MUX1).