WO2002019098A1 - Procede et dispositif pour controleur pipeline unifie risc/dsp utilisable en liaison avec les instructions de controle d'ordinateur a jeu d'instructions reduit (risc) et les instructions de controle de traitement des signaux numeriques (dsp) - Google Patents

Procede et dispositif pour controleur pipeline unifie risc/dsp utilisable en liaison avec les instructions de controle d'ordinateur a jeu d'instructions reduit (risc) et les instructions de controle de traitement des signaux numeriques (dsp) Download PDF

Info

Publication number
WO2002019098A1
WO2002019098A1 PCT/US2001/025890 US0125890W WO0219098A1 WO 2002019098 A1 WO2002019098 A1 WO 2002019098A1 US 0125890 W US0125890 W US 0125890W WO 0219098 A1 WO0219098 A1 WO 0219098A1
Authority
WO
WIPO (PCT)
Prior art keywords
instruction
dsp
risc
control
signal processing
Prior art date
Application number
PCT/US2001/025890
Other languages
English (en)
Inventor
Kumar Ganapathy
Ruban Kanapathipillai
Original Assignee
Intel Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US09/652,593 external-priority patent/US6832306B1/en
Application filed by Intel Corporation filed Critical Intel Corporation
Priority to AU2001285065A priority Critical patent/AU2001285065A1/en
Publication of WO2002019098A1 publication Critical patent/WO2002019098A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3877Concurrent instruction execution, e.g. pipeline or look ahead using a slave processor, e.g. coprocessor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30145Instruction analysis, e.g. decoding, instruction word fields
    • G06F9/30149Instruction analysis, e.g. decoding, instruction word fields of variable length instructions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3802Instruction prefetching
    • G06F9/3808Instruction prefetching for instruction reuse, e.g. trace cache, branch target cache
    • G06F9/381Loop buffering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • G06F9/3853Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution of compound instructions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3867Concurrent instruction execution, e.g. pipeline or look ahead using instruction pipelines

Definitions

  • This invention relates generally to digital signal processing devices. More particularly, the invention relates to instruction execution within digital signal processors .
  • DSP Single chip digital signal processing devices
  • DSPs generally are distinguished from general purpose microprocessors in that DSPs typically support accelerated arithmetic operations by including a dedicated multiplier and accumulator (MAC) for performing multiplication of digital numbers.
  • the instruction set for a typical DSP device usually includes a MAC instruction for performing multiplication of new operands and addition with a prior accumulated value stored within an accumulator register.
  • a MAC instruction is typically the only instruction provided in prior art digital signal processors where two DSP operations, multiply followed by add, are performed by the execution of one instruction. However, when performing signal processing functions on data it is often desirable to perform other DSP operations in varying combinations .
  • DSPs digital filtering.
  • a DSP is typically programmed with instructions to implement some filter function in the digital or time domain.
  • the equation Y n may be evaluated by using a software program. However in some applications, it is necessary that the equation be evaluated as fast as possible.
  • One way to do this is to perform the computations using hardware components such as a DSP device programmed to compute the equation Y n .
  • the multiple DSPs operate in parallel to speed the computation process.
  • the multiplication of terms is spread across the multipliers of the DSPs equally for simultaneous computations of terms.
  • the adding of terms is similarly spread equally across the adders of the DSPs for simultaneous computations.
  • the order of processing terms is unimportant since the combination is associative. If the processing order of the terms is altered, it has no effect on the final result expected in a vectorized processing of a function.
  • a MAC operation would require a multiply instruction and an add instruction to perform both multiplication and addition. To perform these two instructions would require two processing cycles. Additionally, a program written for the typical micro processor would require a larger program memory in order to store the extra instructions necessary to perform the MAC operation.
  • a DSP operation other than a MAC DSP instruction needs to be performed, the operation requires separate arithmetic instructions programmed into program memory. These separate arithmetic instructions in prior art DSPs similarly require increased program memory space and processing cycles to perform the operation when compared to a single MAC instruction. It is desirable to reduce the number of processing cycles when performing DSP operations. It is desirable to reduce program memory requirements as well.
  • DSPs are often programmed in a loop to continuously perform accelerated arithmetic functions including a MAC instruction using different operands. Often times, multiple arithmetic instructions are programmed in a loop to operate on the same data set. The same arithmetic instruction is often executed over and over in a loop using different operands. Additionally, each time one instruction is completed, another instruction is fetched from the program stored in memory during a fetch cycle. Fetch cycles require one or more cycle times to access a memory before instruction execution occurs. Because circuits change state during a fetch cycle, power is consumed and thus it is desirable to reduce the number of fetch cycles. Typically, approximately twenty percent of power consumption may be utilized in the set up and clean up operations of a loop in order to execute DSP instructions.
  • the loop execution where signal processing of data is performed consumes approximately eighty percent of power consumption with a significant portion being due to instruction fetching. Additionally, because data sets that a DSP device processes are usually large, it is also desirable to speed instruction execution by avoiding frequent fetch cycles to memory.
  • the quality of service over a telephone system often relates to the processing speed of signals. That is particularly the case when a DSP is to provide voice processing, such as voice compression, voice decompression, and echo cancellation for multiple channels. More recently, processing speed has become even more important because of the desire to transmit voice aggregated with data in a packetized form for communication over packetized networks. Delays in processing the packetized voice signal tend to result in the degradation of signal quality on receiving ends.
  • the present invention includes an apparatus, method, instruction set architecture, and system as described in the claims.
  • Multiple application specific signal processor (ASSP) having the instruction set architecture of the present invention are provided within gateways in communication systems to provide improved voice and data communication over a packetized network.
  • Each ASSP includes a serial interface, a buffer memory, and four core processors for each to simultaneously process multiple channels of voice or data.
  • Each core processor preferably includes a reduced instruction set computer (RISC) processor and four signal processing units (SPs) .
  • RISC reduced instruction set computer
  • SPs signal processing units
  • Each SP includes multiple arithmetic blocks to simultaneously process multiple voice and data communication signal samples for communication over IP, ATM, Frame Relay or other packetized network.
  • the four signal processing units can execute the digital signal processing algorithms in parallel .
  • Each ASSP is flexible and can be programmed to perform many network functions or data/voice processing functions, including voice and data compression/decompression in telecommunications systems (such as CODECs) particularly packetized telecommunication networks, simply by altering the software program controlling the commands executed by the ASSP.
  • telecommunications systems such as CODECs
  • An instruction set architecture for the ASSP is tailored to digital signal processing applications including audio and speech processing such as compression/decompression and echo cancellation.
  • the instruction set architecture implemented with the ASSP is adapted to DSP algorithmic structures. This adaptation of the ISA of the present invention to DSP algorithmic structures balances the ease of implementation, processing efficiency, and programmability of DSP algorithms.
  • the instruction set architecture may be viewed as being two component parts, one (RISC ISA) corresponding to the RISC control unit and another (DSP ISA) to the DSP datapaths of the signal processing units.
  • the RISC ISA is a register based architecture including 16-registers within the register file, while the DSP ISA is a memory based architecture with efficient digital signal processing instructions.
  • the instruction word for the ASSP can be 20 bits, or can be expanded to 40 bits.
  • the 40-bit instruction word can be used to control two instructions to be executed in series or parallel, such as two RISC control instructions, extended DSP instructions, or two 20-bit DSP instructions .
  • the instruction set architecture of the ASSP has four distinct types of instructions to optimize the DSP operational mix. These are (1) a 20-bit DSP instruction that uses mode bits in control registers (i.e. mode registers), (2) a 40-bit DSP instruction having control extensions that can override mode registers, (3) a 20-bit dyadic DSP instruction, and (4) a 40-bit DSP instruction that extends the capabilities of a 20-bit dyadic DSP instruction by providing powerful bit manipulation .
  • All DSP instructions of the instruction set architecture of the ASSP are dyadic DSP instructions to execute two operations in one instruction with one cycle throughput.
  • a dyadic DSP instruction is a combination of two basic DSP operations in one instruction and includes a main DSP operation (MAIN OP) and a sub DSP operation (SUB OP) .
  • MAIN OP main DSP operation
  • SUB OP sub DSP operation
  • the instruction set architecture of the present invention can be generalized to combining any pair of basic DSP operations to provide very powerful dyadic instruction combinations.
  • the DSP arithmetic instructions or operations in the preferred embodiment include a multiply instruction (MULT) , an addition instruction (ADD) , a minimize/maximize instruction (MIN/MAX) also referred to as an extrema instruction, and a no operation instruction (NOP) each having an associated operation code ("opcode").
  • MULT multiply instruction
  • ADD addition instruction
  • MIN/MAX minimize/maximize instruction
  • NOP no operation instruction
  • the present invention efficiently executes these dyadic DSP instructions by means of the instruction set architecture and the hardware architecture of the application specific signal processor.
  • the DSP instructions can process vector data or scalar data automatically using a single instruction and provide the appropriate vector or scalar output results .
  • a unified RISC/DSP pipeline controller controls the execution of both reduced instruction set computer (RISC) control instructions and digital signal processing (DSP) instructions within each core processor of the ASSP.
  • the unified RISC/DSP pipeline controller is coupled to the program memory, the RISC control unit, and the four signal processing units (SPs) .
  • the program memory stores both DSP instructions and RISC control instructions and the RISC control unit controls the flow of operands and results between the signal processing unit and the data memory.
  • the signal processing units execute the DSP instruction.
  • the unified RISC/DSP pipeline controller generates DSP control signals to control the execution of the DSP instruction by the signal processing units and RISC control signals to control the execution of the RISC control instruction by the RISC control unit.
  • Figure 1A is a block diagram of a system utilizing the present invention.
  • Figure IB is a block diagram of a printed circuit board utilizing the present invention within the gateways of the system in Figure 1A.
  • FIG. 2 is a block diagram of an Application Specific Signal Processor (ASSP) according to one embodiment of the present invention.
  • ASSP Application Specific Signal Processor
  • Figure 3 is a block diagram of an instance of one of the core processors according to one embodiment of the present invention within an ASSP.
  • Figure 4 is a block diagram of the RISC processing unit within the core processors of Figure 3.
  • Figure 5A is a block diagram of an instance of a signal processing unit (SP) according to one embodiment of the present invention within a core processor of Figure 3.
  • SP signal processing unit
  • Figure 5B is a more detailed block diagram of Figure 5A illustrating the bus structure of the signal processing unit according to one embodiment of the present invention.
  • Figure 6A is an exemplary instruction sequence illustrating a program model for DSP algorithms employing an instruction set architecture (ISA) according to one embodiment of the present invention.
  • ISA instruction set architecture
  • Figure 6B is a chart illustrating a pair of bits that specify differing types of dyadic DSP instructions and RISC control instructions of the ISA according to one embodiment of the present invention.
  • Figure 6C lists a set of addressing instructions, and particularly shows a 6-bit operand specifier for the ISA, according to one embodiment of the present invention.
  • Figure 6D shows an exemplary memory address register according to one embodiment of the present invention.
  • Figure 6E illustrates an exemplary 5-bit operand specifier according to one embodiment of the invention.
  • Figure 6F is a chart illustrating the permutations of the dyadic DSP instructions according to one embodiment of the invention.
  • Figures 6G and 6H show a bitmap syntax for exemplary 20-bit non-extended DSP instructions and 40-bit extended DSP instructions according to one embodiment of the invention.
  • Figure 61 illustrates RISC control instructions for the ISA according to one embodiment of the present invention.
  • Figure 6J lists a set of extended RISC control instructions for the ISA according to one embodiment of the present invention.
  • Figure 6K lists a set of 40-bit DSP instructions for the ISA according to one embodiment of the present invention.
  • Figure 7 is a functional block diagram illustrating an exemplary architecture for a unified RISC/DSP pipeline controller according to one embodiment of the present invention.
  • Figure 8a is a diagram illustrating the operations occurring in different stages of the unified RISC/DSP pipeline controller according to one embodiment of the present invention.
  • Figure 8b is a diagram illustrating the timing of certain operations for the unified RISC/DSP pipeline controller of Figure 8a according to one embodiment of the present invention.
  • Each ASSP includes a serial interface, a buffer memory and four core processors in order to simultaneously process multiple channels of voice or data.
  • Each core processor preferably includes a reduced instruction set computer (RISC) processor and four signal processing units (SPs) .
  • RISC reduced instruction set computer
  • SPs signal processing units
  • Each SP includes multiple arithmetic blocks to simultaneously process multiple voice and data communication signal samples for communication over IP, ATM, Frame Relay, or other packetized network.
  • the four signal processing units can execute digital signal processing algorithms in parallel.
  • Each ASSP is flexible and can be programmed to perform many network functions or data/voice processing functions, including voice and data compression/decompression in telecommunication systems (such as CODECs) , particularly packetized telecommunication networks, simply by altering the software program controlling the commands executed by the ASSP.
  • An instruction set architecture for the ASSP is tailored to digital signal processing applications including audio and speech processing such as compression/decompression and echo cancellation.
  • the instruction set architecture implemented with the ASSP is adapted to DSP algorithmic structures. This adaptation of the ISA of the present invention to DSP algorithmic structures balances the ease of implementation, processing efficiency, and programmability of DSP algorithms.
  • the instruction set architecture may be viewed as being two component parts, one (RISC ISA) corresponding to the RISC control unit and another (DSP ISA) to the DSP datapaths of the signal processing units.
  • the RISC ISA is a register based architecture including 16-registers within the register file, while the DSP ISA is a memory based architecture with efficient digital signal processing instructions.
  • the instruction word for the ASSP can be 20 bits, or can be expanded to 40 bits.
  • the 40-bit instruction word can be used to control two instructions to be executed in series or parallel, such as two RISC control instructions, extended DSP instructions, or two 20-bit DSP instructions.
  • the instruction set architecture of the ASSP has four distinct types of instructions to optimize the DSP operational mix. These are (1) a 20-bit DSP instruction that uses mode bits in control registers (i.e. mode registers), (2) a 40-bit DSP instruction having control extensions that can override mode registers, (3) a 20-bit dyadic DSP instruction, and (4) a 40-bit DSP instruction that extends the capabilities of a 20-bit dyadic DSP instruction by providing powerful bit manipulation .
  • All DSP instructions of the instruction set architecture of the ASSP are dyadic DSP instructions to execute two operations in one instruction with one cycle throughput.
  • a dyadic DSP instruction is a combination of two DSP instructions or operations in one instruction and includes a main DSP operation (MAIN OP) and a sub DSP operation (SUB OP) .
  • MAIN OP main DSP operation
  • SUB OP sub DSP operation
  • the instruction set architecture of the present invention can be generalized to combining any pair of basic DSP operations to provide very powerful dyadic instruction combinations.
  • the DSP arithmetic operations in the preferred embodiment include a multiply instruction (MULT) , an addition instruction (ADD) , a minimize/maximize instruction (MIN/MAX) also referred to as an extrema instruction, and a no operation instruction (NOP) each having an associated operation code ("opcode") .
  • MULT multiply instruction
  • ADD addition instruction
  • MIN/MAX minimize/maximize instruction
  • NOP no operation instruction
  • the present invention efficiently executes these dyadic DSP instructions by means of the instruction set architecture and the hardware architecture of the application specific signal processor.
  • a unified RISC/DSP pipeline controller controls the execution of both reduced instruction set computer (RISC) control instructions and digital signal processing (DSP) instructions within each core processor of the ASSP.
  • the unified RISC/DSP pipeline controller is coupled to the program memory, the RISC control unit, and the four signal processing units (SPs) .
  • the program memory stores both DSP instructions and RISC control instructions and the RISC control unit controls the flow of operands and results between the signal processing unit and the data memory.
  • the signal processing units execute the DSP instruction.
  • the unified RISC/DSP pipeline controller generates DSP control signals to control the execution of the DSP instruction by the signal processing units and RISC control signals to control the execution of the RISC control instruction by the RISC control unit.
  • the system 100 includes a network 101 that is a packetized or packet- switched network, such as IP, ATM, or frame relay.
  • the network 101 allows the communication of voice/speech and data between endpoints in the system 100, using packets.
  • Data may be of any type including audio, video, email, and other generic forms of data.
  • the voice or data requires packetization when transceived across the network 101.
  • the system 100 includes gateways 104A, 104B, and 104C in order to packetize the information received for transmission across the network 101.
  • a gateway is a device for connecting multiple networks and devices that use different protocols. Voice and data information may be provided to a gateway 104 from a number of different sources in a variety of digital formats.
  • analog voice signals are transceived by a telephone 108 over the plain old telephone system (POTS) 107A and are coupled into a switch 106A of the public switched telephone network (PSTN) .
  • POTS plain old telephone system
  • PSTN public switched telephone network
  • the analog signals from the POTS 107A are digitized and transceived to the gateway 104A by time division multiplexing (TDM) with each time slot representing a channel and one DS0 input to gateway 104A.
  • TDM time division multiplexing
  • digital voice signals are transceived at public branch exchanges (PBX) 112A and 112B that are coupled to multiple telephones, fax machines, or data modems .
  • Digital voice signals are transceived between PBX 112A and PBX 112B with gateways 104A and 104C, respectively.
  • Digital data signals may also be transceived directly between a digital modem 114 and a gateway 104A.
  • Digital modem 114 may be a Digital Subscriber Line (DSL) modem or a cable modem.
  • Data signals may also be coupled into system 100 by a wireless communication system by means of a mobile unit 118 transceiving digital signals or analog signals wirelessly to a base station 116.
  • Base station 116 converts analog signals into digital signals or directly passes the digital signals to gateway 104B.
  • Data may be transceived by means of modem signals over the plain old telephone system (POTS) 107B using a modem 110.
  • Modem signals communicated over POTS 107B are traditionally analog in nature and are coupled into a switch 106B of the public switched telephone network (PSTN) .
  • PSTN public switched telephone network
  • analog signals from the POTS 107B are digitized and transceived to the gateway 104B by time division multiplexing (TDM) with each time slot representing a channel and one DS0 input to gateway 104B.
  • TDM time division multiplexing
  • incoming signals are packetized for transmission across the network 101.
  • Signals received by the gateways 104A, 104B and 104C from the network 101 are depacketized and transcoded for distribution to the appropriate destination.
  • NIC 130 of a gateway 104 includes one or more application-specific signal processors (ASSPs) 150A-150N.
  • ASSPs application-specific signal processors
  • Line interface devices 131 of NIC 130 provide interfaces to various devices connected to the gateway, including the network 101. In interfacing to the network 101, the line interface devices packetize data for transmission out on the network 101 and depacketize data which is to be received by the ASSP devices. Line interface devices 131 process information received by the gateway on the receive bus 134 and provides it to the ASSP devices. Information from the ASSP devices 150 is communicated on the transmit bus 132 for transmission out of the gateway.
  • a traditional line interface device is a multi-channel serial interface or a UTOPIA device.
  • the NIC 130 couples to a gateway backplane/network interface bus 136 within the gateway 104.
  • Bridge logic 138 transceives information between bus 136 and NIC 130.
  • Bridge logic 138 transceives signals between the NIC 130 and the backplane/network interface bus 136 onto the host bus 139 for communication to either one or more of the ASSP devices 150A-150N, a host processor 140, or a host memory 142.
  • ASSP 150 optional local memory 145A through 145N
  • optional local memory 145 Digital data on the receive bus 134 and transmit bus ⁇ 32 is preferably communicated in bit wide fashion. While internal memory within each ASSP may be sufficiently large to be used as a scratchpad memory, . optional local memory 145 may be used by each of the ASSPs 150 if additional memory space is necessary. Each of the ASSPs 150 provide signal processing capability for the gateway. The type of signal processing provided is flexible because each ASSP may execute differing signal processing programs.
  • Typical signal processing and related voice packetization functions for an ASSP include (a) echo cancellation; (b) video, audio, and voice/speech compression/decompression (voice/speech coding and decoding) ; (c) delay handling (packets, frames) ; (d) loss handling; (e) connectivity (LAN and WAN) ; (f) security (encryption/decryption) ; (g) telephone connectivity; (h) protocol processing (reservation and transport protocols, RSVP, TCP/IP, RTP, UDP for IP, and AAL2 , AAL1, AAL5 for ATM); (i) filtering; (j) Silence suppression; (k) length handling (frames, packets); and other digital signal processing functions associated with the communication of voice and data over a communication system.
  • Each ASSP 150 can perform other functions in order to transmit voice and data to the various endpoints of the system 100 within a packet data stream over a packetized network.
  • FIG. 2 a block diagram of the ASSP 150 is illustrated.
  • Each of the core processors 200A-200D is respectively coupled to a data memory 202A-202D and a program memory 204A-204D.
  • Each of the core processors 200A-200D communicates with outside channels through the multi-channel serial interface 206, the multi-channel memory movement engine 208, buffer memory 210, and data memory 202A-202D.
  • the ASSP 150 further includes an external memory interface 212 to couple to the external optional local memory 145.
  • the ASSP 150 includes an external host interface 214 for interfacing to the external host processor 140 of Figure IB.
  • the ASSP 150 further includes a microcontroller 223 to perform process scheduling for the core processors 200A-200D and the coordination of the data movement within the ASSP as well as an interrupt controller 224 to assist in interrupt handling and the control of the ASSP 150.
  • a microcontroller 223 to perform process scheduling for the core processors 200A-200D and the coordination of the data movement within the ASSP as well as an interrupt controller 224 to assist in interrupt handling and the control of the ASSP 150.
  • Core processor 200 is the block diagram for each of the core processors 200A-200D.
  • Data memory 202 and program memory 204 refers to a respective instance of data memory 202A-202D and program memory 204A-204D, respectively.
  • the core processor 200 includes four signal processing units SP0 300A, SP1 300B, SP2 300C and SP3 300D.
  • the core processor 200 further includes a reduced instruction set computer (RISC) control unit 302 and a unified RISC/DSP pipeline controller 304.
  • RISC reduced instruction set computer
  • the signal processing units 300A-300D perform the signal processing tasks on data while the RISC control unit 302 and the unified RISC/DSP pipeline controller 304 perform control tasks related to the signal processing functions performed by the SPs 300A-300D.
  • the control provided by the RISC control unit 302 is coupled with the SPs 300A- 300D at the pipeline level to yield a tightly integrated core processor 200 that keeps the utilization of the signal processing units 300 at a very high level.
  • the signal processing units 300A-300D are each connected to data memory 202, to each other, and to the RISC 302, via data bus 203, for the exchange of data (e.g. operands) .
  • the signal processing tasks are performed on the datapaths within the signal processing units 300A-300D.
  • the nature of the DSP algorithms are such that they are inherently vector operations on streams of data, that have minimal temporal locality (data reuse) .
  • a data cache with demand paging is not used because it would not function well and would degrade operational performance. Therefore, the signal processing units 300A-300D are allowed to access vector elements (the operands) directly from data memory 202 without the overhead of issuing a number of load and store instructions into memory, resulting in very efficient data processing.
  • the instruction set architecture of the present invention having a 20 bit instruction word which can be expanded to a 40 bit instruction word, achieves better efficiencies than VLIW architectures using 256-bits or higher instruction widths by adapting the ISA to DSP algorithmic structures.
  • the adapted ISA leads to very compact and low-power hardware that can scale to higher computational requirements.
  • the operands that the ASSP can accommodate are varied in data type and data size.
  • the data type may tie real or complex, an integer value or a fractional value, with vectors having multiple elements of different sizes.
  • the data size in the preferred embodiment is 64 bits but larger data sizes can be accommodated with proper instruction coding.
  • RISC control unit 302 includes a data aligner and formatter 402, a memory address generator 404, three adders 406A-406C, an arithmetic logic unit (ALU) 408, a multiplier 410, a barrel shifter 412, and a register file 413.
  • the register file 413 points to a starting memory location from which memory address generator 404 can generate addresses into data memory 202.
  • the RISC control unit 302 is responsible for supplying addresses to data memory so that the proper data stream is fed to the signal processing units 300A-300D.
  • the RISC control unit 302 is a register to register organization with load and store instructions to move data to and from data memory 202.
  • Data memory addressing is performed by RISC control unit using a 32-bit register as a pointer that specifies the address, post-modification offset, and type and permute fields.
  • the type field allows a variety of natural DSP data to be supported as a "first class citizen" in the architecture. For instance, the complex type allows direct operations on complex data stored in memory removing a number of bookkeeping instructions. This is useful in supporting QAM demodulators in data modems very efficiently.
  • FIG. 5A a block diagram of a signal processing unit 300 is illustrated which represents an instance of the SPs 300A-300D.
  • Each of the signal processing units 300 includes a data typer and aligner 502, a first multiplier Ml 504A, a compressor 506, a first adder Al 510A, a second adder A2 510B, an accumulator register 512, a third adder A3 510C, and a second multiplier M2 504B.
  • Adders 510A-510C are similar in structure and are generally referred to as adder 510.
  • Multipliers 504A and 504B are similar in structure and generally referred to as multiplier 504.
  • Each of the multipliers 504A and 504B have a multiplexer 514A and 514B respectively at its input stage to multiplex different inputs from different busses into the multipliers.
  • Each of the adders 510A, 510B, 510C also have a multiplexer 520A, 520B, and 520C respectively at its input stage to multiplex different inputs from different busses into the adders.
  • These multiplexers and other control logic allow the adders, multipliers and other components within the signal processing units 300A- 300C to be flexibly interconnected by proper selection of multiplexers .
  • multiplier Ml 504A, compressor 506, adder Al 510A, adder A2 510B and accumulator 512 can receive inputs directly from external data buses through the data typer and aligner 502.
  • adder 510C and multiplier M2 504B receive inputs from the accumulator 512 or the outputs from the execution units multiplier Ml 504A, compressor 506, adder Al 510A, and adder A2 510B.
  • Program memory 204 couples to the pipe control 304 that includes an instruction buffer that acts as a local loop cache.
  • the instruction buffer in the preferred embodiment has the capability of holding four instructions.
  • the instruction buffer of the unified RISC/DSP pipe controller 304 reduces the power consumed in accessing the main memories to fetch instructions during the execution of program loops .
  • Output signals are coupled out of the signal processor 300 on the Z output bus 532 through the data typer and aligner 502.
  • Input signals are coupled into the signal processor 300 on the X input bus 531 and Y input bus 533 through the data typer and aligner 502.
  • the data typer and aligner 502 has a different data bus to couple to each of multiplier Ml 504A, compressor 506, adder Al 510A, adder A2 510B, and accumulator register AR 512.
  • output data is coupled from the accumulator register AR 512 into the data typer and aligner 502.
  • Multiplier Ml 504A has buses to couple its output into the inputs of the compressor 506, adder Al 510A, adder A2 510B, and the accumulator registers AR 512.
  • Compressor 506 has buses to couple its output into the inputs of adder Al 510A and adder A2 510B.
  • Adder Al 510A has a bus to couple its output into 1 the accumulator registers 512.
  • Adder A2 510B has buses to couple its output into the accumulator registers 512.
  • Accumulator registers 512 has buses to couple its output into multiplier M2 504B, adder A3 510C, and data typer and aligner 502.
  • Adder A3 510C has buses to couple its output into the multiplier M2 504B and the accumulator registers 512.
  • Multiplier M2 504B has buses to couple its output into the inputs of the adder A3 510C and the accumulator registers AR 512.
  • the instruction set architecture of the ASSP 150 is tailored to digital signal processing applications including audio and speech processing such as compression/decompression and echo cancellation.
  • the instruction set architecture implemented with the ASSP 150 is adapted to DSP algorithmic structures .
  • the adaptation of the ISA of the present invention to DSP algorithmic structures is a balance between ease of implementation, processing efficiency, and programmability of DSP algorithms.
  • the ISA of the present invention provides for data movement operations, DSP/arithmetic/logical operations, program control operations (such as function calls/returns, unconditional/conditional jumps and branches) , and system operations (such as privilege, interrupt/trap/hazard handling and memory management control) .
  • an exemplary instruction sequence 600 is illustrated for a DSP algorithm program model employing the instruction set architecture of the present invention.
  • the instruction sequence 600 has an outer loop 601 and an inner loop 602. Because DSP algorithms tend to perform repetitive computations, instructions 605 within the inner loop 602 are executed more often than others.
  • Instructions 603 are typically parameter setup code to set the memory pointers, provide for the setup of the outer loop 601, and other 2X20 control instructions.
  • Instructions 607 are typically context save and function return instructions or other 2X20 control instructions. Instructions 603 and 607 are often considered overhead instructions that are typically infrequently executed.
  • Instructions 604 are typically to provide the setup for the inner loop 602, other control through 2x20 control instructions, dual loop setup, and offset extensions for pointer backup.
  • Instructions 606 typically provide tear down of the inner loop 602, other control through 2x20 control instructions, and combining of datapath results within the signal processing units.
  • Instructions 605 within the inner loop 602 typically provide inner loop execution of DSP operations, control of the four signal processing units 300 in a single instruction multiple data execution mode, memory access for operands, dyadic DSP operations, and other DSP functionality through the 20/40 bit DSP instructions of the ISA of the present invention. Because instructions 605 are so often repeated, significant improvement in operational efficiency may be had by providing the DSP instructions, including general dyadic instructions and dyadic DSP instructions, within the ISA of the present invention.
  • the instruction set architecture of the ASSP 150 can be viewed as being two component parts, one (RISC ISA) corresponding to the RISC control unit and another (DSP ISA) to the DSP datapaths of the signal processing units 300.
  • the RISC ISA is a register based architecture including sixteen registers within the register file 413, while the DSP ISA is a memory based architecture with efficient digital signal processing instructions.
  • the instruction word for the ASSP is typically 20 bits but can be expanded to 40-bits to control two RISC control instructions or DSP instructions to be executed in series or parallel, such as a RISC control instruction executed in parallel with a DSP instruction, or a 40 bit extended RISC control instruction or DSP instruction.
  • the instruction set architecture of the ASSP has four distinct types of instructions to optimize the DSP operational mix. These are (1) a 20-bit DSP instruction that uses mode bits in control registers (i.e. mode registers), (2) a 40-bit DSP instruction having control extensions that can override mode registers, (3) a 20-bit dyadic DSP instruction, and (4) a 40-bit DSP instruction that extends the capabilities of a 20-bit dyadic DSP instruction by providing powerful bit manipulation.
  • the third operand can be placed into one of the registers of the accumulator 512 or the RISC register file 413.
  • there are two subclasses of the 20-bit DSP instructions which are (1) A and B specified by a 4-bit specifier, and C and D by a 1-bit specifier and (2) A and C specified by a 4-bit specifier, and B and D by a 1 bit specifier.
  • Instructions for the ASSP are always fetched 40-bits at a time from program memory with bits 39 and 19 indicating the type of instruction. After fetching, the instruction is grouped into two sections of 20 bits each for execution of operations.
  • the two 20-bit sections are RISC control instructions that are executed simultaneously.
  • the two 20-bit sections are RISC control instructions that are executed serially.
  • This 40-bit extended DSP instruction extends the capabilities of a 20-bit dyadic DSP instruction- the first 20 bit section is a DSP instruction and the second 20-bit section extends the capabilities of the first DSP instruction and provides powerful bit manipulation instructions, i.e., it is a 40-bit DSP instruction that operates on the top row of functional unit with extended capabilities .
  • the type of extension from the 20 bit instruction word falls into five categories :
  • Offset extensions that can replace or extend the offsets specified in the address registers
  • the 40-bit DSP instructions with the 20 bit extensions allow a large immediate value (16 to 20 bits) to be specified in the instruction and powerful bit manipulation instructions.
  • the ISA of the ASSP 150 is fully predicated providing for execution prediction.
  • a 6-bit specifier is used in the DSP 40-bit extended instructions to access operands in memory and registers .
  • Figure 6C shows an exemplary 6-bit operand specifier according to one embodiment of the present invention.
  • the MSB Bit 5 indicates whether the access is a memory access or register access. In this embodiment, if Bit 5 is set to logical one, it denotes a memory access for an operand. If Bit 5 is set to a logical zero, it denotes a register access for an operand .
  • Bit 5 If Bit 5 is set to 1, the contents of a specified register (rX where X: 0-7) are used to obtain the effective memory address and post-modify the pointer field by one of two possible offsets specified in one of the specified rX registers.
  • Figure 6D shows an exemplary memory address register according to one embodiment of the present invention.
  • Bit 4 determines what register set has the contents of the desired operand. If Bit-4 is set to 1, the remaining specified bits control access to the general purpose file (r0-rl5) within the register file 413. If Bit-4 is set to 0, then the remaining specified bits 3 : 0 control access to the general purpose register file (r0-rl5) within the register file 413, the accumulator registers 512 of the signal processing units 300, or to execution unit registers.
  • the general purpose file holds data or memory addresses to allow RISC or DSP operand access . RISC instructions in general access only the GPR file. DSP instructions access memory using GPR as addresses.
  • the 20-bit DSP instruction words have 4-bit operand specifiers that can directly access data memory using 8 address registers (r0-r7) within the register file 413 of the RISC control unit 302.
  • the method of addressing by the 20 bit DSP instruction word is regular indirect with the address register specifying the pointer into memory, post-modification value, type of data accessed and permutation of the data needed to execute the algorithm efficiently.
  • Figure 6E illustrates an exemplary 5-bit operand specifier according to one embodiment of the invention that includes the 4-bit specifier for general data operands and special purpose registers (SPR) .
  • the 5-bit operand specifier is used in RISC control instructions.
  • bit maps for operand specifiers to access registers and memory illustrated in FIGs. 6B-6E are only exemplary, and as should be appreciated by one skilled in the art, any number of bit map schemes, register schemes, etc., could be used to implement the present invention.
  • DSP INSTRUCTIONS There are four major classes of DSP instructions for the ASSP 150 these are :
  • Multiply Controls the execution of the main multiplier connected to data buses from memory. Controls: Rounding, sign of multiply
  • Second operation Add, Sub, Min, Max in vector or scalar mode
  • All of the DSP instructions control the multipliers 504A-504B, adders 510A-510C, compressor 506 and the accumulator 512, the functional units of each signal processing unit 300A-300D.
  • the ASSP 150 can execute these DSP arithmetic operations in vector or scalar fashion. In scalar execution, a reduction or combining operation is performed on the vector results to yield a scalar result. It is common in DSP applications to perform scalar operations, which are efficiently performed by the ASSP 150.
  • Efficient DSP execution is improved by the hardware architecture of the present invention.
  • efficiency is improved in the manner that data is supplied to and from data memory 202, to and from the RISC 302, and to and from the four signal processing units (SPs) 300 themselves (e.g. the SPs can store data themselves within accumulator registers) , to feed the four SPs 300 and the DSP functional units therein, via the data bus 203.
  • the data bus 203 is comprised of two buses, X bus 531 and Y bus 533, for X and Y source operands, and one Z bus 532 for a result write. All buses, including X bus 531, Y bus 533, and Z bus 532, are preferably 64 bits wide.
  • the buses are uni-directional to simplify the physical design and reduce transit times of data.
  • the parallel load field can only access registers within the register file 413 of the RISC control unit 302.
  • the four signal processing units 300A-300D in parallel provide four parallel MAC units (multiplier 504A, adder 510A, and accumulator 512) that can make simultaneous computations.
  • DYADIC DSP INSTRUCTIONS All DSP instructions of the instruction set architecture of the ASSP 150 are dyadic DSP instructions within the 20-bit or 40-bit instruction word.
  • a dyadic DSP instruction informs the ASSP in one instruction and one cycle to perform two operations.
  • FIG. 6F is a chart illustrating the permutations of the dyadic DSP instructions .
  • the dyadic DSP instruction 610 includes a main DSP operation 611 (MAIN OP) and a sub DSP operation 612 (SUB OP) , a combination of two DSP instructions or operations in one dyadic instruction.
  • MAIN OP main DSP operation
  • SUB OP sub DSP operation 612
  • the instruction set architecture of the present invention can be generalized to combining any pair of basic DSP operations to provide very powerful dyadic instruction combinations.
  • Compound DSP operational instructions can provide uniform acceleration for a wide variety of DSP algorithms not just multiply- accumulate intensive filters.
  • the DSP instructions or operations in the preferred embodiment include a multiply instruction (MULT) , an addition instruction (ADD) , a minimize/maximize instruction (MIN/MAX) also referred to as an extrema instruction, and a no operation instruction (NOP) each having an associated operation code ("opcode").
  • MULT multiply instruction
  • ADD addition instruction
  • MIN/MAX minimize/maximize instruction
  • NOP no operation instruction
  • Any two DSP instructions can be combined together to form a dyadic DSP instruction.
  • the NOP instruction is used for the MAIN OP or SUB OP when a single DSP operation is desired to be executed by the dyadic DSP instruction.
  • There are variations of the general DSP instructions such as vector and scalar operations of multiplication or addition, positive or negative multiplication, and positive or negative addition (i.e. subtraction).
  • bitmap syntax for exemplary 20-bit non-extended and 40-bit extended DSP instructions is illustrated.
  • the bitmap syntax is the twenty most significant bits of a forty bit word while for 40-bit extended DSP instruction the bitmap syntax is an instruction word of forty bits.
  • Figures 6G and 6H taken together illustrate an exemplary 40-bit extended DSP instruction.
  • Figure 6G illustrates bitmap syntax for a 20-bit DSP instruction.
  • Figure 6H illustrates the bitmap syntax for the second 20-bit section of a 40-bit extended DSP instruction.
  • the three most significant bits (MSBs) indicates the MAIN OP instruction type while the SUB OP is located near the end of the first 20-bit section at bits numbered 20 through 22.
  • the MAIN OP instruction codes are 000 for NOP, 101 for ADD, 110 for MIN/MAX, and 100 for MULT.
  • the SUB OP code for the given DSP instruction varies according to what MAIN OP code is selected.
  • the SUB OPs are 000 for NOP, 001 or 010 for ADD, 100 or 011 for a negative ADD or subtraction, 101 or 110 for MIN, and 111 for MAX.
  • the bitmap syntax for other MAIN OPs and SUB OPs can be seen in Figure 6G.
  • control extended dyadic DSP instruction i.e. the extended bits
  • control the signal processing unit to perform rounding, limiting, absolute value of inputs for SUB OP, or a global MIN/MAX operation with a register value.
  • bitmap syntax of the dyadic DSP instructions can be converted into text syntax for program coding.
  • its text syntax for multiplication or MULT is
  • vmul I vmuln refers to either positive vector multiplication or negative vector multiplication being selected as the MAIN OP.
  • ssub) smax refers to either vector add, vector subtract, vector maximum, scalar add, scalar subtraction, or scalar maximum being selected as the SUB OP.
  • da refers to selecting one of the registers within the accumulator for storage of results.
  • the field “sx” refers to selecting a register within the RISC register file 413 which points to a memory location in memory as one of the sources of operands .
  • the field “sa” refers to selecting the contents of a register within the accumulator as one of the sources of operands .
  • the field “sy” refers to selecting a register within the RISC register file 413 which points to a memory location in memory as another one of the sources of operands .
  • psl)]” refers to pair selection of keyword PSO or PS1 specifying which are the source- destination pairs of a parallel-store control register.
  • Figure 61 illustrates control instructions for the ISA according to one embodiment of the present invention.
  • Figure 6J illustrates a set of extended control instructions for the ISA according to one embodiment of the present invention.
  • Figure 6K illustrates a set of 40-bit DSP instructions for the ISA according to one embodiment of the present invention.
  • FIG. 7 is a functional block diagram illustrating an exemplary architecture for a unified RISC/DSP pipeline controller 304 according to one embodiment of the present invention.
  • the unified RISC/DSP pipeline controller 304 controls the execution of both reduced instruction set computer (RISC) control instructions and digital signal processing (DSP) instructions within each core processor of the ASSP.
  • RISC reduced instruction set computer
  • DSP digital signal processing
  • the unified RISC/DSP pipeline controller 304 is coupled to the program memory 204, the RISC control unit 302, and the four signal processing units (SPs) 300.
  • the unified pipeline controller 304 is coupled to the program memory 204 by the address bus 702 and the instruction bus 704.
  • the program memory 204 stores both DSP instructions and RISC control instructions.
  • the RISC 302 transmits a request along the instruction request bus 706 to the FO Fetch control stage 708 of the unified pipeline controller 304 to fetch a new instruction.
  • FO Fetch control stage 708 generates an address and transmits the address onto the address bus 702 to address a memory location of a new instruction in the program memory 204.
  • the instruction is then signaled onto to the instruction bus 704 to the FO Fetch control stage 708 of the unified pipeline controller 304.
  • the unified RISC/DSP pipeline controller 304 is coupled to the RISC control unit 302 via RISC control signal bus 710.
  • the unified pipeline controller 304 generates RISC control signals and transmits them onto the RISC control signal bus 710 to control the execution of the RISC control instruction by the RISC control unit 302.
  • the RISC control unit 302 controls the flow of operands and results between the signal processing units 300
  • the unified RISC/DSP pipeline controller 304 is coupled to the four signal processing units (SPs) 300A- 300D via DSP control signal bus 712.
  • the unified pipeline controller 304 generates DSP control signals and transmits them onto the DSP control signal bus 712 to control the execution of the DSP instruction by the SPs 300A-300D.
  • the signal processing units execute the DSP instruction using multiple data inputs from the data memory 202, the RISC 302, and accumulator registers within the SPs, delivered to the SPs along data bus 203.
  • Figure 8a is a diagram illustrating the operations occurring in different stages of the unified RISC/DSP pipeline controller according to one embodiment of the present invention.
  • Figure 8b is a diagram illustrating the timing of certain operations for the unified RISC/DSP pipeline controller of Figure 8a according to one embodiment of the present invention.
  • the unified RISC/DSP pipeline controller 304 is capable of executing both RISC control instructions and DSP instructions.
  • the RISC control instruction is executed within a shared portion 802 of the unified pipeline controller 304 and the digital signal processing instruction is executed within the shared portion 802 of the unified pipeline and within a DSP portion 804 of the unified pipeline.
  • the unified pipeline controller 304 has a two-stage instruction fetch section including a FO Fetch control stage 708 and a FI Fetch control stage 808.
  • the RISC 302 transmits a request along the instruction request bus 706 to the FO Fetch control stage 708 to fetch a new instruction.
  • the FO Fetch control stage 708 generates an address and transmits the address onto the address bus 702 to address a memory location of a new instruction in the program memory 204.
  • the DSP or RISC control instruction is then signaled onto the instruction bus 704 to the FO Fetch control stage 708 and is stored within pipeline register 711. As should be appreciated, all of the pipeline registers are clocked to sequentially move the instruction down the pipeline.
  • the fetched instruction undergoes further processing by the FI Fetch control stage 808 and is stored within instruction pipeline register 713.
  • a 40-bit DSP or RISC control instruction has been read and latched into the instruction pipeline register 713.
  • the instruction can be stored within instruction register 715 for loop buffering of the instruction as will be discussed later.
  • a program counter (PC) is driven to memory.
  • the unified RISC/DSP pipeline controller 304 has a two stage Decoder section including a DO decode stage 812 and a Dl decode stage 814 to decode DSP and RISC control instructions.
  • a DSP instruction upon the next clock cycle, the DSP instruction is transmitted from the instruction pipeline register 713 to the DO decode stage 812 where the DSP instruction is decoded and DSP control signals are generated and transmitted via DSP control signal bus 712 to the SPs 300 to control the execution of the DSP instruction by the SPs.
  • the decoded DSP instruction is then stored in pipeline register 717.
  • the DSP instruction is transmitted from the pipeline register 717 to the Dl decode stage 814 where the DSP instruction is further decoded and DSP control signals are generated and transmitted via DSP control signal bus 712 to the SPs 300 to control the execution of the DSP instruction by the SPs.
  • the decoded DSP instruction is then stored in pipeline register 719.
  • the Dl decode stage 814 also generates memory addresses for use by the SPs and can generate DSP control signals identifying which SPs should be used for DSP tasks. Also, a new program counter (PC) is driven to program memory 204.
  • PC program counter
  • the RISC control instruction is transmitted from the instruction pipeline register 713 to the DO decode stage 812 where the RISC control instruction is decoded and RISC control signals are generated and transmitted via RISC control signal bus 710 to the RISC 302 to control the execution of the RISC control instruction by the RISC 302.
  • the decoded RISC control instruction is then stored in pipeline register 717.
  • the DO decode stage 812 also decodes register specifiers for general purpose register (GPR) access and reads the GPRs of the register file 413 of the RISC 302.
  • GPR general purpose register
  • the RISC control instruction is transmitted from the pipeline register 717 to the Dl decode stage 814 where the RISC control instruction is further decoded and RISC control signals are generated and transmitted via RISC control signal bus 710 to the RISC 302 to control the execution of the RISC control instruction by the RISC 302 and, particularly, to perform the RISC control operation.
  • the decoded RISC control instruction is then stored in pipeline register 719. Also, a new program counter (PC) is driven to program memory 204.
  • PC program counter
  • the unified RISC/DSP pipeline controller 304 has a two-stage memory access section including a MO memory access stage 818 and a Ml memory access stage 820 to provide memory access for DSP and RISC control instructions.
  • a DSP instruction upon the next clock cycle, the decoded DSP instruction is transmitted from the pipeline register 719 to the MO memory stage 818 where the DSP instruction undergoes processing and DSP control signals are generated and transmitted via DSP control signal bus 712 to the SPs 300 to control the execution of the DSP instruction by the SPs.
  • the DSP control signals provide memory access for the SPs by driving data addresses to data memory 202 for requesting data (e.g. operands) from data memory 202 for use by the SPs.
  • the processed DSP instruction is then stored in pipeline register 721.
  • the processed DSP instruction is transmitted from the pipeline register 721 to the Ml memory stage 820 where the DSP instruction undergoes processing and DSP control signals are generated and transmitted via DSP control signal bus 712 to the SPs 300 to control the execution of the DSP instruction by the SPs.
  • the DSP control signals provide memory access for the SPs by driving previously addressed data (e.g. operands) back from data memory 202 to the SPs for use by the SPs for executing the DSP instruction.
  • the processed DSP instruction is then stored in pipeline register 723.
  • the decoded RISC control instruction is transmitted from the pipeline register 719 to the MO memory stage 818 where the RISC control instruction undergoes processing and RISC control signals are generated and transmitted via RISC control signal bus 710 to the RISC 302 to control the execution of the RISC control instruction by the RISC 302.
  • GPR General Purpose Register
  • the processed RISC control instruction is transmitted from the pipeline register 721 to the Ml memory stage 820 where the RISC control instruction undergoes processing and RISC control signals are generated and transmitted via RISC control signal bus 710 to the RISC 302 to control the execution of the RISC control instruction by the RISC 302.
  • memory e.g. data memory 203
  • registers e.g. GPR
  • the unified RISC/DSP pipeline controller 304 has a three-stage execution section including an E0 execution stage 822, an El execution stage 824, and an E2 execution stage 824 to provide DSP control signals SPs 300 to control the execution of the DSP instruction by the SPs.
  • the three execution stages generally provide DSP control signals to the SPs 300 to control the functional units of each SP (e.g. multipliers, adders, and accumulators, etc.), previously discussed, to perform the DSP operations, such as multiply and add, etc., of the DSP instruction.
  • the processed DSP instruction is transmitted from the pipeline register 723 to the E0 execution stage 822 where the DSP instruction undergoes execution processing and DSP control signals are generated and transmitted via DSP control signal bus 712 to the SPs 300 to control the execution of the DSP instruction by the SPs.
  • the DSP control signals control the execution of multiply, add, and min- max operations by the SPs.
  • the DSP control signals control the SPs to update the register file 413 of the RISC 302 with Load data from data memory 202.
  • the execution processed DSP instruction is then stored in pipeline register 725.
  • the execution processed DSP instruction is transmitted from the pipeline register 725 to the El execution stage 824 where the DSP instruction undergoes execution processing and DSP control signals are generated and transmitted via DSP control signal bus 712 to the SPs 300 to control the execution of the DSP instruction by the SPs.
  • the DSP control signals control the execution of multiply, add, (and min- max) operations of the DSP instruction by the SPs.
  • the DSP control signals control the execution of accumulation of vector multiplies and the updating of flag registers by the SPs.
  • the execution processed DSP instruction is then stored in pipeline register 727.
  • the execution processed DSP instruction is transmitted from the pipeline register 727 to the E2 execution stage 826 where the DSP instruction undergoes execution processing and DSP control signals are generated and transmitted via DSP control signal bus 712 to the SPs 300 to control the execution of the DSP instruction by the SPs.
  • DSP control signals control the execution of multiply, min-max operations, and the updating of flag registers by the SPs.
  • the execution processed DSP instruction is then stored in pipeline register 729.
  • the unified RISC/DSP pipeline controller 304 has a last single WB Writeback stage 828 to write back data to data memory 202 after execution of the DSP instruction.
  • the execution processed DSP instruction is transmitted from the pipeline register 729 to the WB Writeback stage 828 where the DSP instruction undergoes processing and DSP control signals are generated and transmitted via DSP control signal bus 712 to the SPs 300 to control the execution of the DSP instruction by the SPs.
  • the DSP control signals control the SPs in writing back data to data memory 202 after execution of the DSP instruction.
  • DSP control signals are generated to control the SPs in driving data into data memory from a parallel store operation and in writing data into the data memory.
  • DSP control signals are generated to instruct the SPs to perform a last add stage for saturating adds and to update accumulators from the saturating add operation. This completes the control of the execution of the DSP instruction by the unified RISC/DSP pipeline controller 304.
  • the hardware and power requirements are reduced for the application specific signal processor (ASSP) resulting in increased operational efficiency.
  • ASSP application specific signal processor
  • DSP portion 804 of the unified pipeline controller 304 and the SPs 300 are not utilized resulting in power savings.
  • DSP instructions are being performed, especially when many DSP instructions are looped, the RISC 302 is not utilized, resulting in power savings.
  • the unified RISC/DSP pipeine controller 304 melds together traditionally separate RISC and DSP pipelines in a seamless integrated way to provide fine-grained control and parallelism. Also, the pipeline is deep enough to allow clock scaling for future products .
  • the unified RISC/DSP pipeline controller 304 dramatically increases the efficiency of the execution of both DSP instruction and RISC control instructions by a signal processor.
  • the unified RISC/DSP pipeline controller 304 couples to the RISC control unit 302 and the program memory 204 to provide the control of the signal processing units 300 in a core processor 200.
  • the unified pipeline controller 304 includes an F0 fetch control stage 708, an FI fetch control stage 808 and a DO decoding stage 812 coupled as shown in Figure 7.
  • the FO fetch control stage 708 in conjunction with the RISC control unit 302 generate addresses to fetch new instructions from the program memory 204.
  • FI fetch control stage 808 receives the newly fetched instructions .
  • FI fetch control stage 808 includes a loop buffer 750 to store and hold instructions for execution within a loop and an instruction register 715 coupled to the output of the loop buffer 750 to store the next instruction for decoding by the DO decoding stage 812.
  • the output from the loop buffer 750 can be stored into the instruction register 715 to generate an output that is coupled into the DO decoding stage 812.
  • the registers in the loop buffer 750 are additionally used for temporary storage of new instructions when an instruction stall in a later pipeline stage (not shown) causes the entire execution pipeline to stall for one or more clock cycles.
  • the loop buffer 750 stores and holds instructions that are executed during a loop such as instructions 604 and 606 for the outer loop 601 or instructions 605 for the inner loop 602.
  • each of the blocks 708, 808, and 812 in the unified pipeline controller 304 have control logic to control the instruction fetching and loop buffering for the signal processing units 300 of the core processor 200.
  • the RISC control unit 302 signals to the F0 Fetch control stage 708 to fetch a new instruction.
  • F0 Fetch control stage 708 generates an address on the address bus 702 coupled into the program memory 204 to address a memory location of a new instruction.
  • the instruction is signaled onto the instruction bus 704 from the program memory 204 and is coupled into the loop buffer 750 of the FI fetch control stage 750.
  • the loop buffer 750 momentarily stores the instruction unless a loop is encountered which can be completely stored therein.
  • the loop buffer 750 is a first in first out (FIFO) type of buffer. That is, the first instruction stored in the FIFO represents the first instruction output which is executed. If a loop is not being executed, the instructions fall out of the loop buffer 750 and are overwritten by the next instruction. If the loop buffer 750 is operating in a loop, the instructions circulate within the loop buffer 750 from the first instruction within the loop (the "first loop instruction") to the last instruction within the loop (the "last loop instruction”) .
  • the depth N of the loop buffer 750 is coordinated with the design of the pipeline architecture of the signal processing units and the instruction set architecture. The deeper the loop buffer 750, the larger the value of N, the more complicated the pipeline and instruction set architecture.
  • the loop buffer 750 has a depth N of four to hold four dyadic DSP instructions of a loop.
  • Four dyadic DSP instructions are the equivalent of up to eight prior art DSP instructions which satisfies a majority of DSP program loops while maintaining reasonable complexity in the pipeline architecture and the instruction set architecture .
  • the loop buffer 750 differs from cache memory, which are associated with microprocessors .
  • the loop buffer stores instructions of a program loop ("looping instructions") in contrast to a cache memory that typically stores a quantity of program instructions regardless of their function or repetitive nature.
  • looping instructions instructions of a program loop
  • the loop buffer 750 continues to store instructions read from program memory 204 in a FIFO manner until receiving a loop buffer cycle (LBC) signal 755 indicating that one complete loop of instructions has been executed and stored in the loop buffer 750.
  • LBC loop buffer cycle
  • the loop buffer is used to repeatedly output each instruction stored therein in a circular fashion in order to repeat executing the instructions within the sequence of the loop.
  • the loop buffer cycle signal LBC 755 is generated by the control logic within the DO decoding stage 812.
  • the loop buffer cycle signal LBC 755 couples to the FI fetch control stage 808 and the FO fetch control stage 708.
  • the LBC 755 signals to the FO fetch control stage 708 that additional instructions need not be fetched while executing the loop. In response the F0 fetch control stage remains idle such that power is conserved by avoiding the fetching of additional instructions.
  • the control logic within the FI fetch control stage 808 causes the loop buffer 750 to circulate its instruction output provided to the DO decoding stage 812 in response to the loop buffer cycle signal 755.
  • the loop buffer cycle signal 755 Upon completion of the loop, the loop buffer cycle signal 755 is deasserted and the loop buffer returns to processing standard instructions until another loop is to be processed.
  • the first loop instruction that starts the loop needs to be ascertained and the total number of instructions or the last loop instruction needs to be determined. Additionally, the number of instructions in the loop, that is the loop size, cannot exceed the depth N of the loop buffer 750. In order to disable the loop buffer cycle signal 755, the number of times the loop is to be repeated needs to be determined.
  • the first loop instruction that starts a loop can easily be determined from a loop control instruction that sets up the loop.
  • Loop control instructions can set up a single loop or one or more nested loops. In the preferred embodiment a single nested loop is used for simplicity.
  • the loop control instructions are LOOP and LOOPi of Figure 61 for a single loop and DLOOP and DLOOPi of Figure 6J for a nested loop or dual loops .
  • the LOOPi and DLOOPi instructions provide the loop values indirectly by pointing to registers that hold the appropriate values.
  • the loop control instruction indicates how many instructions away does the first instruction of the loop begin in the instructions that follow. In the present invention, the number of instructions that follows is three or more.
  • the loop control instruction additionally provides the size (i.e., the number of instructions) of the loop.
  • the loop control instruction indicates how many instructions away does the nested loop begin in the instructions that follow. If an entire nested loop can not fit into the loop buffer, only the inner loops that do fit are stored in the loop buffer while they are being executed. While the nesting can be N loops, in the preferred embodiment, the nesting is two.
  • a loop status register is set up.
  • the loop status register includes a loop active flag, an outer loop size, an inner loop size, outer loop counter value, and inner loop count value.
  • Control logic compares the value of the loop size from the loop status register with the depth N of the loop buffer 750. If the size of the loop is less than or equal to the depth N, when the last instruction of the loop has been executed for the first time (i.e. the first pass through the loop) , the loop buffer cycle signal 755 can be asserted such that instructions are read from the loop buffer 750 thereafter and decoded by the DO decoder 812.
  • the loop control instruction also includes information regarding the number of times a loop is to be repeated.
  • the control logic of the DO decoder 812 includes a counter to count the number of times the loop of instructions has been executed. Upon the count value reaching a number representing the number of times the loop was to be repeated, the loop buffer cycle signal 755 is deasserted so that instructions are once again fetched from program memory 204 for execution.
  • One advantage of the present invention is that power consumption is reduced when executing instructions within loops .
  • the present invention has been described in particular embodiments, it may be implemented in hardware, software, firmware or a combination thereof and utilized in systems, subsystems, components or subcomponents thereof.
  • the elements of the present invention are essentially the code segments to perform the necessary tasks.
  • the program or code segments can be stored in a processor readable medium or transmitted by a computer data signal embodied in a carrier wave over a transmission medium or communication link.
  • the "processor readable medium” may include any medium that can store or transfer information.
  • Examples of the processor readable medium include an electronic circuit, a semiconductor memory device, a ROM, a flash memory, an erasable ROM (EROM) , a floppy diskette, a CD-ROM, an optical disk, a hard disk, a fiber optic medium, a radio frequency (RF) link, etc.
  • the computer data signal may include any signal that can propagate over a transmission medium such as electronic network channels, optical fibers, air, electromagnetic, RF links, etc.
  • the code segments may be downloaded via computer networks such as the Internet, Intranet, etc. In any case, the present invention should not be construed as limited by such embodiments, but rather construed according to the claims that follow below.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Advance Control (AREA)

Abstract

L'invention concerne un procédé et un dispositif pour contrôleur pipeline unifié RISC/DSP, permettant de contrôler l'exécution d'instructions de contrôle RISC et d'instructions DSP, dans le cas d'un processeur de signaux. Le contrôleur unifié RISC/DSP est couplé à une mémoire de programme (204), une unité de contrôle RISC (302), et au moins une unité de traitement des signaux. Le programme enregistre les instructions DSP et RISC et l'unité de contrôle RISC contrôle le flux d'opérandes et de résultats entre l'unité de traitement des signaux et la mémoire de données (202) qui enregistre les données. L'unité de traitement des signaux exécute les instructions DSP. Le contrôleur pipeline unifié RISC/DSP fournit des signaux de contrôle DSP pour contrôler l'exécution des instructions DSP par l'unité de traitement des signaux DSP, et des signaux de contrôle RISC pour contrôler l'exécution des instructions de contrôle RISC par l'unité de contrôle RISC.
PCT/US2001/025890 2000-08-30 2001-08-16 Procede et dispositif pour controleur pipeline unifie risc/dsp utilisable en liaison avec les instructions de controle d'ordinateur a jeu d'instructions reduit (risc) et les instructions de controle de traitement des signaux numeriques (dsp) WO2002019098A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2001285065A AU2001285065A1 (en) 2000-08-30 2001-08-16 Method and apparatus for a unified risc/dsp pipeline controller for both reducedinstruction set computer (risc) control instructions and digital signal process ing (dsp) instructions

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US09/652,593 US6832306B1 (en) 1999-10-25 2000-08-30 Method and apparatus for a unified RISC/DSP pipeline controller for both reduced instruction set computer (RISC) control instructions and digital signal processing (DSP) instructions
US09/652,593 2000-08-30

Publications (1)

Publication Number Publication Date
WO2002019098A1 true WO2002019098A1 (fr) 2002-03-07

Family

ID=24617394

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2001/025890 WO2002019098A1 (fr) 2000-08-30 2001-08-16 Procede et dispositif pour controleur pipeline unifie risc/dsp utilisable en liaison avec les instructions de controle d'ordinateur a jeu d'instructions reduit (risc) et les instructions de controle de traitement des signaux numeriques (dsp)

Country Status (2)

Country Link
AU (1) AU2001285065A1 (fr)
WO (1) WO2002019098A1 (fr)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20050039068A (ko) * 2003-10-23 2005-04-29 한국전자통신연구원 알아이에스시와 디에스피의 듀얼 프로세서를 갖는 비디오신호처리시스템
GB2458487A (en) * 2008-03-19 2009-09-23 Imagination Tech Ltd Processor with multiple execution pipeline paths
WO2013101147A1 (fr) * 2011-12-30 2013-07-04 Intel Corporation Noyau configurable à jeu d'instructions réduit
US11204768B2 (en) 2019-11-06 2021-12-21 Onnivation Llc Instruction length based parallel instruction demarcator
WO2023110065A1 (fr) * 2021-12-15 2023-06-22 Huawei Technologies Co., Ltd. Dispositif et procédé pour un contrôleur de passerelle d'équipement utilisateur

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5638524A (en) * 1993-09-27 1997-06-10 Hitachi America, Ltd. Digital signal processor and method for executing DSP and RISC class instructions defining identical data processing or data transfer operations
US5838931A (en) * 1994-12-08 1998-11-17 Intel Corporation Method and apparatus for enabling a processor to access an external component through a private bus or a shared bus

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5638524A (en) * 1993-09-27 1997-06-10 Hitachi America, Ltd. Digital signal processor and method for executing DSP and RISC class instructions defining identical data processing or data transfer operations
US5838931A (en) * 1994-12-08 1998-11-17 Intel Corporation Method and apparatus for enabling a processor to access an external component through a private bus or a shared bus

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20050039068A (ko) * 2003-10-23 2005-04-29 한국전자통신연구원 알아이에스시와 디에스피의 듀얼 프로세서를 갖는 비디오신호처리시스템
GB2458487A (en) * 2008-03-19 2009-09-23 Imagination Tech Ltd Processor with multiple execution pipeline paths
GB2458487B (en) * 2008-03-19 2011-01-19 Imagination Tech Ltd Pipeline processors
JP2011528817A (ja) * 2008-03-19 2011-11-24 イマジネイション テクノロジーズ リミテッド パイプラインプロセッサ
US8560813B2 (en) 2008-03-19 2013-10-15 Imagination Technologies Limited Multithreaded processor with fast and slow paths pipeline issuing instructions of differing complexity of different instruction set and avoiding collision
WO2013101147A1 (fr) * 2011-12-30 2013-07-04 Intel Corporation Noyau configurable à jeu d'instructions réduit
TWI472911B (zh) * 2011-12-30 2015-02-11 Intel Corp 用於可組配式精簡指令集核心之方法及裝置、以及非暫時性電腦可讀媒體
US11204768B2 (en) 2019-11-06 2021-12-21 Onnivation Llc Instruction length based parallel instruction demarcator
WO2023110065A1 (fr) * 2021-12-15 2023-06-22 Huawei Technologies Co., Ltd. Dispositif et procédé pour un contrôleur de passerelle d'équipement utilisateur

Also Published As

Publication number Publication date
AU2001285065A1 (en) 2002-03-13

Similar Documents

Publication Publication Date Title
EP1257911B1 (fr) Procede et appareil pour architecture a ensemble d'instructions comprenant des instructions de traitement de signaux numeriques dyadiques
EP1252567B1 (fr) Procede et appareil permettant de mettre des instructions de traitement de signal numerique dans un tampon en boucle
EP1323026B1 (fr) Types de donnees de traitement (dsp) specifies en ce qui concerne la quantite, la largeur et la nature simple/complexe
US6832306B1 (en) Method and apparatus for a unified RISC/DSP pipeline controller for both reduced instruction set computer (RISC) control instructions and digital signal processing (DSP) instructions
US6732203B2 (en) Selectively multiplexing memory coupling global bus data bits to narrower functional unit coupling local bus
US20100118852A1 (en) System and Method of Processing Data Using Scalar/Vector Instructions
US6408376B1 (en) Method and apparatus for instruction set architecture to perform primary and shadow digital signal processing sub-instructions simultaneously
WO2002019098A1 (fr) Procede et dispositif pour controleur pipeline unifie risc/dsp utilisable en liaison avec les instructions de controle d'ordinateur a jeu d'instructions reduit (risc) et les instructions de controle de traitement des signaux numeriques (dsp)
WO2024025864A1 (fr) Architectures d'ensembles d'instructions multiples sur un dispositif de traitement
Kariya Evolution of DSPs
Kuo et al. Digital signal processor architectures and programming

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP