CN1135468C - Digital signal processing integrated circuit architecture - Google Patents

Digital signal processing integrated circuit architecture Download PDF

Info

Publication number
CN1135468C
CN1135468C CNB971981442A CN97198144A CN1135468C CN 1135468 C CN1135468 C CN 1135468C CN B971981442 A CNB971981442 A CN B971981442A CN 97198144 A CN97198144 A CN 97198144A CN 1135468 C CN1135468 C CN 1135468C
Authority
CN
China
Prior art keywords
register
signal processing
digital signal
data
instruction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CNB971981442A
Other languages
Chinese (zh)
Other versions
CN1231741A (en
Inventor
D��V���Ÿ��
D·V·雅格加
S·J·格拉斯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ARM Ltd
Original Assignee
Advanced Risc Machines Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Advanced Risc Machines Ltd filed Critical Advanced Risc Machines Ltd
Publication of CN1231741A publication Critical patent/CN1231741A/en
Application granted granted Critical
Publication of CN1135468C publication Critical patent/CN1135468C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/34Addressing or accessing the instruction operand or the result ; Formation of operand address; Addressing modes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3877Concurrent instruction execution, e.g. pipeline or look ahead using a slave processor, e.g. coprocessor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Computational Mathematics (AREA)
  • Algebra (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Advance Control (AREA)
  • Complex Calculations (AREA)
  • Image Processing (AREA)
  • Executing Machine-Instructions (AREA)
  • Microcomputers (AREA)

Abstract

A digital signal processing system has a microprocessor unit 2 operating under control of microprocessor program instruction words which controls data transfer to and from a data storage device 8 and the supply and fetching of data to and from a digital signal processing unit 4.

Description

To signal data word combine digital method for processing signals and device in the memory device
The present invention relates to digital processing field.More specifically, the present invention relates to the integrated circuit architecture that in digital signal processing, uses.
Digital information processing system is characterized in that the arithmetical logic operation of needs execution relative complex so that lot of data is handled, produces real time output data stream.The common application of Digital Signal Processing comprises the mobile phone that need carry out real-time conversion between the digital signal that is used to send of simulated audio signal and coding.
Consider the requirement special and that continue in the digital signal processing application, the integrated circuit specific to application of the architecture with the special applications of being applicable to is provided usually.
As an example, desired typical digital signal processing operation can require three input operand data words (being generally 16 or 32), require to they take advantage of and/or add operation to produce an output data word.This input operand is arranged in a large amount of input data, and respectively they is taken out from storer under the needs data conditions.For enough bandwidth are provided for memory access, use a plurality of physical storages and memory bus/port.
Although adopt above-mentioned integrated circuit architecture to allow the related mass data of system handles, but its shortcoming is to need complicated and memory construction costliness, the architecture of this integrated circuit will be used specific to each, needs huge hardware to change for different terminal applies.And, making memory bus remain on duty constantly and consumed a large amount of electric energy, this is its significant disadvantages, especially in battery-powered mobile device.
Europe publication application EP-A-0,442,041 disclose a kind of system with a DSP unit and a general processor GPP.GPP takes out the certain operations number from primary memory, they are stored in the region of memory of sharing with the DSP unit.This GPP starts a DSP operation of DSP unit then, and enters a waiting status.When this DSP operation was finished, this GPP recovered other operations
According to an aspect of the present invention, the invention provides a kind of digital signal processing device that uses and carry out the method for digital signal processing, said method comprising the steps of being stored in signal data word in the data storage device:
The microprocessor unit that utilization is operated under the control of microprocessor unit programmed instruction word produces address word, is used for the storage unit at the described signal data word of described data storage device addressable storage;
Under the control of described microprocessor unit, from the described storage unit that is addressed of the described data storage device of storing described signal data word, read described signal data word;
Under the control of described microprocessor unit, provide described signal data word to the digital signal processing unit of under the control of digital signal processing unit programmed instruction word, operating;
The described digital signal processing unit that utilization is operated under the control of digital signal processing unit programmed instruction word is carried out described signal data word and is comprised convolution operation at least, and the arithmetical logic operation of one of associative operation and map function is with the data word that bears results;
The described microprocessor unit that utilization is operated under the control of microprocessor unit programmed instruction word takes out described result data word from described digital signal processing unit; It is characterized in that:
With described provide and to take out operation parallel mutually that described microprocessor unit is carried out, described digital signal processing unit is carried out described logical operation.
The present invention recognizes, can with management and drive to the task of the memory access of data memory device with such as convolution, digital signal processing operation relevant and conversion is come respectively, make and produce such system: utilize simple memory construction to make the complicacy of entire system reduce, and can handle the mass data that in digital signal processing, relates to, and can carry out true-time operation.
The present invention uses a microprocessor to be used to produce suitable address word to visit this data storage device, reads this data storage device and provides data word to this digital signal processing unit.And this microprocessor is responsible for taking out result data word from digital signal processing unit.In this way, allow this digital signal processing unit to be independent of its Mass Data Storage Facility that connects and operate, and come with the work disengaging of data transmission and management.And can preserve from a plurality of data sources and the data that are used for a plurality of application with data memory device of complex way permission of control and diode-capacitor storage operation so that a microprocessor is programmed.
In preferred embodiment, this method also is included under the control of described microprocessor unit, is created in the address word of the storage unit of the described result data word of addressable storage in the described data storage device;
Under the control of described microprocessor unit, described result data word is write the storage unit of the described institute addressing that in described data storage device, is used to store described result data word.
Except the transmission of the signal data word of control from the data storage device to the digital signal processing unit, this microprocessor is also operated with control and will be write back to by the result data word that digital signal processing unit produces in this data storage device, if necessary.
From another aspect of the present invention, the present invention also provides and has carried out the device of digital signal processing to being stored in signal data word in the data storage device, and described device comprises:
A microprocessor unit, it carries out the address word of addressing to the storage unit in described data storage device operating under the control of microprocessor unit programmed instruction word to produce, and controls the transmission of described signal data word between described device that is used for the combine digital signal Processing and described data storage device; And
A digital signal processing unit, it is operated under the control of digital signal processing unit instruction word so that the described signal data word that is taken out from described data storage device by described microprocessor unit is carried out and comprises convolution operation at least, the arithmetical logic operation of one of associative operation and map function is with the data word that bears results; It is characterized in that:
Described microprocessor unit and described digital signal processing unit parallel work-flow.
In preferred embodiment of the present invention, described microprocessor unit response more than provides instruction word that the signal data word of a plurality of sequential addressings is provided to described digital signal processing unit.
The ability of microprocessor gating pulse string data transmission allows more effectively to use this memory bus.The ability of complicated response more that this microprocessor has for the state of total system also allows these employed burst modes transmission to have best effect.
Although this digital signal processing unit once can be accepted a signal data word, in preferred embodiment of the present invention, described digital signal processing unit comprises a multiword input buffer.
In digital signal processing unit, provide a multiword input buffer to allow between this microprocessor and this digital signal processing unit, also to use the burst mode transmission.This has further strengthened the data transmission efficiency in system, and improve this digital signal processing unit according to provided through the buffering the input signal data word be independent of the ability that this microprocessor is operated, because from the transmission of data storage device, microprocessor can not have interruptedly these input signal data words is carried out the digital signal processing operation.
Outgoing side at this digital signal processing unit has also carried out corresponding consideration.
In system, improved the dirigibility of the mode that microprocessor can the control data memory device, multiplexed data have been connected described data storage device and described digital signal processing device to transmit described signal data word with instruction bus in this system, and described microprocessor unit programmed instruction word and described digital signal processing unit programmed instruction word are to described digital signal processing device.
Preferred embodiment of the present invention is such, described digital signal processing unit comprises that a digital signal processing unit registers group is used to preserve data word, can carry out arithmetic logical operation to these data words, described DSP program instruction word comprises the register specific field.
Use on the mode of operation of a registers group at digital signal processing unit and bring great dirigibility at digital signal processing unit (it has register and is used for a specific operation in the appointment of DSP program instruction word).And signal data word can be loaded the register in the digital signal processing unit into, and is used repeatedly before being replaced by another signal data word.The repeatedly use of sort signal data word in digital signal processing unit reduced the data stream momentum, and alleviated power consumption problem, and the bandwidth problem relevant with existing system.
In preferred embodiment of the present invention, for each data word that is stored in the described input buffer, the destination data of a purpose digital signal processing unit register of described input buffer stores sign.
The destination data that is provided for identifying a digital signal processing unit register allows to utilize better the function of this microprocessor, because this microprocessor can be done the work that sign is used for the target of a specific signal data word in this digital signal processing unit registers group, thereby alleviated this task of digital signal processing unit.
This digital signal processing unit can take out new signal data word in many ways from this input buffer.Yet, in preferred embodiment of the present invention, the digital signal processing unit programmed instruction word that reads a digital signal processing unit register comprises a sign, and indication is stored in the data word that a data word in the described digital signal processing unit register can be stored in the described input buffer with the destination data that is complementary and replaces.
Digital signal processing unit by only with its oneself register tagging for requiring one to refill operation and will oneself from this input buffer and the data transmission between himself, free and come.Other circuit can be used to be responsible for satisfying this and refill requirement then, and this refills and can require the arbitrary time before the new data in this relevant register of using to take place.
In order further to improve the independence of this microprocessor unit and this digital signal processing unit, if described input buffer comprises a plurality of data words with the destination data that is complementary, then refill described digital signal processing unit register with such data word, described data word has first and is stored in the destination data that is complementary in the described input buffer.
Under the situation of the transmission of the burst mode from this microprocessor to this input buffer, can increase progressively this destination data for each word, and after appearance is once unrolled, can have a limit destination data value selectively.
This microprocessor unit and this digital signal processing unit can lock mutually, if make a Founder wait for the operation that will be finished by the opposing party, this corresponding side is deadlocked.If this digital signal processing unit reduces power consumption when pausing, this feature will be further strengthened.
Be understandable that this microprocessor unit can be manufactured on the different integrated circuit with this digital signal processing unit, if but they are fabricated on the same integrated circuit, will be in the spaces, and speed, power consumption and cost aspect have great benefit.
Embodiments of the invention are described with reference to the accompanying drawings by way of example, in the accompanying drawing:
Fig. 1 illustrates the high level configuration of digital signal processing device;
Fig. 2 illustrates the input buffer of the register configuration of coprocessor;
Fig. 3 illustrates the data routing by coprocessor;
Fig. 4 illustrates the multiplex electronics that reads position, high or low position from register;
Fig. 5 is the employed register of the coprocessor block diagram of map logic again that illustrates in the preferred embodiment;
Fig. 6 illustrates in greater detail the map logic again of the register shown in Fig. 5; And
Fig. 7 is the table that the piece filter algorithm is shown.
The system that describes below is about digital signal processing (DSP).DSP can take many forms, but need generally can think the processing of (in real time) processing mass data at a high speed.Certain analog physical signal of this data ordinary representation.The good example of DSP is used in the digital mobile phone, wherein receives need be decoded into analoging sound signal with the radio signal that sends and with analoging sound signal coding (adopting convolution, conversion and related operation usually).Another example is the disk drive controller, wherein handles the signal that recovers from coiled hair to produce a tracking Control.
In the superincumbent context, be below to based on the description of the digital information processing system of the microprocessor core of coprocessor cooperation (being the ARM nuclear in the microprocessor scope of Britain Camb Advanced RISC Machines Ltd. design in this example).The interface of microprocessor and coprocessor and coprocessor processor system structure itself are special in DSP is provided functional configuration.Microprocessor core will be known as ARM and coprocessor is called Piccolo.ARM and Piccolo manufacture the single IC for both of other element (as DRAM, ROM, D/A and A/D converter etc. on the sheet) that comprises as the part of ASIC usually.
Piccolo is an arm coprocessor, so it carries out a part of ARM instruction set.Arm coprocessor instruction allows ARM to transmit data (utilize and load coprocessor LDC and storage coprocessor STC instruction) between Piccolo and storer, and to transmit ARM register (the MRC instruction that utilization is sent to coprocessor MCR and transmits from coprocessor) from Piccolo.A kind of mode of observing the cooperative interaction of ARM and Piccolo is the strong address generator work of ARM as the Piccolo data, needs to handle in real time the DSP computing that mass data produces corresponding real-time results and Piccolo is carried out if having time.
Fig. 1 illustrates ARM2 and Piccolo4, and ARM2 issue control signal is controlled to Piccolo4 to Piccolo4 and transmitted data and transmit data word from Piccolo4.The needed Piccolo programmed instruction of instruction cache 6 storage Piccolo4 word.Single DRAM storer 8 storage ARM2 and needed all data and instruction words of Piccolo4.ARM2 is responsible for addressable memory 8 and controls all data and transmit.Only simple and cheap with the layout of address bus than the typical DSP method of the bus that needs a plurality of storeies and high bus bandwidth with single memory 8 and one group of data.
Piccolo carries out second instruction stream (DSP program instruction word) from the instruction cache 6 of control Piccolo data routing.Comprise such as the operation of digital signal processing types such as multiply-accumulate and such as control flow commands such as zero-overhead loop instructions in these instructions.Operate on the data of these instructions in remaining on Piccolo register 10 (see figure 2)s.These data are that previous ARM2 sends from storer 8.Instruction stream is from instruction cache 6; Instruction cache 6 conducts are bus master driving data bus completely.Little Piccolo instruction cache 6 is the direct mapping cache (64 instruction) of 4 lines, 16 words of every line.In some implementations, make that instruction cache is bigger to be worth.
Thereby two tasks are independent operatings, and ARM loading data and Piccolo handle it.This allows monocycle data processing lasting on 16 bit data.Piccolo has the ARM of the making alphabetic data of looking ahead, the scanning machine system of loading data before Piccolo needs it (being illustrated among Fig. 2).Piccolo can be with any order access loaded data, along with the last use of old data automatically refills its register (each source operand of all instructions all have indicate should refill source-register).This input mechanism is called the sequencing impact damper again and comprises input buffer 12.Each value (face is by LDC or MCR as follows) that loads Piccolo carries the mark Rn of the destination register of specifying this value.Mark Rn is stored in the input buffer with data word.When selecting circuit 14 access function resisters to instruct appointment will refill this data register by register, just come this register of mark by establishing signal E.Refilling then in the circuit 16 usefulness input buffers 12 with this register is that the oldest loaded value of destination refills this register automatically.Reset the value that the preface impact damper keeps 8 tape labels.Input buffer 12 has the form that is similar to FIFO, but except can be from formation central authorities extracted data word, and the word of later storage is after this filled the room to front transfer.Distance input data word farthest just correspondingly is the oldest, and just determines and should refill input buffer 12 with which data word with it when input buffer 12 maintenances have two data words of correct mark Rn.
Piccolo is by exporting it with data storage in output buffer 18 (FIFO) as shown in Figure 3.Data are sequentially to write among the FIFO, and read into storer 8 by ARM with identical order.Output buffer 18 keeps 8 32 place values.
Piccolo is connected on the ARM by coprocessor interface (the CP control signal of Fig. 1).When carrying out the arm coprocessor instruction, Piccolo can carry out this instruction; It is ready up to Piccolo to make ARM wait for before carrying out this instruction; Or refusal is carried out this instruction.In in the end a kind of situation, ARM will cause undefined instruction exception.
The prevailing coprocessor instruction that Piccolo carries out is LDC and STC, they respectively by data bus to load and the storage data word from storer 8, and ARM generates all addresses.Be that these instruct data load to resetting the data of also storing in the preface impact damper from output buffer 18.Reset when not having enough spaces to come loading data in the preface impact damper if on LDC, import, if and on STC, do not have enough data in the output buffer for storage, be the data expected of ARM not in output buffer 18 time, Piccolo will stop ARM.The ARM/ coprocessor register of also carrying out Piccolo transmits the particular register that makes ARM energy access Piccolo.
Piccolo comes the data routing shown in the control chart 3 and reaches 18 transmission data from the register to the output buffer from resetting the preface impact damper to register from the instruction that storer takes out itself.The ALU of these instructions of execution of Piccolo has the multiplication of execution, addition, subtraction, multiply-accumulate, logical operation, displacement and round-robin multiplier/adders circuit 20.Also being provided with in data routing adds up/tire out subtracts (decumulate) circuit 22 and calibration/saturated circuit 24.
Advance the instruction cache 6 from memory load when Piccolo instruction is initial, wherein Piccolo can access they and do not need to return accessing main memory.
Piccolo can not recover from the storer failure.Therefore, if use Piccolo in virtual memory system, all Piccolo data all must be in physical storage in whole Piccolo task.For the real-time such as Piccolo tasks such as real-time DSP, this is not great restriction.If the storer failure, Piccolo will stop and in status register S2 sign will be set.
Fig. 3 illustrates the overall data path function of Piccolo.Registers group 10 is used 3 read ports and 2 write ports.Utilize a write port (L port) to refill register from resetting the preface impact damper.Output buffer 18 is directly to upgrade from ALU result bus 26, from the output of output buffer 18 under the ARM programmed control.The arm coprocessor interface is carried out LDC (loading coprocessor) instruction that resets in the preface impact damper and from STC (storage coprocessor) instruction of output buffer 18, and the MCR on registers group 10 and MRC (transmit ARM register extremely/from the CP register).
All the other register ports are used for ALU.Two read ports (A and B) drive and are input to multiplier/adders circuit 20, and the C read port is used to drive totalizer/accumulation subtraction apparatus circuit 22 inputs.All the other write port W are used for the result is returned to registers group 10.
Multiplier 20 is carried out 16 * 16 tape symbol or non-signed multiplication, has available 48 and adds up.Scaler unit 24 can provide 0 to 31 arithmetic or logical shift right immediately, and the back is followed available saturated.Shift unit and 20 each cycle of logical block can be carried out a displacement or logical operation.
Piccolo has 16 general-purpose registers that are called D0-D15 or A0-A3, X0-X3, Y0-Y3, Z0-Z3.First group four registers (A0-A3) are predetermined as totalizer and be 48 bit wides, and extra 16 are provided at the protection to overflowing in many continuous calculating.All the other registers are 32 bit wides.
Can with each Piccolo register as comprise two independently 16 place values treat.Position 0 to 15 comprises low half, and position 16 to 31 comprises high half.Instruction can specify each register specific 16 half as source operand, maybe can specify whole 32 bit registers.
Piccolo also provides saturated computing.If the result is greater than the size of destination register, the modification of multiplication, addition and subtraction instruction provides saturated result.When destination register is 48 bit accumulators, value is saturated to 32 (promptly can't saturated 48 place values).On 48 bit registers, do not overflow detection.So add up that just to cause overflowing this be rational restriction in instruction owing to can take at least 65536 multiplication.
Each Piccolo register is to be labeled as " sky " (the E sign is seen Fig. 2) or to comprise one of value (it is empty that half register can not be arranged).When initial, be empty with all register taggings.Piccolo attempts will fill one of empty register from the value that input resets the preface impact damper with refilling control circuit 16 on each cycle.Just no longer it is labeled as " sky " if will write register in addition from the value of ALU.If write register from ALU, there is value to wait for simultaneously and is placed into this register from resetting the preface impact damper, then the result is uncertain.If dummy register is read, the performance element of Piccolo will stop.
Input resets preface impact damper (ROB) between the registers group of coprocessor interface and Piccolo.Transmit data load is advanced among the ROB with arm coprocessor.ROB comprises some 32 place values, respectively has the mark of indication as the Piccolo register of the destination of this value.This mark also indicates these data should send 16 of the bottoms that whole 32 bit registers are still only given 32 bit registers to.If the destination of data is whole register, 16 of bottoms that then will this item send to destination register the bottom half and 16 at top is sent to the top half (if destination register is 48 bit accumulators then escape character) of register.If the destination of these data is the bottom half (so-called " half register ") of register, at first transmit 16 of bottoms.
Register tagging is always with reference to the physics destination register, do not carry out register and remaps that (face remaps about register as follows.)
Piccolo attempts as follows data item to be sent to registers group from ROB on each cycle:
Every and relatively among-the checking R OB with mark and dummy register, determine whether and can transmit register from part or all.
-Xiang Zuzhong from transmitting selects the oldest item and sends its data to registers group.
-will this item flag update be mark this be empty.If only transmitted the part of this item, a part that will transmit is labeled as empty.
For example, if destination register be empty fully and ROB item that select to comprise with whole register be the data of destination, be sky just transmit whole 32 and mark this items.If half is empty and the ROB item comprises half the data of bottom that the destination is a register for the bottom of destination register, then 16 of the bottoms of this ROB item are sent to destination register the bottom half and with the bottom of ROB half is labeled as empty.
Can transmit the height of the data in any independently and hang down 16.If do not have item to comprise the data that can send registers group to, do not transmit in this cycle.The institute that following table is described target ROB item and destination register state might make up.
Target, Rn, state
Target ROB item state Empty Sky is at half High one in midair
Full register, two halves are all effective Rn.h<-entry.h Rn.l<-the entry.l item is labeled as sky Rn.l<-entry.l entry.l is labeled as sky Rn.l<-entry.h entry.h is labeled as sky
Full register, half is effective for height Rn.h<-the entry.h item is labeled as sky Rn.h<-the entry.h item is labeled as sky
Full register is at half effectively Rn.l<-the entry.l item is labeled as sky Rn.l<-the entry.l item is labeled as sky
Half register, two halves are all effective Rn.l<-entry.l entry.l is labeled as sky Rn.l<-entry.l etntry.l is labeled as sky
Half register, half is effective for height Rn.l<-the entry.h item is labeled as sky Rn.l<-the entry.h item is labeled as sky
Sum up, can refill the two halves of register independently from ROB, the data markers among the ROB for whole register be the destination or with the bottom of register half is two 16 place values of destination.
With the arm coprocessor instruction data load is advanced among the ROB.How which bar coprocessor instruction flag data depends on and carries out transmission in ROB.Following A RM instruction can be used for filling ROB with data:
LDP{<cond>}<16/32> <dest>,[Rn]{!},#<size>
LDP{<cond>}<16/32>W <dest>,<wrap>,[Rn]{!},#<size>
LDP{<cond>}16U <bank>,[Rn]{!}
MPR{<cond>} <dest>,Rn
MRP{<cond>} <dest>,Rn
Provide following ARM instruction to be used to dispose ROB:
LDPA<bank?list>
First three bar is collected to be that LDC, MPR and MRP are collected and to be that it is the CDP instruction that MCR, LDPA are collected.
Above<dest〉represent Piccolo register (A0-Z3), Rn represents an ARM register,<size〉representative must be the fixed word joint number of 4 non-zero multiple, and<wrap〉represent constant (1,2,4,8).The field of having drawn together with { } is what select for use.Reset preface impact damper,<size for transmission can be met〉be at most 32.In many occasions, for fear of deadlock,<size〉will be less than this restriction.<16/32〉field indicates whether loaded data to be treated as 16 bit data and be indicated the specific action (face as follows) of ending (endian) that will take, or 32 bit data.
Annotate 1: in the text below, it instructs when quoting LDP or LDPW 16 and 32 modification.
Annotate 2: ' word ' is 32 pieces from storer, and it can comprise two 16 bit data items or one 32 bit data item.
The LDP instruction transmits the plurality of data item, and they are assigned to a full register.This instruction will be from storer address Rn loading<size/4 words, they are inserted among the ROB.The number of words that can transmit is subjected to following restriction:
-amount<size〉must be 4 non-zero multiple;
-<size〉must be less than or equal to the size (be 8 words, in the future version guarantee be no less than this) of the ROB of specific implementation in first version.
First data item that transmits is labeled as is assigned to<dest, second data item is assigned to<dest 〉+1 or the like (rapping around to A0) from Z3.If specified! , then after this with register Rn increment<size 〉.
If adopt the LDP16 modification,, on 2 16 half-words that constitute 32 bit data items, carry out ending (endian) specific operation along with they return from accumulator system.The details big ending of face as follows (Big Endian) is supported with little ending (Little Endian).
The LDPW instruction transmits the plurality of data item to one group of register.First data item that transmits is labeled as is assigned to<dest, second to<dest 〉+1, or the like.As appearance<wrap〉when transmitting, the item that the next one is transmitted is labeled as and is assigned to<dest 〉, or the like.<wrap〉amount is in the amount appointment of half-word.
For LDPW, be suitable for following restriction:
-amount<size〉must be 4 non-zero multiple;
-<size〉must be less than or equal to the size (be 8 words, in the future version guarantee be not less than this) of the ROB of specific implementation in first published;
-<dest〉can be { one of A0, X0, Y0, Z0};
-for LDP32W,<wrap〉can be 2,4, one of a 8} half-word, for LDP16W can be 1,2,4, one of a 8} half-word;
-amount<size〉must be greater than 2*<wrap, otherwise do not occur unrolling and use LDP and instruct and replace.
For example, instruction
LDP32W X0,2,[R0]!,#8
Two words are loaded among the ROB, they are assigned to whole register X0.R0 will be by increment 8.Instruction
LDP32W X0,4,[R0],#16
Four words are loaded among the ROB, they are labeled as are assigned to X0, X1, X0, X1 (by this order).R0 is unaffected.
For LDP16W, can be with<wrap be appointed as 1,2,4 or 8.1 unroll will cause all data markers for being assigned to destination register<dest〉bottom of .l half.This is ' half register ' situation.
For example, instruction
LDP16W X0,1,[R0]!,#8
Two words are loaded among the ROB, they are labeled as 16 bit data that are assigned to X0.1.R0 will be by increment 8.Instruction
LDP16W X0,4,[R0],#16
Performance be similar to the LDP32W example, but carry out for except the specific operation of ending in data when storer returns at it.
LDP instructs all untapped codings to can be in the future, and expansion keeps.
The LDP16U instruction is to provide for the efficient transmission of supporting 16 data that do not line up.LDP16U supports to provide for register D4 to D15 (X, Y and Z group).The LDP16U instruction is sent to one 32 bit data word (comprising two 16 bit data items) the Piccolo from storer.Piccolo will abandon 16 of the bottoms of these data and 16 at top will be stored in the maintenance register.X, Y and Z group have one to keep register.In case loaded the maintenance register in the group, if data are assigned to register in this group, just changed the performance that LDP{W} instructs.Load data the connecting and composing among the ROB by 16 of the bottoms that keeps register and the data that transmitting with the LDP instruction.Put into and keep register for high 16 with the data that transmitting:
entry<-data.lholding_register
holding_register<-data.h
This operator scheme last till always close with LDPA instruction till.Keep register not write down destination register mark or size.This feature is to obtain from the instruction of the next one value that data.l is provided.
The specific behavior of ending can appear on the data that accumulator system returns forever.Because all 32 bit data items of supposition all are the word alignment in storer, do not have non-16 bit instructions that are equivalent to LDP16U.
The LDPA instruction is used to close the operator scheme that do not line up of LDP16U instruction starting.Can on group X, Y, Z, independently close the pattern of not lining up.For example instruction,
LDPA {X,Y}
With the pattern that do not line up of closing on group X and the Y.Data in the maintenance register of these groups will be dropped.
Permission is carried out LDPA on the group that is not in the non-alignment pattern, this will make this group in alignment pattern.
The MPR instruction is put into ROB with the content of ARM register Rn, is assigned to Piccolo register<dest 〉.Destination register<dest〉can be any full register among the scope A0-Z3.For example instruction,
MPR X0,R3
The content of R3 is sent among the ROB, marks the data as and be assigned to full register X0.
Because ARM is inner little ending (endian), and data are not occurred the specific performance that ends up when ARM is sent to Piccolo.
MPRW instruction is placed on the content of ARM register Rn among the ROB, it is labeled as is assigned to 16 Piccolo register<dest〉two the 16 bit data items of .l.Right<dest〉restriction and to identical (being A0, X0, Y0, Z0) of LDPW instruction.For example instruction,
MPRW X0,R3
The content of R3 is sent among the ROB, marks the data as two 16 amounts that are assigned to X0.1.Should point out for having 1 LDP16W that unrolls, can only at the bottom of 32 bit registers half.
As for MPR, on data, do not act on for the specific operation of ending.
LDP is encoded to: 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9876543210
COND 110 P ?U ?N ?W ?I Rn OEST ?PICCOLO1 SIZE/4
Wherein PICCOLO1 is first coprocessor number (current is 8) of Piccolo.The N position is selected between LDP32 (1) and LDP16 (0).
LDPW is encoded to: 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9876543210
COND 110 P ?U ?N ?W ?I Rn DES ?WRA PICCOLO2 SIZE/4
Wherein DEST is that 0-3 and WRAP are 0-3 for the value 1,2,4,8 of unrolling for destination register A0, X0, Y0, Z0.PICCOLO2 is second coprocessor number (current is 9) of Piccolo.The N position is selected between LDP32 (1) and LDP16 (0).
LDP16U is encoded to: 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9876543210
COND 110 ?P ?U ?0 W ?1 Rn ?DES 01 ?PICCOLO2 00000001
Wherein DEST is 1-3 for destination group X, Y, Z.
LDPA is encoded to: 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9876543210
COND 1110 0000 0000 0000 ?PICCOLO1 000 0 BANK
BANK[3 wherein: 0] be used on every group basis, closing the pattern of not lining up.If be provided with BANK[1], then close the pattern that do not line up on the group X.BANK[2] and BANK[3] close the pattern that do not line up on group Y and the Z respectively, if be provided with.Notice that this is the CDP operation.
MPR is encoded to: 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9876543210
COND 1110 0 1 0 0 DEST Rn PICCOLO1 000 ?1 0000
MPRW is encoded to: 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9876543210
COND 1110 ?0 ?1 ?0 ?0 DEST 00 Rn PICCOLO2 000 ?1 0000
Wherein DEST is 1-3 for destination register X0, Y0, Z0.
Output FIFO can keep nearly 8 32 place values.They transmit from Piccolo with one of following (ARM) operational code:
STP{<cond>}<16/32> [Rn]{!},#<size>
MRP Rn
First will from output FIFO<size/4 words are kept on the given address of ARM register Rn, if! There is index Rn.For preventing deadlock,<size〉must not be greater than the size (in this realization being 8) of output FIFO.If adopt the STP16 modification, on the data that accumulator system is returned, can occur for the specific performance of ending.
The MRP instruction is eliminated a word and is placed it among the ARM register Rn from output FIFO.On data, do not act on for the specific operation of ending for MPR.
The ARM of STP is encoded to: 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9876543210
COND 110 P U N W 0 Rn C000 ?PICCOLO1 SIZE/4
Wherein N selects between STP32 (1) and STP16 (0).For the definition of P, U and W position, referring to the ARM Fact Book.
The ARM of MRP is encoded to: 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9876543210
COND 1110 ?0 ?1 ?0 ?1 0000 Rn PICCOLO1 000 ?1 0000
The inner little ending of supposition of Piccolo instruction set (little endian) operation.For example, during as two 16 32 bit registers, supposing is at half takies position 15 to 0 in access.Piccolo can operate in the system that has big ending (big endian) storer or peripherals, therefore must be noted that with correct way to load 16 integrated datas.
Have ' BIGEND ' configuration pin that the programmer can control such as ARM Piccolo such as (the ARM7 microprocessors of producing as Advanced RISC Machines Ltd. of Britain Camb), control can be carried out with programmable peripheral equipment.Piccolo utilizes this pin to dispose input and resets preface impact damper and output FIFO.
When 16 bit data that will divide into groups as ARM were loaded into and reset in the preface impact damper, it must be with the 16 bit formats indication this point of LDP instruction.This information is placed on data the maintenance latch and resets in the preface impact damper with suitable order with the combinations of states of ' BIGEND ' configuration input.Especially in big ending pattern, keep 16 of the bottoms of the word that register-stored loads, and with top 16 bit pairings that next time load.Keep content of registers to finish forever in being sent to 16 of bottoms that reset the word in the preface impact damper.
Output FIFO can comprise grouping 16 or 32 bit data.The programmer must use the correct format of STP instruction so that Piccolo can guarantee 16 bit data are provided at the correct on half of data bus.When being configured to end up greatly, when using the STP of 16 bit formats, 16 two halves in up and down exchange.
Piccolo has can only be from 4 special registers of ARM access.They are called S0-S2., and they can only use MRC and MCR instruction accessing.Operational code is:
MPSR Sn,Rm
MRPS Rm,Sn
These operational codes transmit 32 place values between ARM register Rm and special register Sn.They are transmitted among the ARM as coprocessor register and encode: 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9876543210
COND 1110 001 ?L Sn Rm PICCOLO 000 ?1 0000
Wherein for MPSR, L is 0 and to MRPS, L then is 1.
Register S0 comprises unique ID of Piccolo and revision version code.31?30?29?28?27?26?25?24 23?22?21?20?19?18?17?16 15?14?13?12?11?10?9?8?7?6?5?4 3?2?1?0
The implementor Architecture Part number Revision version
Position [3: 0] comprises the revision number of processor.
3 part number: piccolo that position [15: 4] comprises with the binary-coded decimal form are 0 * 500
Position [23: 16] occlusion body architecture version: 0 * 00=version 1
Position [31: 24] comprises the ASCII character of implementor's trade mark: 0 * 41=A=ARM company limited
Register S1 is the Piccolo status register.31?30 29?28 27?26 25?24?23?22?21?20?19?18?17?16?15?14?13?12?11?10?9?8?7?6 5 4 3 2 1 0
?N Z C ?V ?S ?N ?S ?Z ?S ?C ?S ?V Keep D A ?H ?B ?U E
One-level condition code flag (N, Z, C, V)
Secondary condition code flag (SN, SZ, SC, SV)
E position: Piccolo is forbidden by ARM and stops.
U position: Piccolo runs into undefined instruction and stops.
B position: Piccolo runs into breakpoint and stops.
H position: Piccolo runs into halt instruction and stops.
A position: Piccolo runs into storer failure (loading, storage or Piccolo instruction) and stops.D position: Piccolo detects dead lock condition and stops (as follows).Register S2 is the Piccolo programmable counter: 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9876543210
Programmable counter 0 0
Write-in program counter starting Piccolo is executive routine (if stop then leaving halted state) on this address. and programmable counter is not have definition when resetting, because Piccolo is always by the starting of write-in program counter.
The term of execution, if the state of the execution of Piccolo monitor command and coprocessor interface. it detects:
-Piccolo wait out of service is loaded register again or is waited for that output FIFO has available.
One coprocessor interface is busy waiting, because the space is not enough or output FIF0 discipline is not enough among the ROB.
If detect this two states, the D position in its status register of Piccolo set stops and refusing arm processor instruction, causes ARM to enter undefined instruction trap.
The detection permission of deadlock state constitutes system by reading ARM and Piccolo programmable counter and register and can alert program person occur this state and report accurate trouble spot at least.Should emphasize that deadlock can only destroy the state initiation of Piccolo owing to another part of incorrect program or system.Deadlock can not be caused by data deficiencies or ' overload '.
Can adopt some kinds of operations from ARM control Piccolo, they are provided by the CDP instruction.If these cDP instruction is only just accepted in privileged mode at ARM. Piccolo will refuse CDP and instruct and cause ARM to be in undefined instruction trap in this state. below be available operation:
-reset
-access module gets the hang of
-start
-forbid
Piccolo can reset in software with the PRESET instruction.
PRESET; Remove the state of p iccolo
With this order number is 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9876543210
COND 1110 0000 0000 0000 PICCOLOI 000 0 0000
Following situation appears when carrying out this instruction :-all register taggings are removed input ROB for empty (being ready to refill) .-.-remove and export FIFO.-reset cycle counter.-Pioccolo is placed halted state (with the H position of set S2).
Carrying out the PRESET instruction can take some cycles and finish (for present embodiment 2-3).When carrying out it, the arm coprocessor instruction that the back will be carried out on Piccolo will be in busy waiting.
In the conditional access pattern, can use STC and LDC instruction to preserve and the state that recovers Piccolo (face is about visiting the Piccolo state from ARM as follows).For the access module that gets the hang of, must at first carry out the PSTATE instruction:
The PSTATE access module that gets the hang of
With this order number be: 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9876543210
COND 1110 0001 0000 0000 PICCOLOI 000 ?0 0000
When carrying out, PSTATE instruct the general :-stop Piccolo (if it does not stop as yet), the E position in the status register of set Piccolo.
-configuration Piccolo enters in its conditional access pattern.
Carry out the PSTATE instruction and can take some cycles and finish, because the instruction pipelining of Piccolo must use up before stopping.When carrying out, the arm coprocessor instruction that the back will be carried out on Piccolo will be busy waiting.
PENABLE and PDISABLE instruction is used for fast context switches. and when Piccolo is under an embargo, can only visit special register O and l (ID and status register), and be during from privileged mode.Visit any other state or will cause the ARM undefined instruction unusual from any visit of user model.Forbid that Piccolo causes it to stop to carry out.When Piccolo stopped to carry out, it confirmed this fact by the E position in the SM set mode register.
Piccolo starts by carrying out the PENABLE instruction:
PENABLE; Start Piccolo
With this order number be: 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9876543210
COND 1110 0010 0000 0000 PICCOLOI 000 ?0 0000
Picclol forbids by carrying out the PDISABLE instruction: PDISABLE; Forbid that Piccolo with this order number is: 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9876543210
COND 1110 0011 0000 0000 PICCOLOI 000 ?0 0000
When carrying out this instruction, following situation appears:
The instruction pipelining of-Piccolo will flow.
-Piccolo will shut down and the SM set mode register in the H position.
The Piccolo of Piccolo instruction cache retentive control Picclo data routing instruction. if exist, its guarantees to keep at least 64 instructions, and is initial on 16 word boundarys, and following ARM operational code collects among the MCR.It is operating as forces cache memory to take out initial delegation (16) instruction of (must be 16 word boundarys) on assigned address.Even this taking-up also takes place in the data that cache memory has maintained about this address.
PMIR Rm
Piceolo must stop before carrying out PMIR.
The MCR of this operational code is encoded to: 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9876543210
COND 1110 011 L COCO Rm ?PICCOLOI ?000 I 0000
The Piccoloo instruction set of this section discussion control Piccolo data routing.Each the instruction be 32 long.Instruction is read from the Piccolo instruction cache.
The decoding instruction collection is quite intuitively.High 6 (26 to 31) provide the main operation sign indicating number, and position 22 to 25 provides the minor actions sign indicating number for the minority specific instruction.The position of band gray shade is current not to make Zhou Erwei expansion keep (current they must comprise designated value).
11 main instruction class are arranged.And this is not exclusively corresponding to the main operation sign indicating number that proposes in instruction, and this is for the ease of some subclass of decoding.3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 9 8 7 6 5 4 3 2 1 01 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
0 OPC ?F S D DEST S 1 R 1 SRC1 SRC2
1 000 OPC ?F S D DEST S 1 R 1 SRC1 SRC2
1 001 0 ?O ?P ?F S D DEST S 1 R 1 SRC1 SRC2
1 0011
1 010 OPC ?F S D DEST S 1 R 1 SRC1 SRC2_SHIFT
1 011 00 ?F S D DEST S 1 R 1 SRC1 SRC2_SEL COND
1 011 01
1 011 1 ?O ?P ?F S D DEST S 1 R 1 SRC1 SRC2_SEL COND
1 10 0 O P ?S ?a ?F S D DEST A 1 R 1 SRC1 SRC2_MULA
1 10 ?1 ?0
1 10 ?1 O P ?1 ?F S D DEST A 1 R 1 SRC1 0 A 0 R 2 SRC2_REG SCALE
1 110
1 11100 ?F S D DEST IMMEDIATE_15 R 2
1 11101
1 11110 ?0 ?RFIELD_ ?4 0 R 1 SRC1 #INSTRUCTIONS_8
1 11110 ?1 ?RFIELD_ ?4 #LOOPS_13 #INSTRUCTION_8
1 11111 ?0 ?OPC REGISTER_LIST_16 SCALE
1 11111 ?100 IMMEDIATE_16 COND
1 11111 ?101 PARAMETERS_21
1 11111 ?11 O P
Instruction in the last table has following title:
The normal data computing
Logical operation
Condition adds/subtracts
Undefined
Displacement
Select
Undefined
The parallel selection
Multiply accumulating
Undefined
Double taking advantage of
Undefined
The moving belt symbol is counted immediately
Undefined
Repeat
Repeat
The array of registers table handling
Shift
Renaming parameter transmits
Stop/interrupting
Describe the form of all kinds of instructions in the joint below in detail.For the great majority instruction, source and destination operand field are general and describe in detail that register remaps too in independent joint.
The great majority instruction needs two source operands; Source 1 and source 2.Some exception is saturated absolute value.
Source 1 (SRC1) operand has following 7 bit formats:
18 17 16 15 14 13 12
Size Recharge Register number High/low
Field element has following implication:
The operand size that-size-indication will be read (1=32 position, 0=16 position).
-recharge-stipulate after reading and register tagging should be empty also recharging from ROB.
In 16 32 bit registers that-register number-coding will read which.
-high/low-for 16 read to indicate read 32 bit registers which half.For 32 positional operands, during set the indication should exchange two 16 half.
Size High/low The register section of access
0 0 Low 16
0 1 High 16
1 0 Complete 32
1 1 Complete 32, the two halves exchange
In assembly routine by adding that on register number suffix specifies register size: 1 is low 16, h be high 16 or.X has height and 32 that hang down 16 exchanges.
General source 2 (SRC2) has one of following three kind of 12 bit format: 11 10 9876543210
0 S2 R2 Register number High/low Scale
1 0 ROT IMMED_8
1 1 IMMED_6 Scale
Fig. 4 illustrates according to high/low position and size suitable half with selected register and switches to multiplexed apparatus on the Piccolo data routing.If 16 of size position indications, the then symbol expanded circuit is used the high position in 0 or 1 padding data path as required.
First kind of coding assigned source is register, and these fields have the coding identical with the SRC1 specifier.Scale (SCALE) field is specified the scale on the result that will act on ALU.
Scale Operation
3 2 1 0
0 0 0 0 ASR#0
0 0 0 1 ASR#1
0 0 1 0 ASR#2
0 0 1 1 ASR#3
0 1 0 0 ASR#4
0 1 0 1 Keep
0 1 1 0 ASR#6
0 1 1 1 ASL#1
1 0 0 0 ASR#8
1 0 0 1 ASR#16
1 0 1 0 ASR#10
1 0 1 1 Keep
1 1 0 0 ASR#12
1 1 0 1 ASR#13
1 1 1 0 ASR#14
1 1 1 1 ASR#15
Having 8 available 8 place values of generation of number permission immediately of loop coding and 32 of 2 cyclic representations counts immediately.Express the numerical value immediately that can generate from 8 place value XY down:
Circulation Count immediately
00 0x000000XY
01 0x0000XY00
10 0x00XY0000
11 0xXY000000
6 immediately number encoder allow to use 6 not signedly to count (from 0 to 63) immediately, and act on the scale in the output of ALU.
Universal source 2 codings are general for great majority instruction modification.There are some exceptions in this rule, the finite subset of their support sources 2 codings or it is revised a little:
-selection instruction.
-shift order.
-parallel work-flow.
The instruction of-multiply accumulating.
-take advantage of double instruction.
Selection instruction is only supported a not tape symbol operand of number immediately of register or 6.Because these mode fields by instruction are used and must be made this scale unavailable.11 10 9 8 7 6 5 4 3 2 1 0
0 S2 R2 Register number High/low State
1 1 IMMED_6 State
SRC2_SEL
Shift order is only supported the operand that 5 no symbols between 16 bit registers or 1 and 31 are counted immediately.Can not obtain result's scale.11 10 9 8 7 6 5 4 3 2 1 0
0 0 ?R2 Register number High/low 0 0 0 0
1 0 0 0 0 0 0 IMMED_5
SRC2_SHIFT
In the parallel work-flow situation,, then must carry out 32 and read if specify the source of register as operand.The number encoder immediately of parallel work-flow is slightly different.Its allow with one immediately number copy to two of 32 positional operands 16 half in.Parallel work-flow can utilize the scale of limited field a little.11 10 9 8 7 6 5 4 3 2 1 0
0 1 R2 Register number High/low SCALE_PAR
1 0 ROT IMMED_8
1 1 IMMED_6 SCALE_PAR
SRC2_PARALLEL
If use 6 to count immediately, then always it is copied to two of 32 amounts on half.If use 8 to count immediately, have only the top that is recycled to 32 amounts when the circulation indication is should be with 8 several immediately just to duplicate on half the time.
Circulation Count immediately
00 ?0x000000XY
01 ?0x0000XY00
10 ?0x00XY00XY
11 ?0xXY00XY00
Parallel selection operation does not use scale; Scale field that must these instructions is set to 0.
The multiply accumulating instruction does not allow to specify 8 circulations to count immediately.The position 10 of this field is used for partly specifying which totalizer of use.16 positional operands are contained in source 2.11 10 9 8 7 6 5 4 3 2 1 0
0 ?A0 R2 Register number High/low Scale
1 ?A0 IMMED_6 Scale
SRC2_MULA
Take advantage of double instruction not allow to use constant.Can only specify one 16 bit register.The position 10 of this field is used for partly specifying which totalizer of use.11 10 9 8 7 6 5 4 3 2 1 0
0 ?A0 ?R2 Register number High/low Scale
SRC2_MULD
32 bit manipulations (as ADDADD) are always contained in some instruction, and should size position be set to 1 in these situations, high/low position be used for exchanging selectively two of 32 positional operands 16 half.Some instruction is always contained 16 bit manipulations (as MUL) and should be set to 0 in the size position.And high/low position is selected which half (the size position that loses has been removed in supposition) of employed register.The multiply accumulating instruction allows independent explanation source totalizer and destination register.For these instructions, size position is used to refer to the source totalizer, and the size position is 0 to contain by instruction type then.
(by A or B bus) carried out sign extended automatically it is extended to 32 amounts when reading 16 place values.If read 48 bit registers (by A or B bus), 32 of bottoms only appear on bus.Thereby in all situations, all convert source 1 and source 2 to 32 place values.Have only whole 48 that the instruction that adds up of using bus C can the access accumulator registers.
If set recharges the position, just after using with this register tagging as sky and will recharge (seeing joint) from ROB by the common mechanism that recharges about ROB.Unless as source operand, Piccolo can be not out of service again for this register before recharging.Minimum period number (optimal cases-data are waited at the ROB head) before the data that recharge are effective is 1 or 2.Therefore the data that recharge are not used in suggestion in the instruction that recharges the request back.If can avoid using operand in two instructions in the back, should do like this, because this can prevent the performance loss that the deep-water current waterline is realized.
In assembly routine, recharge the position by adding that on register number suffix " ^ " is specified.Be labeled as empty register section and depend on the register manipulation number.The two halves of each register can be labeled as recharge independently (for example X0.l^ mark recharge X0 the bottom half, X0^ then mark recharges whole X0).When the top " half " that recharges 48 bit registers (position 47: 16), 16 bit data are write position 31: 16 and sign extended puts 47 in place.
If (as ADD X0, X0^ X0^), only once fills to attempt to recharge twice in same register.Assembly routine only allows grammer ADD X1, X0, X0^.
If attempted to read this register before recharging a register, Piccolo wait out of service recharges this register.If flag register is for recharging, and upgraded this register before reading the value that recharges, the result is uncertain (ADD X0 for example, X0^, X1 is uncertain, because its mark X0 recharges, recharges by X0 and X1 sum are placed on wherein then).
14 kinds of scale types of 4 scale code field:
- ASR # 0,1,2,3,4,6,8,10
-ASR #12 to 16
-LSL #1
Parallel maximum/minimum instruction does not provide scale, does not therefore use 6 constant modification (assembly routine is set to 0) in source 2.
Support that in repetitive instruction register remaps, allow the moving of repetitive instruction access register ' window ' and the circulation of not unrolling.Following more detailed description this point.
The destination operand has following 7 bit formats: 25 24 23 22 21 20 19
F SD HL DEST
This basic coding has 10 kinds of modification:
Assembly routine memonic symbol 25 24 23 22 21 20 19
0 1 0 Dx
1 1 0 Dx
0 0 0 Dx
1 0 0 Dx
0 0 1 Dx
1 0 1 Dx
0 1 1 0000
1 1 1 0 ?0 00
1 1 1 0 ?1 00
1 1 1 1 ?0 00
1 1 1 1 ?1 00
Dx 1 Dx ^2 Dx.l 3 Dx.l ^4 Dx.h, 5 Dx.h ^6 undefined .1 (16 of no register write back) 7 " " (32 of no register write back) 8 .l^ (16-position) output 9 ^ (32-position) output 10
Register number (DX) indication just addressing be in 16 registers which.Addressing each 32 bit register as a pair of 16 bit registers are worked with the size position in high/low position.How a size definition is provided with defined appropriate mark in the instruction type, no matter whether the result is write registers group and output FIFO, this allows constituent ratio to reach near order.Implicate the addition class instruction that adds and the result must be write back register.
Express the performance of each coding down:
Coding Register is write FIFO writes The V sign
1 Write whole register Do not write 32 overflow
2 Write whole register Write 32 32 overflow
3 Write low 16 and arrive Dx.l Do not write 16 overflow
4 Write low 16 and arrive Dx.l Write low 16 16 overflow
5 Write low 16 and arrive Dx.h Do not write 16 overflow
6 Write low 16 and arrive Dx.h Write low 16 16 overflow
7 Do not write Do not write 16 overflow
8 Do not write Do not write 32 overflow
9 Do not write Write low 16 16 overflow
10 Do not write Write 32 32 overflow
In all situations, any operation writes back register or inserts output FIFO result before is 48 amounts.Exist two kinds of situations:
If write is 16, by selecting bottom 16 [15: 0] 48 amounts is reduced to 16 amounts.If instruct saturated, then the value with saturated in scope-2^15 to 2^15-1.Then 16 place values are write back to the register of appointment, write the FIFO position, then write output FIFO if be provided with.If it is write output FIFO, then it is remained to up to writing next 16 place values and put into when exporting FIFO with this two values pairing and as 32 single place values.
Write for 32,48 amounts are reduced to 32 amounts by selecting bottom 32 [31: 0].
Write both for 32 with 48, if instruct saturated, just convert 48 place values among scope-2^31-1 to 2^31 32 place values.Then this is saturated:
If-carry out writing back to totalizer, then write whole 48.
If-carry out writing back to 32 bit registers, then write position [31: 0].
If-indication writes back to FIFO, another writes position [31: 0].
The destination size by assembly routine in the register number back with .l or .h appointment.Therefore if do not carry out register write back, then register is unessential, omits destination register and indicates not write register or use ^ to indicate and only write output FIFO.For example, SUB, X0, YO are equivalent to CMP X0, Y0 and ADD^, X0, Y0 puts into output FIFO with the value of X0+Y0.
If the space of output FIFO void value, Piccolo waiting space out of service becomes available.
If write out 16 place values, ADD X0.h^ for example, X1, X2 then latchs this value up to writing second 16 place value.Put into output FIFO with two value combinations and as one 32 figure place then.First that writes 16 place values always appear at 32 words low level half.With the data markers that enters output FIFO is 16 or 32 bit data, to allow proofreading and correct ending in big ending system.
If twice 16 write 32 place values between writing, then operation is undefined.
Support that register remaps in the repetitive instruction, allow the moving of repetitive instruction access register ' window ' and the circulation of not unrolling.Be described in more detail below this point.
In preferred embodiment of the present invention, repetitive instruction provides the mechanism of specifying the mode of register manipulation number in the circulation that is modified in.Under this mechanism, the register that visit is to determine with a function of register manipulation number in the instruction and the volume amount of moving in registers group.This side-play amount changes with programmable way, is preferably in the end of each instruction cycle.This mechanism can be operated on the register that is arranged in X, Y and Z group independently.In preferred embodiment, this facility can not utilize for the register in the A group.
Can use the notion of logical and physical register.Instruction operands is that logic register is quoted, and the physical register that then it is mapped to the specific Piccolo register 10 of sign is quoted.Comprising all operations that recharges interior all operates on physical register.The data that only register occurs in Piccolo instruction stream one side to remap-load Piccolo always are assigned to physical register and do not carry out and remap.
With further reference to Fig. 5 discussion mechanism that remaps, Fig. 5 is the block scheme that some internal parts of Piccolo coprocessor 4 are shown.ARM nuclear 2 data item that retrieve from storer are placed on reset in the preface impact damper 12, Piccolo register 10 then recharges from resetting preface impact damper 12 in the mode of early describing with reference to Fig. 2.Pass to instruction decoder 50 in Piccolo4 with being stored in Piccolo instruction in the cache memory 6, before they are passed to Piccolo processor core 54, decode there.Piccolo processor core 54 comprises early multiplier/adders circuit 20 with reference to Fig. 3 discussion, adding up/tiring out subtracts circuit 22 and calibration/saturated circuit 24.
If instruction decoder 50 is being handled the instruction of formation with the part of the instruction cycle of repetitive instruction sign, and this repetitive instruction has been indicated and should have been carried out remapping of some registers, conveniently carries out necessary remapping with the register logic 52 that remaps.The logic 52 that register can be remapped is thought the part of instruction decoder 50, though the clear logic that register can be remapped of person skilled in the art person is arranged to the entity that complete and instruction demoder 50 separates.
Usually comprise one or more operands that sign comprises the register of the required data item of instruction in the instruction.For example, typical instruction can comprise two source operands and a destination operand, and sign comprises two registers of the required data item of this instruction and the result who instructs should be put into wherein register.The register logic 52 that remaps receives the operand of instruction from instruction decoder 50, and these operand identification logic registers are quoted.Quote according to logic register, whether the register logic that remaps determined should or not to apply and remapped, and will remap as required then to act on physical register and quote.If determining should not apply remaps, quote just provide logic register to quote as physical register.To go through after a while and carry out the preferred mode that remaps.
To quote and pass to Piccolo processor core 54 from the remap physical register of respectively exporting of logic of register, make that processor nuclear energy acts on instruction by on the data item in the particular register 10 of physical register reference identification subsequently.
The mechanism of remapping of preferred embodiment allows each registers group separated into two parts, i.e. the register section that can remap and keep their original registers to quote the register section that does not remap.In the preferred embodiment, the part that remaps originates in the bottom of the registers group that remaps.
The mechanism of remapping adopts several parameters, and these parameters go through with reference to Fig. 6, and Fig. 6 illustrates the register logic 22 that remaps how to use the block scheme of various parameters.Should point out that these parameters are with respect to any the set-point in the group that is remapping, this point is the bottom of this group for example.
Can think that the register logic 52 that remaps comprises two main logical blocks, promptly remap piece 56 and base upgrade piece 58.The logic 52 of remapping register adopts provides the basic pointer that is added in the off-set value that logic register quotes, and upgrades piece 58 by base this basic pointer value is offered the piece 56 that remaps.
Available base initial (BASESTART) signal defines the initial value of basic pointer, and for example this is normally zero, though some other values also can be specified.This basic start signal is passed to the basic multiplexer 60 that upgrades in the piece 58.In repeating the first time of instruction cycle, multiplexer 60 passes to storage unit 66 with basic start signal, and for the repetition of round-robin back, by multiplexer 60 next basic pointer value is offered storage unit 66.
The output of storage unit 66 is passed to the logic 56 that remaps as current basic pointer value, and pass to one of input of the totalizer 62 in the basic more new logic 58.Totalizer 62 also receives provides the basic increment of basic increment size (BASEINC) signal.Totalizer 62 is configured to the current basic pointer value that storage unit 66 is provided is increased this base increment size, and the result is passed to moding circuit 64.
This moding circuit also receive basic ring around (BASEWRAP) value and with this value with from the output base signal-arm of totalizer 62 relatively.If the basic pointer value behind the increment is equal to or greater than basic ring around value, just new basic pointer is rapped around to new off-set value.At this moment the output of moding circuit 64 is next basic pointer value that will be stored in the storage unit 66.This output is offered multiplexer 60, and from there to storage unit 66.
Yet, storage unit 66 receives base renewal (BASEUPDATE) signal from the loop hardware of managing repetitive instruction before, this can not be stored in the storage unit 66 at next basic pointer value.Loop hardware periodically generates basic update signal, for example whenever wanting the repetitive instruction circulation time.When storage unit 66 received basic update signal, storage unit was just rewritten last basic pointer value with next basic pointer value that multiplexer 60 provides.In this way, the basic pointer value that offers the logic 58 that remaps will change over new basic pointer value.
Quote the basic pointer value sum that provides with basic more new logic 58 by the logic register in the operand that is included in instruction at the physical register that the partial memory that remaps of registers group is got determines.This addition be carry out by totalizer 68 and output passed to moding circuit 70.In preferred embodiment, moding circuit 70 is gone back receiving register around value, if surpass register around value from the output signal (logic register is quoted and basic pointer value sum) of totalizer 68, the result will be around the bottom of getting back to the district of remapping.Output with moding circuit 70 offers multiplexer 72 then.
Register counting (REGCOUNT) value is offered the interior logic 74 of the piece 56 of remapping, the number of the register that will remap in the identified group.Logic 74 is quoted comparison with this register count value and logic register, and according to comparative result control signal is passed to multiplexer 72.Multiplexer 72 is quoted as two input receive logic register and the output (register that remaps is quoted) of moding circuit 70.In the preferred embodiment of the present invention,,, logic 74 quotes just instructing register that multiplexer 72 output is remapped to quote as physical register if logic register is quoted less than the register count value.Yet,, quote just logic 74 instructs the direct output logic register of multiplexer to quote as physical register if logic register is quoted more than or equal to the register count value.
As mentioned above, in preferred embodiment, repetitive instruction is called the mechanism of remapping.As going through after a while, repetitive instruction provides four circulations null cycle in hardware.These hardware loop are illustrated among Fig. 5 as the part of instruction decoder 50.Instruction decoder 50 request each time is during from the instruction of cache memory 6, and cache memory just returns to instruction decoder with this instruction, and this moment, instruction decoder judged whether the instruction of returning is repetitive instruction.If just this repetitive instruction is handled in one of configure hardware circulation.
Instruction number in each repetitive instruction designated cycle reaches around round-robin number of times (it is constant or reads the register from Piccolo).Provide two operational codes ' repetition ' (REPEAT) and next (NEXT) define hardware loop, ' next one ' operational code only is not assembled into instruction as delimiter.Repeat from the round-robin starting point, and ' next one ' defines the round-robin end, allows the instruction number in the assembly routine computation cycles body.In preferred embodiment, repetitive instruction can comprise will by register remap that logic 52 uses such as register counting (REGCOUNT), basic increment (BASEINC), basic ring around (BASEWRAP) and register around (REGWRAP) parameter etc. parameter that remaps.
Some registers can be set come the storage register employed parameter that remaps of logic that remaps.In these registers, the some groups of predefined parameters that remap can be provided, keep some registers simultaneously for the user-defined parameter that remaps of storage.If the parameter that remaps with the repetitive instruction appointment equals predefined one of the parameter group that remaps, then adopt suitable repeated encoding, this coding causes multiplexer and so on that the suitable parameter that remaps is directly offered the register logic that remaps from register.Otherwise, parameter is all different with any predefined parameter group that remaps if remap, then assembly routine generates the parameter move instruction (RMOV) of remapping, and its allows the register of configure user definition parameter that remaps, and RMOV instruction back is a repetitive instruction.Preferably the RMOV instruction will user-definedly be remapped to instruct and will be placed on to storing in the register that this user-defined parameter that remaps reserves, and then multiplexer will be programmed for the delivery of content of these registers to the register logic that remaps.
In preferred embodiment, register counting, basic increment, basic ring take off one of value of determining in the table around reaching register around parameter:
Parameter Describe
REGCOUNT (register counting) But it determines to carry out 16 bit register numbers and the value 0,2,4,8 that remaps in the above.The following register of REGCOUNT remaps, more than or what equal REGCOUNT is direct access.
BASEINC (basic increment) This is defined in each and circulates when repeating to finish what 16 bit registers of basic pointer increment.But its value 1,2 or 4 in preferred embodiment, though its desirable other value in fact if desired can comprise negative value in the time of suitably.
BASEWRAP (basic ring around) It determines the upper limit that base calculates.But basic ring winding mold value 2,4,8.
REGWRAP (register around) The upper limit that it is determined to remap and calculates.But register is around mould value 2,4,8.REGWRAP may be selected to be and equals REGCOUNT
Referring to Fig. 6, how the piece 56 that remaps uses the example of various parameters following (in this example, logical and physical register value is with respect to particular group):
If (logic register<REGCOUNT)
Physical register=(logic register+yl) MOD REGCOUNT
else
Physical register=logic register
end?if
In loop ends place, before round-robin repeats beginning next time, the following renewal that basic more new logic 58 is carried out basic pointer:
Base=(the MOD BASEWRAP of base+BASEINC)
In loop ends place of remapping, close register and remap, then as all registers of physical register access.In the preferred embodiment, have only the REPEAT that remaps (repetition) to enliven on any one time.Circulation also can be nested, but have only a circulation can upgrade the variable that remaps in any particular moment.Yet the repetition of can nestedly remapping if desired.
As the benefit that the result reached that adopts according to the mechanism of remapping of preferred embodiment of the present invention typical piece filter algorithm is discussed below in order to show about code density.The principle of blocking filter algorithm at first is discussed with reference to Fig. 7.As shown in Figure 7, with accumulator registers A0 be configured to the to add up result of several times multiplying, multiplying be multiply by the multiplication of data item d0 for coefficient C0, and coefficient c1 multiply by the multiplication of data item d1, and coefficient c2 multiply by the multiplication of data item d2 etc.The add up result of similar multiplying group of register A1, but at this moment coefficient sets has been shifted and makes c0 multiply by d1 now, c1 multiply by d2, and c2 multiply by d3 etc.Similarly, the result of the register A2 cumulative data coefficient value with one step of right shift again on duty makes c0 multiply by d2, and c1 multiply by d3, and c2 multiply by d4 etc.The process that repeats this displacement then, takes advantage of and add up is placed on the result among the register A3.
If do not adopt register to remap, then need following instruction cycle to come the execution block filtering instructions according to preferred embodiment of the present invention:
Begin with 4 new data value
ZERO{A0-A3}; The zero clearing totalizer
REPEAT Z1; Z1=(coefficient number/4)
Four coefficients below carrying out in the first round
;a0+=d0*c0+d1*c1+d2*c2+d3*c3
;a1+=d1*c0+d2*c1+d3*c2+d4*c3
;a2+=d2*c0+d3*c1+d4*c2+d5*c3
;a3+=d3*c0+d4*c1+d5*c2+d6*c3
MULA A0, X0.l^, Y0.l, A0; A0+=d0*c0, and load d4
MULA A1, X0.h, Y0.l, A1; A1+=d1*c0MULA A2, X1.l, Y0.l, A2; A2+=d2*c0 MULA A3, X1.h, Y0.l^, A3; A3+=d3*c0, and load c4MULA A0, X0.h^, Y0.h, A0; A0+=d1*c1, and load d5MULA A1, X1.l, Y0.h, A1; A1+=d2*c1MULA A2, X1.h, Y0.h, A2; A2+=d3*c1MULA A3, X0.l, Y0.h^, A3; A3+=d4*c1, and load c5MULA A0, X1.l^, Y1.l, A0; A0+=d2*c2, and load d6MULA A1, X1.h, Y1.l, A1; A1+=d3*c2MULA A2, X0.l, Y1.l, A2; A2+=d4*c2MULA A3, X0.h, Y1.l^, A3; A3+=d5*c2, and load c6MULA A0, X1.h^, Y1.h, A0; A0+=d3*c3, and load d7MULA A1, X0.l, Y1.h, A1; A1+=d4*c3MULA A2, X0.h, Y1.h, A2; A2+=d5*c3MULA A3, X1.l, Y1.h^, A3; A3+=d6*c3, and load c7NEXT
In this example, data value is placed in the X registers group coefficient value is placed in the y register group.As the first step, four accumulator registers A0, A1, A2 and A3 are set to zero.The accumulator registers in case resetted, just entry instruction circulation, this circulation (REPEAT) reaches ' next one ' with ' repetition ' and (NEXT) instructs demarcation.Value Z1 determines the number of times that instruction cycle should repeat, and for reason discussed below, the number of its as many as coefficient (c0, c1, c2 etc.) is divided by 4.
Instruction cycle comprises 16 multiply accumulatings instructions (MULA), and these are proposed order and will cause at register A0 after for the first time by circulation, A1, and A2 comprises the result of calculation shown in above-mentioned repetition and article one MULA code between instructing among the A3.In order to illustrate how the multiply accumulating instruction is operated, we will consider preceding four MULA instruction.Article one, instruction first or low 16 data value that X is organized register 0 multiply by in the Y group register 0 low 16, and the result is added among the accumulator registers A0.With low 16 that recharge a mark X group register 0, this indicates the present available new data value of this part of this register to recharge simultaneously.Mark is because as can be seen from Figure 7 in this way, in case data item d0 be multiply by coefficient c0 (by article one MULA instruction expression), just no longer needs for all the other piece filtering instructions d0, therefore can replace with new data value.
Then second MULA instruction with X organize register 0 second or high 16 multiply by low 16 of Y group register 0 (multiplication d1 shown in this presentation graphs 7 * c0).Similarly, multiplication d2 * c0 and d3 * c0 are represented in the 3rd and the 4th MULA instruction respectively.As can be seen from Fig. 7, in case carried out this four calculating, coefficient c0 just no longer needs, and therefore with recharging a flag register Y0.l it can be rewritten with another coefficient (c4).
Below four MULA instruction respectively expression calculate d1 * c1, d2 * c1, d3 * c1 and d4 * c1.In case carried out d1 * c1, just with recharging a flag register x0.h, because no longer need d1.Similarly, in case carried out whole four instructions, just register Y0.h is labeled as for recharging, because no longer need coefficient c1.Similarly, below four MULA instruction corresponding to calculating d2 * c2, d3 * c2, d4 * c2 and d5 * c2, last four instructions is then corresponding to calculating d3 * c3, d4 * c3, d5 * c3 and d6 * c3.
In the above-described embodiments, because register can not remap, each multiplying must be regenerated significantly with the required particular register of appointment in the operand.In case carry out 16 MULA instruction, just can repeat this instruction cycle for coefficient c4 to c7 and data item d4 to d10.And circulate on four coefficient values and operate owing to repeat this each time.So the number of coefficient value must be 4 multiple and must calculate Z1=coefficient number/4.
By adopting the mechanism that remaps according to preferred embodiment of the present invention, can greatly dwindle instruction cycle, make it only comprise 4 multiply accumulating instructions rather than otherwise needed 16 multiply accumulatings instruction.The employing mechanism that remaps becomes following listed with code compiling:
Begin with 4 new data value
ZERO{A0-A3}; The zero clearing totalizer
REPEAT Z1, X++ n4 w4 r4, Y++ n4 w4 r4; Z1=(number of coefficient)
X and Y group are remapped
Four 16 bit registers in these groups that remap
The basic pointer that repeats each time two groups at round-robin increases progressively.
Just wraparound when basic pointer arrives in this group the 4th register.
MULA A0, X0.l^, Y0.l, A0; A0+=d0*c0, and load d4
MULA A1,X0.h,Y0.l,A1 ;a1+=d1*c0
MULA A2,X1.l,Y0.l,A2 ;a2+=d2*c0
MULA A3, X1.h, Y0.l^, A3; A3+=d3*c0, and load c4
NEXT; Rap around to and circulate and remap
As mentioned above, the first step is arranged to 0 with four accumulator registers A0-A3.Enter the instruction cycle that usefulness ' repetition ' and ' next one ' operational code are delimited then.Repetitive instruction has related with it several parameters, and they are:
X++: indication is " 1 " for X registers group base increment.
N4: the indicator register counting is " 4 ", and preceding four X group register X0.l to X1.h therefore will remap
W4: indicate for X registers group basic ring around being " 4 "
R4: indicate for X registers group register around being " 4 "
Y++: indication is " 1 " for y register group base increment
N4: the indicator register counting is " 4 " so preceding 4 Y group register Y0.l to Y1.h that will remap.
W4: indicate for y register group basic ring around being " 4 "
R4: indicate for y register group register around being " 4 "
Be also pointed out that present value Z1 equals to equal number of coefficients/4 in number of coefficients rather than the prior art example.
For the circulation first time of instruction cycle, basic pointer value is 0, does not therefore have and remaps.Yet carry out circulation time, organizing basic pointer value for X and Y all will be " 1 " next time, and it is as follows therefore operand to be remapped:
X0.l becomes X0.h
X0.h becomes X1.l
X1.l becomes X1.h
X1.h becomes X0.l (because basic ring is around being " 4 ")
Y0.l becomes Y0.h
Y0.h becomes Y1.l
Y1.l becomes Y1.h
Y1.h becomes Y0.l (because basic ring is around being " 4 ")
Therefore, can find out when repeating for the second time that in fact four MULA instructions carry out not comprising in the example that remaps of the present invention with the 5th to the 8th the indicated calculating of MULA instruction of early discussing.Similarly, repeat for the 3rd and the 4th time to carry out nine to 12nd and 13rd to 16th the calculating that MULA instruction carry out of front with the prior art code by circulation.
Therefore above-mentioned as can be seen code is carried out and identical filter algorithm of prior art code, but the code density in the loop body has been improved a factor 4, owing to only need provide 4 instructions rather than prior art required 16.
By adopting register according to the preferred embodiment of the present invention technology that remaps, can realize following advantage:
1. improvement code density;
2. in certain occasion, hide from flag register and be the empty stand-by period that the preface impact damper recharges this register that resets to Piccolo.This can reach by separating open cycle with the cost that increases the code size.
3. can access the register-, can change the register number of access of variable number by changing the circulation multiplicity of carrying out; And
4. being convenient to algorithm launches.For suitable algorithm, the n stage that the programmer can be algorithm generates one section code, utilizes register to remap then formula is applied on the slip data set.
Clearly can not depart from the scope of the present invention the above-mentioned register mechanism of remapping is made some change.For example, might by registers group 10 provide than the programmer in instruction operands the more physical register of energy appointment.These extra registers can not direct access, and the register mechanism of remapping can be utilized these registers.For example, consider the previous X registers group of discussing have available 4 32 bit registers of programmer and thereby the utilogic register quote the example of specifying 8 16 bit registers.Might make the X registers group in fact comprise for example 6 32 bit registers, will have 4 16 additional bit registers can not be in this case by programmer's direct access.Yet these four extra registers mechanism of being remapped is utilized, and provides additional register for storing data item whereby.
Can use following assembly routine grammer:
Presentation logic moves to right, perhaps move to left when negative at the shifting function number (face<lscale as follows 〉).
-the expression arithmetic shift right, perhaps move to left when negative at the shifting function number (face<scale as follows 〉).
ROR represents ring shift right
The saturation value (size that depends on destination register is saturated to 16 or 32) of SAT (a) expression a.Particularly, in order to be saturated to 16, any value usefulness+0x7fff greater than+0x7fff replaces, any then usefulness-0x8000 replacement of value less than-0x8000.Be saturated to 32 similarly with the limit+0x7fffffff and-0x80000000.If it is destination register is 48, saturated still on 32.
Source operand 1 can be with one of following form:
<Src1〉will writing a Chinese character in simplified form as [Rn  Rn.l  Rn.h  Rn. *] [^].In other words, 7 of all of source specifier are all effective, and read register as the value of (selectively exchanging) 32 place values or the expansion of 16 bit signs.Only read 32 of bottoms for totalizer.The ^ indicator register recharges.
<src1_16〉be writing a Chinese character in simplified form of [Rn.l  Rn.h] [^].Can only read 16 place values.
<src1_32〉be writing a Chinese character in simplified form of [Rn  Rn.X] [^].Can only read 32 place values, the high and exchange selectively that is at half.
<src_2〉(source operand 2) can be one of following form:
<src2〉be writing a Chinese character in simplified form of three kinds of options
The source-register of-form [Rn  Rn.l  Rn.h  Rn.x] [^] adds the scale (<scale 〉) of net result.
8 constants of-selectable displacement (<immed_8 〉), but do not have the scale of net result.
-6 constants (<immed_6 〉) add the scale (<scale 〉) of net result.
<src2_maxmin〉with<src2 identical but do not allow calibration.
<src2_shift〉provide<src2 the shift order of finite subset.See above-mentioned details.
<src2_par〉<src2_shift〉aspect
Instruction for the appointment 3-operand:
<acc〉any one writing a Chinese character in simplified form in four accumulator registers [A0  A1  A2  A3].Read whole 48.Can not specify and recharge.
Destination register has form:
<dest〉it is writing a Chinese character in simplified form of [Rn  Rn.l  Rn.h  .l ] [^].Be not with ". " expansion to write whole register (being 48 in the totalizer situation).Do not needing to write back in the situation of register, employed register is unessential.The assembly routine support is omitted destination register and is indicated and do not need to write back, or indicates with " .l " and not need to write back, but sign should be set, and is 16 amounts just as the result.^ represents value is write among the output FIFO.
<scale〉the some arithmetic scales of expression.Utilizable have 14 kinds of scales:
ASR?#0,1,2,3,4,6,8,10
ASR #12 to 16
LSL?#1
<immed-8〉not signed 8 immediate values of representative.This comprises ring shift left 0,8, a byte of 16 or 24.Therefore can be any YZ encoded radio 0xYZ000000,0x00YZ0000,0x0000YZ00, and 0x000000YZ.Circulation is to encode as 2 amount.
<imm_6〉represent not signed 6 to count immediately.
<PARAMS〉be used for specifying register to remap and have following form:
<BANK><BASEINC>n<RENUMBER>w<BASEWRAP>
<BANK〉can be [X  Y  Z]
<BASEINC〉can be [++ +1 +2 +4]
<RENUMBER〉can be [0 , 2 , 4  8]
<BASEWRAP〉can be [2 , 4  8]
Expression formula<cond〉be any in the following status code.Notice that coding and ARM are slightly different, because not signed LS and HI sign indicating number are substituted by more useful signed overflow/underflow test.The setting of the V on the Piccolo and N sign is different with ARM's, and therefore the translation of checking from state verification to sign is also different with ARM.
The last result of 0000 EQ Z=0 is 0.
The last result non-0 of 0001 NE Z=1.
0010 CS C=1 uses in displacement/maximum operation back.
0011?CC C=0
The last result of 0100 MI/LT N=1 is for negative
The last result of 0101 PL/GE N=0 is for just
The last tape symbol as a result of 0110 VS V=1 overflows/and saturated
The last result of 0111 VC V=0 do not have overflow/saturated
1000 VP V=1﹠amp; The last result of N=0 is just overflowed
1001 VN V=1﹠amp; Negative the overflowing of the last result of N=1
1010 keep
1011 keep
1100?GT N=0&Z=0
1101?LE N=1Z=1
1110?AL
1111 keep
Because Piccolo handles signed amount, discard not signed LS and HI state and replace with the VP and the VN of any direction of overflowing of description.Because the result of ALU is 48 bit wides, MI and LT carry out identical function now, similarly PL and GE.This stays 3 dead slots for following expansion.
Except as otherwise noted, all computings all are signed.
One-level and secondary status code respectively comprise:
N-is negative.
Z-zero.
The C-carry/tape symbol does not overflow.
The V-tape symbol overflows.
Arithmetic instruction can be divided into two classes: parallel and " full duration "." full duration " instruction only is provided with the one-level sign, and concurrent operation symbol according to result's height with low 16 half one-level and secondary sign are set.
Applying calibration but before writing the destination, N, Z and V sign is according to whole ALU result's calculating.ASR will always reduce the required figure place of event memory, and ASL then increases figure place.In order to prevent 48 results of Piccolo truncation when applying the ASL calibration, figure place is limited in carries out zero detection and overflow.
The N sign calculates when supposing to carry out signed arithmetic operation.This is because when overflowing, and result's most significant digit is one of C sign or N sign, and this depends on that input operand is a tape symbol or not signed.
Whether the indication of V sign any loss of significance occurs as the result of the destination of the result being write selection.If selected not write back, still contain ' size ', and overflow indicator correctly is set.In following situation, occur overflowing:
-when the result is not in scope-2^15 to 2^15-1, write 16 bit registers.
-when the result is not in scope-2^31 to 2^31-1, write 32 bit registers.
Parallel add/subtract instruction result's height be at half on N, Z be set independently indicate with V.
When write accumulator with write the same V of setting of 32 bit registers sign.This is to allow saturated instruction to use totalizer as 32 bit registers.
Saturated absolute value instruction (SABS) also is provided with overflow indicator when the absolute value of input operand does not meet the designated destination.
Carry flag is by adding and subtracting the instruction setting and be used as ' scale-of-two ' sign by MAX/MIN, SABS and CLB instruction.Comprise multiplying at all interior other instruction partial carry signs.
For adding and subtracting computing, be 32 or 16 bit wides according to the destination, carry is by position 31 or position 15 or result's generation.
According to how sign is set, can be with standard arithmetic instruction divide into several classes type:
Adding and subtract in the situation of instruction, if the N position be set all signs of maintenance.If it is as follows that N not set of position then will indicate is upgraded:
If complete 48 results are 0 just set Z.
If complete 48 meta 47 set as a result (bearing) then set N.
The set V if one of following condition is set up:
Destination register is 16 and signed result is put to advance (not in scope-2^15<=x<2^15) in 16 bit registers.
Destination register is 32/48 bit register and signed result is put to advance in 32.
If at summation<src1〉with<src2 the time from the position 31 carry is arranged or from<src1 deduct<src2 the time position 31 borrow does not appear, if then<dest just set C sign (with the desired identical carry value on the ARM) when being 32 or 48 bit registers.If<dest〉be 16 bit registers, if just and position 31 carry set C sign then.
Keep secondary sign (SZ, SN, SV, SC).
The situation of carrying out the multiplication or the instruction that adds up from 48 bit registers.
If complete 48 results are 0 just set Z.
If complete 48 meta 47 set as a result (bearing), then set N.
If (1) destination register be 16 and signed result to put to advance 16 bit registers (not in scope-2^15<=x<2^15) or (2) destination register be 32/48 bit register and signed result is put to advance in 32, just set V.
Keep C.
Keep secondary sign (SZ, SN, SV, SC).
Discuss below comprise logical operation, parallel add with subtract, maximum and minimum, displacement etc. are in other interior instruction.
Add and subtract instruction with two register additions or subtract each other, calibrate this result, a register is got back in storage then.Operand is treated as signed value.For the unsaturation modification, sign upgrades and supplies to select for use, and can upgrade by suppressing sign at the additional N of instruction afterbody.31?30?29?28?27?26 25?24?23?22?21?20?19 18 17?16?15?14?13?12 11?10?9?8?7?6?5?4?3?2?1?0
0 OPC ?F ?S ?D DEST S1 ?R ?1 SRC1 SRC2
The type of OPC designated order
Operation (OPC):
100N0 dest=(src1+src2)(->>scale)(,N)
110N0 dest=(src1-src2)(->>scale)(,N)
10001 dest=SAT((src1+src2)(->>scale))
11001 dest=SAT((src1,src2)(->>scale))
01110 dest=(src2-src1)(->>scale)
01111 dest=SAT((src2-src1)(->>scale))
101N0 dest=(src1+src2-Carry)(->>scale)(,N)
111N0 dest=(src1-src2+Carry-1)(->>scale)(,N)
Memonic symbol:
100N0 ADD{N} <dest>,<src1>,<src2>{,<scale>}
110N0 SUB{N} <dest>,<src1>,<src2>{,<scale>}
10001 SADD <dest>,<src1>,<src2>{,<scale>}
11001 SSUB <dest>,<src1>,<src2>{,<scale>}
01110 RSB <dest>,<src1>,<src2>{,<sca1e>}
01111 SRSB <dest>,<src1>,<src2>{,<scale>}
101N0 ADC{N} <dest>,<src1>,<src2>{,<scale>}
111N0 SBC{N} <dest>,<src1>,<src2>{,<scale>}
Assembly routine is supported following operational code
CMP <src1>,<src2>,
CMN <src1>,<src2>,
CMP is a subtraction, and it is provided with sign and disable register is write.CMN is an addition, and it is provided with sign and disable register is write.
Sign: toply discussed.
The reason that comprises:
It is useful after displacement/maximum/minimum operation carry being inserted register bottom ADC.It also is used for carrying out 32/32 division.It also provides the extended precision addition.N position addition provides more accurate sign control, particularly carry.This makes 32/32 division to carry out on 2 every cycles.
G.729 waiting needs saturated add and subtract.
The increment/decrement counter.RSB is useful (x=32-x is a common operation) for calculating displacement.Need saturated RSB for saturated negating (in being used in G.729).
Add/subtract accumulative total instruction execution and implicate meter and calibration/saturated addition and subtraction.Different with multiply accumulating instruction, can not be independent of destination register and specify totalizer number.Two of the bottoms of destination register provide the 48 bit accumulator acc that will be accumulated to wherein.So ADDA X0, X1, X2, A0 and ADDA A3, X1, X2, A3 are effectively, and ADDA X1, X1, X2, A0 are then invalid.For the instruction of this class, what the result must be write back register-do not allow destination field does not write back coding.31?30?29?28?27?26 25 24?23?22?21?20?19?18?17?16?15?14?13?12?11?10?9?8?7?6?5?4?3?2?1?0
?0 ?0 ?O ?P ?C ?1 ?0 Sa F S D DEST S1 ?R ?1 SRC1 SRC2
The type of OPC designated order.Below acc be (DEST[1: 0]).The indication of Sa position is saturated.
Operation (OPC):
0 dest={SAT}(acc+(src1+src2)){->>scale}
1 dest={SAT}(acc+(src1-src2)){->>scale}
Memonic symbol
0 {S}ADDA <dest>,<src1>,<src2>,<acc>{,<scale>}
1 {S}SUBA <dest>,<src1>,<src2>,<acc>{,<scale>}
The S of order front represents saturated.
Sign: above seeing,
The reason that comprises:
ADDA (adding accumulative total) instruction is useful (for example finding out their mean value) for two words with each cycle summation integer array of totalizer.SUBA (subtracting accumulative total) instruction is useful calculating difference sum (being used for being correlated with); It with two independently value subtract each other and difference be added in the 3rd register.
The addition that rounds up of band can be used and<acc〉different<dest carry out.For example, X0=(X1+X2+16384)〉〉 15 can be by remaining among the A0 and in one-period, finish with 16384.The addition of the constant that band rounds up can be used ADDA X0, X1, and #16384, A0 finishes.
Accurately realize position for ((a_i*b_j)〉〉 k) sum (quite commonly used in TrueSpeech):
Standard P iccolo code is:
MUL t1,a_0,b_0,ASR#K
ADD ans,ans,t1
MUL t2,a_1,b_1,ASR#k
ADD ans,ans,t2
This code has two problems: it is oversize and be not to be added to 48 precision, therefore can not use safeguard bit.Solution is for using ADDA preferably:
MUL t1,a_0,b_0,ASR#k
MUL t2,a_1,b_1,ASR#k
ADDA?ans,t1,t2,ans
This improves 25% speed and keeps 48 precision.
Walk abreast to add/subtract and carry out addition and subtraction on two signed 16 amounts of instruction in remaining on 32 bit registers in pairs.The one-level condition code flag is from high 16 setting as a result, and the secondary sign is then from half renewal of low level.Can only specify the source of 32 bit registers, though these values can be exchanged by half-word as these instructions.With each register each half treat as signed value.Calculating and calibration not loss of accuracy are finished.Therefore ADD ADD X0, X1, X2, ASR#1 will be at the high position and low level of the X0 correct mean value of generation in half.For must respectively instructing of set Sa position providing select for use saturated.31?30?29?28?27?26?25?24 23?22?21?20?19?18?17?16?15?14?13?12?11?10?9?8?7?6?5?4?3?2?1?0
?0 OPC Sa ?F ?S ?D DEST S1 R 1 SRC1 SRC2
OPC defining operation operation (OPC):
000 dest.h=(src1.h+src2.h)->>{scale},
dest.l=(src1.l+src2.l)->>{scale}
001 dest.h=(src1.h+src2.h)->>{scale},
dest.l=(rc1.l-src2.l)->>{scale}
100 dest.h=(src1.h-src2.h)->>{scale},
dest.l=(src1.l+src2.l)->>{scale}
101 dest.h=(src1.h-src2.h)->>{scale},
dest.l=(rc1.l-src2.l)->>{scale}
If set the Sa position, each and/difference be independence saturated.Memonic symbol:
000 {S}ADDADD <dest>,<src1_32>,<src2_32>{,<scale>}
001 {S}ADDSUB <dest>,<src1_32>,<src2_33>{,<scale>}
100 {S}SUBADD <dest>,<src1_32>,<src2_32>{,<scale>}
101 {S}SUBSUB <dest>,<src1_32>,<src2_32>{,<scale>}
S before the order represents saturated.Assembly routine is also supported
CMNCMN <dest>,<src1_32>,<src2_32>{,<scale>}
CMNCMP <dest>,<src1_32>,<src2_32>{,<scale>}
CMPCMN <dest>,<src1_32>,<src2_32>{,<scale>}
CMPCMP<dest 〉,<src1_32 〉,<src2_32 〉,<scale〉} they are not to be with the stereotyped command that writes back to generate.
Sign:
If C two high 16 one halfs of addition from the position 15 carries, just set.
16 half sums are 0 if Z is high, just set.
If high 16 half sums of N are for negative, just set.
If V is high 16 half signed 17 and can not pack into (calibration back) in 16, just set.
Be low 16 half set SZ, SN, SV and SC similarly.
The reason that comprises:
It is parallel that to add with subtracting instruction be useful for carrying out computing on the plural number in remaining on single 32 bit registers.They are used in FFT (Fast Fourier Transform (FFT)) core.It also is useful for the simple vector addition/subtraction of 16 bit data, allows to handle in one-period two elements.
Shift the condition changing in (condition) instruction permission control stream.Piccolo takies three cycles and carries out the transfer of being got.31 30?29?28?27?26 25?24?23 22?21?20 19 18 17 16 15 14 13 12 11 10?9?8?7-6?5?4 3?2?1?0
0 11111 100 000 IMMEDIATE_16 COND
Operation:
If according to one-level sign<cond〉set up, shift with side-play amount.
Side-play amount is signed 16 numbers of words.The scope of current skew is limited in-32768 to+32767 words.
The address computation of carrying out is
Destination address=jump instruction address+4+ side-play amount
Memonic symbol:
B<cond><destination_label>
Sign: unaffected.
The reason that comprises:
Highly useful in most of routines.
The condition instruction that adds deduct is added in src1 src2 conditionally goes up or deduct src2 from src1.31?30?29?28?27 26?25?24 23?22?21?20?19 18?17 16?15?14?13?12?11?10?9?8?7?6?5?4?3?2?1?0
?1 0010 ?O ?P ?C ?F ?S ?D DEST S1 ?R ?1 SRC1 SRC2
The type of OPC designated order.
Operation (OPC):
(if carry set) temp=src1-src2 otherwise temp=src1+src2
dest=temp{->>scale}
(if carry set) temp=src1-src2 otherwise temp=src1+src2
Dest=temp{-〉〉 if scale} but the calibration be to shift left
New value that then will (from src1-src2 or src1+src2) carry is shifted in the bottom.
Memonic symbol:
0 CAS <dest>,<src1>,<src2>,{,<scale>}
1 CASC?<dest>,<src1>,<src2>,{,<scale>}
Sign: above seeing:
The reason that comprises:
Condition adds deduct to instruct efficient division code can be constituted.
Example 1: with 32 among the X0 not signed value divided by 16 among the X1 not signed value (suppose X0<(X1<<16) and X1.h=0).
LSL X1, X1, #15; On remove number
SUB X1, X1, #0; The set carry flag
REPEAT#16
CASC?X0,X0,X1,LSL#1
NEXT
At the circulation end, X0.1 keeps the merchant of division.The value that depends on carry can be recovered remainder from X0.h.
Example 2: with 32 among the X0 on the occasion of divided by 32 among the X1 on the occasion of, band early finishes.
MOV X2, #0; Remove the merchant
LOG Z0, X0; The displaceable figure place of X0
LOG Z1, X1; The displaceable figure place of X1
SUBS Z0, Z1, Z0; The X1 upward displacement is 1 coupling therefore
BLT div_end; X1>X0 so answer are 0
LSL X1, X1, Z0; 1 of coupling front
ADD Z0, Z0, #1; The test number that carries out
SUBS Z0, Z0, #0; The set carry
REPEAT Z0
CAS X0,X0,X1,LSL#1
ADCN X2,X2,X2
NEXT
div_end
In end, X2 keeps the merchant and remainder can recover from X0.
The instruction of counting bit preamble makes the normalization of data energy.31?30?29?28?27?26 25 24?23?22?21?20?19?18?17?16?15?14?13?12 11?10?9?8?7?6?5?4?3?2?1?0
011011 F S D DEST S1 R 1 SRC1 101110000000
Operation:
Dest is set in order to make position 31 and figure place that value in src1 must move to left different with 30.This is a value among the scope 0-30, but except src1 be-1 or 0 special circumstances, at this moment return 31.
Memonic symbol:
CLB <dest>,<src1>
Sign:
If Z result is 0, just set.
N eliminates.
If C src1 is one of-1 or 0, just set.
V keeps.
The reason that comprises:
The step that normalization needs.
Be provided with the execution 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9876543210 that stops to be used to stop Piccolo with break-poing instruction
?1 11111 11 OP 00000000000000000000000
The type of OPC designated order.
Operation (OPC):
0 Piccolo carry out be stopped and in the Piccolo status register set stop
The position.
1 Piccolo carries out and to stop, and in the Piccolo status register set interrupt bit,
And interruption ARM report has arrived breakpoint.
Memonic symbol:
0 HALT
1 BREAK
Sign: unaffected.
Logic instruction actuating logic computing on 32 or 16 bit registers.Operand is treated as signed value not.31?30?29?28?27?26 25?24?23?22?21?20?19?18 17?16?15?14?13?12?11?10?9?8?7?6?5?4?3?2?1?0
?1 000 ?OPC ?F ?S ?D DEST S1 ?R ?1 SRC1 SRC2
The logical operation that the OPC coding will be carried out
Operation (OPC):
00 dest=(src1&src2){->>scale}
01 dest=(src1src2){->>scale}
10 dest=(src1&-src2){->>scale}
11 dest=(src1^src2){->>scale}
Memonic symbol:
00 AND <dest>,<src1>,<src2>{,<scale>}
01 ORR <dest>,<src1>,<src2>{,<scale>}
10 BIC <dest>,<src1>,<src2>{,<scale>}
11 EOR <dest>,<src1>,<src2>{,<scale>}
Assembly routine is supported following operational code:
TST <src1>,<src2>
TEQ <src1>,<src2>
TST be disable register write " with " .TEQ is " EOR " that disable register is write.
Sign:
If Z result is complete 0, just set
N, C, V keep
SZ, SN, SC, SV keep
The reason that comprises:
The voice compression algorithm adopts the combination bit field to come coded message.These fields of extraction/combination are assisted in the bit mask instruction.
Max and Min operational order are carried out maximum and minimum operation.31?30?29?28 27?26?25?24 23?22?21?20?19 18 17?16?15?14?13?12 11?10?9?8?7?6?5?4?3?2?1?0
0 101 ?O ?P ?C ?I ?F ?S ?D DEST S1 ?R ?1 SRC1 SRC2
The type of OPC designated order.
Operation (OPC):
0 dest=(src1<=src2)?src1:src2
1 dest=(src1>src2)?src1:src2
Memonic symbol:
0 MIN <dest>,<src1>,<src2>
1 MAX <dest>,<src1>,<src2>
Sign:
If Z result is 0, just set.
If N result is for negative, just set.
C is for Max: if src2 〉=src1 (dest=src1 situation), set C
For Min: if src2 〉=src1 (dest=src2 situation), set C
V keeps
The reason that comprises:
In order to find out signal intensity, many algorithm scanned samples are found out the maximum/minimum value of the absolute value of sample.To this, MAX and MIN are priceless treasures.Depend on that will find out in the signal first still is last maximal value, operand src1 and src2 can exchange.
MAX X0, X0, #0 convert X0 to the positive number that prunes away from below.
MIN X0, X0, #255 prunes away from above.This is useful for graphics process.
Maximal value and minimum operation are carried out in Max in the parallel instruction and Min computing on 16 parallel bit data.31?30?29?28 27?26?25 24?23?22?21?20?19 18 17 16?15?14?13?12 11?10?9?8?7?6?5?4?3?2?1?0
0 111 ?O ?P ?1 ?F ?S ?D DEST S1 ?R ?1 SRC1 SRC2-PARALLEL
The type of OPC designated order.Operation (OPC):
0 dest.l=(src1.l<=src2.l)?src1.l:src2.l
dest.h=(src1.h<=src2.h)?src1.h:src2.h
1 dest.l=(src1.l>src2.l)?src1.l:src2.l
dest.h=(src1.h>src2.h)?src1.h:src2.h
Memonic symbol:
0 MINMIN <dest>,<src1>,<src2>
1 MAXMAX <dest>,<src1>,<src2>
Sign:
If high 16 of Z result is 0, just set.
If high 16 of N result is negative, just set.
C is for Max: if src2.h 〉=src1.h
(dest=src1 situation), set C
For Min: if src2.h=src1.h
(dest=src2 situation), set C.
V keeps.
SZ, SN, SC, SV are low 16 half set similarly.
The reason that comprises:
About 32 Max and Min.
Transmitting the long operational orders of counting immediately allows register is arranged to the value that any signed 16, symbol extend.Article two, this instruction 32 bit registers can be arranged to any value (by sequential access high-order with low level half).See selection operation for the transmission between the register.31 30?29?28?27?26 25?24 23?22?21?20?19 18?17?16?15?14?13?12?11?10?9?8?7?6?5?4 3 2?1?0
1 11100 ?F ?S ?D DEST IMMEDIATE_15 -1 - 000
Memonic symbol
MOV <dest>,#<imm_16>
Assembly routine utilizes the MOV instruction that non-interlocking NOP (blank operation) operation is provided, that is, NOP is equivalent to MOV, #0.
Sign: indicate unaffected.
The reason that comprises:
Initialization register/counter.
The multiply accumulating operational order is carried out signed multiplication and is added up or tired subtract (de-accumulation), and calibration is with saturated.31?30?29?28?27?26?25 24?23?22?21?20?19 18?17 16?15?14?13?12?11?10?9?8?7?6?5?4?3?2?1?0
?1 10 ?OPC Sa F S D DEST A 1 R 1 SRC1 SRC2_MULA
The type of field OPC designated order.
Operation (OPC):
00 dest=(acc+(src1*src2)){->>scale}
01 dest=(acc-(src1*src2)){->>scale}
In each situation, if set the Sa position, before writing the destination that the result is saturated.
Memonic symbol:
00 {S}MULA <dest>,<src1_16>,<src2_16>,<acc>{,<scale>}
01 {S}MULS <dest>,<src1_16>,<src2_16>,<acc>{,<scale>}
S indication before the order is saturated.
Sign: see and go up joint.
The reason that comprises:
Need lasting MULA of monocycle for FI R code.MULS is used in the FFT butterfly circuit.The multiplication MULA that rounds up for band also is useful.For example can in one-period, finish A0=(X0*X1+16384) by remaining in another totalizer (for example A1) with 16384〉〉 15.Also need different<dest for the FFT core〉with<acc 〉.
Take advantage of double computing (Multiply Double Operation) instruction fill order sign multiplication, add up or tiredly subtract, calibration and saturated before the result is doubled.31?30?29 28 27?26?25?24 23?22?21?20?19 18 17?16?15?14?13?12 11 10 9 8?7?6?5?4 3?2?1?0
?1 10 ?1 ?O ?P ?C ?1 ?F ?S ?D DEST A 1 ?R ?1 SRC1 0 A 0 R 2 SRC2 SCALE
The type of OPC designated order.
Operation (OPC):
0 dest=SAT((acc+SAT(2*src1*src2)){->>scale})
1 dest=SAT((acc-SAT(2*src1*src2)){->>scale})
Memonic symbol:
0 SMLDA <dest>,<src1_16>,<src2-16>,<acc>{,<scale>}
1 SMLDS <dest>,<src1_16>,<src2_16>,<acc>{,<scale>}
Sign: see and go up joint.
The reason that comprises:
G.729 reach other algorithm that makes arithmetical operation decimally and need the MLD instruction.Most of DSP provide can add up or write back before move to left in the output at multiplier one little digital modeling.It provides bigger programming dirigibility as specific instruction support.The name that is equivalent to some G series fundamental operation is called:
L_msu=>SMLDS
L_mac=>SMLDA
They utilize the saturated of multiplier moving to left one the time.The decimal multiply accumulating of a sequence and loss of accuracy not can adopt MULA if desired, itself and remain in 33.14 forms.In case of necessity, can when finishing, utilization move to left and saturated 1.15 forms that are transformed into.
Signed multiplication is carried out in the multiplying instruction, and the calibration of selecting for use/saturated.(just 16) treat as signed number with source-register.31?30?29?28?27?26 25?24?23?22?21?20?19 18 17?16?15?14?13?12 11?10?9?8?7?6?5?4?3?2?1?0
00011 ?O ?P ?C ?F ?S ?D DEST S1 ?R ?1 SRC1 SRC2
The type of OPC designated order.
Operation (OPC):
0 dest=(src1*src2){->>scale}
1 dest=SAT((src1*src2){->>scale})
Memonic symbol:
0 MUL <dest>,<src1_16>,<src2>{,<scale>}
1 SMUL <dest>,<src1_16>,<src2>{,<scale>}
Sign: see and go up joint.
The reason that comprises:
Many processing need tape symbol and saturated multiplication.
The array of registers table handling is used for executable operations on one group of register.Provide empty and zero instruction be used for before the routine or between the reset register of selection.The content stores that the register of output order with listing is provided is in output FIFO.31 30?29?28?27?26 25 24?23?22 21?20 19?18?17?16?15?14?13?12?11?10?9?8?7?6?5?4 3?2?1?0
1 11111 ?0 OPC 00 REGISTER_LIST_16 SCALE
The type of OPC designated order.
Operation (OPC):
000 for (k=0; K<16; K++) if set the position k of register tabulation,
Then register k is labeled as sky.
001 for (k=0; K<16; K++) if set the position k of register tabulation,
Then register k is arranged to comprise 0.
010 is undefined
011 is undefined
100 for (k=0; K<16; K++) if set the position k of register tabulation,
Then incite somebody to action (register k-〉〉 scale) write among the output FIFO.
101 for (k=0; K<16; K++) if set the position k of register tabulation,
Then incite somebody to action (register k-〉〉 scale) be written among the output FIFO and register
K is labeled as sky.
110 for (k=0; K<16; K++) if set the position k of register tabulation,
Then SAT (register k-〉〉 scale) is write among the output FIFO.
111 for (k=0; K<16; K++) if set the position k of register tabulation,
Then SAT (register k-〉〉 scale) is write among the output FIFO and will deposit
Device k is labeled as sky.
Memonic symbol:
000 EMPTY <register_list>
001 ZERO <register_list>
010 Unused
011 Unused
100 OUTPUT <register_list>{,<scale>}
101 OUTPUT <register_list>^{,<scale>}
110 SOUTPUT <register_list>{,<scale>}
111 SOUTPUT <register_list>^{,<scale>}
Sign:
Unaffected
Example:
EMPTY {A0,A1,X0-X3}
ZERO {Y0-Y3}
OUTPUT {X0-Y1}^
Assembly routine is also supported grammer
OUTPUT Rn
In this case, utilize MOV^, one of Rn instruction output is posted
Storage.EMPTY instruction will stop up to
All registers that will empty comprise valid data
(promptly not empty).
The array of registers table handling must not be used in the REPEAT that remaps (repetition) circulation.
Output (OUTPUT) instruction can only be specified 8 registers of output at most.
The reason that comprises:
After routine finished, next routine expected that all registers are empty so that it can receive data from ARM.Need EMPTY to instruct and accomplish this point.Before carrying out FIR or filtrator, need all totalizers and partial results zero clearing.ZERO (zero) instruction assists to accomplish this point.By replacing a series of single register transfers, the both is designed to improve code density.Comprise OUTPUT (output) instruction by replacing a series of MOV^, Rn instructs and improves code density.
The register that provides the parameter move instruction RMOV that remaps the to allow configure user definition parameter that remaps.
This order number is as follows: 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9876543210
1 11111 101 00 ZPARAMS YPARAMS XPARAMS
Each PARAMS field comprises following item:
6 5 4 3 2 1 0
BASEWRAP BASEINC 0 RENUMBER
These implication is as follows:
Parameter Explanation
RENUMBER To carry out the 16 bit register numbers that remap thereon, but value 0,2,4,8, the following register of RENUMBER remaps above direct access.
BASEINC The amount that the base pointer increases during each loop ends.But value 1,2 or 4.
BASEWRAP But basic ring winding mold value 2,4,8.
Memonic symbol:
RMOV<PARAMS>,[<PARAMS>]
<PARAMS〉field has following form;
<PARAMS> ∷=<BANK><BASEINC>n<RENUMBER>
w<BASEWRAP>
<BANK> ∷=[XYZ]
<BASEINC>?∷=[+++1+2+4]
<RENUMBER>∷=[0248]
<BASEWRAP>∷=[248]
If it is movable using the RMOV instruction to remap simultaneously, its behavior is UNPREDICTABLE (unpredictable).
Sign: unaffected
Repetitive instruction provides 4 circulations null cycle in the hardware.The hardware loop that the repetitive instruction definition is new.Piccolo utilizes hardware loop 0 for article one repetitive instruction, for the repetitive instruction that is nested in first repetitive instruction is utilized hardware loop 1 or the like.Repetitive instruction does not need to specify is using for which circulation.Repetitive cycling must be strict nested.If attempt nested loop to greater than 4 the degree of depth, then behavior is uncertain.
Instruction number in each repetitive instruction designated cycle (be right after repetitive instruction back) and by round-robin number of times (it is constant or reads the register from Piccolo).
If the circulation in instruction number less (1 or 2) Piccolo could set up circulation with additional cycles.
If cycle count is the register appointment, then contains 32 accesses (S1=1), but only think that 16 of bottoms are effective and numeral is not signed.If cycle count is 0, then the round-robin operation is undefined.Therefore take duplicating of cycle count, can reuse this register (even recharging) immediately and do not influence circulation.
Repetitive instruction provides the mechanism of the mode of revising the register manipulation number in the designated cycle.Described above the details.
The coding of repetition that has the period of register appointment: 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9876543210
?1 11110 ?0 RFIELD_4 00 ?0 R 1 SRC1 0000 #INSTRUCTIONS_8
The coding of the repetition of the period that band is fixing: 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9876543210
1 11110 ?1 ?RFIELD_4 #LOOPS_13 #INSTRUCTIONS_8
The RFIELD operand specifies in and uses any of 16 kinds of parameter configuration that remap in the circulation.
RFIELD The operation of remapping
0 Do not carry out and remap
1 User-defined remapping
2..15 The configuration TBD that remaps that presets
Assembly routine provides two operational code REPEAT and NEXT to define hardware loop, and REPEAT at the beginning of the cycle and NEXT defines round-robin and finishes allows the instruction number in the assembly routine computation cycles body.As for REPEAT, it need only be as constant or register designated cycle number of times.For example:
REPEAT X0
MULA A0,Y0.l,Z0.l,A0
MULA A0,Y0.h^,Z0.h^,A0
NEXT
This will carry out two MULA instructions X0 time.Simultaneously,
REPEAT #10
MULA A0,X0^,Y0^,A0
NEXT
To carry out multiply accumulating 10 times.
Assembly routine is supported grammer:
REPEAT#iterations[,<PARAMS 〉] repeat the used parameter that remaps with appointment.If the required parameter that remaps equals one of predefined parameter group, then use suitable REPEAT coding.If not, then assembly routine will generate RMOV and load user-defined parameter, and the REPEAT instruction is followed in the back.See the RMOV instruction in the top joint and the details of the parameter format that remaps.
If the round-robin multiplicity is 0 then the operation of REPEAT is uncertain.
If the numeral of instruction field is set to 0 then the operation of REPEAT is uncertain.
Circulation only comprises an instruction and this instructs when shifting, and then has uncertain performance.
It out-of-bounds is uncertain transferring to this round-robin in REPEAT circulation circle.
The saturated absolute value in source 1 is calculated in saturated absolute value instruction.31?30?29?28?27?26?25?24?23?22?21?20?19?18 17?16?15?14?13?12?11?10?9?8?7?6?5?4?3?2?1?0
0 10011 ?F ?S ?D DEST S1 ?R ?1 SRC1 10000000000
Operation:
dest=SAT((src1>=0)?src1:-src1)。This value is always saturated.
Memonic symbol:
SABS<dest>,<src1>
Sign:
If Z result is 0, just set.
N keeps.
If C is src1<0 (dest=-src1 situation), just set.
V is if saturated, just set.
The reason that comprises:
It is useful in many DSP use.
Selection operation (condition transmission) is used for conditionally source 1 or source 2 being sent in the destination register.Select always to be equivalent to transmission.Also have parallel add/subtract after the parallel work-flow of use.
The reason of attention in order to realize can read two source operands, if one of them is empty, instruction will stop, no matter whether this operand is strict the needs.31?30?29?28 27?26 25?24?23?22?21?20?19 18 17?16?15?14?13?12 11?10?9?8?7?6?5?4?3?2?1?0
1 011 ?OPC ?F ?S ?D DEST S1 R 1 SRC1 SRC2_SEL
The type of OPC designated order.
Operation (OPC):
If 00<cond〉the one-level sign is set up then dest=src1 otherwise dest=
src2
If 01<cond〉to the one-level sign set up then dest.h=src1.h otherwise
dest.h=src2.h
If<cond〉to two pole marks will set up then dest.l=src1.l otherwise
dest.l=src2.l
If 10<cond〉to the one-level sign set up then dest.h=src1.h otherwise
dest.h=src2.h
If<cond〉to the secondary sign set up then dest.l=src1.l otherwise
dest.l=src2.l
11 keep
Memonic symbol
00 SEL<cond> <dest>,<src1>,<src2>
01 SELTT<cond> <dest>,<src1>,<src2>
10 SELTF<cond> <dest>,<src1>,<src2>
11 need not
If register tagging for recharging, is unconditionally recharged it.Assembly routine also provides following memonic symbol:
MOV<cond> <dest>,<src1>
SELFT<cond> <dest>,<src1>,<src2>
SELFF <cond><dest>,<src1>,<src2>
MOV<cond〉A, B is equivalent to SEL<cond〉A, B, A.By exchanging src1 and src2 and using SELTF, SELTT to obtain SELFT and SELFF.
Sign: keep all signs so that can carry out a sequence selection.
The reason that comprises:
Be used for onlinely making simple decision and need not relying on transfer.Being used for the Viterbi algorithm reaches when sample or vector scanning greatest member.
The shifting function instruction provides logic left and moves to right the amount of arithmetic shift right and circulation appointment.Think shift amount be take from content of registers least-significant byte-128 and+signed integer or counting immediately in scope+1 to+31 between 127.The displacement of negative amount causes superior displacement ABS (shift amount) in the other direction.
With input operand sign extended to 32; Thereby 32 output symbols that will draw before writing back expand to 48 and write the performance of 48 bit registers rationally.31?30?29?28 27?26 25?24 23?22?21?20?19 18?17 16?15?14?13?12 11?10?9?8?7?6?5?4?3?2?1?0
1 010 ?OPC ?F ?S ?D DEST S1 R 1 SRC1 SRC2_SEL
The type of OPC designated order.
Operation (OPC):
00 dest=(src2>=0)?src1<<src2:src1>>-src2
01 dest=(src2>=0)?src1>>src2:src1<<-src2
10 dest=(src2>=0)?src1->>src2:src1<<-src2
11 dest=(src2>=0)?src1?ROR?src2:src1?ROL-src2
Memonic symbol:
00 ASL <dest>,<src1>,<src2_16>
01 LSR <dest>,<src1>,<src2_16>
10 ASR <dest>,<src1>,<src2_16>
11 ROR <dest>,<src1>,<src2_16>
Sign:
If Z result is 0, just set.
If N result is for negative, just set.
V keeps
The value (with the same on ARM) of last that C is arranged to be shifted out
The behavior of the displacement of register appointment is:
-LSL displacement 32 obtains a result 0, and C is set to the position 0 of src1.
-LSL displacement obtains a result 0 more than 32, C is set to 0.
-LSR displacement 32 obtains a result 0, and C is set to the position 31 of src1.
-LSR displacement obtains a result 0 more than 32, C is set to 0.
-ASR displacement build 32 or later draws the position 31 that is set to src1 with position 31 fillings of src1 and C.
-ROR displacement 32 has the position 31 that the result equals src1 and C is arranged to src1.
-ROR displacement n position, wherein n provides and carries out the ROR displacement n-32 identical result in position greater than 32; Therefore from n, repeat to deduct 32 till this amount is in 1 to 32 scope, on seeing.
The reason that comprises:
Power with 2 is taken advantage of/is removed.The position is extracted with field.Serial register.
Undefined instruction is stated as above in the instruction set inventory.As if their execution will cause Piccolo to stop to carry out, and the U position in the SM set mode register, and forbid itself (having removed the E position in the control register).This allows to intercept and capture any following of instruction set and expands and emulation selectively on existing the realization.
As follows from ARM visit Piccolo state.The conditional access pattern is used for observing/revising the state of Piccolo.Be that two kinds of purposes are provided with this mechanism:
-context switches.
-debugging.
By carrying out the PSTATE instruction Piccolo is placed the conditional access pattern.This pattern allows with a sequence STC and LDC instruction preservation and recovers all Piccolo states.When getting the hang of access module, the use of Piccolo coprocessor ID PICCOLO1 is modified as the state of permission visit Piccolo.7 groups of Piccolo states are arranged.Can load and all data of storing in the particular group with single LDC or STC.
Group 0: special register.
-one 32 word comprises the value (read-only) of Piccolo ID register.
-one 32 word comprises the state of control register.
-one 32 word comprises the state of status register.
-one 32 word comprises the state of programmable counter.
Group 1: general-purpose register (GPR)
-16 32 words comprise the general-purpose register state.
Group 2: totalizer
-4 32 words, comprise accumulator registers high 32 (note, for the purpose of recovering, with the GPR state duplicate be necessary-otherwise can contain another time on this registers group write enable).
Group 3: register/Piccolo ROB/ exports fifo status.
Which register tagging-one 32 word indicates for recharging (2 of each 32 bit registers).
-8 32 words comprise the state (storing 87 items on the throne 7 to 0) of ROB label.
-3 32 words comprise the state (position 17 to 0) of the ROB latch that does not line up.
-one 32 word, which groove comprises valid data (position 4 expressions are empty, the number that position 3 to 0 codings are used) in the indication Output Shift Register.
-one 32 word comprises the state (position 17 to 0) that output FIFO keeps latch.
Group 4:ROB input data.
-8 32 bit data value.
Group 5: output data fifo.
-8 32 bit data value.
Group 6: loop hardware.
-4 32 words comprise the circulation start address.
-4 32 words comprise loop end address.
-4 32 words comprise cycle count (position 15 to 0).
-one 32 word comprises user-defined parameter and other state that remaps of remapping.
LDC instruction is used for loading the Piccolo state during in the conditional access pattern at Piccolo.Which group the indication of BANK field is loading.31?30?29?28?27?26?25?24 23 22 21 20?19?18?17?16 15?14?13?12 11?10?9?8 7?6?5?4?3?2?1?0
COND 110 P U 0 W 1 BASE BANK ?PICCOLO1 OFFSET
Following sequence loads all the Piccolo states from the address among the register R0.
LDP B0, [R0], #16! Special register
LDP B1, [R0], #64! Load general-purpose register
LDP B2, [R0], #16! Load totalizer
LDP B3, [R0], #56! Bit load registers/ROB/FIFO state
LDP B4, [R0], #32! Load the ROB data
LDP B5, [R0], #32! Load the output data fifo
LDP B6, [R0], #52! Loaded cycle hardware
STC instruction is used for the storage Piccolo state during in the conditional access pattern at Piccolo.The BANK field is specified and is being stored which group.31?30?29?28 27?26?25 24?23 22?21?20?19?18?17?16 15?14?13?12 11?10?9?8 7?6?5?4?3?2?1?0
COND 110 ?P ?U ?0 W ?0 BASE BANK PICCOLO1 OFFSET
Following sequence is with the address of all Piccolo state storage in the register R0.
STP B0, [R0], #16! Preserve special register
STP B1, [R0], #64! Preserve general-purpose register
STP B2, [R0], #16! Preserve totalizer
STP B3, [R0], #56! Save register/ROB/FIFO state
STP B4, [R0], #32! Preserve the ROB data
STP B5, [R0], #32! Preserve the output data fifo
STP B6, [R0], #52! Preserve loop hardware
Debugging mode-Piccolo need respond the identical debug mechanism of being supported with ARM, and promptly software is by Demon and Angel, and the hardware that has the ICE of embedding, is some mechanism of debugging Piccolo system below:
-ARM instruction breakpoint.
-data breakpoint (observation point).
-Piccolo instruction breakpoint.
-Piccolo software breakpoint.
ARM instruction and data breakpoint are the ICE resume module that embedded by ARM; The Piccolo instruction breakpoint is the ICE resume module that embedded by Piccolo; The Piccolo software breakpoint is handled by Piccolo nuclear.
The Hardware Breakpoint system can be configured to make ARM and Piccolo both that breakpoint is arranged.
As if software breakpoint is handled by Piccolo instruction (shutting down or interruption), causes Piccolo to stop to carry out, enter debugging mode (the B position in the SM set mode register) and forbid itself (instructed with PDISABLE and forbidden Piccolo).Programmable counter is remained valid, and allows to recover breakpoint address.Piccolo no longer executes instruction.
Single step is advanced Piccolo and can be connect a breakpoint and finish by set a breakpoint on the Piccolo instruction stream.
The basic function that software debugging-Piccolo provides is to load and the ability of all states of preservation in the storer by coprocessor instruction in the conditional access pattern.This allows debugged program that all states are kept in the storer, reads and upgrades it and return among the Piccolo.Piccolo store status mechanism right and wrong are destructive, i.e. the store status of Piccolo operation can not destroy any Piccolo internal state.This means that Piccolo does not recover it at first again and just can reset after its state of dump.
Determine to find out the mechanism of the state of Piccolo cache memory.
Hardware debug-hardware debug is provided by the scan chain on the coprocessor interface of Piccolo.Piccolo can be placed the conditional access pattern then and pass through this its state of scan chain check/modification.
The Piccolo status register comprises the single position break-poing instruction of having indicated its executed.When carrying out break-poing instruction, the B position in the Piccolo SM set mode register, and stop to carry out.In order to inquire about Piccolo, debugged program must start Piccolo and be placed in the conditional access pattern by write its control register before the access that can occur subsequently.
Fig. 4 illustrates the high/low position of response and size position suitable half of the register selected is switched to multiplexer configuration on the Piccolo data routing.If 16 of size position indications, the then symbol expanded circuit is with the high position in 0 or 1 suitable padding data path.

Claims (34)

1. one kind is used a digital signal processing device to carry out the method for digital signal processing to being stored in signal data word in the data storage device (8), said method comprising the steps of:
The microprocessor unit that utilization is operated under the control of microprocessor unit programmed instruction word (2) produces address word, is used for the storage unit at the described signal data word of described data storage device addressable storage;
Under the control of described microprocessor unit, from the described storage unit that is addressed of the described data storage device of storing described signal data word, read described signal data word;
Under the control of described microprocessor unit, provide described signal data word to the digital signal processing unit of under the control of digital signal processing unit programmed instruction word, operating (4);
The described digital signal processing unit that utilization is operated under the control of digital signal processing unit programmed instruction word is carried out described signal data word and is comprised convolution operation at least, and the arithmetical logic operation of one of associative operation and map function is with the data word that bears results; And
The described microprocessor unit that utilization is operated under the control of microprocessor unit programmed instruction word takes out described result data word from described digital signal processing unit; It is characterized in that:
With described provide and to take out operation parallel mutually that described microprocessor unit is carried out, described digital signal processing unit is carried out described logical operation.
2. according to the method for claim 1, also be included under the control of described microprocessor, the data word that described microprocessor is produced offers the digital signal processing unit of operating under the control of digital signal processing unit programmed instruction word.
3. according to the method for claim 1, further comprising the steps of:
Under the control of described microprocessor unit, be created in the address word of the storage unit of the described result data word of addressable storage in the described data storage device;
Under the control of described microprocessor unit, described result data word is write the storage unit of the described institute addressing that in described data storage device, is used to store described result data word.
4. according to the method for one of claim 1,2 or 3, wherein said signal data word table shows at least one input simulating signal.
5. according to the method for claim 4, wherein said at least one input simulating signal is a real-time input signal of continually varying.
6. according to the method for one of claim 1,2 or 3, wherein said result data word is represented at least one output simulating signal.
7. according to the method for claim 6, wherein said at least one output signal is a continually varying real time output.
8. one is carried out the device of digital signal processing to being stored in signal data word in the data storage device, and described device comprises:
A microprocessor unit, it carries out the address word of addressing to the storage unit in described data storage device operating under the control of microprocessor unit programmed instruction word to produce, and controls described signal data word and be used for the described device of combine digital signal Processing and the transmission between the described data storage device described; And
A digital signal processing unit, it is operated under the control of digital signal processing unit instruction word so that the described signal data word that is taken out from described data storage device by described microprocessor unit is carried out and comprises convolution operation at least, the arithmetical logic operation of one of associative operation and map function is with the data word that bears results; It is characterized in that:
Described microprocessor unit and described digital signal processing unit parallel work-flow.
9. device is according to Claim 8 wherein write described data storage device by described microprocessor unit with described result data word.
10. according to Claim 8 or 9 device, wherein said signal data word table shows at least one input simulating signal.
11. according to the device of claim 10, wherein said at least one input simulating signal is a real-time input signal of continually varying.
12. according to Claim 8 or 9 device, wherein said result data word is represented at least one output simulating signal.
13. according to the device of claim 12, wherein said at least one output signal is a continually varying real time output.
14. device according to Claim 8, wherein said microprocessor unit response more than provides instruction word that the signal data word of a plurality of sequential addressings is provided to described digital signal processing unit.
15. device according to Claim 8, wherein said digital signal processing unit comprise a multiword input buffer (12).
16. device according to Claim 8, wherein said microprocessor unit response more than is taken out instruction word and is taken out the result data word of a plurality of sequential addressings from described digital signal processing unit.
17. device according to Claim 8, wherein said digital signal processing unit comprise a multiword output buffer (18).
18. device according to Claim 8, one of them multiplexed data has been connected described data storage device and described digital signal processing device to transmit described signal data word with instruction bus, described microprocessor unit programmed instruction word and described digital signal processing unit programmed instruction word are to described digital signal processing device.
19. device according to Claim 8, wherein said digital signal processing unit comprises that a digital signal processing unit registers group (10) is used to preserve data word, can carry out arithmetic logical operation to these data words, described DSP program instruction word comprises the register specific field.
20. according to the device of claim 15, wherein for each data word that is stored in the described input buffer, the destination data of a purpose digital signal processing unit register of described input buffer stores sign.
21. device according to claim 20, the digital signal processing unit programmed instruction word that wherein reads a digital signal processing unit register comprises a sign, and indication is stored in the data word that a data word in the described digital signal processing unit register can be stored in the described input buffer with the destination data that is complementary and replaces.
22. device according to claim 21, if wherein described input buffer comprises a plurality of data words with the destination data that is complementary, then refill described digital signal processing unit register with such data word, described data word has first will be stored in the destination data that is complementary in the described input buffer.
23. device according to claim 14 or 20, one of them provides instruction word to specify a destination data that is used for one first data word more, as the described results that instruction is provided, increase progressively described destination data for each data word subsequently that is stored in described input buffer more.
24. device according to claim 23, providing instruction word also to specify a limit destination data value wherein said more, for each data word subsequently, increase progressively described destination data up to reaching described limit purpose value, therefore, before further increasing progressively described destination data, described destination data is reset to the described destination data of described first data word.
25. device according to Claim 8, if a data word that provides from described microprocessor unit can not be provided wherein described digital signal processing unit, then described microprocessor unit is deadlocked.
26. device according to Claim 8, if wherein described digital signal processing unit can not provide a data word will being taken out by described microprocessor unit, then described microprocessor unit is deadlocked.
27. according to the device of claim 15, if wherein described digital signal processing unit attempts to read a non-existent data word in the described input buffer, then described digital signal processing unit is deadlocked.
28. according to the device of claim 17, if wherein described digital signal processing unit attempt a data word is write described output buffer, and described output buffer full, then described digital signal processing unit is deadlocked.
29. according to the device of claim 27 or 28, if wherein described digital signal processing unit is deadlocked, then described digital signal processing unit enters energy saver mode.
30. according to Claim 8 to 11 and the device of one of 14-22, comprising a digital signal processing unit high-speed cache that is used to store the digital signal processing unit instruction word.
31. according to the device of claim 30, wherein, respond a prefetched instruction, the digital signal processing unit instruction can be pre-fetched in the described digital signal processing unit high-speed cache.
32. according to the device of claim 20, wherein, the instruction of one of following at least situation is carried out in described digital signal processing unit response:
(i) be labeled as sky; And
(ii) export the content of a plurality of registers of described digital signal processing unit.
33. device according to Claim 8, wherein, described microprocessor and described digital signal processing unit form an integrated circuit.
34. according to the device of claim 19, wherein, described a plurality of digital signal processing unit registers comprise at least one X bit data register and at least one Y position accumulator register, wherein Y is greater than X.
CNB971981442A 1996-09-23 1997-08-22 Digital signal processing integrated circuit architecture Expired - Fee Related CN1135468C (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB9619833A GB2317468B (en) 1996-09-23 1996-09-23 Digital signal processing integrated circuit architecture
GB9619833.8 1996-09-23

Publications (2)

Publication Number Publication Date
CN1231741A CN1231741A (en) 1999-10-13
CN1135468C true CN1135468C (en) 2004-01-21

Family

ID=10800366

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB971981442A Expired - Fee Related CN1135468C (en) 1996-09-23 1997-08-22 Digital signal processing integrated circuit architecture

Country Status (11)

Country Link
EP (1) EP0927393B1 (en)
JP (1) JP3756195B2 (en)
KR (1) KR100500890B1 (en)
CN (1) CN1135468C (en)
DE (1) DE69707486T2 (en)
GB (1) GB2317468B (en)
IL (1) IL128321A (en)
MY (1) MY115104A (en)
RU (1) RU2223535C2 (en)
TW (1) TW318915B (en)
WO (1) WO1998012629A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101236548B (en) * 2007-01-31 2010-08-25 财团法人工业技术研究院 Digital signal processor

Families Citing this family (64)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6556044B2 (en) 2001-09-18 2003-04-29 Altera Corporation Programmable logic device including multipliers and configurations thereof to reduce resource utilization
JP2000057122A (en) 1998-08-06 2000-02-25 Yamaha Corp Digital signal processor
JP3561506B2 (en) * 2001-05-10 2004-09-02 東京エレクトロンデバイス株式会社 Arithmetic system
US8620980B1 (en) 2005-09-27 2013-12-31 Altera Corporation Programmable device with specialized multiplier blocks
US8266199B2 (en) 2006-02-09 2012-09-11 Altera Corporation Specialized processing block for programmable logic device
US8041759B1 (en) 2006-02-09 2011-10-18 Altera Corporation Specialized processing block for programmable logic device
US8301681B1 (en) 2006-02-09 2012-10-30 Altera Corporation Specialized processing block for programmable logic device
US8266198B2 (en) 2006-02-09 2012-09-11 Altera Corporation Specialized processing block for programmable logic device
US7836117B1 (en) 2006-04-07 2010-11-16 Altera Corporation Specialized processing block for programmable logic device
US7822799B1 (en) 2006-06-26 2010-10-26 Altera Corporation Adder-rounder circuitry for specialized processing block in programmable logic device
US8386550B1 (en) 2006-09-20 2013-02-26 Altera Corporation Method for configuring a finite impulse response filter in a programmable logic device
US7930336B2 (en) 2006-12-05 2011-04-19 Altera Corporation Large multiplier for programmable logic device
US8386553B1 (en) 2006-12-05 2013-02-26 Altera Corporation Large multiplier for programmable logic device
US7814137B1 (en) 2007-01-09 2010-10-12 Altera Corporation Combined interpolation and decimation filter for programmable logic device
US7865541B1 (en) 2007-01-22 2011-01-04 Altera Corporation Configuring floating point operations in a programmable logic device
US8650231B1 (en) 2007-01-22 2014-02-11 Altera Corporation Configuring floating point operations in a programmable device
US8645450B1 (en) 2007-03-02 2014-02-04 Altera Corporation Multiplier-accumulator circuitry and methods
US7949699B1 (en) 2007-08-30 2011-05-24 Altera Corporation Implementation of decimation filter in integrated circuit device using ram-based data storage
US8959137B1 (en) 2008-02-20 2015-02-17 Altera Corporation Implementing large multipliers in a programmable integrated circuit device
US8244789B1 (en) 2008-03-14 2012-08-14 Altera Corporation Normalization of floating point operations in a programmable integrated circuit device
US8626815B1 (en) 2008-07-14 2014-01-07 Altera Corporation Configuring a programmable integrated circuit device to perform matrix multiplication
US8255448B1 (en) 2008-10-02 2012-08-28 Altera Corporation Implementing division in a programmable integrated circuit device
US8307023B1 (en) 2008-10-10 2012-11-06 Altera Corporation DSP block for implementing large multiplier on a programmable integrated circuit device
US8706790B1 (en) 2009-03-03 2014-04-22 Altera Corporation Implementing mixed-precision floating-point operations in a programmable integrated circuit device
US8549055B2 (en) 2009-03-03 2013-10-01 Altera Corporation Modular digital signal processing circuitry with optionally usable, dedicated connections between modules of the circuitry
US8645449B1 (en) 2009-03-03 2014-02-04 Altera Corporation Combined floating point adder and subtractor
US8805916B2 (en) 2009-03-03 2014-08-12 Altera Corporation Digital signal processing circuitry with redundancy and bidirectional data paths
US8468192B1 (en) 2009-03-03 2013-06-18 Altera Corporation Implementing multipliers in a programmable integrated circuit device
US8886696B1 (en) 2009-03-03 2014-11-11 Altera Corporation Digital signal processing circuitry with redundancy and ability to support larger multipliers
US8650236B1 (en) 2009-08-04 2014-02-11 Altera Corporation High-rate interpolation or decimation filter in integrated circuit device
US8396914B1 (en) 2009-09-11 2013-03-12 Altera Corporation Matrix decomposition in an integrated circuit device
US8412756B1 (en) 2009-09-11 2013-04-02 Altera Corporation Multi-operand floating point operations in a programmable integrated circuit device
US7948267B1 (en) 2010-02-09 2011-05-24 Altera Corporation Efficient rounding circuits and methods in configurable integrated circuit devices
US8539016B1 (en) 2010-02-09 2013-09-17 Altera Corporation QR decomposition in an integrated circuit device
US8601044B2 (en) 2010-03-02 2013-12-03 Altera Corporation Discrete Fourier Transform in an integrated circuit device
US8458243B1 (en) 2010-03-03 2013-06-04 Altera Corporation Digital signal processing circuit blocks with support for systolic finite-impulse-response digital filtering
US8484265B1 (en) 2010-03-04 2013-07-09 Altera Corporation Angular range reduction in an integrated circuit device
US8510354B1 (en) 2010-03-12 2013-08-13 Altera Corporation Calculation of trigonometric functions in an integrated circuit device
US8539014B2 (en) 2010-03-25 2013-09-17 Altera Corporation Solving linear matrices in an integrated circuit device
US8589463B2 (en) 2010-06-25 2013-11-19 Altera Corporation Calculation of trigonometric functions in an integrated circuit device
US8862650B2 (en) 2010-06-25 2014-10-14 Altera Corporation Calculation of trigonometric functions in an integrated circuit device
US8577951B1 (en) 2010-08-19 2013-11-05 Altera Corporation Matrix operations in an integrated circuit device
US8645451B2 (en) 2011-03-10 2014-02-04 Altera Corporation Double-clocked specialized processing block in an integrated circuit device
US9600278B1 (en) 2011-05-09 2017-03-21 Altera Corporation Programmable device using fixed and configurable logic to implement recursive trees
US8812576B1 (en) 2011-09-12 2014-08-19 Altera Corporation QR decomposition in an integrated circuit device
US8949298B1 (en) 2011-09-16 2015-02-03 Altera Corporation Computing floating-point polynomials in an integrated circuit device
US9053045B1 (en) 2011-09-16 2015-06-09 Altera Corporation Computing floating-point polynomials in an integrated circuit device
US8762443B1 (en) 2011-11-15 2014-06-24 Altera Corporation Matrix operations in an integrated circuit device
JP6115564B2 (en) * 2012-03-13 2017-04-19 日本電気株式会社 Data processing system, semiconductor integrated circuit and control method thereof
US8543634B1 (en) 2012-03-30 2013-09-24 Altera Corporation Specialized processing block for programmable integrated circuit device
US9098332B1 (en) 2012-06-01 2015-08-04 Altera Corporation Specialized processing block with fixed- and floating-point structures
US8996600B1 (en) 2012-08-03 2015-03-31 Altera Corporation Specialized processing block for implementing floating-point multiplier with subnormal operation support
US9207909B1 (en) 2012-11-26 2015-12-08 Altera Corporation Polynomial calculations optimized for programmable integrated circuit device structures
US9189200B1 (en) 2013-03-14 2015-11-17 Altera Corporation Multiple-precision processing block in a programmable integrated circuit device
US9348795B1 (en) 2013-07-03 2016-05-24 Altera Corporation Programmable device using fixed and configurable logic to implement floating-point rounding
US9379687B1 (en) 2014-01-14 2016-06-28 Altera Corporation Pipelined systolic finite impulse response filter
US9684488B2 (en) 2015-03-26 2017-06-20 Altera Corporation Combined adder and pre-adder for high-radix multiplier circuit
CN107193768B (en) * 2016-03-15 2021-06-29 厦门旌存半导体技术有限公司 Method and device for inquiring queue state
US10942706B2 (en) 2017-05-05 2021-03-09 Intel Corporation Implementation of floating-point trigonometric functions in an integrated circuit device
US10545516B2 (en) * 2017-08-02 2020-01-28 Schneider Electric Systems Usa, Inc. Industrial process control transmitter for determining solution concentration
FR3087907B1 (en) * 2018-10-24 2021-08-06 St Microelectronics Grenoble 2 MICROCONTROLLER INTENDED TO EXECUTE A PARAMETABLE PROCESSING
US11156664B2 (en) * 2018-10-31 2021-10-26 SK Hynix Inc. Scan chain techniques and method of using scan chain structure
CN110109704B (en) * 2019-05-05 2021-08-27 杭州中科微电子有限公司 Digital signal processing system
US11886377B2 (en) * 2019-09-10 2024-01-30 Cornami, Inc. Reconfigurable arithmetic engine circuit

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4896262A (en) * 1984-02-24 1990-01-23 Kabushiki Kaisha Meidensha Emulation device for converting magnetic disc memory mode signal from computer into semiconductor memory access mode signal for semiconductor memory
CA2003338A1 (en) * 1987-11-09 1990-06-09 Richard W. Cutts, Jr. Synchronization of fault-tolerant computer system having multiple processors
JP3128799B2 (en) * 1988-09-30 2001-01-29 株式会社日立製作所 Data processing device, data processing system, and outline font data generation method
EP0843254A3 (en) * 1990-01-18 1999-08-18 National Semiconductor Corporation Integrated digital signal processor/general purpose CPU with shared internal memory
US6230255B1 (en) * 1990-07-06 2001-05-08 Advanced Micro Devices, Inc. Communications processor for voice band telecommunications
JPH0683578A (en) * 1992-03-13 1994-03-25 Internatl Business Mach Corp <Ibm> Method for controlling processing system and data throughput
KR0165054B1 (en) * 1994-08-22 1999-01-15 정장호 Data stuffing device

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101236548B (en) * 2007-01-31 2010-08-25 财团法人工业技术研究院 Digital signal processor

Also Published As

Publication number Publication date
JP2001501330A (en) 2001-01-30
KR100500890B1 (en) 2005-07-14
CN1231741A (en) 1999-10-13
RU2223535C2 (en) 2004-02-10
KR20000048533A (en) 2000-07-25
JP3756195B2 (en) 2006-03-15
GB2317468A (en) 1998-03-25
WO1998012629A1 (en) 1998-03-26
TW318915B (en) 1997-11-01
IL128321A (en) 2003-05-29
EP0927393B1 (en) 2001-10-17
GB9619833D0 (en) 1996-11-06
EP0927393A1 (en) 1999-07-07
MY115104A (en) 2003-03-31
DE69707486T2 (en) 2002-06-27
DE69707486D1 (en) 2001-11-22
IL128321A0 (en) 2000-01-31
GB2317468B (en) 2001-01-24

Similar Documents

Publication Publication Date Title
CN1135468C (en) Digital signal processing integrated circuit architecture
CN1103961C (en) Coprocessor data access control
CN1103959C (en) Register addressing in a data processing apparatus
CN1112635C (en) Single-instruction-multiple-data processing in multimedia signal processor and device thereof
CN1117316C (en) Single-instruction-multiple-data processing using multiple banks of vector registers
CN1244051C (en) Storing stack operands in registers
CN1027198C (en) Computing device
CN1186718C (en) Microcontroller instruction set
CN1287270C (en) Restarting translated instructions
CN1246772C (en) Processor
CN1625731A (en) Configurable data processor with multi-length instruction set architecture
CN1226323A (en) Data processing apparatus registers
CN1306697A (en) Processing circuit and processing method of variable length coding and decoding
CN1584824A (en) Microprocessor frame based on CISC structure and instruction realizing style
CN1269052C (en) Constant reducing processor capable of supporting shortening code length
CN1484787A (en) Hardware instruction translation within a processor pipeline
CN1104679C (en) Data processing condition code flags
CN1226325A (en) Input operation control in data processing systems
CN1137421C (en) Programmable controller
CN1491383A (en) Data processing using coprocessor
CN1152300C (en) Single-instruction-multiple-data processing with combined scalar/vector operations
CN1226324A (en) Data processing system register control
CN1103467C (en) Macroinstruction set symmetrical parallel system structure microprocessor
CN1228711C (en) Register stack capable of being reconfigured and its design method
CN1227585C (en) Computer system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20040121

Termination date: 20160822