Embodiment
To the present invention be described in conjunction with exemplary multiline procedure processor and corresponding disposal system at this.Yet, be to be understood that the present invention does not need to use the concrete multiline procedure processor and the disposal system structure of this illustrative embodiment, be more suitable for usually in any multiline procedure processor that is being desirable to provide the modified processor performance or information handling system application, using.In addition, although be particularly suitable for using in convergence device, multiline procedure processor of the present invention also can use in the equipment of other type.
As will be described herein in more detail hereinafter, encode according to control coding, digital signal processor (DSP) coding, Java coding and network processes that the illustrative embodiment of multiline procedure processor of the present invention can be carried out based on RISC.This processor comprises single instruction multiple data (SIMD) vector units, reduction unit and the execution of CLIW (LIW) compound instruction.
Fig. 1 diagram is according to multiline procedure processor 102 of the present invention.This multiline procedure processor 102 comprises multithreading cache 110, multithreading data-carrier store 112, instruction decoder 116, register file 118 and Memory Management Unit (MMU) 120.Multithreading cache memory 110 is also referred to as the multithreading cache at this.
Multithreading cache 110 comprise a plurality of thread cache 110-1,110-2 ..., 110-N, wherein N represents the number of threads that this multiline procedure processor 102 is supported usually, in this concrete example, N=4.Certainly, also can use other N value, as conspicuous for a person skilled in the art.
Thereby each thread has its respective thread cache associated in multithreading cache 110.Similarly, data-carrier store 112 comprises the example of the data-carrier store that N is different, be labeled as shown in the figure data-carrier store 112-1,112-2 ..., 112-N.
Processor 102 can be carried out the token triggered multithreading, for example at above-mentioned attorney 1007-8 and title for described in the U.S. Patent application of " method and apparatus that is used for the token triggered multithreading ".The token triggered threading is distributed to different tokens each thread in a plurality of processor threads usually.For example, the token triggered threading can use a token and current processor clock cycle to combine a concrete thread in the recognition processor thread, and this thread will send the instruction that is used for the clock period subsequently.Also can use or alternatively use the threading of other type.
Each thread cache in the multithreading cache 110 can comprise the memory array with one or more groups memory location.Given thread cache also comprises a thread identifier register, is used to store a relevant thread identifier.
Multithreading cache 110 is by primary memory (not shown) the formation interface of MMU 120 with processor 102 outsides.Be similar to cache 110, MMU 120 comprises the independent example that is used for N each thread of thread that processor supports.MMU 120 guarantees to come the suitable instruction of autonomous memory to be downloaded to multithreading cache 110.The MMU 120 that can comprise cache controller or be attached thereto can carry out at least a portion map addresses technology, for example fully correlation map, directly mapping or group correlation map.Be suitable for being the U.S. Patent application that transfers the applicant jointly 10/161 that on June 4th, 2002 submitted in conjunction with the illustrative group of correlation map technology that the present invention uses, 774 and 10/161, be described in 874, these two pieces of patent documentations are incorporated herein by reference.
Data-carrier store 112 also is directly connected to above-mentioned external main memory usually, although this connection does not illustrate in the accompanying drawings clearly.What also be connected with data-carrier store 112 is data buffer 130.
The attorney of quoting in the above other storer that to be 1007-5 described data-carrier store 112 or linked to each other with multiline procedure processor for the U.S. Patent application of " being used for the method and apparatus based on the storage access of thread in multiline procedure processor " with title.
Usually, multithreading cache 110 is used to store the instruction that will be carried out by multiline procedure processor 102, and data-carrier store 112 storages are by the data of instruction manipulation.Instruction is extracted from multithreading cache 110 by instruction decoder 116 and decodes.According to instruction type, instruction decoder 116 can be transmitted to each interior other unit of processor with given instruction or relevant information, as describing hereinafter.
Processor 102 also comprises one group of background register 132, in this example, comprises control register (CR) 134, link register (LR) 136 sum counter registers (CTR) 138.These background registers are assisted program control flow by the position of revising the instruction of being extracted.As shown in the figure, illustrate each background register 134,136 that links to each other with each thread in this illustrative embodiment and an example of 138.
Other register in the processor 102 comprises branch register 140 and programmable counter (PC) register 142.Be similar to background register 134,136 and 138, program counter register 142 comprises an example that is used for each thread.Branch register 140 receives instruction from instruction decoder 116, and in conjunction with program counter register 142 input is offered addition module 144.The branch units that comprises processor 102 that unit 140,142 and 144 is total.The control of this branch units is by the extraction of the instruction in the performed instruction pipelining of processor.
Register file 118 provides the interim storage of whole-number result.Decoding offers the instruction of integer instructions formation (IQ) 150 from instruction decoder 116, and is illustrated as the offset units 152 that comprises the independent example that is used for each thread by use and selects correct hardware thread unit.Offset units 152 is inserted register file addresses with clear and definite bit, so that do not interrupt independently thread-data.For given thread, these clear and definite bits can comprise such as corresponding thread identifier.
As shown in the figure, register file 118 is connected to input register RA and RB, and its output is connected to addition module 154.Input register RA and RB use when the execution command pipelining.The output of addition module 154 is connected to data-carrier store 112.
According to the present invention, register file 118, integer instructions formation 150, offset units 152, unit R A and RB and adder unit comprise an exemplary integer unit jointly.
Attorney 1007-7 that quotes in the above and title have been described the technology that is used for based on the register file of thread ground access such as register file 118 for the U.S. Patent application of " method and apparatus that is used for the register file port reduction in multiline procedure processor ".
Executable instruction type comprises branch (brand), loads (load), stores (Store), integer (Integer) and vector (Vector)/SIMD instruction type in processor 102.If given instruction is assigned finger, loading, storage or integer operation not, then it is a vector/SIMD instruction.Also can use other instruction type.These integers and vector/SIMD instruction type is the example that is called integer and vector instruction type at this usually more respectively.
Vector IQ 156 receives the vector/SIMD instruction of transmitting from instruction decoder 116.Be illustrated as the respective offsets unit 158 that comprises the independent example that is used for each thread and be used to insert suitable bit to guarantee not interrupt independently thread-data.
The vector units 160 of processor 102 is divided into N different parallel section, and comprises the vector file of dividing similarly 162.Vector file 162 is basically as the purpose identical with register file 118, and except the former operation is vector/SIMD instruction type.
Vector units 160 diagram ground comprise the computing and the storage unit of vector instruction formation 156, offset units 158, vector file 162 and associated company.
The operation of vector units 160 is as follows.Given vector/SIMD the data block that is encoded to mark or integer data type reads from vector file 162, and is stored in the visual register VRABC.From here on, flow process continues to carry out by the parallel MPY piece that multiplies each other simultaneously of carrying out vector/SIMD data.The result stores in the structurally visual register PABC.Adder unit can be carried out additional arithmetic operation subsequently, and the result is stored in totalizer (ACC) register.After this, data continue to handle by reduction unit 164, accumulation result concurrently wherein, but generate serial semantics.If it is the substantially the same output of result that serial computing will generate that serial semantics provides with four saturation values of parallel computation in vector units 160.Such output is also referred to as serial output at this.The reduction sum that is obtained is stored in the saturation register that is labeled as SAT.
The other parts of reduction unit 164 and vector units 160 also can be used and the similar technology of describing in following document of technology: N.Yadav, M.Schulte and J.Glossner, " Parallel Saturating Fractional Arithmetic Units ", Proceedings of the9th Great Lakes Symposium on VLSI, the 172-179 page or leaf, Ann Arbor, Michigan, on March 4th to 6,1999, this document is incorporated herein by reference.
Although reduction unit 164 is illustrated as the part of vector units 160 in this illustrative embodiment, also can be implemented as independent unit.
Processor 102 preferably uses the instruction process of pipelining.For example, processor 102 can use an instruction pipelining, and wherein each thread sends single instruction on each processor clock cycle.As another example, instruction pipelining can be configured to each thread and send a plurality of instructions on each processor clock cycle.More particularly, use the thread of sufficient amount and suitable pipelining, each thread of processor can send loading instruction and vector multiplying instruction simultaneously in given processor clock cycle under the situation that does not stop arbitrary thread.
Advantageously, processor 102 shown in Figure 1 can be carried out various dissimilar order numbers effectively, comprises control coding, dsp code, Java coding and network processes coding based on RISC.Therefore, processor 102 is particularly suitable for realizing in the convergence device such as 3GPP WCDMA mobile unit.
Fig. 2 illustrates an example of disposal system 200, wherein can realize processor 102.Disposal system 200 can for example be counted as convergence device, a unit of for example above-mentioned 3GPP WCDMA mobile unit.
More particularly, disposal system 200 in this embodiment is configured to and supports WCDMA and Global Link (GSM) radio communication simultaneously, while processed voice, data, audio frequency, video and the out of Memory that transmits on various different mediums.
Disposal system 200 comprises DSP hardware 202 and microprocessor 204.DSP hardware 202 is illustrated as and comprises first and second examples that are labeled as 202-1 and 202-2.DSP hardware is connected to a relevant external memory storage 206.Microprocessor 208 is connected to a relevant external memory storage 208.Storer 206 and 208 is called " inside ", because they are in the inside of disposal system 200, both can represent a plurality of parts of common storage.DSP hardware 202 also can be communicated by letter with not shown one or more external memory storages respectively with microprocessor 204.
DSP hardware 202 and microprocessor 204 preferably all use single multiline procedure processor as shown in Figure 1 to realize.Also can use such as other structure based on the structure of a plurality of processors.
The first example 202-1 of DSP hardware 202 comprises a plurality of processing units to diagram, comprises GSM channel equalizer, GSM channel encoder, GSM burst builder, GSM channel decoder, GSM Voice decoder, GSM speech coder, GSM transmitter, encrypt/decrypt, timing controlled, WCDMA transmitter, filtering, gain and frequency control, WCDMA searcher, Rake receiver, channel encoder, WCDMA Voice decoder, WCDMA speech coder and channel decoder.Other unit comprises Windows
Media audio (WMA), physical medium, JPEG (joint photographic experts group) (JPEG/JPEG2000), mobile motion picture expert group version layer 3 audio frequency (MP3), advanced audio (AAC) and musical instrument digital interface (MIDI).The operation of these unit is well known in the art, therefore, does not describe in detail further at this.
The second interface 202-2 of DSP hardware 202 can dispose similarly, perhaps can comprise other processing unit that is suitable for supporting other communication function in the disposal system 200.
Microprocessor 204 is illustrated as and comprises a plurality of processing units, comprises man-machine interface (MMI), mobile photographic experts group 4 (MPEG4), protocol stack, Short Message Service/message management system (SMS/MMS) and real time operating system (OS) unit, as shown in the figure.At this, the operation of these unit is known in the field.
Disposal system 200 also comprises the communication bus 210 that is connected between DSP hardware 202, microprocessor 204 and the system unit 212.Similarly, communication bus 214 is connected between DSP hardware 202 and the system unit 216.
System unit 212 comprises digital camera, video camera, USB (universal serial bus) (USB), universal asynchronous receiver/transmitter (UARTS), SCSI parallel interface (SPI), intelligence interface controller (I2C), general purpose I/O (GPIO), security identity module/USIM (Universal Subscriber Identity Module) (SIM/USIM), external memory storage I/O, keyboard, LCD, interruptable controller and direct memory access (DMA) (DMA) controller.
System unit 216 comprises receiver I/O, transmitter I/O and bluetooth I/O.
Other system unit shown in the figure comprises test I/O (I/O) 218, system clock and control 220 and power management 222.
System unit 212,216,218,220 and 220 operate in known in the artly, therefore, these unit are not described further at this.
Point out that as top the function relevant with two DSP hardware 202 and microprocessor 204 can be carried out on the single multiline procedure processor such as multiline procedure processor 102.Thereby multiline procedure processor 102 can be used to carry out the coding relevant with system unit 212,216,218,220 and 222 and the relevant coding with DSP hardware 202 and microprocessor 204.
Microprocessor 204 in the disposal system 200 can be used to move the coding relevant with higher layer applications.
The processing unit relevant with DSP hardware 202 can use software translating to realize.Advantageously, software translating makes it possible to change effectively high-level programming language.
Should be understood that the present invention does not need difference concrete multiline procedure processor and disposal system structure as depicted in figs. 1 and 2.As previously noted, the present invention can realize with various other multiline procedure processor and disposal system structures.
And, should be appreciated that for clearly explanation, simplified concrete structure illustrated in figures 1 and 2, can also comprise not clear and definite illustrated other or substituting unit.
Thereby the above embodiment of the present invention will only be illustrative, and the various alternate embodiments within the protection domain of claim will be conspicuous for a person skilled in the art.