CN100511278C

CN100511278C - Design and implementing method of multimedia expansion instructionof flow input read

Info

Publication number: CN100511278C
Application number: CNB2006101050662A
Authority: CN
Inventors: 梅魁志; 郑南宁; 吴奇; 李国辉; 张元林; 黄畅
Original assignee: Xian Jiaotong University
Current assignee: Xian Jiaotong University
Priority date: 2006-08-29
Filing date: 2006-08-29
Publication date: 2009-07-08
Anticipated expiration: 2026-08-29
Also published as: CN1912925A

Abstract

This invention discloses a design and realization method for multimedia expansion instructions of reading stream input, in which, said method designs 4 media expansion instructions for the stream data input reading in an audio-video decoder, the hardware structure includes two 32 bit buffer-registers, an Adddr reading code streams, a Flag register, a Left register and two shifters to any bits, said expansion instruction includes Bini, Bread, Bload and a current Bpos, which also designs a hardware realization circuit of said stream data reading instruction and plot of pipelines in a processor to provide an instruction code mode in the SPARC V8 processor and experiment shows that the efficiency of said instruction is 5-8 times of the optimized artificial SPARC V8.

Description

The design and the implementation method of the multimedia extension instruction that the stream input is read

Technical field

The invention belongs to the processor design field, be applied to the design of multimedia processor, be specifically related to a kind of design and implementation method at decoded data code stream input in the audio/video decoding (being called for short the stream input) reading command.

Background technology

In nearly all image and video compression standard, as JPEG, JPEG2000, MPEG-1, MPEG-2, MPEG-4, H.264, during decoding, all want earlier coded message to be decoded: be used for the information of decoding and coding data as comprising Huffman table, quantizing factor etc. in the header of the audio protocols the 3rd layer (MP3) of MPEG-1 and MPEG-2 and the side information, in decode procedure, will read in real time to these code streams that comprises coded message.Fig. 1 is for realizing the program flow diagram (being the unit of reading with byte) of stream input function on 32 RISC, judge at first whether the byte data buffering is empty, then from internal memory, read 1 byte data as sky and put into the byte data buffer register, judge that then whether the code stream length (Len) that will read is less than remaining Bits number (Left) in the word buffer register, as less than then reading Len position Bits and upgrading remaining Bits number in the byte buffer register, if greater than at first remaining Bits number in the byte buffer register would be read out, and upgrade Len (Len-=Leff) and Left (Left=8), and then judge that whether Len is greater than 8, if greater than would directly read a byte, and upgrade Len (Len-=8) and Left (Left=8), and then judge that whether Len is greater than 8, till Len is less than 8, when Len less than 8 the time, read Len Bits and upgrade Left from the byte buffer register, all Bits that will read at last put into rreturn value and return.

Said procedure is gone up corresponding 63 assembly instructions at 32 general RISC (LEON2) of compatible SPARC V8 instruction set, takies 5.5% of whole decode times in the MP3 decoding process.Because the stream input operation is prevalent in the various audio/video decoding courses, effectively improve the performance that processor is handled multimedia decoding if therefore can on the instruction set of general RISC, add the instruction meeting that input is read at stream specially.From the applicant existing literature is retrieved, also do not found disclosed stream input reading command design and hardware implementation structure and method.

Summary of the invention

The objective of the invention is to, the design and the implementation method of the stream multimedia extension that input is read instruction is provided.This method can be added the instruction that input is read at stream specially on the instruction set of general RISC, can effectively improve the performance that processor is handled multimedia decoding.

In order to realize above-mentioned task, the present invention takes following technical solution:

The design and the implementation method of the multimedia extension instruction that the stream input is read, it is characterized in that, this method reads at flow data input in the audio/video decoding and has designed 4 media extension instructions, and its hardware implementation structure comprises two 32 bit stream buffer register, the address register Addr that reads code stream, Flag flag register, Left register and two any bit shift devices;

Code stream is read initialization (Bini) instruction and is used for being provided with the address register Addr that reads code stream, load primary data to two a bit stream buffer register of two 32, two bit stream buffer registers are loading data from Data Cache alternately, and Flag flag register and Left register are set; By the current bit stream buffer register of depositing the front code stream of sign Flag decision; Code stream reads (Bread) instruction and reads 32 with interior any Bit flow data from the bit stream buffer register, and Data Loading (Bload) instruction is loaded 32 word to arbitrary bit stream buffer register from Data Cache; Current code stream address is returned (bpos) instruction and is returned Left and Addr register value in the code stream read operation;

The Len bit code stream back Left=Left-Len that needs is read in any bit shift operation of each use, when Left smaller or equal to 0 the time, need to load 32 bit code flow datas in the bit stream buffer register of sky, simultaneously inversed F lag flag register, Left=32+Left simultaneously;

For Len＜=Left, realize that in a beat code stream reads, uses 32 respectively 1 of any lt/shift unit that moves to right; For Len〉Left, realize that in two beats code stream reads, and uses the shared method of 2 32 any lt/shift unit that moves to right; Operation does not need totalizer for some data plus-minus method, uses simple negate to add 1 and the negate logic realization.

The present invention has designed the hardware circuit implementation and the streamline in processor of this flow data reading command and has divided, provided its concrete order number mode in based on SPARC V8 processor, the efficient that experimental results show that this extended instruction is 5-8 times that the manual SPARC V8 processor instruction of optimizing is realized.

Description of drawings

Stream input fetch program process flow diagram among Fig. 1 Libmad (MP3 decoding program);

The update mechanism of Fig. 2 double buffering register;

During Fig. 3 Len＜=Left, the data stream of circuit is read in the stream input;

Fig. 4 Len〉during Left, the data stream of circuit is read in the stream input;

Data stream during Fig. 5 Data Loading in the circuit;

Fig. 6 realizes that with existing SPARCV8 instruction code stream reads.

The present invention is described in further detail below in conjunction with embodiment that accompanying drawing and inventor provide.

Embodiment

According to technique scheme, the present invention is actual to be a kind of design of processor extended instruction and hardware implementation structure and method that reads at decoded data stream in the audio/video decoding program.Its hardware implementation structure comprises two 32 buffer register, the address register Addr that reads code stream, Flag flag register, Left register and two any bit shift devices.

Therefore suppose that the bit number that at every turn reads is not more than 32, need two 32 buffer register Buffer0, Buffer1, two registers are loading data from Data Cache alternately, by the current buffer register of depositing the front code stream of sign Flag decision.Specifically, when Flag is 1, Buffer1 deposits the front code stream, when reading code stream, reads from Buffer1 earlier, if Buffer1 is not enough, reads from Buffer0 again; When Flag is 0, Buffer0 deposits the front code stream, when reading code stream, reads from Buffer0 earlier, if Buffer0 is not enough, reads from Buffer1 again; In addition, represent with the Left register what bits the buffer register that comprises the front code stream also has be not read.After reading code stream at every turn, Left=Left-Len, when Left smaller or equal to 0 the time, need loading data to in the empty bit stream buffer register, inversed F lag flag register simultaneously, Left=32+Left.The concrete update mechanism of double buffering register can be represented with Fig. 2.According to update method shown in Figure 2, can be in two kinds of situation (Len＜=Left and Len〉Left) specifically describe the data stream in the circuit, respectively as shown in Figure 3 and Figure 4.

When Len＜=Left, reading code stream only needs a clock period, in this clock period, realizes calculating and transmission from the Buffer register to the out register, and Buffer obtains upgrading simultaneously; As Len〉during Left, need two clock period.The out register returns the code stream that will read, and deposits in low level, and all the other positions are zero.The value of the out register that obtains like this can be participated in decoding directly and use.During specific implementation, the out register can be connected to a general-purpose register in the processor.In addition, the shift unit of required shift unit in can multiplexing general processor when not multiplexing (, the shift unit number that the code stream read operation needs is two) is when at Len shown in Figure 4〉the Left situation, at the T2 beat, can multiplexing T1 beat in used shift unit.

Whole solution relates to the operation of some plus-minus method, but can add 1 and the negate logic realization with simple negate, does not need totalizer.When reading code stream, need to calculate shift count 32-len, because the len scope is [1,32], low 5 negates of len can be added 1 and can calculate 32-len.When loading data, need to calculate Left=32+Left, the scope of noticing Left is represented Left that for [31,32] if Left equals 0, directly composing Left is 00100000 with 8 bit complements; If Left gets final product the negate of Senior Three position less than 0 (this moment, Left Senior Three position must be 1).

1. flow the instruction design that input is read

The description of data stream of instruction Bini, Bread, Bload and the Bpos of the realization stream input read functions of design is as follows, and wherein T1, T2 represent two beats respectively, and Bread instruction back should be with the Bload instruction is arranged.

Bini?reg

The decoding stage

Read the value of reg according to the numbering of reg in the order code.

Execute phase

T1: the value of register reg is write among the Addr;

T2: from Data Cache, pack 64Bits into to Buffer[1 according to Addr, 0] in, register upgrades as follows:

Addr＝Addr+8，Left＝32，Flag＝1。

Bread?reg?or?imm_Len：

The decoding stage

When the operand of expression code stream length (Len) in the instruction when counting addressing immediately, from instruct, obtain this value; When for register addressing,, read the value of this register according to register number; Then Len and Left are subtracted each other.Execute phase:

According to the result that subtracts each other of Len and Left, two kinds of situations are arranged:

1. when Len＜=Left:

T1: select Buffer1 or Buffer0 according to Flag, this Buffer logical shift right 32-Len position is outputed in the out register; Simultaneously this Buffer value is outputed to another shift unit, logical shift left Len position is used for upgrading this Buffer value; Register upgrades: Left=Left-Len.

2. as Len〉during Left:

T1: select the Buffer that comprises the back code stream (to be made as Buffer0 according to Flag, as shown in Figure 4), to this Buffer logical shift right Left position, the output result is designated as tmp1, simultaneously the Buffer value is sent to another shift unit, logical shift left Len-Leff position, the output result is designated as tmp2, register upgrades: Bufffer0=tmp2, Buffer1=tmp1ORBuffer1.

T2: Buffer1 is sent to shift unit, and logical shift right 32-Len outputs in the out register.While Left=Left-Len.

Bload：

The decoding stage:

Analyze Left, if Left smaller or equal to 0, does the Data Loading operation in the execute phase; Otherwise do not do any operation in the execute phase.

Execute phase (Left is smaller or equal to 0):

T1:Flag is 1, selects buffer1; Flag is 0, selects buffer0.According to Addr, pack 32Bits into to selected buffer from Data Cache.Addr＝Addr+4；Flag＝～Flag；Left＝32+Left。

bpos?reg1，reg2：

The decoding stage:

From instruction, obtain the numbering of reg1, reg2.

Execute phase:

T1: Left is write reg1; Addr is write among the reg2.

2. a kind of realization of code stream reading command

Described to top vague generalization the function and the data stream of code stream reading command, here we are given in the object lesson of realizing these special instructions on the SPARC architecture.The code stream reading command is incorporated sparc architecture, and how mutual with the value of existing register be mainly concerned with coding and these instructions of instruction.Order number adopts the third coded format among the SPARC V8, and op=2, and this form is as follows:

Op＝2

The specific coding form is as follows, and the register in the hardware circuit of code stream reading command (such as Left, Addr) is coding not, because these registers are always used in these instructions acquiescently.Because SPARC is risc architecture, so the operand of instruction mostly is register, wherein the operand (expression length) of Bread instruction both can be shown by numerical table immediately, also can be represented by register.The output result of Bread instruction always is stored in the %o0 register acquiescently, and this is that the %o0 register is the register that is used for depositing function return value because in sparc architecture.

bini?Rs2

bread?Rs2?or?length

bpos?Rs1，Rd

3. stream input reading command efficiency analysis

The Analysis of operation efficiency of stream input reading command realizes that same code stream reads needed instruction strip number before and after need relatively adding extended instruction.Because code stream reads and is applied to decoding program mostly, and this decoding program is generally realized with higher level lanquage, therefore introduces a kind of higher level lanquage model (C language) that code stream reads here, uses two functions to realize flowing input and reads:

Bitinit(struct?Bitptr*ptr，unsigned?word*pword)；

unsigned?long?Bitread(struct?Bitptr*ptr，int?Len)；

Wherein structure B itptr is defined as follows:

struct?Bitptr

{ unsigned?int*ptr_word；

unsigned?int?Cur_word；

unsigned?short?Left；

}

Parameter p tr points to the current bit that will read, and member ptr_word represents the memory address of the next word that will load at data buffering in the structure; Cur_word is current digital data buffering, and the current bit that will read at first reads from this buffering, if not enough, loads next word to data buffering according to ptr_word again; Left is illustrated in the current digital data buffering also has be not read for how many bits.The structure initialization that function Bit_init () relies on the pword parameter that ptr is pointed to; Function Bitread () returns one 32 signless integer according to ptr with the bit number (Len) that will read, and the low Len position of this integer (by order from left to right) is exactly the code stream that will read, and all the other positions are zero.

When not adding the code stream reading command, realize above-mentioned two functions with the SPARCV8 assembly instruction, the %o register is used for transmitting parameter, and %o0 transmits first parameter, and %o1 transmits second parameter ..., the rest may be inferred, and %o0 is used for transmitting rreturn value; Backmost two the instruction ret1, nop is used for function and returns.Function as follows realizes that used SPARCV8 instruction does not generate with compiler, but the optimization of hand-coding instruction is to realize the prestissimo that the stream input is read with existing SPARCV8 instruction substantially.

Bitinit：

st?％o1，[％o0]

ld?[％o1]，％o1

st?％o1，[％o0+4]

set?32，％o1

sth?％o1，[％o0+8]

ret1

nop

Bitread: the optimization instruction of the manual compilation of this function and process flow diagram are as shown in Figure 6.

When using stream input reading command, the assembly instruction of above-mentioned function is achieved as follows:

Bitinit：

Bini?％o1

ret1

nop

Bitread：

Bread?％o1

Bload

ret1

nop

Because it generally is an initialization that the stream input is read, repeatedly read, therefore the raising of reading efficiency depends primarily on Bitread (), after using the stream input reading command of expansion, the needed instruction number of function Bitread () is 4, and used 15,23 instructions respectively in two branches of Fig. 6, therefore using stream input reading command is very efficiently.The place that needs to read code stream in program can be inserted the Bread instruction by hand and be replaced the Bitread function calls, omitted the expense that associated parameter transmission and function return, efficient is higher, but programmer's workload will be increased, therefore realize that with the Bread instruction existing stream imports function reading a kind of good method of can yet be regarded as with simplifying.

Claims

1. flow design and the implementation method of importing the multimedia extension instruction of reading, it is characterized in that, this method reads at flow data input in the audio/video decoding and has designed 4 media extension instructions, and its hardware implementation structure comprises two 32 bit stream buffer register, the address register Addr that reads code stream, Flag flag register, Left register and two any bit shift devices;