CN100511278C - Design and implementing method of multimedia expansion instructionof flow input read - Google Patents

Design and implementing method of multimedia expansion instructionof flow input read Download PDF

Info

Publication number
CN100511278C
CN100511278C CNB2006101050662A CN200610105066A CN100511278C CN 100511278 C CN100511278 C CN 100511278C CN B2006101050662 A CNB2006101050662 A CN B2006101050662A CN 200610105066 A CN200610105066 A CN 200610105066A CN 100511278 C CN100511278 C CN 100511278C
Authority
CN
China
Prior art keywords
register
instruction
code stream
bit
len
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CNB2006101050662A
Other languages
Chinese (zh)
Other versions
CN1912925A (en
Inventor
梅魁志
郑南宁
吴奇
李国辉
张元林
黄畅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Jiaotong University
Original Assignee
Xian Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Jiaotong University filed Critical Xian Jiaotong University
Priority to CNB2006101050662A priority Critical patent/CN100511278C/en
Publication of CN1912925A publication Critical patent/CN1912925A/en
Application granted granted Critical
Publication of CN100511278C publication Critical patent/CN100511278C/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

This invention discloses a design and realization method for multimedia expansion instructions of reading stream input, in which, said method designs 4 media expansion instructions for the stream data input reading in an audio-video decoder, the hardware structure includes two 32 bit buffer-registers, an Adddr reading code streams, a Flag register, a Left register and two shifters to any bits, said expansion instruction includes Bini, Bread, Bload and a current Bpos, which also designs a hardware realization circuit of said stream data reading instruction and plot of pipelines in a processor to provide an instruction code mode in the SPARC V8 processor and experiment shows that the efficiency of said instruction is 5-8 times of the optimized artificial SPARC V8.

Description

The design and the implementation method of the multimedia extension instruction that the stream input is read
Technical field
The invention belongs to the processor design field, be applied to the design of multimedia processor, be specifically related to a kind of design and implementation method at decoded data code stream input in the audio/video decoding (being called for short the stream input) reading command.
Background technology
In nearly all image and video compression standard, as JPEG, JPEG2000, MPEG-1, MPEG-2, MPEG-4, H.264, during decoding, all want earlier coded message to be decoded: be used for the information of decoding and coding data as comprising Huffman table, quantizing factor etc. in the header of the audio protocols the 3rd layer (MP3) of MPEG-1 and MPEG-2 and the side information, in decode procedure, will read in real time to these code streams that comprises coded message.Fig. 1 is for realizing the program flow diagram (being the unit of reading with byte) of stream input function on 32 RISC, judge at first whether the byte data buffering is empty, then from internal memory, read 1 byte data as sky and put into the byte data buffer register, judge that then whether the code stream length (Len) that will read is less than remaining Bits number (Left) in the word buffer register, as less than then reading Len position Bits and upgrading remaining Bits number in the byte buffer register, if greater than at first remaining Bits number in the byte buffer register would be read out, and upgrade Len (Len-=Leff) and Left (Left=8), and then judge that whether Len is greater than 8, if greater than would directly read a byte, and upgrade Len (Len-=8) and Left (Left=8), and then judge that whether Len is greater than 8, till Len is less than 8, when Len less than 8 the time, read Len Bits and upgrade Left from the byte buffer register, all Bits that will read at last put into rreturn value and return.
Said procedure is gone up corresponding 63 assembly instructions at 32 general RISC (LEON2) of compatible SPARC V8 instruction set, takies 5.5% of whole decode times in the MP3 decoding process.Because the stream input operation is prevalent in the various audio/video decoding courses, effectively improve the performance that processor is handled multimedia decoding if therefore can on the instruction set of general RISC, add the instruction meeting that input is read at stream specially.From the applicant existing literature is retrieved, also do not found disclosed stream input reading command design and hardware implementation structure and method.
Summary of the invention
The objective of the invention is to, the design and the implementation method of the stream multimedia extension that input is read instruction is provided.This method can be added the instruction that input is read at stream specially on the instruction set of general RISC, can effectively improve the performance that processor is handled multimedia decoding.
In order to realize above-mentioned task, the present invention takes following technical solution:
The design and the implementation method of the multimedia extension instruction that the stream input is read, it is characterized in that, this method reads at flow data input in the audio/video decoding and has designed 4 media extension instructions, and its hardware implementation structure comprises two 32 bit stream buffer register, the address register Addr that reads code stream, Flag flag register, Left register and two any bit shift devices;
Code stream is read initialization (Bini) instruction and is used for being provided with the address register Addr that reads code stream, load primary data to two a bit stream buffer register of two 32, two bit stream buffer registers are loading data from Data Cache alternately, and Flag flag register and Left register are set; By the current bit stream buffer register of depositing the front code stream of sign Flag decision; Code stream reads (Bread) instruction and reads 32 with interior any Bit flow data from the bit stream buffer register, and Data Loading (Bload) instruction is loaded 32 word to arbitrary bit stream buffer register from Data Cache; Current code stream address is returned (bpos) instruction and is returned Left and Addr register value in the code stream read operation;
The Len bit code stream back Left=Left-Len that needs is read in any bit shift operation of each use, when Left smaller or equal to 0 the time, need to load 32 bit code flow datas in the bit stream buffer register of sky, simultaneously inversed F lag flag register, Left=32+Left simultaneously;
For Len<=Left, realize that in a beat code stream reads, uses 32 respectively 1 of any lt/shift unit that moves to right; For Len〉Left, realize that in two beats code stream reads, and uses the shared method of 2 32 any lt/shift unit that moves to right; Operation does not need totalizer for some data plus-minus method, uses simple negate to add 1 and the negate logic realization.
The present invention has designed the hardware circuit implementation and the streamline in processor of this flow data reading command and has divided, provided its concrete order number mode in based on SPARC V8 processor, the efficient that experimental results show that this extended instruction is 5-8 times that the manual SPARC V8 processor instruction of optimizing is realized.
Description of drawings
Stream input fetch program process flow diagram among Fig. 1 Libmad (MP3 decoding program);
The update mechanism of Fig. 2 double buffering register;
During Fig. 3 Len<=Left, the data stream of circuit is read in the stream input;
Fig. 4 Len〉during Left, the data stream of circuit is read in the stream input;
Data stream during Fig. 5 Data Loading in the circuit;
Fig. 6 realizes that with existing SPARCV8 instruction code stream reads.
The present invention is described in further detail below in conjunction with embodiment that accompanying drawing and inventor provide.
Embodiment
According to technique scheme, the present invention is actual to be a kind of design of processor extended instruction and hardware implementation structure and method that reads at decoded data stream in the audio/video decoding program.Its hardware implementation structure comprises two 32 buffer register, the address register Addr that reads code stream, Flag flag register, Left register and two any bit shift devices.
Therefore suppose that the bit number that at every turn reads is not more than 32, need two 32 buffer register Buffer0, Buffer1, two registers are loading data from Data Cache alternately, by the current buffer register of depositing the front code stream of sign Flag decision.Specifically, when Flag is 1, Buffer1 deposits the front code stream, when reading code stream, reads from Buffer1 earlier, if Buffer1 is not enough, reads from Buffer0 again; When Flag is 0, Buffer0 deposits the front code stream, when reading code stream, reads from Buffer0 earlier, if Buffer0 is not enough, reads from Buffer1 again; In addition, represent with the Left register what bits the buffer register that comprises the front code stream also has be not read.After reading code stream at every turn, Left=Left-Len, when Left smaller or equal to 0 the time, need loading data to in the empty bit stream buffer register, inversed F lag flag register simultaneously, Left=32+Left.The concrete update mechanism of double buffering register can be represented with Fig. 2.According to update method shown in Figure 2, can be in two kinds of situation (Len<=Left and Len〉Left) specifically describe the data stream in the circuit, respectively as shown in Figure 3 and Figure 4.
When Len<=Left, reading code stream only needs a clock period, in this clock period, realizes calculating and transmission from the Buffer register to the out register, and Buffer obtains upgrading simultaneously; As Len〉during Left, need two clock period.The out register returns the code stream that will read, and deposits in low level, and all the other positions are zero.The value of the out register that obtains like this can be participated in decoding directly and use.During specific implementation, the out register can be connected to a general-purpose register in the processor.In addition, the shift unit of required shift unit in can multiplexing general processor when not multiplexing (, the shift unit number that the code stream read operation needs is two) is when at Len shown in Figure 4〉the Left situation, at the T2 beat, can multiplexing T1 beat in used shift unit.
Whole solution relates to the operation of some plus-minus method, but can add 1 and the negate logic realization with simple negate, does not need totalizer.When reading code stream, need to calculate shift count 32-len, because the len scope is [1,32], low 5 negates of len can be added 1 and can calculate 32-len.When loading data, need to calculate Left=32+Left, the scope of noticing Left is represented Left that for [31,32] if Left equals 0, directly composing Left is 00100000 with 8 bit complements; If Left gets final product the negate of Senior Three position less than 0 (this moment, Left Senior Three position must be 1).
1. flow the instruction design that input is read
The description of data stream of instruction Bini, Bread, Bload and the Bpos of the realization stream input read functions of design is as follows, and wherein T1, T2 represent two beats respectively, and Bread instruction back should be with the Bload instruction is arranged.
Bini?reg
The decoding stage
Read the value of reg according to the numbering of reg in the order code.
Execute phase
T1: the value of register reg is write among the Addr;
T2: from Data Cache, pack 64Bits into to Buffer[1 according to Addr, 0] in, register upgrades as follows:
Addr=Addr+8,Left=32,Flag=1。
Bread?reg?or?imm_Len:
The decoding stage
When the operand of expression code stream length (Len) in the instruction when counting addressing immediately, from instruct, obtain this value; When for register addressing,, read the value of this register according to register number; Then Len and Left are subtracted each other.Execute phase:
According to the result that subtracts each other of Len and Left, two kinds of situations are arranged:
1. when Len<=Left:
T1: select Buffer1 or Buffer0 according to Flag, this Buffer logical shift right 32-Len position is outputed in the out register; Simultaneously this Buffer value is outputed to another shift unit, logical shift left Len position is used for upgrading this Buffer value; Register upgrades: Left=Left-Len.
2. as Len〉during Left:
T1: select the Buffer that comprises the back code stream (to be made as Buffer0 according to Flag, as shown in Figure 4), to this Buffer logical shift right Left position, the output result is designated as tmp1, simultaneously the Buffer value is sent to another shift unit, logical shift left Len-Leff position, the output result is designated as tmp2, register upgrades: Bufffer0=tmp2, Buffer1=tmp1ORBuffer1.
T2: Buffer1 is sent to shift unit, and logical shift right 32-Len outputs in the out register.While Left=Left-Len.
Bload:
The decoding stage:
Analyze Left, if Left smaller or equal to 0, does the Data Loading operation in the execute phase; Otherwise do not do any operation in the execute phase.
Execute phase (Left is smaller or equal to 0):
T1:Flag is 1, selects buffer1; Flag is 0, selects buffer0.According to Addr, pack 32Bits into to selected buffer from Data Cache.Addr=Addr+4;Flag=~Flag;Left=32+Left。
bpos?reg1,reg2:
The decoding stage:
From instruction, obtain the numbering of reg1, reg2.
Execute phase:
T1: Left is write reg1; Addr is write among the reg2.
2. a kind of realization of code stream reading command
Described to top vague generalization the function and the data stream of code stream reading command, here we are given in the object lesson of realizing these special instructions on the SPARC architecture.The code stream reading command is incorporated sparc architecture, and how mutual with the value of existing register be mainly concerned with coding and these instructions of instruction.Order number adopts the third coded format among the SPARC V8, and op=2, and this form is as follows:
Op=2
Figure C200610105066D0007150954QIETU
The specific coding form is as follows, and the register in the hardware circuit of code stream reading command (such as Left, Addr) is coding not, because these registers are always used in these instructions acquiescently.Because SPARC is risc architecture, so the operand of instruction mostly is register, wherein the operand (expression length) of Bread instruction both can be shown by numerical table immediately, also can be represented by register.The output result of Bread instruction always is stored in the %o0 register acquiescently, and this is that the %o0 register is the register that is used for depositing function return value because in sparc architecture.
bini?Rs2
Figure C200610105066D0007151040QIETU
bread?Rs2?or?length
Figure C200610105066D0007151118QIETU
bpos?Rs1,Rd
3. stream input reading command efficiency analysis
The Analysis of operation efficiency of stream input reading command realizes that same code stream reads needed instruction strip number before and after need relatively adding extended instruction.Because code stream reads and is applied to decoding program mostly, and this decoding program is generally realized with higher level lanquage, therefore introduces a kind of higher level lanquage model (C language) that code stream reads here, uses two functions to realize flowing input and reads:
Bitinit(struct?Bitptr*ptr,unsigned?word*pword);
unsigned?long?Bitread(struct?Bitptr*ptr,int?Len);
Wherein structure B itptr is defined as follows:
struct?Bitptr
{ unsigned?int*ptr_word;
unsigned?int?Cur_word;
unsigned?short?Left;
}
Parameter p tr points to the current bit that will read, and member ptr_word represents the memory address of the next word that will load at data buffering in the structure; Cur_word is current digital data buffering, and the current bit that will read at first reads from this buffering, if not enough, loads next word to data buffering according to ptr_word again; Left is illustrated in the current digital data buffering also has be not read for how many bits.The structure initialization that function Bit_init () relies on the pword parameter that ptr is pointed to; Function Bitread () returns one 32 signless integer according to ptr with the bit number (Len) that will read, and the low Len position of this integer (by order from left to right) is exactly the code stream that will read, and all the other positions are zero.
When not adding the code stream reading command, realize above-mentioned two functions with the SPARCV8 assembly instruction, the %o register is used for transmitting parameter, and %o0 transmits first parameter, and %o1 transmits second parameter ..., the rest may be inferred, and %o0 is used for transmitting rreturn value; Backmost two the instruction ret1, nop is used for function and returns.Function as follows realizes that used SPARCV8 instruction does not generate with compiler, but the optimization of hand-coding instruction is to realize the prestissimo that the stream input is read with existing SPARCV8 instruction substantially.
Bitinit:
st?%o1,[%o0]
ld?[%o1],%o1
st?%o1,[%o0+4]
set?32,%o1
sth?%o1,[%o0+8]
ret1
nop
Bitread: the optimization instruction of the manual compilation of this function and process flow diagram are as shown in Figure 6.
When using stream input reading command, the assembly instruction of above-mentioned function is achieved as follows:
Bitinit:
Bini?%o1
ret1
nop
Bitread:
Bread?%o1
Bload
ret1
nop
Because it generally is an initialization that the stream input is read, repeatedly read, therefore the raising of reading efficiency depends primarily on Bitread (), after using the stream input reading command of expansion, the needed instruction number of function Bitread () is 4, and used 15,23 instructions respectively in two branches of Fig. 6, therefore using stream input reading command is very efficiently.The place that needs to read code stream in program can be inserted the Bread instruction by hand and be replaced the Bitread function calls, omitted the expense that associated parameter transmission and function return, efficient is higher, but programmer's workload will be increased, therefore realize that with the Bread instruction existing stream imports function reading a kind of good method of can yet be regarded as with simplifying.

Claims (1)

1. flow design and the implementation method of importing the multimedia extension instruction of reading, it is characterized in that, this method reads at flow data input in the audio/video decoding and has designed 4 media extension instructions, and its hardware implementation structure comprises two 32 bit stream buffer register, the address register Addr that reads code stream, Flag flag register, Left register and two any bit shift devices;
Code stream is read initialization (Bini) instruction and is used for being provided with the address register Addr that reads code stream, load primary data to two a bit stream buffer register of two 32, two bit stream buffer registers are loading data from Data Cache alternately, and Flag flag register and Left register are set; By the current bit stream buffer register of depositing the front code stream of sign Flag decision; Code stream reads (Bread) instruction and reads 32 with interior any Bit flow data from the bit stream buffer register, and Data Loading (Bload) instruction is loaded 32 word to arbitrary bit stream buffer register from Data Cache; Current code stream address is returned (bpos) instruction and is returned Left and Addr register value in the code stream read operation;
The Len bit code stream back Left=Left-Len that needs is read in any bit shift operation of each use, when Left smaller or equal to 0 the time, need to load 32 bit code flow datas in the bit stream buffer register of sky, simultaneously inversed F lag flag register, Left=32+Left simultaneously;
For Len<=Left, realize that in a beat code stream reads, uses 32 respectively 1 of any lt/shift unit that moves to right; For Len〉Left, realize that in two beats code stream reads, and uses the shared method of 2 32 any lt/shift unit that moves to right; Operation does not need totalizer for some data plus-minus method, uses simple negate to add 1 and the negate logic realization.
CNB2006101050662A 2006-08-29 2006-08-29 Design and implementing method of multimedia expansion instructionof flow input read Expired - Fee Related CN100511278C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB2006101050662A CN100511278C (en) 2006-08-29 2006-08-29 Design and implementing method of multimedia expansion instructionof flow input read

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB2006101050662A CN100511278C (en) 2006-08-29 2006-08-29 Design and implementing method of multimedia expansion instructionof flow input read

Publications (2)

Publication Number Publication Date
CN1912925A CN1912925A (en) 2007-02-14
CN100511278C true CN100511278C (en) 2009-07-08

Family

ID=37721846

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2006101050662A Expired - Fee Related CN100511278C (en) 2006-08-29 2006-08-29 Design and implementing method of multimedia expansion instructionof flow input read

Country Status (1)

Country Link
CN (1) CN100511278C (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101090504B (en) * 2007-07-20 2010-06-23 清华大学 Coding decoding apparatus for video standard application
CN101640795B (en) * 2009-05-06 2011-05-18 南京龙渊微电子科技有限公司 Video decoding optimization method and device
CN101901187B (en) * 2010-07-09 2012-09-19 北京红旗胜利科技发展有限责任公司 Decoding program test method and system
CN102567556A (en) * 2010-12-27 2012-07-11 北京国睿中数科技股份有限公司 Verifying method and verifying device for debugging-oriented processor

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于SPARC结构的RISC系统设计技术. 时晨,于伦政.微电子学与计算机,第11期. 2002
基于SPARC结构的RISC系统设计技术. 时晨,于伦政.微电子学与计算机,第11期. 2002 *

Also Published As

Publication number Publication date
CN1912925A (en) 2007-02-14

Similar Documents

Publication Publication Date Title
US5892966A (en) Processor complex for executing multimedia functions
KR101851487B1 (en) Systems, apparatuses, and methods for expanding a memory source into a destination register and compressing a source register into a destination memory location
EP3776229A1 (en) Apparatuses, methods, and systems for remote memory access in a configurable spatial accelerator
JP2019032859A (en) Systems, apparatuses and methods for blending two source operands into single destination using writemask
TWI510921B (en) Cache coprocessing unit
JP2008530642A (en) Low latency mass parallel data processor
JP2001202245A (en) Microprocessor having improved type instruction set architecture
JP2017538213A (en) Method and apparatus for implementing and maintaining a stack of predicate values using stack synchronization instructions in an out-of-order hardware software co-design processor
CN102141905A (en) Processor system structure
JP2006313546A (en) Data processing system
KR20010075320A (en) Method for configuring configurable hardware blocks
US20130151822A1 (en) Efficient Enqueuing of Values in SIMD Engines with Permute Unit
CN100511278C (en) Design and implementing method of multimedia expansion instructionof flow input read
US20060168424A1 (en) Processing apparatus, processing method and compiler
Sias et al. Enhancing loop buffering of media and telecommunications applications using low-overhead predication
WO2016210023A1 (en) Decoding information about a group of instructions including a size of the group of instructions
CN111459550A (en) Microprocessor with highly advanced branch predictor
JP5989293B2 (en) Execution time selection of feedback connection in multiple instruction word processor
KR100472706B1 (en) Digital signal processor having a plurality of independent dedicated processors
CN101246435A (en) Processor instruction set supporting part statement function of higher order language
US9201657B2 (en) Lower power assembler
US8631173B2 (en) Semiconductor device
WO2021243490A1 (en) Processor, processing method, and related device
JP2004510248A (en) FIFO write / LIFO read trace buffer with software and hardware loop compression
CN110045989B (en) Dynamic switching type low-power-consumption processor

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20090708

Termination date: 20120829