CN102088603A

CN102088603A - Entropy coder for video coder and implementation method thereof

Info

Publication number: CN102088603A
Application number: CN2010106200287A
Authority: CN
Inventors: 陈奎林; 王新安; 胡子一; 张兴
Original assignee: Peking University Shenzhen Graduate School
Current assignee: Peking University Shenzhen Graduate School
Priority date: 2010-12-31
Filing date: 2010-12-31
Publication date: 2011-06-08
Anticipated expiration: 2030-12-31
Also published as: CN102088603B

Abstract

The invention discloses an entropy coder for a video coder and an implementation method thereof. The implementation method of the entropy coder comprises the following steps: mapping each coding functional block of the entropy coder of the video coder into hardware logic description which is composed of operator units, wherein each coding functional block is described by an advanced program language algorithm; and generating a hardware integrated circuit of the entropy coder by the hardware logic description which is composed of the operator units. According to the invention, the design speed for the integrated circuit of the entropy coder can be quickened.

Description

Entropy coder and its implementation of being used for video encoder

Technical field

The present invention relates to a kind of entropy coder and its implementation that is used for video encoder.

Background technology

It is to be used to refer to the methodology that derived set becomes circuit design that method of designing integrated circuit is learned.Present integrated circuit (IC) design, especially large scale integrated circuit design, its common design cycle is: the system engineer proposes the systemic-function framework according to functional requirement, the software engineer utilizes computer advanced program language (for example: Java, C, C++, Matlab etc.) that the systemic-function framework is described, thereby obtains the arthmetic statement of the high level language of integrated circuit; Then utilize hardware description language (for example: Verilog, VHDL etc.), arthmetic statement is comprehensively become net meter file, use special hardware synthesis instrument to realize placement-and-routing afterwards, obtain integrated circuit diagram by the integrated circuit (IC) design engineer.

In the IC industry demand, desin speed has become the significant consideration that method of designing integrated circuit is learned.As integrated circuit manufacturer or integrated system supplier with the basic problem of commercial operation (as product quantity, design/testing expense, not reusable engineering spending (Non Recurring Engineering, NRE), the integration of User IP/multiplexing, reliability and maintainability, production process complexity etc.) take all factors into consideration when interior, expedite product releases the time, enhance product performance, reduce product makes risk and just becomes significant consideration when selecting designing technique.

Along with integrated circuit fabrication process enters 45nm with the next stage, the price scissors that the integrated circuit (IC) design development lags behind the integrated circuit technology progress continues to increase.Data flow algorithm, especially the ASIC of Digital Media class (Application Specific Integrated Circuit, application-specific integrated circuit (ASIC)) design, the integrated circuit (IC) design of video encoder for example, face problems such as design cycle length, very flexible, autgmentability difference, be difficult to satisfy the demand that product goes on the market fast, function constantly promotes, thereby need to improve and promote.

Summary of the invention

The main technical problem to be solved in the present invention is, a kind of entropy coder and its implementation that is used for video encoder is provided, and can accelerate the integrated circuit (IC) design speed of entropy coder.

For solving the problems of the technologies described above, the present invention has adopted following technical scheme:

A kind of implementation method that is used for the entropy coder of video encoder comprises:

Each encoding function piece of the entropy coder of the video encoder of high level language arthmetic statement is mapped to the hardware logic that is made of the operator unit to be described; By the hardware logic that described operator unit constitutes formation entropy encoder hardware integrated circuit is described.

The described method of an embodiment of the present invention is mapped to the hardware logic that is made of the operator unit with each encoding function piece of the entropy coder of the video encoder of high level language arthmetic statement and describes and comprise the steps:

Process analysis step: read the high level language algorithm routine of each encoding function piece of entropy coder, from described high level language algorithm routine, identify mapped execution object and parameter object according to this high level language rule;

The Data Control flow graph generates step: execution object that will identify and parameter object are mapped to respective nodes in the Data Control flow graph of describing the entropy coder algorithm;

The operator structure chart generates step: according to the function treatment of each node in the Data Control flow graph, from described operator cell library, take out at least one operator unit of corresponding function, described Data Control flow graph is converted to the operator space-time diagram of forming by the operator unit;

The temporal constraint step: total temporal constraint is determined in the requirement according to user specification demand and target integrated circuit technology, and each the operator unit label time in the operator space-time diagram carries out temporal constraint to each level of operator space-time diagram;

Space-time diagram compression step: according to time-labeling the cluster that the operator space-time diagram carries out on the space is compressed, and made overall algorithm execution time approach total temporal constraint most;

Lower floor's hardware mapping step: the operator space-time diagram after the cluster compression is mapped to the hardware logic that is made of the operator unit describes.

The described method of an embodiment of the present invention, the operator unit in the described operator cell library comprise in computing class operator, storage class operator, control class operator, class of paths operator and the clock class operator at least a.

The described method of an embodiment of the present invention, described computing class operator is the restructural operator, a computing class operator configuration control bit and multiple computing, described control bit is used to select one and selects a kind of computing; Described class of paths operator is used to link storage class operator and computing class operator, forms data flow architecture, and its way of realization comprises multi-selection device, cross bar switch and bus; Described control class operator is used to promote flowing of data flow, its way of realization comprises counter, state machine and microinstruction word, and the control output content comprises: the configuration information of computing class operator dynamic restructuring information, storage class operator, the configuration information of class of paths operator; Described storage class operator is used for different data memory formats and computation scheme are changed.

The described method of an embodiment of the present invention, described execution object comprises operational order and/or control command, described parameter object comprises at least a in data, dateout and the intermediate data of input, described operational order is mapped as the computing class operator, described control command is mapped as control class operator or class of paths operator, described parameter object is mapped as the storage class operator.

The described method of an embodiment of the present invention, described encoding function piece comprise the unified Variable Length Code functional block of the grammer parameter that is used to encode, the code stream splicing functional block that is used for the CAVLC functional block of coded residual data and splices all encoding code streams.

The present invention also provides a kind of entropy coder that is used for video encoder, comprise being used to encode the unified Variable Length Code functional block of grammer parameter, the code stream splicing functional block that is used for the CAVLC functional block of coded residual data and splices all encoding code streams, described each functional block is made up of the operator unit.

The described entropy coder of the embodiment of the invention, described self-adapting changeable long codes functional block comprise data buffering that order links to each other reorder unit, residual error coefficient resolution unit, look-up table or calculation code unit.

Reorder unit, residual error coefficient resolution unit, look-up table or calculation code unit of the described entropy coder of the embodiment of the invention, described data buffering adopts the setting of inter-stage flowing water form.

The described entropy coder of the embodiment of the invention, the residual error coefficient coding operator function group of described CAVLC functional block is for realizing the operator function group of second son 4*4 piece residual error coefficient coding.

The invention has the beneficial effects as follows: the present invention is by the operator unit of design expression hardware logic, the high level language arthmetic statement of the entropy coder of video encoder is mapped to the hardware logic that is made of the operator unit describes, thereby can accelerate the integrated circuit (IC) design speed of entropy coder.

Description of drawings

Fig. 1 shows the typical H.264 structure of video encoder;

Fig. 2 shows a few class operator structures that the embodiment of the invention provides; Wherein, shown in Fig. 2 .1 is the computing class operator, and shown in Fig. 2 .2 is the storage class operator, and shown in Fig. 2 .3 is the class of paths operator, and shown in Fig. 2 .4 is the control class operator;

Fig. 3 .1 shows the general structure of entropy coder, and Fig. 3 .2 shows the operator function structure of entropy coder;

Fig. 4 .1 shows the overall operator structure of the entropy coder of the embodiment of the invention, and Fig. 4 .2 shows the operator function structure of the bs_write_ue function of entropy coder;

Fig. 5 shows the overall procedure of the entropy coder of the embodiment of the invention;

Fig. 6 shows 4 * 4 CAVLC codings of repeatedly son circulation of the embodiment of the invention;

Fig. 7 shows the CAVLC coding schematic flow sheet of the embodiment of the invention;

Fig. 8 shows residual error coefficient cache register form under the I_16*16 macro block (mb) type of the embodiment of the invention;

Fig. 9 shows residual error coefficient cache register form under the I_4*4 of the embodiment of the invention and the P_16*16 macro block (mb) type;

Figure 10 shows residual error coefficient storage format and coded sequence under the I_16*16 macro block (mb) type of the embodiment of the invention;

Figure 11 shows residual error coefficient storage format and coded sequence under the I_4*4 of the embodiment of the invention and the P_16*16 macro block (mb) type;

Figure 12 shows the preliminary resolution unit structure of residual error coefficient of the embodiment of the invention;

Figure 13 shows the preliminary resolving of residual error coefficient of the embodiment of the invention;

Figure 14 .1, Figure 14 .2, Figure 14 .3 show the zig-zag scanning of the embodiment of the invention; Wherein, Figure 14 .1 is a pattern, and Figure 14 .2 is the b pattern, and Figure 14 .3 is the c pattern;

Figure 15 shows three kinds of coding modes of the CAVLC of the embodiment of the invention.

Embodiment

In conjunction with the accompanying drawings the present invention is described in further detail below by embodiment.

One, video encoder general introduction

Video coding technique is the key technology of digital media storage and transmission, and it is widely used in fields such as Digital Television, video communication, network flow-medium.The present invention relates to a kind of entropy coder that is used for video encoder and its implementation, be applicable to the entropy coding of various image/videos in the signal processing field/audio frequency dct transform data based on operator thought.

At present in field of video encoding, by the common new digital video coding standard of formulating of the joint video team (JVT) of International Telecommunication Association (ITU-T) and International Organization for Standardization amalgamated consolidation-H.264 be used widely.H.264 algorithm be International Telecommunication Association (ITU-T) H.264, it also is the content of the 10th part of the MPEG-4 standard of International Organization for Standardization and International Power committee (IEC), it adopts with different levels algorithm structure, can be divided into two-layerly conceptive, be respectively video coding layer and network abstraction layer.Video coding layer (Video Coding Layer, be called for short VCL) is responsible for video frequency content expressing efficiently, and network abstraction layer (Network Abstraction Layer is called for short NAL) is responsible for data are packed and transmitting in the desired appropriate mode of network.Help the encapsulation of information and information is carried out better priority control by video coding layer and the such structure of network abstraction layer.

H.264 the video encoder structure as depicted in figs. 1 and 2.As seen from Figure 1, H.264 the main subalgorithm of cataloged procedure has: inter prediction, infra-frame prediction, transform and quantization, block-eliminating effect filtering and entropy coding.Below each subalgorithm is done a summary.

1, inter prediction

Inter prediction is a correlation of utilizing video image interframe, i.e. time correlation reaches the purpose of image compression.In image transmission technology, live image is the emphasis of paying close attention to.It is the temporal image sequence that successive image frame is at interval formed that live image was gone up with the frame period by the time, and it is in time than spatially having bigger correlation.Variations in detail is very little between most of image consecutive frames, promptly has very strong correlation between picture frame, and this correlation characteristics of utilizing frame to have are carried out interframe encode, can obtain the compression ratio more much higher than intraframe coding.H.264 inter prediction is to utilize the predictive mode of encoded video frame/field and block-based motion compensation.

2, infra-frame prediction

Infra-frame prediction is a kind of new technology that H.264 adopts, and the neighbor that refers to utilize the top of current block and left is predicted the pixel of current block as the reference pixel.Intraframe predictive coding can reduce the spatial redundancy of image, reduces code check.

3, transform and quantization

In image encoding, transition coding and quantification are two independently processes in principle.But in H.264, the multiplication of two processes is united two into one, and further adopt integer arithmetic, reduce the operand of encoding and decoding.Quantizing process is to reduce image encoding length under the prerequisite that does not reduce video effect, reduces unnecessary information in the video recovery.Its objective is and remove after the conversion the unessential data component of visual quality for images and keep visually important data component.Its character is that the component after being removed can not recover, and is to diminish process.

4, block elimination filtering

Some comparatively significantly blocking effects can appear in block-based video coding system inevitably at some block boundary place when reconstructed image, particularly under the situation of low code check.The reason that blocking effect occurs is that the pel motion compensation precision of block boundary pixel generally all is lower than pixel in the piece, and the piece conversion has also caused the discontinuity of block boundary.The main process of block elimination filtering comprises: determine filtering strength, judge false border, filtering calculates.

5, entropy coding

In video coding, entropy coding changes a series of symbols of element that are used for representing video sequence into one to be used for transmitting or the compressed bit stream of storing.The symbol of input may comprise transformation system, motion vector, mark, head and the additional information of quantification.

The present invention relates to the H.264 entropy coder of video encoder,, mainly H.264 entropy coder is described for this reason.

H.264 entropy coding is supported two kinds of pattern: CAVLC (based on contextual Variable Length Code) and CABAC (based on contextual arithmetic coding), and CAVLC encoder complexity and code efficiency are relatively low, and anti-error code is that error correcting capability is stronger; CABAC then encoder complexity and code efficiency is higher relatively, but anti-error code and error correcting capability a little less than.Therefore, in coding framework H.264 at the different application field, Baseline profile (basic framework, be used for video conversation etc.) and Extended profile (extension framework, be used for Internet video flow transmission etc.) employing CAVLC, and Main profile (main framework is used for video broadcasting etc.) then adopts CABAC.

Entropy coding involved in the present invention is primarily aimed at the CAVLC entropy coding of Baseline profile, in entropy coding H.264, syntactic element to be encoded can divide makes two classes, one class is a residual error data, adopt the CAVLC coding, in the CAVLC encryption algorithm, the coding of each symbol all has several based on the adaptive VLC table of context, and the selection of these tables is based on the statistical information of piece content.Another kind of syntactic element is referred to as the grammer parameter, comprise header, macro-block coding pattern information, quantization parameter, reference frame index or the like, these grammer parameters all adopt Exp-Golomb (index Columbus) coding, the index Columbus coding is a kind of variable-length encoding of adopting regular texture, because of its tactical rule is unified, so be also referred to as unified variable-length encoding (UVLC).

Although H.264 standard Methods for Coding has without limits been stipulated the bitstream format and the decoding algorithm of video coding output, at present in actual applications, three kinds of reference encoding and decoding softwares of increasing income are arranged: JM, X264 and T264.

The JM reference model is official's reference software, realized H.264 all characteristics, but its program structure is tediously long, only considers to introduce various new features to improve coding efficiency, has ignored encoder complexity, and its encoder complexity height is unsuitable practical.

X264 freely organize on the net the H.264 encoder of standard code stream of the compatibility developed jointly, is characterized in paying attention to practical, compare with JM, and under the prerequisite of not obvious reduction coding efficiency, effort reduction calculation of coding complexity.

T264 is that Chinese video coding is freely organized the H.264 codec of developing jointly, and the starting point of its characteristics and X264 is similar.

The present invention is based on X264, and promptly the high level language arthmetic statement with the entropy coder of the video encoder of X264 code form is a starting point, to realize the integrated circuit (IC) design of entropy coder.

Two, operator design philosophy general introduction

Integrated circuit (IC) design, transistor design from full customization, developing into before and after 1984 with 1.0 μ m process nodes is the gate array design of representative, be the standard cell design of representative before and after 1994 again with 0.5 μ m process node, and be to drive time that is representative based on the design of IP (IP core) and 0.13um process node in 2004 of representative with 0.25 μ m process node before and after 1999, present integrated circuit (IC) design is generally based on the standard cell granularity, by hardware description language the integrated circuit hardware logic is described, yet hardware description language also is not suitable for directly describing the programmed logic of high level language and carries out flow process, cause in present integrated circuit (IC) design, longer by high-level language to lower floor's hardware mapping speed required time, become the bottleneck place that restricts integrated circuit (IC) design speed.

Along with after the integrated circuit worker enters 45nm, the efficient of integrated circuit (IC) design seriously lags behind the progress of manufacturing process.Tradition ASIC design is difficult to satisfy the demand that product goes on the market fast, function constantly promotes because the design cycle is long, design cost is high, very flexible, autgmentability are poor.Desin speed based on standard cell can not meet the demands on a large scale, traces it to its cause, and is that mainly the granularity of standard cell is too little, can only carry out computings such as addition subtraction multiplication and division, can't realize the computing that granularity is bigger.As once-through operation can finish a multichannel addition, displacement add, add displacement or | computings such as a-b|, and these algorithms are particularly common in video algorithm.

Given this, a kind of Fast design method based on operator is suggested.The basic thought of operator design is quick design, and it tries hard to hold best cut-off, to realize the quick design of integrated circuit.

The operator method for designing wishes can support the description of algorithm from mathematics, for the automation translation of implementation algorithm to operator language, operator structure provides support.The characteristics of operator are as follows:

1. granularity is bigger

Each operator all has certain function, and it is a kind of symbolic representation of functional operation.Whole operator system can support the various functions in the arthmetic statement to realize that the designer can be described and verify algorithm based on operator;

2. support the realization of algorithm

But the complete a limited number of operators of kind can support superstratum describe in to the description of computing, storage, control and data path, and can describe based on operator and be optimized and verify;

3. can quicken design and checking

After the designer utilized operator to finish the functional description of algorithm, the corresponding hardware design was also finished simultaneously.Because all operators all be complete in advance customization and be verified correct unit, therefore will shorten the time of design and checking, the designer only need guarantee to get final product based on the correct of the description of operator and mapping process, thereby will quicken greatly to design and proof procedure.

The operator that the present invention proposes, operator unit in other words, the hardware logic that can represent integrated circuit, difference in functionality based on operator, operator can be divided into several classes such as computing class operator, storage class operator, class of paths operator, control class operator, clock class operator, these operators constitute the complete operator system of a cover, can support the realization of all algorithms.Through design, these basic operators are optimizations on parameters such as area, power consumption, performance, and they have constituted the operator cell library.And the operator unit further is formed the operator function piece, and the operator function interblock adopts shared MEM on-link mode (OLM) to carry out data communication, also can adopt register transfer to carry out the control information transmission.Below all kinds of operators are done an explanation.

1, computing class operator.

Computing class operator (AU) is the elementary cell that is used to realize logical operation, arithmetical operation or the hybrid operation of logical AND arithmetic, supports the realization of calculation functions such as arithmetical logic, displacement, realizes that the operator of these basic operations is called the basic operation operator.And, can on the basis of basic operation operator, expand and support at different application, subtract each other in handling such as medium, take absolute value, average, complexity such as fixed coefficient filtering operation but the computing that is in daily use, the computing operator of realizing these extended arithmetics is called the extended arithmetic operator, and they are built by the basic operation operator.The common computing that the computing class operator is finished comprises: addition, addition add up, the addition of four numbers, finish 2 number additions and do amplitude limit computing, control code decoding of the computing of M bit shift, input data or the like again.

Fig. 2 .1 is the structural representation of a kind of computing class operator-ADDS operator, it comprise be used to realize to add the ADD unit of reducing and be used to realize shifting function＜＜/unit.The computing class operator has reconfigurability, can be a computing class operator configuration control bit and multiple calculation function, control bit is used to select one and selects a kind of computing, that is to say, the computing class operator comprises arithmetic logical unit and computing configuration register, the computing configuration register is used for receiving and storage computing configuration-direct, the arithmetical logic operation that different computing configuration-directs is corresponding different, demand according to different application, by configuration control bit (the computing configuration-direct of computing configuration register), can make the computing operator realize different functions.For example go up the ADDS operator in the example, can realize multiple difference in functionality by the parameter value of control bit X, the restructural operator has reduced the operator number that stores in the operator cell library because abundant application function can be used in the different scenes.And the restructural operator can also be realized dynamic reconstruct by the mode that changes control bit in its implementation.And if the calculation function of operator is optimised fixing, then control bit can be fixed (promptly solidifying customization).

2, storage class operator.

Fig. 2 .2 is the basic structure schematic diagram of storage class operator (MU), and the storage class operator comprises stored configuration register and memory cell, and memory cell comprises address-generation unit, data storage, data generation unit and data output control unit.The stored configuration register can be by data output control unit configuration store operator (MU) memory bank (various storage mediums: writing and/or playback mode MEM such as register, RAM), working method that can also configuration store body corresponding address generation unit.Directly will import storage to the precalculated position according to the address that address-generation unit generates, and the data of needs will be exported from deposit position.

Order and arrangement that different phase in the algorithm is handled data are different, thereby make that the storage format of data is different with mode in each stage, and storage operators can also realize the conversion of different storage formats of data and mode at this class demand.The storage class operator can be supported storage format basic and commonly used in the different application and conversion regime, such as going here and there and change and go here and there conversion, queue structure, stack architecture etc.

3, class of paths operator.

Fig. 2 .3 is the universal architecture schematic diagram of class of paths operator (LU).The class of paths operator comprises the routing configuration register and forms the alteration switch and the data register (REG) of Route Selection unit, wherein, the routing configuration register is controlled the control of class operator CU (seeing below), and the control alteration switch is realized connection between the nonidentity operation class operator according to the mode of expectation under the control action of control class operator.Data register is used for the inputoutput data of temporary computing class operator and storage class operator.That is, the effect of class of paths operator is to connect storage class operator and computing class operator, forms data flow architecture.Its way of realization mainly contains three kinds: multi-selection device, cross bar switch and bus.

4, control class operator.

Fig. 2 .4 is the universal architecture schematic diagram of control class operator (CU).The control class operator mainly is that configuration information is sent to corresponding configuration register, and configuration computing class operator, storage class operator and class of paths operator are realized intended function.That is, the effect of control class operator is flowing of normal control data stream.Its way of realization has three kinds: counter, state machine and microinstruction word.The control output content comprises: the configuration information of computing class operator dynamic restructuring information, storage class operator AGU, the configuration information of class of paths operator, realize control to other operators.Wherein the microinstruction word structure comprises decoder, program counter, command memory and streamline control module etc.The control class operator sends configuration information by carrying out the simple configuration instruction to each functional unit.

5, clock class operator.

The clock class operator is to drive and timing part, be used to provide the clock signal of computing class operator, storage class operator, class of paths operator and control class operator, clock signal comprises the signal of control clock start-stop and control clock frequency, and clock signal can dispose according to the mode of expectation.

More than five class operators are the bases of realizing following execution mode, be understandable that, above-mentioned the operator that will be used for integrated circuit (IC) design according to function is divided into five big classes and not exclusive dividing mode, can also carries out the division of wide region more or thinner scope targetedly according to actual conditions.

The arthmetic statement that the operator design process of integrated circuit (IC) may be summarized to be senior language such as C, Matlab is converted into optimal design was described and carried out to operator on the operator structure process.Because operator designs in advance and verifies, thus the operator of algorithm describe can be fast or finish the design verification process automatically.For specific area,, can design some better expansion operator or operator IP that support the algorithm realization, thereby quicken to a greater degree and the optimal design process such as communication or medium.

The present invention relates to operator is used for the realization of the entropy coder of video encoder, it mainly comprises:

Be mapped to the hardware logic that constitutes by the operator unit at each encoding function piece and describe in the process, mainly comprise the steps: the entropy coder of the video encoder of high level language arthmetic statement

1) process analysis step: read the high level language algorithm routine of each encoding function piece of entropy coder, from described high level language algorithm routine, identify mapped execution object and parameter object according to this high level language rule; The main effect of this step is that algorithm decomposes, and mainly is that the algorithm with senior language description forms such as C, Java, Matlab is divided into the experimental process algorithmic procedure.Algorithm decomposes the information such as basic operation of main analytic function call relation, function structure body and variable, bottom function, so that with the mapping of each above-mentioned class operator realization to algorithm.Carry out object and mainly comprise operational order and/or control command, parameter object mainly comprises at least a in data, dateout and the intermediate data of input, when operator shines upon, operational order is mapped as the computing class operator, control command is mapped as control class operator or class of paths operator, and parameter object is mapped as the storage class operator.Computing class operator and storage class operator are by the link of class of paths operator, and the control class operator provides control, and the clock class operator provides clock.

2) the Data Control flow graph generates step: execution object that will identify and parameter object are mapped to respective nodes in the Data Control flow graph of describing the entropy coder algorithm.On the Data Control flow graph, operational order is mapped as the processing block diagram, and control command is mapped as the control flows that is used for identification-state, state transitions condition and state control signal, and input data, dateout and intermediate data are mapped as the memory node on the data flow.

3) the operator structure chart generates step: according to the function treatment of each node in the Data Control flow graph, take out at least one operator unit of corresponding function from the operator cell library, described Data Control flow graph is converted to the operator space-time diagram of being made up of the operator unit; The operator space-time diagram has mainly represented to utilize the form after operator launches algorithm fully, and after algorithm is launched fully by operator, algorithm will be carried out by concurrency fully, and will be fastest, the area maximum, and this is that space-time diagram launches.The operator space-time diagram is that the combination of one or more operators unit can be finished and each node identical functions with each node in the combination replacement Data Control flow graph of one or more operators unit.For how the Data Control flow graph being launched, include but not limited to following several mode: if the data flow in the Data Control flow graph then adopts the mode of streamline to launch described order related data flow for order related data flow structure; If there is feedback in the data flow in the Data Control flow graph, and this data flow is a circulation time, if between the data of this data flow data dependence is arranged, then has the data flow of data dependence can not be converted into flowing structure; There is data dependence between the data of event data stream, and when existing each data flow inside of data dependence not have data dependence, then described internal data do not existed each data flow of data dependence to adopt the mode of local flow's waterline to launch; If there is not data dependence between the data flow in the Data Control flow graph, then adopts parallel mode to launch described parallel data stream, and convert the operator space-time diagram of forming by the operator unit to.

4) temporal constraint step: total temporal constraint is determined in the requirement according to user specification demand and target integrated circuit technology, to each the operator unit label time in the operator space-time diagram, on the other hand, from the operator cell library, can extract the operator time sequence information, the operator space-time diagram is done the sequential mark, form the object of temporal constraint.Thereby temporal constraint each level specific to the operator space-time diagram can be realized each level of operator space-time diagram is carried out temporal constraint according to data flow characteristic.

If described data flow architecture is a parallel data stream, then total temporal constraint is divided equally each the operator level in the space-time diagram of correspondence, and divided the temporal constraint of each operator level equally in this operator level each operator unit.The basic sequential unit of the operator that pairing each the operator level of each node of serial in the Data Control stream is total is as overall temporal constraint, according in each operator level the sequential of the computing operator that long arithmetic path shone upon account in each operator level the sequential that the ratio of the sequential summation of the operator unit correspondence that long arithmetic path shone upon is distributed each operator level.

5) space-time diagram compression step: according to time-labeling the cluster that the operator space-time diagram carries out on the space is compressed, and made overall algorithm execution time approach total temporal constraint most; This step is the time-labeling according to each subalgorithm process, in the operator space-time diagram, carry out the cluster compression, improve the multiplexing of storage class operator and computing class operator by increasing control class operator and class of paths operator, the deadline of each subalgorithm process is marked around with algorithmic procedure as far as possible, promptly reduce the quantity of computing and storage class operator as far as possible.

An arthmetic statement is tiled out fully with operator, and without any multiplexing, the speed of finishing of algorithm will be the fastest, but this cost of bringing will be very high area cost and power consumption, and it is desirable that this is not design.The target expectation of design is satisfied: under the prerequisite that satisfies the requirement of user's sequential, reduce area and power consumption demand, so need more multiplexing operators the biglyyest.So just need carry out on function corresponding and add the mark of its time of implementation, thereby can make overall algorithm execution time approach reality most and want seeking time, the time diagram of Here it is operator according to multiplexing operator of time of implementation at each operator.And space diagram is the definition to the operator area, promptly when the operator cluster is shone upon, carries out cluster according to the area of operator, selects the less operator of multiplexing area.In the operator cluster process, these two kinds of factors all need to consider.The compression of space-time diagram is for example carried out as follows: find out identical computing class operator of attribute and/or the identical storage class operator of memory attribute in the operator space-time diagram; Then according to time-labeling to the identical computing class operator of operational attribute spatially merge the compression and/or the storage class operator that memory attribute is identical spatially merge compression; Introduce the control class operator then, computing class operator after the compression and/or storage class computing operator are generated the corresponding configuration instruction, realize the multiplexing of computing class operator and/or storage class operator.

The step of cluster compression step and generation restructural operator function piece all can produce not only a kind of result.The same subfunction of different function calls, owing to difference confinement time, the cluster result that is produced is also different.Therefore need be optimized according to parameters such as time, area, power consumptions, by performance (time of implementation) discharge order, just satisfy the cluster result of time-constrain and represent its hardware realization cost minimum, therefore select overall algorithm execution time to approach to finish the optimization result of the space-time diagram of the needed total temporal constraint of integrated circuit algorithm most as the cluster compression.

6) lower floor's hardware mapping step: the operator space-time diagram after the cluster compression is mapped to the hardware logic that is made of the operator unit describes.

Be the operator design cycle that example illustrates entropy coder with the bs_write_ue function of the X264 code description of standard H.264 in the present embodiment.

The bs_write_ue function is finished no symbol index Columbus coding.Val in function is a parameter to be encoded, is written among the structure bs_t behind the index Columbus coding.Begin by a high position, write low level.

At first carry out function and decompose, table 1 is a bs_write_ue function input/output relation.

Signal name	Data type	Direction	Explanation
				val	uint32_t	IN	// parameter value to be encoded
i_size	int	IN	The figure place that // sign will be write
				*s-＞p	uint8_t	IN	// target writes register
p	uint8_t*	IN	//bs_t structure, the code stream current location
				p_end	uint8_t*	IN	//bs_t structure, the code stream end position
i_left	int	IN	//bs_t structure writes the useful position of the residue of destination register
				*s-＞p	uint8_t	OUT	// target writes register
p	uint8_t*	OUT	//bs_t structure, the code stream current location
				i_left	int	OUT	//bs_t structure writes the useful position of the residue of destination register

These input/output signals will be identified as parameter object, on the other hand, the various computings in the function be resolved, and identify the execution object.Common, several classes such as order execute statement, condition (branch's control) execute statement, Do statement are arranged in the function.The bs_write_ue function code is as follows:

// each write circulation, circulation primary is finished writing of 1bit.

while(i_count＞0)

{

// judge that code stream writes end.

if(s-＞p＞＝s-＞p_end)

{

break；

}

// write the loop-body part.

i_count--；

// the data that write have only 1 or 0.Rule of judgment will write data and move on to lowest order, again with 0x01 with.

if((i_bits＞＞i_count)&0x01)

{

// write 1 situation.Move to the corresponding position that will write with 1, then with the destination register phase place or.

*s-＞p|＝1＜＜(s-＞i_left-1)；

}

else

{

// write 0 situation.Move to the corresponding position that will write with 0, negate then and destination register phase place and.

*s-＞p&＝-(1＜＜(s-＞i_left-1))；

}

// write and finish, the value of modified logo register i_left.

s-＞i_left--；

// be 0 situation when writing i_left, byte of destination register has been write full, and the p pointer is added 1, and i_left is updated to 8 simultaneously.

if(s-＞i_left＝＝0)

{

s-＞p++；

s-＞i_left＝8；

}

When function is resolved, for Do statement, can loop body be launched according to cycle-index, obtain a plurality of loop bodies with the cycle-index equivalent number.Comprise operation expression in each loop body, and between each operation expression common parameter object is arranged.According to the relation between parameter object, the operation expression in each loop body of iteration, thus can represent the computing of former Do statement with a new operation expression.Operational order in the operation expression that this is new is mapped as the processing block diagram, and the parameter object in the operation expression is mapped as memory node on the data flow.Recursion instruction can be mapped as state machine.For branch's control statement, it can be mapped as MUX.

In operator methodology based on quick design, from the high level language arthmetic statement, obtain the space-time diagram of configurable operator by mapping, through cluster operator is carried out space compression from space-time diagram, make under the situation that satisfies the sequential condition, based on the close principle consistent of function, the operator cluster is formed the ASIC framework that preliminary operator is realized with control.The unnecessary configurable component of operator is removed in realization according to target algorithm, makes that parameters such as area, power consumption, performance are further optimized, and the ASIC that finishes for customization generates configuration-direct at last.This process can be saved the high level language algorithm, studying carefully of c program algorithm for example, and can directly carry out the mapping of operator according to the attribute of C algorithm, and then shortened design time widely.

Three, operator is realized summary

With reference to Fig. 1, entropy coding is positioned at the final step of whole cataloged procedure, it is to the conversion coefficient after reordering, resolve from the back to before carrying out backward, extract the information between the residual error coefficient, comprising: the number (RunBefore) of the number (TrailingOnes) of the number of residual error nonzero coefficient (TotalCoeffs), hangover coefficient, the number (TotalZeros) of last nonzero coefficient leading zero, each nonzero coefficient leading zero and all numbers (ZerosLeft) of zero on the corresponding nonzero coefficient left side.Just can go to have encoded after parsing these parameters, at last with the code stream finished successively by the respective sequence array output.

The encoding function piece of entropy coder comprises unified Variable Length Code (UVLC) functional block of the grammer parameter that is used to encode, the code stream splicing functional block that is used for CAVLC (CAVLC) functional block of coded residual data and splices all encoding code streams.Wherein, UVLC mainly is responsible for index Columbus coding and part block code; CAVLC mainly is responsible for handling macro block data, and inside is mainly look-up table coding and part index number Columbus coding.These entropy coding functional blocks can realize that the operator unit can be called repeatedly by the unit polymerization of a series of operators, finish whole entropy coding function.Wherein index Columbus encoding function piece has two kinds: symbol index Columbus encoding function piece is arranged, no symbol index Columbus encoding function piece; Context-adaptive variable length residual error coefficient encoding function piece has three kinds: sub 16 residual error coefficient encoding function pieces, sub 15 residual error coefficient encoding function pieces, sub 4 residual error coefficient encoding function pieces.Code stream splicing functional block has a kind of, can finish splicing and combining of all code streams.

The present invention is mainly based on the entropy coding function below the SLICE layer of baseline class H.264, configurable by coding mode, and self adaptation realizes that macro block (mb) type is an intra-frame 4 * 4 respectively, in the frame 16 * 16, the coding mode of interframe 16 * 16 and P-SKIP.The coding of these three kinds of macro block (mb) types is all realized by the operator function group that the operator unit is formed.Finish the entropy coding under the different macro block (mb) types respectively.

For the CAVLC residual coding, at the coding of 4 * 4 residual error coefficients of a son, the embodiment of the invention adopts the mode of inter-stage flowing water to raise the efficiency.Specifically be divided into following level Four:

1) the data buffering unit (the streamline first order) that reorders.

2) residual error coefficient resolution unit (the streamline second level).

3) look-up table or calculation code unit (the streamline third level).

4) code stream concatenation unit (the streamline fourth stage).

At the CAVLC entropy coder, the efficient transformation coefficient number that 4 * 4 of son contains has 16,15,4 these three kinds of situations.For these three kinds of situations, can adopt a cover hardware to realize, select wherein 16 as worst case, promptly Bian Ma residual error number is 16 to the maximum.These 3 kinds of patterns all can be used generally speaking, need to handle a plurality of sub 4 * 4 or 2 * 2 in the macro block, all adopt same set of coded hardware to handle under the different macro block (mb) types, but the number of times of its coding circulation is different, controls realization by a counter.

Because in embodiments of the present invention, just at the coding of 4 * 4 residual error coefficients of a second son, coded residual error number is 16 to the maximum to the coded hardware that CAVLC realizes, adopts different zig-zag to scan and with corresponding coefficient zero setting.When conversion coefficient is 15 and 4, utilize zig-zag scanning correspondingly high 1 and high 12 bit masks to be fallen, select here to fill with zero.The advantage of Shi Xianing is like this: all adopt under the constant situation of CAVLC encryption algorithm at conversion coefficient, realized that well hardware resource is shared, can realize the coding of same cover 4 * 4 coded hardware to 16,4,15 residual error coefficients.Its operator function block structure such as Fig. 3 .2.

It is that 16,4,15 residual error coefficient is encoded that encoder needs respectively efficient transformation data number, when carrying out zig-zag scanning, owing to all scan from the back to before carrying out backward, for ease of peek and scanning, can in advance residual error coefficient be reached in certain sequence form by different macro block (mb) types is stored in memory or the register file, when encoding, then 4 * 4 residual error data of a corresponding son are read out and be cached to earlier in 16 data registers, all finish the CAVCL coding one time at every turn by the value in these 16 registers.The order of residual error storage and form are according to the difference of macro block (mb) type and difference.

The residual error coefficient resolution unit is patted by 16 cycles one from the metadata cache register and is received 16 residual error data and to its preliminary parsing, write down the nonzero coefficient number by a counter, and these nonzero coefficients are kept in the data register bank successively use for the back module.Simultaneously also write down this nonzero coefficient and be in current 16 data which, and its sequence number be kept in the data register bank use for the back module.

Of the present invention based on the configurable entropy coding method of operator and based on the hardware-accelerated H.264 class entropy coding device that is applicable to, its hardware has regularity and flexibility, realized the adaptive entropy coding of different macro block (mb) types by configuration, can carry out entropy coding and do not disturb mutually different macro block (mb) types, when promoting coding rate, also be convenient to the transplanting and the expansion of coding mode.

At the CAVLC entropy coder, for 4 * 4 transform data codings of son, only with the residual coding that can handle the different coding coefficient number once cover hardware, realized that hardware resource is shared under the different macro block (mb) types, when saving the encoder hardware area, also has high flexibility, be convenient to the transplanting of 4 * 4 residual error coefficient codings of son under other standard, further improved the efficient of hardware coder.

Four, operator is realized describing in detail

In the embodiment of the invention, entropy coding function group is realized by the polymerization of a series of operator function piece, mainly comprise index Columbus encoding function piece (unified Variable Length Code functional block), context-adaptive variable length residual error coefficient encoding function and code stream splicing functional block, these functional blocks are presented as function in the C code, the C code is realized by calling these main body functions repeatedly when realizing the entropy coding function.

In the X264 code, by the mapping of C code to hardware, as Fig. 4 .1, the x264_macroblock_write_cavlc function is finished the macro-block level coding.Wherein the both sides frame portion is an input/output signal, and the memory of sensing does not draw.The intermediate hollow frame is the function built-in variable.It comprises following logic:

L0 logic: finish the middle mapping of macro block (mb) type coding

my_type_I16＝i_mb_i_offset+1+h-＞mb.i_intra16x16_pred_mode+

h-＞mb.i_cbp_chroma*4+(h-＞mb.i_cbp_luma＝＝0？0：12)

I4_cbp_to_golomb: finish the middle mapping of Columbus's coding in the I4 macro block

P4_cbp_to_golomb: finish the middle mapping of Columbus's coding in the P16 macro block

Mux logic: finish code stream splicing logic

Bs_write_ue logic: no symbol index Columbus coding

Bs_write_se logic: symbol index Columbus coding is arranged

X264_residual_write_cavlc logic: context-adaptive variable length residual error coefficient coding

Above-mentioned logical block is the operator function piece group that forms entropy coding function group, and they are all built by the basic operation operator.

The operator function block structural diagram of aforesaid bs_write_ue function can be with reference to figure 4.2.

As Fig. 3 .1, variable-length encoding mainly is divided into 2 independently unit, i.e. UVLC and CAVLC, wherein mainly responsible index Columbus coding in UVLC unit and part block code, CAVLC is responsible for the unit handling macro block data, and inside is mainly look-up table coding and part index number Columbus coding.For UVLC, the parameter value that transmits from the grammer parameter bus at first exists the cache register (Syntax element_cache), through directly carrying out the UVLC coding after the syntactic analysis calculating, this syntactic analysis mainly comprises the calculating of macro block (mb) type, and the selection of coding mode etc.; For CAVLC, coding to as if 4 * 4 of sons, the residual error data that transmits from data/address bus at first is buffered in register (MBpixel_cache) with a certain size data (getting 4 * 4 of sons) at every turn, through carrying out the CAVLC coding after residual error data is analyzed, this analysis mainly refers to the information that 4 * 4 residual error data of a son scan one by one to extract between them (is comprised nonzero coefficient, the hangover coefficient, the nonzero coefficient amplitude, total number of last nonzero coefficient leading zero, the number of each nonzero coefficient leading zero) for coding.Pass through a state machine at last with this two parts code stream array output.

H.264 adopt elongated entropy coding structure, so greatly saved code stream.Wherein macro block header coding comprises: macro block (mb) type, I_4 * 4 luma prediction modes, look predictive mode, coded macroblocks Type C BP, quantize difference QP etc., this part syntactic element object (Syntex element), promptly the grammer parameter realizes by index Columbus coding (UVLC).That in addition residual error data is carried out that entropy coding adopts is CAVLC, according to different mode respectively in the storage by certain format take out through DCT, corresponding residual error data after quantizing encodes, after having compiled code stream is stored in the certain zone of memory.

At first be placed in the VLC internal register after syntactic element to be encoded is written into, select different coding modes according to different frame types and macro block (mb) type, entropy coding with the baseline class is an example, its predictive mode has intra-frame 4 * 4, frame interior 16 * 16, these 4 kinds of interframe 16 * 16 and P-SKIP, corresponding grammer parameter coding cataloged procedure is also inequality.After judging which kind of coding mode of employing, encoder reads corresponding residual error data from memory.

For P-SKIP predicted macroblock type, the register i-skip value that then will write down this macro block (mb) type number adds 1, this time macroblock coding finish (being not carry out any coding in fact, is the corresponding record of having done).

For intra-frame 4 * 4 predicted macroblock type, successively to macro block (mb) type, 16 different luma prediction modes, prediction mode for chroma, 4 * 4 macroblock coding macro block (mb) types and quantization parameter QP are (if its CBP value is 0, promptly all residual error data all are 0, the QP parameter of then need not encoding) etc. the grammer parameter carry out UVLC coding, then residual error data is carried out the CAVLC coding.

For 16 * 16 predicted macroblock types in the frame, successively to macro block (mb) type, prediction mode for chroma and quantization parameter QP carry out the UVLC coding, then residual error data are carried out the CAVLC coding.

For interframe 16 * 16 predicted macroblock types, to macro block (mb) type, (if its CBP value is 0, promptly all residual error coefficients all are 0 for 16 * 16 macroblock coding macro block (mb) types and quantization parameter QP successively, the QP parameter of then need not encoding) carries out the UVLC coding, then residual error data is carried out the CAVLC coding.

More than each coding mode all realized the macroblock coding that lamella is following, each pattern becomes the absolute coding unit, realizes that with ASIC inside respectively has a state machine to control the coding flow process, also there is a code stream splicing functional block each inside, unit, and UVLC and CAVLC code stream are merged.Select to import the code stream of having encoded and having merged by a multi-selection device at last.More and the independent one-tenth unit in view of the coding parameter of CAVLC is so also have a state machine and code stream splicing functional block to integrate the CAVLC code stream in the CAVLC unit.

As Fig. 6, because a macro block has only a macro block head, so corresponding macro block head parameter coding UVLC for once, and the coded residual data pin right be 4 * 4 of sons or 2 * 2 (this moment all shared sub 4 * 4 CAVLC encode this cover hardware), so macroblock encoding residual error code stream is to be circulated by 26 times or 27 CAVLC code streams to be spliced.When macro block (mb) type is I16 * 16, the i=27 here, CAVCL circulation 27 times; When macro block (mb) type is I4 * 4 or P16 * 16, the i=26 here, CAVCL circulation 26 times.

For the coding of 4 * 4 residual error coefficients of a son, CAVLC coded hardware workflow is as follows: after the sub-piece input, it is carried out reverse Zig-zag scanning, judge one by one according to serial mode during scanning.The group block scan is finished, and then can begin the coding to CoeffToken, TotalZeros and RunBefore.Coding adopts look-up table to obtain, and coding schedule for example is placed among the ROM, by the address value of calculation code table, thereby obtains encoded radio by the corresponding coding schedule of searching of address value.Wherein, calculate and table look-up and to change mutually between the two, complicated table lookup operation can be realized with calculating, and some complicated calculations can be with tabling look-up realization, should reduce mutual process between the two, because the cycle of tabling look-up is fixed, and computational process is subjected to the influence of pipeline series as far as possible, thereby when design, computing unit and lookup unit can be put together, thereby simplify state machine design.

In order to realize the advantage of hardware fast processing, CAVLC is at the coding of 4 * 4 residual error coefficients of a son, and the mode that adopts inter-stage flowing water as shown in Figure 7, specifically is divided into following level Four according to the coding flow process to raise the efficiency:

1) the data buffering unit (the streamline first order) that reorders.

Shown in Fig. 8,9, after entropy coder starts, at first read 4 * 4 residual error data of a son in the corresponding memory and be put in the 1-16 register according to macro block (mb) type.For I16 * 16 brightness DC coefficients, the brightness residual coefficient of I4 * 4 and P16 * 16 directly deposits it in the 1-16 register.Colourity DC coefficient for all macro block (mb) types will be placed on the 1-4 register, and other 5-16 register is zero by tax then.Brightness AC coefficient and colourity AC coefficient for all macro block (mb) types will be placed on the 1-15 register respectively with its effective residual error AC coefficient, and No. 16 register is zero by tax then.After finishing according to the residual error coefficient storage format reading of data in coded sequence and the register, successively the input of the value its register is resolved for subsequent module from No. 16 to No. 1 by a multi-selection device.

Because it is 16 to efficient transformation data number respectively that encoder needs, 4,15 residual error coefficient is encoded, when carrying out zig-zag scanning, owing to all be (position that the DC coefficient that is wherein withdrawn stays can be changed to zero) of carrying out backward scanning from back to preceding order, for ease of peek and scanning, can in advance residual error coefficient be reached in certain sequence form by different macro block (mb) types is stored in memory or the register file, when encoding, then a corresponding sub-4*4 piece residual error data is read out and be cached to earlier in 16 data registers, all finish the CAVCL coding one time at every turn by the value in these 16 registers.The order of residual error storage and form are different because of the difference of macro block (mb) type.

By on know, the I_16*16 macro block (mb) type can be by the storage of form as shown in figure 10 residual error data, for brightness DC coefficient, extract the DC coefficient in each sub-4*4 piece in the macro block and do the starting position of leaving memory behind the Hadamard transform in this moment, CAVLC at first encodes to it when starting, and begins one by one the CAVLC coding from sequence number 0 successively then.When each son 4 * 4 block encodings scanning, resolve residual information from the scanning of data16 to data1 backward successively.For the AC coefficient in each sub-4*4 piece, the data independence of data1 storage is critical, when it being got register and carry out next code its being composed is that zero (value in last register is zero by tax, it at first is output scanning), so still can encode and do not influence information analysis between residual error data by the CAVLC algorithm.For brightness DC coefficient, DC coefficient in each sub-4*4 piece in the chrominance block extracted leave brightness residual coefficient back in after doing Hadamard transform, here 4 efficient transformation DC coefficients have only been stored, place it in when register carries out next code in preceding 4 registers in that it is got, the value in 12 registers of back is zero by tax then.

In like manner, can be for I_4*4 and P_16*16 macro block (mb) type by form storage as shown in figure 11.Compare with I_16*16, it does not have brightness DC coefficient Hadamard transform, and the conversion coefficient storage format of colourity is identical with I_16*16.

2) residual error coefficient resolution unit (the streamline second level).

As shown in figure 12, control logic initialization among the residual error coefficient resolution unit data_parse or put 0 for when resetting nonzero coefficient counter tc_n, put 16 for coefficient sequence register num, reorder the unit from data buffering then and one pat by 16 cycles and receive 16 residual error data and its preliminary parsing.If these data are non-zero, then tc_n constantly circulates and adds 1, at last its value output is nonzero coefficient sum TotalCoeffs, simultaneously these nonzero coefficients are kept among the data register bank mem_d after through a gating switch successively, and also write down this moment this nonzero coefficient be in current 16 data which, 1 obtain by successively decreasing one by one, and the sequence number of this nonzero coefficient is kept among the data register bank mem_n uses for the back module with 16.

For example, as shown in figure 13,16 transform datas of first behavior input, backward is resolved successively from right to left, wherein detected successively nonzero coefficient has 1 ,-1,6,4,3,1,2, and corresponding numbering is followed successively by 13,11,9,7,6,3,1, and the nonzero coefficient number of this sequence has 7.

Note, the above-mentioned nonzero coefficient number that counts, the nonzero coefficient number and the corresponding sequence number of storage have just been carried out preliminary parsing to residual error data, and more a step ground also needs to parse number (TrailingOnes), the number (TotalZeros) of last nonzero coefficient leading zero, the number (RunBefore) of each nonzero coefficient leading zero and all zero numbers (ZerosLeft) on the corresponding nonzero coefficient left side of the hangover coefficient of 4 * 4 residual error coefficients of coding.

3) look-up table or calculation code unit (the streamline third level), the look-up table coding that comprises TotalCoeffs, TotalZeros and zero stroke, wherein the coding of the amplitude (Levels) of the nonzero coefficient except the hangover coefficient is to calculate prefix level_prefix and suffix level_suffix earlier, then these two parameters are carried out that coded combination obtains, and TrailingOnes can be directly to encoding symbols.As previously mentioned,

4) code stream concatenation unit (the streamline fourth stage) carries out the code stream splicing to the code word of third level input by respective sequence.

The efficient transformation coefficient number that contains owing to 4 * 4 of a son has 16,15,4 these three kinds of situations.For example: the DC coefficient DC_L behind the luminance factor Hadamard transform under I16 * 16 patterns, coefficient DC_AC_L behind the brightness dct transform under I4 * 4 patterns, coefficient DC_AC_L behind the brightness dct transform under interframe 16 * 16 patterns, its efficient transformation coefficient number is 16; The ac coefficient AC_L that is left behind the luminance factor Hadamard transform under I16 * 16 patterns, colourity 8 * 8 pattern (intra﹠amp; Inter) count AC_C, its efficient transformation coefficient number is 15; And colourity 8 * 8 patterns (intra﹠amp; Inter) DC coefficient AD_C behind the residual error coefficient Hadamard transform under, its efficient transformation coefficient number is 4.

For these three kinds of situations, adjustment and zero setting by the residual error coefficient position among the present invention can adopt a cover hardware to realize, select wherein 16 as worst case, promptly need 16 conversion coefficients of coding at most.When conversion coefficient is 15 and 4, utilize zig-zag scanning correspondingly high 1 and high 12 bit masks to be fallen, fill with zero here.The advantage of this invention is, all adopts under the constant situation of CAVLC encryption algorithm at conversion coefficient, can not influence the information analysis of follow-up residual error, realized well that under the situation that does not influence the coding effect hardware resource shares.

According to different son 4 * 4 block types, adopt different zig-zag scanning and, so just can realize the coding of same cover 4 * 4 coded

hardware

16,4,15 residual error coefficients with corresponding coefficient zero setting (being indifferent to the coefficient value of grey color part this moment).

As shown in figure 14, for I4 * 4, the residual error luminance factor of P16 * 16 and I16 * 16 residual error brightness DC coefficient (be I4 * 4_L, P16 * 16_L and I16 * 16_L_DC), beat and retouch, and promptly scans from 16 to 1 by the backward under the employing a pattern; For I4 * 4, P16 * 16 and I16 * 16 residual error colourity ac coefficients (be I4 * 4_C_AC, P16 * 16_C_AC and I16 * 16_C_AC), beat and retouch, and promptly scans from 16 to 1 by the backward under the employing b pattern; For I4 * 4, P16 * 16 and I16 * 16 residual error colourity DC coefficient (be I4 * 4_C_DC, P16 * 16_C_DC and I16 * 16_C_DC), beat and retouch, and promptly scans from 16 to 1 by the backward under the employing c pattern.The square frame of the not reference numbers among the figure partly is changed to zero.

CAVLC realizes coding to 16,4,15 residual error coefficients with same cover 4 * 4 coded hardware.These 3 kinds of patterns all can be used generally speaking, need to handle a plurality of sub 4 * 4 or 2 * 2 in the macro block, and the number of times of its coding circulation is different.

As shown in figure 15,, carry out the DC_L_16 coefficient in turn 1 time respectively with regard to needs for I16 * 16 macro blocks, AC_L_15 coefficient 16 times, the coding that DC_C_4 coefficient 2 times and AC_C_15 coefficient are 8 times amounts to CAVCL and needs circular treatment 27 times; For I4 * 4 and P16 * 16 macro blocks, carry out the L_16 coefficient in turn 16 times respectively with regard to needs, the coding that DC_C_4 coefficient 2 times and AC_C_15 coefficient are 8 times, amount to CAVCL and need circular treatment 26 times, after selecting corresponding modes, add 1 and count, reach cycle-index and then finish a macroblock encoding by a counter.When corresponding CBP value (coded macroblocks type code_block_patten) when being 0, the integral multiple of this counter addition 4 then.

As above-mentioned, by in advance to the configuration of encoder, allow encoder be operated in different mode according to macro block (mb) type, finish macroblock encoding one time.

To sum up, the present invention utilizes the operator unit to realize being used for the entropy coder of video encoder, and its desin speed is fast.

Above content be in conjunction with concrete execution mode to further describing that the present invention did, can not assert that concrete enforcement of the present invention is confined to these explanations.For the general technical staff of the technical field of the invention, without departing from the inventive concept of the premise, can also make some simple deduction or replace, all should be considered as belonging to protection scope of the present invention.

Claims

1. an implementation method that is used for the entropy coder of video encoder is characterized in that, comprising:

2. the method for claim 1 is characterized in that, each encoding function piece of the entropy coder of the video encoder of high level language arthmetic statement is mapped to the hardware logic that is made of the operator unit describes and comprise the steps:

3. method as claimed in claim 2 is characterized in that, the operator unit in the described operator cell library comprises in computing class operator, storage class operator, control class operator, class of paths operator and the clock class operator at least a.

4. method as claimed in claim 3 is characterized in that, described computing class operator is the restructural operator, a computing class operator configuration control bit and multiple computing, and described control bit is used to select one and selects a kind of computing; Described class of paths operator is used to link storage class operator and computing class operator, forms data flow architecture, and its way of realization comprises multi-selection device, cross bar switch and bus; Described control class operator is used to promote flowing of data flow, its way of realization comprises counter, state machine and microinstruction word, and the control output content comprises: the configuration information of computing class operator dynamic restructuring information, storage class operator, the configuration information of class of paths operator; Described storage class operator is used for different data memory formats and computation scheme are changed.

5. method as claimed in claim 3, it is characterized in that, described execution object comprises operational order and/or control command, described parameter object comprises at least a in data, dateout and the intermediate data of input, described operational order is mapped as the computing class operator, described control command is mapped as control class operator or class of paths operator, described parameter object is mapped as the storage class operator.

6. the method for claim 1, it is characterized in that described encoding function piece comprises the unified Variable Length Code functional block of the grammer parameter that is used to encode, the code stream splicing functional block that is used for the CAVLC functional block of coded residual data and splices all encoding code streams.

7. entropy coder that is used for video encoder, it is characterized in that, comprise being used to encode the unified Variable Length Code functional block of grammer parameter, the code stream splicing functional block that is used for the CAVLC functional block of coded residual data and splices all encoding code streams, described each functional block is made up of the operator unit.

8. entropy coder as claimed in claim 7 is characterized in that, described self-adapting changeable long codes functional block comprises data buffering that order links to each other reorder unit, residual error coefficient resolution unit, look-up table or calculation code unit.

9. entropy coder as claimed in claim 8 is characterized in that, reorder unit, residual error coefficient resolution unit, look-up table or calculation code unit of described data buffering adopts the setting of inter-stage flowing water form.

10. entropy coder as claimed in claim 7 is characterized in that, the residual error coefficient coding operator function group of described CAVLC functional block is for realizing the operator function group of second son 4*4 piece residual error coefficient coding.