CN102088603B - Entropy coder for video coder and implementation method thereof - Google Patents

Entropy coder for video coder and implementation method thereof Download PDF

Info

Publication number
CN102088603B
CN102088603B CN2010106200287A CN201010620028A CN102088603B CN 102088603 B CN102088603 B CN 102088603B CN 2010106200287 A CN2010106200287 A CN 2010106200287A CN 201010620028 A CN201010620028 A CN 201010620028A CN 102088603 B CN102088603 B CN 102088603B
Authority
CN
China
Prior art keywords
operator
class
data
computing
entropy coder
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN2010106200287A
Other languages
Chinese (zh)
Other versions
CN102088603A (en
Inventor
陈奎林
王新安
胡子一
张兴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University Shenzhen Graduate School
Original Assignee
Peking University Shenzhen Graduate School
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University Shenzhen Graduate School filed Critical Peking University Shenzhen Graduate School
Priority to CN2010106200287A priority Critical patent/CN102088603B/en
Publication of CN102088603A publication Critical patent/CN102088603A/en
Application granted granted Critical
Publication of CN102088603B publication Critical patent/CN102088603B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses an entropy coder for a video coder and an implementation method thereof. The implementation method of the entropy coder comprises the following steps: mapping each coding functional block of the entropy coder of the video coder into hardware logic description which is composed of operator units, wherein each coding functional block is described by an advanced program language algorithm; and generating a hardware integrated circuit of the entropy coder by the hardware logic description which is composed of the operator units. According to the invention, the design speed for the integrated circuit of the entropy coder can be quickened.

Description

Entropy coder and its implementation for video encoder
Technical field
The present invention relates to a kind of entropy coder for video encoder and its implementation.
Background technology
It is to be used to refer to the methodology that derived set becomes circuit design that method of designing integrated circuit is learned.Current integrated circuit (IC) design, especially large scale integrated circuit design, its common design cycle is: the system engineer proposes the systemic-function framework according to functional requirement, the software engineer utilizes computer advanced program language (such as Java, C, C++, Matlab etc.) to be described the systemic-function framework, thereby obtains the arthmetic statement of the high level language of integrated circuit; Then by the integrated circuit (IC) design engineer, utilize hardware description language (such as Verilog, VHDL etc.), arthmetic statement is comprehensively become to net meter file, use afterwards the special hardware synthesis instrument to realize placement-and-routing, obtain integrated circuit diagram.
In the IC industry demand, desin speed has become the significant consideration that method of designing integrated circuit is learned.As integrated circuit manufacturer or integrated system supplier by the basic problem of commercial operation (as product quantity, design/testing expense, not reusable engineering spending (Non Recurring Engineering, NRE), the integration of User IP/multiplexing, reliability and maintainability, production process complexity etc.) consider when interior, expedite product releases the time, enhance product performance, reduce product manufactures risk and just becomes the significant consideration while selecting designing technique.
Along with integrated circuit fabrication process enters 45nm with the next stage, the price scissors that the integrated circuit (IC) design development lags behind the integrated circuit technology progress continues to increase.Data flow algorithm, especially the ASIC of Digital Media class (Application Specific Integrated Circuit, application-specific integrated circuit (ASIC)) design, the integrated circuit (IC) design of video encoder for example, face the problems such as design cycle length, very flexible, autgmentability is poor, be difficult to meet the demand that product goes on the market fast, function constantly promotes, thereby need to improve and promote.
Summary of the invention
The main technical problem to be solved in the present invention is, a kind of entropy coder for video encoder and its implementation are provided, and can accelerate the integrated circuit (IC) design speed of entropy coder.
For solving the problems of the technologies described above, the present invention has adopted following technical scheme:
A kind of implementation method of the entropy coder for video encoder comprises:
Each encoding function piece of the entropy coder of the video encoder of high level language arthmetic statement is mapped to by the hardware logic of operator cell formation and describes; By the hardware logic of described operator cell formation, formation entropy encoder hardware integrated circuit is described.
The described method of an embodiment of the present invention, be mapped to each encoding function piece of the entropy coder of the video encoder of high level language arthmetic statement by the hardware logic of operator cell formation and describe and comprise the steps:
Process analysis step: read the high level language algorithm routine of each encoding function piece of entropy coder, identify mapped execution object and parameter object according to this high level language rule from described high level language algorithm routine;
The Data Control flow graph generates step: the execution object and the parameter object that identify are mapped to respective nodes in the Data Control flow graph of describing the entropy coder algorithm;
Operator structure figure generates step: according to the function treatment of each node in the Data Control flow graph, take out at least one operator unit of corresponding function from described operator cell library, described Data Control flow graph is converted to the operator space-time diagram formed by the operator unit;
The temporal constraint step: determine total temporal constraint according to the requirement of user specification demand and target integrated circuit technology, each the operator unit label time in the operator space-time diagram, carry out temporal constraint to each level of operator space-time diagram;
Space-time diagram compression step: according to time-labeling, the operator space-time diagram is carried out to the cluster compression on space, and make overall algorithm execution time close to total temporal constraint;
Lower hardware mapping step: the operator space-time diagram after the cluster compression is mapped to by the hardware logic of operator cell formation and describes.
The described method of an embodiment of the present invention, the operator unit in described operator cell library comprises computing class operator, storage class operator, controls at least one in class operator, class of paths operator and clock class operator.
The described method of an embodiment of the present invention, described computing class operator is the restructural operator, a computing class operator configuration control bit and multiple computing, described control bit is for selecting a kind of computing of a selection; Described class of paths operator, for linking storage class operator and computing class operator, forms data flow architecture, and its way of realization comprises multi-selection device, cross bar switch and bus; Described control class operator is for promoting flowing of data flow, its way of realization comprises counter, state machine and microinstruction word, controls output content and comprises: the configuration information of computing class operator dynamic restructuring information, storage class operator, the configuration information of class of paths operator; Described storage class operator is for being changed different data memory formats and computation scheme.
The described method of an embodiment of the present invention, described execution object comprises operational order and/or control command, described parameter object comprises at least one in input data, output data and intermediate data, described operational order is mapped as to the computing class operator, described control command is mapped as and controls class operator or class of paths operator, described parameter object is mapped as to the storage class operator.
The described method of an embodiment of the present invention, described encoding function piece comprises unified Variable Length Code functional block for the grammer parameter of encoding, for the CAVLC functional block of coded residual data and the code stream splicing functional block of splicing all encoding code streams.
The present invention also provides a kind of entropy coder for video encoder, comprise unified Variable Length Code functional block for the grammer parameter of encoding, for the CAVLC functional block of coded residual data and the code stream splicing functional block of splicing all encoding code streams, described each functional block is comprised of the operator unit.
The described entropy coder of the embodiment of the present invention, described self-adapting changeable long codes functional block comprises the connected data buffering of order reorder unit, residual error coefficient resolution unit, look-up table or calculation code unit.
The described entropy coder of the embodiment of the present invention, reorder unit, residual error coefficient resolution unit, look-up table or calculation code unit of described data buffering adopts the setting of inter-stage flowing water form.
The described entropy coder of the embodiment of the present invention, the residual error coefficient coding operator functional groups of described CAVLC functional block is the operator function group that realizes a second son 4*4 piece residual error coefficient coding.
The invention has the beneficial effects as follows: the present invention means the operator unit of hardware logic by design, the high level language arthmetic statement of the entropy coder of video encoder is mapped to the hardware logic description by the operator cell formation, thereby can accelerates the integrated circuit (IC) design speed of entropy coder.
The accompanying drawing explanation
Fig. 1 shows the typical H.264 structure of video encoder;
Fig. 2 shows a few class operator structures that the embodiment of the present invention provides; Wherein, shown in Fig. 2 .1 is the computing class operator, and shown in Fig. 2 .2 is the storage class operator, and shown in Fig. 2 .3 is the class of paths operator, and shown in Fig. 2 .4 is to control class operator;
Fig. 3 .1 shows the general structure of entropy coder, and Fig. 3 .2 shows the operator function structure of entropy coder;
Fig. 4 .1 shows the overall Operator structure of the entropy coder of the embodiment of the present invention, and Fig. 4 .2 shows the operator function structure of the bs_write_ue function of entropy coder;
Fig. 5 shows the overall procedure of the entropy coder of the embodiment of the present invention;
Fig. 6 shows 4 * 4 CAVLC codings of repeatedly son circulation of the embodiment of the present invention;
Fig. 7 shows the CAVLC coding schematic flow sheet of the embodiment of the present invention;
Fig. 8 shows residual error coefficient cache register form under the I_16*16 macro block (mb) type of the embodiment of the present invention;
Fig. 9 shows residual error coefficient cache register form under the I_4*4 of the embodiment of the present invention and P_16*16 macro block (mb) type;
Figure 10 shows residual error coefficient storage format and coded sequence under the I_16*16 macro block (mb) type of the embodiment of the present invention;
Figure 11 shows residual error coefficient storage format and coded sequence under the I_4*4 of the embodiment of the present invention and P_16*16 macro block (mb) type;
Figure 12 shows the residual error coefficient Preliminary Analysis cellular construction of the embodiment of the present invention;
Figure 13 shows the residual error coefficient Preliminary Analysis process of the embodiment of the present invention;
Figure 14 .1, Figure 14 .2, Figure 14 .3 show the zig-zag scanning of the embodiment of the present invention; Wherein, Figure 14 .1 is a pattern, and Figure 14 .2 is the b pattern, and Figure 14 .3 is the c pattern;
Figure 15 shows three kinds of coding modes of the CAVLC of the embodiment of the present invention.
Embodiment
Below by embodiment, by reference to the accompanying drawings the present invention is described in further detail.
One, video encoder general introduction
Video coding technique is the key technology of digital media storage and transmission, and it is widely used in the fields such as Digital Television, video communication, network flow-medium.The present invention relates to a kind of entropy coder for video encoder and its implementation based on operator thought, be applicable to the entropy coding of various image/videos in the signal process field/audio frequency dct transform data.
At present in field of video encoding, by the common new digital video coding standard of formulating of the joint video team (JVT) of International Telecommunication Association (ITU-T) and International Standards Organization (ISO) amalgamated consolidation-H.264 be used widely.H.264 algorithm be International Telecommunication Association (ITU-T) H.264, it is also the content of the 10th part of the MPEG-4 standard of International Standards Organization (ISO) and International Power committee (IEC), it adopts with different levels algorithm structure, conceptive, can be divided into two-layerly, be respectively Video coding layer and network abstraction layer.Video coding layer (Video Coding Layer, be called for short VCL) responsible efficient video frequency content expressing, network abstraction layer (Network Abstraction Layer is called for short NAL) is responsible for data being packed and transmitting in the desired appropriate mode of network.Contribute to the encapsulation of information and information is carried out to better priority control by Video coding layer and the such structure of network abstraction layer.
H.264 the video encoder structure as depicted in figs. 1 and 2.As seen from Figure 1, H.264 the main subalgorithm of cataloged procedure has: inter prediction, infra-frame prediction, transform and quantization, block-eliminating effect filtering and entropy coding.Below each subalgorithm is done to a summary.
1, inter prediction
Inter prediction is the correlation of utilizing video image interframe, i.e. time correlation reaches the purpose of image compression.In image transmission technology, live image is the emphasis of paying close attention to.Live image upper be take the temporal image sequence that successive image frame that the frame period is interval forms by the time, and it is in time than spatially having larger correlation.Between most of image consecutive frames, variations in detail is very little, between picture frame, has very strong correlation, and this correlation characteristics of utilizing frame to have are carried out interframe encode, can obtain the compression ratio more much higher than intraframe coding.H.264 inter prediction is to utilize the predictive mode of encoded video frame/field and block-based motion compensation.
2, infra-frame prediction
Infra-frame prediction is a kind of new technology H.264 adopted, and refers to utilize the neighbor of the top of current block and left as the reference pixel, and the pixel of current block is predicted.Intraframe predictive coding can reduction image spatial redundancy, reduce code check.
3, transform and quantization
In Image Coding, transition coding and quantification are two independently processes in principle.But in H.264, the multiplication of two processes is united two into one, and further adopt integer arithmetic, reduce the operand of encoding and decoding.Quantizing process is to reduce Image Coding length under the prerequisite that does not reduce video effect, reduces unnecessary information in video recovery.Its objective is and remove after conversion the unessential data component of the visual quality of image and keep visually important data component.Its character is that the component after being removed can not recover, and is to diminish process.
4, block elimination filtering
Block-based video coding system inevitably there will be some comparatively significantly blocking effects when reconstructed image at some block boundary place, particularly in the situation that low code check.The pel motion compensation precision that the reason of appearance blocking effect is the block boundary pixel is generally all lower than pixel in piece, and the piece conversion has also caused the discontinuity of block boundary.The main process of block elimination filtering comprises: determine filtering strength, judge false border, filtering calculates.
5, entropy coding
In Video coding, entropy coding changes a series of symbols of element that are used for meaning video sequence into one and is used for transmission or the compressed bit stream of storing.The symbol of input may comprise transformation system, motion vector, mark, head and the additional information of quantification.
The present invention relates to the H.264 entropy coder of video encoder,, mainly entropy coder is H.264 described for this reason.
H.264 entropy coding is supported two kinds of pattern: CAVLC (Variable Length Code of based on the context) and CABAC (arithmetic coding of based on the context), and CAVLC encoder complexity and code efficiency are relatively low, and anti-error code is that error correcting capability is stronger; CABAC encoder complexity and code efficiency relatively high, but anti-error code and error correcting capability a little less than.Therefore, in the coding framework for the different application field H.264, Baseline profile (basic framework, be used for video conversation etc.) and Extended profile (extension framework, be used for Internet video flow transmission etc.) employing CAVLC, and Main profile (main framework, for video broadcasting etc.) adopts CABAC.
Entropy coding involved in the present invention is mainly for the CAVLC entropy coding of Baseline profile, in H.264 entropy is encoded, syntactic element to be encoded can divide makes two classes, one class is residual error data, adopt the CAVLC coding, in the CAVLC encryption algorithm, the coding of each symbol has the VLC table of several based on the context content-adaptives, and the statistical information of the selection of these tables based on the piece content.Another kind of syntactic element is referred to as the grammer parameter, comprise header, macro-block coding pattern information, quantization parameter, reference frame index etc., these grammer parameters all adopt Exp-Golomb (index Columbus) coding, the index Columbus coding is a kind of variable-length encoding that adopts regular texture, because its tactical rule is unified, therefore also referred to as unified variable-length encoding (UVLC).
Although H.264 standard does not limit the method for coding, stipulated bitstream format and the decoding algorithm of Video coding output, at present in actual applications, three kinds of reference encoding and decoding softwares of increasing income are arranged: JM, X264 and T264.
The JM reference model is official's reference software, realized H.264 all characteristics, but its program structure is tediously long, only considers to introduce various new features to improve coding efficiency, has ignored encoder complexity, and its encoder complexity is high, unsuitable practical.
X264 freely organize on the net the H.264 encoder of standard code stream of the compatibility developed jointly, is characterized in Lay Importance on Practical, with JM, compares, and under the prerequisite of not obvious reduction coding efficiency, makes great efforts the computation complexity that reduction is encoded.
T264 is that Chinese Video coding is freely organized the H.264 codec of developing jointly, and the starting point of its characteristics and X264 is similar.
The present invention be take X264 as basis, and the high level language arthmetic statement of entropy coder of video encoder of X264 code form of take is starting point, to realize the integrated circuit (IC) design of entropy coder.
Two, operator design philosophy general introduction
Integrated circuit (IC) design, from the transistor design of full customization, develop into before and after 1984 and take the gate array design that 1.0 μ m process nodes are representative, again to take the standard cell design that 0.5 μ m process node is representative before and after 1994, and take the time that the design based on IP (IP core) that 0.25 μ m process node is representative and 0.13um process node in 2004 be representative before and after 1999 and drive, current integrated circuit (IC) design is generally based on the standard cell granularity, by hardware description language, the integrated circuit hardware logic is described, yet hardware description language also is not suitable for the programmed logic of Direct function description high level language and carries out flow process, cause in current integrated circuit (IC) design, longer to the speed required time of lower hardware mapping by high-level language, become the bottleneck place of restriction integrated circuit (IC) design speed.
Along with after the integrated circuit worker enters 45nm, the efficiency of integrated circuit (IC) design seriously lags behind the progress of manufacturing process.Tradition ASIC design because the design cycle is long, design cost is high, very flexible, autgmentability be poor, is difficult to meet the demand that product goes on the market fast, function constantly promotes.The large-scale desin speed based on standard cell can not meet the demands, traces it to its cause, and mainly is that the granularity of standard cell is too little, can only carry out the computings such as addition subtraction multiplication and division, can't realize the computing that granularity is larger.As once-through operation can complete a multichannel addition, displacement add, add displacement or | computings such as a-b|, and these algorithms are particularly common in video algorithm.
Given this, a kind of Fast design method based on operator is suggested.The basic thought of operator design is rapid Design, and it tries hard to hold best cut-off, to realize the rapid Design of integrated circuit.
The operator method for designing wishes can support the description of algorithm from mathematics, for the automation translation of implementation algorithm to operator language, Operator structure provides support.The characteristics of operator are as follows:
1. granularity is larger
Each operator has certain function, and it is that a kind of symbol of functional operation means.Whole operator system can support the various functions in arthmetic statement to realize, the designer can be described and verify algorithm based on operator;
2. support the realization of algorithm
But the complete a limited number of operators of kind can support superstratum describe in to the description of computing, storage, control and data path, and can describe and be optimized and verify based on operator;
3. can accelerate design and checking
After the designer utilizes operator to complete the functional description of algorithm, corresponding hardware designs also completes simultaneously.Because all operators be all full customization in advance and be verified correct unit, therefore will shorten the time of design and checking, the designer only need to guarantee that description and the correct of mapping process based on operator gets final product, thereby will greatly accelerate to design and proof procedure.
The operator that the present invention proposes, operator unit in other words, the hardware logic that can mean integrated circuit, difference in functionality based on operator, operator can be divided into to computing class operator, storage class operator, class of paths operator, control several classes such as class operator, clock class operator, these operators form a set of complete operator system, can support the realization of all algorithms.Through design, these basic operators are optimizations on the parameters such as area, power consumption, performance, and they have formed the operator cell library.And the operator unit can also form the operator function piece, the operator function interblock adopts shared MEM on-link mode (OLM) to carry out data communication, also can adopt register transfer to carry out the control information transmission.Below all kinds of operators are done to an explanation.
1, computing class operator.
Computing class operator (AU) is for realizing the elementary cell of logical operation, arithmetical operation or the hybrid operation of logical AND arithmetic, supports the realization of the calculation functions such as arithmetical logic, displacement, realizes that the operator of these basic operations is called the basic operation operator.And, can on the basis of basic operation operator, be expanded and be supported for different application, such as subtracting each other in media processing, take absolute value, average, the complexity such as fixed coefficient filtering operation but the computing that is in daily use, the computing operator of realizing these extended arithmetics is called the extended arithmetic operator, and they are built by the basic operation operator.The common computing that the computing class operator completes comprises: addition, addition are cumulative, the addition of four numbers, complete 2 number additions and do the amplitude limit computing of the computing of M bit shift, input data, control code decoding etc. again.
The structural representation that Fig. 2 .1 is a kind of computing class operator-ADDS operator, it comprise ADD unit for realizing adding reducing and for realize shifting function<</unit.The computing class operator has reconfigurability, can be a computing class operator configuration control bit and multiple calculation function, control bit is for selecting a kind of computing of a selection, that is to say, the computing class operator comprises arithmetic logical unit and computing configuration register, the computing configuration register is for receiving and storage computing configuration-direct, the arithmetical logic operation that different computing configuration-directs is corresponding different, demand according to different application, by configuration control bit (the computing configuration-direct of computing configuration register), can make the computing operator realize different functions.For example go up the ADDS operator in example, can realize multiple difference in functionality by the parameter value of control bit X, the restructural operator, because abundant application function can be used in different scenes, has reduced the operator number stored in the operator cell library.And the restructural operator can also be realized dynamic reconstruct by the mode that changes control bit in its implementation.And if the calculation function of operator is optimised fixing, control bit can be fixed to (solidifying customization).
2, storage class operator.
The basic structure schematic diagram that Fig. 2 .2 is storage class operator (MU), the storage class operator comprises stored configuration register and memory cell, memory cell comprises address-generation unit, data storage, data generation unit and data output control unit.The stored configuration register can be by data output control unit configuration store operator (MU) memory bank (various storage mediums: writing and/or the playback mode MEM such as register, RAM), the working method of can also the configuration store body corresponding address-generation unit.The address generated according to address-generation unit directly will be inputted data and store precalculated position into, and the data of needs are exported from deposit position.
Order and arrangement that different phase in algorithm is processed data are different, thereby make the storage format of data in each stage different with mode, and storage operators can also realize for this class demand the conversion of the different storage formats of data and mode.The storage class operator can be supported storage format basic and commonly used in different application and conversion regime, such as string conversion, parallel-serial conversion, queue structure, stack architecture etc.
3, class of paths operator.
The universal architecture schematic diagram that Fig. 2 .3 is class of paths operator (LU).The class of paths operator comprises the routing configuration register and forms alteration switch and the data register (REG) of Route Selection unit, wherein, the routing configuration register is controlled the control of class operator CU (seeing below), controls alteration switch and realize the connection between the nonidentity operation class operator according to the mode of expecting under the control action of controlling class operator.Data register is for the inputoutput data of temporary computing class operator and storage class operator.That is, the effect of class of paths operator is to connect storage class operator and computing class operator, forms data flow architecture.Its way of realization mainly contains three kinds: multi-selection device, cross bar switch and bus.
4, control class operator.
Fig. 2 .4 is for controlling the universal architecture schematic diagram of class operator (CU).Controlling class operator is mainly that configuration information is sent to corresponding configuration register, and configuration computing class operator, storage class operator and class of paths operator are realized predetermined function.That is, the effect of control class operator is normally to control flowing of data flow.Its way of realization has three kinds: counter, state machine and microinstruction word.The control output content comprises: the configuration information of computing class operator dynamic restructuring information, storage class operator AGU, the configuration information of class of paths operator, realize the control to other operators.Wherein the microinstruction word structure comprises decoder, program counter, command memory and Pipeline control module etc.Control class operator and send configuration information by carrying out simple configuration-direct to each functional unit.
5, clock class operator.
The clock class operator is to drive and timing part, for the clock signal of computing class operator, storage class operator, class of paths operator and control class operator is provided, clock signal comprises the signal of controlling the clock start-stop and controlling clock frequency, and clock signal can configure according to the mode of expectation.
Above five class operators are the bases of realizing following execution mode, be understandable that, above-mentioned to will be divided into for the operator of integrated circuit (IC) design five large classes not exclusive dividing mode according to function, can also carry out targetedly according to actual conditions the division of wide region more or thinner scope.
The operator design process of integrated circuit (IC) may be summarized to be the arthmetic statement of the senior language such as C, Matlab is converted into to the process that optimal design was described and carried out on Operator structure to operator.Because operator designs in advance and verifies, so the description of the operator of algorithm can quick or automatic complete design proof procedure.For specific area, such as communication or media, can design some better expansion operator or operator IP that support the algorithm realization, thereby accelerate to a greater degree and process of optimization.
The present invention relates to the realization for the entropy coder of video encoder by operator, it mainly comprises:
Each encoding function piece of the entropy coder of the video encoder of high level language arthmetic statement is mapped to by the hardware logic of operator cell formation and describes; By the hardware logic of described operator cell formation, formation entropy encoder hardware integrated circuit is described.
Each encoding function piece at the entropy coder of the video encoder by the high level language arthmetic statement is mapped in the hardware logic of operator cell formation is described process, mainly comprises the steps:
1) process analysis step: read the high level language algorithm routine of each encoding function piece of entropy coder, identify mapped execution object and parameter object according to this high level language rule from described high level language algorithm routine; The Main Function of this step is that algorithm decomposes, and is mainly that the algorithm of the senior language description forms such as C, Java, Matlab is divided into to several subalgorithm processes.Algorithm decomposes the information such as basic operation of Main Analysis function calling relationship, function structure body and variable, bottom function, in order to realize the mapping to algorithm with each above-mentioned class operator.Carry out object and mainly comprise operational order and/or control command, parameter object mainly comprises at least one in input data, output data and intermediate data, when operator shines upon, operational order is mapped as to the computing class operator, control command is mapped as and controls class operator or class of paths operator, and parameter object is mapped as the storage class operator.Computing class operator and storage class operator, by the link of class of paths operator, are controlled class operator control are provided, and the clock class operator provides clock.
2) the Data Control flow graph generates step: the execution object and the parameter object that identify are mapped to respective nodes in the Data Control flow graph of describing the entropy coder algorithm.On the Data Control flow graph, operational order is mapped as the processing block diagram, and control command is mapped as the control stream for identification-state, state transitions condition and state control signal, and input data, output data and intermediate data are mapped as the memory node on data flow.
3) Operator structure figure generates step: according to the function treatment of each node in the Data Control flow graph, take out at least one operator unit of corresponding function from the operator cell library, described Data Control flow graph is converted to the operator space-time diagram formed by the operator unit; The operator space-time diagram has mainly meaned to utilize the form after operator launches algorithm fully, and after algorithm is launched fully by operator, algorithm will be carried out by concurrency fully, fastest, the area maximum, and this is that space-time diagram launches.The operator space-time diagram is to replace each node in the Data Control flow graph with the combination of one or more operators unit, and the combination of one or more operators unit can complete the function identical with each node.For how the Data Control flow graph being launched, include but not limited to following several mode: if the data flow in the Data Control flow graph adopts the mode of streamline to launch described order related data flow for order related data flow structure; If there is feedback in the data flow in the Data Control flow graph, and this data flow is a circulation time, if between the data of this data flow, data dependence is arranged, has the data flow of data dependence can not be converted into flowing structure; If have data dependence between the data of data flow, and while existing each data flow inside of data dependence not have data dependence, described internal data is not existed each data flow of data dependence to adopt the mode of local flow's waterline to launch; If there is not data dependence between the data flow in the Data Control flow graph, adopts parallel mode to launch described parallel data stream, and convert the operator space-time diagram formed by the operator unit to.
4) temporal constraint step: according to the requirement of user specification demand and target integrated circuit technology, determine total temporal constraint, to each the operator unit label time in the operator space-time diagram, on the other hand, can extract the operator time sequence information from the operator cell library, the operator space-time diagram is done to the sequential mark, form the object of temporal constraint.Thereby according to data flow characteristic can be by temporal constraint each level specific to the operator space-time diagram, realize each level of operator space-time diagram is carried out to temporal constraint.
If described data flow architecture is parallel data stream, total temporal constraint is divided equally to each the operator level in corresponding space-time diagram, and divided the temporal constraint of each operator level equally each operator unit in this operator level.Using corresponding each operator level of each node of serial in Data Control stream, the basic sequential unit of total operator is as overall temporal constraint, and the ratio that accounts for the sequential summation that operator unit that in each operator level, the longest arithmetic path shines upon is corresponding according to the sequential of the computing operator that in each operator level, the longest arithmetic path shines upon is distributed the sequential of each operator level.
5) space-time diagram compression step: according to time-labeling, the operator space-time diagram is carried out to the cluster compression on space, and make overall algorithm execution time close to total temporal constraint; This step is the time-labeling according to each subalgorithm process, carry out the cluster compression in the operator space-time diagram, control class operator and the class of paths operator improves the multiplexing of storage class operator and computing class operator by increase, the deadline of each subalgorithm process is marked around with algorithmic procedure as far as possible, reduce the quantity of computing and storage class operator as far as possible.
An arthmetic statement is tiled out fully with operator, and without any multiplexing, the speed that completes of algorithm will be the fastest, but this cost of bringing will be very high area cost and power consumption, and this not designs desirable.The target expectation of design meets: under the prerequisite that meets the requirement of user's sequential, reduce area and power consumption demand, so need more multiplexing operators the largelyst.So just need to carry out the mark that adds its time of implementation on corresponding function at each operator, thereby can make overall algorithm execution time close to the actual requirement time according to multiplexing operator of time of implementation, the time diagram of Here it is operator.And space diagram is the definition to the operator area,, when the operator cluster is shone upon, according to the area of operator, carry out cluster, select the operator that multiplexing area is less.In the operator cluster process, these two kinds of factors all need to consider.The compression of space-time diagram is for example carried out as follows: find out computing class operator and/or the identical storage class operator of memory attribute that attribute is identical in the operator space-time diagram; Then according to time-labeling the computing class operator identical to operational attribute spatially merge the compression and/or by memory attribute, identical storage class operator spatially merges compression; Then introduce and control class operator, the computing class operator after compression and/or storage class computing operator are generated to the corresponding configuration instruction, realize the multiplexing of computing class operator and/or storage class operator.
The step of cluster compression step and generation restructural operator function piece, all can produce not only a kind of result.The same subfunction of different function calls, due to difference confinement time, the cluster result produced is also different.Therefore need to be optimized according to parameters such as time, area, power consumptions, by performance (time of implementation) discharge order, just the cluster result that meets time-constrain means that its hardware realizes Least-cost, therefore selects the optimum results as the cluster compression close to the space-time diagram that completes the needed total temporal constraint of integrated circuit algorithm of overall algorithm execution time.
6) lower hardware mapping step: the operator space-time diagram after the cluster compression is mapped to by the hardware logic of operator cell formation and describes.
The bs_write_ue function of the X264 code description of standard H.264 of take in the present embodiment illustrates the operator design cycle of entropy coder as example.
The bs_write_ue function completes without symbol index Columbus and encodes.Val in function is parameter to be encoded, after the index Columbus coding, is written in structure bs_t.Started by a high position, write low level.
At first carry out function decomposition, table 1 is bs_write_ue function input/output relation.
Signal name Data type Direction Explanation
val uint32_t IN // parameter value to be encoded
i_size int IN The figure place that // sign will be write
*s->p uint8_t IN // target writes register
p uint8_t?* IN //bs_t structure, the code stream current location
p_end uint8_t?* IN //bs_t structure, the code stream end position
i_left int IN //bs_t structure, write the useful position of the residue of destination register
*s->p uint8_t OUT // target writes register
p uint8_t?* OUT //bs_t structure, the code stream current location
i_left int OUT //bs_t structure, write the useful position of the residue of destination register
These input/output signals will be identified as parameter object, on the other hand, the various computings in function be resolved, and identify the execution object.Common, several classes such as order execute statement, condition (branch's control) execute statement, Do statement are arranged in function.The bs_write_ue function code is as follows:
Figure GDA0000051936370000111
Figure GDA0000051936370000121
When function is resolved, for Do statement, can loop body be launched according to cycle-index, obtain a plurality of loop bodies with the cycle-index equivalent number.Each loop body comprises operation expression, and between each operation expression, common parameter object is arranged.According to the relation between parameter object, the operation expression in each loop body of iteration, thus can mean with a new operation expression computing of former Do statement.Operational order by this in new operation expression is mapped as the processing block diagram, and the parameter object in operation expression is mapped as to the memory node on data flow.Recursion instruction can be mapped as state machine.For branch's control statement, it can be mapped as to MUX.
In the Operator Method based on rapid Design is learned, from the high level language arthmetic statement, obtain the space-time diagram of configurable operator by mapping, through cluster, operator is carried out to space compression from space-time diagram, make in the situation that meet the sequential condition, based on the close principle consistent with control of function, the operator cluster is formed to the ASIC framework that preliminary operator is realized.Remove the unnecessary configurable component of operator according to the realization of target algorithm, make the parameters such as area, power consumption, performance further be optimized, finally for the ASIC customized, generate configuration-direct.This process can be saved the high level language algorithm, studying carefully of c program algorithm for example, and can directly according to the attribute of C algorithm, carry out the mapping of operator, and then shortened design time widely.
Three, operator is realized summary
With reference to Fig. 1, the entropy coding is positioned at the final step of whole cataloged procedure, it is to the conversion coefficient after reordering, from to front carrying out the backward parsing, extract the information between residual error coefficient, comprising: the number (RunBefore) of the number (TrailingOnes) of the number of residual error nonzero coefficient (TotalCoeffs), hangover coefficient, the number (TotalZeros) of last nonzero coefficient leading zero, each nonzero coefficient leading zero and all numbers (ZerosLeft) of zero on the corresponding nonzero coefficient left side.After parsing these parameters, just can go to have encoded, finally by the code stream finished successively by the respective sequence array output.
The encoding function piece of entropy coder comprises unified Variable Length Code (UVLC) functional block for the grammer parameter of encoding, for CAVLC (CAVLC) functional block of coded residual data and the code stream splicing functional block of splicing all encoding code streams.Wherein, UVLC mainly is responsible for index Columbus coding and part block code; CAVLC mainly is responsible for processing macro block data, and inside is mainly look-up table coding and part index number Columbus coding.These entropy encoding function pieces can be realized that by the unit polymerization of a series of operators the operator unit can be called repeatedly, completes whole entropy encoding function.Wherein index Columbus encoding function piece has two kinds: symbol index Columbus encoding function piece is arranged, without symbol index Columbus encoding function piece; Context-adaptive variable length residual error coefficient encoding function piece has three kinds: sub 16 residual error coefficient encoding function pieces, sub 15 residual error coefficient encoding function pieces, sub 4 residual error coefficient encoding function pieces.Code stream splicing functional block has a kind of, can complete splicing and combining of all code streams.
The present invention is the entropy encoding function below the SLICE layer based on class H.264baseline mainly, configurable by coding mode, and self adaptation realizes that macro block (mb) type is intra-frame 4 * 4 respectively, in frame 16 * 16, the coding mode of interframe 16 * 16 and P-SKIP.The operator function group that the coding of these three kinds of macro block (mb) types all is comprised of the operator unit realizes.Complete respectively the entropy coding under different macro block (mb) types.
For the CAVLC residual coding, for the coding of 4 * 4 residual error coefficients of a son, the embodiment of the present invention adopts the mode of inter-stage flowing water to raise the efficiency.Specifically be divided into following level Four:
1) the data buffering unit (the streamline first order) that reorders.
2) residual error coefficient resolution unit (the streamline second level).
3) look-up table or calculation code unit (the streamline third level).
4) code stream concatenation unit (the streamline fourth stage).
For the CAVLC entropy coder, the efficient transformation coefficient number that 4 * 4 of son contains has 16,15,4 these three kinds of situations.For these three kinds of situations, can adopt a set of hardware to realize, select wherein 16 as worst case, the residual error number of coding is 16 to the maximum.Generally these 3 kinds of patterns all can be used, and need to process a plurality of sub 4 * 4 or 2 * 2 in a macro block, under different macro block (mb) types, all adopt same set of coded hardware to process, but the number of times of its coding circulation is different, by a counter, controls realization.
Due in embodiments of the present invention, the coded hardware that CAVLC realizes is just for the coding of 4 * 4 residual error coefficients of a second son, and coded residual error number is 16 to the maximum, adopts different zig-zag scanning and by corresponding coefficient zero setting.When conversion coefficient is 15 and 4, utilize zig-zag scanning correspondingly high 1 and high 12 bit masks to be fallen, select here to fill with zero.The advantage realized like this is: in the situation that conversion coefficient all adopts the CAVLC encryption algorithm constant, realized well hardware resource sharing, can realize the coding of same cover 4 * 4 coded hardware to 16,4,15 residual error coefficients.Its operator function block structure is as Fig. 3 .2.
The residual error coefficient that it is 16,4,15 to the efficient transformation data amount check respectively that encoder needs is encoded, when carrying out zig-zag scanning, due to all to front carrying out backward scanning, for ease of peek and scanning, can in advance residual error coefficient be reached in certain sequence to form by different macro block (mb) types is stored in memory or register file, 4 * 4 residual error data of a corresponding son are read out and first are cached in 16 data registers when being encoded, each all values in these 16 registers complete the CAVCL coding one time.The order of residual error storage and form are according to the difference of macro block (mb) type and difference.
The residual error coefficient resolution unit is patted and is received 16 residual error data and to its Preliminary Analysis by 16 cycles one from the data buffer storage register, record the nonzero coefficient number by a counter, and these nonzero coefficients are kept at successively in data register bank and use for the back module.Simultaneously also recorded this nonzero coefficient and be in current 16 data which, and its sequence number is kept in data register bank and uses for the back module.
Configurable entropy coding method based on operator of the present invention and the H.264 class entropy code device that is applicable to based on hardware-accelerated, its hardware has regularity and flexibility, by Configuration the adaptive entropy coding of different macro block (mb) types, can carry out the entropy coding and not interfere with each other different macro block (mb) types, also be convenient to transplanting and the expansion of coding mode when promoting coding rate.
For the CAVLC entropy coder, for 4 * 4 transform data codings of son, only with a set of hardware, just can process the residual coding of different coding coefficient number, realized hardware resource sharing under different macro block (mb) types, also there is high flexibility when saving the encoder hardware area, be convenient to the transplanting of 4 * 4 residual error coefficient codings of son under other standard, further improved the efficiency of hardware coder.
Four, operator is realized describing in detail
In the embodiment of the present invention, entropy encoding function group is realized by the polymerization of a series of operator function piece, mainly comprise index Columbus encoding function piece (unified Variable Length Code functional block), context-adaptive variable length residual error coefficient encoding function and code stream splicing functional block, these functional blocks are presented as function in the C code, the C code, when realizing the entropy encoding function, is realized by repeatedly calling these main body functions.
In the X264 code, the mapping by the C code to hardware, as Fig. 4 .1, the x264_macroblock_write_cavlc function completes the macro-block level coding.Wherein the both sides frame portion is input/output signal, and the memory of sensing does not draw.Middle hollow frame is the function built-in variable.It comprises following logic:
L0 logic: complete the middle mapping of macro block (mb) type coding
my_type_I16=i_mb_i_offset+1+h->mb.i_intra16x16_pred_mode+
h->mb.i_cbp_chroma*4+(h->mb.i_cbp_luma==0?0:12)
I4_cbp_to_golomb: complete the middle mapping of Columbus's coding in the I4 macro block
P4_cbp_to_golomb: complete the middle mapping of Columbus's coding in the P16 macro block
Mux logic: complete code stream splicing logic
Bs_write_ue logic: encode without symbol index Columbus
Bs_write_se logic: symbol index Columbus coding is arranged
X264_residual_write_caylc logic: context-adaptive variable length residual error coefficient coding
Above-mentioned logical block is the operator function piece group that forms entropy encoding function group, and they are all built by the basic operation operator.
The operator function block structural diagram of aforesaid bs_write_ue function can be with reference to figure 4.2.
As Fig. 3 .1, variable-length encoding mainly is divided into 2 independently unit, i.e. UVLC and CAVLC, and wherein index Columbus coding and part block code mainly are responsible in the UVLC unit, CAVLC is responsible for unit processing macro block data, and inside is mainly look-up table coding and part index number Columbus coding.For UVLC, at first the parameter value transmitted from the grammer parameter bus exists in cache register (Syntax element_cache), directly carry out the UVLC coding after syntactic analysis is calculated, this syntactic analysis mainly comprises the calculating of macro block (mb) type, and the selection of coding mode etc.; For CAVLC, coding to as if 4 * 4 of sons, at first the residual error data transmitted from data/address bus is buffered in register (MBpixel_cache) with a certain size data (getting 4 * 4 of sons) at every turn, after being analyzed, residual error data carries out the CAVLC coding, this analysis mainly refers to that the information that 4 * 4 residual error data of a son are scanned to extract between them one by one (comprises nonzero coefficient, the hangover coefficient, the nonzero coefficient amplitude, total number of last nonzero coefficient leading zero, the number of each nonzero coefficient leading zero) for coding.Finally pass through a state machine by this two parts code stream array output.
H.264 adopt elongated entropy coding structure, so greatly saved code stream.Wherein macro block header coding comprises: macro block (mb) type, I_4 * 4 luma prediction modes, the look predictive mode, coded macroblocks Type C BP, quantize difference QP etc., this part syntactic element object (Syntex element), the grammer parameter by index Columbus encode (UVLC) realize.That in addition residual error data is carried out to the employing of entropy coding is CAVLC, according to different mode, from the corresponding residual error data of taking out after DCT, quantification by certain format storage, is encoded respectively, after having compiled, code stream is stored in the memory certain area.
After syntactic element to be encoded is written into, at first be placed in the VLC internal register, select different coding modes according to different frame types and macro block (mb) type, entropy with the baseline class is encoded to example, its predictive mode has intra-frame 4 * 4, frame interior 16 * 16, these 4 kinds of interframe 16 * 16 and P-SKIP, corresponding grammer parameter coding cataloged procedure is not identical yet.After judging which kind of coding mode of employing, encoder reads corresponding residual error data from memory.
For P-SKIP predicted macroblock type, the register i-s kip value that will record this macro block (mb) type number adds 1, this time macroblock coding complete (being not carry out any coding in fact, is the corresponding record of having done).
For intra-frame 4 * 4 predicted macroblock type, successively to macro block (mb) type, 16 different luma prediction modes, prediction mode for chroma, 4 * 4 macroblock coding macro block (mb) types and quantization parameter QP are (if its CBP value is 0, all residual error data are all 0, the QP parameter of need not encoding) etc. the grammer parameter carry out the UVLC coding, then residual error data is carried out to the CAVLC coding.
For 16 * 16 predicted macroblock types in frame, successively to macro block (mb) type, prediction mode for chroma and quantization parameter QP carry out the UVLC coding, then residual error data are carried out to the CAVLC coding.
For interframe 16 * 16 predicted macroblock types, to macro block (mb) type, (if its CBP value is 0, all residual error coefficients are all 0 for 16 * 16 macroblock coding macro block (mb) types and quantization parameter QP successively, the QP parameter of need not encoding) carry out the UVLC coding, then residual error data is carried out to the CAVLC coding.
Above each coding mode has all been realized the following macroblock coding of lamella, each pattern becomes the absolute coding unit, with ASIC, realizes, inside respectively has a state machine to carry out the control coding flow process, also there is a code stream splicing functional block each inside, unit, and UVLC and CAVLC code stream are combined.The code stream of finally by a multi-selection device, selecting input to encode and merged.In view of the coding parameter of CAVLC is more and independent one-tenth unit, therefore also have a state machine and code stream splicing functional block to integrate the CAVLC code stream in the CAVLC unit.
As Fig. 6, because a macro block only has a macro block head, therefore corresponding macro block head parameter coding UVLC for once, and the coded residual data pin right be 4 * 4 of sons or 2 * 2 (now all share 4 * 4 CAVLC of son encode this set of hardware), therefore the coded residual code stream of a macro block is to be spliced by 26 times or 27 CAVLC code streams circulations.When macro block (mb) type is I16 * 16, the i=27 here, CAVCL circulation 27 times; When macro block (mb) type is I4 * 4 or P16 * 16, the i=26 here, CAVCL circulation 26 times.
For the coding of 4 * 4 residual error coefficients of a son, CAVLC coded hardware workflow is as follows: after the sub-block input, it is carried out to reverse Zig-zag scanning, judge one by one according to serial mode during scanning.The group block scan completes, and can start the coding to CoeffToken, TotalZeros and RunBefore.Coding adopts look-up table to obtain, and coding schedule for example is placed in ROM, by the address value of calculation code table, thereby obtains encoded radio by the corresponding coding schedule of searching of address value.Wherein, calculate and table look-up and can mutually change between the two, more complicated table lookup operation can be realized with calculating, and some complicated calculations can be with tabling look-up realization, should reduce mutual process between the two, because the cycle of tabling look-up is fixed, and computational process is subject to the impact of pipeline series as far as possible, thereby computing unit and lookup unit can be put together, thereby simplify state machine design when design.
In order to realize the advantage of hardware fast processing, CAVLC, for the coding of 4 * 4 residual error coefficients of a son, adopts the mode of inter-stage flowing water to raise the efficiency, and as shown in Figure 7, according to the coding flow process, specifically is divided into following level Four:
1) the data buffering unit (the streamline first order) that reorders.
As shown in Figure 8,9, after entropy coder starts, at first according to macro block (mb) type, to reading 4 * 4 residual error data of a son in corresponding memory, be put in the 1-16 register.For I16 * 16 brightness DC coefficients, the brightness residual coefficient of I4 * 4 and P16 * 16, directly deposit it in the 1-16 register.Colourity DC coefficient for all macro block (mb) types, will be placed on the 1-4 register, and other 5-16 register is zero by tax.Brightness AC coefficient and colourity AC coefficient for all macro block (mb) types, will be placed on the 1-15 register respectively by its effective residual error AC coefficient, and No. 16 register is zero by tax.After the residual error coefficient storage format reading out data in coded sequence and register, by a multi-selection device, from No. 16 to No. 1, successively the input of the value its register is resolved for subsequent module.
Due to encoder, needing is 16 to the efficient transformation data amount check respectively, 4, 15 residual error coefficient is encoded, when carrying out zig-zag scanning, due to be all to (position that the DC coefficient wherein withdrawn stays can be set to zero) that front order is carried out backward scanning, for ease of peek and scanning, can in advance residual error coefficient be reached in certain sequence to form by different macro block (mb) types is stored in memory or register file, a corresponding sub-4*4 piece residual error data is read out and first is cached in 16 data registers when being encoded, each all values in these 16 registers complete the CAVCL coding one time.The order of residual error storage and form are different because of the difference of macro block (mb) type.
From above, know, the I_16*16 macro block (mb) type can be by the storage of form as shown in figure 10 residual error data, for brightness DC coefficient, now the DC coefficient in every sub-4*4 piece in a macro block is extracted and do after Hadamard transform the starting position of leaving memory in, at first CAVLC is encoded to it while starting, and then starts successively the CAVLC coding from sequence number 0 successively.When 4 * 4 coded scannings of each son, resolve residual information from the scanning of data16 to data1 backward successively.For the AC coefficient in each sub-4*4 piece, the data independence of data1 storage is critical, when it being got to register and carry out next code, its being composed is that zero (value in last register is zero by tax, at first it be output scanning), so still can be encoded and do not affected the information analysis between residual error data by the CAVLC algorithm.For brightness DC coefficient, DC coefficient in every sub-4*4 piece in chrominance block is extracted after doing Hadamard transform and leaves brightness residual coefficient back in, here 4 efficient transformation DC coefficients have only been stored, it is got when register carries out next code place it in before in 4 registers, the value in 12 registers of back is zero by tax.
In like manner, can be by form storage as shown in figure 11 for I_4*4 and P_16*16 macro block (mb) type.With I_16*16, compare, it does not have brightness DC coefficient Hadamard transform, and the conversion coefficient storage format of colourity is identical with I_16*16.
2) residual error coefficient resolution unit (the streamline second level).
As shown in figure 12, control logic initialization in residual error coefficient resolution unit data_parse or set to 0 to nonzero coefficient counter tc_n while resetting, put 16 to coefficient sequence register num, then from data buffering, reorder unit and one to pat and receive 16 residual error data and to its Preliminary Analysis by 16 cycles.If these data are non-zero, tc_n constantly circulates and adds 1, finally its value output is to nonzero coefficient sum TotalCoeffs, these nonzero coefficients are kept at successively in data register bank mem_d after a gating switch simultaneously, and now also recorded this nonzero coefficient be in current 16 data which, by by 16, successively successively decreasing and 1 obtain, and the sequence number of this nonzero coefficient is kept in data register bank mem_n and uses for the back module.
For example, as shown in figure 13,16 transform datas of the first behavior input, backward is resolved successively from right to left, the nonzero coefficient wherein detected successively has 1 ,-1,6,4,3,1,2, and corresponding numbering is followed successively by 13,11,9,7,6,3,1, and the nonzero coefficient number of this sequence has 7.
Note, the above-mentioned nonzero coefficient number counted, nonzero coefficient number and the corresponding sequence number of storage have just been carried out preliminary parsing to residual error data, and more a step ground also needs to parse number (TrailingOnes), the number (TotalZeros) of last nonzero coefficient leading zero, the number (RunBefore) of each nonzero coefficient leading zero and all zero numbers (ZerosLeft) on the corresponding nonzero coefficient left side of the hangover coefficient of 4 * 4 residual error coefficients of coding.
3) look-up table or calculation code unit (the streamline third level), the look-up table coding that comprises TotalCoeffs, TotalZeros and zero stroke, wherein the coding of the amplitude (Levels) of the nonzero coefficient except the hangover coefficient is first to calculate prefix level_prefix and suffix level_suffix, then these two parameters are carried out that coded combination obtains, and TrailingOnes can be directly to encoding symbols.As previously mentioned,
4) code stream concatenation unit (the streamline fourth stage), carry out the code stream splicing to the code word of third level input by respective sequence.
The efficient transformation coefficient number contained due to 4 * 4 of sons has 16,15,4 these three kinds of situations.For example: the DC coefficient DC_L after the luminance factor Hadamard transform under I16 * 16 patterns, coefficient DC_AC_L after the brightness dct transform under I4 * 4 patterns, coefficient DC_AC_L after the brightness dct transform under interframe 16 * 16 patterns, its efficient transformation coefficient number is 16; Remaining ac coefficient AC_L after luminance factor Hadamard transform under I16 * 16 patterns, colourity 8 * 8 pattern (intra& Inter) count AC_C, its efficient transformation coefficient number is 15; And colourity 8 * 8 patterns (intra& Inter) DC coefficient AD_C after lower residual error coefficient Hadamard transform, its efficient transformation coefficient number is 4.
For these three kinds of situations, in the present invention, by adjustment and the zero setting of residual error coefficient position, can adopt a set of hardware to realize, select wherein 16 as worst case, need at most 16 conversion coefficients of coding.When conversion coefficient is 15 and 4, utilize zig-zag scanning correspondingly high 1 and high 12 bit masks to be fallen, with zero, fill here.The advantage of this invention is, in the situation that conversion coefficient all adopts the CAVLC encryption algorithm constant, can not affect the information analysis of follow-up residual error, in the situation that do not affect encoding efficiency, has realized well hardware resource sharing.
According to different son 4 * 4 block types, adopt different zig-zag scanning and, by corresponding coefficient zero setting (now being indifferent to the coefficient value of grey color part), so just can realize the coding of same cover 4 * 4 coded hardware to 16,4,15 residual error coefficients.
As shown in Figure 14 .1, Figure 14 .2 and Figure 14 .3, for I4 * 4, the residual error luminance factor of P16 * 16 and I16 * 16 residual error brightness DC coefficient (being I4 * 4_L, P16 * 16_L and I16 * 16_L_DC), backward under employing a pattern is beaten and is retouched, and from 16 to 1, is scanned; For I4 * 4, P16 * 16 and I16 * 16 residual error colourity ac coefficients (being I4 * 4_C_AC, P16 * 16_C_AC and I16 * 16_C_AC), the backward under employing b pattern is beaten and is retouched, and from 16 to 1, is scanned; For I4 * 4, P16 * 16 and I16 * 16 residual error colourity DC coefficient (being I4 * 4_C_DC, P16 * 16_C_DC and I16 * 16_C_DC), the backward under employing c pattern is beaten and is retouched, and from 16 to 1, is scanned.The square frame of the not reference numbers in figure partly is set to zero.
CAVLC realizes the coding to 16,4,15 residual error coefficients by same cover 4 * 4 coded hardware.Generally these 3 kinds of patterns all can be used, and need to process a plurality of sub 4 * 4 or 2 * 2 in a macro block, and the number of times of its coding circulation is different.
As shown in figure 15, for I16 * 16 macro blocks, with regard to needs, carry out in turn the DC_L_16 coefficient 1 time respectively, AC_L_15 coefficient 16 times, the coding that DC_C_4 coefficient 2 times and AC_C_15 coefficient are 8 times, amount to CAVCL and need circular treatment 27 times; For I4 * 4 and P16 * 16 macro blocks, carry out in turn the L_16 coefficient 16 times respectively with regard to needs, the coding that DC_C_4 coefficient 2 times and AC_C_15 coefficient are 8 times, amount to CAVCL and need circular treatment 26 times, add 1 by a counter and count after selecting corresponding modes, reach the coding that cycle-index completes a macro block.When corresponding CBP value (coded macroblocks type code_block_patten) while being 0, the integral multiple of this counter addition 4.
As above-mentioned, according to macro block (mb) type, by advance to the configuration of encoder, allow encoder be operated in different mode, complete the coding of a macro block.
To sum up, the present invention utilizes the operator unit to realize the entropy coder for video encoder, and its desin speed is fast.
Above content is in conjunction with concrete execution mode further description made for the present invention, can not assert that specific embodiment of the invention is confined to these explanations.For the general technical staff of the technical field of the invention, without departing from the inventive concept of the premise, can also make some simple deduction or replace, all should be considered as belonging to protection scope of the present invention.

Claims (9)

1. the implementation method for the entropy coder of video encoder, is characterized in that, comprising:
Each encoding function piece of the entropy coder of the video encoder of high level language arthmetic statement is mapped to by the hardware logic of operator cell formation and describes, described operator unit comprises the computing class operator, the storage class operator, the class of paths operator, control at least one in class operator and clock class operator, described computing class operator comprises arithmetic logical unit and computing configuration register, described computing configuration register is for receiving and storage computing configuration-direct, described computing class operator is the restructural operator, described restructural operator is realized dynamic restructuring by changing the computing configuration-direct in the process of implementation, described storage class operator comprises memory cell and storage configuration register, described memory cell comprises address-generation unit, data storage, data generation unit and data output control unit, described class of paths operator comprises the alteration switch that forms the Route Selection unit, data register and routing configuration register, described control class operator mainly is sent to configuration information corresponding configuration register, described clock class operator is used for to described computing class operator, the storage class operator, class of paths operator and control class operator provide clock signal,
By the hardware logic of described operator cell formation, formation entropy encoder hardware integrated circuit is described, the encoded radio of described entropy coder adopts the mode of calculating or tabling look-up to obtain, described entropy coder is the CAVLC entropy coder, a macroblock coding residual error data in described CAVLC entropy coder comprises a plurality of sub 4 * 4 or 2 * 2, the number of each described sub 4 * 4 or 2 * 2 efficient transformation coefficient is 16, any one in 15 and 4, described efficient transformation coefficient position by adjusting described efficient transformation coefficient or make it be applicable to same cover 4 * 4 coded hardware unifications the zero setting of described efficient transformation coefficient to be resolved and encode in the zig-zag inverse scan.
2. the method for claim 1, is characterized in that, each encoding function piece of the entropy coder of the video encoder of high level language arthmetic statement is mapped to by the hardware logic of operator cell formation and describes and comprise the steps:
Process analysis step: read the high level language algorithm routine of each encoding function piece of entropy coder, identify mapped execution object and parameter object according to this high level language rule from described high level language algorithm routine;
The Data Control flow graph generates step: the execution object and the parameter object that identify are mapped to respective nodes in the Data Control flow graph of describing the entropy coder algorithm;
Operator structure figure generates step: according to the function treatment of each node in the Data Control flow graph, take out at least one operator unit of corresponding function from described operator cell library, described Data Control flow graph is converted to the operator space-time diagram formed by the operator unit;
The temporal constraint step: determine total temporal constraint according to the requirement of user specification demand and target integrated circuit technology, each the operator unit label time in the operator space-time diagram, carry out temporal constraint to each level of operator space-time diagram;
Space-time diagram compression step: according to time-labeling, the operator space-time diagram is carried out to the cluster compression on space, and make overall algorithm execution time close to total temporal constraint;
Lower hardware mapping step: the operator space-time diagram after the cluster compression is mapped to by the hardware logic of operator cell formation and describes.
3. the method for claim 1, is characterized in that, a described computing class operator configuration control bit and multiple computing, and described control bit is for selecting a kind of computing of a selection.
4. method as claimed in claim 2, it is characterized in that, described execution object comprises operational order and/or control command, described parameter object comprises at least one in input data, output data and intermediate data, described operational order is mapped as to the computing class operator, described control command is mapped as and controls class operator or class of paths operator, described parameter object is mapped as to the storage class operator.
5. the method for claim 1, it is characterized in that, described encoding function piece comprises unified Variable Length Code functional block for the grammer parameter of encoding, for the CAVLC functional block of coded residual data and the code stream splicing functional block of splicing all encoding code streams.
6. the entropy coder for video encoder that implementation method realizes according to claim 1, it is characterized in that, comprise the unified Variable Length Code functional block for the grammer parameter of encoding, CAVLC functional block for the coded residual data, and the code stream splicing functional block of splicing all encoding code streams, described each functional block is comprised of the operator unit, described operator unit comprises the computing class operator, the storage class operator, the class of paths operator, control at least one in class operator and clock class operator, described computing class operator comprises arithmetic logical unit and computing configuration register, described computing configuration register is for receiving and storage computing configuration-direct, described computing class operator is the restructural operator, described restructural operator is realized dynamic restructuring by changing the computing configuration-direct in the process of implementation, described storage class operator comprises memory cell and storage configuration register, described memory cell comprises address-generation unit, data storage, data generation unit and data output control unit, described class of paths operator comprises the alteration switch that forms the Route Selection unit, data register and routing configuration register, described control class operator mainly is sent to configuration information corresponding configuration register, described clock class operator is used for to described computing class operator, the storage class operator, class of paths operator and control class operator provide clock signal, the encoded radio of described entropy coder adopts the mode of calculating or tabling look-up to obtain, described entropy coder is the CAVLC entropy coder, a macroblock coding residual error data in described CAVLC entropy coder comprises a plurality of sub 4 * 4 or 2 * 2, the number of each described sub 4 * 4 or 2 * 2 efficient transformation coefficient is 16, any one in 15 and 4, described efficient transformation coefficient position by adjusting described efficient transformation coefficient or make it be applicable to same cover 4 * 4 coded hardware unifications the zero setting of described efficient transformation coefficient to be resolved and encode in the zig-zag inverse scan.
7. entropy coder as claimed in claim 6, is characterized in that, described self-adapting changeable long codes functional block comprises the connected data buffering of order reorder unit, residual error coefficient resolution unit, look-up table or calculation code unit.
8. entropy coder as claimed in claim 7, is characterized in that, reorder unit, residual error coefficient resolution unit, look-up table or calculation code unit of described data buffering adopts the setting of inter-stage flowing water form.
9. entropy coder as claimed in claim 6, is characterized in that, the residual error coefficient coding operator functional groups of described CAVLC functional block is the operator function group that realizes a second son 4*4 piece residual error coefficient coding.
CN2010106200287A 2010-12-31 2010-12-31 Entropy coder for video coder and implementation method thereof Expired - Fee Related CN102088603B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2010106200287A CN102088603B (en) 2010-12-31 2010-12-31 Entropy coder for video coder and implementation method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2010106200287A CN102088603B (en) 2010-12-31 2010-12-31 Entropy coder for video coder and implementation method thereof

Publications (2)

Publication Number Publication Date
CN102088603A CN102088603A (en) 2011-06-08
CN102088603B true CN102088603B (en) 2013-12-04

Family

ID=44100178

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2010106200287A Expired - Fee Related CN102088603B (en) 2010-12-31 2010-12-31 Entropy coder for video coder and implementation method thereof

Country Status (1)

Country Link
CN (1) CN102088603B (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AR092786A1 (en) * 2012-01-09 2015-05-06 Jang Min METHODS TO ELIMINATE BLOCK ARTIFACTS
EP2618570B1 (en) * 2012-01-20 2017-12-13 BlackBerry Limited Method and devices for context set selection
PL3515073T3 (en) * 2012-03-26 2021-03-08 Jvckenwood Corporation Image coding device, image coding method, and image coding program
CN103581681B (en) * 2012-07-26 2018-05-01 安凯(广州)微电子技术有限公司 A kind of video decoder and coding/decoding method
US9538175B2 (en) * 2012-09-26 2017-01-03 Qualcomm Incorporated Context derivation for context-adaptive, multi-level significance coding
MX342822B (en) 2013-01-08 2016-10-13 Dolby Int Ab Model based prediction in a critically sampled filterbank.
CN104702952B (en) * 2014-01-21 2018-02-27 杭州海康威视数字技术股份有限公司 A kind of image data compression method
CN108416431B (en) * 2018-01-19 2021-06-01 上海兆芯集成电路有限公司 Neural network microprocessor and macroinstruction processing method
CN111666077B (en) * 2020-04-13 2022-02-25 北京百度网讯科技有限公司 Operator processing method and device, electronic equipment and storage medium
CN111638874B (en) * 2020-05-27 2023-05-16 江西五十铃汽车有限公司 Matlab software-based automobile controller coding information writing and reading method
CN113824955B (en) * 2021-11-22 2022-02-18 杭州雄迈集成电路技术股份有限公司 Multi-channel video time-sharing multiplexing coding method and system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101707716A (en) * 2009-11-19 2010-05-12 清华大学 Video coder and coding method
CN101783953A (en) * 2010-03-08 2010-07-21 北京空间机电研究所 JPEG-LS run length coding hardware implementation method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101707716A (en) * 2009-11-19 2010-05-12 清华大学 Video coder and coding method
CN101783953A (en) * 2010-03-08 2010-07-21 北京空间机电研究所 JPEG-LS run length coding hardware implementation method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
王新安.算子设计方法缩小IC设计与制造间的"剪刀差".《集成电路应用》.2010,23-24,26.
算子设计方法缩小IC设计与制造间的"剪刀差";王新安;《集成电路应用》;20100731;23-24,26 *

Also Published As

Publication number Publication date
CN102088603A (en) 2011-06-08

Similar Documents

Publication Publication Date Title
CN102088603B (en) Entropy coder for video coder and implementation method thereof
CN105874795B (en) When wavefront parallel processing is activated to the rule of intra-picture prediction mode
CN104737540B (en) For the Video Codec framework of next-generation video
CN104584560B (en) Offset in deblocking using chroma quantization parameters
CN101490968B (en) Parallel processing apparatus for video compression
CN101150719B (en) Parallel video coding method and device
CN101313591B (en) Method and device of interpolation in grading video compression
CN107615765A (en) The method and apparatus of resource-sharing in video coding and decoding system between intra block replication mode and inter-frame forecast mode
CN101252694B (en) Address mapping system and frame storage compression of video frequency decoding based on blocks
CN101014129B (en) Video data compression method
CN104737542A (en) Content adaptive entropy coding for next generation video
CN100493202C (en) Variable length decoding device
CN107105278A (en) The coding and decoding video framework that motion vector is automatically generated
CN102055981B (en) Deblocking filter for video coder and implementation method thereof
CN106170092A (en) Fast encoding method for lossless coding
CN102369522A (en) Parallel, pipelined, integrated-circuit implementation of a computational engine
CN101406056A (en) Method of reducing computations in intra-prediction and mode decision processes in a digital video encoder
CN101009842A (en) Method and device for value insertion in the hierarchical video compression
CN113207004A (en) Remote sensing image compression algorithm hardware implementation method based on JPEG-LS interframe expansion
CN107409212A (en) The gradual renewal using conversion coefficient for coding and decoding
CN106331715A (en) Video compression coding standard H.265-based entropy coding system and method
WO2020233179A1 (en) Bitrate estimation core unit, bitrate estimation device and bitrate estimation method
CN104581154A (en) Entropy coding method and entropy coder circuit
CN102137257B (en) Embedded H.264 coding method based on TMS320DM642 chip
CN100568920C (en) The method and apparatus of the video image brightness interpolating of serial input and line output

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20131204

Termination date: 20201231