CN102055981B - Deblocking filter for video coder and implementation method thereof - Google Patents

Deblocking filter for video coder and implementation method thereof Download PDF

Info

Publication number
CN102055981B
CN102055981B CN 201010620016 CN201010620016A CN102055981B CN 102055981 B CN102055981 B CN 102055981B CN 201010620016 CN201010620016 CN 201010620016 CN 201010620016 A CN201010620016 A CN 201010620016A CN 102055981 B CN102055981 B CN 102055981B
Authority
CN
China
Prior art keywords
operator
function
class
data
filtering
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN 201010620016
Other languages
Chinese (zh)
Other versions
CN102055981A (en
Inventor
昌锋
王新安
胡子一
张兴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University Shenzhen Graduate School
Original Assignee
Peking University Shenzhen Graduate School
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University Shenzhen Graduate School filed Critical Peking University Shenzhen Graduate School
Priority to CN 201010620016 priority Critical patent/CN102055981B/en
Publication of CN102055981A publication Critical patent/CN102055981A/en
Application granted granted Critical
Publication of CN102055981B publication Critical patent/CN102055981B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a deblocking filter for a video coder and an implementation method thereof. The method comprises the following steps: mapping each filtering function of the deblocking filter of the video coder described by the high level program language algorithm into the hardware logic description formed by operator units; and generating a hardware integrated circuit of the deblocking filter by the hardware logic description formed by the operator units. By applying the invention, a system engineer can realize fast mapping from the deblocking filtering algorithm described by the high level language to a hardware circuit at the lower layer according to a complete operator cell library capable of supporting the description of the high level language algorithm, thereby finishing the fast design of the deblocking filter integrated circuit. Moreover, the deblocking filter and other application specific integrated circuit (ASIC) parts in the video coder can perform parallel flow operation, thereby increasing the design speed of the ASIC of the video coder. The deblocking filter can be suitable for different video coder structures and has good universality.

Description

De-blocking filter and its implementation of being used for video encoder
Technical field
The present invention relates to the Digital Video Processing technology, particularly, relate in particular to de-blocking filter and its implementation for video encoder.
Background technology
It is to be used to refer to the methodology that derived set becomes circuit design that method of designing integrated circuit is learned.Present integrated circuit (IC) design, especially large scale integrated circuit design, its common design cycle is: the system engineer proposes the systemic-function framework according to functional requirement, the software engineer utilizes computer advanced program language (for example: Java, C, C++, Matlab etc.) that the systemic-function framework is described, thereby obtains the arthmetic statement of the high level language of integrated circuit; Then utilize hardware description language (for example: Verilog, VHDL etc.) by the integrated circuit (IC) design engineer, arthmetic statement is comprehensively become net meter file, use special hardware synthesis instrument to realize placement-and-routing afterwards, obtain integrated circuit diagram.
As integrated circuit manufacturer or integrated system supplier with the basic problem of commercial operation (as product quantity, design/testing expense, not reusable engineering spending (Non-RecurringEngineering, NRE), the integration of User IP/multiplexing, reliability and maintainability, production process complexity etc.) take all factors into consideration when interior, expedite product releases the time, enhance product performance, reduce product makes risk and just becomes significant consideration when selecting designing technique.Especially, the life cycle deficiency of electronic product is 1 year at present, and how as far as possible piece ground designs the electronic product that makes new advances then becomes the problem that manufacturer will solve.Therefore, the desin speed of electronic product integrated circuit then becomes the major consideration in the integrated circuit (IC) design.
Along with integrated circuit fabrication process enters 45nm with the next stage, data flow algorithm, especially the application-specific integrated circuit ASIC of Digital Media class (Application SpecificIntegrated Circuit) design, for example, be used for the integrated circuit (IC) design of the de-blocking filter of video encoder, face problems such as design cycle length, very flexible, autgmentability difference.Therefore, be difficult to satisfy the demand that product goes on the market fast, function constantly promotes.Trace it to its cause, existing ASIC design is based on standard cell lib and comes computing, and granularity is less, the process from the standard cell lib of the bottom to the algorithm of the superiors like this, and therefore the complexity that becomes naturally, has restricted ASIC Design speed so widely.
In video standard H.264, the image after the encoder inverse transformation quantizes blocking artifact can occur usually.The reason of its generation has two: a most important reason is based in the frame of piece and discrete cosine transform (DCT) conversion of Inter prediction residue, the quantizing process of its conversion coefficient is coarse relatively, thereby the conversion coefficient that the inverse quantization process is recovered also has error, can cause in the borderline vision of image block discontinuous.Second reason then comes from the block motion compensation prediction, under the situation of multi-reference frame, different sub-pieces needed reference data when motion compensation is selected from which reference frame to copy according to residual energy, makes that so near the reference data two adjacent sub-block boundaries may be from different reference frames.The coupling that is to say motion compensation block can not be absolutely accurate, so it is discontinuous to produce data on the border of copy block.Certainly, in the discontinuous sub-piece that also can be copied to needs compensation of the data boundary that has existed in the reference frame.The discontinuous effect of this data boundary again can be along with previous reconstructed frame accumulates down, thereby greatly has influence on quality and the compression efficiency of image.Although H.264 can adopt 4 * 4 less transform sizes to reduce the discontinuous phenomenon of this data boundary in the standard, but still need a de-blocking filter farthest to improve coding efficiency.Therefore de-blocking filter is indispensable part in the video encoder, and it goes the quality of piece performance directly to affect the subjective effect of compressing image.
In the prior art, a kind of " being used for carrying out according to a plurality of video standards the deblock filtering techniques of video coding ", (patent No. was: 200680026470) disclose and a kind ofly come the deblock filtering techniques of standard H.264/MPEG4 according to quantization parameter.(patent No. is: 200810060198) proposed a kind ofly the block elimination filtering module is splitted into submodule to carry out the method that block elimination filtering is handled a kind of " block-removal filtering method and device "." a kind of block-removal filtering method, system and de-blocking filter " (patent No. is: 200710125596.8) proposed to carry out the framework that filtering is handled by the row order of macro block.Consider that from the desin speed and assurance de-blocking filter performance two aspects that promote de-blocking filter existing ASIC Design speed for de-blocking filter is slow, this is restricting the development of HD video coding techniques to a certain extent.
Therefore, existing ASIC Design method for de-blocking filter is still waiting to improve and improve.
Summary of the invention
The technical problem to be solved in the present invention is at the deficiencies in the prior art, and a kind of de-blocking filter for video encoder and its implementation are provided, and can accelerate the integrated circuit (IC) design speed of de-blocking filter.
For solving the problems of the technologies described above, the present invention has adopted following technical scheme:
A kind of implementation method of the de-blocking filter for video encoder may further comprise the steps:
Each filter function of the de-blocking filter of the video encoder of high level language arthmetic statement is mapped to the hardware logic that is made of the operator unit to be described; By the hardware logic that described operator unit constitutes the hardware integrated circuit that generates de-blocking filter is described.
In an embodiment of the present invention, each filter function of the de-blocking filter of the video encoder of high level language arthmetic statement being mapped to the hardware logic that is made of the operator unit describes and comprises the steps:
Process analysis step: read the high level language algorithm routine of each filter function of de-blocking filter, from described high level language algorithm routine, identify mapped execution object and parameter object according to this high level language rule;
The Data Control flow graph generates step: the execution object that will identify and parameter object are mapped to the respective nodes in the Data Control flow graph of describing the de-blocking filter algorithm;
The operator space-time diagram generates step: according to the function treatment of each node in the Data Control flow graph, from the operator cell library, take out at least one operator unit of corresponding function, convert described Data Control flow graph to be made up of the operator unit operator space-time diagram, the operator unit in the described operator cell library comprises at least a in control class operator, computing class operator, storage class operator, class of paths operator and the clock class operator operator;
The temporal constraint step: total temporal constraint is determined in the requirement according to user specification demand and target integrated circuit technology, and each the operator unit label time in the operator space-time diagram carries out temporal constraint to each level of operator space-time diagram;
Cluster compression step: according to time-labeling the operator space-time diagram is carried out cluster compression on the space, and the overall algorithm execution time that makes block elimination filtering is close to total temporal constraint;
Lower floor's hardware mapping step: the operator space-time diagram after the cluster compression is mapped to the hardware logic that is made of the operator unit describes.
In one embodiment of the invention, described filter function comprises for the brightness vertical boundary filter function that image is carried out the filtering of brightness vertical boundary, for the brightness level boundaries filter function that image is carried out brightness level boundaries filtering, for the colourity vertical boundary filter function that image is carried out the filtering of colourity vertical boundary with for the colourity horizontal boundary filter function that image is carried out the filtering of colourity horizontal boundary.
In one embodiment of the invention, described process analysis step comprises:
From the high level language algorithm routine of described block elimination filtering, identify mapped execution object and parameter object, described execution object comprises operational order and/or control command, and described parameter object comprises at least a in input data, output data and the intermediate data;
The analytic function call relation identifies the parameter object and the discriminant function subfunction of whether calling in service that relate in function and the function operation; If call subfunction, then analyze the call relation of subfunction, identify subfunction and the subfunction parameter object that relates in service and judge subfunction other subfunction of whether calling again in service.
In one embodiment of the invention, generate in the step at described Data Control flow graph, described operational order is mapped as processing node, described control command is mapped as for the control of identification-state, state transitions condition and state control signal stream, described parameter object is mapped as memory node on the data flow.
In one embodiment of the invention, described operator space-time diagram generation step comprises:
The Data Control flow graph is launched according to its data flow dependency;
The function treatment of carrying out according to each node after launching is taken out at least one operator unit of corresponding function from the operator cell library of setting up in advance, described processing node is mapped as the computing class operator, described control stream is mapped as control class operator and/or class of paths operator, described memory node is mapped as the storage class operator.
In one embodiment of the invention, described temporal constraint step comprises: according to data flow architecture in the Data Control flow graph each level of operator space-time diagram is carried out temporal constraint; If described data flow architecture is parallel data stream, then total temporal constraint is divided equally each the operator level in the space-time diagram of correspondence, and divided the temporal constraint of each operator level equally in this operator level each operator unit; If the data flow in the described Data Control flow graph is serial data stream, then the corresponding sequential in operator unit that the longest arithmetic path in each operator level the is shone upon ratio that accounts for the corresponding sequential summation in operator unit that the longest arithmetic path shines upon in the child level is at long last distributed.
In one embodiment of the invention, described cluster compression step comprises:
The cluster step is used for finding out the identical computing class operator of attribute and/or the identical storage class operator of memory attribute at the operator space-time diagram;
Compression step is used for the identical computing class operator of operational attribute spatially being merged compression and/or the storage class operator that memory attribute is identical spatially merges compression according to time-labeling;
The multiplexing step of operator is used for introducing the control class operator, and the computing class operator after the compression and/or storage class computing operator are generated the corresponding configuration instruction, realizes the multiplexing of computing class operator and/or storage class operator.
Simultaneously, the present invention also provides a kind of de-blocking filter for video encoder, and described de-blocking filter is used for receiving through inverse transformation and the image frame data after quantizing, and carries out block elimination filtering and handle, and described de-blocking filter comprises:
For the brightness vertical boundary filter function piece that image is carried out the filtering of brightness vertical boundary, for the brightness level boundaries filter function piece that image is carried out brightness level boundaries filtering, for the colourity vertical boundary filter function piece that image is carried out the filtering of colourity vertical boundary, for the colourity horizontal boundary filter function piece that image is carried out the filtering of colourity horizontal boundary; Described brightness vertical boundary filter function piece, brightness level boundaries filter function piece, colourity vertical boundary filter function piece and colourity horizontal boundary filter function piece include the filtering parameter computing unit for the calculation of filtered parameter, for the filtering condition decision unit that filtering condition is judged, be used for calculating the strong filtering computing unit of strong filtering and the weak filtering computing unit that is used for calculating weak filtering, described each unit is made up of the operator unit.
Owing to adopted above technical scheme, the beneficial effect that the present invention is possessed is:
Being mapped to the hardware logic that is made of the operator unit by each filter function with the de-blocking filter of the video encoder of high level language arthmetic statement describes; And the design of being described the hardware integrated circuit that generates de-blocking filter by the hardware logic that the operator unit constitutes, make that the system engineer can be according to being enough to support the complete operator cell library of describing the high-level language algorithm, the block elimination filtering algorithm that realization will be described with high-level language is to the fast mapping of lower floor's hardware circuit, thereby finished the quick design of de-blocking filter integrated circuit.And, this kind de-blocking filter can also with video encoder in other application-specific integrated circuit ASICs partly carry out parallel pipelining process operation, accelerated the desin speed of the ASIC of video encoder.Simultaneously, this kind de-blocking filter goes for different video encoder structures, has good versatility.
Description of drawings
Fig. 1 is the design flow diagram of existing integrated circuit;
Fig. 2 shows a few class operator structures that the embodiment of the invention provides; Wherein, shown in Fig. 2 .1 is the computing class operator, and shown in Fig. 2 .2 is the storage class operator, and shown in Fig. 2 .3 is the class of paths operator, and shown in Fig. 2 .4 is the control class operator;
The Data Control flow graph that the de-blocking filter algorithm that Fig. 3 describes for an embodiment of the present invention high level language maps out;
Fig. 4 is the schematic diagram of the constant in an embodiment of the present invention high level language;
Fig. 5 is the operator space-time diagram of the L0 processing capacity node of an embodiment of the present invention;
Fig. 6 is the operator space-time diagram of the L1 processing capacity node of an embodiment of the present invention;
Fig. 7 is the operator space-time diagram of the L2 processing capacity node of an embodiment of the present invention;
Fig. 8 is the operator space-time diagram of the L3 processing capacity node of an embodiment of the present invention;
Fig. 9 is the operator space-time diagram of L4 processing capacity node for an embodiment of the present invention;
Figure 10 carries out the schematic diagram of the 4*4 block boundary of filtering for an embodiment of the present invention;
Figure 11 is the schematic diagram of the filtering basic structure of an embodiment of the present invention;
Figure 12 is the schematic diagram of the operator after the cluster of an embodiment of the present invention is compressed;
Figure 13 is the schematic diagram after operator solidifies of carrying out of an embodiment of the present invention.
Embodiment
By reference to the accompanying drawings the present invention is described in further detail below by embodiment.
One, video encoder general introduction
Video coding technique is the key technology of digital media storage and transmission, and it is widely used in fields such as Digital Television, video communication, network flow-medium.The present invention relates to a kind of implementation method of the de-blocking filter based on operator thought, be applicable to the de-blocking filter of various image/videos in the signal process field/audio frequency dct transform data.
At present in field of video encoding, by the common new digital video coding standard of formulating of the joint video team (JVT) of International Telecommunication Association (ITU-T) and International Organization for Standardization amalgamated consolidation-H.264 be used widely.H.264 algorithm be International Telecommunication Association (ITU-T) H.264, it also is the content of the 10th part of the MPEG-4 standard of International Organization for Standardization and International Power committee (IEC), it adopts with different levels algorithm structure, can be divided into two-layerly conceptive, be respectively video coding layer and network abstraction layer.Video coding layer (VideoCoding Layer is called for short a VCL) is responsible for a video frequency content expressing efficiently, and network abstraction layer (Network Abstraction Layer is called for short NAL) is responsible for data are packed and transmitting in the desired appropriate mode of network.Help the encapsulation of information and information is carried out better priority control by video coding layer and the such structure of network abstraction layer.
H.264 the main subalgorithm of cataloged procedure has: inter prediction, infra-frame prediction, transform and quantization, block-eliminating effect filtering and entropy coding.Below each subalgorithm is done a summary.
1, inter prediction
Inter prediction is the correlation of utilizing video image interframe, i.e. time correlation reaches the purpose of image compression.In image transmission technology, live image is the emphasis of paying close attention to.It is the temporal image sequence that successive image frame is at interval formed that live image was gone up with the frame period by the time, and it is in time than spatially having bigger correlation.Variations in detail is very little between most of image consecutive frames, namely has very strong correlation between picture frame, and this correlation characteristics of utilizing frame to have are carried out interframe encode, can obtain the compression ratio more much higher than intraframe coding.H.264 inter prediction is to utilize the predictive mode of encoded video frame/field and block-based motion compensation.
2, infra-frame prediction
Infra-frame prediction is a kind of new technology that H.264 adopts, and refers to utilize the neighbor of the top of current block and left as the reference pixel, and the pixel of current block is predicted.Intraframe predictive coding can reduce the spatial redundancy of image, reduces code check.
3, transform and quantization
In the image coding, transition coding and quantification are two independently processes in principle.But in H.264, the multiplication of two processes is united two into one, and further adopt integer arithmetic, reduce the operand of encoding and decoding.Quantizing process is to reduce the image code length under the prerequisite that does not reduce video effect, reduces unnecessary information in the video recovery.Its objective is and remove after the conversion the unessential data component of visual quality for images and keep visually important data component.Its character is that the component after being removed can not recover, and is to diminish process.
4, block elimination filtering
Some comparatively significantly blocking effects can appear in block-based video coding system inevitably at some block boundary place when reconstructed image, particularly under the situation of low code check.The reason that blocking effect occurs is that the pel motion compensation precision of block boundary pixel generally all is lower than pixel in the piece, and the piece conversion has also caused the discontinuity of block boundary.The main process of block elimination filtering comprises: determine filtering strength, judge false border, filtering calculates.
5, entropy coding
In video coding, the entropy coding changes a series of symbols of element that are used for representing video sequence into one to be used for transmitting or the compressed bit stream of storing.The symbol of input may comprise transformation system, motion vector, mark, head and the additional information of quantification.
The present invention relates to the H.264 de-blocking filter of video encoder,, mainly H.264 de-blocking filter is described for this reason.
The present invention is based on X264, and namely the high level language arthmetic statement with the de-blocking filter of the video encoder of X264 code form is starting point, to realize the integrated circuit (IC) design of de-blocking filter.
Two, operator design philosophy general introduction
Look back the development course that method of designing integrated circuit is learned, can see: when integrated circuit fabrication process enters epoch of 1um, the method for designing of elementary cell occurred classifying as with gate array; When integrated circuit fabrication process enters epoch of 0.5um, having occurred with the standard cell is the method for designing of elementary cell; When integrated circuit fabrication process enters epoch of 0.18um, having occurred with the IP kernel is the method for designing of elementary cell.This shows: the design methodology of integrated circuit develops along with the development of integrated circuit fabrication process on the one hand, and the unit granularity of the elementary cell of using during method of designing integrated circuit is learned on the other hand (door, standard cell, IP kernel) constantly increases.The appearance of each new elementary cell all indicates the revolutionary progress of method of designing integrated circuit.
As shown in Figure 1, in the prior art, the design of integrated circuit generally includes two parts: first is from based on the description of high-level language arthmetic statement to the RTL level; Second portion is from rtl description to standard cell ASIC structure or the implementation procedure of gate array realization (or other S-ASIC structure) or FPGA structure.In the prior art, the instrument of existing many moneys comparative maturity is supported the realization of above-mentioned second portion, and implementation procedure satisfies requirements such as efficient, quick substantially; Yet the realization of above-mentioned first mainly is still by the technical staff according to self understanding to high-level language, is converted into the description of RTL level artificially.Present integrated circuit (IC) design is generally based on the standard cell granularity, by hardware description language the integrated circuit hardware logic is described, yet hardware description language also is not suitable for directly describing the programmed logic of high level language and carries out flow process, cause in present integrated circuit (IC) design, longer to lower floor's hardware mapping speed required time by high-level language, become the bottleneck place that restricts integrated circuit (IC) design speed.Along with after the integrated circuit worker enters 45nm, the desin speed of integrated circuit lags behind the development speed of integrated circuit fabrication process.Therefore, for the integrated circuit (IC) design field, improving desin speed is one of current the most urgent problem.
Therefore, what can rationally predict is, progress at full speed along with integrated circuit fabrication process over past ten years, especially after integrated circuit fabrication process enters nanoscale, more the new situation of integrated circuit (IC) design will appear and open in the elementary cell of coarsegrain, to adapt to the develop rapidly of integrated circuit fabrication process.
Given this, proposed a kind of based on operator integrated circuit (IC) design side among the present invention, operator is the elementary cell that is used for the integrated circuit building block of a kind of granularity overgauge unit, will speed up the desin speed of integrated circuit based on the method for designing integrated circuit of operator, to adapt to the progress of integrated circuit fabrication process.The operator method for designing wishes can support the description of algorithm from mathematics, for the automation translation of implementation algorithm to operator language, operator structure provides support.The characteristics of operator are as follows:
1. granularity is bigger
Each operator has certain function, and it is a kind of symbolic representation of functional operation.Whole operator system can support the various functions in the arthmetic statement to realize that the designer can be described and verify algorithm based on operator;
2. support the realization of algorithm
But the complete a limited number of operators of kind can support superstratum describe in to the description of computing, storage, control and data path, and can describe based on operator and be optimized and verify;
3. can accelerate design and checking
After the designer utilized operator to finish the functional description of algorithm, the corresponding hardware design was also finished simultaneously.Because all operators all be full customization in advance and be verified correct unit, therefore will shorten the time of design and checking, the designer only need guarantee to get final product based on the correct of the description of operator and mapping process, thereby will accelerate greatly to design and proof procedure.
The operator that the present invention proposes, operator unit in other words, the hardware logic that can represent integrated circuit, difference in functionality based on operator, operator can be divided into several classes such as computing class operator, storage class operator, class of paths operator, control class operator, clock class operator, these operators constitute the complete operator system of a cover, can support the realization of all algorithms.Through design, these basic operators are optimizations in parameters such as area, power consumption, performances, and they have constituted the operator cell library.And the operator unit further is formed the operator function piece, and the operator function interblock adopts shared MEM on-link mode (OLM) to carry out data communication, also can adopt register transfer to carry out the control information transmission.
Below, the function to the basic operator of 5 classes of forming the operator cell library is described in detail respectively.
1, computing class operator
Computing class operator (AU) is for the elementary cell that realizes logical operation, arithmetical operation or the hybrid operation of logical AND arithmetic, supports the realization of calculation functions such as arithmetical logic, displacement, realizes that the operator of these basic operations is called the basic operation operator.And, can expand on the basis of basic operation operator and support at different application, subtract each other in handling such as medium, take absolute value, average, complexity such as fixed coefficient filtering operation but the computing that is in daily use, the computing operator of realizing these extended arithmetics is called the extended arithmetic operator, and they are built by the basic operation operator.
Fig. 2 .1 is the structural representation of a kind of computing class operator-ADDS operator, it comprise for realize adding the ADD unit of reducing and be used for realizing shifting function<</unit.The computing class operator has reconfigurability, can be a computing class operator configuration control bit and multiple calculation function, control bit is used for selecting one and selects a kind of computing, that is to say, the computing class operator comprises arithmetic logical unit and computing configuration register, the computing configuration register is used for receiving and storage computing configuration-direct, the arithmetical logic operation that different computing configuration-directs is corresponding different, demand according to different application, by configuration control bit (the computing configuration-direct of computing configuration register), can make the computing operator realize different functions.For example go up the ADDS operator in the example, can realize multiple difference in functionality by the parameter value of control bit X, the restructural operator has reduced the operator number that stores in the operator cell library because abundant application function can be used in the different scenes.And the restructural operator can also be realized dynamic reconstruct by the mode that changes control bit in its implementation.And if the calculation function of operator is optimised fixing, then control bit can be fixed (namely solidifying customization).
Fig. 2 .2 is the basic structure schematic diagram of storage class operator (MU), and the storage class operator comprises stored configuration register and memory cell, and memory cell comprises address-generation unit, data storage, data generation unit and data output control unit.The stored configuration register can be by data output control unit configuration store operator (MU) memory bank (various storage mediums: writing and/or playback mode MEM such as register, RAM), working method that can also configuration store body corresponding address generation unit.Directly will import data according to the address of address-generation unit generation and store the precalculated position into, and the data of needs will be exported from deposit position.
Order and arrangement that different phase in the algorithm is handled data are different, thereby make that the storage format of data is different with mode in each stage, and storage operators can also realize the conversion of the different storage formats of data and mode at this class demand.The storage class operator can be supported storage format basic and commonly used in the different application and conversion regime, such as going here and there and change and go here and there conversion, queue structure, stack architecture etc.
3, class of paths operator
Fig. 2 .3 is the universal architecture schematic diagram of class of paths operator (LU).The class of paths operator comprises the routing configuration register and forms alteration switch and the data register (REG) of Route Selection unit, wherein, the routing configuration register is controlled the control of class operator CU (seeing below), and the control alteration switch is realized connection between the nonidentity operation class operator according to the mode of expectation under the control action of control class operator.Data register is used for the inputoutput data of temporary computing class operator and storage class operator.That is, the effect of class of paths operator is to connect storage class operator and computing class operator, forms data flow architecture.Its way of realization mainly contains three kinds: multi-selection device, cross bar switch and bus.
4, control class operator
Fig. 2 .4 is the universal architecture schematic diagram of control class operator (CU).The control class operator mainly is that configuration information is sent to corresponding configuration register, and configuration computing class operator, storage class operator and class of paths operator are realized intended function.That is, the effect of control class operator is normally to control flowing of data flow.Its way of realization has three kinds: counter, state machine and microinstruction word.The control output content comprises: the configuration information of computing class operator dynamic restructuring information, storage class operator AGU, the configuration information of class of paths operator, realize the control to other operators.Wherein the microinstruction word structure comprises decoder, program counter, command memory and streamline control module etc.The control class operator sends configuration information by carrying out the simple configuration instruction to each functional unit.
5, clock class operator
The clock class operator is to drive and timing part, and for generation of the clock signal of control and processing operator, clock signal comprises the signal of control clock start-stop and control clock frequency.
More than five class operators are the bases of realizing following execution mode, be understandable that, above-mentioned to will being divided into five big classes and not exclusive dividing mode for the operator of integrated circuit (IC) design according to function, can also carry out the division of wide region more or thinner scope according to actual conditions targetedly.
The arthmetic statement that the operator design process of integrated circuit (IC) may be summarized to be senior language such as C, Matlab is converted into the process that optimal design was described and carried out in the operator structure to operator.Because operator designs in advance and verifies, thus the operator of algorithm describe can be fast or finish the design verification process automatically.For specific area, such as communication or medium, can design some better expansion operator or operator IP that support the algorithm realization, thereby accelerate to a greater degree and the optimal design process.
Below, the implementation method based on the de-blocking filter that is used for video encoder of operator cellular construction of proposition of the present invention is described in detail.
The implementation method of the de-blocking filter that is used for video encoder that the present invention proposes comprises: each filter function of the de-blocking filter of the video encoder of high level language arthmetic statement is mapped to the hardware logic that is made of the operator unit describes; And will the hardware integrated circuit that generate de-blocking filter be described by the hardware logic that the operator unit constitutes.The hardware logic description that is made of the operator unit is exactly the description of RTL level of the prior art.Therefore, those skilled in the art utilize existing conversion instrument, just the hardware logic that is made of the operator unit can be described the hardware integrated circuit that generates de-blocking filter.So, created a kind of mapping tool from computer language to integrated circuit lower floor hardware circuit, realized to standardization the process of integrated circuit from senior language generation such as C or MATLAB lower floor hardware, implemented convenient and swift.The system engineer can utilize this kind mapping tool, will be mapped to lower floor's hardware circuit with the de-blocking filter algorithm that high-level language is described rapidly.
In an embodiment, the implementation method of the de-blocking filter that is used for video encoder that the present invention proposes, each filter function of the de-blocking filter of the video encoder of high level language arthmetic statement is mapped to the hardware logic that is made of the operator unit to be described and may further comprise the steps: the process analysis step, the Data Control flow graph generates step, the operator space-time diagram generates step, the temporal constraint step, the cluster compression step, lower floor's hardware mapping step is carried out the detail analysis explanation to each step respectively below.
1) process analysis step: the high level language algorithm routine that reads each filter function of de-blocking filter, from described high level language algorithm routine, identify mapped execution object and parameter object according to this high level language rule, specifically comprise:
At first identify mapped execution object and parameter object from the high level language algorithm routine of block elimination filtering, carry out object and comprise operational order and/or control command, parameter object comprises at least a in input data, output data and the intermediate data.The analytic function call relation identifies the parameter object and the discriminant function subfunction of whether calling in service that relate in function and the function operation then; If call subfunction, then analyze the call relation of subfunction, identify subfunction and the subfunction parameter object that relates in service and judge subfunction other subfunction of whether calling again in service.By that analogy, the parameter object exclusive disjunction instruction that in finding the bottom function operation, relates to etc.Wherein, operational order is mainly used in the mathematical operation carried out in the description program, and control command then is used for guaranteeing the normal operation of operational order, and for example, whether control command is being controlled operational order needs to carry out loop computation etc.The main effect of this step is that function decomposes, and is the process that the function of senior language description forms such as C, Java, Matlab is divided into several subfunctions.Function decomposes the information such as basic operation of main analytic function call relation, function structure body and variable, subfunction, so that with the mapping of each above-mentioned class operator realization to function.
In an embodiment of the present invention, the high level language of description de-blocking filter is specially the x264_frame_deblocking_filter function of the description block elimination filtering in the C programmer.So, correspondingly, when carrying out process analysis, the required rules of order of following is exactly the C programmer rule, according to the x264_frame_deblocking_filter function, identifies execution object and parameter object in this function.
In c programming language, a principal function can be called a plurality of functions usually, and function can call subfunction.At this, be principal function with the x264_frame_deblocking_filter function definition, i.e. filtering strength decision function.The function that this principal function is directly called is called and is defined as function, and the function of function call is called subfunction.Certainly, subfunction can also be called other subfunction.Particularly, identify function and subfunction in the filtering strength decision function, specifically may further comprise the steps: x264_frame_deblocking_filter filtering strength decision function is analyzed, found out the parameter object and the discriminant function subfunction of whether calling in service that relate in function and the function operation.Particularly, find out the function that this principal function is directly called, specifically comprise, brightness vertical boundary filter function, brightness level boundaries filter function, colourity vertical boundary filter function and colourity horizontal boundary filter function; And the call relation of analytic function, find above-mentioned 4 functions and called filtering parameter computing function, filtering condition decision function, strong filter function and 4 subfunctions of weak filter function respectively.As shown in table 1, be in the process analysis step, the input that the x264_frame_deblocking_filter function parameter object that analysis is come out comprises, output data; As shown in table 2, be intermediate data and the constant that the x264_frame_deblocking_filter function parameter object that analyzes in the process analysis step comprises.
The input of table 1 x264_frame_deblocking_filter function, output data
Name variable Data type Direction Explanation
i_stride int IN The width of image
mb.type int Int Predictive mode
p_fdec uint8_t* IN Reconstruction frames before the filtering
mb.non_zero_count int IN Residual error nonzero coefficient number
mb.mv int IN Motion vector
qp int IN Quantization parameter
mvp[2] int IN Motion vectors
h_fdec uint8_t* OUT Reconstruction frames after the filtering
Intermediate data and the constant of table 2 x264_frame_deblocking_filter function
Name variable Data type Explanation
bs int Filtering strength
qpc int Chrominance block filtering quantization parameter
i_index_a int Lookup table index
alpha int Adjudicate false boundary threshold 1
beta uint8_t* Adjudicate false boundary threshold 2
tc int Weak filtering amplitude limit
Wherein, the subfunction of function and function call relates to operational order and control command in program meeting in service, when the function call subfunction, then can call accordingly according to the rule of C programmer.
2) the Data Control flow graph generates step: the execution object that will identify and parameter object are mapped to respective nodes in the Data Control flow graph of describing the de-blocking filter algorithm;
Owing in the process analysis step, identified execution object and the parameter object of x264_frame_deblocking_filter function.Therefore, in this step, only need the mapping ruler according to the Data Control flow graph, just the high-level language algorithm can be mapped to the Data Control flow graph.
Wherein, the mapping ruler of Data Control flow graph is specially: operational order is mapped as processing node, control command is mapped as for the control of identification-state, state transitions condition and state control signal stream, parameter object is mapped as memory node on the data flow.And, according to the sequencing that the block elimination filtering algorithm is carried out, in the Data Control flow graph, identify correlation between each node and the storage characteristics of memory node in C programmer.
Particularly, for the x264_frame_deblocking_filter function, the processing node that operational order shines upon out specifically comprises: L0 processing node, L1 processing node, L2 processing node, L3 processing node, L4 processing node; Wherein, the L0 processing node is used for calculating the block elimination filtering parameter, corresponding to the filtering parameter computing function that calls in the de-blocking filter algorithm of describing with high level language, the L1 processing node is used for judging boundary filtering strength, the L2 processing node is used for judging false border, corresponding to the filtering condition decision function that calls in the de-blocking filter algorithm of describing with high level language; The L3 processing node is used for calculating strong filtering, corresponding to the strong filter function that calls in the de-blocking filter algorithm of describing with high level language, the L4 processing node is used for calculating weak filtering, corresponding to the strong filter function that calls in the de-blocking filter algorithm of describing with high level language, weak filter function.
As shown in Figure 3, the Data Control flow graph that comes out for the x264_frame_deblocking_filter Function Mapping of the embodiment of the invention; Wherein, DB-rf represents is register for view data before the storage filtering, after the filtering; Be example with vertical border, p0-p3 represents current filtering boundary left side band filtering pixel, and that close border is p0, is followed successively by p1-p3 left, and the like.Q0-q3 represents border the right band filtering pixel, and that close border is q0, is followed successively by q1-q3 to the right, and the like.The effect of L0 processing node output valve is the Rule of judgment for the input variable of L2 processing node, L3 processing node, L4 processing node; The effect of L1 processing node output valve be for select be output through strong filtered result, still weak filtered result, or the result by the straight-through output of filtering not.The effect of L2 processing node is for according to the image pixel feature, calculates some intermediate variables, so that L3 processing node and L4 processing node are selected suitable filters.The effect of L3 processing node is in order to obtain the pixel value of image after the strong filtering, and the effect of L4 processing node is in order to obtain the pixel value of image after the weak filtering.Because the feature difference on the border of each piece in the image is so when image is carried out block elimination filtering, can need to select the filter of different modes.
What need explanation a bit is, input variable in the L0 processing node includes quantization parameter QP, offsetA and offsetB, wherein offsetA represents is variable h->sh.i_alpha_c0_offset in the c program, be used for obtaining indexA, offsetB represents is variable h->sh.i_beta_offset in the c program, is used for obtaining indexB.OffsetA and offsetB also belong to input variable, but for the H.264 algorithm of baseline standard, they all are defined as 0, so list in table 1.And the c program of present embodiment owing to be high-level H.264 algorithm, need not this 2 input variables in the baseline standard.The high level language of L0 processing node relates to changeless i_alpha_table and i_beta_table in describing, their value condition, as shown in Figure 4.
The input variable of L2 processing node comprises α, β, IndexA.Wherein, α represents the border thresholding of interblock, and β represents the border thresholding in the piece.In general, the gradient difference of real boundaries on either side pixel value is bigger than the pixel value gradient difference that quantizes the false boundaries on either side that error causes.Therefore, the gradient difference for the pixel of boundaries on either side is higher than thresholding and just is judged to be real border, otherwise just is judged to be false border.Utilize the L2 processing node that the false border of the boundaries on either side of piece is judged, if be judged as real border, then carry out filtering; If be judged as false border, then do not carry out filtering; If filtering is also carried out on false border, then can cause new error to image.
Wherein, α namely adjudicates false boundary threshold 1 corresponding to the alpha in the table 2, and β namely adjudicates false boundary threshold 2 corresponding to the beta in the table 2.And IndexA is the variable in the C programmer, corresponding to the i_index_a in the table 2, i.e. lookup table index.
Wherein interblock false border judgement needs following three formulas:
|p1-p0|<β;|q1-q0|<β;|p0-q0|<α;
Wherein the judgement of false border needs following three formulas in the piece:
|p0-q0|<(α>>2)+2;|p2-p0|<β;|q2-q0|<β;
P0, P1, P2, q0, q1, q2 all refer to a pixel value to be filtered of boundary vicinity.
The input variable of L1 processing node is is_intra, corresponding to the i_stride in the table 1, and the width of presentation video, nc_zero, corresponding to the mb.non_zero_count in the table 1, expression residual error nonzero coefficient number, mvx and mvy, corresponding to the mb.mv in the table 1, the expression motion vector.The L1 processing node is used for judging boundary filtering strength, and concrete judgment rule is as shown in table 3.
Table 3 filtering strength judgment condition relation table
Both sides, border image block pattern and condition Filtering strength (BS)
An image block is that infra-frame prediction and border are macroblock boundaries 4
An image block is infra-frame prediction 3
An image block is not zero to residual error coefficient entirely 2
The difference of image block motion vector is not less than 1 bright distance 1
Other 0
Need to prove, the L1 processing node that mapping is come out is used for judging boundary filtering strength, because 4 identical subfunctions that brightness vertical boundary filter function, brightness level boundaries filter function, colourity vertical boundary filter function and colourity horizontal boundary filter function all call, therefore, above-mentioned 4 kinds of boundary filtering strength unifications can be mapped in the operator module of L1 processing node.
3) the operator space-time diagram generates step: according to the function treatment of the carrying out of each node in the Data Control flow graph, from the operator cell library, take out at least one operator unit of corresponding function, the Data Control flow graph is converted to the operator space-time diagram of being formed by the operator unit;
When converting the Data Control flow graph to formed by the operator unit operator space-time diagram, according to its data flow dependency the Data Control flow graph is launched as much as possible earlier the function treatment that will carry out with each node in the combination replacement Data Control flow graph of one or more operators unit.When the Data Control flow graph was mapped to the operator space-time diagram, its mapping ruler was: the processing node in the Data Control flow graph is mapped as the computing class operator, will controls stream and be mapped as control class operator and/or class of paths operator, memory node is mapped as the storage class operator.And when being mapped to the operator space-time diagram, adopt parallel mode to launch as much as possible.Though adopt parallel mode to launch, enlarged the design area of integrated circuit, sequential is the fastest in this case.
The mode that the Data Control flow graph is launched includes but not limited to following several mode:
If the data flow in the Data Control flow graph is order related data flow structure, then adopt the mode of streamline to launch described order related data flow.
If when having feedback in the data flow in the Data Control flow graph, when data flow is a circulation time, can not be converted into flowing structure for the data flow that data dependence is arranged, there is data dependence between the data of event data stream, and when there is not data dependence in each the data flow inside that has data dependence, then described internal data is not existed each data flow of data dependence to adopt the mode of local flow's waterline to launch.
If there is not data dependence between the data flow in the Data Control flow graph, then adopts parallel mode to launch described parallel data stream, and convert the operator space-time diagram of being formed by the operator unit to.
Preferably adopt parallel mode to launch.For same processing node, as long as do not have correlation between the operator unit in the functional node, then adopt parallel mode to launch.For different processing nodes, as long as do not have correlation between each processing node, then also adopt parallel mode to launch.Though do the area that has increased ASIC like this.But, but farthest shortened the design time of ASIC.
By with upper type, just each node in the Data Control flow graph can be mapped to the operator space-time diagram all, shown in Fig. 5-9.What wherein, Fig. 5 represented is the operator space-time diagram that the L0 processing node shines upon out; Among Fig. 5, ADD represents addition operator, belongs to the computing class operator.The computing class operator has reconstruct, can by being the different instruction of control bit configuration, get final product so that same computing operator is realized multiple different function according to the demand of different application.Clip also belongs to the computing class operator, is used for truncated data, input variable is limited in a certain scope, thereby exceeds the amplitude limit computing that capping then or lower limit are finished the input data.
What Fig. 6 represented is the operator space-time diagram that the L1 processing node shines upon out, and what Fig. 7 represented is the operator space-time diagram that the L2 processing node shines upon out.Wherein, COMP represents is comparing class operator in the computing class operator; SAD be used for to calculate residual energy, and namely 2 pixel values subtract each other and take absolute value, and what ADDSM represented is the compound operation of addition and displacement.
What Fig. 8 represented is the operator space-time diagram that the L3 processing node shines upon out; Wherein, ADDC represents to bring into the add operation of position, and what Fig. 9 represented is the operator space-time diagram that the L4 processing node shines upon out, and wherein, what SADDM represented is the compound operation of displacement and addition.
In the generative process of operator space-time diagram, adopt the class of paths operator to come the correlation of the data in the identification data control flow graph.In fact, the main effect of path operator is for link storage operators and computing operator, forms the operator space-time diagram.Its way of realization has three kinds: multi-selection device, cross bar switch and bus.
4) temporal constraint step: total temporal constraint is determined in the requirement according to user specification demand and target integrated circuit technology, and each the operator unit label time in the operator space-time diagram carries out temporal constraint to each level of operator space-time diagram;
In the temporal constraint step, be unit with the function, each the operator unit label time in the operator space-time diagram of each function carries out temporal constraint according to data flow architecture in the Data Control flow graph to each level of operator space-time diagram.The event data stream structure is parallel data stream, then total temporal constraint is divided equally each the operator level in the space-time diagram of correspondence, and divides the temporal constraint of each operator level equally in this operator level each operator unit; If the data flow in the Data Control flow graph is serial data stream, then the corresponding sequential in operator unit that the longest arithmetic path in each operator level the is shone upon ratio that accounts for the corresponding sequential summation in operator unit that the longest arithmetic path shines upon in the child level is at long last distributed.
What Figure 10 showed is the schematic diagram that need carry out the 4*4 block boundary of filtering.As can be seen from Figure 10, luminance block and chrominance block have 48 borders need carry out filtering, and every border is the basic structure of the filtering represented of the removable Figure 11 of being divided into again.And filtering strength judgement and strong filtering and weak filtering formic acid can hocket by flowing water, and the former needs 3 clock cycle, and the latter needs 8 clock cycle.Filtering parameter calculates needs 3 clock cycle.Through the sequential mark, can obtain finishing the theoretical earliest possible cycle number of x264_frame_deblocking_filte function: 3+48*4+8=203 (cycle).
5) cluster compression step: on the basis of sequential annotation step, according to time-labeling the operator space-time diagram is carried out the cluster compression of (being on the hardware resource or on the area) on the space, and the overall time of implementation that makes the block elimination filtering algorithm is close to total temporal constraint;
Wherein, the cluster compression step comprises:
The cluster step is used for finding out the identical computing class operator of attribute and/or the identical storage class operator of memory attribute at the operator space-time diagram;
Compression step is used for the identical computing class operator of operational attribute spatially being merged compression and/or the storage class operator that memory attribute is identical spatially merges compression according to time-labeling;
The multiplexing step of operator is used for introducing the control class operator, and the computing class operator after the compression and/or storage class computing operator are generated the corresponding configuration instruction, realizes the multiplexing of computing class operator and/or storage class operator.
The precondition of cluster compression is: before and after the cluster compression, the function that the operator space-time diagram is realized can not change.Particularly, this step is the time-labeling according to each operator unit, in the operator space-time diagram, carry out the cluster compression, improve the multiplexing of storage class operator and computing class operator by increasing control class operator and class of paths operator, namely reduce the quantity of computing and storage class operator as far as possible.
In fact, an arthmetic statement is tiled out fully with operator, without any multiplexing, the speed of finishing of algorithm will be the fastest, but this cost of bringing will be very high area cost and power consumption, and it is desirable that this is not design.The target expectation of design is satisfied: under the prerequisite that satisfies the requirement of user's sequential, reduce area and power consumption demand, so need more multiplexing operators the biglyyest.So just need carry out the mark that corresponding function adds its time of implementation at each operator, thereby can make overall algorithm execution time want seeking time close to reality according to multiplexing operator of time of implementation, the time diagram of Here it is operator.Space diagram is the definition to the operator area, namely when the operator cluster is shone upon, carries out cluster according to the area of operator, selects the less operator of multiplexing area.In the operator cluster process, these two kinds of factors all need to consider.
When carrying out the compression of operator space-time diagram cluster, mainly follow following principle:
1, the computing class operator that operational attribute is identical in the operator space-time diagram is carried out the cluster compression.The precondition of cluster compression is that before and after the cluster compression, the function that the operator space-time diagram is realized can not change.For example, a plurality of parallel addition operators can be compressed into an addition operator in the operator space-time diagram.Certainly, in the process of compression, need realize the multiplexing of addition operator by the mode of introducing the control operator, with the number of addition operator in the minimizing operator space-time diagram, the while guarantees the compression back again and compresses the pre-operator space-time diagram and can realize identical functions.Another example for the different computing class operator of realizing difference in functionality in the front and back different clocks cycle, also can carry out the cluster compression; For example, need a computing operator of realizing addition function in first cycle, need realize the computing operator of a subtraction function at second period, in the process of carrying out the cluster compression, then can two computing operators be compressed into a computing operator by introducing clock class operator and control class operator.Under the control action of clock operator, change being controlled to be of computing operator by the control operator, make this computing operator in first clock cycle, realize addition function, in second clock cycle, realize subtraction function.
This shows that after the operator space-time diagram was compressed, the number of operator can significantly reduce, thereby had saved the area of integrated circuit.Yet, though the cluster compression process has realized the multiplexing of operator,, but increased time of implementation of block elimination filtering algorithm to a certain extent.
Be understandable that, cluster compression to the computing class operator must cause that the tired operator of storage, control class operator, class of paths operator and clock class operator also can correspondingly change, so can also do corresponding cluster compression with further saving integrated circuit area to above-mentioned each class operator according to actual conditions.
2, in the cluster compression process, when introducing the control operator, can generate corresponding configuration-direct in the different clock cycle.Configuration-direct is used for controlling the operator that generates works according to predetermined mode, thereby realizes and the identical function that compresses the pre-operator space-time diagram.
3, have multiple for the possible cluster compression result of same operator space-time diagram.Therefore, in compression process, the time of implementation of selecting compression back block elimination filtering algorithm near the operator space-time diagram of total temporal constraint time as final compression result.So select, then can satisfy under the situation of temporal constraint condition in assurance, save the area of integrated circuit the biglyyest.Total temporal constraint time is the maximum execution time of the block elimination filtering algorithm that calculates according to the performance index that the user proposes.By above-mentioned cluster compression to the operator space-time diagram, can reduce area and the power consumption of the integrated circuit of de-blocking filter.And the operator that cluster compression back generates has certain regularity.
6) lower floor's hardware mapping step: the lower floor's hardware circuit that generates de-blocking filter according to the operator space-time diagram after the cluster compression.
Because the operator space-time diagram that generates is exactly the description of RTL level in the existing integrated circuits design.Because the instrument of existing many moneys comparative maturity supports the description of RTL level to be converted into lower floor's hardware in the prior art, and implementation procedure satisfies requirements such as efficient, quick substantially.Therefore, use the present invention, the algorithm of the integrated circuit of the de-blocking filter that the system engineer can be rapidly will describe based on high-level language is mapped to rtl description, and the existing conversion instrument of recycling just is mapped to concrete lower floor's hardware circuit.The invention a kind of mapping tool from computer language to integrated circuit lower floor hardware circuit, realized that standardization integrated circuit generates the process of lower floor's hardware from high-level language, implements convenient and swift.
What need explanation a bit is, the c programming language of standard is the example explanation to describe H.264 in the present invention, but its method is not limited to H.264 standard and c programming language.For example, high-level language can also be selected Java language, MATLAB M language etc., and for different language, its algorithm of describing the de-blocking filter integrated circuit is also corresponding to be different.
Needing at second of explanation is after the cluster compression step, can also some operator in the operator space-time diagram to be optimized; A kind of mode of optimization is some operator to be carried out solidify customization.Such as, Figure 12 is the computing class operator after a kind of compression, because logic unit does not wherein use, so obtain the operator structure shown in Figure 13 after the logic unit of this operator can being removed, has dwindled the area of operator further.Like this, both can guarantee the correct execution of algorithm, can optimize area and the power consumption of integrated circuit again.
Simultaneously, the present invention also provides a kind of de-blocking filter for video encoder, de-blocking filter is used for receiving brightness and the chroma data of image, comprise: for the brightness vertical boundary filter function piece that image is carried out the filtering of brightness vertical boundary, for the brightness level boundaries filter function piece that image is carried out brightness level boundaries filtering, for the colourity vertical boundary filter function piece that image is carried out the filtering of colourity vertical boundary, for the colourity horizontal boundary filter function piece that image is carried out the filtering of colourity horizontal boundary, described brightness vertical boundary filter function piece, brightness level boundaries filter function piece, colourity vertical boundary filter function piece and colourity horizontal boundary filter function piece include the filtering parameter computing unit for the calculation of filtered parameter, for the filtering condition decision unit that filtering condition is judged, be used for calculating the strong filtering computing unit of strong filtering and the weak filtering computing unit that is used for calculating weak filtering, described each unit is made up of the operator unit.
The method for designing based on the hardware integrated circuit of the de-blocking filter of operator of using that the present invention proposes, the system engineer can be according to being enough to support the complete operator function storehouse of describing the high-level language algorithm, rapidly will be with the rtl description that is mapped to of each functional module of the de-blocking filter of high-level language arthmetic statement, the existing instrument of recycling is mapped to lower floor's hardware circuit of de-blocking filter with rtl description, thereby has finished the quick design of de-blocking filter lower floor hardware integrated circuit.This shows, the invention a kind of mapping tool of the lower floor's hardware circuit from the computer language to the de-blocking filter, realized that standardization integrated circuit generates the process of lower floor's hardware from high-level language, implements convenient and swift.And, reduced widely because artificially each functional module of the de-blocking filter of high-level language arthmetic statement is converted into the time of RTL level arthmetic statement, thereby shortened the time of lower floor's integrated circuit of design de-blocking filter, further satisfy client's demand, also promoted the development of HD video coding techniques to a certain extent.
And, this kind de-blocking filter can also with video encoder in other application-specific integrated circuit ASICs partly carry out parallel pipelining process operation, accelerated the desin speed of the ASIC of video encoder.Simultaneously, this kind de-blocking filter goes for different video encoder structures, has good versatility.
Simultaneously, the present invention is also by carrying out the cluster compression to the operational attribute in the operator space-time diagram that has marked the time identical computing class operator and/or the identical storage class operator of memory attribute; Simultaneously, introduce the control operator and realize multiplexing to the operator after the cluster compression, and then dwindled the area of space-time diagram greatly.
Above content be in conjunction with concrete execution mode to further describing that the present invention does, can not assert that concrete enforcement of the present invention is confined to these explanations.For the general technical staff of the technical field of the invention, without departing from the inventive concept of the premise, can also make some simple deduction or replace, all should be considered as belonging to protection scope of the present invention.

Claims (5)

1. an implementation method that is used for the de-blocking filter of video encoder is characterized in that, may further comprise the steps:
Each filter function of the de-blocking filter of the video encoder of high level language arthmetic statement is mapped to the hardware logic that is made of the operator unit to be described; The hardware logic that is made of described operator unit is described the hardware integrated circuit that generates de-blocking filter, each filter function of the de-blocking filter of the video encoder of high level language arthmetic statement is mapped to the hardware logic that is made of the operator unit describes and comprise the steps:
Process analysis step: read the high level language algorithm routine of each filter function of de-blocking filter, from described high level language algorithm routine, identify mapped execution object and parameter object according to this high level language rule;
The Data Control flow graph generates step: the execution object that will identify and parameter object are mapped to the respective nodes in the Data Control flow graph of describing the de-blocking filter algorithm;
The operator space-time diagram generates step: according to the function treatment of each node in the Data Control flow graph, from the operator cell library, take out at least one operator unit of corresponding function, convert described Data Control flow graph to be made up of the operator unit operator space-time diagram, the operator unit in the described operator cell library comprises at least a in control class operator, computing class operator, storage class operator, class of paths operator and the clock class operator;
The temporal constraint step: total temporal constraint is determined in the requirement according to user specification demand and target integrated circuit technology, and each the operator unit label time in the operator space-time diagram carries out temporal constraint to each level of operator space-time diagram;
Cluster compression step: according to time-labeling the operator space-time diagram is carried out cluster compression on the space, and the overall algorithm execution time that makes block elimination filtering is close to total temporal constraint;
Lower floor's hardware mapping step: the operator space-time diagram after the cluster compression is mapped to the hardware logic that is made of the operator unit describes;
Described process analysis step comprises:
From the high level language algorithm routine of described block elimination filtering, identify mapped execution object and parameter object, described execution object comprises operational order and/or control command, and described parameter object comprises at least a in input data, output data and the intermediate data;
The analytic function call relation identifies the parameter object and the discriminant function subfunction of whether calling in service that relate in function and the function operation; If call subfunction, then analyze the call relation of subfunction, identify subfunction and the subfunction parameter object that relates in service and judge subfunction other subfunction of whether calling again in service;
Generate in the step at described Data Control flow graph, described operational order is mapped as processing node, described control command is mapped as for the control of identification-state, state transitions condition and state control signal stream, described parameter object is mapped as memory node on the data flow.
2. the method for claim 1, it is characterized in that described filter function comprises for the brightness vertical boundary filter function that image is carried out the filtering of brightness vertical boundary, for the brightness level boundaries filter function that image is carried out brightness level boundaries filtering, for the colourity vertical boundary filter function that image is carried out the filtering of colourity vertical boundary with for the colourity horizontal boundary filter function that image is carried out the filtering of colourity horizontal boundary.
3. method as claimed in claim 2 is characterized in that, described operator space-time diagram generates step and comprises:
The Data Control flow graph is launched according to its data flow dependency;
The function treatment of carrying out according to each node after launching is taken out at least one operator unit of corresponding function from the operator cell library of setting up in advance, described processing node is mapped as the computing class operator, described control stream is mapped as control class operator and/or class of paths operator, described memory node is mapped as the storage class operator.
4. method as claimed in claim 3 is characterized in that, described temporal constraint step comprises: according to data flow architecture in the Data Control flow graph each level of operator space-time diagram is carried out temporal constraint; If described data flow architecture is parallel data stream, then total temporal constraint is divided equally each the operator level in the space-time diagram of correspondence, and divided the temporal constraint of each operator level equally in this operator level each operator unit; If the data flow in the described Data Control flow graph is serial data stream, then the corresponding sequential in operator unit that the longest arithmetic path in each operator level the is shone upon ratio that accounts for the corresponding sequential summation in operator unit that the longest arithmetic path shines upon in the child level is at long last distributed.
5. method as claimed in claim 4 is characterized in that, described cluster compression step comprises:
The cluster step is used for finding out the identical computing class operator of attribute and/or the identical storage class operator of memory attribute at the operator space-time diagram;
Compression step is used for the identical computing class operator of operational attribute spatially being merged compression and/or the storage class operator that memory attribute is identical spatially merges compression according to time-labeling;
The multiplexing step of operator is used for introducing the control class operator, and the computing class operator after the compression and/or storage class computing operator are generated the corresponding configuration instruction, realizes the multiplexing of computing class operator and/or storage class operator.
CN 201010620016 2010-12-31 2010-12-31 Deblocking filter for video coder and implementation method thereof Active CN102055981B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201010620016 CN102055981B (en) 2010-12-31 2010-12-31 Deblocking filter for video coder and implementation method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201010620016 CN102055981B (en) 2010-12-31 2010-12-31 Deblocking filter for video coder and implementation method thereof

Publications (2)

Publication Number Publication Date
CN102055981A CN102055981A (en) 2011-05-11
CN102055981B true CN102055981B (en) 2013-07-03

Family

ID=43959845

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201010620016 Active CN102055981B (en) 2010-12-31 2010-12-31 Deblocking filter for video coder and implementation method thereof

Country Status (1)

Country Link
CN (1) CN102055981B (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102055981B (en) * 2010-12-31 2013-07-03 北京大学深圳研究生院 Deblocking filter for video coder and implementation method thereof
CN102223543B (en) * 2011-06-13 2013-09-04 四川虹微技术有限公司 Reference pixel read and storage system
KR20130050403A (en) * 2011-11-07 2013-05-16 오수미 Method for generating rrconstructed block in inter prediction mode
TWI667913B (en) * 2012-04-06 2019-08-01 日商新力股份有限公司 Image processing device, image processing method, computer program, and recording medium
CN103379327A (en) * 2012-04-24 2013-10-30 安凯(广州)微电子技术有限公司 Block effect removing filtering method
US9503753B2 (en) * 2012-09-24 2016-11-22 Qualcomm Incorporated Coded picture buffer arrival and nominal removal times in video coding
CN104113765A (en) * 2014-07-28 2014-10-22 北京大学深圳研究生院 Video coding and decoding method and device
WO2020211770A1 (en) * 2019-04-15 2020-10-22 Beijing Bytedance Network Technology Co., Ltd. Temporal prediction of parameters in non-linear adaptive loop filter
CN111800643A (en) * 2020-07-03 2020-10-20 北京博雅慧视智能技术研究院有限公司 Deblocking filter for video coding and filtering method thereof
CN113706368A (en) * 2021-10-29 2021-11-26 湖北亿咖通科技有限公司 Image data processing method and device, electronic equipment and storage medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102055981A (en) * 2010-12-31 2011-05-11 北京大学深圳研究生院 Deblocking filter for video coder and implementation method thereof

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6922492B2 (en) * 2002-12-27 2005-07-26 Motorola, Inc. Video deblocking method and apparatus

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102055981A (en) * 2010-12-31 2011-05-11 北京大学深圳研究生院 Deblocking filter for video coder and implementation method thereof

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
DAI Peng,Wang Xin"an等.A novel reconfigurable operator based IC design methodology for multimedia processing.《TENCON2009-2009 IEEE REGION 10 CONFERENCE》.2009,1-5. *
Kuilin Chen等.Operator design methodogy and implementation for H.264 entropy encoder.《information engineering and computer science(ICIECS),2010 2nd international conference on》.2010,1-4. *
Ziyi Hu *
王新安.算子设计方法缩小IC设计与制造间的"剪刀差".《集成电路应用》.2010,(第7期),23-24,26.
算子设计方法缩小IC设计与制造间的"剪刀差";王新安;《集成电路应用》;20100710(第7期);全文 *

Also Published As

Publication number Publication date
CN102055981A (en) 2011-05-11

Similar Documents

Publication Publication Date Title
CN102055981B (en) Deblocking filter for video coder and implementation method thereof
CN102088603B (en) Entropy coder for video coder and implementation method thereof
CN102547296B (en) Motion estimation accelerating circuit and motion estimation method as well as loop filtering accelerating circuit
CN101854540B (en) Intra prediction method and device for employing H.264 video coding standard
CN102043886B (en) Underlying hardware mapping method for integrated circuit as well as time sequence constraint method and device for data control flow
CN101115207B (en) Method and device for implementing interframe forecast based on relativity between future positions
CN102054108B (en) Lower hardware mapping method of integrated circuit, and time-space diagram compression method and device
CN102148990B (en) Device and method for predicting motion vector
CN106034235A (en) Method for calculating coding distortion degree and coding mode control and system thereof
CN102572430A (en) Method for implementing H.264 deblocking filter algorithm based on reconfigurable technique
CN1589028B (en) Predicting device and method based on pixel flowing frame
CN102054109B (en) Lower hardware mapping method of integrated circuit, and data control flow generation method and device
CN101909212A (en) Multi-standard macroblock prediction system of reconfigurable multimedia SoC
Xu et al. A power-efficient and self-adaptive prediction engine for H. 264/AVC decoding
CN101383971A (en) Intra-frame prediction processing method based on image encoding and decoding
CN102054107B (en) Lower hardware mapping method of integrated circuit, and space-time diagram generation method and device
CN108924551A (en) The prediction technique and relevant device of video image coding pattern
CN102055980B (en) Intra-frame predicting circuit for video coder and realizing method thereof
CN102075762B (en) Inter-frame predictor circuit for video encoder and method for implementing same
CN102572415B (en) Method for maping and realizing of movement compensation algorithm on reconfigurable processor
CN101902643B (en) Very large-scale integration (VLSI) structural design method of parallel array-type intraframe prediction decoder
Han et al. Optimization of motion compensation based on GPU and CPU for VVC decoding
CN100469146C (en) Video image motion compensator
Doan et al. Multi-asip based parallel and scalable implementation of motion estimation kernel for high definition videos
CN102420989A (en) Intra-frame prediction method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant