CN108449603A - Based on the multi-level task level of multi-core platform and the parallel HEVC coding/decoding methods of data level - Google Patents
Based on the multi-level task level of multi-core platform and the parallel HEVC coding/decoding methods of data level Download PDFInfo
- Publication number
- CN108449603A CN108449603A CN201810239375.1A CN201810239375A CN108449603A CN 108449603 A CN108449603 A CN 108449603A CN 201810239375 A CN201810239375 A CN 201810239375A CN 108449603 A CN108449603 A CN 108449603A
- Authority
- CN
- China
- Prior art keywords
- decoding
- level
- thread
- adjacent
- ctu
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/44—Decoders specially adapted therefor, e.g. video decoders which are asymmetric with respect to the encoder
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/117—Filters, e.g. for pre-processing or post-processing
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/124—Quantisation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/13—Adaptive entropy coding, e.g. adaptive variable length coding [AVLC] or context adaptive binary arithmetic coding [CABAC]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
- H04N19/176—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/42—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/42—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
- H04N19/436—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation using parallelised computational arrangements
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/70—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/80—Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation
- H04N19/82—Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation involving filtering within a prediction loop
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/85—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression
- H04N19/86—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression involving reduction of coding artifacts, e.g. of blockiness
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/90—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using coding techniques not provided for in groups H04N19/10-H04N19/85, e.g. fractals
- H04N19/91—Entropy coding, e.g. variable length coding [VLC] or arithmetic coding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/90—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using coding techniques not provided for in groups H04N19/10-H04N19/85, e.g. fractals
- H04N19/96—Tree coding, e.g. quad-tree coding
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Computing Systems (AREA)
- Theoretical Computer Science (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
The invention discloses based on the multi-level task level of multi-core platform and the parallel HEVC coding/decoding methods of data level.The present invention utilizes the dependence in HEVC data, using multi-core processor as processing platform, in combination with HEVC standard, entire HEVC decoders are divided into 5 task modules, respectively code stream read module, entropy decoder module, pixel reconstruction module, deblocking filtering module and sampling point adaptive equalization module;Parallel method, while the pipeline and parallel design based on CTU units between each intermodule realizes different decoding tasks using the dependence of each CTU units are separately designed for different decoding task modules.It introduces data redundancy and reduces mechanism, only part reference image vegetarian refreshments is put into spatial cache, excessive data manipulation is avoided, buffer-stored space is effectively managed, promotes decoding efficiency.The multi-core parallel concurrent decoding algorithm that the method for the present invention uses substantially increases decoded parallel speedup ratio, and ensure that decoded image quality compared to serial decoding.
Description
Technical field
The invention belongs to digital video signal codec domains, and in particular to be based on the multi-level task level of multi-core platform and number
According to the parallel HEVC coding/decoding methods of grade.
Background technology
As high definition, ultra high-definition Video Applications sharply increase, to improve compression performance and meet massive video data
Transimission and storage requirement, Video coding integration and cooperation group JCT-VC formally issued in April, 2013 and of new generation efficiently regarded
Frequency encoding and decoding international standard HEVC (High Efficiency Video Coding).The main target of HEVC encoding and decoding standards is
On the basis of H.264/AVC standard, under the premise of ensureing same video picture quality, by high-resolution video image pressure
Contracting efficiency doubles, and so that the code check of video flowing is reduced 50%, and then better adapt to a variety of different network environments, while energy
Enough support multi-core parallel concurrent encoding and decoding.
The coding framework of HEVC has continued to use hybrid encoding frame H.264/AVC, is a kind of block-based hybrid coding side
Case.But different from previous hybrid coding scheme, HEVC almost takes important improvement in each coding link and arranges
It applies.Coded frame is divided into several adjacent but nonoverlapping rectangular coding tree unit (Coding Tree Unit, CTU), CTU by HEVC
Several rectangular coding unit CU (Coding Unit) can be further divided into according to the form of quaternary tree, CU can also be decomposed
For smaller predicting unit PU (Prediction Unit) and converter unit TU (Transform Unit).HEVC is H.264/
Many directional prediction modes are increased on the basis of AVC prediction modes to eliminate the spatial coherence of image, each PU supports 35 kinds
Prediction mode.HEVC in addition to using with deblocking filtering (DeBlocking Filter, DBF) in H.264/AVC similar ring
Outside, it also adds new sampling point adaptive equalization SAO (Sample Adaptive Offset) intra-loop filtering tool, is subtracted with this
Few distortion.
Compared with previous video encoding and decoding standard, HEVC is faced with the problem of computational complexity increases sharply, this is directly
Its operation and realization are affected, and it is exactly in multi-core platform to improve processing speed, enhance one of the effective ways of computing capability
Upper progress parallelization processing.Tilera series multi-core processor is just very representative in the market in current multi-core processor,
It uses gridding multicore architecture as a kind of reconfigurable array structure DSP, passes through the numerous processors of iMesh real-time performances
Core interconnects, and single-chip calculation processing power is improved tens to hundreds of times.Domestic many scholars are in multi-core processor
On to video encoding and decoding standard done some research.2016, the Fang Di of Nanjing Univ. of Posts and Telecommunications was " more based on Tilera in its paper
HEVC decoders are divided into three task modules in the research and realization of the multi-level parallel decoding methods of HEVC of core processor ",
Respectively entropy decoder module, pixel decoder module and deblocking filtering module, and base is separately designed with regard to latter two task module
In the parallel method of CTU rows, serial decoding operate is carried out with same CTU rows are individually checked, while utilizing CTU rows between task module
Dependence realize decoder parallel computation processing.2016, the Liu Peng of Southwest Jiaotong University " was based on multinuclear in its paper
Embedded HEVC decoders parallel optimization " in have studied a kind of deblocking filtering parallel method, in the method, decoder meeting
The CTU rows of balanced number are distributed for per thread, multiple CTU rows that then each thread can first be responsible for it carry out vertical boundary
Filtering operation, after the completion of the vertical boundary filtering operation for waiting for a frame image, then again with these threads to water all in CTU rows
Pingbian circle is handled.But their research has certain limitation, when such as carrying out parallel processing to each task module, be with
The parallel granularity of CTU behaviors, decoder are gone serially to decode a line CTU, but pixel decoder module and deblocking filtering with single thread
Its dependence when carrying out parallelization processing of module is present on each CTU units, rather than CTU rows, this is one
Determine that time delay can be increased in degree, while making multi-core resource from making full use of, causes the waste of nuclear resource.In addition square is gone to filter
The parallel processing of wave, fully separates the filtering operation of vertical boundary and horizontal boundary, does not fully consider each boundary
Dependence fails to be combined vertical boundary with the filtering operation of horizontal boundary to realize that parallel work-flow, parallel efficiency fail
Effectively promoted.
Invention content
It is with CTU when carrying out parallel processing to each task module the technical problem to be solved by the present invention is to the prior art
The parallel granularity of behavior, does not account for the dependence of CTU units;The parallel processing of deblocking filtering simultaneously, fully will be vertical
The filtering operation of boundary and horizontal boundary separates, and does not fully consider that the dependence on each boundary, parallel efficiency fail effectively
It is promoted.
In order to solve the above technical problems, the present invention provide it is parallel based on the multi-level task level of multi-core platform and data level
HEVC coding/decoding methods, including:
Step 1, main thread carry out initialization operation to entire HEVC decoders first, read binary code stream file, wound
It builds HEVC decoders and internal storage location is applied;
Step 2, the code stream and call function that present frame is intercepted from read binary code stream, set video code flow
Order member is parsed, and obtained video parameter and global information are preserved into decoding image object structure;
Step 3, according to configuration parametric distribution preset number thread, by multi-kernel function library by per thread from it is different
Core is bound;
Step 4, a thread read video code flow, carry out entropy decoding parsing operation to video code flow, are operated in entropy decoding
After will obtained quantization parameter QP and residual error data be stored in frame buffer in;
After the completion of step 5, the operation of the entropy decoding of current CTU units, if a left side for current CTU units is adjacent, upper left is adjacent, on
The CTU unit pixel decoding and reconstitutings adjacent, upper right is adjacent arrange thread to carry out pixel decoding to current CTU units if all completing
Reconstruct;If thread completes the pixel decoding and reconstituting operation of current CTU units, it is transferred to task queue and enters wait state;
After step 6, current CTU unit pixels decoding and reconstituting terminate, and the left adjacent C TU units of current CTU units are complete
At vertical boundary filtering operation, arranges thread to be filtered operation to current CTU units progress vertical boundary immediately, handled
Cheng Hou, thread are transferred to task queue and enter wait state;
After step 7, current CTU units complete vertical boundary filtering operation, and a left side for current CTU units is adjacent, upper adjacent
The filtering of vertical boundary is completed in CTU units, then thread is arranged to carry out the filtering operation of horizontal boundary to it;
If a left side for step 8, current CTU units is adjacent, upper left is adjacent, upper CTU unit sampling points adjacent, that upper right is adjacent are adaptive
It answers compensating operation all to complete, thread is arranged to carry out sampling point adaptive equalization SAO operation to current CTU units;
Step 9, in next frame picture frame repeat step 5~8 until a frame video code flow decoding complete;
Step 10, complete a frame video code flow decoding after, detection video code flow whether all decoding complete, if complete
Then discharge all resources and destroying threads pond;If not completing, return to step 4.
The advantageous effect that the present invention is reached:The present invention is using multi-core processor as processing platform, in combination with HEVC standard,
Entire HEVC decoders are divided into 5 task modules, respectively code stream read module, entropy decoder module, pixel reconstruction module,
Deblocking filtering module and sampling point adaptive equalization module;Parallel method is separately designed for different decoding task modules,
And buffer-stored space is effectively managed, while in each intermodule different solutions are realized using the dependence of each CTU units
Pipeline and parallel design based on CTU units between code task, uses Thread Pool Technology to dynamically distribute thread resources for decoding task,
Multi-core resource utilization rate is improved, and can be transplanted in other multi-core processors;This method is ensureing decoded image quality
Under the premise of, parallel decoding is carried out to the single code stream of high definition that any parallel encoding mode is not used in multi-core processor platform,
The parallel work-flow of Video Decoder is completed, decoder entirety parallel speedup ratio is improved and promotes the core utilization of multi-core processor
Rate.
Description of the drawings
Fig. 1 is the flow chart of the method for the present invention embodiment;
Fig. 2 is the HEVC decoder architectures that the method for the present invention embodiment is realized;
Fig. 3 be the method for the present invention embodiment pixel decoding and reconstituting module can parallel processing CTU units schematic diagram;
Fig. 4 need to be stored in the pixel of buffer storage when being the pixel decoding operate of the method for the present invention embodiment;
Reference image vegetarian refreshments required when being the current CTU unit pixels decoding of the method for the present invention embodiment Fig. 5;
Boundary strength calculation flow chart when Fig. 6 is the deblocking filtering operation of the method for the present invention embodiment;
Fig. 7 is each boundary dependency analysis schematic diagram of the row deblocking filtering parallel processing of the method for the present invention embodiment;
When Fig. 8 is two stages operatings of deblocking filtering of the method for the present invention embodiment, need to be stored in buffer storage
In pixel;
Fig. 9 is the dependence schematic diagram of each subtask module CTU units of the method for the present invention embodiment;
Figure 10 is processing schematic diagram of the different subtask modules of the invention on each CTU units;
Figure 11 is the multitask module pipeline parallel decoding Organization Chart of the present invention;
Figure 12 is the average speedup and WPP that the present invention carries out different video sequence when multi-core parallel concurrent decoding at different Q P
The comparing result figure of decoder algorithm.
Specific implementation mode
The invention will be further described below in conjunction with the accompanying drawings.Following embodiment is only used for clearly illustrating the present invention
Technical solution, and not intended to limit the protection scope of the present invention.
Using Tilera GX36 multi-core processors as hardware experiment platform, it is made of the present invention 36 Tile cores,
Tilera multi-core processors possess the multi-Core Development tool of complete set, are provided a convenient for the exploitation of multinuclear parallel program.
The flow chart of Fig. 1 the method for the present invention embodiments;Specifically according to the following steps:
Step 1, main thread carry out initialization operation to entire decoder first first, read binary code stream file, wound
It builds HEVC decoders and internal storage location is applied;
Further, step 2, from read binary code stream intercept present frame code stream and call function, to regarding
The correlation unit of frequency code stream is parsed, and obtains being sealed in video parameter therein and some global informations such as predicted value, compensation
Value, quantization parameter, running parameter, then will be in these information preservations to decoding image object structure.
Further, step 3, for reduce frequently create and terminate thread caused by expense, opened in decoder program
When beginning to carry out, the thread of fixed number is disposably distributed according to configuration parameter, fixed number can be preset, these threads are certainly
Begin to whole presence, and take thread pool (Thread Pool) technology, by multi-kernel function library by per thread and different core
It is bound, ensures that each thread can be sufficiently carried out scheduling, realize the decoding of multinuclear efficient parallel.It is hindered simultaneously in order to reduce
Plug does not enter obstruction shape with cost caused by thread, worker thread is waken up when solving terminal information or waiting for CTU units
State.Enter major cycle later to handle;
Further, one step 4, arrangement thread are read out video code flow, and entropy decoding parsing behaviour is carried out to it
Make, and obtained quantization parameter QP and residual error data are stored in frame buffer after entropy decoding operates;One CTU unit
After the completion of entropy decoding operation, you can obtain required all information when its reconstruction;
Further, if the various information needed for step 5, a CTU unit reconstruction operation has all obtained, for the CTU
Row arranges four threads to carry out the operation of the pipeline parallel method between different subtasks.Using pixel decoding and reconstituting parallel method come pair
CTU units carry out pixel decoding operate, and it is corresponding to carry out to distribute a core for the CTU units that each dependence is met
Processing, until CTU unit pixels decoding terminates.The CTU units pixel to be referred to below is only put into buffer storage,
Rather than all pixels point all puts in, carry out further efficiency;
Further, step 6, after CTU unit pixels decoding and reconstituting terminates in picture frame, and with its left adjacent C TU
Vertical boundary filtering operation is completed in unit, arranges what a thread carried out it vertical boundary to be filtered operation immediately, together
The CTU units pixel to be referred to below is only put into buffer storage by sample;
Further, after step 7, current CTU units complete vertical boundary filtering operation, and a left side is adjacent, upper adjacent C TU is mono-
The filtering of vertical boundary is completed in member, then thread is arranged to carry out the filtering operation of horizontal boundary to it;
Further, step 8, for meet dependence CTU units carry out sampling point adaptive equalization operate SAO operation,
To complete the reconstruction of the CTU units.The CTU units rebuild in this way can be conducive to quickly referenced by other frames
It realizes interframe parallel processing operations, more effectively promotes the utilization rate of multi-core resource;
Further, step 9, the CTU units met to dependence in next frame picture frame arrange thread to carry out
Pixel decoding and reconstituting operates, and repeats above step;
Further, step 10, complete a frame video code flow decoding after, detection video code flow whether all decoded
At, if complete if discharge all resources and destroying threads pond;If not completing, return to step 4.
Fig. 2 show the HEVC decoder architectures of the method for the present invention embodiment realization.First to two after coding into
Bit stream processed carries out entropy decoding, to obtain quantization parameter and control information, then carries out inverse quantization and contravariant to quantization parameter
It changes, obtains residual information.Following decoder carries out intra prediction and inter-prediction using control information, predictive information with go back
The residual information that original goes out is combined, and is handled, is obtained using deblocking filtering and the loop filtering of sampling point adaptive equalization
The image of output.
The multinomial new coding structure and encoding tool that HEVC is provided all embody the " friend that it realizes parallel processing
It is good ".Entire HEVC decoders are divided into 5 task modules by the present invention, respectively code stream read module, entropy decoder module, as
Plain reconstructed module, deblocking filtering module and sampling point adaptive equalization module.It is set respectively for different decoding task modules
Parallel method is counted, and buffer-stored space is effectively managed, while the dependence of each CTU units is utilized in each intermodule
It realizes the pipeline and parallel design based on CTU units between different decoding tasks, uses Thread Pool Technology for decoding task dynamic point
With thread resources, multi-core resource utilization rate is improved, and can be transplanted in other multi-core processors.
Fig. 3 be the method for the present invention embodiment pixel decoding and reconstituting module can parallel processing CTU cell schematics.To
Realize that the pixel based on CTU units decodes parallel work-flow, it is necessary to so that the dependence of each CTU units is met.Each CTU is mono-
Member has data dependency with the adjacent CTU units in this 4, its left, upper left side, top and upper right side, only when this four
The dependence of i.e. CTU units, which obtains meeting, after the pixel decoding and reconstituting operation of CTU is completed to carry out pixel solution to current CTU
Code operation.The present invention is that multiple CTU units distribute multiple threads with synchronous progress simultaneously using the Dynamic Scheduling Strategy of thread pool
The processing of pixel decoding and reconstituting, while to ensure the decoding dependency relationship of each CTU units, the processing progress of previous CTU rows is than rear
A line carries the first two CTU units.
Fig. 4 need to be stored in the pixel of buffer storage when being the pixel decoding operate of the method for the present invention embodiment.It introduces
After data redundancy reduction mechanism, when carrying out pixel decoder module, it is not necessary to which the pixel of entire CTU units is put into buffer-stored
In space, only the boundary pixel point of its top CTU and left CTU units are put into buffer storage, it can be to avoid a large amount of superfluous
Remaining operation.
Fig. 5 is that the current CTU units of the method for the present invention embodiment carry out reference image vegetarian refreshments required when pixel decoding operate.It can
To find, according to the method for reduction redundant data operation shown in Fig. 4, segment boundary pixel is only put into buffer-stored space
It goes, the dependence of current CTU units can still be met, and carry out decoding operate.
Fig. 6 is that the method for the present invention embodiment carries out boundary strength calculation flow chart when deblocking filtering operation.Square is gone to filter
The Major Difficulties of wave are to judge whether to be filtered a specific block boundary, while determining its filtering strength.Too strong
Filtering may result in image detail region excess smoothness, and filtering strength can not enough allow blocking artifact to reduce subjective quality.Really
Whether the pixel on a fixed block boundary both sides needs to filter, and the difference for depending primarily on the reconstruction pixel value on block boundary both sides is special
Whether sign, adaptive determining are filtered operation, if necessary to filter, then to arrange filtering strength appropriate and filtering depth.
Fig. 7 is each boundary dependency analysis schematic diagram that the method for the present invention embodiment carries out deblocking filtering parallel processing.
Filter switch and strong and weak judgement are related to 4 row bounds or so, 6 pixels.Strong filtering can be to the left and right side adjacent 3 of vertical boundary
A pixel value is updated, weak filtering can two pixel values each to boundary adjacent block be updated.In view of the above filtering operation
When pixel value more new relation, find arbitrary vertical boundary and other vertical boundaries all without dependence, and filtering operation
It is mutually independent of each other, therefore parallel filtering processing can be carried out to it.But for horizontal boundary, such as Fig. 7, the surrounded picture of dotted line
Plain region is the CTU cell sizes for being actually filtered operation, is equivalent to a revised CTU unit, thus need etc. with
Adjacent, the upper adjacent CTU units completion vertical boundary in its left side is filtered operation, could be to the horizontal sides of current CTU units
Boundary is filtered.
When Fig. 8 is carry out two stages operatings of deblocking filtering of the method for the present invention embodiment, current CTU units needs are put
To the pixel in buffer storage.These pixels are that next CTU is filtered required reference image vegetarian refreshments when operation.Only
These pixels being relied on are stored, the redundant operation of data is reduced, promote decoding efficiency.
Fig. 9 is the dependence schematic diagram of each subtask module CTU units of the method for the present invention embodiment.Current CTU is mono-
Member will carry out pixel decoding and reconstituting operation, and corresponding behaviour is completed in the CTU units for needing its left, top, three, upper right side adjacent
Make, when carrying out parallel processing to it, so that pixel dependence is met.In addition as shown in Figure 9 to Nth row CTU kth
When a CTU units carry out deblocking filtering operation, deblocking filtering operation is completed in left adjacent, upper adjacent CTU units therewith,
And N+1 rows CTU has completed the pixel decoding and reconstituting operation of k-th of CTU unit.It similarly will be to k-th of CTU of Nth row CTU
When unit carries out sampling point adaptive equalization operation SAO operation, N+1 rows CTU is needed to complete+1 CTU unit of kth
Deblocking filtering operates.But between two adjacent image frames, it is adaptive to carry out sampling point in last column CTU of current frame image
When answering compensating operation SAO operation, thread equally exists empty waiting problem, so by means of the thought of OWF algorithms, it is waiting
The CTU units that thread directly meets inter-prediction dependency relationships in next image frame carry out pixel decoding and reconstituting behaviour
Make.Simple point is said, when a decoding task module is divided into multiple subtask modules, though multiple subtask modules can be into
Row parallel processing operations, but this parallel work-flow does not start to walk simultaneously, and there is two between two neighboring subtask module
The time delay of CTU units, to ensure that the decoded dependence of CTU units is met.
Figure 10 show processing schematic diagram of the different subtask modules of the invention on each CTU units.It can be found that pixel
There is the delays of two CTU units in adjacent CTU rows for decoding operate, while when a CTU unit completes pixel decoding
It is at once one thread of CTU units arrangement to carry out being filtered for vertical boundary after reconstructed operation, waits for that vertical boundary filters
After the completion, being filtered for horizontal boundary is carried out immediately.In addition when the dependence that sampling point adaptive equalization operates SAO operation obtains
To when meeting, then arranges a thread to carry out sampling point adaptive equalization to CTU units and operate SAO operation.Simultaneously in order to prevent one
Overwhelming majority thread enters empty the case where waiting for when the operation of frame image decoding is near completion, and this method is by OWF algorithms
Thread under being waited in thread pool is used directly to obtain inter-prediction dependency relationships in next image frame by thought
CTU units to satisfaction carry out pixel decoding and reconstituting operation, to realize the parallel processing operations between consecutive frame.
Figure 11 show the multitask module pipeline parallel decoding Organization Chart of the present invention.Entire HEVC decoders are divided
For 5 task modules, respectively code stream read module, entropy decoder module, pixel reconstruction module, deblocking filtering module and sample
Point self-adapted compensating module.Parallel method is separately designed for different decoding task modules, and buffer-stored space is carried out
Effectively management, while based on CTU units between each intermodule realizes different decoding tasks using the dependence of each CTU units
Pipeline and parallel design uses Thread Pool Technology to dynamically distribute thread resources for decoding task, improves multi-core resource utilization rate, and
It can be transplanted in other multi-core processors.
To verify the effect of the method for the present invention, following confirmatory experiment has been carried out:It is decoded, is chosen using the method for the present invention
3 kinds of video sequences " BasketballDrive ", " Cactus ", " Kimono ", QP are respectively 22,27,32,37.The solution of the present invention
Code algorithm realizes multi-core parallel concurrent decoding respectively on Tilera multi-core processors and monokaryon serially decodes.In order to better
The performance based on the multi-level task level of multi-core platform Yu the parallel HEVC coding/decoding methods of data level is weighed, experiment will introduce WPP simultaneously
Row method.By the comparison with WPP methods, to carry out the comparative analysis of parallel speedup ratio.
Figure 12 indicates that the present invention carries out different video sequence when multi-core parallel concurrent decoding in multi-core processor at different Q P
Average speedup, and compared with WPP decoder algorithms.Wherein the performance of parallel Programming is indicated with speed-up ratio,
It is specific as follows:
Table 1 shows parallel decoding speed-up ratio experimental result of the method for the present invention under different check figures under different Q P.
1 experimental result of table
By experimental data observation analysis it is found that for the same video test sequence, different quantization QP values can produce
Raw different decoding speed-up ratio, the quantization QP values that coding side uses are smaller, and the speed of decoder parallel decoding is also slower.It compares
It is designed herein to be based on the multi-level task level of multi-core platform and data level in the not high situation of WPP parallel algorithm thread utilization rates
Parallel HEVC coding/decoding methods fully utilize the dependence of each CTU units, and using stream between each decoding task module
Waterline concurrent technique is to reduce the decoding delay of decoder, while it is excellent to decoding algorithm progress to introduce data redundancy reduction mechanism
Change, greatly improves the parallel decoding efficiency of decoder.By in the speed-up ratio comparison diagram of Figure 10 it can be found that the present invention and
Speed-up ratio of the row decoding algorithm under each QP values will be than the speed-up ratio higher of WPP decoder.
The above is only a preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art
For member, without departing from the technical principles of the invention, several improvement and deformations can also be made, these improvement and deformations
Also it should be regarded as protection scope of the present invention.
Claims (6)
1. based on the multi-level task level of multi-core platform and the parallel HEVC coding/decoding methods of data level, which is characterized in that including following
Step:
Step 1:Main thread carries out initialization operation to entire HEVC decoders first, reads binary code stream file, creates
HEVC decoders and internal storage location is applied;
Step 2:The code stream and call function that present frame is intercepted from read binary code stream, to the setting list of video code flow
Member is parsed, and obtained video parameter and global information are preserved into decoding image object structure;
Step 3:According to the thread of configuration parametric distribution preset number, by multi-kernel function library by per thread and different core into
Row binding;
Step 4:One thread reads video code flow, carries out entropy decoding parsing operation to video code flow, terminates in entropy decoding operation
Obtained quantization parameter QP and residual error data are stored in frame buffer afterwards;
Step 5:After the completion of the entropy decoding operation of current CTU units, if a left side for current CTU units is adjacent, upper left is adjacent, upper phase
CTU unit pixel decoding and reconstitutings adjacent, that upper right is adjacent arrange thread to carry out pixel decoding weight to current CTU units if all completing
Structure;If thread completes the pixel decoding and reconstituting operation of current CTU units, it is transferred to task queue and enters wait state;
Step 6:After current CTU unit pixel decoding and reconstitutings terminate, and the current left adjacent C TU units of CTU units be completed it is vertical
Straight boundary filtering operation, arranges thread to carry out the operation that is filtered of vertical boundary to current CTU units immediately, and processing is completed
Afterwards, thread is transferred to task queue and enters wait state;
Step 7:After current CTU units complete vertical boundary filtering operation, and a left side for current CTU units is adjacent, upper adjacent C TU is mono-
The filtering of vertical boundary is completed in member, then thread is arranged to carry out the filtering operation of horizontal boundary to it;
Step 8:If a left side for current CTU units is adjacent, upper left is adjacent, upper CTU unit sampling points adjacent, that upper right is adjacent are adaptively mended
It repays operation all to complete, thread is arranged to carry out sampling point adaptive equalization SAO operation to current CTU units;
Step 9:To repeating step 5~8 in next frame picture frame until the decoding of a frame video code flow is completed;
Step 10:After the decoding for completing a frame video code flow, all detection video code flow whether complete by decoding, is released if completing
Put all resources and destroying threads pond;If not completing, return to step 4.
2. as described in claim 1 based on the multi-level task level of multi-core platform and the parallel HEVC coding/decoding methods of data level,
It is characterized in that, if a left side for current CTU units is adjacent, upper left is adjacent, upper CTU unit pixel decoding and reconstitutings adjacent, upper right is adjacent
Operation is all completed, and 4 thread parallels is arranged to handle the pixel decoding and reconstituting operation of current CTU units, CTU unit vertical boundaries
Filtering operation, CTU unit horizontals boundary filtering and sampling point adaptive equalization operation.
3. as described in claim 1 based on the multi-level task level of multi-core platform and the parallel HEVC coding/decoding methods of data level,
It is characterized in that, introduces data redundancy and reduce mechanism, when vertical boundary is filtered only by the wide vertical area of four pixel of CTU units
The pixel in domain is put into spatial cache.
4. as described in claim 1 based on the multi-level task level of multi-core platform and the parallel HEVC coding/decoding methods of data level,
It is characterized in that, the pixel of the high horizontal zone of four pixel of CTU units is only put into caching sky when horizontal boundary is filtered
Between in.
5. as described in claim 1 based on the multi-level task level of multi-core platform and the parallel HEVC coding/decoding methods of data level,
It is characterized in that, the thread of fixed number is each bound on a core using Thread Pool Technology.
6. as described in claim 1 based on the multi-level task level of multi-core platform and the parallel HEVC coding/decoding methods of data level,
It is characterized in that, entire HEVC decoders is divided into 5 task modules, respectively code stream read module, entropy decoder module, pixel
Reconstructed module, deblocking filtering module and sampling point adaptive equalization module.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810239375.1A CN108449603B (en) | 2018-03-22 | 2018-03-22 | Based on the multi-level task level of multi-core platform and the parallel HEVC coding/decoding method of data level |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810239375.1A CN108449603B (en) | 2018-03-22 | 2018-03-22 | Based on the multi-level task level of multi-core platform and the parallel HEVC coding/decoding method of data level |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108449603A true CN108449603A (en) | 2018-08-24 |
CN108449603B CN108449603B (en) | 2019-11-22 |
Family
ID=63196565
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810239375.1A Active CN108449603B (en) | 2018-03-22 | 2018-03-22 | Based on the multi-level task level of multi-core platform and the parallel HEVC coding/decoding method of data level |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108449603B (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109495743A (en) * | 2018-11-15 | 2019-03-19 | 上海电力学院 | A kind of parallelization method for video coding based on isomery many places platform |
CN110418145A (en) * | 2019-07-26 | 2019-11-05 | 北京奇艺世纪科技有限公司 | A kind of method for video coding, device, electronic equipment and storage medium |
CN110446043A (en) * | 2019-08-08 | 2019-11-12 | 南京邮电大学 | A kind of HEVC fine grained parallel coding method based on multi-core platform |
CN112218091A (en) * | 2020-09-16 | 2021-01-12 | 博流智能科技(南京)有限公司 | Intra-frame decoding method and intra-frame decoding module |
CN112468821A (en) * | 2020-10-27 | 2021-03-09 | 南京邮电大学 | HEVC core module-based parallel decoding method, device and medium |
CN113016180A (en) * | 2018-11-12 | 2021-06-22 | 交互数字Vc控股公司 | Virtual pipeline for video encoding and decoding |
CN113660496A (en) * | 2021-07-12 | 2021-11-16 | 珠海全志科技股份有限公司 | Multi-core parallel-based video stream decoding method and device |
CN113852814A (en) * | 2021-07-19 | 2021-12-28 | 南京邮电大学 | Parallel decoding method and device for fusing data level and task level and storage medium |
WO2023029045A1 (en) * | 2021-09-06 | 2023-03-09 | Nvidia Corporation | Parallel encoding of video frames without filtering dependency |
US11871018B2 (en) | 2021-09-02 | 2024-01-09 | Nvidia Corporation | Parallel processing of video frames during video encoding |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102098503A (en) * | 2009-12-14 | 2011-06-15 | 中兴通讯股份有限公司 | Method and device for decoding image in parallel by multi-core processor |
CN102625108A (en) * | 2012-03-30 | 2012-08-01 | 浙江大学 | Multi-core-processor-based H.264 decoding method |
CN103974081A (en) * | 2014-05-08 | 2014-08-06 | 杭州同尊信息技术有限公司 | HEVC coding method based on multi-core processor Tilera |
CN104539972A (en) * | 2014-12-08 | 2015-04-22 | 中安消技术有限公司 | Method and device for controlling video parallel decoding in multi-core processor |
CN105791829A (en) * | 2016-03-30 | 2016-07-20 | 南京邮电大学 | HEVC parallel intra-frame prediction method based on multi-core platform |
CN105992008A (en) * | 2016-03-30 | 2016-10-05 | 南京邮电大学 | Multilevel multitask parallel decoding algorithm on multicore processor platform |
-
2018
- 2018-03-22 CN CN201810239375.1A patent/CN108449603B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102098503A (en) * | 2009-12-14 | 2011-06-15 | 中兴通讯股份有限公司 | Method and device for decoding image in parallel by multi-core processor |
CN102625108A (en) * | 2012-03-30 | 2012-08-01 | 浙江大学 | Multi-core-processor-based H.264 decoding method |
CN103974081A (en) * | 2014-05-08 | 2014-08-06 | 杭州同尊信息技术有限公司 | HEVC coding method based on multi-core processor Tilera |
CN104539972A (en) * | 2014-12-08 | 2015-04-22 | 中安消技术有限公司 | Method and device for controlling video parallel decoding in multi-core processor |
CN105791829A (en) * | 2016-03-30 | 2016-07-20 | 南京邮电大学 | HEVC parallel intra-frame prediction method based on multi-core platform |
CN105992008A (en) * | 2016-03-30 | 2016-10-05 | 南京邮电大学 | Multilevel multitask parallel decoding algorithm on multicore processor platform |
Non-Patent Citations (5)
Title |
---|
DAMIEN DE SAINT JORRE等: "EXPLORING MPEG HEVC DECODER PARALLELISM FOR THE EFFICIENT PORTING ONTO MANY-CORE PLATFORMS", 《IEEE EXPLORE》 * |
刘鹏: "基于多核嵌入式HEVC解码器并行优化及实现", 《中国优秀硕士学位论文全文数据库》 * |
叶昌益: "基于BF561的H.264并行编码的研究", 《器件与应用》 * |
方狄: "基于Tilera多核处理器的HEVC多层次并行解码方法的研究与实现", 《中国优秀硕士学位论文全文数据库》 * |
束骏: "基于Tilera多核处理器的HEVC视频编码并行算法的研究与实现", 《中国优秀硕士学位论文全文数据库》 * |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113016180A (en) * | 2018-11-12 | 2021-06-22 | 交互数字Vc控股公司 | Virtual pipeline for video encoding and decoding |
CN109495743A (en) * | 2018-11-15 | 2019-03-19 | 上海电力学院 | A kind of parallelization method for video coding based on isomery many places platform |
CN109495743B (en) * | 2018-11-15 | 2021-10-08 | 上海电力学院 | Parallelization video coding method based on heterogeneous multiprocessing platform |
CN110418145B (en) * | 2019-07-26 | 2022-04-22 | 北京奇艺世纪科技有限公司 | Video coding method and device, electronic equipment and storage medium |
CN110418145A (en) * | 2019-07-26 | 2019-11-05 | 北京奇艺世纪科技有限公司 | A kind of method for video coding, device, electronic equipment and storage medium |
CN110446043A (en) * | 2019-08-08 | 2019-11-12 | 南京邮电大学 | A kind of HEVC fine grained parallel coding method based on multi-core platform |
CN112218091A (en) * | 2020-09-16 | 2021-01-12 | 博流智能科技(南京)有限公司 | Intra-frame decoding method and intra-frame decoding module |
CN112468821A (en) * | 2020-10-27 | 2021-03-09 | 南京邮电大学 | HEVC core module-based parallel decoding method, device and medium |
CN112468821B (en) * | 2020-10-27 | 2023-02-10 | 南京邮电大学 | HEVC core module-based parallel decoding method, device and medium |
CN113660496A (en) * | 2021-07-12 | 2021-11-16 | 珠海全志科技股份有限公司 | Multi-core parallel-based video stream decoding method and device |
CN113660496B (en) * | 2021-07-12 | 2024-06-07 | 珠海全志科技股份有限公司 | Video stream decoding method and device based on multi-core parallelism |
CN113852814A (en) * | 2021-07-19 | 2021-12-28 | 南京邮电大学 | Parallel decoding method and device for fusing data level and task level and storage medium |
CN113852814B (en) * | 2021-07-19 | 2023-06-16 | 南京邮电大学 | Parallel decoding method, device and storage medium for data level and task level fusion |
US11871018B2 (en) | 2021-09-02 | 2024-01-09 | Nvidia Corporation | Parallel processing of video frames during video encoding |
WO2023029045A1 (en) * | 2021-09-06 | 2023-03-09 | Nvidia Corporation | Parallel encoding of video frames without filtering dependency |
Also Published As
Publication number | Publication date |
---|---|
CN108449603B (en) | 2019-11-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108449603B (en) | Based on the multi-level task level of multi-core platform and the parallel HEVC coding/decoding method of data level | |
CN105992008B (en) | A kind of multi-level multi-task parallel coding/decoding method in multi-core processor platform | |
CN105491377B (en) | A kind of video decoded macroblock grade Method of Scheduling Parallel of computation complexity perception | |
CN110337002B (en) | HEVC (high efficiency video coding) multi-level parallel decoding method on multi-core processor platform | |
CN101115201A (en) | Video decoding method and device | |
CN107465929B (en) | DVFS control method, system, processor and storage equipment based on HEVC | |
CN106210728A (en) | Circuit, method and Video Decoder for video decoding | |
CN101115207B (en) | Method and device for implementing interframe forecast based on relativity between future positions | |
CN112468821B (en) | HEVC core module-based parallel decoding method, device and medium | |
CN105791829A (en) | HEVC parallel intra-frame prediction method based on multi-core platform | |
CN105163126B (en) | A kind of hardware coding/decoding method and device based on HEVC agreements | |
US20230047433A1 (en) | Video decoding method, video encoding method, related devices, and storage medium | |
CN104521234B (en) | Merge the method for processing video frequency and device for going block processes and sampling adaptive migration processing | |
CN106851298B (en) | High-efficiency video coding method and device | |
CN108540797A (en) | HEVC based on multi-core platform combines WPP coding methods within the frame/frames | |
CN101383971A (en) | Intra-frame prediction processing method based on image encoding and decoding | |
CN104980764A (en) | Parallel coding/decoding method, device and system based on complexity balance | |
CN111757109A (en) | High-real-time parallel video coding and decoding method, system and storage medium | |
Gudumasu et al. | Software-based versatile video coding decoder parallelization | |
CN109391816A (en) | The method for parallel processing of HEVC medium entropy coding link is realized based on CPU+GPU heterogeneous platform | |
CN102595137B (en) | Fast mode judging device and method based on image pixel block row/column pipelining | |
CN110446043A (en) | A kind of HEVC fine grained parallel coding method based on multi-core platform | |
Jiang et al. | Highly paralleled low-cost embedded HEVC video encoder on TI KeyStone multicore DSP | |
Jiang et al. | GPU-based intra decompression for 8K real-time AVS3 decoder | |
CN104780377A (en) | Parallel high efficiency video coding (HEVC) system and method based on distributed computer system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |