CN108449603A - Based on the multi-level task level of multi-core platform and the parallel HEVC coding/decoding methods of data level - Google Patents

Based on the multi-level task level of multi-core platform and the parallel HEVC coding/decoding methods of data level Download PDF

Info

Publication number
CN108449603A
CN108449603A CN201810239375.1A CN201810239375A CN108449603A CN 108449603 A CN108449603 A CN 108449603A CN 201810239375 A CN201810239375 A CN 201810239375A CN 108449603 A CN108449603 A CN 108449603A
Authority
CN
China
Prior art keywords
decoding
level
thread
adjacent
ctu
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810239375.1A
Other languages
Chinese (zh)
Other versions
CN108449603B (en
Inventor
胡栋
韩峰
谷涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Post and Telecommunication University
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing Post and Telecommunication University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Post and Telecommunication University filed Critical Nanjing Post and Telecommunication University
Priority to CN201810239375.1A priority Critical patent/CN108449603B/en
Publication of CN108449603A publication Critical patent/CN108449603A/en
Application granted granted Critical
Publication of CN108449603B publication Critical patent/CN108449603B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/44Decoders specially adapted therefor, e.g. video decoders which are asymmetric with respect to the encoder
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/117Filters, e.g. for pre-processing or post-processing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/124Quantisation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/13Adaptive entropy coding, e.g. adaptive variable length coding [AVLC] or context adaptive binary arithmetic coding [CABAC]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/42Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/42Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
    • H04N19/436Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation using parallelised computational arrangements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/70Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/80Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation
    • H04N19/82Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation involving filtering within a prediction loop
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/85Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression
    • H04N19/86Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression involving reduction of coding artifacts, e.g. of blockiness
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/90Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using coding techniques not provided for in groups H04N19/10-H04N19/85, e.g. fractals
    • H04N19/91Entropy coding, e.g. variable length coding [VLC] or arithmetic coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/90Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using coding techniques not provided for in groups H04N19/10-H04N19/85, e.g. fractals
    • H04N19/96Tree coding, e.g. quad-tree coding

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computing Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The invention discloses based on the multi-level task level of multi-core platform and the parallel HEVC coding/decoding methods of data level.The present invention utilizes the dependence in HEVC data, using multi-core processor as processing platform, in combination with HEVC standard, entire HEVC decoders are divided into 5 task modules, respectively code stream read module, entropy decoder module, pixel reconstruction module, deblocking filtering module and sampling point adaptive equalization module;Parallel method, while the pipeline and parallel design based on CTU units between each intermodule realizes different decoding tasks using the dependence of each CTU units are separately designed for different decoding task modules.It introduces data redundancy and reduces mechanism, only part reference image vegetarian refreshments is put into spatial cache, excessive data manipulation is avoided, buffer-stored space is effectively managed, promotes decoding efficiency.The multi-core parallel concurrent decoding algorithm that the method for the present invention uses substantially increases decoded parallel speedup ratio, and ensure that decoded image quality compared to serial decoding.

Description

Based on the multi-level task level of multi-core platform and the parallel HEVC coding/decoding methods of data level
Technical field
The invention belongs to digital video signal codec domains, and in particular to be based on the multi-level task level of multi-core platform and number According to the parallel HEVC coding/decoding methods of grade.
Background technology
As high definition, ultra high-definition Video Applications sharply increase, to improve compression performance and meet massive video data Transimission and storage requirement, Video coding integration and cooperation group JCT-VC formally issued in April, 2013 and of new generation efficiently regarded Frequency encoding and decoding international standard HEVC (High Efficiency Video Coding).The main target of HEVC encoding and decoding standards is On the basis of H.264/AVC standard, under the premise of ensureing same video picture quality, by high-resolution video image pressure Contracting efficiency doubles, and so that the code check of video flowing is reduced 50%, and then better adapt to a variety of different network environments, while energy Enough support multi-core parallel concurrent encoding and decoding.
The coding framework of HEVC has continued to use hybrid encoding frame H.264/AVC, is a kind of block-based hybrid coding side Case.But different from previous hybrid coding scheme, HEVC almost takes important improvement in each coding link and arranges It applies.Coded frame is divided into several adjacent but nonoverlapping rectangular coding tree unit (Coding Tree Unit, CTU), CTU by HEVC Several rectangular coding unit CU (Coding Unit) can be further divided into according to the form of quaternary tree, CU can also be decomposed For smaller predicting unit PU (Prediction Unit) and converter unit TU (Transform Unit).HEVC is H.264/ Many directional prediction modes are increased on the basis of AVC prediction modes to eliminate the spatial coherence of image, each PU supports 35 kinds Prediction mode.HEVC in addition to using with deblocking filtering (DeBlocking Filter, DBF) in H.264/AVC similar ring Outside, it also adds new sampling point adaptive equalization SAO (Sample Adaptive Offset) intra-loop filtering tool, is subtracted with this Few distortion.
Compared with previous video encoding and decoding standard, HEVC is faced with the problem of computational complexity increases sharply, this is directly Its operation and realization are affected, and it is exactly in multi-core platform to improve processing speed, enhance one of the effective ways of computing capability Upper progress parallelization processing.Tilera series multi-core processor is just very representative in the market in current multi-core processor, It uses gridding multicore architecture as a kind of reconfigurable array structure DSP, passes through the numerous processors of iMesh real-time performances Core interconnects, and single-chip calculation processing power is improved tens to hundreds of times.Domestic many scholars are in multi-core processor On to video encoding and decoding standard done some research.2016, the Fang Di of Nanjing Univ. of Posts and Telecommunications was " more based on Tilera in its paper HEVC decoders are divided into three task modules in the research and realization of the multi-level parallel decoding methods of HEVC of core processor ", Respectively entropy decoder module, pixel decoder module and deblocking filtering module, and base is separately designed with regard to latter two task module In the parallel method of CTU rows, serial decoding operate is carried out with same CTU rows are individually checked, while utilizing CTU rows between task module Dependence realize decoder parallel computation processing.2016, the Liu Peng of Southwest Jiaotong University " was based on multinuclear in its paper Embedded HEVC decoders parallel optimization " in have studied a kind of deblocking filtering parallel method, in the method, decoder meeting The CTU rows of balanced number are distributed for per thread, multiple CTU rows that then each thread can first be responsible for it carry out vertical boundary Filtering operation, after the completion of the vertical boundary filtering operation for waiting for a frame image, then again with these threads to water all in CTU rows Pingbian circle is handled.But their research has certain limitation, when such as carrying out parallel processing to each task module, be with The parallel granularity of CTU behaviors, decoder are gone serially to decode a line CTU, but pixel decoder module and deblocking filtering with single thread Its dependence when carrying out parallelization processing of module is present on each CTU units, rather than CTU rows, this is one Determine that time delay can be increased in degree, while making multi-core resource from making full use of, causes the waste of nuclear resource.In addition square is gone to filter The parallel processing of wave, fully separates the filtering operation of vertical boundary and horizontal boundary, does not fully consider each boundary Dependence fails to be combined vertical boundary with the filtering operation of horizontal boundary to realize that parallel work-flow, parallel efficiency fail Effectively promoted.
Invention content
It is with CTU when carrying out parallel processing to each task module the technical problem to be solved by the present invention is to the prior art The parallel granularity of behavior, does not account for the dependence of CTU units;The parallel processing of deblocking filtering simultaneously, fully will be vertical The filtering operation of boundary and horizontal boundary separates, and does not fully consider that the dependence on each boundary, parallel efficiency fail effectively It is promoted.
In order to solve the above technical problems, the present invention provide it is parallel based on the multi-level task level of multi-core platform and data level HEVC coding/decoding methods, including:
Step 1, main thread carry out initialization operation to entire HEVC decoders first, read binary code stream file, wound It builds HEVC decoders and internal storage location is applied;
Step 2, the code stream and call function that present frame is intercepted from read binary code stream, set video code flow Order member is parsed, and obtained video parameter and global information are preserved into decoding image object structure;
Step 3, according to configuration parametric distribution preset number thread, by multi-kernel function library by per thread from it is different Core is bound;
Step 4, a thread read video code flow, carry out entropy decoding parsing operation to video code flow, are operated in entropy decoding After will obtained quantization parameter QP and residual error data be stored in frame buffer in;
After the completion of step 5, the operation of the entropy decoding of current CTU units, if a left side for current CTU units is adjacent, upper left is adjacent, on The CTU unit pixel decoding and reconstitutings adjacent, upper right is adjacent arrange thread to carry out pixel decoding to current CTU units if all completing Reconstruct;If thread completes the pixel decoding and reconstituting operation of current CTU units, it is transferred to task queue and enters wait state;
After step 6, current CTU unit pixels decoding and reconstituting terminate, and the left adjacent C TU units of current CTU units are complete At vertical boundary filtering operation, arranges thread to be filtered operation to current CTU units progress vertical boundary immediately, handled Cheng Hou, thread are transferred to task queue and enter wait state;
After step 7, current CTU units complete vertical boundary filtering operation, and a left side for current CTU units is adjacent, upper adjacent The filtering of vertical boundary is completed in CTU units, then thread is arranged to carry out the filtering operation of horizontal boundary to it;
If a left side for step 8, current CTU units is adjacent, upper left is adjacent, upper CTU unit sampling points adjacent, that upper right is adjacent are adaptive It answers compensating operation all to complete, thread is arranged to carry out sampling point adaptive equalization SAO operation to current CTU units;
Step 9, in next frame picture frame repeat step 5~8 until a frame video code flow decoding complete;
Step 10, complete a frame video code flow decoding after, detection video code flow whether all decoding complete, if complete Then discharge all resources and destroying threads pond;If not completing, return to step 4.
The advantageous effect that the present invention is reached:The present invention is using multi-core processor as processing platform, in combination with HEVC standard, Entire HEVC decoders are divided into 5 task modules, respectively code stream read module, entropy decoder module, pixel reconstruction module, Deblocking filtering module and sampling point adaptive equalization module;Parallel method is separately designed for different decoding task modules, And buffer-stored space is effectively managed, while in each intermodule different solutions are realized using the dependence of each CTU units Pipeline and parallel design based on CTU units between code task, uses Thread Pool Technology to dynamically distribute thread resources for decoding task, Multi-core resource utilization rate is improved, and can be transplanted in other multi-core processors;This method is ensureing decoded image quality Under the premise of, parallel decoding is carried out to the single code stream of high definition that any parallel encoding mode is not used in multi-core processor platform, The parallel work-flow of Video Decoder is completed, decoder entirety parallel speedup ratio is improved and promotes the core utilization of multi-core processor Rate.
Description of the drawings
Fig. 1 is the flow chart of the method for the present invention embodiment;
Fig. 2 is the HEVC decoder architectures that the method for the present invention embodiment is realized;
Fig. 3 be the method for the present invention embodiment pixel decoding and reconstituting module can parallel processing CTU units schematic diagram;
Fig. 4 need to be stored in the pixel of buffer storage when being the pixel decoding operate of the method for the present invention embodiment;
Reference image vegetarian refreshments required when being the current CTU unit pixels decoding of the method for the present invention embodiment Fig. 5;
Boundary strength calculation flow chart when Fig. 6 is the deblocking filtering operation of the method for the present invention embodiment;
Fig. 7 is each boundary dependency analysis schematic diagram of the row deblocking filtering parallel processing of the method for the present invention embodiment;
When Fig. 8 is two stages operatings of deblocking filtering of the method for the present invention embodiment, need to be stored in buffer storage In pixel;
Fig. 9 is the dependence schematic diagram of each subtask module CTU units of the method for the present invention embodiment;
Figure 10 is processing schematic diagram of the different subtask modules of the invention on each CTU units;
Figure 11 is the multitask module pipeline parallel decoding Organization Chart of the present invention;
Figure 12 is the average speedup and WPP that the present invention carries out different video sequence when multi-core parallel concurrent decoding at different Q P The comparing result figure of decoder algorithm.
Specific implementation mode
The invention will be further described below in conjunction with the accompanying drawings.Following embodiment is only used for clearly illustrating the present invention Technical solution, and not intended to limit the protection scope of the present invention.
Using Tilera GX36 multi-core processors as hardware experiment platform, it is made of the present invention 36 Tile cores, Tilera multi-core processors possess the multi-Core Development tool of complete set, are provided a convenient for the exploitation of multinuclear parallel program.
The flow chart of Fig. 1 the method for the present invention embodiments;Specifically according to the following steps:
Step 1, main thread carry out initialization operation to entire decoder first first, read binary code stream file, wound It builds HEVC decoders and internal storage location is applied;
Further, step 2, from read binary code stream intercept present frame code stream and call function, to regarding The correlation unit of frequency code stream is parsed, and obtains being sealed in video parameter therein and some global informations such as predicted value, compensation Value, quantization parameter, running parameter, then will be in these information preservations to decoding image object structure.
Further, step 3, for reduce frequently create and terminate thread caused by expense, opened in decoder program When beginning to carry out, the thread of fixed number is disposably distributed according to configuration parameter, fixed number can be preset, these threads are certainly Begin to whole presence, and take thread pool (Thread Pool) technology, by multi-kernel function library by per thread and different core It is bound, ensures that each thread can be sufficiently carried out scheduling, realize the decoding of multinuclear efficient parallel.It is hindered simultaneously in order to reduce Plug does not enter obstruction shape with cost caused by thread, worker thread is waken up when solving terminal information or waiting for CTU units State.Enter major cycle later to handle;
Further, one step 4, arrangement thread are read out video code flow, and entropy decoding parsing behaviour is carried out to it Make, and obtained quantization parameter QP and residual error data are stored in frame buffer after entropy decoding operates;One CTU unit After the completion of entropy decoding operation, you can obtain required all information when its reconstruction;
Further, if the various information needed for step 5, a CTU unit reconstruction operation has all obtained, for the CTU Row arranges four threads to carry out the operation of the pipeline parallel method between different subtasks.Using pixel decoding and reconstituting parallel method come pair CTU units carry out pixel decoding operate, and it is corresponding to carry out to distribute a core for the CTU units that each dependence is met Processing, until CTU unit pixels decoding terminates.The CTU units pixel to be referred to below is only put into buffer storage, Rather than all pixels point all puts in, carry out further efficiency;
Further, step 6, after CTU unit pixels decoding and reconstituting terminates in picture frame, and with its left adjacent C TU Vertical boundary filtering operation is completed in unit, arranges what a thread carried out it vertical boundary to be filtered operation immediately, together The CTU units pixel to be referred to below is only put into buffer storage by sample;
Further, after step 7, current CTU units complete vertical boundary filtering operation, and a left side is adjacent, upper adjacent C TU is mono- The filtering of vertical boundary is completed in member, then thread is arranged to carry out the filtering operation of horizontal boundary to it;
Further, step 8, for meet dependence CTU units carry out sampling point adaptive equalization operate SAO operation, To complete the reconstruction of the CTU units.The CTU units rebuild in this way can be conducive to quickly referenced by other frames It realizes interframe parallel processing operations, more effectively promotes the utilization rate of multi-core resource;
Further, step 9, the CTU units met to dependence in next frame picture frame arrange thread to carry out Pixel decoding and reconstituting operates, and repeats above step;
Further, step 10, complete a frame video code flow decoding after, detection video code flow whether all decoded At, if complete if discharge all resources and destroying threads pond;If not completing, return to step 4.
Fig. 2 show the HEVC decoder architectures of the method for the present invention embodiment realization.First to two after coding into Bit stream processed carries out entropy decoding, to obtain quantization parameter and control information, then carries out inverse quantization and contravariant to quantization parameter It changes, obtains residual information.Following decoder carries out intra prediction and inter-prediction using control information, predictive information with go back The residual information that original goes out is combined, and is handled, is obtained using deblocking filtering and the loop filtering of sampling point adaptive equalization The image of output.
The multinomial new coding structure and encoding tool that HEVC is provided all embody the " friend that it realizes parallel processing It is good ".Entire HEVC decoders are divided into 5 task modules by the present invention, respectively code stream read module, entropy decoder module, as Plain reconstructed module, deblocking filtering module and sampling point adaptive equalization module.It is set respectively for different decoding task modules Parallel method is counted, and buffer-stored space is effectively managed, while the dependence of each CTU units is utilized in each intermodule It realizes the pipeline and parallel design based on CTU units between different decoding tasks, uses Thread Pool Technology for decoding task dynamic point With thread resources, multi-core resource utilization rate is improved, and can be transplanted in other multi-core processors.
Fig. 3 be the method for the present invention embodiment pixel decoding and reconstituting module can parallel processing CTU cell schematics.To Realize that the pixel based on CTU units decodes parallel work-flow, it is necessary to so that the dependence of each CTU units is met.Each CTU is mono- Member has data dependency with the adjacent CTU units in this 4, its left, upper left side, top and upper right side, only when this four The dependence of i.e. CTU units, which obtains meeting, after the pixel decoding and reconstituting operation of CTU is completed to carry out pixel solution to current CTU Code operation.The present invention is that multiple CTU units distribute multiple threads with synchronous progress simultaneously using the Dynamic Scheduling Strategy of thread pool The processing of pixel decoding and reconstituting, while to ensure the decoding dependency relationship of each CTU units, the processing progress of previous CTU rows is than rear A line carries the first two CTU units.
Fig. 4 need to be stored in the pixel of buffer storage when being the pixel decoding operate of the method for the present invention embodiment.It introduces After data redundancy reduction mechanism, when carrying out pixel decoder module, it is not necessary to which the pixel of entire CTU units is put into buffer-stored In space, only the boundary pixel point of its top CTU and left CTU units are put into buffer storage, it can be to avoid a large amount of superfluous Remaining operation.
Fig. 5 is that the current CTU units of the method for the present invention embodiment carry out reference image vegetarian refreshments required when pixel decoding operate.It can To find, according to the method for reduction redundant data operation shown in Fig. 4, segment boundary pixel is only put into buffer-stored space It goes, the dependence of current CTU units can still be met, and carry out decoding operate.
Fig. 6 is that the method for the present invention embodiment carries out boundary strength calculation flow chart when deblocking filtering operation.Square is gone to filter The Major Difficulties of wave are to judge whether to be filtered a specific block boundary, while determining its filtering strength.Too strong Filtering may result in image detail region excess smoothness, and filtering strength can not enough allow blocking artifact to reduce subjective quality.Really Whether the pixel on a fixed block boundary both sides needs to filter, and the difference for depending primarily on the reconstruction pixel value on block boundary both sides is special Whether sign, adaptive determining are filtered operation, if necessary to filter, then to arrange filtering strength appropriate and filtering depth.
Fig. 7 is each boundary dependency analysis schematic diagram that the method for the present invention embodiment carries out deblocking filtering parallel processing. Filter switch and strong and weak judgement are related to 4 row bounds or so, 6 pixels.Strong filtering can be to the left and right side adjacent 3 of vertical boundary A pixel value is updated, weak filtering can two pixel values each to boundary adjacent block be updated.In view of the above filtering operation When pixel value more new relation, find arbitrary vertical boundary and other vertical boundaries all without dependence, and filtering operation It is mutually independent of each other, therefore parallel filtering processing can be carried out to it.But for horizontal boundary, such as Fig. 7, the surrounded picture of dotted line Plain region is the CTU cell sizes for being actually filtered operation, is equivalent to a revised CTU unit, thus need etc. with Adjacent, the upper adjacent CTU units completion vertical boundary in its left side is filtered operation, could be to the horizontal sides of current CTU units Boundary is filtered.
When Fig. 8 is carry out two stages operatings of deblocking filtering of the method for the present invention embodiment, current CTU units needs are put To the pixel in buffer storage.These pixels are that next CTU is filtered required reference image vegetarian refreshments when operation.Only These pixels being relied on are stored, the redundant operation of data is reduced, promote decoding efficiency.
Fig. 9 is the dependence schematic diagram of each subtask module CTU units of the method for the present invention embodiment.Current CTU is mono- Member will carry out pixel decoding and reconstituting operation, and corresponding behaviour is completed in the CTU units for needing its left, top, three, upper right side adjacent Make, when carrying out parallel processing to it, so that pixel dependence is met.In addition as shown in Figure 9 to Nth row CTU kth When a CTU units carry out deblocking filtering operation, deblocking filtering operation is completed in left adjacent, upper adjacent CTU units therewith, And N+1 rows CTU has completed the pixel decoding and reconstituting operation of k-th of CTU unit.It similarly will be to k-th of CTU of Nth row CTU When unit carries out sampling point adaptive equalization operation SAO operation, N+1 rows CTU is needed to complete+1 CTU unit of kth Deblocking filtering operates.But between two adjacent image frames, it is adaptive to carry out sampling point in last column CTU of current frame image When answering compensating operation SAO operation, thread equally exists empty waiting problem, so by means of the thought of OWF algorithms, it is waiting The CTU units that thread directly meets inter-prediction dependency relationships in next image frame carry out pixel decoding and reconstituting behaviour Make.Simple point is said, when a decoding task module is divided into multiple subtask modules, though multiple subtask modules can be into Row parallel processing operations, but this parallel work-flow does not start to walk simultaneously, and there is two between two neighboring subtask module The time delay of CTU units, to ensure that the decoded dependence of CTU units is met.
Figure 10 show processing schematic diagram of the different subtask modules of the invention on each CTU units.It can be found that pixel There is the delays of two CTU units in adjacent CTU rows for decoding operate, while when a CTU unit completes pixel decoding It is at once one thread of CTU units arrangement to carry out being filtered for vertical boundary after reconstructed operation, waits for that vertical boundary filters After the completion, being filtered for horizontal boundary is carried out immediately.In addition when the dependence that sampling point adaptive equalization operates SAO operation obtains To when meeting, then arranges a thread to carry out sampling point adaptive equalization to CTU units and operate SAO operation.Simultaneously in order to prevent one Overwhelming majority thread enters empty the case where waiting for when the operation of frame image decoding is near completion, and this method is by OWF algorithms Thread under being waited in thread pool is used directly to obtain inter-prediction dependency relationships in next image frame by thought CTU units to satisfaction carry out pixel decoding and reconstituting operation, to realize the parallel processing operations between consecutive frame.
Figure 11 show the multitask module pipeline parallel decoding Organization Chart of the present invention.Entire HEVC decoders are divided For 5 task modules, respectively code stream read module, entropy decoder module, pixel reconstruction module, deblocking filtering module and sample Point self-adapted compensating module.Parallel method is separately designed for different decoding task modules, and buffer-stored space is carried out Effectively management, while based on CTU units between each intermodule realizes different decoding tasks using the dependence of each CTU units Pipeline and parallel design uses Thread Pool Technology to dynamically distribute thread resources for decoding task, improves multi-core resource utilization rate, and It can be transplanted in other multi-core processors.
To verify the effect of the method for the present invention, following confirmatory experiment has been carried out:It is decoded, is chosen using the method for the present invention 3 kinds of video sequences " BasketballDrive ", " Cactus ", " Kimono ", QP are respectively 22,27,32,37.The solution of the present invention Code algorithm realizes multi-core parallel concurrent decoding respectively on Tilera multi-core processors and monokaryon serially decodes.In order to better The performance based on the multi-level task level of multi-core platform Yu the parallel HEVC coding/decoding methods of data level is weighed, experiment will introduce WPP simultaneously Row method.By the comparison with WPP methods, to carry out the comparative analysis of parallel speedup ratio.
Figure 12 indicates that the present invention carries out different video sequence when multi-core parallel concurrent decoding in multi-core processor at different Q P Average speedup, and compared with WPP decoder algorithms.Wherein the performance of parallel Programming is indicated with speed-up ratio, It is specific as follows:
Table 1 shows parallel decoding speed-up ratio experimental result of the method for the present invention under different check figures under different Q P.
1 experimental result of table
By experimental data observation analysis it is found that for the same video test sequence, different quantization QP values can produce Raw different decoding speed-up ratio, the quantization QP values that coding side uses are smaller, and the speed of decoder parallel decoding is also slower.It compares It is designed herein to be based on the multi-level task level of multi-core platform and data level in the not high situation of WPP parallel algorithm thread utilization rates Parallel HEVC coding/decoding methods fully utilize the dependence of each CTU units, and using stream between each decoding task module Waterline concurrent technique is to reduce the decoding delay of decoder, while it is excellent to decoding algorithm progress to introduce data redundancy reduction mechanism Change, greatly improves the parallel decoding efficiency of decoder.By in the speed-up ratio comparison diagram of Figure 10 it can be found that the present invention and Speed-up ratio of the row decoding algorithm under each QP values will be than the speed-up ratio higher of WPP decoder.
The above is only a preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art For member, without departing from the technical principles of the invention, several improvement and deformations can also be made, these improvement and deformations Also it should be regarded as protection scope of the present invention.

Claims (6)

1. based on the multi-level task level of multi-core platform and the parallel HEVC coding/decoding methods of data level, which is characterized in that including following Step:
Step 1:Main thread carries out initialization operation to entire HEVC decoders first, reads binary code stream file, creates HEVC decoders and internal storage location is applied;
Step 2:The code stream and call function that present frame is intercepted from read binary code stream, to the setting list of video code flow Member is parsed, and obtained video parameter and global information are preserved into decoding image object structure;
Step 3:According to the thread of configuration parametric distribution preset number, by multi-kernel function library by per thread and different core into Row binding;
Step 4:One thread reads video code flow, carries out entropy decoding parsing operation to video code flow, terminates in entropy decoding operation Obtained quantization parameter QP and residual error data are stored in frame buffer afterwards;
Step 5:After the completion of the entropy decoding operation of current CTU units, if a left side for current CTU units is adjacent, upper left is adjacent, upper phase CTU unit pixel decoding and reconstitutings adjacent, that upper right is adjacent arrange thread to carry out pixel decoding weight to current CTU units if all completing Structure;If thread completes the pixel decoding and reconstituting operation of current CTU units, it is transferred to task queue and enters wait state;
Step 6:After current CTU unit pixel decoding and reconstitutings terminate, and the current left adjacent C TU units of CTU units be completed it is vertical Straight boundary filtering operation, arranges thread to carry out the operation that is filtered of vertical boundary to current CTU units immediately, and processing is completed Afterwards, thread is transferred to task queue and enters wait state;
Step 7:After current CTU units complete vertical boundary filtering operation, and a left side for current CTU units is adjacent, upper adjacent C TU is mono- The filtering of vertical boundary is completed in member, then thread is arranged to carry out the filtering operation of horizontal boundary to it;
Step 8:If a left side for current CTU units is adjacent, upper left is adjacent, upper CTU unit sampling points adjacent, that upper right is adjacent are adaptively mended It repays operation all to complete, thread is arranged to carry out sampling point adaptive equalization SAO operation to current CTU units;
Step 9:To repeating step 5~8 in next frame picture frame until the decoding of a frame video code flow is completed;
Step 10:After the decoding for completing a frame video code flow, all detection video code flow whether complete by decoding, is released if completing Put all resources and destroying threads pond;If not completing, return to step 4.
2. as described in claim 1 based on the multi-level task level of multi-core platform and the parallel HEVC coding/decoding methods of data level, It is characterized in that, if a left side for current CTU units is adjacent, upper left is adjacent, upper CTU unit pixel decoding and reconstitutings adjacent, upper right is adjacent Operation is all completed, and 4 thread parallels is arranged to handle the pixel decoding and reconstituting operation of current CTU units, CTU unit vertical boundaries Filtering operation, CTU unit horizontals boundary filtering and sampling point adaptive equalization operation.
3. as described in claim 1 based on the multi-level task level of multi-core platform and the parallel HEVC coding/decoding methods of data level, It is characterized in that, introduces data redundancy and reduce mechanism, when vertical boundary is filtered only by the wide vertical area of four pixel of CTU units The pixel in domain is put into spatial cache.
4. as described in claim 1 based on the multi-level task level of multi-core platform and the parallel HEVC coding/decoding methods of data level, It is characterized in that, the pixel of the high horizontal zone of four pixel of CTU units is only put into caching sky when horizontal boundary is filtered Between in.
5. as described in claim 1 based on the multi-level task level of multi-core platform and the parallel HEVC coding/decoding methods of data level, It is characterized in that, the thread of fixed number is each bound on a core using Thread Pool Technology.
6. as described in claim 1 based on the multi-level task level of multi-core platform and the parallel HEVC coding/decoding methods of data level, It is characterized in that, entire HEVC decoders is divided into 5 task modules, respectively code stream read module, entropy decoder module, pixel Reconstructed module, deblocking filtering module and sampling point adaptive equalization module.
CN201810239375.1A 2018-03-22 2018-03-22 Based on the multi-level task level of multi-core platform and the parallel HEVC coding/decoding method of data level Active CN108449603B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810239375.1A CN108449603B (en) 2018-03-22 2018-03-22 Based on the multi-level task level of multi-core platform and the parallel HEVC coding/decoding method of data level

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810239375.1A CN108449603B (en) 2018-03-22 2018-03-22 Based on the multi-level task level of multi-core platform and the parallel HEVC coding/decoding method of data level

Publications (2)

Publication Number Publication Date
CN108449603A true CN108449603A (en) 2018-08-24
CN108449603B CN108449603B (en) 2019-11-22

Family

ID=63196565

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810239375.1A Active CN108449603B (en) 2018-03-22 2018-03-22 Based on the multi-level task level of multi-core platform and the parallel HEVC coding/decoding method of data level

Country Status (1)

Country Link
CN (1) CN108449603B (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109495743A (en) * 2018-11-15 2019-03-19 上海电力学院 A kind of parallelization method for video coding based on isomery many places platform
CN110418145A (en) * 2019-07-26 2019-11-05 北京奇艺世纪科技有限公司 A kind of method for video coding, device, electronic equipment and storage medium
CN110446043A (en) * 2019-08-08 2019-11-12 南京邮电大学 A kind of HEVC fine grained parallel coding method based on multi-core platform
CN112218091A (en) * 2020-09-16 2021-01-12 博流智能科技(南京)有限公司 Intra-frame decoding method and intra-frame decoding module
CN112468821A (en) * 2020-10-27 2021-03-09 南京邮电大学 HEVC core module-based parallel decoding method, device and medium
CN113016180A (en) * 2018-11-12 2021-06-22 交互数字Vc控股公司 Virtual pipeline for video encoding and decoding
CN113660496A (en) * 2021-07-12 2021-11-16 珠海全志科技股份有限公司 Multi-core parallel-based video stream decoding method and device
CN113852814A (en) * 2021-07-19 2021-12-28 南京邮电大学 Parallel decoding method and device for fusing data level and task level and storage medium
WO2023029045A1 (en) * 2021-09-06 2023-03-09 Nvidia Corporation Parallel encoding of video frames without filtering dependency
US11871018B2 (en) 2021-09-02 2024-01-09 Nvidia Corporation Parallel processing of video frames during video encoding

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102098503A (en) * 2009-12-14 2011-06-15 中兴通讯股份有限公司 Method and device for decoding image in parallel by multi-core processor
CN102625108A (en) * 2012-03-30 2012-08-01 浙江大学 Multi-core-processor-based H.264 decoding method
CN103974081A (en) * 2014-05-08 2014-08-06 杭州同尊信息技术有限公司 HEVC coding method based on multi-core processor Tilera
CN104539972A (en) * 2014-12-08 2015-04-22 中安消技术有限公司 Method and device for controlling video parallel decoding in multi-core processor
CN105791829A (en) * 2016-03-30 2016-07-20 南京邮电大学 HEVC parallel intra-frame prediction method based on multi-core platform
CN105992008A (en) * 2016-03-30 2016-10-05 南京邮电大学 Multilevel multitask parallel decoding algorithm on multicore processor platform

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102098503A (en) * 2009-12-14 2011-06-15 中兴通讯股份有限公司 Method and device for decoding image in parallel by multi-core processor
CN102625108A (en) * 2012-03-30 2012-08-01 浙江大学 Multi-core-processor-based H.264 decoding method
CN103974081A (en) * 2014-05-08 2014-08-06 杭州同尊信息技术有限公司 HEVC coding method based on multi-core processor Tilera
CN104539972A (en) * 2014-12-08 2015-04-22 中安消技术有限公司 Method and device for controlling video parallel decoding in multi-core processor
CN105791829A (en) * 2016-03-30 2016-07-20 南京邮电大学 HEVC parallel intra-frame prediction method based on multi-core platform
CN105992008A (en) * 2016-03-30 2016-10-05 南京邮电大学 Multilevel multitask parallel decoding algorithm on multicore processor platform

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
DAMIEN DE SAINT JORRE等: "EXPLORING MPEG HEVC DECODER PARALLELISM FOR THE EFFICIENT PORTING ONTO MANY-CORE PLATFORMS", 《IEEE EXPLORE》 *
刘鹏: "基于多核嵌入式HEVC解码器并行优化及实现", 《中国优秀硕士学位论文全文数据库》 *
叶昌益: "基于BF561的H.264并行编码的研究", 《器件与应用》 *
方狄: "基于Tilera多核处理器的HEVC多层次并行解码方法的研究与实现", 《中国优秀硕士学位论文全文数据库》 *
束骏: "基于Tilera多核处理器的HEVC视频编码并行算法的研究与实现", 《中国优秀硕士学位论文全文数据库》 *

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113016180A (en) * 2018-11-12 2021-06-22 交互数字Vc控股公司 Virtual pipeline for video encoding and decoding
CN109495743A (en) * 2018-11-15 2019-03-19 上海电力学院 A kind of parallelization method for video coding based on isomery many places platform
CN109495743B (en) * 2018-11-15 2021-10-08 上海电力学院 Parallelization video coding method based on heterogeneous multiprocessing platform
CN110418145B (en) * 2019-07-26 2022-04-22 北京奇艺世纪科技有限公司 Video coding method and device, electronic equipment and storage medium
CN110418145A (en) * 2019-07-26 2019-11-05 北京奇艺世纪科技有限公司 A kind of method for video coding, device, electronic equipment and storage medium
CN110446043A (en) * 2019-08-08 2019-11-12 南京邮电大学 A kind of HEVC fine grained parallel coding method based on multi-core platform
CN112218091A (en) * 2020-09-16 2021-01-12 博流智能科技(南京)有限公司 Intra-frame decoding method and intra-frame decoding module
CN112468821A (en) * 2020-10-27 2021-03-09 南京邮电大学 HEVC core module-based parallel decoding method, device and medium
CN112468821B (en) * 2020-10-27 2023-02-10 南京邮电大学 HEVC core module-based parallel decoding method, device and medium
CN113660496A (en) * 2021-07-12 2021-11-16 珠海全志科技股份有限公司 Multi-core parallel-based video stream decoding method and device
CN113660496B (en) * 2021-07-12 2024-06-07 珠海全志科技股份有限公司 Video stream decoding method and device based on multi-core parallelism
CN113852814A (en) * 2021-07-19 2021-12-28 南京邮电大学 Parallel decoding method and device for fusing data level and task level and storage medium
CN113852814B (en) * 2021-07-19 2023-06-16 南京邮电大学 Parallel decoding method, device and storage medium for data level and task level fusion
US11871018B2 (en) 2021-09-02 2024-01-09 Nvidia Corporation Parallel processing of video frames during video encoding
WO2023029045A1 (en) * 2021-09-06 2023-03-09 Nvidia Corporation Parallel encoding of video frames without filtering dependency

Also Published As

Publication number Publication date
CN108449603B (en) 2019-11-22

Similar Documents

Publication Publication Date Title
CN108449603B (en) Based on the multi-level task level of multi-core platform and the parallel HEVC coding/decoding method of data level
CN105992008B (en) A kind of multi-level multi-task parallel coding/decoding method in multi-core processor platform
CN105491377B (en) A kind of video decoded macroblock grade Method of Scheduling Parallel of computation complexity perception
CN110337002B (en) HEVC (high efficiency video coding) multi-level parallel decoding method on multi-core processor platform
CN101115201A (en) Video decoding method and device
CN107465929B (en) DVFS control method, system, processor and storage equipment based on HEVC
CN106210728A (en) Circuit, method and Video Decoder for video decoding
CN101115207B (en) Method and device for implementing interframe forecast based on relativity between future positions
CN112468821B (en) HEVC core module-based parallel decoding method, device and medium
CN105791829A (en) HEVC parallel intra-frame prediction method based on multi-core platform
CN105163126B (en) A kind of hardware coding/decoding method and device based on HEVC agreements
US20230047433A1 (en) Video decoding method, video encoding method, related devices, and storage medium
CN104521234B (en) Merge the method for processing video frequency and device for going block processes and sampling adaptive migration processing
CN106851298B (en) High-efficiency video coding method and device
CN108540797A (en) HEVC based on multi-core platform combines WPP coding methods within the frame/frames
CN101383971A (en) Intra-frame prediction processing method based on image encoding and decoding
CN104980764A (en) Parallel coding/decoding method, device and system based on complexity balance
CN111757109A (en) High-real-time parallel video coding and decoding method, system and storage medium
Gudumasu et al. Software-based versatile video coding decoder parallelization
CN109391816A (en) The method for parallel processing of HEVC medium entropy coding link is realized based on CPU+GPU heterogeneous platform
CN102595137B (en) Fast mode judging device and method based on image pixel block row/column pipelining
CN110446043A (en) A kind of HEVC fine grained parallel coding method based on multi-core platform
Jiang et al. Highly paralleled low-cost embedded HEVC video encoder on TI KeyStone multicore DSP
Jiang et al. GPU-based intra decompression for 8K real-time AVS3 decoder
CN104780377A (en) Parallel high efficiency video coding (HEVC) system and method based on distributed computer system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant