CN109391816A - The method for parallel processing of HEVC medium entropy coding link is realized based on CPU+GPU heterogeneous platform - Google Patents

The method for parallel processing of HEVC medium entropy coding link is realized based on CPU+GPU heterogeneous platform Download PDF

Info

Publication number
CN109391816A
CN109391816A CN201811258709.6A CN201811258709A CN109391816A CN 109391816 A CN109391816 A CN 109391816A CN 201811258709 A CN201811258709 A CN 201811258709A CN 109391816 A CN109391816 A CN 109391816A
Authority
CN
China
Prior art keywords
gpu
cpu
thread
data
frame image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811258709.6A
Other languages
Chinese (zh)
Other versions
CN109391816B (en
Inventor
郭成安
董菁鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian University of Technology
Original Assignee
Dalian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian University of Technology filed Critical Dalian University of Technology
Priority to CN201811258709.6A priority Critical patent/CN109391816B/en
Publication of CN109391816A publication Critical patent/CN109391816A/en
Application granted granted Critical
Publication of CN109391816B publication Critical patent/CN109391816B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/124Quantisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output
    • H04N19/147Data rate or code amount at the encoder output according to rate distortion criteria
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/157Assigned coding mode, i.e. the coding mode being predefined or preselected to be further used for selection of another element or parameter
    • H04N19/159Prediction type, e.g. intra-frame, inter-frame or bidirectional frame prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/172Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/44Decoders specially adapted therefor, e.g. video decoders which are asymmetric with respect to the encoder
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/70Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/90Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using coding techniques not provided for in groups H04N19/10-H04N19/85, e.g. fractals
    • H04N19/91Entropy coding, e.g. variable length coding [VLC] or arithmetic coding

Abstract

A kind of efficient parallel processing method for realizing HEVC medium entropy coding link based on CPU+GPU heterogeneous platform, it is to be carried out in cataloged procedure by HEVC consensus standard to sequence of video images, the final entropy coding link of current frame image is done into parallel processing-with the other links of whole in the processing to next frame image other than final entropy coding link that is, the entropy coding of current frame image is handled with CPU using CPU+GPU computing platform, prediction to next frame image, transformation, quantization, rate-distortion optimization, inverse quantization, inverse transformation, all other links are handled with GPU for filtering and image reconstruction etc., both CPU and GPU parallel computation simultaneously;By using this parallel processing plan, the processing time spent required for time-consuming shorter link (i.e. final entropy coding link) among said two devices can be saved, to significantly improve the overall calculation speed of HEVC encoder.

Description

The parallel processing of HEVC medium entropy coding link is realized based on CPU+GPU heterogeneous platform Method
Technical field
The invention belongs to compression of digital video coding techniques field, it is related in efficient video coding HEVC (High Efficiency Video Coding, also referred to as H.265 or for the entropy coding link in HEVC/H.265) consensus standard such as The method what realizes efficient parallel processing, to achieve the purpose that be obviously improved the computational efficiency of HEVC encoder.
Background technique
With the rapid development of internet and information technology, multimedia technology plays more and more important in social life Effect.And video has many advantages, such as intuitive, certainty, high efficiency and popularity, extensively as a kind of important information carrier It is general to apply in social every field.Continuous improvement with people to demands such as video resolutions and clarity, digital video Development from 352 × 240 initial resolution ratio develops to high definition (1920 × 1080) and then again to ultra high-definition (4k × 2k or more), The video data volume also increases considerably therewith.However the capacity of actual channel and storage equipment is but limited, and therefore, video counts Essential key technology in video technique application and development is had become according to compression.
As latest generation efficient video coding standard, HEVC by International Telecommunication Union Video Coding Experts Group (ITU- T/VCEG) and the Motion Picture Experts Group (ISO/IEC MPEG) of International Organization for standardization and International Electrotechnical Commission (IEC) in Formal publication in 2013.HEVC contains newest video coding technique, with prior-generation video encoding standard H.264/AVC phase Than HEVC can save about 50% code rate under the premise of guaranteeing identical coding quality.HEVC is obtaining outstanding video But also the computation complexity of cataloged procedure dramatically increases (being 2-4 times H.264/AVC according to statistics) while compression performance, from And the realization handled in real time to encoder brings huge challenge.Therefore, efficiently quick HEVC encryption algorithm is designed An important subject as video data compression field.
In HEVC encoder mainly include prediction (intra prediction and inter-prediction), transformation, quantization, rate-distortion optimization, The rings such as inverse quantization, inverse transformation, filtering (deblocking effect filtering and sampling point adaptive equalization filtering), image reconstruction and entropy coding Section is directed to other links other than entropy coding link at present and has separately designed out corresponding efficient parallel processing calculation Method, and then parallel computation is realized by using GPU and Mutli-thread Programming Technology to be obviously improved the computational efficiency of these links.But For entropy coding link therein, due in HEVC using the adaptive binary arithmetic coding based on context (Context-based Adaptive Binary Arithmetic Coding:CABAC) method, the calculating process itself are A kind of recursive operation needs to use the coding result to last data when carrying out coding calculating to latter data, therefore only Latter data could can be encoded after having found out to the coding result of last data, thus can only be according to data before Sequence successively carries out serial arithmetic afterwards, and is unsuitable for doing parallel processing, so being difficult to make the calculating speed of the link to be advised greatly The raising of mould.There are two links to be related to entropy coding calculating in entire HEVC cataloged procedure, and one is excellent in the distortion of the rate of progress Need the code rate information for finding out each CU block respectively to CU data block each in image to obtain its optimum encoding parameter when change, separately One is to carry out final entropy coding to the whole data to be encoded for having found out the full frame image after whole optimum encoding parameters To generate the bit stream after the frame compression of images.
Experimental results show using the 7th generation i7CPU (for example,CoreTMI7-7700) to high definition (1080P) video image completes a width full frame image in the case where data compression ratio is 100-130 times, with the entropy coding algorithm in HEVC Final entropy coding link be averaged 14--17 milliseconds of time-consuming.And prediction, transformation, quantization, inverse quantization, inverse transformation, rate are distorted It is average to carry out parallel processing by using one piece of GPU card (such as GTX-1080) for whole links such as optimization, filtering and image reconstruction It can be completed in 32--36 milliseconds.Therefore, for entire HEVC encoder, entropy coding link has become what realization was handled in real time Bottleneck problem.How efficient entropy coding algorithm is designed to save significantly on the processing time of the link, for realizing that HEVC is compiled The real-time processing of code device is very crucial.
Currently, the research for the computational efficiency for improving HEVC encoder focuses primarily upon algorithm improvement and hardware-accelerated. The master thesis delivered for 2014 (is handed in the parallelization resarch Shanghai [D] of Zhao Yanan a new generation video encoding standard HEVC Logical university, 2014.) it is directed to prediction link while using two kinds of parallel modes of coarse grain parallelism and fine grained parallel, from code tree CTU grades of unit and the inside two CTU levels carry out parallelization resarch.This method can improve HEVC encoder to a certain extent Computational efficiency, but because fine grained parallel relates only to CU grades parallel, degree of parallelism is smaller, and there is no be directed to entropy coding ring Section does parallelization processing, therefore its acceleration effect is unable to satisfy the demand to real-time.It is published in journal of Zhejiang university within 2014 Paper (Zhou Chengtao, Tian Xiang, Chen Yaowu .HEVC coding unit size fast selection algorithm [J] journal of Zhejiang university: engineering Version, 2014,48 (8): 1451-1460.) the monistic concept of depth is proposed, using the selection of adjacent cells coding depth, jump The smaller depth of certain probabilities of occurrence is crossed, so that accelerating CU block divides speed, although this method improves coding rate About 25%, but divide depth due to skipping some CU, so that the final coding quality of HEVC encoder is affected.2016 Year is published in the document on Journal of Visual Communication and Image Representation magazine (Tariq J,Kwong S,Yuan H.HEVC intra mode selection based on Rate Distortion (RD)cost and Sum of Absolute Difference(SAD)[J].Journal of Visual Communication and Image Representation, 2016,35:112-119.) analyze rate distortion (RD cost) Secondary relationship between absolute error and (SAD) is obtained the approximate formula of estimation rate distortion costs, can be saved with this It seeks carrying out time-consuming required for entropy coding to each CU block in rate distortion costs.But since this method is to the code length of each CU block It is to do approximate estimation, the calculating that can be thus distorted to rate brings error, to cause centainly to lose to the coding quality of image. Document (Cebri á n-M á rquez G, Galiano V, Migall ó n H, the et al.Heterogeneous delivered for 2018 CPU plus GPU approaches for HEVC[J].The Journal of Supercomputing,2018,April 2:1-12.) two methods of the parallel scheme based on chip level (Slice) and parallel scheme based on level (Tile) are carried out It realizes, this method is but also the computational efficiency of encoder has obtained a degree of raising.But this method is only chip level or level Parallel, degree of parallelism is limited, its computational efficiency can not be made to obtain more massive promotion.Above method is not related to The final entropy coding link of full frame image after determining optimum encoding parameter carries out parallelization processing.And as previously mentioned, the ring Average 14--17 milliseconds of time-consuming when abridged edition is in one frame high-definition image (1080p) of processing.Therefore, the link how is effectively saved The processing time, for realizing that real-time handle of entire HEVC encoder is vital.
Summary of the invention
The present invention proposes a kind of method for parallel processing of HEVC encoder for being suitble to realize on CPU+GPU heterogeneous platform, This method is by the final entropy coding link of full frame image and prediction, transformation, quantization, rate-distortion optimization, inverse quantization, inverse transformation, filter Involve all other link such as image reconstruction and carry out parallel processing, synchronous operation together, reaches and be obviously improved HEVC binary encoding The purpose of the computational efficiency of device.
In HEVC, each width current frame image will be predicted first, converted, quantified, is rate-distortion optimization, anti- The operation of the links such as quantization, inverse transformation, filtering, image reconstruction and entropy coding.Due in prediction, transformation, quantization, filtering and image Reconstructing in these links can be designed efficiently and trying and effectively being divided to pending data involved in it Parallel processing algorithm, while by using GPU many-core structure and Mutli-thread Programming Technology efficient parallel realize to these The processing of data, so as to promote the computational efficiency of these links on a large scale.But for the entropy coding process in HEVC, especially It carries out full frame image final entropy coding link, is encoded using CABAC method as previously described, this based on upper Adaptive binary arithmetic coding method hereafter requires to use to its last data when encoding each data Coding result, therefore data to be encoded to be dealt with are needed in the link successively serial according to the tandem of each data Ground processing, and if these data are divided into many junior units and carry out parallel processings, the cause and effect between the data of front and back can be destroyed Relationship and generate encoding error.Therefore, it cannot be done as in other links by data division in the entropy coding link Parallel processing and be obviously improved its computational efficiency.
However according to HEVC standard agreement it is found that final entropy coding for a certain frame video image whole data to be encoded Link, although still serial mode must be used to each according to the tandem between these data inside the entropy coding link at this time Data are encoded, but the calculating process with to next frame image prediction, transformation, quantization, rate-distortion optimization, inverse quantization, Links such as inverse transformation, filtering and image reconstruction and there is no the relation of interdependence between data, the two can independently into Row calculates.According to this analysis, the present invention proposes that this " realizes the height of HEVC medium entropy coding link based on CPU+GPU heterogeneous platform Imitate method for parallel processing ".The main thought of this method is by the final entropy coding link of current frame image and to next frame image Processing in other links of whole other than final entropy coding link do parallel processing-that is, the entropy of current frame image compiled Code handled with CPU, to the prediction of next frame image, transformation, quantization, rate-distortion optimization, inverse quantization, inverse transformation, filtering and The whole link such as image reconstruction is handled with GPU, both CPU and GPU parallel computation simultaneously.By using this parallel processing Scheme, when can save the processing spent required for time-consuming shorter link (i.e. final entropy coding link) among said two devices Between, to significantly improve the overall calculation speed of HEVC encoder.
The present invention realizes that the technical solution of HEVC encoders in parallel processing is as follows:
(1) CPU (such as i7-7700 or to improve grade) and GPU card (such as GTX- of two cores or two cores or more are used 1080 or to improve grade) constituted a CPU+GPU heterogeneous computing platforms, for realizing HEVC encoders in parallel of the present invention Processing method;
(2) as shown in Figure of description, two threads is set on CPU and (is referred to as " CPU main thread " and " CPU is from line Journey "), a multithreading (referred to as " GPU multithreading ") is set on GPU.Wherein " CPU main thread " is responsible for entire encoder system Calculation process control, to the data information friendship between " CPU from thread " and the scheduling and CPU and GPU of " GPU multithreading " Stream, " CPU from thread " are responsible for realizing the final entropy coding link to all data to be encoded of previous frame image, " GPU multithreading " It is responsible for realizing to the prediction of current frame image, transformation, quantization, rate-distortion optimization, inverse quantization, inverse transformation, filtering and image reconstruction Deng whole links;It is described to the prediction of image involved in GPU multithreading, transformation, quantization, inverse quantization, inverse transformation, filtering and The whole link such as image reconstruction, using parallel algorithm described in following academic dissertations: 1) Du Junjie .HEVC intra prediction is crucial The design and realization [D] of technology parallel algorithm, Dalian University of Technology's master thesis, 2015;2) Zhang Weilong .HEVC is crucial Design and the realization [D] based on GPU of modular concurrent algorithm, Dalian University of Technology's master thesis, 2016;
(3) for during rate-distortion optimization to the entropy coding link of each CU data block, due to the link be with to working as The processing links such as prediction, transformation, the quantization of preceding image are combined, and the links such as prediction, transformation, quantization are in this programme Parallel processing is done by " GPU multithreading " by GPU, and calculated using CU data block as unit at this time, therefore in this process In can with " GPU multithreading " to each CU data block still press former serial entropy coding algorithm to each data in the CU block into Row coding, and parallel encoding is carried out to different CU data blocks, it is achieved in CU grades of parallel entropy coding, improves the meter of the link Calculate efficiency.
The efficient parallel processing method proposed by the present invention for realizing final entropy coding link in HEVC encoder, enables to HEVC encoder, which saves, to carry out each frame video data to handle the time required for final entropy coding, to make the whole meter of system Efficiency is calculated to be significantly improved.Experiment shows HEVC encoders in parallel processing technique proposed by the present invention in current consumption grade CPU+GPU computing platform on realize (for example, with one CoreTMI7-7700 computer adds with one piece of GTX-1080 GPU card), in the case where the data compression ratio to high definition (1080P) video image reaches 100-130 times, 27-31 frame/second can be obtained Video coding rate can satisfy the requirement of processing (>=25 frames/second) in real time.Otherwise, if carrying out final entropy to each frame image Parallel entropy coding scheme proposed by the present invention is not used when coding, and uses original serial approach, then can not save the link Required calculating is time-consuming (14-17 milliseconds), to can not achieve the real-time coding of high clear video image data under the platform Compression.The present invention proposes the parallel entropy coding of CU grades of realization for the entropy coding of each CU data block during rate-distortion optimization Scheme can make the computational efficiency of rate-distortion optimization link be further improved by executing the parallel scheme.
Detailed description of the invention
Attached drawing 1 is flow diagram of the invention.
Specific embodiment
Specific embodiments of the present invention are illustrated below in conjunction with technical solution and Figure of description.
If sequence of video images length to be encoded is N (0-N-1 frame), implementation steps are as follows:
Step 1: as shown in Figure of description, from T0Moment starts to T1In time, first from the CPU main thread at the end CPU to The 0th frame image data in video sequence is transmitted at the end GPU, and issues dispatch command to GPU, and starting GPU multithreading is used as before " technical solution for realizing the processing of HEVC encoders in parallel " described in face is predicted (to be herein pre- in frame to the 0th frame image Survey), transformation, quantization, the parallel computation of rate-distortion optimization, inverse quantization, the links such as inverse transformation and image reconstruction.During this period, CPU Main thread and the end GPU will have multiple information interchange, the current GPU that CPU main thread can be reflected according to the information that the end GPU returns The task execution situation of GPU multithreading on end starts different GPU multithreadings to complete the processing task of different links;When Task end mark is returned to CPU main thread after the reconstruction calculations task of GPU multithreading the 0th frame image of completion in the end GPU, After CPU main thread receives end mark, the whole data to be encoded for the 0th frame that GPU multithreading generates are passed in CPU;Herein Period, CPU was in idle condition from thread;
Step 2: from T1Moment starts to T2In time, it is complete to the 0th frame from thread by this from thread that CPU main thread starts CPU Portion's data to be encoded carry out entropy coding, and by next frame (the 1st frame), image data is transmitted to the end GPU to CPU main thread, and starts simultaneously The GPU multithreading at the end GPU is predicted (inter-prediction), transformation, quantization, rate-distortion optimization, inverse quantization, anti-to the 1st frame image The parallel computation of the links such as transformation, filtering and image reconstruction.During this period, CPU main thread is same as the end GPU will multiple letter Breath exchange, CPU main thread can be according to the task executions of the GPU multithreading on the current end GPU that the information that the end GPU returns is reflected Situation starts different GPU multithreadings to complete the processing task of different links.CPU main thread and CPU are between thread at this time Also there will be information interchange, and control CPU from the task of thread and complete situation.When the GPU multithreading in the end GPU completes the 1st frame image Reconstruction calculations task after to CPU main thread return task end mark, it is after CPU main thread receives end mark, GPU is multi-thread The whole data to be encoded for the 1st frame that journey generates pass in CPU;
Step 3: from T2Moment starts to T3In time, the CPU main thread at the end CPU, CPU are from the GPU at thread and the end GPU Multithreading carry out respectively in step 2 completely same operation processing, except that in this step from CPU main thread to The transmission of the end GPU be the 2nd frame image data, GPU multiple threads is also that the frame data, CPU main thread start CPU from thread Entropy coding is carried out to the 1st frame whole data to be encoded, CPU main thread is finally that will be generated in this period by GPU multithreading Whole data to be encoded of 2nd frame image pass in CPU;
Step 4: from TnMoment starts to Tn+1In time (n=3,4 ..., N-1), the CPU main line at the end CPU in this step The operation processing that journey, CPU are carried out from the GPU multithreading at thread and the end GPU still with it is almost same in step 2 or step 3 The operation processing of sample, unlike that transmit in this step from CPU main thread to the end GPU is n-th frame image data, GPU more This frame data thread process and that new biography comes, CPU main thread starting CPU are all to be encoded to the (n-1)th frame from thread Data carry out entropy coding and CPU main thread is finally the complete of the n-th frame image that will be generated in this period by GPU multithreading Portion's data to be encoded pass in CPU.In addition, different with abovementioned steps: in this step by GPU multithreading to n-th Prediction that frame image is done processing had not only been likely to be inter-prediction has but also been likely to be intra prediction, specially it is any predict be by Selection algorithm is predicted as defined in advance to determine.It is to be determined by CPU main thread according to prediction selection algorithm in actual moving process The fixed prediction mode to current frame image, and issue corresponding dispatch command to GPU, therewith by GPU multithreading by instruction to working as Prior image frame carries out corresponding intra prediction or inter-prediction operation.It is other it is all operation then with step 2 or the complete phase of step 3 Together;
Step 5: repeating step 4 until TNUntil moment, T is arrivedNMoment removes the completion of N-1 frame (last frame) image Whole processing operations other than final entropy coding link, and be transmitted to whole data to be encoded of the frame image The end CPU;
Step 6: in TNMoment, the whole sent out public notice from CPU main thread to CPU from thread by this from thread to N-1 frame Data to be encoded carry out final entropy coding, obtain the last frame image finally compressed bit stream data.So far, to entire The cataloged procedure of sequence of video images terminates.

Claims (2)

1. a kind of method for parallel processing for realizing HEVC medium entropy coding link based on CPU+GPU heterogeneous platform, it is characterised in that:
(1) a CPU+GPU heterogeneous computing platforms are constituted using CPU and more than two cores or two cores GPU cards;
(2) two threads are set on CPU, are referred to as " CPU main thread " and " CPU from thread ";Setting is more than one on GPU Thread, referred to as " GPU multithreading ";Wherein " CPU main thread " is responsible for the calculation process control of entire encoder system, also to " CPU From thread " and the scheduling and CUP and GPU of " GPU multithreading " between data information exchange;" CPU from thread " is responsible for realization To the final entropy coding link of all data to be encoded of previous frame image, " GPU multithreading " is responsible for realizing to current frame image Prediction, transformation, quantization, rate-distortion optimization, inverse quantization, inverse transformation, filtering and image reconstruction;
(3) for, to the entropy coding link of each CU data block, the links such as prediction, transformation, quantization are during rate-distortion optimization Parallel processing is done by " GPU multithreading " by GPU, and calculated using CU data block as unit at this time, in the process Each CU data block still compiles each data in the CU block by former serial entropy coding algorithm with " GPU multithreading " Code, and parallel encoding is carried out to different CU data blocks, it is achieved in CU grades of parallel entropy coding, improves the calculating effect of the link Rate.
2. method for parallel processing according to claim 1, it is characterised in that following steps:
Step 1: from T0Moment starts to T1In time, first from the CPU main thread at the end CPU into the end GPU transmission video sequence 0th frame image data, and dispatch command is issued to GPU, starting GPU multithreading is using " realization HEVC coding as elucidated before The technical solution of device parallel processing " predicts the 0th frame image, converted, being quantified, rate-distortion optimization, inverse quantization, inverse transformation and The parallel computation of the links such as image reconstruction;During this period, CPU main thread and the end GPU will have multiple information interchange, CPU main line Journey can be according to the task execution situation of the GPU multithreading on the current end GPU that the information that the end GPU returns is reflected, starting is different GPU multithreading to complete the processing tasks of different links;When the GPU multithreading in the end GPU completes the reconstruct meter of the 0th frame image Task end mark is returned to CPU main thread after calculation task, after CPU main thread receives end mark, GPU multithreading is generated Whole data to be encoded of 0th frame pass in CPU;CPU is in idle condition from thread during this period;
Step 2: from T1Moment starts to T2In time, CPU main thread start CPU from thread by this from thread to the 0th frame all to Coded data carries out entropy coding, and by next frame (the 1st frame), image data is transmitted to the end GPU to CPU main thread, and starts GPU simultaneously The GPU multithreading at end predicts the 1st frame image, converted, being quantified, rate-distortion optimization, inverse quantization, inverse transformation, filtering and figure As the parallel computation of the links such as reconstruct;During this period, CPU main thread is same as the end GPU will multiple information interchange, CPU master Thread can be according to the task execution situation of the GPU multithreading on the current end GPU that the information that the end GPU returns is reflected, starting is not With GPU multithreading to complete the processing tasks of different links;CPU main thread and CPU will be from will also there is information at this time between thread Exchange controls CPU from the task of thread and completes situation;When the GPU multithreading in the end GPU completes the reconstruction calculations of the 1st frame image Return to task end mark to CPU main thread after task, after CPU main thread receives end mark, GPU multithreading is generated the Whole data to be encoded of 1 frame pass in CPU;
Step 3: from T2Moment starts to T3In time, CUP main thread, the GPU at CPU from thread and the end GPU at the end CUP are multi-thread Journey carry out respectively with same operation processing completely in step 2, except that from CPU main thread to GPU in this step End transmission be the 2nd frame image data, GPU multiple threads is also that the frame data, CPU main thread start CPU from thread pair 1st frame whole data to be encoded carry out entropy coding, CPU main thread is finally the 2 will generated in this period by GPU multithreading Whole data to be encoded of frame image pass in CPU;
Step 4: from TnMoment starts to Tn+1In time (n=3,4 ..., N-1), in this step the CUP main thread at the end CUP, The operation processing that CPU is carried out from the GPU multithreading at thread and the end GPU still with it is almost same in step 2 or step 3 Operation processing, the difference is that is transmitted in this step from CPU main thread to the end GPU is n-th frame image data, GPU multithreading This frame data processing and that new biography comes, CPU main thread starting CPU are from thread to the (n-1)th frame whole data to be encoded Carry out entropy coding and CPU main thread be finally the n-th frame image that will be generated in this period by GPU multithreading all to Coded data passes in CPU;Frame is both likely to be by the prediction processing that GPU multithreading does n-th frame image in this step Between prediction be likely to be intra prediction again, specially any prediction is that the prediction selection algorithm as defined in advance determines;? In actual moving process, be prediction mode to current frame image is determined according to prediction selection algorithm by CPU main thread, and to GPU issues corresponding dispatch command, therewith by GPU multithreading by instruction to current frame image carry out corresponding intra prediction or Inter-prediction operation;Other all operations are then identical with step 2 or step 3;
Step 5: repeating step 4 until TNUntil moment, T is arrivedNMoment completes in addition to final entropy coding N-1 frame image Whole processing operations other than link, and whole data to be encoded of the frame image have been transmitted to the end CPU;
Step 6: in TNMoment sends out public notice to be encoded from whole of the thread to N-1 frame by this from CPU main thread to CPU from thread Data carry out final entropy coding, obtain the last frame image finally compressed bit stream data;So far, to entire video figure As the cataloged procedure of sequence terminates.
CN201811258709.6A 2018-10-26 2018-10-26 Parallel processing method for realizing entropy coding link in HEVC (high efficiency video coding) based on CPU (Central processing Unit) and GPU (graphics processing Unit) heterogeneous platform Active CN109391816B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811258709.6A CN109391816B (en) 2018-10-26 2018-10-26 Parallel processing method for realizing entropy coding link in HEVC (high efficiency video coding) based on CPU (Central processing Unit) and GPU (graphics processing Unit) heterogeneous platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811258709.6A CN109391816B (en) 2018-10-26 2018-10-26 Parallel processing method for realizing entropy coding link in HEVC (high efficiency video coding) based on CPU (Central processing Unit) and GPU (graphics processing Unit) heterogeneous platform

Publications (2)

Publication Number Publication Date
CN109391816A true CN109391816A (en) 2019-02-26
CN109391816B CN109391816B (en) 2020-11-03

Family

ID=65426863

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811258709.6A Active CN109391816B (en) 2018-10-26 2018-10-26 Parallel processing method for realizing entropy coding link in HEVC (high efficiency video coding) based on CPU (Central processing Unit) and GPU (graphics processing Unit) heterogeneous platform

Country Status (1)

Country Link
CN (1) CN109391816B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110969672A (en) * 2019-11-14 2020-04-07 杭州飞步科技有限公司 Image compression method and device
CN112385225A (en) * 2019-09-02 2021-02-19 北京航迹科技有限公司 Method and system for improved image coding
CN114827614A (en) * 2022-04-18 2022-07-29 重庆邮电大学 Method for realizing LCEVC video coding optimization

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103297777A (en) * 2013-05-23 2013-09-11 广州高清视信数码科技股份有限公司 Method and device for increasing video encoding speed
CN104869398A (en) * 2015-05-21 2015-08-26 大连理工大学 Parallel method of realizing CABAC in HEVC based on CPU+GPU heterogeneous platform
US20170150181A1 (en) * 2015-11-20 2017-05-25 Nvidia Corporation Hybrid Parallel Decoder Techniques
CN107135392A (en) * 2017-04-21 2017-09-05 西安电子科技大学 HEVC motion search parallel methods based on asynchronous mode

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103297777A (en) * 2013-05-23 2013-09-11 广州高清视信数码科技股份有限公司 Method and device for increasing video encoding speed
CN104869398A (en) * 2015-05-21 2015-08-26 大连理工大学 Parallel method of realizing CABAC in HEVC based on CPU+GPU heterogeneous platform
US20170150181A1 (en) * 2015-11-20 2017-05-25 Nvidia Corporation Hybrid Parallel Decoder Techniques
CN107135392A (en) * 2017-04-21 2017-09-05 西安电子科技大学 HEVC motion search parallel methods based on asynchronous mode

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
BIAO WANG 等: "Highly parallel HEVC decoding for heterogeneous systems with CPU and GPU", 《SIGNAL PROCESSING: IMAGE COMMUNICATION》 *
张维龙: "HEVC关键模块并行算法的设计与基于GPU的实现", 《中国优秀硕士学位论文全文数据库》 *
马爱迪: "基于CPU+GPU混合平台的HEVC并行解码器", 《中国优秀硕士学位论文全文数据库》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112385225A (en) * 2019-09-02 2021-02-19 北京航迹科技有限公司 Method and system for improved image coding
CN110969672A (en) * 2019-11-14 2020-04-07 杭州飞步科技有限公司 Image compression method and device
CN114827614A (en) * 2022-04-18 2022-07-29 重庆邮电大学 Method for realizing LCEVC video coding optimization
CN114827614B (en) * 2022-04-18 2024-03-22 重庆邮电大学 Method for realizing LCEVC video coding optimization

Also Published As

Publication number Publication date
CN109391816B (en) 2020-11-03

Similar Documents

Publication Publication Date Title
CN104967850B (en) The method and apparatus that image is coded and decoded by using big converter unit
CN104869398B (en) A kind of CABAC realized based on CPU+GPU heterogeneous platforms in HEVC parallel method
CN106464894B (en) Method for processing video frequency and device
CN104935932B (en) Image decoding apparatus
CN101394560B (en) Mixed production line apparatus used for video encoding
CN106170092A (en) Fast encoding method for lossless coding
CN102065298B (en) High-performance macroblock coding implementation method
CN103548356B (en) Picture decoding method using dancing mode and the device using this method
CN110087087A (en) VVC interframe encode unit prediction mode shifts to an earlier date decision and block divides and shifts to an earlier date terminating method
CN105981383B (en) Method for processing video frequency and device
CN103096056B (en) Matrix coder method and apparatus and coding/decoding method and device
CN109391816A (en) The method for parallel processing of HEVC medium entropy coding link is realized based on CPU+GPU heterogeneous platform
CN107483947A (en) Video coding and decoding device and non-transitory computer-readable storage media
CN103327325A (en) Intra-frame prediction mode rapid self-adaptation selection method based on HEVC standard
CN101895756A (en) Method and system for coding, decoding and reconstructing video image blocks
CN105681797A (en) Prediction residual based DVC-HEVC (Distributed Video Coding-High Efficiency Video Coding) video transcoding method
CN102196272B (en) P frame encoding method and device
CN104702959B (en) A kind of intra-frame prediction method and system of Video coding
CN105100799B (en) A method of reducing intraframe coding time delay in HEVC encoders
CN108965814A (en) A kind of video mix decoding rendering method based on CUDA acceleration technique
CN104980764A (en) Parallel coding/decoding method, device and system based on complexity balance
CN101841722B (en) Detection method of detection device of filtering boundary strength
CN102196253A (en) Video coding method and device based on frame type self-adaption selection
CN102595137B (en) Fast mode judging device and method based on image pixel block row/column pipelining
CN104780377B (en) A kind of parallel HEVC coded systems and method based on Distributed Computer System

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant