CN109391816A - The method for parallel processing of HEVC medium entropy coding link is realized based on CPU+GPU heterogeneous platform - Google Patents
The method for parallel processing of HEVC medium entropy coding link is realized based on CPU+GPU heterogeneous platform Download PDFInfo
- Publication number
- CN109391816A CN109391816A CN201811258709.6A CN201811258709A CN109391816A CN 109391816 A CN109391816 A CN 109391816A CN 201811258709 A CN201811258709 A CN 201811258709A CN 109391816 A CN109391816 A CN 109391816A
- Authority
- CN
- China
- Prior art keywords
- gpu
- cpu
- thread
- data
- frame image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/124—Quantisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T1/00—General purpose image data processing
- G06T1/20—Processor architectures; Processor configuration, e.g. pipelining
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/146—Data rate or code amount at the encoder output
- H04N19/147—Data rate or code amount at the encoder output according to rate distortion criteria
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/157—Assigned coding mode, i.e. the coding mode being predefined or preselected to be further used for selection of another element or parameter
- H04N19/159—Prediction type, e.g. intra-frame, inter-frame or bidirectional frame prediction
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
- H04N19/172—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/44—Decoders specially adapted therefor, e.g. video decoders which are asymmetric with respect to the encoder
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/70—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/90—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using coding techniques not provided for in groups H04N19/10-H04N19/85, e.g. fractals
- H04N19/91—Entropy coding, e.g. variable length coding [VLC] or arithmetic coding
Abstract
A kind of efficient parallel processing method for realizing HEVC medium entropy coding link based on CPU+GPU heterogeneous platform, it is to be carried out in cataloged procedure by HEVC consensus standard to sequence of video images, the final entropy coding link of current frame image is done into parallel processing-with the other links of whole in the processing to next frame image other than final entropy coding link that is, the entropy coding of current frame image is handled with CPU using CPU+GPU computing platform, prediction to next frame image, transformation, quantization, rate-distortion optimization, inverse quantization, inverse transformation, all other links are handled with GPU for filtering and image reconstruction etc., both CPU and GPU parallel computation simultaneously;By using this parallel processing plan, the processing time spent required for time-consuming shorter link (i.e. final entropy coding link) among said two devices can be saved, to significantly improve the overall calculation speed of HEVC encoder.
Description
Technical field
The invention belongs to compression of digital video coding techniques field, it is related in efficient video coding HEVC (High
Efficiency Video Coding, also referred to as H.265 or for the entropy coding link in HEVC/H.265) consensus standard such as
The method what realizes efficient parallel processing, to achieve the purpose that be obviously improved the computational efficiency of HEVC encoder.
Background technique
With the rapid development of internet and information technology, multimedia technology plays more and more important in social life
Effect.And video has many advantages, such as intuitive, certainty, high efficiency and popularity, extensively as a kind of important information carrier
It is general to apply in social every field.Continuous improvement with people to demands such as video resolutions and clarity, digital video
Development from 352 × 240 initial resolution ratio develops to high definition (1920 × 1080) and then again to ultra high-definition (4k × 2k or more),
The video data volume also increases considerably therewith.However the capacity of actual channel and storage equipment is but limited, and therefore, video counts
Essential key technology in video technique application and development is had become according to compression.
As latest generation efficient video coding standard, HEVC by International Telecommunication Union Video Coding Experts Group (ITU-
T/VCEG) and the Motion Picture Experts Group (ISO/IEC MPEG) of International Organization for standardization and International Electrotechnical Commission (IEC) in
Formal publication in 2013.HEVC contains newest video coding technique, with prior-generation video encoding standard H.264/AVC phase
Than HEVC can save about 50% code rate under the premise of guaranteeing identical coding quality.HEVC is obtaining outstanding video
But also the computation complexity of cataloged procedure dramatically increases (being 2-4 times H.264/AVC according to statistics) while compression performance, from
And the realization handled in real time to encoder brings huge challenge.Therefore, efficiently quick HEVC encryption algorithm is designed
An important subject as video data compression field.
In HEVC encoder mainly include prediction (intra prediction and inter-prediction), transformation, quantization, rate-distortion optimization,
The rings such as inverse quantization, inverse transformation, filtering (deblocking effect filtering and sampling point adaptive equalization filtering), image reconstruction and entropy coding
Section is directed to other links other than entropy coding link at present and has separately designed out corresponding efficient parallel processing calculation
Method, and then parallel computation is realized by using GPU and Mutli-thread Programming Technology to be obviously improved the computational efficiency of these links.But
For entropy coding link therein, due in HEVC using the adaptive binary arithmetic coding based on context
(Context-based Adaptive Binary Arithmetic Coding:CABAC) method, the calculating process itself are
A kind of recursive operation needs to use the coding result to last data when carrying out coding calculating to latter data, therefore only
Latter data could can be encoded after having found out to the coding result of last data, thus can only be according to data before
Sequence successively carries out serial arithmetic afterwards, and is unsuitable for doing parallel processing, so being difficult to make the calculating speed of the link to be advised greatly
The raising of mould.There are two links to be related to entropy coding calculating in entire HEVC cataloged procedure, and one is excellent in the distortion of the rate of progress
Need the code rate information for finding out each CU block respectively to CU data block each in image to obtain its optimum encoding parameter when change, separately
One is to carry out final entropy coding to the whole data to be encoded for having found out the full frame image after whole optimum encoding parameters
To generate the bit stream after the frame compression of images.
Experimental results show using the 7th generation i7CPU (for example,CoreTMI7-7700) to high definition
(1080P) video image completes a width full frame image in the case where data compression ratio is 100-130 times, with the entropy coding algorithm in HEVC
Final entropy coding link be averaged 14--17 milliseconds of time-consuming.And prediction, transformation, quantization, inverse quantization, inverse transformation, rate are distorted
It is average to carry out parallel processing by using one piece of GPU card (such as GTX-1080) for whole links such as optimization, filtering and image reconstruction
It can be completed in 32--36 milliseconds.Therefore, for entire HEVC encoder, entropy coding link has become what realization was handled in real time
Bottleneck problem.How efficient entropy coding algorithm is designed to save significantly on the processing time of the link, for realizing that HEVC is compiled
The real-time processing of code device is very crucial.
Currently, the research for the computational efficiency for improving HEVC encoder focuses primarily upon algorithm improvement and hardware-accelerated.
The master thesis delivered for 2014 (is handed in the parallelization resarch Shanghai [D] of Zhao Yanan a new generation video encoding standard HEVC
Logical university, 2014.) it is directed to prediction link while using two kinds of parallel modes of coarse grain parallelism and fine grained parallel, from code tree
CTU grades of unit and the inside two CTU levels carry out parallelization resarch.This method can improve HEVC encoder to a certain extent
Computational efficiency, but because fine grained parallel relates only to CU grades parallel, degree of parallelism is smaller, and there is no be directed to entropy coding ring
Section does parallelization processing, therefore its acceleration effect is unable to satisfy the demand to real-time.It is published in journal of Zhejiang university within 2014
Paper (Zhou Chengtao, Tian Xiang, Chen Yaowu .HEVC coding unit size fast selection algorithm [J] journal of Zhejiang university: engineering
Version, 2014,48 (8): 1451-1460.) the monistic concept of depth is proposed, using the selection of adjacent cells coding depth, jump
The smaller depth of certain probabilities of occurrence is crossed, so that accelerating CU block divides speed, although this method improves coding rate
About 25%, but divide depth due to skipping some CU, so that the final coding quality of HEVC encoder is affected.2016
Year is published in the document on Journal of Visual Communication and Image Representation magazine
(Tariq J,Kwong S,Yuan H.HEVC intra mode selection based on Rate Distortion
(RD)cost and Sum of Absolute Difference(SAD)[J].Journal of Visual
Communication and Image Representation, 2016,35:112-119.) analyze rate distortion (RD cost)
Secondary relationship between absolute error and (SAD) is obtained the approximate formula of estimation rate distortion costs, can be saved with this
It seeks carrying out time-consuming required for entropy coding to each CU block in rate distortion costs.But since this method is to the code length of each CU block
It is to do approximate estimation, the calculating that can be thus distorted to rate brings error, to cause centainly to lose to the coding quality of image.
Document (Cebri á n-M á rquez G, Galiano V, Migall ó n H, the et al.Heterogeneous delivered for 2018
CPU plus GPU approaches for HEVC[J].The Journal of Supercomputing,2018,April
2:1-12.) two methods of the parallel scheme based on chip level (Slice) and parallel scheme based on level (Tile) are carried out
It realizes, this method is but also the computational efficiency of encoder has obtained a degree of raising.But this method is only chip level or level
Parallel, degree of parallelism is limited, its computational efficiency can not be made to obtain more massive promotion.Above method is not related to
The final entropy coding link of full frame image after determining optimum encoding parameter carries out parallelization processing.And as previously mentioned, the ring
Average 14--17 milliseconds of time-consuming when abridged edition is in one frame high-definition image (1080p) of processing.Therefore, the link how is effectively saved
The processing time, for realizing that real-time handle of entire HEVC encoder is vital.
Summary of the invention
The present invention proposes a kind of method for parallel processing of HEVC encoder for being suitble to realize on CPU+GPU heterogeneous platform,
This method is by the final entropy coding link of full frame image and prediction, transformation, quantization, rate-distortion optimization, inverse quantization, inverse transformation, filter
Involve all other link such as image reconstruction and carry out parallel processing, synchronous operation together, reaches and be obviously improved HEVC binary encoding
The purpose of the computational efficiency of device.
In HEVC, each width current frame image will be predicted first, converted, quantified, is rate-distortion optimization, anti-
The operation of the links such as quantization, inverse transformation, filtering, image reconstruction and entropy coding.Due in prediction, transformation, quantization, filtering and image
Reconstructing in these links can be designed efficiently and trying and effectively being divided to pending data involved in it
Parallel processing algorithm, while by using GPU many-core structure and Mutli-thread Programming Technology efficient parallel realize to these
The processing of data, so as to promote the computational efficiency of these links on a large scale.But for the entropy coding process in HEVC, especially
It carries out full frame image final entropy coding link, is encoded using CABAC method as previously described, this based on upper
Adaptive binary arithmetic coding method hereafter requires to use to its last data when encoding each data
Coding result, therefore data to be encoded to be dealt with are needed in the link successively serial according to the tandem of each data
Ground processing, and if these data are divided into many junior units and carry out parallel processings, the cause and effect between the data of front and back can be destroyed
Relationship and generate encoding error.Therefore, it cannot be done as in other links by data division in the entropy coding link
Parallel processing and be obviously improved its computational efficiency.
However according to HEVC standard agreement it is found that final entropy coding for a certain frame video image whole data to be encoded
Link, although still serial mode must be used to each according to the tandem between these data inside the entropy coding link at this time
Data are encoded, but the calculating process with to next frame image prediction, transformation, quantization, rate-distortion optimization, inverse quantization,
Links such as inverse transformation, filtering and image reconstruction and there is no the relation of interdependence between data, the two can independently into
Row calculates.According to this analysis, the present invention proposes that this " realizes the height of HEVC medium entropy coding link based on CPU+GPU heterogeneous platform
Imitate method for parallel processing ".The main thought of this method is by the final entropy coding link of current frame image and to next frame image
Processing in other links of whole other than final entropy coding link do parallel processing-that is, the entropy of current frame image compiled
Code handled with CPU, to the prediction of next frame image, transformation, quantization, rate-distortion optimization, inverse quantization, inverse transformation, filtering and
The whole link such as image reconstruction is handled with GPU, both CPU and GPU parallel computation simultaneously.By using this parallel processing
Scheme, when can save the processing spent required for time-consuming shorter link (i.e. final entropy coding link) among said two devices
Between, to significantly improve the overall calculation speed of HEVC encoder.
The present invention realizes that the technical solution of HEVC encoders in parallel processing is as follows:
(1) CPU (such as i7-7700 or to improve grade) and GPU card (such as GTX- of two cores or two cores or more are used
1080 or to improve grade) constituted a CPU+GPU heterogeneous computing platforms, for realizing HEVC encoders in parallel of the present invention
Processing method;
(2) as shown in Figure of description, two threads is set on CPU and (is referred to as " CPU main thread " and " CPU is from line
Journey "), a multithreading (referred to as " GPU multithreading ") is set on GPU.Wherein " CPU main thread " is responsible for entire encoder system
Calculation process control, to the data information friendship between " CPU from thread " and the scheduling and CPU and GPU of " GPU multithreading "
Stream, " CPU from thread " are responsible for realizing the final entropy coding link to all data to be encoded of previous frame image, " GPU multithreading "
It is responsible for realizing to the prediction of current frame image, transformation, quantization, rate-distortion optimization, inverse quantization, inverse transformation, filtering and image reconstruction
Deng whole links;It is described to the prediction of image involved in GPU multithreading, transformation, quantization, inverse quantization, inverse transformation, filtering and
The whole link such as image reconstruction, using parallel algorithm described in following academic dissertations: 1) Du Junjie .HEVC intra prediction is crucial
The design and realization [D] of technology parallel algorithm, Dalian University of Technology's master thesis, 2015;2) Zhang Weilong .HEVC is crucial
Design and the realization [D] based on GPU of modular concurrent algorithm, Dalian University of Technology's master thesis, 2016;
(3) for during rate-distortion optimization to the entropy coding link of each CU data block, due to the link be with to working as
The processing links such as prediction, transformation, the quantization of preceding image are combined, and the links such as prediction, transformation, quantization are in this programme
Parallel processing is done by " GPU multithreading " by GPU, and calculated using CU data block as unit at this time, therefore in this process
In can with " GPU multithreading " to each CU data block still press former serial entropy coding algorithm to each data in the CU block into
Row coding, and parallel encoding is carried out to different CU data blocks, it is achieved in CU grades of parallel entropy coding, improves the meter of the link
Calculate efficiency.
The efficient parallel processing method proposed by the present invention for realizing final entropy coding link in HEVC encoder, enables to
HEVC encoder, which saves, to carry out each frame video data to handle the time required for final entropy coding, to make the whole meter of system
Efficiency is calculated to be significantly improved.Experiment shows HEVC encoders in parallel processing technique proposed by the present invention in current consumption grade
CPU+GPU computing platform on realize (for example, with one CoreTMI7-7700 computer adds with one piece of GTX-1080
GPU card), in the case where the data compression ratio to high definition (1080P) video image reaches 100-130 times, 27-31 frame/second can be obtained
Video coding rate can satisfy the requirement of processing (>=25 frames/second) in real time.Otherwise, if carrying out final entropy to each frame image
Parallel entropy coding scheme proposed by the present invention is not used when coding, and uses original serial approach, then can not save the link
Required calculating is time-consuming (14-17 milliseconds), to can not achieve the real-time coding of high clear video image data under the platform
Compression.The present invention proposes the parallel entropy coding of CU grades of realization for the entropy coding of each CU data block during rate-distortion optimization
Scheme can make the computational efficiency of rate-distortion optimization link be further improved by executing the parallel scheme.
Detailed description of the invention
Attached drawing 1 is flow diagram of the invention.
Specific embodiment
Specific embodiments of the present invention are illustrated below in conjunction with technical solution and Figure of description.
If sequence of video images length to be encoded is N (0-N-1 frame), implementation steps are as follows:
Step 1: as shown in Figure of description, from T0Moment starts to T1In time, first from the CPU main thread at the end CPU to
The 0th frame image data in video sequence is transmitted at the end GPU, and issues dispatch command to GPU, and starting GPU multithreading is used as before
" technical solution for realizing the processing of HEVC encoders in parallel " described in face is predicted (to be herein pre- in frame to the 0th frame image
Survey), transformation, quantization, the parallel computation of rate-distortion optimization, inverse quantization, the links such as inverse transformation and image reconstruction.During this period, CPU
Main thread and the end GPU will have multiple information interchange, the current GPU that CPU main thread can be reflected according to the information that the end GPU returns
The task execution situation of GPU multithreading on end starts different GPU multithreadings to complete the processing task of different links;When
Task end mark is returned to CPU main thread after the reconstruction calculations task of GPU multithreading the 0th frame image of completion in the end GPU,
After CPU main thread receives end mark, the whole data to be encoded for the 0th frame that GPU multithreading generates are passed in CPU;Herein
Period, CPU was in idle condition from thread;
Step 2: from T1Moment starts to T2In time, it is complete to the 0th frame from thread by this from thread that CPU main thread starts CPU
Portion's data to be encoded carry out entropy coding, and by next frame (the 1st frame), image data is transmitted to the end GPU to CPU main thread, and starts simultaneously
The GPU multithreading at the end GPU is predicted (inter-prediction), transformation, quantization, rate-distortion optimization, inverse quantization, anti-to the 1st frame image
The parallel computation of the links such as transformation, filtering and image reconstruction.During this period, CPU main thread is same as the end GPU will multiple letter
Breath exchange, CPU main thread can be according to the task executions of the GPU multithreading on the current end GPU that the information that the end GPU returns is reflected
Situation starts different GPU multithreadings to complete the processing task of different links.CPU main thread and CPU are between thread at this time
Also there will be information interchange, and control CPU from the task of thread and complete situation.When the GPU multithreading in the end GPU completes the 1st frame image
Reconstruction calculations task after to CPU main thread return task end mark, it is after CPU main thread receives end mark, GPU is multi-thread
The whole data to be encoded for the 1st frame that journey generates pass in CPU;
Step 3: from T2Moment starts to T3In time, the CPU main thread at the end CPU, CPU are from the GPU at thread and the end GPU
Multithreading carry out respectively in step 2 completely same operation processing, except that in this step from CPU main thread to
The transmission of the end GPU be the 2nd frame image data, GPU multiple threads is also that the frame data, CPU main thread start CPU from thread
Entropy coding is carried out to the 1st frame whole data to be encoded, CPU main thread is finally that will be generated in this period by GPU multithreading
Whole data to be encoded of 2nd frame image pass in CPU;
Step 4: from TnMoment starts to Tn+1In time (n=3,4 ..., N-1), the CPU main line at the end CPU in this step
The operation processing that journey, CPU are carried out from the GPU multithreading at thread and the end GPU still with it is almost same in step 2 or step 3
The operation processing of sample, unlike that transmit in this step from CPU main thread to the end GPU is n-th frame image data, GPU more
This frame data thread process and that new biography comes, CPU main thread starting CPU are all to be encoded to the (n-1)th frame from thread
Data carry out entropy coding and CPU main thread is finally the complete of the n-th frame image that will be generated in this period by GPU multithreading
Portion's data to be encoded pass in CPU.In addition, different with abovementioned steps: in this step by GPU multithreading to n-th
Prediction that frame image is done processing had not only been likely to be inter-prediction has but also been likely to be intra prediction, specially it is any predict be by
Selection algorithm is predicted as defined in advance to determine.It is to be determined by CPU main thread according to prediction selection algorithm in actual moving process
The fixed prediction mode to current frame image, and issue corresponding dispatch command to GPU, therewith by GPU multithreading by instruction to working as
Prior image frame carries out corresponding intra prediction or inter-prediction operation.It is other it is all operation then with step 2 or the complete phase of step 3
Together;
Step 5: repeating step 4 until TNUntil moment, T is arrivedNMoment removes the completion of N-1 frame (last frame) image
Whole processing operations other than final entropy coding link, and be transmitted to whole data to be encoded of the frame image
The end CPU;
Step 6: in TNMoment, the whole sent out public notice from CPU main thread to CPU from thread by this from thread to N-1 frame
Data to be encoded carry out final entropy coding, obtain the last frame image finally compressed bit stream data.So far, to entire
The cataloged procedure of sequence of video images terminates.
Claims (2)
1. a kind of method for parallel processing for realizing HEVC medium entropy coding link based on CPU+GPU heterogeneous platform, it is characterised in that:
(1) a CPU+GPU heterogeneous computing platforms are constituted using CPU and more than two cores or two cores GPU cards;
(2) two threads are set on CPU, are referred to as " CPU main thread " and " CPU from thread ";Setting is more than one on GPU
Thread, referred to as " GPU multithreading ";Wherein " CPU main thread " is responsible for the calculation process control of entire encoder system, also to " CPU
From thread " and the scheduling and CUP and GPU of " GPU multithreading " between data information exchange;" CPU from thread " is responsible for realization
To the final entropy coding link of all data to be encoded of previous frame image, " GPU multithreading " is responsible for realizing to current frame image
Prediction, transformation, quantization, rate-distortion optimization, inverse quantization, inverse transformation, filtering and image reconstruction;
(3) for, to the entropy coding link of each CU data block, the links such as prediction, transformation, quantization are during rate-distortion optimization
Parallel processing is done by " GPU multithreading " by GPU, and calculated using CU data block as unit at this time, in the process
Each CU data block still compiles each data in the CU block by former serial entropy coding algorithm with " GPU multithreading "
Code, and parallel encoding is carried out to different CU data blocks, it is achieved in CU grades of parallel entropy coding, improves the calculating effect of the link
Rate.
2. method for parallel processing according to claim 1, it is characterised in that following steps:
Step 1: from T0Moment starts to T1In time, first from the CPU main thread at the end CPU into the end GPU transmission video sequence
0th frame image data, and dispatch command is issued to GPU, starting GPU multithreading is using " realization HEVC coding as elucidated before
The technical solution of device parallel processing " predicts the 0th frame image, converted, being quantified, rate-distortion optimization, inverse quantization, inverse transformation and
The parallel computation of the links such as image reconstruction;During this period, CPU main thread and the end GPU will have multiple information interchange, CPU main line
Journey can be according to the task execution situation of the GPU multithreading on the current end GPU that the information that the end GPU returns is reflected, starting is different
GPU multithreading to complete the processing tasks of different links;When the GPU multithreading in the end GPU completes the reconstruct meter of the 0th frame image
Task end mark is returned to CPU main thread after calculation task, after CPU main thread receives end mark, GPU multithreading is generated
Whole data to be encoded of 0th frame pass in CPU;CPU is in idle condition from thread during this period;
Step 2: from T1Moment starts to T2In time, CPU main thread start CPU from thread by this from thread to the 0th frame all to
Coded data carries out entropy coding, and by next frame (the 1st frame), image data is transmitted to the end GPU to CPU main thread, and starts GPU simultaneously
The GPU multithreading at end predicts the 1st frame image, converted, being quantified, rate-distortion optimization, inverse quantization, inverse transformation, filtering and figure
As the parallel computation of the links such as reconstruct;During this period, CPU main thread is same as the end GPU will multiple information interchange, CPU master
Thread can be according to the task execution situation of the GPU multithreading on the current end GPU that the information that the end GPU returns is reflected, starting is not
With GPU multithreading to complete the processing tasks of different links;CPU main thread and CPU will be from will also there is information at this time between thread
Exchange controls CPU from the task of thread and completes situation;When the GPU multithreading in the end GPU completes the reconstruction calculations of the 1st frame image
Return to task end mark to CPU main thread after task, after CPU main thread receives end mark, GPU multithreading is generated the
Whole data to be encoded of 1 frame pass in CPU;
Step 3: from T2Moment starts to T3In time, CUP main thread, the GPU at CPU from thread and the end GPU at the end CUP are multi-thread
Journey carry out respectively with same operation processing completely in step 2, except that from CPU main thread to GPU in this step
End transmission be the 2nd frame image data, GPU multiple threads is also that the frame data, CPU main thread start CPU from thread pair
1st frame whole data to be encoded carry out entropy coding, CPU main thread is finally the 2 will generated in this period by GPU multithreading
Whole data to be encoded of frame image pass in CPU;
Step 4: from TnMoment starts to Tn+1In time (n=3,4 ..., N-1), in this step the CUP main thread at the end CUP,
The operation processing that CPU is carried out from the GPU multithreading at thread and the end GPU still with it is almost same in step 2 or step 3
Operation processing, the difference is that is transmitted in this step from CPU main thread to the end GPU is n-th frame image data, GPU multithreading
This frame data processing and that new biography comes, CPU main thread starting CPU are from thread to the (n-1)th frame whole data to be encoded
Carry out entropy coding and CPU main thread be finally the n-th frame image that will be generated in this period by GPU multithreading all to
Coded data passes in CPU;Frame is both likely to be by the prediction processing that GPU multithreading does n-th frame image in this step
Between prediction be likely to be intra prediction again, specially any prediction is that the prediction selection algorithm as defined in advance determines;?
In actual moving process, be prediction mode to current frame image is determined according to prediction selection algorithm by CPU main thread, and to
GPU issues corresponding dispatch command, therewith by GPU multithreading by instruction to current frame image carry out corresponding intra prediction or
Inter-prediction operation;Other all operations are then identical with step 2 or step 3;
Step 5: repeating step 4 until TNUntil moment, T is arrivedNMoment completes in addition to final entropy coding N-1 frame image
Whole processing operations other than link, and whole data to be encoded of the frame image have been transmitted to the end CPU;
Step 6: in TNMoment sends out public notice to be encoded from whole of the thread to N-1 frame by this from CPU main thread to CPU from thread
Data carry out final entropy coding, obtain the last frame image finally compressed bit stream data;So far, to entire video figure
As the cataloged procedure of sequence terminates.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811258709.6A CN109391816B (en) | 2018-10-26 | 2018-10-26 | Parallel processing method for realizing entropy coding link in HEVC (high efficiency video coding) based on CPU (Central processing Unit) and GPU (graphics processing Unit) heterogeneous platform |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811258709.6A CN109391816B (en) | 2018-10-26 | 2018-10-26 | Parallel processing method for realizing entropy coding link in HEVC (high efficiency video coding) based on CPU (Central processing Unit) and GPU (graphics processing Unit) heterogeneous platform |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109391816A true CN109391816A (en) | 2019-02-26 |
CN109391816B CN109391816B (en) | 2020-11-03 |
Family
ID=65426863
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811258709.6A Active CN109391816B (en) | 2018-10-26 | 2018-10-26 | Parallel processing method for realizing entropy coding link in HEVC (high efficiency video coding) based on CPU (Central processing Unit) and GPU (graphics processing Unit) heterogeneous platform |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109391816B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110969672A (en) * | 2019-11-14 | 2020-04-07 | 杭州飞步科技有限公司 | Image compression method and device |
CN112385225A (en) * | 2019-09-02 | 2021-02-19 | 北京航迹科技有限公司 | Method and system for improved image coding |
CN114827614A (en) * | 2022-04-18 | 2022-07-29 | 重庆邮电大学 | Method for realizing LCEVC video coding optimization |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103297777A (en) * | 2013-05-23 | 2013-09-11 | 广州高清视信数码科技股份有限公司 | Method and device for increasing video encoding speed |
CN104869398A (en) * | 2015-05-21 | 2015-08-26 | 大连理工大学 | Parallel method of realizing CABAC in HEVC based on CPU+GPU heterogeneous platform |
US20170150181A1 (en) * | 2015-11-20 | 2017-05-25 | Nvidia Corporation | Hybrid Parallel Decoder Techniques |
CN107135392A (en) * | 2017-04-21 | 2017-09-05 | 西安电子科技大学 | HEVC motion search parallel methods based on asynchronous mode |
-
2018
- 2018-10-26 CN CN201811258709.6A patent/CN109391816B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103297777A (en) * | 2013-05-23 | 2013-09-11 | 广州高清视信数码科技股份有限公司 | Method and device for increasing video encoding speed |
CN104869398A (en) * | 2015-05-21 | 2015-08-26 | 大连理工大学 | Parallel method of realizing CABAC in HEVC based on CPU+GPU heterogeneous platform |
US20170150181A1 (en) * | 2015-11-20 | 2017-05-25 | Nvidia Corporation | Hybrid Parallel Decoder Techniques |
CN107135392A (en) * | 2017-04-21 | 2017-09-05 | 西安电子科技大学 | HEVC motion search parallel methods based on asynchronous mode |
Non-Patent Citations (3)
Title |
---|
BIAO WANG 等: "Highly parallel HEVC decoding for heterogeneous systems with CPU and GPU", 《SIGNAL PROCESSING: IMAGE COMMUNICATION》 * |
张维龙: "HEVC关键模块并行算法的设计与基于GPU的实现", 《中国优秀硕士学位论文全文数据库》 * |
马爱迪: "基于CPU+GPU混合平台的HEVC并行解码器", 《中国优秀硕士学位论文全文数据库》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112385225A (en) * | 2019-09-02 | 2021-02-19 | 北京航迹科技有限公司 | Method and system for improved image coding |
CN110969672A (en) * | 2019-11-14 | 2020-04-07 | 杭州飞步科技有限公司 | Image compression method and device |
CN114827614A (en) * | 2022-04-18 | 2022-07-29 | 重庆邮电大学 | Method for realizing LCEVC video coding optimization |
CN114827614B (en) * | 2022-04-18 | 2024-03-22 | 重庆邮电大学 | Method for realizing LCEVC video coding optimization |
Also Published As
Publication number | Publication date |
---|---|
CN109391816B (en) | 2020-11-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104967850B (en) | The method and apparatus that image is coded and decoded by using big converter unit | |
CN104869398B (en) | A kind of CABAC realized based on CPU+GPU heterogeneous platforms in HEVC parallel method | |
CN106464894B (en) | Method for processing video frequency and device | |
CN104935932B (en) | Image decoding apparatus | |
CN101394560B (en) | Mixed production line apparatus used for video encoding | |
CN106170092A (en) | Fast encoding method for lossless coding | |
CN102065298B (en) | High-performance macroblock coding implementation method | |
CN103548356B (en) | Picture decoding method using dancing mode and the device using this method | |
CN110087087A (en) | VVC interframe encode unit prediction mode shifts to an earlier date decision and block divides and shifts to an earlier date terminating method | |
CN105981383B (en) | Method for processing video frequency and device | |
CN103096056B (en) | Matrix coder method and apparatus and coding/decoding method and device | |
CN109391816A (en) | The method for parallel processing of HEVC medium entropy coding link is realized based on CPU+GPU heterogeneous platform | |
CN107483947A (en) | Video coding and decoding device and non-transitory computer-readable storage media | |
CN103327325A (en) | Intra-frame prediction mode rapid self-adaptation selection method based on HEVC standard | |
CN101895756A (en) | Method and system for coding, decoding and reconstructing video image blocks | |
CN105681797A (en) | Prediction residual based DVC-HEVC (Distributed Video Coding-High Efficiency Video Coding) video transcoding method | |
CN102196272B (en) | P frame encoding method and device | |
CN104702959B (en) | A kind of intra-frame prediction method and system of Video coding | |
CN105100799B (en) | A method of reducing intraframe coding time delay in HEVC encoders | |
CN108965814A (en) | A kind of video mix decoding rendering method based on CUDA acceleration technique | |
CN104980764A (en) | Parallel coding/decoding method, device and system based on complexity balance | |
CN101841722B (en) | Detection method of detection device of filtering boundary strength | |
CN102196253A (en) | Video coding method and device based on frame type self-adaption selection | |
CN102595137B (en) | Fast mode judging device and method based on image pixel block row/column pipelining | |
CN104780377B (en) | A kind of parallel HEVC coded systems and method based on Distributed Computer System |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |