CN109391816B - Parallel processing method for realizing entropy coding link in HEVC (high efficiency video coding) based on CPU (Central processing Unit) and GPU (graphics processing Unit) heterogeneous platform - Google Patents

Parallel processing method for realizing entropy coding link in HEVC (high efficiency video coding) based on CPU (Central processing Unit) and GPU (graphics processing Unit) heterogeneous platform Download PDF

Info

Publication number
CN109391816B
CN109391816B CN201811258709.6A CN201811258709A CN109391816B CN 109391816 B CN109391816 B CN 109391816B CN 201811258709 A CN201811258709 A CN 201811258709A CN 109391816 B CN109391816 B CN 109391816B
Authority
CN
China
Prior art keywords
gpu
cpu
frame
data
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811258709.6A
Other languages
Chinese (zh)
Other versions
CN109391816A (en
Inventor
郭成安
董菁鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian University of Technology
Original Assignee
Dalian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian University of Technology filed Critical Dalian University of Technology
Priority to CN201811258709.6A priority Critical patent/CN109391816B/en
Publication of CN109391816A publication Critical patent/CN109391816A/en
Application granted granted Critical
Publication of CN109391816B publication Critical patent/CN109391816B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/124Quantisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output
    • H04N19/147Data rate or code amount at the encoder output according to rate distortion criteria
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/157Assigned coding mode, i.e. the coding mode being predefined or preselected to be further used for selection of another element or parameter
    • H04N19/159Prediction type, e.g. intra-frame, inter-frame or bidirectional frame prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/172Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/44Decoders specially adapted therefor, e.g. video decoders which are asymmetric with respect to the encoder
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/70Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/90Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using coding techniques not provided for in groups H04N19/10-H04N19/85, e.g. fractals
    • H04N19/91Entropy coding, e.g. variable length coding [VLC] or arithmetic coding

Abstract

A high-efficiency parallel processing method for realizing entropy coding links in HEVC based on a CPU + GPU heterogeneous platform is characterized in that in the process of coding a video image sequence according to HEVC protocol standard, a final entropy coding link of a current frame image and all other links except the final entropy coding link in the processing of a next frame image are processed in parallel, namely, a CPU for entropy coding of the current frame image is processed by adopting a CPU + GPU computing platform, all other links such as prediction, transformation, quantization, rate distortion optimization, inverse quantization, inverse transformation, filtering, image reconstruction and the like of the next frame image are processed by a GPU, and the CPU and the GPU are simultaneously computed in parallel; by adopting the parallel processing scheme, the processing time required by the link with shorter time consumption (namely the final entropy coding link) in the two links can be saved, so that the overall computing speed of the HEVC encoder is obviously improved.

Description

Parallel processing method for realizing entropy coding link in HEVC (high efficiency video coding) based on CPU (Central processing Unit) and GPU (graphics processing Unit) heterogeneous platform
Technical Field
The invention belongs to the technical field of digital Video compression Coding, and relates to a method for realizing efficient parallel processing in an entropy Coding link in an HEVC (high efficiency Video Coding, also called H.265 or HEVC/H.265) protocol standard so as to achieve the purpose of remarkably improving the computing efficiency of an HEVC coder.
Background
With the rapid development of the internet and information technology, multimedia technology plays an increasingly important role in social life. The video is an important information carrier, has the advantages of intuition, accuracy, high efficiency, universality and the like, and is widely applied to various social fields. With the increasing demands for video resolution, definition, and the like, the development of digital video has been advanced from the original 352 × 240 resolution to high definition (1920 × 1080) and further to ultra high definition (4k × 2k and above), and the amount of video data has also increased substantially. However, the capacity of the actual channel and storage device is limited, so that video data compression has become an indispensable key technology in the application and development of video technology.
As a latest generation high efficiency video coding standard, HEVC was promulgated by the video coding experts group (ITU-T/VCEG) of the international telecommunication union and the moving picture experts group (ISO/IEC MPEG) of the international organization for standardization and the International Electrotechnical Commission (IEC) in 2013. HEVC contains the latest video coding technology, and compared with the previous generation video coding standard h.264/AVC, HEVC can save a code rate by about 50% on the premise of ensuring the same coding quality. HEVC achieves excellent video compression performance while also significantly increasing the computational complexity of the encoding process (statistically 2-4 times that of h.264/AVC), thereby presenting a significant challenge to the implementation of real-time processing by the encoder. Therefore, designing an efficient and fast HEVC coding algorithm has become an important research topic in the field of video data compression.
The HEVC coder mainly comprises links such as prediction (intra-frame prediction and inter-frame prediction), transformation, quantization, rate distortion optimization, inverse quantization, inverse transformation, filtering (deblocking filtering and sample adaptive compensation filtering), image reconstruction and entropy coding, corresponding efficient parallel processing algorithms are designed for other links except the entropy coding link, and the computing efficiency of the links is remarkably improved by adopting a GPU and a multi-thread programming technology to realize parallel computing. However, for the entropy Coding link, since the Context-based Adaptive Binary Arithmetic Coding (CABAC) method is adopted in HEVC, the calculation process itself is a recursive operation, and when performing Coding calculation on the next data, the Coding result of the previous data needs to be used, so that the next data can only be coded after the Coding result of the previous data is already obtained, and only serial operation can be performed sequentially according to the front and back order of the data, but not parallel processing, so that it is difficult to increase the calculation speed of the link on a large scale. Two links relate to entropy coding calculation in the whole HEVC coding process, one is that the code rate information of each CU block needs to be respectively solved for each CU data block in an image to obtain the optimal coding parameter of each CU block when rate distortion optimization is carried out, and the other is that the final entropy coding is carried out on all to-be-coded data of the whole frame of image after all the optimal coding parameters are solved to generate a bit stream after the frame of image is compressed.
Experimental test results show that, with a seventh generation i7CPU (e.g.,
Figure BDA0001843389250000021
CoreTMi7-7700) under the condition that the data compression ratio of a high-definition (1080P) video image is 100-130 times, the time consumed for completing the final entropy coding link of a whole frame image by using the entropy coding algorithm in the HEVC is 14-17 milliseconds on average. All links such as prediction, transformation, quantization, inverse transformation, rate distortion optimization, filtering, image reconstruction and the like can be finished within 32-36 milliseconds by adopting a GPU card (such as GTX-1080) to perform parallel processing. Therefore, for the whole HEVC encoder, the entropy coding link has become a bottleneck problem to realize real-time processing. How to design an efficient entropy coding algorithm to remarkably save the processing time of the link is very critical for realizing the real-time processing of the HEVC encoder.
Currently, research aimed at improving the computational efficiency of HEVC encoders is mainly focused on algorithm improvement and hardware acceleration. A master academic paper published in 2014 (zhao asian, parallelization research of a new generation video coding standard HEVC [ D ]. shanghai transportation university, 2014.) adopts two parallel modes of coarse-grained parallel and fine-grained parallel at the same time aiming at a prediction link, and performs parallelization research from a Coding Tree Unit (CTU) level and a CTU internal level. The method can improve the calculation efficiency of the HEVC to a certain extent, but fine-grained parallelism only relates to CU-level parallelism, the parallelism is low, and parallelization processing is not performed on an entropy coding link, so that the acceleration effect cannot meet the requirement on real-time performance. The 2014 paper published in Zhejiang university journal (Zhouqing, Tianxiang, Chendazuo. HEVC coding unit size rapid selection algorithm [ J ]. Zhejiang university journal: engineering edition, 2014, 48(8):1451-1460.) proposes a concept of depth unicity, and skips certain depths with smaller occurrence probability by using the selection of coding depths of adjacent units, so as to accelerate CU block division speed. The document published in Journal of Visual Communication and Image reproduction (Tariq J, Kwong S, Yuan H.HEVC intra mode selection based on Rate Distation (RD) cost and Sum of Absolute Difference (SAD) [ J ]. Journal of Visual Communication and Image reproduction, 2016,35:112-119.) in 2016, analyzes the quadratic relationship between the Rate Distortion (RD cost) and the Sum of Absolute errors (SAD), and obtains an approximate calculation formula of the estimated Rate-Distortion cost, thereby eliminating the time required for entropy encoding of each CU block in the process of obtaining the Rate-Distortion cost. However, since the method performs approximate estimation on the code length of each CU block, errors are caused in the rate distortion calculation, and thus a certain loss is caused to the encoding quality of the image. The documents published in 2018 (Cebri n-M-rquez G, Galiano V, Migarl Lo n H, et al. heterogeneous CPU plus GPU implementations for HEVC [ J ] The Journal of Supercomputing,2018, April2:1-12.) realize two methods of Slice level (Slice) based parallel scheme and level (Tile) based parallel scheme, and The method also improves The computing efficiency of The encoder to a certain extent. However, the method is only one-chip level or level parallel, the parallelism degree is limited, and the calculation efficiency cannot be improved in a larger scale. The method does not relate to the parallelization processing of the final entropy coding link of the whole frame image after the optimal coding parameters are determined. As mentioned above, this link itself takes on the average 14-17 ms to process a frame of high definition image (1080 p). Therefore, how to effectively save the processing time of this link is crucial to realize real-time processing of the whole HEVC encoder.
Disclosure of Invention
The invention provides a parallel processing method of an HEVC (high efficiency video coding) encoder, which is suitable for being realized on a CPU (central processing unit) and GPU (graphics processing unit) heterogeneous platform.
In HEVC, for each current frame image, operations of prediction, transformation, quantization, rate distortion optimization, inverse quantization, inverse transformation, filtering, image reconstruction, entropy coding, and the like are performed first. In the links of prediction, transformation, quantization, filtering and image reconstruction, an efficient parallel processing algorithm can be designed by trying to effectively divide the related data to be processed, and meanwhile, the data are efficiently and parallelly processed by utilizing a multi-core structure of a GPU and a multi-thread programming technology, so that the computing efficiency of the links can be improved on a large scale. However, in the entropy coding process in HEVC, especially in the final entropy coding step of the whole frame image, the CABAC method is used for coding as described above, and this context-based adaptive binary arithmetic coding method needs to use the coding result of the previous data when coding each data, so that the data to be coded to be processed in this step need to be processed sequentially and serially according to the front and back order of each data, and if these data are divided into many small units for parallel processing, the causal relationship between the front and back data is destroyed, and a coding error is generated. Therefore, the entropy coding link cannot perform parallel processing through data division as in other links, so that the computational efficiency is remarkably improved.
However, according to the HEVC standard protocol, in the final entropy coding step of all the data to be coded of a certain frame of video image, although each data still needs to be coded in a serial manner according to the sequence between the data in the entropy coding step, the calculation process does not have a mutual dependency relationship between the data in the steps of prediction, transformation, quantization, rate distortion optimization, inverse quantization, inverse transformation, filtering, image reconstruction, etc. of the next frame of image, and the two steps can be calculated independently. According to the analysis, the invention provides a high-efficiency parallel processing method for realizing an entropy coding link in HEVC based on a CPU + GPU heterogeneous platform. The method mainly comprises the steps of carrying out parallel processing on a final entropy coding link of a current frame image and all other links except the final entropy coding link in the processing of a next frame image, namely processing the entropy coding of the current frame image by using a CPU (central processing unit), processing all links such as prediction, transformation, quantization, rate distortion optimization, inverse quantization, inverse transformation, filtering, image reconstruction and the like of the next frame image by using a GPU (graphics processing unit), and simultaneously carrying out parallel calculation on the CPU and the GPU. By adopting the parallel processing scheme, the processing time required by the link with shorter time consumption (namely the final entropy coding link) in the two links can be saved, so that the overall computing speed of the HEVC encoder is obviously improved.
The technical scheme for realizing the parallel processing of the HEVC encoder is as follows:
(1) the method comprises the following steps of adopting two or more cores of a CPU (such as i7-7700 or above) and a GPU card (such as GTX-1080 or above) to form a CPU + GPU heterogeneous computing platform, and being used for realizing the HEVC encoder parallel processing method;
(2) as shown in the attached drawings, two threads (respectively called as a CPU main thread and a CPU slave thread) are arranged on a CPU, and one multithread (called as a GPU multithread) is arranged on a GPU. The CPU main thread is responsible for the calculation process control of the whole encoder system, the scheduling of a CPU slave thread and a GPU multithread and the data information communication between the CPU and the GPU, the CPU slave thread is responsible for realizing the final entropy coding link of all to-be-coded data of a previous frame of image, and the GPU multithread is responsible for realizing all links of prediction, transformation, quantization, rate distortion optimization, inverse quantization, inverse transformation, filtering, image reconstruction and the like of a current frame of image; all the links of image prediction, transformation, quantization, inverse transformation, filtering, image reconstruction and the like related to GPU multithreading adopt a parallel algorithm set forth in the following syndrome paper: 1) design and implementation of a key technology parallel algorithm for HEVC intra-frame prediction [ D ], university of major academic thesis, 2015; 2) zhang Weilong, HEVC key module parallel algorithm design and GPU-based implementation [ D ], university of great courseware academic thesis, 2016;
(3) as for the entropy coding link of each CU data block in the rate distortion optimization process, the link is combined with the processing links of prediction, transformation, quantization and the like of the current image, in the scheme, the links of prediction, transformation, quantization and the like are processed in parallel by a GPU (graphics processing unit) through multithreading, and the CU data blocks are used as units for calculation, so that each CU data block can be encoded by the GPU multithreading in the process according to the original serial entropy coding algorithm, and different CU data blocks are encoded in parallel, so that the parallel entropy coding of a CU level is realized, and the calculation efficiency of the link is improved.
The efficient parallel processing method for realizing the final entropy coding link in the HEVC encoder can save the processing time required by the HEVC encoder for performing the final entropy coding on each frame of video data, so that the overall computing efficiency of the system is remarkably improved. Experiments show that the HEVC encoder parallel processing technology provided by the invention is realized on a CPU + GPU computing platform at the current consumption level (for example, one platform is used
Figure BDA0001843389250000061
Figure BDA0001843389250000062
CoreTMThe i7-7700 computer is additionally provided with a GTX-1080GPU card, and under the condition that the data compression ratio of a high-definition (1080P) video image reaches 100-130 times, the video coding rate of 27-31 frames/second can be obtained, and the requirement of real-time processing (more than or equal to 25 frames/second) can be met. Otherwise, if the parallel entropy coding scheme proposed by the present invention is not adopted when the final entropy coding is performed on each frame image, but the original serial method is adopted, the time consumption (14-17 ms) of the calculation required by the link cannot be saved, so that the real-time coding compression of the high-definition video image data cannot be realized under the platform. The invention provides a parallel entropy coding scheme for realizing CU level aiming at entropy coding of each CU data block in the rate distortion optimization process, and the calculation efficiency of a rate distortion optimization link can be further improved by executing the parallel scheme.
Drawings
FIG. 1 is a schematic flow diagram of the present invention.
Detailed Description
The following description and the accompanying drawings are used to explain specific embodiments of the present invention.
The length of a video image sequence to be coded is set to be N (0-N-1 frame), and the implementation steps are as follows:
step 1: as shown in the attached drawings of the specification, from T0Time begins to T1In time, firstly, the CPU main thread at the CPU end transmits the 0 th frame image data in the video sequence to the GPU end, and issues a scheduling instruction to the GPU, and the GPU multithreading is started to perform parallel computation on the 0 th frame image by using the aforementioned "technical scheme for realizing parallel processing of the HEVC encoder" to perform links such as prediction (here, intra-frame prediction), transformation, quantization, rate distortion optimization, inverse quantization, inverse transformation, and image reconstruction. During the period, the CPU main thread and the GPU end have information exchange for many times, and the CPU main thread can start different GPU multithreading to finish processing tasks of different links according to the task execution condition of GPU multithreading on the current GPU end reflected by the information returned by the GPU end; when the GPU in the GPU end completes the reconstruction calculation task of the 0 th frame of image through multithreading, returning a task ending mark to the CPU main thread, and after the CPU main thread receives the ending mark, transmitting all data to be encoded of the 0 th frame generated by GPU multithreading to the CPU; during which the CPU slave thread is in an idle state;
step 2: from T1Time begins to T2In time, a CPU main thread starts a CPU slave thread to carry out entropy coding on all data to be coded of a 0 th frame by the slave thread, the CPU main thread simultaneously transmits image data of a next frame (1 st frame) to a GPU end, and starts a GPU multithread of the GPU end to carry out parallel calculation on links such as prediction (inter-frame prediction), transformation, quantization, rate distortion optimization, inverse quantization, inverse transformation, filtering, image reconstruction and the like on the 1 st frame image. During the period, the CPU main thread and the GPU end have information exchange for many times, and the CPU main thread starts different GPU multithreading according to the task execution condition of the GPU multithreading on the current GPU end reflected by the information returned by the GPU end so as to finish processing tasks of different links. At this time, information exchange between the CPU main thread and the CPU slave thread is also carried out,and controlling the task completion condition of the slave thread of the CPU. When the GPU in the GPU end completes the reconstruction calculation task of the 1 st frame of image through multithreading, returning a task ending mark to the CPU main thread, and after the CPU main thread receives the ending mark, transmitting all data to be encoded of the 1 st frame generated by GPU multithreading to the CPU;
and step 3: from T2Time begins to T3In time, a CPU main thread at a CPU end, a CPU slave thread and a GPU multithread at a GPU end respectively perform the same operation processing as that in the step 2, the difference is that in the step, the CPU main thread transmits the 2 nd frame of image data to the GPU end, the GPU multithread also transmits the frame of data, the CPU main thread starts the CPU slave thread to entropy encode all the data to be encoded of the 1 st frame, and the CPU main thread finally transmits all the data to be encoded of the 2 nd frame of image generated by the GPU multithread in the period to the CPU;
and 4, step 4: from TnTime begins to Tn+1In time (N is 3,4, …, N-1), the operation processes performed by the CPU master thread on the CPU side, the CPU slave thread and the GPU multithread on the GPU side in this step are still almost the same as those performed in step 2 or step 3, except that in this step, the image data of the nth frame is transferred from the CPU master thread to the GPU side, the frame data is newly transferred from the GPU multithread, the CPU master thread starts the CPU slave thread to entropy-encode all the data to be encoded of the (N-1) th frame, and the CPU master thread finally transfers all the data to be encoded of the image of the nth frame generated by the GPU multithread in this period to the CPU. Further, what is different from the foregoing steps is: in this step, the prediction processing performed by the GPU multithread on the nth frame image may be either inter prediction or intra prediction, and specifically, which prediction is determined by a prediction selection algorithm specified in advance. In the actual operation process, the CPU main thread determines the prediction mode of the current frame image according to the prediction selection algorithm, sends a corresponding scheduling instruction to the GPU, and then the GPU multithreads perform corresponding intra-frame prediction or inter-frame prediction operation on the current frame image according to the instruction. All other operations are completely the same as the step 2 or the step 3;
and 5: repeating the steps4 up to TNUntil time TNAll processing operations except the final entropy coding link are finished on the image of the (N-1) th frame (the last frame) at any moment, and all data to be coded of the image of the frame are transmitted to a CPU (Central processing Unit) end;
step 6: at TNAnd at the moment, the CPU main thread sends a notice to the CPU slave thread that the slave thread performs final entropy coding on all the data to be coded of the (N-1) th frame to obtain code stream data after final compression of the last frame image. The encoding process for the entire sequence of video images is now complete.

Claims (1)

1. A parallel processing method for realizing entropy coding links in HEVC based on a CPU + GPU heterogeneous platform is characterized by comprising the following steps:
(1) a CPU + GPU heterogeneous computing platform is formed by two or more cores of CPUs and a GPU card;
(2) two threads are arranged on a CPU, and are respectively called as a 'CPU main thread' and a 'CPU slave thread'; setting a multithread on a GPU, wherein the multithread is called GPU multithreading; the CPU main thread is responsible for the calculation flow control of the whole encoder system, and also schedules the CPU slave thread and the GPU multithreading and exchanges data information between the CUP and the GPU; the 'CPU slave thread' is responsible for realizing the final entropy coding link of all the data to be coded of the previous frame of image, and the 'GPU multithreading' is responsible for realizing the prediction, transformation, quantization, rate distortion optimization, inverse quantization, inverse transformation, filtering and image reconstruction of the current frame of image;
(3) for the entropy coding link of each CU data block in the rate distortion optimization process, the prediction, transformation and quantization links are processed in parallel by a GPU (graphics processing unit) through GPU multithreading, the CU data blocks are used as units for calculation, each CU data block is still encoded by each GPU multithreading in the process according to the original serial entropy coding algorithm, and different CU data blocks are encoded in parallel, so that CU-level parallel entropy coding is realized, and the calculation efficiency of the link is improved;
the method specifically comprises the following steps:
step 1: starting from the time T0 to the time T1, firstly, transmitting the 0 th frame image data in a video sequence to a GPU (graphics processing Unit) end by a CPU main thread at the CPU end, sending a scheduling instruction to the GPU, and starting GPU multithreading to perform parallel calculation of links of prediction, transformation, quantization, rate distortion optimization, inverse quantization, inverse transformation and image reconstruction on the 0 th frame image; during the period, the CPU main thread and the GPU end have information exchange for many times, and the CPU main thread can start different GPU multithreading to finish processing tasks of different links according to the task execution condition of GPU multithreading on the current GPU end reflected by the information returned by the GPU end;
when the GPU in the GPU end completes the reconstruction calculation task of the 0 th frame of image through multithreading, returning a task ending mark to the CPU main thread, and after the CPU main thread receives the ending mark, transmitting all data to be encoded of the 0 th frame generated by GPU multithreading to the CPU; during which the CPU slave thread is in an idle state;
step 2: starting from the time T1 to the time T2, a CPU main thread starts a CPU secondary thread to entropy encode all data to be encoded of a 0 th frame by the secondary thread, simultaneously transmits image data of a next frame (a 1 st frame) to a GPU end, and starts a GPU multithread of the GPU end to perform parallel calculation of links of prediction, transformation, quantization, rate distortion optimization, inverse quantization, inverse transformation, filtering and image reconstruction on the 1 st frame image; during the period, the CPU main thread and the GPU end have information exchange for many times, and the CPU main thread starts different GPU multithreading to finish processing tasks of different links according to the task execution condition of GPU multithreading on the current GPU end reflected by the information returned by the GPU end; at the moment, information exchange is also carried out between the CPU main thread and the CPU slave thread, and the task completion condition of the CPU slave thread is mastered; when the GPU in the GPU end completes the reconstruction calculation task of the 1 st frame of image through multithreading, returning a task ending mark to the CPU main thread, and after the CPU main thread receives the ending mark, transmitting all data to be encoded of the 1 st frame generated by GPU multithreading to the CPU;
and step 3: from the time T2 to the time T3, completely the same operation processing as that in the step 2 is respectively carried out on the CUP main thread at the CUP end, the CPU slave thread and the GPU multithread at the GPU end, except that in the step, the image data of the 2 nd frame is transmitted to the GPU end by the CPU main thread, the frame data is also processed by the GPU multithread, the CPU main thread starts the CPU slave thread to carry out entropy coding on all data to be coded of the 1 st frame, and finally all the data to be coded of the 2 nd frame image generated by the GPU multithread in the period are transmitted to the CPU;
and 4, step 4: starting from the time Tn to Tn +1 (N is 3,4, …, N-1), in the step, the operation processing performed by the CUP main thread at the CUP end, the CPU slave thread and the GPU multithread at the GPU end is still almost the same as the operation processing performed in the step 2 or the step 3, except that in the step, the image data of the nth frame is transmitted to the GPU end by the CPU main thread, the frame data is newly transmitted by the GPU multithread, the CPU main thread starts the CPU slave thread to perform entropy coding on all the data to be coded of the nth frame of image generated by the GPU multithread in the period, and the CPU main thread finally transmits all the data to be coded of the nth frame of image generated by the GPU multithread in the period to the CPU; in the step, the prediction processing of the nth frame image by the GPU multithreading can be both inter-frame prediction and intra-frame prediction, and the specific prediction is determined by a predetermined prediction selection algorithm; in the actual operation process, a CPU main thread determines a prediction mode of a current frame image according to a prediction selection algorithm, sends a corresponding scheduling instruction to a GPU, and then performs corresponding intra-frame prediction or inter-frame prediction operation on the current frame image according to the instruction by the GPU multithreading; all other operations are completely the same as the step 2 or the step 3;
and 5: repeating the step 4 until the TN moment, finishing all processing operations except the final entropy coding link on the N-1 frame image until the TN moment, and transmitting all data to be coded of the frame image to a CPU (central processing unit) end;
step 6: at the moment of TN, the CPU main thread sends a notice to the CPU slave thread that the slave thread performs final entropy coding on all the data to be coded of the (N-1) th frame to obtain code stream data after the final compression of the last frame of image; the encoding process for the entire sequence of video images is now complete.
CN201811258709.6A 2018-10-26 2018-10-26 Parallel processing method for realizing entropy coding link in HEVC (high efficiency video coding) based on CPU (Central processing Unit) and GPU (graphics processing Unit) heterogeneous platform Active CN109391816B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811258709.6A CN109391816B (en) 2018-10-26 2018-10-26 Parallel processing method for realizing entropy coding link in HEVC (high efficiency video coding) based on CPU (Central processing Unit) and GPU (graphics processing Unit) heterogeneous platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811258709.6A CN109391816B (en) 2018-10-26 2018-10-26 Parallel processing method for realizing entropy coding link in HEVC (high efficiency video coding) based on CPU (Central processing Unit) and GPU (graphics processing Unit) heterogeneous platform

Publications (2)

Publication Number Publication Date
CN109391816A CN109391816A (en) 2019-02-26
CN109391816B true CN109391816B (en) 2020-11-03

Family

ID=65426863

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811258709.6A Active CN109391816B (en) 2018-10-26 2018-10-26 Parallel processing method for realizing entropy coding link in HEVC (high efficiency video coding) based on CPU (Central processing Unit) and GPU (graphics processing Unit) heterogeneous platform

Country Status (1)

Country Link
CN (1) CN109391816B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112385225B (en) * 2019-09-02 2023-07-25 北京航迹科技有限公司 Method and system for improving image coding
CN110969672A (en) * 2019-11-14 2020-04-07 杭州飞步科技有限公司 Image compression method and device
CN114827614B (en) * 2022-04-18 2024-03-22 重庆邮电大学 Method for realizing LCEVC video coding optimization

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103297777A (en) * 2013-05-23 2013-09-11 广州高清视信数码科技股份有限公司 Method and device for increasing video encoding speed
CN104869398B (en) * 2015-05-21 2017-08-22 大连理工大学 A kind of CABAC realized based on CPU+GPU heterogeneous platforms in HEVC parallel method
US10277921B2 (en) * 2015-11-20 2019-04-30 Nvidia Corporation Hybrid parallel decoder techniques
CN107135392B (en) * 2017-04-21 2019-12-10 西安电子科技大学 HEVC motion search parallel method based on asynchronous mode

Also Published As

Publication number Publication date
CN109391816A (en) 2019-02-26

Similar Documents

Publication Publication Date Title
CN115623200B (en) Neural network driven codec
CN105791823B (en) The method and apparatus of Video coding and decoded adaptive stencil matching prediction
CN109391816B (en) Parallel processing method for realizing entropy coding link in HEVC (high efficiency video coding) based on CPU (Central processing Unit) and GPU (graphics processing Unit) heterogeneous platform
US8472527B2 (en) Hierarchical motion estimation using original frame for sub-sampled reference
CN105491377B (en) A kind of video decoded macroblock grade Method of Scheduling Parallel of computation complexity perception
CN101490968A (en) Parallel processing apparatus for video compression
US9380314B2 (en) Pixel retrieval for frame reconstruction
US9560350B2 (en) Intra/inter mode decision for predictive frame encoding
CN101014129B (en) Video data compression method
CN110062239B (en) Reference frame selection method and device for video coding
CN1981533A (en) Method and system for performing deblocking filtering
US9654791B1 (en) System and method for efficient multi-bitrate and multi-spatial resolution media encoding
CN110351552B (en) Fast coding method in video coding
CN102625108A (en) Multi-core-processor-based H.264 decoding method
CN101137062A (en) Video coding system dual-core cooperation encoding method with dual-core processor
CN102196253B (en) Video coding method and device based on frame type self-adaption selection
CN105100799B (en) A method of reducing intraframe coding time delay in HEVC encoders
CN102769753A (en) H.264 coder and coding method
CN101841722B (en) Detection method of detection device of filtering boundary strength
CN108965814A (en) A kind of video mix decoding rendering method based on CUDA acceleration technique
CN102595137B (en) Fast mode judging device and method based on image pixel block row/column pipelining
CN109151476A (en) A kind of reference frame generating method and device based on bi-directional predicted B frame image
Gudumasu et al. Software-based versatile video coding decoder parallelization
CN104053009A (en) Encoding method of monitoring video and device
CN104780377A (en) Parallel high efficiency video coding (HEVC) system and method based on distributed computer system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant