CN107231558B - A kind of implementation method of the H.264 parallel encoder based on CUDA - Google Patents
A kind of implementation method of the H.264 parallel encoder based on CUDA Download PDFInfo
- Publication number
- CN107231558B CN107231558B CN201710368717.5A CN201710368717A CN107231558B CN 107231558 B CN107231558 B CN 107231558B CN 201710368717 A CN201710368717 A CN 201710368717A CN 107231558 B CN107231558 B CN 107231558B
- Authority
- CN
- China
- Prior art keywords
- encoder
- variable
- gpu
- cuda
- global variable
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/157—Assigned coding mode, i.e. the coding mode being predefined or preselected to be further used for selection of another element or parameter
- H04N19/159—Prediction type, e.g. intra-frame, inter-frame or bidirectional frame prediction
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
- H04N19/176—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/42—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
- H04N19/436—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation using parallelised computational arrangements
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/80—Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/90—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using coding techniques not provided for in groups H04N19/10-H04N19/85, e.g. fractals
- H04N19/91—Entropy coding, e.g. variable length coding [VLC] or arithmetic coding
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Computing Systems (AREA)
- Theoretical Computer Science (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
The implementation method of the present invention relates to a kind of H.264 parallel encoder based on CUDA, this method include the parallelization processing of the optimization of encoder overall structure and each functional module on CUDA.The overall structure optimization includes frame level separation being carried out to encoder functionality module, and carry out task division to CPU and GPU.GPU carries out 4 inter-prediction, intraframe predictive coding, entropy coding, deblocking filtering processes in functional module of the module level to encoder respectively, realizes parallelization of the encoder on CUDA from parallel model design and storage model etc..
Description
[technical field]
The invention belongs to field of video encoding more particularly to a kind of realization sides of the H.264 parallel encoder based on CUDA
Method.
[background technique]
Now, H.264/AVC as current most popular video encoding standard, with its high image quality and high compression ratio
Performance and will be widely welcomed, but improve picture quality and code efficiency, while the calculating also considerably increased H.264 is multiple
Miscellaneous degree, and the existing serial structure encoder based on general processor is unable to reach the performance of high definition real-time coding, and it is dedicated
The development cost of hardware is high, and the period is long, poor universality, is not suitable for large-scale use, so needing to find one for H.264 encoder
The efficient implementation method of kind.
[summary of the invention]
In order to solve the above problem in the prior art, the invention proposes a kind of H.264 parallel encoding based on CUDA
The implementation method of device.
The technical solution adopted by the present invention is specific as follows:
A kind of implementation method of the H.264 parallel encoder based on CUDA, method includes the following steps:
(1) H.264 coder structure is adjusted, including frame level separation and right is carried out to encoder functionality module
Task of the encoder on CPU and GPU is divided;
(2) each functional module of encoder parallelization on CUDA is run, i.e., in module level to H.264 encoder
Functional module carry out 4 inter-prediction, intraframe predictive coding, entropy coding, deblocking filtering processes respectively.
Further, the frame level separation of functional module includes the following steps:
(1.1) according to the function of encoder core function, each power function in core function is separated into independent
Loop body makes each power function carry out independent loops in frame level-one;
(1.2) large data structure in encoder is divided into multiple simple data structures according to its life cycle, and
And it is localized according to its actual life cycle.
Further, the step 1.2 specifically includes:
The large data structure is divided into local variable, pseudo- global variable and true global variable three types;
If (a) large data structure is local variable, do not change;
If (b) large data structure is pseudo- global variable, by the method for renaming, by the puppet global variable
Different variables is divided into according to its practical life cycle;
If (c) large data structure is true global variable, in the data structure for investigating the true global variable, it is
No to have Partial Variable be pseudo- global variable or local variable, if so, then isolating these variables from the true global variable
It goes, the processing such as above-mentioned steps b is carried out again to the pseudo- global variable separated.
Further, the task of CPU and GPU, which divides, includes:
(2.1) input of video file is completed by CPU and video file is pre-processed;
(2.2) CPU by video file primitive frame and reference frame send GPU to, subsequent coding is carried out by GPU and is grasped
Make;
(2.3) GPU carries out inter-prediction;
(2.4) GPU executes intraframe predictive coding;
(2.5) GPU carries out parallelization entropy coding;
(2.6) GPU carries out deblocking filtering.
Further, the inter-prediction uses multiresolution multiwindow (MRMW) algorithm.
Further, during intraframe predictive coding, data are loaded in such a way that primary reading is repeatedly handled, i.e., often
The kernel function inside of the data that the multiple macro blocks of a thread block loading processing into corresponding shared memory need, CUDA is logical
It crosses one layer of circulation and predictive coding is carried out to these data, data will be rebuild after the data processing that this reads terminates and are write back,
Then new data are reloaded to be handled;The tissue of corresponding kernel is double loop structure, and outer loop controls variable
The number of corresponding load, the corresponding data loaded every time of memory loop control variable need number of processing.
Further, it is handled as unit of macro block inside kernel, the macro block includes multiple sub-macroblocks, pre- in frame
Surveying coding includes three phases:
First stage: each sub-macroblock transfers to a thread in intra prediction thread block to carry out intra-prediction process;
Second stage: DCT is carried out to a row or column pixel in a sub-macroblock by a thread in DCT thread block
Processing;
Phase III: quantification treatment is carried out to a pixel by a thread in quantization thread block.
Further, during parallelization entropy coding, each CUDA thread block handles 8 continuous macro blocks, each
The entropy coding of one sub-macroblock of thread process.
Further, the deblocking filtering is as unit of frame, calculating and filtering including boundary intensity.
Further, the pretreatment includes the separation and the setting of encoder basic parameter to video component YUV.
This method has the beneficial effect that the execution efficiency for improving H.264 encoder, in the premise for not reducing coding efficiency
The lower computation complexity for reducing coding, improves coding rate.
[Detailed description of the invention]
Described herein the drawings are intended to provide a further understanding of the invention, constitutes part of this application, but
It does not constitute improper limitations of the present invention, in the accompanying drawings:
Fig. 1 is that the present invention divides schematic diagram to the loop body of core function.
Fig. 2 is the schematic diagram that data structure of the present invention is simplified and localizes.
Fig. 3 is the task division figure on CPU-GPU of the present invention.
Fig. 4 is inter prediction encoding storage model of the present invention.
Fig. 5 is CAVLC coding stage CUDA parallel model of the present invention.
Fig. 6 is deblocking filtering function seperated schematic diagram.
[specific embodiment]
Come that the present invention will be described in detail below in conjunction with attached drawing and specific embodiment, illustrative examples therein and says
It is bright to be only used to explain the present invention but not as a limitation of the invention.
The present invention is based on serial program X264 H.264 to be proposed simultaneously based on the analysis to this program according to CUDA framework
Capable H.264 encoder frame and the method that parallel H.264 encoder is realized on CUDA.This method includes following two side
Face:
(1) overall structure optimizes
Overall structure optimization is adjusted to H.264 coder structure, to the H.264 parallel encoder based on CUDA
Frame is designed, and the adjustment and design mainly include two aspects: carrying out frame level separation to encoder functionality module;And it is right
CPU and GPU carries out task division.
(2) parallelization of each functional module on CUDA is distinguished in functional module of the module level to H.264 encoder
Inter-prediction, intraframe predictive coding, 4 entropy coding, deblocking filtering processes are carried out, from the side such as parallel model design and storage model
Realize parallelization of the encoder on CUDA in face.
The two aspects of this method are described in detail below.
The frame level of functional module separates:
Specific step is as follows for the frame level separation of functional module:
(1.1) the loose function degree of coupling
In H.264 encoder, core function (main function) is a big loop body, as shown in above Fig. 1, A
For main function comprising the D1 ' ... of lower section, D5, D6 ..., D7, E1, E2, E3, E4, E5 all functions are whole as one
A big loop body, recycling each time for main function are carried out all function, this mode loop body path length,
If directly carrying out the exploitation of concurrent program, function load is too heavy.
Therefore the function of the invention according to core function, by the entire Loop partitioning of core function at multiple relatively independent
Loop body, as shown in below Fig. 1, each function carries out independent loops in frame level-one, by D1 ' ..., D5, D6 ..., D7, E1, E2,
Each of E3, E4, E5 are divided into independent loop body, such as D1 ' function is a loop body, and D5 function is a loop body, D7 letter
Number is a loop body, and E1 function is loop body etc..In this way, each function independently focuses on a task, independent loops,
During each loop body executes, the locality of instruction is more preferable, and Failure count is low.
(1.2) data structure in H.264 encoder is simplified and is localized
Referring to fig. 2, in order to reduce the time that data are transmitted, the present invention is by the large data structure in encoder according to its life
The life period is divided into multiple simple data structures, and is localized according to its actual life cycle.Specifically, described big
Type data structure is segmented into local variable, pseudo- global variable and true global variable three types.
For the local variable A in local variable, such as Fig. 2 in function 0, do not change.
For pseudo- global variable B, although being global variable, the sphere of action of the variable can split into multiple realities
The puppet global variable is divided into different variables according to its practical life cycle then by the method for renaming by border life cycle.
As shown in Fig. 2, the variate-value between function 0 and function 1 is not related for pseudo- global variable B, 2 can be split into
Life cycle, thus by the puppet global variable renamed as B0 in function 1, and variable B is not used in function 2, then
Variable B can not be re-defined in function 2.
For true global variable C, then in the data structure for needing to investigate the true global variable, if Partial Variable is pseudo-
Global variable or local variable, if so, then these variables are separated from C, the pseudo- global variable separated again into
Row as above processing.If shown in Fig. 2, true global variable C can split into a pseudo- global variable and a local variable, then limit
The sphere of action of the puppet global variable is made in function 0 and function 1, limits the sphere of action of local variable C0 only in function 2.
The task of CPU and GPU divides
With reference to Fig. 3, it illustrates the present invention H.264 task of each functional module of encoder on CPU and GPU divide with
And the data mobility status between CPU-GPU.
(2.1) input of video file is completed by CPU first and video file is pre-processed, including to video component
The separation of YUV and the setting of encoder basic parameter etc..
(2.2) CPU sends primitive frame and reference frame to GPU, carries out subsequent encoding operation by GPU.
GPU is handled frame by executing four modules, basic procedure is: pre- to the interframe of a frame as unit of frame
Surveying terminates and then carries out corresponding intraframe predictive coding, then entropy coding is carried out to obtained variable design coefficient, with such
It pushes away, until the entropy coding of whole frame and deblocking filtering pass result data back CPU again after terminating.
(2.3) GPU executes inter-prediction.
Inter-prediction is that demand the best part is H.264 calculated in encoder, and calculation amount needed for conventional inter is about
The 70% of entire encoder is accounted for, although picture quality is preferably complicated.The present invention uses the more windows of multiresolution in the prior art
Mouth (MRMW) algorithm carries out inter-prediction.It is opposite using MRMW algorithm since the present invention has carried out frame level separation to functional module
The time of inter-prediction can be greatly reduced in the prior art.
(2.4) GPU executes intraframe predictive coding.
Intra prediction degree of parallelism is not high, and it is 1 macro block that CUDA per thread block can handle maximum data volume simultaneously
(256 pixel), pressure for shared memory is simultaneously little, and there are the passes between Producer-consumer problem between adjacent macroblocks
System, in order to be reduced to the access times of related data in global storage, the present invention adds in such a way that primary reading is repeatedly handled
Carry data.That is the data of the multiple macro blocks needs of per thread block loading processing into corresponding shared memory, CUDA's
Predictive coding is carried out to these data by one layer of circulation inside kernel function, after the data processing that this reads terminates
Data will be rebuild to write back, new data are then reloaded and handled.The tissue of corresponding kernel is double loop structure, outside
The number of the corresponding load of layer loop control variable, the corresponding data loaded every time of memory loop control variable need number of processing.
Referring to attached drawing 4, it illustrates the storage models of intraframe predictive coding.Fig. 4 upper left is by multiple macro blocks (MB) group
At a picture frame, every time read frame data when, all from original image frame read a strip, and store to share deposit
In reservoir (as shown in Fig. 4 upper right), the strip is handled as unit of macro block inside kernel.
In the middle part of Fig. 4 and lower part shows kernel to the treatment process of a macro block.The left of Fig. 4 partially illustrates one
The macro block of 4*4 comprising sub-macroblock 0 arrives sub-macroblock 15, and each sub-macroblock includes 4*4 pixel, and intraframe predictive coding includes
Three phases:
First stage: if Fig. 4 is left and bottom left section, each sub-macroblock transfers to intra prediction thread block (prediction
Thread block) in a thread carry out intra-prediction process, need 16 threads (thread 0 to thread 15) altogether.
Second stage: if the positive neutralization positive bottom of Fig. 4 is divided, by a thread in DCT thread block in a sub-macroblock
A row or column pixel carry out DCT processing, need 64 threads (thread 0 to thread 63) altogether.
Phase III: such as the right neutralization lower right-most portion of Fig. 4, by one in quantization thread block (quant Thread block)
A thread carries out quantification treatment (in a manner of row major) to a pixel, needs 256 threads (0 value thread 255 of thread) altogether.
(2.5) GPU carries out parallelization entropy coding.
With reference to attached drawing 5, it is CAVLC coding stage CUDA parallel model, shows the brightness AC compounent entropy coding stage
The mapping relations of data and thread.Wherein each CUDA thread block handles 8 continuous macro blocks, i.e. thread block B0Handle the 0th row
In MB0 to MB7, thread block B14MB112 to MB119 is handled, and so on.Continuous 16 threads are handled respectively in thread block
16 sub-macroblocks in one macro block.1200 thread blocks are shared in Fig. 5, per thread block includes 128 threads, and Thread Count reaches
130560, the entropy coding of each one sub-macroblock of thread process, to realize 130560 thread parallel entropy codings.
Although entropy coding is the component of branch's intensity, the frame level Jing Guo functional module separates, and various components is separated,
Some individual paths are eliminated, realize that large-scale data parallel is enough to make up branch operation bring shadow by a large amount of threads
It rings.
(2.6) GPU carries out deblocking filtering, as shown in fig. 6, the deblocking filtering is as unit of frame, including boundary intensity
It calculates and filters.
By the above process, the present invention realizes the parallelization H.264 on CUDA in terms of system and module level two
Process reduces the computation complexity of coding under the premise of not reducing coding efficiency, improves coding rate.
The above description is only a preferred embodiment of the present invention, thus it is all according to the configuration described in the scope of the patent application of the present invention,
The equivalent change or modification that feature and principle are done, is included in the scope of the patent application of the present invention.
Claims (6)
1. a kind of implementation method of the H.264 parallel encoder based on CUDA, which is characterized in that method includes the following steps:
(1) H.264 coder structure is adjusted, including frame level separation is carried out to encoder functionality module, and to the volume
Task of the code device on CPU and GPU is divided;
(2) each functional module of encoder parallelization on CUDA is run, i.e., in module level to the function of H.264 encoder
Energy module carries out 4 inter-prediction, intraframe predictive coding, entropy coding, deblocking filtering processes respectively;
The frame level separation of functional module includes the following steps:
(1.1) according to the function of encoder core function, each power function in core function is separated into independent circulation
Body makes each power function carry out independent loops in frame level-one;
(1.2) large data structure in encoder is divided into multiple simple data structures, and root according to its life cycle
It is localized according to its actual life cycle;
The step 1.2 specifically includes:
The large data structure is divided into local variable, pseudo- global variable and true global variable three types;It is described pseudo- global
Although variable refers to global variable, sphere of action can split into the variable of multiple practical life cycles;
If (a) large data structure is local variable, do not change;
If (b) large data structure is pseudo- global variable, by renaming method, by the puppet global variable according to
Its practical life cycle is divided into different variables;
If (c) large data structure is true global variable, in the data structure for investigating the true global variable, if having
Partial Variable is pseudo- global variable or local variable, if so, then separate these variables from the true global variable, it is right
The pseudo- global variable separated carries out the processing such as above-mentioned steps b again;
During intraframe predictive coding, data are loaded in such a way that primary reading is repeatedly handled, i.e. per thread block is to correspondence
Shared memory in the data that need of the multiple macro blocks of loading processing, by one layer of circulation to this inside the kernel function of CUDA
A little data carry out predictive coding, and data will be rebuild after the data processing that this reads terminates and are write back, are then reloaded new
Data are handled;The tissue of corresponding kernel is double loop structure, and outer loop controls the number of the corresponding load of variable,
The corresponding data loaded every time of memory loop control variable need number of processing;
It is handled as unit of macro block inside kernel, the macro block includes multiple sub-macroblocks, and intraframe predictive coding includes three
A stage:
First stage: each sub-macroblock transfers to a thread in intra prediction thread block to carry out intra-prediction process;
Second stage: DCT is carried out to a row or column pixel in a sub-macroblock by a thread in DCT thread block
Reason;
Phase III: quantification treatment is carried out to a pixel by a thread in quantization thread block.
2. the method according to claim 1, wherein the task division of CPU and GPU includes:
(2.1) input of video file is completed by CPU and video file is pre-processed;
(2.2) CPU by video file primitive frame and reference frame send GPU to, subsequent encoding operation is carried out by GPU;
(2.3) GPU carries out inter-prediction;
(2.4) GPU executes intraframe predictive coding;
(2.5) GPU carries out parallelization entropy coding;
(2.6) GPU carries out deblocking filtering.
3. according to the method described in claim 2, it is characterized in that, the inter-prediction uses multiresolution multiwindow (MRMW)
Algorithm.
4. according to method described in claim 2-3 any one, which is characterized in that during parallelization entropy coding, each
CUDA thread block handles 8 continuous macro blocks, the entropy coding of each one sub-macroblock of thread process.
5. according to method described in claim 2-3 any one, which is characterized in that the deblocking filtering is as unit of frame, packet
Include the calculating and filtering of boundary intensity.
6. according to method described in claim 2-3 any one, which is characterized in that the pretreatment includes to video component
The separation of YUV and the setting of encoder basic parameter.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710368717.5A CN107231558B (en) | 2017-05-23 | 2017-05-23 | A kind of implementation method of the H.264 parallel encoder based on CUDA |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710368717.5A CN107231558B (en) | 2017-05-23 | 2017-05-23 | A kind of implementation method of the H.264 parallel encoder based on CUDA |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107231558A CN107231558A (en) | 2017-10-03 |
CN107231558B true CN107231558B (en) | 2019-10-22 |
Family
ID=59933794
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710368717.5A Active CN107231558B (en) | 2017-05-23 | 2017-05-23 | A kind of implementation method of the H.264 parallel encoder based on CUDA |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107231558B (en) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108012156B (en) * | 2017-11-17 | 2020-09-25 | 深圳市华尊科技股份有限公司 | Video processing method and control platform |
WO2021042232A1 (en) * | 2019-09-02 | 2021-03-11 | Beijing Voyager Technology Co., Ltd. | Methods and systems for improved image encoding |
CN110677646B (en) * | 2019-09-24 | 2022-01-11 | 杭州当虹科技股份有限公司 | Intra-frame coding prediction method based on CPU + GPU hybrid coding |
CN114765684B (en) * | 2021-01-12 | 2023-05-09 | 四川大学 | JPEG parallel entropy coding method based on GPU |
CN115802055B (en) * | 2023-01-30 | 2023-06-20 | 孔像汽车科技(武汉)有限公司 | Image defogging processing method and device based on FPGA, chip and storage medium |
CN116483545B (en) * | 2023-06-19 | 2023-09-29 | 支付宝(杭州)信息技术有限公司 | Multitasking execution method, device and equipment |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP2192781A2 (en) * | 2008-11-28 | 2010-06-02 | Thomson Licensing | Method for video decoding supported by graphics processing unit |
CN102404561A (en) * | 2010-09-14 | 2012-04-04 | 盛乐信息技术(上海)有限公司 | Method for achieving moving picture experts group (MPEG) 4I frame encoding on compute unified device architecture (CUDA) |
CN104022756A (en) * | 2014-06-03 | 2014-09-03 | 西安电子科技大学 | Modified particle filter method based on GPU (Graphic Processing Unit) architecture |
CN105491377A (en) * | 2015-12-15 | 2016-04-13 | 华中科技大学 | Video decoding macro-block-grade parallel scheduling method for perceiving calculation complexity |
CN105956021A (en) * | 2016-04-22 | 2016-09-21 | 华中科技大学 | Automated task parallel method suitable for distributed machine learning and system thereof |
-
2017
- 2017-05-23 CN CN201710368717.5A patent/CN107231558B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP2192781A2 (en) * | 2008-11-28 | 2010-06-02 | Thomson Licensing | Method for video decoding supported by graphics processing unit |
CN102404561A (en) * | 2010-09-14 | 2012-04-04 | 盛乐信息技术(上海)有限公司 | Method for achieving moving picture experts group (MPEG) 4I frame encoding on compute unified device architecture (CUDA) |
CN104022756A (en) * | 2014-06-03 | 2014-09-03 | 西安电子科技大学 | Modified particle filter method based on GPU (Graphic Processing Unit) architecture |
CN105491377A (en) * | 2015-12-15 | 2016-04-13 | 华中科技大学 | Video decoding macro-block-grade parallel scheduling method for perceiving calculation complexity |
CN105956021A (en) * | 2016-04-22 | 2016-09-21 | 华中科技大学 | Automated task parallel method suitable for distributed machine learning and system thereof |
Non-Patent Citations (2)
Title |
---|
A Parallel H.264 Encoder with CUDA:Mapping and Evaluation;Nan Wu,et al.;《2012 IEEE 18th International Conference on Parallel and Distributed Systems》;20130117;第276-281页 * |
基于TMS320DM8168的高清视频编码技术与实现;姜忠兵,等.;《数据采集与处理》;20121130;第27卷(第6期);第690-694页 * |
Also Published As
Publication number | Publication date |
---|---|
CN107231558A (en) | 2017-10-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107231558B (en) | A kind of implementation method of the H.264 parallel encoder based on CUDA | |
CN104869398B (en) | A kind of CABAC realized based on CPU+GPU heterogeneous platforms in HEVC parallel method | |
CN105491377B (en) | A kind of video decoded macroblock grade Method of Scheduling Parallel of computation complexity perception | |
CN102547296B (en) | Motion estimation accelerating circuit and motion estimation method as well as loop filtering accelerating circuit | |
CN100586180C (en) | Be used to carry out the method and system of de-blocking filter | |
CN101115207B (en) | Method and device for implementing interframe forecast based on relativity between future positions | |
CN108449603B (en) | Based on the multi-level task level of multi-core platform and the parallel HEVC coding/decoding method of data level | |
CN109495743A (en) | A kind of parallelization method for video coding based on isomery many places platform | |
CN105791829B (en) | A kind of parallel intra-frame prediction method of HEVC based on multi-core platform | |
CN101971633A (en) | A video coding system with reference frame compression | |
CN102625108B (en) | Multi-core-processor-based H.264 decoding method | |
CN105516728B (en) | A kind of parallel intra-frame prediction method of H.265/HEVC middle 8x8 sub-macroblock | |
CN102970531A (en) | Method for implementing near-lossless image compression encoder hardware based on joint photographic experts group lossless and near-lossless compression of continuous-tone still image (JPEG-LS) | |
CN103297777A (en) | Method and device for increasing video encoding speed | |
CN103747250A (en) | Method for 4*4 sub-macroblock parallel intraframe prediction in H.264/AVC | |
CN110337002A (en) | The multi-level efficient parallel decoding algorithm of one kind HEVC in multi-core processor platform | |
CN101635849B (en) | Loop filtering method and loop filter | |
CN1306826C (en) | Loop filter based on multistage parallel pipeline mode | |
CN107483948A (en) | Pixel macroblock processing method in a kind of webp compressions processing | |
CN109391816B (en) | Parallel processing method for realizing entropy coding link in HEVC (high efficiency video coding) based on CPU (Central processing Unit) and GPU (graphics processing Unit) heterogeneous platform | |
CN108540797A (en) | HEVC based on multi-core platform combines WPP coding methods within the frame/frames | |
CN111669580B (en) | Method, decoding end, encoding end and system for encoding and decoding | |
CN102196272A (en) | P frame encoding method and device | |
CN110446043A (en) | A kind of HEVC fine grained parallel coding method based on multi-core platform | |
CN104396246B (en) | Video compressing and encoding method and encoder |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information | ||
CB02 | Change of applicant information |
Address after: Room 307, 309 and 311, Room 959, Jiayuan Road, Yuanhe Street, Xiangcheng District, Suzhou City, Jiangsu Province Applicant after: Jiangsu fire Interactive Technology Co., Ltd. Address before: High tech Zone Suzhou city Jiangsu province 215000 Chuk Yuen Road No. 209 Applicant before: Jiangsu fire Interactive Technology Co., Ltd. |
|
GR01 | Patent grant | ||
GR01 | Patent grant |