CN107231558B - A kind of implementation method of the H.264 parallel encoder based on CUDA - Google Patents

A kind of implementation method of the H.264 parallel encoder based on CUDA Download PDF

Info

Publication number
CN107231558B
CN107231558B CN201710368717.5A CN201710368717A CN107231558B CN 107231558 B CN107231558 B CN 107231558B CN 201710368717 A CN201710368717 A CN 201710368717A CN 107231558 B CN107231558 B CN 107231558B
Authority
CN
China
Prior art keywords
encoder
variable
gpu
cuda
global variable
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710368717.5A
Other languages
Chinese (zh)
Other versions
CN107231558A (en
Inventor
杨振
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Fire Interactive Technology Co Ltd
Original Assignee
Jiangsu Fire Interactive Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Fire Interactive Technology Co Ltd filed Critical Jiangsu Fire Interactive Technology Co Ltd
Priority to CN201710368717.5A priority Critical patent/CN107231558B/en
Publication of CN107231558A publication Critical patent/CN107231558A/en
Application granted granted Critical
Publication of CN107231558B publication Critical patent/CN107231558B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/157Assigned coding mode, i.e. the coding mode being predefined or preselected to be further used for selection of another element or parameter
    • H04N19/159Prediction type, e.g. intra-frame, inter-frame or bidirectional frame prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/42Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
    • H04N19/436Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation using parallelised computational arrangements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/80Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/90Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using coding techniques not provided for in groups H04N19/10-H04N19/85, e.g. fractals
    • H04N19/91Entropy coding, e.g. variable length coding [VLC] or arithmetic coding

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computing Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The implementation method of the present invention relates to a kind of H.264 parallel encoder based on CUDA, this method include the parallelization processing of the optimization of encoder overall structure and each functional module on CUDA.The overall structure optimization includes frame level separation being carried out to encoder functionality module, and carry out task division to CPU and GPU.GPU carries out 4 inter-prediction, intraframe predictive coding, entropy coding, deblocking filtering processes in functional module of the module level to encoder respectively, realizes parallelization of the encoder on CUDA from parallel model design and storage model etc..

Description

A kind of implementation method of the H.264 parallel encoder based on CUDA
[technical field]
The invention belongs to field of video encoding more particularly to a kind of realization sides of the H.264 parallel encoder based on CUDA Method.
[background technique]
Now, H.264/AVC as current most popular video encoding standard, with its high image quality and high compression ratio Performance and will be widely welcomed, but improve picture quality and code efficiency, while the calculating also considerably increased H.264 is multiple Miscellaneous degree, and the existing serial structure encoder based on general processor is unable to reach the performance of high definition real-time coding, and it is dedicated The development cost of hardware is high, and the period is long, poor universality, is not suitable for large-scale use, so needing to find one for H.264 encoder The efficient implementation method of kind.
[summary of the invention]
In order to solve the above problem in the prior art, the invention proposes a kind of H.264 parallel encoding based on CUDA The implementation method of device.
The technical solution adopted by the present invention is specific as follows:
A kind of implementation method of the H.264 parallel encoder based on CUDA, method includes the following steps:
(1) H.264 coder structure is adjusted, including frame level separation and right is carried out to encoder functionality module Task of the encoder on CPU and GPU is divided;
(2) each functional module of encoder parallelization on CUDA is run, i.e., in module level to H.264 encoder Functional module carry out 4 inter-prediction, intraframe predictive coding, entropy coding, deblocking filtering processes respectively.
Further, the frame level separation of functional module includes the following steps:
(1.1) according to the function of encoder core function, each power function in core function is separated into independent Loop body makes each power function carry out independent loops in frame level-one;
(1.2) large data structure in encoder is divided into multiple simple data structures according to its life cycle, and And it is localized according to its actual life cycle.
Further, the step 1.2 specifically includes:
The large data structure is divided into local variable, pseudo- global variable and true global variable three types;
If (a) large data structure is local variable, do not change;
If (b) large data structure is pseudo- global variable, by the method for renaming, by the puppet global variable Different variables is divided into according to its practical life cycle;
If (c) large data structure is true global variable, in the data structure for investigating the true global variable, it is No to have Partial Variable be pseudo- global variable or local variable, if so, then isolating these variables from the true global variable It goes, the processing such as above-mentioned steps b is carried out again to the pseudo- global variable separated.
Further, the task of CPU and GPU, which divides, includes:
(2.1) input of video file is completed by CPU and video file is pre-processed;
(2.2) CPU by video file primitive frame and reference frame send GPU to, subsequent coding is carried out by GPU and is grasped Make;
(2.3) GPU carries out inter-prediction;
(2.4) GPU executes intraframe predictive coding;
(2.5) GPU carries out parallelization entropy coding;
(2.6) GPU carries out deblocking filtering.
Further, the inter-prediction uses multiresolution multiwindow (MRMW) algorithm.
Further, during intraframe predictive coding, data are loaded in such a way that primary reading is repeatedly handled, i.e., often The kernel function inside of the data that the multiple macro blocks of a thread block loading processing into corresponding shared memory need, CUDA is logical It crosses one layer of circulation and predictive coding is carried out to these data, data will be rebuild after the data processing that this reads terminates and are write back, Then new data are reloaded to be handled;The tissue of corresponding kernel is double loop structure, and outer loop controls variable The number of corresponding load, the corresponding data loaded every time of memory loop control variable need number of processing.
Further, it is handled as unit of macro block inside kernel, the macro block includes multiple sub-macroblocks, pre- in frame Surveying coding includes three phases:
First stage: each sub-macroblock transfers to a thread in intra prediction thread block to carry out intra-prediction process;
Second stage: DCT is carried out to a row or column pixel in a sub-macroblock by a thread in DCT thread block Processing;
Phase III: quantification treatment is carried out to a pixel by a thread in quantization thread block.
Further, during parallelization entropy coding, each CUDA thread block handles 8 continuous macro blocks, each The entropy coding of one sub-macroblock of thread process.
Further, the deblocking filtering is as unit of frame, calculating and filtering including boundary intensity.
Further, the pretreatment includes the separation and the setting of encoder basic parameter to video component YUV.
This method has the beneficial effect that the execution efficiency for improving H.264 encoder, in the premise for not reducing coding efficiency The lower computation complexity for reducing coding, improves coding rate.
[Detailed description of the invention]
Described herein the drawings are intended to provide a further understanding of the invention, constitutes part of this application, but It does not constitute improper limitations of the present invention, in the accompanying drawings:
Fig. 1 is that the present invention divides schematic diagram to the loop body of core function.
Fig. 2 is the schematic diagram that data structure of the present invention is simplified and localizes.
Fig. 3 is the task division figure on CPU-GPU of the present invention.
Fig. 4 is inter prediction encoding storage model of the present invention.
Fig. 5 is CAVLC coding stage CUDA parallel model of the present invention.
Fig. 6 is deblocking filtering function seperated schematic diagram.
[specific embodiment]
Come that the present invention will be described in detail below in conjunction with attached drawing and specific embodiment, illustrative examples therein and says It is bright to be only used to explain the present invention but not as a limitation of the invention.
The present invention is based on serial program X264 H.264 to be proposed simultaneously based on the analysis to this program according to CUDA framework Capable H.264 encoder frame and the method that parallel H.264 encoder is realized on CUDA.This method includes following two side Face:
(1) overall structure optimizes
Overall structure optimization is adjusted to H.264 coder structure, to the H.264 parallel encoder based on CUDA Frame is designed, and the adjustment and design mainly include two aspects: carrying out frame level separation to encoder functionality module;And it is right CPU and GPU carries out task division.
(2) parallelization of each functional module on CUDA is distinguished in functional module of the module level to H.264 encoder Inter-prediction, intraframe predictive coding, 4 entropy coding, deblocking filtering processes are carried out, from the side such as parallel model design and storage model Realize parallelization of the encoder on CUDA in face.
The two aspects of this method are described in detail below.
The frame level of functional module separates:
Specific step is as follows for the frame level separation of functional module:
(1.1) the loose function degree of coupling
In H.264 encoder, core function (main function) is a big loop body, as shown in above Fig. 1, A For main function comprising the D1 ' ... of lower section, D5, D6 ..., D7, E1, E2, E3, E4, E5 all functions are whole as one A big loop body, recycling each time for main function are carried out all function, this mode loop body path length, If directly carrying out the exploitation of concurrent program, function load is too heavy.
Therefore the function of the invention according to core function, by the entire Loop partitioning of core function at multiple relatively independent Loop body, as shown in below Fig. 1, each function carries out independent loops in frame level-one, by D1 ' ..., D5, D6 ..., D7, E1, E2, Each of E3, E4, E5 are divided into independent loop body, such as D1 ' function is a loop body, and D5 function is a loop body, D7 letter Number is a loop body, and E1 function is loop body etc..In this way, each function independently focuses on a task, independent loops, During each loop body executes, the locality of instruction is more preferable, and Failure count is low.
(1.2) data structure in H.264 encoder is simplified and is localized
Referring to fig. 2, in order to reduce the time that data are transmitted, the present invention is by the large data structure in encoder according to its life The life period is divided into multiple simple data structures, and is localized according to its actual life cycle.Specifically, described big Type data structure is segmented into local variable, pseudo- global variable and true global variable three types.
For the local variable A in local variable, such as Fig. 2 in function 0, do not change.
For pseudo- global variable B, although being global variable, the sphere of action of the variable can split into multiple realities The puppet global variable is divided into different variables according to its practical life cycle then by the method for renaming by border life cycle. As shown in Fig. 2, the variate-value between function 0 and function 1 is not related for pseudo- global variable B, 2 can be split into Life cycle, thus by the puppet global variable renamed as B0 in function 1, and variable B is not used in function 2, then Variable B can not be re-defined in function 2.
For true global variable C, then in the data structure for needing to investigate the true global variable, if Partial Variable is pseudo- Global variable or local variable, if so, then these variables are separated from C, the pseudo- global variable separated again into Row as above processing.If shown in Fig. 2, true global variable C can split into a pseudo- global variable and a local variable, then limit The sphere of action of the puppet global variable is made in function 0 and function 1, limits the sphere of action of local variable C0 only in function 2.
The task of CPU and GPU divides
With reference to Fig. 3, it illustrates the present invention H.264 task of each functional module of encoder on CPU and GPU divide with And the data mobility status between CPU-GPU.
(2.1) input of video file is completed by CPU first and video file is pre-processed, including to video component The separation of YUV and the setting of encoder basic parameter etc..
(2.2) CPU sends primitive frame and reference frame to GPU, carries out subsequent encoding operation by GPU.
GPU is handled frame by executing four modules, basic procedure is: pre- to the interframe of a frame as unit of frame Surveying terminates and then carries out corresponding intraframe predictive coding, then entropy coding is carried out to obtained variable design coefficient, with such It pushes away, until the entropy coding of whole frame and deblocking filtering pass result data back CPU again after terminating.
(2.3) GPU executes inter-prediction.
Inter-prediction is that demand the best part is H.264 calculated in encoder, and calculation amount needed for conventional inter is about The 70% of entire encoder is accounted for, although picture quality is preferably complicated.The present invention uses the more windows of multiresolution in the prior art Mouth (MRMW) algorithm carries out inter-prediction.It is opposite using MRMW algorithm since the present invention has carried out frame level separation to functional module The time of inter-prediction can be greatly reduced in the prior art.
(2.4) GPU executes intraframe predictive coding.
Intra prediction degree of parallelism is not high, and it is 1 macro block that CUDA per thread block can handle maximum data volume simultaneously (256 pixel), pressure for shared memory is simultaneously little, and there are the passes between Producer-consumer problem between adjacent macroblocks System, in order to be reduced to the access times of related data in global storage, the present invention adds in such a way that primary reading is repeatedly handled Carry data.That is the data of the multiple macro blocks needs of per thread block loading processing into corresponding shared memory, CUDA's Predictive coding is carried out to these data by one layer of circulation inside kernel function, after the data processing that this reads terminates Data will be rebuild to write back, new data are then reloaded and handled.The tissue of corresponding kernel is double loop structure, outside The number of the corresponding load of layer loop control variable, the corresponding data loaded every time of memory loop control variable need number of processing.
Referring to attached drawing 4, it illustrates the storage models of intraframe predictive coding.Fig. 4 upper left is by multiple macro blocks (MB) group At a picture frame, every time read frame data when, all from original image frame read a strip, and store to share deposit In reservoir (as shown in Fig. 4 upper right), the strip is handled as unit of macro block inside kernel.
In the middle part of Fig. 4 and lower part shows kernel to the treatment process of a macro block.The left of Fig. 4 partially illustrates one The macro block of 4*4 comprising sub-macroblock 0 arrives sub-macroblock 15, and each sub-macroblock includes 4*4 pixel, and intraframe predictive coding includes Three phases:
First stage: if Fig. 4 is left and bottom left section, each sub-macroblock transfers to intra prediction thread block (prediction Thread block) in a thread carry out intra-prediction process, need 16 threads (thread 0 to thread 15) altogether.
Second stage: if the positive neutralization positive bottom of Fig. 4 is divided, by a thread in DCT thread block in a sub-macroblock A row or column pixel carry out DCT processing, need 64 threads (thread 0 to thread 63) altogether.
Phase III: such as the right neutralization lower right-most portion of Fig. 4, by one in quantization thread block (quant Thread block) A thread carries out quantification treatment (in a manner of row major) to a pixel, needs 256 threads (0 value thread 255 of thread) altogether.
(2.5) GPU carries out parallelization entropy coding.
With reference to attached drawing 5, it is CAVLC coding stage CUDA parallel model, shows the brightness AC compounent entropy coding stage The mapping relations of data and thread.Wherein each CUDA thread block handles 8 continuous macro blocks, i.e. thread block B0Handle the 0th row In MB0 to MB7, thread block B14MB112 to MB119 is handled, and so on.Continuous 16 threads are handled respectively in thread block 16 sub-macroblocks in one macro block.1200 thread blocks are shared in Fig. 5, per thread block includes 128 threads, and Thread Count reaches 130560, the entropy coding of each one sub-macroblock of thread process, to realize 130560 thread parallel entropy codings. Although entropy coding is the component of branch's intensity, the frame level Jing Guo functional module separates, and various components is separated, Some individual paths are eliminated, realize that large-scale data parallel is enough to make up branch operation bring shadow by a large amount of threads It rings.
(2.6) GPU carries out deblocking filtering, as shown in fig. 6, the deblocking filtering is as unit of frame, including boundary intensity It calculates and filters.
By the above process, the present invention realizes the parallelization H.264 on CUDA in terms of system and module level two Process reduces the computation complexity of coding under the premise of not reducing coding efficiency, improves coding rate.
The above description is only a preferred embodiment of the present invention, thus it is all according to the configuration described in the scope of the patent application of the present invention, The equivalent change or modification that feature and principle are done, is included in the scope of the patent application of the present invention.

Claims (6)

1. a kind of implementation method of the H.264 parallel encoder based on CUDA, which is characterized in that method includes the following steps:
(1) H.264 coder structure is adjusted, including frame level separation is carried out to encoder functionality module, and to the volume Task of the code device on CPU and GPU is divided;
(2) each functional module of encoder parallelization on CUDA is run, i.e., in module level to the function of H.264 encoder Energy module carries out 4 inter-prediction, intraframe predictive coding, entropy coding, deblocking filtering processes respectively;
The frame level separation of functional module includes the following steps:
(1.1) according to the function of encoder core function, each power function in core function is separated into independent circulation Body makes each power function carry out independent loops in frame level-one;
(1.2) large data structure in encoder is divided into multiple simple data structures, and root according to its life cycle It is localized according to its actual life cycle;
The step 1.2 specifically includes:
The large data structure is divided into local variable, pseudo- global variable and true global variable three types;It is described pseudo- global Although variable refers to global variable, sphere of action can split into the variable of multiple practical life cycles;
If (a) large data structure is local variable, do not change;
If (b) large data structure is pseudo- global variable, by renaming method, by the puppet global variable according to Its practical life cycle is divided into different variables;
If (c) large data structure is true global variable, in the data structure for investigating the true global variable, if having Partial Variable is pseudo- global variable or local variable, if so, then separate these variables from the true global variable, it is right The pseudo- global variable separated carries out the processing such as above-mentioned steps b again;
During intraframe predictive coding, data are loaded in such a way that primary reading is repeatedly handled, i.e. per thread block is to correspondence Shared memory in the data that need of the multiple macro blocks of loading processing, by one layer of circulation to this inside the kernel function of CUDA A little data carry out predictive coding, and data will be rebuild after the data processing that this reads terminates and are write back, are then reloaded new Data are handled;The tissue of corresponding kernel is double loop structure, and outer loop controls the number of the corresponding load of variable, The corresponding data loaded every time of memory loop control variable need number of processing;
It is handled as unit of macro block inside kernel, the macro block includes multiple sub-macroblocks, and intraframe predictive coding includes three A stage:
First stage: each sub-macroblock transfers to a thread in intra prediction thread block to carry out intra-prediction process;
Second stage: DCT is carried out to a row or column pixel in a sub-macroblock by a thread in DCT thread block Reason;
Phase III: quantification treatment is carried out to a pixel by a thread in quantization thread block.
2. the method according to claim 1, wherein the task division of CPU and GPU includes:
(2.1) input of video file is completed by CPU and video file is pre-processed;
(2.2) CPU by video file primitive frame and reference frame send GPU to, subsequent encoding operation is carried out by GPU;
(2.3) GPU carries out inter-prediction;
(2.4) GPU executes intraframe predictive coding;
(2.5) GPU carries out parallelization entropy coding;
(2.6) GPU carries out deblocking filtering.
3. according to the method described in claim 2, it is characterized in that, the inter-prediction uses multiresolution multiwindow (MRMW) Algorithm.
4. according to method described in claim 2-3 any one, which is characterized in that during parallelization entropy coding, each CUDA thread block handles 8 continuous macro blocks, the entropy coding of each one sub-macroblock of thread process.
5. according to method described in claim 2-3 any one, which is characterized in that the deblocking filtering is as unit of frame, packet Include the calculating and filtering of boundary intensity.
6. according to method described in claim 2-3 any one, which is characterized in that the pretreatment includes to video component The separation of YUV and the setting of encoder basic parameter.
CN201710368717.5A 2017-05-23 2017-05-23 A kind of implementation method of the H.264 parallel encoder based on CUDA Active CN107231558B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710368717.5A CN107231558B (en) 2017-05-23 2017-05-23 A kind of implementation method of the H.264 parallel encoder based on CUDA

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710368717.5A CN107231558B (en) 2017-05-23 2017-05-23 A kind of implementation method of the H.264 parallel encoder based on CUDA

Publications (2)

Publication Number Publication Date
CN107231558A CN107231558A (en) 2017-10-03
CN107231558B true CN107231558B (en) 2019-10-22

Family

ID=59933794

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710368717.5A Active CN107231558B (en) 2017-05-23 2017-05-23 A kind of implementation method of the H.264 parallel encoder based on CUDA

Country Status (1)

Country Link
CN (1) CN107231558B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108012156B (en) * 2017-11-17 2020-09-25 深圳市华尊科技股份有限公司 Video processing method and control platform
WO2021042232A1 (en) * 2019-09-02 2021-03-11 Beijing Voyager Technology Co., Ltd. Methods and systems for improved image encoding
CN110677646B (en) * 2019-09-24 2022-01-11 杭州当虹科技股份有限公司 Intra-frame coding prediction method based on CPU + GPU hybrid coding
CN114765684B (en) * 2021-01-12 2023-05-09 四川大学 JPEG parallel entropy coding method based on GPU
CN115802055B (en) * 2023-01-30 2023-06-20 孔像汽车科技(武汉)有限公司 Image defogging processing method and device based on FPGA, chip and storage medium
CN116483545B (en) * 2023-06-19 2023-09-29 支付宝(杭州)信息技术有限公司 Multitasking execution method, device and equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2192781A2 (en) * 2008-11-28 2010-06-02 Thomson Licensing Method for video decoding supported by graphics processing unit
CN102404561A (en) * 2010-09-14 2012-04-04 盛乐信息技术(上海)有限公司 Method for achieving moving picture experts group (MPEG) 4I frame encoding on compute unified device architecture (CUDA)
CN104022756A (en) * 2014-06-03 2014-09-03 西安电子科技大学 Modified particle filter method based on GPU (Graphic Processing Unit) architecture
CN105491377A (en) * 2015-12-15 2016-04-13 华中科技大学 Video decoding macro-block-grade parallel scheduling method for perceiving calculation complexity
CN105956021A (en) * 2016-04-22 2016-09-21 华中科技大学 Automated task parallel method suitable for distributed machine learning and system thereof

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2192781A2 (en) * 2008-11-28 2010-06-02 Thomson Licensing Method for video decoding supported by graphics processing unit
CN102404561A (en) * 2010-09-14 2012-04-04 盛乐信息技术(上海)有限公司 Method for achieving moving picture experts group (MPEG) 4I frame encoding on compute unified device architecture (CUDA)
CN104022756A (en) * 2014-06-03 2014-09-03 西安电子科技大学 Modified particle filter method based on GPU (Graphic Processing Unit) architecture
CN105491377A (en) * 2015-12-15 2016-04-13 华中科技大学 Video decoding macro-block-grade parallel scheduling method for perceiving calculation complexity
CN105956021A (en) * 2016-04-22 2016-09-21 华中科技大学 Automated task parallel method suitable for distributed machine learning and system thereof

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
A Parallel H.264 Encoder with CUDA:Mapping and Evaluation;Nan Wu,et al.;《2012 IEEE 18th International Conference on Parallel and Distributed Systems》;20130117;第276-281页 *
基于TMS320DM8168的高清视频编码技术与实现;姜忠兵,等.;《数据采集与处理》;20121130;第27卷(第6期);第690-694页 *

Also Published As

Publication number Publication date
CN107231558A (en) 2017-10-03

Similar Documents

Publication Publication Date Title
CN107231558B (en) A kind of implementation method of the H.264 parallel encoder based on CUDA
CN104869398B (en) A kind of CABAC realized based on CPU+GPU heterogeneous platforms in HEVC parallel method
CN105491377B (en) A kind of video decoded macroblock grade Method of Scheduling Parallel of computation complexity perception
CN102547296B (en) Motion estimation accelerating circuit and motion estimation method as well as loop filtering accelerating circuit
CN100586180C (en) Be used to carry out the method and system of de-blocking filter
CN101115207B (en) Method and device for implementing interframe forecast based on relativity between future positions
CN108449603B (en) Based on the multi-level task level of multi-core platform and the parallel HEVC coding/decoding method of data level
CN109495743A (en) A kind of parallelization method for video coding based on isomery many places platform
CN105791829B (en) A kind of parallel intra-frame prediction method of HEVC based on multi-core platform
CN101971633A (en) A video coding system with reference frame compression
CN102625108B (en) Multi-core-processor-based H.264 decoding method
CN105516728B (en) A kind of parallel intra-frame prediction method of H.265/HEVC middle 8x8 sub-macroblock
CN102970531A (en) Method for implementing near-lossless image compression encoder hardware based on joint photographic experts group lossless and near-lossless compression of continuous-tone still image (JPEG-LS)
CN103297777A (en) Method and device for increasing video encoding speed
CN103747250A (en) Method for 4*4 sub-macroblock parallel intraframe prediction in H.264/AVC
CN110337002A (en) The multi-level efficient parallel decoding algorithm of one kind HEVC in multi-core processor platform
CN101635849B (en) Loop filtering method and loop filter
CN1306826C (en) Loop filter based on multistage parallel pipeline mode
CN107483948A (en) Pixel macroblock processing method in a kind of webp compressions processing
CN109391816B (en) Parallel processing method for realizing entropy coding link in HEVC (high efficiency video coding) based on CPU (Central processing Unit) and GPU (graphics processing Unit) heterogeneous platform
CN108540797A (en) HEVC based on multi-core platform combines WPP coding methods within the frame/frames
CN111669580B (en) Method, decoding end, encoding end and system for encoding and decoding
CN102196272A (en) P frame encoding method and device
CN110446043A (en) A kind of HEVC fine grained parallel coding method based on multi-core platform
CN104396246B (en) Video compressing and encoding method and encoder

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: Room 307, 309 and 311, Room 959, Jiayuan Road, Yuanhe Street, Xiangcheng District, Suzhou City, Jiangsu Province

Applicant after: Jiangsu fire Interactive Technology Co., Ltd.

Address before: High tech Zone Suzhou city Jiangsu province 215000 Chuk Yuen Road No. 209

Applicant before: Jiangsu fire Interactive Technology Co., Ltd.

GR01 Patent grant
GR01 Patent grant