CN101860752A - Video code stream parallelization method for embedded multi-core system - Google Patents

Video code stream parallelization method for embedded multi-core system Download PDF

Info

Publication number
CN101860752A
CN101860752A CN 201010166248 CN201010166248A CN101860752A CN 101860752 A CN101860752 A CN 101860752A CN 201010166248 CN201010166248 CN 201010166248 CN 201010166248 A CN201010166248 A CN 201010166248A CN 101860752 A CN101860752 A CN 101860752A
Authority
CN
China
Prior art keywords
node
data
parallel
data flow
flow diagram
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN 201010166248
Other languages
Chinese (zh)
Other versions
CN101860752B (en
Inventor
徐志远
刘鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN 201010166248 priority Critical patent/CN101860752B/en
Publication of CN101860752A publication Critical patent/CN101860752A/en
Application granted granted Critical
Publication of CN101860752B publication Critical patent/CN101860752B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Devices For Executing Special Programs (AREA)

Abstract

The invention provides a video code stream parallelization method for an embedded multi-core system, which comprises the following steps of: on the basis of data flow diagram representation of a video code original program for processing basic data units, performing analog simulation on a target video code original program, extracting computation workload of various nodes in the data flow diagram, analyzing the dependence of various nodes in the data flow diagram and the data dependence for processing the basic data units after stream parallelization, and selecting a processor load-balanced and less inter-core communication stream parallelization dividing scheme according to the obtained computation workload of the nodes and the dependence between the nodes; and after acquiring the dividing scheme, performing targeted encapsulation on task nodes according to the interface standard of indicators, statically mapping encapsulated objects to corresponding cores of the processor, and performing stream parallelization on the video code original program by using the multi-core system by matching the indicators on the cores of the processor.

Description

A kind of video coding streamlined parallel method at embedded multi-core system
Technical field
The present invention relates to the parallel programming of media application, proposed a kind of video coding program streamlined parallel method at embedded multi-core system especially.
Background technology
The advantage of multiprocessor system chip (MPSoC:Multi-Processor System-on-Chip) at aspects such as calculated performance, power consumption, chip area and real-times makes it use more and more widely in current built-in field.But how easily for MPSoC develops concurrent program efficiently, application programs developer and system designer all are challenges.
The media video application program is to use a class method more and that operand is bigger in the built-in field.Media program is a kind of typical Data Stream Processing program, exports after promptly successively the source data of order input being carried out generating the result some the processing stage.Parallelization to the media video application program has two kinds of strategies: (1) data are parallel divides; Be about to not have in the processing procedure source data of dependence to be assigned on the different processors and handle, reach the purpose that multi-core parallel concurrent is carried out.(2) task is parallel divides; The processing procedure that is about to the elementary cell macro block is divided into some stages, each processor only is responsible for moment, and handle current macro be assigned with the processing stage after, just the result is delivered to next processor and make it begin to calculate, handle macro block by the collaborative of several processors and streamlined ground and reach the parallel purpose of quickening.
For video encoder, the restriction that the parallel division methods of some existing tasks exists mainly comprises: the foundation of the theoretic parallel splitting scheme of flowing water that sets the tasks of (1) neither one.Data dependence relation when (2) clearly not proposing to handle different source data in the analysis of task flowing water splitting scheme is to the influence of task pipeline.(3) neither one advantages of simplicity and high efficiency tasks synchronization and scheduling mechanism.Offering the challenge property of the present invention parallel method can be avoided above-mentioned restriction, can come the equalization processor load by the division that changes the task pipelining-stage, improves the level of resources utilization.
Summary of the invention
The invention provides a kind of video encoder task streamlined parallel method at the embedded multi-core platform.This method mainly comprises the content of two aspects: the parallelization of (1) original program is divided; (2) mapping of the processor of the tasks/threads after the division and parallel scheduled for executing.
The target that program parallelization is divided comprises: the speed-up ratio that promotes parallelisation procedure; Excavate the concurrency of serial program, scheduler task makes processor wait for the expense minimum, optimizes the resource utilization of parallel system.
The video encoder task streamlined parallel method at the embedded multi-core platform that the present invention proposes specifically may further comprise the steps:
(1), obtains a kind of coarse granule degrees of data flow graph representation of program according to carrying out the parallel primitive rank of flowing water at target video coding original program;
According to the primitive rank that the flow process of original program is chosen,, can choose macro block as basic data processing unit for the video programming preface.
(2) target video coding original program is carried out analog simulation, extract each node operand in the data flow diagram;
Before parallel division of target video coding original program, the select target video sequence carries out dynamic simulation, and the operand of each node in the data flow diagram that recording step (1) obtains is as the foundation of determining parallel splitting scheme.
Data dependence relation when (3) dependence of each node and streamlined walk abreast the reprocessing primitive in the analysis data flow diagram;
By each data between nodes dependence in the dynamic simulation specified data flow graph, add up required data traffic size between the node that causes because of these data dependence relations, as the foundation of excessive data communication overhead in the flowing water parallel scheme.
(4) according to the concurrency and the internuclear data communication expense of internuclear task, obtain the parallel splitting scheme of flowing water, if this scheme satisfies the parallel system requirement, enter step (5), otherwise the node in the initial data flow graph is divided or merges, get back to step (2);
In the data flow diagram that step (2) and step (3) are obtained between node operand statistics and node the data communication quantitative statistics mark in the data flow diagram that step (1) obtains, obtain a kind of synchrodata flow graph representation of original program.Select parallel splitting scheme according to the processor quantity of synchrodata flow graph and target multi-core platform.The selection of parallel splitting scheme comprises following two guidelines: the operand of (1) each pipelining-stage equates as far as possible, reduces the parallel efficiency that causes because of processor load is unbalanced and descends; (2) according to the data dependence relation of node in the data flow diagram, the node that data traffic is big is assigned on the same processor, reduces the expense that produces because of internuclear data communication; Finally, be up to criterion with total parallel speed-up ratio and resource utilization ratio and choose splitting scheme.
(5) splitting scheme that obtains according to step (4) to task node as indicated the interface standard of device carry out the objectification encapsulation, object after the encapsulation is mapped on the corresponding processor core statically, cooperates the indicating device on each processor core to realize the streamlined executed in parallel of multiple nucleus system to the video coding original program;
Object (Object) after each encapsulation comprises action-function and some input/output ports, the corresponding inputoutput buffer of each input/output port.Each port all has a corresponding marker bit to represent the state of port, and all sign is formed semaphore entries (SemaEntry), and indicating device is by to the inquiry of semaphore entries and the management object of more newly arriving.When all input/output ports of object were all ready, indicating device scheduler object act of execution function, action-function were obtained to bear results after data are handled from the input block and are delivered to output buffer.Data communication between different processor is finished by indicating device.
The present invention proposes a kind of streamlined parallel method towards media application, is the program parallelization quantitative analysis that example carries out multiple nucleus system with the video encoder, and concrete implementation method has versatility for the program of Data Stream Processing type.
Description of drawings
Fig. 1 is the schematic flow sheet of the embodiment of the invention;
Fig. 2 is the coarse granule degrees of data flow diagram of the embodiment of the invention;
Fig. 3 is the schematic diagram that concerns of the current macro of the embodiment of the invention and adjacent macroblocks;
Fig. 4 is the simple class encoder encodes of the MPEG-4 of an embodiment of the invention schematic flow sheet;
Fig. 5 is the MPEG-4 coding P type macro block data flow graph of the embodiment of the invention;
Fig. 6 is the multiple nucleus system platform structure schematic diagram of the embodiment of the invention;
Fig. 7 is parallel the division and the mapping scheme schematic diagram of three nuclears of the embodiment of the invention;
Fig. 8 is parallel the division and the mapping scheme schematic diagram of five nuclears of the embodiment of the invention;
Fig. 9 is the streamline schematic diagram of three nuclear parallel schemes of the embodiment of the invention;
Figure 10 is the streamline schematic diagram of five nuclear parallel schemes of the embodiment of the invention.
Embodiment
Below in conjunction with accompanying drawing, by dividing the execution mode that example illustrates a kind of multiprocessor streamlined parallel method towards media application of the present invention to the task streamlined of the simple class encoder of MPEG-4 is parallel.
The flow process that has proposed a kind of video coding program streamlined parallel method at embedded multi-core system that the present invention proposes as shown in Figure 1.
Carry out the concurrency analysis of serial program based on data flow diagram, as shown in Figure 2.Typical data flow diagram is by some nodes and represent that the directed arc of these node annexations forms, and node among the figure just can be performed as long as becomes ready attitude after all inputs are all satisfied, and the result of generation is as the input of subsequent node after the execution.Node can be that one section program block or linear function call, as long as enough processor numbers are arranged and are in the node of ready attitude, and the execution that just can be scheduled simultaneously of these nodes, so data flow model can be used for the concurrency of development sequence from essence.Directed arc in the data flow diagram is represented the data between nodes dependence, exist the node of data dependence relation not carry out simultaneously, and data dependence relation can be divided into two classes: the data when (1) handles same source data rely on, the present invention is referred to as forward direction and relies on (FD:ForwardDependence), and the stain among Fig. 2 on the arc is promptly represented this dependence.Data when (2) handling different source data rely on, and the present invention is referred to as the back to relying on (BD:Backward Dependence), and M promptly represents this dependence among Fig. 2.
What general concurrency exploitation was considered is that forward direction relies on the restriction that produces, as node F among Fig. 2 BAnd F CThereby between do not have forward direction to rely on can executed in parallel.But node F AAnd F BData production-consumption dependence dAB when being handled, same source data is arranged and can not executed in parallel.If F AAnd F BDo not exist data to rely on when different source datas is handled, can line up the flowing water parallel processing.At F AWhen handling present input data, F BHandle previous moment F simultaneously AThe input data of handling so just can be eliminated the forward direction that same source data is handled and rely on, thereby have realized that the task flow aquation is parallel.
The definite of task flow aquation parallel scheme should reduce the synchronous wait expense that the back causes to dependence, elevator system performance as far as possible.As shown in Figure 2, if node F BWith F DBe in different pipelining-stages, because the back is to the existence that relies on M, F BAt F DCan not carry out the processing of next source data before handling current source data, cause the pause of streamline to be waited for, reduce systematic function.
In order to improve the level of resources utilization of parallel system, the operand that needs the data flow diagram of statistics program each node module in representing, forward direction dependence and back are to dependence between analysis node, investigate factors such as data communication expense, to explore a kind of load balancing of a plurality of processors, procedure division and duty mapping scheme that pipeline stall is waited for the expense minimum of making.
In the parallelization scheme determination process, the selection of task node granularity size is most important.The little load balancingization that then when duty mapping, realizes processor more easily of task granularity, the situation that can adapt to the varying number processor core simultaneously more neatly, but corresponding communication expense, division complexity and program code amount all can increase, because each task all will be carried out the objectification encapsulation, the inner data communication management module that need be extra of object.
The simple class coding of MPEG-4 flow process as shown in Figure 4.In the MPEG-4 coded program, the primitive macro block (MB:MacroBlock) of original image according to from top to bottom, the input of from left to right scanning sequency, handle the back through some tasks and produce compressed bit streams.In to its parallelization process, in order rationally to arrange task flowing water, the data dependence relation in the time of not only will considering to encode single macro block between each task, also to consider between adjacent macroblocks data dependence relation promptly the back to data dependence relation.
As shown in Figure 4, the coding flow process of I frame and P frame is different, and I frame macro block need not carry out the motion search part in the P frame macro block.Therefore at the splitting scheme of P frame, pipelining-stage has bigger lack of uniformity when handling the I frame data.But because coding mode generally speaking is IPPP ... IPPP ... tens even tens P frames in interval between per two I frames, and the influence that how this more imbalance of P frame causes between two I frames is just more little, therefore mainly considers that the harmony of P frame carries out the pipelining-stage parallelization.
By to the exemplary video sequence (foreman, news, mobile, analog simulation bus), statistics when obtaining MPEG-4 encoder encodes P frame macro block the operand ratio of each main modular as shown in table 1.
Each main modular operand accounts for total operand ratio during table 1.MPEG-4 coding P frame macro block
Figure GSA00000114010700061
In order to obtain high as far as possible compression ratio, the MPEG-4 encoder has used various Predicting Techniques, thereby has increased the weight of the data dependency between macro block.No matter be intracoded frame (I frame) or inter-frame encoding frame (Pframe), current macro and macro block A, macro block B and macro block C exist data to rely on, as shown in Figure 3.In the I frame predicted value of current macro data by macro block A, macro block B and and macro block C calculate; The motion vector residual error of the last coding of P frame (MVD:Motion Vector Difference) also is that current motion vector and predictive vector subtraction calculations obtain; And predictive vector is calculated by macro block A, macro block C and macro block D, and the loop filtering of I frame and P frame all needs the value of the left side and top macro block.The sequence requirement of entropy coding is carried out in strict accordance with the macro block scanning sequency, can not carry out the entropy coding of next macro block before promptly the current macro entropy coding is not finished.The macro block that guarantees lastrow when handling current macro all disposes, and the back relies on to data and is present between current macro and left side macro block like this.But this have the back between the node that relies on the residing pipelining-stage of the producer stage before the residing pipelining-stage of the consumer stage, in Fig. 4, need a last macroblock coding to finish result afterwards during coefficient prediction resume module current macro data, and the macroblock coding module is before the coefficient prediction module, last macroblock encoding operation is finished when carrying out the current macro coefficient prediction, therefore can not produce the influence that pauses to task pipeline.
As shown in table 1, the operand ratio is unbalanced.Motion estimation module most operation time that accounted for.For the equalization processor load, must divide again or merge node, and determine partition strategy according to the resource of particular hardware platform.Motion estimation module can be divided into whole pixel motion and estimate and the half-pix estimation that the half-pix motion estimation module that wherein operand is bigger can be divided into one time 16 * 16 block search and four times 8 * 8 block search again.The data flow diagram of MPEG-4 coding P frame macro block as shown in Figure 5.The instruction strip number that this node of numeral among Fig. 5 on the node limit is carried out, the data traffic size is as shown in table 2 between node.
Data traffic between table 2. node
The present invention utilizes the heterogeneous polynuclear SOC (system on a chip), comprise that the multinuclear RED platform that 1 Reduced Instruction Set Computer (RISC:ReducedInstruction Set Computer) processor and 8 digital signal processors (DSPs:DigitalSignal Processors) are formed carries out the parallel scheme experiment of three nuclears and five nuclears, the RED platform structure as shown in Figure 6.The task division of three nuclears and five nuclears and mapping scheme are as shown in Figure 7 and Figure 8.
Mapping scheme among Fig. 7 is three grades of flowing water, and wherein in fact two objects on the digital signal processor #1 are in different pipelining-stages, but after streamlined is carried out a period of time, can regard the situation that they handle different source datas as the same flowing water stage.Mapping scheme among Fig. 8 is five nuclear level Four flowing water, and wherein the object on digital signal processor #2 and the digital signal processor #3 does not have data to rely on each other and walks abreast, and they are in same pipelining-stage.Corresponding pipeline state is respectively as Fig. 9 and shown in Figure 10.
The ratio that these walked abreast modules that above-mentioned two kinds of schemes are mentioned account for the total operand of program is about 96.3%, speed-up ratio in two kinds of parallel schemes on these Modularity Theory is respectively 2.92 and 4.28, according to A Mudaer (Amdahl) law, the theoretical speed-up ratio of whole procedure is respectively 2.72 and 3.82.
At last, it is also to be noted that what more than enumerate only is specific embodiments of the invention.Obviously, the invention is not restricted to above examples of implementation, many distortion can also be arranged.All distortion that those of ordinary skill in the art can directly derive or associate from content disclosed by the invention all should be thought protection scope of the present invention.

Claims (7)

1. video coding streamlined parallel method at embedded multi-core system is characterized in that may further comprise the steps:
(1), obtains a kind of coarse granule degrees of data flow graph representation of program according to carrying out the parallel primitive rank of flowing water at target video coding original program;
(2) target video coding original program is carried out analog simulation, extract each node operand in the data flow diagram;
Data dependence relation when (3) dependence of each node and streamlined walk abreast the reprocessing primitive in the analysis data flow diagram;
(4) according to the concurrency and the internuclear data communication expense of internuclear task, obtain the parallel splitting scheme of flowing water, if this scheme satisfies the parallel system requirement, enter step (5), otherwise the node in the initial data flow graph is divided or merges, get back to step (2);
(5) splitting scheme that obtains according to step (4) to task node as indicated the interface standard of device carry out the objectification encapsulation, object after the encapsulation is mapped on the corresponding processor core statically, cooperates the indicating device on each processor core to realize the streamlined executed in parallel of multiple nucleus system to the video coding original program.
2. method according to claim 1, it is characterized in that: step (2) is chosen the target video sequence, the operand ratio of each node in data flow diagram during recording of video coding original program dynamic process target sequence is as the foundation that obtains task pipelining-stage splitting scheme.
3. method according to claim 1 is characterized in that: in the step (3) to the dependence of node in the data flow graph comprise between node the data that produce when same primitive handled rely on node between the data that produce when handling different primitive rely on.
4. method according to claim 1, it is characterized in that described flowing water parallel scheme obtains as follows: according to total operand that the processor quantity and the target video coding original program of target multi-core platform are handled a primitive, the pipelining-stage quantity of the streamline that sets the tasks and the operand of every grade of pipelining-stage; According to data dependence relation distribution node between the operand ratio of each node in the data flow diagram and node in corresponding pipelining-stage.
5. method according to claim 4 is characterized in that, the selection of the parallel splitting scheme of described flowing water comprises following two guidelines: the operand of (1) each pipelining-stage equates as far as possible, reduces the parallel efficiency that causes because of processor load is unbalanced and descends; (2) according to the data dependence relation of node in the data flow diagram, the node that data traffic is big is assigned on the same processor, reduces the expense that produces because of internuclear data communication; Finally, be up to criterion with total parallel speed-up ratio and resource utilization ratio and choose splitting scheme.
6. method according to claim 1, it is characterized in that: in step (5), object after the described encapsulation comprises input/output port and action-function, the corresponding inputoutput buffer of each input/output port, action-function is obtained to bear results after data are handled from the input block and is delivered to output buffer.
7. method according to claim 1, it is characterized in that: in the step (5), indicating device writes down the state information of each object port correspondence, and described state information is formed semaphore entries, and indicating device is by managing and simultaneous operation the inquiry of semaphore entries value and the realization object scheduling of more newly arriving.
CN 201010166248 2010-05-07 2010-05-07 Video code stream parallelization method for embedded multi-core system Expired - Fee Related CN101860752B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201010166248 CN101860752B (en) 2010-05-07 2010-05-07 Video code stream parallelization method for embedded multi-core system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201010166248 CN101860752B (en) 2010-05-07 2010-05-07 Video code stream parallelization method for embedded multi-core system

Publications (2)

Publication Number Publication Date
CN101860752A true CN101860752A (en) 2010-10-13
CN101860752B CN101860752B (en) 2012-02-01

Family

ID=42946352

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201010166248 Expired - Fee Related CN101860752B (en) 2010-05-07 2010-05-07 Video code stream parallelization method for embedded multi-core system

Country Status (1)

Country Link
CN (1) CN101860752B (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103064935A (en) * 2012-12-24 2013-04-24 深圳先进技术研究院 System and method for multimedia data parallel processing
CN103377035A (en) * 2012-04-12 2013-10-30 浙江大学 Pipeline parallelization method for coarse-grained streaming application
CN103414767A (en) * 2013-07-30 2013-11-27 华南师范大学 Method and device for deploying application software on cloud computing platform
CN103748557A (en) * 2011-07-22 2014-04-23 三星电子株式会社 Simulation device and simulation method therefor
CN105335129A (en) * 2014-06-23 2016-02-17 联想(北京)有限公司 Information processing method and electronic equipment
CN109213590A (en) * 2017-06-30 2019-01-15 华为技术有限公司 The method and apparatus of dispatch processor
CN109284815A (en) * 2018-11-30 2019-01-29 上海寒武纪信息科技有限公司 Neural network model algorithm Compilation Method, device and Related product
CN109543825A (en) * 2018-11-30 2019-03-29 上海寒武纪信息科技有限公司 Neural network model algorithm Compilation Method, device and Related product
CN111327921A (en) * 2018-12-17 2020-06-23 深圳市炜博科技有限公司 Video data processing method and device
CN116149732A (en) * 2023-02-15 2023-05-23 深圳大学 Hardware automatic execution method, system and product of data flow task

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1833226A (en) * 2003-02-27 2006-09-13 皇家飞利浦电子股份有限公司 Data processing system having a plurality of processing elements, a method of controlling a data processing system having a plurality of processing elements
US20090106187A1 (en) * 2007-10-18 2009-04-23 Nec Corporation Information processing apparatus having process units operable in parallel
CN101515231A (en) * 2009-03-23 2009-08-26 浙江大学 Realization method for parallelization of single-threading program based on analysis of data flow

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1833226A (en) * 2003-02-27 2006-09-13 皇家飞利浦电子股份有限公司 Data processing system having a plurality of processing elements, a method of controlling a data processing system having a plurality of processing elements
US20090106187A1 (en) * 2007-10-18 2009-04-23 Nec Corporation Information processing apparatus having process units operable in parallel
CN101515231A (en) * 2009-03-23 2009-08-26 浙江大学 Realization method for parallelization of single-threading program based on analysis of data flow

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
《电路与系统学报》 20051231 汪斌,等 MPEG2解码系统集成芯片音频处理存储优化策略 第10卷, 第6期 2 *
《计算机工程与应用》 20091231 汪斌 基于LGDFG模型的系统级存储优化方法 , 2 *

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103748557A (en) * 2011-07-22 2014-04-23 三星电子株式会社 Simulation device and simulation method therefor
CN103748557B (en) * 2011-07-22 2017-05-31 三星电子株式会社 Emulator and its emulation mode
US10162913B2 (en) 2011-07-22 2018-12-25 Samsung Electronics Co., Ltd. Simulation device and simulation method therefor
CN103377035A (en) * 2012-04-12 2013-10-30 浙江大学 Pipeline parallelization method for coarse-grained streaming application
CN103064935B (en) * 2012-12-24 2016-05-18 深圳先进技术研究院 A kind of multi-medium data parallel processing system (PPS) and method
CN103064935A (en) * 2012-12-24 2013-04-24 深圳先进技术研究院 System and method for multimedia data parallel processing
CN103414767A (en) * 2013-07-30 2013-11-27 华南师范大学 Method and device for deploying application software on cloud computing platform
CN105335129B (en) * 2014-06-23 2019-03-29 联想(北京)有限公司 Information processing method and electronic equipment
CN105335129A (en) * 2014-06-23 2016-02-17 联想(北京)有限公司 Information processing method and electronic equipment
CN109213590A (en) * 2017-06-30 2019-01-15 华为技术有限公司 The method and apparatus of dispatch processor
CN109213590B (en) * 2017-06-30 2021-01-15 华为技术有限公司 Method and apparatus for scheduling processors
CN109543825A (en) * 2018-11-30 2019-03-29 上海寒武纪信息科技有限公司 Neural network model algorithm Compilation Method, device and Related product
CN109284815A (en) * 2018-11-30 2019-01-29 上海寒武纪信息科技有限公司 Neural network model algorithm Compilation Method, device and Related product
CN109284815B (en) * 2018-11-30 2020-11-24 安徽寒武纪信息科技有限公司 Neural network model algorithm compiling method and device and related products
CN111327921A (en) * 2018-12-17 2020-06-23 深圳市炜博科技有限公司 Video data processing method and device
CN116149732A (en) * 2023-02-15 2023-05-23 深圳大学 Hardware automatic execution method, system and product of data flow task
CN116149732B (en) * 2023-02-15 2023-10-27 深圳大学 Hardware automatic execution method, system and product of data flow task

Also Published As

Publication number Publication date
CN101860752B (en) 2012-02-01

Similar Documents

Publication Publication Date Title
CN101860752B (en) Video code stream parallelization method for embedded multi-core system
CN102902512B (en) A kind of multi-threading parallel process method based on multi-thread programming and message queue
CN102541640B (en) Cluster GPU (graphic processing unit) resource scheduling system and method
CN102043886B (en) Underlying hardware mapping method for integrated circuit as well as time sequence constraint method and device for data control flow
CN106339351B (en) A kind of SGD algorithm optimization system and method
CN105491377A (en) Video decoding macro-block-grade parallel scheduling method for perceiving calculation complexity
CN101937082A (en) GPU (Graphic Processing Unit) many-core platform based parallel imaging method of synthetic aperture radar
CN103970580A (en) Data flow compilation optimization method oriented to multi-core cluster
CN103369315B (en) The coding of the intra prediction mode of chromatic component, coding/decoding method, equipment and system
CN103974081A (en) HEVC coding method based on multi-core processor Tilera
CN102360313A (en) Performance acceleration method of heterogeneous multi-core computing platform on chip
Song et al. Bridging the semantic gaps of GPU acceleration for scale-out CNN-based big data processing: Think big, see small
CN103677960A (en) Game resetting method for virtual machines capable of controlling energy consumption
CN105786610B (en) The method that computation-intensive task is unloaded into Cloud Server
Roh et al. Prediction complexity-based HEVC parallel processing for asymmetric multicores
Ling et al. Blastnet: Exploiting duo-blocks for cross-processor real-time dnn inference
CN101170696B (en) Motion estimating method
Cai et al. Deepburning-seg: Generating dnn accelerators of segment-grained pipeline architecture
CN106658012A (en) Parallel pipeline task division method for VP9 decoder
CN102054107B (en) Lower hardware mapping method of integrated circuit, and space-time diagram generation method and device
CN111090613A (en) Low-complexity hardware and software partitioning and scheduling method based on graph partitioning
CN105260165A (en) Method for dynamically controlling speculative thread number in thread-level speculation system based on SMT
CN105653243A (en) Method for distributing tasks by general purpose graphic processing unit in multi-task concurrent execution manner
CN114021833A (en) Line loss prediction method, system, storage medium and computing device
Abdallah et al. Correct and energy-efficient design of socs: The h. 264 encoder case study

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20120201

Termination date: 20150507

EXPY Termination of patent right or utility model