CN101923492A - Method for executing dynamic allocation command on embedded heterogeneous multi-core - Google Patents

Method for executing dynamic allocation command on embedded heterogeneous multi-core Download PDF

Info

Publication number
CN101923492A
CN101923492A CN 201010251261 CN201010251261A CN101923492A CN 101923492 A CN101923492 A CN 101923492A CN 201010251261 CN201010251261 CN 201010251261 CN 201010251261 A CN201010251261 A CN 201010251261A CN 101923492 A CN101923492 A CN 101923492A
Authority
CN
China
Prior art keywords
instruction
fundamental block
core
binary code
process nuclear
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN 201010251261
Other languages
Chinese (zh)
Other versions
CN101923492B (en
Inventor
过敏意
娄林
伍倩
朱寅
沈耀
马曦
唐飞龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN 201010251261 priority Critical patent/CN101923492B/en
Publication of CN101923492A publication Critical patent/CN101923492A/en
Application granted granted Critical
Publication of CN101923492B publication Critical patent/CN101923492B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Devices For Executing Special Programs (AREA)

Abstract

The invention discloses a method for executing a dynamic allocation command on embedded heterogeneous multi-core in the technical field of computers. The method comprises the following steps of: partitioning a binary code program to obtain a plurality of basic blocks; respectively selecting each basic block so as to obtain a target processing core for executing each basic block; translating a basic block which corresponds to the obtained target processing core so as to obtain a translated binary code on the target processing core; and performing statistics on execution frequency of each basic block, marking a basic block of which the execution frequency is greater than a threshold value T as a hot-spot basic block, and caching the translated binary code of the hot-spot basic block into the cache. The method dynamically allocates commands onto each heterogeneous multi-core to be executed according to the processing capacity and load condition of the system multi-core and the like, so that the method overcomes the defect that static scheduling cannot dynamically allocate resources and also reduces the complexity of dynamic thread division. Therefore, the execution efficiency of the program on the heterogeneous multi-core is further improved.

Description

On embedded heterogeneous multi-core, carry out the method that dynamic assignment is instructed
Technical field
What the present invention relates to is a kind of method of field of computer technology, specifically is a kind of method of carrying out the dynamic assignment instruction on the embedded heterogeneous multi-core.
Background technology
In recent years, because people are more and more higher to the requirement of computing power, the processing power of CPU also needed further raising.At first, the performance of raising CPU mainly is to realize by improving dominant frequency and front-side bus frequency and enlarging buffer memory.But the raising dominant frequency has proposed very big challenge to the problem of CPU power consumption and heat radiation, and enlarging buffer memory merely also needs to increase cost, and the development of core processor has reached a limit.In order further to improve the processing power of CPU, the polycaryon processor technology progresses into market.Since 2006, companies such as Intel and AMD have issued polycaryon processors such as double-core, four nuclears respectively.Along with polycaryon processor comes into the market completely, the computing machine multinuclear epoch arrive.
Integrated a plurality of microprocessor cores in single-chip, the executive routine code under the situation that does not promote frequency of operation, not only can improve handling property, and solve the problem of power consumption well fully concurrently, becomes the developing direction of following processor.But, the simple quantity that increases micro-processor kernel can not improve the processing power of CPU ideally, aspect the parallel processing capability of enhancement process device, the most important still parallelization of program itself, but the serial operating part of most of programs still can hinder the lifting of handling property.It is generally acknowledged that in the processor architecture of this isomorphism multinuclear, 4-8 kernel may be the limit that polycaryon processor can obtain better performance boost effect.
In order further to improve the processing power of CPU, the processor core of integrated a plurality of isomeries in a chip a kind of heterogeneous processor framework appearred, i.e..In this framework, different processor cores can be brought into play higher treatment efficiency (considering factors such as performance, power consumption, technology and cost) for application-specific, be good at the processing floating-point operation such as the core that has, and the kernel that has is good at work such as signal Processing.Make full use of the function difference of heterogeneous polynuclear, can satisfy the demand of different field different application more fully.The chip Cell that is jointly made by IBM, Sony and Toshiba etc. is the model of this heterogeneous processor just, and it is one piece of polycaryon processor that has 9 hardware cores.In the Cell chip, have only one to be the Power processor of IBM standard, all the other 8 kernels all are for handling the coprocessor that image is custom-designed, be used for floating-point operation.Wherein, the main function of primary processor is exactly to be responsible for the distribution of task, and actual floating-point operation work all is to be finished by coprocessor.Because the coprocessor among the Cell only is responsible for the floating-point operation task, required operation rule is very simple, and corresponding circuit logic is like this equally, as long as the CPU running frequency is enough high, Cell just can obtain surprising floating-point usefulness.
At present, heterogeneous multi-nucleus processor mainly contains three classes, according to the difference of instruction set, the equal mutually kind of instruction set identical but hardware configuration difference, the complete difference of instruction set and part instruction set core is arranged.Wherein for the heterogeneous multi-nucleus processor that the universal instruction set core is arranged, its different separately partial design is suitable for handling the application of different demands for the instruction set of expansion.In order to catch up with the developing steps of heterogeneous multi-nucleus processor, disposal route in the time of need designing more advanced compiling system and operation towards heterogeneous multi-nucleus processor the time comprises the design of programming language, compilation model and Runtime Library etc.In this respect, in the main working set in the optimization of task scheduling strategy and Thread-Level Parallelism.
The static scheduling of existing task mainly contains two kinds of methods: a kind of is at compiling duration different code compiles to be become to be fit to the binary code that different process nuclear is carried out; Another kind is the Runtime Library of link different disposal nuclear when operation.These two kinds of methods all are statically task assignment to be carried out on specific process nuclear, therefore require program just to consider the isomerism of different IPs when writing, and the programming personnel must have understanding fully to isomery nuclear.This method is not considered problems such as the load balancing of heterogeneous polynuclear and power consumption in addition, and the improper distribution of task may make the program implementation time increase, and informational needs such as load balancing just can be known when operation.
Find from the retrieval of prior art document, problem during for the complexity that overcomes this programming and operation, thread-level dynamic dispatching method has appearred, existing this method is carried out varigrained thread dividing to task, make different threads to carry out concurrently on different cores, and then improved execution efficient.But this method need be considered data dependency, in fact be difficult to the procedure division that serial is carried out is become the multithreading task of executed in parallel, in addition because the data dependency of different threads, also need to consider problems such as data sync and buffer consistency, may make thread scheduling become complicated more, cause the performance of program further to descend.
Summary of the invention
The objective of the invention is to overcome above-mentioned deficiency of the prior art, provide a kind of and on embedded heterogeneous multi-core, carry out the method that dynamic assignment is instructed.The present invention according to factors such as the processing power of system's multinuclear and loading condition dynamically distribution instruction to each heterogeneous polynuclear, carry out, thereby remedied static scheduling can not the dynamic assignment resource deficiency, also reduce the complicacy of dynamic thread dividing, thereby further improved the execution efficient of program on heterogeneous polynuclear.
The present invention is achieved by the following technical solutions, the present invention includes following steps:
The first step in being written into the process of binary code program, is divided processing to binary code program, obtains some fundamental blocks.
Described division is handled, be with the program between i bar entry instruction and the jump instruction of i+1 bar as i+1 fundamental block, wherein: 0≤i, entry instruction be the instruction that forwards to of jump instruction or closely follow jump instruction after instruction.
Second step, respectively each fundamental block is selected to handle, obtain carrying out the target processing core of each fundamental block.
Described selection is handled, and may further comprise the steps:
2.1) obtain with i fundamental block in j bar instruction A have the instruction A* of same function;
2.2) obtain the process nuclear A of all processing instruction A or instruction A*, and then handled the process nuclear set I of every instruction in i the fundamental block simultaneously;
2.3) according to p=1/N, obtain the execution performance of every instruction of each process nuclear execution among the process nuclear set I, and each process nuclear is carried out the execution performance addition of every instruction, obtain each process nuclear in this set and carry out the average behavior of i fundamental block, wherein: p is the execution performance of processor execution command, and N is that this processor is carried out the required instruction cycles of this instruction;
2.4) according to n=P/L, obtain the performance load ratio of each process nuclear, wherein: n is the performance load ratio, and L is the running load of this process nuclear, and P is the average behavior that this process nuclear is carried out i fundamental block;
2.5) from process nuclear set I, select performance load than the target processing core of maximum process nuclear as i fundamental block.
In the 3rd step, the corresponding fundamental block of target processing core that second step was obtained carries out Translation Processing, obtains the binary code after the translation on this target processing core.
Described Translation Processing may further comprise the steps:
3.1) when having the binary code translated to this target processing core in the high-speed cache, directly carry out 3.3); Otherwise, carry out 3.2) after carry out 3.3 again);
3.2) binary code of correspondence when this fundamental block is translated into this target processing core processing dynamically;
3.3) when there are data dependency in binary code and the preorder fundamental block carried out, carry out 3.4) and after carry out 3.5 again); Otherwise, directly carry out 3.5);
3.4) after the preorder fundamental block carry out to finish, obtain the execution result on the corresponding process nuclear, when the preorder fundamental block is assigned to when carrying out on other process nuclear, switched system context then;
3.5) binary code of this fundamental block correspondence of execution on this target processing core.
In the 4th step, the execution frequency of adding up each fundamental block is labeled as the focus fundamental block with carrying out the fundamental block of frequency greater than threshold value T, and the binary code buffer memory after the focus fundamental block translated is to the high speed buffer memory.
The 5th step, returned for second step, carry out the dynamic dispatching of next fundamental block.
Compared with prior art, the invention has the beneficial effects as follows: need not fully to understand the instruction feature of heterogeneous polynuclear, therefore compiler has also obtained the platform transparency, need or not compile different binary codes at different architectural framework codings.Simultaneously, this method is in having the heterogeneous multi-core system of universal instruction set, the function of performance dynamic instruction scheduling, according to factors such as the processing power of system's multinuclear and loading condition dynamically distribution instruction to each heterogeneous polynuclear, carry out, thereby remedied static scheduling can not the dynamic assignment resource deficiency, also reduce the complicacy of dynamic thread dividing, thereby further improved the execution efficient of program on heterogeneous polynuclear.This method does not need to carry out thread dividing, can farthest walk abreast under the situation that no datat relies on, and has therefore reduced about 15% thread synchronization expense.
Description of drawings
Fig. 1 is the method flow synoptic diagram of embodiment.
Embodiment
Below in conjunction with accompanying drawing method of the present invention is further described: present embodiment is being to implement under the prerequisite with the technical solution of the present invention, provided detailed embodiment and concrete operating process, but protection scope of the present invention is not limited to following embodiment.
Embodiment
As shown in Figure 1, present embodiment may further comprise the steps:
The first step in being written into the process of binary code program, is divided processing to binary code program, obtains some fundamental blocks.
Described division is handled, be with the program between i bar entry instruction and the jump instruction of i+1 bar as i+1 fundamental block, wherein: 0≤i, entry instruction be the instruction that forwards to of jump instruction or closely follow jump instruction after instruction.
According to the jump instruction of binary program code, define the entry instruction and the exit instruction of fundamental block in the present embodiment, each fundamental block is one group of instruction sequence of carrying out in proper order, and entry instruction is divided into a fundamental block to the instruction code between exit instruction.The entry instruction of fundamental block is article one instruction of fundamental block, i.e. instruction after the jump instruction or the instruction that jumps to, and exit instruction is a last instruction or the jump instruction of other fundamental block entry instruction.Partial ordering relation before and after keeping between the different fundamental blocks.
Second step, respectively each fundamental block is selected to handle, obtain carrying out the target processing core of each fundamental block.
Described selection is handled, and may further comprise the steps:
2.1) obtain with i fundamental block in j bar instruction A have the instruction A of same function *
2.2) obtain all processing instruction A or instruct A *Process nuclear A, and then handled the process nuclear set I of every instruction in i the fundamental block simultaneously;
2.3) according to p=1/N, obtain the execution performance of every instruction of each process nuclear execution among the process nuclear set I, and each process nuclear is carried out the execution performance addition of every instruction, obtain each process nuclear in this set and carry out the average behavior of i fundamental block, wherein: p is the execution performance of processor execution command, and N is that this processor is carried out the required instruction cycles of this instruction;
2.4) according to n=P/L, obtain the performance load ratio of each process nuclear, wherein: n is the performance load ratio, and L is the running load of this process nuclear, and P is the average behavior that this process nuclear is carried out i fundamental block;
2.5) from process nuclear set I, select performance load than the target processing core of maximum process nuclear as i fundamental block.
In the 3rd step, the corresponding fundamental block of target processing core that second step was obtained carries out Translation Processing, obtains the binary code after the translation on this target processing core.
Described Translation Processing may further comprise the steps:
3.1) when having the binary code translated to this target processing core in the high-speed cache, directly carry out 3.3); Otherwise, carry out 3.2) after carry out 3.3 again);
3.2) binary code of correspondence when this fundamental block is translated into this target processing core processing dynamically;
3.3) when there are data dependency in binary code and the preorder fundamental block carried out, carry out 3.4) and after carry out 3.5 again); Otherwise, directly carry out 3.5);
3.4) after the preorder fundamental block carry out to finish, obtain the execution result on the corresponding process nuclear, when the preorder fundamental block is assigned to when carrying out on other process nuclear, switched system context then;
3.5) binary code of this fundamental block correspondence of execution on this target processing core.
Utilize the binary translation technology former binary code to be mapped to the instruction of target processing core in the present embodiment, have two kinds of situations: a kind of situation is the similar that instruction is mapped to the instruction of target processing core, there is one-to-one relationship in they, and can finish same function; Another kind of situation be have on the target processing core one or one group of instruction that more has superiority possess with former fundamental block in one group of function that instruction is of equal value, such as the AVX (senior vector extension instruction set) of Intel.Instruction after the translation or instruction group have higher execution efficient or power consumption ratio.
In the 4th step, the execution frequency of adding up each fundamental block is labeled as the focus fundamental block with carrying out the fundamental block of frequency greater than threshold value T, and the binary code buffer memory after the focus fundamental block translated is to the high speed buffer memory.
Present embodiment can directly take out the frequent code of carrying out from high-speed cache, thereby greatly reduces the expense of code translation mapping.
The 5th step, returned for second step, carry out the dynamic dispatching of next fundamental block.
Present embodiment is in the process of instruction-level dynamic optimization, pipeline organization is formed in the execution of code on the translation mapping of the selection of target processing core, binary command and the process nuclear, obviously reduced the handover overhead of command assignment to heterogeneous polynuclear, compare with dynamic thread-level dispatching method, this method does not need to carry out thread dividing, under the situation that no datat relies on, can farthest walk abreast, therefore reduce about 15% thread synchronization expense.In addition, the target processing core code after the employing caches translations has further reduced expense, has promoted system performance, carries out the computing of video compress routine on a kind of chip of the DSP of having process nuclear, has on average reached 1.8 times speed-up ratio.

Claims (4)

1. on embedded heterogeneous multi-core, carry out the method that dynamic assignment is instructed for one kind, it is characterized in that, may further comprise the steps:
The first step in being written into the process of binary code program, is divided processing to binary code program, obtains some fundamental blocks;
Second step, respectively each fundamental block is selected to handle, obtain carrying out the target processing core of each fundamental block;
In the 3rd step, the corresponding fundamental block of target processing core that second step was obtained carries out Translation Processing, obtains the binary code after the translation on this target processing core;
In the 4th step, the execution frequency of adding up each fundamental block is labeled as the focus fundamental block with carrying out the fundamental block of frequency greater than threshold value T, and the binary code buffer memory after the focus fundamental block translated is to the high speed buffer memory;
The 5th step, returned for second step, carry out the dynamic dispatching of next fundamental block.
2. the method for carrying out the dynamic assignment instruction on the embedded heterogeneous multi-core according to claim 1, it is characterized in that, division described in the first step is handled, be as i+1 fundamental block with the program between i bar entry instruction and the jump instruction of i+1 bar, wherein: 0≤i, entry instruction is the instruction that forwards to of jump instruction or immediately following the instruction after the jump instruction.
3. the method for carrying out the dynamic assignment instruction on the embedded heterogeneous multi-core according to claim 1 is characterized in that, the selection described in second step is handled, and may further comprise the steps:
2.1) obtain with i fundamental block in j bar instruction A have the instruction A of same function *
2.2) obtain all processing instruction A or instruct A *Process nuclear A, and then handled the process nuclear set I of every instruction in i the fundamental block simultaneously;
2.3) according to p=1/N, obtain the execution performance of every instruction of each process nuclear execution among the process nuclear set I, and each process nuclear is carried out the execution performance addition of every instruction, obtain each process nuclear in this set and carry out the average behavior of i fundamental block, wherein: p is the execution performance of processor execution command, and N is that this processor is carried out the required instruction cycles of this instruction;
2.4) according to n=P/L, obtain the performance load ratio of each process nuclear, wherein: n is the performance load ratio, and L is the running load of this process nuclear, and P is the average behavior that this process nuclear is carried out i fundamental block;
2.5) from process nuclear set I, select performance load than the target processing core of maximum process nuclear as i fundamental block.
4. the method for carrying out the dynamic assignment instruction on the embedded heterogeneous multi-core according to claim 1 is characterized in that, the Translation Processing described in the 3rd step may further comprise the steps:
3.1) when having the binary code translated to this target processing core in the high-speed cache, directly carry out 3.3); Otherwise, carry out 3.2) after carry out 3.3 again);
3.2) binary code of correspondence when this fundamental block is translated into this target processing core processing dynamically;
3.3) when there are data dependency in binary code and the preorder fundamental block carried out, carry out 3.4) and after carry out 3.5 again); Otherwise, directly carry out 3.5);
3.4) after the preorder fundamental block carry out to finish, obtain the execution result on the corresponding process nuclear, when the preorder fundamental block is assigned to when carrying out on other process nuclear, switched system context then;
3.5) binary code of this fundamental block correspondence of execution on this target processing core.
CN 201010251261 2010-08-11 2010-08-11 Method for executing dynamic allocation command on embedded heterogeneous multi-core Expired - Fee Related CN101923492B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201010251261 CN101923492B (en) 2010-08-11 2010-08-11 Method for executing dynamic allocation command on embedded heterogeneous multi-core

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201010251261 CN101923492B (en) 2010-08-11 2010-08-11 Method for executing dynamic allocation command on embedded heterogeneous multi-core

Publications (2)

Publication Number Publication Date
CN101923492A true CN101923492A (en) 2010-12-22
CN101923492B CN101923492B (en) 2013-05-01

Family

ID=43338446

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201010251261 Expired - Fee Related CN101923492B (en) 2010-08-11 2010-08-11 Method for executing dynamic allocation command on embedded heterogeneous multi-core

Country Status (1)

Country Link
CN (1) CN101923492B (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013001401A1 (en) * 2011-06-28 2013-01-03 International Business Machines Corporation Unified, adaptive ras for hybrid systems
US20130080805A1 (en) * 2011-09-23 2013-03-28 Qualcomm Incorporated Dynamic partitioning for heterogeneous cores
CN103207772A (en) * 2013-04-07 2013-07-17 北京航空航天大学 Instruction prefetching content selecting method for optimizing WCET (worst-case execution time) of real-time task
WO2014047828A1 (en) * 2012-09-27 2014-04-03 Intel Corporation Method and apparatus to schedule store instructions across atomic regions in binary translation
US8799693B2 (en) 2011-09-20 2014-08-05 Qualcomm Incorporated Dynamic power optimization for computing devices
GB2514221A (en) * 2013-03-12 2014-11-19 Intel Corp Creating an isolated execution environment in a co-designed processor
CN104471557A (en) * 2012-06-18 2015-03-25 谷歌公司 Optimized execution of dynamic languages
CN105242909A (en) * 2015-11-24 2016-01-13 无锡江南计算技术研究所 Method for many-core circulation partitioning based on multi-version code generation
CN106020922A (en) * 2016-05-30 2016-10-12 湖南科技大学 Instruction dispatching method for executing packet padding idle rhythms by means of skip target basic blocks
CN107250985A (en) * 2015-02-27 2017-10-13 华为技术有限公司 For Heterogeneous Computing API(API)System and method
CN107870818A (en) * 2017-10-19 2018-04-03 福州瑞芯微电子股份有限公司 Polycaryon processor interrupts dynamic response method and storage medium
CN108874727A (en) * 2018-05-29 2018-11-23 中国人民解放军国防科技大学 GPDSP-oriented multi-core parallel computing implementation method
CN109960398A (en) * 2011-03-11 2019-07-02 英特尔公司 The dynamic core selection of system is felt concerned about for heterogeneous polynuclear

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101299194A (en) * 2008-06-26 2008-11-05 上海交通大学 Heterogeneous multi-core system thread-level dynamic dispatching method based on configurable processor
CN101299199A (en) * 2008-06-26 2008-11-05 上海交通大学 Heterogeneous multi-core system based on configurable processor and instruction set extension
CN101329638A (en) * 2007-06-18 2008-12-24 国际商业机器公司 Method and system for analyzing parallelism of program code
CN101387969A (en) * 2008-10-16 2009-03-18 上海交通大学 Dynamic binary translation method for cooperation design of software and hardware
US20090154572A1 (en) * 2007-12-17 2009-06-18 Samsung Electronics Co., Ltd. Method and apparatus for video decoding based on a multi-core processor

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101329638A (en) * 2007-06-18 2008-12-24 国际商业机器公司 Method and system for analyzing parallelism of program code
US20090154572A1 (en) * 2007-12-17 2009-06-18 Samsung Electronics Co., Ltd. Method and apparatus for video decoding based on a multi-core processor
CN101299194A (en) * 2008-06-26 2008-11-05 上海交通大学 Heterogeneous multi-core system thread-level dynamic dispatching method based on configurable processor
CN101299199A (en) * 2008-06-26 2008-11-05 上海交通大学 Heterogeneous multi-core system based on configurable processor and instruction set extension
CN101387969A (en) * 2008-10-16 2009-03-18 上海交通大学 Dynamic binary translation method for cooperation design of software and hardware

Cited By (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11755099B2 (en) 2011-03-11 2023-09-12 Intel Corporation Dynamic core selection for heterogeneous multi-core systems
CN109960398A (en) * 2011-03-11 2019-07-02 英特尔公司 The dynamic core selection of system is felt concerned about for heterogeneous polynuclear
CN109960398B (en) * 2011-03-11 2023-11-07 英特尔公司 Dynamic core selection for heterogeneous multi-core systems
GB2506048A (en) * 2011-06-28 2014-03-19 Ibm Unified, adaptive RAS for hybrid systems
CN103562873B (en) * 2011-06-28 2016-10-26 国际商业机器公司 For processing the method and system of data in computer systems
US8788871B2 (en) 2011-06-28 2014-07-22 International Business Machines Corporation Unified, workload-optimized, adaptive RAS for hybrid systems
CN103562873A (en) * 2011-06-28 2014-02-05 国际商业机器公司 Unified, adaptive RAS for hybrid systems
GB2506048B (en) * 2011-06-28 2020-10-14 Ibm Unified, adaptive RAS for hybrid systems
WO2013001401A1 (en) * 2011-06-28 2013-01-03 International Business Machines Corporation Unified, adaptive ras for hybrid systems
US8799693B2 (en) 2011-09-20 2014-08-05 Qualcomm Incorporated Dynamic power optimization for computing devices
US9098309B2 (en) * 2011-09-23 2015-08-04 Qualcomm Incorporated Power consumption optimized translation of object code partitioned for hardware component based on identified operations
US20130080805A1 (en) * 2011-09-23 2013-03-28 Qualcomm Incorporated Dynamic partitioning for heterogeneous cores
CN104471557A (en) * 2012-06-18 2015-03-25 谷歌公司 Optimized execution of dynamic languages
CN104471557B (en) * 2012-06-18 2016-11-02 谷歌公司 The optimization of dynamic language performs
WO2014047828A1 (en) * 2012-09-27 2014-04-03 Intel Corporation Method and apparatus to schedule store instructions across atomic regions in binary translation
US9141362B2 (en) 2012-09-27 2015-09-22 Intel Corporation Method and apparatus to schedule store instructions across atomic regions in binary translation
US9405551B2 (en) 2013-03-12 2016-08-02 Intel Corporation Creating an isolated execution environment in a co-designed processor
GB2514221B (en) * 2013-03-12 2017-02-01 Intel Corp Creating an isolated execution environment in a co-designed processor
GB2514221A (en) * 2013-03-12 2014-11-19 Intel Corp Creating an isolated execution environment in a co-designed processor
CN103207772B (en) * 2013-04-07 2016-01-13 北京航空航天大学 A kind of instruction prefetch content selection method optimizing real-time task WCET
CN103207772A (en) * 2013-04-07 2013-07-17 北京航空航天大学 Instruction prefetching content selecting method for optimizing WCET (worst-case execution time) of real-time task
CN107250985A (en) * 2015-02-27 2017-10-13 华为技术有限公司 For Heterogeneous Computing API(API)System and method
CN107250985B (en) * 2015-02-27 2020-10-16 华为技术有限公司 System and method for heterogeneous computing Application Programming Interface (API)
CN105242909A (en) * 2015-11-24 2016-01-13 无锡江南计算技术研究所 Method for many-core circulation partitioning based on multi-version code generation
CN105242909B (en) * 2015-11-24 2017-08-11 无锡江南计算技术研究所 A kind of many-core cyclic blocking method based on multi version code building
CN106020922B (en) * 2016-05-30 2019-01-08 湖南科技大学 The instruction dispatching method of idle beat is filled with the execution packet of jump target basic block
CN106020922A (en) * 2016-05-30 2016-10-12 湖南科技大学 Instruction dispatching method for executing packet padding idle rhythms by means of skip target basic blocks
CN107870818B (en) * 2017-10-19 2021-03-02 瑞芯微电子股份有限公司 Multi-core processor interrupt dynamic response method and storage medium
CN107870818A (en) * 2017-10-19 2018-04-03 福州瑞芯微电子股份有限公司 Polycaryon processor interrupts dynamic response method and storage medium
CN108874727B (en) * 2018-05-29 2019-09-10 中国人民解放军国防科技大学 GPDSP-oriented multi-core parallel computing implementation method
CN108874727A (en) * 2018-05-29 2018-11-23 中国人民解放军国防科技大学 GPDSP-oriented multi-core parallel computing implementation method

Also Published As

Publication number Publication date
CN101923492B (en) 2013-05-01

Similar Documents

Publication Publication Date Title
CN101923492B (en) Method for executing dynamic allocation command on embedded heterogeneous multi-core
CN101299194B (en) Heterogeneous multi-core system thread-level dynamic dispatching method based on configurable processor
CN105487838A (en) Task-level parallel scheduling method and system for dynamically reconfigurable processor
CN102981807A (en) Graphics processing unit (GPU) program optimization method based on compute unified device architecture (CUDA) parallel environment
CN104106049A (en) Rasterization of compute shaders
CN108595258A (en) A kind of GPGPU register files dynamic expansion method
Wang et al. Simultaneous multikernel: Fine-grained sharing of gpus
Chen et al. Guided region-based GPU scheduling: utilizing multi-thread parallelism to hide memory latency
CN102087609A (en) Dynamic binary translation method under multi-processor platform
CN105242909A (en) Method for many-core circulation partitioning based on multi-version code generation
Hu et al. The role of return value prediction in exploiting speculative method-level parallelism
Diaz et al. The supercodelet architecture
Huang et al. Minimizing energy consumption of embedded systems via optimal code layout
Li et al. Code layout optimization for defensiveness and politeness in shared cache
Zheng et al. Performance model for OpenMP parallelized loops
Sampaio et al. Divergence analysis with affine constraints
Mantripragada et al. A new framework for integrated global local scheduling
Tian et al. Optimizing gpu register usage: Extensions to openacc and compiler optimizations
Lima et al. HPSM: a programming framework for multi-cpu and multi-gpu systems
Georgakoudis et al. Fast dynamic binary rewriting to support thread migration in shared-isa asymmetric multicores
Li et al. Code motion for migration minimization in STT-RAM based hybrid cache
Ravi et al. Semi-automatic restructuring of offloadable tasks for many-core accelerators
Jakimovska et al. Modern processor architectures overview
Akram Managed language runtimes on heterogeneous hardware: Optimizations for performance, efficiency and lifetime improvement
Zhang et al. Binary translation to improve energy efficiency through post-pass register re-allocation

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20130501

Termination date: 20150811

EXPY Termination of patent right or utility model