CN101923492A - Method for executing dynamic allocation command on embedded heterogeneous multi-core - Google Patents
Method for executing dynamic allocation command on embedded heterogeneous multi-core Download PDFInfo
- Publication number
- CN101923492A CN101923492A CN 201010251261 CN201010251261A CN101923492A CN 101923492 A CN101923492 A CN 101923492A CN 201010251261 CN201010251261 CN 201010251261 CN 201010251261 A CN201010251261 A CN 201010251261A CN 101923492 A CN101923492 A CN 101923492A
- Authority
- CN
- China
- Prior art keywords
- instruction
- fundamental block
- core
- binary code
- process nuclear
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Landscapes
- Devices For Executing Special Programs (AREA)
Abstract
The invention discloses a method for executing a dynamic allocation command on embedded heterogeneous multi-core in the technical field of computers. The method comprises the following steps of: partitioning a binary code program to obtain a plurality of basic blocks; respectively selecting each basic block so as to obtain a target processing core for executing each basic block; translating a basic block which corresponds to the obtained target processing core so as to obtain a translated binary code on the target processing core; and performing statistics on execution frequency of each basic block, marking a basic block of which the execution frequency is greater than a threshold value T as a hot-spot basic block, and caching the translated binary code of the hot-spot basic block into the cache. The method dynamically allocates commands onto each heterogeneous multi-core to be executed according to the processing capacity and load condition of the system multi-core and the like, so that the method overcomes the defect that static scheduling cannot dynamically allocate resources and also reduces the complexity of dynamic thread division. Therefore, the execution efficiency of the program on the heterogeneous multi-core is further improved.
Description
Technical field
What the present invention relates to is a kind of method of field of computer technology, specifically is a kind of method of carrying out the dynamic assignment instruction on the embedded heterogeneous multi-core.
Background technology
In recent years, because people are more and more higher to the requirement of computing power, the processing power of CPU also needed further raising.At first, the performance of raising CPU mainly is to realize by improving dominant frequency and front-side bus frequency and enlarging buffer memory.But the raising dominant frequency has proposed very big challenge to the problem of CPU power consumption and heat radiation, and enlarging buffer memory merely also needs to increase cost, and the development of core processor has reached a limit.In order further to improve the processing power of CPU, the polycaryon processor technology progresses into market.Since 2006, companies such as Intel and AMD have issued polycaryon processors such as double-core, four nuclears respectively.Along with polycaryon processor comes into the market completely, the computing machine multinuclear epoch arrive.
Integrated a plurality of microprocessor cores in single-chip, the executive routine code under the situation that does not promote frequency of operation, not only can improve handling property, and solve the problem of power consumption well fully concurrently, becomes the developing direction of following processor.But, the simple quantity that increases micro-processor kernel can not improve the processing power of CPU ideally, aspect the parallel processing capability of enhancement process device, the most important still parallelization of program itself, but the serial operating part of most of programs still can hinder the lifting of handling property.It is generally acknowledged that in the processor architecture of this isomorphism multinuclear, 4-8 kernel may be the limit that polycaryon processor can obtain better performance boost effect.
In order further to improve the processing power of CPU, the processor core of integrated a plurality of isomeries in a chip a kind of heterogeneous processor framework appearred, i.e..In this framework, different processor cores can be brought into play higher treatment efficiency (considering factors such as performance, power consumption, technology and cost) for application-specific, be good at the processing floating-point operation such as the core that has, and the kernel that has is good at work such as signal Processing.Make full use of the function difference of heterogeneous polynuclear, can satisfy the demand of different field different application more fully.The chip Cell that is jointly made by IBM, Sony and Toshiba etc. is the model of this heterogeneous processor just, and it is one piece of polycaryon processor that has 9 hardware cores.In the Cell chip, have only one to be the Power processor of IBM standard, all the other 8 kernels all are for handling the coprocessor that image is custom-designed, be used for floating-point operation.Wherein, the main function of primary processor is exactly to be responsible for the distribution of task, and actual floating-point operation work all is to be finished by coprocessor.Because the coprocessor among the Cell only is responsible for the floating-point operation task, required operation rule is very simple, and corresponding circuit logic is like this equally, as long as the CPU running frequency is enough high, Cell just can obtain surprising floating-point usefulness.
At present, heterogeneous multi-nucleus processor mainly contains three classes, according to the difference of instruction set, the equal mutually kind of instruction set identical but hardware configuration difference, the complete difference of instruction set and part instruction set core is arranged.Wherein for the heterogeneous multi-nucleus processor that the universal instruction set core is arranged, its different separately partial design is suitable for handling the application of different demands for the instruction set of expansion.In order to catch up with the developing steps of heterogeneous multi-nucleus processor, disposal route in the time of need designing more advanced compiling system and operation towards heterogeneous multi-nucleus processor the time comprises the design of programming language, compilation model and Runtime Library etc.In this respect, in the main working set in the optimization of task scheduling strategy and Thread-Level Parallelism.
The static scheduling of existing task mainly contains two kinds of methods: a kind of is at compiling duration different code compiles to be become to be fit to the binary code that different process nuclear is carried out; Another kind is the Runtime Library of link different disposal nuclear when operation.These two kinds of methods all are statically task assignment to be carried out on specific process nuclear, therefore require program just to consider the isomerism of different IPs when writing, and the programming personnel must have understanding fully to isomery nuclear.This method is not considered problems such as the load balancing of heterogeneous polynuclear and power consumption in addition, and the improper distribution of task may make the program implementation time increase, and informational needs such as load balancing just can be known when operation.
Find from the retrieval of prior art document, problem during for the complexity that overcomes this programming and operation, thread-level dynamic dispatching method has appearred, existing this method is carried out varigrained thread dividing to task, make different threads to carry out concurrently on different cores, and then improved execution efficient.But this method need be considered data dependency, in fact be difficult to the procedure division that serial is carried out is become the multithreading task of executed in parallel, in addition because the data dependency of different threads, also need to consider problems such as data sync and buffer consistency, may make thread scheduling become complicated more, cause the performance of program further to descend.
Summary of the invention
The objective of the invention is to overcome above-mentioned deficiency of the prior art, provide a kind of and on embedded heterogeneous multi-core, carry out the method that dynamic assignment is instructed.The present invention according to factors such as the processing power of system's multinuclear and loading condition dynamically distribution instruction to each heterogeneous polynuclear, carry out, thereby remedied static scheduling can not the dynamic assignment resource deficiency, also reduce the complicacy of dynamic thread dividing, thereby further improved the execution efficient of program on heterogeneous polynuclear.
The present invention is achieved by the following technical solutions, the present invention includes following steps:
The first step in being written into the process of binary code program, is divided processing to binary code program, obtains some fundamental blocks.
Described division is handled, be with the program between i bar entry instruction and the jump instruction of i+1 bar as i+1 fundamental block, wherein: 0≤i, entry instruction be the instruction that forwards to of jump instruction or closely follow jump instruction after instruction.
Second step, respectively each fundamental block is selected to handle, obtain carrying out the target processing core of each fundamental block.
Described selection is handled, and may further comprise the steps:
2.1) obtain with i fundamental block in j bar instruction A have the instruction A* of same function;
2.2) obtain the process nuclear A of all processing instruction A or instruction A*, and then handled the process nuclear set I of every instruction in i the fundamental block simultaneously;
2.3) according to p=1/N, obtain the execution performance of every instruction of each process nuclear execution among the process nuclear set I, and each process nuclear is carried out the execution performance addition of every instruction, obtain each process nuclear in this set and carry out the average behavior of i fundamental block, wherein: p is the execution performance of processor execution command, and N is that this processor is carried out the required instruction cycles of this instruction;
2.4) according to n=P/L, obtain the performance load ratio of each process nuclear, wherein: n is the performance load ratio, and L is the running load of this process nuclear, and P is the average behavior that this process nuclear is carried out i fundamental block;
2.5) from process nuclear set I, select performance load than the target processing core of maximum process nuclear as i fundamental block.
In the 3rd step, the corresponding fundamental block of target processing core that second step was obtained carries out Translation Processing, obtains the binary code after the translation on this target processing core.
Described Translation Processing may further comprise the steps:
3.1) when having the binary code translated to this target processing core in the high-speed cache, directly carry out 3.3); Otherwise, carry out 3.2) after carry out 3.3 again);
3.2) binary code of correspondence when this fundamental block is translated into this target processing core processing dynamically;
3.3) when there are data dependency in binary code and the preorder fundamental block carried out, carry out 3.4) and after carry out 3.5 again); Otherwise, directly carry out 3.5);
3.4) after the preorder fundamental block carry out to finish, obtain the execution result on the corresponding process nuclear, when the preorder fundamental block is assigned to when carrying out on other process nuclear, switched system context then;
3.5) binary code of this fundamental block correspondence of execution on this target processing core.
In the 4th step, the execution frequency of adding up each fundamental block is labeled as the focus fundamental block with carrying out the fundamental block of frequency greater than threshold value T, and the binary code buffer memory after the focus fundamental block translated is to the high speed buffer memory.
The 5th step, returned for second step, carry out the dynamic dispatching of next fundamental block.
Compared with prior art, the invention has the beneficial effects as follows: need not fully to understand the instruction feature of heterogeneous polynuclear, therefore compiler has also obtained the platform transparency, need or not compile different binary codes at different architectural framework codings.Simultaneously, this method is in having the heterogeneous multi-core system of universal instruction set, the function of performance dynamic instruction scheduling, according to factors such as the processing power of system's multinuclear and loading condition dynamically distribution instruction to each heterogeneous polynuclear, carry out, thereby remedied static scheduling can not the dynamic assignment resource deficiency, also reduce the complicacy of dynamic thread dividing, thereby further improved the execution efficient of program on heterogeneous polynuclear.This method does not need to carry out thread dividing, can farthest walk abreast under the situation that no datat relies on, and has therefore reduced about 15% thread synchronization expense.
Description of drawings
Fig. 1 is the method flow synoptic diagram of embodiment.
Embodiment
Below in conjunction with accompanying drawing method of the present invention is further described: present embodiment is being to implement under the prerequisite with the technical solution of the present invention, provided detailed embodiment and concrete operating process, but protection scope of the present invention is not limited to following embodiment.
Embodiment
As shown in Figure 1, present embodiment may further comprise the steps:
The first step in being written into the process of binary code program, is divided processing to binary code program, obtains some fundamental blocks.
Described division is handled, be with the program between i bar entry instruction and the jump instruction of i+1 bar as i+1 fundamental block, wherein: 0≤i, entry instruction be the instruction that forwards to of jump instruction or closely follow jump instruction after instruction.
According to the jump instruction of binary program code, define the entry instruction and the exit instruction of fundamental block in the present embodiment, each fundamental block is one group of instruction sequence of carrying out in proper order, and entry instruction is divided into a fundamental block to the instruction code between exit instruction.The entry instruction of fundamental block is article one instruction of fundamental block, i.e. instruction after the jump instruction or the instruction that jumps to, and exit instruction is a last instruction or the jump instruction of other fundamental block entry instruction.Partial ordering relation before and after keeping between the different fundamental blocks.
Second step, respectively each fundamental block is selected to handle, obtain carrying out the target processing core of each fundamental block.
Described selection is handled, and may further comprise the steps:
2.1) obtain with i fundamental block in j bar instruction A have the instruction A of same function
*
2.2) obtain all processing instruction A or instruct A
*Process nuclear A, and then handled the process nuclear set I of every instruction in i the fundamental block simultaneously;
2.3) according to p=1/N, obtain the execution performance of every instruction of each process nuclear execution among the process nuclear set I, and each process nuclear is carried out the execution performance addition of every instruction, obtain each process nuclear in this set and carry out the average behavior of i fundamental block, wherein: p is the execution performance of processor execution command, and N is that this processor is carried out the required instruction cycles of this instruction;
2.4) according to n=P/L, obtain the performance load ratio of each process nuclear, wherein: n is the performance load ratio, and L is the running load of this process nuclear, and P is the average behavior that this process nuclear is carried out i fundamental block;
2.5) from process nuclear set I, select performance load than the target processing core of maximum process nuclear as i fundamental block.
In the 3rd step, the corresponding fundamental block of target processing core that second step was obtained carries out Translation Processing, obtains the binary code after the translation on this target processing core.
Described Translation Processing may further comprise the steps:
3.1) when having the binary code translated to this target processing core in the high-speed cache, directly carry out 3.3); Otherwise, carry out 3.2) after carry out 3.3 again);
3.2) binary code of correspondence when this fundamental block is translated into this target processing core processing dynamically;
3.3) when there are data dependency in binary code and the preorder fundamental block carried out, carry out 3.4) and after carry out 3.5 again); Otherwise, directly carry out 3.5);
3.4) after the preorder fundamental block carry out to finish, obtain the execution result on the corresponding process nuclear, when the preorder fundamental block is assigned to when carrying out on other process nuclear, switched system context then;
3.5) binary code of this fundamental block correspondence of execution on this target processing core.
Utilize the binary translation technology former binary code to be mapped to the instruction of target processing core in the present embodiment, have two kinds of situations: a kind of situation is the similar that instruction is mapped to the instruction of target processing core, there is one-to-one relationship in they, and can finish same function; Another kind of situation be have on the target processing core one or one group of instruction that more has superiority possess with former fundamental block in one group of function that instruction is of equal value, such as the AVX (senior vector extension instruction set) of Intel.Instruction after the translation or instruction group have higher execution efficient or power consumption ratio.
In the 4th step, the execution frequency of adding up each fundamental block is labeled as the focus fundamental block with carrying out the fundamental block of frequency greater than threshold value T, and the binary code buffer memory after the focus fundamental block translated is to the high speed buffer memory.
Present embodiment can directly take out the frequent code of carrying out from high-speed cache, thereby greatly reduces the expense of code translation mapping.
The 5th step, returned for second step, carry out the dynamic dispatching of next fundamental block.
Present embodiment is in the process of instruction-level dynamic optimization, pipeline organization is formed in the execution of code on the translation mapping of the selection of target processing core, binary command and the process nuclear, obviously reduced the handover overhead of command assignment to heterogeneous polynuclear, compare with dynamic thread-level dispatching method, this method does not need to carry out thread dividing, under the situation that no datat relies on, can farthest walk abreast, therefore reduce about 15% thread synchronization expense.In addition, the target processing core code after the employing caches translations has further reduced expense, has promoted system performance, carries out the computing of video compress routine on a kind of chip of the DSP of having process nuclear, has on average reached 1.8 times speed-up ratio.
Claims (4)
1. on embedded heterogeneous multi-core, carry out the method that dynamic assignment is instructed for one kind, it is characterized in that, may further comprise the steps:
The first step in being written into the process of binary code program, is divided processing to binary code program, obtains some fundamental blocks;
Second step, respectively each fundamental block is selected to handle, obtain carrying out the target processing core of each fundamental block;
In the 3rd step, the corresponding fundamental block of target processing core that second step was obtained carries out Translation Processing, obtains the binary code after the translation on this target processing core;
In the 4th step, the execution frequency of adding up each fundamental block is labeled as the focus fundamental block with carrying out the fundamental block of frequency greater than threshold value T, and the binary code buffer memory after the focus fundamental block translated is to the high speed buffer memory;
The 5th step, returned for second step, carry out the dynamic dispatching of next fundamental block.
2. the method for carrying out the dynamic assignment instruction on the embedded heterogeneous multi-core according to claim 1, it is characterized in that, division described in the first step is handled, be as i+1 fundamental block with the program between i bar entry instruction and the jump instruction of i+1 bar, wherein: 0≤i, entry instruction is the instruction that forwards to of jump instruction or immediately following the instruction after the jump instruction.
3. the method for carrying out the dynamic assignment instruction on the embedded heterogeneous multi-core according to claim 1 is characterized in that, the selection described in second step is handled, and may further comprise the steps:
2.1) obtain with i fundamental block in j bar instruction A have the instruction A of same function
*
2.2) obtain all processing instruction A or instruct A
*Process nuclear A, and then handled the process nuclear set I of every instruction in i the fundamental block simultaneously;
2.3) according to p=1/N, obtain the execution performance of every instruction of each process nuclear execution among the process nuclear set I, and each process nuclear is carried out the execution performance addition of every instruction, obtain each process nuclear in this set and carry out the average behavior of i fundamental block, wherein: p is the execution performance of processor execution command, and N is that this processor is carried out the required instruction cycles of this instruction;
2.4) according to n=P/L, obtain the performance load ratio of each process nuclear, wherein: n is the performance load ratio, and L is the running load of this process nuclear, and P is the average behavior that this process nuclear is carried out i fundamental block;
2.5) from process nuclear set I, select performance load than the target processing core of maximum process nuclear as i fundamental block.
4. the method for carrying out the dynamic assignment instruction on the embedded heterogeneous multi-core according to claim 1 is characterized in that, the Translation Processing described in the 3rd step may further comprise the steps:
3.1) when having the binary code translated to this target processing core in the high-speed cache, directly carry out 3.3); Otherwise, carry out 3.2) after carry out 3.3 again);
3.2) binary code of correspondence when this fundamental block is translated into this target processing core processing dynamically;
3.3) when there are data dependency in binary code and the preorder fundamental block carried out, carry out 3.4) and after carry out 3.5 again); Otherwise, directly carry out 3.5);
3.4) after the preorder fundamental block carry out to finish, obtain the execution result on the corresponding process nuclear, when the preorder fundamental block is assigned to when carrying out on other process nuclear, switched system context then;
3.5) binary code of this fundamental block correspondence of execution on this target processing core.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN 201010251261 CN101923492B (en) | 2010-08-11 | 2010-08-11 | Method for executing dynamic allocation command on embedded heterogeneous multi-core |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN 201010251261 CN101923492B (en) | 2010-08-11 | 2010-08-11 | Method for executing dynamic allocation command on embedded heterogeneous multi-core |
Publications (2)
Publication Number | Publication Date |
---|---|
CN101923492A true CN101923492A (en) | 2010-12-22 |
CN101923492B CN101923492B (en) | 2013-05-01 |
Family
ID=43338446
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN 201010251261 Expired - Fee Related CN101923492B (en) | 2010-08-11 | 2010-08-11 | Method for executing dynamic allocation command on embedded heterogeneous multi-core |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN101923492B (en) |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2013001401A1 (en) * | 2011-06-28 | 2013-01-03 | International Business Machines Corporation | Unified, adaptive ras for hybrid systems |
US20130080805A1 (en) * | 2011-09-23 | 2013-03-28 | Qualcomm Incorporated | Dynamic partitioning for heterogeneous cores |
CN103207772A (en) * | 2013-04-07 | 2013-07-17 | 北京航空航天大学 | Instruction prefetching content selecting method for optimizing WCET (worst-case execution time) of real-time task |
WO2014047828A1 (en) * | 2012-09-27 | 2014-04-03 | Intel Corporation | Method and apparatus to schedule store instructions across atomic regions in binary translation |
US8799693B2 (en) | 2011-09-20 | 2014-08-05 | Qualcomm Incorporated | Dynamic power optimization for computing devices |
GB2514221A (en) * | 2013-03-12 | 2014-11-19 | Intel Corp | Creating an isolated execution environment in a co-designed processor |
CN104471557A (en) * | 2012-06-18 | 2015-03-25 | 谷歌公司 | Optimized execution of dynamic languages |
CN105242909A (en) * | 2015-11-24 | 2016-01-13 | 无锡江南计算技术研究所 | Method for many-core circulation partitioning based on multi-version code generation |
CN106020922A (en) * | 2016-05-30 | 2016-10-12 | 湖南科技大学 | Instruction dispatching method for executing packet padding idle rhythms by means of skip target basic blocks |
CN107250985A (en) * | 2015-02-27 | 2017-10-13 | 华为技术有限公司 | For Heterogeneous Computing API(API)System and method |
CN107870818A (en) * | 2017-10-19 | 2018-04-03 | 福州瑞芯微电子股份有限公司 | Polycaryon processor interrupts dynamic response method and storage medium |
CN108874727A (en) * | 2018-05-29 | 2018-11-23 | 中国人民解放军国防科技大学 | GPDSP-oriented multi-core parallel computing implementation method |
CN109960398A (en) * | 2011-03-11 | 2019-07-02 | 英特尔公司 | The dynamic core selection of system is felt concerned about for heterogeneous polynuclear |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101299194A (en) * | 2008-06-26 | 2008-11-05 | 上海交通大学 | Heterogeneous multi-core system thread-level dynamic dispatching method based on configurable processor |
CN101299199A (en) * | 2008-06-26 | 2008-11-05 | 上海交通大学 | Heterogeneous multi-core system based on configurable processor and instruction set extension |
CN101329638A (en) * | 2007-06-18 | 2008-12-24 | 国际商业机器公司 | Method and system for analyzing parallelism of program code |
CN101387969A (en) * | 2008-10-16 | 2009-03-18 | 上海交通大学 | Dynamic binary translation method for cooperation design of software and hardware |
US20090154572A1 (en) * | 2007-12-17 | 2009-06-18 | Samsung Electronics Co., Ltd. | Method and apparatus for video decoding based on a multi-core processor |
-
2010
- 2010-08-11 CN CN 201010251261 patent/CN101923492B/en not_active Expired - Fee Related
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101329638A (en) * | 2007-06-18 | 2008-12-24 | 国际商业机器公司 | Method and system for analyzing parallelism of program code |
US20090154572A1 (en) * | 2007-12-17 | 2009-06-18 | Samsung Electronics Co., Ltd. | Method and apparatus for video decoding based on a multi-core processor |
CN101299194A (en) * | 2008-06-26 | 2008-11-05 | 上海交通大学 | Heterogeneous multi-core system thread-level dynamic dispatching method based on configurable processor |
CN101299199A (en) * | 2008-06-26 | 2008-11-05 | 上海交通大学 | Heterogeneous multi-core system based on configurable processor and instruction set extension |
CN101387969A (en) * | 2008-10-16 | 2009-03-18 | 上海交通大学 | Dynamic binary translation method for cooperation design of software and hardware |
Cited By (31)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11755099B2 (en) | 2011-03-11 | 2023-09-12 | Intel Corporation | Dynamic core selection for heterogeneous multi-core systems |
CN109960398A (en) * | 2011-03-11 | 2019-07-02 | 英特尔公司 | The dynamic core selection of system is felt concerned about for heterogeneous polynuclear |
CN109960398B (en) * | 2011-03-11 | 2023-11-07 | 英特尔公司 | Dynamic core selection for heterogeneous multi-core systems |
GB2506048A (en) * | 2011-06-28 | 2014-03-19 | Ibm | Unified, adaptive RAS for hybrid systems |
CN103562873B (en) * | 2011-06-28 | 2016-10-26 | 国际商业机器公司 | For processing the method and system of data in computer systems |
US8788871B2 (en) | 2011-06-28 | 2014-07-22 | International Business Machines Corporation | Unified, workload-optimized, adaptive RAS for hybrid systems |
CN103562873A (en) * | 2011-06-28 | 2014-02-05 | 国际商业机器公司 | Unified, adaptive RAS for hybrid systems |
GB2506048B (en) * | 2011-06-28 | 2020-10-14 | Ibm | Unified, adaptive RAS for hybrid systems |
WO2013001401A1 (en) * | 2011-06-28 | 2013-01-03 | International Business Machines Corporation | Unified, adaptive ras for hybrid systems |
US8799693B2 (en) | 2011-09-20 | 2014-08-05 | Qualcomm Incorporated | Dynamic power optimization for computing devices |
US9098309B2 (en) * | 2011-09-23 | 2015-08-04 | Qualcomm Incorporated | Power consumption optimized translation of object code partitioned for hardware component based on identified operations |
US20130080805A1 (en) * | 2011-09-23 | 2013-03-28 | Qualcomm Incorporated | Dynamic partitioning for heterogeneous cores |
CN104471557A (en) * | 2012-06-18 | 2015-03-25 | 谷歌公司 | Optimized execution of dynamic languages |
CN104471557B (en) * | 2012-06-18 | 2016-11-02 | 谷歌公司 | The optimization of dynamic language performs |
WO2014047828A1 (en) * | 2012-09-27 | 2014-04-03 | Intel Corporation | Method and apparatus to schedule store instructions across atomic regions in binary translation |
US9141362B2 (en) | 2012-09-27 | 2015-09-22 | Intel Corporation | Method and apparatus to schedule store instructions across atomic regions in binary translation |
US9405551B2 (en) | 2013-03-12 | 2016-08-02 | Intel Corporation | Creating an isolated execution environment in a co-designed processor |
GB2514221B (en) * | 2013-03-12 | 2017-02-01 | Intel Corp | Creating an isolated execution environment in a co-designed processor |
GB2514221A (en) * | 2013-03-12 | 2014-11-19 | Intel Corp | Creating an isolated execution environment in a co-designed processor |
CN103207772B (en) * | 2013-04-07 | 2016-01-13 | 北京航空航天大学 | A kind of instruction prefetch content selection method optimizing real-time task WCET |
CN103207772A (en) * | 2013-04-07 | 2013-07-17 | 北京航空航天大学 | Instruction prefetching content selecting method for optimizing WCET (worst-case execution time) of real-time task |
CN107250985A (en) * | 2015-02-27 | 2017-10-13 | 华为技术有限公司 | For Heterogeneous Computing API(API)System and method |
CN107250985B (en) * | 2015-02-27 | 2020-10-16 | 华为技术有限公司 | System and method for heterogeneous computing Application Programming Interface (API) |
CN105242909A (en) * | 2015-11-24 | 2016-01-13 | 无锡江南计算技术研究所 | Method for many-core circulation partitioning based on multi-version code generation |
CN105242909B (en) * | 2015-11-24 | 2017-08-11 | 无锡江南计算技术研究所 | A kind of many-core cyclic blocking method based on multi version code building |
CN106020922B (en) * | 2016-05-30 | 2019-01-08 | 湖南科技大学 | The instruction dispatching method of idle beat is filled with the execution packet of jump target basic block |
CN106020922A (en) * | 2016-05-30 | 2016-10-12 | 湖南科技大学 | Instruction dispatching method for executing packet padding idle rhythms by means of skip target basic blocks |
CN107870818B (en) * | 2017-10-19 | 2021-03-02 | 瑞芯微电子股份有限公司 | Multi-core processor interrupt dynamic response method and storage medium |
CN107870818A (en) * | 2017-10-19 | 2018-04-03 | 福州瑞芯微电子股份有限公司 | Polycaryon processor interrupts dynamic response method and storage medium |
CN108874727B (en) * | 2018-05-29 | 2019-09-10 | 中国人民解放军国防科技大学 | GPDSP-oriented multi-core parallel computing implementation method |
CN108874727A (en) * | 2018-05-29 | 2018-11-23 | 中国人民解放军国防科技大学 | GPDSP-oriented multi-core parallel computing implementation method |
Also Published As
Publication number | Publication date |
---|---|
CN101923492B (en) | 2013-05-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN101923492B (en) | Method for executing dynamic allocation command on embedded heterogeneous multi-core | |
CN101299194B (en) | Heterogeneous multi-core system thread-level dynamic dispatching method based on configurable processor | |
CN105487838A (en) | Task-level parallel scheduling method and system for dynamically reconfigurable processor | |
CN102981807A (en) | Graphics processing unit (GPU) program optimization method based on compute unified device architecture (CUDA) parallel environment | |
CN104106049A (en) | Rasterization of compute shaders | |
CN108595258A (en) | A kind of GPGPU register files dynamic expansion method | |
Wang et al. | Simultaneous multikernel: Fine-grained sharing of gpus | |
Chen et al. | Guided region-based GPU scheduling: utilizing multi-thread parallelism to hide memory latency | |
CN102087609A (en) | Dynamic binary translation method under multi-processor platform | |
CN105242909A (en) | Method for many-core circulation partitioning based on multi-version code generation | |
Hu et al. | The role of return value prediction in exploiting speculative method-level parallelism | |
Diaz et al. | The supercodelet architecture | |
Huang et al. | Minimizing energy consumption of embedded systems via optimal code layout | |
Li et al. | Code layout optimization for defensiveness and politeness in shared cache | |
Zheng et al. | Performance model for OpenMP parallelized loops | |
Sampaio et al. | Divergence analysis with affine constraints | |
Mantripragada et al. | A new framework for integrated global local scheduling | |
Tian et al. | Optimizing gpu register usage: Extensions to openacc and compiler optimizations | |
Lima et al. | HPSM: a programming framework for multi-cpu and multi-gpu systems | |
Georgakoudis et al. | Fast dynamic binary rewriting to support thread migration in shared-isa asymmetric multicores | |
Li et al. | Code motion for migration minimization in STT-RAM based hybrid cache | |
Ravi et al. | Semi-automatic restructuring of offloadable tasks for many-core accelerators | |
Jakimovska et al. | Modern processor architectures overview | |
Akram | Managed language runtimes on heterogeneous hardware: Optimizations for performance, efficiency and lifetime improvement | |
Zhang et al. | Binary translation to improve energy efficiency through post-pass register re-allocation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20130501 Termination date: 20150811 |
|
EXPY | Termination of patent right or utility model |