CN101923492B - Method for executing dynamic allocation command on embedded heterogeneous multi-core - Google Patents

Method for executing dynamic allocation command on embedded heterogeneous multi-core Download PDF

Info

Publication number
CN101923492B
CN101923492B CN 201010251261 CN201010251261A CN101923492B CN 101923492 B CN101923492 B CN 101923492B CN 201010251261 CN201010251261 CN 201010251261 CN 201010251261 A CN201010251261 A CN 201010251261A CN 101923492 B CN101923492 B CN 101923492B
Authority
CN
China
Prior art keywords
instruction
fundamental block
core
nuclear
binary code
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN 201010251261
Other languages
Chinese (zh)
Other versions
CN101923492A (en
Inventor
过敏意
娄林
伍倩
朱寅
沈耀
马曦
唐飞龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN 201010251261 priority Critical patent/CN101923492B/en
Publication of CN101923492A publication Critical patent/CN101923492A/en
Application granted granted Critical
Publication of CN101923492B publication Critical patent/CN101923492B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Devices For Executing Special Programs (AREA)

Abstract

The invention discloses a method for executing a dynamic allocation command on embedded heterogeneous multi-core in the technical field of computers. The method comprises the following steps of: partitioning a binary code program to obtain a plurality of basic blocks; respectively selecting each basic block so as to obtain a target processing core for executing each basic block; translating a basic block which corresponds to the obtained target processing core so as to obtain a translated binary code on the target processing core; and performing statistics on execution frequency of each basic block, marking a basic block of which the execution frequency is greater than a threshold value T as a hot-spot basic block, and caching the translated binary code of the hot-spot basic block into the cache. The method dynamically allocates commands onto each heterogeneous multi-core to be executed according to the processing capacity and load condition of the system multi-core and the like, so that the method overcomes the defect that static scheduling cannot dynamically allocate resources and also reduces the complexity of dynamic thread division. Therefore, the execution efficiency of the program on the heterogeneous multi-core is further improved.

Description

The method of executing dynamic allocation command on embedded heterogeneous multi-core
Technical field
What the present invention relates to is a kind of method of field of computer technology, specifically a kind of method of executing dynamic allocation command on embedded heterogeneous multi-core.
Background technology
In recent years, because people are more and more higher to the requirement of computing power, the processing power of CPU also needed further raising.At first, the performance of raising CPU mainly is to realize by improving dominant frequency and front-side bus frequency and enlarging buffer memory.But the raising dominant frequency has proposed very large challenge to the problem of CPU power consumption and heat radiation, and enlarging merely buffer memory also needs to increase cost, and the development of core processor has reached a limit.In order further to improve the processing power of CPU, the polycaryon processor technology progresses into market.Since 2006, the companies such as Intel and AMD have issued respectively the polycaryon processors such as double-core, four nuclears.Along with polycaryon processor comes into the market completely, the computing machine multinuclear epoch arrive.
Integrated multi-microprocessor nuclear in single-chip can complete parallel ground executive routine code, in the situation that do not promote frequency of operation, not only can improve handling property, and solve well the problem of power consumption, becomes the developing direction of following processor.But, the simple quantity that increases micro-processor kernel can not improve the processing power of CPU ideally, aspect the parallel processing capability that strengthens processor, the most important or parallelization of program itself, but the serial operating part of most of programs still can hinder the lifting of handling property.It is generally acknowledged, in the processor architecture of this isomorphism multinuclear, 4-8 kernel may be the limit that polycaryon processor can obtain better performance boost effect.
In order further to improve the processing power of CPU, the processor core of integrated a plurality of isomeries in a chip a kind of heterogeneous processor framework appearred, i.e..In this framework, different processor cores can be brought into play higher treatment efficiency (considering the factors such as performance, power consumption, technique and cost) for application-specific, be good at the processing floating-point operation such as the core that has, and some kernels are good at the work such as signal processing.Take full advantage of the function difference of heterogeneous polynuclear, can satisfy more fully the demand of different field different application.The chip Cell that is jointly made by IBM, Sony and Toshiba etc. is the model of this heterogeneous processor just, and it is one piece of polycaryon processor that has 9 hardware cores.In the Cell chip, only have one to be the Power processor of IBM standard, all the other 8 kernels all are for processing the coprocessor that image is custom-designed, be used for floating-point operation.Wherein, the Major Function of primary processor is exactly the distribution of being responsible for task, and actual floating-point operation work all is to be finished by coprocessor.Because the coprocessor among the Cell only is responsible for the floating-point operation task, required operation rule is very simple, and corresponding circuit logic is like this equally, as long as the CPU running frequency is enough high, Cell just can obtain surprising floating-point usefulness.
At present, heterogeneous multi-nucleus processor mainly contains three classes, according to the difference of instruction set, has that instruction set is identical but hardware configuration different, the complete difference of instruction set and part instruction set core equal kind mutually.Wherein for the heterogeneous multi-nucleus processor that the universal instruction set core is arranged, its separately different partial design be suitable for processing the application of different demands for the instruction set of expansion.In order to catch up with the developing steps of heterogeneous multi-nucleus processor, disposal route in the time of need to designing more advanced compiling system and operation towards heterogeneous multi-nucleus processor the time comprises the design of programming language, compilation model and Runtime Library etc.In this respect, in the main working set in the optimization of task scheduling strategy and Thread-Level Parallelism.
The static scheduling of existing task mainly contains two kinds of methods: a kind of is at compiling duration different code compilations to be become to be fit to the binary code that different processing nuclear is carried out; Another kind is the Runtime Library of link different disposal nuclear when operation.These two kinds of methods all are statically task assignment upward to be carried out to specific processing nuclear, therefore require program just to consider the isomerism of different IPs when writing, and the programming personnel must examine have fully to isomery and understand.The method is not considered the problems such as the load balancing of heterogeneous polynuclear and power consumption in addition, and the improper distribution of task may make the program implementation time increase, and the informational needs such as load balancing just can be known when operation.
Find from the retrieval of prior art document, problem during for the complexity that overcomes this programming and operation, thread-level dynamic dispatching method has appearred, existing this method is carried out varigrained thread dividing to task, so that different threads can carry out on different cores concurrently, and then improved execution efficient.But the method need to be considered data dependency, in fact be difficult to the procedure division that serial is carried out is become the multithreading task of executed in parallel, in addition because the data dependency of different threads, need also to consider that data reach the problems such as buffer consistency synchronously, may make the thread scheduling more complex, cause the performance of program further to descend.
Summary of the invention
The object of the invention is to overcome above-mentioned deficiency of the prior art, a kind of method of executing dynamic allocation command on embedded heterogeneous multi-core is provided.The present invention according to the factors such as the processing power of system's multinuclear and loading condition dynamically distribution instruction to each heterogeneous polynuclear, carry out, thereby remedied static scheduling can not the dynamic assignment resource deficiency, also reduce the complicacy of dynamic thread dividing, thereby further improved the execution efficient of program on heterogeneous polynuclear.
The present invention is achieved by the following technical solutions, the present invention includes following steps:
The first step in being written into the process of binary code program, is divided processing to binary code program, obtains some fundamental blocks.
Described division is processed, be with the program between i bar entry instruction and the jump instruction of i+1 bar as i+1 fundamental block, wherein: 0≤i, entry instruction be the instruction that forwards to of jump instruction or closely follow jump instruction after instruction.
Second step is selected to process to each fundamental block respectively, obtains carrying out the target processing core of each fundamental block.
Described selection is processed, and may further comprise the steps:
2.1) obtain with i fundamental block in j bar instruction A have the instruction A* of same function;
2.2) obtain the processing nuclear A of all processing instruction A or instruction A*, and then processed simultaneously the processing nuclear set I of every instruction in i the fundamental block;
2.3) according to p=1/N, obtain processing the execution performance of every instruction of each processing nuclear execution among the nuclear set I, and each is processed the execution performance addition that nuclear is carried out every instruction, obtain the average behavior of i fundamental block of each processing nuclear execution in this set, wherein: p is the execution performance that processor is carried out instruction, and N is that this processor is carried out the required instruction cycles of this instruction;
2.4) according to n=P/L, obtain the performance load ratio that each processes nuclear, wherein: n is the performance load ratio, and L is the running load of this processing nuclear, and P is the average behavior that this processing nuclear is carried out i fundamental block;
2.5) from process nuclear set I, select performance load than the target processing core of maximum processing nuclear as i fundamental block.
The 3rd step, the corresponding fundamental block of target processing core that second step obtains is translated processing, obtain the binary code after this target processing core translation.
Described translation is processed, and may further comprise the steps:
3.1) when having the binary code translated to this target processing core in the high-speed cache, directly carry out 3.3); Otherwise, carry out 3.2) after carry out again 3.3);
3.2) corresponding binary code when this fundamental block is translated into dynamically this target processing core and processed;
3.3) when there are data dependency in binary code and the preorder fundamental block carried out, carry out 3.4) and after carry out again 3.5); Otherwise, directly carry out 3.5);
3.4) after the preorder fundamental block carry out to finish, processed accordingly the execution result on the nuclear, when the preorder fundamental block is assigned to other and processes that nuclear is upper to be carried out, switched system context then;
3.5) carry out binary code corresponding to this fundamental block in this target processing core.
In the 4th step, the execution frequency of adding up each fundamental block is labeled as the focus fundamental block with carrying out the fundamental block of frequency greater than threshold value T, and the binary code buffer memory after the focus fundamental block translated is to the high speed buffer memory.
The 5th step, return second step, carry out the dynamic dispatching of next fundamental block.
Compared with prior art, the invention has the beneficial effects as follows: need not fully to understand the instruction feature of heterogeneous polynuclear, therefore compiler has also obtained the platform transparency, need to or not compile different binary codes for different architectural framework codings.Simultaneously, the method is in having the heterogeneous multi-core system of universal instruction set, the function of performance dynamic instruction scheduling, according to the factors such as the processing power of system's multinuclear and loading condition dynamically distribution instruction to each heterogeneous polynuclear, carry out, thereby remedied static scheduling can not the dynamic assignment resource deficiency, also reduce the complicacy of dynamic thread dividing, thereby further improved the execution efficient of program on heterogeneous polynuclear.This method does not need to carry out thread dividing, in the situation that countlessly can farthest walk abreast according to relying on, has therefore reduced about 15% thread synchronization expense.
Description of drawings
Fig. 1 is the method flow schematic diagram of embodiment.
Embodiment
Below in conjunction with accompanying drawing method of the present invention is further described: the present embodiment is implemented under take technical solution of the present invention as prerequisite, provided detailed embodiment and concrete operating process, but protection scope of the present invention is not limited to following embodiment.
Embodiment
As shown in Figure 1, the present embodiment may further comprise the steps:
The first step in being written into the process of binary code program, is divided processing to binary code program, obtains some fundamental blocks.
Described division is processed, be with the program between i bar entry instruction and the jump instruction of i+1 bar as i+1 fundamental block, wherein: 0≤i, entry instruction be the instruction that forwards to of jump instruction or closely follow jump instruction after instruction.
According to the jump instruction of binary program code, define entry instruction and the exit instruction of fundamental block in the present embodiment, each fundamental block is one group of instruction sequence of sequentially carrying out, and entry instruction is divided into a fundamental block to the instruction code between exit instruction.The entry instruction of fundamental block is article one instruction of fundamental block, i.e. instruction after the jump instruction or the instruction that jumps to, and exit instruction is a upper instruction or the jump instruction of other fundamental block entry instruction.Partial ordering relation before and after keeping between the different fundamental blocks.
Second step is selected to process to each fundamental block respectively, obtains carrying out the target processing core of each fundamental block.
Described selection is processed, and may further comprise the steps:
2.1) obtain with i fundamental block in j bar instruction A have the instruction A of same function *
2.2) obtain all processing instruction A or instruction A *Processing nuclear A, and then the processing nuclear of being processed simultaneously every instruction in i the fundamental block is gathered I;
2.3) according to p=1/N, obtain processing the execution performance of every instruction of each processing nuclear execution among the nuclear set I, and each is processed the execution performance addition that nuclear is carried out every instruction, obtain the average behavior of i fundamental block of each processing nuclear execution in this set, wherein: p is the execution performance that processor is carried out instruction, and N is that this processor is carried out the required instruction cycles of this instruction;
2.4) according to n=P/L, obtain the performance load ratio that each processes nuclear, wherein: n is the performance load ratio, and L is the running load of this processing nuclear, and P is the average behavior that this processing nuclear is carried out i fundamental block;
2.5) from process nuclear set I, select performance load than the target processing core of maximum processing nuclear as i fundamental block.
The 3rd step, the corresponding fundamental block of target processing core that second step obtains is translated processing, obtain the binary code after this target processing core translation.
Described translation is processed, and may further comprise the steps:
3.1) when having the binary code translated to this target processing core in the high-speed cache, directly carry out 3.3); Otherwise, carry out 3.2) after carry out again 3.3);
3.2) corresponding binary code when this fundamental block is translated into dynamically this target processing core and processed;
3.3) when there are data dependency in binary code and the preorder fundamental block carried out, carry out 3.4) and after carry out again 3.5); Otherwise, directly carry out 3.5);
3.4) after the preorder fundamental block carry out to finish, processed accordingly the execution result on the nuclear, when the preorder fundamental block is assigned to other and processes that nuclear is upper to be carried out, switched system context then;
3.5) carry out binary code corresponding to this fundamental block in this target processing core.
Utilize the binary translation technology former binary code to be mapped to the instruction of target processing core in the present embodiment, have two kinds of situations: a kind of situation is that the structure of the instruction instruction that is mapped to target processing core is similar, there is one-to-one relationship in they, and can finish same function; Another kind of situation be have on the target processing core one or one group of instruction that more has superiority possess with former fundamental block in the function of one group of instruction equivalence, such as the AVX (senior vector extension instruction set) of Intel.Instruction after the translation or instruction group have higher execution efficient or power dissipation ratio.
In the 4th step, the execution frequency of adding up each fundamental block is labeled as the focus fundamental block with carrying out the fundamental block of frequency greater than threshold value T, and the binary code buffer memory after the focus fundamental block translated is to the high speed buffer memory.
The present embodiment can directly take out the frequent code of carrying out from high-speed cache, thereby greatly reduces the expense of code translation mapping.
The 5th step, return second step, carry out the dynamic dispatching of next fundamental block.
The present embodiment is in the process of instruction-level dynamic optimization, the selection of target processing core, the translation of binary command mapping and the execution composition pipeline organization of processing the upper code of nuclear, obviously reduced the handover overhead of command assignment to heterogeneous polynuclear, compare with dynamic thread-level dispatching method, this method does not need to carry out thread dividing, in the situation that countlessly can farthest walk abreast according to relying on, therefore reduced about 15% thread synchronization expense.In addition, the target processing core code after the employing caches translations has further reduced expense, has promoted system performance, and the chip of processing nuclear at a kind of DSP of having carries out the computing of video compress routine, has on average reached 1.8 times speed-up ratio.

Claims (2)

1. the method for an executing dynamic allocation command on embedded heterogeneous multi-core is characterized in that, may further comprise the steps:
The first step in being written into the process of binary code program, is divided processing to binary code program, obtains some fundamental blocks;
Second step is selected to process to each fundamental block respectively, obtains carrying out the target processing core of each fundamental block, specifically may further comprise the steps:
2.1) obtain with i fundamental block in j bar instruction A have the instruction A of same function *
2.2) obtain all processing instruction A or instruction A *Processing nuclear B, and then the processing nuclear of being processed simultaneously every instruction in i the fundamental block is gathered I;
2.3) according to p=1/N, obtain processing the execution performance of every instruction of each processing nuclear execution among the nuclear set I, and each is processed the execution performance addition that nuclear is carried out every instruction, obtain the average behavior of i fundamental block of each processing nuclear execution in this set, wherein: p is the execution performance that processor is carried out instruction, and N is that this processor is carried out the required instruction cycles of this instruction;
2.4) according to n=P/L, obtain the performance load ratio that each processes nuclear, wherein: n is the performance load ratio, and L is the running load of this processing nuclear, and P is the average behavior that this processing nuclear is carried out i fundamental block;
2.5) from process nuclear set I, select performance load than the target processing core of maximum processing nuclear as i fundamental block;
The 3rd step, the corresponding fundamental block of target processing core that second step obtains is translated processing, obtain the binary code after this target processing core translation;
In the 4th step, the execution frequency of executed each fundamental block of statistics is labeled as the focus fundamental block with carrying out the fundamental block of frequency greater than threshold value T, and the binary code buffer memory after the focus fundamental block translated is to the high speed buffer memory;
The 5th step, return second step, carry out the dynamic dispatching of next fundamental block; Division described in the first step is processed, be with the program between i bar entry instruction and the jump instruction of i+1 bar as i+1 fundamental block, wherein: 0≤i, entry instruction be the instruction that forwards to of jump instruction or closely follow jump instruction after instruction;
2. the method for executing dynamic allocation command on embedded heterogeneous multi-core according to claim 1 is characterized in that, the translation described in the 3rd step is processed, and may further comprise the steps:
3.1) when having the binary code translated to this target processing core in the high-speed cache, directly carry out 3.3); Otherwise, carry out 3.2) after carry out again 3.3);
3.2) corresponding binary code when this fundamental block is translated into dynamically this target processing core and processed;
3.3) when there are data dependency in binary code and the preorder fundamental block carried out, carry out 3.4) and after carry out again 3.5); Otherwise, directly carry out 3.5);
3.4) after the preorder fundamental block carry out to finish, processed accordingly the execution result on the nuclear, when the binary code of preorder fundamental block is assigned to when carrying out on other target processing core, switched system context then;
3.5) carry out binary code corresponding to this fundamental block in this target processing core.
CN 201010251261 2010-08-11 2010-08-11 Method for executing dynamic allocation command on embedded heterogeneous multi-core Expired - Fee Related CN101923492B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201010251261 CN101923492B (en) 2010-08-11 2010-08-11 Method for executing dynamic allocation command on embedded heterogeneous multi-core

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201010251261 CN101923492B (en) 2010-08-11 2010-08-11 Method for executing dynamic allocation command on embedded heterogeneous multi-core

Publications (2)

Publication Number Publication Date
CN101923492A CN101923492A (en) 2010-12-22
CN101923492B true CN101923492B (en) 2013-05-01

Family

ID=43338446

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201010251261 Expired - Fee Related CN101923492B (en) 2010-08-11 2010-08-11 Method for executing dynamic allocation command on embedded heterogeneous multi-core

Country Status (1)

Country Link
CN (1) CN101923492B (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8683243B2 (en) 2011-03-11 2014-03-25 Intel Corporation Dynamic core selection for heterogeneous multi-core systems
US8788871B2 (en) * 2011-06-28 2014-07-22 International Business Machines Corporation Unified, workload-optimized, adaptive RAS for hybrid systems
US8799693B2 (en) 2011-09-20 2014-08-05 Qualcomm Incorporated Dynamic power optimization for computing devices
US9098309B2 (en) * 2011-09-23 2015-08-04 Qualcomm Incorporated Power consumption optimized translation of object code partitioned for hardware component based on identified operations
US9104449B2 (en) * 2012-06-18 2015-08-11 Google Inc. Optimized execution of dynamic languages
CN106170761B (en) 2012-09-27 2019-05-10 英特尔公司 Method and apparatus for dispatching store instruction across multiple atomic regions in Binary Conversion
US9405551B2 (en) * 2013-03-12 2016-08-02 Intel Corporation Creating an isolated execution environment in a co-designed processor
CN103207772B (en) * 2013-04-07 2016-01-13 北京航空航天大学 A kind of instruction prefetch content selection method optimizing real-time task WCET
WO2016134784A1 (en) * 2015-02-27 2016-09-01 Huawei Technologies Co., Ltd. Systems and methods for heterogeneous computing application programming interfaces (api)
CN105242909B (en) * 2015-11-24 2017-08-11 无锡江南计算技术研究所 A kind of many-core cyclic blocking method based on multi version code building
CN106020922B (en) * 2016-05-30 2019-01-08 湖南科技大学 The instruction dispatching method of idle beat is filled with the execution packet of jump target basic block
CN107870818B (en) * 2017-10-19 2021-03-02 瑞芯微电子股份有限公司 Multi-core processor interrupt dynamic response method and storage medium
CN108874727B (en) * 2018-05-29 2019-09-10 中国人民解放军国防科技大学 GPDSP-oriented multi-core parallel computing implementation method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101299194A (en) * 2008-06-26 2008-11-05 上海交通大学 Heterogeneous multi-core system thread-level dynamic dispatching method based on configurable processor
CN101299199A (en) * 2008-06-26 2008-11-05 上海交通大学 Heterogeneous multi-core system based on configurable processor and instruction set extension
CN101329638A (en) * 2007-06-18 2008-12-24 国际商业机器公司 Method and system for analyzing parallelism of program code
CN101387969A (en) * 2008-10-16 2009-03-18 上海交通大学 Dynamic binary translation method for cooperation design of software and hardware
US20090154572A1 (en) * 2007-12-17 2009-06-18 Samsung Electronics Co., Ltd. Method and apparatus for video decoding based on a multi-core processor

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101329638A (en) * 2007-06-18 2008-12-24 国际商业机器公司 Method and system for analyzing parallelism of program code
US20090154572A1 (en) * 2007-12-17 2009-06-18 Samsung Electronics Co., Ltd. Method and apparatus for video decoding based on a multi-core processor
CN101299194A (en) * 2008-06-26 2008-11-05 上海交通大学 Heterogeneous multi-core system thread-level dynamic dispatching method based on configurable processor
CN101299199A (en) * 2008-06-26 2008-11-05 上海交通大学 Heterogeneous multi-core system based on configurable processor and instruction set extension
CN101387969A (en) * 2008-10-16 2009-03-18 上海交通大学 Dynamic binary translation method for cooperation design of software and hardware

Also Published As

Publication number Publication date
CN101923492A (en) 2010-12-22

Similar Documents

Publication Publication Date Title
CN101923492B (en) Method for executing dynamic allocation command on embedded heterogeneous multi-core
CN102981807B (en) Graphics processing unit (GPU) program optimization method based on compute unified device architecture (CUDA) parallel environment
US20140304490A1 (en) Information processing device and information processing method
CN101299194B (en) Heterogeneous multi-core system thread-level dynamic dispatching method based on configurable processor
US20140304491A1 (en) Processor system and accelerator
CN104106049A (en) Rasterization of compute shaders
CN108595258A (en) A kind of GPGPU register files dynamic expansion method
CN102193779A (en) MPSoC (multi-processor system-on-chip)-oriented multithread scheduling method
CN103279445A (en) Computing method and super-computing system for computing task
Wang et al. Simultaneous multikernel: Fine-grained sharing of gpus
CN102087609A (en) Dynamic binary translation method under multi-processor platform
Salamy et al. An effective solution to task scheduling and memory partitioning for multiprocessor system-on-chip
KR20170024898A (en) Scheme for dynamic controlling of processing device based on application characteristics
CN105242909A (en) Method for many-core circulation partitioning based on multi-version code generation
Diaz et al. The supercodelet architecture
Huang et al. Minimizing energy consumption of embedded systems via optimal code layout
Tian et al. Optimizing gpu register usage: Extensions to openacc and compiler optimizations
Mantripragada et al. A new framework for integrated global local scheduling
Lima et al. HPSM: a programming framework for multi-cpu and multi-gpu systems
CN110262884B (en) Running method for multi-program multi-data-stream partition parallel in core group based on Shenwei many-core processor
Li et al. Code motion for migration minimization in STT-RAM based hybrid cache
Georgakoudis et al. Fast dynamic binary rewriting to support thread migration in shared-isa asymmetric multicores
CN113448586A (en) Integration of automated compiler data flow optimization
Kaouane et al. SysCellC: Systemc on cell
Nikov et al. High-performance simultaneous multiprocessing for heterogeneous System-on-Chip

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20130501

Termination date: 20150811

EXPY Termination of patent right or utility model