CN102929723B - Method for dividing parallel program segment based on heterogeneous multi-core processor - Google Patents

Method for dividing parallel program segment based on heterogeneous multi-core processor Download PDF

Info

Publication number
CN102929723B
CN102929723B CN201210441326.9A CN201210441326A CN102929723B CN 102929723 B CN102929723 B CN 102929723B CN 201210441326 A CN201210441326 A CN 201210441326A CN 102929723 B CN102929723 B CN 102929723B
Authority
CN
China
Prior art keywords
core
data
many
program
parallel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201210441326.9A
Other languages
Chinese (zh)
Other versions
CN102929723A (en
Inventor
陈德训
房田文
吴宏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuxi Jiangnan Computing Technology Institute
Original Assignee
Wuxi Jiangnan Computing Technology Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuxi Jiangnan Computing Technology Institute filed Critical Wuxi Jiangnan Computing Technology Institute
Priority to CN201210441326.9A priority Critical patent/CN102929723B/en
Publication of CN102929723A publication Critical patent/CN102929723A/en
Application granted granted Critical
Publication of CN102929723B publication Critical patent/CN102929723B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention provides a method for dividing a parallel program segment based on a heterogeneous multi-core processor. The method includes that data correlation analysis is performed on data of an application subject to determine whether program basic segments without data correlations exist; if the program basic segments without the data correlations exist, calculation amounts of program basic segments without the data correlations are calculated; and a first stage multi-core division is performed on the program basic segments without the data correlations. According to the method, the adaptability problem of general scientific calculation and engineering applications in a polymorphic heterogeneous computer system can be solved, and multi-core stage parallel efficiencies and load balancing effects are improved simultaneously.

Description

Based on the parallel program segment division methods of isomery many-core processor
Technical field
The present invention relates to computer realm, particularly relate to a kind of parallel program segment division methods based on isomery many-core processor.
Background technology
In recent years, in order to improve the computing power of system, the main body that multinuclear, many-core processor become high-performance computer gradually sets up parts.On the other hand, towards solving the developing direction that particular problem is microprocessor Design, heterogeneous processor has carried out the isomery design of processor core for the singularity of specific area problem, distinguish by operation dissimilar in teriseable workloads, process with different processor cores respectively, to obtain high-performance on the whole.This structure and isomorphism multi-core processor system form the polymorphic heterogeneous computing system of great scale.Polymorphic heterogeneous system computing power is strong, Energy Efficiency Ratio is high, it is one of important development direction solving major application, the super large parallel scale of simultaneity factor and the architecture of polymorphic complexity, bring huge challenge to traditional high-performance calculation application, the Parallel Implementation technology matched lacks.Therefore study the many granularities concurrent program root segment division methods based on isomery many-core processor, become the technical barrier that those skilled in the art are urgently to be resolved hurrily.
In the concurrent program implementation method supporting heterogeneous computer system, the current Parallel Implementation towards heterogeneous computer system is mostly based on two-stage parallel model, i.e. MPI(" message passing interface ", MessagePassing Interface) parallel+many core walks abreast two-step way, coarse grain parallelism, many core of MPI level implementation process level walk abreast and mainly complete the speed-up computation of core iterative part, namely only carry out fine-grained many core according to core loop and walk abreast.MPI walk abreast+realization of two-stage hybrid parallel programming model that walks abreast of many core and optimisation technique in, MPI level walks abreast and mainly adopts the overlap of MPI communication and calculating to be optimized, and the parallel methods such as data layout optimization, data transfer optimization, calculating and memory access overlap that mainly adopt of many core levels carry out Performance tuning.The acceleration effect of concrete problem and problem calculation features and optimisation technique realize closely related.
According to reading up the literature at present, the integrated solution process that many core Parallel Implementation of application only expend huge part core calculations or certain simple problem for wherein calculated amount is carried out, comprehensively not deep solution is solved to the numerical simulation of many complicated practical problemss, therefore the MPI level parallel scale of the main Parallel application of heterogeneous computer system is all in 100,000 magnitudes at present, and very difficult support is applied with more complicated parallel computation more on a large scale.The overall acceleration effect of practical application problem is general.
In addition, only carry out the fine grained parallel of many core levels according to core loop, its parallel efficiency is limited to the scale of actual motion problem.Such as, if actual subject run grid scale dimension size be M, on isomery many-core processor from check figure be N, if carry out the fine grained parallel of many core levels according to core loop, when M<N, then (N-M) individual computing power from core is not utilized.And as M>N and M is not the integral multiple of N time, will be very undesirable from the load balance effect of core level fine grained parallel.Therefore, many core level fine grained parallel technology of existing core loop are difficult to give full play to the computing power from core.
Be in the Chinese patent application of CN1783011A at publication number, disclose more related contents.
Summary of the invention
Technical matters to be solved by this invention is the adaptability problem that the general scientific algorithm of solution and Engineering are applied on polymorphic heterogeneous computer system, improves parallel efficiency and the load balance effect of many core levels simultaneously.
In order to solve the problem, the invention provides a kind of parallel program segment division methods based on isomery many-core processor, comprising:
Data dependence analysis is carried out, to determine whether there is the program root segment without data dependence to the data of application problem;
If there is the described program root segment without data dependence, then calculate the calculated amount of each program root segment without data dependence; According to described calculated amount, the many core of the first order is carried out to the described program root segment without data dependence and divides.
Optionally, also comprise after the many core of the first order divides described carrying out:
Analyze each program root segment without data dependence, described program root segment is decomposed into multiple computation cycles;
Data recurrence correlation analysis is carried out to the data in each computation cycles, to determine whether there is the countless computation cycles according to recurrence correlativity;
If there is the described countless computation cycles according to recurrence correlativity, then the many core in the second level is carried out to the described countless computation cycles according to recurrence correlativity and divide.
Optionally, described calculated amount comprises: Floating-point Computation amount and fixed point calculation amount.
Optionally, carry out the many core divisions of the first order described in comprise: the task division and the load balance that carry out the first order with the first granularity.
Optionally, carry out the many core divisions of the first order described in comprise: the task division and the load balance that carry out the first order with the first granularity;
The described many core in the second level that carries out divides and comprises: the task division and the load balance that carry out the second level with the second granularity.
Optionally, described second granularity is less than described first granularity.
Optionally, before the described data to application problem carry out data dependence analysis, also comprise:
ANALYSIS OF CALCULATING is carried out to described application problem;
Based on the result of described ANALYSIS OF CALCULATING, with the 3rd granularity, MPI level parallel task is carried out to described application problem and divides.
Optionally, described 3rd granularity is greater than described first granularity.
Compared with prior art, technical scheme of the present invention has the following advantages:
The present invention is by multi-level fine-grained division parallel program segment, make respectively from task division and the load more equilibrium of core processor, thus the computing power that can play more fully from core, obtain good acceleration effect, be applied in adaptability problem on polymorphic heterogeneous computer system to solve general scientific algorithm and Engineering.
Accompanying drawing explanation
Fig. 1 is the schematic flow sheet of parallel program segment division methods first embodiment based on isomery many-core processor of the present invention;
Fig. 2 is the schematic flow sheet of parallel program segment division methods second embodiment based on isomery many-core processor of the present invention;
Fig. 3 is the schematic flow sheet of parallel program segment division methods the 3rd embodiment based on isomery many-core processor of the present invention.
Embodiment
Set forth a lot of detail in the following description so that fully understand the present invention.But the present invention can be much different from alternate manner described here to implement, those skilled in the art can when without prejudice to doing similar popularization when intension of the present invention, therefore the present invention is by the restriction of following public concrete enforcement.
Secondly, the present invention utilizes schematic diagram to be described in detail, and when describing the embodiment of the present invention in detail, for ease of illustrating, described schematic diagram is example, and it should not limit the scope of protection of the invention at this.
In order to solve the technical matters in background technology, the invention provides a kind of parallel program segment division methods based on isomery many-core processor.Fig. 1 is the schematic flow sheet of parallel program segment division methods first embodiment based on isomery many-core processor of the present invention.As shown in Figure 1, this embodiment comprises the following steps:
Perform step S101, data dependence analysis is carried out, to determine whether there is the program root segment without data dependence to the data of application problem.Particularly, if program root segment 1 be Y=F (X), program root segment 2 is Z=F (Y), then think that two program root segments exist data dependence, can only serial computing, cannot executed in parallel.
Perform step S102, judge whether to there is the program root segment without data dependence.
If do not exist, then not having can the program root segment of parallel computation, cannot carry out the division of parallel program segment, terminate.
If exist, that is: existing can the program root segment of parallel computation, then perform step S103, calculate the calculated amount of each program root segment without data dependence.Particularly, described calculated amount comprises Floating-point Computation amount and fixed point calculation amount.Perform step S104, according to calculated amount, the many core of the first order is carried out to the described program root segment without data dependence and divides.The described many core of the first order that carries out divides and comprises: the task division and the load balance that carry out the first order with the first granularity, that is: same root segment completes from core group at one, is determined from the size of core group by the calculated amount of this root segment.Divided by the many core of the first order, can realize the fine grained parallel of program root segment, that is: what follow procedure root segment carried out walking abreast walks abreast from core group.
Below in conjunction with embodiment, technical scheme of the present invention is described further.
In the present embodiment, determine, in certain application problem, have 2 without the program root segment of data dependence, remember with module1 and module2 by step S101.In the present embodiment, be 100 be described to participate in the total quantity from core of parallel computation.By performing step S103, calculating is learnt: the calculated amount of program root segment module1 is 2, and the calculated amount of program root segment module2 is 3.
Then perform step S104, according to calculated amount, the many core of the first order is carried out to program root segment module1 and program root segment module2 and divides.Give program root segment module1 by 100 from 40 core, form first from core group.Give program root segment module2 by 100 from 60 core, form second from core group.
It should be noted that, the present embodiment is intended to the technical program is described, thus select divide comparatively simple from nuclear volume and program segment.It will be understood by those skilled in the art that in actual Large-scale parallel computing, can be millions of from nuclear volume, program is also more complicated, and the present invention does not do concrete restriction to this.
Fig. 2 is the schematic flow sheet of parallel program segment division methods second embodiment based on isomery many-core processor of the present invention.With the first embodiment unlike, in this embodiment, carrying out according to program root segment on the basis of parallel patition, also further each root segment of segmentation, the many core in the second level carried out to the core loop relevant according to recurrence countless in root segment and divides.
As shown in Figure 2, this embodiment comprises the following steps:
Perform step S201, data dependence analysis is carried out, to determine whether there is the program root segment without data dependence to the data of application problem.
Perform step S202, judge whether to there is the program root segment without data dependence.
If do not exist, then terminate.
If exist, then perform step S203, calculate the calculated amount of each program root segment without data dependence.
Perform step S204, according to calculated amount, the many core of the first order is carried out to the described program root segment without data dependence and divides.
Continue to perform step S205, analyze the program root segment without data dependence, described program root segment is decomposed into multiple computation cycles.
Perform step S206, data recurrence correlation analysis is carried out to the data in each computation cycles, to determine whether there is the countless computation cycles according to recurrence correlativity.Particularly, if the interior data variable X of circulation i, j, k=F (X i-1, j, k, X i, j-1, k, X i, j, k-1), then think that variable exists recurrence correlativity, otherwise think that variable is without recurrence correlativity.
Perform step S207, whether determining program root segment exists the countless core loop according to recurrence correlativity.If exist, then perform step S208, the many core in the second level is carried out to the countless computation cycles according to recurrence correlativity and divides, perform step S209 afterwards.Particularly, carry out the many core divisions in the second level described in comprise: the task division and the load balance that carry out the second level with the second granularity.Described second granularity is less than described first granularity.The many core of the described first order is divided into walking abreast from core group of being undertaken walking abreast by root segment, and the many core in the described second level is divided into walking abreast from core of root segment Inner eycle level, for the many core of the first order divides on basis more fine-grained parallel.
If do not exist, then directly perform step S209, judge whether that each program root segment without data dependence all processes.If so, then terminate.Otherwise, circulate from step S205 place, continue to analyze next program root segment without data dependence.
The present embodiment still continues to use program root segment module1 and the module2 of previous embodiment.Such as: perform step S205, first analyze the program root segment module1 without data dependence, program root segment module1 is decomposed into 2 computation cycles, remembers with loop1 and loop2.
Perform step S206, data recurrence correlation analysis is carried out to the data of computation cycles loop1 and loop2, determines computation cycles loop1, computation cycles loop2 is computation cycles without recurrence correlativity.
Due in program root segment module1 with or without the computation cycles loop1 of recurrence correlativity, loop2, therefore, perform step S208, the many core in the second level carried out to computation cycles loop1, loop2 and divides.The many core in the second level can be carried out according to the calculated amount of computation cycles to divide.Such as: in previous embodiment, for program root segment module1 is assigned with 40 from core., see further segmentation herein: 30 perform computation cycles loop1 from core, another 10 perform computation cycles loop2 from core.
Then, owing to also having program root segment module2 untreated, so again perform step S205, analyze the program root segment module2 without data dependence, program root segment module1 is decomposed into 3 computation cycles, remembers with loop1 ', loop2 ' and loop3 '.
Perform step S206, data recurrence correlation analysis is carried out to the data of computation cycles loop1 ', loop2 ' and loop3 ', determine that computation cycles loop1 ', loop2 ' and loop3 ' are the computation cycles of recurrence correlativity, so must serial computing be carried out, cannot executed in parallel, thus no longer carry out the many core in the second level and divide.
Fig. 3 is the schematic flow sheet of parallel program segment division methods the 3rd embodiment based on isomery many-core processor of the present invention.Carrying out parallel root segment on the basis that 2 kinds of different grain sizes divide unlike, this embodiment with the second embodiment, the coarseness that further comprises MPI process level divides.
As shown in Figure 3, this embodiment comprises the following steps:
Perform step S301, ANALYSIS OF CALCULATING is carried out to application problem.
Perform step S302, based on the result of ANALYSIS OF CALCULATING, with the 3rd granularity, MPI level parallel task is carried out to application problem and divides.Particularly, described MPI level behavior coarseness divide, and mainly refer to that the MPI process level of Region Decomposition one-level is parallel.Described 3rd granularity is greater than first granularity of carrying out parallel patition according to root segment, is also certainly greater than second granularity of carrying out parallel patition according to computation cycles.
Perform step S303, data dependence analysis is carried out, to determine whether there is the program root segment without data dependence to the data of application problem.
Perform step S304, judge whether to there is the program root segment without data dependence.If do not exist, then terminate.
If exist, then perform step S305, calculate the calculated amount of each program root segment without data dependence.
Perform step S306, according to calculated amount, the many core of the first order is carried out to the described program root segment without data dependence and divides.
Perform step S307, analyze the program root segment without data dependence, described program root segment is decomposed into multiple computation cycles.
Perform step S308, data recurrence correlation analysis is carried out to the data in each computation cycles, to determine whether there is the countless computation cycles according to recurrence correlativity.
Perform step S309, whether determining program root segment exists the countless core loop according to recurrence correlativity.If exist, then perform step S310, the many core in the second level is carried out to the countless computation cycles according to recurrence correlativity and divides, perform step S311 afterwards.
If do not exist, then directly perform step S311, judge whether that each program root segment without data dependence all processes.If so, then terminate.Otherwise, circulate from step S307, continue to analyze next program root segment without data dependence.
It should be noted that, although it will be understood by those skilled in the art that multi-level fine grained parallel of the present invention can effectively improve from core acceleration effect, play the computing power from core more fully.But multi-level fine grained parallel of the present invention may bring the increase of storage space and a small amount of double counting simultaneously.Therefore, the concrete division adopting which kind of level and granularity to carry out parallel program segment, need operating personnel to combine reality and make balance between degree of parallelism and memory space, the present invention does not do concrete restriction to this.
It should be noted that, through the above description of the embodiments, those skilled in the art can be well understood to and of the present inventionly partly or entirely can to realize in conjunction with required general hardware platform by software.Based on such understanding, technical scheme of the present invention can embody with the form of software product the part that prior art contributes in essence in other words, this computer software product can comprise the one or more machine readable medias it storing machine-executable instruction, and these instructions can make this one or more machine carry out executable operations according to embodiments of the invention when being performed by one or more machine such as such as computing machine, computer network or other electronic equipments etc.Machine readable media can comprise, but be not limited to, floppy disk, CD, CD-ROM(compact-disc-ROM (read-only memory)), magneto-optic disk, ROM(ROM (read-only memory)), RAM(random access memory), EPROM(Erasable Programmable Read Only Memory EPROM), EEPROM(Electrically Erasable Read Only Memory), magnetic or optical card, flash memory or be suitable for the medium/machine readable media of other types of storing machine executable instruction.
The present invention can be used in numerous general or special purpose computing system environment or configuration.Such as: personal computer, server computer, handheld device or portable set, laptop device, multicomputer system, system based on microprocessor, set top box, programmable consumer-elcetronics devices, network PC, small-size computer, mainframe computer, comprise the distributed computing environment etc. of above any system or equipment.
The present invention can describe in the general context of computer executable instructions, such as program module.Usually, program module comprises the routine, program, object, assembly, data structure etc. that perform particular task or realize particular abstract data type.Also can put into practice the application in a distributed computing environment, in these distributed computing environment, be executed the task by the remote processing devices be connected by communication network.In a distributed computing environment, program module can be arranged in the local and remote computer-readable storage medium comprising memory device.
The present invention divides on basis in the MPI level coarseness of prior art, partition by fine granularities is carried out according to program root segment, and more partition by fine granularities can be carried out further in the core loop of program root segment, achieve multi-level fine-grained program segment in large scale scale heterogeneous parallel computation to divide, improve the load balance effect from core, thus the computing power played more fully from core, effectively improve the overall acceleration effect of application problem, for the Effec-tive Function of general scientific algorithm and Engineering using integral problem provides effective power-assisted.
In addition, the present invention is except can providing better concurrency to heterogeneous applications itself, also sum up General Method for Optimization and means that the general rule of general scientific algorithm class problem many core Parallel Implementation method and solution and part improve many core parallel efficiencies, for the basic composing software system of large scale scale heterogeneous computer system and parallel compilation software systems are offered reference and criterion.
Although the present invention with preferred embodiment openly as above; but it is not for limiting the present invention; any those skilled in the art without departing from the spirit and scope of the present invention; the Method and Technology content of above-mentioned announcement can be utilized to make possible variation and amendment to technical solution of the present invention; therefore; every content not departing from technical solution of the present invention; the any simple modification done above embodiment according to technical spirit of the present invention, equivalent variations and modification, all belong to the protection domain of technical solution of the present invention.

Claims (7)

1., based on a parallel program segment division methods for isomery many-core processor, it is characterized in that, comprising:
Data dependence analysis is carried out, to determine whether there is the program root segment without data dependence to the data of application problem;
If there is the described program root segment without data dependence, then calculate the calculated amount of each program root segment without data dependence; According to described calculated amount, the many core of the first order is carried out to the described program root segment without data dependence and divides;
After the many core of the first order divides described carrying out, analyze each program root segment without data dependence, described program root segment is decomposed into multiple computation cycles;
Data recurrence correlation analysis is carried out to the data in each computation cycles, to determine whether there is the countless computation cycles according to recurrence correlativity;
If there is the described countless computation cycles according to recurrence correlativity, then the many core in the second level is carried out to the described countless computation cycles according to recurrence correlativity and divide.
2., as claimed in claim 1 based on the parallel program segment division methods of isomery many-core processor, it is characterized in that:
Described calculated amount comprises: Floating-point Computation amount and fixed point calculation amount.
3., as claimed in claim 1 based on the parallel program segment division methods of isomery many-core processor, it is characterized in that:
The described many core of the first order that carries out divides and comprises: the task division and the load balance that carry out the first order with the first granularity;
The described many core in the second level that carries out divides and comprises: the task division and the load balance that carry out the second level with the second granularity.
4., as claimed in claim 3 based on the parallel program segment division methods of isomery many-core processor, it is characterized in that:
Described second granularity is less than described first granularity.
5. as claimed in claim 1 or 2 based on the parallel program segment division methods of isomery many-core processor, it is characterized in that, before the described data to application problem carry out data dependence analysis, also comprise:
ANALYSIS OF CALCULATING is carried out to described application problem;
Based on the result of described ANALYSIS OF CALCULATING, with the 3rd granularity, MPI level parallel task is carried out to described application problem and divides.
6. the parallel program segment division methods based on isomery many-core processor as described in claim 3 or 4, is characterized in that, before the described data to application problem carry out data dependence analysis, also comprises:
ANALYSIS OF CALCULATING is carried out to described application problem;
Based on the result of described ANALYSIS OF CALCULATING, with the 3rd granularity, MPI level parallel task is carried out to described application problem and divides.
7., as claimed in claim 6 based on the parallel program segment division methods of isomery many-core processor, it is characterized in that:
Described 3rd granularity is greater than described first granularity.
CN201210441326.9A 2012-11-06 2012-11-06 Method for dividing parallel program segment based on heterogeneous multi-core processor Active CN102929723B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210441326.9A CN102929723B (en) 2012-11-06 2012-11-06 Method for dividing parallel program segment based on heterogeneous multi-core processor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210441326.9A CN102929723B (en) 2012-11-06 2012-11-06 Method for dividing parallel program segment based on heterogeneous multi-core processor

Publications (2)

Publication Number Publication Date
CN102929723A CN102929723A (en) 2013-02-13
CN102929723B true CN102929723B (en) 2015-07-08

Family

ID=47644530

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210441326.9A Active CN102929723B (en) 2012-11-06 2012-11-06 Method for dividing parallel program segment based on heterogeneous multi-core processor

Country Status (1)

Country Link
CN (1) CN102929723B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103279391A (en) * 2013-06-09 2013-09-04 浪潮电子信息产业股份有限公司 Load balancing optimization method based on CPU (central processing unit) and MIC (many integrated core) framework processor cooperative computing
CN104834556B (en) * 2015-04-26 2018-06-22 西北工业大学 A kind of mapping method of polymorphic real-time task and polymorphic computing resource
CN105808358B (en) * 2016-03-29 2019-10-18 西安交通大学 A kind of data dependence thread packet mapping method for many-core system
CN108491263A (en) * 2018-03-02 2018-09-04 珠海市魅族科技有限公司 Data processing method, data processing equipment, terminal and readable storage medium storing program for executing
CN109828790B (en) * 2019-01-31 2020-10-20 上海赜睿信息科技有限公司 Data processing method and system based on Shenwei heterogeneous many-core processor
CN111782348A (en) * 2019-04-04 2020-10-16 北京沃东天骏信息技术有限公司 Application program processing method, device, system and computer readable storage medium
CN110222007B (en) * 2019-06-20 2023-11-24 山东省计算中心(国家超级计算济南中心) Acceleration operation method based on Shenwei many-core processor
CN112559032B (en) * 2019-09-25 2023-06-13 无锡江南计算技术研究所 Many-core program reconstruction method based on circulation segment
CN113010112B (en) * 2021-03-09 2022-11-11 重庆邮电大学 Data distribution optimization method based on variable resistive memory

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101075199A (en) * 2006-05-18 2007-11-21 迈普(四川)通信技术有限公司 Method for scheduling multiple CPU
CN101086710A (en) * 2006-06-06 2007-12-12 学校法人早稻田大学 Global compiler for controlling heterogeneous multiprocessor
CN101706741A (en) * 2009-12-11 2010-05-12 中国人民解放军国防科学技术大学 Method for partitioning dynamic tasks of CPU and GPU based on load balance
WO2011155046A1 (en) * 2010-06-10 2011-12-15 富士通株式会社 Multi-core processor system, control program, and method of control
CN102681902A (en) * 2012-05-15 2012-09-19 浙江大学 Load balancing method based on task distribution of multicore system
CN102707952A (en) * 2012-05-16 2012-10-03 上海大学 User description based programming design method on embedded heterogeneous multi-core processor

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1910553A (en) * 2004-01-08 2007-02-07 皇家飞利浦电子股份有限公司 Method and apparatus for scheduling task in multi-processor system based on memory requirements

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101075199A (en) * 2006-05-18 2007-11-21 迈普(四川)通信技术有限公司 Method for scheduling multiple CPU
CN101086710A (en) * 2006-06-06 2007-12-12 学校法人早稻田大学 Global compiler for controlling heterogeneous multiprocessor
CN101706741A (en) * 2009-12-11 2010-05-12 中国人民解放军国防科学技术大学 Method for partitioning dynamic tasks of CPU and GPU based on load balance
WO2011155046A1 (en) * 2010-06-10 2011-12-15 富士通株式会社 Multi-core processor system, control program, and method of control
CN102681902A (en) * 2012-05-15 2012-09-19 浙江大学 Load balancing method based on task distribution of multicore system
CN102707952A (en) * 2012-05-16 2012-10-03 上海大学 User description based programming design method on embedded heterogeneous multi-core processor

Also Published As

Publication number Publication date
CN102929723A (en) 2013-02-13

Similar Documents

Publication Publication Date Title
CN102929723B (en) Method for dividing parallel program segment based on heterogeneous multi-core processor
Petra et al. Real-time stochastic optimization of complex energy systems on high-performance computers
Tan et al. Design and implementation of adaptive spmv library for multicore and many-core architecture
Wu et al. A heterogeneous platform with GPU and FPGA for power efficient high performance computing
Mullowney et al. Preparing an incompressible-flow fluid dynamics code for exascale-class wind energy simulations
Rojek et al. Systematic adaptation of stencil‐based 3D MPDATA to GPU architectures
Ma et al. Improving reliability of soft real-time embedded systems on integrated CPU and GPU platforms
Bao et al. PWCET: power-aware worst case execution time analysis
Huang et al. H2Pack: High-performance H 2 matrix package for kernel matrices using the proxy point method
Liu et al. Floorplanning with graph attention
Wu et al. MIC acceleration of short-range molecular dynamics simulations
Czarnul et al. Modeling energy consumption of parallel applications
Löffler et al. A new parallelization scheme for adaptive mesh refinement
Araki et al. Dynamic load balancing with over decomposition in plasma plume simulations
Voss et al. On predictable reconfigurable system design
Bunt et al. Predictive evaluation of partitioning algorithms through runtime modelling
Shantharam et al. Exploiting dense substructures for fast sparse matrix vector multiplication
Halbiniak et al. Dynamic workload prediction and distribution in numerical modeling of solidification on multi‐/manycore architectures
Carracciuolo et al. Implementation of a non‐linear solver on heterogeneous architectures
Soudris et al. EXA2PRO programming environment: Architecture and Applications
Jin et al. Nuclear Reactor Simulation on OpenCL FPGA: a Case Study of RSBench
Dubey et al. Simulation planning using component based cost model
Ghale et al. Task-based parallel computation of the density matrix in quantum-based molecular dynamics using graph partitioning
Trobec et al. Implementation and evaluation of MPI‐based parallel MD program
Rafiev et al. Power-proportional modelling fidelity

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant