CN102929723A - Method for dividing parallel program segment based on heterogeneous multi-core processor - Google Patents

Method for dividing parallel program segment based on heterogeneous multi-core processor Download PDF

Info

Publication number
CN102929723A
CN102929723A CN2012104413269A CN201210441326A CN102929723A CN 102929723 A CN102929723 A CN 102929723A CN 2012104413269 A CN2012104413269 A CN 2012104413269A CN 201210441326 A CN201210441326 A CN 201210441326A CN 102929723 A CN102929723 A CN 102929723A
Authority
CN
China
Prior art keywords
many
correlativity
countless
core processor
granularity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2012104413269A
Other languages
Chinese (zh)
Other versions
CN102929723B (en
Inventor
陈德训
房田文
吴宏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuxi Jiangnan Computing Technology Institute
Original Assignee
Wuxi Jiangnan Computing Technology Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuxi Jiangnan Computing Technology Institute filed Critical Wuxi Jiangnan Computing Technology Institute
Priority to CN201210441326.9A priority Critical patent/CN102929723B/en
Publication of CN102929723A publication Critical patent/CN102929723A/en
Application granted granted Critical
Publication of CN102929723B publication Critical patent/CN102929723B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Devices For Executing Special Programs (AREA)

Abstract

The invention provides a method for dividing a parallel program segment based on a heterogeneous multi-core processor. The method includes that data correlation analysis is performed on data of an application subject to determine whether program basic segments without data correlations exist; if the program basic segments without the data correlations exist, calculation amounts of program basic segments without the data correlations are calculated; and a first stage multi-core division is performed on the program basic segments without the data correlations. According to the method, the adaptability problem of general scientific calculation and engineering applications in a polymorphic heterogeneous computer system can be solved, and multi-core stage parallel efficiencies and load balancing effects are improved simultaneously.

Description

Concurrent program section division methods based on the isomery many-core processor
Technical field
The present invention relates to computer realm, relate in particular to a kind of concurrent program section division methods based on the isomery many-core processor.
Background technology
In recent years, in order to improve the computing power of system, multinuclear, many-core processor become the main body establishment parts of high-performance computer gradually.On the other hand, the developing direction of microprocessor Design towards solving particular problem, heterogeneous processor has carried out the isomery design of processor core for the singularity of specific area problem, dissimilar operation is distinguished in the soon exemplary operation load, process with different processor cores respectively, to obtain high-performance on the whole.This structure and isomorphism multi-core processor system consist of the polymorphic heterogeneous computing system of great scale.Polymorphic heterogeneous system computing power is strong, Energy Efficiency Ratio is high, it is one of important development direction that solves major application, the architecture of the parallel scale of the super large of simultaneity factor and polymorphic complexity is used for traditional high-performance calculation and is brought huge challenge, and the Parallel Implementation technology that matches lacks.Therefore research becomes the technical barrier that those skilled in the art need to be resolved hurrily based on many granularities concurrent program root segment division methods of isomery many-core processor.
Aspect the concurrent program implementation method of supporting heterogeneous computer system, current Parallel Implementation towards heterogeneous computer system is mostly based on the two-stage parallel model, be MPI(" message passing interface ", MessagePassing Interface) parallel+parallel two-step way of many nuclears, the coarse grain parallelism of MPI level implementation process level, the parallel speed-up computation of mainly finishing the core iterative part of many nuclear namely only loop fine-grained many nuclear according to core parallel.MPI parallel+realization and optimisation technique of the parallel two-stage hybrid parallel programming model of many nuclears in, the parallel main MPI of the employing communication of MPI level and overlapping being optimized of calculating, many nuclear levels are parallel mainly to adopt the methods such as a data layout optimization, data transfer optimization, calculating and memory access be overlapping to carry out Performance tuning.The acceleration effect of concrete problem and problem calculation features and optimisation technique realize closely related.
According to reading up the literature at present, the crowd of application examines Parallel Implementation and only carries out for the integrated solution process that calculated amount wherein expends huge part core calculations or certain simple problem, the numerical simulation of many complicated practical problemss found the solution does not have comprehensively deep solution, therefore the parallel scale of the MPI level of the main Parallel application of heterogeneous computer system is all in 100,000 magnitudes at present, and the more extensive and more complicated parallel computation of very difficult support is used.The whole acceleration effect of practical application problem is general.
In addition, only loop the fine grained parallel of many nuclear levels according to core, its parallel efficiency is subject to the scale of actual motion problem.For example, if the grid scale of actual subject operation dimension size be M, be N from check figure on the isomery many-core processor, if loop the fine grained parallels of many nuclears grade according to core, when M<N, then (N-M) is individual is not utilized from computing power of examining.And as M when N and M are not the integral multiple of N, will be very undesirable from the load balance effect of nuclear level fine grained parallel.Therefore, the cardiocirculatory many nuclear level fine grained parallel technology of existing nuclear are difficult to give full play to the computing power from nuclear.
Be in the Chinese patent application of CN1783011A at publication number, disclosed more related contents.
Summary of the invention
Technical matters to be solved by this invention is to solve general science to calculate with Engineering and be applied in adaptability problem on the polymorphic heterogeneous computer system, improves simultaneously parallel efficiency and the load balance effects of many nuclear levels.
In order to address the above problem, the invention provides a kind of concurrent program section division methods based on the isomery many-core processor, comprising:
The data of using problem are carried out data dependence analysis, to determine whether to exist countless program root segments according to correlativity;
If there is described countless program root segment according to correlativity, then calculate the calculated amount of each countless program root segment according to correlativity; According to described calculated amount, described countless program root segments according to correlativity are carried out the many nuclears of the first order divide.
Optionally, also comprise after the many nuclears of the first order divide described carrying out:
Analyze each countless program root segment according to correlativity, described program root segment is decomposed into a plurality of computation cycles;
Data in each computation cycles are carried out data recurrence correlation analysis, to determine whether to exist countless computation cycles according to the recurrence correlativity;
If there is described countless computation cycles according to the recurrence correlativity, then described countless computation cycles according to the recurrence correlativity carried out the many nuclears in the second level and divide.
Optionally, described calculated amount comprises: Floating-point Computation amount and fixed point calculation amount.
Optionally, the described many nuclears of the first order that carry out are divided and are comprised: the task division and the load balance that carry out the first order with the first granularity.
Optionally, the described many nuclears of the first order that carry out are divided and are comprised: the task division and the load balance that carry out the first order with the first granularity;
The described many nuclears in the second level that carry out are divided and are comprised: the task division and the load balance that carry out the second level with the second granularity.
Optionally, described the second granularity is less than described the first granularity.
Optionally, before described data to the application problem are carried out data dependence analysis, also comprise:
Described application problem is carried out ANALYSIS OF CALCULATING;
Based on the result of described ANALYSIS OF CALCULATING, described application problem is carried out MPI level parallel task with the 3rd granularity divide.
Optionally, described the 3rd granularity is greater than described the first granularity.
Compared with prior art, technical scheme of the present invention has the following advantages:
The present invention is by multi-level fine-grained division concurrent program section, so that respectively more balanced from task division and the load of core processor, thereby can bring into play more fully from the computing power of nuclear, obtain preferably acceleration effect, calculate with Engineering and be applied in adaptability problem on the polymorphic heterogeneous computer system to solve general science.
Description of drawings
Fig. 1 is the schematic flow sheet of concurrent program section division methods the first embodiment based on the isomery many-core processor of the present invention;
Fig. 2 is the schematic flow sheet of concurrent program section division methods the second embodiment based on the isomery many-core processor of the present invention;
Fig. 3 is the schematic flow sheet of concurrent program section division methods the 3rd embodiment based on the isomery many-core processor of the present invention.
Embodiment
A lot of details have been set forth in the following description so that fully understand the present invention.But the present invention can implement much to be different from alternate manner described here, and those skilled in the art can be in the situation that do similar popularization without prejudice to intension of the present invention, so the present invention is not subjected to the restriction of following public implementation.
Secondly, the present invention utilizes schematic diagram to be described in detail, and when the embodiment of the invention was described in detail in detail, for ease of explanation, described schematic diagram was example, and it should not limit the scope of protection of the invention at this.
In order to solve the technical matters in the background technology, the invention provides a kind of concurrent program section division methods based on the isomery many-core processor.Fig. 1 is the schematic flow sheet of concurrent program section division methods the first embodiment based on the isomery many-core processor of the present invention.As shown in Figure 1, this embodiment may further comprise the steps:
Execution in step S101 carries out data dependence analysis to the data of using problem, to determine whether to exist countless program root segments according to correlativity.Particularly, be Z=F (Y) if program root segment 1 is Y=F (X), program root segment 2, think that then there is data dependence in two program root segments, can only serial computing, can't executed in parallel.
Execution in step S102 judges whether to exist countless program root segments according to correlativity.
If do not exist, then do not have can parallel computation the program root segment, can't carry out the division of concurrent program section, finish.
If exist, that is: the program root segment that existence can parallel computation, execution in step S103 then calculates the calculated amount of each countless program root segment according to correlativity.Particularly, described calculated amount comprises Floating-point Computation amount and fixed point calculation amount.Execution in step S104 according to calculated amount, carries out the many nuclears of the first order to described countless program root segments according to correlativity and divides.The described many nuclears of the first order that carry out are divided and comprised: carry out task division and the load balance of the first order with the first granularity, that is: same root segment is finished from the nuclear group at one, determines from the size of the nuclear group calculated amount by this root segment.Divide by the many nuclears of the first order, can realize the fine grained parallel of program root segment, that is: what the follow procedure root segment walked abreast walks abreast from the nuclear group.
Below in conjunction with embodiment technical scheme of the present invention is described further.
In the present embodiment, determine to have 2 countless program root segments according to correlativity in certain application problem by step S101, with module1 and module2 note.In the present embodiment, to participate in describing as 100 from the total quantity of nuclear of parallel computation.By execution in step S103, calculating is learnt: the calculated amount of program root segment module1 is 2, and the calculated amount of program root segment module2 is 3.
Then execution in step S104 according to calculated amount, carries out the many nuclear divisions of the first order to program root segment module1 and program root segment module2.100 from nuclear 40 are given program root segment module1, form first from the nuclear group.100 from nuclear 60 are given program root segment module2, form second from the nuclear group.
Need to prove, the present embodiment is intended to illustrate the technical program, thus select divide comparatively simple from nuclear volume and program segment.It will be understood by those skilled in the art that in the practical large-scale parallel computation, can be millions of from nuclear volume, program is also more complicated, and the present invention does not do concrete restriction to this.
Fig. 2 is the schematic flow sheet of concurrent program section division methods the second embodiment based on the isomery many-core processor of the present invention.Different from the first embodiment is that in this embodiment, on the basis of dividing that walks abreast according to the program root segment, also further each root segment of segmentation loops the many nuclear divisions in the second level to countless in the root segment according to the relevant core of recurrence.
As shown in Figure 2, this embodiment may further comprise the steps:
Execution in step S201 carries out data dependence analysis to the data of using problem, to determine whether to exist countless program root segments according to correlativity.
Execution in step S202 judges whether to exist countless program root segments according to correlativity.
If do not exist, then finish.
If exist, execution in step S203 then calculates the calculated amount of each countless program root segment according to correlativity.
Execution in step S204 according to calculated amount, carries out the many nuclears of the first order to described countless program root segments according to correlativity and divides.
Continue execution in step S205, analyze countless program root segments according to correlativity, described program root segment is decomposed into a plurality of computation cycles.
Execution in step S206 carries out data recurrence correlation analysis to the data in each computation cycles, to determine whether to exist countless computation cycles according to the recurrence correlativity.Particularly, if the interior data variable X of circulation I, j, k=F (X I-1, j, k, X I, j-1, k, X I, j, k-1), think that then there is the recurrence correlativity in variable, otherwise think that variable is without the recurrence correlativity.
Execution in step S207, whether the determining program root segment exists countless core circulations according to the recurrence correlativity.If exist, then execution in step S208 carries out the many nuclears in the second level to countless computation cycles according to the recurrence correlativity and divides, afterwards execution in step S209.Particularly, the described many nuclears in the second level that carry out are divided and are comprised: the task division and the load balance that carry out the second level with the second granularity.Described the second granularity is less than described the first granularity.The many nuclears of the described first order be divided into by root segment walk abreast parallel from the nuclear group, the many nuclears in the described second level are divided into the parallel from examining of circulation level in the root segment, are that the many nuclears of the first order are divided on the basis more fine-grained parallel.
If do not exist, direct execution in step S209 then judges whether that each countless program root segment according to correlativity all handles.If then finish.Otherwise, begin circulation from step S205, continue to analyze next countless program root segment according to correlativity.
The present embodiment is still continued to use program root segment module1 and the module2 of last embodiment.Such as: execution in step S205, analyze first countless program root segment module1 according to correlativity, program root segment module1 is decomposed into 2 computation cycles, with loop1 and loop2 note.
Execution in step S206 carries out data recurrence correlation analysis to the data of computation cycles loop1 and loop2, determines that computation cycles loop1, computation cycles loop2 are the computation cycles without the recurrence correlativity.
Owing to having or not computation cycles loop1, the loop2 of recurrence correlativity among the program root segment module1, therefore, execution in step S208 carries out the many nuclears in the second level to computation cycles loop1, loop2 and divides.Can carry out the many nuclears in the second level according to the calculated amount of computation cycles divides.Such as: in last embodiment, for program root segment module1 has distributed 40 from nuclear., see further segmentation herein: carry out computation cycles loop1 from nuclear for 30, carry out computation cycles loop2 from nuclear for 10 in addition.
Then, owing to also having program root segment module2 to be untreated, so execution in step S205 again analyzes countless program root segment module2 according to correlativity, program root segment module1 is decomposed into 3 computation cycles, with loop1 ', loop2 ' and loop3 ' note.
Execution in step S206, the data of computation cycles loop1 ', loop2 ' and loop3 ' are carried out data recurrence correlation analysis, determine that computation cycles loop1 ', loop2 ' and loop3 ' are the computation cycles of recurrence correlativity, so must carry out serial computing, can't executed in parallel, thereby no longer carry out the many nuclears in the second level and divide.
Fig. 3 is the schematic flow sheet of concurrent program section division methods the 3rd embodiment based on the isomery many-core processor of the present invention.Different from the second embodiment is that this embodiment is being carried out parallel root segment on the basis of 2 kinds of different grain sizes divisions, has comprised that also the coarseness of MPI process level is divided.
As shown in Figure 3, this embodiment may further comprise the steps:
Execution in step S301 carries out ANALYSIS OF CALCULATING to using problem.
Execution in step S302 based on the result of ANALYSIS OF CALCULATING, carries out the division of MPI level parallel task to using problem with the 3rd granularity.Particularly, described MPI level and behavior coarseness are divided, and refer to that mainly the MPI process level of Region Decomposition one-level is parallel.Described the 3rd granularity is greater than the first granularity that walks abreast according to root segment and divide, also certainly greater than the second granularity that walks abreast according to computation cycles and divide.
Execution in step S303 carries out data dependence analysis to the data of using problem, to determine whether to exist countless program root segments according to correlativity.
Execution in step S304 judges whether to exist countless program root segments according to correlativity.If do not exist, then finish.
If exist, execution in step S305 then calculates the calculated amount of each countless program root segment according to correlativity.
Execution in step S306 according to calculated amount, carries out the many nuclears of the first order to described countless program root segments according to correlativity and divides.
Execution in step S307 analyzes countless program root segments according to correlativity, and described program root segment is decomposed into a plurality of computation cycles.
Execution in step S308 carries out data recurrence correlation analysis to the data in each computation cycles, to determine whether to exist countless computation cycles according to the recurrence correlativity.
Execution in step S309, whether the determining program root segment exists countless core circulations according to the recurrence correlativity.If exist, then execution in step S310 carries out the many nuclears in the second level to countless computation cycles according to the recurrence correlativity and divides, afterwards execution in step S311.
If do not exist, direct execution in step S311 then judges whether that each countless program root segment according to correlativity all handles.If then finish.Otherwise, begin circulation from step S307, continue to analyze next countless program root segment according to correlativity.
Need to prove,, bring into play more fully from the computing power of nuclear from the nuclear acceleration effect although it will be understood by those skilled in the art that multi-level fine grained parallel energy Effective Raise of the present invention.But multi-level fine grained parallel of the present invention may bring increase and a small amount of double counting of storage space simultaneously.Therefore, specifically adopt which kind of level and granularity to carry out the division of concurrent program section, need operating personnel to make balance in conjunction with reality between degree of parallelism and memory space, the present invention does not do concrete restriction to this.
Need to prove, through the above description of the embodiments, those skilled in the art can be well understood to and of the present inventionly partly or entirely can realize by software and in conjunction with essential general hardware platform.Based on such understanding, the part that technical scheme of the present invention contributes to prior art in essence in other words can embody with the form of software product, this computer software product can comprise the one or more machine readable medias that store machine-executable instruction on it, and these instructions are can be so that these one or more machines come executable operations according to embodiments of the invention when carrying out such as the one or more machines such as computing machine, computer network or other electronic equipments.Machine readable media can comprise, but be not limited to floppy disk, CD, CD-ROM(compact-disc-ROM (read-only memory)), magneto-optic disk, ROM(ROM (read-only memory)), the RAM(random access memory), the EPROM(Erasable Programmable Read Only Memory EPROM), the EEPROM(Electrically Erasable Read Only Memory), magnetic or optical card, flash memory or be suitable for store the medium/machine readable media of the other types of machine-executable instruction.
The present invention can be used in numerous general or special purpose computingasystem environment or the configuration.Such as: personal computer, server computer, handheld device or portable set, plate equipment, multicomputer system, based on microprocessor system, set top box, programmable consumer-elcetronics devices, network PC, small-size computer, mainframe computer, comprise the distributed computing environment of above any system or equipment etc.
The present invention can describe in the general context of the computer executable instructions of being carried out by computing machine, for example program module.Usually, program module comprises the routine carrying out particular task or realize particular abstract data type, program, object, assembly, data structure etc.Also can in distributed computing environment, put into practice the application, in these distributed computing environment, be executed the task by the teleprocessing equipment that is connected by communication network.In distributed computing environment, program module can be arranged in the local and remote computer-readable storage medium that comprises memory device.
The present invention divides on the basis in the MPI of prior art level coarseness, carry out partition by fine granularities according to the program root segment, and can further in the core circulation of program root segment, carry out more partition by fine granularities, realized that multi-level fine-grained program segment is divided in the large scale scale heterogeneous parallel computation, improved from the load balance effect of nuclear, thereby bring into play more fully from the computing power of nuclear, Effective Raise use the whole acceleration effect of problem, the efficient operation of calculating with Engineering using integral problem for general science provides effective power-assisted.
In addition, the present invention is except providing the better concurrency heterogeneous applications itself, also sum up the general rule of the many nuclear of general science compute classes problem Parallel Implementation method and General Method for Optimization and the means of solution and the many nuclear of part raising parallel efficiencies, for basic composing software system and the parallel compilation software systems of large scale scale heterogeneous computer system are offered reference and criterion.
Although the present invention with preferred embodiment openly as above; but it is not to limit the present invention; any those skilled in the art without departing from the spirit and scope of the present invention; can utilize method and the technology contents of above-mentioned announcement that technical solution of the present invention is made possible change and modification; therefore; every content that does not break away from technical solution of the present invention; to any simple modification, equivalent variations and modification that above embodiment does, all belong to the protection domain of technical solution of the present invention according to technical spirit of the present invention.

Claims (8)

1. the concurrent program section division methods based on the isomery many-core processor is characterized in that, comprising:
The data of using problem are carried out data dependence analysis, to determine whether to exist countless program root segments according to correlativity;
If there is described countless program root segment according to correlativity, then calculate the calculated amount of each countless program root segment according to correlativity; According to described calculated amount, described countless program root segments according to correlativity are carried out the many nuclears of the first order divide.
2. the concurrent program section division methods based on the isomery many-core processor as claimed in claim 1 is characterized in that, also comprises after the many nuclears of the first order divide described carrying out:
Analyze each countless program root segment according to correlativity, described program root segment is decomposed into a plurality of computation cycles;
Data in each computation cycles are carried out data recurrence correlation analysis, to determine whether to exist countless computation cycles according to the recurrence correlativity;
If there is described countless computation cycles according to the recurrence correlativity, then described countless computation cycles according to the recurrence correlativity carried out the many nuclears in the second level and divide.
3. the concurrent program section division methods based on the isomery many-core processor as claimed in claim 1 is characterized in that:
Described calculated amount comprises: Floating-point Computation amount and fixed point calculation amount.
4. the concurrent program section division methods based on the isomery many-core processor as claimed in claim 1 is characterized in that:
The described many nuclears of the first order that carry out are divided and are comprised: the task division and the load balance that carry out the first order with the first granularity.
5. the concurrent program section division methods based on the isomery many-core processor as claimed in claim 2 is characterized in that:
The described many nuclears of the first order that carry out are divided and are comprised: the task division and the load balance that carry out the first order with the first granularity;
The described many nuclears in the second level that carry out are divided and are comprised: the task division and the load balance that carry out the second level with the second granularity.
6. the concurrent program section division methods based on the isomery many-core processor as claimed in claim 5 is characterized in that:
Described the second granularity is less than described the first granularity.
7. such as described any the concurrent program section division methods based on the isomery many-core processor of claim 1 to 6, it is characterized in that, before described data to the application problem are carried out data dependence analysis, also comprise:
Described application problem is carried out ANALYSIS OF CALCULATING;
Based on the result of described ANALYSIS OF CALCULATING, described application problem is carried out MPI level parallel task with the 3rd granularity divide.
8. the concurrent program section division methods based on the isomery many-core processor as claimed in claim 7 is characterized in that:
Described the 3rd granularity is greater than described the first granularity.
CN201210441326.9A 2012-11-06 2012-11-06 Method for dividing parallel program segment based on heterogeneous multi-core processor Active CN102929723B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210441326.9A CN102929723B (en) 2012-11-06 2012-11-06 Method for dividing parallel program segment based on heterogeneous multi-core processor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210441326.9A CN102929723B (en) 2012-11-06 2012-11-06 Method for dividing parallel program segment based on heterogeneous multi-core processor

Publications (2)

Publication Number Publication Date
CN102929723A true CN102929723A (en) 2013-02-13
CN102929723B CN102929723B (en) 2015-07-08

Family

ID=47644530

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210441326.9A Active CN102929723B (en) 2012-11-06 2012-11-06 Method for dividing parallel program segment based on heterogeneous multi-core processor

Country Status (1)

Country Link
CN (1) CN102929723B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103279391A (en) * 2013-06-09 2013-09-04 浪潮电子信息产业股份有限公司 Load balancing optimization method based on CPU (central processing unit) and MIC (many integrated core) framework processor cooperative computing
CN104834556A (en) * 2015-04-26 2015-08-12 西北工业大学 Mapping method for multimode real-time tasks and multimode computing resources
CN105808358A (en) * 2016-03-29 2016-07-27 西安交通大学 Data dependency thread group mapping method for many-core system
CN108491263A (en) * 2018-03-02 2018-09-04 珠海市魅族科技有限公司 Data processing method, data processing equipment, terminal and readable storage medium storing program for executing
CN109828790A (en) * 2019-01-31 2019-05-31 上海赜睿信息科技有限公司 A kind of data processing method and system based on Shen prestige isomery many-core processor
CN110222007A (en) * 2019-06-20 2019-09-10 山东省计算中心(国家超级计算济南中心) A kind of Accelerating running method based on Shen prestige many-core processor
CN111782348A (en) * 2019-04-04 2020-10-16 北京沃东天骏信息技术有限公司 Application program processing method, device, system and computer readable storage medium
CN112559032A (en) * 2019-09-25 2021-03-26 无锡江南计算技术研究所 Many-core program reconstruction method based on loop segment
CN113010112A (en) * 2021-03-09 2021-06-22 重庆邮电大学 Data distribution optimization method based on variable resistive memory

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070124733A1 (en) * 2004-01-08 2007-05-31 Koninklijke Philips Electronics N.V. Resource management in a multi-processor system
CN101075199A (en) * 2006-05-18 2007-11-21 迈普(四川)通信技术有限公司 Method for scheduling multiple CPU
CN101086710A (en) * 2006-06-06 2007-12-12 学校法人早稻田大学 Global compiler for controlling heterogeneous multiprocessor
CN101706741A (en) * 2009-12-11 2010-05-12 中国人民解放军国防科学技术大学 Method for partitioning dynamic tasks of CPU and GPU based on load balance
WO2011155046A1 (en) * 2010-06-10 2011-12-15 富士通株式会社 Multi-core processor system, control program, and method of control
CN102681902A (en) * 2012-05-15 2012-09-19 浙江大学 Load balancing method based on task distribution of multicore system
CN102707952A (en) * 2012-05-16 2012-10-03 上海大学 User description based programming design method on embedded heterogeneous multi-core processor

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070124733A1 (en) * 2004-01-08 2007-05-31 Koninklijke Philips Electronics N.V. Resource management in a multi-processor system
CN101075199A (en) * 2006-05-18 2007-11-21 迈普(四川)通信技术有限公司 Method for scheduling multiple CPU
CN101086710A (en) * 2006-06-06 2007-12-12 学校法人早稻田大学 Global compiler for controlling heterogeneous multiprocessor
CN101706741A (en) * 2009-12-11 2010-05-12 中国人民解放军国防科学技术大学 Method for partitioning dynamic tasks of CPU and GPU based on load balance
WO2011155046A1 (en) * 2010-06-10 2011-12-15 富士通株式会社 Multi-core processor system, control program, and method of control
CN102681902A (en) * 2012-05-15 2012-09-19 浙江大学 Load balancing method based on task distribution of multicore system
CN102707952A (en) * 2012-05-16 2012-10-03 上海大学 User description based programming design method on embedded heterogeneous multi-core processor

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103279391A (en) * 2013-06-09 2013-09-04 浪潮电子信息产业股份有限公司 Load balancing optimization method based on CPU (central processing unit) and MIC (many integrated core) framework processor cooperative computing
CN104834556A (en) * 2015-04-26 2015-08-12 西北工业大学 Mapping method for multimode real-time tasks and multimode computing resources
CN104834556B (en) * 2015-04-26 2018-06-22 西北工业大学 A kind of mapping method of polymorphic real-time task and polymorphic computing resource
CN105808358A (en) * 2016-03-29 2016-07-27 西安交通大学 Data dependency thread group mapping method for many-core system
WO2019165745A1 (en) * 2018-03-02 2019-09-06 珠海市魅族科技有限公司 Data processing method, data processing device, terminal and readable storage medium
CN108491263A (en) * 2018-03-02 2018-09-04 珠海市魅族科技有限公司 Data processing method, data processing equipment, terminal and readable storage medium storing program for executing
CN109828790A (en) * 2019-01-31 2019-05-31 上海赜睿信息科技有限公司 A kind of data processing method and system based on Shen prestige isomery many-core processor
CN111782348A (en) * 2019-04-04 2020-10-16 北京沃东天骏信息技术有限公司 Application program processing method, device, system and computer readable storage medium
CN110222007A (en) * 2019-06-20 2019-09-10 山东省计算中心(国家超级计算济南中心) A kind of Accelerating running method based on Shen prestige many-core processor
CN110222007B (en) * 2019-06-20 2023-11-24 山东省计算中心(国家超级计算济南中心) Acceleration operation method based on Shenwei many-core processor
CN112559032A (en) * 2019-09-25 2021-03-26 无锡江南计算技术研究所 Many-core program reconstruction method based on loop segment
CN112559032B (en) * 2019-09-25 2023-06-13 无锡江南计算技术研究所 Many-core program reconstruction method based on circulation segment
CN113010112A (en) * 2021-03-09 2021-06-22 重庆邮电大学 Data distribution optimization method based on variable resistive memory

Also Published As

Publication number Publication date
CN102929723B (en) 2015-07-08

Similar Documents

Publication Publication Date Title
CN102929723B (en) Method for dividing parallel program segment based on heterogeneous multi-core processor
Luo et al. An effective GPU implementation of breadth-first search
Yang et al. A peta-scalable CPU-GPU algorithm for global atmospheric simulations
Yue et al. Benchmark performance of global switching versus local switching for trajectory surface hopping molecular dynamics simulation: cis↔ trans azobenzene photoisomerization
Dziekonski et al. Generation of large finite‐element matrices on multiple graphics processors
Tan et al. Design and implementation of adaptive spmv library for multicore and many-core architecture
Liu et al. A programming framework for large scale numerical simulations on unstructured mesh
Ao et al. 26 pflops stencil computations for atmospheric modeling on sunway taihulight
CN103049384A (en) Automatic generating frame of multi-core-based multithread limit energy consumption testing source program
Wyrzykowski et al. Parallelization of 2D MPDATA EULAG algorithm on hybrid architectures with GPU accelerators
Luitjens et al. Scalable parallel regridding algorithms for block‐structured adaptive mesh refinement
Rojek et al. Systematic adaptation of stencil‐based 3D MPDATA to GPU architectures
Yang et al. Improving parallel substructuring efficiency by using a multilevel approach
Bao et al. PWCET: power-aware worst case execution time analysis
Martínez del Amor et al. Parallel simulation of probabilistic P systems on multicore platforms
Shankar et al. GRaM-X: A new GPU-accelerated dynamical spacetime GRMHD code for Exascale computing with the Einstein Toolkit
Meister et al. A software concept for cache-efficient simulation on dynamically adaptive structured triangular grids
Wu et al. MIC acceleration of short-range molecular dynamics simulations
Cao et al. Flexible data redistribution in a task-based runtime system
Bruned et al. Use of efficient task allocation algorithm for parallel real-time EMT simulation
Bhatele et al. Applying graph partitioning methods in measurement-based dynamic load balancing
Yang et al. Parallel algorithm design and optimization of geodynamic numerical simulation application on the Tianhe new-generation high-performance computer
Liu et al. Development of a restricted additive Schwarz preconditioner for sparse linear systems on NVIDIA GPU
Chen et al. Efficient data and task co-scheduling for scientific workflow in geo-distributed datacenters
Voss et al. On predictable reconfigurable system design

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant