CN102890642A - Performance analysis method based on heterogeneous reconfigurable computing (HRC) of matching matrix - Google Patents

Performance analysis method based on heterogeneous reconfigurable computing (HRC) of matching matrix Download PDF

Info

Publication number
CN102890642A
CN102890642A CN2011104404354A CN201110440435A CN102890642A CN 102890642 A CN102890642 A CN 102890642A CN 2011104404354 A CN2011104404354 A CN 2011104404354A CN 201110440435 A CN201110440435 A CN 201110440435A CN 102890642 A CN102890642 A CN 102890642A
Authority
CN
China
Prior art keywords
isomery
task
reconstruction
matrix
subtask
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2011104404354A
Other languages
Chinese (zh)
Other versions
CN102890642B (en
Inventor
曾国荪
王伟
谭一鸣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tongji University
Shanghai Redneurons Co Ltd
Original Assignee
Tongji University
Shanghai Redneurons Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tongji University, Shanghai Redneurons Co Ltd filed Critical Tongji University
Priority to CN201110440435.4A priority Critical patent/CN102890642B/en
Publication of CN102890642A publication Critical patent/CN102890642A/en
Application granted granted Critical
Publication of CN102890642B publication Critical patent/CN102890642B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Debugging And Monitoring (AREA)
  • Stored Programmes (AREA)

Abstract

The invention relates to a performance analysis method based on heterogeneous reconfigurable computing (HRC) of a matching matrix. The method comprises the following steps of: 1) establishing an HRC system (HRCS) model; 2) establishing an HR task graph model (HR-DAG); 3) generating a heterogeneous matching matrix Ma; 4) generating a reconfigurable coupling matrix Co; and 5) computing the execution completion time of an application task by using a scheduling algorithm, and performing performance analysis. Compared with the prior art, the performance analysis method has the advantage that requirements of the application task on computing and communication are abundantly and accurately expressed by describing the heterogeneous characteristics and communication reconfigurable characteristics of the application task.

Description

Method for analyzing performance based on the isomery reconstruction calculations of mating matrix
Technical field
The present invention relates to a kind of method for analyzing performance of isomery reconstruction calculations, especially relate to a kind of method for analyzing performance of the isomery reconstruction calculations based on mating matrix.
Background technology
Traditional high-performance calculation take isomorphism calculating (Homogeneous Computing) as main computation schema has begun to Heterogeneous Computing (Heterogeneous Computing, HC) direction changes, for example " No. one, the Milky Way " of " Roadrunner " and China's national defense University of Science and Technology development.On the other hand, restructural calculating (Reconfigurable Computing, RC) also is introduced in the high-performance calculation.Configurable component has improved the dirigibility of calculating and the utilization factor of processing element with its configurable characteristic, and the characteristics of carrying out with its spatial parallelism have strengthened the high efficiency of calculating, and have reduced again power consumption simultaneously.This shows, include isomery reconstruction calculations (the Heterogeneous ﹠amp of various isomery acceleration components and configurable component; Reconfigurable Computing, HRC) be the new trend of high-performance calculation development.All advantages that HRC has HC and a RC are such as high efficiency, dirigibility, high performance-price ratio, low-power consumption, stability, high fault tolerance and short construction cycle etc.But whether the HRC system can realize really that the efficient execution of using then needs to carry out performance evaluation.
Traditionally, the method for analyzing performance of high-performance calculation has: theoretical analysis, analog simulation, thermometrically etc. [6-8].Theoretical analysis method refers to parallel system is carried out Formal Modeling, and these methods can be divided into two classes, determinacy and probability form.In deterministic models, all amounts are all fixed.In probabilistic model, exist uncertainty and stochastic variable.Simulation is a technology that is widely used in performance evaluation field.It provides an effective method to predict the performance that also is not produced the computer system of coming.Also can be used to the correctness of proof theory analytical approach.Analogue technique comprises: emulation technology, Monte Carlo simulation is followed the trail of and is driven simulation, Execution driven simulation and discrete-time analogues.The performance measurement technology comprises two kinds of technology: first technology is profiling, follows the tracks of basic performance information when program is moved.The performance information of program normally just shows the user immediately after program carry out to finish, the distribution of execution time between the distinct program code that can display routine.Second technology is to follow the tracks of (tracing).Tracking technique has been safeguarded the journal file of all activity details of logging program.Follow the tracks of and usually can produce the long program of a large amount of tracking datas, particularly working time.But tracking technique can be used for the behavior of reconstruction application program in when operation.Also can be used to estimate the performance information that is provided by the Profiling technology.Therefore, follow the tracks of the more general performance measurement technology that is considered to.These existing Parallel Computing Performance analytical approachs no longer are applicable to HRC, mainly have following problem:
When using for given one, how to select the parallel computer of architecture? how be applied in the performance moved under certain architecture? is which type of performance index weighed performance? how to obtain performance data? these problems all are target and the research contents of performance evaluation.
For have form of calculation flexibly, the high-effect calculating of the multiple analysis standards such as efficient coupling, programmability, portability, stability, extensibility and low-power consumption of application and resource, the related content of performance evaluation has also obtained expansion, also produces series of problems simultaneously.
Problem to HRC computing system performance evaluation existence:
(1) because the isomery reconstruction calculations is at the early-stage at present, goes back neither one ready-made isomery reconstruction calculations system and computation model thereof.
(2) there is not ready-made performance evaluation technology to carry out analysis and prediction to the performance of isomery reconstruction calculations system.Existing analysis tool also is not enough to isomery reconstruction calculations system is analyzed.
(3) select what kind of performance index that the performance of analytic system is described.
Summary of the invention
Purpose of the present invention is exactly the method for analyzing performance that a kind of isomery reconstruction calculations based on mating matrix is provided for the defective that overcomes above-mentioned prior art existence.
Purpose of the present invention can be achieved through the following technical solutions:
A kind of method for analyzing performance of the isomery reconstruction calculations based on mating matrix is characterized in that, may further comprise the steps:
1) sets up isomery reconstruction calculations system model HRCS;
2) set up isomery reconstruction task graph model HR-DAG;
3) generate isomery coupling matrix M a
4) generate reconstruct coupled matrix C o
5) come complete time of computing application task by dispatching algorithm, always carry out performance evaluation.
Described isomery reconstruction calculations system model HRCS is:
HRCS=(V P, E P); Wherein, point set V P={ p 1, p 2..., p MThe set of processing element in the expression system; Limit collection E P={ e 1, e 2..., e LThe set of link between the expression processing element.
Described isomery reconstruction task graph model HR-DAG is:
HR-DAG=(V T, E T, w, D, H, R); Wherein, point set V T={ t 1, t 2..., t NThe expression subtask set; Limit collection E T={ e 1, e 2..., e KIt is the partial ordering relation set between the subtask; W represents the calculated amount set of subtask; D represents the traffic set between the subtask; H represents the heterogeneous characteristic set of subtask execution; R represents the reconstruct characteristic set of communicating by letter between the subtask.
Described isomery coupling matrix M a=(v Ij) N * M, v wherein IjRepresent application task t iAt processing element p jOn execution speed, 1≤i≤N, 1≤j≤M.
Described reconstruct coupled matrix C o=(c Ij) K * L, c wherein IjReconstruct feature r between the expression task iWith topological structure t jBetween the degree of coupling, and 0≤c Ij≤ 1,1≤i≤K, 1≤j≤L, K are the number on limit among the isomery reconstruction task figure, and L is the number of topological structure type.
Describedly come the complete time of computing application task to be specially by dispatching algorithm:
Utilize isomery coupling matrix M aApplication task optimization mapping is matched on the processing element, according to formula 1:T Comp(t i)=w i/ v IjObtain the execution time of application task, wherein w iBe task t iCalculated amount; Utilize reconstruct coupled matrix C oCome the dynamic restructuring of guidance system topological structure, and according to formula 2:T Comm(t a, t b)=d Ab/ (B*c Ij) obtain the call duration time between task, wherein d AbBe task t aAnd t bBetween the traffic, B is the communication bandwidth of system; Circulation can calculate the deadline of whole application task at last until all tasks carryings are complete.
Compared with prior art, the present invention has the following advantages:
(1) provided Heterogeneous Computing, the definition of reconstruction calculations, isomery reconstruction calculations, and the mutual relationship between the isomery, reconstruct, high-performance calculation three;
(2) isomery reconstruction calculations architectural model and isomery reconstruction applications task HR-DAG graph model have been set up, the latter is the expansion to traditional DAG, by increasing the description of application task heterogeneous characteristic and the reconstruct feature of communicate by letter, more enrich and expressed exactly application task to calculating and the demand of communicating by letter;
(4) defined isomery coupling matrix M a, the matching degree of dissimilar application tasks implementation effect on various processing element is described;
(5) defined reconstruct coupled matrix C o, the communication link of having portrayed between processing element satisfies the subtask to the degree of communication pattern or topological structure demand.
Description of drawings
Fig. 1 is process flow diagram of the present invention.
Embodiment
The present invention is described in detail below in conjunction with the drawings and specific embodiments.
Embodiment
The isomery reconstruction calculations HRCS of system is the novel high-performance computer system that possible comprise general processor CPU, special-purpose acceleration components (such as GPU) and configurable component (such as FPGA), all processing element in the system connect by programmable interconnection network, so that the topological structure of interconnection network can be fit to the communication requirement of application task.In order to carry out the performance evaluation to HRCS, need to set up the system model of HRCS, can carry out the analysis of other performance index such as task execution time, speed-up ratio, extensibility, the ratio of performance to price based on this model.
Specific embodiments is as follows:
(1) application program is generated isomery reconstruction task figure, and rule of thumb data obtain the calculated amount of each subtask in the task image and the traffic between the subtask, calculate calculating unit and every interconnection network type that communication task is fit to that each calculation task is fit to.Node table in the task image is shown tlv triple (t i/ w i/ h i), t wherein iRepresent i task, w iExpression task t iCalculated amount, h iExpression task t iHeterogeneous characteristic; The limit is expressed as tlv triple (e equally j/ d j/ r j), e wherein jRepresent j bar communication limit, d jExpression communication limit e jOn the traffic, r jExpression communication limit e jCommunication feature between two tasks that connect.
(2) generating in the task image computing time of each calculation task on the different disposal parts according to isomery coupling matrix, is that each calculation task is selected the processing element that is fit to.
(3) according to the call duration time of each communication task on different interconnection network in the reconstruct coupled matrix generation task image, be that each communication task is selected suitable internet type.
(4) according to the task execution time analytical algorithm based on metric matrix that proposes the calculation task in the task image and communication task are dispatched, the processing element that arranges each task to be fit to, and according to the opportunity of communication task reconfigurable interconnection network being reconstructed, to be fit to the communication of communication task, the execution time that can draw at last whole isomery reconstruction task figure.

Claims (6)

1. the method for analyzing performance based on the isomery reconstruction calculations of coupling matrix is characterized in that, may further comprise the steps:
1) sets up isomery reconstruction calculations system model HRCS;
2) set up isomery reconstruction task graph model HR-DAG;
3) generate isomery coupling matrix M a
4) generate reconstruct coupled matrix C o
5) come complete time of computing application task by dispatching algorithm, always carry out performance evaluation.
2. a kind of method for analyzing performance of isomery reconstruction calculations based on the coupling matrix according to claim 1 is characterized in that described isomery reconstruction calculations system model HRCS is:
HRCS=(V P, E P); Wherein, point set V P={ p 1, p 2..., p MThe set of processing element in the expression system; Limit collection E P={ e 1, e 2..., e LThe set of link between the expression processing element.
3. a kind of method for analyzing performance of isomery reconstruction calculations based on the coupling matrix according to claim 2 is characterized in that described isomery reconstruction task graph model HR-DAG is:
HR-DAG=(V T, E T, W, D, H, R); Wherein, point set V T={ t 1, t 2..., t NThe expression subtask set; Limit collection E T={ e 1, e 2..., e KIt is the partial ordering relation set between the subtask; W represents the calculated amount set of subtask; D represents the traffic set between the subtask; H represents the heterogeneous characteristic set of subtask execution; R represents the reconstruct characteristic set of communicating by letter between the subtask.
4. the method for analyzing performance of a kind of isomery reconstruction calculations based on mating matrix according to claim 3 is characterized in that, described isomery coupling matrix M a=(v Ij) N * M, v wherein IjRepresent application task t iAt processing element p jOn execution speed, 1≤i≤N, 1≤j≤M.
5. the method for analyzing performance of a kind of isomery reconstruction calculations based on mating matrix according to claim 4 is characterized in that described reconstruct coupled matrix C o=(c Ij) K * L, c wherein IjReconstruct feature r between the expression task iWith topological structure t jBetween the degree of coupling, and 0≤c Ij≤ 1,1≤i≤K, 1≤j≤L, K are the number on limit among the isomery reconstruction task figure, and L is the number of topological structure type.
6. a kind of method for analyzing performance of isomery reconstruction calculations based on the coupling matrix according to claim 5 is characterized in that, describedly comes the complete time of computing application task to be specially by dispatching algorithm:
Utilize isomery coupling matrix M aApplication task optimization mapping is matched on the processing element, according to formula 1:T Comp(t i)=w i/ v IjObtain the execution time of application task, wherein w iBe task t iCalculated amount; Utilize reconstruct coupled matrix C oCome the dynamic restructuring of guidance system topological structure, and according to formula 2:T Comm(t a, t b)=d Ab/ (B*c Ij) obtain the call duration time between task, wherein d AbBe task t aAnd t bBetween the traffic, B is the communication bandwidth of system; Circulation can calculate the deadline of whole application task at last until all tasks carryings are complete.
CN201110440435.4A 2011-12-23 2011-12-23 Performance analysis method based on heterogeneous reconfigurable computing (HRC) of matching matrix Expired - Fee Related CN102890642B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201110440435.4A CN102890642B (en) 2011-12-23 2011-12-23 Performance analysis method based on heterogeneous reconfigurable computing (HRC) of matching matrix

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201110440435.4A CN102890642B (en) 2011-12-23 2011-12-23 Performance analysis method based on heterogeneous reconfigurable computing (HRC) of matching matrix

Publications (2)

Publication Number Publication Date
CN102890642A true CN102890642A (en) 2013-01-23
CN102890642B CN102890642B (en) 2014-10-22

Family

ID=47534150

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201110440435.4A Expired - Fee Related CN102890642B (en) 2011-12-23 2011-12-23 Performance analysis method based on heterogeneous reconfigurable computing (HRC) of matching matrix

Country Status (1)

Country Link
CN (1) CN102890642B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109408148A (en) * 2018-10-25 2019-03-01 北京计算机技术及应用研究所 A kind of production domesticization computing platform and its apply accelerated method
CN109522108A (en) * 2018-10-30 2019-03-26 西安交通大学 A kind of GPU task scheduling system and method merged based on Kernel

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101754226A (en) * 2009-12-21 2010-06-23 西安电子科技大学 Reconstruction method of terminal under environment of cognitive radio network
CN101976204A (en) * 2010-10-14 2011-02-16 中国科学技术大学苏州研究院 Service-oriented heterogeneous multi-core computing platform and task scheduling method used by same

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101754226A (en) * 2009-12-21 2010-06-23 西安电子科技大学 Reconstruction method of terminal under environment of cognitive radio network
CN101976204A (en) * 2010-10-14 2011-02-16 中国科学技术大学苏州研究院 Service-oriented heterogeneous multi-core computing platform and task scheduling method used by same

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
姜燕等: "基于扩展的随机DAG的并行任务调度算法研究", 《计算机科学》, vol. 35, no. 07, 25 July 2008 (2008-07-25), pages 57 - 60 *
肖共萌: "DPDS:一种处理资源调度算法", 《计算机工程与应用》, vol. 44, no. 03, 21 January 2008 (2008-01-21), pages 128 - 132 *
郝水侠,曾国荪,谭一鸣;: "一种基于DAG图的异构可重构任务划分方法", 《同济大学学报(自然科学版)》, vol. 39, no. 11, 30 November 2011 (2011-11-30), pages 1693 - 1698 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109408148A (en) * 2018-10-25 2019-03-01 北京计算机技术及应用研究所 A kind of production domesticization computing platform and its apply accelerated method
CN109408148B (en) * 2018-10-25 2021-06-08 北京计算机技术及应用研究所 Domestic computing platform and application acceleration method thereof
CN109522108A (en) * 2018-10-30 2019-03-26 西安交通大学 A kind of GPU task scheduling system and method merged based on Kernel

Also Published As

Publication number Publication date
CN102890642B (en) 2014-10-22

Similar Documents

Publication Publication Date Title
Bhimani et al. Fim: performance prediction for parallel computation in iterative data processing applications
Carrington et al. A performance prediction framework for scientific applications
Böhme et al. Scalable critical-path based performance analysis
Schulz et al. Interpreting performance data across intuitive domains
Acun et al. Preliminary evaluation of a parallel trace replay tool for hpc network simulations
Mantovani et al. Performance and energy consumption of HPC workloads on a cluster based on Arm ThunderX2 CPU
Banchelli et al. Performance study of HPC applications on an Arm-based cluster using a generic efficiency model
CN101799767B (en) Method for carrying out parallel simulation by repeatedly switching a plurality of operation modes of simulator
CN102890642B (en) Performance analysis method based on heterogeneous reconfigurable computing (HRC) of matching matrix
Booth et al. Phase detection with hidden markov models for dvfs on many-core processors
Fanfakh Predicting the performance of mpi applications over different grid architectures
Blem et al. Multicore model from abstract single core inputs
Nilakantan et al. Metrics for early-stage modeling of many-accelerator architectures
CN104750916A (en) Design resource integration system for designing virtual prototype of boiler
CN104090813A (en) Analysis modeling method for CPU (central processing unit) usage of virtual machines in cloud data center
Lee et al. TAUmon: scalable online performance data analysis in TAU
Liu et al. A systematic and realistic network-on-chip traffic modeling and generation technique for emerging many-core systems
Ding et al. An automatic performance model-based scheduling tool for coupled climate system models
Jorba et al. Application of parallel computing to the simulation of forest fire propagation
CN101281553A (en) Vehicle durability distributed emulation graticule applied system
Kuehn Performance and energy efficiency of parallel processing in data center environments
Yu et al. CantorSim: Simplifying acceleration of micro-architecture simulations
Schwambach et al. Estimating the potential speedup of computer vision applications on embedded multiprocessors
He et al. Efficient and precise profiling, modeling and management on power and performance for power constrained hpc systems
Allugundu et al. Acceleration of distance-to-default with hardware-software co-design

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20141022

Termination date: 20171223