CN106469114B - A kind of Parallel Computing Performance detection system and its method towards communication test - Google Patents

A kind of Parallel Computing Performance detection system and its method towards communication test Download PDF

Info

Publication number
CN106469114B
CN106469114B CN201510508961.8A CN201510508961A CN106469114B CN 106469114 B CN106469114 B CN 106469114B CN 201510508961 A CN201510508961 A CN 201510508961A CN 106469114 B CN106469114 B CN 106469114B
Authority
CN
China
Prior art keywords
parallel
performance
function
serial
equivalent
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510508961.8A
Other languages
Chinese (zh)
Other versions
CN106469114A (en
Inventor
王华俊
李凯
欧阳玉玲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fuzhou Internet of things open laboratory Co., Ltd.
Shanghai Research Center for Wireless Communications
Original Assignee
Fuzhou Internet Of Things Open Laboratory Co Ltd
Shanghai Research Center for Wireless Communications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fuzhou Internet Of Things Open Laboratory Co Ltd, Shanghai Research Center for Wireless Communications filed Critical Fuzhou Internet Of Things Open Laboratory Co Ltd
Priority to CN201510508961.8A priority Critical patent/CN106469114B/en
Publication of CN106469114A publication Critical patent/CN106469114A/en
Application granted granted Critical
Publication of CN106469114B publication Critical patent/CN106469114B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a kind of Parallel Computing Performance detection system and its method towards communication test.Within the system, comprising: serial application system, for obtaining the performance statistics file of serial program, parallel application system, for obtaining the runnability data under parallel equivalent model;Parallel computation assessment system, according to the performance statistics file of serial program, and the runnability data under the parallel equivalent model that is obtained from parallel application system, calculate testing result.The present invention solves the simple accurate method for lacking and carrying out parallel performance assessment for serial application, and assessment models existing in the prior art it is complicated, cannot reflect comprehensively assessment performance, flexibility not enough, the problems such as different scenes can not be applied to, it is simple, efficient, accurate to have the characteristics that.

Description

A kind of Parallel Computing Performance detection system and its method towards communication test
Technical field
The present invention relates to a kind of calculated performance detection system more particularly to a kind of Parallel Computing Performances towards communication test Detection system is also related to the Parallel Computing Performance detection method realized based on the detection system, belongs to communication test technology Field.
Background technique
With the rapid development of wireless technology, high-performance calculation demand is growing day by day.Wireless 5G simulation requirement analysis, is being counted The time overhead for calculating the crucial time-consuming module such as channel matrix, precoding, receiver, scheduling of resource is very big, can achieve 1 and imitates The multiplication of matrices and ask that 1024 points of the FFT that true cycle (1 millisecond) is millions of times is calculated and millions of secondary 32X32 are tieed up Inverse operation, simulation calculation intensity are hundreds times of existing artificial tasks, and according to the configuration of current simulation calculation, (64 vouching platform CPU are taken Business device), a simulation sample of operation 10 seconds needs the calculating time of dozens of days, and this simulation efficiency is to be unable to satisfy demand , it is therefore desirable to use High Performance Computing.
The mainstream development direction of high-performance calculation at present is the parallel computing based on multi-core processor.Parallel computation skill Art have developed rapidly, and system peak performance is improved rapidly, but be faced with following problem simultaneously:
(1) the Parallel Computing Performance gain that application program obtains on concurrent computational system not with the peak value of parallel system The same growth rate of property retention, main reason is that:
● system scale is big, and structure is complicated, there is apparent performance bottleneck on the whole, for example software parallel optimization design is not Sufficiently, hardware selection is improper, can all directly affect the performance gain of total system;
● application model and parallel algorithm can not accurately be mapped on parallel architecture, cannot effective digging system Potential performance;
● software and hardware resources scheduling is difficult, is not easy load balance.
(2) system performance evaluation modeling problem
Concurrent computational system architecture complexity is high, and the calculation features of different application programs are not quite similar, using journey The Parallel Computing Performance gain of sequence is affected by many factors, and there is also the complexity for influencing each other and restricting to close between these factors System, therefore the model for assessing performance gain is extremely complex, is difficult Accurate Model, or since model is excessively complicated, implementation cost It is too high to really be applied in actual engineering practice.
(3) in terms of project implementation angle, the property of various Parallel Design schemes is fast and accurately assessed at Project design initial stage Energy gain simultaneously does decision, is to influence project process, the critical issue of cost, but lack side simple and effective, easy to implement at present Method, reason are:
The application program in different business field is there are the diversity of calculation features, the performance test benchmark program of mainstream at present Using unified design specification, the general calculated performance feature of system bottom, such as dominant frequency computational efficiency, bus can only be embodied The basic indexs such as bandwidth, internal storage access, cache hit probability, and can parallel computation granularity, Memory Allocation possessed by application program With access feature, bandwidth occupancy feature etc. influence the various factors such as calculated performance gain can not in performance test benchmark direct body Reveal and, and the calculation features of different application are different, even if modeling to these influence factors, can not also be formed general Model.The separation of this application program and machine architecture in assessment, the assessment for resulting in Parallel Design scheme are not quasi- enough Really, it on the one hand will affect the selection of parallel computation architecture scheme, on the other hand also result in application program by serially turning simultaneously Conceptual design changes repeatedly when row, directly results in project cost rising, schedule delay.
The Performance Evaluation of High Performance Computing (referring mainly to parallel computation), current main assessment technology means have more Kind.Such as the Concurrent Feature extracting method of the propositions such as Wong, A. extracts the higher part of correlation by the analysis to application Module (1% of the total runing time of the module more than total application time) is simultaneously distributed corresponding weight (number of repetition of module), To generate the executable feature code of the application.This feature code can directly be run on the target machine, each module Runing time is the predictable runing time entirely applied multiplied by its corresponding weight.The property that Gengbin Zheng et al. is proposed Energy prediction technique passes through slice first and miniaturization technology reduces the scale of application and captures the number that the feature of application is modeled According to collection, machine learning techniques is then recycled to build to each of picking out key SEB (module in set of algorithms) and count Mould finally predicts the performance of overall applicability under the simulation framework of BigSim.It is pre- that Susukita, R. etc. also proposed a kind of performance Survey method, it combines backrush prediction model methods and procedures analogy method, and the performance characteristic based on application program simplifies application Program code constructs performance simulation program on the basis of obtaining the key parameters such as communication performance, by specifically emulating frame Structure simulates concurrent program implementation effect and makes a prediction.The performance prediction method needs in dedicated simulated environment (such as BSIM it) carries out, passes through the parallel performance of prediction of result application on the target machine of the simulated program on BSIM.To alternative Core calculation module requires relatively high, it is necessary to and it is that computation complexity is higher and will not influence the process of program after being substituted, this Limit the applicability of this method.The program is to carry out substitution rewriting, not general rewriting for specific code module simultaneously Scheme, the code module if necessary to substitution is more, then there is analysis, rewrites the shortcomings that larger workload.
It can be seen that existing method of parallel prediction is more intended to by certain several typical application or uniformly Benchmark program predicts that the parallel performance of large-scale multi-core computer, emphasis do not lie in how to assess the parallel of serial program Scheme will not more go assessment to select optimal parallel scheme, and this evaluation requirement is in the scene in optimization serial program performance In be it is very crucial, the prior art is unable to satisfy this demand.
Summary of the invention
Primary technical problem to be solved by this invention is to provide a kind of Parallel Computing Performance inspection towards communication test Examining system.
Another technical problem to be solved by this invention is to provide a kind of Parallel Computing Performance inspection towards communication test Survey method.
To realize that above-mentioned goal of the invention, the present invention use following technical solutions:
A kind of Parallel Computing Performance detection system towards communication test, comprising:
Serial application system, for obtaining the performance statistics file of serial program,
Parallel application system, for obtaining the runnability data under parallel equivalent model;
Parallel computation assessment system is answered parallel according to the performance statistics file of the serial program, and from described Runnability data under the parallel equivalent model obtained with system calculate testing result.
Wherein more preferably, the parallel computation assessment system includes data analysis module, for from the serial program In performance statistics file, integrated performance index is obtained.
Wherein more preferably, the parallel computation assessment system includes serial etc. for being established according to the integrated performance index Imitate the module of model.
Wherein more preferably, the parallel computation assessment system includes parallel etc. for being established based on the serial equivalent model Imitate the module of model.
Wherein more preferably, the parallel computation assessment system includes performance estimation module, the parallel equivalent model Runnability data obtain testing result.
A kind of Parallel Computing Performance detection method towards communication test, comprising the following steps:
Properties equivalent is carried out to serial program to map to obtain serial equivalent programs;
Parallel equivalent programs are obtained based on the serial equivalent programs;
Based on the parallel equivalent programs, performance data is obtained after operation, calculates testing result.
Wherein more preferably, when mapping to obtain serial equivalent programs to serial program progress properties equivalent, described in operation Serial program obtains performance statistics file, therefrom extracts the performance characteristic of the serial program, obtains the comprehensive of the serial program Close performance indicator.
Wherein more preferably, according to the integrated performance index, properties equivalent mapping is carried out, the serial program is mapped as The serial equivalent programs.
Wherein more preferably, when constructing the serial equivalent model of the serial program, meet following formula:
St03=k10*st10+k06*st06+k07*st07+k11*st11+tc
Wherein k06, k07, k10, k11 are that preset Dynamic gene, k10*st10, k06*st06, k07* can be achieved St07, k11*st11 characterize function entrance time, global data read time, global data write operation time, letter respectively Number time of return, tc are that effective efficiency module executes the time, are that function removes k10*st10, k06*st06, k07*st07, k11* The st11 remaining calculating time.
Wherein more preferably, properties equivalent is carried out to the serial program to map to obtain serial equivalent programs, including following step It is rapid:
1) serial program to be optimized is run, its function grade performance evaluation report is obtained;
2) according to the analytical statement, by can the function tree of parallelization part generate new function tree, and remaining can not be parallel A new function is merged into the part of change, constructs equivalent performance function tree;
3) according to the analytical statement and the equivalent performance function tree output integrated performance indicator;
4) the serial equivalent programs are generated according to equivalent performance function tree and the integrated performance index.
Wherein more preferably, during generating the serial equivalent programs, it is described can parallelization part function it is effective Functional module executive overhead concentrates in some function, the time be equal to it is described can parallelization part principal function execution it is total when It is long subtract it is described can in each function in parallelization part only global variable read/write operation when the execution time.
Currently lack the simple accurate method that parallel performance assessment is carried out for serial application.The present invention provides The simple accurate method of parallel performance assessment is carried out for serial application, and solves existing in the prior art comment Estimate that model is complicated, cannot reflect assessment performance, flexibility comprehensively not enough, can not be applied to the problems such as different scenes.By emulation It is demonstrated experimentally that the technical solution have the characteristics that it is simple, efficient, accurate.Simultaneously as the universality of the technical solution, it can also To be applied in the parallel optimization project of communication and other field.
Detailed description of the invention
Fig. 1 is the composition schematic diagram of the Parallel Computing Performance detection system towards communication test;
Fig. 2 is the flow chart of the Parallel Computing Performance detection method towards communication test;
Fig. 3 is the basic mapping process schematic diagram of equivalent moving model;
Fig. 4 is the generating process schematic diagram of equivalent performance function tree;
Fig. 5 is the building method flow chart of equivalent moving model;
Fig. 6 is to optimize the schematic diagram of equivalent performance function tree in LTE system emulation platform;
Fig. 7 is in LTE system emulation platform, using the parallelization schematic diagram of parfor mechanism;
Fig. 8 is in LTE system emulation platform, using the parallelization schematic diagram of MDCE mechanism.
Specific embodiment
Detailed specific description is unfolded to technology contents of the invention in the following with reference to the drawings and specific embodiments.
In the present invention, the Parallel Computing Performance detection system towards communication test includes parallel computation assessment system, string Row application system and parallel application system three main bodies, specific composition are as shown in Figure 1.
Wherein, serial application system includes serial application module and server/PC module.The output of serial application module is serial For program to server/PC module, server/PC module analyzes and counts the serial program, obtains performance statistics file, should File contains the statistical information of every computing resource when serial program is run, and should include at least following statistical data:
(1) each function total run time, total call number and the percentage for accounting for program total run time;
(2) global variable of each function access accounts for the byte number of memory;
(3) each function input parameters, return value account for the byte number of memory;
(4) each function executes the time of floating-point operation.
Parallel computation assessment system includes data analysis module, performance modeling module and performance estimation module.
Wherein, data analysis module is from performance statistics file/data from serial application system (such as matlab The performance statistics report of profiler tool output or the log recording data of application code statistics), it obtains comprehensive It can index.
Performance modeling module obtains integrated performance index from data analysis module, is established according to the integrated performance index serial Equivalent model is then based on serial equivalent model, parallel optimization scheme establishes parallel equivalent model.It is serial etc. in the present embodiment Model and parallel equivalent model are imitated, is all to be established in performance modeling module, it will be understood by those skilled in the art that by serial etc. Effect model and parallel equivalent model are assigned to be established and feasible in two modules.In other words, performance modeling module can be one A module is also possible to two modules.Performance modeling module is to be received based on preset equivalent model construction method according to it Integrated performance index establish serial equivalent model.
Performance estimation module obtains final according to the runnability data of parallel optimization target and parallel equivalent model Assessment result and report.
Parallel application system includes Parallel application module and server/PC.Wherein, Parallel application module, which receives, carrys out atman The parallel equivalent model of energy modeling module, runs under the parallel equivalent model.Server/PC obtains runnability data, so After feed back to performance estimation module.
Since serial program is converted into serial equivalent model by the present invention, reconvert carries out parallel at parallel equivalent model Operation, to improve operational efficiency, obtains the higher serial program performance parameter of accuracy in a short time.It is real by emulation It tests, the error of parallelization effect of the invention and serial equivalent programs is within 15%, accuracy with higher.
As shown in Fig. 2, being described in detail below with reference to the Parallel Computing Performance detection method towards communication test.
Step is 1.: after the operation of serial application system, output performance statistical data (the performance statistics file of serial program) is given Data analysis module.This document contains the statistical information of every computing resource when serial program is run, including counts as follows Data:
(1) each function total run time, total call number and the percentage for accounting for program total run time;
(2) global variable of each function access accounts for the byte number of memory;
(3) each function input parameters, return value account for the byte number of memory;
(4) each function executes the time of floating-point operation.
The performance statistics report of the analysis tools such as the profiler tool of Matlab output is capable of providing basic statistical number According to, and can be in a program by realizing that the function of operation resource statistics log provides more statistical data.Based on C, C++ Program Deng other language can automatically record and generate corresponding statistical number by way of realizing and running resource statistics log According to, or relevant performance statistics tool is relied on to obtain statistical data.
Step is 2.: data analysis module extracts serial program according to the performance statistics file analysis serial program of input Performance characteristic exports the integrated performance index of serial program, the key input information as building equivalent model.Such as 1 institute of table Show, integrated performance index includes following index:
1 integrated performance index list of table
Step is 3.: the integrated performance index that performance modeling module is exported according to data analysis module, according to equivalent model structure Construction method carries out properties equivalent mapping, serial program is mapped as serial equivalent programs.This processing mode remains original string The performance characteristic of line program shields the business characteristic of different application, so that parallel performance analysis can be based on unified, essence It is analyzed in the equivalent series computational domain of letter, greatlies simplify Performance Evaluation process.The construction method of equivalent model is detailed in institute hereinafter It states.
Step is 4.: performance modeling module converts serial equivalent programs to according to selected parallel optimization scheme parallel equivalent Program, since there are a variety of parallel schemes, the schemes such as parallel mechanism, Data Structure Design that every kind of parallel scheme uses phases not to the utmost Together, therefore different parallel equivalent model and program can be generated.
Step is 5.: the parallel equivalent programs that parallel application system runnability modeling module provides.
Step is 6.: parallel application system output performance statistical data.The requirement of the content and step of performance statistic 1. It is identical.
Step is 7.: performance estimation module is according to performance statistic, parallel optimization target analysis this parallel equivalent model It is no to meet the requirements, complete the evaluation process of a parallel optimization scheme.
Step is 8.: performance estimation module selects next parallel optimization scheme, jumps to step 4., comments until institute is in need The scheme estimated completes assessment.
Step is 9.: performance estimation module summarizes the assessment result of all parallel optimization schemes, exports assessment report.
It is essentially all that the basic module of software program is formed using Development of Modular mode is letter on software program design Number, therefore the basic unit of paralell design is function, and for number much absolutely needs parallel function, it is suitable to meet N (i) Big condition, can be according to the design of formula (1) this condition simplified model, i.e., to function when analysis needs the unit of parallelization Average behavior modeling.
Wherein F (i, j) is runing time of the function i when jth time is called, and M is the sum of function in program, and N (i) is letter Execution number of the number i in entire program operation process.F (i, j), M, N (i) can the acquisitions of passage capacity statistics file.Work as N (i) when quite big, according to the pungent law of large numbers of admiring, the average operating time of function i converges on desired value μ shown in formula (1) (i)。
When extracting the calculating feature of serial program based on this basic unit of function, need to consider two constraint conditions, One is being required to assess the calculating feature of serial program as unit of function, the second is being required to embody to parallelization Influence, the equivalence of mapping is to ensure that when being mapped to parallel equivalent model from serial equivalent model in this way.The present invention The thinking based on principal component analysis on the basis of the calculating feature that Emulation serial program itself has, it is top-down, using number Less integrated performance index is measured to extract the calculating feature of serial program.
One parallel runtime by serially switching to parallel function is by function entrance time (function address addressing, letter Number parameter pop downs jump to the time overheads of the function entrances such as function address), (function is from the overall situation for global data read time Memory copying is to this function space), function effective efficiency module execute time (removal function call, return, global data read-write The remaining effective efficiency of the time overhead of operation executes the time), the global data write operation time, (function assigned local computing result Be worth the time overhead of global memory), return value (is pressed into function stack, and jumps to the time of main program by function time of return Expense).It is thus determined that the following performance indicator for needing to obtain: st01 function execute total duration, st02 function call number, St03 function execute average duration, the total reading times of st04 function global variable, st05 function global variable be always written number, Number, st08 function input parameters are averagely written in be averaged reading times, st07 function global variable of st06 function global variable Number, st09 function return value number, st10 function input parameters byte number, st11 function return to varying byte number index, meet Relationship shown in formula (2).
St03=k10*st10+k06*st06+k07*st07+k11*st11+tc (2)
Wherein k06, k07, k10, k11 are that preset Dynamic gene, k10*st10, k06*st06, k07* can be achieved St07, k11*st11 characterize function entrance time, global data read time, global data write operation time, letter respectively Number time of return, tc are that effective efficiency module executes the time, are that function removes k10*st10, k06*st06, k07*st07, k11* St11 remaining calculating time, these parameters the modes such as can get ready by statistical tool or in a program and be united for a long time Meter is average to be obtained.It only needs to construct equivalent function according to the parameter in formula (2) when subsequent builds equivalent model, according to corresponding Algorithm constructs equivalent function tree, completes to construct the equivalent model of entire serial program, while can easily be converted to simultaneously Row equivalent model.
Since formula (2) models global data write operation, it will be able to the accurate simulation when parallel task executes Main task from parallel subtasks collect data needed for time overhead.
Parallel optimization design needs to consider more factor, algorithm, data structure including program to be optimized itself, parallel The factors such as mechanism, the specification of hardware and software platform, the maintainability of concurrent program code and scalability.If to excellent The code size for changing program is big, and complexity is high, then how to select the total solution of parallel optimization and in advance assessment optimization The UPS upper performance score reached just becomes very crucial task, because the workload of the parallel optimization adjustment of extensive program is very Greatly, once parallel scheme selection is wrong, then a large amount of manpower and material resources can be wasted, therefore need to consider to provide a kind of assessment parallel scheme Cost effective method, assess the performance level of various parallel schemes in advance with very small cost, mentioned for the decision of parallel scheme For quantitative analysis data.
As shown in figure 3, equivalent moving model is what the Thoughts based on properties equivalent designed, it will be in serial computing domain Serial program to be optimized is transformed to and is assessed in equivalent parallel computational domain, set by assessing different parallel optimizations by converting twice The performance gain of meter scheme, required cost, selection can satisfy the best parallel scheme of performance optimization aim.
Equivalent moving model includes two parts, and first part is to carry out properties equivalent mapping to " serial program " to be optimized Obtained " serial equivalent programs ", this " serial equivalent programs " are also the Performance Evaluation benchmark of parallel optimization, as assessment benchmark Need to meet in mapping runnability can equivalent transformation, i.e., should " serial equivalent programs " being capable of code and process to simplify Simulate serial program runnability, after being optimized when doing parallel optimization to the serial equivalent programs can approximation comment Estimate the attainable performance bound of parallel optimization scheme institute, without being optimized to " serial program " in serial computing domain, this Greatly reduce the workload of parallel optimization design and assessment.Second part is obtained after selecting specific parallel optimization scheme " parallel equivalent programs ".
Transformation is needed with the performance of the code, flowsheeting serial program simplified the most, due to only considering to run for the first time Performance is not simulated truth function, therefore can be completed properties equivalent mapping with very small cost, and serial equivalent programs are generated. Concrete thought is as follows:
1) analyze serial program to be optimized, find out in program which partially can parallelization, and which be partially must It must serially execute.But this step is not required, and all serial programs to be optimized can be carried out step 2~7.
2) serial program to be optimized is run by analysis tools such as matlab profile, obtains its function grade performance point Report is analysed, the read/write operation of call relation, function call number, global variable between the serial executive overhead of function, function is obtained The important informations such as number.
3) according to the analytical statement of step 2) (such as profile report), by can parallelization part function tree by returning And beta pruning is handled, and generates only two layers new function tree, and remaining can not the part of parallelization merge into a new function, Equivalent performance function tree is constructed, and indicates the call number of each function and the read/write time of global variable in function tree Several and partial function function executes total duration, as shown in Figure 4.
4) according to the analytical statement of step 2) and equivalent performance function tree output integrated performance indicator;
5) serial equivalent programs are produced according to equivalent performance function tree and integrated performance index, from runing time, storehouse Each dimension such as data back carrys out equivalent simulation serial program to be optimized between memory overhead, function.The specific method is as follows:
● call relation between the function of serial equivalent programs is constructed according to equivalent performance function tree;
● read/write operation only can be carried out to global variable in each function of parallelization part;
● using some way (such as to the fft algorithm of floating point arithmetic) generate the fixation of runing time (such as 1ms) atom computing unit, the executive overhead for simulation program;
● can the function effective efficiency module executive overhead of parallelization part concentrate in some function, the time Equal to can the execution total duration of parallelization part principal function (test1 in such as Fig. 4) subtract can be in each function in parallelization part Only global variable read/write operation when the execution time, from can randomly select in parallelization function, calculating process utilizes this function Atom computing unit simulation;
● function effective efficiency module executive overhead is only embodied in serial function (S function in such as Fig. 4), and logical Cross the realization of atom computing unit.
● generate function call number needed for serial equivalent programs process, function timing, global variable read/write Number of operations etc. can be obtained from integrated performance index.
Note: not being but code to be directly placed on institute by the way of function call when using atom computing unit In the function needed.
6) the various parallel optimization mechanism such as parfor, MDCE, MPI are selected, and Parallel Design is carried out to serial equivalent programs Optimization, generates parallel equivalent programs, runs in specific hardware and software platform, using serial program to be optimized as benchmark, according to The implementing result of equivalent programs after parallelization obtains the performance rating data of parallel scheme;
7) performance rating data and optimization aim obtained according to step 6) assesses Parallel Design scheme, if this sets parallel Meter scheme is unable to satisfy parallel performance optimization aim, then return step 6), new parallel optimization mechanism is selected, until program energy Enough meet optimization aim.
The present invention, which can be applied, is switched to parallel computation from serial computing in LTE system emulation platform, is also applied for other The computing system or platform of business scope are implementing Performance Evaluation and prediction of the parallel optimization last stage to optional parallel scheme, should The realization body of method is performance test assessment tool or performance simulation system.
It is illustrated by taking the parallel optimization of the LTE system emulation platform based on matlab language as an example below.
LTE system emulation platform includes physical layer channel configuration, interference calculation, data importing, rrc layer Message Processing, object Multiple modules such as reason layer Message Processing, MAC layer are dispatched, simulation result is shown, entire platform program is extremely complex, along with parallel The means of optimization are again very much, therefore, it is difficult to quickly obtain the effect after parallelization, select optimal Optimization Mechanism, can adopt at this time With method of the invention, it simply and efficiently decision can go out to meet the parallel scheme of optimization aim.Specific implementation process is such as Under:
1. analysis LTE system simulated program discovery physical layer message processing module is the main time-consuming module of entire platform, And in the module between user there is no coupling, can carry out parallelization to it, and be accordingly to be regarded as can not parallelization part for rest part.
2. running LTE system simulated program, analytical statement is generated, obtains calling between the serial executive overhead of function, function The information such as relationship.
3. merger beta pruning processing is done to the function tree of physical layer message processing module, to remaining mould according to analytical statement Block merges, and generates equivalent performance function tree as shown in FIG. 6, and indicates global in the call number and function of function become The read/write number of amount and the function of partial function execute total duration.
4. obtaining every integrated performance index in table 1 according to analytical statement and equivalent performance function tree.
5. producing serial equivalent programs according to step 1~4.
● call relation between the function of serial equivalent programs is established according to the equivalent performance function tree of Fig. 6.
● each function of physical layer message processing module only carries out read/write operation, concrete operations number to global variable See Fig. 6.
● because mainly carrying out operation to floating number in this emulation platform, so fft algorithm, which can be used, generates an operation Time is the atom function of 1ms.
● the function effective efficiency module executive overhead of the physical layer message processing each function in part concentrates on PHY_ In Message_Sector_DL_Process function, the time is equal to the execution total duration of PHY_Message_Process function Execution time when only having global variable read/write operation in each function of physical layer message processing module is subtracted, this part is opened Pin is realized using atom function.
● multiple its executive overhead of circulating analog is carried out to atom function in Serial_Func function.
6. two kinds of parallel optimization mechanism of parfor, MDCE of selection carry out the serial equivalent programs of physical layer message processing Parallel Design optimization, generates parallel equivalent programs, runs in Matlab platform, is made with the serial program of LTE system emulation platform On the basis of, assess performance gain.
Run the parallel equivalent programs of two kinds of mechanism, the results showed that the parallelization better effect of parfor, this is because The expense of parfor parallel mechanism is smaller.Parfor and MDCE parallelization mechanism has the difference of essence, and parfor can only be at one Parallel task is run on server/PC, as shown in fig. 7, it is accomplished that the parallel processor system of function rank, optimization design letter It is single;And MDCE also supports multimachine distribution pattern in addition to local mode, extensible application applies it on a cluster Multiple stage computers, as shown in figure 8, being accomplished that the parallel processing mechanism of task rank, optimization design is complex.
It is attainable parallel that the present invention can be used for rapid evaluation and prediction the serialized software institute under different Parallel Design schemes Calculated performance gain (or speed-up ratio),
(1) new method of double transform domain equivalent statistics models is proposed, destination application is continuous from serial computing domain Equivalent series computational domain and equivalent parallel computational domain are transformed to, the workload of parallel performance assessment and prediction is greatlied simplify;
(2) it on the basis of combining LTE simulation model to calculate feature, is commented based on Principal Component Analysis and statistical method reduction Estimate the index quantity of calculated performance, exports a small number of overall targets, and based on these overall targets building emulation application etc. Computation model is imitated, the building difficulty of equivalent series computation model is greatly simplified;
(3) parallel computation equivalent model is constructed based on equivalent series computation model, Parallel Design scheme, and really simultaneously Actual measurement assessment is carried out in row computing platform, is iterated Parallel Design scheme based on calculated performance target, until meeting setting Optimization aim enables iteration to be rapidly completed since computation model is succinctly effective, reduces Parallel Scheme Design and assessment Workload.
Currently lack the simple accurate method that parallel performance assessment is carried out for serial application.The present invention provides The simple accurate method of parallel performance assessment is carried out for serial application, and solves existing in the prior art comment Estimate that model is complicated, cannot reflect assessment performance, flexibility comprehensively not enough, can not be applied to the problems such as different scenes.By emulation It is demonstrated experimentally that the technical solution have the characteristics that it is simple, efficient, accurate.The present invention is in the wireless LTE system based on matlab language Trial effect proves that it is simple, efficient, accurate that the present invention has the characteristics that, simultaneously in system simulated program Parallel Design optimization project Also there is good universality, can be applied in the computer program parallel optimization project of communication and other field, can be used for Commercial performance test assessment tool or performance simulation system.
The Parallel Computing Performance detection system to provided by the present invention towards communication test and its method carry out above Detailed description.For those of ordinary skill in the art, it is done under the premise of without departing substantially from true spirit Any obvious change, will all constitute the infringement weighed to the invention patent, corresponding legal liabilities will be undertaken.

Claims (9)

1. a kind of Parallel Computing Performance detection system towards communication test, characterized by comprising:
Serial application system obtains its function grade performance evaluation report and performance statistics for running serial program to be optimized File;According to the analytical statement, by can the function tree of parallelization part generate new function tree, remaining can not parallelization part A new function is merged into, equivalent performance function tree is constructed;It is exported according to the analytical statement and the equivalent performance function tree Integrated performance index generates serial equivalent model according to the equivalent performance function tree and the integrated performance index;
Parallel application system, for obtaining the runnability data under parallel equivalent model;
Parallel computation assessment system, according to the performance statistics file of the serial program, and from the Parallel application system The runnability data united under the obtained parallel equivalent model calculate testing result.
2. Parallel Computing Performance detection system as described in claim 1, it is characterised in that:
The parallel computation assessment system includes data analysis module, for from the performance statistics file of the serial program, Obtain integrated performance index.
3. Parallel Computing Performance detection system as claimed in claim 2, it is characterised in that:
The parallel computation assessment system includes the module for establishing serial equivalent model according to the integrated performance index.
4. Parallel Computing Performance detection system as claimed in claim 3, it is characterised in that:
The parallel computation assessment system includes the module for establishing parallel equivalent model based on the serial equivalent model.
5. Parallel Computing Performance detection system as claimed in claim 4, it is characterised in that:
The parallel computation assessment system includes performance estimation module, is obtained according to the runnability data of the parallel equivalent model To testing result.
6. a kind of Parallel Computing Performance detection method towards communication test, it is characterised in that the following steps are included:
Serial program to be optimized is run, its function grade performance evaluation report is obtained;It, can parallelization according to the analytical statement Partial function tree generates new function tree, remaining can not the part of parallelization merge into a new function, building equivalence energy content Number tree;According to the analytical statement and the equivalent performance function tree output integrated performance indicator, according to the equivalence energy content Number tree and the integrated performance index generate serial equivalent programs;
Parallel equivalent programs are obtained based on the serial equivalent programs;
Based on the parallel equivalent programs, performance data is obtained after operation, calculates testing result.
7. Parallel Computing Performance detection method as claimed in claim 6, it is characterised in that:
It runs the serial program and obtains performance statistics file, therefrom extract the performance characteristic of the serial program, obtain described The integrated performance index of serial program.
8. Parallel Computing Performance detection method as claimed in claim 6, it is characterised in that:
When generating the serial equivalent programs of the serial program, meet following formula:
St03=k10*st10+k06*st06+k07*st07+k11*st11+tc
Wherein, st03 is that function executes average duration, and st10 is function input parameters byte number, and st06 is flat for function global variable Equal reading times, st07 are that function global variable is averagely written number, and st11 is that function returns to varying byte number, k06, k07, K10, k11 are that preset Dynamic gene can be achieved, and k10*st10, k06*st06, k07*st07, k11*st11 characterize letter respectively Number entry time, global data read time, global data write operation time, function time of return, tc are effective efficiency mould Block executes the time, is that function removes k10*st10, k06*st06, k07*st07, k11*st11 remaining calculating time.
9. Parallel Computing Performance detection method as claimed in claim 6, it is characterised in that:
During generating the serial equivalent programs, it is described can parallelization part function effective efficiency module executive overhead Concentrate in some function, the time be equal to it is described can parallelization part principal function execution total duration subtract it is described can be parallel Change execution time when only having global variable read/write operation in each function in part.
CN201510508961.8A 2015-08-18 2015-08-18 A kind of Parallel Computing Performance detection system and its method towards communication test Active CN106469114B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510508961.8A CN106469114B (en) 2015-08-18 2015-08-18 A kind of Parallel Computing Performance detection system and its method towards communication test

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510508961.8A CN106469114B (en) 2015-08-18 2015-08-18 A kind of Parallel Computing Performance detection system and its method towards communication test

Publications (2)

Publication Number Publication Date
CN106469114A CN106469114A (en) 2017-03-01
CN106469114B true CN106469114B (en) 2019-06-04

Family

ID=58213780

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510508961.8A Active CN106469114B (en) 2015-08-18 2015-08-18 A kind of Parallel Computing Performance detection system and its method towards communication test

Country Status (1)

Country Link
CN (1) CN106469114B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108052370B (en) * 2017-10-09 2021-06-08 华南理工大学 Evaluation method for influence of shared memory on program execution time based on accompanying program group
CN112749724B (en) * 2019-10-31 2024-06-04 阿里巴巴集团控股有限公司 Method and equipment for training classifier and predicting application performance expansibility
CN111209042B (en) * 2020-01-06 2022-08-26 北京字节跳动网络技术有限公司 Method, device, medium and electronic equipment for establishing function stack
CN112685326A (en) * 2021-01-26 2021-04-20 政采云有限公司 Software testing method, system, equipment and readable storage medium
CN112784422B (en) * 2021-01-28 2022-10-25 华东师范大学 Fine-grained performance modeling method applied to parallel scientific computing program

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1932766A (en) * 2006-10-12 2007-03-21 上海交通大学 Semi-automatic parallel method of large serial program code quantity-oriented field
CN103080900A (en) * 2010-09-03 2013-05-01 西门子公司 Method for parallelizing automatic control programs and compiler
CN103246541A (en) * 2013-04-27 2013-08-14 中国人民解放军信息工程大学 Method for evaluating auto-parallelization and multistage parallelization cost
CN104035781A (en) * 2014-06-27 2014-09-10 北京航空航天大学 Method for quickly developing heterogeneous parallel program

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1932766A (en) * 2006-10-12 2007-03-21 上海交通大学 Semi-automatic parallel method of large serial program code quantity-oriented field
CN103080900A (en) * 2010-09-03 2013-05-01 西门子公司 Method for parallelizing automatic control programs and compiler
CN103246541A (en) * 2013-04-27 2013-08-14 中国人民解放军信息工程大学 Method for evaluating auto-parallelization and multistage parallelization cost
CN104035781A (en) * 2014-06-27 2014-09-10 北京航空航天大学 Method for quickly developing heterogeneous parallel program

Also Published As

Publication number Publication date
CN106469114A (en) 2017-03-01

Similar Documents

Publication Publication Date Title
CN106469114B (en) A kind of Parallel Computing Performance detection system and its method towards communication test
Ïpek et al. Efficiently exploring architectural design spaces via predictive modeling
Gibilisco et al. Stage aware performance modeling of dag based in memory analytic platforms
Mariani et al. OSCAR: An optimization methodology exploiting spatial correlation in multicore design spaces
Chen et al. Tree structured analysis on GPU power study
Nouri et al. Building faithful high-level models and performance evaluation of manycore embedded systems
Bhimani et al. New performance modeling methods for parallel data processing applications
Mariani et al. DeSpErate++: An enhanced design space exploration framework using predictive simulation scheduling
Ji et al. An artificial neural network model of LRU-cache misses on out-of-order embedded processors
Yasudo et al. Performance estimation for exascale reconfigurable dataflow platforms
CN110377525B (en) Parallel program performance prediction system based on runtime characteristics and machine learning
Kaviani et al. Cross-tier application and data partitioning of web applications for hybrid cloud deployment
Tiwari et al. Identification of critical parameters for MapReduce energy efficiency using statistical Design of Experiments
US11644882B2 (en) System and method for predicting power usage of network components
Liu et al. Agent-based online quality measurement approach in cloud computing environment
Zhang et al. Performance difference prediction in cloud services for SLA-based auditing
Anger et al. Application modeling for scalable simulation of massively parallel systems
Makaratzis et al. GPU power modeling of HPC applications for the simulation of heterogeneous clouds
Madougou et al. Using colored petri nets for GPGPU performance modeling
Zhang et al. Jolteon: Unleashing the Promise of Serverless for Serverless Workflows
Chennupathi et al. IMCSIM: Parameterized performance prediction for implicit Monte Carlo codes
Jin Virtualization technology for computing system: Opportunities and challenges
CN117435870B (en) Load data real-time filling method, system, equipment and medium
Alavani et al. Prediction of performance and power consumption of GPGPU applications
CN111522644B (en) Method for predicting running time of parallel program based on historical running data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20190419

Address after: 201210 4th Floor, Building 8, 100 Haike Road, Pudong New District, Shanghai

Applicant after: Sanghai Radio Communication Research Center

Applicant after: Fuzhou Internet of things open laboratory Co., Ltd.

Address before: 201210 4th Floor, Building 8, 100 Haike Road, Pudong New District, Shanghai

Applicant before: Sanghai Radio Communication Research Center

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant