CN112882912A - Function running time measuring method applied to parallel scientific computation program - Google Patents

Function running time measuring method applied to parallel scientific computation program Download PDF

Info

Publication number
CN112882912A
CN112882912A CN202110141179.2A CN202110141179A CN112882912A CN 112882912 A CN112882912 A CN 112882912A CN 202110141179 A CN202110141179 A CN 202110141179A CN 112882912 A CN112882912 A CN 112882912A
Authority
CN
China
Prior art keywords
function
timing
file
running time
program
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110141179.2A
Other languages
Chinese (zh)
Other versions
CN112882912B (en
Inventor
刘垚
赵景元
薛巍
杨磊
焦鹏龙
张忆莲
苏巨亮
樊树伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuxi Hengding Super Computing Center Co ltd
East China Normal University
Original Assignee
Wuxi Hengding Super Computing Center Co ltd
East China Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuxi Hengding Super Computing Center Co ltd, East China Normal University filed Critical Wuxi Hengding Super Computing Center Co ltd
Priority to CN202110141179.2A priority Critical patent/CN112882912B/en
Publication of CN112882912A publication Critical patent/CN112882912A/en
Application granted granted Critical
Publication of CN112882912B publication Critical patent/CN112882912B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3409Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
    • G06F11/3419Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment by assessing time
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3404Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for parallel or distributed programming

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention provides a method for measuring function running time applied to a parallel scientific computation program, which comprises the following steps: generating an intermediate file for the range sequence source file to be measured through the LLVM, scanning, acquiring all self-defined function names, numbering, and writing in a function name recording file; generating a new intermediate file; aiming at the new intermediate file, generating an executable file of the program to be measured; the timing function records and determines the parent-child calling relationship and calling times among functions, and records the running time of the functions; obtaining a timing result file after the executable file of the program to be measured runs; sorting the timing result file according to the timing result file to generate a new timing result file; and restoring the function number in the new timing result file into a function name. The invention provides a function running time measuring method applied to a parallel scientific computation program, which can obtain accurate function running time and is beneficial to excavating the performance bottleneck of the program.

Description

Function running time measuring method applied to parallel scientific computation program
Technical Field
The invention relates to the field of function running time measurement, in particular to a function running time measurement method applied to a parallel scientific computation program.
Background
Scientific computing is a comprehensive cross-domain that uses computers to analyze and solve scientific problems, and plays a very important role in many scientific disciplines and high-tech application fields. In actual scientific research, a large number of complex mathematical calculation problems are often encountered, which are difficult to solve by common calculation tools and easy to process by a computer. Many scientific computing applications and software have been developed, which mainly function to simulate computing problems in scientific disciplines by computers, such as scientific computing applications for predicting earthquakes, tsunamis and other natural disasters by numerical simulation, scientific computing applications for studying biomolecules and developing new compounds by molecular modeling, and in addition, scientific computing has been widely used in the fields of hydrodynamics, bioinformatics, chemometrics, geophysics, and the like.
The scientific calculation has the characteristics of complex calculation formula, large calculation amount and large numerical value change, and requires a required computer system to have strong calculation capacity, so the method is usually operated on a high-performance calculation platform. The high-performance computing platform comprises a plurality of computing nodes, and the nodes are communicated in modes of MPI and the like. The parallel technology can use a plurality of computing nodes simultaneously, and the computing speed is increased. The scientific computing program developed by the parallel technology is the parallel scientific computing program.
For example, NPB (NAS parallel benchmarking) is a set of parallel scientific computing programs used to evaluate the performance of supercomputers. The program is from computational fluid dynamics application programs, and comprises 5 kernel programs and 3 pseudo application programs, wherein the size of the program is predefined and is divided into different categories. In addition, the earth system model is also a very representative parallel scientific computation program, which is mainly used for quantifying the law of earth changes and the relationship between human activities and earth changes. Among them, CAS-ESM (global system model of chinese academy of sciences) is a global system model released by chinese academy of sciences, which is advanced and widely used. The CAS-ESM couples independent mode components, such as an atmospheric circulation mode, a marine circulation mode, a terrestrial process mode, an ocean ice mode, a vegetation dynamics mode, an aerosol and atmospheric chemistry mode, a marine geochemical geochemistry mode, a terrestrial biochemical geochemistry mode and the like, by using the coupler to form a complex system with huge calculation amount.
However, even though the computing power of various high-performance computing platforms is rapidly developed, the machine utilization rate of the parallel scientific computing program is gradually reduced, and the actual performance peak value of the program even can not reach 20% of the machine peak value, which indicates that the performance of the scientific computing application program often has a large optimization space.
An important indicator of program performance is its run time on the target computer. The ability to run the program faster not only means a reduction in problem-solving time and savings in overhead, but also makes possible certain time-critical tasks. Therefore, reduction of the actual running time of the program is a very concern. The accurate measurement of the function running time in the program is beneficial to carrying out performance modeling on the program and mining the performance bottleneck of the program, thereby realizing the improvement of the program performance.
The measurement of the function running time mainly measures the calculation function running time and the communication function running time of the function respectively. To solve the problem of accurately timing the function runtime, some research has been carried out to propose a solution to this problem. The GNU's gprofs enable the timing of computation functions, but do not measure the runtime of communication functions; the HPCtoolkit designed by Adhiantol can respectively calculate the running time of a timing function and the running time of a communication function, but the timing precision is not high; the tam (tuning and Analysis utilities) designed by Sameer can achieve higher timing precision, but the accuracy of the timing result is not high.
Disclosure of Invention
In view of the above defects of the prior art, the technical problem to be solved by the present invention is that the existing function running time measurement accuracy and precision are not high. The invention provides a function running time measuring method applied to a parallel scientific computation program, which can obtain accurate function running time and is beneficial to excavating the performance bottleneck of the program.
In order to achieve the above object, the present invention provides a method for measuring function runtime applied to a parallel scientific computation program, comprising the following steps:
the method for numbering the functions in advance before the program runs can reduce the measurement expense during the program running;
for the intermediate file of the program generated by the LLVM, inserting a function number and a timing function into each function through the LLVM to generate a new intermediate file, wherein the timing function adopts Cycle-level timing, so that the high precision of the timing is guaranteed;
generating an executable file of the program to be measured for the new intermediate file through the LLVM;
when an executable file of a program to be measured runs, recording and determining a parent-child calling relationship and a calling number between functions through a timing starting function and a timing ending function of a parent function (namely a calling function) and a child function (namely a called function), calculating the running time of the functions, and storing the parent-child calling relationship, the calling number, the running time of the functions not including the timing function and the running time of the functions including the timing function into a function running state recording table in a memory, wherein the method can distinguish the running time of the functions not including the timing function from the running time of the functions including the timing function, thereby ensuring high accuracy of timing;
after the executable file of the program to be measured is operated, writing a function operation state record table in the memory into a timing result file on the hard disk;
according to the timing result file, subtracting the function running time of all sub-functions of the function, which contains the timing function, from the function running time which does not contain the timing function to obtain the running time of the function per se and generate a new timing result file;
and recording the file according to the function name, and restoring the function number in the new timing result file into the function name.
Further, generate the intermediate file to the range sequence source file that awaits measuring through LLVM to scan the intermediate file, acquire all self-defined function names, number the self-defined function name of acquireing, and write in the function name record file on the hard disk, specifically include:
compiling a source file into an intermediate file through an LLVM (link layer virtual memory), and compiling an LLVM Pass to scan functions in a Module of the intermediate file one by one;
if the Function is the Function statement, acquiring the Function name, numbering the Function name and storing the Function name into a Function name recording file, otherwise, not processing the Function name, and recording the Function number and the self-defined Function name in the Function name recording file;
and repeating the scanning and recording steps until all the Function files in the modules of the intermediate file are scanned and recorded.
Further, the function numbers are set to insert respective function numbers after the entry in each function; the timing functions include a timing start function and a timing end function, and are arranged to insert the timing start function after an entry in each function and to insert the timing end function before an exit in each function.
Further, the function running state record table is set to record < function number, parent function number, call times, function running time without timing function, and function running time with timing function >, and the data structure of the function running state record table stored in the memory is a red-black tree, which can reduce the measurement overhead when the program runs.
Further, the timing start function firstly records the current time and stores the current time in a function running time stack containing the timing function, stores the function number in the function number stack, and records the current time again as the function running start time without the timing function and as a return value of the timing start function.
Further, the timing ending function firstly records the current time, and subtracts the current time from the function operation starting time which does not contain the timing function to obtain the function operation time which does not contain the timing function; then 2 function numbers at the top of the extracted function number stack comprise the number of the function and the parent function number of the function; recording the current moment again, and subtracting the time from the time of the stack top of the function running time stack containing the timing function to obtain the function running time containing the timing function; finally, the function number and the father function number are used as main keys, whether a corresponding table entry exists in a searched function running state recording table or not is judged, if yes, the calling times in the table entry are added together, and the function running time without the timing function and the function running time with the timing function are respectively accumulated; if not, a new table entry is built and inserted into the function running state recording table.
Technical effects
The function runtime measurement method applied to the parallel scientific computation program realizes the recording of the function runtime of all the self-defined functions including the self-defined MPI Wrapper function based on the LLVM, and realizes the measurement with high precision, high accuracy and low cost.
The conception, the specific structure and the technical effects of the present invention will be further described with reference to the accompanying drawings to fully understand the objects, the features and the effects of the present invention.
Drawings
FIG. 1 is a general flow chart of a function runtime measurement method applied to a parallel scientific computing program according to a preferred embodiment of the present invention;
FIG. 2 is a diagram illustrating a new intermediate file generated by inserting a function number and a timing function into each function of an original intermediate file of the LLVM in an intermediate file in the NPB according to a preferred embodiment of the present invention;
FIG. 3 is a diagram illustrating a new intermediate file generated by inserting a function number and a timing function into each function of the original intermediate file of the LLVM according to a certain intermediate file in the CAS-ESM of the preferred embodiment of the present invention.
Detailed Description
In order to make the technical problems, technical solutions and advantageous effects to be solved by the present invention more clearly apparent, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular internal procedures, techniques, etc. in order to provide a thorough understanding of the embodiments of the invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.
For scientific computing applications that need to be implemented in parallel, the running time of the program has not only computation time but also MPI communication time. In order to take over the MPI function of the system, an MPI Wrapper technique may be used to encapsulate the MPI function of the system. According to the MPI standard, each MPI operation can be implemented by a function beginning with MPI _ and PMPI _ at the same time, for example, both functions MPI _ Send and PMPI _ Send can implement the same transmission operation. By utilizing the characteristic, a self-defined MPI Wrapper function with the same name as the system MPI function is created to connect the calling of the system MPI function, and then the PMPI function is called in the self-defined MPI Wrapper function to realize the specific MPI communication. Through the MPI Wrapper technology, the user-defined MPI Wrapper function takes over the MPI function of the system, the user-defined function is further inserted, and the MPI function is timed.
LLVM is a modular, reusable set of compilers and tools, provides compiler-related extended support, and can be used as a compiler background for multi-purpose languages. Clang and Flang support C/C + + and Fortran compilers, form a complete tool chain together with the LLVM, support various operating systems and hardware architectures, and can replace a GCC compiling system. Because the LLVM is designed by adopting a modular library architecture, the LLVM is easily embedded into various application programs. The function insertion in the program can be realized through the LLVM.
Therefore, the present invention provides a LLVM-based function runtime measurement method applied to a parallel scientific computing program, which measures a function runtime running in a program by instrumentation of a source program by an LLVM, as shown in fig. 1, and includes the following steps:
step 1: and generating an intermediate file for the range sequence source file to be measured through the LLVM, scanning the intermediate file, acquiring all the user-defined function names, numbering the acquired user-defined function names, and writing the function names into a function name recording file on the hard disk. Specifically, an intermediate file of a program is generated through the LLVM and scanned, the LLVM Pass is written to scan the Function in the Module of the intermediate file one by one, if the Function is a Function statement, the Function name of the Function is obtained and numbered and stored in a Function name recording file, otherwise, the Function name recording file is not processed, and after scanning is completed, all self-defined Function names and numbers including the self-defined MPI Wrapper Function are contained in the Function name recording file.
Step 2: and for the intermediate file of the program generated by the LLVM, inserting a function number and a timing function into each function through the LLVM to generate a new intermediate file. Specifically, for the intermediate file generated in step 1, according to the Function name record file, writing LLVM Pass to scan the functions in the Module of the intermediate file one by one, if the functions are Function declarations, inserting the corresponding Function number and timing start Function after the entry in each Function, and inserting the timing end Function before the exit in each Function to generate a new intermediate file, otherwise, not processing.
And step 3: and generating an executable file of the program to be measured for the new intermediate file through the LLVM. Specifically, for the new intermediate file generated in step 2, the file is compiled by LLVM and an executable file of the program to be measured is generated.
And 4, step 4: when an executable file of a program to be measured runs, recording function running time containing a timing function, a function number and function running starting time not containing the timing function through a timing starting function; determining a parent-child calling relationship between functions (namely determining the relationship between a calling function and a called function) through a timing ending function, increasing the calling times, calculating the running time of the functions, and storing the parent-child calling relationship, the calling times, the function running time without the timing function and the function running time with the timing function into a function running state record table in a memory, wherein the data structure of the function running state record table stored in the memory is a red-black tree, and the data structure can reduce the measurement overhead during the running of a program. The method can distinguish the function running time without the timing function from the function running time with the timing function, thereby ensuring high accuracy of timing.
And 5: and after the executable file of the program to be measured is operated, writing the function operation state record table in the memory into a timing result file on the hard disk. Specifically, after the executable file of the program to be measured runs, the llin function is inserted by writing the LLVM Pass, the function which writes the memory result back to the file is inserted at the end of the MAIN function, and the function running state record table in the memory is written into the timing result file on the hard disk.
Step 6: traversing the result in the timing result file, calculating the function running time of all the sub-functions of the current function, which contains the timing function, according to the parent-child calling relationship in the timing result file, subtracting the function running time of the sub-function, which contains the timing function, from the function running time of the current function, which does not contain the timing function, to obtain the function running time, and generating a new timing result file.
And 7: and (4) restoring the function number in the new timing result file generated in the step (6) into the function name according to the corresponding relation between the function name and the function number in the function name record file generated in the step (1).
The method and the device acquire the function running time by inserting the functions in the program, can measure the time of the MPI communication function which cannot be measured by the traditional timing tool, and can acquire more accurate and more accurate function running time compared with the traditional timing tool.
The function runtime measurement method applied to the parallel scientific computation program of the present invention will be described in two specific embodiments.
Example 1:
taking the SP example program in NPB as an example, the time measurement is carried out on the function operation by using the method by adopting the B scale and using 9 processes for operation. The method comprises the following specific steps:
1) generating an intermediate file of a program by the LLVM, scanning, compiling a Pass with a Function of acquiring a Function name, compiling the Pass into a dynamic link library, using an optimizer of the LLVM to enable the Pass to scan functions in a Module of the intermediate file one by one, and if the functions are Function declarations, acquiring the Function names, numbering the Function names and storing the Function names in a Function name recording file. The function name recording file records < function number, self-defined function name >. After the scanning is finished, all the self-defined function names including the self-defined MPI Wrapper function are contained in the function name recording file and are numbered. Part of the data is presented as shown in table 1. Here, taking x _ solution function as an example, the function is numbered 764.
TABLE 1
Function numbering Function name
0 MPI_Init
1 MPI_Finalize
2 MPI_Bsend
3 MPI_Bsend_init
…… ……
749 timer_read_
750 MAIN_
751 make_set_
752 initialize_
753 lhsinit_
754 exact_solution_
755 exact_rhs_
756 set_constants_
757 adi_
758 compute_buffer_size_
759 copy_faces_
760 compute_rhs_
761 lhsx_
762 lhsy_
763 lhsz_
764 x_solve_
…… ……
2) And aiming at the intermediate file, compiling the Pass with the Function of inserting the intermediate file, compiling the Pass into a dynamic link library, using an optimizer of the LLVM to insert functions in modules of the intermediate file one by the Pass, inserting a serial number and a timing starting Function corresponding to each Function after an entrance in each Function, inserting a timing ending Function before an exit in each Function, and generating a new intermediate file. The partially instrumented code of x _ solution is shown in FIG. 2.
3) And generating an executable file of the program to be measured by the LLVM according to the generated new intermediate file.
4) When an executable file of a program to be measured runs, a timing starting function is used, the current time is recorded at first and is stored in a function running time stack containing the timing function, the function number of the function is stored in the function number stack, and the current time is recorded again to serve as the function running starting time without the timing function and serve as a return value of the timing starting function. And the timing ending function records the current time and the calling relationship of the father and the son, calculates the calling times and the running time of the function, and stores the calling relationship of the father and the son, the calling times, the running time of the function without the timing function and the running time of the function with the timing function into a function running state recording table in the memory.
5) After the executable file of the program to be measured is operated, the MALN function is inserted by writing the LLVM Pass, the function which writes the memory result back to the file is inserted at the end of the MALN function, and the function operation state record table in the memory is written into the timing result file on the hard disk. The timing result file records < function number, parent function number, call times, function running time without timing function, and function running time with timing function >. Part of the data is presented as shown in table 2. In the example of the x _ solution function, the function is numbered 764, the function 764 is called by the function 757, and the function 764 calls the 471, 481, 650, 761, and 765 functions.
TABLE 2
Figure BDA0002928099380000071
Figure BDA0002928099380000081
Figure BDA0002928099380000091
6) And according to the timing result file, subtracting the function running time of all the sub-functions of the function, which contains the timing function, from the function running time which does not contain the timing function to obtain the self running time of the function, and generating a new timing result file. The new timing result file records < function number, accumulation time of sub-functions, calling times, including function running time of the function, and self time of the function >. Part of the data is presented as shown in table 3. Taking an x _ solution function as an example, the function number of the function is 764, the accumulated running time of the sub-functions is 4.46E +10 ns, the number of calls is 1002, the total running time is 5.21E +10 ns, and the running time of the function itself is 7.49E +09 ns.
TABLE 3
Figure BDA0002928099380000092
Figure BDA0002928099380000101
Figure BDA0002928099380000111
7) And restoring the function number in the new timing result file generated in the step 6 into a function name according to the function name record file generated in the step 1. The timing result file records < function name, cumulative time of sub-functions, and calling times, including function running time of function and self-time of function >. Part of the data is presented as shown in table 4. Taking an x _ solution function as an example, the function name x _ solution of the function is 4.46E +10 nanoseconds, the accumulated running time of the sub-function is 1002 times, the total running time is 5.21E +10 nanoseconds, and the running time of the function is 7.49E +09 nanoseconds.
TABLE 4
Figure BDA0002928099380000121
Figure BDA0002928099380000131
Figure BDA0002928099380000141
Example 2:
taking the atmospheric circulation mode program as exemplified by famicc 5 in CAS-ESM as an example, the function run time measurement was performed using the present invention using 64 processes running for 5 mode days. The method comprises the following specific steps:
1) generating an intermediate file of a program by the LLVM, scanning, compiling a Pass with a Function of acquiring a Function name, compiling the Pass into a dynamic link library, using an optimizer of the LLVM to enable the Pass to scan functions in a Module of the intermediate file one by one, and if the functions are Function declarations, acquiring the Function names, numbering the Function names and storing the Function names in a Function name recording file. The function name recording file records < function number, self-defined function name >. After the scanning is finished, all the self-defined function names including the self-defined MPI Wrapper function are contained in the function name recording file and are numbered. Part of the data is presented as shown in table 5. Here, taking the m _ mpif90_ type __ function as an example, the function number of the function is 7.
TABLE 5
Figure BDA0002928099380000151
Figure BDA0002928099380000161
2) And aiming at the intermediate file, compiling the Pass with the Function of inserting the intermediate file, compiling the Pass into a dynamic link library, using an optimizer of the LLVM to insert functions in modules of the intermediate file one by the Pass, inserting a serial number and a timing starting Function corresponding to each Function after an entrance in each Function, inserting a timing ending Function before an exit in each Function, and generating a new intermediate file. The partially instrumented code of m _ mpif90_ type __ is shown in FIG. 3.
3) And generating an executable file of the program to be measured by the LLVM according to the generated new intermediate file.
4) When an executable file of a program to be measured runs, a timing starting function is used, the current time is recorded at first and is stored in a function running time stack containing the timing function, the function number of the function is stored in the function number stack, and the current time is recorded again to serve as the function running starting time without the timing function and serve as a return value of the timing starting function. And the timing ending function records the current time and the calling relationship of the father and the son, calculates the calling times and the running time of the function, and stores the calling relationship of the father and the son, the calling times, the running time of the function without the timing function and the running time of the function with the timing function into a function running state recording table in the memory.
5) After the executable file of the program to be measured is operated, the MALN function is inserted by writing the LLVM Pass, the function which writes the memory result back to the file is inserted at the end of the MALN function, and the function operation state record table in the memory is written into the timing result file on the hard disk. The timing result file records < function number, parent function number, call times, function running time without timing function, and function running time with timing function >. Part of the data is presented as shown in table 6. Taking the m _ mpif90_ type __ function as an example, the function number is 7, the function number 7 is called by the function number 466, and the function number 7 calls the function number 885.
TABLE 6
Figure BDA0002928099380000162
Figure BDA0002928099380000171
6) And according to the timing result file, subtracting the function running time of all the sub-functions of the function, which contains the timing function, from the function running time which does not contain the timing function to obtain the self running time of the function, and generating a new timing result file. The new timing result file records < function number, accumulation time of sub-functions, calling times, including function running time of the function, and self time of the function >. Part of the data is presented as shown in table 7. Taking an m _ mpif90_ type __ function as an example, the function number of the function is 7, the accumulated running time of the sub-functions is 6970 nanoseconds, the calling times are 8 times, the total running time is 9260 nanoseconds, and the running time of the function itself is 2290 nanoseconds.
TABLE 7
Figure BDA0002928099380000172
Figure BDA0002928099380000181
7) And restoring the function number in the new timing result file generated in the step 6 into a function name according to the function name record file generated in the step 1. The timing result file records < function name, cumulative time of sub-functions, and calling times, including function running time of function and self-time of function >. Part of the data is presented as shown in table 8. Taking an m _ mpif90_ type __ function as an example, the function name of the function is m _ mpif90_ type __, the accumulated running time of the sub-functions is 6970 nanoseconds, the calling times are 8 times, the total running time is 9260 nanoseconds, and the running time of the function is 2290 nanoseconds.
TABLE 8
Figure BDA0002928099380000182
Figure BDA0002928099380000191
The foregoing detailed description of the preferred embodiments of the invention has been presented. It should be understood that numerous modifications and variations could be devised by those skilled in the art in light of the present teachings without departing from the inventive concepts. Therefore, the technical solutions available to those skilled in the art through logic analysis, reasoning and limited experiments based on the prior art according to the concept of the present invention should be within the scope of protection defined by the claims.

Claims (7)

1. A method for measuring function running time applied to a parallel scientific computation program is characterized by comprising the following steps:
generating an intermediate file for a range sequence source file to be measured through the LLVM, scanning the intermediate file to obtain all self-defined function names, numbering the obtained self-defined function names, and writing the self-defined function names into a function name recording file on a hard disk;
for the intermediate file of the program generated by the LLVM, inserting a function number and a timing function into each function through the LLVM to generate a new intermediate file;
generating an executable file of the program to be measured for the new intermediate file through the LLVM;
when an executable file of a program to be measured runs, recording and determining a parent-child calling relationship and calling times among functions through a timing starting function and a timing ending function of a parent function and a child function, calculating the running time of the functions, and storing the parent-child calling relationship, the calling times, the function running time without a timing function and the function running time with the timing function into a function running state recording table in a memory;
after the executable file of the program to be measured is operated, writing a function operation state record table in the memory into a timing result file on the hard disk;
according to the timing result file, subtracting the function running time of all sub-functions of the function, which contains the timing function, from the function running time which does not contain the timing function to obtain the running time of the function per se, and generating a new timing result file;
and restoring the function number in the new timing result file into a function name according to the function name recording file.
2. The method of claim 1, wherein the all custom function names comprise custom MPI Wrapper functions.
3. The method as claimed in claim 2, wherein the method for measuring function runtime time applied to a parallel scientific computing program comprises the steps of generating an intermediate file for a range sequence source file to be measured by LLVM, scanning the intermediate file to obtain all user-defined function names, numbering the obtained user-defined function names, and writing the function names into a function name recording file on a hard disk, and specifically comprises:
compiling a source file into an intermediate file through an LLVM (link layer virtual memory), and compiling an LLVM Pass to scan functions in a Module of the intermediate file one by one;
if the Function is the Function statement, acquiring a Function name, numbering the Function name and storing the Function name into a Function name recording file, otherwise, not processing, wherein the Function name recording file records the Function number and the self-defined Function name;
and repeating the scanning and recording steps until the Function files in all modules of the intermediate file are scanned and recorded.
4. The method for measuring function runtime applied to a parallel scientific computation program according to claim 2, wherein the function number is set as a function number in which a word is inserted after an entry in each function; the timing functions include a timing start function and a timing end function, and are configured to insert the timing start function after an entry in each function and to insert the timing end function before an exit in each function.
5. The method as claimed in claim 2, wherein the function running state record table is configured to record < function number, parent function number, call number, function running time not including a timing function, and function running time including a timing function >, and the data structure in the memory storing the function running state record table is a red and black tree.
6. The method as claimed in claim 4, wherein the timing start function first records the current time and stores the current time in a function running time stack containing the timing function, stores the function number in the function number stack, and records the current time again as the function running start time without the timing function and as the return value of the timing start function.
7. The method as claimed in claim 6, wherein the timing end function first records the current time, and subtracts the current time from the function operation start time without the timing function to obtain the function operation time without the timing function; then taking out 2 function numbers at the top of the function number stack, namely the number of the current function and the parent function number of the current function; recording the current moment again, and subtracting the time from the time of the stack top of the function running time stack containing the timing function to obtain the function running time containing the timing function; finally, the function number and the father function number are used as main keys, whether a corresponding table entry exists in the function running state recording table is searched, if yes, the calling times in the table entry are added, and the function running time without the timing function and the function running time with the timing function are respectively accumulated; if not, a new table entry is built and inserted into the function running state recording table.
CN202110141179.2A 2021-02-01 2021-02-01 Function runtime measurement method applied to parallel scientific computation program Active CN112882912B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110141179.2A CN112882912B (en) 2021-02-01 2021-02-01 Function runtime measurement method applied to parallel scientific computation program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110141179.2A CN112882912B (en) 2021-02-01 2021-02-01 Function runtime measurement method applied to parallel scientific computation program

Publications (2)

Publication Number Publication Date
CN112882912A true CN112882912A (en) 2021-06-01
CN112882912B CN112882912B (en) 2022-10-25

Family

ID=76052518

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110141179.2A Active CN112882912B (en) 2021-02-01 2021-02-01 Function runtime measurement method applied to parallel scientific computation program

Country Status (1)

Country Link
CN (1) CN112882912B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103425565A (en) * 2012-05-16 2013-12-04 腾讯科技(深圳)有限公司 Method and system for acquiring running information of program
CN103631573A (en) * 2012-08-24 2014-03-12 中兴通讯股份有限公司 Method and system for obtaining execution time of transferable functions
JP2014222384A (en) * 2013-05-13 2014-11-27 日本電気株式会社 Function execution time measuring device, function execution time measuring method, and function execution time measuring program
CN105183650A (en) * 2015-09-11 2015-12-23 哈尔滨工业大学 LLVM-based automatic performance prediction method for scientific calculation program
CN108959069A (en) * 2018-06-11 2018-12-07 北京奇艺世纪科技有限公司 A kind of method for tracing and device of function operation
CN109923526A (en) * 2017-10-12 2019-06-21 深圳市汇顶科技股份有限公司 Computer storage medium, program operational monitoring method and device
CN110781060A (en) * 2019-09-20 2020-02-11 平安普惠企业管理有限公司 Function monitoring method and device, computer equipment and storage medium
CN111813677A (en) * 2020-07-09 2020-10-23 杭州优万科技有限公司 Performance debugging method on embedded equipment
CN112199261A (en) * 2019-07-08 2021-01-08 腾讯科技(深圳)有限公司 Application program performance analysis method and device and electronic equipment

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103425565A (en) * 2012-05-16 2013-12-04 腾讯科技(深圳)有限公司 Method and system for acquiring running information of program
CN103631573A (en) * 2012-08-24 2014-03-12 中兴通讯股份有限公司 Method and system for obtaining execution time of transferable functions
JP2014222384A (en) * 2013-05-13 2014-11-27 日本電気株式会社 Function execution time measuring device, function execution time measuring method, and function execution time measuring program
CN105183650A (en) * 2015-09-11 2015-12-23 哈尔滨工业大学 LLVM-based automatic performance prediction method for scientific calculation program
CN109923526A (en) * 2017-10-12 2019-06-21 深圳市汇顶科技股份有限公司 Computer storage medium, program operational monitoring method and device
CN108959069A (en) * 2018-06-11 2018-12-07 北京奇艺世纪科技有限公司 A kind of method for tracing and device of function operation
CN112199261A (en) * 2019-07-08 2021-01-08 腾讯科技(深圳)有限公司 Application program performance analysis method and device and electronic equipment
CN110781060A (en) * 2019-09-20 2020-02-11 平安普惠企业管理有限公司 Function monitoring method and device, computer equipment and storage medium
CN111813677A (en) * 2020-07-09 2020-10-23 杭州优万科技有限公司 Performance debugging method on embedded equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
詹石岩: "基于程序插桩技术的程序运行时间预测方法", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Also Published As

Publication number Publication date
CN112882912B (en) 2022-10-25

Similar Documents

Publication Publication Date Title
Falk et al. TACLeBench: A benchmark collection to support worst-case execution time research
Massonnet et al. Replicability of the EC-Earth3 Earth system model under a change in computing environment
US7979852B2 (en) System for automatically generating optimized codes
US20060150160A1 (en) Software analyzer
CN102567200A (en) Parallelization security hole detecting method based on function call graph
CN105183650A (en) LLVM-based automatic performance prediction method for scientific calculation program
Malony et al. An experimental approach to performance measurement of heterogeneous parallel applications using cuda
CN112597064B (en) Method for simulating program, electronic device and storage medium
Mendonça et al. Automatic insertion of copy annotation in data-parallel programs
CN112882912B (en) Function runtime measurement method applied to parallel scientific computation program
US8756580B2 (en) Instance-based field affinity optimization
Peredo et al. Acceleration of the Geostatistical Software Library (GSLIB) by code optimization and hybrid parallel programming
US7590792B2 (en) Cache memory analyzing method
Wolf et al. Execution cost interval refinement in static software analysis
CN111290946B (en) Floating point number error detection method and device based on atomic state function
Armstrong et al. Performance forecasting: Towards a methodology for characterizing large computational applications
Morelli et al. Warping cache simulation of polyhedral programs
Govindasamy Mapping of an APNG Encoder to the Grid of Processing Cells Architecture
Massonnet et al. Reproducibility of an Earth System Model under a change in computing environment
Curreri et al. Performance analysis with high-level languages for high-performance reconfigurable computing
Anand et al. An accurate stack memory abstraction and symbolic analysis framework for executables
Kaminski et al. Efficient sensitivities for the spin-up phase
Patil Regression Testing in Era of Internet of Things and Machine Learning
Deshpande Automatic Generation of Complete Communication Skeletons from Traces.
Perks et al. Towards automated memory model generation via event tracing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant