CN110377525B - Parallel program performance prediction system based on runtime characteristics and machine learning - Google Patents

Parallel program performance prediction system based on runtime characteristics and machine learning Download PDF

Info

Publication number
CN110377525B
CN110377525B CN201910680598.6A CN201910680598A CN110377525B CN 110377525 B CN110377525 B CN 110377525B CN 201910680598 A CN201910680598 A CN 201910680598A CN 110377525 B CN110377525 B CN 110377525B
Authority
CN
China
Prior art keywords
program
basic block
parallel program
parallel
machine learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910680598.6A
Other languages
Chinese (zh)
Other versions
CN110377525A (en
Inventor
张伟哲
何慧
王一名
郝萌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Institute of Technology
Original Assignee
Harbin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Institute of Technology filed Critical Harbin Institute of Technology
Priority to CN201910680598.6A priority Critical patent/CN110377525B/en
Publication of CN110377525A publication Critical patent/CN110377525A/en
Application granted granted Critical
Publication of CN110377525B publication Critical patent/CN110377525B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3604Software analysis for verifying properties of programs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3604Software analysis for verifying properties of programs
    • G06F11/3608Software analysis for verifying properties of programs using formal methods, e.g. model checking, abstract interpretation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Debugging And Monitoring (AREA)

Abstract

A parallel program performance prediction system based on runtime characteristics and machine learning belongs to the technical field of parallel program performance prediction. The invention aims to solve the problems of high overhead, long prediction time and low accuracy of a parallel program performance prediction system based on machine learning. Performing mixed instrumentation on an original program, reducing a counter of a basic block, then deleting the program into a serial program without an input result, reducing the running time of the program while maintaining the flow of program execution, accurately and quickly acquiring the frequency of the basic block, preprocessing the data, inputting the data into a prediction model, and finally outputting the execution time of a large-scale parallel program. The model generated by the method has strong generalization capability, can accurately predict the execution time of the large-scale parallel program, and has low prediction cost.

Description

Parallel program performance prediction system based on runtime characteristics and machine learning
Technical Field
The invention relates to a parallel program performance prediction system based on runtime characteristics and machine learning, and belongs to the technical field of parallel program performance prediction.
Background
With the rapid increase of the scale and complexity of the high-performance computing system, such as the number of nodes, storage, etc., the cost of executing the parallel application program in the high-performance computing system by the user also increases, the execution efficiency of many parallel programs in the high-performance computing system is low, and the waste of system resources is caused, which causes the efficiency and the expandability problems of the high-performance system and the application program to become more and more prominent. Therefore, it is very important to predict the performance of a massively parallel program on a target system by running a small-scale parallel program before executing the parallel program massively in a high-performance computing system. In addition, according to the prediction result, the performance of the parallel program is optimized, the execution cost can be effectively reduced, and the waste of resources is avoided.
The prior art with the reference number CN101650687B discloses a large-scale parallel program performance prediction implementation method, which includes: collecting communication sequences and sequential calculation vectors of parallel programs, analyzing the calculation similarity of each process, selecting a representative process, recording the communication content of the representative process, replaying the representative process by using a calculation node of a target platform, acquiring sequential calculation time of the representative process, and replacing the calculation time of other processes by the calculation time; acquiring a communication record of the parallel program; the final program performance is automatically predicted using a network simulator. By the method, the accurate parallel program prediction performance can be obtained by using few hardware resources.
The parallel program energy prediction system based on machine learning has the disadvantages of high overhead, long prediction time and low accuracy, and the prior art does not provide a parallel program energy prediction system which enables the overhead, the prediction time and the accuracy to be low and achieve the best compromise.
Disclosure of Invention
The technical problem to be solved by the invention is as follows:
the invention aims to solve the problems of high overhead, long prediction time and low accuracy of a parallel program performance prediction system based on machine learning.
The technical scheme adopted by the invention for solving the technical problems is as follows:
a parallel program performance prediction system based on runtime features and machine learning comprises a feature acquisition module, a performance modeling module and a performance prediction module,
the device comprises a characteristic acquisition module, a detection module and a control module, wherein the characteristic acquisition module is used for converting a parallel program to be detected into an LLVM IR form, performing 'edge profiling pile insertion' on the parallel program to generate a parallel program (an executable program) after pile insertion, executing the parallel program after pile insertion according to different input scales and process numbers to generate total operation time, process numbers and basic block frequency, and preprocessing three parameters of the total operation time, the process numbers and the basic block frequency;
the performance modeling module is used for taking the preprocessed process number and the basic block frequency as input; performing machine learning by taking the preprocessed execution time as output, and obtaining a performance prediction model after the machine learning;
the performance prediction module is used for converting the parallel program to be tested into an LLVM IR form, performing basic block mixed pile insertion on the parallel program, performing program deletion after pile insertion to obtain an executable serial program, executing the serial program by using different input scales and process numbers larger than the input scale and the process number in the characteristic acquisition module to generate a process number and a basic block frequency, and then preprocessing the process number and the basic block frequency; and taking the processed process number and the basic block frequency as the input of the performance prediction model to obtain the output of the predicted parallel program execution time.
Further, the specific process of the edge profiling pile-inserting algorithm is as follows,
the input is as follows: the LLVM IR of the parallel program is,
the output is: IR after edge profiling stake insertion,
1) A counter group C is established in the parallel program to be tested and initialized to zero;
2) Judging whether an edge in the graph is a critical edge or not in a control flow graph corresponding to LLVM IR of the parallel program, and if so, inserting a new basic block newbb between a source basic block (basic block) and a target basic block of the critical edge e; adding a code { C [ index ] + } before the termination instruction of the new basic block newbb; otherwise, adding a code { C [ index ] + } before the termination instruction of the source basic block or the target basic block of the critical edge e, and completing the instrumentation.
Further, the concrete process of the hybrid pile-inserting algorithm is as follows,
inputting: the LLVM IR of the parallel program is,
and (3) outputting: mixing the IR after the pile is inserted,
1) Obtaining a basic block set selected by the characteristic acquisition module through processing,
2) Creating a counter group C in the target program and initializing to zero;
3) Judging whether l is a natural loop or not and judging whether a head block h of a back side in the loop is dominated by the basic block or not for a loop l of the basic block selected in the step 1) in the parallel program to be tested, and if so, creating a preheader block p before a head node header; then the following steps are carried out:
creating a preheader block p before a node header;
acquiring LTC related values: % start,% end,% stride;
adding code before p's stop instruction
Figure BDA0002144043940000021
Calculating LTC of l;
executing when p traverses, and adding a code { C [ index ] + = Gamma } before a termination instruction of p;
otherwise, add code { C [ index ] + } in the basic block.
Further, the specific process of the program deletion algorithm is as follows:
inputting: parallel program mixed post-instrumentation IR
And (3) outputting: pruned IR
1) Firstly, deleting codes related to output in the parallel program in the IR after the parallel program mixed instrumentation;
2) And then deleting the function call in the MPI parallel program,
3) And finally eliminating the dead code.
The invention has the following beneficial technical effects:
the method can accurately predict the performance of the large-scale parallel program, not only can analyze the performance of the program for a user so that the user can efficiently execute the application program on a high-performance computing system, but also can help the user to manage and schedule the operation, reasonably distribute the scheduling strategy, reduce the waiting time of the operation, and evaluate the resources so as to guide the user to apply for the resources. Therefore, the invention provides a parallel program performance prediction system, the model generated by the system has strong generalization capability, the execution time of the large-scale parallel program can be accurately predicted, the prediction cost is low, and the system has strong value of practical application.
The runtime characteristics in the parallel program performance prediction system based on the runtime characteristics and machine learning refer to basic block frequency, and the parallel program performance refers to the execution time of a program.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention and not to limit the invention. In the drawings:
FIG. 1 is a block diagram of a parallel program performance prediction framework according to the present invention;
FIG. 2 is a graph of predicted time versus real time for 6 parallel programs characterized by basic blocks, wherein the meanings of a) sweet 3D, b) LULESH, c) NPB SP, D) NPB BT, e) NPB LU, and f) NPB EP are all well known and represent parallel program names; the ordinate in the figure represents the execution time and the abscissa represents the number of samples;
FIG. 3 is a box graph of the MAPE characteristic of the fundamental fast frequency for six parallel programs, SVR, RF, ridge representing three machine learning methods;
FIG. 4 is a graph comparing the errors of the three methods;
FIG. 5 is a graph of a comparison of the predicted and original costs of six parallel programs.
Detailed Description
With reference to fig. 1 to 5, the implementation of the parallel program performance prediction system based on runtime features and machine learning according to the present invention is described as follows:
1 parallel program performance prediction system
As shown in fig. 1, the parallel program energy prediction system is mainly divided into three parts: feature acquisition, performance modeling, and performance prediction. The method comprises the steps that a first part is program feature acquisition, and training data features are acquired mainly by performing edge profiling on a small-scale parallel program; the second part is performance modeling, which adopts a supervised machine learning regression algorithm to perform performance modeling, continuously adjust parameters and evaluate an optimal performance prediction model; the third part is that the model is used for predicting the performance of the large-scale parallel program, and the basic block frequency of the large-scale program in operation needs to be quickly acquired as the input of the model, so that mixed instrumentation is carried out on the original program, the counter of the basic block is reduced, then the program is deleted into a serial program without an input result, the operation time of the program is reduced, the program execution flow is kept, the basic block frequency is accurately and quickly acquired, the data are preprocessed, the data are input into the prediction model, and finally the execution time of the large-scale parallel program is output.
2 acquisition of Performance model features
Firstly, converting a small-scale parallel program into an LLVM intermediate code form by using the front end of an LLVM compiling framework, then compiling a Pass for realizing edge profiling pile insertion, executing the Pass, and automatically performing pile insertion on the program. Then, the instrumented program is executed to generate a file containing the basic block frequency. And finally, reading the file, and sorting the data into a data set containing the process number and the basic block frequency. The edge profiling pile-inserting algorithm is specifically expressed as follows:
Figure BDA0002144043940000041
Figure BDA0002144043940000051
3 Performance modeling based on machine learning
Firstly, feature preprocessing is carried out, mainly non-linear normalization is carried out on data, and proper features are selected by removing repeated features, a variance selection method and a Pearson correlation coefficient. And then performing performance modeling by using three machine learning algorithms of SVR (singular value regression), ridge regression and RF (radio frequency), dividing data into a training set, a testing set and a verification set, fitting a model by using the training set, adjusting parameters by using the testing set and evaluating the model by using the verification set, wherein a grid search method and a k-fold cross verification method are combined, parameters are continuously called while the model is evaluated, and the optimal configuration parameters are automatically selected. The mean absolute percentage error was used to evaluate the model generalization ability.
4 Performance prediction of massively parallel programs
In order to predict the performance of a large-scale parallel program, the runtime characteristics of the large-scale program need to be acquired as the input of a prediction model. Although the overhead of using edge profiling instrumentation to obtain small-scale program features is small, the overhead of using it to obtain large-scale program features is large. Therefore, there is a need to reduce the overhead of post-instrumentation large-scale procedures. In order to reduce the overhead brought by the instrumentation, a mixed program instrumentation algorithm is provided. In addition, in order to reduce the cost of executing the large-scale program itself, a program deletion algorithm is also provided.
The hybrid stake-insertion algorithm will combine dynamic stake-insertion and static stake-insertion. The cycle number identification method is used for estimating the cycle execution number, the cycle number can be directly obtained in the running process, and a counter is not required to be inserted for accumulation. The loop induction variable is initialized to% start, the loop exit condition is% end, and the loop step is% stride. The number of cycles f is calculated in the form:
Figure BDA0002144043940000052
adding a new basic block called as a preheader before the header of the loop, and moving the counter of the basic block in the header to the preheader, and inserting a formula for calculating the loop times into the header, so that the counter is not required to be inserted. This approach can further reduce the number of access and update counters. However, not all basic block counters in a natural loop can move into a preheader. Next, a method of determining whether the basic block frequency in a natural loop including a branch can be moved into the preheader basic block in the loop is given. The following definition is used to determine whether the counter of the basic block can be moved to the preheader node.
Definition 1 in a control flow graph, an input node is b0, if each path from b0 to bj must pass through bi, it is called that node bi dominates node bj, and write is bi > > bj. By definition, each node dominates itself, e.g., bi > > bi.
The hybrid pile-insertion algorithm is specifically expressed as follows:
Figure BDA0002144043940000061
the program pruning algorithm obtains the selected basic block frequency without considering the calculation result, so that the initialization code and the instrumentation-related code are firstly reserved to ensure that the pruned program can normally run and accurately record the basic block frequency, and then useless and output-related codes in the IR are deleted. In addition, in order to generate a serial program, it is necessary to delete the part called by the parallel program function. After the code associated with the output and the MPI function call code are deleted, many dead codes appear, which are not used for other calculations, and which can be deleted from the IR by performing dead code elimination. In this way, IR is reduced, resulting in smaller executable programs and faster execution speeds.
The program pruning algorithm is specifically expressed as follows:
Figure BDA0002144043940000071
the technical effects of the present invention are explained below:
1 prediction of results
Table 1 shows a set of two features, the first being the usual method (INPUT), the selected features being the INPUT parameters and the number of processes, the second being the method proposed by the invention (RUNTIME), the selected features being the basic block frequencies and the number of processes. It can be seen from table 1 that the method characterized by the basic block frequency is significantly better than the method characterized by the input parameters. The MAPE of the method is below 20 percent, and the average MAPE of 6 parallel applications is 8.41 percent.
TABLE 1 feature set and MAPE for parallel programs
Figure BDA0002144043940000081
Table 2 shows standard deviations of prediction errors of the 6 parallel programs, and the discrete degrees of the prediction errors can be clearly seen, so that the stability of the model can be analyzed. In the method characterized by the input parameters, the stability of RF is best, and SVR is superior to RF when using basic block frequency as the feature. Overall, the SVR model characterized by the fundamental block frequency has the best stability.
These results show that compared to traditional machine learning methods with only input parameter features, automatic performance modeling based on runtime features can build better performance models, significantly improving prediction accuracy and stability.
TABLE 2 standard deviation of parallel program errors
Figure BDA0002144043940000091
Fig. 2 shows a comparison of predicted time and program true run time using SVR, RF and Ridge regression algorithms for sweet 3D, LULESH and NPB parallel applications, respectively, and characterized by basic blocks. In these figures, the samples of the test set are ordered in increasing order as the actual runtime, with the deepest points being the true program execution time and the other lighter points representing the time predicted by the machine learning model.
FIG. 3 is a boxplot of 6 parallel applications of MAPE characterized by the frequency of the basic blocks, which is able to avoid the effects of outliers and accurately demonstrate the discrete distribution of the data. From these figures, it is clear that the prediction error of SVR is minimal.
2 comparative experiment
The method provided by the invention is compared with two other classical performance prediction models based on input parameters. These two methods are the Branes method and the Hoefler method. A comparison of the errors of the three methods is shown in fig. 4.
TABLE 3 MAPE of the three methods
Figure BDA0002144043940000092
Figure BDA0002144043940000101
3 Performance prediction overhead
In predicting the performance of a parallel application, only the corresponding pruned serial program need be executed to collect the basic block frequency, without executing the original parallel application. The generated data only contains the basic block frequency of a few basic blocks (6 in the invention), and the storage overhead is negligible. Therefore, the truncated serial program is mainly evaluated for predicted execution overhead. The computational resources on the supercomputer are charged on a per-core basis, so in this experiment, the prediction cost is also expressed in terms of per-core.
Table 4 shows a comparison of the number of cores consumed by the method of the present invention when predicting the performance of 6 selected applications versus the number of cores executed by the initially parallel application. It can be seen from this table that all the overhead of performing the inventive method on 6 applications is much lower than the overhead of the original application. The average management cost only accounts for 0.1219% of the original execution cost. This means that the method may help HPC users to efficiently predict the performance of parallel applications. This is because the pruned program is a stand-alone serial program that can be executed with only one node or one core. In addition, optimizing this serial program by reducing the number of inserted counters and eliminating many dead codes further improves its performance.
TABLE 4 method and average overhead of original execution
Figure BDA0002144043940000102
Fig. 5 shows a comparison graph of predicted cost and original cost for 6 parallel programs, in which samples of a test set are sorted in increasing order at actual run time, where the y-axis is the kernel, the line closer to the x-axis is the predicted cost and the line farther from the x-axis is the original cost. From these figures, it is clear that the prediction overhead is much less than the overhead of the original program execution.

Claims (3)

1. A parallel program performance prediction system based on runtime characteristics and machine learning is characterized in that the system comprises a characteristic acquisition module, a performance modeling module and a performance prediction module,
the device comprises a characteristic acquisition module, a data processing module and a data processing module, wherein the characteristic acquisition module is used for converting a parallel program to be tested into an LLVM IR form, performing 'edge profiling pile insertion' on the parallel program to generate a parallel program after pile insertion, executing the parallel program after pile insertion according to different input scales and process numbers to generate total operation time, process numbers and basic block frequency, and preprocessing three parameters of the total operation time, the process numbers and the basic block frequency;
the performance modeling module is used for taking the preprocessed process number and the basic block frequency as input; performing machine learning by taking the preprocessed execution time as output, and obtaining a performance prediction model after the machine learning;
the performance prediction module is used for converting the parallel program to be tested into an LLVM IR form, performing basic block mixed instrumentation on the LLVM IR form, performing program deletion after instrumentation to obtain an executable serial program, executing the serial program by using different input scales and process numbers which are larger than the input scale and the process number in the characteristic acquisition module to generate a process number and a basic block frequency, and then preprocessing the process number and the basic block frequency; taking the processed process number and the basic block frequency as the input of the performance prediction model to obtain the output of the predicted parallel program execution time;
the concrete process of the hybrid pile-inserting algorithm is as follows,
inputting: the LLVM IR of the parallel program is,
and (3) outputting: mixing the IR after the pile is inserted,
1) Obtaining a basic block set selected by the processing in the characteristic acquisition module,
2) Creating a counter group C in the target program and initializing to zero;
3) Judging whether l is a natural loop or not and judging whether a head block h of a back side in the loop is dominated by the basic block or not for a loop l of the basic block selected in the step 1) in the parallel program to be tested, and if so, creating a preheader block p before a head node header; then the following steps are carried out:
creating a preheader block p before a node header;
acquiring LTC related values: % start,% end,% stride;
adding code before p's stop instruction
Figure FDA0003847160490000011
Calculating LTC of l;
executing when p traverses, and adding a code { C [ index ] + = Gamma } before a termination instruction of p;
otherwise, add the code { C [ index ] + } in the basic block.
2. The system for predicting parallel program performance based on runtime characteristics and machine learning of claim 1, wherein the edge profiling instrumentation algorithm is implemented as follows,
the input is as follows: the LLVM IR of the parallel program is,
the output is: IR after edge profiling instrumentation,
1) A counter group C is established in the parallel program to be tested and initialized to zero;
2) Judging whether an edge in the graph is a critical edge or not in a control flow graph corresponding to LLVM IR of the parallel program, and if so, inserting a new basic block newbb between a source basic block and a target basic block of the critical edge e; adding a code { C [ index ] + } before the termination instruction of the new basic block newbb; otherwise, adding a code { C [ index ] + } before the termination instruction of the source basic block or the target basic block of the critical edge e, and completing the instrumentation.
3. The system for parallel program performance prediction based on runtime features and machine learning of claim 2, wherein the program pruning algorithm is as follows:
inputting: parallel program mixed post-instrumentation IR
And (3) outputting: pruned IR
1) Firstly, deleting codes related to output in the parallel program in the IR after the parallel program mixed instrumentation;
2) And then deleting the function call in the MPI parallel program,
3) And finally, eliminating the dead code.
CN201910680598.6A 2019-07-25 2019-07-25 Parallel program performance prediction system based on runtime characteristics and machine learning Active CN110377525B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910680598.6A CN110377525B (en) 2019-07-25 2019-07-25 Parallel program performance prediction system based on runtime characteristics and machine learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910680598.6A CN110377525B (en) 2019-07-25 2019-07-25 Parallel program performance prediction system based on runtime characteristics and machine learning

Publications (2)

Publication Number Publication Date
CN110377525A CN110377525A (en) 2019-10-25
CN110377525B true CN110377525B (en) 2022-11-15

Family

ID=68256290

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910680598.6A Active CN110377525B (en) 2019-07-25 2019-07-25 Parallel program performance prediction system based on runtime characteristics and machine learning

Country Status (1)

Country Link
CN (1) CN110377525B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111522644B (en) * 2020-04-22 2023-04-07 中国科学技术大学 Method for predicting running time of parallel program based on historical running data
CN113553266A (en) * 2021-07-23 2021-10-26 湖南大学 Parallelism detection method, system, terminal and readable storage medium of serial program based on parallelism detection model

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102063373A (en) * 2011-01-06 2011-05-18 北京航空航天大学 Method for positioning performance problems of large-scale parallel program

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9547599B2 (en) * 2013-07-26 2017-01-17 Futurewei Technologies, Inc. System and method for predicting false sharing
CN105183650B (en) * 2015-09-11 2018-03-16 哈尔滨工业大学 Scientific program automatic performance Forecasting Methodology based on LLVM
CN105224452B (en) * 2015-09-11 2018-03-16 哈尔滨工业大学 A kind of prediction cost optimization method for scientific program static analysis performance
CN105183651B (en) * 2015-09-11 2018-03-16 哈尔滨工业大学 For the foreseeable viewpoint method for improving of program automaticity
US11120521B2 (en) * 2018-12-28 2021-09-14 Intel Corporation Techniques for graphics processing unit profiling using binary instrumentation

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102063373A (en) * 2011-01-06 2011-05-18 北京航空航天大学 Method for positioning performance problems of large-scale parallel program

Also Published As

Publication number Publication date
CN110377525A (en) 2019-10-25

Similar Documents

Publication Publication Date Title
CN110163429B (en) Short-term load prediction method based on similarity day optimization screening
CN107908536B (en) Performance evaluation method and system for GPU application in CPU-GPU heterogeneous environment
CN111782460A (en) Large-scale log data anomaly detection method and device and storage medium
CN110399182B (en) CUDA thread placement optimization method
CN110377525B (en) Parallel program performance prediction system based on runtime characteristics and machine learning
CN113420506B (en) Tunneling speed prediction model establishment method, tunneling speed prediction method and tunneling speed prediction device
CN110377519B (en) Performance capacity test method, device and equipment of big data system and storage medium
CN103645961B (en) The method for detecting abnormality of computation-intensive parallel task and system
CN106469114A (en) A kind of Parallel Computing Performance detecting system towards communication test and its method
CN113159441A (en) Prediction method and device for implementation condition of banking business project
CN117473440A (en) Power time sequence data prediction algorithm integrated management system and method
Li et al. Feature mining for machine learning based compilation optimization
CN110109811A (en) A kind of source tracing method towards GPU calculated performance problem
CN113656279B (en) Code odor detection method based on residual network and metric attention mechanism
CN111523685B (en) Method for reducing performance modeling overhead based on active learning
CN111522644B (en) Method for predicting running time of parallel program based on historical running data
CN111459838B (en) Software defect prediction method and system based on manifold alignment
CN112083929A (en) Performance-energy consumption collaborative optimization method and device for power constraint system
CN111898666A (en) Random forest algorithm and module population combined data variable selection method
Neill et al. Automated analysis of task-parallel execution behavior via artificial neural networks
Wu et al. A Highly Reliable Compilation Optimization Passes Sequence Generation Framework
CN117742683B (en) Software development operating system
CN110275773B (en) Paas resource recycling index system based on real data model fitting
CN112861951B (en) Image neural network parameter determining method and electronic equipment
Huang et al. Research on Fault Diagnosis of Smart Meters Based on Machine Learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant