CN111459633B - Irregular program-oriented self-adaptive thread partitioning method - Google Patents

Irregular program-oriented self-adaptive thread partitioning method Download PDF

Info

Publication number
CN111459633B
CN111459633B CN202010238885.4A CN202010238885A CN111459633B CN 111459633 B CN111459633 B CN 111459633B CN 202010238885 A CN202010238885 A CN 202010238885A CN 111459633 B CN111459633 B CN 111459633B
Authority
CN
China
Prior art keywords
program
thread
complexity
scheme
irregular
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010238885.4A
Other languages
Chinese (zh)
Other versions
CN111459633A (en
Inventor
李玉祥
张志勇
牛丹梅
张丽丽
赵长伟
荆军昌
邵东霞
徐艳艳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Henan University of Science and Technology
Original Assignee
Henan University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Henan University of Science and Technology filed Critical Henan University of Science and Technology
Priority to CN202010238885.4A priority Critical patent/CN111459633B/en
Publication of CN111459633A publication Critical patent/CN111459633A/en
Application granted granted Critical
Publication of CN111459633B publication Critical patent/CN111459633B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Devices For Executing Special Programs (AREA)

Abstract

A self-adaptive thread partitioning method for irregular programs relates to the technical field of computers, and comprises the steps of building a program complexity calculation model on a multi-core platform, establishing a candidate thread partitioning scheme set based on a classical thread partitioning method, establishing a selection mechanism of a thread partitioning scheme according to expert knowledge, and selecting the most suitable thread partitioning scheme for programs according to context and program complexity. The invention has the beneficial effects that: the method can realize the optimal division of irregular programs of different types, excavate the potential parallelism of the irregular programs to the maximum extent, improve the acceleration ratio performance of the programs, solve the problem of software and hardware incompatibility between the serial programs and the multi-core processor, fully utilize the resources of the multi-core processor and the legacy serial programs, promote the parallelization of the multi-core processor and the software, promote the health, the virtuous and the rapid development of related industries such as high-performance computing and cloud computing, and have better application prospect and practical value.

Description

Irregular program-oriented self-adaptive thread partitioning method
Technical Field
The invention belongs to the technical field of computers, and particularly relates to an irregular program-oriented adaptive thread partitioning method.
Background
The multi-core era comes, and the traditional parallel programming mode and compiling technology face new challenges. An effective method for realizing the execution of the serial program on the multi-core is the parallelization of the serial program, which not only solves the transformation of the traditional serial program, but also reasonably utilizes increasingly developed and abundant core resources. Traditional parallelization methods, such as: the dependency problem is solved by adopting conservative methods such as OpenMP, MPI, TBB, openCL and CUDA, namely, concurrency units (threads or processes) with dependency relationship are serialized by adopting synchronization or communication, so that the parallelization effect of irregular programs is poor. Thread-Level Speculation (TLS), which is a speculative multithreading technology, allows data dependency between concurrent units to be aggressively executed in parallel, overcomes the limitation that the traditional parallelization method cannot effectively resolve fuzzy dependency relationship of Thread levels, and shows good prospects for parallelization of irregular programs. Thread division is a key step for inserting thread division statements into a serial program by TLS, is a core link for program speculation parallelization, and directly influences acceleration ratio performance, so that research on a thread division method is not slow.
The existing thread division method mainly comprises the following steps: a heuristic rule-based thread partitioning method, a machine learning-based thread partitioning method, a graph-based thread partitioning method, and the like. The former determines a thread dividing scheme according to a heuristic rule in thread division, and determines the insertion position of a thread dividing statement on a program execution flow path; the latter two learn the thread partition knowledge in the sample by using a machine learning method, predict a thread partition scheme according to program characteristics, and execute program partition by using the partition scheme. The thread dividing method generates a uniform thread dividing scheme for the similar programs under the guidance of a dividing rule or a dividing knowledge. However, for unknown programs, the complexity and execution state are difficult to predict, and it is difficult to ensure maximum performance improvement using a uniform thread partitioning scheme.
In order to understand the development situation of the existing thread division method, the existing papers and patents are searched, compared and analyzed, and the following technical information with high correlation degree with the method is screened out:
the technical scheme 1: in the serial program dividing process, a Thread dividing method (HR-based) Thread division Approach) based on Heuristic Rules determines the value ranges of parameters such as Thread granularity, data dependence between threads, firing distance and the like generated after all programs are divided according to the Heuristic Rules, thereby determining the position of a dividing mark (sp-cqip point).
The article entitled "Min-cut program composition for thread-level specification" uses a minimum segmentation algorithm of a graph to divide a program flow graph, uses a heuristic method to balance the cost of factors such as data dependence, performance cost, load imbalance and the like, and obtains performance improvement after program division.
A paper entitled "A General Compiler Framework for specialized Multithreading" finds out the granularity, priority, etc. of each thread after the thread division on the critical path of program execution by using heuristic rules.
A paper entitled A statistical Multi-processed Based on Precomputation Slices, to reduce the search space for a fire pair (SP-CQIP), heuristic rules were used to select candidate fire pairs. In the selection process, the excitation pairs with the contribution rate smaller than the contribution threshold are abandoned, the excitation pairs are simultaneously in the same process or a loop body, the length of the excitation pairs is smaller than the length threshold, the probability from SP to CQIP is larger than the probability threshold, and the ratio of the length of Precomputation-slice (P-slice) to the size of the speculation thread is smaller than the proportional threshold.
The technical scheme 2 is as follows: a Thread partitioning method (ML-based) Thread Partition Approach) based on Machine Learning learns the Thread partitioning knowledge in a sample set by using the Machine Learning method, predicts a partitioning scheme of a new input program according to the characteristics of the new input program, and guides the process of the program to be partitioned by using the partitioning scheme.
A thesis of the title 'speculative multithreading partitioning algorithm based on fuzzy clustering' searches an effective thread solution space by using a clustering method to obtain better thread partitioning.
A KNN-Based Thread Partitioning method is proposed in the article entitled A Novel Thread Partitioning applied Machine Learning for specialized Multithreading. The method mainly comprises two parts: training the generation of a sample set, extracting partition knowledge contained in the sample set, and selecting k most similar samples by using the similarity between each unknown program and each sample to determine a thread partition scheme of the program.
A paper entitled "localization streaming parallel for multi-core adaptive processing" uses a machine learning method to partition a stream program on a mobile and automatic compiler, learns prior knowledge offline, and predicts the partition structure of an unknown program.
A paper of the topic of Optimizing partial thresholds in specific multitudinous decoding extracts five main influence parameters influencing thread division, and optimizes the five parameters by using a layer traversal method, thereby obtaining a better thread division scheme for a program.
A paper of A Parametric Model in statistical Multithreading utilizes a linear regression method to discover the rules between thread partition parameters and acceleration ratios, and extracts the partition scheme of an irregular program.
The topic of Using Industrial Neural Network for Predicting threaded Partitioning in predictive multitreading utilizes an Artificial Neural Network learning Thread to partition knowledge and predict the Partitioning scheme of an unknown program
Technical scheme 3: the thread dividing method based on the Graph comprehensively covers program characteristic information by using a Weighted Control Flow Graph (WCFG) of a program, and executes comprehensive division on different paths of the WCFG.
A paper of title A Graph-Based Thread Partition application in predictive Multithreading proposes a Graph-Based Thread Partition method, in which a weighted control flow Graph is used for formally expressing an irregular program, a machine learning method is used for learning Thread Partition knowledge and predicting a Partition scheme of an unknown program, and the generated Partition scheme is applied to each process of the program.
A paper of the title GbA, A graph-based thread partition adaptation in specific multi-threading, proposes a graph-based thread partition method, in which an irregular program is formally expressed by a Weighted Control Flow Graph (WCFG) of the program, and a machine learning method is utilized to learn thread partition knowledge and predict a thread partition scheme for an unknown program.
A paper entitled Improving Graph Partitioning for model Graphs and Architectures carries out Graph Partitioning on sparse irregular data, a multi-thread Graph divider mt-Metis is provided, 20 different Graphs in multiple fields are used on 36 cores for carrying out experiments, and the effectiveness of the method is verified.
The technical schemes 1,2 and 3 respectively use different methods to realize the thread division based on heuristic rules, the thread division based on machine learning and the thread division based on graphs. However, there are some drawbacks in both of these solutions.
According to the technical scheme 1, upper and lower limits of thread granularity, dependency between threads, excitation distance and the like during thread division are specified through a heuristic rule, and then the insertion position of a thread division flag bit (sp-cqip) is guided. Compared with other methods, the scheme has the advantages of simplicity and easiness in operation. However, there is a uniform partition rule applied to all the programs to be partitioned, which results in that the partial program threads are partitioned without obtaining the best performance improvement.
The technical scheme 2 is that a machine learning method is utilized to learn thread division knowledge in a sample, the program to be divided is guided by using a division scheme of a most similar sample according to similarity comparison between the program to be divided and the sample, and thread division is carried out. The thread division method based on machine learning has the advantages of intelligence, automatic division and the like, but compared with the scheme 1, the method has the process problem that the minimum unit of thread division is a program and not a program. However, the thread dividing operation is performed in units of processes in the program, and therefore, the optimal performance improvement of some processes in the program is not achieved.
Technical solution 3 proposes a graph-based thread partitioning method in which an irregular program is formally expressed with a weighted control flow graph, and thread partitioning knowledge is learned and a thread partitioning scheme is predicted for an unknown program using a machine learning method. The scheme can fully mine the characteristic information of the program, but the process in the program cannot be divided in a personalized mode, and the program performance cannot be improved to the maximum extent.
According to the current research situation of irregular program thread division methods at home and abroad at present: the thread division method based on the heuristic rule has the advantages of simplicity and easiness in operation; the thread dividing method based on machine learning has the advantages of intelligence, automatic division and the like; the thread dividing method based on the graph can more comprehensively express the data and the control information of the program. However, in summary, the existing thread partitioning method mostly adopts a unified thread partitioning scheme for the same type of irregular program, and the complexity, execution state, and other aspects of the program are rarely concerned, thereby seriously affecting the efficiency of parallelization of serial programs.
Disclosure of Invention
The technical problem to be solved by the invention is to provide an irregular program-oriented adaptive thread partitioning method on a multi-core platform, and solve the problems that the existing thread partitioning method adopts a uniform thread partitioning scheme for the same type of irregular programs, and the complexity, the execution state and other aspects of the programs are rarely concerned, so that the parallelization efficiency of serial programs is seriously influenced, and the like.
The technical scheme adopted by the invention for solving the technical problems is as follows: an irregular program-oriented adaptive thread partitioning method comprises the following steps:
step one, establishing a complexity calculation model of an irregular program
1.1, constructing a CFG (computational fluid dynamics) graph of a program by using a formal expression and using a basic block as an analysis unit, and adding characteristic values obtained by program analysis to the CFG graph in an annotation form to form a weighted control flow graph;
1.2, calculating the complexity, namely the branch complexity of each path possibly existing on the weighting control flow graph based on probability statistics and graph traversal;
and 1.3, integrating the complexity of the components to obtain the overall complexity of the program.
Step two, constructing a candidate thread partition scheme set conforming to the program context
A method for fusing program context and program characteristics is adopted to construct a candidate thread partition scheme set, an initial candidate set is constructed on the basis of the program characteristics, and on the basis, the initial candidate set is filtered on the basis of the upper and lower reference values of the program to generate a final candidate thread partition scheme set.
Step three, constructing a thread division scheme selection mechanism according with the program complexity
After the program complexity and the candidate thread division scheme set are obtained from the first step and the second step respectively, a scheme selection mapping rule set of 'program complexity → thread division scheme' is established according to expert knowledge; and executing the context according to the mapping rule and the program complexity, and selecting the most suitable thread partition scheme in the candidate partition scheme set.
The rule set is used for storing expert knowledge used for reasoning, in the rule set, the expert knowledge of a thread division scheme selection mechanism is expressed by a mapping rule, and the general form expressed by the expert knowledge mapping rule is IF < condition >, THEN < condition >.
The invention has the beneficial effects that: (1) The invention mainly researches an adaptive thread division method facing irregular programs on a multi-core platform, builds a program complexity calculation model, establishes a candidate division scheme set based on a classical thread division method, establishes a thread division scheme selection mechanism according to expert knowledge, selects a most suitable thread division scheme according to context and program complexity, and can realize optimal division of the programs, thereby fully utilizing multi-core resources and maximally excavating potential parallelism of the irregular programs.
(2) The invention aims to utilize a self-adaptive mechanism, and self-adaptively selects the most suitable thread division scheme according to the program characteristics and the context on the basis of the composite thread division method, thereby not only solving the contradiction problem of multi-core platform and program serial execution, but also improving the acceleration ratio performance after serial program parallelization, and providing a new method for designing a multi-core processor.
(3) The self-adaptive thread division method provided by the invention effectively solves the parallelization problem of the irregular serial program on the multi-core platform, simultaneously promotes the progress of the parallel technology, promotes the healthy, benign and rapid development of related industries such as high-performance computing and cloud computing, and has a better application prospect and a practical value.
Drawings
FIG. 1 is a schematic overall flow chart of the adaptive thread partitioning method according to the present invention;
FIG. 2 is a flowchart illustrating the complexity calculation of the process of the present invention;
FIG. 3 is a schematic flow chart of the present invention for constructing a candidate thread partition scheme set;
FIG. 4 is a flow diagram of a thread partitioning scheme selection mechanism.
Detailed Description
The following description of specific embodiments (examples) of the present invention are provided in conjunction with the accompanying drawings to enable those skilled in the art to better understand the present invention.
The overall scheme and flow of the adaptive thread partitioning method of the present invention is shown in fig. 1. Taking an irregular serial program as input, taking program complexity calculation model establishment, candidate thread division scheme generation and division scheme selection based on expert knowledge as main research points, selecting a thread division scheme most suitable for the program to execute thread division, and obtaining an acceleration ratio and a program running result on a Prophet simulator.
(1) Establishment of complexity calculation model of irregular program
The program characteristics influencing the thread division are many, such as data dependence, control dependence, branch number, basic block number, average dynamic instruction number, nesting layer number of loop structure, procedure call number and the like. The values of these features reflect the complexity of the program (complexity is a measure of complexity). Most of the existing thread dividing methods cannot fully consider the influence of program complexity on thread division, only program features are selected as input of the thread dividing methods, and the problems that the program features selected by different thread dividing methods are not uniform, the generated thread dividing schemes are not accurate enough and the like are easily caused.
Firstly, a program complexity calculation model adopts formal expression, a basic block is used as an analysis unit to construct a CFG (computational fluid dynamics) graph of a program, and characteristic values obtained through program analysis are added to the CFG graph in an annotation form to form a Weighting Control Flow Graph (WCFG); calculating the complexity (namely the branch complexity) of each possible path on the WCFG based on the probability statistics and the graph traversal; and finally, integrating the sub-complexity to obtain the overall complexity of the program. Fig. 2 shows a flow chart of the program complexity calculation.
In FIG. 2, P represents the input irregular serial program, G (P) represents WCFG, F1-Fn (N ∈ N) represent program features, F1 () -Fn () (N ∈ N) represent conversion functions, comp1 () -Comp () (N ∈ N) represent the complexity of each path, and Comp represents the total complexity of P. In the model, first, the unknown program P is formally expressed and converted into WCFG, i.e., G (P); secondly, extracting the characteristics of each possible path (from a head node to a tail node) in G (P), and respectively representing the paths by F1-Fn (N belongs to N); thirdly, the mapping of eigenvalues to complexity is implemented with a transfer function f1 () -fn () (N ∈ N), such as: the complexity corresponding to the number x of the basic blocks is 0.01 multiplied by x, the complexity corresponding to the number y of the loops is 0.2 multiplied by y, and the like; then, the complexity Comp1 () to Comp () of each path in G (P) is calculated separately; and finally, summarizing the complexity of each path to obtain the complexity of the program P.
(2) Construction of a set of candidate thread partitioning schemes that conform to a program context
A method for fusing program context and program characteristics is adopted to construct a candidate thread partition scheme set, an initial candidate set is constructed on the basis of the program characteristics, and on the basis, the initial candidate set is filtered on the basis of the values of the upper and lower references of the program to generate a final candidate thread partition scheme set. FIG. 3 shows a construction process of a candidate thread partition scheme set.
In FIG. 3, P represents an irregular serial program, F1-Fn represent program features, formal (P) represents a Formal expression of P, and M represents 1 ~M n (N belongs to N) represents N classical thread division methods, schem 1 ~Sch em n Representing n thread partitioning schemes. The thread dividing method is numbered as follows: thread partitioning method (M) based on heuristic rule 1 ) Numbered 1, thread partitioning method based on machine learning (M) 2 ) Numbered 2, thread partitioning method based on graph critical path (M) 3 ) Numbered 3, thread partitioning method based on graph full path (M) 4 ) Numbered 4, hybrid thread partitioning method (M) 5 ) Number 5, etc. The path numbers are respectively: the critical path number is 1, and the other non-critical paths are numbered from 2 to N (N belongs to N). The Thread partitioning scheme is composed of a Thread partitioning method number, a path number and five main parameters influencing a Thread partitioning result in a Thread partitioning algorithm (the five parameters are respectively an Upper Limit of firing Distance (ULoSD), a Lower Limit of firing Distance (LLoSD), a Data Dependency Count (DDC), an Upper Limit of Thread Granularity (ULoTG) and a Lower Limit of Thread Granularity (LLoTG)). By introducing a context parameter delta 1 ~δ n (N epsilon N), so that the thread partitioning method in the invention is context-aware, and the constructed candidate thread partitioning scheme set can better capture the change of the program state.
(3) Construction of thread partitioning scheme selection mechanism conforming to program complexity
After the program complexity is calculated and a candidate thread division scheme set is constructed in the steps (1) and (2), a mapping rule set selected by the scheme of 'program complexity- > thread division scheme' is established according to expert knowledge; and selecting the most suitable thread partition scheme in the candidate partition scheme set according to the mapping rule, the program complexity and the execution context. FIG. 4 is a flow diagram of a thread partitioning scheme selection mechanism.
The rule set is used to store expert knowledge for reasoning. In the rule set, expert knowledge of the thread partitioning scheme selection mechanism is expressed in production rules (also called mapping rules). Production rules separate the knowledge representation into two parts, a premise and a conclusion. The general form of the expert knowledge production rule representation is IF < condition >, THEN < condition >, such as:
(i) IF < complexity Comp ∈ [0.8,1.0] >, THEN < choice Schem1' >;
(ii) IF < complexity Comp ∈ [0.6, 0.8) >, THEN < select Schem2' >;
(iii) IF < complexity Comp ∈ [0.4, 0.6) >, THEN < choice Schem3' >;
(iv) IF < complexity Comp ∈ [0.2, 0.4) >, THEN < select Schem4' >;
(v) IF < complexity Comp ∈ (0.0, 0.2) >, THEN < select Schem5' >.
Schem1 'to Schem5' are division schemes selected from the candidate thread division scheme set generated in the step (2), and are determined by the complexity and rules of the program. Some cases for generating the mapping rules are given above.
The invention provides an irregular program-oriented adaptive thread partitioning method by utilizing an adaptive mechanism, aims to realize the general research goal of improving the speed-up ratio performance of an irregular program to the maximum extent, and provides a necessary and urgent thread partitioning method and a related basic theory for the wide application and the healthy development of a emerging parallel technology.
(1) Maximum boost in program acceleration ratio performance
By researching the relation between the program characteristics and the thread dividing scheme, a compound thread dividing scheme is established, and the program can autonomously select and execute the most suitable dividing scheme by using the guidance of a self-adaptive mechanism and expert knowledge to obtain the maximum acceleration ratio.
(2) Exploring laws that program features affect acceleration ratio performance
By analyzing factors influencing program parallelization, a program complexity model, a candidate thread division scheme set and a division scheme selection mechanism are established, the rule that the program characteristics influence the acceleration ratio performance of the program is explored, and method support is provided for irregular program parallelization on a multi-core platform.

Claims (2)

1. An irregular program-oriented adaptive thread partitioning method is characterized in that: the method comprises the following steps:
step one, establishing a complexity calculation model of an irregular program
1.1, adopting formal expression, constructing a CFG (computational fluid dynamics) graph of a program by taking a basic block as an analysis unit, and adding characteristic values obtained by program analysis to the CFG graph in an annotation form to form a weighted control flow graph;
1.2, calculating the complexity, namely the branch complexity of each path possibly existing on the weighting control flow graph based on probability statistics and graph traversal;
1.3, integrating the complexity of the sub-components to obtain the overall complexity of the program;
step two, constructing a candidate thread partition scheme set conforming to the program context
Constructing a candidate thread partition scheme set by adopting a method of fusing program context and program characteristics, constructing an initial candidate set on the basis of the program characteristics, and filtering the initial candidate set on the basis of the values of the upper and lower references of the program to generate a final candidate thread partition scheme set; the program characteristics refer to data dependence, control dependence, branch number, basic block number, average dynamic instruction number, nesting layer number of a loop structure and process call number, and the values of the characteristics reflect the complexity of a program;
step three, constructing a thread division scheme selection mechanism according with the program complexity
After the program complexity and the candidate thread division scheme set are obtained from the first step and the second step respectively, a scheme selection mapping rule set of 'program complexity → thread division scheme' is established according to expert knowledge; and executing the context according to the mapping rule and the program complexity, and selecting the most suitable thread partition scheme in the candidate partition scheme set.
2. The adaptive thread partitioning method for irregular programs according to claim 1, wherein: the rule set is used for storing expert knowledge used for reasoning, in the rule set, the expert knowledge of the thread division scheme selection mechanism is expressed by a mapping rule, and the general form expressed by the expert knowledge mapping rule is IF < condition >, THEN < condition >.
CN202010238885.4A 2020-03-30 2020-03-30 Irregular program-oriented self-adaptive thread partitioning method Active CN111459633B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010238885.4A CN111459633B (en) 2020-03-30 2020-03-30 Irregular program-oriented self-adaptive thread partitioning method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010238885.4A CN111459633B (en) 2020-03-30 2020-03-30 Irregular program-oriented self-adaptive thread partitioning method

Publications (2)

Publication Number Publication Date
CN111459633A CN111459633A (en) 2020-07-28
CN111459633B true CN111459633B (en) 2023-04-11

Family

ID=71683330

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010238885.4A Active CN111459633B (en) 2020-03-30 2020-03-30 Irregular program-oriented self-adaptive thread partitioning method

Country Status (1)

Country Link
CN (1) CN111459633B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB0905720D0 (en) * 2008-04-09 2009-05-20 Nvidia Corp Partitioning cuda code for execution by a general purpose processor
CN102968295A (en) * 2012-11-28 2013-03-13 上海大学 Speculation thread partitioning method based on weighting control flow diagram
CN105260166A (en) * 2015-10-15 2016-01-20 西安交通大学 Manual sample set generation method applied to machine learning thread partitioning
CN105373424A (en) * 2015-10-14 2016-03-02 西安交通大学 Speculative multithreading division method based on machine learning
CN110069347A (en) * 2019-04-29 2019-07-30 河南科技大学 A kind of thread dividing method of Kernel-based methods different degree

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB0905720D0 (en) * 2008-04-09 2009-05-20 Nvidia Corp Partitioning cuda code for execution by a general purpose processor
CN102968295A (en) * 2012-11-28 2013-03-13 上海大学 Speculation thread partitioning method based on weighting control flow diagram
CN105373424A (en) * 2015-10-14 2016-03-02 西安交通大学 Speculative multithreading division method based on machine learning
CN105260166A (en) * 2015-10-15 2016-01-20 西安交通大学 Manual sample set generation method applied to machine learning thread partitioning
CN110069347A (en) * 2019-04-29 2019-07-30 河南科技大学 A kind of thread dividing method of Kernel-based methods different degree

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
Yuxiang Li.ProCTA: program characteristic‑based thread p.《The Journal of Supercomputing》.2019, *
一种基于路径优化的推测多线程划分算法;李远成等;《软件学报》;20120815(第08期);20-34 *
基于模糊聚类的推测多线程划分算法;李远成;《计算机学报》;20140331;第37卷(第3期);580-592 *
基于程序特征的线程划分方法的研究;马巧梅;《计算机科学与探索》;20180620(第6期);872-885 *
面向CPU/GPU混合架构的地理空间分析负载均衡并行技术研究;周琛;《CNKI博士学位论文全文库》;20180915;1-179 *

Also Published As

Publication number Publication date
CN111459633A (en) 2020-07-28

Similar Documents

Publication Publication Date Title
CN111738434B (en) Method for executing deep neural network on heterogeneous processing unit
Kousiouris et al. The effects of scheduling, workload type and consolidation scenarios on virtual machine performance and their prediction through optimized artificial neural networks
Xiao et al. A load balancing inspired optimization framework for exascale multicore systems: A complex networks approach
CN115461763A (en) Efficient quantum adaptive execution method for quantum circuit
CN112434785B (en) Distributed parallel deep neural network performance evaluation method for supercomputer
Fanjiang et al. Search based approach to forecasting QoS attributes of web services using genetic programming
CN113822173A (en) Pedestrian attribute recognition training acceleration method based on node merging and path prediction
Cedersjö et al. Tÿcho: A framework for compiling stream programs
CN105373424A (en) Speculative multithreading division method based on machine learning
CN111459633B (en) Irregular program-oriented self-adaptive thread partitioning method
Naqvi et al. Mascot: self-adaptive opportunistic offloading for cloud-enabled smart mobile applications with probabilistic graphical models at runtime
Burtscher et al. A scalable heterogeneous parallelization framework for iterative local searches
CN110069347B (en) Thread dividing method based on process importance
Deniz et al. Using machine learning techniques to detect parallel patterns of multi-threaded applications
CN116400963A (en) Model automatic parallel method, device and storage medium based on load balancing
Ni et al. Online performance and power prediction for edge TPU via comprehensive characterization
CN114217688B (en) NPU power consumption optimization system and method based on neural network structure
Zeng et al. MP-DPS: adaptive distributed training for deep learning based on node merging and path prediction
Yi et al. Optimizing DNN compilation for distributed training with joint OP and tensor fusion
Li et al. An Adaptive Thread Partitioning Approach in Speculative Multithreading
CN112083929B (en) Performance-energy consumption collaborative optimization method and device for power constraint system
Li et al. Tpaopi: a thread partitioning approach based on procedure importance in speculative multithreading
Ma et al. Parallel exact inference on multicore using mapreduce
Varrette et al. Automatic software tuning of parallel programs for energy-aware executions
Zeng et al. Aware: Adaptive Distributed Training with Computation, Communication and Position Awareness for Deep Learning Model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant