CN111984418A - Automatic adjusting and optimizing method and device for granularity parameters of sparse matrix vector multiplication parallel tasks - Google Patents
Automatic adjusting and optimizing method and device for granularity parameters of sparse matrix vector multiplication parallel tasks Download PDFInfo
- Publication number
- CN111984418A CN111984418A CN202010880655.8A CN202010880655A CN111984418A CN 111984418 A CN111984418 A CN 111984418A CN 202010880655 A CN202010880655 A CN 202010880655A CN 111984418 A CN111984418 A CN 111984418A
- Authority
- CN
- China
- Prior art keywords
- matrix
- sparse matrix
- statistical characteristic
- task granularity
- value
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5083—Techniques for rebalancing the load in a distributed system
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Program initiating; Program switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
- G06N20/10—Machine learning using kernel methods, e.g. support vector machines [SVM]
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Abstract
The invention belongs to the field of parallel computing, and discloses an automatic adjusting and optimizing method and device for sparse matrix vector multiplication parallel task granularity parameters, wherein the method comprises a prediction model construction step, namely constructing a prediction model by using a machine learning method; a statistical characteristic value obtaining step, namely analyzing the original data file of the matrix to obtain a statistical characteristic value of the matrix; an optimal task granularity parameter prediction step, namely inputting the obtained statistical characteristic values into a prediction model, and predicting optimal parallel task granularity parameter values of the SpMV program when the matrix characteristic values are used as input; and a configuration step, namely adjusting the task granularity of the system in parallel operation according to the prediction result. The device comprises a prediction model construction module, a statistical characteristic value acquisition module, an optimal task granularity parameter prediction module and a configuration module. The invention realizes the purposes of improving the load balance and the overall calculation performance of the parallel program by adaptively selecting the parallel task granularity of the SpMV in different input matrixes.
Description
Technical Field
The invention relates to a parallel program task allocation technology, in particular to a task granularity parameter automatic tuning method and device for a sparse matrix vector multiplication parallel program.
Background
In the fields of scientific calculation and artificial intelligence, Sparse Matrix-Vector Multiplication (SpMV) has been widely used as a basic operator, and its corresponding operation module is also one of the most time-consuming modules in software in this field. Unlike dense matrices, sparse matrices have only a small number of non-zero elements, most of which are zero-elements. These zeros will not affect the operation result, and there is extra overhead in accessing and operating the zeros, resulting in low operation efficiency.
Therefore, researchers only store non-zero elements in the sparse matrix in a compressed storage format mode by utilizing the sparse characteristic of the matrix, so that zero element processing is avoided, and storage and access overhead of the matrix is reduced. Common Sparse matrix storage formats include Coordinate List (COO), Compressed Spare Row (CSR), Ellpack (ELL), Hybrid ELL + COO (hyb), and the like. Both the sparse structure of the matrix and the storage format used will have an impact on the sparse matrix operation performance.
The non-zero element distribution of the sparse matrix generated in practical application has irregularity, and meanwhile, the storage hierarchy of a computer system is relatively complex, so that great challenge is brought to the operation performance optimization work of the sparse matrix. The current optimization work mainly expands from two aspects: on one hand, a new sparse matrix storage format and a corresponding SpMV realization algorithm are introduced, and the layout of non-zero elements is reorganized to fully utilize a cache and a wide vector functional component of a processor, so that the characteristics of an upper sparse matrix and the structural characteristics of a bottom hardware system are considered; on the other hand, by using a parallelization method, the sparse matrix operation task is divided and distributed to the parallel computer systems to be executed concurrently. These methods introduce a large number of configuration parameters and create a huge optimization variable space. Finding the optimal configuration parameters in an exhaustive manner is clearly not feasible in this optimization variable space. Therefore, the method is of great significance in studying an automatic performance optimization method to obtain the optimal configuration parameters for sparse matrix vector multiplication operation.
In particular, since there is no dependency relationship between results of two adjacent rows of the calculated SpMV, the cumulative calculation of each row can be regarded as an independent subtask, so the SpMV has the characteristic of easy expansion. Assuming that the processor has t threads, by default, m rows of the sparse matrix are divided into t subtasks, and each thread is responsible for one subtask, and the size of the subtask is g ═ m/t (task granularity). When the task granularity g is m/(2 · t), each thread is responsible for 2 tasks; when the task granularity g is m/(4 · t), each thread is responsible for 4 tasks, and so on … …, when the task granularity g is m/(K · t), each thread is responsible for K tasks. Using different task granularities will result in different ways of task to thread allocation and affect load balancing among threads. In particular, as the number of processor threads and the size of the matrix become larger, the distribution mode of tasks to threads and the granularity of tasks appear to increase sharply, resulting in a huge optimization space. Therefore, for a given sparse matrix data set and a multi-core processor platform, a corresponding optimal task granularity prediction model is required to be constructed to balance the calculation and memory access loads among a plurality of threads, so that the sparse matrix data set, the SpMV parallel calculation task and the hardware platform form optimal matching, the calculation potential of the multi-core processor is mined to the maximum extent, and the calculation efficiency of the SpMV parallel task is improved.
Disclosure of Invention
The invention aims to: the task granularity parameter of the system during parallel operation is automatically adjusted by constructing a model, so that the load balance and the overall operation performance of the sparse matrix vector multiplication (SpMV) parallel program on the multi-core processor are improved.
The invention is realized by the following steps: a sparse matrix vector multiplication parallel task granularity parameter automatic tuning method based on machine learning comprises the following steps:
s1, a prediction model construction step, namely constructing a prediction model by using a machine learning method, and constructing a prediction model f between the statistical characteristic value space X and the parallel task granularity optimal value space Y: x → Y, wherein X (X) is used1,x2,…,xi,…xn) To represent the n-dimensional statistical eigenvectors x, x of the sparse matrixiDenotes the value of statistical characteristic, the index i is 1,2, …, n, the task granularity is expressed by y, and the value is in the statistical characteristic vector x (x)1,x2,…,xi,…xn) Under the condition, the parallel task granularity value y with the highest SpMV running performance0The granularity of the parallel task is optimally taken, wherein i and n are positive integers;
s2, a statistical characteristic value obtaining step, namely analyzing the matrix original data file to obtain a statistical characteristic value of the matrix; the statistical characteristic value is used for describing a sparse matrix and is composed of non-zero element distribution information of the matrix;
s3, an optimal task granularity parameter predicting step, namely, inputting the obtained statistical characteristic values into a prediction model for a sparse matrix to be calculated, and predicting the optimal parallel task granularity of the SpMV program when the matrix statistical characteristic values are used as inputTaking a parameter value; at the initial starting stage of the SpMV program, acquiring a statistical characteristic value vector x of an input sparse matrix, inputting the x into a prediction model, and outputting a result y0That is, the SpMV uses the matrix as the optimal value of the task granularity at the time of input;
and S4, a configuration step, namely, adjusting the task granularity parameter of the system in parallel operation according to the prediction result of the optimal task granularity parameter prediction step. The step can guide the distribution process of the SpMV parallel tasks to the threads, and the load balance and the overall operation efficiency of the parallel program are improved.
The step of constructing the prediction model specifically comprises the following steps:
s11, selecting a matrix statistical characteristic;
and S12, generating training data: the data required by training comprises statistical characteristic values of the sparse matrix and the optimal task granularity of the corresponding matrix; traversing a task granularity value space for each sparse matrix by using an exhaustive search method, and selecting a value which enables SpMV operation performance to be the highest as an optimal value of the task granularity; the value of the statistical characteristic of the sparse matrix in the S11 can be calculated from the read sparse matrix; before training, carrying out standardization processing on the matrix statistical characteristic values;
s13, training the model by using a machine learning algorithm: a machine learning method is used for constructing a prediction model for identifying the relation between the statistical characteristic value of the sparse matrix and the optimal task granularity value; taking the matrix statistical characteristic value and the optimal task granularity as the input of a machine learning algorithm, and training to obtain an optimal task granularity prediction model; the used prediction model training algorithm is a random forest model, a neural network or a support vector machine and the like.
The matrix statistical characteristics comprise sparse matrix row number, sparse matrix column number, sparse matrix non-zero element ratio, sparse matrix row non-zero element minimum number, sparse matrix row non-zero element maximum number, sparse matrix average each row non-zero element number and sparse matrix average each row non-zero element number standard deviation.
A sparse matrix vector multiplication parallel task granularity parameter automatic tuning device based on machine learning comprises:
the prediction model building module is used for building a prediction model by using a machine learning method, and the prediction model f is built between a statistical characteristic value space X and a parallel task granularity optimal value space Y: x → Y, wherein X (X) is used1,x2,…,xi,…xn) To represent the n-dimensional statistical eigenvectors x, x of the sparse matrixiDenotes the value of statistical characteristic, the index i is 1,2, …, n, the task granularity is expressed by y, and the value is in the statistical characteristic vector x (x)1,x2,…,xi,…xn) Under the condition, the parallel task granularity value y with the highest SpMV running performance0The granularity of the parallel task is optimally taken, wherein i and n are positive integers;
the statistical characteristic value obtaining module is used for analyzing the matrix original data file and obtaining the statistical characteristic value of the matrix; the statistical characteristic value is used for describing a sparse matrix and is composed of non-zero element distribution information of the matrix;
the optimal task granularity parameter prediction module is used for inputting the obtained statistical characteristic values into a prediction model for a sparse matrix to be calculated, and predicting the optimal parallel task granularity parameter values of the SpMV program when the matrix statistical characteristic values are used as input; at the initial starting stage of the SpMV program, acquiring a statistical characteristic value input vector x of an input sparse matrix, inputting the input vector x into a prediction model, and outputting a result y0That is, the SpMV uses the matrix as the optimal value of the task granularity at the time of input;
and the configuration module is used for adjusting the task granularity parameters of the system in parallel operation according to the prediction result. The module can guide the distribution process of the SpMV parallel tasks to the threads, and the load balance and the overall operation performance of the SpMV parallel programs are improved.
The prediction model building module specifically executes the following steps:
s11, selecting a matrix statistical characteristic;
and S12, generating training data: the data required by training comprises statistical characteristic values of the sparse matrix and optimal task granularity values of the corresponding matrix; traversing a task granularity value space for each sparse matrix by using an exhaustive search method, and selecting a value which enables SpMV operation performance to be the highest as an optimal value of the task granularity; the statistical characteristic value of the sparse matrix can be obtained by calculation from the read sparse matrix; before training, carrying out standardization processing on the matrix statistical characteristic values;
s13, training the model by using a machine learning algorithm: a machine learning method is used for constructing a prediction model for identifying the relation between the statistical characteristic value of the sparse matrix and the optimal task granularity value; taking the matrix statistical characteristic value and the optimal task granularity as the input of a machine learning algorithm, and training to obtain an optimal task granularity prediction model; the used prediction model training algorithm is a random forest model, a neural network or a support vector machine and the like.
The matrix statistical characteristics comprise sparse matrix row number, sparse matrix column number, sparse matrix non-zero element ratio, sparse matrix row non-zero element minimum number, sparse matrix row non-zero element maximum number, sparse matrix average each row non-zero element number and sparse matrix average each row non-zero element number standard deviation.
The invention has the beneficial effects that:
and a prediction model is established by using a machine learning method, in the initial stage of SpMV starting, the value is calculated according to the statistical characteristics of the input matrix, the optimal value of the task granularity is predicted, the task granularity parameter of the system in parallel operation is configured according to the prediction result, the load balance of the SpMV task on the multi-core processor is improved, and therefore better operation performance is obtained. The experimental results show that, relative to the default task granularity, the average performance improvement of about 35% can be obtained by using the task granularity value selected by the prediction model.
Drawings
FIG. 1 is a flow chart of the sparse matrix vector multiplication task granularity parameter automatic tuning method of the present invention;
FIG. 2 is a schematic diagram of a training process of the sparse matrix vector multiplication task granularity prediction model of the present invention;
FIG. 3 is a schematic diagram of the sparse matrix vector multiplication task granularity parameter automatic tuning device of the present invention.
Detailed Description
The technical solution of the present invention will be described in detail with reference to the following examples.
Example 1: a sparse matrix vector multiplication parallel task granularity parameter automatic tuning method.
Fig. 1 is a flowchart of the sparse matrix vector multiplication task granularity parameter automatic tuning method of the present invention, which includes: the method comprises the steps of prediction model construction, statistical characteristic value acquisition, optimal task granularity parameter prediction and configuration.
S1, a prediction model construction step, namely constructing a prediction model by using a machine learning method, and constructing a prediction model f between the statistical characteristic value space X and the parallel task granularity optimal value space Y: x → Y, wherein X (X) is used1,x2,…,xi,…xn) To represent the n-dimensional statistical eigenvectors x, x of the sparse matrixiRepresenting the value of statistical characteristic, using y to represent the task granularity, and using the statistical characteristic vector x (x)1,x2,…,xi,…xn) Under the condition, the parallel task granularity value y with the highest SpMV running performance0The granularity of the parallel task is optimally taken, wherein i and n are positive integers;
s2, a statistical characteristic value obtaining step, namely analyzing the matrix original data file to obtain a statistical characteristic value of the matrix; the statistical characteristic value is used for describing a sparse matrix and is composed of non-zero element distribution information of the matrix;
s3, an optimal task granularity parameter prediction step, namely, inputting the obtained statistical characteristic values into a prediction model for a sparse matrix to be calculated, and predicting the optimal parallel task granularity parameter values of the SpMV program when the matrix statistical characteristic values are used as input; at the initial starting stage of the SpMV program, acquiring a statistical characteristic value vector x of an input sparse matrix, inputting the x into a prediction model, and outputting a result y0That is, the SpMV uses the matrix as the optimal value of the task granularity at the time of input;
and step S4 is configured, and according to the prediction result, the task granularity parameter of the parallel operation system is adjusted. The step can guide the distribution process of the SpMV parallel tasks to the threads, and the load balance is improved.
The step of constructing the prediction model specifically comprises the following steps:
s11, selecting a matrix statistical characteristic; the matrix statistical characteristics comprise sparse matrix row number, sparse matrix column number, sparse matrix non-zero element proportion, sparse matrix row non-zero element minimum number, sparse matrix row non-zero element maximum number, sparse matrix average each row non-zero element number and sparse matrix average each row non-zero element number standard deviation;
in practical implementation, for the acquisition of the statistical characteristics of the matrix, a Python script program is used for processing the original data file of the matrix, and corresponding characteristic values are extracted or calculated.
S12, generating training data; the data required by training comprises statistical characteristic values of the sparse matrix and the optimal task granularity of the corresponding matrix; traversing a task granularity value space for each sparse matrix by using an exhaustive search method, and selecting a value which enables SpMV operation performance to be the highest as an optimal value of the task granularity; the statistical characteristic value of the sparse matrix can be obtained by calculation from the read sparse matrix; before training, carrying out standardization processing on the matrix statistical characteristic values;
in the actual implementation process, before training, in order to eliminate dimensional influence among indexes, accelerate the speed of solving the optimal solution by gradient descent and improve the accuracy of the model, the statistical characteristic value of the matrix is standardized by using a StandardScale () built-in function of Python. The function uses the formula (X-mean)/std to bring the eigenvalues around 0 with a variance of 1.
To generate the training data, the performance of the SpMV at different task granularities needs to be run for each input data set. Taking a 64-thread hardware platform of an intel (r) xeon (r) Gold 6130 CPU @2.10GHz processor as an example, taking K as 1,2,4,6,8,10,12,14,16, and 9 values in total, testing the operating performance of the SpMV under each K value, and calculating the task allocation granularity K under the optimal performancebest. In an embodiment of the present invention, 1989 sparse matrices were used as a training data set, using a matrix with a number of rows greater than 1024 in the sparse matrix set of florida university. For each of the sparse matrices, the matrix is,and traversing the task granularity value space by using an exhaustive search method, and selecting a value which enables SpMV operation performance (Gflops) to be the highest from the task granularity value space as the optimal value of the task granularity.
In this embodiment, the statistical characteristic value of the input matrix and the optimal task allocation granularity together form a training data set. The number of samples of the data set is equal to the number of input matrices.
S13, training the model by using a machine learning algorithm; a machine learning method is used for constructing a prediction model for identifying the relation between the statistical characteristic value of the sparse matrix and the optimal task granularity value; taking the matrix statistical characteristic value and the optimal task granularity as the input of a machine learning algorithm, and training to obtain an optimal task granularity prediction model; the prediction model constructed in this embodiment is a random forest model, and other models may also be used, such as: neural networks, support vector machines, etc.
When the model is trained, considering that the accuracy of the prediction model is possibly influenced by insufficient training data, the data are randomly divided into a training set (80%) and a testing set (20%), the model is optimized in the training set in a cross validation (cross validation) mode, and the performance of the trained model is evaluated by the testing set. The model tuning adopts k-fold cross validation (k-fold cross validation), and the principle can be simply summarized as follows: the data set D is first divided into k mutually exclusive subsets of similar size, i.e.Each subset Di maintains as consistent a data distribution as possible, i.e. is derived from D by hierarchical sampling. Then, taking the union of k-1 subsets as a training set each time, and taking the rest subsets as a test set; thus, k sets of training/validation sets are obtained, so that k sets of training and validation can be performed, and finally the mean value of the k test results is returned. The most common 10-fold cross validation is adopted in the implementation, and cross validation of the model can be realized by calling a cross _ val _ score () built-in function of Python and obtaining the average accuracy of the model. Finally, the trained models were evaluated using a 20% test set.
The trained model is called by SpMV in the form of a library. Specifically, at the initial stage of SpMV starting, a matrix statistical characteristic value is calculated according to the read original input matrix. Then inputting the matrix statistical characteristic values into an optimal task granularity prediction model, and calculating the optimal task allocation granularity Kpredict. The parameters are input into the runtime system of OpenMP, and the task granularity is set to guide the distribution process of the SpMV task to the thread.
A statistical characteristic value obtaining step S2, which is to analyze the matrix raw data file to obtain a statistical characteristic value of the matrix, where the statistical characteristic value is mainly used to delineate a sparse matrix, and is composed of non-zero-element distribution information of the matrix, and includes: the number of rows of the sparse matrix, the number of columns of the sparse matrix, the non-zero element proportion of the sparse matrix, the minimum number of non-zero elements of the rows of the sparse matrix, the maximum number of non-zero elements of the rows of the sparse matrix, the average number of non-zero elements of each row of the sparse matrix and the standard deviation of the average number of non-zero elements of each row of the sparse matrix.
An optimal task granularity parameter predicting step S3, calculating a statistical characteristic value vector of a sparse matrix to be calculated, inputting the statistical characteristic value vector into the model constructed in S1, and predicting to obtain an optimal task granularity value of the sparse matrix to be calculated; at the initial starting stage of the SpMV program, acquiring a statistical characteristic value vector x of an input sparse matrix, inputting the x into a prediction model, and outputting a result y0That is, the SpMV uses the matrix as the best value of the task granularity at the time of input.
And step S4 is configured, task granularity parameters of the system during parallel operation are adjusted according to the prediction result, the distribution process of SpMV tasks to threads is guided, and the load balance and the overall calculation performance of the parallel program are improved.
FIG. 2 is a schematic diagram of a training process of the sparse matrix vector multiplication task granularity prediction model of the present invention. Firstly, reading a sparse matrix file; then measuring SpMV performance (Gflops) under each task granularity, and marking the corresponding task granularity when the optimal performance is obtained; analyzing the matrix file to obtain statistical characteristic values; and taking the matrix statistical characteristic value and the optimal task granularity value as the input of a machine learning algorithm to construct a prediction model.
Example 2: and the sparse matrix vector multiplication parallel task granularity parameter automatic tuning device.
FIG. 3 is a block diagram of the sparse matrix vector multiplication task granularity parameter automatic tuning device of the present invention, including: the system comprises a prediction model construction module, a statistical characteristic value acquisition module, an optimal task granularity parameter prediction module and a configuration module.
The prediction model building module is used for building a prediction model by using a machine learning method, and the prediction model f is built between a statistical characteristic value space X and a parallel task granularity optimal value space Y: x → Y, wherein X (X) is used1,x2,…,xi,…xn) To represent the n-dimensional statistical characteristic value vector x, x of the sparse matrixiExpressing the value of statistical characteristics, expressing the task granularity by using y, and expressing a vector x (x) in the statistical characteristics1,x2,…,xi,…xn) Under the condition, the parallel task granularity value y with the highest SpMV running performance0The granularity of the parallel task is optimally taken, wherein i and n are positive integers;
the statistical characteristic value obtaining module is used for analyzing the matrix original data file and obtaining the statistical characteristic value of the matrix; the statistical characteristic value is used for describing a sparse matrix and is composed of non-zero element distribution information of the matrix;
the optimal task granularity parameter prediction module is used for inputting the obtained statistical characteristic values into a prediction model for a sparse matrix to be calculated, and predicting the optimal parallel task granularity parameter values of the SpMV program when the matrix statistical characteristic values are used as input; at the initial starting stage of the SpMV program, acquiring a statistical characteristic value output vector x of an input sparse matrix, inputting the x into a prediction model, and outputting a result y0That is, the SpMV uses the matrix as the optimal value of the task granularity at the time of input;
and the configuration module is used for adjusting the task granularity parameters of the system in parallel operation according to the prediction result. The module can guide the distribution process of the SpMV parallel tasks to the threads, and the load balance and the overall operation performance of the SpMV parallel programs are improved.
The prediction model building module specifically executes the following steps:
s11, selecting a matrix statistical characteristic;
and S12, generating training data: the data required by training comprises statistical characteristic values of the sparse matrix and the optimal task granularity of the corresponding matrix; traversing a task granularity value space for each sparse matrix by using an exhaustive search method, and selecting a value which enables SpMV operation performance to be the highest as an optimal value of the task granularity; the statistical characteristic value of the sparse matrix can be obtained by calculation from the read sparse matrix; before training, carrying out standardization processing on the matrix statistical characteristic values;
s13, training the model by using a machine learning algorithm: a machine learning method is used for constructing a prediction model for identifying the relation between the statistical characteristic value of the sparse matrix and the optimal task granularity value; taking the matrix statistical characteristic value and the optimal task granularity as the input of a machine learning algorithm, and training to obtain an optimal task granularity prediction model; the used prediction model training algorithm is a random forest model, a neural network or a support vector machine and the like.
The matrix statistical characteristics comprise sparse matrix row number, sparse matrix column number, sparse matrix non-zero element ratio, sparse matrix row non-zero element minimum number, sparse matrix row non-zero element maximum number, sparse matrix average each row non-zero element number and sparse matrix average each row non-zero element number standard deviation.
The present invention is not intended to be limited to the specific embodiments shown and described, and various modifications and changes can be made by those skilled in the art without departing from the spirit and scope of the invention.
Claims (6)
1. A sparse matrix vector multiplication parallel task granularity parameter automatic tuning method is characterized by specifically comprising the following steps,
s1: step of constructing prediction model
Using machinesThe learning method constructs a prediction model, and a prediction model f is constructed between a statistical characteristic value space X and a parallel task granularity optimal value space Y: x → Y, wherein X (X) is used1,x2,…,xi,…xn) To represent the n-dimensional statistical eigenvectors x, x of the sparse matrixiDenotes the value of statistical characteristic, the index i is 1,2, …, n, the task granularity is expressed by y, and the value is in the statistical characteristic vector x (x)1,x2,…,xi,…xn) Under the condition, the parallel task granularity value y with the highest SpMV running performance0The granularity of the parallel task is optimally taken, wherein i and n are positive integers;
s2: statistical characteristic value obtaining step
Analyzing the matrix original data file to obtain the statistical characteristic value of the matrix; the statistical characteristic value is used for describing a sparse matrix and is composed of non-zero element distribution information of the matrix;
s3: optimal task granularity parameter prediction step
For a sparse matrix to be calculated, inputting the obtained statistical characteristic values into a prediction model, and predicting the optimal parallel task granularity parameter values of the SpMV program when the matrix statistical characteristic values are used as input; at the initial starting stage of the SpMV program, acquiring a statistical characteristic value vector x of an input sparse matrix, inputting the x into a prediction model, and outputting a result y0That is, the SpMV uses the matrix as the optimal value of the task granularity at the time of input;
s4: step of configuration
And adjusting the task granularity parameters of the system in parallel operation according to the prediction result of the optimal task granularity parameter prediction step.
2. The sparse matrix vector multiplication parallel task granularity parameter automatic tuning method of claim 1, wherein the prediction model construction step specifically comprises,
s11: selecting matrix statistical features
S12: generating training data
The data required by training comprises statistical characteristic values of the sparse matrix and the optimal task granularity of the corresponding matrix; traversing a task granularity value space for each sparse matrix by using an exhaustive search method, and selecting a value which enables SpMV operation performance to be the highest as an optimal value of the task granularity; the statistical characteristic value of the sparse matrix can be obtained by calculation from the read sparse matrix; before training, carrying out standardization processing on the matrix statistical characteristic values;
s13: training models using machine learning algorithms
A machine learning method is used for constructing a prediction model for identifying the relation between the statistical characteristic value of the sparse matrix and the optimal task granularity value; taking the matrix statistical characteristic value and the optimal task granularity as the input of a machine learning algorithm, and training to obtain an optimal task granularity prediction model; the used prediction model is a random forest model, a neural network or a support vector machine.
3. The sparse matrix vector multiplication parallel task granularity parameter automatic tuning method of claim 1 or 2, characterized by: the matrix statistical characteristics comprise sparse matrix row number, sparse matrix column number, sparse matrix non-zero element ratio, sparse matrix row non-zero element minimum number, sparse matrix row non-zero element maximum number, sparse matrix average each row non-zero element number and sparse matrix average each row non-zero element number standard deviation.
4. The utility model provides a sparse matrix vector multiply parallel task granularity parameter automatic tuning device based on machine learning which characterized in that includes:
the prediction model building module is used for building a prediction model by using a machine learning method, and the prediction model f is built between a statistical characteristic value space X and a parallel task granularity optimal value space Y: x → Y, wherein X (X) is used1,x2,…,xi,…xn) To represent the n-dimensional statistical eigenvectors x, x of the sparse matrixiDenotes the value of statistical characteristic, the index i is 1,2, …, n, the task granularity is expressed by y, and the value is in the statistical characteristic vector x (x)1,x2,…,xi,…xn) SpMV running Performance under conditionsMaximum parallel task granularity value y0The granularity of the parallel task is optimally taken, wherein i and n are positive integers;
the statistical characteristic value obtaining module is used for analyzing the matrix original data file and obtaining the statistical characteristic value of the matrix; the statistical characteristic value is used for describing a sparse matrix and is composed of non-zero element distribution information of the matrix;
the optimal task granularity parameter prediction module is used for inputting the obtained statistical characteristic values into a prediction model for a sparse matrix to be calculated, and predicting the optimal parallel task granularity parameter values of the SpMV program when the matrix statistical characteristic values are used as input; at the initial starting stage of the SpMV program, acquiring a statistical characteristic value output vector x of an input sparse matrix, inputting the x into a prediction model, and outputting a result y0That is, the SpMV uses the matrix as the optimal value of the task granularity at the time of input;
and the configuration module is used for adjusting the task granularity parameters of the system in parallel operation according to the prediction result.
5. The machine-learning-based sparse matrix vector multiplication parallel task granularity parameter automatic tuning device of claim 4, wherein the prediction model construction module specifically executes the following steps:
s11: selecting matrix statistical features
S12: generating training data
The data required by training comprises statistical characteristic values of the sparse matrix and the optimal task granularity of the corresponding matrix; traversing a task granularity value space for each sparse matrix by using an exhaustive search method, and selecting a value which enables SpMV operation performance to be the highest as an optimal value of the task granularity; the statistical characteristic value of the sparse matrix can be obtained by calculation from the read sparse matrix; before training, carrying out standardization processing on the matrix statistical characteristic values;
s13: training models using machine learning algorithms
A machine learning method is used for constructing a prediction model for identifying the relation between the statistical characteristic value of the sparse matrix and the optimal task granularity value; taking the matrix statistical characteristic value and the optimal task granularity as the input of a machine learning algorithm, and training to obtain an optimal task granularity prediction model; the used prediction model training algorithm is a random forest model, a neural network or a support vector machine and the like.
6. The machine-learning-based sparse matrix vector multiplication parallel task granularity parameter auto-tuning apparatus of claim 4 or 5, wherein: the matrix statistical characteristics comprise sparse matrix row number, sparse matrix column number, sparse matrix non-zero element ratio, sparse matrix row non-zero element minimum number, sparse matrix row non-zero element maximum number, sparse matrix average each row non-zero element number and sparse matrix average each row non-zero element number standard deviation.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010701971 | 2020-07-20 | ||
CN2020107019714 | 2020-07-20 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111984418A true CN111984418A (en) | 2020-11-24 |
CN111984418B CN111984418B (en) | 2022-09-02 |
Family
ID=73441027
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010880655.8A Active CN111984418B (en) | 2020-07-20 | 2020-08-27 | Automatic adjusting and optimizing method and device for granularity parameters of sparse matrix vector multiplication parallel tasks |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111984418B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113360188A (en) * | 2021-05-18 | 2021-09-07 | 中国石油大学(北京) | Parallel processing method and device for optimizing sparse matrix-vector multiplication |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104636273A (en) * | 2015-02-28 | 2015-05-20 | 中国科学技术大学 | Storage method of sparse matrix on SIMD multi-core processor with multi-level cache |
US20180189239A1 (en) * | 2016-12-31 | 2018-07-05 | Intel Corporation | Heterogeneous hardware accelerator architecture for processing sparse matrix data with skewed non-zero distributions |
CN109993683A (en) * | 2017-12-29 | 2019-07-09 | 英特尔公司 | Machine learning sparse calculation mechanism, the algorithm calculations micro-architecture and sparsity for training mechanism of any neural network |
CN111428192A (en) * | 2020-03-19 | 2020-07-17 | 湖南大学 | Method and system for optimizing high performance computational architecture sparse matrix vector multiplication |
-
2020
- 2020-08-27 CN CN202010880655.8A patent/CN111984418B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104636273A (en) * | 2015-02-28 | 2015-05-20 | 中国科学技术大学 | Storage method of sparse matrix on SIMD multi-core processor with multi-level cache |
US20180189239A1 (en) * | 2016-12-31 | 2018-07-05 | Intel Corporation | Heterogeneous hardware accelerator architecture for processing sparse matrix data with skewed non-zero distributions |
CN109993683A (en) * | 2017-12-29 | 2019-07-09 | 英特尔公司 | Machine learning sparse calculation mechanism, the algorithm calculations micro-architecture and sparsity for training mechanism of any neural network |
CN111428192A (en) * | 2020-03-19 | 2020-07-17 | 湖南大学 | Method and system for optimizing high performance computational architecture sparse matrix vector multiplication |
Non-Patent Citations (1)
Title |
---|
张凯: "向量SIMD DSP上高效矩阵运算技术研究", 《国防科学技术大学工学博士论文》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113360188A (en) * | 2021-05-18 | 2021-09-07 | 中国石油大学(北京) | Parallel processing method and device for optimizing sparse matrix-vector multiplication |
CN113360188B (en) * | 2021-05-18 | 2023-10-31 | 中国石油大学(北京) | Parallel processing method and device for optimizing sparse matrix-vector multiplication |
Also Published As
Publication number | Publication date |
---|---|
CN111984418B (en) | 2022-09-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Borlea et al. | A unified form of fuzzy C-means and K-means algorithms and its partitional implementation | |
Khaleghzadeh et al. | A novel data-partitioning algorithm for performance optimization of data-parallel applications on heterogeneous HPC platforms | |
Li et al. | Performance analysis of GPU-based convolutional neural networks | |
Benatia et al. | Sparse matrix format selection with multiclass SVM for SpMV on GPU | |
CN107644063B (en) | Time sequence analysis method and system based on data parallelism | |
CN112101525A (en) | Method, device and system for designing neural network through NAS | |
Daghero et al. | Energy-efficient deep learning inference on edge devices | |
Liang et al. | OMNI: A framework for integrating hardware and software optimizations for sparse CNNs | |
Neelima et al. | Predicting an optimal sparse matrix format for SpMV computation on GPU | |
Benatia et al. | Machine learning approach for the predicting performance of SpMV on GPU | |
CN111984418B (en) | Automatic adjusting and optimizing method and device for granularity parameters of sparse matrix vector multiplication parallel tasks | |
Moreno et al. | Improving the performance and energy of non-dominated sorting for evolutionary multiobjective optimization on GPU/CPU platforms | |
Bai et al. | Dnnabacus: Toward accurate computational cost prediction for deep neural networks | |
Daoudi et al. | A Comparative study of parallel CPU/GPU implementations of the K-Means Algorithm | |
Ni et al. | Online performance and power prediction for edge TPU via comprehensive characterization | |
CN115344386A (en) | Method, device and equipment for predicting cloud simulation computing resources based on sequencing learning | |
He et al. | HOME: A holistic GPU memory management framework for deep learning | |
CN112686342B (en) | Training method, device and equipment of SVM (support vector machine) model and computer-readable storage medium | |
CN112083929B (en) | Performance-energy consumption collaborative optimization method and device for power constraint system | |
Iskandar et al. | Near-data-processing architectures performance estimation and ranking using machine learning predictors | |
Elafrou et al. | A lightweight optimization selection method for Sparse Matrix-Vector Multiplication | |
Wang et al. | A novel parallel algorithm for sparse tensor matrix chain multiplication via tcu-acceleration | |
Glavan et al. | Cloud environment assessment using clustering techniques on microservices dataset | |
Song et al. | DNN training acceleration via exploring GPGPU friendly sparsity | |
CN116341628B (en) | Gradient sparsification method, system, equipment and storage medium for distributed training |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |