CN111984418A - Automatic adjusting and optimizing method and device for granularity parameters of sparse matrix vector multiplication parallel tasks - Google Patents

Automatic adjusting and optimizing method and device for granularity parameters of sparse matrix vector multiplication parallel tasks Download PDF

Info

Publication number
CN111984418A
CN111984418A CN202010880655.8A CN202010880655A CN111984418A CN 111984418 A CN111984418 A CN 111984418A CN 202010880655 A CN202010880655 A CN 202010880655A CN 111984418 A CN111984418 A CN 111984418A
Authority
CN
China
Prior art keywords
matrix
sparse matrix
statistical characteristic
task granularity
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010880655.8A
Other languages
Chinese (zh)
Other versions
CN111984418B (en
Inventor
方建滨
黄春
唐滔
彭林
张鹏
范小康
崔英博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Publication of CN111984418A publication Critical patent/CN111984418A/en
Application granted granted Critical
Publication of CN111984418B publication Critical patent/CN111984418B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/10Machine learning using kernel methods, e.g. support vector machines [SVM]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention belongs to the field of parallel computing, and discloses an automatic adjusting and optimizing method and device for sparse matrix vector multiplication parallel task granularity parameters, wherein the method comprises a prediction model construction step, namely constructing a prediction model by using a machine learning method; a statistical characteristic value obtaining step, namely analyzing the original data file of the matrix to obtain a statistical characteristic value of the matrix; an optimal task granularity parameter prediction step, namely inputting the obtained statistical characteristic values into a prediction model, and predicting optimal parallel task granularity parameter values of the SpMV program when the matrix characteristic values are used as input; and a configuration step, namely adjusting the task granularity of the system in parallel operation according to the prediction result. The device comprises a prediction model construction module, a statistical characteristic value acquisition module, an optimal task granularity parameter prediction module and a configuration module. The invention realizes the purposes of improving the load balance and the overall calculation performance of the parallel program by adaptively selecting the parallel task granularity of the SpMV in different input matrixes.

Description

Automatic adjusting and optimizing method and device for granularity parameters of sparse matrix vector multiplication parallel tasks
Technical Field
The invention relates to a parallel program task allocation technology, in particular to a task granularity parameter automatic tuning method and device for a sparse matrix vector multiplication parallel program.
Background
In the fields of scientific calculation and artificial intelligence, Sparse Matrix-Vector Multiplication (SpMV) has been widely used as a basic operator, and its corresponding operation module is also one of the most time-consuming modules in software in this field. Unlike dense matrices, sparse matrices have only a small number of non-zero elements, most of which are zero-elements. These zeros will not affect the operation result, and there is extra overhead in accessing and operating the zeros, resulting in low operation efficiency.
Therefore, researchers only store non-zero elements in the sparse matrix in a compressed storage format mode by utilizing the sparse characteristic of the matrix, so that zero element processing is avoided, and storage and access overhead of the matrix is reduced. Common Sparse matrix storage formats include Coordinate List (COO), Compressed Spare Row (CSR), Ellpack (ELL), Hybrid ELL + COO (hyb), and the like. Both the sparse structure of the matrix and the storage format used will have an impact on the sparse matrix operation performance.
The non-zero element distribution of the sparse matrix generated in practical application has irregularity, and meanwhile, the storage hierarchy of a computer system is relatively complex, so that great challenge is brought to the operation performance optimization work of the sparse matrix. The current optimization work mainly expands from two aspects: on one hand, a new sparse matrix storage format and a corresponding SpMV realization algorithm are introduced, and the layout of non-zero elements is reorganized to fully utilize a cache and a wide vector functional component of a processor, so that the characteristics of an upper sparse matrix and the structural characteristics of a bottom hardware system are considered; on the other hand, by using a parallelization method, the sparse matrix operation task is divided and distributed to the parallel computer systems to be executed concurrently. These methods introduce a large number of configuration parameters and create a huge optimization variable space. Finding the optimal configuration parameters in an exhaustive manner is clearly not feasible in this optimization variable space. Therefore, the method is of great significance in studying an automatic performance optimization method to obtain the optimal configuration parameters for sparse matrix vector multiplication operation.
In particular, since there is no dependency relationship between results of two adjacent rows of the calculated SpMV, the cumulative calculation of each row can be regarded as an independent subtask, so the SpMV has the characteristic of easy expansion. Assuming that the processor has t threads, by default, m rows of the sparse matrix are divided into t subtasks, and each thread is responsible for one subtask, and the size of the subtask is g ═ m/t (task granularity). When the task granularity g is m/(2 · t), each thread is responsible for 2 tasks; when the task granularity g is m/(4 · t), each thread is responsible for 4 tasks, and so on … …, when the task granularity g is m/(K · t), each thread is responsible for K tasks. Using different task granularities will result in different ways of task to thread allocation and affect load balancing among threads. In particular, as the number of processor threads and the size of the matrix become larger, the distribution mode of tasks to threads and the granularity of tasks appear to increase sharply, resulting in a huge optimization space. Therefore, for a given sparse matrix data set and a multi-core processor platform, a corresponding optimal task granularity prediction model is required to be constructed to balance the calculation and memory access loads among a plurality of threads, so that the sparse matrix data set, the SpMV parallel calculation task and the hardware platform form optimal matching, the calculation potential of the multi-core processor is mined to the maximum extent, and the calculation efficiency of the SpMV parallel task is improved.
Disclosure of Invention
The invention aims to: the task granularity parameter of the system during parallel operation is automatically adjusted by constructing a model, so that the load balance and the overall operation performance of the sparse matrix vector multiplication (SpMV) parallel program on the multi-core processor are improved.
The invention is realized by the following steps: a sparse matrix vector multiplication parallel task granularity parameter automatic tuning method based on machine learning comprises the following steps:
s1, a prediction model construction step, namely constructing a prediction model by using a machine learning method, and constructing a prediction model f between the statistical characteristic value space X and the parallel task granularity optimal value space Y: x → Y, wherein X (X) is used1,x2,…,xi,…xn) To represent the n-dimensional statistical eigenvectors x, x of the sparse matrixiDenotes the value of statistical characteristic, the index i is 1,2, …, n, the task granularity is expressed by y, and the value is in the statistical characteristic vector x (x)1,x2,…,xi,…xn) Under the condition, the parallel task granularity value y with the highest SpMV running performance0The granularity of the parallel task is optimally taken, wherein i and n are positive integers;
s2, a statistical characteristic value obtaining step, namely analyzing the matrix original data file to obtain a statistical characteristic value of the matrix; the statistical characteristic value is used for describing a sparse matrix and is composed of non-zero element distribution information of the matrix;
s3, an optimal task granularity parameter predicting step, namely, inputting the obtained statistical characteristic values into a prediction model for a sparse matrix to be calculated, and predicting the optimal parallel task granularity of the SpMV program when the matrix statistical characteristic values are used as inputTaking a parameter value; at the initial starting stage of the SpMV program, acquiring a statistical characteristic value vector x of an input sparse matrix, inputting the x into a prediction model, and outputting a result y0That is, the SpMV uses the matrix as the optimal value of the task granularity at the time of input;
and S4, a configuration step, namely, adjusting the task granularity parameter of the system in parallel operation according to the prediction result of the optimal task granularity parameter prediction step. The step can guide the distribution process of the SpMV parallel tasks to the threads, and the load balance and the overall operation efficiency of the parallel program are improved.
The step of constructing the prediction model specifically comprises the following steps:
s11, selecting a matrix statistical characteristic;
and S12, generating training data: the data required by training comprises statistical characteristic values of the sparse matrix and the optimal task granularity of the corresponding matrix; traversing a task granularity value space for each sparse matrix by using an exhaustive search method, and selecting a value which enables SpMV operation performance to be the highest as an optimal value of the task granularity; the value of the statistical characteristic of the sparse matrix in the S11 can be calculated from the read sparse matrix; before training, carrying out standardization processing on the matrix statistical characteristic values;
s13, training the model by using a machine learning algorithm: a machine learning method is used for constructing a prediction model for identifying the relation between the statistical characteristic value of the sparse matrix and the optimal task granularity value; taking the matrix statistical characteristic value and the optimal task granularity as the input of a machine learning algorithm, and training to obtain an optimal task granularity prediction model; the used prediction model training algorithm is a random forest model, a neural network or a support vector machine and the like.
The matrix statistical characteristics comprise sparse matrix row number, sparse matrix column number, sparse matrix non-zero element ratio, sparse matrix row non-zero element minimum number, sparse matrix row non-zero element maximum number, sparse matrix average each row non-zero element number and sparse matrix average each row non-zero element number standard deviation.
A sparse matrix vector multiplication parallel task granularity parameter automatic tuning device based on machine learning comprises:
the prediction model building module is used for building a prediction model by using a machine learning method, and the prediction model f is built between a statistical characteristic value space X and a parallel task granularity optimal value space Y: x → Y, wherein X (X) is used1,x2,…,xi,…xn) To represent the n-dimensional statistical eigenvectors x, x of the sparse matrixiDenotes the value of statistical characteristic, the index i is 1,2, …, n, the task granularity is expressed by y, and the value is in the statistical characteristic vector x (x)1,x2,…,xi,…xn) Under the condition, the parallel task granularity value y with the highest SpMV running performance0The granularity of the parallel task is optimally taken, wherein i and n are positive integers;
the statistical characteristic value obtaining module is used for analyzing the matrix original data file and obtaining the statistical characteristic value of the matrix; the statistical characteristic value is used for describing a sparse matrix and is composed of non-zero element distribution information of the matrix;
the optimal task granularity parameter prediction module is used for inputting the obtained statistical characteristic values into a prediction model for a sparse matrix to be calculated, and predicting the optimal parallel task granularity parameter values of the SpMV program when the matrix statistical characteristic values are used as input; at the initial starting stage of the SpMV program, acquiring a statistical characteristic value input vector x of an input sparse matrix, inputting the input vector x into a prediction model, and outputting a result y0That is, the SpMV uses the matrix as the optimal value of the task granularity at the time of input;
and the configuration module is used for adjusting the task granularity parameters of the system in parallel operation according to the prediction result. The module can guide the distribution process of the SpMV parallel tasks to the threads, and the load balance and the overall operation performance of the SpMV parallel programs are improved.
The prediction model building module specifically executes the following steps:
s11, selecting a matrix statistical characteristic;
and S12, generating training data: the data required by training comprises statistical characteristic values of the sparse matrix and optimal task granularity values of the corresponding matrix; traversing a task granularity value space for each sparse matrix by using an exhaustive search method, and selecting a value which enables SpMV operation performance to be the highest as an optimal value of the task granularity; the statistical characteristic value of the sparse matrix can be obtained by calculation from the read sparse matrix; before training, carrying out standardization processing on the matrix statistical characteristic values;
s13, training the model by using a machine learning algorithm: a machine learning method is used for constructing a prediction model for identifying the relation between the statistical characteristic value of the sparse matrix and the optimal task granularity value; taking the matrix statistical characteristic value and the optimal task granularity as the input of a machine learning algorithm, and training to obtain an optimal task granularity prediction model; the used prediction model training algorithm is a random forest model, a neural network or a support vector machine and the like.
The matrix statistical characteristics comprise sparse matrix row number, sparse matrix column number, sparse matrix non-zero element ratio, sparse matrix row non-zero element minimum number, sparse matrix row non-zero element maximum number, sparse matrix average each row non-zero element number and sparse matrix average each row non-zero element number standard deviation.
The invention has the beneficial effects that:
and a prediction model is established by using a machine learning method, in the initial stage of SpMV starting, the value is calculated according to the statistical characteristics of the input matrix, the optimal value of the task granularity is predicted, the task granularity parameter of the system in parallel operation is configured according to the prediction result, the load balance of the SpMV task on the multi-core processor is improved, and therefore better operation performance is obtained. The experimental results show that, relative to the default task granularity, the average performance improvement of about 35% can be obtained by using the task granularity value selected by the prediction model.
Drawings
FIG. 1 is a flow chart of the sparse matrix vector multiplication task granularity parameter automatic tuning method of the present invention;
FIG. 2 is a schematic diagram of a training process of the sparse matrix vector multiplication task granularity prediction model of the present invention;
FIG. 3 is a schematic diagram of the sparse matrix vector multiplication task granularity parameter automatic tuning device of the present invention.
Detailed Description
The technical solution of the present invention will be described in detail with reference to the following examples.
Example 1: a sparse matrix vector multiplication parallel task granularity parameter automatic tuning method.
Fig. 1 is a flowchart of the sparse matrix vector multiplication task granularity parameter automatic tuning method of the present invention, which includes: the method comprises the steps of prediction model construction, statistical characteristic value acquisition, optimal task granularity parameter prediction and configuration.
S1, a prediction model construction step, namely constructing a prediction model by using a machine learning method, and constructing a prediction model f between the statistical characteristic value space X and the parallel task granularity optimal value space Y: x → Y, wherein X (X) is used1,x2,…,xi,…xn) To represent the n-dimensional statistical eigenvectors x, x of the sparse matrixiRepresenting the value of statistical characteristic, using y to represent the task granularity, and using the statistical characteristic vector x (x)1,x2,…,xi,…xn) Under the condition, the parallel task granularity value y with the highest SpMV running performance0The granularity of the parallel task is optimally taken, wherein i and n are positive integers;
s2, a statistical characteristic value obtaining step, namely analyzing the matrix original data file to obtain a statistical characteristic value of the matrix; the statistical characteristic value is used for describing a sparse matrix and is composed of non-zero element distribution information of the matrix;
s3, an optimal task granularity parameter prediction step, namely, inputting the obtained statistical characteristic values into a prediction model for a sparse matrix to be calculated, and predicting the optimal parallel task granularity parameter values of the SpMV program when the matrix statistical characteristic values are used as input; at the initial starting stage of the SpMV program, acquiring a statistical characteristic value vector x of an input sparse matrix, inputting the x into a prediction model, and outputting a result y0That is, the SpMV uses the matrix as the optimal value of the task granularity at the time of input;
and step S4 is configured, and according to the prediction result, the task granularity parameter of the parallel operation system is adjusted. The step can guide the distribution process of the SpMV parallel tasks to the threads, and the load balance is improved.
The step of constructing the prediction model specifically comprises the following steps:
s11, selecting a matrix statistical characteristic; the matrix statistical characteristics comprise sparse matrix row number, sparse matrix column number, sparse matrix non-zero element proportion, sparse matrix row non-zero element minimum number, sparse matrix row non-zero element maximum number, sparse matrix average each row non-zero element number and sparse matrix average each row non-zero element number standard deviation;
in practical implementation, for the acquisition of the statistical characteristics of the matrix, a Python script program is used for processing the original data file of the matrix, and corresponding characteristic values are extracted or calculated.
S12, generating training data; the data required by training comprises statistical characteristic values of the sparse matrix and the optimal task granularity of the corresponding matrix; traversing a task granularity value space for each sparse matrix by using an exhaustive search method, and selecting a value which enables SpMV operation performance to be the highest as an optimal value of the task granularity; the statistical characteristic value of the sparse matrix can be obtained by calculation from the read sparse matrix; before training, carrying out standardization processing on the matrix statistical characteristic values;
in the actual implementation process, before training, in order to eliminate dimensional influence among indexes, accelerate the speed of solving the optimal solution by gradient descent and improve the accuracy of the model, the statistical characteristic value of the matrix is standardized by using a StandardScale () built-in function of Python. The function uses the formula (X-mean)/std to bring the eigenvalues around 0 with a variance of 1.
To generate the training data, the performance of the SpMV at different task granularities needs to be run for each input data set. Taking a 64-thread hardware platform of an intel (r) xeon (r) Gold 6130 CPU @2.10GHz processor as an example, taking K as 1,2,4,6,8,10,12,14,16, and 9 values in total, testing the operating performance of the SpMV under each K value, and calculating the task allocation granularity K under the optimal performancebest. In an embodiment of the present invention, 1989 sparse matrices were used as a training data set, using a matrix with a number of rows greater than 1024 in the sparse matrix set of florida university. For each of the sparse matrices, the matrix is,and traversing the task granularity value space by using an exhaustive search method, and selecting a value which enables SpMV operation performance (Gflops) to be the highest from the task granularity value space as the optimal value of the task granularity.
In this embodiment, the statistical characteristic value of the input matrix and the optimal task allocation granularity together form a training data set. The number of samples of the data set is equal to the number of input matrices.
S13, training the model by using a machine learning algorithm; a machine learning method is used for constructing a prediction model for identifying the relation between the statistical characteristic value of the sparse matrix and the optimal task granularity value; taking the matrix statistical characteristic value and the optimal task granularity as the input of a machine learning algorithm, and training to obtain an optimal task granularity prediction model; the prediction model constructed in this embodiment is a random forest model, and other models may also be used, such as: neural networks, support vector machines, etc.
When the model is trained, considering that the accuracy of the prediction model is possibly influenced by insufficient training data, the data are randomly divided into a training set (80%) and a testing set (20%), the model is optimized in the training set in a cross validation (cross validation) mode, and the performance of the trained model is evaluated by the testing set. The model tuning adopts k-fold cross validation (k-fold cross validation), and the principle can be simply summarized as follows: the data set D is first divided into k mutually exclusive subsets of similar size, i.e.
Figure BDA0002654006000000071
Each subset Di maintains as consistent a data distribution as possible, i.e. is derived from D by hierarchical sampling. Then, taking the union of k-1 subsets as a training set each time, and taking the rest subsets as a test set; thus, k sets of training/validation sets are obtained, so that k sets of training and validation can be performed, and finally the mean value of the k test results is returned. The most common 10-fold cross validation is adopted in the implementation, and cross validation of the model can be realized by calling a cross _ val _ score () built-in function of Python and obtaining the average accuracy of the model. Finally, the trained models were evaluated using a 20% test set.
The trained model is called by SpMV in the form of a library. Specifically, at the initial stage of SpMV starting, a matrix statistical characteristic value is calculated according to the read original input matrix. Then inputting the matrix statistical characteristic values into an optimal task granularity prediction model, and calculating the optimal task allocation granularity Kpredict. The parameters are input into the runtime system of OpenMP, and the task granularity is set to guide the distribution process of the SpMV task to the thread.
A statistical characteristic value obtaining step S2, which is to analyze the matrix raw data file to obtain a statistical characteristic value of the matrix, where the statistical characteristic value is mainly used to delineate a sparse matrix, and is composed of non-zero-element distribution information of the matrix, and includes: the number of rows of the sparse matrix, the number of columns of the sparse matrix, the non-zero element proportion of the sparse matrix, the minimum number of non-zero elements of the rows of the sparse matrix, the maximum number of non-zero elements of the rows of the sparse matrix, the average number of non-zero elements of each row of the sparse matrix and the standard deviation of the average number of non-zero elements of each row of the sparse matrix.
An optimal task granularity parameter predicting step S3, calculating a statistical characteristic value vector of a sparse matrix to be calculated, inputting the statistical characteristic value vector into the model constructed in S1, and predicting to obtain an optimal task granularity value of the sparse matrix to be calculated; at the initial starting stage of the SpMV program, acquiring a statistical characteristic value vector x of an input sparse matrix, inputting the x into a prediction model, and outputting a result y0That is, the SpMV uses the matrix as the best value of the task granularity at the time of input.
And step S4 is configured, task granularity parameters of the system during parallel operation are adjusted according to the prediction result, the distribution process of SpMV tasks to threads is guided, and the load balance and the overall calculation performance of the parallel program are improved.
FIG. 2 is a schematic diagram of a training process of the sparse matrix vector multiplication task granularity prediction model of the present invention. Firstly, reading a sparse matrix file; then measuring SpMV performance (Gflops) under each task granularity, and marking the corresponding task granularity when the optimal performance is obtained; analyzing the matrix file to obtain statistical characteristic values; and taking the matrix statistical characteristic value and the optimal task granularity value as the input of a machine learning algorithm to construct a prediction model.
Example 2: and the sparse matrix vector multiplication parallel task granularity parameter automatic tuning device.
FIG. 3 is a block diagram of the sparse matrix vector multiplication task granularity parameter automatic tuning device of the present invention, including: the system comprises a prediction model construction module, a statistical characteristic value acquisition module, an optimal task granularity parameter prediction module and a configuration module.
The prediction model building module is used for building a prediction model by using a machine learning method, and the prediction model f is built between a statistical characteristic value space X and a parallel task granularity optimal value space Y: x → Y, wherein X (X) is used1,x2,…,xi,…xn) To represent the n-dimensional statistical characteristic value vector x, x of the sparse matrixiExpressing the value of statistical characteristics, expressing the task granularity by using y, and expressing a vector x (x) in the statistical characteristics1,x2,…,xi,…xn) Under the condition, the parallel task granularity value y with the highest SpMV running performance0The granularity of the parallel task is optimally taken, wherein i and n are positive integers;
the statistical characteristic value obtaining module is used for analyzing the matrix original data file and obtaining the statistical characteristic value of the matrix; the statistical characteristic value is used for describing a sparse matrix and is composed of non-zero element distribution information of the matrix;
the optimal task granularity parameter prediction module is used for inputting the obtained statistical characteristic values into a prediction model for a sparse matrix to be calculated, and predicting the optimal parallel task granularity parameter values of the SpMV program when the matrix statistical characteristic values are used as input; at the initial starting stage of the SpMV program, acquiring a statistical characteristic value output vector x of an input sparse matrix, inputting the x into a prediction model, and outputting a result y0That is, the SpMV uses the matrix as the optimal value of the task granularity at the time of input;
and the configuration module is used for adjusting the task granularity parameters of the system in parallel operation according to the prediction result. The module can guide the distribution process of the SpMV parallel tasks to the threads, and the load balance and the overall operation performance of the SpMV parallel programs are improved.
The prediction model building module specifically executes the following steps:
s11, selecting a matrix statistical characteristic;
and S12, generating training data: the data required by training comprises statistical characteristic values of the sparse matrix and the optimal task granularity of the corresponding matrix; traversing a task granularity value space for each sparse matrix by using an exhaustive search method, and selecting a value which enables SpMV operation performance to be the highest as an optimal value of the task granularity; the statistical characteristic value of the sparse matrix can be obtained by calculation from the read sparse matrix; before training, carrying out standardization processing on the matrix statistical characteristic values;
s13, training the model by using a machine learning algorithm: a machine learning method is used for constructing a prediction model for identifying the relation between the statistical characteristic value of the sparse matrix and the optimal task granularity value; taking the matrix statistical characteristic value and the optimal task granularity as the input of a machine learning algorithm, and training to obtain an optimal task granularity prediction model; the used prediction model training algorithm is a random forest model, a neural network or a support vector machine and the like.
The matrix statistical characteristics comprise sparse matrix row number, sparse matrix column number, sparse matrix non-zero element ratio, sparse matrix row non-zero element minimum number, sparse matrix row non-zero element maximum number, sparse matrix average each row non-zero element number and sparse matrix average each row non-zero element number standard deviation.
The present invention is not intended to be limited to the specific embodiments shown and described, and various modifications and changes can be made by those skilled in the art without departing from the spirit and scope of the invention.

Claims (6)

1. A sparse matrix vector multiplication parallel task granularity parameter automatic tuning method is characterized by specifically comprising the following steps,
s1: step of constructing prediction model
Using machinesThe learning method constructs a prediction model, and a prediction model f is constructed between a statistical characteristic value space X and a parallel task granularity optimal value space Y: x → Y, wherein X (X) is used1,x2,…,xi,…xn) To represent the n-dimensional statistical eigenvectors x, x of the sparse matrixiDenotes the value of statistical characteristic, the index i is 1,2, …, n, the task granularity is expressed by y, and the value is in the statistical characteristic vector x (x)1,x2,…,xi,…xn) Under the condition, the parallel task granularity value y with the highest SpMV running performance0The granularity of the parallel task is optimally taken, wherein i and n are positive integers;
s2: statistical characteristic value obtaining step
Analyzing the matrix original data file to obtain the statistical characteristic value of the matrix; the statistical characteristic value is used for describing a sparse matrix and is composed of non-zero element distribution information of the matrix;
s3: optimal task granularity parameter prediction step
For a sparse matrix to be calculated, inputting the obtained statistical characteristic values into a prediction model, and predicting the optimal parallel task granularity parameter values of the SpMV program when the matrix statistical characteristic values are used as input; at the initial starting stage of the SpMV program, acquiring a statistical characteristic value vector x of an input sparse matrix, inputting the x into a prediction model, and outputting a result y0That is, the SpMV uses the matrix as the optimal value of the task granularity at the time of input;
s4: step of configuration
And adjusting the task granularity parameters of the system in parallel operation according to the prediction result of the optimal task granularity parameter prediction step.
2. The sparse matrix vector multiplication parallel task granularity parameter automatic tuning method of claim 1, wherein the prediction model construction step specifically comprises,
s11: selecting matrix statistical features
S12: generating training data
The data required by training comprises statistical characteristic values of the sparse matrix and the optimal task granularity of the corresponding matrix; traversing a task granularity value space for each sparse matrix by using an exhaustive search method, and selecting a value which enables SpMV operation performance to be the highest as an optimal value of the task granularity; the statistical characteristic value of the sparse matrix can be obtained by calculation from the read sparse matrix; before training, carrying out standardization processing on the matrix statistical characteristic values;
s13: training models using machine learning algorithms
A machine learning method is used for constructing a prediction model for identifying the relation between the statistical characteristic value of the sparse matrix and the optimal task granularity value; taking the matrix statistical characteristic value and the optimal task granularity as the input of a machine learning algorithm, and training to obtain an optimal task granularity prediction model; the used prediction model is a random forest model, a neural network or a support vector machine.
3. The sparse matrix vector multiplication parallel task granularity parameter automatic tuning method of claim 1 or 2, characterized by: the matrix statistical characteristics comprise sparse matrix row number, sparse matrix column number, sparse matrix non-zero element ratio, sparse matrix row non-zero element minimum number, sparse matrix row non-zero element maximum number, sparse matrix average each row non-zero element number and sparse matrix average each row non-zero element number standard deviation.
4. The utility model provides a sparse matrix vector multiply parallel task granularity parameter automatic tuning device based on machine learning which characterized in that includes:
the prediction model building module is used for building a prediction model by using a machine learning method, and the prediction model f is built between a statistical characteristic value space X and a parallel task granularity optimal value space Y: x → Y, wherein X (X) is used1,x2,…,xi,…xn) To represent the n-dimensional statistical eigenvectors x, x of the sparse matrixiDenotes the value of statistical characteristic, the index i is 1,2, …, n, the task granularity is expressed by y, and the value is in the statistical characteristic vector x (x)1,x2,…,xi,…xn) SpMV running Performance under conditionsMaximum parallel task granularity value y0The granularity of the parallel task is optimally taken, wherein i and n are positive integers;
the statistical characteristic value obtaining module is used for analyzing the matrix original data file and obtaining the statistical characteristic value of the matrix; the statistical characteristic value is used for describing a sparse matrix and is composed of non-zero element distribution information of the matrix;
the optimal task granularity parameter prediction module is used for inputting the obtained statistical characteristic values into a prediction model for a sparse matrix to be calculated, and predicting the optimal parallel task granularity parameter values of the SpMV program when the matrix statistical characteristic values are used as input; at the initial starting stage of the SpMV program, acquiring a statistical characteristic value output vector x of an input sparse matrix, inputting the x into a prediction model, and outputting a result y0That is, the SpMV uses the matrix as the optimal value of the task granularity at the time of input;
and the configuration module is used for adjusting the task granularity parameters of the system in parallel operation according to the prediction result.
5. The machine-learning-based sparse matrix vector multiplication parallel task granularity parameter automatic tuning device of claim 4, wherein the prediction model construction module specifically executes the following steps:
s11: selecting matrix statistical features
S12: generating training data
The data required by training comprises statistical characteristic values of the sparse matrix and the optimal task granularity of the corresponding matrix; traversing a task granularity value space for each sparse matrix by using an exhaustive search method, and selecting a value which enables SpMV operation performance to be the highest as an optimal value of the task granularity; the statistical characteristic value of the sparse matrix can be obtained by calculation from the read sparse matrix; before training, carrying out standardization processing on the matrix statistical characteristic values;
s13: training models using machine learning algorithms
A machine learning method is used for constructing a prediction model for identifying the relation between the statistical characteristic value of the sparse matrix and the optimal task granularity value; taking the matrix statistical characteristic value and the optimal task granularity as the input of a machine learning algorithm, and training to obtain an optimal task granularity prediction model; the used prediction model training algorithm is a random forest model, a neural network or a support vector machine and the like.
6. The machine-learning-based sparse matrix vector multiplication parallel task granularity parameter auto-tuning apparatus of claim 4 or 5, wherein: the matrix statistical characteristics comprise sparse matrix row number, sparse matrix column number, sparse matrix non-zero element ratio, sparse matrix row non-zero element minimum number, sparse matrix row non-zero element maximum number, sparse matrix average each row non-zero element number and sparse matrix average each row non-zero element number standard deviation.
CN202010880655.8A 2020-07-20 2020-08-27 Automatic adjusting and optimizing method and device for granularity parameters of sparse matrix vector multiplication parallel tasks Active CN111984418B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010701971 2020-07-20
CN2020107019714 2020-07-20

Publications (2)

Publication Number Publication Date
CN111984418A true CN111984418A (en) 2020-11-24
CN111984418B CN111984418B (en) 2022-09-02

Family

ID=73441027

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010880655.8A Active CN111984418B (en) 2020-07-20 2020-08-27 Automatic adjusting and optimizing method and device for granularity parameters of sparse matrix vector multiplication parallel tasks

Country Status (1)

Country Link
CN (1) CN111984418B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113360188A (en) * 2021-05-18 2021-09-07 中国石油大学(北京) Parallel processing method and device for optimizing sparse matrix-vector multiplication

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104636273A (en) * 2015-02-28 2015-05-20 中国科学技术大学 Storage method of sparse matrix on SIMD multi-core processor with multi-level cache
US20180189239A1 (en) * 2016-12-31 2018-07-05 Intel Corporation Heterogeneous hardware accelerator architecture for processing sparse matrix data with skewed non-zero distributions
CN109993683A (en) * 2017-12-29 2019-07-09 英特尔公司 Machine learning sparse calculation mechanism, the algorithm calculations micro-architecture and sparsity for training mechanism of any neural network
CN111428192A (en) * 2020-03-19 2020-07-17 湖南大学 Method and system for optimizing high performance computational architecture sparse matrix vector multiplication

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104636273A (en) * 2015-02-28 2015-05-20 中国科学技术大学 Storage method of sparse matrix on SIMD multi-core processor with multi-level cache
US20180189239A1 (en) * 2016-12-31 2018-07-05 Intel Corporation Heterogeneous hardware accelerator architecture for processing sparse matrix data with skewed non-zero distributions
CN109993683A (en) * 2017-12-29 2019-07-09 英特尔公司 Machine learning sparse calculation mechanism, the algorithm calculations micro-architecture and sparsity for training mechanism of any neural network
CN111428192A (en) * 2020-03-19 2020-07-17 湖南大学 Method and system for optimizing high performance computational architecture sparse matrix vector multiplication

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张凯: "向量SIMD DSP上高效矩阵运算技术研究", 《国防科学技术大学工学博士论文》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113360188A (en) * 2021-05-18 2021-09-07 中国石油大学(北京) Parallel processing method and device for optimizing sparse matrix-vector multiplication
CN113360188B (en) * 2021-05-18 2023-10-31 中国石油大学(北京) Parallel processing method and device for optimizing sparse matrix-vector multiplication

Also Published As

Publication number Publication date
CN111984418B (en) 2022-09-02

Similar Documents

Publication Publication Date Title
Borlea et al. A unified form of fuzzy C-means and K-means algorithms and its partitional implementation
Khaleghzadeh et al. A novel data-partitioning algorithm for performance optimization of data-parallel applications on heterogeneous HPC platforms
Li et al. Performance analysis of GPU-based convolutional neural networks
Benatia et al. Sparse matrix format selection with multiclass SVM for SpMV on GPU
CN107644063B (en) Time sequence analysis method and system based on data parallelism
CN112101525A (en) Method, device and system for designing neural network through NAS
Daghero et al. Energy-efficient deep learning inference on edge devices
Liang et al. OMNI: A framework for integrating hardware and software optimizations for sparse CNNs
Neelima et al. Predicting an optimal sparse matrix format for SpMV computation on GPU
Benatia et al. Machine learning approach for the predicting performance of SpMV on GPU
CN111984418B (en) Automatic adjusting and optimizing method and device for granularity parameters of sparse matrix vector multiplication parallel tasks
Moreno et al. Improving the performance and energy of non-dominated sorting for evolutionary multiobjective optimization on GPU/CPU platforms
Bai et al. Dnnabacus: Toward accurate computational cost prediction for deep neural networks
Daoudi et al. A Comparative study of parallel CPU/GPU implementations of the K-Means Algorithm
Ni et al. Online performance and power prediction for edge TPU via comprehensive characterization
CN115344386A (en) Method, device and equipment for predicting cloud simulation computing resources based on sequencing learning
He et al. HOME: A holistic GPU memory management framework for deep learning
CN112686342B (en) Training method, device and equipment of SVM (support vector machine) model and computer-readable storage medium
CN112083929B (en) Performance-energy consumption collaborative optimization method and device for power constraint system
Iskandar et al. Near-data-processing architectures performance estimation and ranking using machine learning predictors
Elafrou et al. A lightweight optimization selection method for Sparse Matrix-Vector Multiplication
Wang et al. A novel parallel algorithm for sparse tensor matrix chain multiplication via tcu-acceleration
Glavan et al. Cloud environment assessment using clustering techniques on microservices dataset
Song et al. DNN training acceleration via exploring GPGPU friendly sparsity
CN116341628B (en) Gradient sparsification method, system, equipment and storage medium for distributed training

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant