CN111984418A

CN111984418A - Automatic adjusting and optimizing method and device for granularity parameters of sparse matrix vector multiplication parallel tasks

Info

Publication number: CN111984418A
Application number: CN202010880655.8A
Authority: CN
Inventors: 方建滨; 黄春; 唐滔; 彭林; 张鹏; 范小康; 崔英博
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2020-07-20
Filing date: 2020-08-27
Publication date: 2020-11-24
Anticipated expiration: 2040-08-27
Also published as: CN111984418B

Abstract

The invention belongs to the field of parallel computing, and discloses an automatic adjusting and optimizing method and device for sparse matrix vector multiplication parallel task granularity parameters, wherein the method comprises a prediction model construction step, namely constructing a prediction model by using a machine learning method; a statistical characteristic value obtaining step, namely analyzing the original data file of the matrix to obtain a statistical characteristic value of the matrix; an optimal task granularity parameter prediction step, namely inputting the obtained statistical characteristic values into a prediction model, and predicting optimal parallel task granularity parameter values of the SpMV program when the matrix characteristic values are used as input; and a configuration step, namely adjusting the task granularity of the system in parallel operation according to the prediction result. The device comprises a prediction model construction module, a statistical characteristic value acquisition module, an optimal task granularity parameter prediction module and a configuration module. The invention realizes the purposes of improving the load balance and the overall calculation performance of the parallel program by adaptively selecting the parallel task granularity of the SpMV in different input matrixes.

Description

Automatic adjusting and optimizing method and device for granularity parameters of sparse matrix vector multiplication parallel tasks

Technical Field

The invention relates to a parallel program task allocation technology, in particular to a task granularity parameter automatic tuning method and device for a sparse matrix vector multiplication parallel program.

Background

In the fields of scientific calculation and artificial intelligence, Sparse Matrix-Vector Multiplication (SpMV) has been widely used as a basic operator, and its corresponding operation module is also one of the most time-consuming modules in software in this field. Unlike dense matrices, sparse matrices have only a small number of non-zero elements, most of which are zero-elements. These zeros will not affect the operation result, and there is extra overhead in accessing and operating the zeros, resulting in low operation efficiency.

Therefore, researchers only store non-zero elements in the sparse matrix in a compressed storage format mode by utilizing the sparse characteristic of the matrix, so that zero element processing is avoided, and storage and access overhead of the matrix is reduced. Common Sparse matrix storage formats include Coordinate List (COO), Compressed Spare Row (CSR), Ellpack (ELL), Hybrid ELL + COO (hyb), and the like. Both the sparse structure of the matrix and the storage format used will have an impact on the sparse matrix operation performance.

The non-zero element distribution of the sparse matrix generated in practical application has irregularity, and meanwhile, the storage hierarchy of a computer system is relatively complex, so that great challenge is brought to the operation performance optimization work of the sparse matrix. The current optimization work mainly expands from two aspects: on one hand, a new sparse matrix storage format and a corresponding SpMV realization algorithm are introduced, and the layout of non-zero elements is reorganized to fully utilize a cache and a wide vector functional component of a processor, so that the characteristics of an upper sparse matrix and the structural characteristics of a bottom hardware system are considered; on the other hand, by using a parallelization method, the sparse matrix operation task is divided and distributed to the parallel computer systems to be executed concurrently. These methods introduce a large number of configuration parameters and create a huge optimization variable space. Finding the optimal configuration parameters in an exhaustive manner is clearly not feasible in this optimization variable space. Therefore, the method is of great significance in studying an automatic performance optimization method to obtain the optimal configuration parameters for sparse matrix vector multiplication operation.

In particular, since there is no dependency relationship between results of two adjacent rows of the calculated SpMV, the cumulative calculation of each row can be regarded as an independent subtask, so the SpMV has the characteristic of easy expansion. Assuming that the processor has t threads, by default, m rows of the sparse matrix are divided into t subtasks, and each thread is responsible for one subtask, and the size of the subtask is g ═ m/t (task granularity). When the task granularity g is m/(2 · t), each thread is responsible for 2 tasks; when the task granularity g is m/(4 · t), each thread is responsible for 4 tasks, and so on … …, when the task granularity g is m/(K · t), each thread is responsible for K tasks. Using different task granularities will result in different ways of task to thread allocation and affect load balancing among threads. In particular, as the number of processor threads and the size of the matrix become larger, the distribution mode of tasks to threads and the granularity of tasks appear to increase sharply, resulting in a huge optimization space. Therefore, for a given sparse matrix data set and a multi-core processor platform, a corresponding optimal task granularity prediction model is required to be constructed to balance the calculation and memory access loads among a plurality of threads, so that the sparse matrix data set, the SpMV parallel calculation task and the hardware platform form optimal matching, the calculation potential of the multi-core processor is mined to the maximum extent, and the calculation efficiency of the SpMV parallel task is improved.

Disclosure of Invention

The invention aims to: the task granularity parameter of the system during parallel operation is automatically adjusted by constructing a model, so that the load balance and the overall operation performance of the sparse matrix vector multiplication (SpMV) parallel program on the multi-core processor are improved.

The invention is realized by the following steps: a sparse matrix vector multiplication parallel task granularity parameter automatic tuning method based on machine learning comprises the following steps:

s1, a prediction model construction step, namely constructing a prediction model by using a machine learning method, and constructing a prediction model f between the statistical characteristic value space X and the parallel task granularity optimal value space Y: x → Y, wherein X (X) is used₁,x₂,…,x_i,…x_n) To represent the n-dimensional statistical eigenvectors x, x of the sparse matrix_iDenotes the value of statistical characteristic, the index i is 1,2, …, n, the task granularity is expressed by y, and the value is in the statistical characteristic vector x (x)₁,x₂,…,x_i,…x_n) Under the condition, the parallel task granularity value y with the highest SpMV running performance₀The granularity of the parallel task is optimally taken, wherein i and n are positive integers;

s2, a statistical characteristic value obtaining step, namely analyzing the matrix original data file to obtain a statistical characteristic value of the matrix; the statistical characteristic value is used for describing a sparse matrix and is composed of non-zero element distribution information of the matrix;

s3, an optimal task granularity parameter predicting step, namely, inputting the obtained statistical characteristic values into a prediction model for a sparse matrix to be calculated, and predicting the optimal parallel task granularity of the SpMV program when the matrix statistical characteristic values are used as inputTaking a parameter value; at the initial starting stage of the SpMV program, acquiring a statistical characteristic value vector x of an input sparse matrix, inputting the x into a prediction model, and outputting a result y₀That is, the SpMV uses the matrix as the optimal value of the task granularity at the time of input;

and S4, a configuration step, namely, adjusting the task granularity parameter of the system in parallel operation according to the prediction result of the optimal task granularity parameter prediction step. The step can guide the distribution process of the SpMV parallel tasks to the threads, and the load balance and the overall operation efficiency of the parallel program are improved.

The step of constructing the prediction model specifically comprises the following steps:

s11, selecting a matrix statistical characteristic;

and S12, generating training data: the data required by training comprises statistical characteristic values of the sparse matrix and the optimal task granularity of the corresponding matrix; traversing a task granularity value space for each sparse matrix by using an exhaustive search method, and selecting a value which enables SpMV operation performance to be the highest as an optimal value of the task granularity; the value of the statistical characteristic of the sparse matrix in the S11 can be calculated from the read sparse matrix; before training, carrying out standardization processing on the matrix statistical characteristic values;

s13, training the model by using a machine learning algorithm: a machine learning method is used for constructing a prediction model for identifying the relation between the statistical characteristic value of the sparse matrix and the optimal task granularity value; taking the matrix statistical characteristic value and the optimal task granularity as the input of a machine learning algorithm, and training to obtain an optimal task granularity prediction model; the used prediction model training algorithm is a random forest model, a neural network or a support vector machine and the like.

The matrix statistical characteristics comprise sparse matrix row number, sparse matrix column number, sparse matrix non-zero element ratio, sparse matrix row non-zero element minimum number, sparse matrix row non-zero element maximum number, sparse matrix average each row non-zero element number and sparse matrix average each row non-zero element number standard deviation.

A sparse matrix vector multiplication parallel task granularity parameter automatic tuning device based on machine learning comprises:

the prediction model building module is used for building a prediction model by using a machine learning method, and the prediction model f is built between a statistical characteristic value space X and a parallel task granularity optimal value space Y: x → Y, wherein X (X) is used₁,x₂,…,x_i,…x_n) To represent the n-dimensional statistical eigenvectors x, x of the sparse matrix_iDenotes the value of statistical characteristic, the index i is 1,2, …, n, the task granularity is expressed by y, and the value is in the statistical characteristic vector x (x)₁,x₂,…,x_i,…x_n) Under the condition, the parallel task granularity value y with the highest SpMV running performance₀The granularity of the parallel task is optimally taken, wherein i and n are positive integers;

the statistical characteristic value obtaining module is used for analyzing the matrix original data file and obtaining the statistical characteristic value of the matrix; the statistical characteristic value is used for describing a sparse matrix and is composed of non-zero element distribution information of the matrix;

the optimal task granularity parameter prediction module is used for inputting the obtained statistical characteristic values into a prediction model for a sparse matrix to be calculated, and predicting the optimal parallel task granularity parameter values of the SpMV program when the matrix statistical characteristic values are used as input; at the initial starting stage of the SpMV program, acquiring a statistical characteristic value input vector x of an input sparse matrix, inputting the input vector x into a prediction model, and outputting a result y₀That is, the SpMV uses the matrix as the optimal value of the task granularity at the time of input;

and the configuration module is used for adjusting the task granularity parameters of the system in parallel operation according to the prediction result. The module can guide the distribution process of the SpMV parallel tasks to the threads, and the load balance and the overall operation performance of the SpMV parallel programs are improved.

The prediction model building module specifically executes the following steps:

s11, selecting a matrix statistical characteristic;

and S12, generating training data: the data required by training comprises statistical characteristic values of the sparse matrix and optimal task granularity values of the corresponding matrix; traversing a task granularity value space for each sparse matrix by using an exhaustive search method, and selecting a value which enables SpMV operation performance to be the highest as an optimal value of the task granularity; the statistical characteristic value of the sparse matrix can be obtained by calculation from the read sparse matrix; before training, carrying out standardization processing on the matrix statistical characteristic values;

The invention has the beneficial effects that:

and a prediction model is established by using a machine learning method, in the initial stage of SpMV starting, the value is calculated according to the statistical characteristics of the input matrix, the optimal value of the task granularity is predicted, the task granularity parameter of the system in parallel operation is configured according to the prediction result, the load balance of the SpMV task on the multi-core processor is improved, and therefore better operation performance is obtained. The experimental results show that, relative to the default task granularity, the average performance improvement of about 35% can be obtained by using the task granularity value selected by the prediction model.

Drawings

FIG. 1 is a flow chart of the sparse matrix vector multiplication task granularity parameter automatic tuning method of the present invention;

FIG. 2 is a schematic diagram of a training process of the sparse matrix vector multiplication task granularity prediction model of the present invention;

FIG. 3 is a schematic diagram of the sparse matrix vector multiplication task granularity parameter automatic tuning device of the present invention.

Detailed Description

The technical solution of the present invention will be described in detail with reference to the following examples.

Example 1: a sparse matrix vector multiplication parallel task granularity parameter automatic tuning method.

Fig. 1 is a flowchart of the sparse matrix vector multiplication task granularity parameter automatic tuning method of the present invention, which includes: the method comprises the steps of prediction model construction, statistical characteristic value acquisition, optimal task granularity parameter prediction and configuration.

S1, a prediction model construction step, namely constructing a prediction model by using a machine learning method, and constructing a prediction model f between the statistical characteristic value space X and the parallel task granularity optimal value space Y: x → Y, wherein X (X) is used₁,x₂,…,x_i,…x_n) To represent the n-dimensional statistical eigenvectors x, x of the sparse matrix_iRepresenting the value of statistical characteristic, using y to represent the task granularity, and using the statistical characteristic vector x (x)₁,x₂,…,x_i,…x_n) Under the condition, the parallel task granularity value y with the highest SpMV running performance₀The granularity of the parallel task is optimally taken, wherein i and n are positive integers;

s3, an optimal task granularity parameter prediction step, namely, inputting the obtained statistical characteristic values into a prediction model for a sparse matrix to be calculated, and predicting the optimal parallel task granularity parameter values of the SpMV program when the matrix statistical characteristic values are used as input; at the initial starting stage of the SpMV program, acquiring a statistical characteristic value vector x of an input sparse matrix, inputting the x into a prediction model, and outputting a result y₀That is, the SpMV uses the matrix as the optimal value of the task granularity at the time of input;

and step S4 is configured, and according to the prediction result, the task granularity parameter of the parallel operation system is adjusted. The step can guide the distribution process of the SpMV parallel tasks to the threads, and the load balance is improved.

s11, selecting a matrix statistical characteristic; the matrix statistical characteristics comprise sparse matrix row number, sparse matrix column number, sparse matrix non-zero element proportion, sparse matrix row non-zero element minimum number, sparse matrix row non-zero element maximum number, sparse matrix average each row non-zero element number and sparse matrix average each row non-zero element number standard deviation;

in practical implementation, for the acquisition of the statistical characteristics of the matrix, a Python script program is used for processing the original data file of the matrix, and corresponding characteristic values are extracted or calculated.

S12, generating training data; the data required by training comprises statistical characteristic values of the sparse matrix and the optimal task granularity of the corresponding matrix; traversing a task granularity value space for each sparse matrix by using an exhaustive search method, and selecting a value which enables SpMV operation performance to be the highest as an optimal value of the task granularity; the statistical characteristic value of the sparse matrix can be obtained by calculation from the read sparse matrix; before training, carrying out standardization processing on the matrix statistical characteristic values;

in the actual implementation process, before training, in order to eliminate dimensional influence among indexes, accelerate the speed of solving the optimal solution by gradient descent and improve the accuracy of the model, the statistical characteristic value of the matrix is standardized by using a StandardScale () built-in function of Python. The function uses the formula (X-mean)/std to bring the eigenvalues around 0 with a variance of 1.

To generate the training data, the performance of the SpMV at different task granularities needs to be run for each input data set. Taking a 64-thread hardware platform of an intel (r) xeon (r) Gold 6130 CPU @2.10GHz processor as an example, taking K as 1,2,4,6,8,10,12,14,16, and 9 values in total, testing the operating performance of the SpMV under each K value, and calculating the task allocation granularity K under the optimal performance_best. In an embodiment of the present invention, 1989 sparse matrices were used as a training data set, using a matrix with a number of rows greater than 1024 in the sparse matrix set of florida university. For each of the sparse matrices, the matrix is,and traversing the task granularity value space by using an exhaustive search method, and selecting a value which enables SpMV operation performance (Gflops) to be the highest from the task granularity value space as the optimal value of the task granularity.

In this embodiment, the statistical characteristic value of the input matrix and the optimal task allocation granularity together form a training data set. The number of samples of the data set is equal to the number of input matrices.

S13, training the model by using a machine learning algorithm; a machine learning method is used for constructing a prediction model for identifying the relation between the statistical characteristic value of the sparse matrix and the optimal task granularity value; taking the matrix statistical characteristic value and the optimal task granularity as the input of a machine learning algorithm, and training to obtain an optimal task granularity prediction model; the prediction model constructed in this embodiment is a random forest model, and other models may also be used, such as: neural networks, support vector machines, etc.

When the model is trained, considering that the accuracy of the prediction model is possibly influenced by insufficient training data, the data are randomly divided into a training set (80%) and a testing set (20%), the model is optimized in the training set in a cross validation (cross validation) mode, and the performance of the trained model is evaluated by the testing set. The model tuning adopts k-fold cross validation (k-fold cross validation), and the principle can be simply summarized as follows: the data set D is first divided into k mutually exclusive subsets of similar size, i.e.

Each subset Di maintains as consistent a data distribution as possible, i.e. is derived from D by hierarchical sampling. Then, taking the union of k-1 subsets as a training set each time, and taking the rest subsets as a test set; thus, k sets of training/validation sets are obtained, so that k sets of training and validation can be performed, and finally the mean value of the k test results is returned. The most common 10-fold cross validation is adopted in the implementation, and cross validation of the model can be realized by calling a cross _ val _ score () built-in function of Python and obtaining the average accuracy of the model. Finally, the trained models were evaluated using a 20% test set.

The trained model is called by SpMV in the form of a library. Specifically, at the initial stage of SpMV starting, a matrix statistical characteristic value is calculated according to the read original input matrix. Then inputting the matrix statistical characteristic values into an optimal task granularity prediction model, and calculating the optimal task allocation granularity K_predict. The parameters are input into the runtime system of OpenMP, and the task granularity is set to guide the distribution process of the SpMV task to the thread.

A statistical characteristic value obtaining step S2, which is to analyze the matrix raw data file to obtain a statistical characteristic value of the matrix, where the statistical characteristic value is mainly used to delineate a sparse matrix, and is composed of non-zero-element distribution information of the matrix, and includes: the number of rows of the sparse matrix, the number of columns of the sparse matrix, the non-zero element proportion of the sparse matrix, the minimum number of non-zero elements of the rows of the sparse matrix, the maximum number of non-zero elements of the rows of the sparse matrix, the average number of non-zero elements of each row of the sparse matrix and the standard deviation of the average number of non-zero elements of each row of the sparse matrix.

An optimal task granularity parameter predicting step S3, calculating a statistical characteristic value vector of a sparse matrix to be calculated, inputting the statistical characteristic value vector into the model constructed in S1, and predicting to obtain an optimal task granularity value of the sparse matrix to be calculated; at the initial starting stage of the SpMV program, acquiring a statistical characteristic value vector x of an input sparse matrix, inputting the x into a prediction model, and outputting a result y₀That is, the SpMV uses the matrix as the best value of the task granularity at the time of input.

And step S4 is configured, task granularity parameters of the system during parallel operation are adjusted according to the prediction result, the distribution process of SpMV tasks to threads is guided, and the load balance and the overall calculation performance of the parallel program are improved.

FIG. 2 is a schematic diagram of a training process of the sparse matrix vector multiplication task granularity prediction model of the present invention. Firstly, reading a sparse matrix file; then measuring SpMV performance (Gflops) under each task granularity, and marking the corresponding task granularity when the optimal performance is obtained; analyzing the matrix file to obtain statistical characteristic values; and taking the matrix statistical characteristic value and the optimal task granularity value as the input of a machine learning algorithm to construct a prediction model.

Example 2: and the sparse matrix vector multiplication parallel task granularity parameter automatic tuning device.

FIG. 3 is a block diagram of the sparse matrix vector multiplication task granularity parameter automatic tuning device of the present invention, including: the system comprises a prediction model construction module, a statistical characteristic value acquisition module, an optimal task granularity parameter prediction module and a configuration module.

The prediction model building module is used for building a prediction model by using a machine learning method, and the prediction model f is built between a statistical characteristic value space X and a parallel task granularity optimal value space Y: x → Y, wherein X (X) is used₁,x₂,…,x_i,…x_n) To represent the n-dimensional statistical characteristic value vector x, x of the sparse matrix_iExpressing the value of statistical characteristics, expressing the task granularity by using y, and expressing a vector x (x) in the statistical characteristics₁,x₂,…,x_i,…x_n) Under the condition, the parallel task granularity value y with the highest SpMV running performance₀The granularity of the parallel task is optimally taken, wherein i and n are positive integers;

the optimal task granularity parameter prediction module is used for inputting the obtained statistical characteristic values into a prediction model for a sparse matrix to be calculated, and predicting the optimal parallel task granularity parameter values of the SpMV program when the matrix statistical characteristic values are used as input; at the initial starting stage of the SpMV program, acquiring a statistical characteristic value output vector x of an input sparse matrix, inputting the x into a prediction model, and outputting a result y₀That is, the SpMV uses the matrix as the optimal value of the task granularity at the time of input;

The prediction model building module specifically executes the following steps:

s11, selecting a matrix statistical characteristic;

and S12, generating training data: the data required by training comprises statistical characteristic values of the sparse matrix and the optimal task granularity of the corresponding matrix; traversing a task granularity value space for each sparse matrix by using an exhaustive search method, and selecting a value which enables SpMV operation performance to be the highest as an optimal value of the task granularity; the statistical characteristic value of the sparse matrix can be obtained by calculation from the read sparse matrix; before training, carrying out standardization processing on the matrix statistical characteristic values;

The present invention is not intended to be limited to the specific embodiments shown and described, and various modifications and changes can be made by those skilled in the art without departing from the spirit and scope of the invention.

Claims

1. A sparse matrix vector multiplication parallel task granularity parameter automatic tuning method is characterized by specifically comprising the following steps,

s1: step of constructing prediction model

Using machinesThe learning method constructs a prediction model, and a prediction model f is constructed between a statistical characteristic value space X and a parallel task granularity optimal value space Y: x → Y, wherein X (X) is used₁,x₂,…,x_i,…x_n) To represent the n-dimensional statistical eigenvectors x, x of the sparse matrix_iDenotes the value of statistical characteristic, the index i is 1,2, …, n, the task granularity is expressed by y, and the value is in the statistical characteristic vector x (x)₁,x₂,…,x_i,…x_n) Under the condition, the parallel task granularity value y with the highest SpMV running performance₀The granularity of the parallel task is optimally taken, wherein i and n are positive integers;

s2: statistical characteristic value obtaining step

Analyzing the matrix original data file to obtain the statistical characteristic value of the matrix; the statistical characteristic value is used for describing a sparse matrix and is composed of non-zero element distribution information of the matrix;

s3: optimal task granularity parameter prediction step

For a sparse matrix to be calculated, inputting the obtained statistical characteristic values into a prediction model, and predicting the optimal parallel task granularity parameter values of the SpMV program when the matrix statistical characteristic values are used as input; at the initial starting stage of the SpMV program, acquiring a statistical characteristic value vector x of an input sparse matrix, inputting the x into a prediction model, and outputting a result y₀That is, the SpMV uses the matrix as the optimal value of the task granularity at the time of input;

s4: step of configuration

And adjusting the task granularity parameters of the system in parallel operation according to the prediction result of the optimal task granularity parameter prediction step.

2. The sparse matrix vector multiplication parallel task granularity parameter automatic tuning method of claim 1, wherein the prediction model construction step specifically comprises,

s11: selecting matrix statistical features

S12: generating training data

The data required by training comprises statistical characteristic values of the sparse matrix and the optimal task granularity of the corresponding matrix; traversing a task granularity value space for each sparse matrix by using an exhaustive search method, and selecting a value which enables SpMV operation performance to be the highest as an optimal value of the task granularity; the statistical characteristic value of the sparse matrix can be obtained by calculation from the read sparse matrix; before training, carrying out standardization processing on the matrix statistical characteristic values;

s13: training models using machine learning algorithms

A machine learning method is used for constructing a prediction model for identifying the relation between the statistical characteristic value of the sparse matrix and the optimal task granularity value; taking the matrix statistical characteristic value and the optimal task granularity as the input of a machine learning algorithm, and training to obtain an optimal task granularity prediction model; the used prediction model is a random forest model, a neural network or a support vector machine.

3. The sparse matrix vector multiplication parallel task granularity parameter automatic tuning method of claim 1 or 2, characterized by: the matrix statistical characteristics comprise sparse matrix row number, sparse matrix column number, sparse matrix non-zero element ratio, sparse matrix row non-zero element minimum number, sparse matrix row non-zero element maximum number, sparse matrix average each row non-zero element number and sparse matrix average each row non-zero element number standard deviation.

4. The utility model provides a sparse matrix vector multiply parallel task granularity parameter automatic tuning device based on machine learning which characterized in that includes:

the prediction model building module is used for building a prediction model by using a machine learning method, and the prediction model f is built between a statistical characteristic value space X and a parallel task granularity optimal value space Y: x → Y, wherein X (X) is used₁,x₂,…,x_i,…x_n) To represent the n-dimensional statistical eigenvectors x, x of the sparse matrix_iDenotes the value of statistical characteristic, the index i is 1,2, …, n, the task granularity is expressed by y, and the value is in the statistical characteristic vector x (x)₁,x₂,…,x_i,…x_n) SpMV running Performance under conditionsMaximum parallel task granularity value y₀The granularity of the parallel task is optimally taken, wherein i and n are positive integers;

and the configuration module is used for adjusting the task granularity parameters of the system in parallel operation according to the prediction result.

5. The machine-learning-based sparse matrix vector multiplication parallel task granularity parameter automatic tuning device of claim 4, wherein the prediction model construction module specifically executes the following steps:

s11: selecting matrix statistical features

S12: generating training data

s13: training models using machine learning algorithms

A machine learning method is used for constructing a prediction model for identifying the relation between the statistical characteristic value of the sparse matrix and the optimal task granularity value; taking the matrix statistical characteristic value and the optimal task granularity as the input of a machine learning algorithm, and training to obtain an optimal task granularity prediction model; the used prediction model training algorithm is a random forest model, a neural network or a support vector machine and the like.

6. The machine-learning-based sparse matrix vector multiplication parallel task granularity parameter auto-tuning apparatus of claim 4 or 5, wherein: the matrix statistical characteristics comprise sparse matrix row number, sparse matrix column number, sparse matrix non-zero element ratio, sparse matrix row non-zero element minimum number, sparse matrix row non-zero element maximum number, sparse matrix average each row non-zero element number and sparse matrix average each row non-zero element number standard deviation.