CN112529157B - Sparse tensor storage format automatic selection method based on convolutional neural network - Google Patents
Sparse tensor storage format automatic selection method based on convolutional neural network Download PDFInfo
- Publication number
- CN112529157B CN112529157B CN202011430624.9A CN202011430624A CN112529157B CN 112529157 B CN112529157 B CN 112529157B CN 202011430624 A CN202011430624 A CN 202011430624A CN 112529157 B CN112529157 B CN 112529157B
- Authority
- CN
- China
- Prior art keywords
- tensor
- sparse
- matrix
- storage format
- dimension
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Complex Calculations (AREA)
Abstract
The invention discloses a sparse tensor storage format automatic selection method based on a convolutional neural network, which comprises the following steps of: 1) respectively reducing the multidimensional tensor into a two-dimensional matrix by two conversion methods of unfolding and mapping; 2) scaling the matrix to a fixed size by a density representation or histogram representation; 3) taking a matrix with a fixed size as the input of a Convolutional Neural Network (CNN), wherein the structure of the CNN is designed and customized for the automatic selection of a sparse tensor storage format; 4) training the CNN by a supervised learning method to obtain a trained network model; 5) and taking the new sparse tensor as the input of the network model, and obtaining the optimal storage format of the tensor after forward propagation. The method utilizes the advantages of CNN under the classification problem and combines with a feed-forward neural network FFNN to adapt to the prediction of an optimal sparse tensor storage format, effectively converts the sparse tensor into matrix input with a fixed size on the premise of fully retaining tensor characteristics, and is applicable to automatic selection of a sparse format of a high-order tensor under any tensor calculation.
Description
Technical Field
The invention relates to the fields of convolutional neural networks, sparse tensor storage formats, tensor calculation and the like, in particular to an automatic sparse tensor storage format selection method based on the convolutional neural networks.
Background
Tensors generally represent high-dimensional data that exceeds two dimensions. Multidimensional tensors are widely applied to the fields of scientific calculation, numerical analysis, machine learning and the like. Since real-world tensors are typically large and very sparse, many existing efforts optimize the performance of tensor computations based on computational patterns and operational dependencies. Although parallelization can significantly improve the performance of tensor computations, it is still limited by sparse patterns and hardware features. Thus, prior work has proposed diverse sparse tensor formats to improve computational performance through commonly designed storage and algorithms compatible with sparsity and hardware. However, due to the complex sparse pattern and the diverse hardware characteristics, the optimal sparse storage formats for tensor computation vary greatly. Therefore, determining different sparse tensors under diverse hardware platforms is challenging for the optimal storage format for tensor computation.
The storage format selection of the sparse tensor can be seen as a classification problem. This problem has proven to be well suited for deep learning techniques, which have proven their effectiveness in image classification and object detection. In particular, Convolutional Neural Networks (CNNs) have gained tremendous popularity in classification tasks due to their ability to capture the underlying features of input data without human intervention. However, such methods cannot be directly applied to tensor format selection due to the high dimensionality of the data to be processed. Although high-dimensional convolution is proposed, it is not suitable for automatic selection of sparse tensor storage formats, mainly for two reasons. Firstly, irregularity of tensor data reduces the computational efficiency of convolution operations, which brings unacceptable training overhead; secondly, the popular deep learning framework supports only three-dimensional convolution layers at the highest, and cannot meet the requirement of higher dimension tensor. The existing solution only supports automatic selection of the optimal sparse storage format for matrix operations, which can be summarized in the following two aspects:
(1) traditional machine learning method
This is studied by extracting sparse features of the matrix as input parameters and training classifiers or regression models to predict the best combination of input parameters. Commonly used classifiers include Decision Trees (DTs), Multi-class Support Vector Machines (SVMs), rule sets (rulesets), and the like. Commonly used regression models include linear regression models and tree-based regression models. The matrix sparse characteristics extracted by the method are rough, and the sparse space distribution with fine granularity cannot be effectively reflected, so that the prediction accuracy of the optimal sparse tensor format is limited.
(2) Deep learning method
The study in this respect is to convert irregular sparse matrices into fixed size matrices by scaling methods. A matrix of fixed size may be used as an input to the convolutional neural network, and multiple matrices may be viewed as different channels of the image. In addition, the sparse features of the matrix can also be accessed into the feature layer of the neural network, thereby supplementing the sparse spatial distribution of the matrix lost during the scaling process. The method only supports sparse matrix at present and does not support prediction of the optimal storage format of higher-order data. Since tensors typically store data in higher dimensions, they cannot be fed directly to a convolutional neural network that typically processes no more than three dimensions (such as weight, height, and channel of the image data).
In summary, existing solutions, whether based on the conventional machine learning method or the deep learning method, only support sparse matrices and do not support automatic selection of storage formats of higher-order tensors. The format selection of the sparse tensor presents special challenges to the convolutional neural network: 1) in order to be matched with a two-dimensional convolutional neural network, a high-dimensional tensor needs to be reduced into a matrix; 2) the matrix generated after reduction needs to be scaled to be network input with a fixed size under the condition that the sparse mode is not lost as much as possible; 3) the convolutional neural network needs to be redesigned to complement the sparse features lost during tensor conversion.
Disclosure of Invention
The invention solves the problems: the method overcomes the defects of the prior art, provides an automatic selection method of the sparse tensor storage format based on the convolutional neural network, and fully excavates the performance influence of the sparse storage format on tensor calculation. Specifically, the sparse tensor is subjected to tensor conversion and feature extraction, and the generated sparse matrix and feature vectors are fed forward to the customized convolutional neural network to predict and obtain the optimal storage format of the sparse tensor.
The technical solution of the invention is a sparse tensor storage format automatic selection method based on a convolutional neural network, comprising the following steps:
step 1: first a sparse matrix dataset of physical entity production and operation is collected. Then, combining the index values of the sparse matrix in rows and columns into a high dimension or a low dimension of the sparse tensor, and finally generating a sparse tensor data set of a predetermined order;
step 2: storing the sparse tensor data set into various sparse tensor formats, performing tensor calculation on a preset hardware platform to obtain execution time of the sparse tensor data set, and converting the execution time into a label of corresponding tensor data;
and 3, step 3: reducing the sparse tensor data into a plurality of matrixes along a certain dimension by a mapping method, wherein the dimension corresponds to the dimension for executing tensor calculation in the step 2;
and 4, step 4: reducing the sparse tensor data into a matrix along a certain dimension by a flattening method, wherein the dimension corresponds to the dimension for executing tensor calculation in the step 2;
and 5: scaling the irregular-size matrix generated in step 3 and step 4 into a fixed-size matrix through density representation or histogram representation;
step 6: normalizing the values of all elements in the fixed-size matrix to a range of [0,1] by dividing by the maximum value of the elements in the matrix;
and 7: analyzing the sparse tensor data and obtaining a candidate feature set of the tensor, wherein the candidate feature set comprises global features and local features of the sparse tensor;
and 8: forming an feature set of each sparse tensor by the normalization matrix generated in the step 6 and the candidate feature set generated in the step 7, and arranging the feature sets and labels of all the tensors into a list according to a number index, so as to form a training set of a Convolutional Neural Network (CNN);
and step 9: taking the training set as the input of the customized CNN, and generating a trained network model after the training process;
step 10: when a new optimal storage format for inputting sparse tensor data is selected, re-executing the step 3-7, and combining the normalized matrix of the sparse tensor and the candidate feature set into a customized CNN prediction set;
step 11: inputting the prediction set into the trained network model, outputting the probability of obtaining the best performance for each sparse tensor storage format, and selecting the sparse format for obtaining the maximum probability as the storage format of the tensor execution tensor calculation;
step 12: repeating the steps 10-11 until the storage formats of all the sparse tensors to be predicted are selected;
step 13: if the automatic selection of the sparse tensor format is to be realized on a new hardware platform, there are two situations: the hardware architecture is the same as the software system, the trained model generated in the step 9 is reserved, and the step 12 is executed again; secondly, the hardware architecture and the software system have a plurality of differences, the trained model generated in the step 9 is not reserved, and the steps 2 to 12 are executed again;
step 14: if the matrix scaling method needs to be replaced, re-executing the step 5, finely adjusting the network structure of the customized CNN in the step 9, and re-executing the steps 8-12 to automatically select the storage formats of all the sparse tensors to be predicted;
step 15: if the sparse tensor format is to be automatically selected based on other tensor calculation, the step 2 is executed again to obtain a new label of corresponding tensor data, and then the steps 8 to 12 are executed again;
step 16: and if the storage format of the sparse tensor of other orders needs to be automatically selected, fine-tuning the network structure of the customized CNN in the step 9, and re-executing the steps 1-12.
In the step 1, the index values of the sparse matrix in the rows and columns are combined into the high dimension or the low dimension of the sparse tensor, which is because the open-source real sparse tensor data is too little to support the training of the CNN. In addition, a large amount of real sparse matrix data can be obtained in the real world, and two or more sparse matrixes can be randomly selected to obtain enough sparse tensor data for network training.
In the step 2, the execution time is converted into a label of corresponding tensor data, and the specific conversion method is to mark the sparse storage format with the shortest execution time as 1 and mark other sparse storage formats as 0, so as to obtain a bit label of the corresponding tensor.
In the step 3, the sparse tensor data are reduced into a plurality of matrixes along a certain dimension by a mapping method, wherein the mapping method embodies the vertical distribution of the modeless index. The specific method comprises the following steps:
(1) assuming that the tensor X is third order and is mapped to the matrix A along the first dimension (Mode-1), specifically, non-zero values of the tensor along all slices (Slice) of Mode-1 are mapped and accumulated on the same Slice, and the non-zero values are all regarded as 1 when accumulated, defining X as RI×J×K,A∈RJ×KWhere I, J, K is the dimension of the tensor X in three directions, the matrix A is calculated asThe mapping matrix of the tensor X along other dimensions is analogized in the same way;
(2) assuming that the tensor X is of order N and is mapped into several matrices along Mode-1, specifically, non-zero values are mapped to the same slice with fixed N-2 indexes each time, and finally, the final generation is performedA matrix, defining In which INRepresenting the magnitude of the Nth dimension, the equation for mapping the tensor X to the matrix A along Mode-1 isThe tensor X is analogized to the other matrices generated along Mode-1.
In the step 4, the sparse tensor data is reduced into a matrix along a certain dimension by a flattening method, wherein the flattening method embodies the horizontal distribution of the modal index. The tensor is unfolded by a flattening and matrixing method, and the specific method is as follows:
(1) let tensor X be third order and each element in X is flattened to the corresponding matrix. Definition of X ∈ RI×J×K,B∈RI×JKThe tensor X is then flattened along Mode-1 to the matrix B, and the calculation formula is B(i,k×J+j)X (i, j, k). The flattening matrix of the tensor X along other dimensions can be analogized in the same way;
(2) let tensor X be of order N, and each element in X is flattened to a corresponding matrix. Definition ofThe equation for the tensor X flattened into a matrix along Mode-n isWherein the tensor element (i)1,i2,...,iN) Mapping to matrix elements (i)nJ). The flattening matrices of tensor X along other dimensions can be analogized.
In step 5, the irregular-size matrices generated in step 3 and step 4 are scaled to fixed-size matrices by density representation or histogram representation, wherein both methods can represent the coarse-grained features of the original matrix with an acceptable matrix size. The matrix scaling method is specifically as follows:
(1) the density representation captures the detailed density differences between different regions of the original matrix. For density representation, calculating the number of non-zero elements in each block in an original matrix, and filling the non-zero elements into a matrix with a fixed size after scaling;
(2) the histogram representation further captures the distance between the elements and the diagonal in the original matrix, but some of the sparsely distributed features are lost. For histogram representation, distance information of each element in the original matrix in rows and columns versus the diagonal is calculated and filled into scaled fixed size row and column histograms.
In step 7, the sparse tensor data are analyzed and candidate feature sets of the tensor are obtained, wherein the candidate feature sets comprise global features and local features of the sparse tensor. The candidate feature set complements the sparse features of the original tensor that were lost during the tensor conversion. In addition, the feature set of the sparse tensor affects the memory layout and computational characteristics under a particular sparse format. The features in the candidate feature set are classified as follows:
(1) global features include the dimension size of the tensor, the Number of Non-zeros (NNZ for short), sparsity, and features associated with NNZ at each index along a dimension. For example, the NNZs under each index are obtained by accumulation along a certain dimension, and then the average value, the maximum value, the minimum value and the average neighbor difference of the NNZs of all indexes are obtained by further calculation.
(2) Local features are associated with slices and fibers (Fiber), including the number of slices and fibers, the ratio of slices and fibers, the NNZ under each slice and Fiber, and the number of fibers under each slice. For example, the NNZ under each slice is obtained by accumulation along a certain dimension, and then the average value, the maximum value, the minimum value and the average neighbor difference of the NNZ of all slices are further calculated.
In step 9, the training set is used as an input of the customized CNN, and the customized Neural Network (tnsnnet) combines the CNN and a feed-forward Neural Network (FFNN) to better predict an optimal storage format in tensor calculation. The network structure design of TnsNet is as follows:
(1) TnsNet enables network nesting, where the inner network (BaseNet) contains all convolutional and pooling layers. When the method is adapted to other scenes or the scaling method of the change matrix, the network structure and the hyper-parameters of the BaseNet do not need to be changed, and only the full connection layer except the BaseNet needs to be changed;
(2) TnsNet applies two tensor reduction methods, and the matrixes generated after tensor conversion are output as vectors after BaseNet, then are combined into the combined characteristic of tensor, and flow into a full connection layer;
(3) TnsNet introduces a feature layer, and a candidate feature set of a sparse tensor is used as an input of the feature layer, is cascaded with the fully-connected layer in the step (2), and flows into the next fully-connected layer.
In step 13, if the automatic selection of the sparse tensor format is to be implemented on a new hardware platform, different retraining methods are applied to the two cases, and a continuous training method is adopted for the case that the hardware architecture and the software system are the same or similar, that is, training is continued based on the trained model; for the situation that the hardware architecture and the software system have a plurality of differences, a method of training from the beginning is adopted, namely, a trained model is abandoned, and the network model is trained from the beginning directly.
In step 14, if the method for scaling the matrix needs to be changed, step 5 is executed again, and the network structure of the customized CNN in step 9 is fine-tuned. For example, when replacing the density representation with a histogram representation, each input matrix is replaced with a row matrix and a column matrix, which can be seen as different channels of the image. In addition, each matrix is used as input of BaseNet, and output characteristic values are combined to a full connection layer; after the features of the rows and columns are combined, the remaining network structure of the TnsNet remains unchanged.
In step 16, if the storage format of the sparse tensor of the other orders is to be automatically selected, the network structure of the customized CNN in step 9 is finely adjusted. For the N-order tensor calculation based on a certain dimensionality, a matrix is generated after flattening, and a matrix is generated after mappingA matrix. And for a plurality of matrixes generated after mapping, the matrixes are respectively used as input of BaseNet, and output characteristic values are combined to a full connection layer. After the mapping matrixes are combined, the rest network structure of the TnsNet remains unchanged.
Has the advantages that:
the method fully utilizes the advantages of the CNN under the classification problem, and is combined with a feed forward Neural Network (FFNN for short) to adapt to the prediction of the optimal sparse tensor storage format. In addition, the sparse tensor is effectively converted into matrix input with a fixed size on the premise of fully retaining tensor characteristics. The method can be suitable for automatic selection of the sparse format of the high-order tensor under the arbitrary tensor calculation.
Drawings
FIG. 1 is a design summary of a proposed method of implementing the present invention;
FIG. 2(a) is a schematic diagram of the third order tensor proposed by the present invention;
FIG. 2(b) is a slice with tensor at Model-1;
FIG. 2(c) is a mapping of tensors under Model-1;
FIG. 2(d) is the flattening of the tensor at Model-1;
FIG. 3 is a schematic diagram of a matrix scaling method proposed by the present invention;
FIG. 4 is a schematic diagram of sparse tensor format contrast proposed by the present invention;
FIG. 5 is a schematic diagram of a structure of a customized convolutional neural network proposed by the present invention;
fig. 6 is a schematic structural diagram of a customized convolutional neural network represented by a histogram according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings and examples. It should be understood that the specific examples described herein are intended to be illustrative only and are not intended to be limiting. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
The design summary of the present invention is shown in fig. 1, wherein a gray part is a module added in addition to an existing data set and a sparse tensor storage format, and tnsnnet is a custom Convolutional Neural Network (CNN) implemented by the present invention.
As shown in fig. 1: the method comprises the following specific implementation steps:
step 1: the method comprises the steps of collecting a sparse matrix data set of a real world (such as a social network, an image bitmap and the like), wherein the sparse matrix is widely applied to numerical analysis (such as solving partial differential equations) and practical problems (such as the social network, the image bitmap and the like), randomly selecting two or more sparse matrices, and combining the two or more sparse matrices to obtain enough sparse tensor data for network training. Combining the index values of the sparse matrix in rows and columns into the high dimension or the low dimension of the sparse tensor, and finally generating a sparse tensor data set of a specific order;
and 2, step: and storing the sparse tensor data set into various sparse tensor formats, performing tensor calculation on a specific hardware platform to obtain the execution time of the sparse tensor data set, and converting the execution time into a label of corresponding tensor data. The specific conversion method is that the sparse storage format with the shortest execution time is marked as 1, and other sparse storage formats are marked as 0, so that a bit label of a corresponding tensor is obtained;
and step 3: the sparse tensor data is reduced by the mapping method to a number of matrices along a certain dimension, which corresponds to the dimension of the tensor calculation performed in step 2. The mapping of the third order tensor in the first dimension (Mode-1) is shown in figure 2. The specific method comprises the following steps:
(1) let tensor X be third order and map to matrix A along the first dimension (Mode-1). Specifically, the non-zero values of all slices (Slice) of the tensor along Mode-1 are mapped and accumulated onto the same Slice (the non-zero values are all regarded as 1 when accumulated). Definition of X ∈ RI×J×K,A∈RJ×KWhere I, J, K is the dimension of the tensor X in three directions, the matrix A is calculated asThe mapping matrix of the tensor X along other dimensions can be analogized in the same way;
(2) let tensor X be of order N and mapped as matrices along Mode-1. In particular, non-zero values are mapped to the same slice with a fixed N-2 indices at a time, resulting in the final generationA matrix. Definition of Wherein INRepresenting the magnitude of the Nth dimension, the equation for mapping the tensor X to the matrix A along Mode-1 isThe other matrices generated by tensor X along Mode-1 can be analogized in the same way.
And 4, step 4: the sparse tensor data is reduced by a flattening method to a matrix along a dimension corresponding to the dimension in which the tensor calculation is performed in step 2. The flattening of the third order tensor in the first dimension (Mode-1) is shown in FIG. 2. The specific method comprises the following steps:
(1) let tensor X be third order and each element in X is flattened to the corresponding matrix. Definition of X ∈ RI×J×K,B∈RI×JKThe tensor X is then flattened along Mode-1 to the matrix B, and the calculation formula is B(i,k×J+j)X (i, j, k). The flattening matrix of the tensor X along other dimensions can be analogized in the same way;
(2) let tensor X be of order N, and each element in X is flattened to a corresponding matrix. Definition ofThe equation for the tensor X flattened into a matrix along Mode-n isWherein the tensor element (i)1,i2,...,iN) Mapping to matrix elements (i)nJ). The flattening matrices of tensor X along other dimensions can be analogized.
And 5: the irregular-sized matrices generated in steps 3 and 4 are scaled to fixed-sized matrices by density representation or histogram representation. Fig. 3 shows an example of scaling of a matrix, where a matrix of 8 x 8 size is scaled to a number of matrices of 4 x 4 size. The specific method comprises the following steps:
(1) the density representation captures the detailed density differences between different regions of the original matrix. For density representation, calculating the number of non-zero elements in each block in an original matrix, and filling the non-zero elements into a matrix with a fixed size after scaling;
(2) the histogram representation further captures the distance between the elements and the diagonal in the original matrix, but some of the sparsely distributed features are lost. For histogram representation, distance information of each element in the original matrix in rows and columns compared to the diagonal is calculated and filled into scaled fixed-size row and column histograms.
Step 6: normalizing the values of all elements in the fixed-size matrix to a range of [0,1] by dividing by the maximum value of the elements in the matrix;
and 7: the sparse tensor data is analyzed and candidate feature sets of the tensor are obtained, including global and local features of the sparse tensor. The candidate feature set complements the sparse features of the original tensor that were lost during the tensor conversion. In addition, the feature set of the sparse tensor affects the memory layout and computational characteristics under a particular sparse format. The candidate feature set of the sparse tensor is shown in table 2, which is classified as follows:
(1) global features include the dimension size of the tensor, the Number of Non-zeros (NNZ for short), sparsity, and features associated with NNZ at each index along a dimension. For example, the NNZs under each index are obtained by accumulation along a certain dimension, and then the average value, the maximum value, the minimum value and the average neighbor difference of the NNZs of all indexes are obtained by further calculation.
(2) Local features are associated with slices (Slice) and fibers (Fiber), including the number of slices and fibers, the ratio of slices and fibers, the NNZ under each Slice and Fiber, and the number of fibers under each Slice. For example, the NNZs under each slice are obtained by accumulation along a certain dimension, and then the average value, the maximum value, the minimum value and the average neighbor difference of the NNZs of all the slices are further calculated.
Table 2 is a candidate feature set of the sparse tensor of the method proposed by the present invention;
and 8: forming a feature set of each sparse tensor by the normalization matrix generated in the step 6 and the candidate feature set generated in the step 7, and listing the feature sets and labels of all tensors according to the number indexes to form a training set of a Convolutional Neural Network (CNN);
and step 9: and (4) taking the training set as the input of the customized CNN, and generating a trained network model after a training process. A customized Neural Network (TnsNet) combines CNN and a feed-forward Neural Network (FFNN) to better enable prediction of the optimal storage format in tensor computation. The storage format of the sparse tensor includes COO, F-COO, HiCOO, CSF, and HB-CSF, as shown in FIG. 4. Due to the complexity of the spatial sparsity of tensors, no format can be applied to all tensors. Furthermore, the performance in different formats may differ exponentially for the same tensor. The network structure of TnsNet is shown in fig. 5, and is specifically designed as follows:
(1) TnsNet enables network nesting, where the inner network (BaseNet) contains all convolutional and pooling layers. When the method is adapted to other scenes or the scaling method of the change matrix, the network structure and the hyper-parameters of the BaseNet do not need to be changed, and only the full connection layer except the BaseNet needs to be changed;
(2) TnsNet applies two tensor reduction methods, the matrixes generated after tensor conversion are output as vectors after being subjected to BaseNet respectively, and then are combined into the combined characteristic of the tensor, and flow into the full connection layer;
(3) TnsNet introduces a feature layer, and a candidate feature set of a sparse tensor is used as an input of the feature layer, is cascaded with the fully connected layer in the step (2), and flows into the next fully connected layer.
Step 10: when a new optimal storage format of the input sparse tensor data is selected, re-executing the step 3-7, and combining the normalized matrix of the sparse tensor and the candidate feature set into a customized prediction set of the CNN;
step 11: inputting the prediction set into the trained network model, outputting the probability of obtaining the best performance for each sparse tensor storage format, and selecting the sparse format for obtaining the maximum probability as the storage format of the tensor execution tensor calculation;
step 12: repeating the steps 10-11 until the storage formats of all the sparse tensors to be predicted are selected;
step 13: if the automatic selection of the sparse tensor format is to be realized on a new hardware platform, there are two situations:
(1) the hardware architecture is the same as the software system, and a continuous training method is adopted at the moment, namely training is continued based on the trained model. The specific method is to keep the trained model generated in the step 9 and execute the step 12 again;
(2) there are several differences between hardware architecture and software system, and at this time, a method of training from the beginning is adopted, that is, the trained model is discarded, and the network model is trained from the beginning directly. The specific method is that the trained model generated in the step 9 is not reserved, and the steps 2 to 12 are executed again.
Step 14: and if the matrix scaling method needs to be replaced, re-executing the step 5, finely adjusting the network structure of the customized CNN in the step 9, and re-executing the steps 8-12 to automatically select the storage formats of all the sparse tensors to be predicted. The network structure of TnsNet represented using the histogram is shown in fig. 6. When replacing the histogram representation by the density representation, each input matrix is replaced by a row matrix and a column matrix, which matrices can be regarded as different channels of the image. In addition, each matrix is used as input of BaseNet, and the output characteristic values are combined into a full connection layer. After the features of the rows and columns are combined, the remaining network structure of the TnsNet remains unchanged.
Step 15: if the sparse tensor format is to be automatically selected based on other tensor calculation, the step 2 is executed again to obtain a new label of corresponding tensor data, and then the steps 8 to 12 are executed again;
step 16: and if the automatic selection of the storage format is to be realized for the sparse tensors of other orders, fine-tuning the network structure of the TnsNet in the step 9, and re-executing the steps 1-12. For the N-order tensor calculation based on a certain dimensionality, a matrix is generated after flattening, and a matrix is generated after mappingA matrix. And for a plurality of matrixes generated after mapping, the matrixes are respectively used as input of BaseNet, and output characteristic values are combined to a full connection layer. After the mapping matrixes are combined, the rest network structure of the TnsNet keeps unchanged.
Although illustrative embodiments of the present invention have been described above to facilitate the understanding of the present invention by those skilled in the art, it should be understood that the present invention is not limited to the scope of the embodiments, but various changes may be apparent to those skilled in the art, and it is intended that all inventive concepts utilizing the inventive concepts set forth herein be protected without departing from the spirit and scope of the present invention as defined and limited by the appended claims.
Claims (11)
1. A sparse tensor storage format automatic selection method based on a convolutional neural network is characterized by comprising the following steps of:
step 1: firstly, collecting a sparse matrix data set produced and operated by an image bitmap, then combining index values of the sparse matrix in rows and columns into a high dimension or a low dimension of a sparse tensor, and finally generating a sparse tensor data set of a preset order;
step 2: storing the sparse tensor data set into various sparse tensor formats, performing tensor calculation on a preset hardware platform to obtain execution time of the sparse tensor data set, and converting the execution time into a label of corresponding tensor data;
and step 3: reducing the sparse tensor data into a plurality of matrixes along a certain dimension by a mapping method, wherein the dimension corresponds to the dimension for executing tensor calculation in the step 2;
and 4, step 4: reducing the sparse tensor data into a matrix along a certain dimension by a flattening method, wherein the dimension corresponds to the dimension for executing tensor calculation in the step 2;
and 5: scaling the irregular-size matrix generated in step 3 and step 4 into a fixed-size matrix through density representation or histogram representation;
step 6: normalizing the values of all elements in the fixed-size matrix to a range of [0,1] by dividing by the maximum value of the elements in the matrix;
and 7: analyzing the sparse tensor data and obtaining a candidate feature set of the tensor, wherein the candidate feature set comprises global features and local features of the sparse tensor;
and 8: forming an feature set of each sparse tensor by the normalization matrix generated in the step 6 and the candidate feature set generated in the step 7, and arranging the feature sets and labels of all the tensors into a list according to the number indexes, so as to form a training set of a Convolutional Neural Network (CNN) for short;
and step 9: taking the training set as the input of the customized CNN, and generating a trained network model after the training process;
step 10: when a new optimal storage format for inputting sparse tensor data is selected, re-executing the step 3-7, and combining the normalized matrix of the sparse tensor and the candidate feature set into a customized CNN prediction set;
step 11: inputting the prediction set into the trained network model, outputting the probability of obtaining the best performance for each sparse tensor storage format, and selecting the sparse format for obtaining the maximum probability as the storage format of the tensor execution tensor calculation;
step 12: repeating the steps 10-11 until the storage formats of all the sparse tensors to be predicted are selected;
step 13: if the automatic selection of the sparse tensor format is to be realized on a new hardware platform, there are two situations: the hardware architecture is the same as the software system, the trained model generated in the step 9 is reserved, and the step 12 is executed again; secondly, the hardware architecture and the software system have a plurality of differences, the trained model generated in the step 9 is not reserved, and the steps 2 to 12 are executed again;
step 14: if the matrix scaling method needs to be replaced, re-executing the step 5, finely adjusting the network structure of the customized CNN in the step 9, and re-executing the steps 8-12 to automatically select the storage formats of all the sparse tensors to be predicted;
step 15: if the sparse tensor format is to be automatically selected based on other tensor calculation, the step 2 is executed again to obtain a new label of corresponding tensor data, and then the steps 8 to 12 are executed again;
step 16: and if the storage format of the sparse tensor of other orders needs to be automatically selected, fine-tuning the network structure of the customized CNN in the step 9, and re-executing the steps 1-12.
2. The method for automatically selecting the sparse tensor storage format based on the convolutional neural network as claimed in claim 1, wherein: in the step 1, the index values of the sparse matrix in the rows and the columns are combined into the high dimension or the low dimension of the sparse tensor, and in addition, for real sparse matrix data obtained in the image bitmap production and operation, two or more sparse matrices are randomly selected and combined to obtain a preset number of sparse tensor data for network training.
3. The method for automatically selecting the sparse tensor storage format based on the convolutional neural network as claimed in claim 1, wherein: in the step 2, the execution time is converted into a label of corresponding tensor data, and the specific conversion method is to mark the sparse storage format with the shortest execution time as 1 and mark other sparse storage formats as 0, so as to obtain a bit label of the corresponding tensor.
4. The method for automatically selecting the sparse tensor storage format based on the convolutional neural network as claimed in claim 1, wherein: in the step 3, the sparse tensor data are reduced into a plurality of matrixes along a certain dimension by a mapping method, wherein the mapping method embodies the vertical distribution of the modeless index; the specific method comprises the following steps:
(1) let tensor X be third order and Mode along the first dimension1Mapping to matrix A, specifically, the tensor along Mode1Mapping and accumulating non-zero values of all slices Slice to the same Slice, wherein the non-zero values are all regarded as 1 during accumulation, and X is defined as the RI ×J×K,A∈RJ×KWhere I, J, K is the dimension of the tensor X in three directions, the matrix A is calculated asThe mapping matrix of the tensor X along other dimensions is analogized in the same way;
(2) let tensor X be of order N and along Mode1Mapping into several matrixes, specifically, mapping non-zero values into the same slice with fixed N-2 indexes each time, and finally generatingA matrix, defining Wherein INRepresenting the magnitude of the Nth dimension, the tensor X is along Mode1The calculation formula mapped to matrix A isTensor X edge Mode1The other generated matrixes are analogized in the same way.
5. The method for automatically selecting the sparse tensor storage format based on the convolutional neural network as claimed in claim 1, wherein: in the step 4, the sparse tensor data is reduced into a matrix along a certain dimension by a flattening method, wherein the flattening method embodies the horizontal distribution of modal indexes, and the flattening, i.e., the tensor is unfolded by a matrixing method, and the specific method is as follows:
(1) assuming that the tensor X is third-order and each element in X is flattened to a corresponding matrix, defining X ∈ RI×J×K,B∈RI ×JKThe tensor X is along Mode1The formula of the calculation flattened to the matrix B is B(i,k×J+j)X (i, j, k), the tensor X is analogized by the same reasoning along other dimensions of the flattened matrix;
(2) assuming that the tensor X is of order N and each element in X is flattened into a corresponding matrix, defineThe tensor X is along ModenThe calculation formula of flattening into a matrix isWherein the tensor element (i)1,i2,…,iN) Mapping to matrix elements (i)nJ), the tensor X is analogized by flattening the matrix along other dimensions.
6. The method for automatically selecting the sparse tensor storage format based on the convolutional neural network as claimed in claim 1, wherein: in step 5, scaling the irregular-size matrices generated in step 3 and step 4 into fixed-size matrices by density representation or histogram representation, where both methods use an acceptable matrix size to represent coarse-grained features of an original matrix, and the matrix scaling method is specifically as follows:
(1) the density representation captures the detailed density difference between different areas of the original matrix, and for the density representation, the number of non-zero elements in each block in the original matrix is calculated and is filled into the matrix with fixed size after scaling;
(2) the histogram representation further captures the distance between the element in the original matrix and the diagonal line, but part of the sparsely distributed features are lost, and for the histogram representation, the distance information of the diagonal line of each element in the original matrix in the rows and the columns is calculated and filled into the row histogram and the column histogram which are fixed in size after scaling.
7. The method for automatically selecting the sparse tensor storage format based on the convolutional neural network as claimed in claim 1, wherein: in step 7, the sparse tensor data are analyzed to obtain candidate feature sets of the tensor, wherein the candidate feature sets include global features and local features of the sparse tensor, the candidate feature sets supplement sparse features lost in the original tensor conversion process, in addition, the feature sets of the sparse tensor influence memory layout and calculation characteristics under a specific sparse format, and the features in the candidate feature sets are classified as follows:
(1) global features include the dimension size of the tensor, the Number of Non-zeros, NNZ for short, the sparsity, and features associated with NNZ under each index along a dimension;
(2) local features are associated with the slice and Fiber, including the number of slices and fibers, the slice to Fiber ratio, the NNZ under each slice and Fiber, and the number of fibers under each slice.
8. The method for automatically selecting the sparse tensor storage format based on the convolutional neural network as claimed in claim 1, wherein: in the step 9, the training set is used as an input of a customized CNN, and the customized Neural Network tnsnnet combines the CNN and a feed-forward Neural Network, which is referred to as FFNN for short, so as to better realize prediction of an optimal storage format in tensor calculation; the network structure design of TnsNet is as follows:
(1) TnsNet realizes network nesting, wherein an inner layer network BaseNet comprises all convolutional layers and pooling layers, and when the network is adapted to other scenes or a matrix scaling method is changed, the network structure and hyper-parameters of the BaseNet do not need to be changed, and only a full connection layer except the BaseNet needs to be changed;
(2) TnsNet applies two tensor reduction methods, and the matrixes generated after tensor conversion are output as vectors after BaseNet, then are combined into the combined characteristic of tensor, and flow into a full connection layer;
(3) TnsNet introduces a feature layer, and a candidate feature set of a sparse tensor is used as an input of the feature layer, is cascaded with the fully-connected layer in the step (2), and flows into the next fully-connected layer.
9. The method for automatically selecting the sparse tensor storage format based on the convolutional neural network as claimed in claim 1, wherein: in the step 13, if the automatic selection of the sparse tensor format is to be implemented on a new hardware platform, different retraining methods are applied to the two cases, and a continuous training method is adopted for the case that the hardware architecture and the software system are the same, that is, training is continued based on a trained model; for the situation that the hardware architecture and the software system have a plurality of differences, a method of training from the beginning is adopted, namely, a trained model is abandoned, and the network model is trained from the beginning directly.
10. The method for automatically selecting the sparse tensor storage format based on the convolutional neural network as claimed in claim 1, wherein: in step 14, if the method of scaling the matrix needs to be changed, re-executing step 5 and fine-tuning the network structure of the customized CNN in step 9 includes that, when the density representation is changed into the histogram representation, each input matrix is replaced by a row matrix and a column matrix, and the two matrices can be regarded as different channels of the image; in addition, each matrix is used as input of BaseNet, and output characteristic values are combined to a full connection layer; after the features of the rows and columns are combined, the remaining network structure of the TnsNet remains unchanged.
11. The method as claimed in claim 1, wherein in step 16, if the sparse tensor of other orders is to be automatically selected as the storage format, the network structure of the customized CNN in step 9 is finely adjusted, and for the N-order tensor, based on the tensor calculation of a certain dimension, a matrix is always generated after flattening, and a matrix is generated after mappingAnd the matrixes are used for respectively inputting the plurality of matrixes generated after mapping, the output characteristic values are merged to a full connection layer, and after the mapping matrixes are merged, the rest network structure of TnsNet is kept unchanged.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011430624.9A CN112529157B (en) | 2020-12-09 | 2020-12-09 | Sparse tensor storage format automatic selection method based on convolutional neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011430624.9A CN112529157B (en) | 2020-12-09 | 2020-12-09 | Sparse tensor storage format automatic selection method based on convolutional neural network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112529157A CN112529157A (en) | 2021-03-19 |
CN112529157B true CN112529157B (en) | 2022-07-01 |
Family
ID=74998606
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011430624.9A Active CN112529157B (en) | 2020-12-09 | 2020-12-09 | Sparse tensor storage format automatic selection method based on convolutional neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112529157B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112686342B (en) * | 2021-03-12 | 2021-06-18 | 北京大学 | Training method, device and equipment of SVM (support vector machine) model and computer-readable storage medium |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110020724A (en) * | 2019-03-18 | 2019-07-16 | 浙江大学 | A kind of neural network column Sparse methods based on weight conspicuousness |
CN111625476A (en) * | 2019-02-28 | 2020-09-04 | 莫维迪乌斯有限公司 | Method and apparatus for storing and accessing multidimensional data |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9471377B2 (en) * | 2013-11-13 | 2016-10-18 | Reservoir Labs, Inc. | Systems and methods for parallelizing and optimizing sparse tensor computations |
-
2020
- 2020-12-09 CN CN202011430624.9A patent/CN112529157B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111625476A (en) * | 2019-02-28 | 2020-09-04 | 莫维迪乌斯有限公司 | Method and apparatus for storing and accessing multidimensional data |
CN110020724A (en) * | 2019-03-18 | 2019-07-16 | 浙江大学 | A kind of neural network column Sparse methods based on weight conspicuousness |
Non-Patent Citations (2)
Title |
---|
IA-SpGEMM: An Input-aware Auto-tuning Framework for Parallel Sparse Matrix-Matrix Multiplication;Zhen Xie等;《ACM》;20190628;第94-105页 * |
tensor toolbox 处理稀疏张量;其他;《https://www.codetd.com/article/4927895》;20190115;第1-6页 * |
Also Published As
Publication number | Publication date |
---|---|
CN112529157A (en) | 2021-03-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11301727B2 (en) | Efficient image classification method based on structured pruning | |
Wang et al. | Deep mixture of experts via shallow embedding | |
US11657267B2 (en) | Neural network apparatus, vehicle control system, decomposition device, and program | |
CN113627389A (en) | Target detection optimization method and device | |
CN110188827B (en) | Scene recognition method based on convolutional neural network and recursive automatic encoder model | |
CN113658100A (en) | Three-dimensional target object detection method and device, electronic equipment and storage medium | |
CN112613536A (en) | Near infrared spectrum diesel grade identification method based on SMOTE and deep learning | |
CN117251754A (en) | CNN-GRU energy consumption prediction method considering dynamic time packaging | |
CN112529157B (en) | Sparse tensor storage format automatic selection method based on convolutional neural network | |
Pichel et al. | A new approach for sparse matrix classification based on deep learning techniques | |
CN108564116A (en) | A kind of ingredient intelligent analysis method of camera scene image | |
CN113705394B (en) | Behavior recognition method combining long time domain features and short time domain features | |
CN113516019B (en) | Hyperspectral image unmixing method and device and electronic equipment | |
CN114463636A (en) | Improved complex background remote sensing image target detection method and system | |
Qi et al. | Learning low resource consumption cnn through pruning and quantization | |
Geng et al. | Pruning convolutional neural networks via filter similarity analysis | |
CN110378356A (en) | Fine granularity image-recognizing method based on multiple target Lagrange canonical | |
CN114494284A (en) | Scene analysis model and method based on explicit supervision area relation | |
CN112364193A (en) | Image retrieval-oriented method for fusing multilayer characteristic deep neural network model | |
CN112329924A (en) | Method for improving prediction performance of neural network | |
Su et al. | A dual quantum image feature extraction method: PSQIFE | |
CN117933345B (en) | Training method of medical image segmentation model | |
CN114663690B (en) | System for realizing breast cancer classification based on novel quantum frame | |
CN114241249B (en) | Image classification method and system based on target detection algorithm and convolutional neural network | |
López-Cifuentes et al. | Attention-based Knowledge Distillation in Multi-attention Tasks: The Impact of a DCT-driven Loss |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |