CN112529157B - Sparse tensor storage format automatic selection method based on convolutional neural network - Google Patents

Sparse tensor storage format automatic selection method based on convolutional neural network Download PDF

Info

Publication number
CN112529157B
CN112529157B CN202011430624.9A CN202011430624A CN112529157B CN 112529157 B CN112529157 B CN 112529157B CN 202011430624 A CN202011430624 A CN 202011430624A CN 112529157 B CN112529157 B CN 112529157B
Authority
CN
China
Prior art keywords
tensor
sparse
matrix
storage format
dimension
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011430624.9A
Other languages
Chinese (zh)
Other versions
CN112529157A (en
Inventor
杨海龙
孙庆骁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN202011430624.9A priority Critical patent/CN112529157B/en
Publication of CN112529157A publication Critical patent/CN112529157A/en
Application granted granted Critical
Publication of CN112529157B publication Critical patent/CN112529157B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Complex Calculations (AREA)

Abstract

The invention discloses a sparse tensor storage format automatic selection method based on a convolutional neural network, which comprises the following steps of: 1) respectively reducing the multidimensional tensor into a two-dimensional matrix by two conversion methods of unfolding and mapping; 2) scaling the matrix to a fixed size by a density representation or histogram representation; 3) taking a matrix with a fixed size as the input of a Convolutional Neural Network (CNN), wherein the structure of the CNN is designed and customized for the automatic selection of a sparse tensor storage format; 4) training the CNN by a supervised learning method to obtain a trained network model; 5) and taking the new sparse tensor as the input of the network model, and obtaining the optimal storage format of the tensor after forward propagation. The method utilizes the advantages of CNN under the classification problem and combines with a feed-forward neural network FFNN to adapt to the prediction of an optimal sparse tensor storage format, effectively converts the sparse tensor into matrix input with a fixed size on the premise of fully retaining tensor characteristics, and is applicable to automatic selection of a sparse format of a high-order tensor under any tensor calculation.

Description

Sparse tensor storage format automatic selection method based on convolutional neural network
Technical Field
The invention relates to the fields of convolutional neural networks, sparse tensor storage formats, tensor calculation and the like, in particular to an automatic sparse tensor storage format selection method based on the convolutional neural networks.
Background
Tensors generally represent high-dimensional data that exceeds two dimensions. Multidimensional tensors are widely applied to the fields of scientific calculation, numerical analysis, machine learning and the like. Since real-world tensors are typically large and very sparse, many existing efforts optimize the performance of tensor computations based on computational patterns and operational dependencies. Although parallelization can significantly improve the performance of tensor computations, it is still limited by sparse patterns and hardware features. Thus, prior work has proposed diverse sparse tensor formats to improve computational performance through commonly designed storage and algorithms compatible with sparsity and hardware. However, due to the complex sparse pattern and the diverse hardware characteristics, the optimal sparse storage formats for tensor computation vary greatly. Therefore, determining different sparse tensors under diverse hardware platforms is challenging for the optimal storage format for tensor computation.
The storage format selection of the sparse tensor can be seen as a classification problem. This problem has proven to be well suited for deep learning techniques, which have proven their effectiveness in image classification and object detection. In particular, Convolutional Neural Networks (CNNs) have gained tremendous popularity in classification tasks due to their ability to capture the underlying features of input data without human intervention. However, such methods cannot be directly applied to tensor format selection due to the high dimensionality of the data to be processed. Although high-dimensional convolution is proposed, it is not suitable for automatic selection of sparse tensor storage formats, mainly for two reasons. Firstly, irregularity of tensor data reduces the computational efficiency of convolution operations, which brings unacceptable training overhead; secondly, the popular deep learning framework supports only three-dimensional convolution layers at the highest, and cannot meet the requirement of higher dimension tensor. The existing solution only supports automatic selection of the optimal sparse storage format for matrix operations, which can be summarized in the following two aspects:
(1) traditional machine learning method
This is studied by extracting sparse features of the matrix as input parameters and training classifiers or regression models to predict the best combination of input parameters. Commonly used classifiers include Decision Trees (DTs), Multi-class Support Vector Machines (SVMs), rule sets (rulesets), and the like. Commonly used regression models include linear regression models and tree-based regression models. The matrix sparse characteristics extracted by the method are rough, and the sparse space distribution with fine granularity cannot be effectively reflected, so that the prediction accuracy of the optimal sparse tensor format is limited.
(2) Deep learning method
The study in this respect is to convert irregular sparse matrices into fixed size matrices by scaling methods. A matrix of fixed size may be used as an input to the convolutional neural network, and multiple matrices may be viewed as different channels of the image. In addition, the sparse features of the matrix can also be accessed into the feature layer of the neural network, thereby supplementing the sparse spatial distribution of the matrix lost during the scaling process. The method only supports sparse matrix at present and does not support prediction of the optimal storage format of higher-order data. Since tensors typically store data in higher dimensions, they cannot be fed directly to a convolutional neural network that typically processes no more than three dimensions (such as weight, height, and channel of the image data).
In summary, existing solutions, whether based on the conventional machine learning method or the deep learning method, only support sparse matrices and do not support automatic selection of storage formats of higher-order tensors. The format selection of the sparse tensor presents special challenges to the convolutional neural network: 1) in order to be matched with a two-dimensional convolutional neural network, a high-dimensional tensor needs to be reduced into a matrix; 2) the matrix generated after reduction needs to be scaled to be network input with a fixed size under the condition that the sparse mode is not lost as much as possible; 3) the convolutional neural network needs to be redesigned to complement the sparse features lost during tensor conversion.
Disclosure of Invention
The invention solves the problems: the method overcomes the defects of the prior art, provides an automatic selection method of the sparse tensor storage format based on the convolutional neural network, and fully excavates the performance influence of the sparse storage format on tensor calculation. Specifically, the sparse tensor is subjected to tensor conversion and feature extraction, and the generated sparse matrix and feature vectors are fed forward to the customized convolutional neural network to predict and obtain the optimal storage format of the sparse tensor.
The technical solution of the invention is a sparse tensor storage format automatic selection method based on a convolutional neural network, comprising the following steps:
step 1: first a sparse matrix dataset of physical entity production and operation is collected. Then, combining the index values of the sparse matrix in rows and columns into a high dimension or a low dimension of the sparse tensor, and finally generating a sparse tensor data set of a predetermined order;
step 2: storing the sparse tensor data set into various sparse tensor formats, performing tensor calculation on a preset hardware platform to obtain execution time of the sparse tensor data set, and converting the execution time into a label of corresponding tensor data;
and 3, step 3: reducing the sparse tensor data into a plurality of matrixes along a certain dimension by a mapping method, wherein the dimension corresponds to the dimension for executing tensor calculation in the step 2;
and 4, step 4: reducing the sparse tensor data into a matrix along a certain dimension by a flattening method, wherein the dimension corresponds to the dimension for executing tensor calculation in the step 2;
and 5: scaling the irregular-size matrix generated in step 3 and step 4 into a fixed-size matrix through density representation or histogram representation;
step 6: normalizing the values of all elements in the fixed-size matrix to a range of [0,1] by dividing by the maximum value of the elements in the matrix;
and 7: analyzing the sparse tensor data and obtaining a candidate feature set of the tensor, wherein the candidate feature set comprises global features and local features of the sparse tensor;
and 8: forming an feature set of each sparse tensor by the normalization matrix generated in the step 6 and the candidate feature set generated in the step 7, and arranging the feature sets and labels of all the tensors into a list according to a number index, so as to form a training set of a Convolutional Neural Network (CNN);
and step 9: taking the training set as the input of the customized CNN, and generating a trained network model after the training process;
step 10: when a new optimal storage format for inputting sparse tensor data is selected, re-executing the step 3-7, and combining the normalized matrix of the sparse tensor and the candidate feature set into a customized CNN prediction set;
step 11: inputting the prediction set into the trained network model, outputting the probability of obtaining the best performance for each sparse tensor storage format, and selecting the sparse format for obtaining the maximum probability as the storage format of the tensor execution tensor calculation;
step 12: repeating the steps 10-11 until the storage formats of all the sparse tensors to be predicted are selected;
step 13: if the automatic selection of the sparse tensor format is to be realized on a new hardware platform, there are two situations: the hardware architecture is the same as the software system, the trained model generated in the step 9 is reserved, and the step 12 is executed again; secondly, the hardware architecture and the software system have a plurality of differences, the trained model generated in the step 9 is not reserved, and the steps 2 to 12 are executed again;
step 14: if the matrix scaling method needs to be replaced, re-executing the step 5, finely adjusting the network structure of the customized CNN in the step 9, and re-executing the steps 8-12 to automatically select the storage formats of all the sparse tensors to be predicted;
step 15: if the sparse tensor format is to be automatically selected based on other tensor calculation, the step 2 is executed again to obtain a new label of corresponding tensor data, and then the steps 8 to 12 are executed again;
step 16: and if the storage format of the sparse tensor of other orders needs to be automatically selected, fine-tuning the network structure of the customized CNN in the step 9, and re-executing the steps 1-12.
In the step 1, the index values of the sparse matrix in the rows and columns are combined into the high dimension or the low dimension of the sparse tensor, which is because the open-source real sparse tensor data is too little to support the training of the CNN. In addition, a large amount of real sparse matrix data can be obtained in the real world, and two or more sparse matrixes can be randomly selected to obtain enough sparse tensor data for network training.
In the step 2, the execution time is converted into a label of corresponding tensor data, and the specific conversion method is to mark the sparse storage format with the shortest execution time as 1 and mark other sparse storage formats as 0, so as to obtain a bit label of the corresponding tensor.
In the step 3, the sparse tensor data are reduced into a plurality of matrixes along a certain dimension by a mapping method, wherein the mapping method embodies the vertical distribution of the modeless index. The specific method comprises the following steps:
(1) assuming that the tensor X is third order and is mapped to the matrix A along the first dimension (Mode-1), specifically, non-zero values of the tensor along all slices (Slice) of Mode-1 are mapped and accumulated on the same Slice, and the non-zero values are all regarded as 1 when accumulated, defining X as RI×J×K,A∈RJ×KWhere I, J, K is the dimension of the tensor X in three directions, the matrix A is calculated as
Figure BDA0002826529790000041
The mapping matrix of the tensor X along other dimensions is analogized in the same way;
(2) assuming that the tensor X is of order N and is mapped into several matrices along Mode-1, specifically, non-zero values are mapped to the same slice with fixed N-2 indexes each time, and finally, the final generation is performed
Figure BDA0002826529790000042
A matrix, defining
Figure BDA0002826529790000043
Figure BDA0002826529790000044
In which INRepresenting the magnitude of the Nth dimension, the equation for mapping the tensor X to the matrix A along Mode-1 is
Figure BDA0002826529790000045
The tensor X is analogized to the other matrices generated along Mode-1.
In the step 4, the sparse tensor data is reduced into a matrix along a certain dimension by a flattening method, wherein the flattening method embodies the horizontal distribution of the modal index. The tensor is unfolded by a flattening and matrixing method, and the specific method is as follows:
(1) let tensor X be third order and each element in X is flattened to the corresponding matrix. Definition of X ∈ RI×J×K,B∈RI×JKThe tensor X is then flattened along Mode-1 to the matrix B, and the calculation formula is B(i,k×J+j)X (i, j, k). The flattening matrix of the tensor X along other dimensions can be analogized in the same way;
(2) let tensor X be of order N, and each element in X is flattened to a corresponding matrix. Definition of
Figure BDA0002826529790000046
The equation for the tensor X flattened into a matrix along Mode-n is
Figure BDA0002826529790000047
Wherein the tensor element (i)1,i2,...,iN) Mapping to matrix elements (i)nJ). The flattening matrices of tensor X along other dimensions can be analogized.
In step 5, the irregular-size matrices generated in step 3 and step 4 are scaled to fixed-size matrices by density representation or histogram representation, wherein both methods can represent the coarse-grained features of the original matrix with an acceptable matrix size. The matrix scaling method is specifically as follows:
(1) the density representation captures the detailed density differences between different regions of the original matrix. For density representation, calculating the number of non-zero elements in each block in an original matrix, and filling the non-zero elements into a matrix with a fixed size after scaling;
(2) the histogram representation further captures the distance between the elements and the diagonal in the original matrix, but some of the sparsely distributed features are lost. For histogram representation, distance information of each element in the original matrix in rows and columns versus the diagonal is calculated and filled into scaled fixed size row and column histograms.
In step 7, the sparse tensor data are analyzed and candidate feature sets of the tensor are obtained, wherein the candidate feature sets comprise global features and local features of the sparse tensor. The candidate feature set complements the sparse features of the original tensor that were lost during the tensor conversion. In addition, the feature set of the sparse tensor affects the memory layout and computational characteristics under a particular sparse format. The features in the candidate feature set are classified as follows:
(1) global features include the dimension size of the tensor, the Number of Non-zeros (NNZ for short), sparsity, and features associated with NNZ at each index along a dimension. For example, the NNZs under each index are obtained by accumulation along a certain dimension, and then the average value, the maximum value, the minimum value and the average neighbor difference of the NNZs of all indexes are obtained by further calculation.
(2) Local features are associated with slices and fibers (Fiber), including the number of slices and fibers, the ratio of slices and fibers, the NNZ under each slice and Fiber, and the number of fibers under each slice. For example, the NNZ under each slice is obtained by accumulation along a certain dimension, and then the average value, the maximum value, the minimum value and the average neighbor difference of the NNZ of all slices are further calculated.
In step 9, the training set is used as an input of the customized CNN, and the customized Neural Network (tnsnnet) combines the CNN and a feed-forward Neural Network (FFNN) to better predict an optimal storage format in tensor calculation. The network structure design of TnsNet is as follows:
(1) TnsNet enables network nesting, where the inner network (BaseNet) contains all convolutional and pooling layers. When the method is adapted to other scenes or the scaling method of the change matrix, the network structure and the hyper-parameters of the BaseNet do not need to be changed, and only the full connection layer except the BaseNet needs to be changed;
(2) TnsNet applies two tensor reduction methods, and the matrixes generated after tensor conversion are output as vectors after BaseNet, then are combined into the combined characteristic of tensor, and flow into a full connection layer;
(3) TnsNet introduces a feature layer, and a candidate feature set of a sparse tensor is used as an input of the feature layer, is cascaded with the fully-connected layer in the step (2), and flows into the next fully-connected layer.
In step 13, if the automatic selection of the sparse tensor format is to be implemented on a new hardware platform, different retraining methods are applied to the two cases, and a continuous training method is adopted for the case that the hardware architecture and the software system are the same or similar, that is, training is continued based on the trained model; for the situation that the hardware architecture and the software system have a plurality of differences, a method of training from the beginning is adopted, namely, a trained model is abandoned, and the network model is trained from the beginning directly.
In step 14, if the method for scaling the matrix needs to be changed, step 5 is executed again, and the network structure of the customized CNN in step 9 is fine-tuned. For example, when replacing the density representation with a histogram representation, each input matrix is replaced with a row matrix and a column matrix, which can be seen as different channels of the image. In addition, each matrix is used as input of BaseNet, and output characteristic values are combined to a full connection layer; after the features of the rows and columns are combined, the remaining network structure of the TnsNet remains unchanged.
In step 16, if the storage format of the sparse tensor of the other orders is to be automatically selected, the network structure of the customized CNN in step 9 is finely adjusted. For the N-order tensor calculation based on a certain dimensionality, a matrix is generated after flattening, and a matrix is generated after mapping
Figure BDA0002826529790000051
A matrix. And for a plurality of matrixes generated after mapping, the matrixes are respectively used as input of BaseNet, and output characteristic values are combined to a full connection layer. After the mapping matrixes are combined, the rest network structure of the TnsNet remains unchanged.
Has the advantages that:
the method fully utilizes the advantages of the CNN under the classification problem, and is combined with a feed forward Neural Network (FFNN for short) to adapt to the prediction of the optimal sparse tensor storage format. In addition, the sparse tensor is effectively converted into matrix input with a fixed size on the premise of fully retaining tensor characteristics. The method can be suitable for automatic selection of the sparse format of the high-order tensor under the arbitrary tensor calculation.
Drawings
FIG. 1 is a design summary of a proposed method of implementing the present invention;
FIG. 2(a) is a schematic diagram of the third order tensor proposed by the present invention;
FIG. 2(b) is a slice with tensor at Model-1;
FIG. 2(c) is a mapping of tensors under Model-1;
FIG. 2(d) is the flattening of the tensor at Model-1;
FIG. 3 is a schematic diagram of a matrix scaling method proposed by the present invention;
FIG. 4 is a schematic diagram of sparse tensor format contrast proposed by the present invention;
FIG. 5 is a schematic diagram of a structure of a customized convolutional neural network proposed by the present invention;
fig. 6 is a schematic structural diagram of a customized convolutional neural network represented by a histogram according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings and examples. It should be understood that the specific examples described herein are intended to be illustrative only and are not intended to be limiting. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
The design summary of the present invention is shown in fig. 1, wherein a gray part is a module added in addition to an existing data set and a sparse tensor storage format, and tnsnnet is a custom Convolutional Neural Network (CNN) implemented by the present invention.
As shown in fig. 1: the method comprises the following specific implementation steps:
step 1: the method comprises the steps of collecting a sparse matrix data set of a real world (such as a social network, an image bitmap and the like), wherein the sparse matrix is widely applied to numerical analysis (such as solving partial differential equations) and practical problems (such as the social network, the image bitmap and the like), randomly selecting two or more sparse matrices, and combining the two or more sparse matrices to obtain enough sparse tensor data for network training. Combining the index values of the sparse matrix in rows and columns into the high dimension or the low dimension of the sparse tensor, and finally generating a sparse tensor data set of a specific order;
and 2, step: and storing the sparse tensor data set into various sparse tensor formats, performing tensor calculation on a specific hardware platform to obtain the execution time of the sparse tensor data set, and converting the execution time into a label of corresponding tensor data. The specific conversion method is that the sparse storage format with the shortest execution time is marked as 1, and other sparse storage formats are marked as 0, so that a bit label of a corresponding tensor is obtained;
and step 3: the sparse tensor data is reduced by the mapping method to a number of matrices along a certain dimension, which corresponds to the dimension of the tensor calculation performed in step 2. The mapping of the third order tensor in the first dimension (Mode-1) is shown in figure 2. The specific method comprises the following steps:
(1) let tensor X be third order and map to matrix A along the first dimension (Mode-1). Specifically, the non-zero values of all slices (Slice) of the tensor along Mode-1 are mapped and accumulated onto the same Slice (the non-zero values are all regarded as 1 when accumulated). Definition of X ∈ RI×J×K,A∈RJ×KWhere I, J, K is the dimension of the tensor X in three directions, the matrix A is calculated as
Figure BDA0002826529790000071
The mapping matrix of the tensor X along other dimensions can be analogized in the same way;
(2) let tensor X be of order N and mapped as matrices along Mode-1. In particular, non-zero values are mapped to the same slice with a fixed N-2 indices at a time, resulting in the final generation
Figure BDA0002826529790000072
A matrix. Definition of
Figure BDA0002826529790000073
Figure BDA0002826529790000074
Wherein INRepresenting the magnitude of the Nth dimension, the equation for mapping the tensor X to the matrix A along Mode-1 is
Figure BDA0002826529790000075
The other matrices generated by tensor X along Mode-1 can be analogized in the same way.
And 4, step 4: the sparse tensor data is reduced by a flattening method to a matrix along a dimension corresponding to the dimension in which the tensor calculation is performed in step 2. The flattening of the third order tensor in the first dimension (Mode-1) is shown in FIG. 2. The specific method comprises the following steps:
(1) let tensor X be third order and each element in X is flattened to the corresponding matrix. Definition of X ∈ RI×J×K,B∈RI×JKThe tensor X is then flattened along Mode-1 to the matrix B, and the calculation formula is B(i,k×J+j)X (i, j, k). The flattening matrix of the tensor X along other dimensions can be analogized in the same way;
(2) let tensor X be of order N, and each element in X is flattened to a corresponding matrix. Definition of
Figure BDA0002826529790000076
The equation for the tensor X flattened into a matrix along Mode-n is
Figure BDA0002826529790000077
Wherein the tensor element (i)1,i2,...,iN) Mapping to matrix elements (i)nJ). The flattening matrices of tensor X along other dimensions can be analogized.
And 5: the irregular-sized matrices generated in steps 3 and 4 are scaled to fixed-sized matrices by density representation or histogram representation. Fig. 3 shows an example of scaling of a matrix, where a matrix of 8 x 8 size is scaled to a number of matrices of 4 x 4 size. The specific method comprises the following steps:
(1) the density representation captures the detailed density differences between different regions of the original matrix. For density representation, calculating the number of non-zero elements in each block in an original matrix, and filling the non-zero elements into a matrix with a fixed size after scaling;
(2) the histogram representation further captures the distance between the elements and the diagonal in the original matrix, but some of the sparsely distributed features are lost. For histogram representation, distance information of each element in the original matrix in rows and columns compared to the diagonal is calculated and filled into scaled fixed-size row and column histograms.
Step 6: normalizing the values of all elements in the fixed-size matrix to a range of [0,1] by dividing by the maximum value of the elements in the matrix;
and 7: the sparse tensor data is analyzed and candidate feature sets of the tensor are obtained, including global and local features of the sparse tensor. The candidate feature set complements the sparse features of the original tensor that were lost during the tensor conversion. In addition, the feature set of the sparse tensor affects the memory layout and computational characteristics under a particular sparse format. The candidate feature set of the sparse tensor is shown in table 2, which is classified as follows:
(1) global features include the dimension size of the tensor, the Number of Non-zeros (NNZ for short), sparsity, and features associated with NNZ at each index along a dimension. For example, the NNZs under each index are obtained by accumulation along a certain dimension, and then the average value, the maximum value, the minimum value and the average neighbor difference of the NNZs of all indexes are obtained by further calculation.
(2) Local features are associated with slices (Slice) and fibers (Fiber), including the number of slices and fibers, the ratio of slices and fibers, the NNZ under each Slice and Fiber, and the number of fibers under each Slice. For example, the NNZs under each slice are obtained by accumulation along a certain dimension, and then the average value, the maximum value, the minimum value and the average neighbor difference of the NNZs of all the slices are further calculated.
Table 2 is a candidate feature set of the sparse tensor of the method proposed by the present invention;
Figure BDA0002826529790000081
and 8: forming a feature set of each sparse tensor by the normalization matrix generated in the step 6 and the candidate feature set generated in the step 7, and listing the feature sets and labels of all tensors according to the number indexes to form a training set of a Convolutional Neural Network (CNN);
and step 9: and (4) taking the training set as the input of the customized CNN, and generating a trained network model after a training process. A customized Neural Network (TnsNet) combines CNN and a feed-forward Neural Network (FFNN) to better enable prediction of the optimal storage format in tensor computation. The storage format of the sparse tensor includes COO, F-COO, HiCOO, CSF, and HB-CSF, as shown in FIG. 4. Due to the complexity of the spatial sparsity of tensors, no format can be applied to all tensors. Furthermore, the performance in different formats may differ exponentially for the same tensor. The network structure of TnsNet is shown in fig. 5, and is specifically designed as follows:
(1) TnsNet enables network nesting, where the inner network (BaseNet) contains all convolutional and pooling layers. When the method is adapted to other scenes or the scaling method of the change matrix, the network structure and the hyper-parameters of the BaseNet do not need to be changed, and only the full connection layer except the BaseNet needs to be changed;
(2) TnsNet applies two tensor reduction methods, the matrixes generated after tensor conversion are output as vectors after being subjected to BaseNet respectively, and then are combined into the combined characteristic of the tensor, and flow into the full connection layer;
(3) TnsNet introduces a feature layer, and a candidate feature set of a sparse tensor is used as an input of the feature layer, is cascaded with the fully connected layer in the step (2), and flows into the next fully connected layer.
Step 10: when a new optimal storage format of the input sparse tensor data is selected, re-executing the step 3-7, and combining the normalized matrix of the sparse tensor and the candidate feature set into a customized prediction set of the CNN;
step 11: inputting the prediction set into the trained network model, outputting the probability of obtaining the best performance for each sparse tensor storage format, and selecting the sparse format for obtaining the maximum probability as the storage format of the tensor execution tensor calculation;
step 12: repeating the steps 10-11 until the storage formats of all the sparse tensors to be predicted are selected;
step 13: if the automatic selection of the sparse tensor format is to be realized on a new hardware platform, there are two situations:
(1) the hardware architecture is the same as the software system, and a continuous training method is adopted at the moment, namely training is continued based on the trained model. The specific method is to keep the trained model generated in the step 9 and execute the step 12 again;
(2) there are several differences between hardware architecture and software system, and at this time, a method of training from the beginning is adopted, that is, the trained model is discarded, and the network model is trained from the beginning directly. The specific method is that the trained model generated in the step 9 is not reserved, and the steps 2 to 12 are executed again.
Step 14: and if the matrix scaling method needs to be replaced, re-executing the step 5, finely adjusting the network structure of the customized CNN in the step 9, and re-executing the steps 8-12 to automatically select the storage formats of all the sparse tensors to be predicted. The network structure of TnsNet represented using the histogram is shown in fig. 6. When replacing the histogram representation by the density representation, each input matrix is replaced by a row matrix and a column matrix, which matrices can be regarded as different channels of the image. In addition, each matrix is used as input of BaseNet, and the output characteristic values are combined into a full connection layer. After the features of the rows and columns are combined, the remaining network structure of the TnsNet remains unchanged.
Step 15: if the sparse tensor format is to be automatically selected based on other tensor calculation, the step 2 is executed again to obtain a new label of corresponding tensor data, and then the steps 8 to 12 are executed again;
step 16: and if the automatic selection of the storage format is to be realized for the sparse tensors of other orders, fine-tuning the network structure of the TnsNet in the step 9, and re-executing the steps 1-12. For the N-order tensor calculation based on a certain dimensionality, a matrix is generated after flattening, and a matrix is generated after mapping
Figure BDA0002826529790000101
A matrix. And for a plurality of matrixes generated after mapping, the matrixes are respectively used as input of BaseNet, and output characteristic values are combined to a full connection layer. After the mapping matrixes are combined, the rest network structure of the TnsNet keeps unchanged.
Although illustrative embodiments of the present invention have been described above to facilitate the understanding of the present invention by those skilled in the art, it should be understood that the present invention is not limited to the scope of the embodiments, but various changes may be apparent to those skilled in the art, and it is intended that all inventive concepts utilizing the inventive concepts set forth herein be protected without departing from the spirit and scope of the present invention as defined and limited by the appended claims.

Claims (11)

1. A sparse tensor storage format automatic selection method based on a convolutional neural network is characterized by comprising the following steps of:
step 1: firstly, collecting a sparse matrix data set produced and operated by an image bitmap, then combining index values of the sparse matrix in rows and columns into a high dimension or a low dimension of a sparse tensor, and finally generating a sparse tensor data set of a preset order;
step 2: storing the sparse tensor data set into various sparse tensor formats, performing tensor calculation on a preset hardware platform to obtain execution time of the sparse tensor data set, and converting the execution time into a label of corresponding tensor data;
and step 3: reducing the sparse tensor data into a plurality of matrixes along a certain dimension by a mapping method, wherein the dimension corresponds to the dimension for executing tensor calculation in the step 2;
and 4, step 4: reducing the sparse tensor data into a matrix along a certain dimension by a flattening method, wherein the dimension corresponds to the dimension for executing tensor calculation in the step 2;
and 5: scaling the irregular-size matrix generated in step 3 and step 4 into a fixed-size matrix through density representation or histogram representation;
step 6: normalizing the values of all elements in the fixed-size matrix to a range of [0,1] by dividing by the maximum value of the elements in the matrix;
and 7: analyzing the sparse tensor data and obtaining a candidate feature set of the tensor, wherein the candidate feature set comprises global features and local features of the sparse tensor;
and 8: forming an feature set of each sparse tensor by the normalization matrix generated in the step 6 and the candidate feature set generated in the step 7, and arranging the feature sets and labels of all the tensors into a list according to the number indexes, so as to form a training set of a Convolutional Neural Network (CNN) for short;
and step 9: taking the training set as the input of the customized CNN, and generating a trained network model after the training process;
step 10: when a new optimal storage format for inputting sparse tensor data is selected, re-executing the step 3-7, and combining the normalized matrix of the sparse tensor and the candidate feature set into a customized CNN prediction set;
step 11: inputting the prediction set into the trained network model, outputting the probability of obtaining the best performance for each sparse tensor storage format, and selecting the sparse format for obtaining the maximum probability as the storage format of the tensor execution tensor calculation;
step 12: repeating the steps 10-11 until the storage formats of all the sparse tensors to be predicted are selected;
step 13: if the automatic selection of the sparse tensor format is to be realized on a new hardware platform, there are two situations: the hardware architecture is the same as the software system, the trained model generated in the step 9 is reserved, and the step 12 is executed again; secondly, the hardware architecture and the software system have a plurality of differences, the trained model generated in the step 9 is not reserved, and the steps 2 to 12 are executed again;
step 14: if the matrix scaling method needs to be replaced, re-executing the step 5, finely adjusting the network structure of the customized CNN in the step 9, and re-executing the steps 8-12 to automatically select the storage formats of all the sparse tensors to be predicted;
step 15: if the sparse tensor format is to be automatically selected based on other tensor calculation, the step 2 is executed again to obtain a new label of corresponding tensor data, and then the steps 8 to 12 are executed again;
step 16: and if the storage format of the sparse tensor of other orders needs to be automatically selected, fine-tuning the network structure of the customized CNN in the step 9, and re-executing the steps 1-12.
2. The method for automatically selecting the sparse tensor storage format based on the convolutional neural network as claimed in claim 1, wherein: in the step 1, the index values of the sparse matrix in the rows and the columns are combined into the high dimension or the low dimension of the sparse tensor, and in addition, for real sparse matrix data obtained in the image bitmap production and operation, two or more sparse matrices are randomly selected and combined to obtain a preset number of sparse tensor data for network training.
3. The method for automatically selecting the sparse tensor storage format based on the convolutional neural network as claimed in claim 1, wherein: in the step 2, the execution time is converted into a label of corresponding tensor data, and the specific conversion method is to mark the sparse storage format with the shortest execution time as 1 and mark other sparse storage formats as 0, so as to obtain a bit label of the corresponding tensor.
4. The method for automatically selecting the sparse tensor storage format based on the convolutional neural network as claimed in claim 1, wherein: in the step 3, the sparse tensor data are reduced into a plurality of matrixes along a certain dimension by a mapping method, wherein the mapping method embodies the vertical distribution of the modeless index; the specific method comprises the following steps:
(1) let tensor X be third order and Mode along the first dimension1Mapping to matrix A, specifically, the tensor along Mode1Mapping and accumulating non-zero values of all slices Slice to the same Slice, wherein the non-zero values are all regarded as 1 during accumulation, and X is defined as the RI ×J×K,A∈RJ×KWhere I, J, K is the dimension of the tensor X in three directions, the matrix A is calculated as
Figure FDA0003631017110000021
The mapping matrix of the tensor X along other dimensions is analogized in the same way;
(2) let tensor X be of order N and along Mode1Mapping into several matrixes, specifically, mapping non-zero values into the same slice with fixed N-2 indexes each time, and finally generating
Figure FDA0003631017110000022
A matrix, defining
Figure FDA0003631017110000023
Figure FDA0003631017110000024
Wherein INRepresenting the magnitude of the Nth dimension, the tensor X is along Mode1The calculation formula mapped to matrix A is
Figure FDA0003631017110000025
Tensor X edge Mode1The other generated matrixes are analogized in the same way.
5. The method for automatically selecting the sparse tensor storage format based on the convolutional neural network as claimed in claim 1, wherein: in the step 4, the sparse tensor data is reduced into a matrix along a certain dimension by a flattening method, wherein the flattening method embodies the horizontal distribution of modal indexes, and the flattening, i.e., the tensor is unfolded by a matrixing method, and the specific method is as follows:
(1) assuming that the tensor X is third-order and each element in X is flattened to a corresponding matrix, defining X ∈ RI×J×K,B∈RI ×JKThe tensor X is along Mode1The formula of the calculation flattened to the matrix B is B(i,k×J+j)X (i, j, k), the tensor X is analogized by the same reasoning along other dimensions of the flattened matrix;
(2) assuming that the tensor X is of order N and each element in X is flattened into a corresponding matrix, define
Figure FDA0003631017110000031
The tensor X is along ModenThe calculation formula of flattening into a matrix is
Figure FDA0003631017110000032
Wherein the tensor element (i)1,i2,…,iN) Mapping to matrix elements (i)nJ), the tensor X is analogized by flattening the matrix along other dimensions.
6. The method for automatically selecting the sparse tensor storage format based on the convolutional neural network as claimed in claim 1, wherein: in step 5, scaling the irregular-size matrices generated in step 3 and step 4 into fixed-size matrices by density representation or histogram representation, where both methods use an acceptable matrix size to represent coarse-grained features of an original matrix, and the matrix scaling method is specifically as follows:
(1) the density representation captures the detailed density difference between different areas of the original matrix, and for the density representation, the number of non-zero elements in each block in the original matrix is calculated and is filled into the matrix with fixed size after scaling;
(2) the histogram representation further captures the distance between the element in the original matrix and the diagonal line, but part of the sparsely distributed features are lost, and for the histogram representation, the distance information of the diagonal line of each element in the original matrix in the rows and the columns is calculated and filled into the row histogram and the column histogram which are fixed in size after scaling.
7. The method for automatically selecting the sparse tensor storage format based on the convolutional neural network as claimed in claim 1, wherein: in step 7, the sparse tensor data are analyzed to obtain candidate feature sets of the tensor, wherein the candidate feature sets include global features and local features of the sparse tensor, the candidate feature sets supplement sparse features lost in the original tensor conversion process, in addition, the feature sets of the sparse tensor influence memory layout and calculation characteristics under a specific sparse format, and the features in the candidate feature sets are classified as follows:
(1) global features include the dimension size of the tensor, the Number of Non-zeros, NNZ for short, the sparsity, and features associated with NNZ under each index along a dimension;
(2) local features are associated with the slice and Fiber, including the number of slices and fibers, the slice to Fiber ratio, the NNZ under each slice and Fiber, and the number of fibers under each slice.
8. The method for automatically selecting the sparse tensor storage format based on the convolutional neural network as claimed in claim 1, wherein: in the step 9, the training set is used as an input of a customized CNN, and the customized Neural Network tnsnnet combines the CNN and a feed-forward Neural Network, which is referred to as FFNN for short, so as to better realize prediction of an optimal storage format in tensor calculation; the network structure design of TnsNet is as follows:
(1) TnsNet realizes network nesting, wherein an inner layer network BaseNet comprises all convolutional layers and pooling layers, and when the network is adapted to other scenes or a matrix scaling method is changed, the network structure and hyper-parameters of the BaseNet do not need to be changed, and only a full connection layer except the BaseNet needs to be changed;
(2) TnsNet applies two tensor reduction methods, and the matrixes generated after tensor conversion are output as vectors after BaseNet, then are combined into the combined characteristic of tensor, and flow into a full connection layer;
(3) TnsNet introduces a feature layer, and a candidate feature set of a sparse tensor is used as an input of the feature layer, is cascaded with the fully-connected layer in the step (2), and flows into the next fully-connected layer.
9. The method for automatically selecting the sparse tensor storage format based on the convolutional neural network as claimed in claim 1, wherein: in the step 13, if the automatic selection of the sparse tensor format is to be implemented on a new hardware platform, different retraining methods are applied to the two cases, and a continuous training method is adopted for the case that the hardware architecture and the software system are the same, that is, training is continued based on a trained model; for the situation that the hardware architecture and the software system have a plurality of differences, a method of training from the beginning is adopted, namely, a trained model is abandoned, and the network model is trained from the beginning directly.
10. The method for automatically selecting the sparse tensor storage format based on the convolutional neural network as claimed in claim 1, wherein: in step 14, if the method of scaling the matrix needs to be changed, re-executing step 5 and fine-tuning the network structure of the customized CNN in step 9 includes that, when the density representation is changed into the histogram representation, each input matrix is replaced by a row matrix and a column matrix, and the two matrices can be regarded as different channels of the image; in addition, each matrix is used as input of BaseNet, and output characteristic values are combined to a full connection layer; after the features of the rows and columns are combined, the remaining network structure of the TnsNet remains unchanged.
11. The method as claimed in claim 1, wherein in step 16, if the sparse tensor of other orders is to be automatically selected as the storage format, the network structure of the customized CNN in step 9 is finely adjusted, and for the N-order tensor, based on the tensor calculation of a certain dimension, a matrix is always generated after flattening, and a matrix is generated after mapping
Figure FDA0003631017110000041
And the matrixes are used for respectively inputting the plurality of matrixes generated after mapping, the output characteristic values are merged to a full connection layer, and after the mapping matrixes are merged, the rest network structure of TnsNet is kept unchanged.
CN202011430624.9A 2020-12-09 2020-12-09 Sparse tensor storage format automatic selection method based on convolutional neural network Active CN112529157B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011430624.9A CN112529157B (en) 2020-12-09 2020-12-09 Sparse tensor storage format automatic selection method based on convolutional neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011430624.9A CN112529157B (en) 2020-12-09 2020-12-09 Sparse tensor storage format automatic selection method based on convolutional neural network

Publications (2)

Publication Number Publication Date
CN112529157A CN112529157A (en) 2021-03-19
CN112529157B true CN112529157B (en) 2022-07-01

Family

ID=74998606

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011430624.9A Active CN112529157B (en) 2020-12-09 2020-12-09 Sparse tensor storage format automatic selection method based on convolutional neural network

Country Status (1)

Country Link
CN (1) CN112529157B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112686342B (en) * 2021-03-12 2021-06-18 北京大学 Training method, device and equipment of SVM (support vector machine) model and computer-readable storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110020724A (en) * 2019-03-18 2019-07-16 浙江大学 A kind of neural network column Sparse methods based on weight conspicuousness
CN111625476A (en) * 2019-02-28 2020-09-04 莫维迪乌斯有限公司 Method and apparatus for storing and accessing multidimensional data

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9471377B2 (en) * 2013-11-13 2016-10-18 Reservoir Labs, Inc. Systems and methods for parallelizing and optimizing sparse tensor computations

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111625476A (en) * 2019-02-28 2020-09-04 莫维迪乌斯有限公司 Method and apparatus for storing and accessing multidimensional data
CN110020724A (en) * 2019-03-18 2019-07-16 浙江大学 A kind of neural network column Sparse methods based on weight conspicuousness

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
IA-SpGEMM: An Input-aware Auto-tuning Framework for Parallel Sparse Matrix-Matrix Multiplication;Zhen Xie等;《ACM》;20190628;第94-105页 *
tensor toolbox 处理稀疏张量;其他;《https://www.codetd.com/article/4927895》;20190115;第1-6页 *

Also Published As

Publication number Publication date
CN112529157A (en) 2021-03-19

Similar Documents

Publication Publication Date Title
US11301727B2 (en) Efficient image classification method based on structured pruning
Wang et al. Deep mixture of experts via shallow embedding
US11657267B2 (en) Neural network apparatus, vehicle control system, decomposition device, and program
CN113627389A (en) Target detection optimization method and device
CN110188827B (en) Scene recognition method based on convolutional neural network and recursive automatic encoder model
CN113658100A (en) Three-dimensional target object detection method and device, electronic equipment and storage medium
CN112613536A (en) Near infrared spectrum diesel grade identification method based on SMOTE and deep learning
CN117251754A (en) CNN-GRU energy consumption prediction method considering dynamic time packaging
CN112529157B (en) Sparse tensor storage format automatic selection method based on convolutional neural network
Pichel et al. A new approach for sparse matrix classification based on deep learning techniques
CN108564116A (en) A kind of ingredient intelligent analysis method of camera scene image
CN113705394B (en) Behavior recognition method combining long time domain features and short time domain features
CN113516019B (en) Hyperspectral image unmixing method and device and electronic equipment
CN114463636A (en) Improved complex background remote sensing image target detection method and system
Qi et al. Learning low resource consumption cnn through pruning and quantization
Geng et al. Pruning convolutional neural networks via filter similarity analysis
CN110378356A (en) Fine granularity image-recognizing method based on multiple target Lagrange canonical
CN114494284A (en) Scene analysis model and method based on explicit supervision area relation
CN112364193A (en) Image retrieval-oriented method for fusing multilayer characteristic deep neural network model
CN112329924A (en) Method for improving prediction performance of neural network
Su et al. A dual quantum image feature extraction method: PSQIFE
CN117933345B (en) Training method of medical image segmentation model
CN114663690B (en) System for realizing breast cancer classification based on novel quantum frame
CN114241249B (en) Image classification method and system based on target detection algorithm and convolutional neural network
López-Cifuentes et al. Attention-based Knowledge Distillation in Multi-attention Tasks: The Impact of a DCT-driven Loss

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant