Deep learning system and method based on discrete vectors
Technical Field
The invention relates to the field of deep learning, in particular to a deep learning system and method based on discrete vectors.
Background
As Deep Neural Networks (DNNs) become larger and more complex, as are the features embodied in practical biological neural networks, discreteness is becoming the most critical dimension of DNN exploration efficiency and scalability. A DNN model is typically modeled as a Data Flow Graph (DFG), where each node in the DFG is an operator with one or more input and output vectors.
Model discretization involves introducing some specific discretization patterns on the vector: quantification is either pruning or a combination of both. By careful quantization and pruning, the DNN model can be compressed to a smaller memory footprint without losing much accuracy. However, current deep learning systems have not been able to effectively exploit the discreteness: for various reasons, the increase in discreteness does not translate into a corresponding increase in efficiency. First, the computation of a general discrete operation is still not optimal.
For example, CUDA library custarse for discrete matrix operations has proven to perform poorly. Second, as DNN computation tends to require multiple stages, the discrete patterns may vary significantly between the different stages, which makes it difficult to develop a discrete perceptual optimization for the end-to-end gain. Finally, any effective discrete-aware optimization may involve additional support across the vertical stack, from deep learning frameworks, compilers, optimizers, operators, and kernels, and up to hardware, inadequate support of software and hardware at any one layer may result in inefficiency
The currently existing discrete models are very common, and various discrete modes are currently under study: co-granularity discretization, including channel granularity discretization and block discretization, involves pruning sub-blocks of channels or vectors, and is associated with certain operators, so long as any element of a vector with fine granularity discretization can be pruned. Quantization algorithms represent models at different levels of precision, even with different hybrid precision within a neural network layer or single vector. Some studies have further combined pruning and quantization to achieve higher accuracy under stringent latency and memory constraints. Overall, pruning and quantization have proven to be effective in reducing the size and delay of certain deep learning models, sometimes by more than a factor of 10, without losing much accuracy.
Currently, the existing discrete model has the following defects:
1. the discretization of the model does not translate directly into performance benefits: we now use "proxy metrics" (e.g., failure rate or floating point operations per second) to evaluate the effect of the current strategy, but such methods of model reasoning delay are flawed and can lead to inaccurate results. For example, when the weight of an operator is clipped by 50% with fine-grained discreteness, even though its failure rate may theoretically be reduced by half, the actual model delay may become even higher using the default discrete kernel. One of the reasons for this is that current general purpose discrete core implementations and current policy support are inadequate. A discrete kernel tends to set a threshold to determine whether a vector or a row/column is discrete or dense, this coarse-grained discretization assumption provides a limited upper optimization limit for the discrete kernel, while we can only use discrete encoding to reduce memory usage;
2. insufficient return on end performance: discrete algorithms typically focus on exploring the discreteness (e.g., convolution) of certain DNN operators. However, when placed in an end-to-end deep learning model, the discrete pattern of the entire model may be affected by each operator in the model, which may introduce complex discrete patterns that are difficult to understand or optimize, resulting in reduced end-to-end return values of the discreteness. As shown in fig. 6, vector W2 shows a fine-grained discrete pattern (63% discrete pattern), even without further complexity, the initial discrete pattern of W2 causes a chain effect. W2 propagates its discrete properties to downstream and upstream vectors, including W1, T2, T3, T4, T5, and W5. Since the second column of W2 is trimmed, the second column of T3 is destined to be all zeros, and thus is correspondingly trimmed (e.g., t2×w2=t3). Similarly, when the third row of W2 is trimmed, the third column of T2 is also trimmed. Thus, if the deep learning compiler can take advantage of this spread of discreteness, end-to-end discreteness sense optimization can be further achieved;
3. additional support across the vertical stack: the discretization strategy of the model can often work in a vertical stack, but because there is a lack of a common foundation to build the optimization strategy throughout the computation, it is not uniform in terms of different software and hardware implementation, so administrators often have to implement their discretization algorithm manually end-to-end.
In general, the problem faced is that the general discrete kernels are still far from optimal; it is difficult to match the discrete methods of the current design; global benefit is not focused on the local optimization metrics; the support for discrete innovation is insufficient.
For the problems in the related art, no effective solution has been proposed at present.
Disclosure of Invention
Aiming at the problems in the related art, the invention provides a deep learning system and a method based on discrete vectors, which are used for overcoming the technical problems existing in the prior related art.
For this purpose, the invention adopts the following specific technical scheme:
according to one aspect of the present invention, there is provided a discrete vector-based deep learning system including: the system comprises a discreteness attribute export module, a discreteness attribute propagation module, a code generation module and a code execution module;
the discrete attribute deriving module is used for deriving the discrete attribute of the initial vector in the DNN model;
the discrete attribute propagation module is used for acquiring the discrete attributes of all vectors in the DNN model;
the code generation module is used for converting the original execution plan into a new execution plan with a given discreteness mode and generating an end-to-end DNN code;
the code execution module is used for configuring the DNN model by using the generated end-to-end DNN code and acquiring trusted performance feedback.
Further, the discrete attribute for acquiring all vectors in the DNN model includes the following steps:
acquiring the discrete attribute of the initial vector;
propagating downstream and upstream along the dataflow graph of the DNN model in granularity of the operators using initial vector-based discrete attributes;
the discrete properties of all vectors in the DNN model are derived.
Further, the method for converting an original execution plan into a new execution plan having a given discretization mode and generating end-to-end DNN code includes the steps of:
generating an original execution plan by using the DNN model, and converting the original execution plan into a new execution plan with given discreteness attribute;
compiling and transferring the new execution plan;
performing specialized code processing by utilizing the discrete attribute in the DNN model;
efficient end-to-end DNN codes are generated based on the processing of the discretionary attributes and specialized codes in the discretionary mode.
Further, the trusted performance feedback includes memory consumption and latency for the whole or a specific portion of the DNN model.
Further, the generating an original execution plan by using the DNN model, and converting the original execution plan into a new execution plan with given discretization attribute includes the following steps:
converting the weight vector into two different vectors;
introducing two operators to process different two vectors respectively;
using hardware instructions with the two operators using respective quantization schemes;
the original execution plan with one vector operation is converted into a new execution plan with given discrete type attributes.
Further, the compiling and transferring process for the new execution plan includes the following steps:
converting vectors with complex discrete patterns to a combination of simpler discrete patterns;
generating an active kernel for each simplified discrete pattern;
rewriting the execution plan of the operator to accommodate the new operator to calculate a newly transformed vector;
code generation is performed on the converted new execution plan, and code with discrete awareness is executed exclusively.
Further, the specialized code processing using the discrete attribute in the DNN model includes the following steps:
the discrete attribute is used to specialize the code during the code generation, the discrete sensing optimization is performed, and the efficient custom kernel code is generated in a given discrete mode by utilizing the converted new execution plan and the discrete mode in the DNN model.
According to another aspect of the present invention, there is provided a discrete vector-based deep learning method, the method comprising the steps of:
s1, deriving a discrete attribute of an initial vector in a DNN model;
s2, acquiring the discrete attribute of all vectors in the DNN model based on the discrete attribute of the initial vector;
s3, converting the original execution plan into a new execution plan with a given discreteness mode based on the obtained discreteness attributes of all vectors in the DNN model, and generating an end-to-end DNN code;
s4, configuring a DNN model by using the generated end-to-end DNN code, and acquiring trusted performance feedback.
Further, the acquiring the discrete attribute of all vectors in the DNN model based on the discrete attribute of the initial vector includes the following steps:
s21, acquiring a discreteness attribute of an initial vector;
s22, using the discrete attribute based on the initial vector to broadcast downstream and upstream along the data flow graph of the DNN model in the granularity of the operators;
s23, deducing the discrete attribute of all vectors in the DNN model.
Further, the converting the original execution plan into a new execution plan with a given discrete mode based on the discrete attribute of all vectors in the obtained DNN model, and generating the end-to-end DNN code includes the following steps:
s31, generating an original execution plan by using a DNN model, and converting the original execution plan into a new execution plan with given discreteness attribute;
s32, compiling and transmitting the new execution plan;
s33, performing specialized code processing by utilizing the discrete attribute in the DNN model;
s34, generating an efficient end-to-end DNN code based on the discrete attribute in the discrete mode and the processing of the specialized code.
The beneficial effects of the invention are as follows:
1. the invention enables specialization of code for each operator, specifying a specific set of conversion rules for the different operators, which can decompose complex discrete attributes into a combination of simple attributes with known effective optimizations, and then make optimal decisions throughout the stack by evaluating execution plans.
2. The system provided by the invention is highly customizable and extensible, not only can utilize the new method of discretization to determine new discretization attributes and modes, but also can provide new vector propagation rules of the discretization attributes, and incorporate new discrete sensing operators, kernels and hardware accelerators into the transformation rules.
3. The invention models the discreteness through a new abstraction of the discrete attribute vector, can extend the existing vector abstraction by using the discrete attribute, and for the initial discrete attribute, the system carries out attribute propagation according to the automatically defined propagation rule or knowledge in a specific field so as to infer the discrete attribute of all other vectors in the deep learning model, and compared with the original discrete vector, the discrete attribute propagation exposes more optimization opportunities in the whole model.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a functional block diagram of a discrete vector based deep learning system according to an embodiment of the present invention;
FIG. 2 is a diagram of the main flow architecture of a discrete vector based deep learning system according to an embodiment of the present invention;
FIG. 3 is an exemplary diagram of discrete attribute vectors of a discrete vector-based deep learning system in accordance with an embodiment of the present invention;
FIG. 4 is a schematic representation of the dispersion attribute propagation of a discrete vector based deep learning system in accordance with an embodiment of the present invention;
FIG. 5 is a flow chart of compilation delivery in a discrete vector-based deep learning system in accordance with embodiments of the present invention;
fig. 6 is a diagram of the discrete nature of propagation vectors along a deep learning network for a discrete vector based deep learning system in accordance with an embodiment of the present invention.
In the figure:
1. a discreteness attribute derivation module; 2. a discrete attribute propagation module; 3. a code generation module; 4. and a code execution module.
Detailed Description
For the purpose of further illustrating the various embodiments, the present invention provides the accompanying drawings, which are a part of the disclosure of the present invention, and which are mainly used for illustrating the embodiments and for explaining the principles of the operation of the embodiments in conjunction with the description thereof, and with reference to these matters, it will be apparent to those skilled in the art to which the present invention pertains that other possible embodiments and advantages of the present invention may be practiced.
According to an embodiment of the invention, a deep learning system and a method based on discrete vectors are provided.
The present invention will now be further described with reference to the accompanying drawings and detailed description, as shown in fig. 1-2, of a discrete vector-based deep learning system according to an embodiment of the present invention, the discrete vector-based deep learning system comprising: a discrete attribute export module 1, a discrete attribute propagation module 2, a code generation module 3 and a code execution module 4;
the discrete attribute deriving module 1 is used for deriving the discrete attribute of the initial vector in the DNN model;
specifically, the discrete attribute vector is a simple and powerful abstraction. An example of a discrete attribute vector is shown in fig. 3: in addition to the conventional vector, the discrete attribute vector further provides a discrete attribute, which is an additional vector in which each element represents the discrete attribute of the corresponding element in the original vector, which allows the user to specify an arbitrary pattern of discreteness in a vector, 4 representing a 4-bit integer, 8 representing an 8-bit integer, and 0 representing a blank in fig. 3.
For example, the lower right element may be represented using 8 bits and the second row in the vector is pruned, specifying at least some of the discrete properties of the discrete property vector before running the system, so that the discrete pattern may be understood at compile time, thereby achieving further optimization.
The discrete attribute propagation module 2 is used for acquiring the discrete attributes of all vectors in the DNN model;
wherein, the discrete attribute for obtaining all vectors in the DNN model comprises the following steps:
acquiring the discrete attribute of the initial vector;
propagating downstream and upstream along the dataflow graph of the DNN model in granularity of the operators using initial vector-based discrete attributes;
deriving the discrete attribute of all vectors in the DNN model;
specifically, to maximize the model's discreteness, the system will perform discrete attribute propagation, deriving the discrete attributes of all vectors in the DNN model. The discretion attribute is bi-directionally propagated, i.e., given an initial discretion attribute, it can be propagated downstream and upstream along the data flow graph of the DNN model in the granularity of operators.
The operator matrix multiplication as shown in fig. 4 is an example:
in (a), the third row and second column of the discrete attribute display vector W2 are pruned, the second column of the downstream output vector W3 has been pruned by discrete attribute propagation, and at the same time, one column of W1 is pruned due to the discrete attribute of W2;
as can be seen from (b), the discrete properties of the downstream vector W3 can be propagated back to the upstream vector W2, and eventually, one discrete property in one vector may affect the entire deep learning model, and such discrete property can be used to optimize the code during the compilation stage. For example, the entire code of the second column of the calculation W2 in fig. 4 may be deleted (dead code elimination).
The code generation module 3 is used for converting the original execution plan into a new execution plan with a given discreteness mode and generating an end-to-end DNN code;
wherein the method for converting an original execution plan into a new execution plan having a given discretization mode and generating end-to-end DNN code comprises the steps of:
generating an original execution plan by using the DNN model, and converting the original execution plan into a new execution plan with given discreteness attribute;
in particular, for example, it may decouple one vector into two, each vector having different discreteness properties, to use different quantization schemes (e.g., 16-bit and 8-bit quantization). Accordingly, the decoupled vector will require the original operator to be rewritten as two new operators, each with different hardware for computation.
Wherein the generating an original execution plan using the DNN model, converting the original execution plan into a new execution plan having given discretization attributes, comprises the steps of:
converting the weight vector into two different vectors;
introducing two operators to process different two vectors respectively;
using hardware instructions with the two operators using respective quantization schemes;
converting an original execution plan with one vector operation into a new execution plan with given discrete attributes;
specifically, the weight vector W as shown in fig. 5 represents a mixed precision vector in which two coarse-granularity blocks are quantized using 8 bits and one fine-granularity element is quantized using 32 bits; converting W to W1 and W2; two operators are introduced simultaneously to process w1×i and w2×i, respectively, each using appropriate hardware instructions with a corresponding quantization scheme, so that the original execution plan with one vector operation is converted into a new execution plan, where two further vector operations are needed.
Compiling and transferring the new execution plan;
specifically, the converted new execution plan is run through a compilation pass to specialize in executing code with a discrete perception, which enables the generation of customized highly customized code for observed discrete patterns. For example, any pruned element in a vector may cause the dead code of the corresponding computed portion that relates to the particular element to be eliminated.
The compiling and transferring process for the new execution plan comprises the following steps:
converting vectors with complex discrete patterns to a combination of simpler discrete patterns;
generating an active kernel for each simplified discrete pattern;
rewriting the execution plan of the operator to accommodate the new operator to calculate a newly transformed vector;
executing code generation on the converted new execution plan, and specializing to execute the code with discrete perception;
performing specialized code processing by utilizing the discrete attribute in the DNN model;
wherein the specialized code processing using the discrete attribute in the DNN model comprises the steps of:
the discrete attribute is used to specialize the code during the code generation, the discrete sensing optimization is performed, and the efficient custom kernel code is generated in a given discrete mode by utilizing the converted new execution plan and the discrete mode in the DNN model.
Specifically, through the discrete attribute vector abstraction, it is possible to know exactly which element's computation can be eliminated, or which vector operation may benefit from a special hardware instruction, specializing our strategy. Discrete specialization is implemented at the level of code specialization, which is a specialization strategy for the DNN operator efficient kernel code for discrete perception. It can typically specialize the various discrete modes generated by pruning and quantization into efficient kernel implementations. Assume that the deep learning operators (e.g., matmul and Conv2 d) are implemented with an efficient tiling strategy that can be derived by a conventional DNN compiler.
Efficient end-to-end DNN codes are generated based on the processing of the discretionary attributes and specialized codes in the discretionary mode.
The code execution module 4 is configured to configure the DNN model by using the generated end-to-end DNN code, and obtain trusted performance feedback.
Specifically, with the generated end-to-end DNN code, the discretization algorithm designer can configure the DNN model to obtain trusted performance feedback, including memory consumption and latency for the whole or specific portions of the DNN model. Based on the feedback, the algorithm designer may further update the discrete properties in certain vectors and iteratively repeat the process.
According to another embodiment of the present invention, there is provided a discrete vector-based deep learning method including the steps of:
s1, deriving a discrete attribute of an initial vector in a DNN model;
s2, acquiring the discrete attribute of all vectors in the DNN model based on the discrete attribute of the initial vector;
wherein, based on the discrete attribute of the initial vector, acquiring the discrete attribute of all vectors in the DNN model comprises the following steps:
s21, acquiring a discreteness attribute of an initial vector;
s22, using the discrete attribute based on the initial vector to broadcast downstream and upstream along the data flow graph of the DNN model in the granularity of the operators;
s23, deducing the discrete attribute of all vectors in the DNN model;
s3, converting the original execution plan into a new execution plan with a given discreteness mode based on the obtained discreteness attributes of all vectors in the DNN model, and generating an end-to-end DNN code;
wherein the converting the original execution plan into a new execution plan with a given discretization mode based on the discretization attribute of all vectors in the obtained DNN model, and generating the end-to-end DNN code comprises the following steps:
s31, generating an original execution plan by using a DNN model, and converting the original execution plan into a new execution plan with given discreteness attribute;
s32, compiling and transmitting the new execution plan;
s33, performing specialized code processing by utilizing the discrete attribute in the DNN model;
s34, generating an efficient end-to-end DNN code based on the discrete attribute in the discrete mode and the processing of the specialized code;
s4, configuring a DNN model by using the generated end-to-end DNN code, and acquiring trusted performance feedback.
In summary, with the above technical solution of the present invention, the present invention can implement code specialization for each operator, specify a specific set of conversion rules for different operators, these rules can decompose complex discrete attributes into a combination of simple attributes with known effective optimization, and then make optimal decisions in the whole stack by evaluating the execution plan; the system provided by the invention is highly customizable and expandable, not only can utilize the new method of discretization to determine new discretization attribute and mode, but also can provide new vector propagation rules of the discretization attribute, and combine new discrete sensing operators, kernels and hardware accelerators into the transformation rules; the invention models the discreteness through a new abstraction of the discrete attribute vector, can extend the existing vector abstraction by using the discrete attribute, and for the initial discrete attribute, the system carries out attribute propagation according to the automatically defined propagation rule or knowledge in a specific field so as to infer the discrete attribute of all other vectors in the deep learning model, and compared with the original discrete vector, the discrete attribute propagation exposes more optimization opportunities in the whole model.
The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, alternatives, and improvements that fall within the spirit and scope of the invention.