WO2023207039A1 - Data processing method and apparatus, and device and storage medium - Google Patents

Data processing method and apparatus, and device and storage medium Download PDF

Info

Publication number
WO2023207039A1
WO2023207039A1 PCT/CN2022/132429 CN2022132429W WO2023207039A1 WO 2023207039 A1 WO2023207039 A1 WO 2023207039A1 CN 2022132429 W CN2022132429 W CN 2022132429W WO 2023207039 A1 WO2023207039 A1 WO 2023207039A1
Authority
WO
WIPO (PCT)
Prior art keywords
quantization
target
optional
strategy
feature
Prior art date
Application number
PCT/CN2022/132429
Other languages
French (fr)
Chinese (zh)
Inventor
王桂彬
丛士钧
贾铭
贾磊
Original Assignee
北京百度网讯科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京百度网讯科技有限公司 filed Critical 北京百度网讯科技有限公司
Publication of WO2023207039A1 publication Critical patent/WO2023207039A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Definitions

  • This disclosure relates to the field of artificial intelligence technology and the field of deep learning technology, and can be applied to scenarios such as speech recognition, natural language processing, and information recommendation.
  • the present disclosure provides a data processing method, device, equipment and storage medium.
  • a data processing method including:
  • each row of feature elements of the feature matrix is divided into at least two feature element sub-segments, and each column of weight elements of the weight matrix is divided into at least two weight element sub-segments;
  • quantization processing is performed on at least two feature element sub-segments and at least two weight element sub-segments, and based on the quantization processing results, the output characteristics of the target quantization network layer are determined.
  • a data processing apparatus including:
  • a matrix acquisition module configured to acquire the feature matrix input by the target quantization network layer and the weight matrix of the target quantization network layer; wherein the number of columns of the feature matrix is equal to the number of rows of the weight matrix;
  • a matrix segmentation module configured to divide each row of feature elements of the feature matrix into at least two feature element sub-segments according to the target quantification target segmentation coefficient of the network layer, and weight each column of the weight matrix The elements are divided into at least two weight element sub-segments;
  • a quantization processing module configured to perform quantization processing on the at least two feature element sub-segments and the at least two weight element sub-segments according to the target quantization bit number of the target quantization network layer;
  • a feature determination module is configured to determine the output features of the target quantization network layer based on the quantization processing results.
  • an electronic device including:
  • a memory communicatively connected to at least one processor; wherein,
  • the memory stores instructions that can be executed by at least one processor, and the instructions are executed by at least one processor, so that at least one processor can execute the above-mentioned data processing method.
  • a non-transitory computer-readable storage medium storing computer instructions, wherein the computer instructions are used to cause the computer to execute the above-mentioned data processing method.
  • a computer program product including a computer program that implements the above-mentioned data processing method when executed by a processor.
  • Figure 1 is a flow chart of a data processing method provided by an embodiment of the present disclosure
  • Figure 2 is a flow chart of another data processing method provided by an embodiment of the present disclosure.
  • Figure 3 is a flow chart of another data processing method provided by an embodiment of the present disclosure.
  • Figure 4 is a flow chart of another data processing method provided by an embodiment of the present disclosure.
  • Figure 5 is a schematic structural diagram of a data processing device provided by an embodiment of the present disclosure.
  • Figure 6 is a schematic structural diagram of a feature determination module provided by an embodiment of the present disclosure.
  • Figure 7 is a schematic structural diagram of another data processing device provided by an embodiment of the present disclosure.
  • Figure 8 is a schematic structural diagram of a target strategy determination module provided by an embodiment of the present disclosure.
  • Figure 9 is a schematic structural diagram of another target strategy determination module provided by an embodiment of the present disclosure.
  • Figure 10 is a block diagram of an electronic device that implements a data processing method provided by an embodiment of the present disclosure.
  • Figure 1 is a flow chart of a data processing method provided by an embodiment of the present disclosure; the embodiment of the present disclosure is suitable for performing quantitative processing on the data calculation process of the target quantification network layer in the deep learning model, and is suitable for performing quantitative processing on the data calculation process of the target quantization network layer in the deep learning model.
  • the feature matrix input to the target quantization network layer and the weight matrix of the target quantization network layer are processed to obtain the output features of the target network layer.
  • the method may be executed by a data processing device, which may be implemented in software and/or hardware. Can be integrated into electronic devices configured with deep learning models.
  • the data processing method provided by this embodiment may include:
  • S101 Obtain the feature matrix input to the target quantization network layer and the weight matrix of the target quantization network layer.
  • the target quantization network layer may be a network layer that operates matrix multiplication operators in the deep learning model.
  • the matrix multiplication operators may include but are not limited to: fully connected operators and other derivative operators, such as transformer operators.
  • the feature matrix input to the target quantification network layer can be the input information input to the target network layer.
  • the feature matrix can be the input information of the deep learning model.
  • the feature matrix can be the output of the network layer located above the target quantization network layer in the deep learning model.
  • the weight matrix of the target quantization network layer can be the inherent network parameters of the weight coefficients of the target quantization network layer that are obtained during the network training stage and are used to characterize the input features of this layer.
  • the number of columns of the feature matrix needs to be equal to the number of rows of the weight matrix. That is, the size of the feature matrix is: m*k, and the size of the weight matrix is k*n. Among them, the values of m, k, and n are positive integers.
  • This embodiment can obtain the feature data input to the target quantization network layer as a feature matrix, and obtain the inherent weight parameters in the target quantization network layer as a weight matrix. If there are multiple input data to the target network layer, the input data whose number of columns is the same as the number of rows of the weight matrix can be selected as the feature matrix input to the target quantization network layer.
  • the target segmentation coefficient may be one of the quantization configuration parameters required when quantizing the operation process of the target quantization network layer. It is used to characterize the number of matrix elements contained in each sub-segment after division. For example, if the target segmentation coefficient is C, each C consecutive elements in the matrix can be divided into a segment, that is, the number of matrix elements contained in each divided subsegment is C. Among them, the value of C is a positive integer.
  • the value of the target segmentation coefficient may be predetermined. For example, one may be selected from a variety of optional segmentation coefficients as the target segmentation coefficient through a large number of test analyses. It can also be set based on experience, etc., which is not limited.
  • the target segmentation coefficient can be set to divide the feature elements of each row of the feature matrix and the matrix elements of each column of the weight matrix into equal parts. That is, if the number of columns of the feature matrix and the number of rows of the weight matrix are both k, then this When the value of the target segmentation coefficient C can be evenly divided by k.
  • the matrix elements in the feature matrix are called feature elements, and each group of feature elements after dividing the feature elements is regarded as a feature element sub-segment; the matrix elements in the weight matrix are called weight elements, and the weight elements are Each divided group of weight elements is treated as a weight element sub-segment.
  • each row of feature elements in the feature matrix can be divided into at least two segments with C adjacent feature elements as a group, each of which serves as a feature element sub-segment. ; Then according to the target segmentation coefficient C, each column of weight elements in the weight matrix is divided into at least two segments with C adjacent weight elements as a group, and each segment is used as a weight element sub-segment.
  • the characteristic matrix is matrix I, that is The weight matrix is the matrix W, that is And the target segmentation coefficient C is 4, then each row in the matrix I is divided based on the target segmentation coefficient C, and 8 characteristic element sub-segments are obtained, that is, the characteristic element sub-segment 1 (I 11 , I 12 , I 13 , I 14 ), characteristic element sub-segment 2 (I 15 , I 16 , I 17 , I 18 ), characteristic element sub-segment 3 (I 21 , I 22 , I 23 , I 24 ), characteristic element sub-segment 4 (I 25 ,I 26 ,I 27 ,I 28 ), characteristic element sub-segment 5 (I 31 ,I 32 ,I 33 ,I 34 ), characteristic element sub-segment 6 (I 35 ,I 36 ,I 37 ,I 38 ), characteristic Element sub-segment 7 (I 41 , I 42 , I 43 , I 44 ) and characteristic element element sub-s
  • weight element sub-segments namely weight element sub-segment 1 (W 11 , W 21 , W 31 , W 41 ), weight element sub-segment 2 ( W 51 , W 61 , W 71 , W 81 ), weight element sub-segment 3 (W 12 , W 22 , W 32 , W 42 ) and weight element sub-segment 4 (W 52 , W 62 , W 72 , W 82 ) .
  • weight element sub-segment 1 W 11 , W 21 , W 31 , W 41
  • weight element sub-segment 2 W 51 , W 61 , W 71 , W 81
  • weight element sub-segment 3 W 12 , W 22 , W 32 , W 42
  • weight element sub-segment 4 W 52 , W 62 , W 72 , W 82
  • S103 Perform quantization processing on at least two feature element sub-segments and at least two weight element sub-segments according to the target quantization bit number of the target quantization network layer, and determine the output characteristics of the target quantization network layer based on the quantization processing results.
  • the target quantization bit number may be another parameter among the quantization configuration parameters required when quantizing the operation process of the target quantization network layer. It is used to represent the degree of quantization of the matrix multiplication operator, that is, the smaller the value of the target number of quantization bits, the higher the degree of quantization.
  • the value of the target number of quantization bits in this embodiment is usually not greater than 4.
  • it can be The value is 1bit, 2bit or 4bit, etc.
  • the process of quantizing each feature element sub-segment and each weight element sub-segment based on the target number of quantization bits includes: determining the characteristics of the feature element sub-segment based on the feature element value within each feature element sub-segment.
  • the reference value for example, can be the feature element value with the largest absolute value within the feature element sub-segment as the feature reference value of the feature element sub-segment, and then based on the feature reference value and the target quantization bit number of the target quantization network layer, as follows: Formula (1) determines the quantified value of each feature element within the feature element sub-segment;
  • I′ i,p is the quantized value of the feature element in the i-th row and p-th column of the feature matrix I;
  • I i,p is the feature element in the i-th row and p-th column of the feature matrix I;
  • absmax(I i,s ) is the feature reference value of the s-th feature element sub-segment in the i-th row of feature matrix I;
  • B is the target quantization bit number of the target quantization network layer.
  • each weight element sub-segment determines the weight base value of the weight element sub-segment, and according to the weight base value and the target quantization bit number, determine the weight element sub-segment according to the following formula (2) The quantized value of the weight element.
  • W′ q,j is the quantized value of the weight element in the qth row and jth column of the weight matrix W;
  • W q,j is the weight element of the qth row and jth column of the weight matrix W;
  • absmax(I j,s ) is the weight reference value of the s-th weight element subsection of the j-th column of the weight matrix W;
  • B is the target quantization bit number of the target quantization network layer.
  • variables i, p, s, j, and q in this embodiment are positive integers.
  • the process of converting each feature element or weight element into its corresponding quantized value is essentially a process of quantizing the feature element or weight element into a low-bit integer corresponding to the target quantization bit number.
  • the quantification processing results can be used, that is, the feature reference value of each feature element sub-segment and each feature within the feature element sub-segment.
  • the quantized value of the element; as well as the weight reference value of each weight element sub-segment and the quantized value of each weight element within the weight element sub-segment, the target quantization network layer is determined through the process of low-bit matrix multiplication calculation and inverse quantization. Output features.
  • the target quantization network layer of the above solution in this embodiment can be located in any deep learning model configured with a matrix multiplication operator, for example, it can be located in an image recognition model, a speech recognition model, or a text semantic parsing model, etc.
  • the target quantification network layer can be deployed in the speech recognition model.
  • the corresponding feature matrix is the speech feature obtained after the speech segment is processed by the feature extraction layer; the output feature is used for semantic recognition of the speech segment. deal with.
  • each row of the feature matrix and each column of the weight matrix are divided into at least two weight element sub-segments, and then perform quantization processing on the divided feature element sub-segments and weight element sub-segments according to the target quantization bit number, and determine the output characteristics of the target quantization network layer based on the processing results.
  • this solution divides each row of the feature matrix and each column of the weight matrix into multiple sub-segments for quantization.
  • the target segmentation coefficient of the target quantization network layer may include a first coefficient and a second coefficient.
  • the method of segmenting the feature matrix and the weight matrix is: according to the target segmentation of the target quantization network layer According to the first coefficient in the coefficient, each row of feature elements of the feature matrix is divided into at least two feature element sub-segments; according to the second coefficient in the target segmentation coefficient, each column weight element of the weight matrix is divided into at least two weight elements.
  • the method of dividing the feature matrix based on the first coefficient and the method of dividing the weight matrix based on the second coefficient are similar to the methods introduced in the above embodiments and will not be described again here.
  • This method can divide the weight matrix and feature matrix into sub-segments based on different target segment coefficients, which improves the flexibility and diversity of the division rules, improves the subsequent matrix quantification, and determines the accuracy and flexibility of the output matrix based on the quantification results. sex.
  • Figure 2 is a flow chart of another data processing method provided by an embodiment of the present disclosure. Based on the above embodiments, the embodiments of the present disclosure explain in detail how to determine the output characteristics of the target quantization network layer based on the quantization processing results. As shown in Figure 2, the data processing method provided by this embodiment may include:
  • the number of columns of the feature matrix is equal to the number of rows of the weight matrix.
  • S203 Perform quantization processing on at least two feature element sub-segments and at least two weight element sub-segments according to the target quantization bit number of the target quantization network layer.
  • S204 Determine each characteristic element sub-segment in the characteristic matrix and the corresponding weight element sub-segment in the weight matrix, and use the corresponding characteristic element sub-segments and weight element sub-segments as a set of associated sub-segment pairs.
  • each characteristic element sub-segment of the characteristic matrix has a corresponding weight element sub-segment in each column of the weight matrix to determine the weight element sub-segment corresponding to the s-th characteristic element sub-segment of the i-th row in the characteristic matrix.
  • perform the following operations for each column of the weight matrix in turn: According to the position of each feature element in the i-th row of the feature element sub-segment, select the weight element where the weight element at the same position in each column of the weight matrix is located. Sub-segment, as the weight element sub-segment corresponding to each column of the weight matrix for the feature element sub-segment.
  • the characteristic matrix I is The characteristic matrix W is And the feature matrix I is divided into sub-segments based on the first coefficient C1 in the target segment coefficient, and the weight matrix W is divided based on the second coefficient C2 in the target segment coefficient.
  • the values of C1 and C2 are positive integers, and they can be the same or different.
  • the weight element sub-segment 1 (W 11 , W 21 , W 31 , W 41 ) where the weight element (i.e.
  • W 11 , W 21 , W 31 , W 41 is located, as the characteristic element sub-segment 1 in the weight matrix W
  • the weight element sub-segments corresponding to the characteristic element sub-segment 1 in other columns of the weight matrix W are determined in the same way, and will not be described again here.
  • the weight element sub-segment corresponding to element sub-segment 1 in the first column of the weight matrix W is used as this feature.
  • the weight element sub-segments corresponding to the characteristic element sub-segment 1 in other columns of the weight matrix W are determined in the same way, and will not be described again here.
  • the weight element sub-segment corresponding to the characteristic element sub-segment 2 (I 13 , I 14 ) in the first column of the weight matrix W is also the weight element sub-segment 1 (W 11 , W 21 , W 31 , W 41 ).
  • each characteristic element sub-segment in the row can be analyzed to compare with each column of the weight matrix.
  • the corresponding relationship between each weight element sub-segment, and the corresponding row in the feature matrix I and each column of the weight matrix, the feature element sub-segment and the weight element sub-segment with the corresponding relationship are regarded as a set of associated sub-segment pairs.
  • Sub-segment 1 (I 11 , I 12 , I 13 , I 14 ) and weight element sub-segment 1 (W 11 , W 21 , W 31 , W 41 ); feature element sub-segment 2 (I 15 , I 16 , I 17 ,I 18 ) and weight element sub-segment 2 (W 51 , W 61 , W 71 , W 81 ).
  • S205 Determine the output characteristics of the target quantization network layer based on the quantization processing results of the feature element sub-segments and weight element sub-segments in each group of associated sub-segment pairs.
  • the corresponding positions are The quantized value of the feature element and the quantized value of the weight element are calculated as low-bit products and then summed, and then the product summation result is multiplied with the feature reference value and the weight reference value to obtain the sub-inner product of the group of associated sub-segment pairs.
  • the position of the feature element corresponds to the position of the weight element.
  • the sub-inner product of each group of associated sub-segment pairs can be calculated through the following formula (3).
  • O i,s,j is the sub-inner product of the associated sub-segment pair corresponding to the s-th feature element sub-segment in the i-th row in the feature matrix I and the s-th weight element sub-segment in the j-th column in the weight matrix W.
  • C is the target segmentation coefficient
  • I′ i,t is the quantized value of the feature element in the i-th row and t-th column in the feature matrix I
  • W′ t,j is the feature element in the t-th row and j-th column in the weight matrix W quantified value.
  • bsmax(I i,s ) is the characteristic reference value of the s-th feature element sub-segment in the i-th row in the feature matrix I; bsmax(W s,j ) is the s-th weight element sub-segment in the j-th column in the weight matrix W Weight base value.
  • the value of t is a positive integer.
  • the output characteristics of the target quantization network layer are determined based on the sub-inner product of each group of associated sub-segment pairs. It may be to sum up the inner products of multiple groups of associated sub-segment pairs with the same number of corresponding rows in the feature matrix and the same number of corresponding columns in the weight matrix to obtain the element value at the corresponding row and corresponding column position in the output feature.
  • O i,j is the element value of the i-th row and j-th column in the matrix where the output feature is located; k is the total number of columns of the feature matrix (also the total number of rows of the weight matrix); O i,s,j is the feature matrix I The sub-inner product of the associated sub-segment pair corresponding to the s-th feature element sub-segment in the i-th row in the weight matrix W and the s-th weight element sub-segment in the j-th column in the weight matrix W.
  • each row of the feature matrix and each column of the weight matrix are divided into at least two weight element sub-segments, and then quantize the divided feature element sub-segments and weight element sub-segments according to the target quantization bit number, and determine the corresponding feature element sub-segments and weight element sub-segments as a set of associations
  • the output characteristics of the target quantization network layer are determined based on the quantization processing results of the feature element sub-segments and weight element sub-segments in each group of associated sub-segment pairs.
  • this solution determines the output characteristics of the target quantization network layer based on the feature element sub-segment and the weight element sub-segment, it first determines the correspondence between the feature element sub-segment and the weight element sub-segment. Based on this correspondence, it can be more accurate and The output features are quickly determined, thereby ensuring the accuracy of the target quantization network layer operation results.
  • this embodiment uses the Tensor Core computing unit of the GPU developed by NVIDIA to determine the output characteristics of the target quantization network layer based on the quantization processing results.
  • the implementation method is as follows: after obtaining the quantization processing results of each feature element sub-segment and each weight element sub-segment based on the above embodiment, the quantization results are sequentially loaded into the cache space of the Tensor Core computing unit, and then the cache space is The quantization results of the feature element sub-segments contained in each group of associated sub-segment pairs (i.e., the feature reference value and the quantized value of the feature element) and the quantification results of the weight element sub-segment (i.e., the weight reference value and the quantized value of the weight element) are as The input of the Tensor Core computing unit.
  • the Tensor Core computing unit can first provide low-bit multiplication calculation based on the input quantization result.
  • the quantized value of the feature element corresponding to the position and the quantized value of the weight element are multiplied and then summed to obtain a low-bit multiplication calculation.
  • the bit calculation result (for example, when the target number of quantization bits is 4, the low-bit calculation result obtained at this time is an integer result of type int32), and then the inverse quantization calculation is performed, that is, the low-bit calculation result is combined with the feature reference value and weight reference value Calculate the product to obtain the sub-inner product of each group of associated sub-segment pairs, which is a single-precision floating point type; finally, based on the sub-inner product of each group of associated sub-segment pairs, determine the output characteristics of the target quantization network layer.
  • This embodiment provides an example of implementing the data processing method of this embodiment based on the Tensor Core computing unit of the GPU developed by NVIDIA, which provides a basis for subsequent customization of chips (such as application specific integrated circuit (Application Specific Integrated Circuit, ASIC) chips).
  • chips such as application specific integrated circuit (Application Specific Integrated Circuit, ASIC) chips.
  • ASIC Application Specific Integrated Circuit
  • the above data processing method in this embodiment sequentially completes the process of converting floating point numbers into low-bit integers, low-bit matrix multiplication, and inverse quantization. Since the value of the weight matrix will not change during the entire calculation process, its quantization process can be completed offline, while the input feature matrix needs to be quantized online.
  • the size of the target segmentation coefficient C of the target quantization network layer will directly affect the accuracy of the quantization process. Generally, the larger the target segmentation coefficient C, the lower the numerical accuracy of the quantification representation, and the corresponding accuracy of the final output feature will also decrease. Decrease; the smaller the target segmentation coefficient C, the higher the numerical accuracy of the quantitative representation, and the corresponding accuracy of the final output feature will also increase. That is, the target segmentation coefficient C affects the calculation efficiency.
  • the target segmentation coefficient C is the key to balancing model accuracy and speed, and the value selection needs to be customized according to the scene requirements.
  • Figure 3 is a flow chart of another data processing method provided by an embodiment of the present disclosure. Based on the above embodiments, the embodiment of the present disclosure explains how to determine the target quantization network layer, the target segmentation coefficient and the target quantization bit number of the target quantization network layer. As shown in Figure 3, the data provided by this embodiment Treatment methods may include:
  • the original model can be a deep learning model that needs to be quantized, which contains at least one network layer capable of quantification, that is, an optional quantization network layer.
  • This optional quantized network layer contains matrix multiplication operators.
  • the optional quantization strategy is the strategy used to quantize the original model, which includes: an optional quantization network layer, an optional segmentation coefficient of the optional quantization network layer, and an optional number of quantization bits.
  • the optional quantization network layers can be different, but the optional segmentation coefficients and the number of optional quantization bits corresponding to different optional quantization network layers are the same; it can also be the optional quantization network included.
  • the layers are the same, but the optional segmentation coefficients and/or the number of optional quantization bits corresponding to the same optional quantization network layer are different; it is also possible to include optional quantization network layers with different optional segmentation coefficients and the number of optional quantization bits. etc., there is no limitation on this.
  • an implementation method for determining the optional quantization strategy of the original model may be: first determine the network layer containing the matrix multiplication operator in the original model as the optional quantization network layer, and then based on experience for each possible quantization network layer. Select the quantization network layer to configure at least one optional segmentation coefficient and the number of optional quantization bits; then use each optional quantization network layer and its corresponding optional segmentation coefficient and the number of optional quantization bits in turn as the original model An optional quantification strategy.
  • Another way to implement it is to first determine the network layer containing the matrix multiplication operator in the original model as an optional quantization network layer, and then for each optional quantization network layer, segment from the predetermined alternative quantization Segmented coefficients are randomly selected from the coefficient set, and the number of quantization bits is randomly selected from the set of alternative quantization bit numbers, and then randomly combined with the optional quantization network layer to obtain multiple optional quantization strategies.
  • S302 Obtain the quantitative contribution information of the optional quantization strategy obtained by performing data processing on the original model based on the optional quantization strategy.
  • the quantification contribution information in this embodiment refers to the degree of contribution of the optional quantization strategy to the quantization effect of the original model, which may include: model accuracy information and compression volume information.
  • the model accuracy information is the accuracy value of the model after quantizing the original model based on the optional quantization strategy.
  • the compressed volume information is the compressed volume value of the model volume after quantizing the original model based on this optional quantization strategy compared to before quantization.
  • the original model is quantized based on the optional quantization strategy, the optional quantization network layer corresponding to the optional quantization strategy is found in the original model, and then the optional quantization strategy is The optional segmentation coefficients and optional quantization bit numbers of the quantization network layer are assigned to the quantization parameters of the optional quantization network layer in the original model, and then the verification data set of the original model is input into the original model.
  • Each network layer of will perform data processing based on its network parameters to obtain the corresponding output results.
  • This embodiment mainly obtains the output results of the optional quantization network layer, that is, the test output features; combine the test output features with the optional quantization
  • the network layer performs error analysis on the real output features before quantization processing based on the optional segmentation coefficient and the optional number of quantization bits, and obtains the model accuracy value in the quantization contribution information corresponding to the optional quantization strategy. Then, the compression volume information in the quantization contribution information corresponding to the optional quantization strategy is determined according to the number of optional quantization bits in the optional quantization strategy.
  • the optional quantization network layer assigned optional segmentation coefficients and optional quantization bit numbers determines the test output characteristics based on its input feature matrix and its own weight matrix.
  • the method of determining the target quantization strategy from the optional quantization strategies may be to weigh the model accuracy information and the compression volume information at the same time, and select an optional quantization strategy with a relatively small loss in model accuracy and a relatively large compression volume as the Goal quantification strategy.
  • one implementation method is: first, based on the model accuracy information in the quantitative contribution information, select optional quantization strategies with model accuracy losses within an acceptable range, and then determine the corresponding quantization strategies for this part of the selected optional quantization strategies. Compression volume, use at least one of the top compression volumes as the target quantization strategy.
  • Another possible implementation method is: according to the model accuracy information in the quantified contribution information, sort multiple optional quantization strategies in order from high to low model accuracy, and then based on the compression volume information in the multiple quantified contribution information and the expected compression volume, and determine the target quantization strategy from the optional quantization strategies in order from high to low. For example, it is determined which of the top-ranked optional quantization strategies correspond to a total compression volume information that can reach the expected compression volume, and then these optional quantization strategies are used as target quantization strategies.
  • subsequent data processing operations can be performed based on the corresponding target quantization network layer in each target quantization strategy and its corresponding target segmentation coefficient and target number of quantization bits.
  • the operation process of the target quantization network layer in the original model can be quantified, thereby achieving the effect of quantifying the original model.
  • the number of columns of the feature matrix is equal to the number of rows of the weight matrix.
  • S306 Perform quantization processing on at least two feature element sub-segments and at least two weight element sub-segments according to the target quantization bit number of the target quantization network layer, and determine the output characteristics of the target quantization network layer based on the quantization processing results.
  • the original model is controlled to perform data processing based on the optional quantization strategy, and the quantification contribution information of the optional quantization strategy is determined based on the processing results.
  • the quantitative contribution information is used to determine the target quantization strategy, and then the feature matrix input to the target quantization network layer and the weight matrix of the target quantization network layer are processed according to the target segmentation coefficient and the target quantization bit number in the target quantization strategy to obtain the output features.
  • This solution determines the final quantization strategy of the model based on the quantitative contribution information of multiple optional quantization strategies. While ensuring the accuracy of model quantification, it also reduces the model volume, thereby improving the accuracy of model quantification.
  • Figure 4 is a flow chart of another data processing method provided by an embodiment of the present disclosure. Based on the above embodiments, the embodiments of the present disclosure explain how to determine the target quantification strategy from the optional quantization strategies based on the quantification contribution information. As shown in Figure 4, the data processing method provided by this embodiment may include:
  • the optional quantization strategy includes: an optional quantization network layer, an optional segmentation coefficient of the optional quantization network layer, and an optional number of quantization bits.
  • S402 Obtain the quantitative contribution information of the optional quantization strategy obtained by performing data processing on the original model based on the optional quantization strategy.
  • S403 Determine a newly selected quantification strategy from the optional quantification strategies based on the quantification contribution information corresponding to the optional quantification strategies.
  • the target quantification strategy from the optional quantification strategies when determining the target quantification strategy from the optional quantification strategies, it is determined through multiple screenings, that is, a part is screened out each time, and all the optional quantification strategies screened out multiple times are used as the target quantification strategies, and for the current time
  • the selected optional quantitative strategies will be used as newly selected quantitative strategies, and the selected optional quantitative strategies before the current time will be used as historical selected quantitative strategies.
  • one possible way to determine a new selected quantization strategy from the optional quantization strategies based on the quantitative contribution information corresponding to the optional quantization strategies is to simultaneously combine the model accuracy information and the compression volume information, each time from the available Among the selected quantization strategies, select a preset number of optional quantization strategies (such as 3) with smaller accuracy loss and larger compression volume as the newly selected quantization strategy.
  • Another implementation method is to sort the optional quantization strategies according to the model accuracy information and compression volume information corresponding to the optional quantization strategies; according to the sorting results and the compression volume information corresponding to the optional quantization strategies, sort the optional quantization strategies from Confirm to add the selected quantitative strategy.
  • each optional quantization strategy is sorted from high to low according to its corresponding model accuracy, and it is judged that the sum of the compression volume information corresponding to the top-ranked optional quantization strategies can reach the compression volume of this screening, then
  • These optional quantitative strategies are newly selected quantitative strategies this time.
  • the values of L, R and R' are positive numbers;
  • the second method can be used to determine the newly selected quantization strategy.
  • This method can select a target quantization strategy that meets the requirements of quantization accuracy and quantization volume faster and more accurately.
  • S404 Determine the total compression volume of the newly selected strategy and the historically selected quantization strategy.
  • volume information calculate the total compression volume of the original model by newly selected strategies and historical selected strategies. For example, the total compression volume is obtained by summing the compression volume corresponding to the newly selected strategy and the compression volume corresponding to the historical selected strategy.
  • S405 Determine whether the total compression volume meets the quantification requirement. If the total compression volume does not meet the quantification requirement, execute S406. If the total compression volume reaches the quantification requirement, execute S409.
  • the quantization requirement may be a preset expected compression volume. This embodiment can determine whether the currently reached total compression volume has reached the expected compression volume, that is, whether it has met the quantification requirements, after each time a part of the newly selected strategies is determined. If the currently reached total compression volume has not reached the expected compression volume, it means that the quantification requirements have not been met, and the subsequent operation of S406 needs to be performed. If the total compression volume currently reached reaches the expected compression volume, it means that the quantification requirements have been met, and the subsequent operations of S409 need to be performed.
  • the corresponding compression volume in the original model will be The optional quantization network layer assigns quantization parameters, that is, achieves preliminary quantification of the original model. Then use the training samples to perform model training on the initially quantized original model, which can include forward training and reverse training. to obtain a preliminary quantitative model.
  • S408 Use optional quantification strategies other than the newly selected quantification strategy among the optional quantification strategies as new optional quantification strategies, add the newly selected quantification strategy to the historically selected quantification strategy, and add the preliminary The quantized model is used as the original model, and the operation of S402 is returned.
  • the number of columns of the feature matrix is equal to the number of rows of the weight matrix.
  • S412 Perform quantization processing on at least two feature element sub-segments and at least two weight element sub-segments according to the target quantization bit number of the target quantization network layer, and determine the output characteristics of the target quantization network layer based on the quantization processing results.
  • the original model is controlled to perform data processing based on the optional quantization strategy, and the quantification contribution information of the optional quantization strategy is determined based on the processing results.
  • the newly selected quantization strategy is determined from the optional quantization strategies in batches. If the total compression volume of the newly selected quantization strategy and the historically selected quantization strategy does not reach the quantization strategy, the newly selected quantization strategy will be used based on the newly selected quantization strategy.
  • the original model After the original model is quantized and trained, return and re-execute the quantification contribution information of the optional quantization strategy and its subsequent operations until the total compression volume of the new and historical selected quantization strategies reaches the quantization strategy, and then the new and historical selected quantization strategies will be added.
  • the quantization strategy is selected as the target quantization strategy, and then the feature matrix input to the target quantization network layer and the weight matrix of the target quantization network layer are processed according to the target segmentation coefficient and the target quantization bit number in the target quantization strategy to obtain the output features.
  • This solution obtains the target quantification strategy in batches, and performs quantification and training processing of the original model based on the newly selected quantification strategy between batches, which greatly ensures the accuracy of the extracted target quantification strategy, thereby ensuring Model quantification accuracy.
  • Figure 5 is a schematic structural diagram of a data processing device provided by an embodiment of the present disclosure.
  • the embodiment of the present disclosure is suitable for performing quantitative processing on the data calculation process of the target quantization network layer in the deep learning model, and is suitable for performing quantitative processing on the data calculation process of the target quantization network layer in the deep learning model.
  • the feature matrix input to the target quantization network layer and the weight matrix of the target quantization network layer are processed to obtain the output features of the target network layer.
  • the device can be configured in an electronic device installed with a deep learning model and implemented using software and/or hardware.
  • the device can implement the data processing method of any embodiment of the present disclosure. As shown in Figure 5, the data processing device 500 includes:
  • the matrix acquisition module 501 is configured to acquire the feature matrix input by the target quantization network layer and the weight matrix of the target quantization network layer; wherein the number of columns of the feature matrix is equal to the number of rows of the weight matrix; the matrix segmentation module 502 is configured to obtain the target quantization according to The target segmentation coefficient of the network layer divides each row of feature elements of the feature matrix into at least two feature element sub-segments, and divides each column of weight elements of the weight matrix into at least two weight element sub-segments; the quantification processing module 503, It is configured to perform quantization processing on at least two feature element sub-segments and at least two weight element sub-segments according to the target quantization bit number of the target quantization network layer; the feature determination module 504 is configured to determine the target quantization network layer based on the quantization processing results. output characteristics.
  • each row of the feature matrix and each column of the weight matrix are divided into at least two weight element sub-segments, and then perform quantization processing on the divided feature element sub-segments and weight element sub-segments according to the target quantization bit number, and determine the output characteristics of the target quantization network layer based on the processing results.
  • this solution divides each row of the feature matrix and each column of the weight matrix into multiple sub-segments for quantization.
  • the matrix segmentation module 502 is configured as:
  • each row of feature elements of the feature matrix is divided into at least two feature element sub-segments; according to the second coefficient in the target segmentation coefficient, each row of the weight matrix is A column of weight elements is divided into at least two weight element sub-segments; wherein the first coefficient and the second coefficient are in an integer multiple relationship.
  • the feature determination module 504 includes:
  • the sub-segment pair determining unit 610 is configured to determine each feature element sub-segment in the feature matrix and the corresponding weight element sub-segment in the weight matrix, and use the corresponding feature element sub-segments and weight element sub-segments as a group Associated sub-segment pairs; the feature calculation unit 620 is configured to determine the output features of the target quantization network layer based on the quantization processing results of the feature element sub-segments and weight element sub-segments in each set of associated sub-segment pairs.
  • the ratio of the number of feature element sub-segments and weight element sub-segments contained in each group of associated sub-segment pairs is the same as the ratio of the segmentation coefficients that divide the weight matrix and the feature matrix.
  • the feature determination module 504 is configured as:
  • the output characteristics of the target quantization network layer are determined based on the quantization processing results.
  • the feature matrix is the speech features obtained after the speech segments are processed by the feature extraction layer; the output features are used to perform semantic recognition processing on the speech segments.
  • the data processing device 500 also includes:
  • the optional strategy determination module 505 is configured to determine the optional quantization strategy of the original model; wherein the optional quantization strategy includes: an optional quantization network layer, an optional segmentation coefficient of the optional quantization network layer, and an optional number of quantization bits;
  • the contribution information acquisition module 506 is configured to obtain the quantitative contribution information of the optional quantification strategy obtained by performing data processing based on the optional quantification strategy on the original model;
  • the target strategy determination module 507 is configured to obtain the quantitative contribution information from the optional quantification strategy based on the quantitative contribution information. Determine the target quantization strategy to obtain the target quantization network layer, the target segmentation coefficient and the target quantization bit number of the target quantization network layer.
  • the target policy determination module 507 includes:
  • the new strategy determination unit 710 is configured to determine a newly selected quantization strategy from the optional quantization strategies based on the quantification contribution information corresponding to the optional quantization strategy; the compression volume determination unit 720 is configured to determine the newly selected strategy and history The total compression volume of the selected quantization strategy; the target strategy determination unit 730 is configured to use the newly selected quantization strategy and the historically selected quantization strategy as the target quantization strategy when the total compression volume reaches the quantization requirement.
  • the quantitative contribution information includes: model accuracy information and compression volume information; a new strategy determination unit 710 is set to:
  • the target policy determination module 507 also includes:
  • the quantization training unit 740 is configured to perform preliminary quantization on the original model based on the newly selected quantization strategy when the total compression volume does not meet the quantization requirements, and trains the original model after preliminary quantization to obtain a preliminary quantization model; historical quantification
  • the strategy update unit 750 is configured to add the newly selected quantization strategy to the historically selected quantization strategy;
  • the loop operation unit 760 is configured to add other optional quantization strategies in the optional quantization strategies except the newly selected quantization strategy.
  • the above-mentioned products can execute the methods provided by any embodiment of the present disclosure, and have corresponding functional modules and effects for executing the methods.
  • the present disclosure also provides an electronic device, a readable storage medium, and a computer program product to implement the above data processing method.
  • FIG. 10 shows a schematic block diagram of an example electronic device 600 that may be used to implement embodiments of the present disclosure.
  • Electronic device 600 is intended to represent many forms of digital computers, such as laptop computers, desktop computers, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers.
  • Electronic devices may also represent various forms of mobile devices, such as personal digital assistants, cellular phones, smartphones, wearable devices, and other similar computing devices.
  • the components shown herein, their connections and relationships, and their functions are examples only and are not intended to limit implementations of the disclosure described and/or claimed herein.
  • the electronic device 600 includes a computing unit 601, which can be loaded into a random access memory (Random Access Memory) according to a computer program stored in a read-only memory (Read-Only Memory, ROM) 602 or from a storage unit 608.
  • Computer program in RAM 603 to perform various appropriate actions and processes.
  • RAM random access memory
  • various programs and data required for the operation of the electronic device 600 can also be stored.
  • Computing unit 601, ROM 602 and RAM 603 are connected to each other via bus 604.
  • An input/output (I/O) interface 605 is also connected to bus 604.
  • the I/O interface 605 includes: an input unit 606, such as a keyboard, a mouse, etc.; an output unit 607, such as various types of displays, speakers, etc.; a storage unit 608, such as a magnetic disk, an optical disk, etc. etc.; and a communication unit 609, such as a network card, modem, wireless communication transceiver, etc.
  • the communication unit 609 allows the electronic device 600 to exchange information/data with other devices through a computer network such as the Internet and/or various telecommunications networks.
  • Computing unit 601 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 601 include, but are not limited to, a central processing unit (Central Processing Unit, CPU), a graphics processing unit (Graphics Processing Unit, GPU), a variety of dedicated artificial intelligence (Artificial Intelligence, AI) computing chips, a variety of running The computing unit of the machine learning model algorithm, Digital Signal Processing (DSP), and any appropriate processor, controller, microcontroller, etc.
  • the computing unit 601 performs a plurality of methods and processes described above, such as data processing methods.
  • the data processing method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 608.
  • part or all of the computer program may be loaded and/or installed onto the electronic device 600 via the ROM 602 and/or the communication unit 609.
  • the computer program When the computer program is loaded into RAM 603 and executed by computing unit 601, one or more steps of the data processing method described above may be performed.
  • the computing unit 601 may be configured to perform the data processing method in any other suitable manner (eg, by means of firmware).
  • Various implementations of the systems and techniques described above may be implemented in digital electronic circuit systems, integrated circuit systems, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Parts (ASSP), System on Chip (SOC), Complex Programmable Logic Device (CPLD), computer hardware, firmware, software, and/or their realized in combination.
  • Various implementations may include implementation in one or more computer programs that may be executed and/or interpreted on a programmable system including at least one programmable processor that may is a special-purpose or general-purpose programmable processor that can receive data and instructions from a storage system, at least one input device, and at least one output device, and transmit data and instructions to the storage system, the at least one input device, and the at least one output device.
  • Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general-purpose computer, special-purpose computer, or other programmable data processing device, such that the program codes, when executed by the processor or controller, cause the functions specified in the flowcharts and/or block diagrams/ The operation is implemented.
  • the program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
  • a machine-readable medium may be a tangible medium that may contain or store a program for use by or in connection with an instruction execution system, apparatus, or device.
  • the machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • Machine-readable media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices or devices, or any suitable combination of the foregoing.
  • machine-readable storage media examples include one or more wire-based electrical connections, laptop disks, hard drives, RAM, ROM, Erasable Programmable Read-Only Memory (EPROM), or flash memory ), optical fiber, portable compact disk read-only memory (Compact Disc Read-Only Memory, CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • the systems and techniques described herein may be implemented on a computer having a display device (e.g., a cathode ray tube (CRT)) or a liquid crystal display (e.g., a CRT) configured to display information to a user.
  • a display device e.g., a cathode ray tube (CRT)
  • a liquid crystal display e.g., a CRT
  • LCD Liquid Crystal Display
  • keyboard and pointing device e.g., a mouse or a trackball
  • Other kinds of devices may also be configured to provide interaction with the user; for example, the feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and may be provided in any form, including Acoustic input, voice input or tactile input) to receive input from the user.
  • the systems and techniques described herein may be implemented in a computing system that includes back-end components (e.g., as a data server), or a computing system that includes middleware components (e.g., an application server), or a computing system that includes front-end components (e.g., A user's computer having a graphical user interface or web browser through which the user can interact with implementations of the systems and technologies described herein), or including such backend components, middleware components, or any combination of front-end components in a computing system.
  • the components of the system may be interconnected by any form or medium of digital data communication (eg, a communications network). Examples of communication networks include: Local Area Network (LAN), Wide Area Network (WAN), blockchain network, and the Internet.
  • Computer systems may include clients and servers. Clients and servers are generally remote from each other and typically interact over a communications network. The relationship of client and server is created by computer programs running on corresponding computers and having a client-server relationship with each other.
  • the server can be a cloud server, also known as cloud computing server or cloud host. It is a host product in the cloud computing service system to solve the problems that exist in traditional physical host and virtual private server (VPS) services. It has the disadvantages of difficult management and weak business scalability.
  • the server can also be a distributed system server or a server combined with a blockchain.
  • Artificial intelligence is the study of using computers to simulate some human thinking processes and intelligent behaviors (such as learning, reasoning, thinking, planning, etc.). It has both hardware-level technology and software-level technology. Artificial intelligence hardware technology generally includes sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing and other technologies; artificial intelligence software technology mainly includes computer vision technology, speech recognition technology, natural language processing technology and machine learning/depth Learning technology, big data processing technology, knowledge graph technology and other major directions.
  • Cloud computing refers to a flexible and scalable shared physical or virtual resource pool through network access.
  • Resources can include servers, operating systems, networks, software, applications, storage devices, etc., and can be on-demand and self-service.
  • Steps can be reordered, added, or removed using various forms of the process shown above.
  • multiple steps described in the present disclosure can be executed in parallel, sequentially, or in different orders.
  • the desired results of the technical solution disclosed in the present disclosure can be achieved, there is no limitation here.

Abstract

A data processing method and apparatus, and a device and a storage medium. The data processing method comprises: acquiring a feature matrix, which is input by a target quantization network layer, and a weight matrix of the target quantization network layer (S101), wherein the number of columns of the feature matrix is equal to the number of rows of the weight matrix; according to a target segmentation coefficient of the target quantization network layer, dividing each row of feature elements of the feature matrix into at least two feature element sub-segments, and dividing each column of weight elements of the weight matrix into at least two weight element sub-segments (S102); and performing quantization processing on the at least two feature element sub-segments and the at least two weight element sub-segments according to the number of target quantization bits of the target quantization network layer, and determining an output feature of the target quantization network layer according to a quantization processing result (S103).

Description

数据处理方法、装置、设备以及存储介质Data processing methods, devices, equipment and storage media
本申请要求在2022年04月28日提交中国专利局、申请号为202210463316.9的中国专利申请的优先权,该申请的全部内容通过引用结合在本申请中。This application claims priority to the Chinese patent application with application number 202210463316.9, which was submitted to the China Patent Office on April 28, 2022. The entire content of this application is incorporated into this application by reference.
技术领域Technical field
本公开涉及人工智能技术领域,涉及深度学习技术领域,可适用于语音识别、自然语言处理和信息推荐等场景。This disclosure relates to the field of artificial intelligence technology and the field of deep learning technology, and can be applied to scenarios such as speech recognition, natural language processing, and information recommendation.
背景技术Background technique
随着人工智能技术的发展,深度学习技术在日常生活中的应用越来越广,为了不断提升深度学习模型的模型精度,导致模型的复杂度和参数量都在持续增加,直接影响模型体积和模型的运算速度,从而影响人工智能技术落地成本,亟需改进。With the development of artificial intelligence technology, deep learning technology is increasingly used in daily life. In order to continuously improve the model accuracy of deep learning models, the complexity and parameter amount of the model continue to increase, which directly affects the model volume and The computing speed of the model, which affects the cost of implementing artificial intelligence technology, needs to be improved urgently.
发明内容Contents of the invention
本公开提供了一种数据处理方法、装置、设备以及存储介质。The present disclosure provides a data processing method, device, equipment and storage medium.
根据本公开的一方面,提供了一种数据处理方法,包括:According to one aspect of the present disclosure, a data processing method is provided, including:
获取目标量化网络层输入的特征矩阵和目标量化网络层的权重矩阵;其中,特征矩阵的列数等于权重矩阵的行数;Obtain the feature matrix input by the target quantization network layer and the weight matrix of the target quantization network layer; where the number of columns of the feature matrix is equal to the number of rows of the weight matrix;
根据目标量化网络层的目标分段系数,将特征矩阵的每一行特征元素划分为至少两个特征元素子段,以及将权重矩阵的每一列权重元素划分为至少两个权重元素子段;According to the target segmentation coefficient of the target quantization network layer, each row of feature elements of the feature matrix is divided into at least two feature element sub-segments, and each column of weight elements of the weight matrix is divided into at least two weight element sub-segments;
根据目标量化网络层的目标量化比特数,对至少两个特征元素子段和至少两个权重元素子段进行量化处理,并根据量化处理结果,确定目标量化网络层的输出特征。According to the target quantization bit number of the target quantization network layer, quantization processing is performed on at least two feature element sub-segments and at least two weight element sub-segments, and based on the quantization processing results, the output characteristics of the target quantization network layer are determined.
根据本公开的一方面,提供了一种数据处理装置,包括:According to an aspect of the present disclosure, a data processing apparatus is provided, including:
矩阵获取模块,设置为获取目标量化网络层输入的特征矩阵和所述目标量化网络层的权重矩阵;其中,所述特征矩阵的列数等于所述权重矩阵的行数;A matrix acquisition module, configured to acquire the feature matrix input by the target quantization network layer and the weight matrix of the target quantization network layer; wherein the number of columns of the feature matrix is equal to the number of rows of the weight matrix;
矩阵分段模块,设置为根据所述目标量化网络层的目标分段系数,将所述特征矩阵的每一行特征元素划分为至少两个特征元素子段,以及将所述权重矩阵的每一列权重元素划分为至少两个权重元素子段;A matrix segmentation module configured to divide each row of feature elements of the feature matrix into at least two feature element sub-segments according to the target quantification target segmentation coefficient of the network layer, and weight each column of the weight matrix The elements are divided into at least two weight element sub-segments;
量化处理模块,设置为根据所述目标量化网络层的目标量化比特数,对所述至少两个特征元素子段和所述至少两个权重元素子段进行量化处理;A quantization processing module configured to perform quantization processing on the at least two feature element sub-segments and the at least two weight element sub-segments according to the target quantization bit number of the target quantization network layer;
特征确定模块,设置为根据量化处理结果,确定所述目标量化网络层的输出特征。A feature determination module is configured to determine the output features of the target quantization network layer based on the quantization processing results.
根据本公开的另一方面,提供了一种电子设备,该电子设备包括:According to another aspect of the present disclosure, an electronic device is provided, the electronic device including:
至少一个处理器;以及at least one processor; and
与至少一个处理器通信连接的存储器;其中,A memory communicatively connected to at least one processor; wherein,
存储器存储有可被至少一个处理器执行的指令,指令被至少一个处理器执行,以使至少一个处理器能够执行上述的数据处理方法。The memory stores instructions that can be executed by at least one processor, and the instructions are executed by at least one processor, so that at least one processor can execute the above-mentioned data processing method.
根据本公开的另一方面,提供了一种存储有计算机指令的非瞬时计算机可读存储介质,其中,计算机指令用于使计算机执行上述的数据处理方法。According to another aspect of the present disclosure, a non-transitory computer-readable storage medium storing computer instructions is provided, wherein the computer instructions are used to cause the computer to execute the above-mentioned data processing method.
根据本公开的另一方面,提供了一种计算机程序产品,包括计算机程序,所述计算机程序在被处理器执行时实现上述的数据处理方法。According to another aspect of the present disclosure, a computer program product is provided, including a computer program that implements the above-mentioned data processing method when executed by a processor.
附图说明Description of the drawings
图1是本公开实施例提供的一种数据处理方法的流程图;Figure 1 is a flow chart of a data processing method provided by an embodiment of the present disclosure;
图2是本公开实施例提供的另一种数据处理方法的流程图;Figure 2 is a flow chart of another data processing method provided by an embodiment of the present disclosure;
图3是本公开实施例提供的另一种数据处理方法的流程图;Figure 3 is a flow chart of another data processing method provided by an embodiment of the present disclosure;
图4是本公开实施例提供的另一种数据处理方法的流程图;Figure 4 is a flow chart of another data processing method provided by an embodiment of the present disclosure;
图5是本公开实施例提供的一种数据处理装置的结构示意图;Figure 5 is a schematic structural diagram of a data processing device provided by an embodiment of the present disclosure;
图6是本公开实施例提供的一种特征确定模块的结构示意图;Figure 6 is a schematic structural diagram of a feature determination module provided by an embodiment of the present disclosure;
图7是本公开实施例提供的另一种数据处理装置的结构示意图;Figure 7 is a schematic structural diagram of another data processing device provided by an embodiment of the present disclosure;
图8是本公开实施例提供的一种目标策略确定模块的结构示意图;Figure 8 is a schematic structural diagram of a target strategy determination module provided by an embodiment of the present disclosure;
图9是本公开实施例提供的另一种目标策略确定模块的结构示意图;Figure 9 is a schematic structural diagram of another target strategy determination module provided by an embodiment of the present disclosure;
图10是本公开实施例提供的一种实现数据处理方法的电子设备的框图。Figure 10 is a block diagram of an electronic device that implements a data processing method provided by an embodiment of the present disclosure.
具体实施方式Detailed ways
以下结合附图对本公开的示范性实施例做出说明,其中包括本公开实施例的多种细节以助于理解,应当将它们认为仅仅是示范性的。为了清楚和简明,以下的描述中省略了对公知功能和结构以及与下述实施例相关性低的功能和结构的描述。Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the present disclosure are included to facilitate understanding, and they should be considered to be exemplary only. For the sake of clarity and conciseness, descriptions of well-known functions and structures as well as functions and structures that are less relevant to the embodiments described below are omitted from the following description.
图1是本公开实施例提供的一种数据处理方法的流程图;本公开实施例适用于对深度学习模型中目标量化网络层的数据计算过程进行量化处理的情况,适用于对深度学习模型中的目标量化网络层输入的特征矩阵和目标量化网络层的权重矩阵进行处理,得到该目标网络层的输出特征的情况。该方法可以由数据处理装置来执行,该装置可以采用软件和/或硬件的方式实现。可以集成于配置有深度学习模型的电子设备中。如图1所示,本实施例提供的数据处理方法可以包括:Figure 1 is a flow chart of a data processing method provided by an embodiment of the present disclosure; the embodiment of the present disclosure is suitable for performing quantitative processing on the data calculation process of the target quantification network layer in the deep learning model, and is suitable for performing quantitative processing on the data calculation process of the target quantization network layer in the deep learning model. The feature matrix input to the target quantization network layer and the weight matrix of the target quantization network layer are processed to obtain the output features of the target network layer. The method may be executed by a data processing device, which may be implemented in software and/or hardware. Can be integrated into electronic devices configured with deep learning models. As shown in Figure 1, the data processing method provided by this embodiment may include:
S101,获取目标量化网络层输入的特征矩阵和目标量化网络层的权重矩阵。S101: Obtain the feature matrix input to the target quantization network layer and the weight matrix of the target quantization network layer.
目标量化网络层可以是深度学习模型中,运算矩阵乘法算子的网络层,该矩阵乘法算子可以包括但不限于:全连接算子和其他衍生算子,如transformer算子。The target quantization network layer may be a network layer that operates matrix multiplication operators in the deep learning model. The matrix multiplication operators may include but are not limited to: fully connected operators and other derivative operators, such as transformer operators.
目标量化网络层输入的特征矩阵,可以是输入到目标网络层的输入信息,例如,若目标量化网络层为深度学习模型中的首个网络层,则该特征矩阵,可以是该深度学习模型的输入,若目标量化网络层为深度学习模型的非首个网络层,则该特征矩阵可以是深度学习模型中,位于该目标量化网络层的上一网络层的输出。目标量化网络层的权重矩阵可以是目标量化网络层在网络训练阶段得到,用于表征本层输入特征的权重系数的固有网络参数。由于目标量化网络层对应的是矩阵乘法算子,所以需要要求特征矩阵的列数等于权重矩阵的行数。即特征矩阵的大小为:m*k,权重矩阵的大小为k*n。其中,m、k、n的取值为正整数。The feature matrix input to the target quantification network layer can be the input information input to the target network layer. For example, if the target quantification network layer is the first network layer in the deep learning model, then the feature matrix can be the input information of the deep learning model. Input, if the target quantization network layer is not the first network layer of the deep learning model, the feature matrix can be the output of the network layer located above the target quantization network layer in the deep learning model. The weight matrix of the target quantization network layer can be the inherent network parameters of the weight coefficients of the target quantization network layer that are obtained during the network training stage and are used to characterize the input features of this layer. Since the target quantization network layer corresponds to the matrix multiplication operator, the number of columns of the feature matrix needs to be equal to the number of rows of the weight matrix. That is, the size of the feature matrix is: m*k, and the size of the weight matrix is k*n. Among them, the values of m, k, and n are positive integers.
本实施例可以获取输入到目标量化网络层的特征数据作为特征矩阵,并获取该目标量化网络层内固有的权重参数作为权重矩阵。若目标网络层的输入数据为多个时,则可以是选择列数与权重矩阵的行数相同的输入数据作为该目标量化网络层输入的特征矩阵。This embodiment can obtain the feature data input to the target quantization network layer as a feature matrix, and obtain the inherent weight parameters in the target quantization network layer as a weight matrix. If there are multiple input data to the target network layer, the input data whose number of columns is the same as the number of rows of the weight matrix can be selected as the feature matrix input to the target quantization network layer.
S102,根据目标量化网络层的目标分段系数,将特征矩阵的每一行特征元素划分为至少两个特征元素子段,以及将权重矩阵的每一列权重元素划分为至少两个权重元素子段。S102. According to the target segmentation coefficient of the target quantization network layer, divide each row of feature elements of the feature matrix into at least two feature element sub-segments, and divide each column of weight elements of the weight matrix into at least two weight element sub-segments.
目标分段系数可以是对目标量化网络层的运算过程进行量化时所需的量化配置参数中的一个参数。其用于表征划分后的每一子段中包含的矩阵元素的数量。例如,若目标分段系数为C,则可以是将矩阵中的每C个连续元素划分为一段,即划分后的每一子段中包含的矩阵元素的数量为C个。其中,C的取值为正整数。该目标分段系数的数值可以是预先确定的,例如,可以是通过大量测试分析,从多种可选分段系数中选择一个作为目标分段系数。 还可以是根据经验设置等,对此不进行限定。本实施例可以设置目标分段系数能够将特征矩阵的每一行特征元素和权重矩阵的每一列矩阵元素进行等份划分,即若特征矩阵的列数和权重矩阵的行数均为k,则此时目标分段系数C的取值能被k整除。The target segmentation coefficient may be one of the quantization configuration parameters required when quantizing the operation process of the target quantization network layer. It is used to characterize the number of matrix elements contained in each sub-segment after division. For example, if the target segmentation coefficient is C, each C consecutive elements in the matrix can be divided into a segment, that is, the number of matrix elements contained in each divided subsegment is C. Among them, the value of C is a positive integer. The value of the target segmentation coefficient may be predetermined. For example, one may be selected from a variety of optional segmentation coefficients as the target segmentation coefficient through a large number of test analyses. It can also be set based on experience, etc., which is not limited. In this embodiment, the target segmentation coefficient can be set to divide the feature elements of each row of the feature matrix and the matrix elements of each column of the weight matrix into equal parts. That is, if the number of columns of the feature matrix and the number of rows of the weight matrix are both k, then this When the value of the target segmentation coefficient C can be evenly divided by k.
本实施例将特征矩阵中的矩阵元素称为特征元素,将对特征元素划分后的每一组特征元素作为一个特征元素子段;将权重矩阵中的矩阵元素称为权重元素,将对权重元素划分后的每一组权重元素作为一个权重元素子段。In this embodiment, the matrix elements in the feature matrix are called feature elements, and each group of feature elements after dividing the feature elements is regarded as a feature element sub-segment; the matrix elements in the weight matrix are called weight elements, and the weight elements are Each divided group of weight elements is treated as a weight element sub-segment.
本实施例可以根据目标分段系数C,对特征矩阵中的每一行特征元素,以相邻的C个特征元素为一个组,划分为至少两段,其中的每一段均作为一个特征元素子段;再根据目标分段系数C,对权重矩阵中的每一列权重元素,以相邻的C个权重元素为一个组,划分为至少两段,其中的每一段均作为一个权重元素子段。In this embodiment, according to the target segmentation coefficient C, each row of feature elements in the feature matrix can be divided into at least two segments with C adjacent feature elements as a group, each of which serves as a feature element sub-segment. ; Then according to the target segmentation coefficient C, each column of weight elements in the weight matrix is divided into at least two segments with C adjacent weight elements as a group, and each segment is used as a weight element sub-segment.
示例性的,若特征矩阵为矩阵I,即
Figure PCTCN2022132429-appb-000001
权重矩阵为矩阵W,即
Figure PCTCN2022132429-appb-000002
且目标分段系数C为4,则基于目标分段系数C对矩阵I中的每一行进行划分,得到8个特征元素子段,即特征元素子段1(I 11,I 12,I 13,I 14)、特征元素子段2(I 15,I 16,I 17,I 18)、特征元素子段3(I 21,I 22,I 23,I 24)、特征元素子段4(I 25,I 26,I 27,I 28)、特征元素子段5(I 31,I 32,I 33,I 34)、特征元素子段6(I 35,I 36,I 37,I 38)、特征元素子段7(I 41,I 42,I 43,I 44)和特征元素子段8(I 45,I 46,I 47,I 48)。基于目标分段系数C对矩阵W中的每一列进行划分,得到4个权重元素子段,即权重元素子段1(W 11,W 21,W 31,W 41)、权重元素子段2(W 51,W 61,W 71,W 81)、权重元素子段3(W 12,W 22,W 32,W 42)和权重元素子段4(W 52,W 62,W 72,W 82)。
For example, if the characteristic matrix is matrix I, that is
Figure PCTCN2022132429-appb-000001
The weight matrix is the matrix W, that is
Figure PCTCN2022132429-appb-000002
And the target segmentation coefficient C is 4, then each row in the matrix I is divided based on the target segmentation coefficient C, and 8 characteristic element sub-segments are obtained, that is, the characteristic element sub-segment 1 (I 11 , I 12 , I 13 , I 14 ), characteristic element sub-segment 2 (I 15 , I 16 , I 17 , I 18 ), characteristic element sub-segment 3 (I 21 , I 22 , I 23 , I 24 ), characteristic element sub-segment 4 (I 25 ,I 26 ,I 27 ,I 28 ), characteristic element sub-segment 5 (I 31 ,I 32 ,I 33 ,I 34 ), characteristic element sub-segment 6 (I 35 ,I 36 ,I 37 ,I 38 ), characteristic Element sub-segment 7 (I 41 , I 42 , I 43 , I 44 ) and characteristic element sub-segment 8 (I 45 , I 46 , I 47 , I 48 ). Divide each column in the matrix W based on the target segmentation coefficient C to obtain 4 weight element sub-segments, namely weight element sub-segment 1 (W 11 , W 21 , W 31 , W 41 ), weight element sub-segment 2 ( W 51 , W 61 , W 71 , W 81 ), weight element sub-segment 3 (W 12 , W 22 , W 32 , W 42 ) and weight element sub-segment 4 (W 52 , W 62 , W 72 , W 82 ) .
S103,根据目标量化网络层的目标量化比特数,对至少两个特征元素子段和至少两个权重元素子段进行量化处理,并根据量化处理结果,确定目标量化网络层的输出特征。S103: Perform quantization processing on at least two feature element sub-segments and at least two weight element sub-segments according to the target quantization bit number of the target quantization network layer, and determine the output characteristics of the target quantization network layer based on the quantization processing results.
目标量化比特数可以是对目标量化网络层的运算过程进行量化时所需的量化配置参数中的另一个参数。其用于表征对矩阵乘法算子量化的程度,即目标量化比特数的数值越小,代表量化程度越高,例如,本实施例的目标量化比特数的数值通常不大于4,例如,可以取值为1bit、2bit或4bit等。The target quantization bit number may be another parameter among the quantization configuration parameters required when quantizing the operation process of the target quantization network layer. It is used to represent the degree of quantization of the matrix multiplication operator, that is, the smaller the value of the target number of quantization bits, the higher the degree of quantization. For example, the value of the target number of quantization bits in this embodiment is usually not greater than 4. For example, it can be The value is 1bit, 2bit or 4bit, etc.
本实施例根据目标量化比特数,对每一特征元素子段和每一权重元素子段进行量化处理的过程包括:根据每一特征元素子段内的特征元素值,确定特征元素子段的特征基准值,例如,可以是将该特征元素子段内绝对值最大的特征元素值作为该特征元素子段的特征基准值,然后根据特征基准值和目标量化网络层的目标量化比特数,按照如下公式(1),确定特征元素子段内的每个特征元素的量化值;In this embodiment, the process of quantizing each feature element sub-segment and each weight element sub-segment based on the target number of quantization bits includes: determining the characteristics of the feature element sub-segment based on the feature element value within each feature element sub-segment. The reference value, for example, can be the feature element value with the largest absolute value within the feature element sub-segment as the feature reference value of the feature element sub-segment, and then based on the feature reference value and the target quantization bit number of the target quantization network layer, as follows: Formula (1) determines the quantified value of each feature element within the feature element sub-segment;
Figure PCTCN2022132429-appb-000003
Figure PCTCN2022132429-appb-000003
其中,I′ i,p为特征矩阵I的第i行第p列的特征元素的量化值;I i,p为特征矩阵I的第i行第p列的特征元素;absmax(I i,s)为特征矩阵I的第i行第s个特征元素子段的特征基准值;B为目标量化网络层的目标量化比特数。 Among them, I′ i,p is the quantized value of the feature element in the i-th row and p-th column of the feature matrix I; I i,p is the feature element in the i-th row and p-th column of the feature matrix I; absmax(I i,s ) is the feature reference value of the s-th feature element sub-segment in the i-th row of feature matrix I; B is the target quantization bit number of the target quantization network layer.
同理,根据每一权重元素子段内的权重元素值,确定权重元素子段的权重基准值,并根据权重基准值和目标量化比特数,按照如下公式(2),确定权重元素子段内的权重元素的量化值。In the same way, according to the weight element value in each weight element sub-segment, determine the weight base value of the weight element sub-segment, and according to the weight base value and the target quantization bit number, determine the weight element sub-segment according to the following formula (2) The quantized value of the weight element.
Figure PCTCN2022132429-appb-000004
Figure PCTCN2022132429-appb-000004
其中,W′ q,j为权重矩阵W的第q行第j列的权重元素的量化值;W q,j为权重矩阵W的第q行第j列的权重元素;absmax(I j,s)为权重矩阵W的第j列的第s个权重元素子段的权重基准值;B为目标量化网络层的目标量化比特数。 Among them, W′ q,j is the quantized value of the weight element in the qth row and jth column of the weight matrix W; W q,j is the weight element of the qth row and jth column of the weight matrix W; absmax(I j,s ) is the weight reference value of the s-th weight element subsection of the j-th column of the weight matrix W; B is the target quantization bit number of the target quantization network layer.
本实施例中的变量i、p、s、j、q的取值为正整数。The values of variables i, p, s, j, and q in this embodiment are positive integers.
本实施例将每一特征元素或权重元素转化为其对应的量化值的过程实质是将特征元素或权重元素量化为目标量化比特数对应的低比特整数的过程。In this embodiment, the process of converting each feature element or weight element into its corresponding quantized value is essentially a process of quantizing the feature element or weight element into a low-bit integer corresponding to the target quantization bit number.
本实施例得到的量化结果可以先基于紧致格式保存,以便后续计算输出特征时调用。例如,若目标量化比特数B=4,则由于一个字节为8bit,所以可以用一个字节保存两个特征元素的量化值或两个权重元素的量化值,同时还要保存每个特征基准值和每个权重基准值,其中,每一特征基准值需要占用4个字节,每个权重基准值也要占用4个字节。The quantified results obtained in this embodiment can be saved in a compact format first, so that they can be called later when calculating the output features. For example, if the target number of quantization bits is B=4, then since one byte is 8 bits, one byte can be used to store the quantized values of two feature elements or the quantized values of two weight elements, as well as each feature benchmark. value and each weight reference value, where each feature reference value takes up 4 bytes, and each weight reference value also takes up 4 bytes.
在基于上述方式对每个特征元素子段和每个权重元素子段进行量化处理后,可以基于量化处理结果,即每个特征元素子段的特征基准值和该特征元素子段内每个特征元素的量化值;以及每个权重元素子段的权重基准值和该权重元素子段内每个权重元素的量化值,通过低比特矩阵乘法计算和反量化 的过程,来确定目标量化网络层的输出特征。After quantizing each feature element sub-segment and each weight element sub-segment based on the above method, the quantification processing results can be used, that is, the feature reference value of each feature element sub-segment and each feature within the feature element sub-segment. The quantized value of the element; as well as the weight reference value of each weight element sub-segment and the quantized value of each weight element within the weight element sub-segment, the target quantization network layer is determined through the process of low-bit matrix multiplication calculation and inverse quantization. Output features.
本实施例上述方案的目标量化网络层可位于任何配置有矩阵乘法算子的深度学习模型中,例如,可以位于图像识别模型中、语音识别模型中或文本语义解析模型中等。The target quantization network layer of the above solution in this embodiment can be located in any deep learning model configured with a matrix multiplication operator, for example, it can be located in an image recognition model, a speech recognition model, or a text semantic parsing model, etc.
在本实施例中,可以在语音识别模型中部署该目标量化网络层,此时,对应的特征矩阵是语音片段经特征提取层处理后得到的语音特征;输出特征用于对语音片段进行语义识别处理。In this embodiment, the target quantification network layer can be deployed in the speech recognition model. At this time, the corresponding feature matrix is the speech feature obtained after the speech segment is processed by the feature extraction layer; the output feature is used for semantic recognition of the speech segment. deal with.
本公开实施例的方案,获取目标量化网络层输入的特征矩阵和该目标量化网络层的权重矩阵后,基于目标分段系数,分别将特征矩阵的每一行和权重矩阵的每一列划分为至少两个权重元素子段,进而根据目标量化比特数,对划分后的特征元素子段和权重元素子段进行量化处理,并根据处理结果,确定目标量化网络层的输出特征。本方案通过引入目标分段系数,将特征矩阵的每一行和权重矩阵的每一列划分为多个子段进行量化,在达到低比特量化矩阵乘法的同时,还保证了低比特矩阵乘法量化的精准性,即能够在尽可能保证模型精度的同时,压缩模型体积,提高模型运行速度,从而降低人工智能技术落地成本。According to the solution of the embodiment of the present disclosure, after obtaining the feature matrix input by the target quantization network layer and the weight matrix of the target quantization network layer, based on the target segmentation coefficient, each row of the feature matrix and each column of the weight matrix are divided into at least two weight element sub-segments, and then perform quantization processing on the divided feature element sub-segments and weight element sub-segments according to the target quantization bit number, and determine the output characteristics of the target quantization network layer based on the processing results. By introducing target segmentation coefficients, this solution divides each row of the feature matrix and each column of the weight matrix into multiple sub-segments for quantization. While achieving low-bit quantization matrix multiplication, it also ensures the accuracy of low-bit matrix multiplication quantification. , that is, it can compress the model volume and improve the model running speed while ensuring the accuracy of the model as much as possible, thereby reducing the cost of implementing artificial intelligence technology.
在本实施例中,目标量化网络层的目标分段系数可以包括第一系数和第二系数,相应的,对特征矩阵和权重矩阵进行分段的方式为:根据目标量化网络层的目标分段系数中的第一系数,将特征矩阵的每一行特征元素划分为至少两个特征元素子段;根据目标分段系数中的第二系数,将权重矩阵的每一列权重元素划分为至少两个权重元素子段;其中,第一系数和第二系数可以相同也可以不同,无论两者是否相同,均需遵循第一系数和第二系数成整数倍关系。例如,第一系数C1=4,第二系数C2=2。基于第一系数对特征矩阵进行划分的方式,以及基于第二系数对权重矩阵进行划分的方式与上述实施例介绍的方式类似,在此不进行赘述。本方对权重矩阵和特征矩阵可基于不同的目标分段系数进行子段划分,提高了划分规则的灵活性和多样性,提高了后续矩阵量化,以及基于量化结果确定输出矩阵的精准性和灵活性。In this embodiment, the target segmentation coefficient of the target quantization network layer may include a first coefficient and a second coefficient. Correspondingly, the method of segmenting the feature matrix and the weight matrix is: according to the target segmentation of the target quantization network layer According to the first coefficient in the coefficient, each row of feature elements of the feature matrix is divided into at least two feature element sub-segments; according to the second coefficient in the target segmentation coefficient, each column weight element of the weight matrix is divided into at least two weight elements. Element subsection; wherein, the first coefficient and the second coefficient may be the same or different. Regardless of whether they are the same or not, the relationship between the first coefficient and the second coefficient must be an integer multiple. For example, the first coefficient C1=4 and the second coefficient C2=2. The method of dividing the feature matrix based on the first coefficient and the method of dividing the weight matrix based on the second coefficient are similar to the methods introduced in the above embodiments and will not be described again here. This method can divide the weight matrix and feature matrix into sub-segments based on different target segment coefficients, which improves the flexibility and diversity of the division rules, improves the subsequent matrix quantification, and determines the accuracy and flexibility of the output matrix based on the quantification results. sex.
图2是本公开实施例提供的另一种数据处理方法的流程图。本公开实施例在上述实施例的基础上,对如何根据量化处理结果,确定目标量化网络层的输出特征进行详细解释说明,如图2所示,本实施例提供的数据处理方法可以包括:Figure 2 is a flow chart of another data processing method provided by an embodiment of the present disclosure. Based on the above embodiments, the embodiments of the present disclosure explain in detail how to determine the output characteristics of the target quantization network layer based on the quantization processing results. As shown in Figure 2, the data processing method provided by this embodiment may include:
S201,获取目标量化网络层输入的特征矩阵和目标量化网络层的权重矩阵。S201: Obtain the feature matrix input by the target quantization network layer and the weight matrix of the target quantization network layer.
特征矩阵的列数等于权重矩阵的行数。The number of columns of the feature matrix is equal to the number of rows of the weight matrix.
S202,根据目标量化网络层的目标分段系数,将特征矩阵的每一行特征元素划分为至少两个特征元素子段,以及将权重矩阵的每一列权重元素划分为至少两个权重元素子段。S202. According to the target segmentation coefficient of the target quantization network layer, divide each row of feature elements of the feature matrix into at least two feature element sub-segments, and divide each column of weight elements of the weight matrix into at least two weight element sub-segments.
S203,根据目标量化网络层的目标量化比特数,对至少两个特征元素子段和至少两个权重元素子段进行量化处理。S203: Perform quantization processing on at least two feature element sub-segments and at least two weight element sub-segments according to the target quantization bit number of the target quantization network layer.
S204,确定特征矩阵中的每一特征元素子段,在权重矩阵中对应的权重元素子段,并将具有对应关系的特征元素子段和权重元素子段作为一组关联子段对。S204: Determine each characteristic element sub-segment in the characteristic matrix and the corresponding weight element sub-segment in the weight matrix, and use the corresponding characteristic element sub-segments and weight element sub-segments as a set of associated sub-segment pairs.
本实施例中特征矩阵的每一特征元素子段,其在权重矩阵的每一列中有对应有权重元素子段,以确定特征矩阵中第i行第s个特征元素子段对应的权重元素子段为例,依次针对权重矩阵的每一列执行如下操作:根据该特征元素子段中的每个特征元素在第i行的位置,选择权重矩阵的每一列中同样位置的权重元素所在的权重元素子段,作为该特征元素子段在权重矩阵的每一列对应的权重元素子段。In this embodiment, each characteristic element sub-segment of the characteristic matrix has a corresponding weight element sub-segment in each column of the weight matrix to determine the weight element sub-segment corresponding to the s-th characteristic element sub-segment of the i-th row in the characteristic matrix. Taking the segment as an example, perform the following operations for each column of the weight matrix in turn: According to the position of each feature element in the i-th row of the feature element sub-segment, select the weight element where the weight element at the same position in each column of the weight matrix is located. Sub-segment, as the weight element sub-segment corresponding to each column of the weight matrix for the feature element sub-segment.
示例性的,特征矩阵I为
Figure PCTCN2022132429-appb-000005
特征矩阵W为
Figure PCTCN2022132429-appb-000006
且特征矩阵I是基于目标分段系数中的第一系数C1进行子段划分的,权重矩阵W是基于目标分段系数中的第二系数C2进行划分的。其中,C1和C2的取值为正整数,且两者可以相同也可以不同。
For example, the characteristic matrix I is
Figure PCTCN2022132429-appb-000005
The characteristic matrix W is
Figure PCTCN2022132429-appb-000006
And the feature matrix I is divided into sub-segments based on the first coefficient C1 in the target segment coefficient, and the weight matrix W is divided based on the second coefficient C2 in the target segment coefficient. Among them, the values of C1 and C2 are positive integers, and they can be the same or different.
若C1=C2=4,则确定特征矩阵I中第1行的特征元素子段1(I 11,I 12,I 13,I 14)在权重矩阵W的第1列对应的权重元素子段时,由于特征元素子段1中的多 个特征元素位于特征矩阵I第1行的第1到4个元素位置,所以本实施例将权重矩阵W中第1列的第1到4个元素位置对应的权重元素(即W 11,W 21,W 31,W 41)所在的权重元素子段1(W 11,W 21,W 31,W 41),作为该特征元素子段1在权重矩阵W的第1列中对应的权重元素子段。特征元素子段1在权重矩阵W的其他列对应的权重元素子段的确定方式同理,在此不进行赘述。 If C1=C2=4, determine the characteristic element sub-segment 1 (I 11 , I 12 , I 13 , I 14 ) of the first row in the feature matrix I when it is the weight element sub-segment corresponding to the first column of the weight matrix W , since multiple characteristic elements in the characteristic element subsection 1 are located at the 1st to 4th element positions in the 1st row of the feature matrix I, this embodiment corresponds to the 1st to 4th element positions in the 1st column of the weight matrix W The weight element sub-segment 1 (W 11 , W 21 , W 31 , W 41 ) where the weight element (i.e. W 11 , W 21 , W 31 , W 41 ) is located, as the characteristic element sub-segment 1 in the weight matrix W The corresponding weight element subsegment in column 1. The weight element sub-segments corresponding to the characteristic element sub-segment 1 in other columns of the weight matrix W are determined in the same way, and will not be described again here.
若C1=4,C2=2,则确定特征矩阵I中第1行的特征元素子段1(I 11,I 12,I 13,I 14)在权重矩阵W的第1列对应的权重元素子段时,由于特征元素子段1中的多个特征元素位于特征矩阵I第1行的第1到4个元素位置,所以本实施例将权重矩阵W中第1列的第1到4个元素位置对应的权重元素(即W 11,W 21,W 31,W 41)所在的权重元素子段1(W 11,W 21)和权重元素子段2(W 31,W 41),作为该特征元素子段1在权重矩阵W的第1列中对应的权重元素子段。特征元素子段1在权重矩阵W的其他列对应的权重元素子段的确定方式同理,在此不进行赘述。 If C1=4, C2=2, determine the weight element subsection corresponding to the feature element sub-segment 1 (I 11 , I 12 , I 13 , I 14 ) in the first row of the feature matrix I in the first column of the weight matrix W segment, since multiple characteristic elements in the characteristic element sub-segment 1 are located at the 1st to 4th element positions in the 1st row of the feature matrix I, this embodiment places the 1st to 4th elements in the 1st column of the weight matrix W The weight element sub-segment 1 (W 11 , W 21 ) and weight element sub-segment 2 (W 31 , W 41 ) where the weight element corresponding to the position ( i.e. W 11 , W 21 , W 31 , W 41 ) is located is used as this feature The weight element sub-segment corresponding to element sub-segment 1 in the first column of the weight matrix W. The weight element sub-segments corresponding to the characteristic element sub-segment 1 in other columns of the weight matrix W are determined in the same way, and will not be described again here.
若C1=2,C2=4,则确定特征矩阵I中第1行的特征元素子段1(I 11,I 12)在权重矩阵W的第1列对应的权重元素子段时,由于特征元素子段1中的多个特征元素位于特征矩阵I第1行的第1和2个元素位置,所以本实施例将权重矩阵W中第1列的第1和2个元素位置对应的权重元素(即W 11,W 21)所在的权重元素子段1(W 11,W 21,W 31,W 41),作为该特征元素子段1在权重矩阵W的第1列中对应的权重元素子段。同理,特征元素子段2(I 13,I 14)在权重矩阵W的第1列中对应的权重元素子段也是权重元素子段1(W 11,W 21,W 31,W 41)。 If C1=2, C2=4, then when determining the characteristic element sub-segment 1 (I 11 , I 12 ) of the first row in the characteristic matrix I in the weight element sub-segment corresponding to the first column of the weight matrix W, due to the characteristic element Multiple feature elements in subsegment 1 are located at the 1st and 2nd element positions in the 1st row of the feature matrix I, so this embodiment uses the weight elements corresponding to the 1st and 2nd element positions in the 1st column of the weight matrix W ( That is, the weight element sub-segment 1 (W 11 , W 21 , W 31 , W 41 ) where W 11 , W 21 ) is located is the corresponding weight element sub-segment 1 of the characteristic element sub-segment 1 in the first column of the weight matrix W. . In the same way, the weight element sub-segment corresponding to the characteristic element sub-segment 2 (I 13 , I 14 ) in the first column of the weight matrix W is also the weight element sub-segment 1 (W 11 , W 21 , W 31 , W 41 ).
本实施例在确定出具有对应关系的特征元素子段和权重元素子段后,可以针对特征矩阵I中的每一行,分析该行中的每个特征元素子段,与权重矩阵的每一列的每个权重元素子段之间的对应关系,并将特征矩阵I中该行与权重矩阵的每一列,具有对应关系的特征元素子段和权重元素子段均作为一组关联子段对。In this embodiment, after determining the corresponding characteristic element sub-segments and weight element sub-segments, for each row in the characteristic matrix I, each characteristic element sub-segment in the row can be analyzed to compare with each column of the weight matrix. The corresponding relationship between each weight element sub-segment, and the corresponding row in the feature matrix I and each column of the weight matrix, the feature element sub-segment and the weight element sub-segment with the corresponding relationship are regarded as a set of associated sub-segment pairs.
示例性的,针对特征矩阵I和权重矩阵W,若C1=C2=4,则特征矩阵I的第一行,与权重矩阵W的第一列之间存在两组关联子段对,即特征元素子段1(I 11,I 12,I 13,I 14)和权重元素子段1(W 11,W 21,W 31,W 41);特征元素子段2(I 15,I 16,I 17,I 18)和权重元素子段2(W 51,W 61,W 71,W 81)。特征矩阵I的第一行,与权重矩阵W的第二列之间也存在两组关联子段对,即特征元素子段1(I 11,I 12,I 13,I 14)和权重元素子段3(W 12,W 22,W 32,W 42);特征元素子段2(I 15,I 16,I 17,I 18)和权重元素子段4(W 52,W 62,W 72,W 82)。 For example, for the feature matrix I and the weight matrix W, if C1=C2=4, then there are two sets of associated sub-segment pairs between the first row of the feature matrix I and the first column of the weight matrix W, that is, feature elements. Sub-segment 1 (I 11 , I 12 , I 13 , I 14 ) and weight element sub-segment 1 (W 11 , W 21 , W 31 , W 41 ); feature element sub-segment 2 (I 15 , I 16 , I 17 ,I 18 ) and weight element sub-segment 2 (W 51 , W 61 , W 71 , W 81 ). There are also two sets of associated sub-segment pairs between the first row of the feature matrix I and the second column of the weight matrix W, namely the feature element sub-segment 1 (I 11 , I 12 , I 13 , I 14 ) and the weight element sub-segment 1 Segment 3 (W 12 , W 22 , W 32 , W 42 ); feature element sub-segment 2 (I 15 , I 16 , I 17 , I 18 ) and weight element sub-segment 4 (W 52 , W 62 , W 72 , W 82 ).
在本实施例中,每组关联子段对中包含的特征元素子段和权重元素子段 的数量比值,与划分权重矩阵和特征矩阵的分段系数的比值相同。即针对每组关联子段对,其中包含的特征元素子段的数量/权重元素子段的数量=划分权重矩阵的第一系数/划分特征矩阵的第二系数。In this embodiment, the ratio of the number of feature element subsegments and weight element subsegments contained in each group of associated subsegment pairs is the same as the ratio of the segmentation coefficients that divide the weight matrix and the feature matrix. That is, for each group of associated sub-segment pairs, the number of feature element sub-segments contained therein/the number of weight element sub-segments = the first coefficient of the division weight matrix/the second coefficient of the division feature matrix.
S205,根据每组关联子段对中的特征元素子段和权重元素子段的量化处理结果,确定目标量化网络层的输出特征。S205: Determine the output characteristics of the target quantization network layer based on the quantization processing results of the feature element sub-segments and weight element sub-segments in each group of associated sub-segment pairs.
本实施例可以先根据每组关联子段对中的特征元素子段的特征基准值和特征元素的量化值,以及权重元素子段的权重基准值和权重元素的量化值,将位置相对应的特征元素的量化值和权重元素的量化值求低比特乘积后求和,再将乘积求和结果与特征基准值和权重基准值求乘积,得到该组关联子段对的子内积。其中,若特征元素所在列数,与权重元素所在行数相同,则该特征元素与权重元素的位置对应。In this embodiment, according to the characteristic reference value and the quantized value of the characteristic element sub-segment in each group of associated sub-segment pairs, as well as the weight reference value and the quantized value of the weight element sub-segment, the corresponding positions are The quantized value of the feature element and the quantized value of the weight element are calculated as low-bit products and then summed, and then the product summation result is multiplied with the feature reference value and the weight reference value to obtain the sub-inner product of the group of associated sub-segment pairs. Among them, if the number of columns where the feature element is located is the same as the number of rows where the weight element is located, then the position of the feature element corresponds to the position of the weight element.
示例性的,若划分特征矩阵I和权重矩阵W的目标分段系数相同,即C1=C2=C,则可以通过如下公式(3)计算每组关联子段对的子内积。For example, if the target segmentation coefficients of the partitioning feature matrix I and the weight matrix W are the same, that is, C1=C2=C, then the sub-inner product of each group of associated sub-segment pairs can be calculated through the following formula (3).
Figure PCTCN2022132429-appb-000007
Figure PCTCN2022132429-appb-000007
其中,O i,s,j为特征矩阵I中第i行第s个特征元素子段,与权重矩阵W中第j列的第s个权重元素子段对应的关联子段对的子内积;C为目标分段系数;I′ i,t为特征矩阵I中第i行第t列的特征元素的量化值;W′ t,j为权重矩阵W中第t行第j列的特征元素的量化值。bsmax(I i,s)为特征矩阵I中第i行第s个特征元素子段的特征基准值;bsmax(W s,j)为权重矩阵W中第j列第s个权重元素子段的权重基准值。其中,t的取值为正整数。 Among them, O i,s,j is the sub-inner product of the associated sub-segment pair corresponding to the s-th feature element sub-segment in the i-th row in the feature matrix I and the s-th weight element sub-segment in the j-th column in the weight matrix W. ; C is the target segmentation coefficient; I′ i,t is the quantized value of the feature element in the i-th row and t-th column in the feature matrix I; W′ t,j is the feature element in the t-th row and j-th column in the weight matrix W quantified value. bsmax(I i,s ) is the characteristic reference value of the s-th feature element sub-segment in the i-th row in the feature matrix I; bsmax(W s,j ) is the s-th weight element sub-segment in the j-th column in the weight matrix W Weight base value. Among them, the value of t is a positive integer.
在按照上述方式确定出每组关联子段对的子内积后,根据每组关联子段对的子内积,确定目标量化网络层的输出特征。可以是对特征矩阵中对应行数和在权重矩阵中对应列数相同的多组关联子段对的子内积求和,得到输出特征中该对应行和对应列位置处的元素值。After the sub-inner product of each group of associated sub-segment pairs is determined in the above manner, the output characteristics of the target quantization network layer are determined based on the sub-inner product of each group of associated sub-segment pairs. It may be to sum up the inner products of multiple groups of associated sub-segment pairs with the same number of corresponding rows in the feature matrix and the same number of corresponding columns in the weight matrix to obtain the element value at the corresponding row and corresponding column position in the output feature.
Figure PCTCN2022132429-appb-000008
Right now
Figure PCTCN2022132429-appb-000008
其中,O i,j为输出特征所在矩阵中的第i行第j列的元素值;k为特征矩阵的总列数(也是权重矩阵的总行数);O i,s,j为特征矩阵I中第i行第s个特征元素子段,与权重矩阵W中第j列的第s个权重元素子段对应的关联子段对的子内积。 Among them, O i,j is the element value of the i-th row and j-th column in the matrix where the output feature is located; k is the total number of columns of the feature matrix (also the total number of rows of the weight matrix); O i,s,j is the feature matrix I The sub-inner product of the associated sub-segment pair corresponding to the s-th feature element sub-segment in the i-th row in the weight matrix W and the s-th weight element sub-segment in the j-th column in the weight matrix W.
本公开实施例的方案,获取目标量化网络层输入的特征矩阵和该目标量化网络层的权重矩阵后,基于目标分段系数,分别将特征矩阵的每一行和权重矩阵的每一列划分为至少两个权重元素子段,进而根据目标量化比特数,对划分后的特征元素子段和权重元素子段进行量化处理,并确定出具有对应 关系的特征元素子段和权重元素子段作为一组关联子段对,根据每组关联子段对中特征元素子段和权重元素子段的量化处理结果,确定目标量化网络层的输出特征。本方案在根据特征元素子段和权重元素子段确定目标量化网络层的输出特征时,先确定特征元素子段和权重元素子段之间的对应关系,基于该对应关系,能够更为精准且快速的确定出输出特征,进而保证了目标量化网络层运算结果的准确性。According to the solution of the embodiment of the present disclosure, after obtaining the feature matrix input by the target quantization network layer and the weight matrix of the target quantization network layer, based on the target segmentation coefficient, each row of the feature matrix and each column of the weight matrix are divided into at least two weight element sub-segments, and then quantize the divided feature element sub-segments and weight element sub-segments according to the target quantization bit number, and determine the corresponding feature element sub-segments and weight element sub-segments as a set of associations For sub-segment pairs, the output characteristics of the target quantization network layer are determined based on the quantization processing results of the feature element sub-segments and weight element sub-segments in each group of associated sub-segment pairs. When this solution determines the output characteristics of the target quantization network layer based on the feature element sub-segment and the weight element sub-segment, it first determines the correspondence between the feature element sub-segment and the weight element sub-segment. Based on this correspondence, it can be more accurate and The output features are quickly determined, thereby ensuring the accuracy of the target quantization network layer operation results.
由于低比特的矩阵乘法是上述介绍的数据处理方法的计算核心,且纳斯达克(NVIDIA)公司研发的图形处理器(Graphics Processing Unit,GPU)对低比特乘法有高效的支持,可以实现int4和int1等的低比特乘法运算,所以在上述实施例的基础上,本实施例通过NVIDIA公司研发的GPU的张量核Tensor Core计算单元,根据量化处理结果,确定目标量化网络层的输出特征,实现方式如下:基于上述实施例的方式得到每个特征元素子段和每个权重元素子段的量化处理结果后,依次将量化结果加载到Tensor Core计算单元的缓存空间中,然后将缓存空间中的每组关联子段对中包含的特征元素子段的量化结果(即特征基准值和特征元素的量化值)和权重元素子段的量化结果(即权重基准值和权重元素的量化值)作为Tensor Core计算单元的输入,该Tensor Core计算单元即可基于输入的量化结果先提供低比特乘法计算,即将位置相对应的特征元素的量化值和权重元素的量化值求乘积后求和,得到低比特计算结果(如当目标量化比特数为4时,此时得到的低比特计算结果为int32类型的整数结果),然后再进行反量化计算,即将低比特计算结果与特征基准值和权重基准值求乘积,得到每组关联子段对的子内积,该子内积为单精度浮点类型;最后,再根据每组关联子段对的子内积,确定目标量化网络层的输出特征。Since low-bit matrix multiplication is the computing core of the data processing method introduced above, and the graphics processor (Graphics Processing Unit, GPU) developed by Nasdaq (NVIDIA) has efficient support for low-bit multiplication, it can implement int4 and int1 and other low-bit multiplication operations. Therefore, based on the above embodiment, this embodiment uses the Tensor Core computing unit of the GPU developed by NVIDIA to determine the output characteristics of the target quantization network layer based on the quantization processing results. The implementation method is as follows: after obtaining the quantization processing results of each feature element sub-segment and each weight element sub-segment based on the above embodiment, the quantization results are sequentially loaded into the cache space of the Tensor Core computing unit, and then the cache space is The quantization results of the feature element sub-segments contained in each group of associated sub-segment pairs (i.e., the feature reference value and the quantized value of the feature element) and the quantification results of the weight element sub-segment (i.e., the weight reference value and the quantized value of the weight element) are as The input of the Tensor Core computing unit. The Tensor Core computing unit can first provide low-bit multiplication calculation based on the input quantization result. That is, the quantized value of the feature element corresponding to the position and the quantized value of the weight element are multiplied and then summed to obtain a low-bit multiplication calculation. The bit calculation result (for example, when the target number of quantization bits is 4, the low-bit calculation result obtained at this time is an integer result of type int32), and then the inverse quantization calculation is performed, that is, the low-bit calculation result is combined with the feature reference value and weight reference value Calculate the product to obtain the sub-inner product of each group of associated sub-segment pairs, which is a single-precision floating point type; finally, based on the sub-inner product of each group of associated sub-segment pairs, determine the output characteristics of the target quantization network layer.
本实施例给出了一种基于NVIDIA公司研发的GPU的Tensor Core计算单元实现本实施例的数据处理方法的实例,为后续在定制芯片(如专用集成电路(Application Specific Integrated Circuit,ASIC)芯片)上应用该数据处理算法实现对深度学习模型量化提供了技术保障。This embodiment provides an example of implementing the data processing method of this embodiment based on the Tensor Core computing unit of the GPU developed by NVIDIA, which provides a basis for subsequent customization of chips (such as application specific integrated circuit (Application Specific Integrated Circuit, ASIC) chips). The application of this data processing algorithm provides technical support for the quantification of deep learning models.
本实施例的上述数据处理方法,依次完成了浮点数量化为低比特整数、低比特矩阵乘法和反量化的过程。由于权重矩阵的数值在整个计算过程中不会发生变化,因此其量化过程可以离线完成,而输入的特征矩阵是需要在线进行量化的。目标量化网络层的目标分段系数C的大小会直接影响量化过程的精度,通常目标分段系数C越大,量化表示的数值精度越低,相应的最终得到的输出特征的精度也会随之降低;而目标分段系数C越小量化表示的数 值精度越高,相应的最终得到的输出特征的精度也随之提高。即目标分段系数C影响计算效率。通常其数值越大所需要的指令条数越少,也就是计算耗时越小;反之其越小则计算耗时越大。因此,目标分段系数C是平衡模型精度和速度的关键,取值选择需要根据场景需求来定制。The above data processing method in this embodiment sequentially completes the process of converting floating point numbers into low-bit integers, low-bit matrix multiplication, and inverse quantization. Since the value of the weight matrix will not change during the entire calculation process, its quantization process can be completed offline, while the input feature matrix needs to be quantized online. The size of the target segmentation coefficient C of the target quantization network layer will directly affect the accuracy of the quantization process. Generally, the larger the target segmentation coefficient C, the lower the numerical accuracy of the quantification representation, and the corresponding accuracy of the final output feature will also decrease. Decrease; the smaller the target segmentation coefficient C, the higher the numerical accuracy of the quantitative representation, and the corresponding accuracy of the final output feature will also increase. That is, the target segmentation coefficient C affects the calculation efficiency. Generally, the larger the value, the fewer instructions are required, which means the calculation time is smaller; conversely, the smaller the value, the greater the calculation time is. Therefore, the target segmentation coefficient C is the key to balancing model accuracy and speed, and the value selection needs to be customized according to the scene requirements.
图3是本公开实施例提供的另一种数据处理方法的流程图。本公开实施例在上述实施例的基础上,对如何确定目标量化网络层、目标量化网络层的目标分段系数和目标量化比特数进行解释说明,如图3所示,本实施例提供的数据处理方法可以包括:Figure 3 is a flow chart of another data processing method provided by an embodiment of the present disclosure. Based on the above embodiments, the embodiment of the present disclosure explains how to determine the target quantization network layer, the target segmentation coefficient and the target quantization bit number of the target quantization network layer. As shown in Figure 3, the data provided by this embodiment Treatment methods may include:
S301,确定原始模型的可选量化策略。S301. Determine the optional quantization strategy of the original model.
原始模型可以是需要进行量化的深度学习模型,其中包含有至少一个能够量化的网络层,即可选量化网络层。该可选量化的网络层中包含有矩阵乘法算子。可选量化策略为对原始模型进行量化时所依据的策略,其中包含:可选量化网络层、可选量化网络层的可选分段系数和可选量化比特数。本实施例的可选量化策略的数量为多个,针对每一可选量化策略,其中都包含有一个可选量化网络层,和该可选量化网络层对应的一组量化配置参数,即可选分段系数和可选量化比特数。对于不同的可选量化策略,可以是包含的可选量化网络层不同,但不同可选量化网络层对应的可选分段系数和可选量化比特数相同;还可以是包含的可选量化网络层相同,但同一可选量化网络层对应的可选分段系数和/或可选量化比特数不同;也是可以包含的可选量化网络层与可选分段系数和可选量化比特数均不同等,对此不进行限定。The original model can be a deep learning model that needs to be quantized, which contains at least one network layer capable of quantification, that is, an optional quantization network layer. This optional quantized network layer contains matrix multiplication operators. The optional quantization strategy is the strategy used to quantize the original model, which includes: an optional quantization network layer, an optional segmentation coefficient of the optional quantization network layer, and an optional number of quantization bits. There are multiple optional quantization strategies in this embodiment. For each optional quantization strategy, there is an optional quantization network layer and a set of quantization configuration parameters corresponding to the optional quantization network layer, that is, Select segmentation coefficients and optional number of quantization bits. For different optional quantization strategies, the optional quantization network layers can be different, but the optional segmentation coefficients and the number of optional quantization bits corresponding to different optional quantization network layers are the same; it can also be the optional quantization network included. The layers are the same, but the optional segmentation coefficients and/or the number of optional quantization bits corresponding to the same optional quantization network layer are different; it is also possible to include optional quantization network layers with different optional segmentation coefficients and the number of optional quantization bits. etc., there is no limitation on this.
本实施例确定原始模型的可选量化策略的一种实现方式可以是:先确定出原始模型中包含有矩阵乘法算子的网络层,作为可选量化网络层,然后再根据经验为每一可选量化网络层配置至少一种可选分段系数和可选量化比特数;然后依次将每一可选量化网络层与其对应的每一种可选分段系数和可选量化比特数作为原始模型的一个可选量化策略。In this embodiment, an implementation method for determining the optional quantization strategy of the original model may be: first determine the network layer containing the matrix multiplication operator in the original model as the optional quantization network layer, and then based on experience for each possible quantization network layer. Select the quantization network layer to configure at least one optional segmentation coefficient and the number of optional quantization bits; then use each optional quantization network layer and its corresponding optional segmentation coefficient and the number of optional quantization bits in turn as the original model An optional quantification strategy.
另一种可实现方式为:先确定出原始模型中包含有矩阵乘法算子的网络层,作为可选量化网络层,然后针对每一可选量化网络层,从预先确定的备选量化分段系数集合中随机抽取分段系数,从备选量化比特数集合中随机抽取量化比特数,然后与可选量化网络层进行随机组合,得到多个可选量化策略。Another way to implement it is to first determine the network layer containing the matrix multiplication operator in the original model as an optional quantization network layer, and then for each optional quantization network layer, segment from the predetermined alternative quantization Segmented coefficients are randomly selected from the coefficient set, and the number of quantization bits is randomly selected from the set of alternative quantization bit numbers, and then randomly combined with the optional quantization network layer to obtain multiple optional quantization strategies.
S302,获取原始模型基于可选量化策略执行数据处理,得到的可选量化策略的量化贡献信息。S302: Obtain the quantitative contribution information of the optional quantization strategy obtained by performing data processing on the original model based on the optional quantization strategy.
本实施例的量化贡献信息是指可选量化策略对原始模型量化效果的贡献程度,其可以包括:模型精度信息和压缩体积信息。其中,模型精度信息是基于该可选量化策略对原始模型进行量化处理后,模型的精度值。压缩体积信息是基于该可选量化策略对原始模型进行量化处理后,模型体积相比于量化之前,压缩的体积值。The quantification contribution information in this embodiment refers to the degree of contribution of the optional quantization strategy to the quantization effect of the original model, which may include: model accuracy information and compression volume information. Among them, the model accuracy information is the accuracy value of the model after quantizing the original model based on the optional quantization strategy. The compressed volume information is the compressed volume value of the model volume after quantizing the original model based on this optional quantization strategy compared to before quantization.
本实施例可以是针对获取的每一可选量化策略,基于该可选量化策略对原始模型进行量化,在原始模型中找到该可选量化策略对应的可选量化网络层,然后将该可选量化网络层的可选分段系数和可选量化比特数,赋值到原始模型中该可选量化网络层的量化参数中,然后将原始模型的验证数据集输入到原始模型中,该原始模型中的每个网络层会基于其网络参数进行数据处理,得到对应的输出结果,本实施例主要获取的是可选量化网络层输出的结果,即测试输出特征;将该测试输出特征与可选量化网络层基于可选分段系数和可选量化比特数进行量化处理前的真实输出特征进行误差分析,得到该可选量化策略对应的量化贡献信息中的模型精度值。然后再根据该可选量化策略中的可选量化比特数确定该可选量化策略对应的量化贡献信息中的压缩体积信息。In this embodiment, for each optional quantization strategy obtained, the original model is quantized based on the optional quantization strategy, the optional quantization network layer corresponding to the optional quantization strategy is found in the original model, and then the optional quantization strategy is The optional segmentation coefficients and optional quantization bit numbers of the quantization network layer are assigned to the quantization parameters of the optional quantization network layer in the original model, and then the verification data set of the original model is input into the original model. Each network layer of will perform data processing based on its network parameters to obtain the corresponding output results. This embodiment mainly obtains the output results of the optional quantization network layer, that is, the test output features; combine the test output features with the optional quantization The network layer performs error analysis on the real output features before quantization processing based on the optional segmentation coefficient and the optional number of quantization bits, and obtains the model accuracy value in the quantization contribution information corresponding to the optional quantization strategy. Then, the compression volume information in the quantization contribution information corresponding to the optional quantization strategy is determined according to the number of optional quantization bits in the optional quantization strategy.
本实施例中,赋值了可选分段系数和可选量化比特数的可选量化网络层在根据其输入的特征矩阵和自身的权重矩阵,确定测试输出特征的方式可参考上述实施例介绍的内容,在此不进行赘述。In this embodiment, the optional quantization network layer assigned optional segmentation coefficients and optional quantization bit numbers determines the test output characteristics based on its input feature matrix and its own weight matrix. Refer to the method introduced in the above embodiment. The content will not be described in detail here.
S303,根据量化贡献信息,从可选量化策略中确定目标量化策略,以得到目标量化网络层、目标量化网络层的目标分段系数和目标量化比特数。S303. According to the quantization contribution information, determine the target quantization strategy from the optional quantization strategies to obtain the target quantization network layer, the target segmentation coefficient and the target quantization bit number of the target quantization network layer.
本实施例根据量化贡献信息,从可选量化策略中确定目标量化策略的方式可以是同时权衡模型精度信息和压缩体积信息,选择模型精度损失相对小,且压缩体积相对大的可选量化策略作为目标量化策略。例如,一种可实现方式为:先根据量化贡献信息中的模型精度信息,选择出模型精度损失在可接受范围内的可选量化策略,然后再判断选出的这部分可选量化策略对应的压缩体积,将压缩体积排名靠前的至少一个作为目标量化策略。In this embodiment, based on the quantification contribution information, the method of determining the target quantization strategy from the optional quantization strategies may be to weigh the model accuracy information and the compression volume information at the same time, and select an optional quantization strategy with a relatively small loss in model accuracy and a relatively large compression volume as the Goal quantification strategy. For example, one implementation method is: first, based on the model accuracy information in the quantitative contribution information, select optional quantization strategies with model accuracy losses within an acceptable range, and then determine the corresponding quantization strategies for this part of the selected optional quantization strategies. Compression volume, use at least one of the top compression volumes as the target quantization strategy.
另一种可实现方式为:根据量化贡献信息中的模型精度信息,将多个可选量化策略按照模型精度从高到低的顺序进行排序,然后再根据多个量化贡献信息中的压缩体积信息和预期压缩体积,按照排序从高到低的顺序,从可选量化策略中确定目标量化策略。例如,判断排序靠前的哪些可选量化策略对应的压缩体积信息的总和能够达到预期压缩体积,则将这些可选量化策略作为目标量化策略。Another possible implementation method is: according to the model accuracy information in the quantified contribution information, sort multiple optional quantization strategies in order from high to low model accuracy, and then based on the compression volume information in the multiple quantified contribution information and the expected compression volume, and determine the target quantization strategy from the optional quantization strategies in order from high to low. For example, it is determined which of the top-ranked optional quantization strategies correspond to a total compression volume information that can reach the expected compression volume, and then these optional quantization strategies are used as target quantization strategies.
本实施例在确定出目标量化策略后,可以依次基于每一目标量化策中对 应的目标量化网络层及其对应的目标分段系数和目标量化比特数,执行后续的数据处理操作。从而实现对原始模型中的目标量化网络层的运算过程进行量化处理,进而达到对原始模型进行量化的效果。In this embodiment, after determining the target quantization strategy, subsequent data processing operations can be performed based on the corresponding target quantization network layer in each target quantization strategy and its corresponding target segmentation coefficient and target number of quantization bits. In this way, the operation process of the target quantization network layer in the original model can be quantified, thereby achieving the effect of quantifying the original model.
S304,获取目标量化网络层输入的特征矩阵和目标量化网络层的权重矩阵。S304: Obtain the feature matrix input by the target quantization network layer and the weight matrix of the target quantization network layer.
特征矩阵的列数等于权重矩阵的行数。The number of columns of the feature matrix is equal to the number of rows of the weight matrix.
S305,根据目标量化网络层的目标分段系数,将特征矩阵的每一行特征元素划分为至少两个特征元素子段,以及将权重矩阵的每一列权重元素划分为至少两个权重元素子段。S305. According to the target segmentation coefficient of the target quantization network layer, divide each row of feature elements of the feature matrix into at least two feature element sub-segments, and divide each column of weight elements of the weight matrix into at least two weight element sub-segments.
S306,根据目标量化网络层的目标量化比特数,对至少两个特征元素子段和至少两个权重元素子段进行量化处理,并根据量化处理结果,确定目标量化网络层的输出特征。S306: Perform quantization processing on at least two feature element sub-segments and at least two weight element sub-segments according to the target quantization bit number of the target quantization network layer, and determine the output characteristics of the target quantization network layer based on the quantization processing results.
本公开实施例的方案,确定原始模型的可选量化策略后,控制原始模型基于可选量化策略执行数据处理,并根据处理结果确定可选量化策略的量化贡献信息,根据每个可选量化策略的量化贡献信息,确定目标量化策略,进而根据目标量化策略中的目标分段系数和目标量化比特数对目标量化网络层输入的特征矩阵和目标量化网络层的权重矩阵进行处理,得到输出特征。本方案基于多个可选量化策略的量化贡献信息来确定模型的最终量化策略,在保证了模型量化精度的同时,降低了模型体积,进而提高了模型量化的精度。According to the solution of the embodiment of the present disclosure, after determining the optional quantization strategy of the original model, the original model is controlled to perform data processing based on the optional quantization strategy, and the quantification contribution information of the optional quantization strategy is determined based on the processing results. According to each optional quantization strategy The quantitative contribution information is used to determine the target quantization strategy, and then the feature matrix input to the target quantization network layer and the weight matrix of the target quantization network layer are processed according to the target segmentation coefficient and the target quantization bit number in the target quantization strategy to obtain the output features. This solution determines the final quantization strategy of the model based on the quantitative contribution information of multiple optional quantization strategies. While ensuring the accuracy of model quantification, it also reduces the model volume, thereby improving the accuracy of model quantification.
图4是本公开实施例提供的另一种数据处理方法的流程图。本公开实施例在上述实施例的基础上,对如何根据量化贡献信息,从可选量化策略中确定目标量化策略进行解释说明,如图4所示,本实施例提供的数据处理方法可以包括:Figure 4 is a flow chart of another data processing method provided by an embodiment of the present disclosure. Based on the above embodiments, the embodiments of the present disclosure explain how to determine the target quantification strategy from the optional quantization strategies based on the quantification contribution information. As shown in Figure 4, the data processing method provided by this embodiment may include:
S401,确定原始模型的可选量化策略。S401. Determine the optional quantization strategy of the original model.
可选量化策略包括:可选量化网络层、可选量化网络层的可选分段系数和可选量化比特数。The optional quantization strategy includes: an optional quantization network layer, an optional segmentation coefficient of the optional quantization network layer, and an optional number of quantization bits.
S402,获取原始模型基于可选量化策略执行数据处理,得到的可选量化策略的量化贡献信息。S402: Obtain the quantitative contribution information of the optional quantization strategy obtained by performing data processing on the original model based on the optional quantization strategy.
S403,根据可选量化策略对应的量化贡献信息,从可选量化策略中确定新增选定量化策略。S403: Determine a newly selected quantification strategy from the optional quantification strategies based on the quantification contribution information corresponding to the optional quantification strategies.
本实施例在从可选量化策略中确目标量化策略时,是通过多次筛选确定, 即每次筛选出一部分,将多次筛选出的所有可选量化策略作为目标量化策略,而对于当前次筛选出的那部分可选量化策略,将其作为新增选定量化策略,对于当前次之前筛选出的那部分可选量化策略作为历史选定量化策略。In this embodiment, when determining the target quantification strategy from the optional quantification strategies, it is determined through multiple screenings, that is, a part is screened out each time, and all the optional quantification strategies screened out multiple times are used as the target quantification strategies, and for the current time The selected optional quantitative strategies will be used as newly selected quantitative strategies, and the selected optional quantitative strategies before the current time will be used as historical selected quantitative strategies.
本实施例可以根据可选量化策略对应的量化贡献信息,从可选量化策略中确定新增选定量化策略的一种可实现方式为:同时结合模型精度信息和压缩体积信息,每次从可选量化策略中选择精度损失较小且压缩体积较大的预设个数(如3个)的可选量化策略作为新增选定量化策略。In this embodiment, one possible way to determine a new selected quantization strategy from the optional quantization strategies based on the quantitative contribution information corresponding to the optional quantization strategies is to simultaneously combine the model accuracy information and the compression volume information, each time from the available Among the selected quantization strategies, select a preset number of optional quantization strategies (such as 3) with smaller accuracy loss and larger compression volume as the newly selected quantization strategy.
另一种可实现方式为:根据可选量化策略对应的模型精度信息和压缩体积信息,对可选量化策略进行排序;根据排序结果和可选量化策略对应的压缩体积信息,从可选量化策略中确定新增选定量化策略。根据当前的原始模型的模型体积L和期望压缩体积R,计算本次筛选压缩体积R',其中R'=(L-R)/2。然后将每个可选量化策略按照其对应的模型精度从高到低的顺序进行排序,判断排序靠前的哪些可选量化策略对应的压缩体积信息的总和能够达到本次筛选压缩体积,则将这些可选量化策略作为本次的新增选定量化策略。其中,所述L、R和R'的取值为正数;Another implementation method is to sort the optional quantization strategies according to the model accuracy information and compression volume information corresponding to the optional quantization strategies; according to the sorting results and the compression volume information corresponding to the optional quantization strategies, sort the optional quantization strategies from Confirm to add the selected quantitative strategy. According to the model volume L of the current original model and the expected compression volume R, the compression volume R' of this screening is calculated, where R'=(L-R)/2. Then, each optional quantization strategy is sorted from high to low according to its corresponding model accuracy, and it is judged that the sum of the compression volume information corresponding to the top-ranked optional quantization strategies can reach the compression volume of this screening, then These optional quantitative strategies are newly selected quantitative strategies this time. Wherein, the values of L, R and R' are positive numbers;
本实施例可以采用第二种方式来确定新增选定量化策略,该方式能够更快且更精准的选出满足量化精度和量化体积要求的目标量化策略。In this embodiment, the second method can be used to determine the newly selected quantization strategy. This method can select a target quantization strategy that meets the requirements of quantization accuracy and quantization volume faster and more accurately.
S404,确定新增选定策略和历史选定量化策略的总压缩体积。S404: Determine the total compression volume of the newly selected strategy and the historically selected quantization strategy.
本实施例每次确定出新增选定策略后,都会根据新增选定策略对应的量化贡献信息中的压缩体积信息,和之前确定的每个历史选定策略对应的量化贡献信息中的压缩体积信息,计算新增选定策略和历史选定策略对原始模型的总压缩体积。如,将新增选定策略对应的压缩体积与历史选定策略对应的压缩体积求和得到总压缩体积。Each time a newly selected strategy is determined in this embodiment, the compression volume information in the quantitative contribution information corresponding to the newly selected strategy and the compression volume information in the quantitative contribution information corresponding to each previously determined historical selected strategy will be used. Volume information, calculate the total compression volume of the original model by newly selected strategies and historical selected strategies. For example, the total compression volume is obtained by summing the compression volume corresponding to the newly selected strategy and the compression volume corresponding to the historical selected strategy.
S405,判断总压缩体积是否达到量化要求,若总压缩体积未达到量化要求,则执行S406,若总压缩体积达到量化要求,则执行S409。S405: Determine whether the total compression volume meets the quantification requirement. If the total compression volume does not meet the quantification requirement, execute S406. If the total compression volume reaches the quantification requirement, execute S409.
该量化要求可以是预先设置的预期压缩体积。本实施例可以在每次确定出一部分新增选定策略后,判断一次当前已经达到的总压缩体积是否达到预期压缩体积,即是否达到量化要求,若当前已经达到的总压缩体积未达到预期压缩体积,则说明没有达到量化要求,需要执行后续S406的操作,若当前已经达到的总压缩体积达到预期压缩体积,则说明达到量化要求,执行后续S409的操作。The quantization requirement may be a preset expected compression volume. This embodiment can determine whether the currently reached total compression volume has reached the expected compression volume, that is, whether it has met the quantification requirements, after each time a part of the newly selected strategies is determined. If the currently reached total compression volume has not reached the expected compression volume, it means that the quantification requirements have not been met, and the subsequent operation of S406 needs to be performed. If the total compression volume currently reached reaches the expected compression volume, it means that the quantification requirements have been met, and the subsequent operations of S409 need to be performed.
S406,在总压缩体积未达到量化要求的情况下,基于新增选定量化策略对原始模型进行初步量化,并训练初步量化后的原始模型,得到初步量化模 型。S406, when the total compression volume does not meet the quantization requirements, perform preliminary quantization on the original model based on the newly selected quantization strategy, and train the original model after preliminary quantization to obtain a preliminary quantization model.
若S405判断总压缩体积未达到量化要求,即未达到预期压缩体积,则基于新增量化策略中的可选量化网络层及其可选分段系数和可选量化比特数,对原始模型中对应的可选量化网络层进行量化参数赋值,即实现对原始模型的初步量化。然后采用训练样本对初步量化后的原始模型进行模型训练,如可以包括前向训练和反向训练两部分。以得到初步量化模型。If S405 determines that the total compression volume does not meet the quantization requirements, that is, the expected compression volume is not reached, then based on the optional quantization network layer in the new quantization strategy and its optional segmentation coefficients and optional number of quantization bits, the corresponding compression volume in the original model will be The optional quantization network layer assigns quantization parameters, that is, achieves preliminary quantification of the original model. Then use the training samples to perform model training on the initially quantized original model, which can include forward training and reverse training. to obtain a preliminary quantitative model.
S407,将新增选定量化策略添加到历史选定量化策略。S407: Add the newly selected quantitative strategy to the historically selected quantitative strategy.
S408,将可选量化策略中除新增选定量化策略之外的其他可选量化策略作为新的可选量化策略,将新增选定量化策略添加到历史选定量化策略中,以及将初步量化模型作为原始模型,返回执行S402的操作。S408: Use optional quantification strategies other than the newly selected quantification strategy among the optional quantification strategies as new optional quantification strategies, add the newly selected quantification strategy to the historically selected quantification strategy, and add the preliminary The quantized model is used as the original model, and the operation of S402 is returned.
S409,在总压缩体积达到量化要求的情况下,将新增选定量化策略和历史选定量化策略作为目标量化策略,以得到目标量化网络层、目标量化网络层的目标分段系数和目标量化比特数。S409, when the total compression volume reaches the quantization requirement, use the newly selected quantization strategy and the historically selected quantization strategy as the target quantization strategy to obtain the target quantization network layer and the target segmentation coefficient and target quantization of the target quantization network layer. Number of bits.
S410,获取目标量化网络层输入的特征矩阵和目标量化网络层的权重矩阵。S410: Obtain the feature matrix input to the target quantization network layer and the weight matrix of the target quantization network layer.
特征矩阵的列数等于权重矩阵的行数。The number of columns of the feature matrix is equal to the number of rows of the weight matrix.
S411,根据目标量化网络层的目标分段系数,将特征矩阵的每一行特征元素划分为至少两个特征元素子段,以及将权重矩阵的每一列权重元素划分为至少两个权重元素子段。S411. According to the target segmentation coefficient of the target quantization network layer, divide each row of feature elements of the feature matrix into at least two feature element sub-segments, and divide each column of weight elements of the weight matrix into at least two weight element sub-segments.
S412,根据目标量化网络层的目标量化比特数,对至少两个特征元素子段和至少两个权重元素子段进行量化处理,并根据量化处理结果,确定目标量化网络层的输出特征。S412: Perform quantization processing on at least two feature element sub-segments and at least two weight element sub-segments according to the target quantization bit number of the target quantization network layer, and determine the output characteristics of the target quantization network layer based on the quantization processing results.
本公开实施例的方案,确定原始模型的可选量化策略后,控制原始模型基于可选量化策略执行数据处理,并根据处理结果确定可选量化策略的量化贡献信息,根据每个可选量化策略的量化贡献信息,分批次从可选量化策略中确定新增选定量化策略,若新增和历史选定量化策略的总压缩体积未达到量化策略时,则基于新增选定量化策略对原始模型进行量化和训练后,再返回重新执行确定可选量化策略的量化贡献信息及其后续操作,直到新增和历史选定量化策略的总压缩体积达到量化策略为止,然后将新增和历史选定量化策略作为目标量化策略,进而根据目标量化策略中的目标分段系数和目标量化比特数对目标量化网络层输入的特征矩阵和目标量化网络层的权重矩阵进行处理,得到输出特征。本方案分批次获取目标量化策略,且两两批次之间基于新选定量化策略进行原始模型的量化和训练处理,极大的保证了提取 到的目标量化策略的精准性,进而保证了模型量化的精度。According to the solution of the embodiment of the present disclosure, after determining the optional quantization strategy of the original model, the original model is controlled to perform data processing based on the optional quantization strategy, and the quantification contribution information of the optional quantization strategy is determined based on the processing results. According to each optional quantization strategy Based on the quantitative contribution information, the newly selected quantization strategy is determined from the optional quantization strategies in batches. If the total compression volume of the newly selected quantization strategy and the historically selected quantization strategy does not reach the quantization strategy, the newly selected quantization strategy will be used based on the newly selected quantization strategy. After the original model is quantized and trained, return and re-execute the quantification contribution information of the optional quantization strategy and its subsequent operations until the total compression volume of the new and historical selected quantization strategies reaches the quantization strategy, and then the new and historical selected quantization strategies will be added. The quantization strategy is selected as the target quantization strategy, and then the feature matrix input to the target quantization network layer and the weight matrix of the target quantization network layer are processed according to the target segmentation coefficient and the target quantization bit number in the target quantization strategy to obtain the output features. This solution obtains the target quantification strategy in batches, and performs quantification and training processing of the original model based on the newly selected quantification strategy between batches, which greatly ensures the accuracy of the extracted target quantification strategy, thereby ensuring Model quantification accuracy.
图5是本公开实施例提供的一种数据处理装置的结构示意图,本公开实施例适用于对深度学习模型中目标量化网络层的数据计算过程进行量化处理的情况,适用于对深度学习模型中的目标量化网络层输入的特征矩阵和目标量化网络层的权重矩阵进行处理,得到该目标网络层的输出特征的情况。该装置可以配置于安装有深度学习模型的电子设备中,采用软件和/或硬件来实现,该装置可以实现本公开任意实施例的数据处理方法。如图5所示,该数据处理装置500包括:Figure 5 is a schematic structural diagram of a data processing device provided by an embodiment of the present disclosure. The embodiment of the present disclosure is suitable for performing quantitative processing on the data calculation process of the target quantization network layer in the deep learning model, and is suitable for performing quantitative processing on the data calculation process of the target quantization network layer in the deep learning model. The feature matrix input to the target quantization network layer and the weight matrix of the target quantization network layer are processed to obtain the output features of the target network layer. The device can be configured in an electronic device installed with a deep learning model and implemented using software and/or hardware. The device can implement the data processing method of any embodiment of the present disclosure. As shown in Figure 5, the data processing device 500 includes:
矩阵获取模块501,设置为获取目标量化网络层输入的特征矩阵和目标量化网络层的权重矩阵;其中,特征矩阵的列数等于权重矩阵的行数;矩阵分段模块502,设置为根据目标量化网络层的目标分段系数,将特征矩阵的每一行特征元素划分为至少两个特征元素子段,以及将权重矩阵的每一列权重元素划分为至少两个权重元素子段;量化处理模块503,设置为根据目标量化网络层的目标量化比特数,对至少两个特征元素子段和至少两个权重元素子段进行量化处理;特征确定模块504,设置为根据量化处理结果,确定目标量化网络层的输出特征。The matrix acquisition module 501 is configured to acquire the feature matrix input by the target quantization network layer and the weight matrix of the target quantization network layer; wherein the number of columns of the feature matrix is equal to the number of rows of the weight matrix; the matrix segmentation module 502 is configured to obtain the target quantization according to The target segmentation coefficient of the network layer divides each row of feature elements of the feature matrix into at least two feature element sub-segments, and divides each column of weight elements of the weight matrix into at least two weight element sub-segments; the quantification processing module 503, It is configured to perform quantization processing on at least two feature element sub-segments and at least two weight element sub-segments according to the target quantization bit number of the target quantization network layer; the feature determination module 504 is configured to determine the target quantization network layer based on the quantization processing results. output characteristics.
本公开实施例的方案,获取目标量化网络层输入的特征矩阵和该目标量化网络层的权重矩阵后,基于目标分段系数,分别将特征矩阵的每一行和权重矩阵的每一列划分为至少两个权重元素子段,进而根据目标量化比特数,对划分后的特征元素子段和权重元素子段进行量化处理,并根据处理结果,确定目标量化网络层的输出特征。本方案通过引入目标分段系数,将特征矩阵的每一行和权重矩阵的每一列划分为多个子段进行量化,在达到低比特量化矩阵乘法的同时,还保证了低比特矩阵乘法量化的精准性,即能够在尽可能保证模型精度的同时,压缩模型体积,提高模型运行速度,从而降低人工智能技术落地成本。According to the solution of the embodiment of the present disclosure, after obtaining the feature matrix input by the target quantization network layer and the weight matrix of the target quantization network layer, based on the target segmentation coefficient, each row of the feature matrix and each column of the weight matrix are divided into at least two weight element sub-segments, and then perform quantization processing on the divided feature element sub-segments and weight element sub-segments according to the target quantization bit number, and determine the output characteristics of the target quantization network layer based on the processing results. By introducing target segmentation coefficients, this solution divides each row of the feature matrix and each column of the weight matrix into multiple sub-segments for quantization. While achieving low-bit quantization matrix multiplication, it also ensures the accuracy of low-bit matrix multiplication quantification. , that is, it can compress the model volume and improve the model running speed while ensuring the accuracy of the model as much as possible, thereby reducing the cost of implementing artificial intelligence technology.
一实施例中,矩阵分段模块502,设置为:In one embodiment, the matrix segmentation module 502 is configured as:
根据目标量化网络层的目标分段系数中的第一系数,将特征矩阵的每一行特征元素划分为至少两个特征元素子段;根据目标分段系数中的第二系数,将权重矩阵的每一列权重元素划分为至少两个权重元素子段;其中,第一系数和第二系数成整数倍关系。According to the first coefficient in the target segmentation coefficient of the target quantization network layer, each row of feature elements of the feature matrix is divided into at least two feature element sub-segments; according to the second coefficient in the target segmentation coefficient, each row of the weight matrix is A column of weight elements is divided into at least two weight element sub-segments; wherein the first coefficient and the second coefficient are in an integer multiple relationship.
如图6所示,一实施例中,特征确定模块504,包括:As shown in Figure 6, in one embodiment, the feature determination module 504 includes:
子段对确定单元610,设置为确定特征矩阵中的每一特征元素子段,在 权重矩阵中对应的权重元素子段,并将具有对应关系的特征元素子段和权重元素子段作为一组关联子段对;特征计算单元620,设置为根据每组关联子段对中的特征元素子段和权重元素子段的量化处理结果,确定目标量化网络层的输出特征。The sub-segment pair determining unit 610 is configured to determine each feature element sub-segment in the feature matrix and the corresponding weight element sub-segment in the weight matrix, and use the corresponding feature element sub-segments and weight element sub-segments as a group Associated sub-segment pairs; the feature calculation unit 620 is configured to determine the output features of the target quantization network layer based on the quantization processing results of the feature element sub-segments and weight element sub-segments in each set of associated sub-segment pairs.
一实施例中,每组关联子段对中包含的特征元素子段和权重元素子段的数量比值,与划分权重矩阵和特征矩阵的分段系数的比值相同。In one embodiment, the ratio of the number of feature element sub-segments and weight element sub-segments contained in each group of associated sub-segment pairs is the same as the ratio of the segmentation coefficients that divide the weight matrix and the feature matrix.
一实施例中,特征确定模块504,设置为:In one embodiment, the feature determination module 504 is configured as:
通过图形处理器GPU的张量核Tensor Core计算单元,根据量化处理结果,确定目标量化网络层的输出特征。Through the Tensor Core computing unit of the graphics processor GPU, the output characteristics of the target quantization network layer are determined based on the quantization processing results.
一实施例中,特征矩阵是语音片段经特征提取层处理后得到的语音特征;输出特征用于对语音片段进行语义识别处理。In one embodiment, the feature matrix is the speech features obtained after the speech segments are processed by the feature extraction layer; the output features are used to perform semantic recognition processing on the speech segments.
如图7所示,一实施例中,该数据处理装置500,还包括:As shown in Figure 7, in one embodiment, the data processing device 500 also includes:
可选策略确定模块505,设置为确定原始模型的可选量化策略;其中,可选量化策略包括:可选量化网络层、可选量化网络层的可选分段系数和可选量化比特数;贡献信息获取模块506,设置为获取原始模型基于可选量化策略执行数据处理,得到的可选量化策略的量化贡献信息;目标策略确定模块507,设置为根据量化贡献信息,从可选量化策略中确定目标量化策略,以得到目标量化网络层、目标量化网络层的目标分段系数和目标量化比特数。The optional strategy determination module 505 is configured to determine the optional quantization strategy of the original model; wherein the optional quantization strategy includes: an optional quantization network layer, an optional segmentation coefficient of the optional quantization network layer, and an optional number of quantization bits; The contribution information acquisition module 506 is configured to obtain the quantitative contribution information of the optional quantification strategy obtained by performing data processing based on the optional quantification strategy on the original model; the target strategy determination module 507 is configured to obtain the quantitative contribution information from the optional quantification strategy based on the quantitative contribution information. Determine the target quantization strategy to obtain the target quantization network layer, the target segmentation coefficient and the target quantization bit number of the target quantization network layer.
如图8所示,一实施例中,目标策略确定模块507,包括:As shown in Figure 8, in one embodiment, the target policy determination module 507 includes:
新增策略确定单元710,设置为根据可选量化策略对应的量化贡献信息,从可选量化策略中确定新增选定量化策略;压缩体积确定单元720,设置为确定新增选定策略和历史选定量化策略的总压缩体积;目标策略确定单元730,设置为在总压缩体积达到量化要求的情况下,则将新增选定量化策略和历史选定量化策略作为目标量化策略。The new strategy determination unit 710 is configured to determine a newly selected quantization strategy from the optional quantization strategies based on the quantification contribution information corresponding to the optional quantization strategy; the compression volume determination unit 720 is configured to determine the newly selected strategy and history The total compression volume of the selected quantization strategy; the target strategy determination unit 730 is configured to use the newly selected quantization strategy and the historically selected quantization strategy as the target quantization strategy when the total compression volume reaches the quantization requirement.
一实施例中,量化贡献信息包括:模型精度信息和压缩体积信息;新增策略确定单元710,设置为:In one embodiment, the quantitative contribution information includes: model accuracy information and compression volume information; a new strategy determination unit 710 is set to:
根据可选量化策略对应的模型精度信息和压缩体积信息,对可选量化策略进行排序;根据排序结果和可选量化策略对应的压缩体积信息,从可选量化策略中确定新增选定量化策略。Sort the optional quantization strategies according to the model accuracy information and compression volume information corresponding to the optional quantization strategies; determine the newly selected quantization strategy from the optional quantization strategies based on the sorting results and the compression volume information corresponding to the optional quantization strategies. .
如图9所示,一实施例中,目标策略确定模块507,还包括:As shown in Figure 9, in one embodiment, the target policy determination module 507 also includes:
量化训练单元740,设置为在总压缩体积未达到量化要求的情况下,则基于新增选定量化策略对原始模型进行初步量化,并训练初步量化后的原始 模型,得到初步量化模型;历史量化策略更新单元750,设置为将新增选定量化策略添加到历史选定量化策略;循环操作单元760,设置为将可选量化策略中除新增选定量化策略之外的其他可选量化策略作为新的可选量化策略,并将初步量化模型作为原始模型,返回执行获取原始模型基于可选量化策略执行数据处理,得到的可选量化策略的量化贡献信息的操作。The quantization training unit 740 is configured to perform preliminary quantization on the original model based on the newly selected quantization strategy when the total compression volume does not meet the quantization requirements, and trains the original model after preliminary quantization to obtain a preliminary quantization model; historical quantification The strategy update unit 750 is configured to add the newly selected quantization strategy to the historically selected quantization strategy; the loop operation unit 760 is configured to add other optional quantization strategies in the optional quantization strategies except the newly selected quantization strategy. As a new optional quantization strategy, and using the preliminary quantization model as the original model, return to perform the operation of obtaining the quantitative contribution information of the optional quantization strategy obtained by performing data processing on the original model based on the optional quantization strategy.
上述产品可执行本公开任意实施例所提供的方法,具备执行方法相应的功能模块和效果。The above-mentioned products can execute the methods provided by any embodiment of the present disclosure, and have corresponding functional modules and effects for executing the methods.
本公开的技术方案中,所涉及的特征矩阵、权重矩阵、输出特征、语音片段等的获取,存储和应用等,均符合相关法律法规的规定,且不违背公序良俗。In the technical solution of this disclosure, the acquisition, storage and application of feature matrices, weight matrices, output features, voice clips, etc. are all in compliance with relevant laws and regulations and do not violate public order and good customs.
根据本公开的实施例,本公开还提供了一种电子设备、一种可读存储介质和一种计算机程序产品,以实现上述的数据处理方法。According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium, and a computer program product to implement the above data processing method.
图10示出了可以用来实施本公开的实施例的示例电子设备600的示意性框图。电子设备600旨在表示多种形式的数字计算机,诸如,膝上型计算机、台式计算机、工作台、个人数字助理、服务器、刀片式服务器、大型计算机、和其它适合的计算机。电子设备还可以表示多种形式的移动装置,诸如,个人数字助理、蜂窝电话、智能电话、可穿戴设备和其它类似的计算装置。本文所示的部件、它们的连接和关系、以及它们的功能仅仅作为示例,并且不意在限制本文中描述的和/或者要求的本公开的实现。Figure 10 shows a schematic block diagram of an example electronic device 600 that may be used to implement embodiments of the present disclosure. Electronic device 600 is intended to represent many forms of digital computers, such as laptop computers, desktop computers, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers. Electronic devices may also represent various forms of mobile devices, such as personal digital assistants, cellular phones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions are examples only and are not intended to limit implementations of the disclosure described and/or claimed herein.
如图10所示,电子设备600包括计算单元601,其可以根据存储在只读存储器(Read-Only Memory,ROM)602中的计算机程序或者从存储单元608加载到随机访问存储器(Random Access Memory,RAM)603中的计算机程序,来执行多种适当的动作和处理。在RAM 603中,还可存储电子设备600操作所需的多种程序和数据。计算单元601、ROM 602以及RAM 603通过总线604彼此相连。输入/输出(Input/Output,I/O)接口605也连接至总线604。As shown in Figure 10, the electronic device 600 includes a computing unit 601, which can be loaded into a random access memory (Random Access Memory) according to a computer program stored in a read-only memory (Read-Only Memory, ROM) 602 or from a storage unit 608. Computer program in RAM) 603 to perform various appropriate actions and processes. In the RAM 603, various programs and data required for the operation of the electronic device 600 can also be stored. Computing unit 601, ROM 602 and RAM 603 are connected to each other via bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
电子设备600中的多个部件连接至I/O接口605,包括:输入单元606,例如键盘、鼠标等;输出单元607,例如多种类型的显示器、扬声器等;存储单元608,例如磁盘、光盘等;以及通信单元609,例如网卡、调制解调器、无线通信收发机等。通信单元609允许电子设备600通过诸如因特网的计算机网络和/或多种电信网络与其他设备交换信息/数据。Multiple components in the electronic device 600 are connected to the I/O interface 605, including: an input unit 606, such as a keyboard, a mouse, etc.; an output unit 607, such as various types of displays, speakers, etc.; a storage unit 608, such as a magnetic disk, an optical disk, etc. etc.; and a communication unit 609, such as a network card, modem, wireless communication transceiver, etc. The communication unit 609 allows the electronic device 600 to exchange information/data with other devices through a computer network such as the Internet and/or various telecommunications networks.
计算单元601可以是多种具有处理和计算能力的通用和/或专用处理组件。计算单元601的一些示例包括但不限于中央处理单元(Central Processing  Unit,CPU)、图形处理单元(Graphics Processing Unit,GPU)、多种专用的人工智能(Artificial Intelligence,AI)计算芯片、多种运行机器学习模型算法的计算单元、数字信号处理器(Digital Signal Processing,DSP)、以及任何适当的处理器、控制器、微控制器等。计算单元601执行上文所描述的多个方法和处理,例如数据处理方法。例如,在一些实施例中,数据处理方法可被实现为计算机软件程序,其被有形地包含于机器可读介质,例如存储单元608。在一些实施例中,计算机程序的部分或者全部可以经由ROM 602和/或通信单元609而被载入和/或安装到电子设备600上。当计算机程序加载到RAM 603并由计算单元601执行时,可以执行上文描述的数据处理方法的一个或多个步骤。备选地,在其他实施例中,计算单元601可以通过其他任何适当的方式(例如,借助于固件)而被配置为执行数据处理方法。 Computing unit 601 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 601 include, but are not limited to, a central processing unit (Central Processing Unit, CPU), a graphics processing unit (Graphics Processing Unit, GPU), a variety of dedicated artificial intelligence (Artificial Intelligence, AI) computing chips, a variety of running The computing unit of the machine learning model algorithm, Digital Signal Processing (DSP), and any appropriate processor, controller, microcontroller, etc. The computing unit 601 performs a plurality of methods and processes described above, such as data processing methods. For example, in some embodiments, the data processing method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 608. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 600 via the ROM 602 and/or the communication unit 609. When the computer program is loaded into RAM 603 and executed by computing unit 601, one or more steps of the data processing method described above may be performed. Alternatively, in other embodiments, the computing unit 601 may be configured to perform the data processing method in any other suitable manner (eg, by means of firmware).
本文中以上描述的系统和技术的多种实施方式可以在数字电子电路系统、集成电路系统、现场可编程门阵列(Field Programmable Gate Array,FPGA)、专用集成电路(Application Specific Integrated Circuit,ASIC)、专用标准产品(Application Specific Standard Parts,ASSP)、芯片上的系统(System on Chip,SOC)、复杂可编程逻辑设备(Complex Programmable Logic Device,CPLD)、计算机硬件、固件、软件、和/或它们的组合中实现。多种实施方式可以包括:实施在一个或者多个计算机程序中,该一个或者多个计算机程序可在包括至少一个可编程处理器的可编程系统上执行和/或解释,该可编程处理器可以是专用或者通用可编程处理器,可以从存储系统、至少一个输入装置、和至少一个输出装置接收数据和指令,并且将数据和指令传输至该存储系统、该至少一个输入装置、和该至少一个输出装置。Various implementations of the systems and techniques described above may be implemented in digital electronic circuit systems, integrated circuit systems, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Parts (ASSP), System on Chip (SOC), Complex Programmable Logic Device (CPLD), computer hardware, firmware, software, and/or their realized in combination. Various implementations may include implementation in one or more computer programs that may be executed and/or interpreted on a programmable system including at least one programmable processor that may is a special-purpose or general-purpose programmable processor that can receive data and instructions from a storage system, at least one input device, and at least one output device, and transmit data and instructions to the storage system, the at least one input device, and the at least one output device.
用于实施本公开的方法的程序代码可以采用一个或多个编程语言的任何组合来编写。这些程序代码可以提供给通用计算机、专用计算机或其他可编程数据处理装置的处理器或控制器,使得程序代码当由处理器或控制器执行时使流程图和/或框图中所规定的功能/操作被实施。程序代码可以完全在机器上执行、部分地在机器上执行,作为独立软件包部分地在机器上执行且部分地在远程机器上执行或完全在远程机器或服务器上执行。Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general-purpose computer, special-purpose computer, or other programmable data processing device, such that the program codes, when executed by the processor or controller, cause the functions specified in the flowcharts and/or block diagrams/ The operation is implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、RAM、ROM、可擦除可编程只读存储器(Erasable Programmable  Read-Only Memory,EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(Compact Disc Read-Only Memory,CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。In the context of this disclosure, a machine-readable medium may be a tangible medium that may contain or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. Machine-readable media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices or devices, or any suitable combination of the foregoing. Examples of machine-readable storage media would include one or more wire-based electrical connections, laptop disks, hard drives, RAM, ROM, Erasable Programmable Read-Only Memory (EPROM), or flash memory ), optical fiber, portable compact disk read-only memory (Compact Disc Read-Only Memory, CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
为了提供与用户的交互,可以在计算机上实施此处描述的系统和技术,该计算机具有:设置为向用户显示信息的显示装置(例如,阴极射线管(Cathode Ray Tube,CRT)或者液晶显示器(Liquid Crystal Display,LCD)监视器);以及键盘和指向装置(例如,鼠标或者轨迹球),用户可以通过该键盘和该指向装置来将输入提供给计算机。其它种类的装置还可以设置为提供与用户的交互;例如,提供给用户的反馈可以是任何形式的传感反馈(例如,视觉反馈、听觉反馈、或者触觉反馈);并且可以用任何形式(包括声输入、语音输入或者、触觉输入)来接收来自用户的输入。To provide interaction with a user, the systems and techniques described herein may be implemented on a computer having a display device (e.g., a cathode ray tube (CRT)) or a liquid crystal display (e.g., a CRT) configured to display information to a user. Liquid Crystal Display (LCD) monitor); and a keyboard and pointing device (e.g., a mouse or a trackball) through which a user can provide input to the computer. Other kinds of devices may also be configured to provide interaction with the user; for example, the feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and may be provided in any form, including Acoustic input, voice input or tactile input) to receive input from the user.
可以将此处描述的系统和技术实施在包括后台部件的计算系统(例如,作为数据服务器)、或者包括中间件部件的计算系统(例如,应用服务器)、或者包括前端部件的计算系统(例如,具有图形用户界面或者网络浏览器的用户计算机,用户可以通过该图形用户界面或者该网络浏览器来与此处描述的系统和技术的实施方式交互)、或者包括这种后台部件、中间件部件、或者前端部件的任何组合的计算系统中。可以通过任何形式或者介质的数字数据通信(例如,通信网络)来将系统的部件相互连接。通信网络的示例包括:局域网(Local Area Network,LAN)、广域网(Wide Area Network,WAN)、区块链网络和互联网。The systems and techniques described herein may be implemented in a computing system that includes back-end components (e.g., as a data server), or a computing system that includes middleware components (e.g., an application server), or a computing system that includes front-end components (e.g., A user's computer having a graphical user interface or web browser through which the user can interact with implementations of the systems and technologies described herein), or including such backend components, middleware components, or any combination of front-end components in a computing system. The components of the system may be interconnected by any form or medium of digital data communication (eg, a communications network). Examples of communication networks include: Local Area Network (LAN), Wide Area Network (WAN), blockchain network, and the Internet.
计算机系统可以包括客户端和服务器。客户端和服务器一般远离彼此并且通常通过通信网络进行交互。通过在相应的计算机上运行并且彼此具有客户端-服务器关系的计算机程序来产生客户端和服务器的关系。服务器可以是云服务器,又称为云计算服务器或云主机,是云计算服务体系中的一项主机产品,以解决了传统物理主机与虚拟专用服务器(Virtual Private Server,VPS)服务中,存在的管理难度大,业务扩展性弱的缺陷。服务器也可以为分布式系统的服务器,或者是结合了区块链的服务器。Computer systems may include clients and servers. Clients and servers are generally remote from each other and typically interact over a communications network. The relationship of client and server is created by computer programs running on corresponding computers and having a client-server relationship with each other. The server can be a cloud server, also known as cloud computing server or cloud host. It is a host product in the cloud computing service system to solve the problems that exist in traditional physical host and virtual private server (VPS) services. It has the disadvantages of difficult management and weak business scalability. The server can also be a distributed system server or a server combined with a blockchain.
人工智能是研究使计算机来模拟人的一些思维过程和智能行为(如学习、推理、思考、规划等)的学科,既有硬件层面的技术也有软件层面的技术。人工智能硬件技术一般包括如传感器、专用人工智能芯片、云计算、分布式存储、大数据处理等技术;人工智能软件技术主要包括计算机视觉技术、语音识别技术、自然语言处理技术及机器学习/深度学习技术、大数据处理技术、知识图谱技术等几大方向。Artificial intelligence is the study of using computers to simulate some human thinking processes and intelligent behaviors (such as learning, reasoning, thinking, planning, etc.). It has both hardware-level technology and software-level technology. Artificial intelligence hardware technology generally includes sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing and other technologies; artificial intelligence software technology mainly includes computer vision technology, speech recognition technology, natural language processing technology and machine learning/depth Learning technology, big data processing technology, knowledge graph technology and other major directions.
云计算(cloud computing),指的是通过网络接入弹性可扩展的共享物 理或虚拟资源池,资源可以包括服务器、操作系统、网络、软件、应用和存储设备等,并可以按需、自服务的方式对资源进行部署和管理的技术体系。通过云计算技术,可以为人工智能、区块链等技术应用、模型训练提供高效强大的数据处理能力。Cloud computing refers to a flexible and scalable shared physical or virtual resource pool through network access. Resources can include servers, operating systems, networks, software, applications, storage devices, etc., and can be on-demand and self-service. A technical system for deploying and managing resources. Through cloud computing technology, it can provide efficient and powerful data processing capabilities for artificial intelligence, blockchain and other technology applications and model training.
可以使用上面所示的多种形式的流程,重新排序、增加或删除步骤。例如,本公开中记载的多个步骤可以并行地执行也可以顺序地执行也可以不同的次序执行,只要能够实现本公开公开的技术方案所期望的结果,本文在此不进行限制。Steps can be reordered, added, or removed using various forms of the process shown above. For example, multiple steps described in the present disclosure can be executed in parallel, sequentially, or in different orders. As long as the desired results of the technical solution disclosed in the present disclosure can be achieved, there is no limitation here.

Claims (23)

  1. 一种数据处理方法,包括:A data processing method including:
    获取目标量化网络层输入的特征矩阵和所述目标量化网络层的权重矩阵;其中,所述特征矩阵的列数等于所述权重矩阵的行数;Obtain the feature matrix input by the target quantization network layer and the weight matrix of the target quantization network layer; wherein the number of columns of the feature matrix is equal to the number of rows of the weight matrix;
    根据所述目标量化网络层的目标分段系数,将所述特征矩阵的每一行特征元素划分为至少两个特征元素子段,以及将所述权重矩阵的每一列权重元素划分为至少两个权重元素子段;According to the target segmentation coefficient of the target quantization network layer, each row of feature elements of the feature matrix is divided into at least two feature element sub-segments, and each column of weight elements of the weight matrix is divided into at least two weights. element subsection;
    根据所述目标量化网络层的目标量化比特数,对所述至少两个特征元素子段和所述至少两个权重元素子段进行量化处理,并根据量化处理结果,确定所述目标量化网络层的输出特征。According to the target quantization bit number of the target quantization network layer, quantization processing is performed on the at least two feature element sub-segments and the at least two weight element sub-segments, and the target quantization network layer is determined according to the quantization processing result. output characteristics.
  2. 根据权利要求1所述的方法,其中,所述根据所述目标量化网络层的目标分段系数,将所述特征矩阵的每一行特征元素划分为至少两个特征元素子段,以及将所述权重矩阵的每一列权重元素划分为至少两个权重元素子段,包括:The method according to claim 1, wherein, according to the target segmentation coefficient of the target quantization network layer, each row of feature elements of the feature matrix is divided into at least two feature element sub-segments, and the Each column weight element of the weight matrix is divided into at least two weight element sub-segments, including:
    根据所述目标量化网络层的目标分段系数中的第一系数,将所述特征矩阵的每一行特征元素划分为至少两个特征元素子段;Divide each row of feature elements of the feature matrix into at least two feature element sub-segments according to the first coefficient among the target segmentation coefficients of the target quantization network layer;
    根据所述目标分段系数中的第二系数,将所述权重矩阵的每一列权重元素划分为至少两个权重元素子段;Divide each column weight element of the weight matrix into at least two weight element sub-segments according to the second coefficient in the target segment coefficient;
    其中,所述第一系数和所述第二系数成整数倍关系。Wherein, the first coefficient and the second coefficient are in an integer multiple relationship.
  3. 根据权利要求1所述的方法,其中,所述根据量化处理结果,确定所述目标量化网络层的输出特征,包括:The method according to claim 1, wherein determining the output characteristics of the target quantization network layer according to the quantization processing results includes:
    确定所述特征矩阵中的每一特征元素子段,在所述权重矩阵中对应的权重元素子段,并将具有对应关系的特征元素子段和权重元素子段作为一组关联子段对;Determine each characteristic element sub-segment in the characteristic matrix, the corresponding weight element sub-segment in the weight matrix, and use the characteristic element sub-segments and weight element sub-segments with corresponding relationships as a set of associated sub-segment pairs;
    根据每组关联子段对中的特征元素子段和权重元素子段的量化处理结果,确定所述目标量化网络层的输出特征。The output characteristics of the target quantization network layer are determined according to the quantization processing results of the feature element sub-segments and weight element sub-segments in each group of associated sub-segment pairs.
  4. 根据权利要求3所述的方法,其中,所述每组关联子段对中包含的特征元素子段和权重元素子段的数量比值,与划分所述权重矩阵和所述特征矩阵的分段系数的比值相同。The method according to claim 3, wherein the ratio of the number of characteristic element sub-segments and weight element sub-segments contained in each group of associated sub-segment pairs is divided by the segmentation coefficient dividing the weight matrix and the characteristic matrix. The ratios are the same.
  5. 根据权利要求1-4中任一项所述的方法,其中,所述根据量化处理结果,确定所述目标量化网络层的输出特征,包括:The method according to any one of claims 1 to 4, wherein determining the output characteristics of the target quantization network layer according to the quantization processing results includes:
    通过图形处理器GPU的张量核Tensor Core计算单元,根据所述量化处理结果,确定所述目标量化网络层的输出特征。Through the Tensor Core computing unit of the graphics processor GPU, the output characteristics of the target quantization network layer are determined based on the quantization processing results.
  6. 根据权利要求1-5中任一项所述的方法,其中,所述特征矩阵是语音片段经特征提取层处理后得到的语音特征;所述输出特征用于对所述语音片段进行语义识别处理。The method according to any one of claims 1 to 5, wherein the feature matrix is a speech feature obtained after a speech segment is processed by a feature extraction layer; the output feature is used to perform semantic recognition processing on the speech segment. .
  7. 根据权利要求1-6中任一项所述的方法,还包括:The method according to any one of claims 1-6, further comprising:
    确定原始模型的可选量化策略;其中,所述可选量化策略包括:可选量化网络层、所述可选量化网络层的可选分段系数和可选量化比特数;Determine the optional quantization strategy of the original model; wherein the optional quantization strategy includes: an optional quantization network layer, an optional segmentation coefficient of the optional quantization network layer, and an optional number of quantization bits;
    获取所述原始模型基于所述可选量化策略执行数据处理,得到的所述可选量化策略的量化贡献信息;Obtain the quantitative contribution information of the optional quantization strategy obtained by performing data processing on the original model based on the optional quantization strategy;
    根据所述量化贡献信息,从所述可选量化策略中确定目标量化策略,以得到所述目标量化网络层、所述目标量化网络层的目标分段系数和所述目标量化网络层的目标量化比特数。According to the quantization contribution information, a target quantization strategy is determined from the optional quantization strategies to obtain the target quantization network layer, the target segmentation coefficient of the target quantization network layer, and the target quantization of the target quantization network layer. Number of bits.
  8. 根据权利要求7所述的方法,其中,所述根据所述量化贡献信息,从可选量化策略中确定目标量化策略,包括:The method according to claim 7, wherein determining a target quantization strategy from optional quantization strategies according to the quantization contribution information includes:
    根据所述可选量化策略对应的量化贡献信息,从所述可选量化策略中确定新增选定量化策略;Determine a newly selected quantization strategy from the optional quantization strategies according to the quantification contribution information corresponding to the optional quantization strategies;
    确定所述新增选定策略和历史选定量化策略的总压缩体积;Determine the total compression volume of the newly selected strategy and the historically selected quantization strategy;
    在所述总压缩体积达到量化要求的情况下,将所述新增选定量化策略和所述历史选定量化策略作为所述目标量化策略。When the total compression volume reaches the quantization requirement, the newly selected quantization strategy and the historically selected quantization strategy are used as the target quantization strategy.
  9. 根据权利要求8所述的方法,其中,所述量化贡献信息包括模型精度信息和压缩体积信息;The method according to claim 8, wherein the quantitative contribution information includes model accuracy information and compression volume information;
    所述根据所述可选量化策略对应的量化贡献信息,从所述可选量化策略中确定新增选定量化策略,包括:Determining a newly selected quantization strategy from the optional quantization strategies based on the quantification contribution information corresponding to the optional quantization strategies includes:
    根据所述可选量化策略对应的所述模型精度信息和所述压缩体积信息,对所述可选量化策略进行排序;Sort the optional quantization strategies according to the model accuracy information and the compression volume information corresponding to the optional quantization strategies;
    根据排序结果和所述可选量化策略对应的压缩体积信息,从所述可选量化策略中确定所述新增选定量化策略。The newly selected quantization strategy is determined from the optional quantization strategies according to the sorting result and the compression volume information corresponding to the optional quantization strategies.
  10. 根据权利要求8或9所述的方法,还包括:The method according to claim 8 or 9, further comprising:
    在所述总压缩体积未达到量化要求的情况下,基于所述新增选定量化策略对所述原始模型进行初步量化,并训练初步量化后的原始模型,得到初步量化模型;When the total compression volume does not meet the quantification requirements, perform preliminary quantification on the original model based on the newly selected quantization strategy, and train the original model after preliminary quantification to obtain a preliminary quantization model;
    将所述新增选定量化策略添加到所述历史选定量化策略;Add the newly selected quantization strategy to the historically selected quantization strategy;
    将所述可选量化策略中除所述新增选定量化策略之外的可选量化策略作为新的可选量化策略,并将所述初步量化模型作为所述原始模型,返回执行获取所述原始模型基于所述可选量化策略执行数据处理,得到的所述可选量化策略的量化贡献信息的操作。Use the optional quantization strategies other than the newly selected quantization strategy among the optional quantization strategies as new optional quantization strategies, and use the preliminary quantization model as the original model, and return to execute to obtain the The original model performs data processing based on the optional quantization strategy to obtain the quantified contribution information of the optional quantization strategy.
  11. 一种数据处理装置,包括:A data processing device including:
    矩阵获取模块,设置为获取目标量化网络层输入的特征矩阵和所述目标量化网络层的权重矩阵;其中,所述特征矩阵的列数等于所述权重矩阵的行数;A matrix acquisition module, configured to acquire the feature matrix input by the target quantization network layer and the weight matrix of the target quantization network layer; wherein the number of columns of the feature matrix is equal to the number of rows of the weight matrix;
    矩阵分段模块,设置为根据所述目标量化网络层的目标分段系数,将所述特征矩阵的每一行特征元素划分为至少两个特征元素子段,以及将所述权重矩阵的每一列权重元素划分为至少两个权重元素子段;A matrix segmentation module configured to divide each row of feature elements of the feature matrix into at least two feature element sub-segments according to the target quantification target segmentation coefficient of the network layer, and weight each column of the weight matrix The elements are divided into at least two weight element sub-segments;
    量化处理模块,设置为根据所述目标量化网络层的目标量化比特数,对所述至少两个特征元素子段和所述至少两个权重元素子段进行量化处理;A quantization processing module configured to perform quantization processing on the at least two feature element sub-segments and the at least two weight element sub-segments according to the target quantization bit number of the target quantization network layer;
    特征确定模块,设置为根据量化处理结果,确定所述目标量化网络层的输出特征。A feature determination module is configured to determine the output features of the target quantization network layer based on the quantization processing results.
  12. 根据权利要求11所述的装置,其中,所述矩阵分段模块,设置为:The device according to claim 11, wherein the matrix segmentation module is configured to:
    根据所述目标量化网络层的目标分段系数中的第一系数,将所述特征矩阵的每一行特征元素划分为至少两个特征元素子段;Divide each row of feature elements of the feature matrix into at least two feature element sub-segments according to the first coefficient among the target segmentation coefficients of the target quantization network layer;
    根据所述目标分段系数中的第二系数,将所述权重矩阵的每一列权重元素划分为至少两个权重元素子段;Divide each column weight element of the weight matrix into at least two weight element sub-segments according to the second coefficient in the target segment coefficient;
    其中,所述第一系数和所述第二系数成整数倍关系。Wherein, the first coefficient and the second coefficient are in an integer multiple relationship.
  13. 根据权利要求11所述的装置,其中,所述特征确定模块,包括:The device according to claim 11, wherein the feature determination module includes:
    子段对确定单元,设置为确定所述特征矩阵中的每一特征元素子段,在所述权重矩阵中对应的权重元素子段,并将具有对应关系的特征元素子段和权重元素子段作为一组关联子段对;The sub-segment pair determination unit is configured to determine each characteristic element sub-segment in the characteristic matrix, the corresponding weight element sub-segment in the weight matrix, and combine the characteristic element sub-segments and weight element sub-segments with corresponding relationships as a set of associated subsegment pairs;
    特征计算单元,设置为根据每组关联子段对中的特征元素子段和权重元素子段的量化处理结果,确定所述目标量化网络层的输出特征。The feature calculation unit is configured to determine the output features of the target quantization network layer based on the quantization processing results of the feature element sub-segments and the weight element sub-segments in each group of associated sub-segment pairs.
  14. 根据权利要求13所述的装置,其中,所述每组关联子段对中包含的特征元素子段和权重元素子段的数量比值,与划分所述权重矩阵和所述特征矩阵的分段系数的比值相同。The device according to claim 13, wherein the ratio of the number of characteristic element sub-segments and weight element sub-segments contained in each group of associated sub-segment pairs is divided by the segmentation coefficient dividing the weight matrix and the characteristic matrix. The ratios are the same.
  15. 根据权利要求11-14中任一项所述的装置,其中,所述特征确定模块,设置为:The device according to any one of claims 11-14, wherein the feature determination module is configured to:
    通过图形处理器GPU的张量核Tensor Core计算单元,根据所述量化处理结果,确定所述目标量化网络层的输出特征。Through the Tensor Core computing unit of the graphics processor GPU, the output characteristics of the target quantization network layer are determined based on the quantization processing results.
  16. 根据权利要求11-15中任一项所述的装置,其中,所述特征矩阵是语音片段经特征提取层处理后得到的语音特征;所述输出特征用于对所述语音片段进行语义识别处理。The device according to any one of claims 11 to 15, wherein the feature matrix is a speech feature obtained after a speech segment is processed by a feature extraction layer; the output feature is used to perform semantic recognition processing on the speech segment. .
  17. 根据权利要求11-16中任一项所述的装置,还包括:The device according to any one of claims 11-16, further comprising:
    可选策略确定模块,设置为确定原始模型的可选量化策略;其中,所述可选量化策略包括:可选量化网络层、所述可选量化网络层的可选分段系数和可选量化比特数;The optional strategy determination module is configured to determine the optional quantization strategy of the original model; wherein the optional quantization strategy includes: an optional quantization network layer, an optional segmentation coefficient of the optional quantization network layer, and optional quantization. number of bits;
    贡献信息获取模块,设置为获取所述原始模型基于所述可选量化策略执行数据处理,得到的所述可选量化策略的量化贡献信息;A contribution information acquisition module, configured to acquire the quantitative contribution information of the optional quantification strategy obtained by performing data processing on the original model based on the optional quantification strategy;
    目标策略确定模块,设置为根据所述量化贡献信息,从所述可选量化策略中确定目标量化策略,以得到所述目标量化网络层、所述目标量化网络层的目标分段系数和所述目标量化网络层的目标量化比特数。A target strategy determination module configured to determine a target quantization strategy from the optional quantization strategies based on the quantification contribution information to obtain the target quantization network layer, the target segmentation coefficient of the target quantization network layer and the The target number of quantization bits for the target quantization network layer.
  18. 根据权利要求17所述的装置,其中,所述目标策略确定模块,包括:The device according to claim 17, wherein the target policy determination module includes:
    新增策略确定单元,设置为根据所述可选量化策略对应的量化贡献信息,从所述可选量化策略中确定新增选定量化策略;A new strategy determination unit configured to determine a newly selected quantization strategy from the optional quantization strategies based on the quantification contribution information corresponding to the optional quantization strategies;
    压缩体积确定单元,设置为确定所述新增选定策略和历史选定量化策略的总压缩体积;A compression volume determination unit configured to determine the total compression volume of the newly selected strategy and the historically selected quantization strategy;
    目标策略确定单元,设置为在所述总压缩体积达到量化要求的情况下,将所述新增选定量化策略和所述历史选定量化策略作为所述目标量化策略。The target strategy determining unit is configured to use the newly selected quantization strategy and the historically selected quantization strategy as the target quantization strategy when the total compression volume reaches the quantization requirement.
  19. 根据权利要求18所述的装置,其中,所述量化贡献信息包括模型精度信息和压缩体积信息;The apparatus of claim 18, wherein the quantified contribution information includes model accuracy information and compression volume information;
    所述新增策略确定单元,设置为:The new strategy determination unit is set to:
    根据所述可选量化策略对应的所述模型精度信息和所述压缩体积信息,对所述可选量化策略进行排序;Sort the optional quantization strategies according to the model accuracy information and the compression volume information corresponding to the optional quantization strategies;
    根据排序结果和所述可选量化策略对应的压缩体积信息,从所述可选量化策略中确定新增选定量化策略。According to the sorting result and the compression volume information corresponding to the optional quantization strategy, a newly selected quantization strategy is determined from the optional quantization strategies.
  20. 根据权利要求18或19所述的装置,所述目标策略确定模块,还包括:The device according to claim 18 or 19, the target policy determination module further includes:
    量化训练单元,设置为在所述总压缩体积未达到量化要求的情况下,基于所述新增选定量化策略对所述原始模型进行初步量化,并训练初步量化后的原 始模型,得到初步量化模型;The quantification training unit is configured to perform preliminary quantification on the original model based on the newly selected quantization strategy when the total compression volume does not meet the quantification requirements, and train the original model after preliminary quantification to obtain preliminary quantification. Model;
    历史量化策略更新单元,设置为将所述新增选定量化策略添加到所述历史选定量化策略;A historical quantification strategy update unit configured to add the newly selected quantification strategy to the historical selected quantification strategy;
    循环操作单元,设置为将所述可选量化策略中除所述新增选定量化策略之外的可选量化策略作为新的可选量化策略,并将所述初步量化模型作为所述原始模型,返回执行获取所述原始模型基于所述可选量化策略执行数据处理,得到的所述可选量化策略的量化贡献信息的操作。A loop operation unit configured to use the optional quantization strategies other than the newly selected quantization strategy among the optional quantization strategies as new optional quantization strategies, and use the preliminary quantization model as the original model. , return to perform the operation of obtaining the quantization contribution information of the optional quantization strategy obtained by performing data processing on the original model based on the optional quantization strategy.
  21. 一种电子设备,包括:An electronic device including:
    至少一个处理器;以及at least one processor; and
    与所述至少一个处理器通信连接的存储器;其中,a memory communicatively connected to the at least one processor; wherein,
    所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行权利要求1-10中任一项所述的数据处理方法。The memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to enable the at least one processor to perform any one of claims 1-10. data processing methods.
  22. 一种存储有计算机指令的非瞬时计算机可读存储介质,其中,所述计算机指令用于使所述计算机执行根据权利要求1-10中任一项所述的数据处理方法。A non-transitory computer-readable storage medium storing computer instructions, wherein the computer instructions are used to cause the computer to execute the data processing method according to any one of claims 1-10.
  23. 一种计算机程序产品,包括计算机程序,所述计算机程序在被处理器执行时实现权利要求1-10中任一项所述的数据处理方法。A computer program product, including a computer program that implements the data processing method according to any one of claims 1-10 when executed by a processor.
PCT/CN2022/132429 2022-04-28 2022-11-17 Data processing method and apparatus, and device and storage medium WO2023207039A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210463316.9A CN114781650B (en) 2022-04-28 2022-04-28 Data processing method, device, equipment and storage medium
CN202210463316.9 2022-04-28

Publications (1)

Publication Number Publication Date
WO2023207039A1 true WO2023207039A1 (en) 2023-11-02

Family

ID=82434750

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/132429 WO2023207039A1 (en) 2022-04-28 2022-11-17 Data processing method and apparatus, and device and storage medium

Country Status (2)

Country Link
CN (1) CN114781650B (en)
WO (1) WO2023207039A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117312255A (en) * 2023-11-29 2023-12-29 湖南中斯信息科技有限公司 Electronic document splitting optimization management method and system

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114781650B (en) * 2022-04-28 2024-02-27 北京百度网讯科技有限公司 Data processing method, device, equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111598227A (en) * 2020-05-20 2020-08-28 字节跳动有限公司 Data processing method and device, electronic equipment and computer readable storage medium
WO2020190772A1 (en) * 2019-03-15 2020-09-24 Futurewei Technologies, Inc. Neural network model compression and optimization
CN112529189A (en) * 2020-11-10 2021-03-19 北京百度网讯科技有限公司 Model compression method and device, electronic equipment and storage medium
WO2021174370A1 (en) * 2020-03-05 2021-09-10 Huawei Technologies Co., Ltd. Method and system for splitting and bit-width assignment of deep learning models for inference on distributed systems
CN113408704A (en) * 2021-06-29 2021-09-17 深圳市商汤科技有限公司 Data processing method, device, equipment and computer readable storage medium
CN114781650A (en) * 2022-04-28 2022-07-22 北京百度网讯科技有限公司 Data processing method, device, equipment and storage medium

Family Cites Families (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106796668B (en) * 2016-03-16 2019-06-14 香港应用科技研究院有限公司 Method and system for bit-depth reduction in artificial neural network
CN107944555B (en) * 2017-12-07 2021-09-17 广州方硅信息技术有限公司 Neural network compression and acceleration method, storage device and terminal
CN108133266B (en) * 2017-12-12 2021-07-09 北京信息科技大学 Neural network weight compression method based on non-uniform quantization and use method
WO2019127362A1 (en) * 2017-12-29 2019-07-04 清华大学 Neural network model block compression method, training method, computing device and system
CN108765247B (en) * 2018-05-15 2023-01-10 腾讯科技(深圳)有限公司 Image processing method, device, storage medium and equipment
CN110874636B (en) * 2018-09-04 2023-06-30 杭州海康威视数字技术股份有限公司 Neural network model compression method and device and computer equipment
JP7266693B2 (en) * 2018-10-30 2023-04-28 グーグル エルエルシー Quantization of Trained Long-Short-Term Memory Neural Networks
KR102659494B1 (en) * 2019-01-21 2024-04-23 삼성전자주식회사 Electronic apparatus and control method thereof
KR102152374B1 (en) * 2019-02-25 2020-09-07 주식회사 딥엑스 Method and system for bit quantization of artificial neural network
CN110263913A (en) * 2019-05-23 2019-09-20 深圳先进技术研究院 A kind of deep neural network compression method and relevant device
CN110348562B (en) * 2019-06-19 2021-10-15 北京迈格威科技有限公司 Neural network quantization strategy determination method, image identification method and device
CN110288030B (en) * 2019-06-27 2023-04-07 重庆大学 Image identification method, device and equipment based on lightweight network model
CN110782003A (en) * 2019-09-20 2020-02-11 北京航空航天大学 Neural network compression method and system based on Hash learning
CN111222638B (en) * 2019-11-21 2023-05-12 湖南大学 Neural network-based network anomaly detection method and device
CN113762493A (en) * 2020-06-01 2021-12-07 阿里巴巴集团控股有限公司 Neural network model compression method and device, acceleration unit and computing system
CN111695695B (en) * 2020-06-09 2023-08-08 北京百度网讯科技有限公司 Quantitative analysis method and device for user decision behaviors
CN112669861B (en) * 2020-12-09 2023-04-07 北京百度网讯科技有限公司 Audio data processing method, device, equipment and storage medium
CN114005452A (en) * 2021-10-29 2022-02-01 北京百度网讯科技有限公司 Method and device for extracting voice features, electronic equipment and storage medium
CN114282670A (en) * 2022-01-14 2022-04-05 北京百度网讯科技有限公司 Neural network model compression method, device and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020190772A1 (en) * 2019-03-15 2020-09-24 Futurewei Technologies, Inc. Neural network model compression and optimization
WO2021174370A1 (en) * 2020-03-05 2021-09-10 Huawei Technologies Co., Ltd. Method and system for splitting and bit-width assignment of deep learning models for inference on distributed systems
CN111598227A (en) * 2020-05-20 2020-08-28 字节跳动有限公司 Data processing method and device, electronic equipment and computer readable storage medium
CN112529189A (en) * 2020-11-10 2021-03-19 北京百度网讯科技有限公司 Model compression method and device, electronic equipment and storage medium
CN113408704A (en) * 2021-06-29 2021-09-17 深圳市商汤科技有限公司 Data processing method, device, equipment and computer readable storage medium
CN114781650A (en) * 2022-04-28 2022-07-22 北京百度网讯科技有限公司 Data processing method, device, equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117312255A (en) * 2023-11-29 2023-12-29 湖南中斯信息科技有限公司 Electronic document splitting optimization management method and system
CN117312255B (en) * 2023-11-29 2024-02-20 湖南中斯信息科技有限公司 Electronic document splitting optimization management method and system

Also Published As

Publication number Publication date
CN114781650A (en) 2022-07-22
CN114781650B (en) 2024-02-27

Similar Documents

Publication Publication Date Title
WO2023207039A1 (en) Data processing method and apparatus, and device and storage medium
US11295208B2 (en) Robust gradient weight compression schemes for deep learning applications
JP2022058915A (en) Method and device for training image recognition model, method and device for recognizing image, electronic device, storage medium, and computer program
WO2019155064A1 (en) Data compression using jointly trained encoder, decoder, and prior neural networks
CN109284761B (en) Image feature extraction method, device and equipment and readable storage medium
JP2022172362A (en) Image processing method, face recognition model training method, device and equipment
US20220374678A1 (en) Method for determining pre-training model, electronic device and storage medium
CN112560985A (en) Neural network searching method and device and electronic equipment
CN113449859A (en) Data processing method and device
JP2023547010A (en) Model training methods, equipment, and electronics based on knowledge distillation
KR20220127332A (en) Automatic creation of various texts
CN116362325A (en) Electric power image recognition model lightweight application method based on model compression
JP7357114B2 (en) Training method, device, electronic device and storage medium for living body detection model
JP2023085353A (en) Feature extraction model training method, image classifying method, and related apparatus
CN113409898B (en) Molecular structure acquisition method and device, electronic equipment and storage medium
CN114020950A (en) Training method, device and equipment of image retrieval model and storage medium
CN113435208A (en) Student model training method and device and electronic equipment
CN113408304B (en) Text translation method and device, electronic equipment and storage medium
CN113361621B (en) Method and device for training model
CN114881227A (en) Model compression method, image processing method, device and electronic equipment
US20210232891A1 (en) Neural network model compression with structured weight unification
CN114998649A (en) Training method of image classification model, and image classification method and device
CN114882388A (en) Method, device, equipment and medium for training and predicting multitask model
US20210201157A1 (en) Neural network model compression with quantizability regularization
CN113961765A (en) Searching method, device, equipment and medium based on neural network model

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22939854

Country of ref document: EP

Kind code of ref document: A1