WO2023207039A1 - Procédé et appareil de traitement de données, et dispositif et support de stockage - Google Patents

Procédé et appareil de traitement de données, et dispositif et support de stockage Download PDF

Info

Publication number
WO2023207039A1
WO2023207039A1 PCT/CN2022/132429 CN2022132429W WO2023207039A1 WO 2023207039 A1 WO2023207039 A1 WO 2023207039A1 CN 2022132429 W CN2022132429 W CN 2022132429W WO 2023207039 A1 WO2023207039 A1 WO 2023207039A1
Authority
WO
WIPO (PCT)
Prior art keywords
quantization
target
optional
strategy
feature
Prior art date
Application number
PCT/CN2022/132429
Other languages
English (en)
Chinese (zh)
Inventor
王桂彬
丛士钧
贾铭
贾磊
Original Assignee
北京百度网讯科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京百度网讯科技有限公司 filed Critical 北京百度网讯科技有限公司
Publication of WO2023207039A1 publication Critical patent/WO2023207039A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Definitions

  • This disclosure relates to the field of artificial intelligence technology and the field of deep learning technology, and can be applied to scenarios such as speech recognition, natural language processing, and information recommendation.
  • the present disclosure provides a data processing method, device, equipment and storage medium.
  • a data processing method including:
  • each row of feature elements of the feature matrix is divided into at least two feature element sub-segments, and each column of weight elements of the weight matrix is divided into at least two weight element sub-segments;
  • quantization processing is performed on at least two feature element sub-segments and at least two weight element sub-segments, and based on the quantization processing results, the output characteristics of the target quantization network layer are determined.
  • a data processing apparatus including:
  • a matrix acquisition module configured to acquire the feature matrix input by the target quantization network layer and the weight matrix of the target quantization network layer; wherein the number of columns of the feature matrix is equal to the number of rows of the weight matrix;
  • a matrix segmentation module configured to divide each row of feature elements of the feature matrix into at least two feature element sub-segments according to the target quantification target segmentation coefficient of the network layer, and weight each column of the weight matrix The elements are divided into at least two weight element sub-segments;
  • a quantization processing module configured to perform quantization processing on the at least two feature element sub-segments and the at least two weight element sub-segments according to the target quantization bit number of the target quantization network layer;
  • a feature determination module is configured to determine the output features of the target quantization network layer based on the quantization processing results.
  • an electronic device including:
  • a memory communicatively connected to at least one processor; wherein,
  • the memory stores instructions that can be executed by at least one processor, and the instructions are executed by at least one processor, so that at least one processor can execute the above-mentioned data processing method.
  • a non-transitory computer-readable storage medium storing computer instructions, wherein the computer instructions are used to cause the computer to execute the above-mentioned data processing method.
  • a computer program product including a computer program that implements the above-mentioned data processing method when executed by a processor.
  • Figure 1 is a flow chart of a data processing method provided by an embodiment of the present disclosure
  • Figure 2 is a flow chart of another data processing method provided by an embodiment of the present disclosure.
  • Figure 3 is a flow chart of another data processing method provided by an embodiment of the present disclosure.
  • Figure 4 is a flow chart of another data processing method provided by an embodiment of the present disclosure.
  • Figure 5 is a schematic structural diagram of a data processing device provided by an embodiment of the present disclosure.
  • Figure 6 is a schematic structural diagram of a feature determination module provided by an embodiment of the present disclosure.
  • Figure 7 is a schematic structural diagram of another data processing device provided by an embodiment of the present disclosure.
  • Figure 8 is a schematic structural diagram of a target strategy determination module provided by an embodiment of the present disclosure.
  • Figure 9 is a schematic structural diagram of another target strategy determination module provided by an embodiment of the present disclosure.
  • Figure 10 is a block diagram of an electronic device that implements a data processing method provided by an embodiment of the present disclosure.
  • Figure 1 is a flow chart of a data processing method provided by an embodiment of the present disclosure; the embodiment of the present disclosure is suitable for performing quantitative processing on the data calculation process of the target quantification network layer in the deep learning model, and is suitable for performing quantitative processing on the data calculation process of the target quantization network layer in the deep learning model.
  • the feature matrix input to the target quantization network layer and the weight matrix of the target quantization network layer are processed to obtain the output features of the target network layer.
  • the method may be executed by a data processing device, which may be implemented in software and/or hardware. Can be integrated into electronic devices configured with deep learning models.
  • the data processing method provided by this embodiment may include:
  • S101 Obtain the feature matrix input to the target quantization network layer and the weight matrix of the target quantization network layer.
  • the target quantization network layer may be a network layer that operates matrix multiplication operators in the deep learning model.
  • the matrix multiplication operators may include but are not limited to: fully connected operators and other derivative operators, such as transformer operators.
  • the feature matrix input to the target quantification network layer can be the input information input to the target network layer.
  • the feature matrix can be the input information of the deep learning model.
  • the feature matrix can be the output of the network layer located above the target quantization network layer in the deep learning model.
  • the weight matrix of the target quantization network layer can be the inherent network parameters of the weight coefficients of the target quantization network layer that are obtained during the network training stage and are used to characterize the input features of this layer.
  • the number of columns of the feature matrix needs to be equal to the number of rows of the weight matrix. That is, the size of the feature matrix is: m*k, and the size of the weight matrix is k*n. Among them, the values of m, k, and n are positive integers.
  • This embodiment can obtain the feature data input to the target quantization network layer as a feature matrix, and obtain the inherent weight parameters in the target quantization network layer as a weight matrix. If there are multiple input data to the target network layer, the input data whose number of columns is the same as the number of rows of the weight matrix can be selected as the feature matrix input to the target quantization network layer.
  • the target segmentation coefficient may be one of the quantization configuration parameters required when quantizing the operation process of the target quantization network layer. It is used to characterize the number of matrix elements contained in each sub-segment after division. For example, if the target segmentation coefficient is C, each C consecutive elements in the matrix can be divided into a segment, that is, the number of matrix elements contained in each divided subsegment is C. Among them, the value of C is a positive integer.
  • the value of the target segmentation coefficient may be predetermined. For example, one may be selected from a variety of optional segmentation coefficients as the target segmentation coefficient through a large number of test analyses. It can also be set based on experience, etc., which is not limited.
  • the target segmentation coefficient can be set to divide the feature elements of each row of the feature matrix and the matrix elements of each column of the weight matrix into equal parts. That is, if the number of columns of the feature matrix and the number of rows of the weight matrix are both k, then this When the value of the target segmentation coefficient C can be evenly divided by k.
  • the matrix elements in the feature matrix are called feature elements, and each group of feature elements after dividing the feature elements is regarded as a feature element sub-segment; the matrix elements in the weight matrix are called weight elements, and the weight elements are Each divided group of weight elements is treated as a weight element sub-segment.
  • each row of feature elements in the feature matrix can be divided into at least two segments with C adjacent feature elements as a group, each of which serves as a feature element sub-segment. ; Then according to the target segmentation coefficient C, each column of weight elements in the weight matrix is divided into at least two segments with C adjacent weight elements as a group, and each segment is used as a weight element sub-segment.
  • the characteristic matrix is matrix I, that is The weight matrix is the matrix W, that is And the target segmentation coefficient C is 4, then each row in the matrix I is divided based on the target segmentation coefficient C, and 8 characteristic element sub-segments are obtained, that is, the characteristic element sub-segment 1 (I 11 , I 12 , I 13 , I 14 ), characteristic element sub-segment 2 (I 15 , I 16 , I 17 , I 18 ), characteristic element sub-segment 3 (I 21 , I 22 , I 23 , I 24 ), characteristic element sub-segment 4 (I 25 ,I 26 ,I 27 ,I 28 ), characteristic element sub-segment 5 (I 31 ,I 32 ,I 33 ,I 34 ), characteristic element sub-segment 6 (I 35 ,I 36 ,I 37 ,I 38 ), characteristic Element sub-segment 7 (I 41 , I 42 , I 43 , I 44 ) and characteristic element element sub-s
  • weight element sub-segments namely weight element sub-segment 1 (W 11 , W 21 , W 31 , W 41 ), weight element sub-segment 2 ( W 51 , W 61 , W 71 , W 81 ), weight element sub-segment 3 (W 12 , W 22 , W 32 , W 42 ) and weight element sub-segment 4 (W 52 , W 62 , W 72 , W 82 ) .
  • weight element sub-segment 1 W 11 , W 21 , W 31 , W 41
  • weight element sub-segment 2 W 51 , W 61 , W 71 , W 81
  • weight element sub-segment 3 W 12 , W 22 , W 32 , W 42
  • weight element sub-segment 4 W 52 , W 62 , W 72 , W 82
  • S103 Perform quantization processing on at least two feature element sub-segments and at least two weight element sub-segments according to the target quantization bit number of the target quantization network layer, and determine the output characteristics of the target quantization network layer based on the quantization processing results.
  • the target quantization bit number may be another parameter among the quantization configuration parameters required when quantizing the operation process of the target quantization network layer. It is used to represent the degree of quantization of the matrix multiplication operator, that is, the smaller the value of the target number of quantization bits, the higher the degree of quantization.
  • the value of the target number of quantization bits in this embodiment is usually not greater than 4.
  • it can be The value is 1bit, 2bit or 4bit, etc.
  • the process of quantizing each feature element sub-segment and each weight element sub-segment based on the target number of quantization bits includes: determining the characteristics of the feature element sub-segment based on the feature element value within each feature element sub-segment.
  • the reference value for example, can be the feature element value with the largest absolute value within the feature element sub-segment as the feature reference value of the feature element sub-segment, and then based on the feature reference value and the target quantization bit number of the target quantization network layer, as follows: Formula (1) determines the quantified value of each feature element within the feature element sub-segment;
  • I′ i,p is the quantized value of the feature element in the i-th row and p-th column of the feature matrix I;
  • I i,p is the feature element in the i-th row and p-th column of the feature matrix I;
  • absmax(I i,s ) is the feature reference value of the s-th feature element sub-segment in the i-th row of feature matrix I;
  • B is the target quantization bit number of the target quantization network layer.
  • each weight element sub-segment determines the weight base value of the weight element sub-segment, and according to the weight base value and the target quantization bit number, determine the weight element sub-segment according to the following formula (2) The quantized value of the weight element.
  • W′ q,j is the quantized value of the weight element in the qth row and jth column of the weight matrix W;
  • W q,j is the weight element of the qth row and jth column of the weight matrix W;
  • absmax(I j,s ) is the weight reference value of the s-th weight element subsection of the j-th column of the weight matrix W;
  • B is the target quantization bit number of the target quantization network layer.
  • variables i, p, s, j, and q in this embodiment are positive integers.
  • the process of converting each feature element or weight element into its corresponding quantized value is essentially a process of quantizing the feature element or weight element into a low-bit integer corresponding to the target quantization bit number.
  • the quantification processing results can be used, that is, the feature reference value of each feature element sub-segment and each feature within the feature element sub-segment.
  • the quantized value of the element; as well as the weight reference value of each weight element sub-segment and the quantized value of each weight element within the weight element sub-segment, the target quantization network layer is determined through the process of low-bit matrix multiplication calculation and inverse quantization. Output features.
  • the target quantization network layer of the above solution in this embodiment can be located in any deep learning model configured with a matrix multiplication operator, for example, it can be located in an image recognition model, a speech recognition model, or a text semantic parsing model, etc.
  • the target quantification network layer can be deployed in the speech recognition model.
  • the corresponding feature matrix is the speech feature obtained after the speech segment is processed by the feature extraction layer; the output feature is used for semantic recognition of the speech segment. deal with.
  • each row of the feature matrix and each column of the weight matrix are divided into at least two weight element sub-segments, and then perform quantization processing on the divided feature element sub-segments and weight element sub-segments according to the target quantization bit number, and determine the output characteristics of the target quantization network layer based on the processing results.
  • this solution divides each row of the feature matrix and each column of the weight matrix into multiple sub-segments for quantization.
  • the target segmentation coefficient of the target quantization network layer may include a first coefficient and a second coefficient.
  • the method of segmenting the feature matrix and the weight matrix is: according to the target segmentation of the target quantization network layer According to the first coefficient in the coefficient, each row of feature elements of the feature matrix is divided into at least two feature element sub-segments; according to the second coefficient in the target segmentation coefficient, each column weight element of the weight matrix is divided into at least two weight elements.
  • the method of dividing the feature matrix based on the first coefficient and the method of dividing the weight matrix based on the second coefficient are similar to the methods introduced in the above embodiments and will not be described again here.
  • This method can divide the weight matrix and feature matrix into sub-segments based on different target segment coefficients, which improves the flexibility and diversity of the division rules, improves the subsequent matrix quantification, and determines the accuracy and flexibility of the output matrix based on the quantification results. sex.
  • Figure 2 is a flow chart of another data processing method provided by an embodiment of the present disclosure. Based on the above embodiments, the embodiments of the present disclosure explain in detail how to determine the output characteristics of the target quantization network layer based on the quantization processing results. As shown in Figure 2, the data processing method provided by this embodiment may include:
  • the number of columns of the feature matrix is equal to the number of rows of the weight matrix.
  • S203 Perform quantization processing on at least two feature element sub-segments and at least two weight element sub-segments according to the target quantization bit number of the target quantization network layer.
  • S204 Determine each characteristic element sub-segment in the characteristic matrix and the corresponding weight element sub-segment in the weight matrix, and use the corresponding characteristic element sub-segments and weight element sub-segments as a set of associated sub-segment pairs.
  • each characteristic element sub-segment of the characteristic matrix has a corresponding weight element sub-segment in each column of the weight matrix to determine the weight element sub-segment corresponding to the s-th characteristic element sub-segment of the i-th row in the characteristic matrix.
  • perform the following operations for each column of the weight matrix in turn: According to the position of each feature element in the i-th row of the feature element sub-segment, select the weight element where the weight element at the same position in each column of the weight matrix is located. Sub-segment, as the weight element sub-segment corresponding to each column of the weight matrix for the feature element sub-segment.
  • the characteristic matrix I is The characteristic matrix W is And the feature matrix I is divided into sub-segments based on the first coefficient C1 in the target segment coefficient, and the weight matrix W is divided based on the second coefficient C2 in the target segment coefficient.
  • the values of C1 and C2 are positive integers, and they can be the same or different.
  • the weight element sub-segment 1 (W 11 , W 21 , W 31 , W 41 ) where the weight element (i.e.
  • W 11 , W 21 , W 31 , W 41 is located, as the characteristic element sub-segment 1 in the weight matrix W
  • the weight element sub-segments corresponding to the characteristic element sub-segment 1 in other columns of the weight matrix W are determined in the same way, and will not be described again here.
  • the weight element sub-segment corresponding to element sub-segment 1 in the first column of the weight matrix W is used as this feature.
  • the weight element sub-segments corresponding to the characteristic element sub-segment 1 in other columns of the weight matrix W are determined in the same way, and will not be described again here.
  • the weight element sub-segment corresponding to the characteristic element sub-segment 2 (I 13 , I 14 ) in the first column of the weight matrix W is also the weight element sub-segment 1 (W 11 , W 21 , W 31 , W 41 ).
  • each characteristic element sub-segment in the row can be analyzed to compare with each column of the weight matrix.
  • the corresponding relationship between each weight element sub-segment, and the corresponding row in the feature matrix I and each column of the weight matrix, the feature element sub-segment and the weight element sub-segment with the corresponding relationship are regarded as a set of associated sub-segment pairs.
  • Sub-segment 1 (I 11 , I 12 , I 13 , I 14 ) and weight element sub-segment 1 (W 11 , W 21 , W 31 , W 41 ); feature element sub-segment 2 (I 15 , I 16 , I 17 ,I 18 ) and weight element sub-segment 2 (W 51 , W 61 , W 71 , W 81 ).
  • S205 Determine the output characteristics of the target quantization network layer based on the quantization processing results of the feature element sub-segments and weight element sub-segments in each group of associated sub-segment pairs.
  • the corresponding positions are The quantized value of the feature element and the quantized value of the weight element are calculated as low-bit products and then summed, and then the product summation result is multiplied with the feature reference value and the weight reference value to obtain the sub-inner product of the group of associated sub-segment pairs.
  • the position of the feature element corresponds to the position of the weight element.
  • the sub-inner product of each group of associated sub-segment pairs can be calculated through the following formula (3).
  • O i,s,j is the sub-inner product of the associated sub-segment pair corresponding to the s-th feature element sub-segment in the i-th row in the feature matrix I and the s-th weight element sub-segment in the j-th column in the weight matrix W.
  • C is the target segmentation coefficient
  • I′ i,t is the quantized value of the feature element in the i-th row and t-th column in the feature matrix I
  • W′ t,j is the feature element in the t-th row and j-th column in the weight matrix W quantified value.
  • bsmax(I i,s ) is the characteristic reference value of the s-th feature element sub-segment in the i-th row in the feature matrix I; bsmax(W s,j ) is the s-th weight element sub-segment in the j-th column in the weight matrix W Weight base value.
  • the value of t is a positive integer.
  • the output characteristics of the target quantization network layer are determined based on the sub-inner product of each group of associated sub-segment pairs. It may be to sum up the inner products of multiple groups of associated sub-segment pairs with the same number of corresponding rows in the feature matrix and the same number of corresponding columns in the weight matrix to obtain the element value at the corresponding row and corresponding column position in the output feature.
  • O i,j is the element value of the i-th row and j-th column in the matrix where the output feature is located; k is the total number of columns of the feature matrix (also the total number of rows of the weight matrix); O i,s,j is the feature matrix I The sub-inner product of the associated sub-segment pair corresponding to the s-th feature element sub-segment in the i-th row in the weight matrix W and the s-th weight element sub-segment in the j-th column in the weight matrix W.
  • each row of the feature matrix and each column of the weight matrix are divided into at least two weight element sub-segments, and then quantize the divided feature element sub-segments and weight element sub-segments according to the target quantization bit number, and determine the corresponding feature element sub-segments and weight element sub-segments as a set of associations
  • the output characteristics of the target quantization network layer are determined based on the quantization processing results of the feature element sub-segments and weight element sub-segments in each group of associated sub-segment pairs.
  • this solution determines the output characteristics of the target quantization network layer based on the feature element sub-segment and the weight element sub-segment, it first determines the correspondence between the feature element sub-segment and the weight element sub-segment. Based on this correspondence, it can be more accurate and The output features are quickly determined, thereby ensuring the accuracy of the target quantization network layer operation results.
  • this embodiment uses the Tensor Core computing unit of the GPU developed by NVIDIA to determine the output characteristics of the target quantization network layer based on the quantization processing results.
  • the implementation method is as follows: after obtaining the quantization processing results of each feature element sub-segment and each weight element sub-segment based on the above embodiment, the quantization results are sequentially loaded into the cache space of the Tensor Core computing unit, and then the cache space is The quantization results of the feature element sub-segments contained in each group of associated sub-segment pairs (i.e., the feature reference value and the quantized value of the feature element) and the quantification results of the weight element sub-segment (i.e., the weight reference value and the quantized value of the weight element) are as The input of the Tensor Core computing unit.
  • the Tensor Core computing unit can first provide low-bit multiplication calculation based on the input quantization result.
  • the quantized value of the feature element corresponding to the position and the quantized value of the weight element are multiplied and then summed to obtain a low-bit multiplication calculation.
  • the bit calculation result (for example, when the target number of quantization bits is 4, the low-bit calculation result obtained at this time is an integer result of type int32), and then the inverse quantization calculation is performed, that is, the low-bit calculation result is combined with the feature reference value and weight reference value Calculate the product to obtain the sub-inner product of each group of associated sub-segment pairs, which is a single-precision floating point type; finally, based on the sub-inner product of each group of associated sub-segment pairs, determine the output characteristics of the target quantization network layer.
  • This embodiment provides an example of implementing the data processing method of this embodiment based on the Tensor Core computing unit of the GPU developed by NVIDIA, which provides a basis for subsequent customization of chips (such as application specific integrated circuit (Application Specific Integrated Circuit, ASIC) chips).
  • chips such as application specific integrated circuit (Application Specific Integrated Circuit, ASIC) chips.
  • ASIC Application Specific Integrated Circuit
  • the above data processing method in this embodiment sequentially completes the process of converting floating point numbers into low-bit integers, low-bit matrix multiplication, and inverse quantization. Since the value of the weight matrix will not change during the entire calculation process, its quantization process can be completed offline, while the input feature matrix needs to be quantized online.
  • the size of the target segmentation coefficient C of the target quantization network layer will directly affect the accuracy of the quantization process. Generally, the larger the target segmentation coefficient C, the lower the numerical accuracy of the quantification representation, and the corresponding accuracy of the final output feature will also decrease. Decrease; the smaller the target segmentation coefficient C, the higher the numerical accuracy of the quantitative representation, and the corresponding accuracy of the final output feature will also increase. That is, the target segmentation coefficient C affects the calculation efficiency.
  • the target segmentation coefficient C is the key to balancing model accuracy and speed, and the value selection needs to be customized according to the scene requirements.
  • Figure 3 is a flow chart of another data processing method provided by an embodiment of the present disclosure. Based on the above embodiments, the embodiment of the present disclosure explains how to determine the target quantization network layer, the target segmentation coefficient and the target quantization bit number of the target quantization network layer. As shown in Figure 3, the data provided by this embodiment Treatment methods may include:
  • the original model can be a deep learning model that needs to be quantized, which contains at least one network layer capable of quantification, that is, an optional quantization network layer.
  • This optional quantized network layer contains matrix multiplication operators.
  • the optional quantization strategy is the strategy used to quantize the original model, which includes: an optional quantization network layer, an optional segmentation coefficient of the optional quantization network layer, and an optional number of quantization bits.
  • the optional quantization network layers can be different, but the optional segmentation coefficients and the number of optional quantization bits corresponding to different optional quantization network layers are the same; it can also be the optional quantization network included.
  • the layers are the same, but the optional segmentation coefficients and/or the number of optional quantization bits corresponding to the same optional quantization network layer are different; it is also possible to include optional quantization network layers with different optional segmentation coefficients and the number of optional quantization bits. etc., there is no limitation on this.
  • an implementation method for determining the optional quantization strategy of the original model may be: first determine the network layer containing the matrix multiplication operator in the original model as the optional quantization network layer, and then based on experience for each possible quantization network layer. Select the quantization network layer to configure at least one optional segmentation coefficient and the number of optional quantization bits; then use each optional quantization network layer and its corresponding optional segmentation coefficient and the number of optional quantization bits in turn as the original model An optional quantification strategy.
  • Another way to implement it is to first determine the network layer containing the matrix multiplication operator in the original model as an optional quantization network layer, and then for each optional quantization network layer, segment from the predetermined alternative quantization Segmented coefficients are randomly selected from the coefficient set, and the number of quantization bits is randomly selected from the set of alternative quantization bit numbers, and then randomly combined with the optional quantization network layer to obtain multiple optional quantization strategies.
  • S302 Obtain the quantitative contribution information of the optional quantization strategy obtained by performing data processing on the original model based on the optional quantization strategy.
  • the quantification contribution information in this embodiment refers to the degree of contribution of the optional quantization strategy to the quantization effect of the original model, which may include: model accuracy information and compression volume information.
  • the model accuracy information is the accuracy value of the model after quantizing the original model based on the optional quantization strategy.
  • the compressed volume information is the compressed volume value of the model volume after quantizing the original model based on this optional quantization strategy compared to before quantization.
  • the original model is quantized based on the optional quantization strategy, the optional quantization network layer corresponding to the optional quantization strategy is found in the original model, and then the optional quantization strategy is The optional segmentation coefficients and optional quantization bit numbers of the quantization network layer are assigned to the quantization parameters of the optional quantization network layer in the original model, and then the verification data set of the original model is input into the original model.
  • Each network layer of will perform data processing based on its network parameters to obtain the corresponding output results.
  • This embodiment mainly obtains the output results of the optional quantization network layer, that is, the test output features; combine the test output features with the optional quantization
  • the network layer performs error analysis on the real output features before quantization processing based on the optional segmentation coefficient and the optional number of quantization bits, and obtains the model accuracy value in the quantization contribution information corresponding to the optional quantization strategy. Then, the compression volume information in the quantization contribution information corresponding to the optional quantization strategy is determined according to the number of optional quantization bits in the optional quantization strategy.
  • the optional quantization network layer assigned optional segmentation coefficients and optional quantization bit numbers determines the test output characteristics based on its input feature matrix and its own weight matrix.
  • the method of determining the target quantization strategy from the optional quantization strategies may be to weigh the model accuracy information and the compression volume information at the same time, and select an optional quantization strategy with a relatively small loss in model accuracy and a relatively large compression volume as the Goal quantification strategy.
  • one implementation method is: first, based on the model accuracy information in the quantitative contribution information, select optional quantization strategies with model accuracy losses within an acceptable range, and then determine the corresponding quantization strategies for this part of the selected optional quantization strategies. Compression volume, use at least one of the top compression volumes as the target quantization strategy.
  • Another possible implementation method is: according to the model accuracy information in the quantified contribution information, sort multiple optional quantization strategies in order from high to low model accuracy, and then based on the compression volume information in the multiple quantified contribution information and the expected compression volume, and determine the target quantization strategy from the optional quantization strategies in order from high to low. For example, it is determined which of the top-ranked optional quantization strategies correspond to a total compression volume information that can reach the expected compression volume, and then these optional quantization strategies are used as target quantization strategies.
  • subsequent data processing operations can be performed based on the corresponding target quantization network layer in each target quantization strategy and its corresponding target segmentation coefficient and target number of quantization bits.
  • the operation process of the target quantization network layer in the original model can be quantified, thereby achieving the effect of quantifying the original model.
  • the number of columns of the feature matrix is equal to the number of rows of the weight matrix.
  • S306 Perform quantization processing on at least two feature element sub-segments and at least two weight element sub-segments according to the target quantization bit number of the target quantization network layer, and determine the output characteristics of the target quantization network layer based on the quantization processing results.
  • the original model is controlled to perform data processing based on the optional quantization strategy, and the quantification contribution information of the optional quantization strategy is determined based on the processing results.
  • the quantitative contribution information is used to determine the target quantization strategy, and then the feature matrix input to the target quantization network layer and the weight matrix of the target quantization network layer are processed according to the target segmentation coefficient and the target quantization bit number in the target quantization strategy to obtain the output features.
  • This solution determines the final quantization strategy of the model based on the quantitative contribution information of multiple optional quantization strategies. While ensuring the accuracy of model quantification, it also reduces the model volume, thereby improving the accuracy of model quantification.
  • Figure 4 is a flow chart of another data processing method provided by an embodiment of the present disclosure. Based on the above embodiments, the embodiments of the present disclosure explain how to determine the target quantification strategy from the optional quantization strategies based on the quantification contribution information. As shown in Figure 4, the data processing method provided by this embodiment may include:
  • the optional quantization strategy includes: an optional quantization network layer, an optional segmentation coefficient of the optional quantization network layer, and an optional number of quantization bits.
  • S402 Obtain the quantitative contribution information of the optional quantization strategy obtained by performing data processing on the original model based on the optional quantization strategy.
  • S403 Determine a newly selected quantification strategy from the optional quantification strategies based on the quantification contribution information corresponding to the optional quantification strategies.
  • the target quantification strategy from the optional quantification strategies when determining the target quantification strategy from the optional quantification strategies, it is determined through multiple screenings, that is, a part is screened out each time, and all the optional quantification strategies screened out multiple times are used as the target quantification strategies, and for the current time
  • the selected optional quantitative strategies will be used as newly selected quantitative strategies, and the selected optional quantitative strategies before the current time will be used as historical selected quantitative strategies.
  • one possible way to determine a new selected quantization strategy from the optional quantization strategies based on the quantitative contribution information corresponding to the optional quantization strategies is to simultaneously combine the model accuracy information and the compression volume information, each time from the available Among the selected quantization strategies, select a preset number of optional quantization strategies (such as 3) with smaller accuracy loss and larger compression volume as the newly selected quantization strategy.
  • Another implementation method is to sort the optional quantization strategies according to the model accuracy information and compression volume information corresponding to the optional quantization strategies; according to the sorting results and the compression volume information corresponding to the optional quantization strategies, sort the optional quantization strategies from Confirm to add the selected quantitative strategy.
  • each optional quantization strategy is sorted from high to low according to its corresponding model accuracy, and it is judged that the sum of the compression volume information corresponding to the top-ranked optional quantization strategies can reach the compression volume of this screening, then
  • These optional quantitative strategies are newly selected quantitative strategies this time.
  • the values of L, R and R' are positive numbers;
  • the second method can be used to determine the newly selected quantization strategy.
  • This method can select a target quantization strategy that meets the requirements of quantization accuracy and quantization volume faster and more accurately.
  • S404 Determine the total compression volume of the newly selected strategy and the historically selected quantization strategy.
  • volume information calculate the total compression volume of the original model by newly selected strategies and historical selected strategies. For example, the total compression volume is obtained by summing the compression volume corresponding to the newly selected strategy and the compression volume corresponding to the historical selected strategy.
  • S405 Determine whether the total compression volume meets the quantification requirement. If the total compression volume does not meet the quantification requirement, execute S406. If the total compression volume reaches the quantification requirement, execute S409.
  • the quantization requirement may be a preset expected compression volume. This embodiment can determine whether the currently reached total compression volume has reached the expected compression volume, that is, whether it has met the quantification requirements, after each time a part of the newly selected strategies is determined. If the currently reached total compression volume has not reached the expected compression volume, it means that the quantification requirements have not been met, and the subsequent operation of S406 needs to be performed. If the total compression volume currently reached reaches the expected compression volume, it means that the quantification requirements have been met, and the subsequent operations of S409 need to be performed.
  • the corresponding compression volume in the original model will be The optional quantization network layer assigns quantization parameters, that is, achieves preliminary quantification of the original model. Then use the training samples to perform model training on the initially quantized original model, which can include forward training and reverse training. to obtain a preliminary quantitative model.
  • S408 Use optional quantification strategies other than the newly selected quantification strategy among the optional quantification strategies as new optional quantification strategies, add the newly selected quantification strategy to the historically selected quantification strategy, and add the preliminary The quantized model is used as the original model, and the operation of S402 is returned.
  • the number of columns of the feature matrix is equal to the number of rows of the weight matrix.
  • S412 Perform quantization processing on at least two feature element sub-segments and at least two weight element sub-segments according to the target quantization bit number of the target quantization network layer, and determine the output characteristics of the target quantization network layer based on the quantization processing results.
  • the original model is controlled to perform data processing based on the optional quantization strategy, and the quantification contribution information of the optional quantization strategy is determined based on the processing results.
  • the newly selected quantization strategy is determined from the optional quantization strategies in batches. If the total compression volume of the newly selected quantization strategy and the historically selected quantization strategy does not reach the quantization strategy, the newly selected quantization strategy will be used based on the newly selected quantization strategy.
  • the original model After the original model is quantized and trained, return and re-execute the quantification contribution information of the optional quantization strategy and its subsequent operations until the total compression volume of the new and historical selected quantization strategies reaches the quantization strategy, and then the new and historical selected quantization strategies will be added.
  • the quantization strategy is selected as the target quantization strategy, and then the feature matrix input to the target quantization network layer and the weight matrix of the target quantization network layer are processed according to the target segmentation coefficient and the target quantization bit number in the target quantization strategy to obtain the output features.
  • This solution obtains the target quantification strategy in batches, and performs quantification and training processing of the original model based on the newly selected quantification strategy between batches, which greatly ensures the accuracy of the extracted target quantification strategy, thereby ensuring Model quantification accuracy.
  • Figure 5 is a schematic structural diagram of a data processing device provided by an embodiment of the present disclosure.
  • the embodiment of the present disclosure is suitable for performing quantitative processing on the data calculation process of the target quantization network layer in the deep learning model, and is suitable for performing quantitative processing on the data calculation process of the target quantization network layer in the deep learning model.
  • the feature matrix input to the target quantization network layer and the weight matrix of the target quantization network layer are processed to obtain the output features of the target network layer.
  • the device can be configured in an electronic device installed with a deep learning model and implemented using software and/or hardware.
  • the device can implement the data processing method of any embodiment of the present disclosure. As shown in Figure 5, the data processing device 500 includes:
  • the matrix acquisition module 501 is configured to acquire the feature matrix input by the target quantization network layer and the weight matrix of the target quantization network layer; wherein the number of columns of the feature matrix is equal to the number of rows of the weight matrix; the matrix segmentation module 502 is configured to obtain the target quantization according to The target segmentation coefficient of the network layer divides each row of feature elements of the feature matrix into at least two feature element sub-segments, and divides each column of weight elements of the weight matrix into at least two weight element sub-segments; the quantification processing module 503, It is configured to perform quantization processing on at least two feature element sub-segments and at least two weight element sub-segments according to the target quantization bit number of the target quantization network layer; the feature determination module 504 is configured to determine the target quantization network layer based on the quantization processing results. output characteristics.
  • each row of the feature matrix and each column of the weight matrix are divided into at least two weight element sub-segments, and then perform quantization processing on the divided feature element sub-segments and weight element sub-segments according to the target quantization bit number, and determine the output characteristics of the target quantization network layer based on the processing results.
  • this solution divides each row of the feature matrix and each column of the weight matrix into multiple sub-segments for quantization.
  • the matrix segmentation module 502 is configured as:
  • each row of feature elements of the feature matrix is divided into at least two feature element sub-segments; according to the second coefficient in the target segmentation coefficient, each row of the weight matrix is A column of weight elements is divided into at least two weight element sub-segments; wherein the first coefficient and the second coefficient are in an integer multiple relationship.
  • the feature determination module 504 includes:
  • the sub-segment pair determining unit 610 is configured to determine each feature element sub-segment in the feature matrix and the corresponding weight element sub-segment in the weight matrix, and use the corresponding feature element sub-segments and weight element sub-segments as a group Associated sub-segment pairs; the feature calculation unit 620 is configured to determine the output features of the target quantization network layer based on the quantization processing results of the feature element sub-segments and weight element sub-segments in each set of associated sub-segment pairs.
  • the ratio of the number of feature element sub-segments and weight element sub-segments contained in each group of associated sub-segment pairs is the same as the ratio of the segmentation coefficients that divide the weight matrix and the feature matrix.
  • the feature determination module 504 is configured as:
  • the output characteristics of the target quantization network layer are determined based on the quantization processing results.
  • the feature matrix is the speech features obtained after the speech segments are processed by the feature extraction layer; the output features are used to perform semantic recognition processing on the speech segments.
  • the data processing device 500 also includes:
  • the optional strategy determination module 505 is configured to determine the optional quantization strategy of the original model; wherein the optional quantization strategy includes: an optional quantization network layer, an optional segmentation coefficient of the optional quantization network layer, and an optional number of quantization bits;
  • the contribution information acquisition module 506 is configured to obtain the quantitative contribution information of the optional quantification strategy obtained by performing data processing based on the optional quantification strategy on the original model;
  • the target strategy determination module 507 is configured to obtain the quantitative contribution information from the optional quantification strategy based on the quantitative contribution information. Determine the target quantization strategy to obtain the target quantization network layer, the target segmentation coefficient and the target quantization bit number of the target quantization network layer.
  • the target policy determination module 507 includes:
  • the new strategy determination unit 710 is configured to determine a newly selected quantization strategy from the optional quantization strategies based on the quantification contribution information corresponding to the optional quantization strategy; the compression volume determination unit 720 is configured to determine the newly selected strategy and history The total compression volume of the selected quantization strategy; the target strategy determination unit 730 is configured to use the newly selected quantization strategy and the historically selected quantization strategy as the target quantization strategy when the total compression volume reaches the quantization requirement.
  • the quantitative contribution information includes: model accuracy information and compression volume information; a new strategy determination unit 710 is set to:
  • the target policy determination module 507 also includes:
  • the quantization training unit 740 is configured to perform preliminary quantization on the original model based on the newly selected quantization strategy when the total compression volume does not meet the quantization requirements, and trains the original model after preliminary quantization to obtain a preliminary quantization model; historical quantification
  • the strategy update unit 750 is configured to add the newly selected quantization strategy to the historically selected quantization strategy;
  • the loop operation unit 760 is configured to add other optional quantization strategies in the optional quantization strategies except the newly selected quantization strategy.
  • the above-mentioned products can execute the methods provided by any embodiment of the present disclosure, and have corresponding functional modules and effects for executing the methods.
  • the present disclosure also provides an electronic device, a readable storage medium, and a computer program product to implement the above data processing method.
  • FIG. 10 shows a schematic block diagram of an example electronic device 600 that may be used to implement embodiments of the present disclosure.
  • Electronic device 600 is intended to represent many forms of digital computers, such as laptop computers, desktop computers, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers.
  • Electronic devices may also represent various forms of mobile devices, such as personal digital assistants, cellular phones, smartphones, wearable devices, and other similar computing devices.
  • the components shown herein, their connections and relationships, and their functions are examples only and are not intended to limit implementations of the disclosure described and/or claimed herein.
  • the electronic device 600 includes a computing unit 601, which can be loaded into a random access memory (Random Access Memory) according to a computer program stored in a read-only memory (Read-Only Memory, ROM) 602 or from a storage unit 608.
  • Computer program in RAM 603 to perform various appropriate actions and processes.
  • RAM random access memory
  • various programs and data required for the operation of the electronic device 600 can also be stored.
  • Computing unit 601, ROM 602 and RAM 603 are connected to each other via bus 604.
  • An input/output (I/O) interface 605 is also connected to bus 604.
  • the I/O interface 605 includes: an input unit 606, such as a keyboard, a mouse, etc.; an output unit 607, such as various types of displays, speakers, etc.; a storage unit 608, such as a magnetic disk, an optical disk, etc. etc.; and a communication unit 609, such as a network card, modem, wireless communication transceiver, etc.
  • the communication unit 609 allows the electronic device 600 to exchange information/data with other devices through a computer network such as the Internet and/or various telecommunications networks.
  • Computing unit 601 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 601 include, but are not limited to, a central processing unit (Central Processing Unit, CPU), a graphics processing unit (Graphics Processing Unit, GPU), a variety of dedicated artificial intelligence (Artificial Intelligence, AI) computing chips, a variety of running The computing unit of the machine learning model algorithm, Digital Signal Processing (DSP), and any appropriate processor, controller, microcontroller, etc.
  • the computing unit 601 performs a plurality of methods and processes described above, such as data processing methods.
  • the data processing method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 608.
  • part or all of the computer program may be loaded and/or installed onto the electronic device 600 via the ROM 602 and/or the communication unit 609.
  • the computer program When the computer program is loaded into RAM 603 and executed by computing unit 601, one or more steps of the data processing method described above may be performed.
  • the computing unit 601 may be configured to perform the data processing method in any other suitable manner (eg, by means of firmware).
  • Various implementations of the systems and techniques described above may be implemented in digital electronic circuit systems, integrated circuit systems, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Parts (ASSP), System on Chip (SOC), Complex Programmable Logic Device (CPLD), computer hardware, firmware, software, and/or their realized in combination.
  • Various implementations may include implementation in one or more computer programs that may be executed and/or interpreted on a programmable system including at least one programmable processor that may is a special-purpose or general-purpose programmable processor that can receive data and instructions from a storage system, at least one input device, and at least one output device, and transmit data and instructions to the storage system, the at least one input device, and the at least one output device.
  • Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general-purpose computer, special-purpose computer, or other programmable data processing device, such that the program codes, when executed by the processor or controller, cause the functions specified in the flowcharts and/or block diagrams/ The operation is implemented.
  • the program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
  • a machine-readable medium may be a tangible medium that may contain or store a program for use by or in connection with an instruction execution system, apparatus, or device.
  • the machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • Machine-readable media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices or devices, or any suitable combination of the foregoing.
  • machine-readable storage media examples include one or more wire-based electrical connections, laptop disks, hard drives, RAM, ROM, Erasable Programmable Read-Only Memory (EPROM), or flash memory ), optical fiber, portable compact disk read-only memory (Compact Disc Read-Only Memory, CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • the systems and techniques described herein may be implemented on a computer having a display device (e.g., a cathode ray tube (CRT)) or a liquid crystal display (e.g., a CRT) configured to display information to a user.
  • a display device e.g., a cathode ray tube (CRT)
  • a liquid crystal display e.g., a CRT
  • LCD Liquid Crystal Display
  • keyboard and pointing device e.g., a mouse or a trackball
  • Other kinds of devices may also be configured to provide interaction with the user; for example, the feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and may be provided in any form, including Acoustic input, voice input or tactile input) to receive input from the user.
  • the systems and techniques described herein may be implemented in a computing system that includes back-end components (e.g., as a data server), or a computing system that includes middleware components (e.g., an application server), or a computing system that includes front-end components (e.g., A user's computer having a graphical user interface or web browser through which the user can interact with implementations of the systems and technologies described herein), or including such backend components, middleware components, or any combination of front-end components in a computing system.
  • the components of the system may be interconnected by any form or medium of digital data communication (eg, a communications network). Examples of communication networks include: Local Area Network (LAN), Wide Area Network (WAN), blockchain network, and the Internet.
  • Computer systems may include clients and servers. Clients and servers are generally remote from each other and typically interact over a communications network. The relationship of client and server is created by computer programs running on corresponding computers and having a client-server relationship with each other.
  • the server can be a cloud server, also known as cloud computing server or cloud host. It is a host product in the cloud computing service system to solve the problems that exist in traditional physical host and virtual private server (VPS) services. It has the disadvantages of difficult management and weak business scalability.
  • the server can also be a distributed system server or a server combined with a blockchain.
  • Artificial intelligence is the study of using computers to simulate some human thinking processes and intelligent behaviors (such as learning, reasoning, thinking, planning, etc.). It has both hardware-level technology and software-level technology. Artificial intelligence hardware technology generally includes sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing and other technologies; artificial intelligence software technology mainly includes computer vision technology, speech recognition technology, natural language processing technology and machine learning/depth Learning technology, big data processing technology, knowledge graph technology and other major directions.
  • Cloud computing refers to a flexible and scalable shared physical or virtual resource pool through network access.
  • Resources can include servers, operating systems, networks, software, applications, storage devices, etc., and can be on-demand and self-service.
  • Steps can be reordered, added, or removed using various forms of the process shown above.
  • multiple steps described in the present disclosure can be executed in parallel, sequentially, or in different orders.
  • the desired results of the technical solution disclosed in the present disclosure can be achieved, there is no limitation here.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Image Analysis (AREA)

Abstract

Procédé et appareil de traitement de données, et dispositif et support de stockage. Le procédé de traitement des données consiste : à acquérir une matrice de caractéristiques, qui est entrée par une couche de réseau de quantification cible, et une matrice de poids de la couche de réseau de quantification cible (S101), le nombre de colonnes de la matrice de caractéristiques étant égal au nombre de lignes de la matrice de poids; en fonction d'un coefficient de segmentation cible de la couche de réseau de quantification cible, à diviser chaque ligne d'éléments de caractéristiques de la matrice de caractéristiques en au moins deux sous-segments d'éléments de caractéristiques, et à diviser chaque colonne d'éléments de poids de la matrice de poids en au moins deux sous-segments d'éléments de poids (S102); et à effectuer un traitement de quantification sur lesdits deux sous-segments d'éléments caractéristiques et lesdits deux sous-segments d'éléments de poids en fonction du nombre de bits de quantification cibles de la couche de réseau de quantification cible, et à déterminer une caractéristique de sortie de la couche de réseau de quantification cible en fonction d'un résultat de traitement de quantification (S103).
PCT/CN2022/132429 2022-04-28 2022-11-17 Procédé et appareil de traitement de données, et dispositif et support de stockage WO2023207039A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210463316.9A CN114781650B (zh) 2022-04-28 2022-04-28 一种数据处理方法、装置、设备以及存储介质
CN202210463316.9 2022-04-28

Publications (1)

Publication Number Publication Date
WO2023207039A1 true WO2023207039A1 (fr) 2023-11-02

Family

ID=82434750

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/132429 WO2023207039A1 (fr) 2022-04-28 2022-11-17 Procédé et appareil de traitement de données, et dispositif et support de stockage

Country Status (2)

Country Link
CN (1) CN114781650B (fr)
WO (1) WO2023207039A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117312255A (zh) * 2023-11-29 2023-12-29 湖南中斯信息科技有限公司 一种电子文档拆分优化管理方法及系统

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114781650B (zh) * 2022-04-28 2024-02-27 北京百度网讯科技有限公司 一种数据处理方法、装置、设备以及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111598227A (zh) * 2020-05-20 2020-08-28 字节跳动有限公司 数据处理方法、装置、电子设备及计算机可读存储介质
WO2020190772A1 (fr) * 2019-03-15 2020-09-24 Futurewei Technologies, Inc. Compression et optimisation de modèle de réseau de neurones artificiels
CN112529189A (zh) * 2020-11-10 2021-03-19 北京百度网讯科技有限公司 模型压缩方法、装置、电子设备及存储介质
WO2021174370A1 (fr) * 2020-03-05 2021-09-10 Huawei Technologies Co., Ltd. Procédé et système de division et d'attribution de largeur de bit de modèles d'apprentissage profond pour inférence sur des systèmes distribués
CN113408704A (zh) * 2021-06-29 2021-09-17 深圳市商汤科技有限公司 数据处理方法、装置、设备及计算机可读存储介质
CN114781650A (zh) * 2022-04-28 2022-07-22 北京百度网讯科技有限公司 一种数据处理方法、装置、设备以及存储介质

Family Cites Families (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106796668B (zh) * 2016-03-16 2019-06-14 香港应用科技研究院有限公司 用于人工神经网络中比特深度减少的方法和系统
CN107944555B (zh) * 2017-12-07 2021-09-17 广州方硅信息技术有限公司 神经网络压缩和加速的方法、存储设备和终端
CN108133266B (zh) * 2017-12-12 2021-07-09 北京信息科技大学 一种基于非均匀量化的神经网络权值压缩方法及使用方法
WO2019127362A1 (fr) * 2017-12-29 2019-07-04 清华大学 Procédé de compression de bloc de modèle de réseau neuronal, procédé d'apprentissage, dispositif informatique et système
CN108765247B (zh) * 2018-05-15 2023-01-10 腾讯科技(深圳)有限公司 图像处理方法、装置、存储介质及设备
CN110874636B (zh) * 2018-09-04 2023-06-30 杭州海康威视数字技术股份有限公司 一种神经网络模型压缩方法、装置和计算机设备
CN112955907A (zh) * 2018-10-30 2021-06-11 谷歌有限责任公司 量化训练的长短期记忆神经网络
KR102659494B1 (ko) * 2019-01-21 2024-04-23 삼성전자주식회사 전자 장치 및 그 제어 방법
KR102152374B1 (ko) * 2019-02-25 2020-09-07 주식회사 딥엑스 인공신경망의 비트 양자화 방법 및 시스템
CN110263913A (zh) * 2019-05-23 2019-09-20 深圳先进技术研究院 一种深度神经网络压缩方法及相关设备
CN110348562B (zh) * 2019-06-19 2021-10-15 北京迈格威科技有限公司 神经网络的量化策略确定方法、图像识别方法和装置
CN110288030B (zh) * 2019-06-27 2023-04-07 重庆大学 基于轻量化网络模型的图像识别方法、装置及设备
CN110782003A (zh) * 2019-09-20 2020-02-11 北京航空航天大学 一种基于哈希学习的神经网络压缩方法及系统
CN111222638B (zh) * 2019-11-21 2023-05-12 湖南大学 一种基于神经网络的网络异常检测方法及装置
CN113762493A (zh) * 2020-06-01 2021-12-07 阿里巴巴集团控股有限公司 神经网络模型的压缩方法、装置、加速单元和计算系统
CN111695695B (zh) * 2020-06-09 2023-08-08 北京百度网讯科技有限公司 用户决策行为量化分析方法及装置
CN112669861B (zh) * 2020-12-09 2023-04-07 北京百度网讯科技有限公司 音频数据处理方法、装置、设备和存储介质
CN114005452A (zh) * 2021-10-29 2022-02-01 北京百度网讯科技有限公司 提取语音特征的方法、装置、电子设备及存储介质
CN114282670A (zh) * 2022-01-14 2022-04-05 北京百度网讯科技有限公司 神经网络模型的压缩方法、设备和存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020190772A1 (fr) * 2019-03-15 2020-09-24 Futurewei Technologies, Inc. Compression et optimisation de modèle de réseau de neurones artificiels
WO2021174370A1 (fr) * 2020-03-05 2021-09-10 Huawei Technologies Co., Ltd. Procédé et système de division et d'attribution de largeur de bit de modèles d'apprentissage profond pour inférence sur des systèmes distribués
CN111598227A (zh) * 2020-05-20 2020-08-28 字节跳动有限公司 数据处理方法、装置、电子设备及计算机可读存储介质
CN112529189A (zh) * 2020-11-10 2021-03-19 北京百度网讯科技有限公司 模型压缩方法、装置、电子设备及存储介质
CN113408704A (zh) * 2021-06-29 2021-09-17 深圳市商汤科技有限公司 数据处理方法、装置、设备及计算机可读存储介质
CN114781650A (zh) * 2022-04-28 2022-07-22 北京百度网讯科技有限公司 一种数据处理方法、装置、设备以及存储介质

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117312255A (zh) * 2023-11-29 2023-12-29 湖南中斯信息科技有限公司 一种电子文档拆分优化管理方法及系统
CN117312255B (zh) * 2023-11-29 2024-02-20 湖南中斯信息科技有限公司 一种电子文档拆分优化管理方法及系统

Also Published As

Publication number Publication date
CN114781650A (zh) 2022-07-22
CN114781650B (zh) 2024-02-27

Similar Documents

Publication Publication Date Title
WO2023207039A1 (fr) Procédé et appareil de traitement de données, et dispositif et support de stockage
US11295208B2 (en) Robust gradient weight compression schemes for deep learning applications
JP2022058915A (ja) 画像認識モデルをトレーニングするための方法および装置、画像を認識するための方法および装置、電子機器、記憶媒体、並びにコンピュータプログラム
WO2019155064A1 (fr) Compression de données à l'aide d'un codeur, d'un décodeur et de réseaux neuronaux antérieurs appris conjointement
CN110766142A (zh) 模型生成方法和装置
JP2022172362A (ja) 画像処理方法、顔認識モデルトのレーニング方法、装置及び機器
US20220374678A1 (en) Method for determining pre-training model, electronic device and storage medium
CN113837308B (zh) 基于知识蒸馏的模型训练方法、装置、电子设备
CN112560985A (zh) 神经网络的搜索方法、装置及电子设备
KR20220127332A (ko) 다양한 텍스트의 자동 생성
CN116362325A (zh) 一种基于模型压缩的电力图像识别模型轻量化应用方法
JP7357114B2 (ja) 生体検出モデルのトレーニング方法、装置、電子機器および記憶媒体
JP2023085353A (ja) 特徴抽出モデル訓練方法、画像分類方法および関連装置
CN113409898B (zh) 分子结构获取方法、装置、电子设备及存储介质
CN112949433B (zh) 视频分类模型的生成方法、装置、设备和存储介质
CN114020950A (zh) 图像检索模型的训练方法、装置、设备以及存储介质
CN113657468A (zh) 预训练模型的生成方法、装置、电子设备和存储介质
CN113435208A (zh) 学生模型的训练方法、装置及电子设备
CN113408304B (zh) 文本翻译方法、装置、电子设备及存储介质
CN114881227A (zh) 模型压缩方法、图像处理方法、装置和电子设备
US20210232891A1 (en) Neural network model compression with structured weight unification
CN114998649A (zh) 图像分类模型的训练方法、图像分类方法及装置
CN114882388A (zh) 多任务模型的训练及预测方法、装置、设备和介质
US20210201157A1 (en) Neural network model compression with quantizability regularization
CN113961765A (zh) 基于神经网络模型的搜索方法、装置、设备和介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22939854

Country of ref document: EP

Kind code of ref document: A1