WO2023207039A1

WO2023207039A1 - Data processing method and apparatus, and device and storage medium

Info

Publication number: WO2023207039A1
Application number: PCT/CN2022/132429
Authority: WO
Inventors: 王桂彬; 丛士钧; 贾铭; 贾磊
Original assignee: 北京百度网讯科技有限公司
Priority date: 2022-04-28
Filing date: 2022-11-17
Publication date: 2023-11-02
Also published as: CN114781650A; CN114781650B

Abstract

A data processing method and apparatus, and a device and a storage medium. The data processing method comprises: acquiring a feature matrix, which is input by a target quantization network layer, and a weight matrix of the target quantization network layer (S101), wherein the number of columns of the feature matrix is equal to the number of rows of the weight matrix; according to a target segmentation coefficient of the target quantization network layer, dividing each row of feature elements of the feature matrix into at least two feature element sub-segments, and dividing each column of weight elements of the weight matrix into at least two weight element sub-segments (S102); and performing quantization processing on the at least two feature element sub-segments and the at least two weight element sub-segments according to the number of target quantization bits of the target quantization network layer, and determining an output feature of the target quantization network layer according to a quantization processing result (S103).

Description

Data processing methods, devices, equipment and storage media

This application claims priority to the Chinese patent application with application number 202210463316.9, which was submitted to the China Patent Office on April 28, 2022. The entire content of this application is incorporated into this application by reference.

Technical field

This disclosure relates to the field of artificial intelligence technology and the field of deep learning technology, and can be applied to scenarios such as speech recognition, natural language processing, and information recommendation.

Background technique

With the development of artificial intelligence technology, deep learning technology is increasingly used in daily life. In order to continuously improve the model accuracy of deep learning models, the complexity and parameter amount of the model continue to increase, which directly affects the model volume and The computing speed of the model, which affects the cost of implementing artificial intelligence technology, needs to be improved urgently.

Contents of the invention

The present disclosure provides a data processing method, device, equipment and storage medium.

According to one aspect of the present disclosure, a data processing method is provided, including:

Obtain the feature matrix input by the target quantization network layer and the weight matrix of the target quantization network layer; where the number of columns of the feature matrix is equal to the number of rows of the weight matrix;

According to the target segmentation coefficient of the target quantization network layer, each row of feature elements of the feature matrix is divided into at least two feature element sub-segments, and each column of weight elements of the weight matrix is divided into at least two weight element sub-segments;

According to the target quantization bit number of the target quantization network layer, quantization processing is performed on at least two feature element sub-segments and at least two weight element sub-segments, and based on the quantization processing results, the output characteristics of the target quantization network layer are determined.

According to an aspect of the present disclosure, a data processing apparatus is provided, including:

A matrix acquisition module, configured to acquire the feature matrix input by the target quantization network layer and the weight matrix of the target quantization network layer; wherein the number of columns of the feature matrix is equal to the number of rows of the weight matrix;

A matrix segmentation module configured to divide each row of feature elements of the feature matrix into at least two feature element sub-segments according to the target quantification target segmentation coefficient of the network layer, and weight each column of the weight matrix The elements are divided into at least two weight element sub-segments;

A quantization processing module configured to perform quantization processing on the at least two feature element sub-segments and the at least two weight element sub-segments according to the target quantization bit number of the target quantization network layer;

A feature determination module is configured to determine the output features of the target quantization network layer based on the quantization processing results.

According to another aspect of the present disclosure, an electronic device is provided, the electronic device including:

at least one processor; and

A memory communicatively connected to at least one processor; wherein,

The memory stores instructions that can be executed by at least one processor, and the instructions are executed by at least one processor, so that at least one processor can execute the above-mentioned data processing method.

According to another aspect of the present disclosure, a non-transitory computer-readable storage medium storing computer instructions is provided, wherein the computer instructions are used to cause the computer to execute the above-mentioned data processing method.

According to another aspect of the present disclosure, a computer program product is provided, including a computer program that implements the above-mentioned data processing method when executed by a processor.

Description of the drawings

Figure 1 is a flow chart of a data processing method provided by an embodiment of the present disclosure;

Figure 2 is a flow chart of another data processing method provided by an embodiment of the present disclosure;

Figure 3 is a flow chart of another data processing method provided by an embodiment of the present disclosure;

Figure 4 is a flow chart of another data processing method provided by an embodiment of the present disclosure;

Figure 5 is a schematic structural diagram of a data processing device provided by an embodiment of the present disclosure;

Figure 6 is a schematic structural diagram of a feature determination module provided by an embodiment of the present disclosure;

Figure 7 is a schematic structural diagram of another data processing device provided by an embodiment of the present disclosure;

Figure 8 is a schematic structural diagram of a target strategy determination module provided by an embodiment of the present disclosure;

Figure 9 is a schematic structural diagram of another target strategy determination module provided by an embodiment of the present disclosure;

Figure 10 is a block diagram of an electronic device that implements a data processing method provided by an embodiment of the present disclosure.

Detailed ways

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the present disclosure are included to facilitate understanding, and they should be considered to be exemplary only. For the sake of clarity and conciseness, descriptions of well-known functions and structures as well as functions and structures that are less relevant to the embodiments described below are omitted from the following description.

Figure 1 is a flow chart of a data processing method provided by an embodiment of the present disclosure; the embodiment of the present disclosure is suitable for performing quantitative processing on the data calculation process of the target quantification network layer in the deep learning model, and is suitable for performing quantitative processing on the data calculation process of the target quantization network layer in the deep learning model. The feature matrix input to the target quantization network layer and the weight matrix of the target quantization network layer are processed to obtain the output features of the target network layer. The method may be executed by a data processing device, which may be implemented in software and/or hardware. Can be integrated into electronic devices configured with deep learning models. As shown in Figure 1, the data processing method provided by this embodiment may include:

S101: Obtain the feature matrix input to the target quantization network layer and the weight matrix of the target quantization network layer.

The target quantization network layer may be a network layer that operates matrix multiplication operators in the deep learning model. The matrix multiplication operators may include but are not limited to: fully connected operators and other derivative operators, such as transformer operators.

The feature matrix input to the target quantification network layer can be the input information input to the target network layer. For example, if the target quantification network layer is the first network layer in the deep learning model, then the feature matrix can be the input information of the deep learning model. Input, if the target quantization network layer is not the first network layer of the deep learning model, the feature matrix can be the output of the network layer located above the target quantization network layer in the deep learning model. The weight matrix of the target quantization network layer can be the inherent network parameters of the weight coefficients of the target quantization network layer that are obtained during the network training stage and are used to characterize the input features of this layer. Since the target quantization network layer corresponds to the matrix multiplication operator, the number of columns of the feature matrix needs to be equal to the number of rows of the weight matrix. That is, the size of the feature matrix is: m*k, and the size of the weight matrix is k*n. Among them, the values of m, k, and n are positive integers.

This embodiment can obtain the feature data input to the target quantization network layer as a feature matrix, and obtain the inherent weight parameters in the target quantization network layer as a weight matrix. If there are multiple input data to the target network layer, the input data whose number of columns is the same as the number of rows of the weight matrix can be selected as the feature matrix input to the target quantization network layer.

S102. According to the target segmentation coefficient of the target quantization network layer, divide each row of feature elements of the feature matrix into at least two feature element sub-segments, and divide each column of weight elements of the weight matrix into at least two weight element sub-segments.

The target segmentation coefficient may be one of the quantization configuration parameters required when quantizing the operation process of the target quantization network layer. It is used to characterize the number of matrix elements contained in each sub-segment after division. For example, if the target segmentation coefficient is C, each C consecutive elements in the matrix can be divided into a segment, that is, the number of matrix elements contained in each divided subsegment is C. Among them, the value of C is a positive integer. The value of the target segmentation coefficient may be predetermined. For example, one may be selected from a variety of optional segmentation coefficients as the target segmentation coefficient through a large number of test analyses. It can also be set based on experience, etc., which is not limited. In this embodiment, the target segmentation coefficient can be set to divide the feature elements of each row of the feature matrix and the matrix elements of each column of the weight matrix into equal parts. That is, if the number of columns of the feature matrix and the number of rows of the weight matrix are both k, then this When the value of the target segmentation coefficient C can be evenly divided by k.

In this embodiment, the matrix elements in the feature matrix are called feature elements, and each group of feature elements after dividing the feature elements is regarded as a feature element sub-segment; the matrix elements in the weight matrix are called weight elements, and the weight elements are Each divided group of weight elements is treated as a weight element sub-segment.

In this embodiment, according to the target segmentation coefficient C, each row of feature elements in the feature matrix can be divided into at least two segments with C adjacent feature elements as a group, each of which serves as a feature element sub-segment. ; Then according to the target segmentation coefficient C, each column of weight elements in the weight matrix is divided into at least two segments with C adjacent weight elements as a group, and each segment is used as a weight element sub-segment.

For example, if the characteristic matrix is matrix I, that is

The weight matrix is the matrix W, that is

And the target segmentation coefficient C is 4, then each row in the matrix I is divided based on the target segmentation coefficient C, and 8 characteristic element sub-segments are obtained, that is, the characteristic element sub-segment 1 (I ₁₁ , I ₁₂ , I ₁₃ , I ₁₄ ), characteristic element sub-segment 2 (I ₁₅ , I ₁₆ , I ₁₇ , I ₁₈ ), characteristic element sub-segment 3 (I ₂₁ , I ₂₂ , I ₂₃ , I ₂₄ ), characteristic element sub-segment 4 (I ₂₅ ,I ₂₆ ,I ₂₇ ,I ₂₈ ), characteristic element sub-segment 5 (I ₃₁ ,I ₃₂ ,I ₃₃ ,I ₃₄ ), characteristic element sub-segment 6 (I ₃₅ ,I ₃₆ ,I ₃₇ ,I ₃₈ ), characteristic Element sub-segment 7 (I ₄₁ , I ₄₂ , I ₄₃ , I ₄₄ ) and characteristic element sub-segment 8 (I ₄₅ , I ₄₆ , I ₄₇ , I ₄₈ ). Divide each column in the matrix W based on the target segmentation coefficient C to obtain 4 weight element sub-segments, namely weight element sub-segment 1 (W ₁₁ , W ₂₁ , W ₃₁ , W ₄₁ ), weight element sub-segment 2 ( W ₅₁ , W ₆₁ , W ₇₁ , W ₈₁ ), weight element sub-segment 3 (W ₁₂ , W ₂₂ , W ₃₂ , W ₄₂ ) and weight element sub-segment 4 (W ₅₂ , W ₆₂ , W ₇₂ , W ₈₂ ) .

S103: Perform quantization processing on at least two feature element sub-segments and at least two weight element sub-segments according to the target quantization bit number of the target quantization network layer, and determine the output characteristics of the target quantization network layer based on the quantization processing results.

The target quantization bit number may be another parameter among the quantization configuration parameters required when quantizing the operation process of the target quantization network layer. It is used to represent the degree of quantization of the matrix multiplication operator, that is, the smaller the value of the target number of quantization bits, the higher the degree of quantization. For example, the value of the target number of quantization bits in this embodiment is usually not greater than 4. For example, it can be The value is 1bit, 2bit or 4bit, etc.

In this embodiment, the process of quantizing each feature element sub-segment and each weight element sub-segment based on the target number of quantization bits includes: determining the characteristics of the feature element sub-segment based on the feature element value within each feature element sub-segment. The reference value, for example, can be the feature element value with the largest absolute value within the feature element sub-segment as the feature reference value of the feature element sub-segment, and then based on the feature reference value and the target quantization bit number of the target quantization network layer, as follows: Formula (1) determines the quantified value of each feature element within the feature element sub-segment;

Among them, I′ _i,p is the quantized value of the feature element in the i-th row and p-th column of the feature matrix I; I _i,p is the feature element in the i-th row and p-th column of the feature matrix I; absmax(I _i,s ) is the feature reference value of the s-th feature element sub-segment in the i-th row of feature matrix I; B is the target quantization bit number of the target quantization network layer.

In the same way, according to the weight element value in each weight element sub-segment, determine the weight base value of the weight element sub-segment, and according to the weight base value and the target quantization bit number, determine the weight element sub-segment according to the following formula (2) The quantized value of the weight element.

Among them, W′ _q,j is the quantized value of the weight element in the qth row and jth column of the weight matrix W; W _q,j is the weight element of the qth row and jth column of the weight matrix W; absmax(I _j,s ) is the weight reference value of the s-th weight element subsection of the j-th column of the weight matrix W; B is the target quantization bit number of the target quantization network layer.

The values of variables i, p, s, j, and q in this embodiment are positive integers.

In this embodiment, the process of converting each feature element or weight element into its corresponding quantized value is essentially a process of quantizing the feature element or weight element into a low-bit integer corresponding to the target quantization bit number.

The quantified results obtained in this embodiment can be saved in a compact format first, so that they can be called later when calculating the output features. For example, if the target number of quantization bits is B=4, then since one byte is 8 bits, one byte can be used to store the quantized values of two feature elements or the quantized values of two weight elements, as well as each feature benchmark. value and each weight reference value, where each feature reference value takes up 4 bytes, and each weight reference value also takes up 4 bytes.

After quantizing each feature element sub-segment and each weight element sub-segment based on the above method, the quantification processing results can be used, that is, the feature reference value of each feature element sub-segment and each feature within the feature element sub-segment. The quantized value of the element; as well as the weight reference value of each weight element sub-segment and the quantized value of each weight element within the weight element sub-segment, the target quantization network layer is determined through the process of low-bit matrix multiplication calculation and inverse quantization. Output features.

The target quantization network layer of the above solution in this embodiment can be located in any deep learning model configured with a matrix multiplication operator, for example, it can be located in an image recognition model, a speech recognition model, or a text semantic parsing model, etc.

In this embodiment, the target quantification network layer can be deployed in the speech recognition model. At this time, the corresponding feature matrix is the speech feature obtained after the speech segment is processed by the feature extraction layer; the output feature is used for semantic recognition of the speech segment. deal with.

According to the solution of the embodiment of the present disclosure, after obtaining the feature matrix input by the target quantization network layer and the weight matrix of the target quantization network layer, based on the target segmentation coefficient, each row of the feature matrix and each column of the weight matrix are divided into at least two weight element sub-segments, and then perform quantization processing on the divided feature element sub-segments and weight element sub-segments according to the target quantization bit number, and determine the output characteristics of the target quantization network layer based on the processing results. By introducing target segmentation coefficients, this solution divides each row of the feature matrix and each column of the weight matrix into multiple sub-segments for quantization. While achieving low-bit quantization matrix multiplication, it also ensures the accuracy of low-bit matrix multiplication quantification. , that is, it can compress the model volume and improve the model running speed while ensuring the accuracy of the model as much as possible, thereby reducing the cost of implementing artificial intelligence technology.

In this embodiment, the target segmentation coefficient of the target quantization network layer may include a first coefficient and a second coefficient. Correspondingly, the method of segmenting the feature matrix and the weight matrix is: according to the target segmentation of the target quantization network layer According to the first coefficient in the coefficient, each row of feature elements of the feature matrix is divided into at least two feature element sub-segments; according to the second coefficient in the target segmentation coefficient, each column weight element of the weight matrix is divided into at least two weight elements. Element subsection; wherein, the first coefficient and the second coefficient may be the same or different. Regardless of whether they are the same or not, the relationship between the first coefficient and the second coefficient must be an integer multiple. For example, the first coefficient C1=4 and the second coefficient C2=2. The method of dividing the feature matrix based on the first coefficient and the method of dividing the weight matrix based on the second coefficient are similar to the methods introduced in the above embodiments and will not be described again here. This method can divide the weight matrix and feature matrix into sub-segments based on different target segment coefficients, which improves the flexibility and diversity of the division rules, improves the subsequent matrix quantification, and determines the accuracy and flexibility of the output matrix based on the quantification results. sex.

Figure 2 is a flow chart of another data processing method provided by an embodiment of the present disclosure. Based on the above embodiments, the embodiments of the present disclosure explain in detail how to determine the output characteristics of the target quantization network layer based on the quantization processing results. As shown in Figure 2, the data processing method provided by this embodiment may include:

S201: Obtain the feature matrix input by the target quantization network layer and the weight matrix of the target quantization network layer.

The number of columns of the feature matrix is equal to the number of rows of the weight matrix.

S202. According to the target segmentation coefficient of the target quantization network layer, divide each row of feature elements of the feature matrix into at least two feature element sub-segments, and divide each column of weight elements of the weight matrix into at least two weight element sub-segments.

S203: Perform quantization processing on at least two feature element sub-segments and at least two weight element sub-segments according to the target quantization bit number of the target quantization network layer.

S204: Determine each characteristic element sub-segment in the characteristic matrix and the corresponding weight element sub-segment in the weight matrix, and use the corresponding characteristic element sub-segments and weight element sub-segments as a set of associated sub-segment pairs.

In this embodiment, each characteristic element sub-segment of the characteristic matrix has a corresponding weight element sub-segment in each column of the weight matrix to determine the weight element sub-segment corresponding to the s-th characteristic element sub-segment of the i-th row in the characteristic matrix. Taking the segment as an example, perform the following operations for each column of the weight matrix in turn: According to the position of each feature element in the i-th row of the feature element sub-segment, select the weight element where the weight element at the same position in each column of the weight matrix is located. Sub-segment, as the weight element sub-segment corresponding to each column of the weight matrix for the feature element sub-segment.

For example, the characteristic matrix I is

The characteristic matrix W is

And the feature matrix I is divided into sub-segments based on the first coefficient C1 in the target segment coefficient, and the weight matrix W is divided based on the second coefficient C2 in the target segment coefficient. Among them, the values of C1 and C2 are positive integers, and they can be the same or different.

If C1=C2=4, determine the characteristic element sub-segment 1 (I ₁₁ , I ₁₂ , I ₁₃ , I ₁₄ ) of the first row in the feature matrix I when it is the weight element sub-segment corresponding to the first column of the weight matrix W , since multiple characteristic elements in the characteristic element subsection 1 are located at the 1st to 4th element positions in the 1st row of the feature matrix I, this embodiment corresponds to the 1st to 4th element positions in the 1st column of the weight matrix W The weight element sub-segment 1 (W ₁₁ , W ₂₁ , W ₃₁ , W ₄₁ ) where the weight element (i.e. W ₁₁ , W ₂₁ , W ₃₁ , W ₄₁ ) is located, as the characteristic element sub-segment 1 in the weight matrix W The corresponding weight element subsegment in column 1. The weight element sub-segments corresponding to the characteristic element sub-segment 1 in other columns of the weight matrix W are determined in the same way, and will not be described again here.

If C1=4, C2=2, determine the weight element subsection corresponding to the feature element sub-segment 1 (I ₁₁ , I ₁₂ , I ₁₃ , I ₁₄ ) in the first row of the feature matrix I in the first column of the weight matrix W segment, since multiple characteristic elements in the characteristic element sub-segment 1 are located at the 1st to 4th element positions in the 1st row of the feature matrix I, this embodiment places the 1st to 4th elements in the 1st column of the weight matrix W The weight element sub-segment 1 (W ₁₁ , W ₂₁ ) and weight element sub-segment 2 (W ₃₁ , W ₄₁ ) where the weight element corresponding to the position ₍ i.e. W ₁₁ , W ₂₁ , W 31 , W ₄₁ ) is located is used as this feature The weight element sub-segment corresponding to element sub-segment 1 in the first column of the weight matrix W. The weight element sub-segments corresponding to the characteristic element sub-segment 1 in other columns of the weight matrix W are determined in the same way, and will not be described again here.

If C1=2, C2=4, then when determining the characteristic element sub-segment 1 (I ₁₁ , I ₁₂ ) of the first row in the characteristic matrix I in the weight element sub-segment corresponding to the first column of the weight matrix W, due to the characteristic element Multiple feature elements in subsegment 1 are located at the 1st and 2nd element positions in the 1st row of the feature matrix I, so this embodiment uses the weight elements corresponding to the 1st and 2nd element positions in the 1st column of the weight matrix W ( That is, the weight element sub-segment 1 (W ₁₁ , W ₂₁ , W ₃₁ , W ₄₁ ) where W ₁₁ , W ₂₁ ) is located is the corresponding weight element sub-segment 1 of the characteristic element sub-segment 1 in the first column of the weight matrix W. . In the same way, the weight element sub-segment corresponding to the characteristic element sub-segment 2 (I ₁₃ , I ₁₄ ) in the first column of the weight matrix W is also the weight element sub-segment 1 (W ₁₁ , W ₂₁ , W ₃₁ , W ₄₁ ).

In this embodiment, after determining the corresponding characteristic element sub-segments and weight element sub-segments, for each row in the characteristic matrix I, each characteristic element sub-segment in the row can be analyzed to compare with each column of the weight matrix. The corresponding relationship between each weight element sub-segment, and the corresponding row in the feature matrix I and each column of the weight matrix, the feature element sub-segment and the weight element sub-segment with the corresponding relationship are regarded as a set of associated sub-segment pairs.

For example, for the feature matrix I and the weight matrix W, if C1=C2=4, then there are two sets of associated sub-segment pairs between the first row of the feature matrix I and the first column of the weight matrix W, that is, feature elements. Sub-segment 1 (I ₁₁ , I ₁₂ , I ₁₃ , I ₁₄ ) and weight element sub-segment 1 (W ₁₁ , W ₂₁ , W ₃₁ , W ₄₁ ); feature element sub-segment 2 (I ₁₅ , I ₁₆ , I ₁₇ ,I ₁₈ ) and weight element sub-segment 2 (W ₅₁ , W ₆₁ , W ₇₁ , W ₈₁ ). There are also two sets of associated sub-segment pairs between the first row of the feature matrix I and the second column of the weight matrix W, namely the feature element sub-segment 1 (I ₁₁ , I ₁₂ , I ₁₃ , I ₁₄ ) and the weight element sub-segment 1 Segment 3 (W ₁₂ , W ₂₂ , W ₃₂ , W ₄₂ ); feature element sub-segment 2 (I ₁₅ , I ₁₆ , I ₁₇ , I ₁₈ ) and weight element sub-segment 4 (W ₅₂ , W ₆₂ , W ₇₂ , W ₈₂ ).

In this embodiment, the ratio of the number of feature element subsegments and weight element subsegments contained in each group of associated subsegment pairs is the same as the ratio of the segmentation coefficients that divide the weight matrix and the feature matrix. That is, for each group of associated sub-segment pairs, the number of feature element sub-segments contained therein/the number of weight element sub-segments = the first coefficient of the division weight matrix/the second coefficient of the division feature matrix.

S205: Determine the output characteristics of the target quantization network layer based on the quantization processing results of the feature element sub-segments and weight element sub-segments in each group of associated sub-segment pairs.

In this embodiment, according to the characteristic reference value and the quantized value of the characteristic element sub-segment in each group of associated sub-segment pairs, as well as the weight reference value and the quantized value of the weight element sub-segment, the corresponding positions are The quantized value of the feature element and the quantized value of the weight element are calculated as low-bit products and then summed, and then the product summation result is multiplied with the feature reference value and the weight reference value to obtain the sub-inner product of the group of associated sub-segment pairs. Among them, if the number of columns where the feature element is located is the same as the number of rows where the weight element is located, then the position of the feature element corresponds to the position of the weight element.

For example, if the target segmentation coefficients of the partitioning feature matrix I and the weight matrix W are the same, that is, C1=C2=C, then the sub-inner product of each group of associated sub-segment pairs can be calculated through the following formula (3).

Among them, O _i,s,j is the sub-inner product of the associated sub-segment pair corresponding to the s-th feature element sub-segment in the i-th row in the feature matrix I and the s-th weight element sub-segment in the j-th column in the weight matrix W. ; C is the target segmentation coefficient; I′ _i,t is the quantized value of the feature element in the i-th row and t-th column in the feature matrix I; W′ _t,j is the feature element in the t-th row and j-th column in the weight matrix W quantified value. bsmax(I _i,s ) is the characteristic reference value of the s-th feature element sub-segment in the i-th row in the feature matrix I; bsmax(W _s,j ) is the s-th weight element sub-segment in the j-th column in the weight matrix W Weight base value. Among them, the value of t is a positive integer.

After the sub-inner product of each group of associated sub-segment pairs is determined in the above manner, the output characteristics of the target quantization network layer are determined based on the sub-inner product of each group of associated sub-segment pairs. It may be to sum up the inner products of multiple groups of associated sub-segment pairs with the same number of corresponding rows in the feature matrix and the same number of corresponding columns in the weight matrix to obtain the element value at the corresponding row and corresponding column position in the output feature.

Right now

Among them, O _i,j is the element value of the i-th row and j-th column in the matrix where the output feature is located; k is the total number of columns of the feature matrix (also the total number of rows of the weight matrix); O _i,s,j is the feature matrix I The sub-inner product of the associated sub-segment pair corresponding to the s-th feature element sub-segment in the i-th row in the weight matrix W and the s-th weight element sub-segment in the j-th column in the weight matrix W.

According to the solution of the embodiment of the present disclosure, after obtaining the feature matrix input by the target quantization network layer and the weight matrix of the target quantization network layer, based on the target segmentation coefficient, each row of the feature matrix and each column of the weight matrix are divided into at least two weight element sub-segments, and then quantize the divided feature element sub-segments and weight element sub-segments according to the target quantization bit number, and determine the corresponding feature element sub-segments and weight element sub-segments as a set of associations For sub-segment pairs, the output characteristics of the target quantization network layer are determined based on the quantization processing results of the feature element sub-segments and weight element sub-segments in each group of associated sub-segment pairs. When this solution determines the output characteristics of the target quantization network layer based on the feature element sub-segment and the weight element sub-segment, it first determines the correspondence between the feature element sub-segment and the weight element sub-segment. Based on this correspondence, it can be more accurate and The output features are quickly determined, thereby ensuring the accuracy of the target quantization network layer operation results.

Since low-bit matrix multiplication is the computing core of the data processing method introduced above, and the graphics processor (Graphics Processing Unit, GPU) developed by Nasdaq (NVIDIA) has efficient support for low-bit multiplication, it can implement int4 and int1 and other low-bit multiplication operations. Therefore, based on the above embodiment, this embodiment uses the Tensor Core computing unit of the GPU developed by NVIDIA to determine the output characteristics of the target quantization network layer based on the quantization processing results. The implementation method is as follows: after obtaining the quantization processing results of each feature element sub-segment and each weight element sub-segment based on the above embodiment, the quantization results are sequentially loaded into the cache space of the Tensor Core computing unit, and then the cache space is The quantization results of the feature element sub-segments contained in each group of associated sub-segment pairs (i.e., the feature reference value and the quantized value of the feature element) and the quantification results of the weight element sub-segment (i.e., the weight reference value and the quantized value of the weight element) are as The input of the Tensor Core computing unit. The Tensor Core computing unit can first provide low-bit multiplication calculation based on the input quantization result. That is, the quantized value of the feature element corresponding to the position and the quantized value of the weight element are multiplied and then summed to obtain a low-bit multiplication calculation. The bit calculation result (for example, when the target number of quantization bits is 4, the low-bit calculation result obtained at this time is an integer result of type int32), and then the inverse quantization calculation is performed, that is, the low-bit calculation result is combined with the feature reference value and weight reference value Calculate the product to obtain the sub-inner product of each group of associated sub-segment pairs, which is a single-precision floating point type; finally, based on the sub-inner product of each group of associated sub-segment pairs, determine the output characteristics of the target quantization network layer.

This embodiment provides an example of implementing the data processing method of this embodiment based on the Tensor Core computing unit of the GPU developed by NVIDIA, which provides a basis for subsequent customization of chips (such as application specific integrated circuit (Application Specific Integrated Circuit, ASIC) chips). The application of this data processing algorithm provides technical support for the quantification of deep learning models.

The above data processing method in this embodiment sequentially completes the process of converting floating point numbers into low-bit integers, low-bit matrix multiplication, and inverse quantization. Since the value of the weight matrix will not change during the entire calculation process, its quantization process can be completed offline, while the input feature matrix needs to be quantized online. The size of the target segmentation coefficient C of the target quantization network layer will directly affect the accuracy of the quantization process. Generally, the larger the target segmentation coefficient C, the lower the numerical accuracy of the quantification representation, and the corresponding accuracy of the final output feature will also decrease. Decrease; the smaller the target segmentation coefficient C, the higher the numerical accuracy of the quantitative representation, and the corresponding accuracy of the final output feature will also increase. That is, the target segmentation coefficient C affects the calculation efficiency. Generally, the larger the value, the fewer instructions are required, which means the calculation time is smaller; conversely, the smaller the value, the greater the calculation time is. Therefore, the target segmentation coefficient C is the key to balancing model accuracy and speed, and the value selection needs to be customized according to the scene requirements.

Figure 3 is a flow chart of another data processing method provided by an embodiment of the present disclosure. Based on the above embodiments, the embodiment of the present disclosure explains how to determine the target quantization network layer, the target segmentation coefficient and the target quantization bit number of the target quantization network layer. As shown in Figure 3, the data provided by this embodiment Treatment methods may include:

S301. Determine the optional quantization strategy of the original model.

The original model can be a deep learning model that needs to be quantized, which contains at least one network layer capable of quantification, that is, an optional quantization network layer. This optional quantized network layer contains matrix multiplication operators. The optional quantization strategy is the strategy used to quantize the original model, which includes: an optional quantization network layer, an optional segmentation coefficient of the optional quantization network layer, and an optional number of quantization bits. There are multiple optional quantization strategies in this embodiment. For each optional quantization strategy, there is an optional quantization network layer and a set of quantization configuration parameters corresponding to the optional quantization network layer, that is, Select segmentation coefficients and optional number of quantization bits. For different optional quantization strategies, the optional quantization network layers can be different, but the optional segmentation coefficients and the number of optional quantization bits corresponding to different optional quantization network layers are the same; it can also be the optional quantization network included. The layers are the same, but the optional segmentation coefficients and/or the number of optional quantization bits corresponding to the same optional quantization network layer are different; it is also possible to include optional quantization network layers with different optional segmentation coefficients and the number of optional quantization bits. etc., there is no limitation on this.

In this embodiment, an implementation method for determining the optional quantization strategy of the original model may be: first determine the network layer containing the matrix multiplication operator in the original model as the optional quantization network layer, and then based on experience for each possible quantization network layer. Select the quantization network layer to configure at least one optional segmentation coefficient and the number of optional quantization bits; then use each optional quantization network layer and its corresponding optional segmentation coefficient and the number of optional quantization bits in turn as the original model An optional quantification strategy.

Another way to implement it is to first determine the network layer containing the matrix multiplication operator in the original model as an optional quantization network layer, and then for each optional quantization network layer, segment from the predetermined alternative quantization Segmented coefficients are randomly selected from the coefficient set, and the number of quantization bits is randomly selected from the set of alternative quantization bit numbers, and then randomly combined with the optional quantization network layer to obtain multiple optional quantization strategies.

S302: Obtain the quantitative contribution information of the optional quantization strategy obtained by performing data processing on the original model based on the optional quantization strategy.

The quantification contribution information in this embodiment refers to the degree of contribution of the optional quantization strategy to the quantization effect of the original model, which may include: model accuracy information and compression volume information. Among them, the model accuracy information is the accuracy value of the model after quantizing the original model based on the optional quantization strategy. The compressed volume information is the compressed volume value of the model volume after quantizing the original model based on this optional quantization strategy compared to before quantization.

In this embodiment, for each optional quantization strategy obtained, the original model is quantized based on the optional quantization strategy, the optional quantization network layer corresponding to the optional quantization strategy is found in the original model, and then the optional quantization strategy is The optional segmentation coefficients and optional quantization bit numbers of the quantization network layer are assigned to the quantization parameters of the optional quantization network layer in the original model, and then the verification data set of the original model is input into the original model. Each network layer of will perform data processing based on its network parameters to obtain the corresponding output results. This embodiment mainly obtains the output results of the optional quantization network layer, that is, the test output features; combine the test output features with the optional quantization The network layer performs error analysis on the real output features before quantization processing based on the optional segmentation coefficient and the optional number of quantization bits, and obtains the model accuracy value in the quantization contribution information corresponding to the optional quantization strategy. Then, the compression volume information in the quantization contribution information corresponding to the optional quantization strategy is determined according to the number of optional quantization bits in the optional quantization strategy.

In this embodiment, the optional quantization network layer assigned optional segmentation coefficients and optional quantization bit numbers determines the test output characteristics based on its input feature matrix and its own weight matrix. Refer to the method introduced in the above embodiment. The content will not be described in detail here.

S303. According to the quantization contribution information, determine the target quantization strategy from the optional quantization strategies to obtain the target quantization network layer, the target segmentation coefficient and the target quantization bit number of the target quantization network layer.

In this embodiment, based on the quantification contribution information, the method of determining the target quantization strategy from the optional quantization strategies may be to weigh the model accuracy information and the compression volume information at the same time, and select an optional quantization strategy with a relatively small loss in model accuracy and a relatively large compression volume as the Goal quantification strategy. For example, one implementation method is: first, based on the model accuracy information in the quantitative contribution information, select optional quantization strategies with model accuracy losses within an acceptable range, and then determine the corresponding quantization strategies for this part of the selected optional quantization strategies. Compression volume, use at least one of the top compression volumes as the target quantization strategy.

Another possible implementation method is: according to the model accuracy information in the quantified contribution information, sort multiple optional quantization strategies in order from high to low model accuracy, and then based on the compression volume information in the multiple quantified contribution information and the expected compression volume, and determine the target quantization strategy from the optional quantization strategies in order from high to low. For example, it is determined which of the top-ranked optional quantization strategies correspond to a total compression volume information that can reach the expected compression volume, and then these optional quantization strategies are used as target quantization strategies.

In this embodiment, after determining the target quantization strategy, subsequent data processing operations can be performed based on the corresponding target quantization network layer in each target quantization strategy and its corresponding target segmentation coefficient and target number of quantization bits. In this way, the operation process of the target quantization network layer in the original model can be quantified, thereby achieving the effect of quantifying the original model.

S304: Obtain the feature matrix input by the target quantization network layer and the weight matrix of the target quantization network layer.

S305. According to the target segmentation coefficient of the target quantization network layer, divide each row of feature elements of the feature matrix into at least two feature element sub-segments, and divide each column of weight elements of the weight matrix into at least two weight element sub-segments.

S306: Perform quantization processing on at least two feature element sub-segments and at least two weight element sub-segments according to the target quantization bit number of the target quantization network layer, and determine the output characteristics of the target quantization network layer based on the quantization processing results.

According to the solution of the embodiment of the present disclosure, after determining the optional quantization strategy of the original model, the original model is controlled to perform data processing based on the optional quantization strategy, and the quantification contribution information of the optional quantization strategy is determined based on the processing results. According to each optional quantization strategy The quantitative contribution information is used to determine the target quantization strategy, and then the feature matrix input to the target quantization network layer and the weight matrix of the target quantization network layer are processed according to the target segmentation coefficient and the target quantization bit number in the target quantization strategy to obtain the output features. This solution determines the final quantization strategy of the model based on the quantitative contribution information of multiple optional quantization strategies. While ensuring the accuracy of model quantification, it also reduces the model volume, thereby improving the accuracy of model quantification.

Figure 4 is a flow chart of another data processing method provided by an embodiment of the present disclosure. Based on the above embodiments, the embodiments of the present disclosure explain how to determine the target quantification strategy from the optional quantization strategies based on the quantification contribution information. As shown in Figure 4, the data processing method provided by this embodiment may include:

S401. Determine the optional quantization strategy of the original model.

The optional quantization strategy includes: an optional quantization network layer, an optional segmentation coefficient of the optional quantization network layer, and an optional number of quantization bits.

S402: Obtain the quantitative contribution information of the optional quantization strategy obtained by performing data processing on the original model based on the optional quantization strategy.

S403: Determine a newly selected quantification strategy from the optional quantification strategies based on the quantification contribution information corresponding to the optional quantification strategies.

In this embodiment, when determining the target quantification strategy from the optional quantification strategies, it is determined through multiple screenings, that is, a part is screened out each time, and all the optional quantification strategies screened out multiple times are used as the target quantification strategies, and for the current time The selected optional quantitative strategies will be used as newly selected quantitative strategies, and the selected optional quantitative strategies before the current time will be used as historical selected quantitative strategies.

In this embodiment, one possible way to determine a new selected quantization strategy from the optional quantization strategies based on the quantitative contribution information corresponding to the optional quantization strategies is to simultaneously combine the model accuracy information and the compression volume information, each time from the available Among the selected quantization strategies, select a preset number of optional quantization strategies (such as 3) with smaller accuracy loss and larger compression volume as the newly selected quantization strategy.

Another implementation method is to sort the optional quantization strategies according to the model accuracy information and compression volume information corresponding to the optional quantization strategies; according to the sorting results and the compression volume information corresponding to the optional quantization strategies, sort the optional quantization strategies from Confirm to add the selected quantitative strategy. According to the model volume L of the current original model and the expected compression volume R, the compression volume R' of this screening is calculated, where R'=(L-R)/2. Then, each optional quantization strategy is sorted from high to low according to its corresponding model accuracy, and it is judged that the sum of the compression volume information corresponding to the top-ranked optional quantization strategies can reach the compression volume of this screening, then These optional quantitative strategies are newly selected quantitative strategies this time. Wherein, the values of L, R and R' are positive numbers;

In this embodiment, the second method can be used to determine the newly selected quantization strategy. This method can select a target quantization strategy that meets the requirements of quantization accuracy and quantization volume faster and more accurately.

S404: Determine the total compression volume of the newly selected strategy and the historically selected quantization strategy.

Each time a newly selected strategy is determined in this embodiment, the compression volume information in the quantitative contribution information corresponding to the newly selected strategy and the compression volume information in the quantitative contribution information corresponding to each previously determined historical selected strategy will be used. Volume information, calculate the total compression volume of the original model by newly selected strategies and historical selected strategies. For example, the total compression volume is obtained by summing the compression volume corresponding to the newly selected strategy and the compression volume corresponding to the historical selected strategy.

S405: Determine whether the total compression volume meets the quantification requirement. If the total compression volume does not meet the quantification requirement, execute S406. If the total compression volume reaches the quantification requirement, execute S409.

The quantization requirement may be a preset expected compression volume. This embodiment can determine whether the currently reached total compression volume has reached the expected compression volume, that is, whether it has met the quantification requirements, after each time a part of the newly selected strategies is determined. If the currently reached total compression volume has not reached the expected compression volume, it means that the quantification requirements have not been met, and the subsequent operation of S406 needs to be performed. If the total compression volume currently reached reaches the expected compression volume, it means that the quantification requirements have been met, and the subsequent operations of S409 need to be performed.

S406, when the total compression volume does not meet the quantization requirements, perform preliminary quantization on the original model based on the newly selected quantization strategy, and train the original model after preliminary quantization to obtain a preliminary quantization model.

If S405 determines that the total compression volume does not meet the quantization requirements, that is, the expected compression volume is not reached, then based on the optional quantization network layer in the new quantization strategy and its optional segmentation coefficients and optional number of quantization bits, the corresponding compression volume in the original model will be The optional quantization network layer assigns quantization parameters, that is, achieves preliminary quantification of the original model. Then use the training samples to perform model training on the initially quantized original model, which can include forward training and reverse training. to obtain a preliminary quantitative model.

S407: Add the newly selected quantitative strategy to the historically selected quantitative strategy.

S408: Use optional quantification strategies other than the newly selected quantification strategy among the optional quantification strategies as new optional quantification strategies, add the newly selected quantification strategy to the historically selected quantification strategy, and add the preliminary The quantized model is used as the original model, and the operation of S402 is returned.

S409, when the total compression volume reaches the quantization requirement, use the newly selected quantization strategy and the historically selected quantization strategy as the target quantization strategy to obtain the target quantization network layer and the target segmentation coefficient and target quantization of the target quantization network layer. Number of bits.

S410: Obtain the feature matrix input to the target quantization network layer and the weight matrix of the target quantization network layer.

S411. According to the target segmentation coefficient of the target quantization network layer, divide each row of feature elements of the feature matrix into at least two feature element sub-segments, and divide each column of weight elements of the weight matrix into at least two weight element sub-segments.

S412: Perform quantization processing on at least two feature element sub-segments and at least two weight element sub-segments according to the target quantization bit number of the target quantization network layer, and determine the output characteristics of the target quantization network layer based on the quantization processing results.

According to the solution of the embodiment of the present disclosure, after determining the optional quantization strategy of the original model, the original model is controlled to perform data processing based on the optional quantization strategy, and the quantification contribution information of the optional quantization strategy is determined based on the processing results. According to each optional quantization strategy Based on the quantitative contribution information, the newly selected quantization strategy is determined from the optional quantization strategies in batches. If the total compression volume of the newly selected quantization strategy and the historically selected quantization strategy does not reach the quantization strategy, the newly selected quantization strategy will be used based on the newly selected quantization strategy. After the original model is quantized and trained, return and re-execute the quantification contribution information of the optional quantization strategy and its subsequent operations until the total compression volume of the new and historical selected quantization strategies reaches the quantization strategy, and then the new and historical selected quantization strategies will be added. The quantization strategy is selected as the target quantization strategy, and then the feature matrix input to the target quantization network layer and the weight matrix of the target quantization network layer are processed according to the target segmentation coefficient and the target quantization bit number in the target quantization strategy to obtain the output features. This solution obtains the target quantification strategy in batches, and performs quantification and training processing of the original model based on the newly selected quantification strategy between batches, which greatly ensures the accuracy of the extracted target quantification strategy, thereby ensuring Model quantification accuracy.

Figure 5 is a schematic structural diagram of a data processing device provided by an embodiment of the present disclosure. The embodiment of the present disclosure is suitable for performing quantitative processing on the data calculation process of the target quantization network layer in the deep learning model, and is suitable for performing quantitative processing on the data calculation process of the target quantization network layer in the deep learning model. The feature matrix input to the target quantization network layer and the weight matrix of the target quantization network layer are processed to obtain the output features of the target network layer. The device can be configured in an electronic device installed with a deep learning model and implemented using software and/or hardware. The device can implement the data processing method of any embodiment of the present disclosure. As shown in Figure 5, the data processing device 500 includes:

The matrix acquisition module 501 is configured to acquire the feature matrix input by the target quantization network layer and the weight matrix of the target quantization network layer; wherein the number of columns of the feature matrix is equal to the number of rows of the weight matrix; the matrix segmentation module 502 is configured to obtain the target quantization according to The target segmentation coefficient of the network layer divides each row of feature elements of the feature matrix into at least two feature element sub-segments, and divides each column of weight elements of the weight matrix into at least two weight element sub-segments; the quantification processing module 503, It is configured to perform quantization processing on at least two feature element sub-segments and at least two weight element sub-segments according to the target quantization bit number of the target quantization network layer; the feature determination module 504 is configured to determine the target quantization network layer based on the quantization processing results. output characteristics.

In one embodiment, the matrix segmentation module 502 is configured as:

According to the first coefficient in the target segmentation coefficient of the target quantization network layer, each row of feature elements of the feature matrix is divided into at least two feature element sub-segments; according to the second coefficient in the target segmentation coefficient, each row of the weight matrix is A column of weight elements is divided into at least two weight element sub-segments; wherein the first coefficient and the second coefficient are in an integer multiple relationship.

As shown in Figure 6, in one embodiment, the feature determination module 504 includes:

The sub-segment pair determining unit 610 is configured to determine each feature element sub-segment in the feature matrix and the corresponding weight element sub-segment in the weight matrix, and use the corresponding feature element sub-segments and weight element sub-segments as a group Associated sub-segment pairs; the feature calculation unit 620 is configured to determine the output features of the target quantization network layer based on the quantization processing results of the feature element sub-segments and weight element sub-segments in each set of associated sub-segment pairs.

In one embodiment, the ratio of the number of feature element sub-segments and weight element sub-segments contained in each group of associated sub-segment pairs is the same as the ratio of the segmentation coefficients that divide the weight matrix and the feature matrix.

In one embodiment, the feature determination module 504 is configured as:

Through the Tensor Core computing unit of the graphics processor GPU, the output characteristics of the target quantization network layer are determined based on the quantization processing results.

In one embodiment, the feature matrix is the speech features obtained after the speech segments are processed by the feature extraction layer; the output features are used to perform semantic recognition processing on the speech segments.

As shown in Figure 7, in one embodiment, the data processing device 500 also includes:

The optional strategy determination module 505 is configured to determine the optional quantization strategy of the original model; wherein the optional quantization strategy includes: an optional quantization network layer, an optional segmentation coefficient of the optional quantization network layer, and an optional number of quantization bits; The contribution information acquisition module 506 is configured to obtain the quantitative contribution information of the optional quantification strategy obtained by performing data processing based on the optional quantification strategy on the original model; the target strategy determination module 507 is configured to obtain the quantitative contribution information from the optional quantification strategy based on the quantitative contribution information. Determine the target quantization strategy to obtain the target quantization network layer, the target segmentation coefficient and the target quantization bit number of the target quantization network layer.

As shown in Figure 8, in one embodiment, the target policy determination module 507 includes:

The new strategy determination unit 710 is configured to determine a newly selected quantization strategy from the optional quantization strategies based on the quantification contribution information corresponding to the optional quantization strategy; the compression volume determination unit 720 is configured to determine the newly selected strategy and history The total compression volume of the selected quantization strategy; the target strategy determination unit 730 is configured to use the newly selected quantization strategy and the historically selected quantization strategy as the target quantization strategy when the total compression volume reaches the quantization requirement.

In one embodiment, the quantitative contribution information includes: model accuracy information and compression volume information; a new strategy determination unit 710 is set to:

Sort the optional quantization strategies according to the model accuracy information and compression volume information corresponding to the optional quantization strategies; determine the newly selected quantization strategy from the optional quantization strategies based on the sorting results and the compression volume information corresponding to the optional quantization strategies. .

As shown in Figure 9, in one embodiment, the target policy determination module 507 also includes:

The quantization training unit 740 is configured to perform preliminary quantization on the original model based on the newly selected quantization strategy when the total compression volume does not meet the quantization requirements, and trains the original model after preliminary quantization to obtain a preliminary quantization model; historical quantification The strategy update unit 750 is configured to add the newly selected quantization strategy to the historically selected quantization strategy; the loop operation unit 760 is configured to add other optional quantization strategies in the optional quantization strategies except the newly selected quantization strategy. As a new optional quantization strategy, and using the preliminary quantization model as the original model, return to perform the operation of obtaining the quantitative contribution information of the optional quantization strategy obtained by performing data processing on the original model based on the optional quantization strategy.

The above-mentioned products can execute the methods provided by any embodiment of the present disclosure, and have corresponding functional modules and effects for executing the methods.

In the technical solution of this disclosure, the acquisition, storage and application of feature matrices, weight matrices, output features, voice clips, etc. are all in compliance with relevant laws and regulations and do not violate public order and good customs.

According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium, and a computer program product to implement the above data processing method.

Figure 10 shows a schematic block diagram of an example electronic device 600 that may be used to implement embodiments of the present disclosure. Electronic device 600 is intended to represent many forms of digital computers, such as laptop computers, desktop computers, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers. Electronic devices may also represent various forms of mobile devices, such as personal digital assistants, cellular phones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions are examples only and are not intended to limit implementations of the disclosure described and/or claimed herein.

As shown in Figure 10, the electronic device 600 includes a computing unit 601, which can be loaded into a random access memory (Random Access Memory) according to a computer program stored in a read-only memory (Read-Only Memory, ROM) 602 or from a storage unit 608. Computer program in RAM) 603 to perform various appropriate actions and processes. In the RAM 603, various programs and data required for the operation of the electronic device 600 can also be stored. Computing unit 601, ROM 602 and RAM 603 are connected to each other via bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

Multiple components in the electronic device 600 are connected to the I/O interface 605, including: an input unit 606, such as a keyboard, a mouse, etc.; an output unit 607, such as various types of displays, speakers, etc.; a storage unit 608, such as a magnetic disk, an optical disk, etc. etc.; and a communication unit 609, such as a network card, modem, wireless communication transceiver, etc. The communication unit 609 allows the electronic device 600 to exchange information/data with other devices through a computer network such as the Internet and/or various telecommunications networks.

Computing unit 601 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 601 include, but are not limited to, a central processing unit (Central Processing Unit, CPU), a graphics processing unit (Graphics Processing Unit, GPU), a variety of dedicated artificial intelligence (Artificial Intelligence, AI) computing chips, a variety of running The computing unit of the machine learning model algorithm, Digital Signal Processing (DSP), and any appropriate processor, controller, microcontroller, etc. The computing unit 601 performs a plurality of methods and processes described above, such as data processing methods. For example, in some embodiments, the data processing method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 608. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 600 via the ROM 602 and/or the communication unit 609. When the computer program is loaded into RAM 603 and executed by computing unit 601, one or more steps of the data processing method described above may be performed. Alternatively, in other embodiments, the computing unit 601 may be configured to perform the data processing method in any other suitable manner (eg, by means of firmware).

Various implementations of the systems and techniques described above may be implemented in digital electronic circuit systems, integrated circuit systems, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Parts (ASSP), System on Chip (SOC), Complex Programmable Logic Device (CPLD), computer hardware, firmware, software, and/or their realized in combination. Various implementations may include implementation in one or more computer programs that may be executed and/or interpreted on a programmable system including at least one programmable processor that may is a special-purpose or general-purpose programmable processor that can receive data and instructions from a storage system, at least one input device, and at least one output device, and transmit data and instructions to the storage system, the at least one input device, and the at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general-purpose computer, special-purpose computer, or other programmable data processing device, such that the program codes, when executed by the processor or controller, cause the functions specified in the flowcharts and/or block diagrams/ The operation is implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that may contain or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. Machine-readable media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices or devices, or any suitable combination of the foregoing. Examples of machine-readable storage media would include one or more wire-based electrical connections, laptop disks, hard drives, RAM, ROM, Erasable Programmable Read-Only Memory (EPROM), or flash memory ), optical fiber, portable compact disk read-only memory (Compact Disc Read-Only Memory, CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.

To provide interaction with a user, the systems and techniques described herein may be implemented on a computer having a display device (e.g., a cathode ray tube (CRT)) or a liquid crystal display (e.g., a CRT) configured to display information to a user. Liquid Crystal Display (LCD) monitor); and a keyboard and pointing device (e.g., a mouse or a trackball) through which a user can provide input to the computer. Other kinds of devices may also be configured to provide interaction with the user; for example, the feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and may be provided in any form, including Acoustic input, voice input or tactile input) to receive input from the user.

The systems and techniques described herein may be implemented in a computing system that includes back-end components (e.g., as a data server), or a computing system that includes middleware components (e.g., an application server), or a computing system that includes front-end components (e.g., A user's computer having a graphical user interface or web browser through which the user can interact with implementations of the systems and technologies described herein), or including such backend components, middleware components, or any combination of front-end components in a computing system. The components of the system may be interconnected by any form or medium of digital data communication (eg, a communications network). Examples of communication networks include: Local Area Network (LAN), Wide Area Network (WAN), blockchain network, and the Internet.

Computer systems may include clients and servers. Clients and servers are generally remote from each other and typically interact over a communications network. The relationship of client and server is created by computer programs running on corresponding computers and having a client-server relationship with each other. The server can be a cloud server, also known as cloud computing server or cloud host. It is a host product in the cloud computing service system to solve the problems that exist in traditional physical host and virtual private server (VPS) services. It has the disadvantages of difficult management and weak business scalability. The server can also be a distributed system server or a server combined with a blockchain.

Artificial intelligence is the study of using computers to simulate some human thinking processes and intelligent behaviors (such as learning, reasoning, thinking, planning, etc.). It has both hardware-level technology and software-level technology. Artificial intelligence hardware technology generally includes sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing and other technologies; artificial intelligence software technology mainly includes computer vision technology, speech recognition technology, natural language processing technology and machine learning/depth Learning technology, big data processing technology, knowledge graph technology and other major directions.

Cloud computing refers to a flexible and scalable shared physical or virtual resource pool through network access. Resources can include servers, operating systems, networks, software, applications, storage devices, etc., and can be on-demand and self-service. A technical system for deploying and managing resources. Through cloud computing technology, it can provide efficient and powerful data processing capabilities for artificial intelligence, blockchain and other technology applications and model training.

Steps can be reordered, added, or removed using various forms of the process shown above. For example, multiple steps described in the present disclosure can be executed in parallel, sequentially, or in different orders. As long as the desired results of the technical solution disclosed in the present disclosure can be achieved, there is no limitation here.

Claims

A data processing method including:

Obtain the feature matrix input by the target quantization network layer and the weight matrix of the target quantization network layer; wherein the number of columns of the feature matrix is equal to the number of rows of the weight matrix;

According to the target segmentation coefficient of the target quantization network layer, each row of feature elements of the feature matrix is divided into at least two feature element sub-segments, and each column of weight elements of the weight matrix is divided into at least two weights. element subsection;

According to the target quantization bit number of the target quantization network layer, quantization processing is performed on the at least two feature element sub-segments and the at least two weight element sub-segments, and the target quantization network layer is determined according to the quantization processing result. output characteristics.
The method according to claim 1, wherein, according to the target segmentation coefficient of the target quantization network layer, each row of feature elements of the feature matrix is divided into at least two feature element sub-segments, and the Each column weight element of the weight matrix is divided into at least two weight element sub-segments, including:

Divide each row of feature elements of the feature matrix into at least two feature element sub-segments according to the first coefficient among the target segmentation coefficients of the target quantization network layer;

Divide each column weight element of the weight matrix into at least two weight element sub-segments according to the second coefficient in the target segment coefficient;

Wherein, the first coefficient and the second coefficient are in an integer multiple relationship.
The method according to claim 1, wherein determining the output characteristics of the target quantization network layer according to the quantization processing results includes:

Determine each characteristic element sub-segment in the characteristic matrix, the corresponding weight element sub-segment in the weight matrix, and use the characteristic element sub-segments and weight element sub-segments with corresponding relationships as a set of associated sub-segment pairs;

The output characteristics of the target quantization network layer are determined according to the quantization processing results of the feature element sub-segments and weight element sub-segments in each group of associated sub-segment pairs.
The method according to claim 3, wherein the ratio of the number of characteristic element sub-segments and weight element sub-segments contained in each group of associated sub-segment pairs is divided by the segmentation coefficient dividing the weight matrix and the characteristic matrix. The ratios are the same.
The method according to any one of claims 1 to 4, wherein determining the output characteristics of the target quantization network layer according to the quantization processing results includes:

Through the Tensor Core computing unit of the graphics processor GPU, the output characteristics of the target quantization network layer are determined based on the quantization processing results.
The method according to any one of claims 1 to 5, wherein the feature matrix is a speech feature obtained after a speech segment is processed by a feature extraction layer; the output feature is used to perform semantic recognition processing on the speech segment. .
The method according to any one of claims 1-6, further comprising:

Determine the optional quantization strategy of the original model; wherein the optional quantization strategy includes: an optional quantization network layer, an optional segmentation coefficient of the optional quantization network layer, and an optional number of quantization bits;

Obtain the quantitative contribution information of the optional quantization strategy obtained by performing data processing on the original model based on the optional quantization strategy;

According to the quantization contribution information, a target quantization strategy is determined from the optional quantization strategies to obtain the target quantization network layer, the target segmentation coefficient of the target quantization network layer, and the target quantization of the target quantization network layer. Number of bits.
The method according to claim 7, wherein determining a target quantization strategy from optional quantization strategies according to the quantization contribution information includes:

Determine a newly selected quantization strategy from the optional quantization strategies according to the quantification contribution information corresponding to the optional quantization strategies;

Determine the total compression volume of the newly selected strategy and the historically selected quantization strategy;

When the total compression volume reaches the quantization requirement, the newly selected quantization strategy and the historically selected quantization strategy are used as the target quantization strategy.
The method according to claim 8, wherein the quantitative contribution information includes model accuracy information and compression volume information;

Determining a newly selected quantization strategy from the optional quantization strategies based on the quantification contribution information corresponding to the optional quantization strategies includes:

Sort the optional quantization strategies according to the model accuracy information and the compression volume information corresponding to the optional quantization strategies;

The newly selected quantization strategy is determined from the optional quantization strategies according to the sorting result and the compression volume information corresponding to the optional quantization strategies.
The method according to claim 8 or 9, further comprising:

When the total compression volume does not meet the quantification requirements, perform preliminary quantification on the original model based on the newly selected quantization strategy, and train the original model after preliminary quantification to obtain a preliminary quantization model;

Add the newly selected quantization strategy to the historically selected quantization strategy;

Use the optional quantization strategies other than the newly selected quantization strategy among the optional quantization strategies as new optional quantization strategies, and use the preliminary quantization model as the original model, and return to execute to obtain the The original model performs data processing based on the optional quantization strategy to obtain the quantified contribution information of the optional quantization strategy.
A data processing device including:

A matrix acquisition module, configured to acquire the feature matrix input by the target quantization network layer and the weight matrix of the target quantization network layer; wherein the number of columns of the feature matrix is equal to the number of rows of the weight matrix;

A matrix segmentation module configured to divide each row of feature elements of the feature matrix into at least two feature element sub-segments according to the target quantification target segmentation coefficient of the network layer, and weight each column of the weight matrix The elements are divided into at least two weight element sub-segments;

A quantization processing module configured to perform quantization processing on the at least two feature element sub-segments and the at least two weight element sub-segments according to the target quantization bit number of the target quantization network layer;

A feature determination module is configured to determine the output features of the target quantization network layer based on the quantization processing results.
The device according to claim 11, wherein the matrix segmentation module is configured to:

Divide each row of feature elements of the feature matrix into at least two feature element sub-segments according to the first coefficient among the target segmentation coefficients of the target quantization network layer;

Divide each column weight element of the weight matrix into at least two weight element sub-segments according to the second coefficient in the target segment coefficient;

Wherein, the first coefficient and the second coefficient are in an integer multiple relationship.
The device according to claim 11, wherein the feature determination module includes:

The sub-segment pair determination unit is configured to determine each characteristic element sub-segment in the characteristic matrix, the corresponding weight element sub-segment in the weight matrix, and combine the characteristic element sub-segments and weight element sub-segments with corresponding relationships as a set of associated subsegment pairs;

The feature calculation unit is configured to determine the output features of the target quantization network layer based on the quantization processing results of the feature element sub-segments and the weight element sub-segments in each group of associated sub-segment pairs.
The device according to claim 13, wherein the ratio of the number of characteristic element sub-segments and weight element sub-segments contained in each group of associated sub-segment pairs is divided by the segmentation coefficient dividing the weight matrix and the characteristic matrix. The ratios are the same.
The device according to any one of claims 11-14, wherein the feature determination module is configured to:

Through the Tensor Core computing unit of the graphics processor GPU, the output characteristics of the target quantization network layer are determined based on the quantization processing results.
The device according to any one of claims 11 to 15, wherein the feature matrix is a speech feature obtained after a speech segment is processed by a feature extraction layer; the output feature is used to perform semantic recognition processing on the speech segment. .
The device according to any one of claims 11-16, further comprising:

The optional strategy determination module is configured to determine the optional quantization strategy of the original model; wherein the optional quantization strategy includes: an optional quantization network layer, an optional segmentation coefficient of the optional quantization network layer, and optional quantization. number of bits;

A contribution information acquisition module, configured to acquire the quantitative contribution information of the optional quantification strategy obtained by performing data processing on the original model based on the optional quantification strategy;

A target strategy determination module configured to determine a target quantization strategy from the optional quantization strategies based on the quantification contribution information to obtain the target quantization network layer, the target segmentation coefficient of the target quantization network layer and the The target number of quantization bits for the target quantization network layer.
The device according to claim 17, wherein the target policy determination module includes:

A new strategy determination unit configured to determine a newly selected quantization strategy from the optional quantization strategies based on the quantification contribution information corresponding to the optional quantization strategies;

A compression volume determination unit configured to determine the total compression volume of the newly selected strategy and the historically selected quantization strategy;

The target strategy determining unit is configured to use the newly selected quantization strategy and the historically selected quantization strategy as the target quantization strategy when the total compression volume reaches the quantization requirement.
The apparatus of claim 18, wherein the quantified contribution information includes model accuracy information and compression volume information;

The new strategy determination unit is set to:

Sort the optional quantization strategies according to the model accuracy information and the compression volume information corresponding to the optional quantization strategies;

According to the sorting result and the compression volume information corresponding to the optional quantization strategy, a newly selected quantization strategy is determined from the optional quantization strategies.
The device according to claim 18 or 19, the target policy determination module further includes:

The quantification training unit is configured to perform preliminary quantification on the original model based on the newly selected quantization strategy when the total compression volume does not meet the quantification requirements, and train the original model after preliminary quantification to obtain preliminary quantification. Model;

A historical quantification strategy update unit configured to add the newly selected quantification strategy to the historical selected quantification strategy;

A loop operation unit configured to use the optional quantization strategies other than the newly selected quantization strategy among the optional quantization strategies as new optional quantization strategies, and use the preliminary quantization model as the original model. , return to perform the operation of obtaining the quantization contribution information of the optional quantization strategy obtained by performing data processing on the original model based on the optional quantization strategy.
An electronic device including:

at least one processor; and

a memory communicatively connected to the at least one processor; wherein,

The memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to enable the at least one processor to perform any one of claims 1-10. data processing methods.
A non-transitory computer-readable storage medium storing computer instructions, wherein the computer instructions are used to cause the computer to execute the data processing method according to any one of claims 1-10.
A computer program product, including a computer program that implements the data processing method according to any one of claims 1-10 when executed by a processor.