CN115393633A

CN115393633A - Data processing method, electronic device, storage medium, and program product

Info

Publication number: CN115393633A
Application number: CN202210936853.0A
Authority: CN
Inventors: 孙培钦; 张天禹
Original assignee: Shenzhen Kuangshi Jinzhi Technology Co ltd; Beijing Megvii Technology Co Ltd
Current assignee: Shenzhen Kuangshi Jinzhi Technology Co ltd; Beijing Megvii Technology Co Ltd
Priority date: 2022-08-05
Filing date: 2022-08-05
Publication date: 2022-11-25

Abstract

The application provides a data processing method, an electronic device, a storage medium and a program product, comprising: acquiring data to be processed; inputting the data to be processed into a Transformer model, and processing the data to be processed through a Softmax module of the Transformer model to obtain an attention map matrix corresponding to the data to be processed; performing log2 logarithmic quantization on elements in the attribute map matrix through quantization nodes in the transform model to obtain a logarithmic quantization matrix, and quantizing the elements in the logarithmic quantization matrix into a plurality of quantization intervals according to the target bit width to obtain a quantization matrix; and performing subsequent calculation on the basis of the quantization matrix through a subsequent network module in the Transformer model to obtain a processing result of the data to be processed.

Description

Data processing method, electronic device, storage medium, and program product

Technical Field

The present application relates to the field of data processing technologies, and in particular, to a data processing method, an electronic device, a storage medium, and a program product.

Background

Post-training quantization (post-training quantization) of the model refers to mapping floating point numbers in the trained model to an integer field with corresponding bit numbers, so that the size of the model can be greatly reduced and the inference speed can be improved under the condition that the precision of the model is almost lossless.

The Transformer model (a model based on a multi-head attention mechanism) has strong performance in visual tasks (such as image classification, image detection, face recognition and other tasks for processing images), but has the disadvantages of large network structure, high computational complexity, high hardware overhead and the like, and is difficult to deploy on a mobile terminal.

Currently, research on quantization after training of a transform model is few, and particularly, for a Softmax module (classification module) specific to the transform model, the Softmax module needs to be switched from integer operation to floating point operation, and data transfer brings space-time burden, so that the quantization method in the related art is not suitable for the transform model. How to quantize the Transformer model to lower bits so as to greatly reduce the size of the Transformer model, and improve the speed of the Transformer model for processing the data to be processed while ensuring the precision is an urgent problem to be solved.

Disclosure of Invention

In view of the above problems, embodiments of the present application provide a data processing method, an electronic device, a storage medium, and a program product, so as to overcome the above problems or at least partially solve the above problems.

In a first aspect of the embodiments of the present application, a data processing method is provided, including:

acquiring data to be processed; the data to be processed comprises at least one of image data, face data and point cloud data;

inputting the data to be processed into a Transformer model, and processing the data to be processed through a Softmax module of the Transformer model to obtain an attention map matrix corresponding to the data to be processed;

performing log2 logarithmic quantization on elements in the attribute map matrix through a quantization node in the transform model to obtain a logarithmic quantization matrix, and quantizing the elements in the logarithmic quantization matrix into a plurality of quantization intervals according to a target bit width to obtain a quantization matrix;

and performing subsequent calculation on the basis of the quantization matrix through a subsequent network module in the Transformer model to obtain a processing result of the data to be processed.

Optionally, the log2 logarithmic quantization is performed on the elements in the attribute map matrix to obtain a logarithmic quantization matrix, where the log quantization matrix includes:

taking log2 logarithm of the elements in the attention map matrix, and taking inverse number to obtain a positive matrix;

rounding elements in the positive matrix to obtain the logarithmic quantization matrix;

quantizing the elements in the logarithmic quantization matrix into a plurality of quantization intervals according to the target bit width to obtain a quantization matrix, including:

and truncating the elements in the logarithmic quantization matrix according to the target bit width to obtain the quantization matrix.

Optionally, after the obtaining the quantization matrix, the method further includes:

acquiring a Value matrix which is originally operated with the attention map matrix;

performing shift operation on the quantization matrix and the Value matrix to obtain an operation result matrix, and using the operation result matrix as the output of the quantization node;

storing the operation result matrix for subsequent calculation;

performing subsequent calculation based on the quantization matrix through a subsequent network module in the transform model to obtain a processing result of the data to be processed, including:

and performing subsequent calculation through a subsequent network module in the Transformer model based on the operation result matrix to obtain a processing result of the data to be processed.

Optionally, the performing a shift operation on the quantization matrix and the Value matrix to obtain an operation result matrix includes:

determining a target numerical value N according to the target bit width, wherein N is a positive integer;

subtracting the value of each element in the quantization matrix from the target value to obtain a displacement corresponding to the value of each element in the quantization matrix;

and performing shift operation according to the displacement amount corresponding to the value of each element in the quantization matrix to obtain the operation result matrix.

Optionally, the quantization node is further configured to: obtaining a quantization scale 1/2 according to the target value N ^N ；

Performing subsequent calculation based on the operation result matrix through a subsequent network module in the Transformer model to obtain a processing result of the data to be processed, including:

when the subsequent calculation is full-precision calculation, multiplying each element in the operation result matrix by the quantization scale through a subsequent network module in the Transformer model to obtain a full-precision matrix, and performing full-precision calculation according to the full-precision matrix to obtain a processing result of the data to be processed; or

When the subsequent calculation is not full-precision calculation, reading the operation result matrix through a subsequent network module in the Transformer model and participating in the subsequent calculation to obtain a quantitative result matrix; and when a full-precision processing result needs to be obtained, multiplying each element in the quantization result matrix by the quantization scale to obtain the full-precision processing result.

Optionally, the target bit width is denoted by b, and the target value N is 2 ^b Or greater than 2 ^b Is a positive integer of (1).

Optionally, the Transformer model is deployed on a mobile terminal.

Optionally, when the Transformer model is an image processing model for performing an image processing task, the data to be processed is image data, and a processing result of the data to be processed is an image processing result; the attention map matrix is a self-attention value between sub-images of the image data, the self-attention value represents the importance degree of the sub-images to an image processing result of the image processing task, and the image processing result is a classification result of the image or an identification result of an object contained in the image.

In a second aspect of the embodiments of the present application, there is provided an electronic device, including a memory, a processor, and a computer program stored on the memory, where the processor executes the computer program to implement the data processing method disclosed in the embodiments of the present application.

In a third aspect of the embodiments of the present application, a computer-readable storage medium is provided, on which a computer program/instruction is stored, which when executed by a processor implements the data processing method as disclosed in the embodiments of the present application.

In a fourth aspect of the embodiments of the present application, a computer program product is provided, which includes computer programs/instructions, and when the computer programs/instructions are executed by a processor, the computer programs/instructions implement the data processing method disclosed in the embodiments of the present application.

The embodiment of the application has the following advantages:

in this embodiment, the number of elements of the attribute map matrix in the transform model is large, the operation complexity of the attribute map matrix is high, the transform model includes quantization nodes, and the quantization nodes quantize the attribute map matrix, so that the inference speed of the transform model can be effectively improved. The elements in the attribute map matrix are (0, 1) long-tail distribution, a large number of elements are gathered near 0, and a small part of elements exist near 1, so that log2 logarithmic quantization is performed on the elements in the attribute map matrix, and the elements near 1 are considered while sufficient bit width is given to the elements near 0. The elements in the logarithmic quantization matrix are quantized into a plurality of quantization intervals according to the target bit width, most of the elements in the attribute map matrix can be distributed into the plurality of quantization intervals, and therefore the accuracy loss of the transform model can be guaranteed to be within an acceptable range. Therefore, the subsequent network module in the transform model performs subsequent calculation based on the quantization matrix to obtain the processing result of the data to be processed, and the accuracy can be ensured to be within an acceptable range. Therefore, when the Transformer model processes the data to be processed, the precision of the processing result can be ensured, and the reasoning speed of the Transformer model can be improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments of the present application will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive exercise.

FIG. 1 is a flow chart of steps of a data processing method in an embodiment of the present application;

FIG. 2 is a schematic diagram of two kinds of quantization performed on an attribute map matrix;

FIG. 3 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application;

fig. 4 is a schematic diagram of an electronic device in an embodiment of the present application.

Detailed Description

In order to make the aforementioned objects, features and advantages of the present application more comprehensible, the present application is described in further detail with reference to the accompanying drawings and the detailed description.

In recent years, technical research based on artificial intelligence, such as computer vision, deep learning, machine learning, image processing, and image recognition, has been actively developed. Artificial Intelligence (AI) is an emerging scientific technology for studying and developing theories, methods, techniques and application systems for simulating and extending human Intelligence. The artificial intelligence subject is a comprehensive subject and relates to various technical categories such as chips, big data, cloud computing, internet of things, distributed storage, deep learning, machine learning and neural networks. Computer vision is an important branch of artificial intelligence, particularly a machine is used for identifying the world, and computer vision technologies generally comprise technologies such as face identification, living body detection, fingerprint identification and anti-counterfeiting verification, biological feature identification, face detection, pedestrian detection, target detection, pedestrian identification, image processing, image identification, image semantic understanding, image retrieval, character identification, video processing, video content identification, three-dimensional reconstruction, virtual reality, augmented reality, synchronous positioning and map construction (SLAM), computational photography, robot navigation and positioning and the like. With the research and progress of artificial intelligence technology, the technology is applied to many fields, such as safety control, city management, traffic management, building management, park management, face passage, face attendance, logistics management, warehouse management, robots, intelligent marketing, computational photography, mobile phone images, cloud services, smart homes, wearable equipment, unmanned driving, automatic driving, intelligent medical treatment, face payment, face unlocking, fingerprint unlocking, person certificate verification, smart screens, smart televisions, cameras, mobile internet, live webcasts, beauty treatment, medical beauty treatment, intelligent temperature measurement and the like.

Referring to fig. 1, a flowchart illustrating steps of a data processing method in an embodiment of the present application is shown, and as shown in fig. 1, the data processing method includes the following steps:

step S11: acquiring data to be processed; the data to be processed comprises at least one of image data, face data and point cloud data;

step S12: inputting the data to be processed into a Transformer model, and processing the data to be processed through a Softmax module of the Transformer model to obtain an attribution map matrix corresponding to the data to be processed;

step S13: performing log2 logarithmic quantization on elements in the attribute map matrix through a quantization node in the transform model to obtain a logarithmic quantization matrix, and quantizing the elements in the logarithmic quantization matrix into a plurality of quantization intervals according to a target bit width to obtain a quantization matrix;

step S14: and performing subsequent calculation on the basis of the quantization matrix through a subsequent network module in the Transformer model to obtain a processing result of the data to be processed.

The Transformer models which are trained differently can process different data to be processed, including image data, face data, point cloud data and the like. Multi-head Self-Attention module (Multi-head Self-Attention) is one of the most important components in a transform-based architecture (transform-based architecture), and is considered as one of the most computing resource-consuming modules. The computational complexity of the multi-headed self-attention module is the quadratic of the sequence length, for example, if the Transformer model is used for image classification, and the Transformer model divides the input image into m sub-images, the computational complexity of the multi-headed self-attention module is m2. The storage and operation of the attention map matrix generated by the Softmax module in the multi-head self-attention module are the bottleneck for improving the performance and reasoning speed of the transform model. Therefore, in order to improve the performance and the reasoning speed of the Transformer model, a quantization node can be added to the output end of the Softmax module in the multi-head self-attention module, and the attention map matrix generated by the Softmax module of the Transformer model is quantized at the quantization node.

The model quantization comprises quantization in training and quantization after training, the quantization in the training needs complete training data and takes long time, the model does not need to be trained any more after the training, and the model quantization can be completed quickly only by a small amount of test data and simple operation.

Inputting the data to be processed into a trained Transformer model, wherein a quantization node is added at the Softmax output end of the Transformer model, and the quantization node can quantize an attribute map matrix to realize the post-training quantization of the Transformer model, so that the Transformer model can quickly and accurately obtain the processing result of the data to be processed. It can be understood that, when quantizing the attribute map matrix, the quantization node may not necessarily obtain the complete attribute map matrix. The quantization node may quantize the obtained complete attribute map matrix; the quantization node may also be an attention map matrix generated by the Softmax module, the quantization node quantizes the attention map matrix generated by the Softmax module, and the generation of the attention map matrix by the Softmax module and the quantization of the attention map matrix by the quantization node may be performed in parallel.

Fig. 2 shows a schematic diagram of two kinds of quantization of the attribute map matrix, and it can be seen that the elements of the attribute map matrix are (0, 1) long-tailed distribution, a large number of elements are gathered near 0, and a small number of elements are present near 1. If uniform (uniform) 4-bit quantization is performed on elements in the attribute map matrix, most elements of the attribute map matrix can be quantized only into 1 quantization interval; while the elements in the attribute map matrix are log2 logarithmically quantized, the elements of the attribute map matrix can be quantized into 12 quantization intervals with quantization into a total of 16 quantization intervals of 4 bits. The smaller the number of quantization intervals quantized, the more the performance of the model is impaired. Therefore, the elements in the attribute map matrix are log2 logarithmically quantized to obtain a logarithmically quantized matrix.

The quantization interval is 2 to the power of the target bit-width. Alternatively, if the target bit width is 4, the number of quantization intervals is 2 ⁴ = 16; if the target bit width is 6, the number of quantization intervals is 2 ⁶ = 64. However, referring to fig. 2, in the range of 0 to 1, some regions may not have elements in the attribute map matrix, and thus, the elements of the logarithmic quantization matrix may not be quantized into all quantization intervals. The log2 logarithmic quantization adopted by the embodiment of the application can ensure that the elements of the logarithmic quantization matrix are quantized into quantization intervals as much as possible. Therefore, the accuracy of the processing result of the obtained to-be-processed data is higher when the subsequent network module in the transform model performs subsequent calculation based on the quantization matrix. By adopting the technical scheme of the embodiment of the application, compared with the method for quantizing the Transformer model in training, the quantization nodes are added in the trained Transformer model, and only simple operation is needed to be carried out on the quantization nodes,the quantification of the transform model can be completed. The number of elements of the attribution map matrix in the transform model is large, the operation complexity of the attribution map matrix is high, the transform model comprises quantization nodes, and the quantization nodes quantize the attribution map matrix, so that the reasoning speed of the transform model can be effectively improved. The elements in the attribute map matrix are (0, 1) long-tail distribution, a large number of elements are gathered near 0, and a small part of elements exist near 1, so that log2 logarithmic quantization is performed on the elements in the attribute map matrix, and the elements near 1 are considered while sufficient bit width is given to the elements near 0. The elements in the logarithmic quantization matrix are quantized into a plurality of quantization intervals according to the target bit width, most elements in the attribute map matrix can be distributed into the plurality of quantization intervals, and therefore the accuracy loss of the Transformer model can be guaranteed to be within an acceptable range. Therefore, the subsequent network module in the transform model performs subsequent calculation based on the quantization matrix to obtain the processing result of the data to be processed, and the accuracy can be ensured to be within an acceptable range. Therefore, when the Transformer model processes the data to be processed, the precision of the processing result can be ensured, and the reasoning speed of the Transformer model can be improved.

The unquantized Transformer model has a huge network structure, high computational complexity and high hardware overhead, so that the Transformer model is difficult to deploy at a mobile terminal. The Transformer model can be quantized through the quantization node, so that the size of the Transformer model is greatly reduced, and the Transformer model can be deployed on a mobile terminal. The Transformer model is deployed on the mobile terminal, and the application scene of the Transformer model is favorably expanded.

Optionally, on the basis of the above technical solution, the log2 logarithmic quantization is performed on the element in the attribute map matrix by the quantization node to obtain a logarithmic quantization matrix, which may specifically include: taking log2 logarithm of elements in the attribute map matrix, and taking inverse number to obtain a positive matrix; and rounding the elements in the positive matrix to obtain the logarithmic quantization matrix.

Because the value range of the elements in the attention map matrix is 0 to 1, taking log2 logarithm of the elements in the attention map matrix to obtain the value range of- ∞ to 0, and then taking the inverse number to obtain the normal number matrix, wherein the value range of each element in the normal number matrix is 0 to + ∞.

Because quantization is to map floating point numbers onto an integer domain, rounding is performed on each element in the logarithmic matrix to obtain a logarithmic quantization matrix. Among them, rounding up, rounding down, etc. may be performed.

Since the number of quantization intervals is determined by the target bit-width, quantizing the elements in the logarithmic quantization matrix into a plurality of quantization intervals according to the target bit-width to obtain a quantization matrix, which may include: and truncating the elements in the logarithmic quantization matrix according to the target bit width to obtain the quantization matrix.

If the target bit width is b, truncating the elements in the logarithm quantization matrix according to the target bit width, and quantizing each element in the attribute map matrix to an integer of 0-2 ^b -1, obtaining a quantization matrix.

The process of obtaining the quantization matrix according to the attribute map matrix can be characterized by the following formula:

wherein, attn characterizes the attribute map matrix, attn _Q And (b) representing a quantization matrix, b is the target bit width, Q represents a quantization process, and clip represents truncation.

Thus, the process of obtaining the quantization matrix from the attribute map matrix is completed.

In the Softmax module of the Transformer model, an original attribute map matrix needs to be multiplied by a Value matrix generated by a front-end module of the Softmax module, and then subsequent calculation of the Transformer model is carried out according to a multiplication result. Because the attribute map matrix is log2 logarithmically quantized, if the quantization matrix is also multiplied by the Value matrix, the obtained result will be wrong.

The output of the quantization node corresponds to a result of quantizing the original output of the Softmax module. Therefore, in order to obtain the output of the quantization node, the quantization node also needs to replace the calculation of the attribute map matrix and the Value matrix.

The quantization node can equivalently replace the original multiplication operation of the attribute map matrix and the Value matrix with the shift operation of the quantization matrix and the Value matrix to obtain an operation result matrix, and the operation result matrix is used as the output of the quantization node. When the subsequent network module of the Transformer model performs subsequent calculation based on the quantization matrix, the subsequent calculation is actually performed based on the operation result matrix obtained according to the quantization matrix, so that the processing result of the data to be processed is obtained.

Optionally, the operation result matrix may also be stored, and when the operation result matrix needs to be used in subsequent calculation, the stored operation result matrix is obtained. And the subsequent calculation is calculation performed by the Transformer model after a quantization matrix or an operation result matrix is obtained.

By adopting the technical scheme of the embodiment of the application, the computation amount of the shift operation between the matrixes is less than that of the multiplication operation between the matrixes, so that the computing resources can be saved. Because the data volume of the quantization matrix is smaller than that of the attribute map matrix, the data volume of the operation result matrix obtained based on the quantization matrix is also smaller than that of the operation result originally obtained based on the attribute map matrix. Therefore, the storage space for storing the operation result matrix can be saved, and the subsequent calculation speed of the transform model can be accelerated.

Optionally, on the basis of the foregoing technical solution, performing shift operation on the quantization matrix and the Value matrix to obtain an operation result matrix, which may include: determining a target numerical value N according to the target bit width, wherein N is a positive integer; subtracting the value of each element in the quantization matrix from the target value to obtain a displacement corresponding to the value of each element in the quantization matrix; and performing shift operation according to the displacement amount corresponding to the value of each element in the quantization matrix to obtain the operation result matrix.

N is a positive integer large enough that if the target bit width is denoted by b, N may be 2 ^b Or greater than 2 ^b Is a positive integer of (1). And replacing the multiplication operation of the original attribute map matrix and the Value matrix with the left shift operation of obtaining the displacement of the Value matrix to obtain an operation result matrix.

The shift operation performed by the quantization matrix and Value matrix can be characterized by the following formula:

Attn·V _Q ＝V _Q ＜＜(N-Attn _Q )

wherein, V _Q The Value matrix is characterized and the meaning of the remaining characters can be referred to above.

Therefore, the calculation resources are saved, and the data volume of the operation result matrix is reduced.

Optionally, on the basis of the above technical solution, the quantization scale is determined by the target value N, and the quantization node may obtain the quantization scale of 1/2 according to the target value N ^N The quantization scale may be understood as a scaling factor.

When the subsequent calculation performed by the subsequent network module of the Transformer model is full-precision calculation, the subsequent network module can obtain a full-precision matrix only by multiplying each element in the operation result matrix by the quantization scale, and can perform the subsequent full-precision calculation of the Transformer model by using the full-precision matrix to obtain a processing result of the data to be processed.

When the subsequent calculation performed by the subsequent network module of the Transformer model is not full-precision calculation, the operation result matrix can be directly read, the operation result matrix is used for participating in the subsequent calculation, and a quantization result matrix is obtained, wherein the quantization result matrix is a processing result of the non-full-precision data to be processed output by the Transformer model.

When the subsequent calculation of the transform model is not full-precision calculation and a full-precision processing result is desired, after the quantization result matrix is obtained, each element in the quantization result matrix may be multiplied by the quantization scale, so that the full-precision processing result may be obtained. Therefore, a full-precision processing result is obtained, the intermediate calculation process is non-full-precision calculation, and the calculation is simple.

Therefore, after the operation result matrix is obtained, different calculations can be performed, different requirements of users are met, and the use experience of the users is improved.

Alternatively, the Transformer model may be an image processing model for performing an image processing task, where the image processing task may include image classification, image detection, image separation, image character bimodal, three-dimensional point cloud recognition, and the like, and the image processing task has wide applications in the fields of face recognition, medical image recognition and analysis, three-dimensional modeling, automatic driving, and the like.

In the case that the Transformer model is an image processing model for performing an image processing task, the data to be processed is image data, and a quantization node is added to an output end of a Softmax module of the image processing model. Inputting the image data into an image processing model, and processing the image data by a Softmax module of the image processing model to obtain an attribute map matrix corresponding to the image data. The method comprises the steps that a quantization node in a Transformer model carries out log2 logarithmic quantization on elements in an attribute map matrix to obtain a logarithmic quantization matrix, the elements in the logarithmic quantization matrix are quantized into a plurality of quantization intervals according to target bit width to obtain a quantization matrix, and the quantization matrix is used for replacing the attribute map matrix and participating in calculation originally participated by the attribute map matrix. And a subsequent network module in the image processing model performs subsequent calculation based on the quantization matrix to obtain an image processing result.

And the quantization node performs shift operation on the quantization matrix and a Value matrix generated by a front module of the Softmax module to obtain an operation result matrix. And based on the operation result matrix, performing subsequent calculation on a subsequent network model of the image processing model to obtain an image processing result. The attention map matrix is a self-attention value between sub-images of an image of an input image processing model, the self-attention value represents the importance degree of the sub-images to an image processing result of the image processing task, and the image processing result is a classification result of the image or an identification result of an object contained in the image.

In the case that the Transformer model is an image processing model for performing an image processing task, the image processing model divides an input image into a plurality of sub-images, and the attention map matrix generated by the Softmax module is a self-attention value between the sub-images. Under the condition that the Transformer model is an image classification model for performing an image classification task, an image processing model divides an input image to be classified into a plurality of sub-images, an attention map matrix representing a self-attention value of each sub-image is obtained, and the self-attention value of each sub-image represents the importance degree of a classification result of the sub-image to be classified. Carrying out log2 logarithmic quantization on elements in an attribute map matrix of the image classification model to obtain a logarithmic quantization matrix; and quantizing the elements in the logarithmic quantization matrix to a plurality of quantization intervals according to the target bit width to obtain a quantization matrix. And substituting the quantization matrix for the attribute map matrix, participating in the original calculation of the attribute map matrix, and obtaining an image classification result of the image to be classified output by the image classification model.

Therefore, the image processing models for different image processing tasks can be quantized, the accuracy of the image processing result is guaranteed, the size of the image processing model is greatly reduced, and the reasoning speed of the image processing model is improved.

Experiments were performed on multiple Vision transform (transform for performing Vision tasks) structures on an ImageNet dataset (data in a large visualization database), verifying the loss to model accuracy caused by uniform quantization and log2 logarithmic quantization performed by the quantization nodes in the examples of the present application. Wherein, except for the LayerNorm module, all weights (weight) and activations (activation) of the model are uniformly quantized using the maximum and minimum (MinMax for Uniform quantization) and the attribute map matrix of the model is log2 logarithmically quantized and uniformly quantized, respectively.

TABLE 1 accuracy of each model after log2 logarithmic and uniform quantization

It can be seen that, under the condition that each model is quantized to 8 bits, the model precision loss is not large; when the model is quantized to 4 bits by adopting uniform quantization, the precision loss of the model is serious; however, if the model is quantized to 4 bits by log2 logarithmic quantization, the loss of model precision is almost the same as that of the model quantized to 8 bits, and the loss of precision is only about 0.5% as compared with that of the floating point model. Therefore, by adopting the quantification method in the embodiment of the application, the accuracy loss of the Transformer model can be ensured to be small.

Under the condition of quantizing to 8 bits, the difference between the results obtained by adopting the uniform quantizing method and the results obtained by adopting the quantizing method of the embodiment of the application is not large, and when the results are quantized to lower bits (6 bits and 4 bits), the accuracy of the results is rapidly reduced by adopting the uniform quantizing method, and even the attribute map matrix is inactivated in a large range (deactivating); however, when the quantization method of the embodiment of the present application is used for quantizing to lower bits (6 bits and 4 bits), the obtained result is not obviously changed from the result obtained when the quantization method is used for quantizing to 8 bits. Therefore, by adopting the technical scheme of the embodiment of the application, the performance loss of the Transformer model can be ensured to be small even at lower bits.

It should be noted that, for simplicity of description, the method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the embodiments are not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the embodiments. Further, those skilled in the art will also appreciate that the embodiments described in the specification are presently preferred and that no particular act is required of the embodiments of the application.

Fig. 3 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application, and as shown in fig. 3, the data processing apparatus includes an obtaining module, an input module, a quantizing module, and a result obtaining module, where:

the acquisition module is used for acquiring data to be processed; the data to be processed comprises at least one of image data, face data and point cloud data;

the input module is used for inputting the data to be processed into a Transformer model, and processing the data to be processed through a Softmax module of the Transformer model to obtain an attribution map matrix corresponding to the data to be processed;

a quantization module, configured to perform log2 logarithmic quantization on elements in the attribute map matrix through quantization nodes in the transform model to obtain a logarithmic quantization matrix, and quantize the elements in the logarithmic quantization matrix into multiple quantization intervals according to a target bit width to obtain a quantization matrix;

and the result acquisition module is used for performing subsequent calculation on the basis of the quantization matrix through a subsequent network module in the Transformer model to obtain a processing result of the data to be processed.

Optionally, the quantization module comprises:

a logarithm taking unit, configured to take log2 logarithm of the elements in the attribute map matrix, and take inverse numbers to obtain a positive matrix;

the rounding unit is used for rounding elements in the positive number matrix to obtain the logarithmic quantization matrix;

the quantization module comprises:

and the truncation unit is used for truncating the elements in the logarithmic quantization matrix according to the target bit width to obtain the quantization matrix.

a Value matrix obtaining module, configured to obtain a Value matrix that is to be operated with the attribute map matrix;

the shift operation module is used for performing shift operation on the quantization matrix and the Value matrix to obtain an operation result matrix, and the operation result matrix is used as the output of the quantization node;

the storage module is used for storing the operation result matrix for subsequent calculation;

the result obtaining comprises:

and the result acquisition unit is used for performing subsequent calculation on the basis of the operation result matrix through a subsequent network module in the Transformer model to obtain a processing result of the data to be processed.

Optionally, the shift operation module includes:

a target value determining unit, configured to determine a target value N according to the target bit width, where N is a positive integer;

a displacement obtaining unit, configured to subtract the value of each element in the quantization matrix from the target value to obtain a displacement corresponding to the value of each element in the quantization matrix;

and the shift operation unit is used for carrying out shift operation according to the displacement amount corresponding to the value of each element in the quantization matrix to obtain the operation result matrix.

The result acquisition module comprises:

a first result obtaining unit, configured to, when the subsequent calculation is full-precision calculation, multiply each element in the operation result matrix by the quantization scale through a subsequent network module in the transform model to obtain a full-precision matrix, and perform full-precision calculation according to the full-precision matrix to obtain a processing result of the to-be-processed data; or

A second result obtaining unit, configured to, when the subsequent calculation is not full-precision calculation, read the operation result matrix through a subsequent network module in the transform model and participate in the subsequent calculation to obtain a quantization result matrix; and when a full-precision processing result needs to be obtained, multiplying each element in the quantization result matrix by the quantization scale to obtain the full-precision processing result.

Optionally, the target bit width is represented by b, and the target value N is 2 ^b Or greater than 2 ^b Is a positive integer of (a).

Optionally, the Transformer model is deployed on a mobile terminal.

Optionally, in a case that the transform model is an image processing model for performing an image processing task, the data to be processed is image data, and a processing result of the data to be processed is an image processing result; the attention map matrix is a self-attention value between sub-images of the image data, the self-attention value represents the importance degree of the sub-images to an image processing result of the image processing task, and the image processing result is a classification result of the image or an identification result of an object contained in the image.

It should be noted that the device embodiments are similar to the method embodiments, so that the description is simple, and reference may be made to the method embodiments for relevant points.

An electronic device is further provided in the embodiment of the present application, and referring to fig. 4, fig. 4 is a schematic diagram of the electronic device provided in the embodiment of the present application. As shown in fig. 4, the electronic apparatus 100 includes: the memory 110 and the processor 120 are connected through a bus in a communication manner, the memory 110 and the processor 120 are stored with a computer program, and the computer program can be run on the processor 120, so as to implement the steps in the data processing method disclosed in the embodiment of the present application.

Embodiments of the present application also provide a computer-readable storage medium, on which a computer program/instruction is stored, which, when executed by a processor, implements the data processing method as disclosed in the embodiments of the present application.

Embodiments of the present application further provide a computer program product, which includes a computer program/instruction, and the computer program/instruction, when executed by a processor, implement the data processing method disclosed in the embodiments of the present application.

The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, apparatus or computer program product. Accordingly, embodiments of the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

Embodiments of the present application are described with reference to flowchart illustrations and/or block diagrams of methods, apparatus, electronic devices and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing terminal to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing terminal to cause a series of operational steps to be performed on the computer or other programmable terminal to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present application have been described, additional variations and modifications of these embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including the preferred embodiment and all changes and modifications that fall within the true scope of the embodiments of the present application.

Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrases "comprising one of \ 8230; \8230;" does not exclude the presence of additional like elements in a process, method, article, or terminal device that comprises the element.

The data processing method, the electronic device, the storage medium, and the program product provided by the present application are introduced in detail, and a specific example is applied to illustrate the principles and embodiments of the present application, and the description of the above embodiment is only used to help understand the method and the core idea of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. A data processing method, comprising:

inputting the data to be processed into a Transformer model, and processing the data to be processed through a Softmax module of the Transformer model to obtain an attribution map matrix corresponding to the data to be processed;

2. The method of claim 1, wherein log2 logarithmically quantizing the elements in the attribute map matrix to obtain a logarithmically quantized matrix, comprises:

taking log2 logarithm of elements in the attribute map matrix, and taking inverse number to obtain a positive matrix;

3. The method of claim 1, further comprising, after said deriving the quantization matrix:

acquiring a Value matrix which is originally operated with the attribution map matrix;

storing the operation result matrix for subsequent calculation;

4. The method of claim 3, wherein the performing a shift operation on the quantization matrix and the Value matrix to obtain an operation result matrix comprises:

5. The method of claim 4, wherein the quantization node is further configured to: obtaining a quantization scale 1/2 according to the target value N ^N ；

6. Method according to claim 4 or 5, characterized in that said target bit-width is denoted b, said target value N being 2 ^b Or greater than 2 ^b Is a positive integer of (1).

7. The method of any of claims 1-6, wherein the Transformer model is deployed on a mobile terminal.

8. The method according to any one of claims 1 to 7, wherein, in the case that the Transformer model is an image processing model for performing an image processing task, the data to be processed is image data, and the processing result of the data to be processed is an image processing result; the attention map matrix is a self-attention value between sub-images of the image data, the self-attention value represents the importance degree of the sub-images to an image processing result of the image processing task, and the image processing result is a classification result of the image or an identification result of an object contained in the image.

9. An electronic device comprising a memory, a processor and a computer program stored on the memory, characterized in that the processor executes the computer program to implement the data processing method of any one of claims 1 to 8.

10. A computer-readable storage medium, on which a computer program/instructions are stored, characterized in that the computer program/instructions, when executed by a processor, implement the data processing method according to any one of claims 1 to 8.

11. A computer program product comprising computer programs/instructions, characterized in that the computer programs/instructions, when executed by a processor, implement the data processing method according to any one of claims 1 to 8.