CN112189216A

CN112189216A - Data processing method and device

Info

Publication number: CN112189216A
Application number: CN201980033583.9A
Authority: CN
Inventors: 周爱春; 余俊峰; 聂谷洪
Original assignee: SZ DJI Technology Co Ltd
Current assignee: SZ DJI Technology Co Ltd; SZ DJI Innovations Technology Co Ltd
Priority date: 2019-08-29
Filing date: 2019-08-29
Publication date: 2021-01-05
Also published as: WO2021035598A1

Abstract

A data processing method and device. The method comprises the following steps: reading target data (201) of a calculation node in a neural network from a storage device, wherein the target data is fixed-point data represented by a first bit number, the bit number occupied by the target data in the storage device is the first bit number, the first bit number is smaller than a second bit number, and the second bit number is the bit number of an operand of the calculation device; and calculating to obtain the output data (202) of the computing node according to the target data. The method can reduce the bandwidth of the neural network between the computing equipment and the storage equipment on the basis of ensuring the computing efficiency of the neural network on the computing equipment.

Description

Data processing method and device

Technical Field

The present application relates to the field of computer technologies, and in particular, to a data processing method and device.

Background

With the development of artificial intelligence technology, Neural Networks (NN) are applied more and more widely. Among them, a neural network is a complex network system formed by a large number of simple processing units (which may also be referred to as neurons) connected in relation to each other.

In the prior art, a neural network may include a plurality of computing nodes, each of the computing nodes may include a plurality of neurons, parameters and activation data of the computing nodes in the neural network are stored in a storage device, and the computing device needs to perform a large amount of data interaction with the storage device in the process of operating the neural network. Generally, in order to improve the computational efficiency of the neural network on the computing device, the parameters of the neural network and the excitation data are fixed-point data represented by a certain number of bits, i.e., the number of bits of the operand of the computing device, for example, 8 bits.

However, in the prior art, the problem that the neural network occupies a large bandwidth between the computing device and the storage device exists.

Disclosure of Invention

The embodiment of the application provides a data processing method and device, which are used for solving the problem that a neural network occupies a larger bandwidth between a computing device and a storage device in the prior art.

In a first aspect, an embodiment of the present application provides a data processing method applied to a computing device, including:

reading target data of a computational node in a neural network from a storage device, wherein the target data are fixed-point data represented by a first bit number, the bit number occupied by the target data in the storage device is the first bit number, the first bit number is smaller than a second bit number, and the second bit number is the bit number of an operand of the computational device;

and calculating to obtain the output data of the computing node according to the target data.

In a second aspect, an embodiment of the present application provides a data processing apparatus, including: a processor and a memory; the memory for storing program code; the processor, invoking the program code, when executed, is configured to:

In a third aspect, an embodiment of the present application provides a computer-readable storage medium, which stores a computer program, where the computer program includes at least one piece of code, where the at least one piece of code is executable by a computer to control the computer to perform the method according to any one of the above first aspects.

In a fourth aspect, the present application provides a computer program, which is used to implement the method of any one of the above first aspects when the computer program is executed by a computer.

The embodiment of the application provides a data processing method and device, by reading target data of a computing node in a neural network from a storage device, the target data is fixed-point data and the number of bits occupied by the target data in the storage device is less than the second number of bits of the operand of the computing device, the output data of the computing node is obtained by computing according to the target data, because the target data is fixed-point data, the calculation efficiency of the neural network on the calculation device can be ensured, since the number of bits occupied by the target data in the storage device is a first number of bits, which is smaller than a second number of bits of an operand of the computing device, the amount of data of the neural network interacting between the computing device and the storage device can be reduced, therefore, the bandwidth occupied by the neural network between the computing device and the storage device can be reduced on the basis of ensuring the computing efficiency of the neural network on the computing device. In addition, by reducing the bandwidth requirements between the computing device and the storage device, costs may also be reduced.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings can be obtained by those skilled in the art without creative efforts.

Fig. 1 is a schematic view of an application scenario of a data processing method according to an embodiment of the present application;

fig. 2 is a schematic flowchart of a data processing method according to an embodiment of the present application;

fig. 3 is a schematic flowchart of a data processing method according to another embodiment of the present application;

FIG. 4A is a schematic storage diagram of target data before expansion according to an embodiment of the present application;

FIG. 4B is a schematic diagram illustrating the storage of the augmented target data according to an embodiment of the present application;

fig. 5 is a schematic flowchart of a data processing method according to another embodiment of the present application;

FIG. 6A is a schematic storage diagram of output data before compression according to an embodiment of the present application;

FIG. 6B is a schematic storage diagram of compressed output data according to an embodiment of the present disclosure;

FIG. 7 is a diagram illustrating the relationship between expansion and compression and the calculation process provided by the embodiment of the present application;

FIG. 8 is a schematic diagram illustrating a training process of a compute node according to an embodiment of the present application;

FIG. 9 is a schematic diagram of a prediction process of a compute node according to an embodiment of the present application;

fig. 10 is a schematic structural diagram of a computing device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The data processing method provided by the embodiment of the application can be applied to any scene needing data processing through a neural network, and the data processing method can be specifically executed by computing equipment. An application scenario schematic diagram of the data processing method provided in the embodiment of the present application may be as shown in fig. 1, specifically, the computing device 11 is in communication connection with the server 12, the computing device 11 may obtain the neural network from the server 12, and the computing device 11 may perform data processing through the neural network obtained from the server 12. The data processing may be any type of processing that can be performed by a neural network.

The computing device 11 may be specifically a device capable of performing a computing function. Illustratively, the computing device includes, but is not limited to, one or more of the following: a Central Processing Unit (CPU), an Advanced RISC Machine (ARM), a Digital Signal Processor (DSP), and a Graphics Processing Unit (GPU).

It should be noted that, as to the specific manner of the communication connection between the data processing apparatus 11 and the server 12, the present application is not limited, and for example, the wireless communication connection may be implemented based on a bluetooth interface, or the wired communication connection may be implemented based on an RS232 interface.

It should be noted that fig. 1 illustrates an example in which the computing device obtains the neural network from a server, alternatively, the computing device may obtain the neural network in other ways, for example, the computing device may obtain the neural network from other devices, or the computing device may obtain the neural network through self-training.

In the data processing method provided by the embodiment of the application, the target data of the computation node in the neural network is read from the storage device through the computation device, the number of bits occupied by one target data in the storage device is smaller than the second number of bits of the operand of the computation device, and compared with the prior art that the target data of the computation node is fixed-point data represented by the second number of bits, and the number of bits occupied by one target data in the storage device is equal to the second number of bits, the data size of the neural network interacted between the computation device and the storage device is reduced, so that the bandwidth occupied by the neural network between the computation device and the storage device is reduced.

Some embodiments of the present application will be described in detail below with reference to the accompanying drawings. The embodiments described below and the features of the embodiments can be combined with each other without conflict.

Fig. 2 is a schematic flow chart of a data processing method according to an embodiment of the present application, where an execution subject of the embodiment may be a computing device, and specifically may be a processor of the computing device. As shown in fig. 2, the method of this embodiment may include:

in step 201, target data of a computational node in a neural network is read from a storage device.

In this step, the storage device may specifically be a device accessible by the computing device and used for implementing a data storage function. Illustratively, the storage device includes, but is not limited to, one or more of the following: memory, solid state disk, mechanical hard disk, floppy disk.

The neural network can be composed of a plurality of computing nodes, each computing node can take the activation data from the storage device, then carries out calculation, and then stores the calculation result in the storage device as the input data of the next-stage computing node, namely the activation data of the next-stage computing node). Taking a Neural network as a Convolutional Neural Network (CNN) as an example, one volume Block (Conv Block) may correspond to one compute node. It should be noted that the computation nodes in this embodiment may be all computation nodes in the neural network, or the computation nodes in this embodiment may be part of computation nodes in the neural network.

The target data is fixed-point data represented by a first bit number, the bit number occupied by one target data in the storage device is the first bit number, the first bit number is smaller than a second bit number, and the second bit number is the bit number of an operand of the computing device. The fixed point type data refers to that the position of a decimal point in the number is fixed and unchangeable, usually there are fixed point integers and fixed point decimal, after the position of the decimal point is selected, all numbers in the operation are unified into the fixed point integers or the fixed point decimal, and the decimal problem is not considered in the operation, so that the calculation efficiency can be improved.

Illustratively, the second number of bits may be 16 or 8. Optionally, when the number of bits all satisfies the law of powers of 2 and the second number of bits is 16, the first number of bits may be 8 or 4. When the number of bits satisfies the power of 2 law and the second number of bits is 8, the first number of bits may be 4.

Since the target data is fixed-point data, computational efficiency of the neural network on the computing device can be ensured. Compared with the prior art that the target data of the computing node is fixed-point data represented by the second bit number, and the bit number occupied by one target data in the storage device is equal to the second bit number, the data volume of the neural network interacted between the computing device and the storage device is reduced, so that the data volume of the neural network interacted between the computing device and the storage device can be reduced.

It should be noted that the target data may be one or more of any types of data that the neural network needs to read from the storage device in the calculation process. Illustratively, the target data includes a weight parameter and/or activation data. Wherein the weight parameter is a network parameter of the neural network.

And step 202, calculating to obtain output data of the computing node according to the target data.

In this step, the specific calculation manner for performing calculation according to the target data may include a preset calculation related to the type of the neural network. For example, when the neural network is a convolutional neural network, the preset calculation may include a convolutional calculation, and specifically, a convolutional operation may be performed on the activation data and the network parameter.

It should be noted that the output data of the compute node is the activation data of the compute node at the next stage of the compute node.

In this embodiment, by reading target data of a computation node in a neural network from a storage device, where the target data is fixed-point data and a number of bits occupied by the computation node in the storage device is smaller than a second number of bits of an operand of the computation device, and obtaining output data of the computation node by computation according to the target data, since the target data is fixed-point data, computation efficiency of the neural network on the computation device can be ensured, and since the number of bits occupied by the target data in the storage device is a first number of bits and the first number of bits is smaller than the second number of bits of the operand of the computation device, a data amount of the neural network interacting between the computation device and the storage device can be reduced, so that bandwidth occupied by the neural network between the computation device and the storage device can be reduced on the basis of ensuring computation efficiency of the neural network on the computation device. In addition, by reducing the bandwidth requirements between the computing device and the storage device, costs may also be reduced.

Fig. 3 is a schematic flow chart of a data processing method according to another embodiment of the present application, and this embodiment mainly describes an optional implementation manner of obtaining output data of the computing node by calculation according to the target data on the basis of the embodiment shown in fig. 3. As shown in fig. 3, the method of this embodiment may include:

step 301, reading target data of a computing node in the neural network from a storage device.

In this step, the number of bits occupied by one target data in the storage device is the first number of bits, and the number of bits occupied by one target data read from the storage device in step 301 is also the first number of bits.

Step 302, extending the bit number occupied by the target data from the first bit number to the second bit number to obtain the extended target data.

In this step, since the number of bits of the operand supported by the computing device is the second number of bits, the number of bits occupied by the target data read from the storage device is the first number of bits, and the first number of bits is smaller than the second number of bits, in order to enable the computing device to perform the calculation of the neural network based on the target data, the number of bits occupied by the target data may be extended from the first number of bits to the second number of bits in step 302, so as to obtain the extended target data. Illustratively, the expansion of the number of bits occupied by the target data from the first number of bits to the second number of bits may be accomplished using a combination with an instruction "and", a shift instruction "shift", a sum instruction "add", and the like.

The process of expanding the bit number occupied by the target data from the first bit number to the second bit number can be understood as an unpacking (unpack) process of the target data. Illustratively, step 302 may specifically include: and adding N filling bits to the head and/or tail of the first bit number of bits occupied by the target data to obtain the expanded target data, wherein N is the difference between the second bit number and the first bit number.

For example, when the first number of bits is equal to 4, the second number of bits is equal to 8, and N padding bits are added to the header of the first number of bits occupied by the target data to obtain the extended target data, valid bits in the extended target data may be located at lower 4 bits. For another example, when the first number of bits is equal to 4, the second number of bits is equal to 8, and N padding bits are added to the tail of the first number of bits occupied by the target data to obtain the extended target data, valid bits in the extended target data may be located at the upper 4 bits.

It should be noted that the padding bits may be understood as invalid bits, the bits other than the padding bits in the expanded target data may be understood as valid bits, and the valid bits are bits of the first consecutive bits. Wherein, the padding bits can be flexibly designed according to requirements. For example, taking the first number of bits as 4 and the second number of bits as 8 as an example, assuming that the extended target data needs to be used as a signed 8-bit number and the computing device represents the signed number by a complementary code, the valid bit may be placed at the upper 4 bits, and the upper 4 bits may be complemented by 1, that is, the padding bit may be 1.

Taking the first number of bits as 4, the second number of bits as 8, and the expanded valid bits are located at the lower 4 bits as an example, the unpack process may be as shown in fig. 4A and 4B, specifically, the previous expansion data occupies 4 bits, the storage address relationship of the data a1 to F1 in the computing device may be as shown in fig. 4A, the processing of the unpack process is performed to make the next data occupy 8 bits, and the storage address relationship of the expanded data in the computing device may be as shown in fig. 4B.

And 303, calculating to obtain output data of the computing node according to the expanded target data.

In this step, the specific calculation manner for performing calculation according to the expanded target data may include a preset calculation related to the type of the neural network. For example, when the neural network is a convolutional neural network, the preset calculation may include a convolutional calculation, and specifically, a convolutional operation may be performed on the activation data and the network parameter.

In this embodiment, the number of bits occupied by the target data read from the storage device is expanded from the first number of bits to the second number of bits to obtain expanded target data, and the output data of the computing node is obtained by calculation according to the expanded target data, so that the computing device can calculate and obtain the output data of the computing node based on the target data in which the number of bits occupied by the storage device is smaller than the second number of bits, and the problem that the neural network cannot be deployed in the computing device due to the fact that the number of bits occupied by the target data in the storage device is not consistent with the number of bits of operands of the computing device is solved.

Fig. 5 is a schematic flow chart of a data processing method according to another embodiment of the present application, and this embodiment mainly describes an optional implementation manner of obtaining output data of the computing node by calculation according to the target data on the basis of the embodiment shown in fig. 2. As shown in fig. 5, the method of this embodiment may include:

step 501, reading target data of a computing node in a neural network from a storage device.

It should be noted that step 501 is similar to step 201 and step 301, and is not described herein again.

And 502, performing preset calculation according to the target data to obtain a calculation result.

Optionally, when step 302 is executed before step 502, step 502 may specifically include: and carrying out preset calculation according to the expanded target data to obtain a calculation result. For example, when the preset calculation is a convolution operation, the calculation result may specifically be a convolution calculation result.

Step 503, quantizing the calculation result by using the first bit number, so as to obtain the output data of the calculation node.

In this step, the output data is fixed-point data represented by the first bit number, and the number of bits occupied by the output data is the second bit number. For example, assuming that the first number of bits is 4 and the second number of bits is 8, and the calculation result is 0b10010110, the output data of the calculation node obtained by quantizing the calculation result with the first number of bits may be 0b 1001.

Optionally, when the data amount of the output data of the computing node read from the storage device by the computing device is not considered to be reduced, step 503 may be replaced by quantizing the computation result by using the second bit number to obtain the output data of the computing node.

Step 504, compressing the bit number occupied by the output data from the second bit number to the first bit number to obtain the compressed output data.

In this step, a process of compressing the bit number occupied by the target data from the second bit number to the first bit number may be understood as a packing (pack) process of output data. Illustratively, step 504 may specifically include: and selecting effective continuous bits of the first bit number from the second bit numbers occupied by the output data as the compressed output data. Where invalid bits may be discarded. It should be noted that the pack process corresponds to the unpack process, for example, assuming that 4 bits are occupied by the data before being compressed, the storage address relationship of the data G to L in the computing device may be as shown in fig. 6A, 4 bits are occupied by the data after being processed by the pack process, and the storage address relationship of the compressed data in the computing device may be as shown in fig. 6B.

For example, the compressing the number of bits occupied by the output data from the second number of bits to the first number of bits may be accomplished by using a combination of an instruction "and", a shift instruction "shift", a summation instruction "add", and the like, so as to obtain the compressed output data.

And 505, writing the compressed output data into the storage device.

After step 505 is executed, the storage device stores the compressed output data, that is, the number of bits occupied by one compressed output data in the storage device is a first number of bits.

In this embodiment, a calculation result is obtained by performing preset calculation according to target data, the calculation result is quantized by using a first bit number to obtain output data of a calculation node, the number of bits occupied by the output data is compressed from a second bit number to the first bit number to obtain compressed output data, and the compressed output data is written into a storage device, so that the data size of the output data of the calculation node read from the storage device by the calculation device can be reduced.

It should be noted that the pack process in step 302 may be implemented instead in a data transfer process of transferring target data from a storage device to a computing device, that is, the number of bits occupied by the target data may be expanded from a first number of bits to a second number of bits in the process of reading the target data from the storage device in step 201.

And/or, the unpack process of step 504 may be implemented in a data relocation process in which the computing device writes the output data into the storage device instead, that is, step 504 and step 505 may be implemented in a process of compressing the number of bits occupied by the output data from the second number of bits to the first number of bits in a process of writing the output data into the computing device after the output data obtained in step 503 is written into the computing device instead.

For example, a Direct Memory Access (DMA) transfer may be used to implement a pack process and an unpack process in the data relocation process.

It should be noted that the above process of calculating the output data of a computing node according to the target data of the computing node may be regarded as a computing process of the computing node, and for example, the computing process may include step 502 and step 503. Further, taking the first bit number as 4 and the second bit number as 8 as an example, as shown in fig. 7, an unpack process may be performed before the calculation process k, so as to expand the bit number of the input data i [ k ] occupying 4 bits in the storage device to 8 bits, and a pack process may be performed after the calculation process k, so as to compress the bit number occupied by the output data o [ k ] occupying 8 bits output by the calculation process k to 4 bits, so that the o [ k ] stored in the storage device occupies 4 bits.

On the basis of the above embodiments, the neural network may be trained according to a specific training strategy. For example, when the target data includes a weight parameter, the training strategy may include a first training strategy. Wherein the first training strategy comprises: and storing the floating point type weight parameters in a training process, and quantizing the floating point type weight parameters by adopting the first bit number before calculating according to the floating point type weight parameters of the calculation nodes.

It should be noted that, the process of calculating the output data of a computing node according to the activation data and the weight parameter of the computing node may be regarded as a calculation process of the computing node. Because the data type according to the calculation process of a calculation node in the prediction process of the neural network is consistent with the data type according to the calculation process of the calculation node in the training process, the data type is fixed-point data represented by a first bit number, and the precision of the fixed-point data is greatly influenced by the bit number of the fixed-point data, floating-point weight parameters are stored in the training process, the fixed-point weight parameters represented by the first bit number are simulated by using the floating-point weight parameters, and compared with the fixed-point weight parameters represented by the first bit number stored in the training process, the precision of parameter learning in the neural network can be improved, so that the precision of the neural network can be improved.

For example, the quantizing the floating-point weight parameter by using the first bit number may specifically include the following step a1 and step a 2.

Step A1, converting the floating point type weight parameter to a first preset range to obtain a converted floating point type weight parameter; the minimum value of the first preset range is greater than or equal to 0.

Step A2, quantizing the transformed floating point type weight parameter by using the first bit number.

Specifically, in step a1, the floating-point weight parameter may be converted from a real number range to a positive number range. To improve the computational efficiency of the neural network, the first preset range includes 0 to 1. Step a2 may specifically implement fixed-point quantization of a first bit number on the floating-point type weight parameter transformed to the first preset range, so as to obtain a fixed-point type weight parameter represented by the first bit number, and further, the calculation node may perform calculation according to the fixed-point type weight parameter represented by the first bit number.

Illustratively, step a1 may specifically include steps a11 and a12 as follows.

Step a11, performing non-linear transform (non-linear transform) on the floating-point type weight parameter to transform the floating-point type weight parameter to a second preset range, where a minimum value of the second preset range is smaller than 0.

Step a12, performing scaling and shift (scale & shift) transformation on the floating-point type weight parameters after the nonlinear transformation, so as to transform the floating-point type weight parameters to the first preset range.

The specific manner of the nonlinear transformation in step a11 may not be limited in this application, and the nonlinear transformation may be implemented by a tangent function or an S-type function, for example. Illustratively, the second predetermined range includes-1 to 1. The purpose of further performing scaling and shift transformation on the floating point type weight parameter after the nonlinear transformation in the step a12 may include that a value range of the floating point type weight parameter after the transformation is a first preset range, and the specific manner of scaling and transformation may not be limited in the present application.

For example, in step a2, the transformed floating-point type weight parameters may be symmetrically and uniformly quantized (symmetry uniform quantization) by using the first bit number. The combination effect of asymmetric non-uniform quantization may be better than symmetric uniform quantization, and since the implementation efficiency of symmetric uniform quantization hardware is high based on hardware considerations, the ability of symmetric uniform quantization in step a2 may ensure hardware efficiency.

Optionally, the above-mentioned correlation operation for quantizing the floating-point type weight parameter by using the first bit number is derivable. The correlation operation by quantizing the floating-point type weight parameter with the first bit number is derivable so that the network parameter of the computation node can be iteratively solved by a gradient descent method.

When the target data comprises activation data, the output data is fixed-point data represented by the first bit number, and the bit number occupied by the output data is the second bit number;

illustratively, when the target data includes activation data, the training strategy includes a second training strategy. Wherein the second training strategy comprises: and storing floating point type activation data in a training process, and quantizing the floating point type activation data by adopting the first bit number before calculating according to the floating point type activation data of the calculation node.

Because the data type according to the calculation process of a calculation node in the prediction process of the neural network is consistent with the data type according to the calculation process of the calculation node in the training process, the data type is fixed-point data represented by the first bit number, and the precision of the fixed-point data is greatly influenced by the bit number of the fixed-point data, the floating-point activation data is used for simulating the fixed-point activation data represented by the first bit number, compared with the fixed-point activation data represented by the first bit number stored in the training process, the precision of parameter learning in the neural network can be improved, and the precision of the neural network can be improved.

Illustratively, the floating point type activation data may be quantized using step B1 as follows.

And step B1, when the value range of the floating point type activation data does not include a negative number, directly adopting the first bit number to quantize the floating point type activation data.

Alternatively, for example, the floating point type activation data may be quantized using the following steps B2 and B3.

And step B2, when the value range of the floating point type activation data includes a negative number, scaling and shifting the floating point type activation data to transform the floating point type activation data to a third preset range, wherein the minimum value of the third preset range is greater than or equal to 0.

And step B3, quantizing the scaled and shifted floating point type activation data by using the first bit number.

The purpose of scaling and shifting the floating-point activation data in step B2 may include: the minimum value of the value range of the transformed floating point type activation data is greater than or equal to 0, and the specific modes of scaling and transformation are not limited in the application.

The step B1 or the step B2-step B3 may specifically implement fixed-point quantization of a first bit number for floating-point type activation data, so as to obtain fixed-point type activation data represented by the first bit number, and further, the computing node may perform computation according to a fixed-point type activation parameter represented by the first bit number.

Because the offset parameters in the network parameters account for less calculation, the offset parameters can be quantized in a high-precision quantization mode in the embodiment of the application. Illustratively, the training strategy may further include a third training strategy; the third training strategy comprises: storing a floating point type offset (bias) parameter during a training process, and quantizing the floating point type offset parameter by adopting a third bit number before calculating according to the floating point type offset parameter of the calculation node. Wherein the third number of bits may be greater than the first number of bits.

Illustratively, the third number of bits may be greater than the second number of bits, e.g., the third number of bits is equal to 32, the second number of bits is equal to 8, and the third number of bits is equal to 4.

It should be noted that the first training strategy, the second training strategy, and the third training strategy may be combined, that is, the training strategy may include one or more of the first training strategy, the second training strategy, and the third training strategy.

Taking the Training strategy including the first Training strategy and the second Training strategy, the neural network being a convolutional neural network and the first number of bits being 4 as an example, the Training process (Training procedure) of the compute node may be as shown in fig. 8, for example.

As shown in fig. 8, the quantization of the training process can be divided into weight quantization and input quantization. Wherein, the weight quantization can be composed of three parts of (1) nonlinear transformation, (2) scaling and shifting, and (3) symmetrical uniform quantization. (1) Non-linear transformation, which non-linearly transforms floating point (float) weight (weight) to (-1,1) range, e.g. non-linear transformation

Etc.; (2) scaling and shifting through a transfer function

The weighting parameter is ranged from [ -1,1]Conversion to [0,1](ii) a (3) Symmetrically uniform quantization, by means of a quantization function

Symmetric uniform quantization is performed. The input quantization, i.e. the quantization of the activation data, distinguishes whether or not it contains a negative number, and in the case of a negative number, it can be scaled and shifted to a range of [0, a ]]And finally 4bit uniform quantization is carried out.

Further, after the floating-point input and the floating-point weight of the compute node are quantized, convolution operation may be performed on the quantized input and the quantized weight based on the conv2d function, and further, output data of the compute node may be obtained by performing Batch Normalization (BN) on an operation result of the convolution operation.

Based on the training process shown in fig. 8, the prediction process (Inference process) of the computation node in the neural network may be as shown in fig. 9. As shown in fig. 9, the activation data and the weight parameter according to which the computing node predicts are both fixed-point data expressed by 4 bits. In addition, as shown in fig. 9, in order to reduce the data amount of the output data of the computing node read by the computing device from the storage device, after performing convolution calculation in the computing process of the computing node, 4 bits may be used to quantize the result of the convolution calculation, and the fixed-point data represented by 4 bits is obtained as the output data of the computing node. It is understood that, when the second bit number is 8, although the activation data and the weight parameter according to which the computing node predicts during the prediction process, and the output data of the computing node are all fixed-point data represented by 4 bits, the bit numbers occupied by the activation data, the weight parameter, and the output data are all the second bit number, for example, 8 bits.

It is to be noted that, as can be seen from fig. 8 and 9, when the target data includes the weight parameter, the weight parameter is stored in the form of floating-point data in the training process, but the weight parameter is stored in the form of fixed-point data in the prediction process. Specifically, the weight parameter may be converted from floating-point type data to fixed-point type data represented by a first bit number after the neural network training is completed.

Optionally, the computing nodes in the above method embodiments may be part of computing nodes in the neural network. For other computing nodes except the partial computing nodes in the neural network, the node data can be fixed-point data represented by a second bit number, and the bit number occupied by one node data in the storage device is the second bit number. Specifically, on the basis of the above method embodiment, the method may further include: and reading node data of other computing nodes in the neural network from the storage device, wherein the node data is fixed-point data represented by the second bit number. The node data may include, among other things, activation data and network parameters.

Optionally, the other computing nodes may be points with higher requirements on precision, and exemplarily, the other computing nodes are computing nodes in which the degree of reduction of the precision of the neural network is greater than a degree threshold value by using the first bit number to represent node data compared with the second bit number to represent node data. Node data of other computing nodes with higher precision requirement in the neural network are fixed-point data represented by a second bit number, so that the occupation of bandwidth between computing equipment and storage equipment can be reduced on the basis of ensuring the precision of the algorithm.

Further optionally, the other computing nodes include, but are not limited to, one or more of the following: the computing nodes of the first layer in the neural network, the computing nodes of the last layer in the neural network and the computing nodes related to the coordinates in the neural network. The computing node associated with the coordinates may specifically be a computing node whose input includes the coordinates or whose output includes the coordinates, for example, a computing node for extracting the region of interest from the graph.

It should be noted that, for other computing nodes, quantization is also performed based on the second bit number during the training process. For example, the training strategy may further include: before calculation is carried out according to the floating point type weight parameters of the other calculation nodes, the second bit number is adopted to quantize the floating point type weight parameters; and/or quantizing the floating point type activation data by adopting the second bit number before calculation according to the floating point type activation data of other calculation nodes.

Fig. 10 is a schematic structural diagram of a computing device according to an embodiment of the present application, and as shown in fig. 10, the computing device 100 may include: a processor 101 and a memory 102.

The memory 102 is used for storing program codes;

the processor 101, invoking the program code, when executed, is configured to perform the following:

The computing device provided in this embodiment may be configured to execute the technical solution of the foregoing method embodiment, and the implementation principle and technical effect of the computing device are similar to those of the method embodiment, and are not described herein again.

Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.

Finally, it should be noted that: the above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present application.

Claims

1. A data processing method applied to computing equipment is characterized by comprising the following steps:

2. The method of claim 1, wherein computing the output data of the compute node from the target data comprises:

expanding the bit number occupied by the target data from the first bit number to the second bit number to obtain the expanded target data;

and calculating to obtain the output data of the computing node according to the expanded target data.

3. The method of claim 1, wherein reading target data of a compute node in a neural network from a storage device comprises:

and in the process of reading target data of a calculation node in the neural network from the storage device, expanding the bit number occupied by the target data from the first bit number to the second bit number.

4. The method according to claim 2 or 3, wherein the expanding the number of bits occupied by the target data from the first number of bits to the second number of bits comprises:

and adding N padding bits at the head and/or tail of the first bit number occupied by the target data, wherein N is the difference between the second bit number and the first bit number.

5. The method of claim 1, wherein computing the output data of the compute node from the target data comprises:

performing preset calculation according to the target data to obtain a calculation result;

quantizing the calculation result by adopting the first bit number to obtain output data of the calculation node; the output data is fixed-point data represented by the first bit number, and the occupied bit number is the second bit number;

writing the output data to the storage device.

6. The method of claim 5, wherein prior to writing the output data to the storage device, further comprising:

compressing the bit number occupied by the output data from the second bit number to the first bit number to obtain the compressed output data;

the writing the output data to the storage device includes: and writing the compressed output data into the storage device.

7. The method according to claim 6, wherein compressing the number of bits occupied by the output data from the second number of bits to the first number of bits to obtain the compressed output data comprises:

and selecting effective continuous bits of the first bit number from the second bit numbers occupied by the output data as the compressed output data.

8. The method according to any of claims 1-3, 5-7, wherein the target data comprises a weight parameter and/or activation data.

9. The method of claim 8, wherein the neural network is trained according to a training strategy, the training strategy comprising a first training strategy when the target data comprises weight parameters;

the first training strategy comprises: and storing the floating point type weight parameters in a training process, and quantizing the floating point type weight parameters by adopting the first bit number before calculating according to the floating point type weight parameters of the calculation nodes.

10. The method of claim 9, wherein quantizing the floating-point weight parameter with the first number of bits comprises:

converting the floating point type weight parameter to a first preset range to obtain a converted floating point type weight parameter; the minimum value of the first preset range is greater than or equal to 0;

and quantizing the transformed floating point type weight parameters by adopting the first bit number.

11. The method of claim 10, wherein transforming the floating-point type weight parameter to a first predetermined range comprises:

performing nonlinear transformation on the floating-point type weight parameters to transform the floating-point type weight parameters to a second preset range, wherein the minimum value of the second preset range is less than 0;

and carrying out scaling and shifting transformation on the floating point type weight parameters after the nonlinear transformation so as to transform the floating point type weight parameters to the first preset range.

12. The method of claim 11, wherein the second predetermined range comprises-1 to 1.

13. The method of claim 10, wherein the first predetermined range comprises 0 to 1.

14. The method of claim 10, wherein the correlation operation for quantizing the floating-point weight parameter with the first number of bits is guided.

15. The method of claim 8, wherein the neural network is trained according to a training strategy, the training strategy comprising a second training strategy when the target data comprises activation data;

the second training strategy comprises: and storing floating point type activation data in a training process, and quantizing the floating point type activation data by adopting the first bit number before calculating according to the floating point type activation data of the calculation node.

16. The method of claim 15, wherein quantizing the floating-point activation data with the first number of bits comprises:

and when the value range of the floating point type activation data does not comprise negative numbers, directly adopting the first bit number to quantize the floating point type activation data.

17. The method of claim 16, wherein quantizing the floating-point activation data with the first number of bits comprises:

when the value range of the floating point type activation data comprises a negative number, carrying out scaling and shifting transformation on the floating point type activation data so as to transform the floating point type activation data to a third preset range, wherein the minimum value of the third preset range is greater than or equal to 0;

and quantizing the floating point type activation data after scaling and shifting transformation by adopting the first bit number.

18. The method of claim 8, wherein the training strategies further comprise a third training strategy;

the third training strategy comprises: and storing the floating point type offset parameters in a training process, and quantizing the floating point type offset parameters by adopting a third bit number before calculating according to the floating point type offset parameters of the calculation nodes, wherein the third bit number is larger than the first bit number.

19. The method of any one of claims 1-3, 5-7, wherein the compute nodes are part of compute nodes in the neural network.

20. The method of claim 19, further comprising:

and reading node data of other computing nodes in the neural network from the storage device, wherein the node data is fixed-point data represented by the second bit number.

21. The method of claim 20, wherein the other computing nodes are computing nodes that have the neural network accuracy degraded more than a threshold degree by the first number of bits representing node data compared to the second number of bits representing node data.

22. The method of claim 21, wherein the other computing nodes comprise one or more of:

the computing nodes of the first layer in the neural network, the computing nodes of the last layer in the neural network and the computing nodes related to the coordinates in the neural network.

23. The method of claim 1, wherein the second number of bits is equal to 8.

24. The method of claim 23, wherein the first number of bits is equal to 4.

25. The method of claim 1, wherein the second number of bits is equal to 16.

26. The method of claim 25, wherein the first number of bits is equal to 8.

27. The method of claim 1, wherein the computing device comprises one or more of:

CPU, Advanced RISC Machine (ARM), DSP and GPU.

28. The method of claim 1, wherein the neural network comprises a convolutional neural network.

29. A computing device, comprising: a processor and a memory;

the memory for storing program code;

the processor, invoking the program code, when executed, is configured to:

30. The apparatus according to claim 29, wherein the processor is configured to calculate, according to the target data, output data of the computing node, and specifically includes:

31. The device of claim 29, wherein the processor is configured to read target data of a compute node in the neural network from the storage device, and in particular comprises:

32. The apparatus according to claim 30 or 31, wherein the processor is configured to expand the number of bits occupied by the target data from the first number of bits to the second number of bits, and specifically includes:

33. The apparatus according to claim 29, wherein the processor is configured to calculate, according to the target data, output data of the computing node, and specifically includes:

writing the output data to the storage device.

34. The apparatus of claim 33, wherein the processor is further configured to compress the number of bits occupied by the output data from the second number of bits to the first number of bits, so as to obtain the compressed output data;

the processor is configured to write the output data into the storage device, and specifically includes: and writing the compressed output data into the storage device.

35. The apparatus according to claim 34, wherein the processor is configured to compress the number of bits occupied by the output data from the second number of bits to the first number of bits to obtain the compressed output data, and specifically includes:

36. An apparatus according to any of claims 29-31, 33-35, wherein the target data comprises a weight parameter and/or activation data.

37. The apparatus of claim 36, wherein the neural network is trained according to a training strategy, the training strategy comprising a first training strategy when the target data comprises weight parameters;

38. The apparatus of claim 37, wherein the quantizing the floating-point weight parameter with the first number of bits comprises:

39. The apparatus of claim 38, wherein said transforming the floating-point type weight parameter to a first predetermined range comprises:

40. The apparatus of claim 39, wherein the second predetermined range comprises-1 to 1.

41. The apparatus of claim 38, wherein the first predetermined range comprises 0 to 1.

42. The apparatus of claim 38, wherein the correlation operation for quantizing the floating-point weight parameters with the first number of bits is guided.

43. The apparatus of claim 36, wherein the neural network is trained according to a training strategy, the training strategy comprising a second training strategy when the target data comprises activation data;

44. The apparatus of claim 43, wherein the quantizing the floating point activation data with the first number of bits comprises:

45. The apparatus of claim 43, wherein the quantizing the floating point activation data with the first number of bits comprises:

46. The apparatus of claim 37, wherein the training strategies further comprise a third training strategy;

47. The apparatus of any one of claims 29-31, 33-35, wherein the compute nodes are part of compute nodes in the neural network.

48. The device of claim 47, wherein the processor is further configured to:

49. The apparatus of claim 48, wherein the other computing nodes are computing nodes that represent node data with the first number of bits to a degree that reduces the neural network accuracy by more than a degree threshold compared to node data represented with the second number of bits.

50. The device of claim 49, wherein the other computing nodes comprise one or more of:

51. The apparatus of claim 29, wherein the second number of bits is equal to 8.

52. The apparatus of claim 51, wherein the first number of bits is equal to 4.

53. The apparatus of claim 29, wherein the second number of bits is equal to 16.

54. The apparatus of claim 53, wherein the first number of bits is equal to 8.

55. The device of claim 29, wherein the computing device comprises one or more of:

CPU, Advanced RISC Machine (ARM), DSP and GPU.

56. The apparatus of claim 29, wherein the neural network comprises a convolutional neural network.

57. A computer-readable storage medium, having stored thereon a computer program comprising at least one code section executable by a computer for controlling the computer to perform the method according to any one of claims 1-28.

58. A computer program for implementing the method according to any one of claims 1-28 when the computer program is executed by a computer.