WO2021012148A1

WO2021012148A1 - Data processing method and apparatus based on deep neural network, and mobile device

Info

Publication number: WO2021012148A1
Application number: PCT/CN2019/097072
Authority: WO
Inventors: 陈诗南; 余俊峰; 周爱春; 张伟
Original assignee: 深圳市大疆创新科技有限公司
Priority date: 2019-07-22
Filing date: 2019-07-22
Publication date: 2021-01-28
Also published as: CN111344719A

Abstract

Disclosed are a data processing method and apparatus based on a deep neural network, and a mobile device. The method comprises: acquiring a floating-point network model of a deep neural network; quantifying the floating-point network model to obtain at least two fixed-point network models with different accuracies; according to the accuracies of the fixed-point network models, selecting one fixed-point network model of the at least two fixed-point network models; and using the selected fixed-point network model to process data.

Description

Data processing method, device and mobile equipment based on deep neural network

Technical field

The present disclosure relates to the technical field of artificial neural networks, and in particular to a data processing method, device and mobile equipment based on a deep neural network.

Background technique

Deep neural networks have been widely used in mobile devices. When a deep neural network is deployed to a mobile device, due to the limited computing resources of the mobile device, the network model needs to be quantified. If the number of quantized digits is large, it will lead to high bandwidth pressure. If the number of quantized digits is small, it will affect the accuracy of the network. Therefore, deep neural networks in mobile devices have a contradiction between ensuring network accuracy and reducing bandwidth.

At the same time, there are a large number of intermediate results in deep neural network operations. The prior art often stores the intermediate results in an off-chip memory and then reads the intermediate results from the off-chip memory. This frequent read and write operations on the off-chip memory has further increased. Increased bandwidth pressure.

Public content

The present disclosure provides a data processing method based on a deep neural network, which includes:

Acquiring a floating-point network model of the deep neural network;

Quantify the floating-point network model to obtain at least two fixed-point network models with different precisions;

Selecting one of at least two fixed-point network models according to the accuracy of the fixed-point network model;

The data is processed using the selected fixed-point network model.

The present disclosure also provides a data processing device based on a deep neural network, which includes:

An obtaining unit for obtaining the floating-point network model of the deep neural network;

A quantization unit, configured to quantify the floating-point network model to obtain at least two fixed-point network models with different precisions;

A selection unit, configured to select one of at least two fixed-point network models according to the accuracy of the fixed-point network model;

The processing unit uses the selected fixed-point network model to process the data.

The present disclosure also provides a mobile device, which includes: the above-mentioned data processing device based on a deep neural network.

It can be seen from the above technical solutions that the present disclosure has at least the following beneficial effects:

By quantifying the floating-point network model into at least two fixed-point network models with different precisions, and selecting a fixed-point network model to process the data according to the accuracy of the fixed-point network model, it is possible to achieve as much as possible while ensuring the network accuracy It reduces the required computing power, reduces the bandwidth requirements, and balances accuracy and bandwidth, effectively solving the contradiction between network accuracy and bandwidth, and improving the performance of mobile devices to perform deep neural network operations.

Description of the drawings

The accompanying drawings are used to provide a further understanding of the present disclosure and constitute a part of the specification. Together with the following specific embodiments, they are used to explain the present disclosure, but do not constitute a limitation to the present disclosure. In the attached picture:

FIG. 1 is a flowchart of a data processing method based on a deep neural network according to an embodiment of the disclosure.

FIG. 2 is a data diagram of a data processing method based on a deep neural network according to an embodiment of the disclosure.

Figure 3 is a schematic diagram of the structure of the convolutional layer of the deep neural network.

4 is a schematic diagram of a deep convolution operation of a data processing method based on a deep neural network according to an embodiment of the disclosure.

FIG. 5 is a schematic diagram of a point convolution operation of a data processing method based on a deep neural network according to an embodiment of the disclosure.

Detailed ways

The technical solutions of the present disclosure will be clearly and completely described below in conjunction with the embodiments and the drawings in the embodiments. Obviously, the described embodiments are only a part of the embodiments of the present disclosure, rather than all the embodiments. Based on the embodiments in the present disclosure, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of the present disclosure.

An embodiment of the present disclosure provides a data processing method based on a deep neural network. A deep neural network (DNN) generally refers to an artificial neural network including an input layer, multiple hidden layers, and an output layer. Deep neural networks include: Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), Long Short-Term Memory Networks (LSTM) and many other types of neural networks. In this embodiment, a convolutional neural network is taken as an example to describe the data processing method, but those skilled in the art should understand that the data processing method of this embodiment is not limited to convolutional neural networks, but is applicable to all types Deep neural network.

For a convolutional neural network, each hidden layer includes multiple operations such as convolution, bias, normalization (BN), activation, and quantization. Among them, the convolution operation is generally called a convolution layer.

It should be noted that, for the convolutional neural network, the above description of the hidden layer is only exemplary, and does not constitute a limitation on the order and quantity of each operation or each layer. Each operation or each layer can have various modifications. The position and number of several operations or layers can be changed. For example, some hidden layers may not have normalization operations and activation operations, and some hidden layers also include other operations or layers such as pooling and full connection. When the convolutional neural network is a fixed-point network, the hidden layer includes quantization operations. When the convolutional neural network is a floating-point network, the hidden layer does not include quantization operations.

In this embodiment, the depthwise separable convolution in the convolutional neural network is taken as an example to describe the data processing method. The convolutional layer introduced above is a standard convolutional layer, which performs standard convolution operations. In the deep separable convolutional neural network, the convolutional layer in the hidden layer includes: the two convolutional layers obtained by splitting the standard convolutional layer: the deep convolutional layer and the point convolutional layer, which perform the deep convolution respectively Product operation and point convolution operation. Convolutional neural networks generally target the data of multiple input channels. The deep convolution layer first uses a convolution kernel to perform deep convolution on the data of each input channel to obtain the deep convolution results of each input channel. The build-up layer then performs point convolution on the depth convolution result, fusing the information of each input channel. Compared with the standard convolutional neural network, the deep separable convolutional neural network has greatly reduced required parameters and reduced the amount of calculation. It is especially suitable for scenarios with limited computing resources such as mobile devices.

The deeply separable convolutional neural network in this embodiment includes a variety of different implementations, such as the Mobile Net network model. The size of the convolution kernel of the point convolution layer of MobileNet is 1×1. In this embodiment, a standard 28-layer Mobile Net network model can be used, or a deep separable convolutional neural network obtained by reducing, expanding, and deforming the standard Mobile Net network model can also be used.

The data processing method based on the deep neural network of this embodiment, as shown in FIG. 1 and FIG. 2, includes the following steps:

S101: Obtain a floating-point network model of the deep neural network.

The data processing method of this embodiment is suitable for mobile devices. In this step, the mobile device can directly obtain a floating-point network model from the outside, and the floating-point network model is a trained deep neural network.

The training of the deep neural network is performed in two steps, see the "training" link in Figure 2:

First, construct a convolutional neural network that is separable in depth.

Then, train the constructed deep-separable convolutional neural network to obtain a deep-separable convolutional neural network model.

As mentioned above, the deep separable convolutional neural network constructed in this step can be Mobile Net or other types of neural networks. The convolutional layers of these neural networks include deep convolutional layers and point convolutional layers. Floor. As shown in FIG. 3, in an example, both the deep convolution layer and the point convolution layer include: convolution, bias, activation, and quantization operations.

Convolutional neural networks are usually used in image processing, especially in scenes such as image recognition and image classification. Therefore, in this step, the training image is used to train the depth-separable convolutional neural network. First, normalize the training image data, normalize the training image data to [-1, 1), and then input the normalized data into the constructed deep separable convolutional neural network for training, and get the depth Separable convolutional neural network model.

It should be noted that in order to ensure the accuracy of the network, a floating point number with a larger number of digits is generally used to represent the convolutional neural network in the training phase. In this step, a 32-bit floating point number can be used to represent the depth of the separable convolutional neural network. The weights, bias values, and activation values in the deep convolutional layer and the point convolutional layer are all represented by 32-bit floating point numbers. The resulting floating-point network model is a 32-bit floating-point network model. Due to the use of 32-bit floating point numbers, the data volume of convolutional neural network parameters and the amount of training calculations are very large. Mobile devices cannot provide enough computing resources. Therefore, deep neural networks are generally not trained on mobile devices, but on servers or computers. Perform training on the Internet, and then transplant the trained floating-point network model to mobile devices.

The foregoing is only an exemplary description. In this embodiment, floating-point numbers of other digits may also represent a convolutional neural network with a depth of separation, and the training data may also be other data besides image data, such as voice data.

S201: Perform quantization on the floating-point network model to obtain at least two fixed-point network models with different precisions.

Due to the limited storage and computing capabilities of mobile devices, they may not even support floating-point operations. Therefore, if the floating-point network model is directly used for data processing, it will cause a serious burden on the storage, computing capabilities and power consumption of mobile devices. The device may even be unable to complete the data processing process. Therefore, mobile devices need to quantify the floating-point network model, and get the fixed-point network model after quantification. Fixed-point network model The floating-point network model requires less storage and computing power, and is very suitable for mobile devices.

Refer to the "quantization" link in Figure 2. The quantization method used in this step can be called mixed precision quantization. Specifically, for the deep convolutional layer of the floating-point network model convolutional layer, different bits are used to quantify the weight of the deep convolutional layer, and the same bit is used to quantify the activation value of the deep convolutional layer. For the point convolution layer of the floating point network model, the same number of bits is used to quantify the weight of the point convolution layer, and the same number of bits is used to quantify the activation value of the point convolution layer. By performing mixed precision quantization on the weights of the deep convolutional layer, at least two fixed-point network models corresponding to the quantization bits of the weights can be obtained.

In this embodiment, for the deep convolutional layer, the weights can be converted into 8-bit and 16-bit fixed-point numbers respectively; for the 8-bit fixed-point deep convolutional layer, the corresponding activation value is quantized into 8-bit fixed-point numbers; For the 16-bit fixed-point deep convolutional layer, the corresponding activation value is also quantized into an 8-bit fixed-point number. For the point convolutional layer, the weight and activation value are quantized into 8-bit fixed point numbers.

In the above manner, this embodiment quantifies the floating-point network model into two fixed-point network models: the first fixed-point network model and the second fixed-point network model. Among them, the weight and activation value of the deep convolutional layer of the first fixed-point network model are both 8-bit fixed-point numbers (w8a8), and the weight and activation value of the point convolutional layer are both 8-bit fixed-point numbers (w8a8); second The weight of the deep convolution layer of the fixed-point network model is a 16-bit fixed-point number, and the activation value is an 8-bit fixed-point number (w16a8), and the weight and activation value of the point convolution layer are both 8-bit fixed-point number (w8a8).

This step has been described above through an example, but this embodiment is not limited to this. For example, the activation value of the deep convolutional layer can also be quantized with mixed precision, that is, the same number of bits is used to quantize the weight of the deep convolutional layer, and the activation value of the deep convolutional layer is quantized with different bits.

The weight and activation value of the point convolution layer can also be quantified with mixed precision, including: for the deep convolution layer of the floating-point network model convolution layer, the weight is quantized with the same bit, and the same bit is used to activate it Value is quantized; for the point convolution layer of the floating-point network model, use different bits to quantize its weight, use the same bit to quantize its activation value, or use the same bit to quantize its weight, use different bits Quantify its activation value.

It is also possible to use mixed precision quantization for both the deep convolutional layer and the point convolutional layer, and the specific quantization method is similar to the quantization method of the deep convolutional layer or the point convolutional layer described above.

Those skilled in the art should understand that the quantization bits of the above-mentioned first fixed-point network model and the second fixed-point network model are only an example. This embodiment may also use other quantization bits other than 8 bits and 16 bits, and quantization bits. For more than three fixed-point network models. For the deep convolutional layer and the point convolutional layer, when the same number of bits is used to quantize the weight and the same number of bits is used to quantize the activation value, the quantization bit of the weight and the quantization bit of the activation value can be the same, or different.

S301: According to the accuracy of the fixed-point network model, select one of the at least two fixed-point network models;

The accuracy of the fixed-point network model and the required computing power are related to its quantization bits. In step S201, at least two fixed-point network models with different precisions are obtained, corresponding to different computing capabilities. The more quantization bits, the higher the accuracy, and the greater the computing power required, the fewer the quantization bits, the lower the accuracy, and the smaller the required computing power. For example, for the first fixed-point network model and the second fixed-point network model, since the weight of the deep convolutional layer of the second fixed-point network model is 16, the weight of the deep convolutional layer of the first fixed-point network model is fixed-point The number is 8, therefore, the accuracy of the second fixed-point network model is better than that of the first fixed-point network model, but the required computing power is greater than that of the first fixed-point network model. Therefore, this step needs to select a fixed-point network model from at least two fixed-point network models, so as to reduce the required computing power as much as possible while the accuracy is not greatly reduced, thereby reducing the bandwidth requirements and achieving a balance The purpose of accuracy and bandwidth.

Referring to the "selection" link in Figure 2, specifically, the same test data is first input into at least two fixed-point network models, and at least two fixed-point network models perform inferences on the test data to obtain processing results. Wherein, the test data uses an 8-bit integer (i8), which can be image data, or other data besides image data, such as voice data.

When the above-mentioned first fixed-point network model and second fixed-point network model are used, the test image is input into the first fixed-point network model and the second fixed-point network model. The first fixed-point network model and the second fixed-point network model compare the test images respectively. Perform reasoning and get the image processing result.

Obtain the accuracy value of the processing result of each of the fixed-point network models.

In this embodiment, the accuracy value of the processing result is characterized by the mean average accuracy (mAP). When the above-mentioned first fixed-point network model and the second fixed-point network model are used, the accuracy value of the first fixed-point network model is called the first accuracy value, and the accuracy value of the second fixed-point network model is called the second accuracy value. Of course, the accuracy value of the processing result can also be characterized by other parameters such as average accuracy (AP).

It is determined whether there is at least one fixed-point network model, and the difference between its accuracy value and the accuracy value of the fixed-point network model with the highest accuracy is within the threshold.

Take the fixed-point network model with the highest accuracy as the comparison object, and compare accuracy values with other fixed-point network models. The threshold may be set according to experience or accuracy requirements, for example, it may be 1%. In other words, it is judged whether there is a fixed-point network model whose accuracy value is within 1% of the accuracy value of the fixed-point network model with the highest accuracy. When the above-mentioned first fixed-point network model and second fixed-point network model are used, the second fixed-point network model is the fixed-point network model with the highest accuracy, and the accuracy value of the first fixed-point network model is judged to be the difference between the accuracy value of the second fixed-point network model Is the difference within 1%?

If not, use the fixed-point network model with the highest accuracy as the selected fixed-point network model; if so, use the fixed-point network model that occupies the smallest memory among the at least one fixed-point network model as the selected fixed-point network model.

If not, it means that the accuracy difference between other fixed-point network models and the most accurate fixed-point network model is too large. If other fixed-point network models are selected, although the required computing power and bandwidth requirements can be reduced, it will It will seriously affect the accuracy of the network. Therefore, to ensure the accuracy of the network, the fixed-point network model with the highest accuracy is selected at this time.

If it is, it means that the accuracy value difference between other fixed-point network models and the most accurate fixed-point network model is very small, and will not have an excessive impact on network accuracy. Therefore, in order to reduce the required computing power as much as possible and reduce the bandwidth requirements, Select the fixed-point network model that occupies the smallest memory so that the network accuracy will not be significantly reduced.

When the above-mentioned first fixed-point network model and second fixed-point network model are used, if the difference between the accuracy value of the first fixed-point network model and the accuracy value of the second fixed-point network model is within 1%, the first fixed-point network model is selected Point network model, otherwise, select the second fixed point network model.

S401: Use the selected fixed-point network model to process data.

Refer to the "processing" link in Figure 2. After selecting the fixed-point network model, the input data can be processed to obtain the data processing result.

It can be seen that the data processing method based on the deep neural network of this embodiment quantifies the floating-point network model into at least two fixed-point network models with different accuracy, and selects a fixed-point network model according to the accuracy of the fixed-point network model. Processing data can reduce the required computing power and bandwidth requirements as much as possible while ensuring network accuracy, taking into account the balance between accuracy and bandwidth, and effectively solving the contradiction between network accuracy and bandwidth. Improve the performance of mobile devices to perform deep neural network operations.

Another embodiment of the present disclosure provides a data processing method based on a deep neural network. For a brief description, the features that are the same as or similar to the previous embodiment will not be repeated, and only the features that are different from the previous embodiment will be described below. .

In the data processing method of this embodiment, in step S401, in the process of using the selected fixed-point network model to process the data, the deep convolution result of the deep convolution layer is stored in the on-chip memory, and the slice is read The deep convolution result stored in the internal memory, and the point convolution layer processes the deep convolution result.

In a deep neural network, each hidden layer processes the input data, and outputs the processing result to the next hidden layer as the input data of the next hidden layer. When processing image data, the input data and output data of each hidden layer are called feature maps. When performing deep convolution and point convolution operations, the feature maps are processed by tiles.

In this embodiment, for each convolutional layer of the selected fixed-point network model, a deep convolution operation is first performed:

First, a data block (tile) of the feature map is stored in the on-chip memory. The size of the data block is equal to the size of the convolution kernel of the following deep convolution layer. The on-chip memory refers to the internal memory of the processor rather than the external memory, and the on-chip memory may be on-chip memory or cache.

Then the data block is read from the on-chip memory, and the deep convolution layer processes the data block to obtain the deep convolution result of the data block. As shown in Figure 4, the data block is convolved with the weight of the convolution kernel, and the convolution result is superimposed with the offset value. In Figure 4, [1, C, H _t , W _t ] represents the parameters of the data block, where C represents the number of input channels, and H _t and W _t represent the height and width of the data block, respectively. [C, 1, H _w , W _w ] represents the parameters of the weight matrix of the deep convolutional layer, where C represents the number of input channels, and H _w and W _w represent the height and width of the weight matrix, respectively. [C, 1, 1, 1] represents the parameter of the bias value of the deep convolutional layer, and C represents the number of input channels. If the deep convolutional layer has an activation operation, the activation function is used to activate the output value of the bias operation to obtain the activation value, and finally the activation value is quantized to obtain the deep convolution result.

Finally, the deep convolution result of the data block is stored in the on-chip memory, that is, stored in the on-chip memory or cache of the processor, rather than stored in an off-chip memory such as DDR.

When processing feature maps of multiple input channels, the above-mentioned deep convolution operation can be used to process the feature maps of each input channel in parallel to improve computing efficiency.

Then perform the point convolution operation:

First, read the deep convolution result of the data block stored in the on-chip memory. That is, the deep convolution result is read from the processor's on-chip memory or cache instead of off-chip memory such as DDR.

Then the point convolution layer processes the deep convolution result of the data block to obtain the point convolution result of the data block. As shown in Figure 5, the data blocks of each input channel are convolved with the weight of the convolution kernel, and the convolution result is superimposed and then the offset value is superimposed. In Figure 5, [1, C, 1, 1] represents the parameters of the weight matrix of the point convolution layer, where C represents the number of input channels, and the third and

fourth elements

1 and 1, respectively represent the weights The height and width of the matrix, that is, the weight matrix of the point convolution layer, is a 1×1 matrix. [1,1,1,1] represents the parameter of the offset value of the dot convolution layer.

If the dot convolution layer has an activation operation, the activation function is used to activate the output value of the bias operation to obtain the activation value, and finally the activation value is quantized to obtain the dot convolution result of the data block.

Finally, the point convolution result of the data block is stored in the on-chip memory.

Among them, the point convolution layer generally has multiple output channels, and the aforementioned point convolution operation can be used to process each output channel in parallel to improve computing efficiency.

Move the data blocks of the feature map, and perform the above-mentioned deep convolution and point convolution operations on each data block of the feature map until all the data blocks of the feature map are processed.

It can be seen that the data processing method based on the deep neural network of this embodiment stores the deep convolution result in the on-chip memory, and the point convolution layer reads the deep convolution result from the on-chip memory for processing, and the intermediate result (depth The reading and writing of the convolution result) are all done on-chip, without reading and writing off-chip memory. Compared with the method of writing intermediate results into off-chip memory and then reading intermediate results from off-chip memory, it further saves the bandwidth of mobile devices, improves the performance of mobile devices to perform deep neural network operations, and can support low computing power and bandwidth Few such lower-end mobile devices.

Another embodiment of the present disclosure provides a data processing device based on a deep neural network, including:

The obtaining unit is used to obtain the floating point network model of the deep neural network.

The quantization unit is used to quantize the floating-point network model to obtain at least two fixed-point network models with different precisions.

The selection unit is configured to select one of at least two fixed-point network models according to the accuracy of the fixed-point network model.

The data processing apparatus of this embodiment is used for mobile equipment, and the mobile equipment can directly obtain a floating-point network model from the outside, and the floating-point network model is a trained deep neural network.

Training a deep neural network is performed in two steps. First, build a deep separable convolutional neural network. Then, train the constructed deep-separable convolutional neural network to obtain a deep-separable convolutional neural network model.

As mentioned above, the depth separable convolutional neural network constructed in this embodiment can be Mobile Net or other types of neural networks. The convolutional layers of these neural networks include deep convolutional layers and point convolutional layers. Buildup.

In this embodiment, training images are used to train a convolutional neural network that is separable in depth. First, normalize the training image data, normalize the training image data to [-1, 1), and then input the normalized data into the constructed deep separable convolutional neural network for training, and get the depth Separable convolutional neural network model.

In this embodiment, a 32-bit floating point number can be used to represent a convolutional neural network that is separable in depth, and the weights, bias values, and activation values in the deep convolutional layer and the point convolutional layer are all represented by 32-bit floating point numbers , The floating-point network model obtained is 32 floating-point network model. Due to the use of 32-bit floating point numbers, the data volume of convolutional neural network parameters and the amount of training calculations are very large. Mobile devices cannot provide enough computing resources. Therefore, deep neural networks are generally not trained on mobile devices, but on servers or computers. Perform training on the Internet, and then transplant the trained floating-point network model to mobile devices.

Due to the limited storage and computing capabilities of mobile devices, they may not even support floating-point operations. Therefore, if the floating-point network model is directly used for data processing, it will cause a serious burden on the storage, computing capabilities and power consumption of mobile devices. The device may even be unable to complete the data processing process. Therefore, mobile devices need to quantify the floating-point network model, and obtain the fixed-point network model after quantification. The fixed-point network model requires less storage and computing power than the floating-point network model, and is very suitable for mobile devices.

The quantization method used by the quantization unit can be called mixed precision quantization. Specifically, for the deep convolutional layer of the floating-point network model convolutional layer, different bits are used to quantify the weight of the deep convolutional layer, and the same bit is used to quantify the activation value of the deep convolutional layer. For the point convolution layer of the floating point network model, the same number of bits is used to quantify the weight of the point convolution layer, and the same number of bits is used to quantify the activation value of the point convolution layer. By quantizing the weights of the deep convolutional layer with mixed precision, at least two fixed-point network models corresponding to the quantization bits of the weights can be obtained.

In the above manner, this embodiment quantifies the floating-point network model into two fixed-point network models: the first fixed-point network model and the second fixed-point network model. Among them, the weight and activation value of the deep convolutional layer of the first fixed-point network model are both 8-bit fixed-point numbers (w8a8), and the weight and activation value of the point convolutional layer are both 8-bit fixed-point numbers (w8a8); second The weight of the deep convolution layer of the fixed-point network model is a 16-bit fixed-point number, and the activation value is an 8-bit fixed-point number (w16a8), and the weight and activation value of the point convolutional layer are both 8-bit fixed-point number (w8a8).

It is also possible to use mixed precision quantization for both the deep convolutional layer and the point convolutional layer, and the specific quantization method is similar to the quantization method of the deep convolutional layer or the point convolutional layer.

The accuracy of the fixed-point network model and the required computing power are related to its quantization bits. At least two fixed-point network models with different precisions obtained by the quantization unit correspond to different computing capabilities. The more quantization bits, the higher the accuracy, and the greater the computing power required, the fewer the quantization bits, the lower the accuracy, and the smaller the required computing power. For example, for the first fixed-point network model and the second fixed-point network model, since the weight of the deep convolutional layer of the second fixed-point network model is 16, the weight of the deep convolutional layer of the first fixed-point network model is fixed-point The number is 8, therefore, the accuracy of the second fixed-point network model is better than that of the first fixed-point network model, but the required computing power is greater than that of the first fixed-point network model. Therefore, the selected unit needs to select a fixed-point network model from at least two fixed-point network models, so as to reduce the required computing power as much as possible while the accuracy is not greatly reduced, thereby reducing the bandwidth requirements and achieving Balance the purpose of accuracy and bandwidth.

Specifically, the selected unit first inputs the same test data into at least two fixed-point network models, and the at least two fixed-point network models perform inferences on the test data to obtain processing results. Wherein, the test data uses an 8-bit integer (i8), which can be image data, or other data besides image data, such as voice data.

Then the selected unit obtains the accuracy value of the processing result of each of the fixed-point network models.

The selected unit then judges whether there is at least one fixed-point network model, and the difference between its precision value and the precision value of the fixed-point network model with the highest precision is within the threshold.

If not, it means that the accuracy value of other fixed-point network models is too large compared with the fixed-point network model with the highest accuracy. If other fixed-point network models are selected, although the required computing power and bandwidth requirements can be reduced, it will It will seriously affect the accuracy of the network. Therefore, to ensure the accuracy of the network, the fixed-point network model with the highest accuracy is selected at this time.

The processing unit uses the selected fixed-point network model to process the data. After the fixed-point network model is selected, the processing unit can process the input data to obtain the data processing result.

It can be seen that the data processing device based on the deep neural network of this embodiment quantifies the floating-point network model into at least two fixed-point network models with different accuracy, and selects a fixed-point network model according to the accuracy of the fixed-point network model. Processing data can reduce the required computing power and bandwidth requirements as much as possible while ensuring network accuracy, taking into account the balance between accuracy and bandwidth, and effectively solving the contradiction between network accuracy and bandwidth. Improve the performance of mobile devices to perform deep neural network operations.

For a brief description of the data processing device based on a deep neural network according to another embodiment of the present disclosure, the features that are the same as or similar to the previous embodiment will not be repeated, and only the features that are different from the previous embodiment will be described below.

In the data processing device of this embodiment, the processing unit stores the deep convolution result of the deep convolution layer into the on-chip memory and reads the on-chip memory storage during the process of processing the data using the selected fixed-point network model The depth convolution result of the point convolution layer processes the depth convolution result.

In the deep neural network, the processing unit uses each hidden layer to process the input data, and outputs the processing result to the next hidden layer as the input data of the next hidden layer. When processing image data, the input data and output data of each hidden layer are called feature maps. When performing deep convolution and point convolution operations, the feature maps are processed by tiles.

In this embodiment, for each convolutional layer of the selected fixed-point network model, the processing unit first performs a deep convolution operation:

First, a data block (tile) of the feature map is stored in the on-chip memory. The size of the data block is equal to the size of the convolution kernel of the following deep convolution layer. The on-chip memory refers to a memory inside the processing unit rather than an external memory, and the on-chip memory may be an on-chip memory or a cache.

Then the data block is read from the on-chip memory, and the deep convolution layer processes the data block to obtain the deep convolution result of the data block. The data block is convolved with the weight of the convolution kernel, and the convolution result is superimposed with the offset value. If the deep convolutional layer has an activation operation, the activation function is used to activate the output value of the bias operation to obtain the activation value, and finally the activation value is quantized to obtain the deep convolution result.

Finally, the deep convolution result of the data block is stored in the on-chip memory, that is, stored in the on-chip memory or cache of the processing unit, instead of being stored in an off-chip memory such as DDR.

When processing the feature maps of multiple input channels, the processing unit can use the aforementioned deep convolution operation to process the feature maps of each input channel in parallel to improve computing efficiency.

Then the processing unit performs point convolution operations:

First, read the deep convolution result of the data block stored in the on-chip memory. That is, the deep convolution result is read from the on-chip memory or cache of the processing unit instead of off-chip memory such as DDR.

Then the point convolution layer processes the deep convolution result of the data block to obtain the point convolution result of the data block. The data block of each input channel is convolved with the weight of the convolution kernel, and the offset value is superimposed after the convolution result is superimposed. If the dot convolution layer has an activation operation, the activation function is used to activate the output value of the bias operation to obtain the activation value, and finally the activation value is quantized to obtain the dot convolution result of the data block.

Among them, the point convolution layer generally has multiple output channels, and the processing unit can use the above point convolution operation to process each output channel in parallel to improve the computing efficiency.

It can be seen that the data processing device based on the deep neural network of this embodiment stores the deep convolution result in the on-chip memory, and the point convolution layer reads the deep convolution result from the on-chip memory for processing, and the intermediate result (depth The reading and writing of the convolution result) are all done on-chip, without reading and writing off-chip memory. Compared with the method of writing intermediate results into off-chip memory and then reading intermediate results from off-chip memory, it further saves the bandwidth of mobile devices, improves the performance of mobile devices to perform deep neural network operations, and can support low computing power and bandwidth Few such lower-end mobile devices.

It should be noted that the embodiments of the present disclosure do not limit the implementation form of the deep neural network-based data processing device and its acquisition unit, quantization unit, selection unit, and processing unit. For the acquisition unit, quantization unit, selection unit, and processing unit, each unit can be implemented by a separate processor, and some or all of the units can also be implemented by one processor. It should be understood that the processor mentioned in the embodiments of the present disclosure may be a central processing unit (Central Processing Unit, CPU), other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), and application specific integrated circuits (Application Specific Integrated Circuits). Integrated Circuit, ASIC), Field Programmable Gate Array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc. The general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.

It should also be understood that the on-chip memory mentioned in the embodiments of the present disclosure refers to the memory integrated inside the processor, which may be volatile memory or non-volatile memory, or may include both volatile and non-volatile memory. Both memory.

Yet another embodiment of the present disclosure provides a mobile device, which includes the deep neural network-based data processing device of any of the foregoing embodiments. Wherein, the mobile device may be a portable mobile terminal, a drone, a handheld PTZ, a remote control, etc., the portable mobile terminal may be a cell phone, a tablet computer, etc., and the remote control may be a remote control of a drone.

Those skilled in the art can clearly understand that, for the convenience and conciseness of the description, only the division of the above-mentioned functional modules is used as an example. In practical applications, the above-mentioned functions can be allocated by different functional modules as required, namely, the device The internal structure is divided into different functional modules to complete all or part of the functions described above. For the specific working process of the device described above, reference may be made to the corresponding process in the foregoing method embodiment, which is not repeated here.

Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present disclosure, not to limit them; although the present disclosure has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand: The technical solutions described in the foregoing embodiments can still be modified, or some or all of the technical features can be equivalently replaced; in the case of no conflict, the features in the embodiments of the present disclosure can be combined arbitrarily; and these modifications or replacements It does not cause the essence of the corresponding technical solutions to deviate from the scope of the technical solutions of the embodiments of the present disclosure.

Claims

A data processing method based on deep neural network, which includes:

Acquiring a floating-point network model of the deep neural network;

Quantify the floating-point network model to obtain at least two fixed-point network models with different precisions;

Selecting one of at least two fixed-point network models according to the accuracy of the fixed-point network model;

The data is processed using the selected fixed-point network model.
8. The data processing method according to claim 1, wherein the deep neural network is a convolutional neural network with depth separation.
3. The data processing method of claim 2, wherein the depth separable convolutional neural network includes a plurality of convolutional layers, and each of the convolutional layers includes a deep convolutional layer and a point convolutional layer.
5. The data processing method of claim 3, wherein quantizing the floating-point network model comprises:

For the deep convolutional layer, different bits are used to quantify the weight, and the same number of bits is used to quantize the activation value.
5. The data processing method of claim 3, wherein quantizing the floating-point network model comprises:

The point convolution layer uses different bits to quantize the weight, and uses the same bit to quantize the activation value.
5. The data processing method of claim 3, wherein quantizing the floating-point network model comprises:

For both the deep convolutional layer and the point convolutional layer, different numbers of bits are used to quantize the weights, and the same number of bits is used to quantize the activation values.
The data processing method according to claim 4, wherein:

For the dot convolutional layer, the same number of bits is used to quantize the weight, and the same number of bits is used to quantize the activation value.
8. The data processing method of claim 7, wherein at least two of the fixed-point network models include: a first fixed-point network model and a second fixed-point network model;

The weight and activation value of the deep convolution layer of the first fixed-point network model are both the first numerical value, and the weight and activation value of the point convolution layer are both the first numerical value;

The weight of the deep convolution layer of the second fixed-point network model is the second numerical bit, the activation value is the first numerical bit, and the weight and activation value of the point convolution layer are both the first numerical bit.
8. The data processing method according to claim 8, wherein the first numerical digit is eight and the second numerical digit is sixteen.
The data processing method of claim 1, wherein, according to the accuracy of the fixed-point network model, selecting one of the at least two fixed-point network models comprises:

Processing the same test data by using the at least two fixed-point network models;

Acquiring the accuracy value of the processing result of each of the fixed-point network models;

Determine whether there is at least one fixed-point network model, and the difference between its accuracy value and that of the fixed-point network model with the highest accuracy is within the threshold;

If not, use the fixed-point network model with the highest accuracy as the selected fixed-point network model; if so, use the fixed-point network model that occupies the smallest memory among the at least one fixed-point network model as the selected fixed-point network model.
10. The data processing method of claim 10, wherein the deep neural network is a convolutional neural network that is deeply separable;
11. The data processing method according to claim 11, wherein the depth separable convolutional neural network includes a plurality of convolutional layers, and each of the convolutional layers includes: a deep convolutional layer and a point convolutional layer;

The accuracy of the fixed-point network model corresponds to the number of weight bits and the number of activation values of at least one of the deep convolution layer and the point convolution layer.
The data processing method according to claim 12, wherein the weight bits of the deep convolutional layers of at least two fixed-point network models are different from each other, and the number of activation value bits is the same, and the weights of the point convolutional layers The number of digits is the same, and the number of activation values is the same.
The data processing method according to claim 13, wherein at least two of the fixed-point network models include: a first fixed-point network model and a second fixed-point network model;

The weight and activation value of the deep convolution layer of the first fixed-point network model are both the first numerical value, and the weight and activation value of the point convolution layer are both the first numerical value;

The weight of the deep convolution layer of the second fixed-point network model is a second numerical value, the activation value is the first numerical value, and the weight and activation value of the point convolution layer are both the first numerical value;

The fixed-point network model with the smallest memory occupation is the first fixed-point network model;

The fixed-point network model with the highest accuracy is the second fixed-point network model.
15. The data processing method according to claim 14, wherein the first numerical digit is eight and the second numerical digit is sixteen.
3. The data processing method of claim 2, wherein the depth separable convolutional neural network includes a plurality of convolutional layers, and each of the convolutional layers includes: a deep convolutional layer and a point convolutional layer;

Processing data using the selected fixed-point network model includes:

The deep convolution result of the deep convolution layer is stored in the on-chip memory, the deep convolution result stored in the on-chip memory is read, and the point convolution layer processes the deep convolution result.
The data processing method according to claim 16, wherein for each convolutional layer of the fixed-point network model:

The deep convolution layer processes the data block of the data to obtain a deep convolution result of the data block;

Storing the deep convolution result of the data block in the on-chip memory;

Reading the deep convolution result of the data block stored in the on-chip memory;

The point convolution layer processes the depth convolution result of the data block to obtain the point convolution result of the data block;

Move the data block, and perform the above processing on all the data blocks of the data.
The data processing method of claim 2 or 11, wherein the deeply separable convolutional neural network is a Mobile Net network.
The data processing method according to claim 10, wherein the accuracy value is at least one of an average accuracy and an average average accuracy.
A data processing device based on a deep neural network, which includes:

An obtaining unit for obtaining the floating-point network model of the deep neural network;

A quantization unit, configured to quantify the floating-point network model to obtain at least two fixed-point network models with different precisions;

A selection unit for selecting one fixed-point network model of at least two fixed-point network models according to the accuracy of the fixed-point network model;

The processing unit uses the selected fixed-point network model to process the data.
22. The data processing device of claim 20, wherein the deep neural network is a convolutional neural network that is deeply separable.
22. The data processing device of claim 21, wherein the depth separable convolutional neural network includes a plurality of convolutional layers, and each of the convolutional layers includes a deep convolutional layer and a point convolutional layer.
The data processing device according to claim 22, wherein the quantization unit uses different bits to quantize the weight of the deep convolutional layer, and uses the same bit to quantize the activation value.
The data processing device according to claim 22, wherein the quantization unit uses different bits to quantize the weight of the dot convolutional layer, and uses the same bit to quantize the activation value.
The data processing device according to claim 22, wherein, for both the deep convolution layer and the point convolution layer, the quantization unit uses different bits to quantize the weights, and uses the same bit to activate The value is quantified.
The data processing device according to claim 23, wherein:

The quantization unit uses the same number of bits to quantize the weight of the point convolutional layer, and uses the same number of bits to quantize the activation value.
The data processing device according to claim 26, wherein:

The weight and activation value of the deep convolution layer of the first fixed-point network model are both the first numerical value, and the weight and activation value of the point convolution layer are both the first numerical value;

The weight of the deep convolution layer of the second fixed-point network model is a second numerical value, the activation value is the first numerical value, and the weight and activation value of the point convolution layer are both the second numerical value.
22. The data processing device of claim 20, wherein the selection unit uses the at least two fixed-point network models to process the same test data;

Acquiring the accuracy value of the processing result of each of the fixed-point network models;

Determine whether there is at least one fixed-point network model, and the difference between its accuracy value and that of the fixed-point network model with the highest accuracy is within the threshold;

If not, use the fixed-point network model with the highest accuracy as the selected fixed-point network model; if yes, use the fixed-point network model that occupies the smallest memory among the at least one fixed-point network model as the selected fixed-point network model.
The data processing device of claim 28, wherein the deep neural network is a deeply separable convolutional neural network;
The data processing device of claim 29, wherein the depth-separable convolutional neural network includes a plurality of convolutional layers, and each of the convolutional layers includes: a deep convolutional layer and a point convolutional layer;

The accuracy of the fixed-point network model corresponds to the number of weights of at least one of the deep convolutional layer and the point convolutional layer.
The data processing device according to claim 30, wherein the number of weights of the deep convolutional layers of at least two of the fixed-point network models are different from each other, the number of activation values is the same, and the weights of the point convolutional layers The number of digits is the same, and the number of activation values is the same.
The data processing device of claim 31, wherein at least two of the fixed-point network models include: a first fixed-point network model and a second fixed-point network model;

The weight and activation value of the deep convolution layer of the first fixed-point network model are both the first numerical value, and the weight and activation value of the point convolution layer are both the first numerical value;

The weight of the deep convolution layer of the second fixed-point network model is a second numerical value, the activation value is the second numerical value, and the weight and activation value of the point convolution layer are both the first numerical value;

The selection unit uses the first fixed-point network model as the fixed-point network model with the smallest memory occupation, and uses the second fixed-point network model as the fixed-point network model with the highest accuracy.
The data processing device according to claim 32, wherein the first numerical bit is eight and the second numerical bit is sixteen.
22. The data processing device of claim 21, wherein the depth-separable convolutional neural network includes a plurality of convolutional layers, and each of the convolutional layers includes: a deep convolutional layer and a point convolutional layer;

The processing unit stores the deep convolution result of the deep convolution layer into the on-chip memory of the processing unit, reads the deep convolution result stored in the on-chip memory, and uses the point convolution layer to convolve the depth The product result is processed.
The data processing device according to claim 34, wherein the processing unit uses each convolutional layer of the fixed-point network model to perform the following operations:

Performing deep convolution on the data block of the data by using the deep convolution layer to obtain a deep convolution result of the data block;

Storing the deep convolution result of the data block in the on-chip memory;

Reading the deep convolution result of the data block stored in the on-chip memory;

Processing the deep convolution result of the data block by using the point convolution layer to obtain the point convolution result of the data block;

Move the data block, and perform the above-mentioned processing on all the data blocks of the data.
The data processing device according to claim 21 or 29, wherein the deeply separable convolutional neural network is a Mobile Net network.
The data processing device according to claim 28, wherein the accuracy value is at least one of an average accuracy or an average accuracy.
A mobile device, comprising: the data processing device based on a deep neural network according to any one of claims 20 to 37.
The mobile device according to claim 38, wherein the mobile device is at least one of a portable mobile terminal, a drone, a handheld PTZ, and a remote controller.