WO2021012148A1 - Data processing method and apparatus based on deep neural network, and mobile device - Google Patents

Data processing method and apparatus based on deep neural network, and mobile device Download PDF

Info

Publication number
WO2021012148A1
WO2021012148A1 PCT/CN2019/097072 CN2019097072W WO2021012148A1 WO 2021012148 A1 WO2021012148 A1 WO 2021012148A1 CN 2019097072 W CN2019097072 W CN 2019097072W WO 2021012148 A1 WO2021012148 A1 WO 2021012148A1
Authority
WO
WIPO (PCT)
Prior art keywords
fixed
point
network model
point network
deep
Prior art date
Application number
PCT/CN2019/097072
Other languages
French (fr)
Chinese (zh)
Inventor
陈诗南
余俊峰
周爱春
张伟
Original Assignee
深圳市大疆创新科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市大疆创新科技有限公司 filed Critical 深圳市大疆创新科技有限公司
Priority to CN201980005317.5A priority Critical patent/CN111344719A/en
Priority to PCT/CN2019/097072 priority patent/WO2021012148A1/en
Publication of WO2021012148A1 publication Critical patent/WO2021012148A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present disclosure relates to the technical field of artificial neural networks, and in particular to a data processing method, device and mobile equipment based on a deep neural network.
  • Deep neural networks have been widely used in mobile devices.
  • the network model needs to be quantified. If the number of quantized digits is large, it will lead to high bandwidth pressure. If the number of quantized digits is small, it will affect the accuracy of the network. Therefore, deep neural networks in mobile devices have a contradiction between ensuring network accuracy and reducing bandwidth.
  • the present disclosure provides a data processing method based on a deep neural network, which includes:
  • the data is processed using the selected fixed-point network model.
  • the present disclosure also provides a data processing device based on a deep neural network, which includes:
  • An obtaining unit for obtaining the floating-point network model of the deep neural network An obtaining unit for obtaining the floating-point network model of the deep neural network
  • a quantization unit configured to quantify the floating-point network model to obtain at least two fixed-point network models with different precisions
  • a selection unit configured to select one of at least two fixed-point network models according to the accuracy of the fixed-point network model
  • the processing unit uses the selected fixed-point network model to process the data.
  • the present disclosure also provides a mobile device, which includes: the above-mentioned data processing device based on a deep neural network.
  • FIG. 1 is a flowchart of a data processing method based on a deep neural network according to an embodiment of the disclosure.
  • FIG. 2 is a data diagram of a data processing method based on a deep neural network according to an embodiment of the disclosure.
  • Figure 3 is a schematic diagram of the structure of the convolutional layer of the deep neural network.
  • FIG. 4 is a schematic diagram of a deep convolution operation of a data processing method based on a deep neural network according to an embodiment of the disclosure.
  • FIG. 5 is a schematic diagram of a point convolution operation of a data processing method based on a deep neural network according to an embodiment of the disclosure.
  • An embodiment of the present disclosure provides a data processing method based on a deep neural network.
  • a deep neural network generally refers to an artificial neural network including an input layer, multiple hidden layers, and an output layer.
  • Deep neural networks include: Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), Long Short-Term Memory Networks (LSTM) and many other types of neural networks.
  • CNN Convolutional Neural Networks
  • RNN Recurrent Neural Networks
  • LSTM Long Short-Term Memory Networks
  • a convolutional neural network is taken as an example to describe the data processing method, but those skilled in the art should understand that the data processing method of this embodiment is not limited to convolutional neural networks, but is applicable to all types Deep neural network.
  • each hidden layer includes multiple operations such as convolution, bias, normalization (BN), activation, and quantization.
  • convolution For a convolutional neural network, each hidden layer includes multiple operations such as convolution, bias, normalization (BN), activation, and quantization.
  • the convolution operation is generally called a convolution layer.
  • the above description of the hidden layer is only exemplary, and does not constitute a limitation on the order and quantity of each operation or each layer.
  • Each operation or each layer can have various modifications. The position and number of several operations or layers can be changed. For example, some hidden layers may not have normalization operations and activation operations, and some hidden layers also include other operations or layers such as pooling and full connection.
  • the convolutional neural network is a fixed-point network
  • the hidden layer includes quantization operations.
  • the convolutional neural network is a floating-point network
  • the hidden layer does not include quantization operations.
  • the depthwise separable convolution in the convolutional neural network is taken as an example to describe the data processing method.
  • the convolutional layer introduced above is a standard convolutional layer, which performs standard convolution operations.
  • the convolutional layer in the hidden layer includes: the two convolutional layers obtained by splitting the standard convolutional layer: the deep convolutional layer and the point convolutional layer, which perform the deep convolution respectively Product operation and point convolution operation.
  • Convolutional neural networks generally target the data of multiple input channels.
  • the deep convolution layer first uses a convolution kernel to perform deep convolution on the data of each input channel to obtain the deep convolution results of each input channel.
  • the build-up layer then performs point convolution on the depth convolution result, fusing the information of each input channel.
  • the deep separable convolutional neural network has greatly reduced required parameters and reduced the amount of calculation. It is especially suitable for scenarios with limited computing resources such as mobile devices.
  • the deeply separable convolutional neural network in this embodiment includes a variety of different implementations, such as the Mobile Net network model.
  • the size of the convolution kernel of the point convolution layer of MobileNet is 1 ⁇ 1.
  • a standard 28-layer Mobile Net network model can be used, or a deep separable convolutional neural network obtained by reducing, expanding, and deforming the standard Mobile Net network model can also be used.
  • the data processing method based on the deep neural network of this embodiment includes the following steps:
  • the data processing method of this embodiment is suitable for mobile devices.
  • the mobile device can directly obtain a floating-point network model from the outside, and the floating-point network model is a trained deep neural network.
  • the deep separable convolutional neural network constructed in this step can be Mobile Net or other types of neural networks.
  • the convolutional layers of these neural networks include deep convolutional layers and point convolutional layers.
  • Floor As shown in FIG. 3, in an example, both the deep convolution layer and the point convolution layer include: convolution, bias, activation, and quantization operations.
  • Convolutional neural networks are usually used in image processing, especially in scenes such as image recognition and image classification. Therefore, in this step, the training image is used to train the depth-separable convolutional neural network. First, normalize the training image data, normalize the training image data to [-1, 1), and then input the normalized data into the constructed deep separable convolutional neural network for training, and get the depth Separable convolutional neural network model.
  • a floating point number with a larger number of digits is generally used to represent the convolutional neural network in the training phase.
  • a 32-bit floating point number can be used to represent the depth of the separable convolutional neural network.
  • the weights, bias values, and activation values in the deep convolutional layer and the point convolutional layer are all represented by 32-bit floating point numbers.
  • the resulting floating-point network model is a 32-bit floating-point network model. Due to the use of 32-bit floating point numbers, the data volume of convolutional neural network parameters and the amount of training calculations are very large. Mobile devices cannot provide enough computing resources. Therefore, deep neural networks are generally not trained on mobile devices, but on servers or computers. Perform training on the Internet, and then transplant the trained floating-point network model to mobile devices.
  • floating-point numbers of other digits may also represent a convolutional neural network with a depth of separation
  • the training data may also be other data besides image data, such as voice data.
  • S201 Perform quantization on the floating-point network model to obtain at least two fixed-point network models with different precisions.
  • the quantization method used in this step can be called mixed precision quantization.
  • different bits are used to quantify the weight of the deep convolutional layer, and the same bit is used to quantify the activation value of the deep convolutional layer.
  • the same number of bits is used to quantify the weight of the point convolution layer, and the same number of bits is used to quantify the activation value of the point convolution layer.
  • the weights can be converted into 8-bit and 16-bit fixed-point numbers respectively; for the 8-bit fixed-point deep convolutional layer, the corresponding activation value is quantized into 8-bit fixed-point numbers; For the 16-bit fixed-point deep convolutional layer, the corresponding activation value is also quantized into an 8-bit fixed-point number. For the point convolutional layer, the weight and activation value are quantized into 8-bit fixed point numbers.
  • this embodiment quantifies the floating-point network model into two fixed-point network models: the first fixed-point network model and the second fixed-point network model.
  • the weight and activation value of the deep convolutional layer of the first fixed-point network model are both 8-bit fixed-point numbers (w8a8), and the weight and activation value of the point convolutional layer are both 8-bit fixed-point numbers (w8a8);
  • the weight of the deep convolution layer of the fixed-point network model is a 16-bit fixed-point number, and the activation value is an 8-bit fixed-point number (w16a8), and the weight and activation value of the point convolution layer are both 8-bit fixed-point number (w8a8).
  • the activation value of the deep convolutional layer can also be quantized with mixed precision, that is, the same number of bits is used to quantize the weight of the deep convolutional layer, and the activation value of the deep convolutional layer is quantized with different bits.
  • the weight and activation value of the point convolution layer can also be quantified with mixed precision, including: for the deep convolution layer of the floating-point network model convolution layer, the weight is quantized with the same bit, and the same bit is used to activate it Value is quantized; for the point convolution layer of the floating-point network model, use different bits to quantize its weight, use the same bit to quantize its activation value, or use the same bit to quantize its weight, use different bits Quantify its activation value.
  • the quantization bits of the above-mentioned first fixed-point network model and the second fixed-point network model are only an example. This embodiment may also use other quantization bits other than 8 bits and 16 bits, and quantization bits. For more than three fixed-point network models. For the deep convolutional layer and the point convolutional layer, when the same number of bits is used to quantize the weight and the same number of bits is used to quantize the activation value, the quantization bit of the weight and the quantization bit of the activation value can be the same, or different.
  • step S201 The accuracy of the fixed-point network model and the required computing power are related to its quantization bits.
  • step S201 at least two fixed-point network models with different precisions are obtained, corresponding to different computing capabilities. The more quantization bits, the higher the accuracy, and the greater the computing power required, the fewer the quantization bits, the lower the accuracy, and the smaller the required computing power.
  • the weight of the deep convolutional layer of the second fixed-point network model is 16
  • the weight of the deep convolutional layer of the first fixed-point network model is fixed-point
  • the number is 8, therefore, the accuracy of the second fixed-point network model is better than that of the first fixed-point network model, but the required computing power is greater than that of the first fixed-point network model. Therefore, this step needs to select a fixed-point network model from at least two fixed-point network models, so as to reduce the required computing power as much as possible while the accuracy is not greatly reduced, thereby reducing the bandwidth requirements and achieving a balance The purpose of accuracy and bandwidth.
  • test data is first input into at least two fixed-point network models, and at least two fixed-point network models perform inferences on the test data to obtain processing results.
  • the test data uses an 8-bit integer (i8), which can be image data, or other data besides image data, such as voice data.
  • the test image is input into the first fixed-point network model and the second fixed-point network model.
  • the first fixed-point network model and the second fixed-point network model compare the test images respectively. Perform reasoning and get the image processing result.
  • the accuracy value of the processing result is characterized by the mean average accuracy (mAP).
  • mAP mean average accuracy
  • the accuracy value of the first fixed-point network model is called the first accuracy value
  • the accuracy value of the second fixed-point network model is called the second accuracy value.
  • the accuracy value of the processing result can also be characterized by other parameters such as average accuracy (AP).
  • the threshold may be set according to experience or accuracy requirements, for example, it may be 1%. In other words, it is judged whether there is a fixed-point network model whose accuracy value is within 1% of the accuracy value of the fixed-point network model with the highest accuracy.
  • the second fixed-point network model is the fixed-point network model with the highest accuracy
  • the accuracy value of the first fixed-point network model is judged to be the difference between the accuracy value of the second fixed-point network model Is the difference within 1%?
  • the fixed-point network model with the highest accuracy is selected at this time.
  • the first fixed-point network model and second fixed-point network model are used, if the difference between the accuracy value of the first fixed-point network model and the accuracy value of the second fixed-point network model is within 1%, the first fixed-point network model is selected Point network model, otherwise, select the second fixed point network model.
  • the input data can be processed to obtain the data processing result.
  • the data processing method based on the deep neural network of this embodiment quantifies the floating-point network model into at least two fixed-point network models with different accuracy, and selects a fixed-point network model according to the accuracy of the fixed-point network model.
  • Processing data can reduce the required computing power and bandwidth requirements as much as possible while ensuring network accuracy, taking into account the balance between accuracy and bandwidth, and effectively solving the contradiction between network accuracy and bandwidth. Improve the performance of mobile devices to perform deep neural network operations.
  • Another embodiment of the present disclosure provides a data processing method based on a deep neural network.
  • the features that are the same as or similar to the previous embodiment will not be repeated, and only the features that are different from the previous embodiment will be described below. .
  • step S401 in the process of using the selected fixed-point network model to process the data, the deep convolution result of the deep convolution layer is stored in the on-chip memory, and the slice is read The deep convolution result stored in the internal memory, and the point convolution layer processes the deep convolution result.
  • each hidden layer processes the input data, and outputs the processing result to the next hidden layer as the input data of the next hidden layer.
  • the input data and output data of each hidden layer are called feature maps.
  • the feature maps are processed by tiles.
  • a data block (tile) of the feature map is stored in the on-chip memory.
  • the size of the data block is equal to the size of the convolution kernel of the following deep convolution layer.
  • the on-chip memory refers to the internal memory of the processor rather than the external memory, and the on-chip memory may be on-chip memory or cache.
  • the data block is read from the on-chip memory, and the deep convolution layer processes the data block to obtain the deep convolution result of the data block.
  • the data block is convolved with the weight of the convolution kernel, and the convolution result is superimposed with the offset value.
  • [1, C, H t , W t ] represents the parameters of the data block, where C represents the number of input channels, and H t and W t represent the height and width of the data block, respectively.
  • [C, 1, H w , W w ] represents the parameters of the weight matrix of the deep convolutional layer, where C represents the number of input channels, and H w and W w represent the height and width of the weight matrix, respectively.
  • [C, 1, 1, 1] represents the parameter of the bias value of the deep convolutional layer, and C represents the number of input channels. If the deep convolutional layer has an activation operation, the activation function is used to activate the output value of the bias operation to obtain the activation value, and finally the activation value is quantized to obtain the deep convolution result.
  • the deep convolution result of the data block is stored in the on-chip memory, that is, stored in the on-chip memory or cache of the processor, rather than stored in an off-chip memory such as DDR.
  • the above-mentioned deep convolution operation can be used to process the feature maps of each input channel in parallel to improve computing efficiency.
  • the deep convolution result is read from the processor's on-chip memory or cache instead of off-chip memory such as DDR.
  • the point convolution layer processes the deep convolution result of the data block to obtain the point convolution result of the data block.
  • the data blocks of each input channel are convolved with the weight of the convolution kernel, and the convolution result is superimposed and then the offset value is superimposed.
  • [1, C, 1, 1] represents the parameters of the weight matrix of the point convolution layer, where C represents the number of input channels, and the third and fourth elements 1 and 1, respectively represent the weights
  • the height and width of the matrix, that is, the weight matrix of the point convolution layer, is a 1 ⁇ 1 matrix.
  • [1,1,1,1] represents the parameter of the offset value of the dot convolution layer.
  • the activation function is used to activate the output value of the bias operation to obtain the activation value, and finally the activation value is quantized to obtain the dot convolution result of the data block.
  • the point convolution layer generally has multiple output channels, and the aforementioned point convolution operation can be used to process each output channel in parallel to improve computing efficiency.
  • the data processing method based on the deep neural network of this embodiment stores the deep convolution result in the on-chip memory, and the point convolution layer reads the deep convolution result from the on-chip memory for processing, and the intermediate result (depth The reading and writing of the convolution result) are all done on-chip, without reading and writing off-chip memory.
  • the method of writing intermediate results into off-chip memory and then reading intermediate results from off-chip memory it further saves the bandwidth of mobile devices, improves the performance of mobile devices to perform deep neural network operations, and can support low computing power and bandwidth Few such lower-end mobile devices.
  • Another embodiment of the present disclosure provides a data processing device based on a deep neural network, including:
  • the obtaining unit is used to obtain the floating point network model of the deep neural network.
  • the quantization unit is used to quantize the floating-point network model to obtain at least two fixed-point network models with different precisions.
  • the selection unit is configured to select one of at least two fixed-point network models according to the accuracy of the fixed-point network model.
  • the processing unit uses the selected fixed-point network model to process the data.
  • the data processing apparatus of this embodiment is used for mobile equipment, and the mobile equipment can directly obtain a floating-point network model from the outside, and the floating-point network model is a trained deep neural network.
  • Training a deep neural network is performed in two steps. First, build a deep separable convolutional neural network. Then, train the constructed deep-separable convolutional neural network to obtain a deep-separable convolutional neural network model.
  • the depth separable convolutional neural network constructed in this embodiment can be Mobile Net or other types of neural networks.
  • the convolutional layers of these neural networks include deep convolutional layers and point convolutional layers. Buildup.
  • training images are used to train a convolutional neural network that is separable in depth.
  • a 32-bit floating point number can be used to represent a convolutional neural network that is separable in depth, and the weights, bias values, and activation values in the deep convolutional layer and the point convolutional layer are all represented by 32-bit floating point numbers .
  • the floating-point network model obtained is 32 floating-point network model. Due to the use of 32-bit floating point numbers, the data volume of convolutional neural network parameters and the amount of training calculations are very large. Mobile devices cannot provide enough computing resources. Therefore, deep neural networks are generally not trained on mobile devices, but on servers or computers. Perform training on the Internet, and then transplant the trained floating-point network model to mobile devices.
  • the quantization method used by the quantization unit can be called mixed precision quantization.
  • different bits are used to quantify the weight of the deep convolutional layer, and the same bit is used to quantify the activation value of the deep convolutional layer.
  • the same number of bits is used to quantify the weight of the point convolution layer, and the same number of bits is used to quantify the activation value of the point convolution layer.
  • the weights can be converted into 8-bit and 16-bit fixed-point numbers respectively; for the 8-bit fixed-point deep convolutional layer, the corresponding activation value is quantized into 8-bit fixed-point numbers; For the 16-bit fixed-point deep convolutional layer, the corresponding activation value is also quantized into an 8-bit fixed-point number. For the point convolutional layer, the weight and activation value are quantized into 8-bit fixed point numbers.
  • this embodiment quantifies the floating-point network model into two fixed-point network models: the first fixed-point network model and the second fixed-point network model.
  • the weight and activation value of the deep convolutional layer of the first fixed-point network model are both 8-bit fixed-point numbers (w8a8), and the weight and activation value of the point convolutional layer are both 8-bit fixed-point numbers (w8a8);
  • the weight of the deep convolution layer of the fixed-point network model is a 16-bit fixed-point number, and the activation value is an 8-bit fixed-point number (w16a8), and the weight and activation value of the point convolutional layer are both 8-bit fixed-point number (w8a8).
  • the activation value of the deep convolutional layer can also be quantized with mixed precision, that is, the same number of bits is used to quantize the weight of the deep convolutional layer, and the activation value of the deep convolutional layer is quantized with different bits.
  • the weight and activation value of the point convolution layer can also be quantified with mixed precision, including: for the deep convolution layer of the floating-point network model convolution layer, the weight is quantized with the same bit, and the same bit is used to activate it Value is quantized; for the point convolution layer of the floating-point network model, use different bits to quantize its weight, use the same bit to quantize its activation value, or use the same bit to quantize its weight, use different bits Quantify its activation value.
  • the quantization bits of the above-mentioned first fixed-point network model and the second fixed-point network model are only an example. This embodiment may also use other quantization bits other than 8 bits and 16 bits, and quantization bits. For more than three fixed-point network models. For the deep convolutional layer and the point convolutional layer, when the same number of bits is used to quantize the weight and the same number of bits is used to quantize the activation value, the quantization bit of the weight and the quantization bit of the activation value can be the same, or different.
  • the accuracy of the fixed-point network model and the required computing power are related to its quantization bits. At least two fixed-point network models with different precisions obtained by the quantization unit correspond to different computing capabilities. The more quantization bits, the higher the accuracy, and the greater the computing power required, the fewer the quantization bits, the lower the accuracy, and the smaller the required computing power.
  • the selected unit needs to select a fixed-point network model from at least two fixed-point network models, so as to reduce the required computing power as much as possible while the accuracy is not greatly reduced, thereby reducing the bandwidth requirements and achieving Balance the purpose of accuracy and bandwidth.
  • the selected unit first inputs the same test data into at least two fixed-point network models, and the at least two fixed-point network models perform inferences on the test data to obtain processing results.
  • the test data uses an 8-bit integer (i8), which can be image data, or other data besides image data, such as voice data.
  • the test image is input into the first fixed-point network model and the second fixed-point network model.
  • the first fixed-point network model and the second fixed-point network model compare the test images respectively. Perform reasoning and get the image processing result.
  • the selected unit obtains the accuracy value of the processing result of each of the fixed-point network models.
  • the accuracy value of the processing result is characterized by the mean average accuracy (mAP).
  • mAP mean average accuracy
  • the accuracy value of the first fixed-point network model is called the first accuracy value
  • the accuracy value of the second fixed-point network model is called the second accuracy value.
  • the accuracy value of the processing result can also be characterized by other parameters such as average accuracy (AP).
  • the selected unit judges whether there is at least one fixed-point network model, and the difference between its precision value and the precision value of the fixed-point network model with the highest precision is within the threshold.
  • the threshold may be set according to experience or accuracy requirements, for example, it may be 1%. In other words, it is judged whether there is a fixed-point network model whose accuracy value is within 1% of the accuracy value of the fixed-point network model with the highest accuracy.
  • the second fixed-point network model is the fixed-point network model with the highest accuracy
  • the accuracy value of the first fixed-point network model is judged to be the difference between the accuracy value of the second fixed-point network model Is the difference within 1%?
  • the fixed-point network model with the highest accuracy is selected at this time.
  • the first fixed-point network model and second fixed-point network model are used, if the difference between the accuracy value of the first fixed-point network model and the accuracy value of the second fixed-point network model is within 1%, the first fixed-point network model is selected Point network model, otherwise, select the second fixed point network model.
  • the processing unit uses the selected fixed-point network model to process the data. After the fixed-point network model is selected, the processing unit can process the input data to obtain the data processing result.
  • the data processing device based on the deep neural network of this embodiment quantifies the floating-point network model into at least two fixed-point network models with different accuracy, and selects a fixed-point network model according to the accuracy of the fixed-point network model.
  • Processing data can reduce the required computing power and bandwidth requirements as much as possible while ensuring network accuracy, taking into account the balance between accuracy and bandwidth, and effectively solving the contradiction between network accuracy and bandwidth. Improve the performance of mobile devices to perform deep neural network operations.
  • the processing unit stores the deep convolution result of the deep convolution layer into the on-chip memory and reads the on-chip memory storage during the process of processing the data using the selected fixed-point network model
  • the depth convolution result of the point convolution layer processes the depth convolution result.
  • the processing unit uses each hidden layer to process the input data, and outputs the processing result to the next hidden layer as the input data of the next hidden layer.
  • the input data and output data of each hidden layer are called feature maps.
  • the feature maps are processed by tiles.
  • the processing unit first performs a deep convolution operation:
  • a data block (tile) of the feature map is stored in the on-chip memory.
  • the size of the data block is equal to the size of the convolution kernel of the following deep convolution layer.
  • the on-chip memory refers to a memory inside the processing unit rather than an external memory, and the on-chip memory may be an on-chip memory or a cache.
  • the deep convolution layer processes the data block to obtain the deep convolution result of the data block.
  • the data block is convolved with the weight of the convolution kernel, and the convolution result is superimposed with the offset value. If the deep convolutional layer has an activation operation, the activation function is used to activate the output value of the bias operation to obtain the activation value, and finally the activation value is quantized to obtain the deep convolution result.
  • the deep convolution result of the data block is stored in the on-chip memory, that is, stored in the on-chip memory or cache of the processing unit, instead of being stored in an off-chip memory such as DDR.
  • the processing unit can use the aforementioned deep convolution operation to process the feature maps of each input channel in parallel to improve computing efficiency.
  • the deep convolution result of the data block stored in the on-chip memory is read from the on-chip memory or cache of the processing unit instead of off-chip memory such as DDR.
  • the point convolution layer processes the deep convolution result of the data block to obtain the point convolution result of the data block.
  • the data block of each input channel is convolved with the weight of the convolution kernel, and the offset value is superimposed after the convolution result is superimposed. If the dot convolution layer has an activation operation, the activation function is used to activate the output value of the bias operation to obtain the activation value, and finally the activation value is quantized to obtain the dot convolution result of the data block.
  • the point convolution layer generally has multiple output channels, and the processing unit can use the above point convolution operation to process each output channel in parallel to improve the computing efficiency.
  • the data processing device based on the deep neural network of this embodiment stores the deep convolution result in the on-chip memory, and the point convolution layer reads the deep convolution result from the on-chip memory for processing, and the intermediate result (depth The reading and writing of the convolution result) are all done on-chip, without reading and writing off-chip memory.
  • the method of writing intermediate results into off-chip memory and then reading intermediate results from off-chip memory it further saves the bandwidth of mobile devices, improves the performance of mobile devices to perform deep neural network operations, and can support low computing power and bandwidth Few such lower-end mobile devices.
  • the embodiments of the present disclosure do not limit the implementation form of the deep neural network-based data processing device and its acquisition unit, quantization unit, selection unit, and processing unit.
  • each unit can be implemented by a separate processor, and some or all of the units can also be implemented by one processor.
  • the processor mentioned in the embodiments of the present disclosure may be a central processing unit (Central Processing Unit, CPU), other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), and application specific integrated circuits (Application Specific Integrated Circuits). Integrated Circuit, ASIC), Field Programmable Gate Array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the on-chip memory mentioned in the embodiments of the present disclosure refers to the memory integrated inside the processor, which may be volatile memory or non-volatile memory, or may include both volatile and non-volatile memory. Both memory.
  • the mobile device which includes the deep neural network-based data processing device of any of the foregoing embodiments.
  • the mobile device may be a portable mobile terminal, a drone, a handheld PTZ, a remote control, etc.
  • the portable mobile terminal may be a cell phone, a tablet computer, etc.
  • the remote control may be a remote control of a drone.

Abstract

Disclosed are a data processing method and apparatus based on a deep neural network, and a mobile device. The method comprises: acquiring a floating-point network model of a deep neural network; quantifying the floating-point network model to obtain at least two fixed-point network models with different accuracies; according to the accuracies of the fixed-point network models, selecting one fixed-point network model of the at least two fixed-point network models; and using the selected fixed-point network model to process data.

Description

基于深度神经网络的数据处理方法、装置及移动设备Data processing method, device and mobile equipment based on deep neural network 技术领域Technical field
本公开涉及人工神经网络技术领域,尤其涉及一种基于深度神经网络的数据处理方法、装置及移动设备。The present disclosure relates to the technical field of artificial neural networks, and in particular to a data processing method, device and mobile equipment based on a deep neural network.
背景技术Background technique
深度神经网络已经在移动设备中广泛应用。深度神经网络部署到移动设备时,由于移动设备的计算资源有限,需要对网络模型进行量化。如果量化的位数较多,会导致带宽压力大,如果量化的位数较少,则会影响网络精度,因此,深度神经网络在移动设备中存在保证网络精度和降低带宽之间的矛盾。Deep neural networks have been widely used in mobile devices. When a deep neural network is deployed to a mobile device, due to the limited computing resources of the mobile device, the network model needs to be quantified. If the number of quantized digits is large, it will lead to high bandwidth pressure. If the number of quantized digits is small, it will affect the accuracy of the network. Therefore, deep neural networks in mobile devices have a contradiction between ensuring network accuracy and reducing bandwidth.
同时,深度神经网络运算中存在大量的中间结果,现有技术常常将中间结果存储至片外存储器,再从片外存储器读取中间结果,这种对片外存储器频繁的读写操作也进一步增大了带宽压力。At the same time, there are a large number of intermediate results in deep neural network operations. The prior art often stores the intermediate results in an off-chip memory and then reads the intermediate results from the off-chip memory. This frequent read and write operations on the off-chip memory has further increased. Increased bandwidth pressure.
公开内容Public content
本公开提供了一种基于深度神经网络的数据处理方法,其中,包括:The present disclosure provides a data processing method based on a deep neural network, which includes:
获取所述深度神经网络的浮点网络模型;Acquiring a floating-point network model of the deep neural network;
对所述浮点网络模型进行量化,得到至少两个不同精度的定点网络模型;Quantify the floating-point network model to obtain at least two fixed-point network models with different precisions;
根据所述定点网络模型的所述精度,选定至少两个所述定点网络模型中的一个定点网络模型;Selecting one of at least two fixed-point network models according to the accuracy of the fixed-point network model;
利用选定的所述定点网络模型对数据进行处理。The data is processed using the selected fixed-point network model.
本公开还提供了一种基于深度神经网络的数据处理装置,其中,包括:The present disclosure also provides a data processing device based on a deep neural network, which includes:
获取单元,用于获取所述深度神经网络的浮点网络模型;An obtaining unit for obtaining the floating-point network model of the deep neural network;
量化单元,用于对所述浮点网络模型进行量化,得到至少两个不同精度的定点网络模型;A quantization unit, configured to quantify the floating-point network model to obtain at least two fixed-point network models with different precisions;
选定单元,用于根据所述定点网络模型的所述精度,选定至少两个所述定点网络模型中的一个定点网络模型;A selection unit, configured to select one of at least two fixed-point network models according to the accuracy of the fixed-point network model;
处理单元,利用选定的所述定点网络模型对数据进行处理。The processing unit uses the selected fixed-point network model to process the data.
本公开还提供了一种移动设备,其中,包括:上述基于深度神经网络 的数据处理装置。The present disclosure also provides a mobile device, which includes: the above-mentioned data processing device based on a deep neural network.
从上述技术方案可以看出,本公开至少具有以下有益效果:It can be seen from the above technical solutions that the present disclosure has at least the following beneficial effects:
通过将浮点网络模型量化为至少两个不同精度的定点网络模型,并根据定点网络模型的精度,选定一个定点网络模型来对数据进行处理,可以实现在保证网络精度的情况下,尽可能地降低所需计算能力,减小对带宽的要求,兼顾精度与带宽的平衡,有效解决了网络精度和带宽之间的矛盾,提高了移动设备执行深度神经网络运算的性能。By quantifying the floating-point network model into at least two fixed-point network models with different precisions, and selecting a fixed-point network model to process the data according to the accuracy of the fixed-point network model, it is possible to achieve as much as possible while ensuring the network accuracy It reduces the required computing power, reduces the bandwidth requirements, and balances accuracy and bandwidth, effectively solving the contradiction between network accuracy and bandwidth, and improving the performance of mobile devices to perform deep neural network operations.
附图说明Description of the drawings
附图是用来提供对本公开的进一步理解,并且构成说明书的一部分,与下面的具体实施方式一起用于解释本公开,但并不构成对本公开的限制。在附图中:The accompanying drawings are used to provide a further understanding of the present disclosure and constitute a part of the specification. Together with the following specific embodiments, they are used to explain the present disclosure, but do not constitute a limitation to the present disclosure. In the attached picture:
图1为本公开实施例基于深度神经网络的数据处理方法的流程图。FIG. 1 is a flowchart of a data processing method based on a deep neural network according to an embodiment of the disclosure.
图2为本公开实施例基于深度神经网络的数据处理方法的数据图。FIG. 2 is a data diagram of a data processing method based on a deep neural network according to an embodiment of the disclosure.
图3为深度神经网络的卷积层的结构示意图。Figure 3 is a schematic diagram of the structure of the convolutional layer of the deep neural network.
图4为本公开实施例基于深度神经网络的数据处理方法的深度卷积操作示意图。4 is a schematic diagram of a deep convolution operation of a data processing method based on a deep neural network according to an embodiment of the disclosure.
图5为本公开实施例基于深度神经网络的数据处理方法的点卷积操作示意图。FIG. 5 is a schematic diagram of a point convolution operation of a data processing method based on a deep neural network according to an embodiment of the disclosure.
具体实施方式Detailed ways
下面将结合实施例和实施例中的附图,对本公开技术方案进行清楚、完整的描述。显然,所描述的实施例仅仅是本公开一部分实施例,而不是全部的实施例。基于本公开中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本公开保护的范围。The technical solutions of the present disclosure will be clearly and completely described below in conjunction with the embodiments and the drawings in the embodiments. Obviously, the described embodiments are only a part of the embodiments of the present disclosure, rather than all the embodiments. Based on the embodiments in the present disclosure, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of the present disclosure.
本公开一实施例提供了一种基于深度神经网络的数据处理方法。深度神经网络(DNN)是一般是指包括输入层、多个隐藏层和输出层的人工神经网络。深度神经网络包括:卷积神经网络(CNN)、循环神经网络(RNN)、长短期记忆网络(LSTM)等多种类型的神经网络。在本实施例中,以卷积神经网络为例对该数据处理方法进行说明,但本领域技术人员应当了解,本实施例的数据处理方法并不限于卷积神经网络,而是适用于所有类型的深度神经网络。An embodiment of the present disclosure provides a data processing method based on a deep neural network. A deep neural network (DNN) generally refers to an artificial neural network including an input layer, multiple hidden layers, and an output layer. Deep neural networks include: Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), Long Short-Term Memory Networks (LSTM) and many other types of neural networks. In this embodiment, a convolutional neural network is taken as an example to describe the data processing method, but those skilled in the art should understand that the data processing method of this embodiment is not limited to convolutional neural networks, but is applicable to all types Deep neural network.
对于卷积神经网络来说,其每个隐藏层包括卷积(convolution)、偏置(bias)、归一化(BN)、激活(activation)、量化(quantization)等多个操作。其中,卷积操作一般称为卷积层。For a convolutional neural network, each hidden layer includes multiple operations such as convolution, bias, normalization (BN), activation, and quantization. Among them, the convolution operation is generally called a convolution layer.
需要说明的是,对于卷积神经网络,上述隐藏层的说明仅是示例性的,并不构成对各个操作或各个层的顺序和数量的限定,其各个操作或各个层可以有多种变形,若干操作或层的位置和数量均可以发生变化。例如,有的隐藏层可能没有归一化操作、激活操作,有的隐藏层还包括池化、全连接等其他操作或层。当卷积神经网络为定点网络时,隐藏层包括量化操作,当卷积神经网络为浮点网络时,隐藏层不包括量化操作。It should be noted that, for the convolutional neural network, the above description of the hidden layer is only exemplary, and does not constitute a limitation on the order and quantity of each operation or each layer. Each operation or each layer can have various modifications. The position and number of several operations or layers can be changed. For example, some hidden layers may not have normalization operations and activation operations, and some hidden layers also include other operations or layers such as pooling and full connection. When the convolutional neural network is a fixed-point network, the hidden layer includes quantization operations. When the convolutional neural network is a floating-point network, the hidden layer does not include quantization operations.
在本实施例中,卷积神经网络中的深度可分离的卷积神经网络(depthwise separable convolution)为例对该数据处理方法进行说明。上面介绍的卷积层是标准卷积层,其执行标准卷积操作。在深度可分离卷积神经网络中,隐藏层中的卷积层,包括:将标准卷积层拆分而得到的两个卷积层:深度卷积层与点卷积层,分别执行深度卷积操作和点卷积操作。卷积神经网络一般针对的是多个输入通道的数据,深度卷积层先对每个输入通道的数据分别用一个卷积核进行深度卷积,得到各个输入通道的深度卷积结果,点卷积层再对深度卷积结果进行点卷积,将各个输入通道的信息进行融合。深度可分离的卷积神经网络与标准的卷积神经网络相比,所需参数大幅减少,计算量大幅减小,特别适用于移动设备等计算资源有限的场景。In this embodiment, the depthwise separable convolution in the convolutional neural network is taken as an example to describe the data processing method. The convolutional layer introduced above is a standard convolutional layer, which performs standard convolution operations. In the deep separable convolutional neural network, the convolutional layer in the hidden layer includes: the two convolutional layers obtained by splitting the standard convolutional layer: the deep convolutional layer and the point convolutional layer, which perform the deep convolution respectively Product operation and point convolution operation. Convolutional neural networks generally target the data of multiple input channels. The deep convolution layer first uses a convolution kernel to perform deep convolution on the data of each input channel to obtain the deep convolution results of each input channel. The build-up layer then performs point convolution on the depth convolution result, fusing the information of each input channel. Compared with the standard convolutional neural network, the deep separable convolutional neural network has greatly reduced required parameters and reduced the amount of calculation. It is especially suitable for scenarios with limited computing resources such as mobile devices.
本实施例的深度可分离的卷积神经网络包括多种不同的实现方式,例如Mobile Net网络模型。Mobile Net的点卷积层的卷积核大小为1×1。本实施例可以采用标准的28层Mobile Net网络模型,也可以采用对标准Mobile Net网络模型做缩减、扩张、变形而得到的深度可分离卷积神经网络。The deeply separable convolutional neural network in this embodiment includes a variety of different implementations, such as the Mobile Net network model. The size of the convolution kernel of the point convolution layer of MobileNet is 1×1. In this embodiment, a standard 28-layer Mobile Net network model can be used, or a deep separable convolutional neural network obtained by reducing, expanding, and deforming the standard Mobile Net network model can also be used.
本实施例的基于深度神经网络的数据处理方法,结合图1和图2所示,包括以下步骤:The data processing method based on the deep neural network of this embodiment, as shown in FIG. 1 and FIG. 2, includes the following steps:
S101:获取所述深度神经网络的浮点网络模型。S101: Obtain a floating-point network model of the deep neural network.
本实施例的数据处理方法适应于移动设备,在本步骤中,移动设备可直接从外部获取浮点网络模型,浮点网络模型是经过训练的深度神经网络。The data processing method of this embodiment is suitable for mobile devices. In this step, the mobile device can directly obtain a floating-point network model from the outside, and the floating-point network model is a trained deep neural network.
训练深度神经网络分为两个步骤执行,参见图2中“训练”环节:The training of the deep neural network is performed in two steps, see the "training" link in Figure 2:
首先,构建深度可分离的卷积神经网络。First, construct a convolutional neural network that is separable in depth.
然后,对构建的深度可分离的卷积神经网络进行训练,得到深度可分离的卷积神经网络模型。Then, train the constructed deep-separable convolutional neural network to obtain a deep-separable convolutional neural network model.
如上所述,本步骤构建的深度可分离的卷积神经网络,可以是Mobile Net或其他类型的神经网络,这些神经网络的卷积层均包括深度卷积层与点卷积层两个卷积层。如图3所示,在一个示例中,深度卷积层和点卷积层均包括:卷积、偏置、激活和量化操作。As mentioned above, the deep separable convolutional neural network constructed in this step can be Mobile Net or other types of neural networks. The convolutional layers of these neural networks include deep convolutional layers and point convolutional layers. Floor. As shown in FIG. 3, in an example, both the deep convolution layer and the point convolution layer include: convolution, bias, activation, and quantization operations.
卷积神经网络通常用于图像处理领域,尤其是图像识别、图像分类等场景。因此在本步骤中,利用训练图像对深度可分离的卷积神经网络进行训练。首先对训练图像的数据进行归一化,将训练图像的数据归一化到[-1,1),再将归一化的数据输入构建的深度可分离的卷积神经网络进行训练,得到深度可分离的卷积神经网络模型。Convolutional neural networks are usually used in image processing, especially in scenes such as image recognition and image classification. Therefore, in this step, the training image is used to train the depth-separable convolutional neural network. First, normalize the training image data, normalize the training image data to [-1, 1), and then input the normalized data into the constructed deep separable convolutional neural network for training, and get the depth Separable convolutional neural network model.
需要说明的是,为了保证网络精度,在训练阶段一般采用位数较大的浮点数表示卷积神经网络。在本步骤中,可采用32位浮点数表示深度可分离的卷积神经网络,深度卷积层和点卷积层中的权值、偏置值、激活值等均用32位浮点数表示,得到的浮点网络模型是32位浮点网络模型。由于采用32位浮点数,所以卷积神经网络参数的数据量、以及训练的计算量很大,移动设备不能提供足够的计算资源,所以一般不在移动设备训练深度神经网络,而是在服务器或计算机上进行训练,然后将训练好的浮点网络模型移植到移动设备。It should be noted that in order to ensure the accuracy of the network, a floating point number with a larger number of digits is generally used to represent the convolutional neural network in the training phase. In this step, a 32-bit floating point number can be used to represent the depth of the separable convolutional neural network. The weights, bias values, and activation values in the deep convolutional layer and the point convolutional layer are all represented by 32-bit floating point numbers. The resulting floating-point network model is a 32-bit floating-point network model. Due to the use of 32-bit floating point numbers, the data volume of convolutional neural network parameters and the amount of training calculations are very large. Mobile devices cannot provide enough computing resources. Therefore, deep neural networks are generally not trained on mobile devices, but on servers or computers. Perform training on the Internet, and then transplant the trained floating-point network model to mobile devices.
以上只是示例性说明,本实施例也可以其他位数的浮点数表示深度可分离的卷积神经网络,训练数据也可以是图像数据之外的其他数据,例如语音数据。The foregoing is only an exemplary description. In this embodiment, floating-point numbers of other digits may also represent a convolutional neural network with a depth of separation, and the training data may also be other data besides image data, such as voice data.
S201:对所述浮点网络模型进行量化,得到至少两个不同精度的定点网络模型。S201: Perform quantization on the floating-point network model to obtain at least two fixed-point network models with different precisions.
由于移动设备的存储以及计算能力有限,甚至可能不支持浮点运算,因此,如果直接利用浮点网络模型进行数据处理,将会对移动设备的存储、计算能力以及功耗造成严重的负担,移动设备甚至可能无法完成数据处理过程。因此,移动设备需要对浮点网络模型进行量化,量化后得到定点网 络模型。定点网络模型浮点网络模型需要更少的存储和计算能力,非常适用于移动设备。Due to the limited storage and computing capabilities of mobile devices, they may not even support floating-point operations. Therefore, if the floating-point network model is directly used for data processing, it will cause a serious burden on the storage, computing capabilities and power consumption of mobile devices. The device may even be unable to complete the data processing process. Therefore, mobile devices need to quantify the floating-point network model, and get the fixed-point network model after quantification. Fixed-point network model The floating-point network model requires less storage and computing power, and is very suitable for mobile devices.
参见图2中的“量化”环节,本步骤采用量化方式可称为混合精度量化。具体来说,针对浮点网络模型卷积层的深度卷积层,采用不同位数对深度卷积层的权重进行量化,采用同一位数对深度卷积层的激活值进行量化。针对浮点网络模型的点卷积层,采用同一位数对点卷积层的权重进行量化,采用同一位数对点卷积层的激活值进行量化。通过对深度卷积层的权重进行混合精度量化,可得到与权重量化位数对应的至少两个定点网络模型。Refer to the "quantization" link in Figure 2. The quantization method used in this step can be called mixed precision quantization. Specifically, for the deep convolutional layer of the floating-point network model convolutional layer, different bits are used to quantify the weight of the deep convolutional layer, and the same bit is used to quantify the activation value of the deep convolutional layer. For the point convolution layer of the floating point network model, the same number of bits is used to quantify the weight of the point convolution layer, and the same number of bits is used to quantify the activation value of the point convolution layer. By performing mixed precision quantization on the weights of the deep convolutional layer, at least two fixed-point network models corresponding to the quantization bits of the weights can be obtained.
在本实施例中,对于深度卷积层,可分别将权重量化为8位和16位定点数;对于8位定点数的深度卷积层,将其对应的激活值量化为8位定点数;对于16位定点数的深度卷积层,将其对应的激活值也量化为8位定点数。对于点卷积层,将权重和激活值量化为8位定点数。In this embodiment, for the deep convolutional layer, the weights can be converted into 8-bit and 16-bit fixed-point numbers respectively; for the 8-bit fixed-point deep convolutional layer, the corresponding activation value is quantized into 8-bit fixed-point numbers; For the 16-bit fixed-point deep convolutional layer, the corresponding activation value is also quantized into an 8-bit fixed-point number. For the point convolutional layer, the weight and activation value are quantized into 8-bit fixed point numbers.
通过上述方式,本实施例将浮点网络模型量化为两个定点网络模型:第一定点网络模型和第二定点网络模型。其中,第一定点网络模型的深度卷积层的权重和激活值均为8位定点数(w8a8),其点卷积层的权重和激活值均为8位定点数(w8a8);第二定点网络模型的深度卷积层的权重为16位定点数,激活值为8位定点数(w16a8),其点卷积层的权重和激活值均为8位定点数(w8a8)。In the above manner, this embodiment quantifies the floating-point network model into two fixed-point network models: the first fixed-point network model and the second fixed-point network model. Among them, the weight and activation value of the deep convolutional layer of the first fixed-point network model are both 8-bit fixed-point numbers (w8a8), and the weight and activation value of the point convolutional layer are both 8-bit fixed-point numbers (w8a8); second The weight of the deep convolution layer of the fixed-point network model is a 16-bit fixed-point number, and the activation value is an 8-bit fixed-point number (w16a8), and the weight and activation value of the point convolution layer are both 8-bit fixed-point number (w8a8).
以上通过一个示例对本步骤进行了说明,但本实施例不限于此。例如,还可以对深度卷积层的激活值进行混合精度量化,即采用同一位数对深度卷积层的权重进行量化,采用不同位数对深度卷积层的激活值进行量化。This step has been described above through an example, but this embodiment is not limited to this. For example, the activation value of the deep convolutional layer can also be quantized with mixed precision, that is, the same number of bits is used to quantize the weight of the deep convolutional layer, and the activation value of the deep convolutional layer is quantized with different bits.
还可以对点卷积层的权重和激活值进行混合精度量化,包括:针对浮点网络模型卷积层的深度卷积层,采用同一位数对其权重进行量化,采用同一位数对其激活值进行量化;针对浮点网络模型的点卷积层,采用不同位数对其权重进行量化,采用同一位数对其激活值进行量化,或者采用同一位数对其权重进行量化,采用不同位数对其激活值进行量化。The weight and activation value of the point convolution layer can also be quantified with mixed precision, including: for the deep convolution layer of the floating-point network model convolution layer, the weight is quantized with the same bit, and the same bit is used to activate it Value is quantized; for the point convolution layer of the floating-point network model, use different bits to quantize its weight, use the same bit to quantize its activation value, or use the same bit to quantize its weight, use different bits Quantify its activation value.
还可以对深度卷积层和点卷积层均采用混合精度量化,具体量化方式与上述深度卷积层或点卷积层的量化方式类似。It is also possible to use mixed precision quantization for both the deep convolutional layer and the point convolutional layer, and the specific quantization method is similar to the quantization method of the deep convolutional layer or the point convolutional layer described above.
本领域技术人员应当了解,上述第一定点网络模型和第二定点网络模 型的量化位数仅是一个示例,本实施例还可以采用8位、16位之外的其他量化位数,以及量化为三个以上的定点网络模型。对于深度卷积层和点卷积层,当采用同一位数对权重进行量化,且采用同一位数对激活值进行量化时,权重的量化位数与激活值的量化位数可以相同,也可以不同。Those skilled in the art should understand that the quantization bits of the above-mentioned first fixed-point network model and the second fixed-point network model are only an example. This embodiment may also use other quantization bits other than 8 bits and 16 bits, and quantization bits. For more than three fixed-point network models. For the deep convolutional layer and the point convolutional layer, when the same number of bits is used to quantize the weight and the same number of bits is used to quantize the activation value, the quantization bit of the weight and the quantization bit of the activation value can be the same, or different.
S301:根据所述定点网络模型的所述精度,选定至少两个所述定点网络模型中的一个定点网络模型;S301: According to the accuracy of the fixed-point network model, select one of the at least two fixed-point network models;
定点网络模型的精度和所需的计算能力与其量化位数有关。步骤S201得到至少两个不同精度的定点网络模型,对应不同的计算能力。量化位数越多,精度越高,所需计算能力越大,量化位数越少,精度越低,所需计算能力越小。例如,对于第一定点网络模型和第二定点网络模型,由于第二定点网络模型的深度卷积层的权重定点位数为16,第一定点网络模型的深度卷积层的权重定点位数为8,因此,第二定点网络模型的精度优于第一定点网络模型,但所需计算能力大于第一定点网络模型。因此,本步骤需要从至少两个定点网络模型中选定一个定点网络模型,以实现在精度下降不大的情况下,尽可能地降低所需计算能力,从而减小对带宽的要求,达到兼顾精度与带宽的目的。The accuracy of the fixed-point network model and the required computing power are related to its quantization bits. In step S201, at least two fixed-point network models with different precisions are obtained, corresponding to different computing capabilities. The more quantization bits, the higher the accuracy, and the greater the computing power required, the fewer the quantization bits, the lower the accuracy, and the smaller the required computing power. For example, for the first fixed-point network model and the second fixed-point network model, since the weight of the deep convolutional layer of the second fixed-point network model is 16, the weight of the deep convolutional layer of the first fixed-point network model is fixed-point The number is 8, therefore, the accuracy of the second fixed-point network model is better than that of the first fixed-point network model, but the required computing power is greater than that of the first fixed-point network model. Therefore, this step needs to select a fixed-point network model from at least two fixed-point network models, so as to reduce the required computing power as much as possible while the accuracy is not greatly reduced, thereby reducing the bandwidth requirements and achieving a balance The purpose of accuracy and bandwidth.
参见图2中的“选定”环节,具体来说,首先将同一测试数据输入至少两个定点网络模型,至少两个定点网络模型对该测试数据进行推理,得到处理结果。其中,该测试数据采用8位整数(i8),可以是图像数据,也可以是图像数据之外的其他数据,例如语音数据。Referring to the "selection" link in Figure 2, specifically, the same test data is first input into at least two fixed-point network models, and at least two fixed-point network models perform inferences on the test data to obtain processing results. Wherein, the test data uses an 8-bit integer (i8), which can be image data, or other data besides image data, such as voice data.
当采用上述第一定点网络模型和第二定点网络模型时,将测试图像输入第一定点网络模型和第二定点网络模型,第一定点网络模型和第二定点网络模型分别对测试图像进行推理,得到图像处理结果。When the above-mentioned first fixed-point network model and second fixed-point network model are used, the test image is input into the first fixed-point network model and the second fixed-point network model. The first fixed-point network model and the second fixed-point network model compare the test images respectively. Perform reasoning and get the image processing result.
获取各个所述定点网络模型的处理结果的精度值。Obtain the accuracy value of the processing result of each of the fixed-point network models.
本实施例中,处理结果的精度值用平均精度均值(mAP)表征。当采用上述第一定点网络模型和第二定点网络模型时,第一定点网络模型的精度值称为第一精度值,第二定点网络模型的精度值称为第二精度值。当然处理结果的精度值还可以用平均精度(AP)等其他参数表征。In this embodiment, the accuracy value of the processing result is characterized by the mean average accuracy (mAP). When the above-mentioned first fixed-point network model and the second fixed-point network model are used, the accuracy value of the first fixed-point network model is called the first accuracy value, and the accuracy value of the second fixed-point network model is called the second accuracy value. Of course, the accuracy value of the processing result can also be characterized by other parameters such as average accuracy (AP).
判断是否存在至少一个定点网络模型,其精度值与精度最高的定点网络模型的精度值的差值在阈值内。It is determined whether there is at least one fixed-point network model, and the difference between its accuracy value and the accuracy value of the fixed-point network model with the highest accuracy is within the threshold.
以精度最高的定点网络模型为比较对象,其他定点网络模型与之比较精度值。所述阈值可以根据经验、或精度要求来设置,例如可以是1%。也就是说,判断是否存在精度值与精度最高的定点网络模型的精度值的差值在1%内的定点网络模型。当采用上述第一定点网络模型和第二定点网络模型时,第二定点网络模型为精度最高的定点网络模型,判断第一定点网络模型的精度值与第二定点网络模型的精度值的差值是否在1%内。Take the fixed-point network model with the highest accuracy as the comparison object, and compare accuracy values with other fixed-point network models. The threshold may be set according to experience or accuracy requirements, for example, it may be 1%. In other words, it is judged whether there is a fixed-point network model whose accuracy value is within 1% of the accuracy value of the fixed-point network model with the highest accuracy. When the above-mentioned first fixed-point network model and second fixed-point network model are used, the second fixed-point network model is the fixed-point network model with the highest accuracy, and the accuracy value of the first fixed-point network model is judged to be the difference between the accuracy value of the second fixed-point network model Is the difference within 1%?
如果否,则将所述精度最高的定点网络模型作为选定的定点网络模型;如果是,将所述至少一个定点网络模型中占用内存最小的定点网络模型作为选定的定点网络模型。If not, use the fixed-point network model with the highest accuracy as the selected fixed-point network model; if so, use the fixed-point network model that occupies the smallest memory among the at least one fixed-point network model as the selected fixed-point network model.
如果否,则说明其他定点网络模型与精度最高的定点网络模型相比,精度值差异过大,如果选定其他定点网络模型,虽然可以降低所需计算能力,减小对带宽的要求,但是将较为严重的影响网络精度,所以,为保证网络精度,此时选定精度最高的定点网络模型。If not, it means that the accuracy difference between other fixed-point network models and the most accurate fixed-point network model is too large. If other fixed-point network models are selected, although the required computing power and bandwidth requirements can be reduced, it will It will seriously affect the accuracy of the network. Therefore, to ensure the accuracy of the network, the fixed-point network model with the highest accuracy is selected at this time.
如果是,说明其他定点网络模型与精度最高的定点网络模型的精度值差异很小,不会对网络精度产生过大影响,所以为尽可能地降低所需计算能力,减小对带宽的要求,选定占用内存最小的定点网络模型,这样网络精度也保证不会明显降低。If it is, it means that the accuracy value difference between other fixed-point network models and the most accurate fixed-point network model is very small, and will not have an excessive impact on network accuracy. Therefore, in order to reduce the required computing power as much as possible and reduce the bandwidth requirements, Select the fixed-point network model that occupies the smallest memory so that the network accuracy will not be significantly reduced.
当采用上述第一定点网络模型和第二定点网络模型时,如果第一定点网络模型的精度值与第二定点网络模型的精度值的差值在1%内,则选定第一定点网络模型,否则,选定第二定点网络模型。When the above-mentioned first fixed-point network model and second fixed-point network model are used, if the difference between the accuracy value of the first fixed-point network model and the accuracy value of the second fixed-point network model is within 1%, the first fixed-point network model is selected Point network model, otherwise, select the second fixed point network model.
S401:利用选定的所述定点网络模型对数据进行处理。S401: Use the selected fixed-point network model to process data.
参见图2中的“处理”环节,选定定点网络模型后,即可对输入数据进行处理,得到数据处理结果。Refer to the "processing" link in Figure 2. After selecting the fixed-point network model, the input data can be processed to obtain the data processing result.
由此可见,本实施例的基于深度神经网络的数据处理方法,通过将浮点网络模型量化为至少两个不同精度的定点网络模型,并根据定点网络模型的精度,选定一个定点网络模型来对数据进行处理,可以实现在保证网络精度的情况下,尽可能地降低所需计算能力,减小对带宽的要求,兼顾精度与带宽的平衡,有效解决了网络精度和带宽之间的矛盾,提高了移动设备执行深度神经网络运算的性能。It can be seen that the data processing method based on the deep neural network of this embodiment quantifies the floating-point network model into at least two fixed-point network models with different accuracy, and selects a fixed-point network model according to the accuracy of the fixed-point network model. Processing data can reduce the required computing power and bandwidth requirements as much as possible while ensuring network accuracy, taking into account the balance between accuracy and bandwidth, and effectively solving the contradiction between network accuracy and bandwidth. Improve the performance of mobile devices to perform deep neural network operations.
本公开另一实施例提供了一种基于深度神经网络的数据处理方法,为 简要描述,其与上一实施例相同或相似的特征不再赘述,以下仅描述其不同于上一实施例的特征。Another embodiment of the present disclosure provides a data processing method based on a deep neural network. For a brief description, the features that are the same as or similar to the previous embodiment will not be repeated, and only the features that are different from the previous embodiment will be described below. .
本实施例的数据处理方法,在步骤S401中,在利用选定的所述定点网络模型对数据进行处理的过程中,将深度卷积层的深度卷积结果存入片内存储器,读取片内存储器存储的所述深度卷积结果,点卷积层对所述深度卷积结果进行处理。In the data processing method of this embodiment, in step S401, in the process of using the selected fixed-point network model to process the data, the deep convolution result of the deep convolution layer is stored in the on-chip memory, and the slice is read The deep convolution result stored in the internal memory, and the point convolution layer processes the deep convolution result.
在深度神经网络中,每一隐藏层对输入数据进行处理,将处理结果输出给下一隐藏层,作为下一隐藏层的输入数据。当处理图像数据时,每一隐藏层的输入数据和输出数据称为特征图(feature map),在进行深度卷积和点卷积操作时,分块(tile)对特征图进行处理。In a deep neural network, each hidden layer processes the input data, and outputs the processing result to the next hidden layer as the input data of the next hidden layer. When processing image data, the input data and output data of each hidden layer are called feature maps. When performing deep convolution and point convolution operations, the feature maps are processed by tiles.
本实施例中,对于选定的定点网络模型的每一卷积层,首先执行深度卷积操作:In this embodiment, for each convolutional layer of the selected fixed-point network model, a deep convolution operation is first performed:
先将特征图的一个数据块(tile)存入片内存储器。数据块的大小与下述深度卷积层的卷积核大小相等。所述片内存储器是指处理器内部的存储器而非外接的存储器,片内存储器可以是片内内存或缓存。First, a data block (tile) of the feature map is stored in the on-chip memory. The size of the data block is equal to the size of the convolution kernel of the following deep convolution layer. The on-chip memory refers to the internal memory of the processor rather than the external memory, and the on-chip memory may be on-chip memory or cache.
然后从片内存储器读取数据块,深度卷积层对数据块进行处理,得到数据块的深度卷积结果。如图4所述,数据块与卷积核的权重进行卷积,卷积结果叠加偏置值。在图4中,[1,C,H t,W t]表示数据块的参数,其中C表示输入通道的数量,H t和W t分别表示数据块的高度和宽度。[C,1,H w,W w]表示深度卷积层的权重矩阵的参数,其中,C表示输入通道的数量,H w和W w分别表示权重矩阵的高度和宽度。[C,1,1,1]表示深度卷积层的偏置值的参数,C表示输入通道的数量。如果该深度卷积层具有激活操作,则利用激活函数对偏置操作的输出值进行激活,得到激活值,最后对激活值进行量化,得到深度卷积结果。 Then the data block is read from the on-chip memory, and the deep convolution layer processes the data block to obtain the deep convolution result of the data block. As shown in Figure 4, the data block is convolved with the weight of the convolution kernel, and the convolution result is superimposed with the offset value. In Figure 4, [1, C, H t , W t ] represents the parameters of the data block, where C represents the number of input channels, and H t and W t represent the height and width of the data block, respectively. [C, 1, H w , W w ] represents the parameters of the weight matrix of the deep convolutional layer, where C represents the number of input channels, and H w and W w represent the height and width of the weight matrix, respectively. [C, 1, 1, 1] represents the parameter of the bias value of the deep convolutional layer, and C represents the number of input channels. If the deep convolutional layer has an activation operation, the activation function is used to activate the output value of the bias operation to obtain the activation value, and finally the activation value is quantized to obtain the deep convolution result.
最后将所述数据块的深度卷积结果存入片内存储器,即存入处理器的片内内存或缓存中,而不是存储在DDR等片外存储器中。Finally, the deep convolution result of the data block is stored in the on-chip memory, that is, stored in the on-chip memory or cache of the processor, rather than stored in an off-chip memory such as DDR.
当处理多个输入通道的特征图时,可利用上述深度卷积操作对各个输入通道的特征图并行处理,以提高运算效率。When processing feature maps of multiple input channels, the above-mentioned deep convolution operation can be used to process the feature maps of each input channel in parallel to improve computing efficiency.
然后进行点卷积操作:Then perform the point convolution operation:
先读取所述片内存储器存储的数据块的深度卷积结果。即从处理器的 片内内存或缓存、而不是DDR等片外存储器中读取深度卷积结果。First, read the deep convolution result of the data block stored in the on-chip memory. That is, the deep convolution result is read from the processor's on-chip memory or cache instead of off-chip memory such as DDR.
然后点卷积层对数据块的深度卷积结果进行处理,得到数据块的点卷积结果。如图5所述,各个输入通道的数据块与卷积核的权重进行卷积,卷积结果叠加后再叠加偏置值。在图5中,[1,C,1,1]表示点卷积层的权重矩阵的参数,其中,C表示输入通道的数量,第3个元素和第4个元素1和1,分别表示权重矩阵的高度和宽度,即点卷积层的权重矩阵为1×1的矩阵。[1,1,1,1]表示点卷积层的偏置值的参数。Then the point convolution layer processes the deep convolution result of the data block to obtain the point convolution result of the data block. As shown in Figure 5, the data blocks of each input channel are convolved with the weight of the convolution kernel, and the convolution result is superimposed and then the offset value is superimposed. In Figure 5, [1, C, 1, 1] represents the parameters of the weight matrix of the point convolution layer, where C represents the number of input channels, and the third and fourth elements 1 and 1, respectively represent the weights The height and width of the matrix, that is, the weight matrix of the point convolution layer, is a 1×1 matrix. [1,1,1,1] represents the parameter of the offset value of the dot convolution layer.
如果该点卷积层具有激活操作,则利用激活函数对偏置操作的输出值进行激活,得到激活值,最后对激活值进行量化,得到数据块的点卷积结果。If the dot convolution layer has an activation operation, the activation function is used to activate the output value of the bias operation to obtain the activation value, and finally the activation value is quantized to obtain the dot convolution result of the data block.
最后将数据块的点卷积结果存入片内存储器。Finally, the point convolution result of the data block is stored in the on-chip memory.
其中,点卷积层一般具有多个输出通道,可利用上述点卷积操作对各个输出通道并行处理,以提高运算效率。Among them, the point convolution layer generally has multiple output channels, and the aforementioned point convolution operation can be used to process each output channel in parallel to improve computing efficiency.
移动特征图的数据块,对特征图的每一个数据块执行上述深度卷积和点卷积操作,直到特征图所有数据块都处理完毕。Move the data blocks of the feature map, and perform the above-mentioned deep convolution and point convolution operations on each data block of the feature map until all the data blocks of the feature map are processed.
由此可见,本实施例的基于深度神经网络的数据处理方法,将深度卷积结果存储至片内存储器,点卷积层从片内存储器中读取深度卷积结果进行处理,中间结果(深度卷积结果)的读写均在片内完成,无需对片外存储器进行读写操作。相对于将中间结果写入片外存储器,再从片外存储器读取中间结果的方式,进一步节省了移动设备的带宽,提高了移动设备执行深度神经网络运算的性能,能够支持算力低、带宽少这样的更低端的移动设备。It can be seen that the data processing method based on the deep neural network of this embodiment stores the deep convolution result in the on-chip memory, and the point convolution layer reads the deep convolution result from the on-chip memory for processing, and the intermediate result (depth The reading and writing of the convolution result) are all done on-chip, without reading and writing off-chip memory. Compared with the method of writing intermediate results into off-chip memory and then reading intermediate results from off-chip memory, it further saves the bandwidth of mobile devices, improves the performance of mobile devices to perform deep neural network operations, and can support low computing power and bandwidth Few such lower-end mobile devices.
本公开另一实施例提供了一种基于深度神经网络的数据处理装置,包括:Another embodiment of the present disclosure provides a data processing device based on a deep neural network, including:
获取单元,用于获取所述深度神经网络的浮点网络模型。The obtaining unit is used to obtain the floating point network model of the deep neural network.
量化单元,用于对所述浮点网络模型进行量化,得到至少两个不同精度的定点网络模型。The quantization unit is used to quantize the floating-point network model to obtain at least two fixed-point network models with different precisions.
选定单元,用于根据所述定点网络模型的所述精度,选定至少两个所述定点网络模型中的一个定点网络模型。The selection unit is configured to select one of at least two fixed-point network models according to the accuracy of the fixed-point network model.
处理单元,利用选定的所述定点网络模型对数据进行处理。The processing unit uses the selected fixed-point network model to process the data.
本实施例的数据处理装置用于移动设备,移动设备可直接从外部获取浮点网络模型,浮点网络模型是经过训练的深度神经网络。The data processing apparatus of this embodiment is used for mobile equipment, and the mobile equipment can directly obtain a floating-point network model from the outside, and the floating-point network model is a trained deep neural network.
训练深度神经网络分为两个步骤执行,首先,构建深度可分离的卷积神经网络。然后,对构建的深度可分离的卷积神经网络进行训练,得到深度可分离的卷积神经网络模型。Training a deep neural network is performed in two steps. First, build a deep separable convolutional neural network. Then, train the constructed deep-separable convolutional neural network to obtain a deep-separable convolutional neural network model.
如上所述,本实施例构建的深度可分离的卷积神经网络,可以是Mobile Net或其他类型的神经网络,这些神经网络的卷积层均包括深度卷积层与点卷积层两个卷积层。As mentioned above, the depth separable convolutional neural network constructed in this embodiment can be Mobile Net or other types of neural networks. The convolutional layers of these neural networks include deep convolutional layers and point convolutional layers. Buildup.
本实施例利用训练图像对深度可分离的卷积神经网络进行训练。首先对训练图像的数据进行归一化,将训练图像的数据归一化到[-1,1),再将归一化的数据输入构建的深度可分离的卷积神经网络进行训练,得到深度可分离的卷积神经网络模型。In this embodiment, training images are used to train a convolutional neural network that is separable in depth. First, normalize the training image data, normalize the training image data to [-1, 1), and then input the normalized data into the constructed deep separable convolutional neural network for training, and get the depth Separable convolutional neural network model.
在本实施例中,可采用32位浮点数表示深度可分离的卷积神经网络,深度卷积层和点卷积层中的权值、偏置值、激活值等均用32位浮点数表示,得到的浮点网络模型是32为浮点网络模型。由于采用32位浮点数,所以卷积神经网络参数的数据量、以及训练的计算量很大,移动设备不能提供足够的计算资源,所以一般不在移动设备训练深度神经网络,而是在服务器或计算机上进行训练,然后将训练好的浮点网络模型移植到移动设备。In this embodiment, a 32-bit floating point number can be used to represent a convolutional neural network that is separable in depth, and the weights, bias values, and activation values in the deep convolutional layer and the point convolutional layer are all represented by 32-bit floating point numbers , The floating-point network model obtained is 32 floating-point network model. Due to the use of 32-bit floating point numbers, the data volume of convolutional neural network parameters and the amount of training calculations are very large. Mobile devices cannot provide enough computing resources. Therefore, deep neural networks are generally not trained on mobile devices, but on servers or computers. Perform training on the Internet, and then transplant the trained floating-point network model to mobile devices.
由于移动设备的存储以及计算能力有限,甚至可能不支持浮点运算,因此,如果直接利用浮点网络模型进行数据处理,将会对移动设备的存储、计算能力以及功耗造成严重的负担,移动设备甚至可能无法完成数据处理过程。因此,移动设备需要对浮点网络模型进行量化,量化后得到定点网络模型。定点网络模型相对浮点网络模型需要更少的存储和计算能力,非常适用于移动设备。Due to the limited storage and computing capabilities of mobile devices, they may not even support floating-point operations. Therefore, if the floating-point network model is directly used for data processing, it will cause a serious burden on the storage, computing capabilities and power consumption of mobile devices. The device may even be unable to complete the data processing process. Therefore, mobile devices need to quantify the floating-point network model, and obtain the fixed-point network model after quantification. The fixed-point network model requires less storage and computing power than the floating-point network model, and is very suitable for mobile devices.
量化单元采用量化方式可称为混合精度量化。具体来说,针对浮点网络模型卷积层的深度卷积层,采用不同位数对深度卷积层的权重进行量化,采用同一位数对深度卷积层的激活值进行量化。针对浮点网络模型的点卷积层,采用同一位数对点卷积层的权重进行量化,采用同一位数对点卷积层的激活值进行量化。通过对深度卷积层的权重进行混合精度量化,可得 到与权重量化位数对应的至少两个定点网络模型。The quantization method used by the quantization unit can be called mixed precision quantization. Specifically, for the deep convolutional layer of the floating-point network model convolutional layer, different bits are used to quantify the weight of the deep convolutional layer, and the same bit is used to quantify the activation value of the deep convolutional layer. For the point convolution layer of the floating point network model, the same number of bits is used to quantify the weight of the point convolution layer, and the same number of bits is used to quantify the activation value of the point convolution layer. By quantizing the weights of the deep convolutional layer with mixed precision, at least two fixed-point network models corresponding to the quantization bits of the weights can be obtained.
在本实施例中,对于深度卷积层,可分别将权重量化为8位和16位定点数;对于8位定点数的深度卷积层,将其对应的激活值量化为8位定点数;对于16位定点数的深度卷积层,将其对应的激活值也量化为8位定点数。对于点卷积层,将权重和激活值量化为8位定点数。In this embodiment, for the deep convolutional layer, the weights can be converted into 8-bit and 16-bit fixed-point numbers respectively; for the 8-bit fixed-point deep convolutional layer, the corresponding activation value is quantized into 8-bit fixed-point numbers; For the 16-bit fixed-point deep convolutional layer, the corresponding activation value is also quantized into an 8-bit fixed-point number. For the point convolutional layer, the weight and activation value are quantized into 8-bit fixed point numbers.
通过上述方式,本实施例将浮点网络模型量化为两个定点网络模型:第一定点网络模型和第二定点网络模型。其中,第一定点网络模型的深度卷积层的权重和激活值均为8位定点数(w8a8),其点卷积层的权重和激活值均为8位定点数(w8a8);第二定点网络模型的深度卷积层的权重为16位定点数,激活值为8位定点数(w16a8),其点卷积层的权重和激活值均为8位定点数(w8a8)。In the above manner, this embodiment quantifies the floating-point network model into two fixed-point network models: the first fixed-point network model and the second fixed-point network model. Among them, the weight and activation value of the deep convolutional layer of the first fixed-point network model are both 8-bit fixed-point numbers (w8a8), and the weight and activation value of the point convolutional layer are both 8-bit fixed-point numbers (w8a8); second The weight of the deep convolution layer of the fixed-point network model is a 16-bit fixed-point number, and the activation value is an 8-bit fixed-point number (w16a8), and the weight and activation value of the point convolutional layer are both 8-bit fixed-point number (w8a8).
以上通过一个示例对本步骤进行了说明,但本实施例不限于此。例如,还可以对深度卷积层的激活值进行混合精度量化,即采用同一位数对深度卷积层的权重进行量化,采用不同位数对深度卷积层的激活值进行量化。This step has been described above through an example, but this embodiment is not limited to this. For example, the activation value of the deep convolutional layer can also be quantized with mixed precision, that is, the same number of bits is used to quantize the weight of the deep convolutional layer, and the activation value of the deep convolutional layer is quantized with different bits.
还可以对点卷积层的权重和激活值进行混合精度量化,包括:针对浮点网络模型卷积层的深度卷积层,采用同一位数对其权重进行量化,采用同一位数对其激活值进行量化;针对浮点网络模型的点卷积层,采用不同位数对其权重进行量化,采用同一位数对其激活值进行量化,或者采用同一位数对其权重进行量化,采用不同位数对其激活值进行量化。The weight and activation value of the point convolution layer can also be quantified with mixed precision, including: for the deep convolution layer of the floating-point network model convolution layer, the weight is quantized with the same bit, and the same bit is used to activate it Value is quantized; for the point convolution layer of the floating-point network model, use different bits to quantize its weight, use the same bit to quantize its activation value, or use the same bit to quantize its weight, use different bits Quantify its activation value.
还可以对深度卷积层和点卷积层均采用混合精度量化,具体量化方式与上述深度卷积层或点卷积层的量化方式类似。It is also possible to use mixed precision quantization for both the deep convolutional layer and the point convolutional layer, and the specific quantization method is similar to the quantization method of the deep convolutional layer or the point convolutional layer.
本领域技术人员应当了解,上述第一定点网络模型和第二定点网络模型的量化位数仅是一个示例,本实施例还可以采用8位、16位之外的其他量化位数,以及量化为三个以上的定点网络模型。对于深度卷积层和点卷积层,当采用同一位数对权重进行量化,且采用同一位数对激活值进行量化时,权重的量化位数与激活值的量化位数可以相同,也可以不同。Those skilled in the art should understand that the quantization bits of the above-mentioned first fixed-point network model and the second fixed-point network model are only an example. This embodiment may also use other quantization bits other than 8 bits and 16 bits, and quantization bits. For more than three fixed-point network models. For the deep convolutional layer and the point convolutional layer, when the same number of bits is used to quantize the weight and the same number of bits is used to quantize the activation value, the quantization bit of the weight and the quantization bit of the activation value can be the same, or different.
定点网络模型的精度和所需的计算能力与其量化位数有关。量化单元得到的至少两个不同精度的定点网络模型,对应不同的计算能力。量化位数越多,精度越高,所需计算能力越大,量化位数越少,精度越低,所需计算能力越小。例如,对于第一定点网络模型和第二定点网络模型,由于 第二定点网络模型的深度卷积层的权重定点位数为16,第一定点网络模型的深度卷积层的权重定点位数为8,因此,第二定点网络模型的精度优于第一定点网络模型,但所需计算能力大于第一定点网络模型。因此,选定单元需要从至少两个定点网络模型中选定一个定点网络模型,以实现在精度下降不大的情况下,尽可能地降低所需计算能力,从而减小对带宽的要求,达到兼顾精度与带宽的目的。The accuracy of the fixed-point network model and the required computing power are related to its quantization bits. At least two fixed-point network models with different precisions obtained by the quantization unit correspond to different computing capabilities. The more quantization bits, the higher the accuracy, and the greater the computing power required, the fewer the quantization bits, the lower the accuracy, and the smaller the required computing power. For example, for the first fixed-point network model and the second fixed-point network model, since the weight of the deep convolutional layer of the second fixed-point network model is 16, the weight of the deep convolutional layer of the first fixed-point network model is fixed-point The number is 8, therefore, the accuracy of the second fixed-point network model is better than that of the first fixed-point network model, but the required computing power is greater than that of the first fixed-point network model. Therefore, the selected unit needs to select a fixed-point network model from at least two fixed-point network models, so as to reduce the required computing power as much as possible while the accuracy is not greatly reduced, thereby reducing the bandwidth requirements and achieving Balance the purpose of accuracy and bandwidth.
具体来说,选定单元先将同一测试数据输入至少两个定点网络模型,至少两个定点网络模型对该测试数据进行推理,得到处理结果。其中,该测试数据采用8位整数(i8),可以是图像数据,也可以是图像数据之外的其他数据,例如语音数据。Specifically, the selected unit first inputs the same test data into at least two fixed-point network models, and the at least two fixed-point network models perform inferences on the test data to obtain processing results. Wherein, the test data uses an 8-bit integer (i8), which can be image data, or other data besides image data, such as voice data.
当采用上述第一定点网络模型和第二定点网络模型时,将测试图像输入第一定点网络模型和第二定点网络模型,第一定点网络模型和第二定点网络模型分别对测试图像进行推理,得到图像处理结果。When the above-mentioned first fixed-point network model and second fixed-point network model are used, the test image is input into the first fixed-point network model and the second fixed-point network model. The first fixed-point network model and the second fixed-point network model compare the test images respectively. Perform reasoning and get the image processing result.
然后选定单元获取各个所述定点网络模型的处理结果的精度值。Then the selected unit obtains the accuracy value of the processing result of each of the fixed-point network models.
本实施例中,处理结果的精度值用平均精度均值(mAP)表征。当采用上述第一定点网络模型和第二定点网络模型时,第一定点网络模型的精度值称为第一精度值,第二定点网络模型的精度值称为第二精度值。当然处理结果的精度值还可以用平均精度(AP)等其他参数表征。In this embodiment, the accuracy value of the processing result is characterized by the mean average accuracy (mAP). When the above-mentioned first fixed-point network model and the second fixed-point network model are used, the accuracy value of the first fixed-point network model is called the first accuracy value, and the accuracy value of the second fixed-point network model is called the second accuracy value. Of course, the accuracy value of the processing result can also be characterized by other parameters such as average accuracy (AP).
选定单元再判断是否存在至少一个定点网络模型,其精度值与精度最高的定点网络模型的精度值的差值在阈值内。The selected unit then judges whether there is at least one fixed-point network model, and the difference between its precision value and the precision value of the fixed-point network model with the highest precision is within the threshold.
以精度最高的定点网络模型为比较对象,其他定点网络模型与之比较精度值。所述阈值可以根据经验、或精度要求来设置,例如可以是1%。也就是说,判断是否存在精度值与精度最高的定点网络模型的精度值的差值在1%内的定点网络模型。当采用上述第一定点网络模型和第二定点网络模型时,第二定点网络模型为精度最高的定点网络模型,判断第一定点网络模型的精度值与第二定点网络模型的精度值的差值是否在1%内。Take the fixed-point network model with the highest accuracy as the comparison object, and compare accuracy values with other fixed-point network models. The threshold may be set according to experience or accuracy requirements, for example, it may be 1%. In other words, it is judged whether there is a fixed-point network model whose accuracy value is within 1% of the accuracy value of the fixed-point network model with the highest accuracy. When the above-mentioned first fixed-point network model and second fixed-point network model are used, the second fixed-point network model is the fixed-point network model with the highest accuracy, and the accuracy value of the first fixed-point network model is judged to be the difference between the accuracy value of the second fixed-point network model Is the difference within 1%?
如果否,则将所述精度最高的定点网络模型作为选定的定点网络模型;如果是,将所述至少一个定点网络模型中占用内存最小的定点网络模型作为选定的定点网络模型。If not, use the fixed-point network model with the highest accuracy as the selected fixed-point network model; if so, use the fixed-point network model that occupies the smallest memory among the at least one fixed-point network model as the selected fixed-point network model.
如果否,则说明其他定点网络模型与精度最高的定点网络模型相比, 精度值差异过大,如果选定其他定点网络模型,虽然可以降低所需计算能力,减小对带宽的要求,但是将较为严重的影响网络精度,所以,为保证网络精度,此时选定精度最高的定点网络模型。If not, it means that the accuracy value of other fixed-point network models is too large compared with the fixed-point network model with the highest accuracy. If other fixed-point network models are selected, although the required computing power and bandwidth requirements can be reduced, it will It will seriously affect the accuracy of the network. Therefore, to ensure the accuracy of the network, the fixed-point network model with the highest accuracy is selected at this time.
如果是,说明其他定点网络模型与精度最高的定点网络模型的精度值差异很小,不会对网络精度产生过大影响,所以为尽可能地降低所需计算能力,减小对带宽的要求,选定占用内存最小的定点网络模型,这样网络精度也保证不会明显降低。If it is, it means that the accuracy value difference between other fixed-point network models and the most accurate fixed-point network model is very small, and will not have an excessive impact on network accuracy. Therefore, in order to reduce the required computing power as much as possible and reduce the bandwidth requirements, Select the fixed-point network model that occupies the smallest memory so that the network accuracy will not be significantly reduced.
当采用上述第一定点网络模型和第二定点网络模型时,如果第一定点网络模型的精度值与第二定点网络模型的精度值的差值在1%内,则选定第一定点网络模型,否则,选定第二定点网络模型。When the above-mentioned first fixed-point network model and second fixed-point network model are used, if the difference between the accuracy value of the first fixed-point network model and the accuracy value of the second fixed-point network model is within 1%, the first fixed-point network model is selected Point network model, otherwise, select the second fixed point network model.
处理单元利用选定的所述定点网络模型对数据进行处理。选定定点网络模型后,处理单元即可对输入数据进行处理,得到数据处理结果。The processing unit uses the selected fixed-point network model to process the data. After the fixed-point network model is selected, the processing unit can process the input data to obtain the data processing result.
由此可见,本实施例的基于深度神经网络的数据处理装置,通过将浮点网络模型量化为至少两个不同精度的定点网络模型,并根据定点网络模型的精度,选定一个定点网络模型来对数据进行处理,可以实现在保证网络精度的情况下,尽可能地降低所需计算能力,减小对带宽的要求,兼顾精度与带宽的平衡,有效解决了网络精度和带宽之间的矛盾,提高了移动设备执行深度神经网络运算的性能。It can be seen that the data processing device based on the deep neural network of this embodiment quantifies the floating-point network model into at least two fixed-point network models with different accuracy, and selects a fixed-point network model according to the accuracy of the fixed-point network model. Processing data can reduce the required computing power and bandwidth requirements as much as possible while ensuring network accuracy, taking into account the balance between accuracy and bandwidth, and effectively solving the contradiction between network accuracy and bandwidth. Improve the performance of mobile devices to perform deep neural network operations.
本公开另一实施例的基于深度神经网络的数据处理装置,为简要描述,其与上一实施例相同或相似的特征不再赘述,以下仅描述其不同于上一实施例的特征。For a brief description of the data processing device based on a deep neural network according to another embodiment of the present disclosure, the features that are the same as or similar to the previous embodiment will not be repeated, and only the features that are different from the previous embodiment will be described below.
本实施例的数据处理装置,处理单元在利用选定的所述定点网络模型对数据进行处理的过程中,将深度卷积层的深度卷积结果存入片内存储器,读取片内存储器存储的所述深度卷积结果,点卷积层对所述深度卷积结果进行处理。In the data processing device of this embodiment, the processing unit stores the deep convolution result of the deep convolution layer into the on-chip memory and reads the on-chip memory storage during the process of processing the data using the selected fixed-point network model The depth convolution result of the point convolution layer processes the depth convolution result.
在深度神经网络中,处理单元利用每一隐藏层对输入数据进行处理,将处理结果输出给下一隐藏层,作为下一隐藏层的输入数据。当处理图像数据时,每一隐藏层的输入数据和输出数据称为特征图(feature map),在进行深度卷积和点卷积操作时,分块(tile)对特征图进行处理。In the deep neural network, the processing unit uses each hidden layer to process the input data, and outputs the processing result to the next hidden layer as the input data of the next hidden layer. When processing image data, the input data and output data of each hidden layer are called feature maps. When performing deep convolution and point convolution operations, the feature maps are processed by tiles.
本实施例中,对于选定的定点网络模型的每一卷积层,处理单元首先 执行深度卷积操作:In this embodiment, for each convolutional layer of the selected fixed-point network model, the processing unit first performs a deep convolution operation:
先将特征图的一个数据块(tile)存入片内存储器。数据块的大小与下述深度卷积层的卷积核大小相等。所述片内存储器是指处理单元内部的存储器而非外接的存储器,片内存储器可以是片内内存或缓存。First, a data block (tile) of the feature map is stored in the on-chip memory. The size of the data block is equal to the size of the convolution kernel of the following deep convolution layer. The on-chip memory refers to a memory inside the processing unit rather than an external memory, and the on-chip memory may be an on-chip memory or a cache.
然后从片内存储器读取数据块,深度卷积层对数据块进行处理,得到数据块的深度卷积结果。数据块与卷积核的权重进行卷积,卷积结果叠加偏置值。如果该深度卷积层具有激活操作,则利用激活函数对偏置操作的输出值进行激活,得到激活值,最后对激活值进行量化,得到深度卷积结果。Then the data block is read from the on-chip memory, and the deep convolution layer processes the data block to obtain the deep convolution result of the data block. The data block is convolved with the weight of the convolution kernel, and the convolution result is superimposed with the offset value. If the deep convolutional layer has an activation operation, the activation function is used to activate the output value of the bias operation to obtain the activation value, and finally the activation value is quantized to obtain the deep convolution result.
最后将所述数据块的深度卷积结果存入片内存储器,即存入处理单元的片内内存或缓存中,而不是存储在DDR等片外存储器中。Finally, the deep convolution result of the data block is stored in the on-chip memory, that is, stored in the on-chip memory or cache of the processing unit, instead of being stored in an off-chip memory such as DDR.
当处理多个输入通道的特征图时,处理单元可利用上述深度卷积操作对各个输入通道的特征图并行处理,以提高运算效率。When processing the feature maps of multiple input channels, the processing unit can use the aforementioned deep convolution operation to process the feature maps of each input channel in parallel to improve computing efficiency.
然后处理单元进行点卷积操作:Then the processing unit performs point convolution operations:
先读取所述片内存储器存储的数据块的深度卷积结果。即从处理单元的片内内存或缓存、而不是DDR等片外存储器中读取深度卷积结果。First, read the deep convolution result of the data block stored in the on-chip memory. That is, the deep convolution result is read from the on-chip memory or cache of the processing unit instead of off-chip memory such as DDR.
然后点卷积层对数据块的深度卷积结果进行处理,得到数据块的点卷积结果。各个输入通道的数据块与卷积核的权重进行卷积,卷积结果叠加后再叠加偏置值。如果该点卷积层具有激活操作,则利用激活函数对偏置操作的输出值进行激活,得到激活值,最后对激活值进行量化,得到数据块的点卷积结果。Then the point convolution layer processes the deep convolution result of the data block to obtain the point convolution result of the data block. The data block of each input channel is convolved with the weight of the convolution kernel, and the offset value is superimposed after the convolution result is superimposed. If the dot convolution layer has an activation operation, the activation function is used to activate the output value of the bias operation to obtain the activation value, and finally the activation value is quantized to obtain the dot convolution result of the data block.
最后将数据块的点卷积结果存入片内存储器。Finally, the point convolution result of the data block is stored in the on-chip memory.
其中,点卷积层一般具有多个输出通道,处理单元可利用上述点卷积操作对各个输出通道并行处理,以提高运算效率。Among them, the point convolution layer generally has multiple output channels, and the processing unit can use the above point convolution operation to process each output channel in parallel to improve the computing efficiency.
移动特征图的数据块,对特征图的每一个数据块执行上述深度卷积和点卷积操作,直到特征图所有数据块都处理完毕。Move the data blocks of the feature map, and perform the above-mentioned deep convolution and point convolution operations on each data block of the feature map until all the data blocks of the feature map are processed.
由此可见,本实施例的基于深度神经网络的数据处理装置,将深度卷积结果存储至片内存储器,点卷积层从片内存储器中读取深度卷积结果进行处理,中间结果(深度卷积结果)的读写均在片内完成,无需对片外存储器进行读写操作。相对于将中间结果写入片外存储器,再从片外存储器 读取中间结果的方式,进一步节省了移动设备的带宽,提高了移动设备执行深度神经网络运算的性能,能够支持算力低、带宽少这样的更低端的移动设备。It can be seen that the data processing device based on the deep neural network of this embodiment stores the deep convolution result in the on-chip memory, and the point convolution layer reads the deep convolution result from the on-chip memory for processing, and the intermediate result (depth The reading and writing of the convolution result) are all done on-chip, without reading and writing off-chip memory. Compared with the method of writing intermediate results into off-chip memory and then reading intermediate results from off-chip memory, it further saves the bandwidth of mobile devices, improves the performance of mobile devices to perform deep neural network operations, and can support low computing power and bandwidth Few such lower-end mobile devices.
需要说明的是,本公开实施例对基于深度神经网络的数据处理装置及其获取单元、量化单元、选定单元、处理单元的实现形式不做限定。对于获取单元、量化单元、选定单元、处理单元,每个单元均可以由单独的处理器实现,部分单元或全部单元也可以由一个处理器实现。应理解,本公开实施例中提及的处理器可以是中央处理单元(CentralProcessing Unit,CPU),还可以是其他通用处理器、数字信号处理器(Digital SignalProcessor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。It should be noted that the embodiments of the present disclosure do not limit the implementation form of the deep neural network-based data processing device and its acquisition unit, quantization unit, selection unit, and processing unit. For the acquisition unit, quantization unit, selection unit, and processing unit, each unit can be implemented by a separate processor, and some or all of the units can also be implemented by one processor. It should be understood that the processor mentioned in the embodiments of the present disclosure may be a central processing unit (Central Processing Unit, CPU), other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), and application specific integrated circuits (Application Specific Integrated Circuits). Integrated Circuit, ASIC), Field Programmable Gate Array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc. The general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
还应理解,本公开实施例中提及的片内存储器是指集成在处理器内部的存储器,其可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。It should also be understood that the on-chip memory mentioned in the embodiments of the present disclosure refers to the memory integrated inside the processor, which may be volatile memory or non-volatile memory, or may include both volatile and non-volatile memory. Both memory.
本公开再一实施例提供了一种移动设备,该移动设备包括上述任一实施例的基于深度神经网络的数据处理装置。其中,该移动设备可以是便携式移动终端、无人机、手持云台、遥控器等,便携式移动终端可以是手机、平板电脑等,遥控器可以是无人机的遥控器。Yet another embodiment of the present disclosure provides a mobile device, which includes the deep neural network-based data processing device of any of the foregoing embodiments. Wherein, the mobile device may be a portable mobile terminal, a drone, a handheld PTZ, a remote control, etc., the portable mobile terminal may be a cell phone, a tablet computer, etc., and the remote control may be a remote control of a drone.
本领域技术人员可以清楚地了解到,为描述的方便和简洁,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。上述描述的装置的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and conciseness of the description, only the division of the above-mentioned functional modules is used as an example. In practical applications, the above-mentioned functions can be allocated by different functional modules as required, namely, the device The internal structure is divided into different functional modules to complete all or part of the functions described above. For the specific working process of the device described above, reference may be made to the corresponding process in the foregoing method embodiment, which is not repeated here.
最后应说明的是:以上各实施例仅用以说明本公开的技术方案,而非对其限制;尽管参照前述各实施例对本公开进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分或者全部技术特征进行等同替换;在不冲突的情况下,本公开实施例中的特征可以任意组合;而这些修改或者替换,并不 使相应技术方案的本质脱离本公开各实施例技术方案的范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present disclosure, not to limit them; although the present disclosure has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand: The technical solutions described in the foregoing embodiments can still be modified, or some or all of the technical features can be equivalently replaced; in the case of no conflict, the features in the embodiments of the present disclosure can be combined arbitrarily; and these modifications or replacements It does not cause the essence of the corresponding technical solutions to deviate from the scope of the technical solutions of the embodiments of the present disclosure.

Claims (39)

  1. 一种基于深度神经网络的数据处理方法,其中,包括:A data processing method based on deep neural network, which includes:
    获取所述深度神经网络的浮点网络模型;Acquiring a floating-point network model of the deep neural network;
    对所述浮点网络模型进行量化,得到至少两个不同精度的定点网络模型;Quantify the floating-point network model to obtain at least two fixed-point network models with different precisions;
    根据所述定点网络模型的所述精度,选定至少两个所述定点网络模型中的一个定点网络模型;Selecting one of at least two fixed-point network models according to the accuracy of the fixed-point network model;
    利用选定的所述定点网络模型对数据进行处理。The data is processed using the selected fixed-point network model.
  2. 如权利要求1所述的数据处理方法,其中,所述深度神经网络为深度可分离的卷积神经网络。8. The data processing method according to claim 1, wherein the deep neural network is a convolutional neural network with depth separation.
  3. 如权利要求2所述的数据处理方法,其中,所述深度可分离的卷积神经网络包括多个卷积层,每个所述卷积层包括:深度卷积层与点卷积层。3. The data processing method of claim 2, wherein the depth separable convolutional neural network includes a plurality of convolutional layers, and each of the convolutional layers includes a deep convolutional layer and a point convolutional layer.
  4. 如权利要求3所述的数据处理方法,其中,对所述浮点网络模型进行量化包括:5. The data processing method of claim 3, wherein quantizing the floating-point network model comprises:
    对所述深度卷积层采用不同位数对权重进行量化,采用同一位数对激活值进行量化。For the deep convolutional layer, different bits are used to quantify the weight, and the same number of bits is used to quantize the activation value.
  5. 如权利要求3所述的数据处理方法,其中,对所述浮点网络模型进行量化包括:5. The data processing method of claim 3, wherein quantizing the floating-point network model comprises:
    对所述点卷积层采用不同位数对权重进行量化,采用同一位数对激活值进行量化。The point convolution layer uses different bits to quantize the weight, and uses the same bit to quantize the activation value.
  6. 如权利要求3所述的数据处理方法,其中,对所述浮点网络模型进行量化包括:5. The data processing method of claim 3, wherein quantizing the floating-point network model comprises:
    对所述深度卷积层和所述点卷积层二者,均采用不同位数对权重进行量化,采用同一位数对激活值进行量化。For both the deep convolutional layer and the point convolutional layer, different numbers of bits are used to quantize the weights, and the same number of bits is used to quantize the activation values.
  7. 如权利要求4所述的数据处理方法,其中,The data processing method according to claim 4, wherein:
    对于所述点卷积层,采用同一位数对权重进行量化,采用同一位数对激活值进行量化。For the dot convolutional layer, the same number of bits is used to quantize the weight, and the same number of bits is used to quantize the activation value.
  8. 如权利要求7所述的数据处理方法,其中,至少两个所述定点网络模型包括:第一定点网络模型和第二定点网络模型;8. The data processing method of claim 7, wherein at least two of the fixed-point network models include: a first fixed-point network model and a second fixed-point network model;
    所述第一定点网络模型的深度卷积层的权重和激活值均为第一数值位,其点卷积层的权重和激活值均为所述第一数值位;The weight and activation value of the deep convolution layer of the first fixed-point network model are both the first numerical value, and the weight and activation value of the point convolution layer are both the first numerical value;
    所述第二定点网络模型的深度卷积层的权重为第二数值位,激活值为所述第一数值位,其点卷积层的权重和激活值均为所述第一数值位。The weight of the deep convolution layer of the second fixed-point network model is the second numerical bit, the activation value is the first numerical bit, and the weight and activation value of the point convolution layer are both the first numerical bit.
  9. 如权利要求8所述的数据处理方法,其中,所述第一数值位为八,所述第二数值位为十六。8. The data processing method according to claim 8, wherein the first numerical digit is eight and the second numerical digit is sixteen.
  10. 如权利要求1所述的数据处理方法,其中,根据所述定点网络模型的所述精度,选定至少两个所述定点网络模型中的一个定点网络模型包括:The data processing method of claim 1, wherein, according to the accuracy of the fixed-point network model, selecting one of the at least two fixed-point network models comprises:
    利用所述至少两个所述定点网络模型对同一测试数据进行处理;Processing the same test data by using the at least two fixed-point network models;
    获取各个所述定点网络模型的处理结果的精度值;Acquiring the accuracy value of the processing result of each of the fixed-point network models;
    判断是否存在至少一个定点网络模型,其精度值与精度最高的定点网络模型的精度值的差值在阈值内;Determine whether there is at least one fixed-point network model, and the difference between its accuracy value and that of the fixed-point network model with the highest accuracy is within the threshold;
    如果否,则将所述精度最高的定点网络模型作为选定的定点网络模型;如果是,将所述至少一个定点网络模型中占用内存最小的定点网络模型作为选定的定点网络模型。If not, use the fixed-point network model with the highest accuracy as the selected fixed-point network model; if so, use the fixed-point network model that occupies the smallest memory among the at least one fixed-point network model as the selected fixed-point network model.
  11. 如权利要求10所述的数据处理方法,其中,所述深度神经网络为深度可分离的卷积神经网络;10. The data processing method of claim 10, wherein the deep neural network is a convolutional neural network that is deeply separable;
  12. 如权利要求11所述的数据处理方法,其中,所述深度可分离的卷积神经网络包括多个卷积层,每个所述卷积层包括:深度卷积层与点卷积层;11. The data processing method according to claim 11, wherein the depth separable convolutional neural network includes a plurality of convolutional layers, and each of the convolutional layers includes: a deep convolutional layer and a point convolutional layer;
    所述定点网络模型的所述精度与所述深度卷积层和所述点卷积层的至少其中之一的权重位数和激活值位数相对应。The accuracy of the fixed-point network model corresponds to the number of weight bits and the number of activation values of at least one of the deep convolution layer and the point convolution layer.
  13. 如权利要求12所述的数据处理方法,其中,至少两个所述定点网络模型的所述深度卷积层的权重位数互不相同,激活值位数相同,所述点卷积层的权重位数相同,激活值位数相同。The data processing method according to claim 12, wherein the weight bits of the deep convolutional layers of at least two fixed-point network models are different from each other, and the number of activation value bits is the same, and the weights of the point convolutional layers The number of digits is the same, and the number of activation values is the same.
  14. 如权利要求13所述的数据处理方法,其中,至少两个所述定点网络模型包括:第一定点网络模型和第二定点网络模型;The data processing method according to claim 13, wherein at least two of the fixed-point network models include: a first fixed-point network model and a second fixed-point network model;
    所述第一定点网络模型的深度卷积层的权重和激活值均为第一数值位,其点卷积层的权重和激活值均为所述第一数值位;The weight and activation value of the deep convolution layer of the first fixed-point network model are both the first numerical value, and the weight and activation value of the point convolution layer are both the first numerical value;
    所述第二定点网络模型的深度卷积层的权重为第二数值位,激活值为所述第一数值位,其点卷积层的权重和激活值均为所述第一数值位;The weight of the deep convolution layer of the second fixed-point network model is a second numerical value, the activation value is the first numerical value, and the weight and activation value of the point convolution layer are both the first numerical value;
    所述占用内存最小的定点网络模型为所述第一定点网络模型;The fixed-point network model with the smallest memory occupation is the first fixed-point network model;
    所述精度最高的定点网络模型为所述第二定点网络模型。The fixed-point network model with the highest accuracy is the second fixed-point network model.
  15. 如权利要求14所述的数据处理方法,其中,所述第一数值位为八,所述第二数值位为十六。15. The data processing method according to claim 14, wherein the first numerical digit is eight and the second numerical digit is sixteen.
  16. 如权利要求2所述的数据处理方法,其中,所述深度可分离的卷积神经网络包括多个卷积层,每个所述卷积层包括:深度卷积层与点卷积层;3. The data processing method of claim 2, wherein the depth separable convolutional neural network includes a plurality of convolutional layers, and each of the convolutional layers includes: a deep convolutional layer and a point convolutional layer;
    利用选定的所述定点网络模型对数据进行处理包括:Processing data using the selected fixed-point network model includes:
    将深度卷积层的深度卷积结果存入片内存储器,读取片内存储器存储的所述深度卷积结果,点卷积层对所述深度卷积结果进行处理。The deep convolution result of the deep convolution layer is stored in the on-chip memory, the deep convolution result stored in the on-chip memory is read, and the point convolution layer processes the deep convolution result.
  17. 如权利要求16所述的数据处理方法,其中,对于所述定点网络模型的每一卷积层:The data processing method according to claim 16, wherein for each convolutional layer of the fixed-point network model:
    所述深度卷积层对所述数据的数据块进行处理,得到所述数据块的深度卷积结果;The deep convolution layer processes the data block of the data to obtain a deep convolution result of the data block;
    将所述数据块的深度卷积结果存入片内存储器;Storing the deep convolution result of the data block in the on-chip memory;
    读取所述片内存储器存储的所述数据块的深度卷积结果;Reading the deep convolution result of the data block stored in the on-chip memory;
    所述点卷积层对所述数据块的所述深度卷积结果进行处理,得到所述数据块的点卷积结果;The point convolution layer processes the depth convolution result of the data block to obtain the point convolution result of the data block;
    移动数据块,对所述数据的所有数据块执行上述处理。Move the data block, and perform the above processing on all the data blocks of the data.
  18. 如权利要求2或11所述的数据处理方法,其中,所述深度可分离的卷积神经网络为Mobile Net网络。The data processing method of claim 2 or 11, wherein the deeply separable convolutional neural network is a Mobile Net network.
  19. 如权利要求10所述的数据处理方法,其中,所述精度值为平均精度和平均精度均值的至少一种。The data processing method according to claim 10, wherein the accuracy value is at least one of an average accuracy and an average average accuracy.
  20. 一种基于深度神经网络的数据处理装置,其中,包括:A data processing device based on a deep neural network, which includes:
    获取单元,用于获取所述深度神经网络的浮点网络模型;An obtaining unit for obtaining the floating-point network model of the deep neural network;
    量化单元,用于对所述浮点网络模型进行量化,得到至少两个不同精度的定点网络模型;A quantization unit, configured to quantify the floating-point network model to obtain at least two fixed-point network models with different precisions;
    选定单元,用于根据所述定点网络模型的所述精度,选定至少两个所 述定点网络模型中的一个定点网络模型;A selection unit for selecting one fixed-point network model of at least two fixed-point network models according to the accuracy of the fixed-point network model;
    处理单元,利用选定的所述定点网络模型对数据进行处理。The processing unit uses the selected fixed-point network model to process the data.
  21. 如权利要求20所述的数据处理装置,其中,所述深度神经网络为深度可分离的卷积神经网络。22. The data processing device of claim 20, wherein the deep neural network is a convolutional neural network that is deeply separable.
  22. 如权利要求21所述的数据处理装置,其中,所述深度可分离的卷积神经网络包括多个卷积层,每个所述卷积层包括:深度卷积层与点卷积层。22. The data processing device of claim 21, wherein the depth separable convolutional neural network includes a plurality of convolutional layers, and each of the convolutional layers includes a deep convolutional layer and a point convolutional layer.
  23. 如权利要求22所述的数据处理装置,其中,所述量化单元采用不同的位数对所述深度卷积层的权重进行量化,采用同一位数对激活值进行量化。The data processing device according to claim 22, wherein the quantization unit uses different bits to quantize the weight of the deep convolutional layer, and uses the same bit to quantize the activation value.
  24. 如权利要求22所述的数据处理装置,其中,所述量化单元采用不同的位数对所述点卷积层的权重进行量化,采用同一位数对激活值进行量化。The data processing device according to claim 22, wherein the quantization unit uses different bits to quantize the weight of the dot convolutional layer, and uses the same bit to quantize the activation value.
  25. 如权利要求22所述的数据处理装置,其中,对所述深度卷积层和所述点卷积层二者,所述量化单元均采用不同位数对权重进行量化,采用同一位数对激活值进行量化。The data processing device according to claim 22, wherein, for both the deep convolution layer and the point convolution layer, the quantization unit uses different bits to quantize the weights, and uses the same bit to activate The value is quantified.
  26. 如权利要求23所述的数据处理装置,其中,The data processing device according to claim 23, wherein:
    所述量化单元采用同一位数对所述点卷积层的权重进行量化,采用同一位数对激活值进行量化。The quantization unit uses the same number of bits to quantize the weight of the point convolutional layer, and uses the same number of bits to quantize the activation value.
  27. 如权利要求26所述的数据处理装置,其中,The data processing device according to claim 26, wherein:
    所述第一定点网络模型的深度卷积层的权重和激活值均为第一数值位,其点卷积层的权重和激活值均为所述第一数值位;The weight and activation value of the deep convolution layer of the first fixed-point network model are both the first numerical value, and the weight and activation value of the point convolution layer are both the first numerical value;
    所述第二定点网络模型的深度卷积层的权重为第二数值位,激活值为所述第一数值位,其点卷积层的权重和激活值均为第二数值位。The weight of the deep convolution layer of the second fixed-point network model is a second numerical value, the activation value is the first numerical value, and the weight and activation value of the point convolution layer are both the second numerical value.
  28. 如权利要求20所述的数据处理装置,其中,所述选定单元利用所述至少两个所述定点网络模型对同一测试数据进行处理;22. The data processing device of claim 20, wherein the selection unit uses the at least two fixed-point network models to process the same test data;
    获取各个所述定点网络模型的处理结果的精度值;Acquiring the accuracy value of the processing result of each of the fixed-point network models;
    判断是否存在至少一个定点网络模型,其精度值与精度最高的定点网络模型的精度值的差值在阈值内;Determine whether there is at least one fixed-point network model, and the difference between its accuracy value and that of the fixed-point network model with the highest accuracy is within the threshold;
    如果否,则将所述精度最高的定点网络模型作为选定的定点网络模型; 如果是,将所述至少一个定点网络模型中占用内存最小的定点网络模型作为选定的定点网络模型。If not, use the fixed-point network model with the highest accuracy as the selected fixed-point network model; if yes, use the fixed-point network model that occupies the smallest memory among the at least one fixed-point network model as the selected fixed-point network model.
  29. 如权利要求28所述的数据处理装置,其中,所述深度神经网络为深度可分离的卷积神经网络;The data processing device of claim 28, wherein the deep neural network is a deeply separable convolutional neural network;
  30. 如权利要求29所述的数据处理装置,其中,所述深度可分离的卷积神经网络包括多个卷积层,每个所述卷积层包括:深度卷积层与点卷积层;The data processing device of claim 29, wherein the depth-separable convolutional neural network includes a plurality of convolutional layers, and each of the convolutional layers includes: a deep convolutional layer and a point convolutional layer;
    所述定点网络模型的所述精度与所述深度卷积层和所述点卷积层的至少其中之一的权重位数相对应。The accuracy of the fixed-point network model corresponds to the number of weights of at least one of the deep convolutional layer and the point convolutional layer.
  31. 如权利要求30所述的数据处理装置,其中,至少两个所述定点网络模型的所述深度卷积层的权重位数互不相同,激活值位数相同,所述点卷积层的权重位数相同,激活值位数相同。The data processing device according to claim 30, wherein the number of weights of the deep convolutional layers of at least two of the fixed-point network models are different from each other, the number of activation values is the same, and the weights of the point convolutional layers The number of digits is the same, and the number of activation values is the same.
  32. 如权利要求31所述的数据处理装置,其中,至少两个所述定点网络模型包括:第一定点网络模型和第二定点网络模型;The data processing device of claim 31, wherein at least two of the fixed-point network models include: a first fixed-point network model and a second fixed-point network model;
    所述第一定点网络模型的深度卷积层的权重和激活值均为第一数值位,其点卷积层的权重和激活值均为所述第一数值位;The weight and activation value of the deep convolution layer of the first fixed-point network model are both the first numerical value, and the weight and activation value of the point convolution layer are both the first numerical value;
    所述第二定点网络模型的深度卷积层的权重为第二数值位,激活值为所述第二数值位,其点卷积层的权重和激活值均为所述第一数值位;The weight of the deep convolution layer of the second fixed-point network model is a second numerical value, the activation value is the second numerical value, and the weight and activation value of the point convolution layer are both the first numerical value;
    所述选定单元将所述第一定点网络模型作为所述占用内存最小的定点网络模型,将所述第二定点网络模型作为所述精度最高的定点网络模型。The selection unit uses the first fixed-point network model as the fixed-point network model with the smallest memory occupation, and uses the second fixed-point network model as the fixed-point network model with the highest accuracy.
  33. 如权利要求32所述的数据处理装置,其中,所述第一数值位为八,所述第二数值位为十六。The data processing device according to claim 32, wherein the first numerical bit is eight and the second numerical bit is sixteen.
  34. 如权利要求21所述的数据处理装置,其中,所述深度可分离的卷积神经网络包括多个卷积层,每个所述卷积层包括:深度卷积层与点卷积层;22. The data processing device of claim 21, wherein the depth-separable convolutional neural network includes a plurality of convolutional layers, and each of the convolutional layers includes: a deep convolutional layer and a point convolutional layer;
    所述处理单元将深度卷积层的深度卷积结果存入所述处理单元的片内存储器,读取片内存储器存储的所述深度卷积结果,并利用点卷积层对所述深度卷积结果进行处理。The processing unit stores the deep convolution result of the deep convolution layer into the on-chip memory of the processing unit, reads the deep convolution result stored in the on-chip memory, and uses the point convolution layer to convolve the depth The product result is processed.
  35. 如权利要求34所述的数据处理装置,其中,所述处理单元利用所述定点网络模型的每一卷积层进行以下操作:The data processing device according to claim 34, wherein the processing unit uses each convolutional layer of the fixed-point network model to perform the following operations:
    利用所述深度卷积层对所述数据的数据块进行深度卷积,得到数据块的深度卷积结果;Performing deep convolution on the data block of the data by using the deep convolution layer to obtain a deep convolution result of the data block;
    将所述数据块的深度卷积结果存入片内存储器;Storing the deep convolution result of the data block in the on-chip memory;
    读取所述片内存储器存储的所述数据块的深度卷积结果;Reading the deep convolution result of the data block stored in the on-chip memory;
    利用所述点卷积层对数据块的深度卷积结果进行处理,得到数据块的点卷积结果;Processing the deep convolution result of the data block by using the point convolution layer to obtain the point convolution result of the data block;
    移动数据块,对所述数据的所有数据块执行上述处理。Move the data block, and perform the above-mentioned processing on all the data blocks of the data.
  36. 如权利要求21或29所述的数据处理装置,其中,所述深度可分离的卷积神经网络为Mobile Net网络。The data processing device according to claim 21 or 29, wherein the deeply separable convolutional neural network is a Mobile Net network.
  37. 如权利要求28所述的数据处理装置,其中,所述精度值为平均精度或平均精度均值的至少一种。The data processing device according to claim 28, wherein the accuracy value is at least one of an average accuracy or an average accuracy.
  38. 一种移动设备,其中,包括:如权利要求20至37任一项所述的基于深度神经网络的数据处理装置。A mobile device, comprising: the data processing device based on a deep neural network according to any one of claims 20 to 37.
  39. 如权利要求38所述的移动设备,其中,所述移动设备为便携式移动终端、无人机、手持云台、遥控器的至少一种。The mobile device according to claim 38, wherein the mobile device is at least one of a portable mobile terminal, a drone, a handheld PTZ, and a remote controller.
PCT/CN2019/097072 2019-07-22 2019-07-22 Data processing method and apparatus based on deep neural network, and mobile device WO2021012148A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201980005317.5A CN111344719A (en) 2019-07-22 2019-07-22 Data processing method and device based on deep neural network and mobile device
PCT/CN2019/097072 WO2021012148A1 (en) 2019-07-22 2019-07-22 Data processing method and apparatus based on deep neural network, and mobile device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/097072 WO2021012148A1 (en) 2019-07-22 2019-07-22 Data processing method and apparatus based on deep neural network, and mobile device

Publications (1)

Publication Number Publication Date
WO2021012148A1 true WO2021012148A1 (en) 2021-01-28

Family

ID=71187736

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/097072 WO2021012148A1 (en) 2019-07-22 2019-07-22 Data processing method and apparatus based on deep neural network, and mobile device

Country Status (2)

Country Link
CN (1) CN111344719A (en)
WO (1) WO2021012148A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210089888A1 (en) * 2019-09-25 2021-03-25 Arm Limited Hybrid Filter Banks for Artificial Neural Networks

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113409773B (en) * 2021-08-18 2022-01-18 中科南京智能技术研究院 Binaryzation neural network voice awakening method and system
CN116720563B (en) * 2022-09-19 2024-03-29 荣耀终端有限公司 Method and device for improving fixed-point neural network model precision and electronic equipment

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105224984A (en) * 2014-05-31 2016-01-06 华为技术有限公司 A kind of data category recognition methods based on deep neural network and device
CN106203624A (en) * 2016-06-23 2016-12-07 上海交通大学 Vector Quantization based on deep neural network and method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107657316B (en) * 2016-08-12 2020-04-07 北京深鉴智能科技有限公司 Design of cooperative system of general processor and neural network processor

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105224984A (en) * 2014-05-31 2016-01-06 华为技术有限公司 A kind of data category recognition methods based on deep neural network and device
CN106203624A (en) * 2016-06-23 2016-12-07 上海交通大学 Vector Quantization based on deep neural network and method

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210089888A1 (en) * 2019-09-25 2021-03-25 Arm Limited Hybrid Filter Banks for Artificial Neural Networks
US11561767B2 (en) 2019-09-25 2023-01-24 Arm Limited Mixed-precision computation unit

Also Published As

Publication number Publication date
CN111344719A (en) 2020-06-26

Similar Documents

Publication Publication Date Title
US11373087B2 (en) Method and apparatus for generating fixed-point type neural network
US10878273B2 (en) Dynamic quantization for deep neural network inference system and method
US10096134B2 (en) Data compaction and memory bandwidth reduction for sparse neural networks
US11755901B2 (en) Dynamic quantization of neural networks
US20200364552A1 (en) Quantization method of improving the model inference accuracy
WO2021012148A1 (en) Data processing method and apparatus based on deep neural network, and mobile device
WO2019238029A1 (en) Convolutional neural network system, and method for quantifying convolutional neural network
WO2021135715A1 (en) Image compression method and apparatus
CN109583561B (en) Activation quantity quantification method and device for deep neural network
US11238130B2 (en) Signal processing method and apparatus
WO2019001323A1 (en) Signal processing system and method
CN110647974A (en) Network layer operation method and device in deep neural network
WO2022111002A1 (en) Method and apparatus for training neural network, and computer readable storage medium
WO2021133422A1 (en) Flexible accelerator for sparse tensors (fast) in machine learning
CN113780549A (en) Quantitative model training method, device, medium and terminal equipment for overflow perception
WO2019128248A1 (en) Signal processing method and apparatus
CN111160517B (en) Convolutional layer quantization method and device for deep neural network
CN116992946A (en) Model compression method, apparatus, storage medium, and program product
JP2020021208A (en) Neural network processor, neural network processing method, and program
CN114841325A (en) Data processing method and medium of neural network model and electronic device
JP7040771B2 (en) Neural network processing equipment, communication equipment, neural network processing methods, and programs
CN113282535A (en) Quantization processing method and device and quantization processing chip
KR20200139071A (en) Method and apparatus for quantizing parameter in neural network
CN116884398B (en) Speech recognition method, device, equipment and medium
US20240004719A1 (en) Just-In-Time Re-Partitioning of Feature Maps for Efficient Balancing of Compute Core Workloads

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19938761

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19938761

Country of ref document: EP

Kind code of ref document: A1