WO2021012148A1 - Procédé et appareil de traitement de données reposant sur un réseau neuronal profond, et dispositif mobile - Google Patents

Procédé et appareil de traitement de données reposant sur un réseau neuronal profond, et dispositif mobile Download PDF

Info

Publication number
WO2021012148A1
WO2021012148A1 PCT/CN2019/097072 CN2019097072W WO2021012148A1 WO 2021012148 A1 WO2021012148 A1 WO 2021012148A1 CN 2019097072 W CN2019097072 W CN 2019097072W WO 2021012148 A1 WO2021012148 A1 WO 2021012148A1
Authority
WO
WIPO (PCT)
Prior art keywords
fixed
point
network model
point network
deep
Prior art date
Application number
PCT/CN2019/097072
Other languages
English (en)
Chinese (zh)
Inventor
陈诗南
余俊峰
周爱春
张伟
Original Assignee
深圳市大疆创新科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市大疆创新科技有限公司 filed Critical 深圳市大疆创新科技有限公司
Priority to CN201980005317.5A priority Critical patent/CN111344719A/zh
Priority to PCT/CN2019/097072 priority patent/WO2021012148A1/fr
Publication of WO2021012148A1 publication Critical patent/WO2021012148A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present disclosure relates to the technical field of artificial neural networks, and in particular to a data processing method, device and mobile equipment based on a deep neural network.
  • Deep neural networks have been widely used in mobile devices.
  • the network model needs to be quantified. If the number of quantized digits is large, it will lead to high bandwidth pressure. If the number of quantized digits is small, it will affect the accuracy of the network. Therefore, deep neural networks in mobile devices have a contradiction between ensuring network accuracy and reducing bandwidth.
  • the present disclosure provides a data processing method based on a deep neural network, which includes:
  • the data is processed using the selected fixed-point network model.
  • the present disclosure also provides a data processing device based on a deep neural network, which includes:
  • An obtaining unit for obtaining the floating-point network model of the deep neural network An obtaining unit for obtaining the floating-point network model of the deep neural network
  • a quantization unit configured to quantify the floating-point network model to obtain at least two fixed-point network models with different precisions
  • a selection unit configured to select one of at least two fixed-point network models according to the accuracy of the fixed-point network model
  • the processing unit uses the selected fixed-point network model to process the data.
  • the present disclosure also provides a mobile device, which includes: the above-mentioned data processing device based on a deep neural network.
  • FIG. 1 is a flowchart of a data processing method based on a deep neural network according to an embodiment of the disclosure.
  • FIG. 2 is a data diagram of a data processing method based on a deep neural network according to an embodiment of the disclosure.
  • Figure 3 is a schematic diagram of the structure of the convolutional layer of the deep neural network.
  • FIG. 4 is a schematic diagram of a deep convolution operation of a data processing method based on a deep neural network according to an embodiment of the disclosure.
  • FIG. 5 is a schematic diagram of a point convolution operation of a data processing method based on a deep neural network according to an embodiment of the disclosure.
  • An embodiment of the present disclosure provides a data processing method based on a deep neural network.
  • a deep neural network generally refers to an artificial neural network including an input layer, multiple hidden layers, and an output layer.
  • Deep neural networks include: Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), Long Short-Term Memory Networks (LSTM) and many other types of neural networks.
  • CNN Convolutional Neural Networks
  • RNN Recurrent Neural Networks
  • LSTM Long Short-Term Memory Networks
  • a convolutional neural network is taken as an example to describe the data processing method, but those skilled in the art should understand that the data processing method of this embodiment is not limited to convolutional neural networks, but is applicable to all types Deep neural network.
  • each hidden layer includes multiple operations such as convolution, bias, normalization (BN), activation, and quantization.
  • convolution For a convolutional neural network, each hidden layer includes multiple operations such as convolution, bias, normalization (BN), activation, and quantization.
  • the convolution operation is generally called a convolution layer.
  • the above description of the hidden layer is only exemplary, and does not constitute a limitation on the order and quantity of each operation or each layer.
  • Each operation or each layer can have various modifications. The position and number of several operations or layers can be changed. For example, some hidden layers may not have normalization operations and activation operations, and some hidden layers also include other operations or layers such as pooling and full connection.
  • the convolutional neural network is a fixed-point network
  • the hidden layer includes quantization operations.
  • the convolutional neural network is a floating-point network
  • the hidden layer does not include quantization operations.
  • the depthwise separable convolution in the convolutional neural network is taken as an example to describe the data processing method.
  • the convolutional layer introduced above is a standard convolutional layer, which performs standard convolution operations.
  • the convolutional layer in the hidden layer includes: the two convolutional layers obtained by splitting the standard convolutional layer: the deep convolutional layer and the point convolutional layer, which perform the deep convolution respectively Product operation and point convolution operation.
  • Convolutional neural networks generally target the data of multiple input channels.
  • the deep convolution layer first uses a convolution kernel to perform deep convolution on the data of each input channel to obtain the deep convolution results of each input channel.
  • the build-up layer then performs point convolution on the depth convolution result, fusing the information of each input channel.
  • the deep separable convolutional neural network has greatly reduced required parameters and reduced the amount of calculation. It is especially suitable for scenarios with limited computing resources such as mobile devices.
  • the deeply separable convolutional neural network in this embodiment includes a variety of different implementations, such as the Mobile Net network model.
  • the size of the convolution kernel of the point convolution layer of MobileNet is 1 ⁇ 1.
  • a standard 28-layer Mobile Net network model can be used, or a deep separable convolutional neural network obtained by reducing, expanding, and deforming the standard Mobile Net network model can also be used.
  • the data processing method based on the deep neural network of this embodiment includes the following steps:
  • the data processing method of this embodiment is suitable for mobile devices.
  • the mobile device can directly obtain a floating-point network model from the outside, and the floating-point network model is a trained deep neural network.
  • the deep separable convolutional neural network constructed in this step can be Mobile Net or other types of neural networks.
  • the convolutional layers of these neural networks include deep convolutional layers and point convolutional layers.
  • Floor As shown in FIG. 3, in an example, both the deep convolution layer and the point convolution layer include: convolution, bias, activation, and quantization operations.
  • Convolutional neural networks are usually used in image processing, especially in scenes such as image recognition and image classification. Therefore, in this step, the training image is used to train the depth-separable convolutional neural network. First, normalize the training image data, normalize the training image data to [-1, 1), and then input the normalized data into the constructed deep separable convolutional neural network for training, and get the depth Separable convolutional neural network model.
  • a floating point number with a larger number of digits is generally used to represent the convolutional neural network in the training phase.
  • a 32-bit floating point number can be used to represent the depth of the separable convolutional neural network.
  • the weights, bias values, and activation values in the deep convolutional layer and the point convolutional layer are all represented by 32-bit floating point numbers.
  • the resulting floating-point network model is a 32-bit floating-point network model. Due to the use of 32-bit floating point numbers, the data volume of convolutional neural network parameters and the amount of training calculations are very large. Mobile devices cannot provide enough computing resources. Therefore, deep neural networks are generally not trained on mobile devices, but on servers or computers. Perform training on the Internet, and then transplant the trained floating-point network model to mobile devices.
  • floating-point numbers of other digits may also represent a convolutional neural network with a depth of separation
  • the training data may also be other data besides image data, such as voice data.
  • S201 Perform quantization on the floating-point network model to obtain at least two fixed-point network models with different precisions.
  • the quantization method used in this step can be called mixed precision quantization.
  • different bits are used to quantify the weight of the deep convolutional layer, and the same bit is used to quantify the activation value of the deep convolutional layer.
  • the same number of bits is used to quantify the weight of the point convolution layer, and the same number of bits is used to quantify the activation value of the point convolution layer.
  • the weights can be converted into 8-bit and 16-bit fixed-point numbers respectively; for the 8-bit fixed-point deep convolutional layer, the corresponding activation value is quantized into 8-bit fixed-point numbers; For the 16-bit fixed-point deep convolutional layer, the corresponding activation value is also quantized into an 8-bit fixed-point number. For the point convolutional layer, the weight and activation value are quantized into 8-bit fixed point numbers.
  • this embodiment quantifies the floating-point network model into two fixed-point network models: the first fixed-point network model and the second fixed-point network model.
  • the weight and activation value of the deep convolutional layer of the first fixed-point network model are both 8-bit fixed-point numbers (w8a8), and the weight and activation value of the point convolutional layer are both 8-bit fixed-point numbers (w8a8);
  • the weight of the deep convolution layer of the fixed-point network model is a 16-bit fixed-point number, and the activation value is an 8-bit fixed-point number (w16a8), and the weight and activation value of the point convolution layer are both 8-bit fixed-point number (w8a8).
  • the activation value of the deep convolutional layer can also be quantized with mixed precision, that is, the same number of bits is used to quantize the weight of the deep convolutional layer, and the activation value of the deep convolutional layer is quantized with different bits.
  • the weight and activation value of the point convolution layer can also be quantified with mixed precision, including: for the deep convolution layer of the floating-point network model convolution layer, the weight is quantized with the same bit, and the same bit is used to activate it Value is quantized; for the point convolution layer of the floating-point network model, use different bits to quantize its weight, use the same bit to quantize its activation value, or use the same bit to quantize its weight, use different bits Quantify its activation value.
  • the quantization bits of the above-mentioned first fixed-point network model and the second fixed-point network model are only an example. This embodiment may also use other quantization bits other than 8 bits and 16 bits, and quantization bits. For more than three fixed-point network models. For the deep convolutional layer and the point convolutional layer, when the same number of bits is used to quantize the weight and the same number of bits is used to quantize the activation value, the quantization bit of the weight and the quantization bit of the activation value can be the same, or different.
  • step S201 The accuracy of the fixed-point network model and the required computing power are related to its quantization bits.
  • step S201 at least two fixed-point network models with different precisions are obtained, corresponding to different computing capabilities. The more quantization bits, the higher the accuracy, and the greater the computing power required, the fewer the quantization bits, the lower the accuracy, and the smaller the required computing power.
  • the weight of the deep convolutional layer of the second fixed-point network model is 16
  • the weight of the deep convolutional layer of the first fixed-point network model is fixed-point
  • the number is 8, therefore, the accuracy of the second fixed-point network model is better than that of the first fixed-point network model, but the required computing power is greater than that of the first fixed-point network model. Therefore, this step needs to select a fixed-point network model from at least two fixed-point network models, so as to reduce the required computing power as much as possible while the accuracy is not greatly reduced, thereby reducing the bandwidth requirements and achieving a balance The purpose of accuracy and bandwidth.
  • test data is first input into at least two fixed-point network models, and at least two fixed-point network models perform inferences on the test data to obtain processing results.
  • the test data uses an 8-bit integer (i8), which can be image data, or other data besides image data, such as voice data.
  • the test image is input into the first fixed-point network model and the second fixed-point network model.
  • the first fixed-point network model and the second fixed-point network model compare the test images respectively. Perform reasoning and get the image processing result.
  • the accuracy value of the processing result is characterized by the mean average accuracy (mAP).
  • mAP mean average accuracy
  • the accuracy value of the first fixed-point network model is called the first accuracy value
  • the accuracy value of the second fixed-point network model is called the second accuracy value.
  • the accuracy value of the processing result can also be characterized by other parameters such as average accuracy (AP).
  • the threshold may be set according to experience or accuracy requirements, for example, it may be 1%. In other words, it is judged whether there is a fixed-point network model whose accuracy value is within 1% of the accuracy value of the fixed-point network model with the highest accuracy.
  • the second fixed-point network model is the fixed-point network model with the highest accuracy
  • the accuracy value of the first fixed-point network model is judged to be the difference between the accuracy value of the second fixed-point network model Is the difference within 1%?
  • the fixed-point network model with the highest accuracy is selected at this time.
  • the first fixed-point network model and second fixed-point network model are used, if the difference between the accuracy value of the first fixed-point network model and the accuracy value of the second fixed-point network model is within 1%, the first fixed-point network model is selected Point network model, otherwise, select the second fixed point network model.
  • the input data can be processed to obtain the data processing result.
  • the data processing method based on the deep neural network of this embodiment quantifies the floating-point network model into at least two fixed-point network models with different accuracy, and selects a fixed-point network model according to the accuracy of the fixed-point network model.
  • Processing data can reduce the required computing power and bandwidth requirements as much as possible while ensuring network accuracy, taking into account the balance between accuracy and bandwidth, and effectively solving the contradiction between network accuracy and bandwidth. Improve the performance of mobile devices to perform deep neural network operations.
  • Another embodiment of the present disclosure provides a data processing method based on a deep neural network.
  • the features that are the same as or similar to the previous embodiment will not be repeated, and only the features that are different from the previous embodiment will be described below. .
  • step S401 in the process of using the selected fixed-point network model to process the data, the deep convolution result of the deep convolution layer is stored in the on-chip memory, and the slice is read The deep convolution result stored in the internal memory, and the point convolution layer processes the deep convolution result.
  • each hidden layer processes the input data, and outputs the processing result to the next hidden layer as the input data of the next hidden layer.
  • the input data and output data of each hidden layer are called feature maps.
  • the feature maps are processed by tiles.
  • a data block (tile) of the feature map is stored in the on-chip memory.
  • the size of the data block is equal to the size of the convolution kernel of the following deep convolution layer.
  • the on-chip memory refers to the internal memory of the processor rather than the external memory, and the on-chip memory may be on-chip memory or cache.
  • the data block is read from the on-chip memory, and the deep convolution layer processes the data block to obtain the deep convolution result of the data block.
  • the data block is convolved with the weight of the convolution kernel, and the convolution result is superimposed with the offset value.
  • [1, C, H t , W t ] represents the parameters of the data block, where C represents the number of input channels, and H t and W t represent the height and width of the data block, respectively.
  • [C, 1, H w , W w ] represents the parameters of the weight matrix of the deep convolutional layer, where C represents the number of input channels, and H w and W w represent the height and width of the weight matrix, respectively.
  • [C, 1, 1, 1] represents the parameter of the bias value of the deep convolutional layer, and C represents the number of input channels. If the deep convolutional layer has an activation operation, the activation function is used to activate the output value of the bias operation to obtain the activation value, and finally the activation value is quantized to obtain the deep convolution result.
  • the deep convolution result of the data block is stored in the on-chip memory, that is, stored in the on-chip memory or cache of the processor, rather than stored in an off-chip memory such as DDR.
  • the above-mentioned deep convolution operation can be used to process the feature maps of each input channel in parallel to improve computing efficiency.
  • the deep convolution result is read from the processor's on-chip memory or cache instead of off-chip memory such as DDR.
  • the point convolution layer processes the deep convolution result of the data block to obtain the point convolution result of the data block.
  • the data blocks of each input channel are convolved with the weight of the convolution kernel, and the convolution result is superimposed and then the offset value is superimposed.
  • [1, C, 1, 1] represents the parameters of the weight matrix of the point convolution layer, where C represents the number of input channels, and the third and fourth elements 1 and 1, respectively represent the weights
  • the height and width of the matrix, that is, the weight matrix of the point convolution layer, is a 1 ⁇ 1 matrix.
  • [1,1,1,1] represents the parameter of the offset value of the dot convolution layer.
  • the activation function is used to activate the output value of the bias operation to obtain the activation value, and finally the activation value is quantized to obtain the dot convolution result of the data block.
  • the point convolution layer generally has multiple output channels, and the aforementioned point convolution operation can be used to process each output channel in parallel to improve computing efficiency.
  • the data processing method based on the deep neural network of this embodiment stores the deep convolution result in the on-chip memory, and the point convolution layer reads the deep convolution result from the on-chip memory for processing, and the intermediate result (depth The reading and writing of the convolution result) are all done on-chip, without reading and writing off-chip memory.
  • the method of writing intermediate results into off-chip memory and then reading intermediate results from off-chip memory it further saves the bandwidth of mobile devices, improves the performance of mobile devices to perform deep neural network operations, and can support low computing power and bandwidth Few such lower-end mobile devices.
  • Another embodiment of the present disclosure provides a data processing device based on a deep neural network, including:
  • the obtaining unit is used to obtain the floating point network model of the deep neural network.
  • the quantization unit is used to quantize the floating-point network model to obtain at least two fixed-point network models with different precisions.
  • the selection unit is configured to select one of at least two fixed-point network models according to the accuracy of the fixed-point network model.
  • the processing unit uses the selected fixed-point network model to process the data.
  • the data processing apparatus of this embodiment is used for mobile equipment, and the mobile equipment can directly obtain a floating-point network model from the outside, and the floating-point network model is a trained deep neural network.
  • Training a deep neural network is performed in two steps. First, build a deep separable convolutional neural network. Then, train the constructed deep-separable convolutional neural network to obtain a deep-separable convolutional neural network model.
  • the depth separable convolutional neural network constructed in this embodiment can be Mobile Net or other types of neural networks.
  • the convolutional layers of these neural networks include deep convolutional layers and point convolutional layers. Buildup.
  • training images are used to train a convolutional neural network that is separable in depth.
  • a 32-bit floating point number can be used to represent a convolutional neural network that is separable in depth, and the weights, bias values, and activation values in the deep convolutional layer and the point convolutional layer are all represented by 32-bit floating point numbers .
  • the floating-point network model obtained is 32 floating-point network model. Due to the use of 32-bit floating point numbers, the data volume of convolutional neural network parameters and the amount of training calculations are very large. Mobile devices cannot provide enough computing resources. Therefore, deep neural networks are generally not trained on mobile devices, but on servers or computers. Perform training on the Internet, and then transplant the trained floating-point network model to mobile devices.
  • the quantization method used by the quantization unit can be called mixed precision quantization.
  • different bits are used to quantify the weight of the deep convolutional layer, and the same bit is used to quantify the activation value of the deep convolutional layer.
  • the same number of bits is used to quantify the weight of the point convolution layer, and the same number of bits is used to quantify the activation value of the point convolution layer.
  • the weights can be converted into 8-bit and 16-bit fixed-point numbers respectively; for the 8-bit fixed-point deep convolutional layer, the corresponding activation value is quantized into 8-bit fixed-point numbers; For the 16-bit fixed-point deep convolutional layer, the corresponding activation value is also quantized into an 8-bit fixed-point number. For the point convolutional layer, the weight and activation value are quantized into 8-bit fixed point numbers.
  • this embodiment quantifies the floating-point network model into two fixed-point network models: the first fixed-point network model and the second fixed-point network model.
  • the weight and activation value of the deep convolutional layer of the first fixed-point network model are both 8-bit fixed-point numbers (w8a8), and the weight and activation value of the point convolutional layer are both 8-bit fixed-point numbers (w8a8);
  • the weight of the deep convolution layer of the fixed-point network model is a 16-bit fixed-point number, and the activation value is an 8-bit fixed-point number (w16a8), and the weight and activation value of the point convolutional layer are both 8-bit fixed-point number (w8a8).
  • the activation value of the deep convolutional layer can also be quantized with mixed precision, that is, the same number of bits is used to quantize the weight of the deep convolutional layer, and the activation value of the deep convolutional layer is quantized with different bits.
  • the weight and activation value of the point convolution layer can also be quantified with mixed precision, including: for the deep convolution layer of the floating-point network model convolution layer, the weight is quantized with the same bit, and the same bit is used to activate it Value is quantized; for the point convolution layer of the floating-point network model, use different bits to quantize its weight, use the same bit to quantize its activation value, or use the same bit to quantize its weight, use different bits Quantify its activation value.
  • the quantization bits of the above-mentioned first fixed-point network model and the second fixed-point network model are only an example. This embodiment may also use other quantization bits other than 8 bits and 16 bits, and quantization bits. For more than three fixed-point network models. For the deep convolutional layer and the point convolutional layer, when the same number of bits is used to quantize the weight and the same number of bits is used to quantize the activation value, the quantization bit of the weight and the quantization bit of the activation value can be the same, or different.
  • the accuracy of the fixed-point network model and the required computing power are related to its quantization bits. At least two fixed-point network models with different precisions obtained by the quantization unit correspond to different computing capabilities. The more quantization bits, the higher the accuracy, and the greater the computing power required, the fewer the quantization bits, the lower the accuracy, and the smaller the required computing power.
  • the selected unit needs to select a fixed-point network model from at least two fixed-point network models, so as to reduce the required computing power as much as possible while the accuracy is not greatly reduced, thereby reducing the bandwidth requirements and achieving Balance the purpose of accuracy and bandwidth.
  • the selected unit first inputs the same test data into at least two fixed-point network models, and the at least two fixed-point network models perform inferences on the test data to obtain processing results.
  • the test data uses an 8-bit integer (i8), which can be image data, or other data besides image data, such as voice data.
  • the test image is input into the first fixed-point network model and the second fixed-point network model.
  • the first fixed-point network model and the second fixed-point network model compare the test images respectively. Perform reasoning and get the image processing result.
  • the selected unit obtains the accuracy value of the processing result of each of the fixed-point network models.
  • the accuracy value of the processing result is characterized by the mean average accuracy (mAP).
  • mAP mean average accuracy
  • the accuracy value of the first fixed-point network model is called the first accuracy value
  • the accuracy value of the second fixed-point network model is called the second accuracy value.
  • the accuracy value of the processing result can also be characterized by other parameters such as average accuracy (AP).
  • the selected unit judges whether there is at least one fixed-point network model, and the difference between its precision value and the precision value of the fixed-point network model with the highest precision is within the threshold.
  • the threshold may be set according to experience or accuracy requirements, for example, it may be 1%. In other words, it is judged whether there is a fixed-point network model whose accuracy value is within 1% of the accuracy value of the fixed-point network model with the highest accuracy.
  • the second fixed-point network model is the fixed-point network model with the highest accuracy
  • the accuracy value of the first fixed-point network model is judged to be the difference between the accuracy value of the second fixed-point network model Is the difference within 1%?
  • the fixed-point network model with the highest accuracy is selected at this time.
  • the first fixed-point network model and second fixed-point network model are used, if the difference between the accuracy value of the first fixed-point network model and the accuracy value of the second fixed-point network model is within 1%, the first fixed-point network model is selected Point network model, otherwise, select the second fixed point network model.
  • the processing unit uses the selected fixed-point network model to process the data. After the fixed-point network model is selected, the processing unit can process the input data to obtain the data processing result.
  • the data processing device based on the deep neural network of this embodiment quantifies the floating-point network model into at least two fixed-point network models with different accuracy, and selects a fixed-point network model according to the accuracy of the fixed-point network model.
  • Processing data can reduce the required computing power and bandwidth requirements as much as possible while ensuring network accuracy, taking into account the balance between accuracy and bandwidth, and effectively solving the contradiction between network accuracy and bandwidth. Improve the performance of mobile devices to perform deep neural network operations.
  • the processing unit stores the deep convolution result of the deep convolution layer into the on-chip memory and reads the on-chip memory storage during the process of processing the data using the selected fixed-point network model
  • the depth convolution result of the point convolution layer processes the depth convolution result.
  • the processing unit uses each hidden layer to process the input data, and outputs the processing result to the next hidden layer as the input data of the next hidden layer.
  • the input data and output data of each hidden layer are called feature maps.
  • the feature maps are processed by tiles.
  • the processing unit first performs a deep convolution operation:
  • a data block (tile) of the feature map is stored in the on-chip memory.
  • the size of the data block is equal to the size of the convolution kernel of the following deep convolution layer.
  • the on-chip memory refers to a memory inside the processing unit rather than an external memory, and the on-chip memory may be an on-chip memory or a cache.
  • the deep convolution layer processes the data block to obtain the deep convolution result of the data block.
  • the data block is convolved with the weight of the convolution kernel, and the convolution result is superimposed with the offset value. If the deep convolutional layer has an activation operation, the activation function is used to activate the output value of the bias operation to obtain the activation value, and finally the activation value is quantized to obtain the deep convolution result.
  • the deep convolution result of the data block is stored in the on-chip memory, that is, stored in the on-chip memory or cache of the processing unit, instead of being stored in an off-chip memory such as DDR.
  • the processing unit can use the aforementioned deep convolution operation to process the feature maps of each input channel in parallel to improve computing efficiency.
  • the deep convolution result of the data block stored in the on-chip memory is read from the on-chip memory or cache of the processing unit instead of off-chip memory such as DDR.
  • the point convolution layer processes the deep convolution result of the data block to obtain the point convolution result of the data block.
  • the data block of each input channel is convolved with the weight of the convolution kernel, and the offset value is superimposed after the convolution result is superimposed. If the dot convolution layer has an activation operation, the activation function is used to activate the output value of the bias operation to obtain the activation value, and finally the activation value is quantized to obtain the dot convolution result of the data block.
  • the point convolution layer generally has multiple output channels, and the processing unit can use the above point convolution operation to process each output channel in parallel to improve the computing efficiency.
  • the data processing device based on the deep neural network of this embodiment stores the deep convolution result in the on-chip memory, and the point convolution layer reads the deep convolution result from the on-chip memory for processing, and the intermediate result (depth The reading and writing of the convolution result) are all done on-chip, without reading and writing off-chip memory.
  • the method of writing intermediate results into off-chip memory and then reading intermediate results from off-chip memory it further saves the bandwidth of mobile devices, improves the performance of mobile devices to perform deep neural network operations, and can support low computing power and bandwidth Few such lower-end mobile devices.
  • the embodiments of the present disclosure do not limit the implementation form of the deep neural network-based data processing device and its acquisition unit, quantization unit, selection unit, and processing unit.
  • each unit can be implemented by a separate processor, and some or all of the units can also be implemented by one processor.
  • the processor mentioned in the embodiments of the present disclosure may be a central processing unit (Central Processing Unit, CPU), other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), and application specific integrated circuits (Application Specific Integrated Circuits). Integrated Circuit, ASIC), Field Programmable Gate Array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the on-chip memory mentioned in the embodiments of the present disclosure refers to the memory integrated inside the processor, which may be volatile memory or non-volatile memory, or may include both volatile and non-volatile memory. Both memory.
  • the mobile device which includes the deep neural network-based data processing device of any of the foregoing embodiments.
  • the mobile device may be a portable mobile terminal, a drone, a handheld PTZ, a remote control, etc.
  • the portable mobile terminal may be a cell phone, a tablet computer, etc.
  • the remote control may be a remote control of a drone.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

L'invention concerne un procédé et un appareil de traitement de données reposant sur un réseau neuronal profond, et un dispositif mobile. Le procédé comprend les étapes consistant à : acquérir un modèle de réseau en virgule flottante d'un réseau neuronal profond ; quantifier le modèle de réseau en virgule flottante pour obtenir au moins deux modèles de réseau en virgule fixe avec des précisions différentes ; en fonction des précisions des modèles de réseau en virgule fixe, sélectionner un modèle de réseau en virgule fixe parmi les au moins deux modèles de réseau en virgule fixe ; et utiliser le modèle de réseau en virgule fixe sélectionné pour traiter des données.
PCT/CN2019/097072 2019-07-22 2019-07-22 Procédé et appareil de traitement de données reposant sur un réseau neuronal profond, et dispositif mobile WO2021012148A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201980005317.5A CN111344719A (zh) 2019-07-22 2019-07-22 基于深度神经网络的数据处理方法、装置及移动设备
PCT/CN2019/097072 WO2021012148A1 (fr) 2019-07-22 2019-07-22 Procédé et appareil de traitement de données reposant sur un réseau neuronal profond, et dispositif mobile

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/097072 WO2021012148A1 (fr) 2019-07-22 2019-07-22 Procédé et appareil de traitement de données reposant sur un réseau neuronal profond, et dispositif mobile

Publications (1)

Publication Number Publication Date
WO2021012148A1 true WO2021012148A1 (fr) 2021-01-28

Family

ID=71187736

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/097072 WO2021012148A1 (fr) 2019-07-22 2019-07-22 Procédé et appareil de traitement de données reposant sur un réseau neuronal profond, et dispositif mobile

Country Status (2)

Country Link
CN (1) CN111344719A (fr)
WO (1) WO2021012148A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210089888A1 (en) * 2019-09-25 2021-03-25 Arm Limited Hybrid Filter Banks for Artificial Neural Networks

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113409773B (zh) * 2021-08-18 2022-01-18 中科南京智能技术研究院 一种二值化神经网络语音唤醒方法及系统
CN116720563B (zh) * 2022-09-19 2024-03-29 荣耀终端有限公司 一种提升定点神经网络模型精度的方法、装置及电子设备

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105224984A (zh) * 2014-05-31 2016-01-06 华为技术有限公司 一种基于深度神经网络的数据类别识别方法及装置
CN106203624A (zh) * 2016-06-23 2016-12-07 上海交通大学 基于深度神经网络的矢量量化系统及方法

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107657316B (zh) * 2016-08-12 2020-04-07 北京深鉴智能科技有限公司 通用处理器与神经网络处理器的协同系统设计

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105224984A (zh) * 2014-05-31 2016-01-06 华为技术有限公司 一种基于深度神经网络的数据类别识别方法及装置
CN106203624A (zh) * 2016-06-23 2016-12-07 上海交通大学 基于深度神经网络的矢量量化系统及方法

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210089888A1 (en) * 2019-09-25 2021-03-25 Arm Limited Hybrid Filter Banks for Artificial Neural Networks
US11561767B2 (en) 2019-09-25 2023-01-24 Arm Limited Mixed-precision computation unit

Also Published As

Publication number Publication date
CN111344719A (zh) 2020-06-26

Similar Documents

Publication Publication Date Title
CN109754066B (zh) 用于生成定点型神经网络的方法和装置
US10878273B2 (en) Dynamic quantization for deep neural network inference system and method
US10096134B2 (en) Data compaction and memory bandwidth reduction for sparse neural networks
CN110363279B (zh) 基于卷积神经网络模型的图像处理方法和装置
US11755901B2 (en) Dynamic quantization of neural networks
CN109840589B (zh) 一种在fpga上运行卷积神经网络的方法和装置
CN111931922B (zh) 一种提高模型推断精度的量化方法
WO2021012148A1 (fr) Procédé et appareil de traitement de données reposant sur un réseau neuronal profond, et dispositif mobile
WO2019238029A1 (fr) Système de réseau neuronal convolutif et procédé de quantification d'un réseau neuronal convolutif
US20200327185A1 (en) Signal Processing Method and Apparatus
CN111240746B (zh) 一种浮点数据反量化及量化的方法和设备
WO2019001323A1 (fr) Système et procédé de traitement de signal
CN113132723A (zh) 一种图像压缩方法及装置
CN113780549A (zh) 溢出感知的量化模型训练方法、装置、介质及终端设备
CN116992946B (zh) 模型压缩方法、装置、存储介质和程序产品
CN116884398B (zh) 语音识别方法、装置、设备和介质
CN112561050B (zh) 一种神经网络模型训练方法及装置
CN111160517A (zh) 一种深度神经网络的卷积层量化方法及装置
JP2020021208A (ja) ニューラルネットワーク用プロセッサ、ニューラルネットワーク用処理方法、および、プログラム
CN114841325A (zh) 神经网络模型的数据处理方法、介质及电子设备
JP7040771B2 (ja) ニューラルネットワーク処理装置、通信装置、ニューラルネットワーク処理方法、およびプログラム
WO2021037174A1 (fr) Procédé et appareil de formation de modèle de réseau neuronal
CN115705486A (zh) 量化模型的训练方法、装置、电子设备和可读存储介质
CN113282535A (zh) 量化处理方法和装置、量化处理芯片
KR20200139071A (ko) 뉴럴 네트워크에서 파라미터를 양자화하는 방법 및 장치

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19938761

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19938761

Country of ref document: EP

Kind code of ref document: A1