WO2022027242A1 - 神经网络的数据处理方法、装置、可移动平台及计算机可读存储介质 - Google Patents

神经网络的数据处理方法、装置、可移动平台及计算机可读存储介质 Download PDF

Info

Publication number
WO2022027242A1
WO2022027242A1 PCT/CN2020/106865 CN2020106865W WO2022027242A1 WO 2022027242 A1 WO2022027242 A1 WO 2022027242A1 CN 2020106865 W CN2020106865 W CN 2020106865W WO 2022027242 A1 WO2022027242 A1 WO 2022027242A1
Authority
WO
WIPO (PCT)
Prior art keywords
layer
neural network
quantized
target
quantization
Prior art date
Application number
PCT/CN2020/106865
Other languages
English (en)
French (fr)
Inventor
聂谷洪
蒋阳
李思晋
张李亮
Original Assignee
深圳市大疆创新科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市大疆创新科技有限公司 filed Critical 深圳市大疆创新科技有限公司
Priority to PCT/CN2020/106865 priority Critical patent/WO2022027242A1/zh
Publication of WO2022027242A1 publication Critical patent/WO2022027242A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present application relates to the technical field of data processing, and in particular, to a data processing method, apparatus, movable platform and computer-readable storage medium of a neural network.
  • neural network technology is applied to all aspects of life, such as image recognition (such as face recognition, content-based image retrieval or expression recognition, etc.) using neural network technology, natural language processing (such as speech recognition, text classification or information retrieval, etc.) and so on.
  • image recognition such as face recognition, content-based image retrieval or expression recognition, etc.
  • natural language processing such as speech recognition, text classification or information retrieval, etc.
  • the operation of a neural network is a computationally and memory-intensive process.
  • the parameters in the neural network model are quantified.
  • the same quantization method is used for each layer in the neural network, but the redundancy of different layers in the neural network is different.
  • the tier consumes too much storage resources.
  • one of the objectives of the present application is to provide a data processing method, device, removable platform and computer-readable storage medium of a neural network.
  • an embodiment of the present application provides a data processing method for a neural network, including:
  • For each layer in the neural network search for the target quantization bit number of the layer from a plurality of candidate quantization bit numbers included in the search space of the layer; and, according to the target quantization bit number of the layer The parameters are quantized to obtain the quantized neural network;
  • the target number of quantized bits of each layer is adjusted, and data processing is performed based on the adjusted neural network.
  • an embodiment of the present application provides a data processing apparatus, including: a processor, a memory for storing executable instructions; when executing the executable instructions, the processor is configured to:
  • For each layer in the neural network search for the target quantization bit number of the layer from a plurality of candidate quantization bit numbers included in the search space of the layer; and, according to the target quantization bit number of the layer The parameters are quantized to obtain the quantized neural network;
  • the target number of quantized bits of each layer is adjusted, and data processing is performed based on the adjusted neural network.
  • embodiments of the present application provide a computer-readable storage medium on which computer instructions are stored, and when the instructions are executed by a processor, implement the method described in the first aspect.
  • an embodiment of the present application provides a movable platform, including the data processing apparatus described in the second aspect.
  • a neural network for each layer in the neural network, from multiple candidates included in the search space of the layer Searching for the target quantization bit number of the layer in the quantization bit number; and quantizing the parameters of the layer according to the target quantization bit number of the layer to obtain a quantized neural network; then determining the accuracy of the quantized neural network Then, utilize the precision of the neural network after the quantization to adjust the target quantization bit number of each layer of the neural network, until it is determined to search for each layer in the neural network according to the precision of the neural network after the quantization.
  • the target quantization bit number so as to obtain a mixed-precision network with high performance, for the layer with low redundancy, use higher bit bit quantization, for the layer with high redundancy, use lower bit bit quantization, using this
  • the mixed-precision network is used for data processing, which is conducive to reducing the bandwidth that needs to be occupied in the data processing process and improving the data processing effect.
  • FIG. 1 is a schematic diagram of an application scenario of a method for processing a neural network provided by an embodiment of the present application
  • FIG. 2 is a schematic flowchart of a processing method of a neural network provided by an embodiment of the present application
  • FIG. 3 is a schematic diagram of obtaining the quantized neural network based on reinforcement learning provided by an embodiment of the present application
  • FIG. 4 is a schematic flowchart of another method for processing a neural network provided by an embodiment of the present application.
  • 5A is a schematic diagram of quantizing parameters of a convolutional layer provided by an embodiment of the present application.
  • 5B is a schematic diagram of quantizing the parameters of the pooling layer provided by an embodiment of the present application.
  • FIG. 6 is a schematic diagram of quantization based on different quantization bit numbers provided by an embodiment of the present application.
  • FIG. 7 is a schematic diagram of scaling a weight value based on a hyperbolic tangent function provided by an embodiment of the present application.
  • 8A is a schematic diagram of a weight value quantized layer by layer in the related art provided by an embodiment of the present application.
  • 8B is a schematic diagram of a channel-by-channel quantized weight value provided by an embodiment of the present application.
  • FIG. 9 is a schematic structural diagram of a processing apparatus provided by an embodiment of the present application.
  • the same quantization method is used for each layer in the neural network, but it is not considered: First, the redundancy of different layers in the neural network is different, and the same quantization method is used for each layer in the neural network. Layers with high redundancy will occupy too much storage resources; second, the computing density of different layers in the neural network is also different, and the requirements for computing resources and storage resources are also different.
  • the standard convolutional layer is computationally intensive. The layer-by-layer convolution layer is storage-intensive, so different computing densities have different requirements for the number of quantization bits of the layer parameters.
  • the embodiment of the present application provides a data processing method of a neural network, searching for a suitable target quantization bit number for each layer in the neural network, thereby further improving the performance of the model, and using such a neural network for data processing, It is beneficial to improve the processing efficiency.
  • the neural network includes but is not limited to a BP neural network or a deep neural network (DNN), wherein the deep neural network (DNN) generally refers to a neural network including an input layer, multiple hidden layers and an output layer, so
  • the deep neural network includes, but is not limited to, a convolutional neural network (CNN), a recurrent neural network (RNN), a long short-term memory network (LSTM), and the like.
  • the data processing method of the neural network in the embodiment of the present application can be applied to different data processing fields.
  • the data processing method of the neural network can be applied to the field of image processing, such as using the method to perform face recognition, expression recognition, image retrieval, object recognition, behavior classification, or pose estimation.
  • the data processing method of the neural network can be applied to the field of natural language processing, such as using the method to perform speech recognition, text classification, text retrieval or automatic word segmentation. Since each layer in the neural network in the embodiments of the present application finds an appropriate target quantization bit number, using such a neural network for data processing is beneficial to improve processing efficiency.
  • the data processing method of the neural network can be applied to a data processing device, and the data processing device includes but is not limited to computer chips (such as an ARM processor, a DSP processor, a GPU processor, and an FPGA processor. etc.) or entities (such as computer equipment).
  • the data processing device includes but is not limited to computer chips (such as an ARM processor, a DSP processor, a GPU processor, and an FPGA processor. etc.) or entities (such as computer equipment).
  • the data processing device when the data processing device is a computer chip, the data processing device can be mounted on a mobile platform, so as to provide the mobile platform with a data processing function based on a neural network, and the mobile platform can be provided with a neural network-based data processing function.
  • Mobile platforms include, but are not limited to, unmanned aerial vehicles, unmanned vehicles, unmanned ships, mobile robots, or PTZs.
  • FIG. 1 is a schematic diagram of an application scenario provided by the embodiment of the present application.
  • the unmanned aerial vehicle 11 is equipped with the data processing device 12 , and the data processing device is equipped with a
  • the executable instructions of the data processing method of the neural network in this embodiment, assuming that the data processing method of the neural network is used for object recognition, the UAV 11 can be implemented based on the data processing method of the neural network
  • the tracking and shooting function of the target object, specifically, the shooting device 10 on the unmanned aerial vehicle 11 can continuously shoot multiple images, and then use the data processing method of the neural network in the data processing device 11.
  • the flying attitude of the unmanned aerial vehicle is adjusted to realize the tracking and shooting of the target object 13 .
  • each layer in the neural network in the embodiments of the present application has searched for a suitable target quantization bit number, using such a neural network for object recognition is beneficial to improve processing efficiency.
  • FIG. 2 is a flowchart of a data processing method of a neural network provided by an embodiment of the present application.
  • the method includes:
  • step S101 for each layer in the neural network, the target quantization bit number of the layer is searched from a plurality of candidate quantization bit numbers included in the search space of the layer; and, according to the target quantization bit number of the layer The parameters of the layer are quantized to obtain the quantized neural network.
  • step S102 the accuracy of the quantized neural network is determined.
  • step S103 according to the precision of the quantized neural network and the search space of each layer, the target number of quantized bits of each layer is adjusted, and data processing is performed based on the adjusted neural network.
  • the search space includes multiple candidate quantization bits, for example, the candidate quantization bits are ⁇ 1bit, 2bit, 4bit, 8bit, 16bit ⁇ .
  • the number of candidate quantization bits may be the same, thereby helping to reduce the burden on the staff.
  • the number of candidate quantization bits may also be different, which is not limited in this embodiment.
  • the target quantization bit number of the layer is searched from a plurality of candidate quantization bit numbers included in the search space of the layer; and, according to the target quantization bit number of the layer
  • the parameters of the layer are quantized by the number of bits, and the quantized neural network is obtained; then the quantized neural network is trained, and the neural network after the test training is completed to obtain the accuracy of the quantized neural network; then, using The precision of the quantized neural network adjusts the number of target quantization bits of each layer of the neural network, that is, the target quantization bits of the layer are re-searched from the search space of the layer according to the precision of the quantized neural network.
  • a high-performance mixed-precision network uses higher-bit quantization for layers with low redundancy and lower-bit quantization for layers with high redundancy, and uses such a mixed-precision network for data processing , which is conducive to reducing the bandwidth that needs to be occupied in the data processing process, and at the same time improving the data processing effect.
  • the accuracy of the quantized neural network is obtained, if the accuracy of the quantized neural network does not meet the preset condition, it indicates that the accuracy of each layer in the neural network searched this time is The target quantization bit number may be inappropriate, then for each layer in the neural network, the target quantization bit number of the layer is re-searched from the search space of the layer according to the accuracy of the quantized neural network. If the accuracy of the quantized neural network complies with the preset conditions, it indicates that the appropriate target quantization bit number of each layer in the neural network has been searched this time, then the search process is ended, and the quantized neural network is used for data processing. In this embodiment, according to the accuracy of the quantized neural network, it is determined whether a suitable target number of quantized bits for each layer in the neural network is searched this time, so as to obtain a mixed-precision network with high performance.
  • the preset condition may be specifically set according to an actual application scenario, which is not limited in this embodiment of the present application.
  • the target quantization bit number of this layer is searched again from the search space of this layer according to the accuracy of the quantized neural network: if the accuracy of the quantized neural network does not meet the preset conditions, it indicates that this time The searched target number of quantized bits of each layer in the neural network may be inappropriate, then first obtain the sampling probability of searching for the target number of quantized bits of this layer from the search space of each layer in the neural network this time; The accuracy of the quantized neural network adjusts the sampling probability to obtain the adjusted sampling probability; finally, the target quantization bit number of the layer is re-searched from the search space of the layer according to the adjusted sampling probability.
  • the current neural network can be improved.
  • the sampling probability of the target quantization bit number is sampled for each layer in The accuracy of this time is low), then the sampling probability of sampling the target number of quantized bits in each layer in this neural network can be reduced, and the sampling probability can be adjusted to ensure that the accuracy that meets the preset conditions can be searched.
  • Neural Networks Further, by adjusting the sampling probability, the target number of quantized bits of each layer of the neural network is automatically searched, which reduces the workload of manual operation and is beneficial to improving the search efficiency.
  • the embodiment of the present application adopts a reinforcement learning method, and a controller is used to adjust the target number of quantization bits of each layer.
  • a controller is used to adjust the target number of quantization bits of each layer.
  • a recurrent neural network (such as an LSTM network) is used to construct a controller, and then, for each layer in the neural network, a plurality of candidate quantization bits from the search space of the layer are used by the controller.
  • the controller can increase the sampling probability of sampling the target quantized bits from each layer in this neural network, if the If the accuracy of the quantized neural network is low, the controller can reduce the sampling probability of sampling the target number of quantized bits in each layer of the neural network, and adjust the sampling probability to ensure that the The neural network whose accuracy meets the preset conditions is searched.
  • the sampling probability corresponding to each layer of the neural network is randomly generated, and the controller searches for the target quantization bit number of the layer from the search space of each layer of the neural network according to the randomly generated sampling probability.
  • the controller uses policy gradient algorithm to update the parameters of the controller, then use the updated controller to adjust the sampling probability, obtain the adjusted sampling probability, and use the adjusted sampling probability from the search space of each layer of the neural network Research the target quantization bits for this layer.
  • the controller will increase the sampling probability of the target number of quantized bits sampled by each layer in the neural network this time. If the accuracy of the later neural network is low, the controller can reduce the sampling probability of sampling the target quantized bits from each layer in the neural network, and adjust the sampling probability to ensure that the search is possible. to a neural network whose accuracy meets preset conditions.
  • the policy gradient algorithm is used to update the parameters of the controller, and by controlling the search step size of the policy gradient, in the early stage of the search, the search space is effectively searched, and the probability is determined according to the accuracy of the quantized neural network. Evaluation is performed, and then feedback is performed according to the policy gradient, where the policy gradient can be expressed as: Wherein, the m is the number of test samples, ⁇ c is the parameter of the controller, T is the number of layers of the neural network, and P(a t
  • the specific task may have different requirements for the operation of the neural network, such as when the specific task is applied to some real-time scenarios
  • the processing speed of the neural network is required to be fast to meet the real-time requirements, so that the determination of the target number of quantized bits for each layer of the neural network will also have an impact.
  • adjusting the target number of quantized bits, in combination with the specific tasks to be performed by the neural network, for each layer in the neural network, according to the accuracy of the quantized neural network and the operating state of the neural network when performing specific tasks information re-search the layer's target quantization bits from the layer's search space.
  • the target number of quantized bits of each layer in the neural network is selected based on the accuracy of the quantized neural network and the operating state information related to the specific task, so that the finally obtained quantized neural network not only has higher It has good performance, and is more suitable for performing the specific task, meeting the operating requirements of the specific task, and realizing the neural network that obtains the optimal performance under the condition that the operating requirements of the specific task are met, so that the final quantized neural network can be obtained.
  • the neural network is well suited for the specific task.
  • the specific tasks include but are not limited to tasks in image processing such as face recognition tasks, expression recognition tasks or image classification tasks, etc., or tasks in natural language processing such as speech recognition tasks, text retrieval tasks, etc., This embodiment of the present application does not impose any limitation on this.
  • the operating status information includes but is not limited to the neural network execution The bandwidth occupied by the specific task, the speed at which the specific task is executed, and/or (and/or any combination of the three) the running time when the specific task is executed, so as to satisfy the specific task
  • the best performance of the neural network is obtained in the case of the running requirements.
  • the training process is time-consuming if the quantized neural network needs to be retrained each time to converge.
  • the weight value of each layer can be reused in the neural network where the target number of quantized bits of each layer was searched last time.
  • the weight value of the corresponding layer that is, the weight value of each layer in the neural network where the target number of quantized bits per layer is searched each time and the weight of the corresponding layer in the neural network where the target number of quantized bits per layer was searched last time
  • the value is the same, and then the multiplexed weight value is quantized with the target quantization bit number of each layer found this time.
  • This embodiment adopts the weight sharing method, which is beneficial to reduce the amount of calculation and improve the training efficiency.
  • FIG. 1 A schematic flowchart of a data processing method for a neural network, the method comprising:
  • step S201 for each layer in the neural network, the target quantization bit number of the layer is searched from a plurality of candidate quantization bit numbers included in the search space of the layer; and, according to the target quantization bit number of the layer Quantize the weight value and/or activation value of the layer to obtain the quantized neural network.
  • step S202 the accuracy of the quantized neural network is determined. Similar to step S102, details are not repeated here.
  • step S203 according to the precision of the quantized neural network and the search space of each layer, the target number of quantized bits of each layer is adjusted, and data processing is performed based on the adjusted neural network. Similar to step S103, details are not repeated here.
  • the neural network may include a convolution layer, a pooling layer, a fully connected layer, etc., and the parameters of the layer corresponding to layers with different properties are also different.
  • the parameters to be quantized in the convolution layer have weights and Output parameters (or activation values)
  • the parameters that need to be quantized in the pooling layer include output parameters
  • the parameters of the layer can be quantized according to the properties of the layer.
  • the weight value of the layer can be quantized according to the target quantization bit number of the layer, and the convolution operation can be performed according to the quantized weight value and the input value, Then quantify the activation value obtained by the convolution operation according to the target quantization bit number of the layer; please refer to Figure 5B, if it is a pooling layer or a fully connected layer, Figure 5B takes the pooling layer as an example, then the input value is quantized. After the activation value is obtained by the pooling operation, the activation value of the layer can be quantized according to the target quantization bit number of the layer.
  • the activation value of each layer in the neural network when the activation value of each layer in the neural network is quantized, the activation value can be quantized into discrete values according to the target quantization bit number of the layer and the preset range of the activation value. value of .
  • the activation value be x
  • the activation value after quantization is Quant x
  • is the preset range of the activation value
  • k is the target quantization bit number of this layer
  • the round() function is used to round up according to the specified number of decimal places, thereby turning continuous values into discrete values .
  • the weight value in the neural network is quantized according to the number of quantization bits of low bits (less than 8 bits)
  • the weight value of each layer is usually quantized uniformly, but if When the weight value of some channels in this layer is small, it is easy to quantize the weight value of the channel to 0, which is invalid eventually, resulting in performance degradation.
  • the weight value corresponding to each channel of the layer is quantized according to the target quantization bit number of the layer.
  • the weight values corresponding to each channel are quantized in units of channels. Compared with the quantization of the weight values in the layer in units of layers in the related art, the quantization interval is reduced, thereby improving the quantization accuracy.
  • the weight value of each layer in the neural network is quantized, for each channel of the layer, in order to prevent the abnormal value from causing the quantization error to be too large, and eventually lead to the problem that the neural network training process cannot converge, first of all The weight value of the channel is scaled to the first preset range, and then the scaled weight value is quantized. And because there is a certain quantization error in the scaling process and the quantization process, the smaller the number of quantization bits, the larger the quantization error, and the larger the number of quantization bits, the more uniform the value distribution after quantization, this embodiment is characterized by quantization parameters.
  • the quantization parameter can be a regularization coefficient, and different values of the quantization parameter correspond to different quantization errors.
  • the quantization error is related to the number of quantization bits, and each number of quantization bits corresponds to one that makes the quantization error.
  • the value of the quantization parameter is relatively small. Therefore, before scaling, it is necessary to determine the quantization parameter corresponding to the number of quantization bits.
  • the quantization parameter makes the corresponding quantization error relatively small after the parameter is quantized by using the number of quantization bits.
  • FIG. 6 shows that the weight value corresponding to a channel is quantized with different numbers of bits (respectively 2bit, 4bit and 8bit) using the quantization method of the embodiment of the present application. ) for quantization,
  • Figure 6 shows the curves of 2bit quantization (2bit quant), 4bit quantization (8bit quant) and 8bit quantization (8bit quant), it can be seen that the larger the quantization error, the larger the number of bits, and the quantized The more uniform the distribution of values.
  • the general method adopts the Straight Through Estimation (STE) method to solve the problem of reverse derivation in the training process. This method uses the numerical calculation after quantization during forward propagation.
  • STE Straight Through Estimation
  • the quantization function derivative is set to 1.
  • the value before quantization is directly derived, the model weight is updated, and the quantization function is skipped.
  • the STE method assumes that the values before and after quantization are the same. As can be seen from Figure 6, there is a certain quantization error using the STE method. The smaller the number of quantization bits, the greater the quantization error.
  • the hyperbolic tangent function tanh(x) can be used to scale the weight value of the channel to [-1, 1], please refer to Figure 7, which shows the effect of different quantization parameters, that is, the alpha parameter, on the weight value x
  • the alpha parameter is a regularization coefficient, which is used to characterize the quantization error of the weight value. Different values of the alpha parameter correspond to different quantization errors.
  • the curve goes from left to Right, the value of the alpha parameter increases in turn, the larger the value of the alpha parameter, the more jittery the curve, the gradient on both sides is close to 0, the lower the degree of discrimination, please refer to the first quadrant or the third quadrant in Figure 7, the alpha parameter
  • the larger the value is, after the absolute value of the weight before scaling exceeds a certain range, there is basically no difference in the corresponding scaled value.
  • the weight value is in [0,5 ]
  • the corresponding scaled values in the interval are different.
  • the relationship between the number of quantization bits and the quantization error, that is, the alpha parameter can be determined by combining Figure 6 and Figure 7. As the number of quantization bits increases, the value of the alpha parameter needs to be gradually reduced, so as to satisfy the neural network model. expression needs.
  • the quantization parameter of the layer is first determined according to the target quantization bit number of the layer, and the quantization parameter represents the The quantization error of the weight value, the quantization parameter is negatively correlated with the number of quantized bits; and then the weight values corresponding to each channel of the layer are quantized according to the target number of quantized bits of the layer and the quantization parameter.
  • a quantization parameter that is adaptive to the target quantization bit number of the layer is determined, thereby helping to reduce quantization errors.
  • the quantization parameter has a monotonically decreasing relationship with the target number of quantized bits.
  • the target number of quantization bits is k
  • the determined quantization parameter is ⁇ k
  • This embodiment determines a quantization parameter that is adaptive to the target quantization bit number of the layer, thereby helping to reduce quantization errors.
  • the weight value of the channel is scaled to the first preset range according to the quantization parameter of the layer. , obtain the first intermediate result; quantize the weight value of the channel according to the first intermediate result and the target quantization bit number of the layer.
  • the preset range is [-1,1].
  • the outlier may refer to the maximum value in the weight value of the channel.
  • the hyperbolic tangent function tanh(x) can be used to scale the weight value of the channel to [-1, 1], then we have
  • ⁇ k is a quantization parameter adaptive to the target quantization bit number k
  • w is a weight value of the channel.
  • the second preset range is [0, 1], so as to facilitate quantization of the weight value of each channel.
  • the second intermediate result be normalize(w)
  • tanh( ⁇ k w) is the first intermediate result.
  • the weight value of the channel is quantized by the number of quantization bits of
  • the weight value of the channel is quantized to a third preset range, and the third preset range may be [-1,1]; and then the quantization of the weight value of the channel is obtained according to the third intermediate result the result after.
  • the second intermediate result be normalize(w)
  • the target number of quantized bits be k
  • the third intermediate result be quant w (normalize(w),k)
  • FIG. 8A shows a schematic diagram of layer-by-layer quantization of weight values in units of layers
  • FIG. 8B shows the method of quantizing weight values in units of channels using an embodiment of the present application
  • 2 channels are used as an example for description.
  • the matrix in the first row in FIG. 8A and FIG. 8B is the original weight value before quantization, and the matrix in the second row is the weight value in the unit of layer.
  • the weight value after quantization, the matrix of the third layer is the absolute error of the weight value before and after the quantization, it can be seen from the comparison that the method of quantizing the weight value channel by channel according to the embodiment of the present application is used, because the quantization interval is reduced, The quantization accuracy has also been improved.
  • the absolute difference before and after quantization is significantly smaller and closer to the value before quantization, thereby further reducing the quantization error and improving the convergence speed and performance of the model.
  • an embodiment of the present application further provides a data processing apparatus 30, including: a processor 31, a memory 31 for storing executable instructions; the processor 31 executes the executable instructions is configured as:
  • For each layer in the neural network search for the target quantization bit number of the layer from a plurality of candidate quantization bit numbers included in the search space of the layer; and, according to the target quantization bit number of the layer The parameters are quantized to obtain the quantized neural network;
  • the target number of quantized bits of each layer is adjusted, and data processing is performed based on the adjusted neural network.
  • the processor 31 executes the executable instructions included in the memory 32, and the processor 31 may be a central processing unit (Central Processing Unit, CPU), or other general-purpose processors, digital signal processors (Digital Signal Processors) Processor, DSP), Application Specific Integrated Circuit (ASIC), off-the-shelf Programmable Gate Array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • the memory 32 stores executable instructions of the data processing method of the neural network
  • the memory 32 may include at least one type of storage medium, and the storage medium includes a flash memory, a hard disk, a multimedia card, a card-type memory (for example, SD or DX memory, etc.), random access memory (RAM), static random access memory (SRAM), read only memory (ROM), electrically erasable programmable read only memory (EEPROM), programmable read only memory (PROM), Magnetic memory, magnetic disk, optical disk, etc.
  • the device may cooperate with a network storage device that performs the storage function of the memory through a network connection.
  • the memory 32 may be an internal storage unit of the device 30 , such as a hard disk or a memory of the device 30 .
  • the memory 32 can also be an external storage device of the device 30, such as a plug-in hard disk, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card, and a flash memory card (Flash Card) equipped on the device 30. Wait. Further, the memory 32 may also include both an internal storage unit of the apparatus 30 and an external storage device. The memory 32 is used to store the computer program 33 and other programs and data required by the device. The memory 32 may also be used to temporarily store data that has been or will be output.
  • an external storage device of the device 30 such as a plug-in hard disk, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card, and a flash memory card (Flash Card) equipped on the device 30. Wait. Further, the memory 32 may also include both an internal storage unit of the apparatus 30 and an external storage device. The memory 32 is used to store the computer program 33 and other programs and data required by the device. The memory 32 may also be used to temporarily
  • the apparatus described in the above embodiments may be specifically implemented by a computer chip or an entity, or implemented by a product having a certain function.
  • the above-mentioned apparatus may be implemented by an electronic device, and the electronic device may be a computing device such as a desktop computer, a notebook, a palmtop computer, a server, a cloud server, and a mobile phone.
  • the apparatus 30 may include, but is not limited to, a processor 31 and a memory 32 .
  • FIG. 9 is only an example of the device 30, and does not constitute a limitation to the electronic device 30, and may include more or less components than the one shown, or combine some components, or different components,
  • the devices may also include input and output devices, network access devices, buses, and the like.
  • the processor 31 when the processor 31 adjusts the target number of quantized bits of each layer, it is specifically used for: for each layer in the neural network, according to the accuracy of the quantized neural network and the According to the running state information of the neural network when performing a specific task, the target quantization bit number of this layer is re-searched from the search space of this layer.
  • the running state information includes bandwidth and/or running time occupied by the neural network when performing the specific task.
  • the processor 31 when determining the accuracy of the quantized neural network, is specifically configured to: train the quantized neural network, and test the neural network after the training is completed to obtain the quantized neural network. the accuracy of the neural network.
  • the processor 31 when the processor 31 adjusts the target number of quantized bits of each layer, it is specifically used for: if the accuracy of the quantized neural network does not meet the preset condition, for the neural network For each layer, the target quantization bit number of this layer is re-searched from the search space of the layer according to the accuracy of the quantized neural network; if the accuracy of the quantized neural network meets the preset conditions, the quantized The post neural network is used for data processing.
  • the processor 31 when the processor 31 re-searches the target quantization bit number of the layer, it is specifically configured to: obtain the target quantization bit number of the layer searched from the search space of each layer in the neural network this time. adjust the sampling probability according to the accuracy of the quantized neural network to obtain the adjusted sampling probability; re-search the target of this layer from the search space of this layer according to the adjusted sampling probability Quantize the number of bits.
  • a controller is used to adjust the target number of quantization bits for each layer based on reinforcement learning.
  • the processor 31 is specifically configured to: update the parameters of the controller according to the accuracy of the quantized neural network and the sampling probability; adjust the sampling probability according to using the updated controller, and obtain the adjustment After sampling probability.
  • the number of candidate quantization bits in the search space of each layer in the neural network is the same.
  • the processor 31 is specifically configured to: quantize the weight value and/or activation value of the layer according to the target quantization bit number of the layer.
  • the processor 31 when quantizing the activation value, is specifically configured to: quantize the activation value into discrete numerical values according to the target quantization bit number of the layer and the preset range of the activation value. .
  • the processor 31 when quantizing the weight value, is specifically configured to: quantize the weight value corresponding to each channel of the layer according to the target quantization bit number of the layer.
  • the processor 31 when the processor 31 quantizes the weight value, it is specifically configured to: determine the quantization parameter of the layer according to the target quantization bit number of the layer, and the quantization parameter represents the quantization of the weight value. error, the quantization parameter is negatively correlated with the number of quantization bits; the weight values corresponding to each channel of the layer are respectively quantized according to the target number of quantization bits of the layer and the quantization parameter.
  • the quantization parameter has a monotonically decreasing relationship with the target number of quantized bits.
  • the processor 31 when quantizing the weight value, is specifically configured to: for each channel of the layer, scale the weight value of the channel to a first preset range according to the quantization parameter of the layer , obtain the first intermediate result; quantize the weight value of the channel according to the first intermediate result and the target quantization bit number of the layer.
  • the preset range is [-1,1].
  • the processor 31 when quantizing the weight value, is specifically configured to: for each channel of the layer, scale the first intermediate result of the channel into a second preset range, and obtain The second intermediate result; the weight value of the channel is quantized according to the second intermediate result and the number of quantization bits of the layer.
  • the second preset range is [0,1].
  • the processor 31 when quantizing the weight value, is specifically configured to: for each channel of the layer, obtain the maximum value of the second intermediate result in the channel; The maximum value of the intermediate results, the result of normalizing the maximum value, and the number of quantization bits of the layer are used to quantize the weight value of the channel.
  • the weight value of each layer in the neural network for which the target number of quantization bits per layer is found each time is the same as the weight value of the corresponding layer in the neural network for which the target number of quantization bits for each layer is found in the previous search.
  • the various embodiments described herein can be implemented using computer readable media such as computer software, hardware, or any combination thereof.
  • the embodiments described herein can be implemented using application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable gate arrays ( FPGA), processors, controllers, microcontrollers, microprocessors, electronic units designed to perform the functions described herein are implemented.
  • ASICs application specific integrated circuits
  • DSPs digital signal processors
  • DSPDs digital signal processing devices
  • PLDs programmable logic devices
  • FPGA field programmable gate arrays
  • processors controllers, microcontrollers, microprocessors, electronic units designed to perform the functions described herein are implemented.
  • embodiments such as procedures or functions may be implemented with separate software modules that allow the performance of at least one function or operation.
  • the software codes may be implemented by a software application (or program) written in any suitable programming language, which may be stored in
  • non-transitory computer-readable storage medium such as a memory including instructions, executable by a processor of an apparatus to perform the above-described method.
  • the non-transitory computer-readable storage medium may be ROM, random access memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, and the like.
  • a non-transitory computer-readable storage medium when the instructions in the storage medium are executed by the processor of the terminal, enable the terminal to execute the above method.
  • a movable platform is also provided, and the movable platform includes the above-mentioned data processing apparatus.
  • the movable platform includes, but is not limited to, an unmanned aerial vehicle, an unmanned vehicle, a mobile robot, or a PTZ.

Abstract

一种神经网络的数据处理方法、装置、可移动平台及计算机可读存储介质,所述方法包括:对于所述神经网络中的每层,从该层的搜索空间所包括的多个候选量化比特数中搜索出该层的目标量化比特数;以及,根据该层的目标量化比特数对该层的参数进行量化,获取量化后的神经网络;确定所述量化后的神经网络的精度;根据所述量化后的神经网络的精度以及所述每层的搜索空间,对所述每层的目标量化比特数进行调整,并基于调整后的神经网络进行数据处理。本申请实施例根据量化后的神经网络的精度确定搜索到所述神经网络中的每层合适的目标量化比特数,使用这样的量化后的神经网络进行数据处理,有利于减少数据处理过程中需要占用的带宽,同时提升数据处理效果。

Description

神经网络的数据处理方法、装置、可移动平台及计算机可读存储介质 技术领域
本申请涉及数据处理技术领域,具体而言,涉及一种神经网络的数据处理方法、装置、可移动平台及计算机可读存储介质。
背景技术
随着技术的发展,神经网络技术应用于生活中的方方面面,比如利用神经网络技术进行图像识别(诸如人脸识别、基于内容的图像检索或者表情识别等)、自然语言处理(诸如语音识别、文本分类或者信息检索等)等等。
然而,神经网络的运行是一个计算密集和存储密集的过程。为了节省神经网络运行过程中占用的带宽,通过会对神经网络模型中的参数进行量化。相关技术中对神经网络中的每层均采用相同的量化方法,但是神经网络中的不同层的冗余度不同,对神经网络中的每层均采用相同的量化方法会使冗余度高的层占用过多的存储资源。
发明内容
有鉴于此,本申请的目的之一是提供一种神经网络的数据处理方法、装置、可移动平台及计算机可读存储介质。
第一方面,本申请实施例提供了一种神经网络的数据处理方法,包括:
对于所述神经网络中的每层,从该层的搜索空间所包括的多个候选量化比特数中搜索出该层的目标量化比特数;以及,根据该层的目标量化比特数对该层的参数进行量化,获取量化后的神经网络;
确定所述量化后的神经网络的精度;
根据所述量化后的神经网络的精度以及所述每层的搜索空间,对所述每层的目标量化比特数进行调整,并基于调整后的神经网络进行数据处理。
第二方面,本申请实施例提供了一种数据处理装置,包括:处理器,用于存储可执行指令的存储器;所述处理器在执行所述可执行指令时,被配置为:
对于所述神经网络中的每层,从该层的搜索空间所包括的多个候选量化比特数中 搜索出该层的目标量化比特数;以及,根据该层的目标量化比特数对该层的参数进行量化,获取量化后的神经网络;
确定所述量化后的神经网络的精度;
根据所述量化后的神经网络的精度以及所述每层的搜索空间,对所述每层的目标量化比特数进行调整,并基于调整后的神经网络进行数据处理。
第三方面,本申请实施例提供了一种计算机可读存储介质,其上存储有计算机指令,该指令被处理器执行时实现第一方面所述的方法。
第四方面,本申请实施例提供了一种可移动平台,包括第二方面所述的数据处理装置。
本申请实施例所提供的一种神经网络的数据处理方法、装置、可移动平台及计算机可读存储介质,对于所述神经网络中的每层,从该层的搜索空间所包括的多个候选量化比特数中搜索出该层的目标量化比特数;以及,根据该层的目标量化比特数对该层的参数进行量化,获取量化后的神经网络;然后确定所述量化后的神经网络的精度;接着,利用所述量化后的神经网络的精度对所述神经网络每层的目标量化比特数进行调整,直到根据量化后的神经网络的精度确定搜索到所述神经网络中的每层合适的目标量化比特数,从而得到一个具有高性能的混合精度网络,对冗余度低的层,采用更高比特位比特量化,对冗余度高的层,采用更低比特位比特量化,使用这样的混合精度网络进行数据处理,有利于减少数据处理过程中需要占用的带宽,同时提升数据处理效果。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1是本申请一个实施例提供的一种神经网络的处理方法的应用场景示意图;
图2是本申请一个实施例提供的一种神经网络的处理方法的流程示意图;
图3是本申请一个实施例提供的基于强化学习获取所述量化后的神经网络的示意图;
图4是本申请一个实施例提供的另一种神经网络的处理方法的流程示意图;
图5A是本申请一个实施例提供的对卷积层的参数进行量化的示意图;
图5B是本申请一个实施例提供的对池化层的参数进行量化的示意图;
图6是本申请一个实施例提供的基于不同量化比特数进行量化的示意图;
图7是本申请一个实施例提供的基于双曲正切函数将权重值进行放缩的示意图;
图8A是本申请一个实施例提供的相关技术中逐层量化的权重值的示意图;
图8B是本申请一个实施例提供的逐通道量化的权重值的示意图;
图9是本申请一个实施例提供的一种处理装置的结构示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
相关技术中对神经网络中的每层均采用相同的量化方法,但是未考虑到:第一,神经网络中的不同层的冗余度不同,对神经网络中的每层均采用相同的量化方法会使冗余度高的层占用过多的存储资源;第二,神经网络中的不同层的计算密度也有所不同,对于计算资源和存储资源的要求也不同,比如标准卷积层为计算密集型,逐层卷积层为存储密集型,因此不同的计算密度对于该层参数的量化比特数也有不同的要求。
基于此,本申请实施例提供了一种神经网络的数据处理方法,对于神经网络中的每层均搜索到合适的目标量化比特数,从而进一步提高模型性能,并用这样的神经网络进行数据处理,有利于提高处理效率。其中,所述神经网络包括但不限于BP神经网络或者深度神经网络(DNN),其中,所述深度神经网络(DNN)一般是指包括输入层、多个隐藏层和输出层的神经网络,所述深度神经网络包括但不限于卷积神经网络(CNN)、循环神经网络(RNN)、长短期记忆网络(LSTM)等。
本申请实施例的神经网络的数据处理方法可以应用于不同的数据处理领域。在一个例子中,所述神经网络的数据处理方法可以应用于图像处理领域,如利用所述方法进行人脸识别、表情识别、图像检索、物体识别、行为分类或者姿态估计等。在另一个例子中,所述神经网络的数据处理方法可以应用于自然语言处理领域,如利用所述方法进行语音识别、文本分类、文本检索或者自动分词等。由于本申请实施例中的所述神经网络中的每层均搜索到合适的目标量化比特数,使用这样的神经网络进行数据处理,有利于提高处理效率。
在一实施例中,所述神经网络的数据处理方法可以应用于数据处理装置上,所述数据处理装置包括但不限于计算机芯片(如ARM处理器,DSP处理器,GPU处理器,FPGA处理器等)或者实体(如计算机设备)。
在一种实现方式中,当所述数据处理装置为计算机芯片时,所述数据处理装置可搭载于可移动平台上,从而为所述可移动平台提供基于神经网络的数据处理功能,所述可移动平台包括但不限于无人飞行器、无人驾驶车辆、无人驾驶船只、可移动机器人或者云台等。
在一示例性的实施例中,请参阅图1,为本申请实施例提供的一种应用场景示意图,无人飞行器11上搭载有所述数据处理装置12,所述数据处理装置上部署有所述神经网络的数据处理方法的可执行指令,在本实施例中假设所述神经网络的数据处理方法用于进行物体识别,则所述无人飞行器11可以基于所述神经网络的数据处理方法实现对目标物体的跟踪拍摄功能,具体来说,所述无人飞行器11上的拍摄装置10可以持续拍摄多张图像,进而利用所述数据处理装置11中的所述神经网络的数据处理方法的可执行指令对所述多张图像进行处理,以识别所述多张图像中的目标物体13并确定所述目标物体13在图像中的位置,进而可以根据所述目标物体13在图像中的位置来调整所述无人飞行器的飞行姿态,以实现对目标物体13的跟踪拍摄。进一步地,由于本申请实施例中的所述神经网络中的每层均搜索到合适的目标量化比特数,使用这样的神经网络进行物体识别,有利于提高处理效率。
请参阅图2,为本申请实施例提供的一种神经网络的数据处理方法的流程图,所述方法包括:
在步骤S101中,对于所述神经网络中的每层,从该层的搜索空间所包括的多个候选量化比特数中搜索出该层的目标量化比特数;以及,根据该层的目标量化比特数对该层的参数进行量化,获取量化后的神经网络。
在步骤S102中,确定所述量化后的神经网络的精度。
在步骤S103中,根据所述量化后的神经网络的精度以及所述每层的搜索空间,对所述每层的目标量化比特数进行调整,并基于调整后的神经网络进行数据处理。
首先,对于所述神经网络中的每层,定义对应于每层的搜索空间,所述搜索空间包括多个候选量化比特数,比如所述候选量化比特数为{1bit、2bit、4bit、8bit、16bit}。
考虑到目前的神经网络的层数过多,为所述神经网络中的各个层定义的搜索空间中,所述候选量化比特数可以相同,从而有利于减少工作人员的负担。当然,为所述神经网络中的各个层定义的搜索空间中,所述候选量化比特数也可以不相同,本实施 例对此不做任何限制。
在本实施例中,对于所述神经网络中的每层,从该层的搜索空间所包括的多个候选量化比特数中搜索出该层的目标量化比特数;以及,根据该层的目标量化比特数对该层的参数进行量化,获取量化后的神经网络;然后训练所述量化后的神经网络,以及测试训练完成后的神经网络,得到所述量化后的神经网络的精度;接着,利用所述量化后的神经网络的精度对所述神经网络每层的目标量化比特数进行调整,即根据所述量化后的神经网络的精度从该层的搜索空间中重新搜索该层的目标量化比特数,直到根据量化后的神经网络的精度确定搜索到所述神经网络中的每层合适的目标量化比特数,所述神经网络中的每层的目标量化比特数可以相同也可以不同,从而得到一个具有高性能的混合精度网络,对冗余度低的层,采用更高比特位比特量化,对冗余度高的层,采用更低比特位比特量化,使用这样的混合精度网络进行数据处理,有利于减少数据处理过程中需要占用的带宽,同时提升数据处理效果。
在一实施例中,在获取到所述量化后的神经网络的精度之后,如果所述量化后的神经网络的精度不符合预设条件,表明本次搜索到的所述神经网络中每层的目标量化比特数可能不合适,则对于神经网络中的每层,根据所述量化后的神经网络的精度从该层的搜索空间中重新搜索该层的目标量化比特数。如果所述量化后的神经网络的精度符合预设条件,表明本次已搜索到所述神经网络中每层合适的目标量化比特数,则结束搜索过程,将所述量化后的神经网络用于数据处理。本实施例根据所述量化后的神经网络的精度确定本次是否搜索到所述神经网络中每层合适的目标量化比特数,从而得到一个具有高性能的混合精度网络。
可以理解的是,所述预设条件可以依据实际应用场景进行具体设置,本申请实施例对此不做任何限制。
这里对根据所述量化后的神经网络的精度从该层的搜索空间中重新搜索该层的目标量化比特数进行说明:如果所述量化后的神经网络的精度不符合预设条件,表明本次搜索到的所述神经网络中每层的目标量化比特数可能不合适,则首先获取本次从神经网络中的每层的搜索空间中搜索到该层的目标量化比特数的采样概率;然后根据所述量化后的神经网络的精度对所述采样概率进行调整,获取调整后的采样概率;最后根据所述调整后的采样概率从该层的搜索空间中重新搜索该层的目标量化比特数。本实施例中,如果所述量化后的神经网络的精度高(比如本次的精度与预设精度的差值在预设范围内,表明本次的精度高),则可以提高本次神经网络中的每层采样到所述目标量化比特数的所述采样概率,如果所述量化后的神经网络的精度低(比如本次的精 度与预设精度的差值不在预设范围内,表明本次的精度低),则可以降低本次神经网络中的每层采样到所述的目标量化比特数的所述采样概率,通过调整所述采样概率的方式确保能够搜索到精度满足预设条件的神经网络。进一步地,通过调整所述采样概率的方式自动搜索所述神经网络每层的目标量化比特数,减少了人工操作的工作量,有利于提高搜索效率。
在一示例性的实施例中,本申请实施例采用强化学习的方法,使用控制器对所述每层的目标量化比特数进行调整。请参阅图3,利用循环神经网络(比如LSTM网络)来构建一个控制器,然后对于所述神经网络中的每层,利用所述控制器从该层的搜索空间所包括的多个候选量化比特数中搜索出该层的目标量化比特数,以及,根据该层的目标量化比特数对该层的参数进行量化,获取量化后的神经网络;然后训练所述量化后的神经网络,以及测试训练完成后的神经网络,得到所述量化后的神经网络的精度;接着,获取本次从神经网络的每层的搜索空间中搜索到该层的目标量化比特数的采样概率,根据所述量化后的神经网络的精度和所述采样概率,使用策略梯度算法更新所述控制器的参数;然后使用更新后的控制器调整所述采样概率,获取所述调整后的采样概率,并使用调整后的采样概率从神经网络每层的搜索空间中重新搜索该层的目标量化比特数。本实施例中,如果所述量化后的神经网络的精度高,则所述控制器可以提高本次神经网络中的每层采样到所述的目标量化比特数的所述采样概率,如果所述量化后的神经网络的精度低,则所述控制器可以降低本次神经网络中的每层采样到所述的目标量化比特数的所述采样概率,通过调整所述采样概率的方式从而确保能够搜索到精度满足预设条件的神经网络。
其中,在第一次搜索时,随机生成神经网络的每层对应的采样概率,所述控制器根据随机生成的采样概率从神经网络的每层的搜索空间中搜索出该层的目标量化比特数,在后续的迭代过程中,根据上一次得到的量化后的神经网络的精度和上一次从所述神经网络的每层的搜索空间中搜索到该层的目标量化比特数的采样概率,并使用策略梯度算法来更新所述控制器的参数,然后使用更新后的控制器调整所述采样概率,获取所述调整后的采样概率,并使用调整后的采样概率从神经网络每层的搜索空间中重新搜索该层的目标量化比特数。在迭代过程中,如果本次所述量化后的神经网络的精度高,所述控制器会提高本次神经网络中的每层采样到的目标量化比特数的所述采样概率,如果所述量化后的神经网络的精度低,则所述控制器可以降低本次神经网络中的每层采样到所述的目标量化比特数的所述采样概率,通过调整所述采样概率的方式从而确保能够搜索到精度满足预设条件的神经网络。
本申请实施例使用策略梯度算法来更新所述控制器的参数,通过控制策略梯度的搜索步长,在搜索初期,对搜索空间进行有效的搜索,并依据量化后的神经网络的精度从概率上进行评估,进而依据策略梯度进行反馈,其中,策略梯度可以表示为:
Figure PCTCN2020106865-appb-000001
其中,所述m为测试样本数、θ c为所述控制器的参数,T为所述神经网络的层数,P(a t|a (t-1):1;θ c)为神经网络中的每层搜索到该层的目标量化比特数的采样概率,R k为本次的量化后的神经网络的精度。根据策略梯度反馈的结果来更新所述控制器的参数,金进而更新后的控制器从概率的角度更新搜索策略,并基于更新后的搜索策略重新搜索所述搜索空间,这样就进行了充分有效的搜索和反馈以及策略更新。
在一实施例中,考虑到当所述神经网络应用于特定任务时,所述特定任务对于所述神经网络的运行可能有不同的要求,比如当所述特定任务应用于某些实时场景时下,要求所述神经网络的处理速度快,以能够满足实时性需求,从而对于所述神经网络每层的目标量化比特数的确定也会有所影响,因此,本申请实施例在对所述每层的目标量化比特数进行调整时,结合所述神经网络要执行的特定任务,对于神经网络中的每层,根据所述量化后的神经网络的精度以及所述神经网络执行特定任务时的运行状态信息,从该层的搜索空间中重新搜索该层的目标量化比特数。本实施例基于所述量化后的神经网络的精度和与所述特定任务相关的运行状态信息来选择神经网络中的每层的目标量化比特数,使得最终得到的量化后的神经网络不仅具有较好的性能,而且更适合去执行所述特定任务,满足所述特定任务的运行需求,实现在满足所述特定任务的运行需求的情况下获取最优性能的神经网络,使得最终得到的量化后的神经网络与所述特定任务有良好的适配性。
可以理解的是,所述特定任务包括但不限于图像处理方面的任务如人脸识别任务、表情识别任务或者图像分类任务等,或者自然语言处理方面的任务如语音识别任务、文本检索任务等,本申请实施例对此不做任何限制。
其中,考虑到某些特定任务的运行环境,比如可能有些运行所述神经网络的设备的带宽有限,或者对于实时性要求较高,因此,所述运行状态信息包括但不限于所述神经网络执行所述特定任务时所占用的带宽、执行所述特定任务时的速度和/或(和/或表示三者的任意组合)执行所述特定任务时的运行时长,从而实现在满足所述特定任务的运行需求的情况下获取最优性能的神经网络。
在一实施例中,考虑到最终得到用于进行数据处理的神经网络可能需要经过多次迭代过程,如果每次都需要从头训练所述量化后的神经网络到收敛的话,训练过程是 耗时的,基于此,在本实施例中,在每次搜索出神经网络中每层的目标量化比特数之后,每层的权重值可以复用上一次搜索出每层的目标量化比特数的神经网络中对应层的权重值,即是说,每次搜索到每层的目标量化比特数的神经网络中每层的权重值与上一次搜索到每层的目标量化比特数的神经网络中对应层的权重值相同,然后再用本次搜索到的每层的目标量化比特数对复用的权重值进行量化。本实施例采用权值共享的方式,从而有利于减少计算量,提高训练效率。
其中,根据神经网络每层的目标量化比特数对该层的参数进行量化的过程,是对该层的权重值和/或激活值进行量化的过程,请参见图4,为本申请提供的另一种神经网络的数据处理方法的流程示意图,所述方法包括:
在步骤S201中,对于所述神经网络中的每层,从该层的搜索空间所包括的多个候选量化比特数中搜索出该层的目标量化比特数;以及,根据该层的目标量化比特数对该层的权重值和/或激活值进行量化,获取量化后的神经网络。
在步骤S202中,确定所述量化后的神经网络的精度。与步骤S102类似,此处不再赘述。
在步骤S203中,根据所述量化后的神经网络的精度以及所述每层的搜索空间,对所述每层的目标量化比特数进行调整,并基于调整后的神经网络进行数据处理。与步骤S103类似,此处不再赘述。
其中,所述神经网络中可能包括卷积层、池化层或者全连接层等,不同性质的层对应的该层的参数也有所不同,比如所述卷积层中需要量化的参数有权重和输出参数(或者说激活值),所述池化层中需要量化的参数有输出参数,则可以根据该层的性质对该层的参数进行量化。
在一个例子中,请参阅图5A,如果是卷积层,则可以根据该层的目标量化比特数对该层的权重值进行量化后,根据量化后的权重值和输入值进行卷积运算,再根据该层的目标量化比特数对卷积运算得到的激活值进行量化;请参阅图5B,如果是池化层或者全连接层,图5B以池化层为例,则在对输入值进行池化运算得到激活值之后,可以根据该层的目标量化比特数对该层的激活值进行量化。
在一实施例中,在对神经网络中的每层的所述激活值进行量化时,可以根据该层的目标量化比特数以及所述激活值的预设范围,将所述激活值量化为离散的数值。在一个例子中,设所述激活值为x,量化后的激活值为Quant x,则
Figure PCTCN2020106865-appb-000002
Figure PCTCN2020106865-appb-000003
其中,β为所述激活值的预设范围,k为该层的目标量化比特数,round ()函数用于将按照指定的小数位数进行四舍五入运算,从而将连续的值变成离散的值。
在一实施例中,考虑到相关技术中,根据低比特(小于8比特)的量化比特数对神经网络中的权重值进行量化时,通常会对每一层的权重值统一进行量化,但是如果该层中的某些通道的权重值较小时,容易出现该通道的权重值量化为0,最终无效,导致性能下降。基于此,本实施例在对神经网络中的每层的所述权重值进行量化时,根据所述该层的目标量化比特数对该层各个通道对应的权重值分别进行量化,本实施例以通道为单位对各个通道对应的权重值分别进行量化,相较于相关技术中的以层为单位对该层内的权重值进行量化,量化区间缩小了,从而提升了量化精度。
本实施例在对神经网络中的每层的所述权重值进行量化时,对于该层的各个通道,为了防止异常值使得量化误差过大,最终导致神经网络训练过程无法收敛的问题,首先会将该通道的权重值放缩至第一预设范围,然后在对放缩后的权重值进行量化。又因为在放缩过程以及量化过程中存在一定的量化误差,量化比特数越小,量化误差越大,而量化比特数越大,量化后的值分布越均匀,本实施例通过量化参数来表征所述量化误差,所述量化参数可以是正则化系数,量化参数的不同值对应不同的量化误差,可以这么说,量化误差跟量化比特数相关,而每个量化比特数对应有一个使得量化误差相对小的量化参数的值,因此,在进行放缩之前,需要先确定与量化比特数对应的量化参数,该量化参数使得使用该量化比特数对参数进行量化后,相应的量化误差相对小。
在一种示例性的实施例中,请参阅图6,图6示出了使用本申请实施例的量化方法对某个通道对应的权重值以不同的量化比特数(分别是2bit、4bit和8bit)进行量化的示意图,图6示出了2bit量化(2bit quant)、4bit量化(8bit quant)和8bit量化(8bit quant)的曲线,可以看出,量化误差越大比特数越大,量化后的值分布越均匀。在低精度量化中,由于量化函数不可导,通用方法采用直通估计器(Straight Through Estimation,STE)方法,解决训练过程中反向求导问题,该方法在正向传播时,使用量化后数值计算,反向传播时,量化函数导设置为1,默认直接对量化前的数值求导,对模型权重进行更新,跳过量化函数。STE方法假设量化前后数值相同,从图6可以看出,使用STE方法存在一定的量化误差,量化比特数越小,量化误差越大。另外,可以采用双曲正切函数tanh(x)将该通道的权重值放缩至[-1,1],请参阅图7,图7示出了不同的量化参数即alpha参数对于权重值x的放缩范围的影响,alpha参数是正则化系数,用于表征所述权重值的量化误差,不同的alpha参数的值对应不同的量化误差, 从图7中的第三象限看,曲线从左到右,alpha参数的值依次增加,alpha参数的值越大,曲线越抖,在两侧的梯度接近于0,区分度越低,请参见图7中的第一象限或者第三象限,alpha参数的值越大,在放缩前权重绝对值超过一定范围后,对应放缩后的数值基本无差异,比如在第一象限中,当alpha参数的值为0.25时,权重值在[0,5]区间内对应放缩后的数值均不同,当alpha参数的值为2时,权重值在[0,5]区间内对应放缩后的数值有一部分都对应1,从而降低了神经网络模型的表达能力,因此,综合图6以及图7可以确定量化比特数与量化误差即alpha参数之间的关系,随着量化比特数的增加,alpha参数的值需要逐渐减小,这样才能满足神经网络模型的表达需求。
因此,在根据所述该层的目标量化比特数对该层各个通道对应的权重值分别进行量化时,首先根据所述该层的目标量化比特数确定该层的量化参数,所述量化参数表征所述权重值的量化误差,所述量化参数与所述量化比特数负相关;然后再根据所述该层的目标量化比特数和量化参数对该层各个通道对应的权重值分别进行量化。本实施例中,确定了一个与所述该层的目标量化比特数自适应的量化参数,从而有利于减少量化误差。
进一步地,所述量化参数与所述目标量化比特数呈单调递减关系。在一个例子中,设原始的量化参数为α 0,所述目标量化比特数为k,确定后的量化参数为α k,则α k=α 0/k。本实施例确定了一个与所述该层的目标量化比特数自适应的量化参数,从而有利于减少量化误差。
接着,为了防止异常值使得量化误差过大,最终导致神经网络训练过程无法收敛的问题,对于该层的各个通道,根据该层的量化参数将该通道的权重值放缩至第一预设范围内,获取第一中间结果;根据第一中间结果和所述该层的目标量化比特数对该通道的权重值进行量化。其中,所述预设范围为[-1,1]。所述异常值可以指该通道的权重值中的极大值。
在一个例子中,可以采用双曲正切函数tanh(x)将该通道的权重值放缩至[-1,1],则有
Figure PCTCN2020106865-appb-000004
其中,α k为与所述目标量化比特数k自适应的量化参数,w为该通道的权重值。
然后,对于该层的各个通道,将该通道的所述第一中间结果放缩至第二预设范围内,获取第二中间结果;根据所述第二中间结果和所述该层的量化比特数对该通道的权重值进行量化。其中,所述第二预设范围为[0,1],从而方便量化各通道的权重值。在一个例子中,设所述第二中间结果为normalize(w),则有
Figure PCTCN2020106865-appb-000005
Figure PCTCN2020106865-appb-000006
其中,tanh(α kw)为所述第一中间结果。
最后,对于该层的每个通道,获取该通道中的第二中间结果的最大值;根据该通道中的第二中间结果的最大值、所述最大值归一化的结果以及所述该层的量化比特数对该通道的权重值进行量化,具体来说,可以根据该通道中的第二中间结果的最大值、所述最大值归一化的结果以及所述该层的量化比特数获得第三中间结果,将该通道的权重值量化到第三预设范围,所述第三预设范围可以是[-1,1];然后根据所述第三中间结果获取该通道的权重值量化后的结果。在一个例子中,设所述第二中间结果为normalize(w),目标量化比特数为k,第三中间结果为quant w(normalize(w),k),该通道的权重值量化后的值为Quant W(w,k);则有
Figure PCTCN2020106865-appb-000007
以及Quant W(w,k)=2*quant w(normalize(w),k)-1,其中,scale channel(normalize(w))用于获取该通道中的第二中间结果的最大值;norm channel(normalize(w))用于获取该通道中的第二中间结果的最大值归一化的结果,round()函数用于将按照指定的小数位数进行四舍五入运算,从而将连续的值变成离散的值,请参阅图6,为不同量化比特数下量化得到的所述第三中间结果的示意图。
在一个例子中,请参见图8A以及图8B,图8A示出了以层为单位对权重值进行逐层量化的示意图,图8B示出了采用本申请实施例的以通道为单位对权重值进行逐个通道量化的示意图,这里以2个通道为例进行说明,图8A以及图8B中第一行的矩阵为量化前的原始权重值,第二行的矩阵为以层为单位对权重值进行量化后的权重值,第三层的矩阵为量化前后的权重值的绝对误差,对比可知,使用本申请实施例的以通道为单位对权重值进行逐个通道量化的方法,由于量化区间缩小了,使得量化精度也有所提升,量化前后绝对差值相较于7A明显偏小,更接近于量化前的数值,从而进一步减少了量化误差,可以提升模型的收敛速度和性能。
相应的,请参阅图9,本申请实施例还提供了一种数据处理装置30,包括:处理器31,用于存储可执行指令的存储器31;所述处理器31在执行所述可执行指令时,被配置为:
对于所述神经网络中的每层,从该层的搜索空间所包括的多个候选量化比特数中搜索出该层的目标量化比特数;以及,根据该层的目标量化比特数对该层的参数进行量化,获取量化后的神经网络;
确定所述量化后的神经网络的精度;
根据所述量化后的神经网络的精度以及所述每层的搜索空间,对所述每层的目标量化比特数进行调整,并基于调整后的神经网络进行数据处理。
所述处理器31执行所述存储器32中包括的可执行指令,所述处理器31可以是中央处理单元(Central Processing Unit,CPU),还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
所述存储器32存储所述神经网络的数据处理方法的可执行指令,所述存储器32可以包括至少一种类型的存储介质,存储介质包括闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘等等。而且,设备可以与通过网络连接执行存储器的存储功能的网络存储装置协作。存储器32可以是装置30的内部存储单元,例如装置30的硬盘或内存。存储器32也可以是装置30的外部存储设备,例如装置30上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。进一步地,存储器32还可以既包括装置30的内部存储单元也包括外部存储设备。存储器32用于存储计算机程序33以及设备所需的其他程序和数据。存储器32还可以用于暂时地存储已经输出或者将要输出的数据。
上述实施例阐明的装置,具体可以由计算机芯片或者实体实现,或者由具有某种功能的产品实现。例如,上述装置可以由电子设备来实现,所述电子设备可以是桌上型计算机、笔记本、掌上电脑、服务器、云服务器及手机等计算设备。
其中,装置30可包括,但不仅限于,处理器31、存储器32。本领域技术人员可以理解,图9仅仅是装置30的示例,并不构成对电子设备30的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件,例如设备还可以包括输入输出设备、网络接入设备、总线等。
在一实施例中,所述处理器31在对所述每层的目标量化比特数进行调整时,具体用于:对于神经网络中的每层,根据所述量化后的神经网络的精度以及所述神经网络执行特定任务时的运行状态信息,从该层的搜索空间中重新搜索该层的目标量化比特数。
在一实施例中,所述运行状态信息包括所述神经网络执行所述特定任务时所占用 的带宽和/或运行时长。
在一实施例中,所述处理器31在确定所述量化后的神经网络的精度时,具体用于:训练所述量化后的神经网络,以及测试训练完成后的神经网络,得到所述量化后的神经网络的精度。
在一实施例中,所述处理器31在对所述每层的目标量化比特数进行调整时,具体用于:若所述量化后的神经网络的精度不符合预设条件,对于神经网络中的每层,根据所述量化后的神经网络的精度从该层的搜索空间中重新搜索该层的目标量化比特数;若所述量化后的神经网络的精度符合预设条件,将所述量化后的神经网络用于数据处理。
在一实施例中,所述处理器31在重新搜索该层的目标量化比特数时,具体用于:获取本次从神经网络中的每层的搜索空间中搜索到该层的目标量化比特数的采样概率;根据所述量化后的神经网络的精度对所述采样概率进行调整,获取调整后的采样概率;根据所述调整后的采样概率从该层的搜索空间中重新搜索该层的目标量化比特数。
在一实施例中,基于强化学习方式,使用控制器对所述每层的目标量化比特数进行调整。
所述处理器31具体用于:根据所述量化后的神经网络的精度和所述采样概率,更新所述控制器的参数;根据使用更新后的控制器调整所述采样概率,获取所述调整后的采样概率。
在一实施例中,所述神经网络中的每层的搜索空间中的多种候选量化比特数相同。
在一实施例中,所述处理器31在量化时,具体用于:根据所述该层的目标量化比特数对该层的权重值和/或激活值进行量化。
在一实施例中,所述处理器31在对激活值量化时,具体用于:根据该层的目标量化比特数以及所述激活值的预设范围,将所述激活值量化为离散的数值。
在一实施例中,所述处理器31在对权重值量化时,具体用于:根据所述该层的目标量化比特数对该层各个通道对应的权重值分别进行量化。
在一实施例中,所述处理器31在对权重值量化时,具体用于:根据所述该层的目标量化比特数确定该层的量化参数,所述量化参数表征所述权重值的量化误差,所述量化参数与所述量化比特数负相关;根据所述该层的目标量化比特数和量化参数对该层各个通道对应的权重值分别进行量化。
在一实施例中,所述量化参数与所述目标量化比特数呈单调递减关系。
在一实施例中,所述处理器31在对权重值量化时,具体用于:对于该层的各个通 道,根据该层的量化参数将该通道的权重值放缩至第一预设范围内,获取第一中间结果;根据第一中间结果和所述该层的目标量化比特数对该通道的权重值进行量化。
在一实施例中,所述预设范围为[-1,1]。
在一实施例中,所述处理器31在对权重值量化时,具体用于:对于该层的各个通道,将该通道的所述第一中间结果放缩至第二预设范围内,获取第二中间结果;根据所述第二中间结果和所述该层的量化比特数对该通道的权重值进行量化。
在一实施例中,所述第二预设范围为[0,1]。
在一实施例中,所述处理器31在对权重值量化时,具体用于:对于该层的每个通道,获取该通道中的第二中间结果的最大值;根据该通道中的第二中间结果的最大值、所述最大值归一化的结果以及所述该层的量化比特数对该通道的权重值进行量化。
在一实施例中,每次搜索到每层的目标量化比特数的神经网络中每层的权重值与上一次搜索到每层的目标量化比特数的神经网络中对应层的权重值相同。
这里描述的各种实施方式可以使用例如计算机软件、硬件或其任何组合的计算机可读介质来实施。对于硬件实施,这里描述的实施方式可以通过使用特定用途集成电路(ASIC)、数字信号处理器(DSP)、数字信号处理装置(DSPD)、可编程逻辑装置(PLD)、现场可编程门阵列(FPGA)、处理器、控制器、微控制器、微处理器、被设计为执行这里描述的功能的电子单元中的至少一种来实施。对于软件实施,诸如过程或功能的实施方式可以与允许执行至少一种功能或操作的单独的软件模块来实施。软件代码可以由以任何适当的编程语言编写的软件应用程序(或程序)来实施,软件代码可以存储在存储器中并且由控制器执行。
对于装置实施例而言,由于其基本对应于方法实施例,所以相关之处参见方法实施例的部分说明即可。以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性劳动的情况下,即可以理解并实施。
在示例性实施例中,还提供了一种包括指令的非临时性计算机可读存储介质,例如包括指令的存储器,上述指令可由装置的处理器执行以完成上述方法。例如,非临时性计算机可读存储介质可以是ROM、随机存取存储器(RAM)、CD-ROM、磁带、软盘和光数据存储设备等。
一种非临时性计算机可读存储介质,当存储介质中的指令由终端的处理器执行时, 使得终端能够执行上述方法。
在示例性实施例中,还提供了一种可移动平台,所述可移动平台包括有上述的数据处理装置。其中,所述可移动平台包括但不限于无人飞行器、无人驾驶车辆、移动机器人或者云台等。
需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。
以上对本申请实施例所提供的方法和装置进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。

Claims (40)

  1. 一种神经网络的数据处理方法,其特征在于,包括:
    对于所述神经网络中的每层,从该层的搜索空间所包括的多个候选量化比特数中搜索出该层的目标量化比特数;以及,根据该层的目标量化比特数对该层的参数进行量化,获取量化后的神经网络;
    确定所述量化后的神经网络的精度;
    根据所述量化后的神经网络的精度以及所述每层的搜索空间,对所述每层的目标量化比特数进行调整,并基于调整后的神经网络进行数据处理。
  2. 根据权利要求1所述的方法,其特征在于,所述根据所述量化后的神经网络的精度以及所述每层的搜索空间,对所述每层的目标量化比特数进行调整,还包括:
    对于神经网络中的每层,根据所述量化后的神经网络的精度以及所述神经网络执行特定任务时的运行状态信息,从该层的搜索空间中重新搜索该层的目标量化比特数。
  3. 根据权利要求2所述的方法,其特征在于,所述运行状态信息包括所述神经网络执行所述特定任务时所占用的带宽和/或运行时长。
  4. 根据权利要求1所述的方法,其特征在于,所述确定所述量化后的神经网络的精度,包括:
    训练所述量化后的神经网络,以及测试训练完成后的神经网络,得到所述量化后的神经网络的精度。
  5. 根据权利要求1所述的方法,其特征在于,所述根据所述量化后的神经网络的精度以及所述每层的搜索空间,对所述每层的目标量化比特数进行调整,包括:
    若所述量化后的神经网络的精度不符合预设条件,对于神经网络中的每层,根据所述量化后的神经网络的精度从该层的搜索空间中重新搜索该层的目标量化比特数;
    若所述量化后的神经网络的精度符合预设条件,将所述量化后的神经网络用于数据处理。
  6. 根据权利要求5所述的方法,其特征在于,所述根据所述量化后的神经网络的精度从该层的搜索空间中重新搜索该层的目标量化比特数,包括:
    获取本次从神经网络中的每层的搜索空间中搜索到该层的目标量化比特数的采样概率;
    根据所述量化后的神经网络的精度对所述采样概率进行调整,获取调整后的采样概率;
    根据所述调整后的采样概率从该层的搜索空间中重新搜索该层的目标量化比特数。
  7. 根据权利要求6所述的方法,其特征在于,基于强化学习方式,使用控制器对所述每层的目标量化比特数进行调整;
    所述根据所述量化后的神经网络的精度对所述采样概率进行调整,获取调整后的采样概率,包括:
    根据所述量化后的神经网络的精度和所述采样概率,更新所述控制器的参数;
    使用更新后的控制器调整所述采样概率,获取所述调整后的采样概率。
  8. 根据权利要求1所述的方法,其特征在于,所述神经网络中的每层的搜索空间中的多种候选量化比特数相同。
  9. 根据权利要求1所述的方法,其特征在于,所述根据该层的目标量化比特数对该层的参数进行量化,包括:
    根据所述该层的目标量化比特数对该层的权重值和/或激活值进行量化。
  10. 根据权利要求9所述的方法,其特征在于,所述根据所述该层的目标量化比特数对该层的激活值进行量化,包括:
    根据该层的目标量化比特数以及所述激活值的预设范围,将所述激活值量化为离散的数值。
  11. 根据权利要求9所述的方法,其特征在于,所述根据所述该层的目标量化比特数对该层的权重值进行量化,包括:
    根据所述该层的目标量化比特数对该层各个通道对应的权重值分别进行量化。
  12. 根据权利要求11所述的方法,其特征在于,所述根据所述该层的目标量化比特数对该层各个通道对应的权重值分别进行量化,包括:
    根据所述该层的目标量化比特数确定该层的量化参数,所述量化参数表征所述权重值的量化误差,所述量化参数与所述量化比特数负相关;
    根据所述该层的目标量化比特数和量化参数对该层各个通道对应的权重值分别进行量化。
  13. 根据权利要求12所述的方法,其特征在于,所述量化参数与所述目标量化比特数呈单调递减关系。
  14. 根据权利要求12所述的方法,其特征在于,所述根据所述该层的目标量化比特数和量化参数对该层根据所述该层的目标量化比特数和量化参数对该层各个通道对应的权重值分别进行量化,包括:
    对于该层的各个通道,根据该层的量化参数将该通道的权重值放缩至第一预设范围内,获取第一中间结果;
    根据第一中间结果和所述该层的目标量化比特数对该通道的权重值进行量化。
  15. 根据权利要求14所述的方法,其特征在于,所述预设范围为[-1,1]。
  16. 根据权利要求14所述的方法,其特征在于,所述根据第一中间结果和所述该层的目标量化比特数对该通道的权重值进行量化,包括:
    对于该层的各个通道,将该通道的所述第一中间结果放缩至第二预设范围内,获取第二中间结果;
    根据所述第二中间结果和所述该层的量化比特数对该通道的权重值进行量化。
  17. 根据权利要求16所述的方法,其特征在于,所述第二预设范围为[0,1]。
  18. 根据权利要求16所述的方法,其特征在于,所述根据所述第二中间结果和所述该层的量化比特数对该通道的权重值进行量化,包括:
    对于该层的每个通道,获取该通道中的第二中间结果的最大值;
    根据该通道中的第二中间结果的最大值、所述最大值归一化的结果以及所述该层的量化比特数对该通道的权重值进行量化。
  19. 根据权利要求1所述的方法,其特征在于,每次搜索到每层的目标量化比特数的神经网络中每层的权重值与上一次搜索到每层的目标量化比特数的神经网络中对应层的权重值相同。
  20. 一种数据处理装置,其特征在于,包括:处理器,用于存储可执行指令的存储器;所述处理器在执行所述可执行指令时,被配置为:
    对于所述神经网络中的每层,从该层的搜索空间所包括的多个候选量化比特数中搜索出该层的目标量化比特数;以及,根据该层的目标量化比特数对该层的参数进行量化,获取量化后的神经网络;
    确定所述量化后的神经网络的精度;
    根据所述量化后的神经网络的精度以及所述每层的搜索空间,对所述每层的目标量化比特数进行调整,并基于调整后的神经网络进行数据处理。
  21. 根据权利要求20所述的装置,其特征在于,所述处理器在对所述每层的目标量化比特数进行调整时,具体用于:对于神经网络中的每层,根据所述量化后的神经网络的精度以及所述神经网络执行特定任务时的运行状态信息,从该层的搜索空间中重新搜索该层的目标量化比特数。
  22. 根据权利要求21所述的装置,其特征在于,所述运行状态信息包括所述神经网络执行所述特定任务时所占用的带宽和/或运行时长。
  23. 根据权利要求20所述的装置,其特征在于,所述处理器在确定所述量化后的神经网络的精度时,具体用于:训练所述量化后的神经网络,以及测试训练完成后的神经网络,得到所述量化后的神经网络的精度。
  24. 根据权利要求20所述的装置,其特征在于,所述处理器在对所述每层的目标量化比特数进行调整时,具体用于:若所述量化后的神经网络的精度不符合预设条件,对于神经网络中的每层,根据所述量化后的神经网络的精度从该层的搜索空间中重新搜索该层的目标量化比特数;若所述量化后的神经网络的精度符合预设条件,将所述量化后的神经网络用于数据处理。
  25. 根据权利要求24所述的装置,其特征在于,所述处理器在重新搜索该层的目标量化比特数时,具体用于:
    获取本次从神经网络中的每层的搜索空间中搜索到该层的目标量化比特数的采样概率;
    根据所述量化后的神经网络的精度对所述采样概率进行调整,获取调整后的采样概率;
    根据所述调整后的采样概率从该层的搜索空间中重新搜索该层的目标量化比特数。
  26. 根据权利要求25所述的装置,其特征在于,基于强化学习方式,使用控制器对所述每层的目标量化比特数进行调整;
    所述处理器具体用于:根据所述量化后的神经网络的精度和所述采样概率,更新所述控制器的参数;根据使用更新后的控制器调整所述采样概率,获取所述调整后的采样概率。
  27. 根据权利要求20所述的装置,其特征在于,所述神经网络中的每层的搜索空间中的多种候选量化比特数相同。
  28. 根据权利要求20所述的装置,其特征在于,所述处理器在量化时,具体用于:根据所述该层的目标量化比特数对该层的权重值和/或激活值进行量化。
  29. 根据权利要求28所述的装置,其特征在于,所述处理器在对激活值量化时,具体用于:根据该层的目标量化比特数以及所述激活值的预设范围,将所述激活值量化为离散的数值。
  30. 根据权利要求28所述的装置,其特征在于,所述处理器在对权重值量化时,具体用于:根据所述该层的目标量化比特数对该层各个通道对应的权重值分别进行量化。
  31. 根据权利要求28所述的装置,其特征在于,所述处理器在对权重值量化时, 具体用于:
    根据所述该层的目标量化比特数确定该层的量化参数,所述量化参数表征所述权重值的量化误差,所述量化参数与所述量化比特数负相关;
    根据所述该层的目标量化比特数和量化参数对该层各个通道对应的权重值分别进行量化。
  32. 根据权利要求31所述的装置,其特征在于,所述量化参数与所述目标量化比特数呈单调递减关系。
  33. 根据权利要求31所述的装置,其特征在于,所述处理器在对权重值量化时,具体用于:
    对于该层的各个通道,根据该层的量化参数将该通道的权重值放缩至第一预设范围内,获取第一中间结果;
    根据第一中间结果和所述该层的目标量化比特数对该通道的权重值进行量化。
  34. 根据权利要求33所述的装置,其特征在于,所述预设范围为[-1,1]。
  35. 根据权利要求33所述的装置,其特征在于,所述处理器在对权重值量化时,具体用于:
    对于该层的各个通道,将该通道的所述第一中间结果放缩至第二预设范围内,获取第二中间结果;
    根据所述第二中间结果和所述该层的量化比特数对该通道的权重值进行量化。
  36. 根据权利要求35所述的装置,其特征在于,所述第二预设范围为[0,1]。
  37. 根据权利要求35所述的装置,其特征在于,所述处理器在对权重值量化时,具体用于:
    对于该层的每个通道,获取该通道中的第二中间结果的最大值;
    根据该通道中的第二中间结果的最大值、所述最大值归一化的结果以及所述该层的量化比特数对该通道的权重值进行量化。
  38. 根据权利要求20所述的装置,其特征在于,每次搜索到每层的目标量化比特数的神经网络中每层的权重值与上一次搜索到每层的目标量化比特数的神经网络中对应层的权重值相同。
  39. 一种计算机可读存储介质,其特征在于,其上存储有计算机指令,该指令被处理器执行时实现权利要求1至19任意一项所述的方法。
  40. 一种可移动平台,其特征在于,包括如权利要求20至38任意一项所述的数据处理装置。
PCT/CN2020/106865 2020-08-04 2020-08-04 神经网络的数据处理方法、装置、可移动平台及计算机可读存储介质 WO2022027242A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/106865 WO2022027242A1 (zh) 2020-08-04 2020-08-04 神经网络的数据处理方法、装置、可移动平台及计算机可读存储介质

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/106865 WO2022027242A1 (zh) 2020-08-04 2020-08-04 神经网络的数据处理方法、装置、可移动平台及计算机可读存储介质

Publications (1)

Publication Number Publication Date
WO2022027242A1 true WO2022027242A1 (zh) 2022-02-10

Family

ID=80118682

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/106865 WO2022027242A1 (zh) 2020-08-04 2020-08-04 神经网络的数据处理方法、装置、可移动平台及计算机可读存储介质

Country Status (1)

Country Link
WO (1) WO2022027242A1 (zh)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200012926A1 (en) * 2018-07-05 2020-01-09 Hitachi, Ltd. Neural network learning device and neural network learning method
CN110852438A (zh) * 2019-11-11 2020-02-28 北京百度网讯科技有限公司 模型生成方法和装置
CN110852421A (zh) * 2019-11-11 2020-02-28 北京百度网讯科技有限公司 模型生成方法和装置
CN110889503A (zh) * 2019-11-26 2020-03-17 中科寒武纪科技股份有限公司 数据处理方法、装置、计算机设备和存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200012926A1 (en) * 2018-07-05 2020-01-09 Hitachi, Ltd. Neural network learning device and neural network learning method
CN110852438A (zh) * 2019-11-11 2020-02-28 北京百度网讯科技有限公司 模型生成方法和装置
CN110852421A (zh) * 2019-11-11 2020-02-28 北京百度网讯科技有限公司 模型生成方法和装置
CN110889503A (zh) * 2019-11-26 2020-03-17 中科寒武纪科技股份有限公司 数据处理方法、装置、计算机设备和存储介质

Similar Documents

Publication Publication Date Title
US11676029B2 (en) Neural network quantization parameter determination method and related products
US20210286688A1 (en) Neural Network Quantization Parameter Determination Method and Related Products
KR102566480B1 (ko) 신경망 프루닝 및 재훈련을 위한 자동 임계값들
US10643124B2 (en) Method and device for quantizing complex artificial neural network
EP4020329A1 (en) Data processing method and apparatus, computer equipment and storage medium
JP2022501675A (ja) データ処理方法、装置、コンピュータデバイス、及び記憶媒体
KR20200060302A (ko) 처리방법 및 장치
JP2022501677A (ja) データ処理方法、装置、コンピュータデバイス、及び記憶媒体
WO2021089013A1 (zh) 空间图卷积网络的训练方法、电子设备及存储介质
WO2022042123A1 (zh) 图像识别模型生成方法、装置、计算机设备和存储介质
CN111079899A (zh) 神经网络模型压缩方法、系统、设备及介质
WO2022095432A1 (zh) 神经网络模型训练方法、装置、计算机设备及存储介质
US20200082213A1 (en) Sample processing method and device
WO2022267717A1 (zh) 模型训练方法、装置及可读存储介质
CN113255910A (zh) 卷积神经网络的剪枝方法、装置、电子设备和存储介质
WO2023071592A1 (zh) 面向超大搜索空间的网络结构搜索方法、系统及介质
CN116188878A (zh) 基于神经网络结构微调的图像分类方法、装置和存储介质
JP2022512211A (ja) 画像処理方法、装置、車載演算プラットフォーム、電子機器及びシステム
WO2022027242A1 (zh) 神经网络的数据处理方法、装置、可移动平台及计算机可读存储介质
CN114239799A (zh) 一种高效目标检测方法、设备、介质和系统
WO2021038190A1 (en) Skip predictor for pre-trained recurrent neural networks
TWI767122B (zh) 模型建構方法、系統及非揮發性電腦可讀取記錄媒體
KR102657904B1 (ko) 뉴럴 네트워크에서의 다중 레벨 단계적 양자화 방법 및 장치
CN113780101A (zh) 避障模型的训练方法、装置、电子设备及存储介质
CN116737607B (zh) 样本数据缓存方法、系统、计算机设备和存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20948015

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20948015

Country of ref document: EP

Kind code of ref document: A1