WO2020134829A1 - 一种神经网络批归一化的优化方法及装置 - Google Patents

一种神经网络批归一化的优化方法及装置 Download PDF

Info

Publication number
WO2020134829A1
WO2020134829A1 PCT/CN2019/121522 CN2019121522W WO2020134829A1 WO 2020134829 A1 WO2020134829 A1 WO 2020134829A1 CN 2019121522 W CN2019121522 W CN 2019121522W WO 2020134829 A1 WO2020134829 A1 WO 2020134829A1
Authority
WO
WIPO (PCT)
Prior art keywords
layer
network layer
layers
target network
network
Prior art date
Application number
PCT/CN2019/121522
Other languages
English (en)
French (fr)
Inventor
肖海勇
Original Assignee
深圳云天励飞技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳云天励飞技术有限公司 filed Critical 深圳云天励飞技术有限公司
Priority to US17/257,314 priority Critical patent/US11270208B2/en
Publication of WO2020134829A1 publication Critical patent/WO2020134829A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions

Definitions

  • the present application relates to the field of neural networks, and in particular to a method and device for optimization of batch-normalized neural networks.
  • Batch Normalization is a method to speed up the deep network training process.
  • the processing of Batch Norm needs to introduce bias manually.
  • the bias is artificially introduced
  • the setting is not removed in time, it may be gradually enlarged.
  • the offset is too large, it will cause difficulty in implementing the fixed-point network.
  • the output data of each layer of the network is directly quantized and truncated, a large amount of bit width will be wasted on invalid data offset, and the data that needs bit quantization to remove the offset can only get the remaining low-level quantization bits , Causing the network's fixed-point location to expand.
  • the embodiments of the present application provide a method and a device for optimizing the batch normalization of a neural network, which can reduce the influence of artificially introduced offsets and effectively reduce the fixed-point position width requirements of the network.
  • the first aspect of the embodiments of the present application provides a batch network normalization optimization method, including:
  • the offset values of different network layers from the starting layer to the previous layer of the target network layer are all set to 0, and the equivalent of the target network layer is set
  • the offset value is used as the offset value of the target network layer.
  • the target network layer is not the last layer of the neural network
  • the next layer of the target network layer is set as the starting layer, and the calling preset algorithm is executed to the slave
  • the initial offset values of the different network layers that are sequentially acquired from the start layer are processed separately to obtain the equivalent offset values of the different network layers.
  • a second aspect of the embodiments of the present application provides a neural network batch normalization optimization device.
  • the device includes:
  • the initial layer setting module is used to set the first network layer in the neural network as the initial layer
  • An initial offset value acquiring module configured to sequentially acquire initial offset values of different network layers starting from the initial layer backwards
  • a data processing module configured to call a preset algorithm to process the initial offset values of the different network layers respectively to obtain equivalent offset values of the different network layers;
  • the judgment module is used to judge whether there is a target network layer, wherein the ratio between the equivalent offset value corresponding to the previous layer of the target network layer and the equivalent offset value corresponding to the target network layer is not Less than the preset threshold;
  • the optimization processing module is used to set the offset values of different network layers from the starting layer to the previous layer of the target network layer to 0 when the target network layer exists, and set the The equivalent offset value of the target network layer is used as the offset value of the target network layer.
  • a third aspect of the embodiments of the present application provides a terminal, including a processor, an input device, an output device, and a memory, where the processor, input device, output device, and memory are connected to each other, wherein the memory is used to store a computer program
  • the computer program includes program instructions, and the processor is configured to call the program instructions to execute the step instructions as in the first aspect of the embodiments of the present application.
  • a fourth aspect of the embodiments of the present application provides a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program for electronic data exchange, wherein the computer program causes the computer to execute the first embodiment of the present application. Part or all of the steps described in one aspect.
  • a fifth aspect of an embodiment of the present application provides a computer program product, wherein the computer program product includes a non-transitory computer-readable storage medium that stores the computer program, and the computer program is operable to cause the computer to execute as implemented in the present application Examples of some or all of the steps described in the first aspect.
  • the computer program product may be a software installation package.
  • the initial offset value of each network layer is processed to obtain the equivalent offset value of each network layer, and the target network layer is obtained by acquiring the corresponding previous layer of the target network layer
  • the ratio between the equivalent offset value and the equivalent offset value corresponding to the target network layer is not less than a preset threshold, so as to determine the equivalent offset value of the target network layer is a smaller equivalent offset within a certain range If the value is set, the offset value of each network layer before the target network layer is set to 0, and the equivalent offset value of the target network layer is used as the offset value of the target network layer, thus ensuring the bias of the non-target network layer
  • the setting value is 0, the offset value of the target network layer is a very small offset value, and the offset values of the different network layers that reach the final neural network are small, which solves the artificial introduction in the neural network with batch normalization operation
  • the offset is too large, which leads to the problem of wasting high-order information bits when the information is fixed. Using this solution can effectively reduce the bit width of the fixed-point network, which
  • FIG. 1 is an interaction schematic diagram of a neural network batch normalization optimization device provided by an embodiment of this application;
  • FIG. 2 is a schematic flowchart of a neural network batch normalization optimization method provided by an embodiment of the present application
  • FIG. 3 is a schematic flowchart of a neural network batch normalization optimization method provided by an embodiment of the present application
  • FIG. 4 is a schematic structural diagram of a terminal according to an embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of a neural network batch normalization optimization device provided by an embodiment of the present application.
  • FIG. 1 provides a schematic diagram of an optimized interaction of neural network batch normalization according to an embodiment of the present application.
  • the interaction includes a terminal 101 and a preset algorithm database 102, where the terminal 101 sets the first network layer in the neural network as the starting layer, and the terminal 101 calls the preset from the preset algorithm database 102
  • the algorithm calculates the equivalent bias values of different network layers starting from the starting layer; the terminal 101 determines whether there is a target network layer, wherein the equivalent bias value corresponding to the previous layer of the target network layer The ratio between the equivalent bias values corresponding to the target network layer is not less than a preset threshold; if it exists, the terminal 101 transfers from the starting layer to the previous layer of the target network layer
  • the offset values of different network layers are all set to 0, and the equivalent offset value of the target network layer is used as the offset value of the target network layer; if there is no target network layer, the initial offset of each network layer
  • the setting value is not processed; the terminal 101 confirms whether the target network layer is located in
  • the initial offset value of each network layer is processed to obtain the equivalent offset value of each network layer, and the target network layer is obtained by acquiring the corresponding previous layer of the target network layer
  • the ratio between the equivalent offset value and the equivalent offset value corresponding to the target network layer is not less than a preset threshold, so as to determine the equivalent offset value of the target network layer is a smaller equivalent offset within a certain range If the value is set, the offset value of each network layer before the target network layer is set to 0, and the equivalent offset value of the target network layer is used as the offset value of the target network layer, thus ensuring the bias of the non-target network layer
  • the setting value is 0, the offset value of the target network layer is a very small offset value, and the offset values of the different network layers that reach the final neural network are small, which solves the artificial introduction in the neural network with batch normalization operation
  • the offset is too large, which leads to the problem of wasting high-order information bits when the information is fixed. Using this solution can effectively reduce the bit width of the fixed-point network, which
  • FIG. 2 is a schematic flowchart of an optimization method for batch normalization of a neural network according to an embodiment of the present application. As shown in FIG. 2, it may include steps 201-205, as follows:
  • the initial offset value refers to the bias value obtained from the machine training and learning in the neural network, as the initial offset value before optimization;
  • the equivalent offset value of the first network layer of the initial layer is the initial offset value of the first network layer
  • the calculation of the equivalent offset value of the second network layer is The initial offset value of the initial layer is multiplied by the sum of the weights of the second network layer, and the initial offset value of the second network layer is added to obtain the equivalent offset of the second network layer Value
  • the equivalent offset value of the Nth network layer is the equivalent offset value of the N-1th network layer multiplied by the weight of the Nth network layer and the resulting value, plus the The initial offset value of the N network layers is the equivalent offset value of the Nth network layer.
  • the preset threshold may be 100, 1000, or several thousand. For example, if the preset threshold is 100, the equivalent bias value of the target network layer is 3.5, and the equivalent of the previous layer of the target network layer is equivalent. If the bias value is 2000, 2000 divided by 3.5 equals 571.43, then 571.43 is not less than the preset threshold.
  • the target network layer exists, set the offset values of different network layers from the starting layer to the previous layer of the target network layer to 0, and set the target network layer's offset value to 0.
  • the equivalent offset value is used as the offset value of the target network layer;
  • the offset values of different network layers from the starting layer to the previous layer of the target network layer are all set to 0, and the equalities of the target network layer are set.
  • the effective offset value as the offset value of the target network layer; that is, by optimizing the offset value of the network layer corresponding to the equivalent offset value satisfying the condition to its corresponding equivalent offset value, the network layer
  • the bias value of each previous network layer is optimized to 0, so that the bias of the different network layers of the final neural network is small, which solves the artificially introduced bias in the neural network with batch normalization operation, resulting in information The problem of wasting high-order information bits when targeting.
  • it also includes: confirming that the target network layer is not the last layer of the neural network, then setting the next layer of the target network layer as the starting layer, and executing the calling preset algorithm on the The initial offset values of different network layers sequentially obtained from the starting layer to the back are processed separately to obtain equivalent offset values of the different network layers.
  • the initial offset value of each network layer is processed to obtain the equivalent offset value of each network layer, and the target network layer is obtained by acquiring the corresponding previous layer of the target network layer
  • the ratio between the equivalent offset value and the equivalent offset value corresponding to the target network layer is not less than a preset threshold, so as to determine the equivalent offset value of the target network layer is a smaller equivalent offset within a certain range If the value is set, the offset value of each network layer before the target network layer is set to 0, and the equivalent offset value of the target network layer is used as the offset value of the target network layer, thus ensuring the bias of the non-target network layer
  • the setting value is 0, the offset value of the target network layer is a very small offset value, and the offset values of the different network layers that reach the final neural network are small, which solves the artificial introduction in the neural network with batch normalization operation
  • the offset is too large, which leads to the problem of wasting high-order information bits when the information is fixed. Using this solution can effectively reduce the bit width of the fixed-point network, which
  • the smallest equivalent bias value among the equivalent bias values respectively corresponding to the N target network layers may be obtained; the network layer corresponding to the smallest equivalent bias value is determined as the optimized target network layer.
  • the offset value of each network layer before the target network layer is 0, so that the offset value of the different network layers of the final neural network is small, It solves the problem that the artificially introduced bias in the neural network with batch normalization operation is too large, resulting in the waste of high-order information bits when the information is fixed.
  • the last network layer of the N target network layers is determined as the target network layer.
  • the network layer corresponding to the equivalent bias value at the back as the target network layer By determining the network layer corresponding to the equivalent bias value at the back as the target network layer, compared to selecting the network layer at the front as the target network layer, when selecting the network layer at the front as the target network layer, When obtaining the target network layer in one cycle, it is necessary to use the next layer of the current target network layer as the starting layer, sequentially obtain the equivalent offset values of the next layer of the target network layer and the subsequent layers, and obtain the The ratio of the equivalent offset values of the next layer of the target network layer and the adjacent front and back layers of each layer behind it; therefore, in comparison, the network layer corresponding to the equivalent bias value behind is selected to a certain extent as the target
  • the network layer can reduce the calculation amount of the optimization method when selecting the target network layer, and improve the efficiency of optimization processing.
  • FIG. 3 is a schematic flowchart of a neural network batch normalization optimization method according to an embodiment of the present application.
  • the activation operation of the activation layer needs to be modified accordingly, that is, the activation functions of different activation layers need to be modified accordingly, otherwise if the data of less than 0 is still pressed if the bias of the network layer is not added Set to 0 will generate an error.
  • it may include steps 301-309, as follows:
  • the target network layer exists, set the offset values of different network layers from the starting layer to the previous layer of the target network layer to 0, and set the target network layer's offset value to 0.
  • the equivalent offset value is used as the offset value of the target network layer;
  • the initial offset values of all network layers starting from the starting layer are not changed, and then processed according to the batch normalization process of the normal neural network.
  • the initial activation functions of the different activation layers are respectively Perform processing to obtain activation functions of different activation layers from the starting layer to the target network layer;
  • the conventional activation operation is performed using the following activation function:
  • the optimized activation function is as follows:
  • x is the input value of the activation function of the different activation layers before adjustment
  • bias_eq is the equivalent bias of the different activation layers
  • relu_new(x) is the modified activation function.
  • the initial offset value of each network layer is processed to obtain the target network layer, the offset value of each network layer before the target network layer is set to 0, and the equivalent of the target network layer is set
  • the offset value is used as the offset value of the target network layer, and accordingly the activation function of the activation layer is modified accordingly in order to ensure the normal operation of the activation operation.
  • the method further includes:
  • padding_new(y) y-bias_eq
  • y is the initial padding value of different network layers before optimization
  • the algorithm is used to adjust the padding value of the layer with the initial padding value in the neural network accordingly.
  • the filling values of different network layers are adjusted respectively by a specific algorithm, so that the optimization of the neural network guarantees the Normalization of other parameters of the neural network.
  • the method further includes:
  • Each batch normalization layer of the P'batch normalization layers is merged with the previous network layer corresponding to each batch normalization layer.
  • the P'batch normalization layers that meet the condition are merged with their corresponding previous network layers, so that the number of layers of the neural network becomes less, then the corresponding calculations are performed.
  • the calculation amount of the equivalent offset value of the network layer and the search for the target network layer is reduced, and the processing efficiency is improved.
  • the k-th layer is the Batch Norm layer and includes multiple outputs.
  • the i-th output of the k-th Batch Norm layer can be expressed as:
  • data k [i,:,:] is the input data of the kth layer
  • gamma k [i] beta k [i] are the parameters of the ith way of the kth network layer obtained by training
  • data_var k [ i] is the variance of the i-th path of the k-th network layer obtained by training
  • data_mean k [i] the average difference of the i-th path of the k-th network layer obtained by training
  • is a constant
  • the i-th output expression of the k-th network layer can be simplified as follows:
  • weight k [i] represents the weight of the ith channel of the kth network layer
  • bias k [i] represents the offset value of the ith channel of the kth network layer before merging
  • the formula (2) can be processed when the neural network is deployed, and it is merged into the k-1 network layer, where the k-1 network layer is a convolutional layer or a full
  • the connection layer namely:
  • weight_new k-1 [i] weight k [i]*weight k-1 [i];
  • bias_new k-1 [i] weight k [i]*bias k-1 [i]+bias k [i];
  • bias_new k-1 [i] represents the initial bias value of the ith channel of the k-1 network layer after being merged according to the above formula
  • weight_new k-1 [i] represents the ith channel of the k-1 network layer
  • the transformed weight That is, the output of the kth layer can be represented by the k-1th layer, and the k-1th layer uses weight_new k-1 and bias_new k-1 as coefficients and offsets.
  • the batch normalization layer and its corresponding previous convolutional layer or fully connected layer are merged, so that the number of layers of the neural network becomes less, and the corresponding calculation amount when calculating the equivalent offset value of each network layer Reduce, improve processing efficiency.
  • the k-th network layer When the k-th network layer is a convolutional or fully connected layer, the k-th network layer includes multiple branches, and the equivalent bias uses the following formula to pass the stack back:
  • bias_eq k [i] bias_new k [i]+sum(weight_new k-1 [i])*bias_eq k-1 ; (4)
  • bias_eq k+1 [i] bias_new k+1 [i]+sum(weight_new k [i])*bias_eq k ; (5)
  • bias_eq k [i] is the equivalent offset value of the i-th channel of the k-th network layer.
  • the equivalent offset value of each layer After calculating the equivalent offset value of each layer, if the ratio between the equivalent offset value of the previous layer of a certain network layer and the equivalent offset value of the corresponding network layer is not less than the preset threshold , Then all offset values of the layer from the starting layer to the network layer are set to 0 at this layer, and the equivalent offset value of the layer is used as the offset value of the layer.
  • the offset value of each layer before this layer the fixed-point data before this layer mainly encodes the effective data with rich variations, which can greatly reduce the bit width of the network.
  • the normal activation operation is as follows:
  • x is the input value of the activation operation of the different activation layers before adjustment
  • bias_eq is the equivalent offset of the different activation layers
  • relu_new(x) is the modified activation function.
  • the padding value needs to be modified accordingly.
  • the modification method is similar to the activation operation. If the padding value is 0, the padding value needs to be modified to -bias_eq. Generally, if the padding value is v, the modified padding value is v-bias_eq.
  • the activation operations of different network layers and the padding values of the layers with padding values are adjusted accordingly by specific algorithms, so that the optimization and adjustment of the neural network guarantees the normalization of other parameters of the neural network.
  • FIG. 4 is a schematic structural diagram of a terminal provided by an embodiment of the present application. As shown in the figure, it includes a processor, an input device, an output device, and a memory. The input device, the output device, and the memory are connected to each other, wherein the memory is used to store a computer program, the computer program includes program instructions, the processor is configured to call the program instructions, and the above program includes to execute the following Step instructions
  • the offset values of different network layers from the starting layer to the previous layer of the target network layer are all set to 0, and the equivalent of the target network layer is set
  • the offset value is used as the offset value of the target network layer
  • the initial offset value of each network layer is processed to obtain the equivalent offset value of each network layer, and the target network layer is obtained by acquiring the corresponding previous layer of the target network layer
  • the ratio between the equivalent offset value and the equivalent offset value corresponding to the target network layer is not less than a preset threshold, so as to determine the equivalent offset value of the target network layer is a smaller equivalent offset within a certain range If the value is set, the offset value of each network layer before the target network layer is set to 0, and the equivalent offset value of the target network layer is used as the offset value of the target network layer, thus ensuring the bias of the non-target network layer
  • the setting value is 0, the offset value of the target network layer is a very small offset value, and the offset values of the different network layers that reach the final neural network are small, which solves the artificial introduction in the neural network with batch normalization operation
  • the offset is too large, which leads to the problem of wasting high-order information bits when the information is fixed. Using this solution can effectively reduce the bit width of the fixed-point network, which
  • the terminal includes a hardware structure and/or a software module corresponding to each function.
  • the present application can be implemented in the form of hardware or a combination of hardware and computer software. Whether a function is performed by hardware or computer software driven hardware depends on the specific application of the technical solution and design constraints. Professional technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered beyond the scope of this application.
  • the embodiments of the present application may divide the functional unit of the terminal according to the above method example, for example, each functional unit may be divided corresponding to each function, or two or more functions may be integrated into one processing unit.
  • the above integrated unit can be implemented in the form of hardware or software function unit. It should be noted that the division of the units in the embodiments of the present application is schematic, and is only a division of logical functions, and there may be other division manners in actual implementation.
  • FIG. 5 is a schematic structural diagram of a neural network batch normalization optimization device according to an embodiment of the present application.
  • the device includes an initial layer setting module 501 and an initial offset value acquisition.
  • Module 502 data processing module 503, judgment module 504, optimization processing module 505, where:
  • the starting layer setting module 501 is used to set the first network layer in the neural network as the starting layer;
  • An initial offset value obtaining module 502 configured to sequentially obtain initial offset values of different network layers starting from the starting layer backwards;
  • the data processing module 503 is configured to call a preset algorithm to separately process the initial offset values of the different network layers to obtain equivalent offset values of the different network layers;
  • the judgment module 504 is used to judge whether there is a target network layer, wherein the ratio between the equivalent offset value corresponding to the previous layer of the target network layer and the equivalent offset value corresponding to the target network layer Not less than the preset threshold;
  • the optimization processing module 505 is configured to set the offset values of different network layers from the starting layer to the previous layer of the target network layer to 0 when the target network layer exists, and set The equivalent offset value of the target network layer is used as the offset value of the target network layer.
  • an activation function processing module for:
  • padding value processing module for:
  • a merge processing module for:
  • each batch normalization layer in each batch normalization layer is merged with the previous network layer corresponding to each batch normalization layer respectively, wherein each of the P′ batch normalization layers
  • the previous network layer of each batch normalization layer is a convolutional layer/fully connected layer, 1 ⁇ P' ⁇ P.
  • a target network layer determination module for:
  • N target network layers When there are N target network layers, obtain the smallest equivalent offset value among the equivalent offset values corresponding to the N target network layers, respectively, where N is an integer greater than 1, and The network layer corresponding to the minimum equivalent offset value is determined as the target network layer;
  • it is used to determine the last network layer among the N target network layers as the target network layer when there are N target network layers.
  • the initial offset value of each network layer is processed to obtain the equivalent offset value of each network layer, and the target network layer is obtained by acquiring the corresponding previous layer of the target network layer
  • the ratio between the equivalent offset value and the equivalent offset value corresponding to the target network layer is not less than a preset threshold, in order to determine the equivalent offset value of the target network layer is a smaller equivalent offset within a certain range If the value is set, the offset value of each network layer before the target network layer is set to 0, and the equivalent offset value of the target network layer is used as the offset value of the target network layer, thus ensuring the bias of the non-target network layer
  • the setting value is 0, the offset value of the target network layer is a very small offset value, and the offset values of the different network layers that reach the final neural network are small, which solves the artificial introduction in the neural network with batch normalization operation
  • the offset is too large, which leads to the problem of wasting high-order information bits when the information is fixed. Using this solution can effectively reduce the bit width of the network after it is fixed,
  • An embodiment of the present application further provides a computer storage medium, wherein the computer storage medium stores a computer program for electronic data exchange, and the computer program causes the computer to execute any one of the neural network batches described in the above method embodiments Part or all of the steps of the fixed point method.
  • An embodiment of the present application further provides a computer program product, the computer program product includes a non-transitory computer-readable storage medium storing a computer program, the computer program causes the computer to execute any of the nerves described in the above method embodiments Some or all steps of the fixed-point method of network batch normalization.
  • the disclosed device may be implemented in other ways.
  • the device embodiments described above are only schematic.
  • the division of the unit is only a logical function division.
  • there may be another division manner for example, multiple units or components may be combined or may Integration into another system, or some features can be ignored, or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above integrated unit may be implemented in the form of hardware or software program modules.
  • the integrated unit is implemented in the form of a software program module and sold or used as an independent product, it may be stored in a computer-readable memory.
  • the technical solution of the present application may essentially be a part that contributes to the prior art or all or part of the technical solution may be embodied in the form of a software product, and the computer software product is stored in a memory.
  • Several instructions are included to enable a computer device (which may be a personal computer, server, network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application.
  • the foregoing memory includes: U disk, read-only memory (ROM), random access memory (RAM), mobile hard disk, magnetic disk or optical disk and other media that can store program codes.
  • the program may be stored in a computer-readable memory, and the memory may include: a flash disk , Read-only memory, random access device, magnetic disk or optical disk, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

一种神经网络批归一化的优化方法,包括:将神经网络中第一个网络层设置为起始层;从所述起始层开始向后依次获取不同网络层的初始偏置值;计算所述不同网络层的等效偏置值;判断是否存在目标网络层,其中,所述目标网络层的前一层所对应的等效偏置值与所述目标网络层所对应的等效偏置值之间的比值不小于预设阈值;若存在所述目标网络层,则将所述起始层至所述目标网络层的前一层之间的不同网络层的偏置值均设为0,并将所述目标网络层的等效偏置值作为所述目标网络层的偏置值。所述方法能够有效减少网络定点化后的位宽,有利于网络在各种现有位宽受限的AI硬件平台上部署。

Description

一种神经网络批归一化的优化方法及装置
本申请要求于2018年12月25日提交中国专利局,申请号为201811590504.8、发明名称为“一种神经网络批归一化的优化方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及神经网络领域,具体涉及一种神经网络批归一化的优化方法及装置。
背景技术
近年来,随着深度学习技术的发展,深度神经网络成为人工智能的关键核心技术。批归一化(Batch Normalization(Batch Norm))是一种加速深度网络训练过程的一种方法,然而Batch Norm的处理过程需要人为引入偏置,在后续的网络处理过程中,如果人为引入的偏置没有及时消除就有可能会逐步放大。如果该偏置放大幅度过大,会导致网络的定点化实现困难。如果直接对网络每一层的输出数据进行量化截位,会导致大量位宽浪费在无效的数据偏移上,而真正需要比特量化的去除偏移后的数据只能得到剩下的低位量化比特,导致网络的定点位宽膨胀。
发明内容
本申请实施例提供一种神经网络批归一化的优化方法及装置,能够减少人为引入的偏置带来的影响,有效降低网络的定点位宽需求。
本申请实施例的第一方面提供了一种神经网络批归一化的优化方法,包括:
将神经网络中第一个网络层设置为起始层;
从所述起始层开始向后依次获取不同网络层的初始偏置值;
调用预设算法对所述不同网络层的初始偏置值分别进行处理,以得到所述 不同网络层的等效偏置值;
判断是否存在目标网络层,其中,所述目标网络层的前一层所对应的等效偏置值与所述目标网络层所对应的等效偏置值之间的比值不小于预设阈值;
若存在所述目标网络层,则将所述起始层至所述目标网络层的前一层之间的不同网络层的偏置值均设为0,并将所述目标网络层的等效偏置值作为所述目标网络层的偏置值。
优选的,确认所述目标网络层不是位于所述神经网络的最后一层,则将所述目标网络层的后一层设置为起始层,并执行所述调用预设算法对所述从所述起始层开始向后依次获取的不同网络层的初始偏置值分别进行处理,以得到所述不同网络层的等效偏置值。
本申请实施例的第二方面提供了一种神经网络批归一化的优化装置,所述装置包括:
起始层设置模块,用于将神经网络中第一个网络层设置为起始层;
初始偏置值获取模块,用于从所述起始层开始向后依次获取不同网络层的初始偏置值;
数据处理模块,用于调用预设算法对所述不同网络层的初始偏置值分别进行处理,以得到所述不同网络层的等效偏置值;
判断模块,用于判断是否存在目标网络层,其中,所述目标网络层的前一层所对应的等效偏置值与所述目标网络层所对应的等效偏置值之间的比值不小于预设阈值;
优化处理模块,用于在存在所述目标网络层时,将所述起始层至所述目标网络层的前一层之间的不同网络层的偏置值均设为0,并将所述目标网络层的等效偏置值作为所述目标网络层的偏置值。
本申请实施例的第三方面提供一种终端,包括处理器、输入设备、输出设备和存储器,所述处理器、输入设备、输出设备和存储器相互连接,其中,所述存储器用于存储计算机程序,所述计算机程序包括程序指令,所述处理器被 配置用于调用所述程序指令,执行如本申请实施例第一方面的步骤指令。
本申请实施例的第四方面提供了一种计算机可读存储介质,其中,上述计算机可读存储介质存储用于电子数据交换的计算机程序,其中,上述计算机程序使得计算机执行如本申请实施例第一方面中所描述的部分或全部步骤。
本申请实施例的第五方面提供了一种计算机程序产品,其中,上述计算机程序产品包括存储了计算机程序的非瞬时性计算机可读存储介质,上述计算机程序可操作来使计算机执行如本申请实施例第一方面中所描述的部分或全部步骤。该计算机程序产品可以为一个软件安装包。
实施本申请实施例,至少具有如下有益效果:
通过本申请实施例,将每一个网络层的初始偏置值进行处理,得到每一个网络层的等效偏置值,通过获取到目标网络层,其中,目标网络层的前一层所对应的等效偏置值与该目标网络层所对应的等效偏置值之间的比值不小于预设阈值,以便确定该目标网络层的等效偏置值为一定范围内较小的等效偏置值,则将目标网络层之前的各网络层的偏置值设为0,并将目标网络层的等效偏置值作为目标网络层的偏置值,则保障了非目标网络层的偏置值为0,目标网络层的偏置值为很小的偏置值,达到最终神经网络的不同网络层的偏置值均较小,解决了含批归一化操作的神经网络中人为引入的偏置过大,导致信息定点化时浪费高位信息比特的问题。采用本方案,能够有效减少网络定点化后的位宽,有利于网络在各种现有位宽受限的AI硬件平台上部署。
附图说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本申请实施例提供的一种神经网络批归一化的优化装置的交互示意 图;
图2为本申请实施例提供的一种神经网络批归一化的优化方法的流程示意图;
图3为本申请实施例提供的一种神经网络批归一化的优化方法的流程示意图;
图4为本申请实施例提供的一种终端的结构示意图;
图5为本申请实施例提供的一种神经网络批归一化的优化装置的结构示意图。
具体实施方式
参照图1,图1为本申请实施例提供了一种神经网络批归一化的优化交互示意图。如图1所示,该交互包括终端101、预设算法数据库102,其中,终端101将神经网络中第一个网络层设置为起始层,所述终端101从预设算法数据库102调用预设算法计算所述起始层开始不同网络层的等效bias(偏置)值;所述终端101判断是否存在目标网络层,其中,所述目标网络层的前一层所对应的等效bias值与所述目标网络层所对应的等效bias值之间的比值不小于预设阈值;若存在,则所述终端101将所述起始层至所述目标网络层的前一层之间的不同网络层的偏置值均设为0,并将所述目标网络层的等效偏置值作为所述目标网络层的偏置值;若不存在目标网络层,则各网络层的初始偏置值不作处理;所述终端101确认所述目标网络层是否位于所述神经网络的最后一层;若不是,则所述终端101将所述目标网络层的后一层设置为起始层,并执行所述终端101从预设算法数据库102调用预设算法计算所述起始层开始不同网络层的等效bias值;若是,则所述终端101结束该神经网络批归一化的优化操作。
通过本申请实施例,将每一个网络层的初始偏置值进行处理,得到每一个网络层的等效偏置值,通过获取到目标网络层,其中,目标网络层的前一层所对应的等效偏置值与该目标网络层所对应的等效偏置值之间的比值不小于预设 阈值,以便确定该目标网络层的等效偏置值为一定范围内较小的等效偏置值,则将目标网络层之前的各网络层的偏置值设为0,并将目标网络层的等效偏置值作为目标网络层的偏置值,则保障了非目标网络层的偏置值为0,目标网络层的偏置值为很小的偏置值,达到最终神经网络的不同网络层的偏置值均较小,解决了含批归一化操作的神经网络中人为引入的偏置过大,导致信息定点化时浪费高位信息比特的问题。采用本方案,能够有效减少网络定点化后的位宽,有利于网络在各种现有位宽受限的AI硬件平台上部署。
参照图2,图2为本申请实施例提供了一种神经网络批归一化的优化方法的流程示意图。如图2所示,其可包括步骤201-205,具体如下:
201,将神经网络中第一个网络层设置为起始层;
202,从所述起始层开始向后依次获取不同网络层的初始偏置值;
其中,该初始偏置值是指神经网络中根据机器训练学习得到的bias值,作为优化前的初始偏置值;
203,调用预设算法对所述不同网络层的初始偏置值分别进行处理,以得到所述不同网络层的等效偏置值;
优选的,其中,起始层第一个网络层的等效偏置值即为第一个网络层的初始偏置值;第二个网络层的等效偏置值的计算,是通过将起始层的初始偏置值乘以第二个网络层的各权重所得到的和,再加上所述第二个网络层的初始偏置值,即得到第二个网络层的等效偏置值;则第N个网络层的等效偏置值即为第N-1个网络层的等效偏置值乘以第N个网络层的权重和所得到的值,再加上所述第N个网络层的初始偏置值,即得到第N个网络层的等效偏置值。
204,判断是否存在目标网络层,其中,所述目标网络层的前一层所对应的等效偏置值与所述目标网络层所对应的等效偏置值之间的比值不小于预设阈值;
其中,所述预设阈值可为100、1000或者几千不等,如预设阈值为100,目 标网络层所对应的等效bias值为3.5,目标网络层的前一层所对应的等效bias值为2000,则2000除以3.5等于571.43,则571.43不小于预设阈值。
205,若存在所述目标网络层,则将所述起始层至所述目标网络层的前一层之间的不同网络层的偏置值均设为0,并将所述目标网络层的等效偏置值作为所述目标网络层的偏置值;
在存在所述目标网络层时,则将所述起始层至所述目标网络层的前一层之间的不同网络层的偏置值均设为0,并将所述目标网络层的等效偏置值作为所述目标网络层的偏置值;即通过将满足条件的等效偏置值所对应的网络层的偏置值优化为其对应的等效偏置值,将该网络层之前的各网络层的偏置值优化为0,使得最终神经网络的不同网络层的偏置均较小,解决了含批归一化操作的神经网络中人为引入的偏置过大,导致信息定点化时浪费高位信息比特的问题。
进一步,还包括:确认所述目标网络层不是位于所述神经网络的最后一层,则将所述目标网络层的后一层设置为起始层,并执行所述调用预设算法对所述从所述起始层开始向后依次获取的不同网络层的初始偏置值分别进行处理,以得到所述不同网络层的等效偏置值。
通过本申请实施例,将每一个网络层的初始偏置值进行处理,得到每一个网络层的等效偏置值,通过获取到目标网络层,其中,目标网络层的前一层所对应的等效偏置值与该目标网络层所对应的等效偏置值之间的比值不小于预设阈值,以便确定该目标网络层的等效偏置值为一定范围内较小的等效偏置值,则将目标网络层之前的各网络层的偏置值设为0,并将目标网络层的等效偏置值作为目标网络层的偏置值,则保障了非目标网络层的偏置值为0,目标网络层的偏置值为很小的偏置值,达到最终神经网络的不同网络层的偏置值均较小,解决了含批归一化操作的神经网络中人为引入的偏置过大,导致信息定点化时浪费高位信息比特的问题。采用本方案,能够有效减少网络定点化后的位宽,有利于网络在各种现有位宽受限的AI硬件平台上部署。
在该实施例的基础上,当对于同一个起始层,在存在N个所述目标网络层 时,为了确定出一个合适的等效bias值,即一个目标网络层,则:
可获取所述N个所述目标网络层分别对应的等效bias值中最小的等效bias值;将所述最小的等效bias值所对应的网络层确定为优化的目标网络层。
通过将最小的等效bias值所对应的网络层确定为目标网络层,则目标网络层之前的各网络层的偏置值为0,使得最终神经网络的不同网络层的偏置值较小,解决了含批归一化操作的神经网络中人为引入的偏置过大,导致信息定点化时浪费高位信息比特的问题。
可替代的,当存在N个所述目标网络层时,则将N个所述目标网络层中的最后一个网络层确定为目标网络层。
通过将位于后面的等效bias值所对应的网络层确定为目标网络层,相比选取位于较前面网络层为目标网络层而言,当选取较前的网络层为目标网络层时,则在下一次循环获取目标网络层时候,需要将当前目标网络层的后一层作为起始层,依次获取所述目标网络层的后一层及其后面各层的等效偏置值,以及获取所述目标网络层的后一层及其后面各层的相邻前后层的等效偏置值的比值;因此相比较而言,一定程度上选取位于后面的等效bias值所对应的网络层为目标网络层,可以减少该优化方法在选取目标网络层时候的计算量,提高了优化处理的效率。
请参阅图3,图3为本申请实施例提供了一种神经网络批归一化的优化方法的流程示意图。当网络层的bias改变时,需要对激活层的激活操作进行相应修改,即需要对不同激活层的激活函数进行相应修改,否则如果在网络层的bias没有加入的情况下仍然按小于0的数据置0将产生错误。如图3所示,其可包括步骤301-309,具体如下:
301、将神经网络中第一个网络层设置为起始层;
302、从所述起始层开始向后依次获取不同网络层的初始偏置值;
303、调用预设算法对所述不同网络层的初始偏置值分别进行处理,以得到所述不同网络层的等效偏置值;
304、判断是否存在目标网络层,其中,所述目标网络层的前一层所对应的等效偏置值与所述目标网络层所对应的等效偏置值之间的比值不小于预设阈值;
305、若存在所述目标网络层,则将所述起始层至所述目标网络层的前一层之间的不同网络层的偏置值均设为0,并将所述目标网络层的等效偏置值作为所述目标网络层的偏置值;
若不存在所述目标网络层,则从起始层开始向后所有网络层的初始偏置值均不作任何改变,则按照正常神经网络的批归一化处理流程进行处理。
306、获取所述起始层至所述目标网络层之间的不同激活层的初始激活函数;
307、对所述起始层至所述目标网络层之间的不同激活层的初始激活函数分别进行处理,以得到所述起始层至所述目标网络层之间的不同激活层的激活函数;
通过获取所述起始层至所述目标网络层之间的不同激活层的等效偏置值;根据所述不同激活层的等效偏置值,对所述不同激活层的初始激活函数分别进行处理,以得到所述起始层至所述目标网络层之间的不同激活层的激活函数;
其中,常规的激活操作采用如下激活函数进行:
Figure PCTCN2019121522-appb-000001
将上述激活函数进行优化,则优化后的激活函数如下:
relu(x+bias_eq)=relu_new(x)+bias_eq;
Figure PCTCN2019121522-appb-000002
其中,x为调整前不同激活层的激活函数的输入值,bias_eq为不同激活层的等效偏置,relu_new(x)为修改后的激活函数。
308、对所述起始层至所述目标网络层之间的不同激活层按照所述不同激活层的激活函数分别进行激活处理;
309、确认所述目标网络层不是位于所述神经网络的最后一层,则将所述目标网络层的后一层设置为起始层,并执行所述调用预设算法对所述从所述起始层开始向后依次获取的不同网络层的初始偏置值分别进行处理,以得到所述不同网络层的等效偏置值。
通过本申请实施例,将每一个网络层的初始偏置值进行处理,获取到目标网络层,将目标网络层之前的各网络层的偏置值设为0,并将目标网络层的等效偏置值作为目标网络层的偏置值,则在相应修改偏置值的情况下对激活层的激活函数做出相应修改,以保证激活操作的正常进行。
进一步地,在上述各实施例的基础上,还包括:
确认所述起始层至所述目标网络层之间存在第A个网络层,所述第A个网络层存在有初始填充值,其中,A为整数,且A不小于1;
对所述第A个网络层的初始填充值进行处理,以得到所述第A个网络层的填充值。
通过获取所述第A个网络层的等效偏置值,根据所述第A个网络层的等效偏置值,对所述第A个网络层的初始填充值进行处理;
其中,padding_new(y)=y-bias_eq,y为优化前的不同网络层的初始padding(填充)值,采用该算法对神经网络中存在有初始填充值的层的不同padding值做出相应调整。
通过本申请实施例,在对各网络层的偏置值进行优化处理后,同时对不同网络层的填充值分别通过特定算法作相应调整,使得对该神经网络所做的优化之后,保障了该神经网络的其他参数的正常化。
进一步地,在对所述各网络层的偏置值进行优化之前,还包括:
获取所述神经网络中的P个批归一化层,其中,P为正整数;
确认所述P个批归一化层中存在P’个批归一化层,其中,所述P’个批归一化层中的每个批归一化层的前一网络层均为卷积层/全连接层,1≤P’≤P;
将所述P’个批归一化层中的每个批归一化层与所述每个批归一化层对应的前一网络层分别进行合并。
即,在进行偏置值优化处理之前,对满足条件的P’个批归一化层与其对应的前一网络层进行合并,使得该神经网络的层数变少,则相应的在进行计算各网络层的等效偏置值时以及查找目标网络层时的计算量减少,提高处理效率。
作为该方案的一具体的实施例,对于神经网络中Batch Norm层进行如下操作:
第k层为Batch Norm层且其包括多路输出,对所述第k个Batch Norm层的第i路输出可表示为:
Figure PCTCN2019121522-appb-000003
data_mean k[i]=mean(data k[i,:,:])
data_var k[i]=var(data k[i,:,:])
其中,data k[i,:,:]为第k层的输入数据,gamma k[i]、beta k[i]均为训练得到的第k个网络层的第i路的参数,data_var k[i]为训练得到的第k个网络层的第i路的方差,data_mean k[i]训练得到的第k个网络层的第i路的均差,ε为常数;
对式(1)继续进行扩展处理并合并,可得到第k个网络层的第i路输出表达式简化如下:
Figure PCTCN2019121522-appb-000004
Figure PCTCN2019121522-appb-000005
定义:
Figure PCTCN2019121522-appb-000006
Figure PCTCN2019121522-appb-000007
其中,weight k[i]表示第k个网络层的第i路的权重,bias k[i]表示第k个网络层的第i路的合并前的偏置值;
为了减少计算量,通常在神经网络部署时对式(2)可以进行处理,将其进行合并操作,并入第k-1个网络层,其中第k-1个网络层为卷积层或全连接层,即:
Figure PCTCN2019121522-appb-000008
其中,weight_new k-1[i]=weight k[i]*weight k-1[i];
bias_new k-1[i]=weight k[i]*bias k-1[i]+bias k[i];
bias_new k-1[i]表示第k-1个网络层的第i路按照上述公式合并后的初始偏置值,weight_new k-1[i]表示第k-1个网络层的第i路的变换后的权重。即第k层的输出可以采用第k-1层来表示,且第k-1层采用weight_new k-1和bias_new k-1作为系数和偏移。
将批归一化层与其对应的前一卷积层或全连接层进行合并,使得该神经网络的层数变少,则相应的在进行计算各网络层的等效偏置值时的计算量减少,提高处理效率。
则对于合并后的各网络层的等效偏置值的计算如下:
其中第k个网络层为卷积或全连接层时,所述第k个网络层包括多个支路,等效bias采用如下公式往后传递叠加:
bias_eq k[i]=bias_new k[i]+sum(weight_new k-1[i])*bias_eq k-1;       (4)
bias_eq k+1[i]=bias_new k+1[i]+sum(weight_new k[i])*bias_eq k;      (5)
其中,bias_eq k[i]为第k个网络层的第i路的等效偏置值。
计算出各层的等效偏置值后,如果某一网络层的前一层所对应的等效偏置值与该网络层所对应的等效偏置值之间的比值不小于预设阈值,则在该层将起始层至该网络层之前一层的所有偏置值设为0,并将该层的等效偏置值作为该层的偏置值。通过将该层之前的各层的偏置值均设为0,则该层之前的定点数据主要对变化丰富的有效数据进行编码,可以大大减小网络的位宽。
采用上面的处理,在偏置值进行修改时候,由于网络中一般都存在非线性的激活操作,因此相应的,还需要对激活操作进行修改,否则如果在网络的bias没有加入的情况下仍然按小于0的数据置0将产生错误。
常规的激活操作按下式进行:
Figure PCTCN2019121522-appb-000009
将上述激活操作进行修改,则修改后的激活操作如下:
relu(x+bias_eq)=relu_new(x)+bias_eq;
Figure PCTCN2019121522-appb-000010
其中,x为调整前不同激活层的激活操作的输入值,bias_eq为不同激活层的等效偏置,relu_new(x)为修改后的激活函数。
此外,在网络中存在padding的情况下,需要相应修改padding值。修改方式与激活操作类似,如果padding值为0,需要将padding值修改为-bias_eq。一般的,如果padding值为v,则修改后的padding值为v-bias_eq。
通过对不同网络层的激活操作以及存在padding值的层的padding值分别通过特定算法作相应调整,使得对该神经网络所做的优化调整之后,保障了该神 经网络的其他参数的正常化。
与上述实施例一致的,请参阅图4,图4为本申请实施例提供的一种终端的结构示意图,如图所示,包括处理器、输入设备、输出设备和存储器,所述处理器、输入设备、输出设备和存储器相互连接,其中,所述存储器用于存储计算机程序,所述计算机程序包括程序指令,所述处理器被配置用于调用所述程序指令,上述程序包括用于执行以下步骤的指令;
将神经网络中第一个网络层设置为起始层;
从所述起始层开始向后依次获取不同网络层的初始偏置值;
调用预设算法对所述不同网络层的初始偏置值分别进行处理,以得到所述不同网络层的等效偏置值;
判断是否存在目标网络层,其中,所述目标网络层的前一层所对应的等效偏置值与所述目标网络层所对应的等效偏置值之间的比值不小于预设阈值;
若存在所述目标网络层,则将所述起始层至所述目标网络层的前一层之间的不同网络层的偏置值均设为0,并将所述目标网络层的等效偏置值作为所述目标网络层的偏置值;
确认所述目标网络层不是位于所述神经网络的最后一层,则将所述目标网络层的后一层设置为起始层,并执行所述调用预设算法对所述从所述起始层开始向后依次获取的不同网络层的初始偏置值分别进行处理,以得到所述不同网络层的等效偏置值。
通过本申请实施例,将每一个网络层的初始偏置值进行处理,得到每一个网络层的等效偏置值,通过获取到目标网络层,其中,目标网络层的前一层所对应的等效偏置值与该目标网络层所对应的等效偏置值之间的比值不小于预设阈值,以便确定该目标网络层的等效偏置值为一定范围内较小的等效偏置值,则将目标网络层之前的各网络层的偏置值设为0,并将目标网络层的等效偏置值作为目标网络层的偏置值,则保障了非目标网络层的偏置值为0,目标网络层的偏置值为很小的偏置值,达到最终神经网络的不同网络层的偏置值均较小,解 决了含批归一化操作的神经网络中人为引入的偏置过大,导致信息定点化时浪费高位信息比特的问题。采用本方案,能够有效减少网络定点化后的位宽,有利于网络在各种现有位宽受限的AI硬件平台上部署。
上述主要从方法侧执行过程的角度对本申请实施例的方案进行了介绍。可以理解的是,终端为了实现上述功能,其包含了执行各个功能相应的硬件结构和/或软件模块。本领域技术人员应该很容易意识到,结合本文中所提供的实施例描述的各示例的单元及算法步骤,本申请能够以硬件或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
本申请实施例可以根据上述方法示例对终端进行功能单元的划分,例如,可以对应各个功能划分各个功能单元,也可以将两个或两个以上的功能集成在一个处理单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。需要说明的是,本申请实施例中对单元的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。
与上述一致的,请参阅图5,图5为本申请实施例提供了一种神经网络批归一化的优化装置的结构示意图,所述装置包括起始层设置模块501、初始偏置值获取模块502、数据处理模块503、判断模块504、优化处理模块505,其中:
起始层设置模块501,用于将神经网络中第一个网络层设置为起始层;
初始偏置值获取模块502,用于从所述起始层开始向后依次获取不同网络层的初始偏置值;
数据处理模块503,用于调用预设算法对所述不同网络层的初始偏置值分别进行处理,以得到所述不同网络层的等效偏置值;
判断模块504,用于判断是否存在目标网络层,其中,所述目标网络层的前一层所对应的等效偏置值与所述目标网络层所对应的等效偏置值之间的比值不 小于预设阈值;
优化处理模块505,用于在存在所述目标网络层时,将所述起始层至所述目标网络层的前一层之间的不同网络层的偏置值均设为0,并将所述目标网络层的等效偏置值作为所述目标网络层的偏置值。
进一步,还包括激活函数处理模块,用于:
获取所述起始层至所述目标网络层之间的不同激活层的初始激活函数,并获取所述起始层至所述目标网络层之间的不同激活层的等效偏置值,并根据所述不同激活层的等效偏置值,对所述不同激活层的初始激活函数分别进行处理,以得到所述起始层至所述目标网络层之间的不同激活层的激活函数;对所述起始层至所述目标网络层之间的不同激活层按照其激活函数分别进行激活处理。
进一步,还包括填充值处理模块,用于:
确认所述起始层至所述目标网络层之间存在第A个网络层,所述第A个网络层存在有初始填充值,其中,A为整数,且A不小于1,并对所述第A个网络层的初始填充值进行处理,以得到所述第A个网络层的填充值。
进一步,还包括合并处理模块,用于:
获取所述神经网络中的P个批归一化层,其中,P为正整数,并确认所述P个批归一化层中存在P’个批归一化层时,将所述P’个批归一化层中的每个批归一化层与所述每个批归一化层对应的前一网络层分别进行合并,其中,所述P’个批归一化层中的每个批归一化层的前一网络层均为卷积层/全连接层,1≤P’≤P。
进一步,还包括目标网络层确定模块,用于:
在存在N个所述目标网络层时,获取所述N个所述目标网络层分别对应的等效偏置值中最小的等效偏置值,其中,N为大于1的整数,并将所述最小的等效偏置值所对应的网络层确定为目标网络层;
或者,用于在存在N个所述目标网络层时,将N个所述目标网络层中的最后一个网络层确定为目标网络层。
通过本申请实施例,将每一个网络层的初始偏置值进行处理,得到每一个网络层的等效偏置值,通过获取到目标网络层,其中,目标网络层的前一层所对应的等效偏置值与该目标网络层所对应的等效偏置值之间的比值不小于预设阈值,以便确定该目标网络层的等效偏置值为一定范围内较小的等效偏置值,则将目标网络层之前的各网络层的偏置值设为0,并将目标网络层的等效偏置值作为目标网络层的偏置值,则保障了非目标网络层的偏置值为0,目标网络层的偏置值为很小的偏置值,达到最终神经网络的不同网络层的偏置值均较小,解决了含批归一化操作的神经网络中人为引入的偏置过大,导致信息定点化时浪费高位信息比特的问题。采用本方案,能够有效减少网络定点化后的位宽,有利于网络在各种现有位宽受限的AI硬件平台上部署。
本申请实施例还提供一种计算机存储介质,其中,该计算机存储介质存储用于电子数据交换的计算机程序,该计算机程序使得计算机执行如上述方法实施例中记载的任何一种神经网络批归一化的定点化方法的部分或全部步骤。
本申请实施例还提供一种计算机程序产品,所述计算机程序产品包括存储了计算机程序的非瞬时性计算机可读存储介质,该计算机程序使得计算机执行如上述方法实施例中记载的任何一种神经网络批归一化的定点化方法的部分或全部步骤。
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请并不受所描述的动作顺序的限制,因为依据本申请,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定是本申请所必须的。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。
在本申请所提供的几个实施例中,应该理解到,所揭露的装置,可通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如所述单 元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在申请明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件程序模块的形式实现。
所述集成的单元如果以软件程序模块的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储器中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储器中,包括若干指令用以使得一台计算机设备(可为个人计算机、服务器或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储器包括:U盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。
本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,该程序可以存储于一计算机可读存储器中,存储器可以包括:闪存盘、只读存储器、随机存取器、磁盘或光盘等。
以上对本申请实施例进行了详细介绍,本文中应用了具体个例对本申请的 原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。

Claims (10)

  1. 一种神经网络批归一化的优化方法,其特征在于,包括:
    将神经网络中第一个网络层设置为起始层;
    从所述起始层开始向后依次获取不同网络层的初始偏置值;
    调用预设算法对从所述起始层开始向后依次获取的不同网络层的初始偏置值分别进行处理,以得到所述不同网络层的等效偏置值;
    判断是否存在目标网络层,其中,所述目标网络层的前一层所对应的等效偏置值与所述目标网络层所对应的等效偏置值之间的比值不小于预设阈值;
    若存在所述目标网络层,则将所述起始层至所述目标网络层的前一层之间的不同网络层的偏置值均设为0,并将所述目标网络层的等效偏置值作为所述目标网络层的偏置值。
  2. 根据权利要求1所述的方法,其特征在于,还包括:
    获取所述起始层至所述目标网络层之间的不同激活层的初始激活函数;
    获取所述起始层至所述目标网络层之间的不同激活层的等效偏置值;
    根据所述不同激活层的等效偏置值,对所述不同激活层的初始激活函数分别进行处理,以得到所述起始层至所述目标网络层之间的不同激活层的激活函数;
    对所述起始层至所述目标网络层之间的不同激活层按照其激活函数分别进行激活处理。
  3. 根据权利要求2所述的方法,其特征在于,所述对所述起始层至所述目标网络层之间的不同激活层按照所述不同激活层的激活函数分别进行激活处理之后,包括:
    确认所述起始层至所述目标网络层之间存在第A个网络层,所述第A个网络层存在有初始填充值,其中,A为整数,且A不小于1;
    获取所述第A个网络层的等效偏置值;
    根据所述第A个网络层的等效偏置值,对所述第A个网络层的初始填充值进行处理,以得到所述第A个网络层的填充值。
  4. 根据权利要求1至3任一项所述的方法,其特征在于,所述将神经网络中第一个网络层设置为起始层之前,还包括:
    获取所述神经网络中的P个批归一化层,其中,P为正整数;
    确认所述P个批归一化层中存在P’个批归一化层,其中,所述P’个批归一化层中的每个批归一化层的前一网络层均为卷积层/全连接层,1≤P’≤P;
    将所述P’个批归一化层中的每个批归一化层与其对应的前一网络层分别进行合并。
  5. 根据权利要求4所述的方法,其特征在于,包括:
    当存在N个所述目标网络层时,则获取所述N个所述目标网络层分别对应的等效偏置值中最小的等效偏置值,其中,N为大于1的整数;
    将所述最小的等效偏置值所对应的网络层确定为目标网络层;
    或者,当存在N个所述目标网络层时,则将N个所述目标网络层中的最后一个网络层确定为目标网络层。
  6. 一种神经网络批归一化的优化装置,其特征在于,所述装置包括:
    起始层设置模块,用于将神经网络中第一个网络层设置为起始层;
    初始偏置值获取模块,用于从所述起始层开始向后依次获取不同网络层的初始偏置值;
    数据处理模块,用于调用预设算法对所述不同网络层的初始偏置值分别进行处理,以得到所述不同网络层的等效偏置值;
    判断模块,用于判断是否存在目标网络层,其中,所述目标网络层的前一层所对应的等效偏置值与所述目标网络层所对应的等效偏置值之间的比值不小 于预设阈值;
    优化处理模块,用于在存在所述目标网络层时,将所述起始层至所述目标网络层的前一层之间的不同网络层的偏置值均设为0,并将所述目标网络层的等效偏置值作为所述目标网络层的偏置值。
  7. 根据权利要求6所述的装置,其特征在于,还包括激活函数处理模块,用于:
    获取所述起始层至所述目标网络层之间的不同激活层的初始激活函数,并获取所述起始层至所述目标网络层之间的不同激活层的等效偏置值,并根据所述不同激活层的等效偏置值,对所述不同激活层的初始激活函数分别进行处理,以得到所述起始层至所述目标网络层之间的不同激活层的激活函数;对所述起始层至所述目标网络层之间的不同激活层按照其激活函数分别进行激活处理。
  8. 根据权利要求7所述的装置,其特征在于,还包括填充值处理模块,用于:
    确认所述起始层至所述目标网络层之间存在第A个网络层,所述第A个网络层存在有初始填充值,其中,A为整数,且A不小于1,并获取所述第A个网络层的等效偏置值,然后根据所述第A个网络层的等效偏置值,对所述第A个网络层的初始填充值进行处理,以得到所述第A个网络层的填充值。
  9. 根据权利要求8所述的装置,其特征在于,还包括合并处理模块,用于:
    获取所述神经网络中的P个批归一化层,其中,P为正整数,并确认所述P个批归一化层中存在P’个批归一化层时,将所述P’个批归一化层中的每个批归一化层与所述每个批归一化层对应的前一网络层分别进行合并,其中,所述P’个批归一化层中的每个批归一化层的前一网络层均为卷积层/全连接层,1≤P’≤P。
  10. 根据权利要求9所述的装置,其特征在于,还包括目标网络层确定模块,用于:
    在存在N个所述目标网络层时,获取所述N个所述目标网络层分别对应的等效偏置值中最小的等效偏置值,其中,N为大于1的整数,并将所述最小的等效偏置值所对应的网络层确定为目标网络层;
    或者,用于在存在N个所述目标网络层时,将N个所述目标网络层中的最后一个网络层确定为目标网络层。
PCT/CN2019/121522 2018-12-25 2019-11-28 一种神经网络批归一化的优化方法及装置 WO2020134829A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/257,314 US11270208B2 (en) 2018-12-25 2019-11-28 Neural network batch normalization optimization method and apparatus

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811590504.8A CN109800859B (zh) 2018-12-25 2018-12-25 一种神经网络批归一化的优化方法及装置
CN201811590504.8 2018-12-25

Publications (1)

Publication Number Publication Date
WO2020134829A1 true WO2020134829A1 (zh) 2020-07-02

Family

ID=66557498

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/121522 WO2020134829A1 (zh) 2018-12-25 2019-11-28 一种神经网络批归一化的优化方法及装置

Country Status (3)

Country Link
US (1) US11270208B2 (zh)
CN (1) CN109800859B (zh)
WO (1) WO2020134829A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109800859B (zh) * 2018-12-25 2021-01-12 深圳云天励飞技术有限公司 一种神经网络批归一化的优化方法及装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160358070A1 (en) * 2015-06-04 2016-12-08 Samsung Electronics Co., Ltd. Automatic tuning of artificial neural networks
CN107451653A (zh) * 2017-07-05 2017-12-08 深圳市自行科技有限公司 深度神经网络的计算方法、装置及可读存储介质
CN108416423A (zh) * 2017-02-10 2018-08-17 三星电子株式会社 用于神经网络修剪和再训练的自动阈值
CN108734268A (zh) * 2017-04-21 2018-11-02 展讯通信(上海)有限公司 深度神经网络模型的压缩方法及装置、终端、存储介质
CN109800859A (zh) * 2018-12-25 2019-05-24 深圳云天励飞技术有限公司 一种神经网络批归一化的优化方法及装置

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2763398B2 (ja) * 1990-11-20 1998-06-11 キヤノン株式会社 パターン認識装置
CN104199536B (zh) * 2014-07-23 2017-06-06 西安空间无线电技术研究所 一种基于bp神经网络的fpga动态功耗估算方法
US9633282B2 (en) * 2015-07-30 2017-04-25 Xerox Corporation Cross-trained convolutional neural networks using multimodal images
EP3564866A4 (en) * 2016-12-28 2020-03-25 Shanghai Cambricon Information Technology Co., Ltd CALCULATION PROCEDURE
CN108345939B (zh) * 2017-01-25 2022-05-24 微软技术许可有限责任公司 基于定点运算的神经网络
KR102415508B1 (ko) * 2017-03-28 2022-07-01 삼성전자주식회사 컨볼루션 신경망 처리 방법 및 장치
JP6933367B2 (ja) * 2017-09-20 2021-09-08 Tokyo Artisan Intelligence株式会社 ニューラルネットワーク回路装置、システム、処理方法および実行プログラム
CN108875504B (zh) * 2017-11-10 2021-07-23 北京旷视科技有限公司 基于神经网络的图像检测方法和图像检测装置
US11354888B2 (en) * 2018-11-16 2022-06-07 GM Global Technology Operations LLC Method and apparatus for a neural network

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160358070A1 (en) * 2015-06-04 2016-12-08 Samsung Electronics Co., Ltd. Automatic tuning of artificial neural networks
CN108416423A (zh) * 2017-02-10 2018-08-17 三星电子株式会社 用于神经网络修剪和再训练的自动阈值
CN108734268A (zh) * 2017-04-21 2018-11-02 展讯通信(上海)有限公司 深度神经网络模型的压缩方法及装置、终端、存储介质
CN107451653A (zh) * 2017-07-05 2017-12-08 深圳市自行科技有限公司 深度神经网络的计算方法、装置及可读存储介质
CN109800859A (zh) * 2018-12-25 2019-05-24 深圳云天励飞技术有限公司 一种神经网络批归一化的优化方法及装置

Also Published As

Publication number Publication date
CN109800859B (zh) 2021-01-12
CN109800859A (zh) 2019-05-24
US20210182685A1 (en) 2021-06-17
US11270208B2 (en) 2022-03-08

Similar Documents

Publication Publication Date Title
TWI785227B (zh) 深度神經網絡中批量標準化層修剪的方法
TWI794157B (zh) 自動多閾值特徵過濾方法及裝置
WO2020134828A1 (zh) 一种神经网络推理结构优化方法及装置
US7782979B2 (en) Base-band digital pre-distortion-based method for improving efficiency of RF power amplifier
WO2015089148A2 (en) Reducing dynamic range of low-rank decomposition matrices
US20210342696A1 (en) Deep Learning Model Training Method and System
CN109583586B (zh) 一种语音识别或图像识别中的卷积核处理方法及装置
CN110503194B (zh) 一种分布式并行训练的方法和系统
WO2022148071A1 (zh) 一种图像特征提取方法、装置、设备及存储介质
US11853594B2 (en) Neural network computing chip and computing method
CN111414641A (zh) 一种基于采样的个性化差分隐私保护方法及系统
CN104112445A (zh) 一种终端及语音识别方法
CN111758104A (zh) 适合于硬件实现的神经网络参数优化方法、神经网络计算方法和装置
CN113436105A (zh) 模型训练和图像优化方法、装置、电子设备及存储介质
WO2020134829A1 (zh) 一种神经网络批归一化的优化方法及装置
JPH03278166A (ja) 多層パーセプトロン・デバイスのネットワーク・パラメータの調整方法及び同方法の実施手段を備えたパーセプトロン・デバイス
WO2021056941A1 (zh) 一种动态调整神经网络通道的方法、装置以及设备
CN110458287B (zh) 神经网络优化器的参数更新方法、装置、终端及存储介质
WO2020177378A1 (zh) 文本信息的特征提取方法、装置、计算机设备及存储介质
CN110874635A (zh) 一种深度神经网络模型压缩方法及装置
CN113077056A (zh) 基于横向联邦学习的数据处理系统
CN109102468A (zh) 图像增强方法、装置、终端设备及存储介质
US20230087774A1 (en) Parameter optimization method, electronic device, and storage medium
US11170297B2 (en) Method of neural network training using floating-point signed digit representation
CN115953651A (zh) 一种基于跨域设备的模型训练方法、装置、设备及介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19903660

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19903660

Country of ref document: EP

Kind code of ref document: A1