CN111860814A - Device and method for executing batch normalization operation - Google Patents

Device and method for executing batch normalization operation Download PDF

Info

Publication number
CN111860814A
CN111860814A CN202010617696.8A CN202010617696A CN111860814A CN 111860814 A CN111860814 A CN 111860814A CN 202010617696 A CN202010617696 A CN 202010617696A CN 111860814 A CN111860814 A CN 111860814A
Authority
CN
China
Prior art keywords
neuron
operation module
input neuron
vector
gradient
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010617696.8A
Other languages
Chinese (zh)
Other versions
CN111860814B (en
Inventor
刘少礼
于涌
陈云霁
陈天石
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Cambricon Technologies Corp Ltd
Original Assignee
Cambricon Technologies Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Cambricon Technologies Corp Ltd filed Critical Cambricon Technologies Corp Ltd
Priority to CN202010617696.8A priority Critical patent/CN111860814B/en
Publication of CN111860814A publication Critical patent/CN111860814A/en
Application granted granted Critical
Publication of CN111860814B publication Critical patent/CN111860814B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Neurology (AREA)
  • Memory System Of A Hierarchy Structure (AREA)
  • Image Analysis (AREA)

Abstract

The present disclosure discloses an apparatus for performing a batch normalization operation, comprising an arithmetic module. The device can be used for realizing batch normalization operation in the multilayer artificial neural network. For the batch normalization operation, the square root of the sum of the squared difference and the infinitesimal value eps is divided by the input minus the mean in the forward process. And then multiplying the result by a learning parameter alpha and a learning parameter beta to obtain the layer output. In the reverse training process, the mean value of the gradient vector is subtracted from the input gradient vector, then the mean value of the product of the gradient vector and the output obtained in the forward direction is subtracted, multiplied by the output, and the obtained difference value is divided by the square root of the sum of the variance in the forward direction and the minimum definite value to obtain the output gradient vector of the layer. The method effectively improves the support of batch normalization forward and reverse operations in the artificial neural network.

Description

Device and method for executing batch normalization operation
Technical Field
The present disclosure relates to artificial neural network technology, and in particular, to an apparatus and method for performing batchnormalization forward and backward operations in an artificial neural network.
Background
In recent years, due to the high recognition accuracy and the good parallelism of the multilayer artificial neural network, the multilayer artificial neural network receives more and more extensive attention from academia and industry, and because of the characteristic that the batch normalization operation in the multilayer artificial neural network can accelerate the training speed of the neural network and improve the recognition accuracy, the multilayer artificial neural network is more and more applied to the multilayer neural network.
One known method of supporting batch normalization operations is to use a general purpose processor. The method supports the above algorithm by executing general instructions using a general register file and general functional units. One of the disadvantages of this method is that the single general-purpose processor has a low operation performance and cannot meet the performance requirements of the common multi-layer artificial neural network operation. When multiple general-purpose processors are executed in parallel, the mutual communication between the general-purpose processors becomes a performance bottleneck. In addition, the general processor needs to decode the forward operation of the multilayer artificial neural network into a long-row operation and access instruction sequence, and the front-end decoding of the processor brings large power consumption overhead.
Another known method to support batch normalization is to use a Graphics Processor (GPU). The method supports the above algorithm by executing general purpose SIMD instructions using a general purpose register file and a general purpose stream processing unit. Because the GPU is a device specially used for performing graphics image operations and scientific calculations, there is no special support for the multilayer artificial neural network batch normalization operations, and a large amount of front-end decoding work is still required to perform the multilayer artificial neural network operations, which brings a large amount of additional overhead. In addition, the GPU only has a small on-chip cache, model data of the multilayer artificial neural network batch simulation needs to be carried out from the outside of the chip repeatedly, and the bandwidth of the outside of the chip becomes a main performance bottleneck. And the batch normalization operation has a large number of normalization operations such as summation, and the parallel architecture of the GPU is not suitable for the large number of normalization operations.
Disclosure of Invention
One aspect of the present disclosure provides an apparatus for performing an artificial neural network batch normalization operation, including an instruction storage unit, a controller unit, a data access unit, and an operation module, wherein: the instruction storage unit is used for reading in the instruction through the data access unit and caching the read instruction; the controller unit is used for reading the instruction from the instruction storage unit and decoding the instruction into a microinstruction for controlling the operation module; the data access unit is used for writing data into a corresponding data cache unit of the operation module from an external address space or reading data from the data cache unit to the external address space; and the operation module is used for specifically calculating the data.
Another aspect of the disclosure provides a method of performing a batch normalization forward operation using the apparatus described above. In use, it is assumed that x is each input neuron element and y is an output element. Learning parameters alpha, beta, an infinitesimal constant eps, a mean value E [ x ] and a variance var [ x ] are constants obtained in a training process, and the device completes the batch normalization forward direction y ═ f (x) ═ alpha (x-E [ x ])/sqrt (var (x) + eps) + beta calculation process in parallel to obtain an output neuron. In the training process, the forward operation needs to dynamically calculate the mean E [ x ], the variance var [ x ]. The operation module of the device is used for finishing the accumulation and (normalization) operation in the process of calculating the mean value and the variance, thereby calculating the mean value and the variance of each iteration in the training process.
Another aspect of the disclosure provides a method of performing a batch normalization inverse operation using the apparatus described above. Assuming that the gradient transmitted by one pixel point is dl/dY and the forward process output is Y, the gradient dl/dx (alpha/sqrt (var (x)) + eps) (dl/dY-mean (dl/dY) Y) transmitted in reverse direction by the batchnormalization learns the gradient of alpha of the parameter: dl/dalpha (∑ dl/dY) × Y, gradient of learning parameter beta: dl/dbeta ═ Σ dl/dY. The inverse process of batch normalization completes the normalization operation of the neurons such as taking the mean, variance and the like in parallel through the operation unit.
The present disclosure may be applied in the following (including but not limited to) scenarios: the system comprises a data processing device, a robot, a computer, a printer, a scanner, a telephone, a tablet computer, an intelligent terminal, a mobile phone, a vehicle data recorder, a navigator, a sensor, a camera, a cloud server, a camera, a video camera, a projector, a watch, an earphone, a mobile storage, a wearable device and other various electronic products; various vehicles such as airplanes, ships, vehicles, and the like; various household appliances such as televisions, air conditioners, microwave ovens, refrigerators, electric cookers, humidifiers, washing machines, electric lamps, gas stoves, range hoods and the like; and various medical devices including nuclear magnetic resonance apparatuses, B-ultrasonic apparatuses, electrocardiographs and the like. According to the method and the device, the problem that the CPU and the GPU are insufficient in operation performance and high in front-end decoding overhead is solved by adopting the device and the instruction set for executing batch normalization operation. The method effectively improves the support of the forward and reverse operation of batch normalization.
According to the method, the special on-chip cache for batch normalization operation is adopted, the reusability of input neurons and intermediate data is fully excavated, the data are prevented from being read to the memory repeatedly, the memory access bandwidth is reduced, and the problem that the memory bandwidth becomes the bottleneck of forward operation performance of a multilayer artificial neural network is solved.
The present disclosure better balances the relationship between parallel and serial by employing dedicated arithmetic units for the batch normalization operation. The defects that the CPU architecture is only operated in series, the speed is low when the data scale is large, the GPU architecture is only operated in parallel, and the normalization operation is not easy to process are avoided. The data storage unit and the operation unit are matched to well balance normalized serial operation and parallel operation.
Drawings
For a more complete understanding of the present disclosure and the advantages thereof, reference is now made to the following descriptions taken in conjunction with the accompanying drawings, in which:
fig. 1 shows an example block diagram of the overall structure of an apparatus for performing a batch normalization operation according to an embodiment of the present disclosure.
FIG. 2 illustrates an example block diagram of an operational block configuration in an apparatus for performing a batch normalization operation according to an embodiment of this disclosure.
FIG. 3 illustrates an example block diagram of a batch normalization operation process in accordance with an embodiment of this disclosure.
FIG. 4 illustrates a flow diagram of a batch normalization operation according to an embodiment of the disclosure.
Detailed Description
For the purpose of promoting a better understanding of the objects, aspects and advantages of the present disclosure, reference is made to the following detailed description taken in conjunction with the accompanying drawings.
The batch normalization operation includes two parts, forward and backward. The forward direction and the reverse direction of the batch normalization operation need to be applied in the training process of the artificial neural network, and only the forward direction process of the batch normalization operation is executed in the using process of the artificial neural network. Parameters obtained by a training process are used in the use process of the artificial neural network, and data such as mean, variance and the like in batch normalization operation do not need to be repeatedly calculated.
Fig. 1 shows an overall block diagram of an apparatus for performing an artificial neural network batch normalization operation according to the present disclosure. As shown in fig. 1, the apparatus includes an instruction storage unit 1, a controller unit 2, a data access unit 3, and an arithmetic module 4. The instruction storage unit 1, the controller unit 2, the data access unit 3, and the operation module 4 may be implemented by hardware circuits (for example, including but not limited to an FPGA, a CGRA, an application specific integrated circuit ASIC, an analog circuit, a memristor, and the like).
The instruction storage unit 1 reads in instructions through the data access unit 3 and buffers the read instructions. The instruction storage unit may be implemented by various different memory devices (SRAM, eDRAM, DRAM, memristor, 3D-DRAM, or nonvolatile storage, etc.).
The controller unit 2 reads instructions from the instruction storage unit 1, decodes the instructions into micro-instructions that control the behavior of other units or modules, such as the data access unit 3, the arithmetic module 4, etc., and then distributes the respective micro-instructions to the respective units or modules.
The data access unit 3 can access an external address space, and directly read and write data to each cache unit in the device to complete loading and storing of the data.
Fig. 2 shows an example block diagram of the structure of the operation module 4 in the apparatus for performing the artificial neural network batch normalization operation according to the embodiment of the present disclosure. As shown in fig. 2, the operation module 4 includes an operation unit 41, a data dependency determination unit 42, a neuron buffer unit 43, and an intermediate value buffer unit 44.
The arithmetic unit 41 receives the microinstruction issued by the controller unit 2 and performs arithmetic logic operations.
The data dependency relationship determination unit 42 is responsible for reading and writing operations on the neuron cache unit in the calculation process. Before the data dependency judgment unit 42 performs the read/write operation, it first ensures that there is no read/write consistency conflict for the data used between the instructions. For example, all microinstructions destined for the data dependency unit 42 are stored in an instruction queue within the data dependency unit 42, in which queue a read instruction must wait until the dependent write instruction is executed if the read data range of the read instruction conflicts with the write data range of the write instruction located earlier in the queue.
The neuron buffer unit 43 buffers the input neuron vector data and the output neuron value data of the arithmetic module 4.
The intermediate value buffer unit 44 buffers intermediate value data required by the arithmetic module 4 in the calculation process. Such as partial sums, partial square sums, etc. calculated during the operation. For each operation module 4, the middle value buffer unit 44 stores the middle value data of the batch normalization operation process. Such as the forward batch normalization operation, during use of the artificial neural network. Let x be each input neuron data and y be output neuron data. And learning parameters alpha and beta, wherein the two parameters are continuously updated in the reverse training process and are used in a formula for calculating output neuron data y later. The minimum constant eps represents a minimum amount of data, and can be represented by a power of-5 of 10, or can be set to 0 in actual use. The mean value E [ x ] represents a mean value of the input data with the batch size as a total amount, and var [ x ] represents a variance of the corresponding input neuron data x with the batch size as a total amount. In artificial neural network algorithms, input neuron data typically has four dimensions: the input batch size, namely the batch (also called number), the input channel number, namely the channel, the input height and the input width determine the total number of input data x, and the E x and the var x are the mean values and the variances of the data in the other three dimensions calculated by taking the batch as the total number. The operation unit 41 may perform a computation process of y ═ f (x) ═ alpha (x-E [ x ])/sqrt (var (x) + eps) + beta in parallel, sqrt represents an opening operation, constant data of the process may be stored in the intermediate value buffer unit, and the obtained result is returned to the data access unit to obtain the output neuron. In artificial neural network algorithms, input neuron data typically has four dimensions: the input batch is the size of batch, the input channel number is channel, the input height is high, and the input width is width, and the four dimensions determine the total number of input data x, and E [ x ] and var [ x ] are the mean value and variance of data in the other three dimensions calculated by taking batch as the total number. In addition, because the data storage mode of the device is stored according to three dimensions of channel, height and weight, the device can sequentially read in the operations of summation, averaging and variance after the input neuron data x is read.
For the forward operation process of the batch normalization, the calculated mean variances e (x) and var (x) can be used as the mean and variance in the process of the batch normalization operation, the parameters are used as constants to be stored and operated, and the mean and variance in the process of the batch normalization operation in the subsequent calculation process can be calculated according to input data in the forward process. The arithmetic unit calculates the mean and variance data of each time. During each training iteration, the input neuron calculates the mean and variance via the operation unit, and puts the part of data in the median buffer unit 44 for the subsequent calculation of f (x) in the iteration.
The present disclosure also provides an instruction set for executing an artificial neural network batch normalization operation on the aforementioned apparatus. The instruction set comprises a CONFIG instruction, a COMPUTE instruction, an IO instruction, a NOP instruction, a JUMP instruction and a MOVE instruction, wherein:
the CONFIG instruction configures various constants required by calculation of a current layer before the beginning of batch normalization calculation;
the COMPUTE instruction completes the arithmetic logic calculation of the batch normalization process;
the IO instruction reads input data required by calculation from an external address space and stores the data back to the external space after the calculation is finished;
The NOP instruction is responsible for emptying all microinstructions in the microinstruction storage queue in the current device, and all instructions before the NOP instruction are guaranteed to be completely executed. NOP instructions do not contain any operations themselves;
the JUMP instruction is responsible for controlling the JUMP of the next instruction address to be read from the instruction storage unit and is used for realizing the JUMP of a control flow;
the MOVE instruction is responsible for carrying data at one address in the internal address space of the device to another address in the internal address space of the device, and the process is independent of the arithmetic unit and does not occupy the resources of the arithmetic unit in the execution process.
FIG. 3 illustrates an example block diagram of an artificial neural network batch normalization forward and backward operations, according to an embodiment of this disclosure. For the formula out (in-middle)/middle, in is the input neuron data and out is the output neuron data. midle is an intermediate value in the operation process, the intermediate value is an intermediate result of the mean value, the variance and the like which need to be subjected to normalization operation, and part of the intermediate values [ midle 1.., midlen ] in the normalization process are calculated in parallel by the operation module 4 and stored in the intermediate value cache unit 44. Then, the operation module 4 calculates the output neuron data out for each input neuron data in parallel by using the median, and obtains the final output vector.
FIG. 4 illustrates a flow diagram of the batch normalization forward operation in the training process, according to one embodiment. This flow chart describes the process of implementing the forward operation of the batch normalization operation shown in FIG. 3 using the apparatus and instruction set of the present disclosure.
In step S1, an IO instruction is stored in advance at the head address of the instruction storage unit 1.
In step S2, the operation starts, the controller unit 2 reads the IO instruction from the first address of the instruction storage unit 1, and according to the translated micro instruction, the data access unit 3 reads all corresponding batch normalization forward operation instructions from the external address space and buffers them in the instruction storage unit 1.
In step S3, the controller unit 2 then reads in the next IO instruction from the instruction storage unit, and according to the translated micro instruction, the data access unit 3 reads all the data (e.g., including input neuron vector, batch size, learning parameters alpha, beta, minimum eps, mean, variance, etc.) required by the operation module 4 from the external address space to the neuron buffer unit 43 of the operation module 4.
In step S4, the controller unit 2 then reads the next CONFIG instruction from the instruction storage unit, and based on the translated microinstruction, performs the device allocation batch normalization operation. For example, whether the forward operation process uses the calculated mean variance or calculates the mean variance according to the input.
In step S5, the controller unit 2 reads in the next component instruction from the instruction storage unit, and according to the translated microinstruction, the operation module 4 reads the input neuron vector from the neuron buffer unit, calculates the mean and variance of the input neuron, and stores the mean and variance into the median buffer unit.
In step S6, the operation module 4 subtracts the mean value from the data input into the neuron buffer unit and the median buffer unit according to the microinstruction decoded from the compare instruction, divides the result by the square root operation of the square difference and the sum of the minimum eps, and stores the result back to the median buffer unit.
In step S7, the operation module 4 reads the learning parameter alpha from the neuron cache unit 43 according to the microinstruction decoded by the compare instruction, multiplies the learning parameter alpha by the intermediate value, adds the learning parameter beta, and returns the result to the neuron cache.
In step S8, the controller unit then reads in the next IO instruction from the instruction storage unit, and based on the translated microinstruction, the data access unit 3 stores the output neuron vector in the neuron cache unit 43 to the external address space specified address, and the operation ends.
The difference between the forward process using the batch normalization operation in the training process and the forward process using the batch normalization operation in the training process is that the constant mean and variance are used in step S4, and each dynamic calculation is not required, i.e., step S5 is eliminated. The rest is the same as fig. 4.
The reverse process for the batch normalization operation is similar to the forward process described above. The difference is in the data of the operation. Assuming that the gradient transmitted by one pixel point is dl/dY, the gradient transmitted in the reverse direction is dl/dx, the output of the forward process is Y, and the rest parameters represent the same meaning as the forward process, the gradient transmitted in the reverse direction by the batch normalization is dl/dx ═ alpha/sqrt (var (x)) + eps) (dl/dY-mean (dl/dY) Y), wherein mean is the averaging operation. Gradient of alpha of learning parameter: dl/dalpha (∑ dl/dY) × Y, gradient of learning parameter beta: dl/dbeta ∑ dl/dY, the values of the learning parameters are updated by these two gradients. The inverse process of batch normalization normalizes the gradient data, such as taking the mean, variance, etc., by the arithmetic unit. And then the operation units finish the rest operations in the formula in parallel.
By adopting the device and the instruction set for executing batch normalization operation, the problems of insufficient operation performance of the CPU and the GPU and high front-end decoding overhead are solved. The method effectively improves the support of the forward and reverse operation of batch normalization.
By adopting the special on-chip cache for batch normalization operation, the reusability of input neurons and intermediate data is fully mined, the data are prevented from being read to the memory repeatedly, the memory access bandwidth is reduced, and the problem that the memory bandwidth becomes the bottleneck of forward operation performance of a multilayer artificial neural network is avoided.
The relationship between parallel and serial is better balanced by employing dedicated arithmetic units for the batch normalization operation. The defects that the CPU architecture is only operated in series, the speed is low when the data scale is large, the GPU architecture is only operated in parallel, and the normalization operation is not easy to process are avoided. The data storage unit and the operation unit are matched to well balance normalized serial operation and parallel operation.
The above-mentioned embodiments are intended to illustrate the objects, aspects and advantages of the present disclosure in further detail, and it should be understood that the above-mentioned embodiments are only illustrative of the present disclosure and are not intended to limit the present disclosure, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present disclosure should be included in the scope of the present disclosure.

Claims (14)

1. A method for performing batch normalization operations during neural network training, the method being applied in a computing device comprising a controller unit and a computation module;
the controller unit reads instructions and decodes the instructions into micro-instructions;
the controller unit generates the microinstructions to the operation module;
And the operation module executes the forward operation or the backward operation of the batch normalization according to the microinstruction.
2. The method of claim 1, wherein the operation module comprises the operation module including a neuron buffer unit and an intermediate value buffer unit;
the operation module executes the forward operation of the batch normalization operation according to the microinstruction, and the forward operation comprises the following steps:
the operation module reads an input neuron vector from the neuron cache unit according to the microinstruction and calculates the mean value of the input neuron vector and the variance of the input neuron vector;
the operation module stores the mean value of the input neuron vector and the variance of the input neuron vector into the intermediate value cache unit;
the operation module reads a learning parameter and an input neuron vector from the neuron cache unit, reads a mean value of the input neuron vector and a variance of the input neuron vector from the intermediate value cache unit, and calculates and obtains an output neuron according to the learning parameter, the input neuron vector, the mean value of the input neuron vector and the variance of the input neuron vector;
and the operation module stores the output neurons into the neuron cache unit.
3. The method of claim 2, wherein the operation module reads the learning parameters and the input neuron vector from the neuron buffer unit, reads the mean of the input neuron vector and the variance of the input neuron vector from the intermediate value buffer unit, and calculates the output neuron according to the learning parameters, the input neuron vector, the mean of the input neuron vector and the variance of the input neuron vector, and comprises:
the operation module reads an input neuron vector from the neuron buffer unit, reads the mean value of the input neuron vector and the variance of the input neuron vector from an intermediate value buffer unit, and performs parallel calculation to obtain an intermediate value in a normalization process according to the input neuron vector, the mean value of the input neuron vector and the variance of the input neuron vector;
the operation module stores the intermediate value of the normalization process into an intermediate value cache unit;
the operation module reads the learning parameters from the neuron cache unit, reads the intermediate value of the normalization process from the intermediate value cache unit, and obtains the output neurons according to the learning parameters and the intermediate value.
4. The method of claim 2, the learning parameters comprising learning parameters
The operation module calculates and obtains an output neuron according to the learning parameter, the input neuron vector, the mean value of the input neuron vector and the variance of the input neuron vector, and comprises the following steps:
the operation module is used for: y ═ alpha (x-E [ x ])/sqrt (var (x) + eps) + beta to obtain an output neuron;
wherein x is an input neuron vector, y is an output neuron, and alpha and beta are learning parameters which are continuously updated in the training process and are used for calculating new output neuron data; eps is a very small constant, E [ x ] is the mean, and var (x) is the variance.
5. The method of claim 2, the operational module further comprising a data dependency determination unit;
the operation module reads an input neuron vector from the neuron cache unit according to the microinstruction, and comprises:
the data dependency relationship judging unit judges whether read-write consistency conflict exists between the microinstructions;
and if the read-write consistency conflict does not exist, the operation module reads an input neuron vector from the neuron cache unit according to the microinstruction.
6. The method of claim 2, the operational module further comprising a data dependency determination unit;
the operation module stores the output neuron into the neuron cache unit, and includes:
the data dependency relationship judging unit judges whether read-write consistency conflict exists between the microinstructions;
and if the read-write consistency conflict does not exist, the operation module stores the output neuron into the neuron cache unit.
7. The method of claim 1, the operation module comprising the operation module including a neuron buffer unit and an intermediate value buffer unit; it is characterized in that the preparation method is characterized in that,
the operation module executes the inverse operation of the batch normalization operation according to the microinstruction, and the inverse operation comprises the following steps:
the operation module reads an input neuron gradient from the neuron cache unit according to the microinstruction and calculates a mean value of the input neuron gradient and a variance of the input neuron gradient;
the operation module stores the mean value of the input neuron gradient and the variance of the input neuron gradient into the intermediate value cache unit;
the operation module reads a learning parameter gradient and an input neuron gradient from the neuron cache unit, reads a mean value of the input neuron gradient and a variance of the input neuron gradient from an intermediate value cache unit, and calculates to obtain an output neuron gradient according to the learning parameter gradient, the input neuron gradient, the mean value of the input neuron gradient and the variance of the input neuron gradient;
And the operation module stores the output neuron gradient into the neuron cache unit.
8. The method of claim 7, wherein the operation module reads a learning parameter gradient and an input neuron gradient from the neuron buffer unit, reads a mean value of the input neuron gradient and a variance of the input neuron gradient from an intermediate value buffer unit, and calculates an output neuron gradient according to the learning parameter gradient, the input neuron gradient, the mean value of the input neuron gradient and the variance of the input neuron gradient, and comprises:
dl/dx ═ (alpha/sqrt (var (x)) + eps)) (dl/dY-mean (dl/dY) -mean (dl/dY x Y) × Y), where the input neuron gradient is dl/dY, the output neuron gradient is dl/dx, mean is the averaging operation, eps is a minimum constant, E [ x ] is the mean, alpha, beta are the learning parameters; the learning parameters are continuously updated in the training process and are used for calculating new output neuron gradients; the gradient of the learning parameter alpha is: dl/dalpha ═ Y (Σ dl/dY), the gradient of the learning parameter beta is: dl/dbeta ═ Σ dl/dY.
9. A method for executing batch normalization operation in the use process of a neural network is applied to an arithmetic device, and the arithmetic device comprises a controller unit and an arithmetic module;
The controller unit reads instructions and decodes the instructions into micro-instructions;
the controller unit generates the microinstructions to the operation module;
the operation module executes batch normalization operation according to the microinstruction.
10. The method of claim 9, wherein the operation module comprises the operation module including a neuron buffer unit and an intermediate value buffer unit;
the operation module executes batch normalization operation according to the microinstruction, and comprises the following steps:
the operation module reads an input neuron vector and a learning parameter from the neuron cache unit according to the microinstruction;
the arithmetic unit configures a mean constant and a variance constant according to the microinstruction;
the operation module calculates and obtains an output neuron according to the learning parameter, the input neuron vector, the mean constant and the variance constant;
and the operation module stores the output neurons into the neuron cache unit.
11. The method of claim 9, wherein the operation module obtains an output neuron according to the learning parameter, the input neuron vector, the mean constant and the variance constant calculation, comprising:
The operation module reads an input neuron vector from the neuron cache unit, and obtains a middle value in a normalization process through parallel calculation according to the input neuron vector, the mean constant and the variance constant;
the operation module stores the intermediate value of the normalization process into an intermediate value cache unit;
the operation module reads the learning parameters from the neuron cache unit, reads the intermediate value of the normalization process from the intermediate value cache unit, and obtains the output neurons according to the learning parameters and the intermediate value.
12. The method of claim 9, the operational module further comprising a data dependency determination unit;
the operation module reads an input neuron vector from the neuron cache unit according to the microinstruction, and comprises:
the data dependency relationship judging unit judges whether read-write consistency conflict exists between the microinstructions;
and if the read-write consistency conflict does not exist, the operation module reads an input neuron vector from the neuron cache unit according to the microinstruction.
13. The method of claim 9, the operational module further comprising a data dependency determination unit;
The operation module stores the output neuron into the neuron cache unit, and includes:
the data dependency relationship judging unit judges whether read-write consistency conflict exists between the microinstructions;
and if the read-write consistency conflict does not exist, the operation module stores the output neuron into the neuron cache unit.
14. An apparatus for implementing the method of any of claims 1-13, the apparatus comprising at least one of: the system comprises a data processing device, a robot, a computer, a printer, a scanner, a telephone, a tablet computer, an intelligent terminal, a mobile phone, a vehicle data recorder, a navigator, a sensor, a camera, a cloud server, a camera, a video camera, a projector, a watch, an earphone, a mobile storage, a wearable device and other various electronic products; various vehicles such as airplanes, ships, vehicles, and the like; various household appliances such as televisions, air conditioners, microwave ovens, refrigerators, electric cookers, humidifiers, washing machines, electric lamps, gas stoves, range hoods and the like; and various medical devices including nuclear magnetic resonance apparatuses, B-ultrasonic apparatuses, electrocardiographs and the like.
CN202010617696.8A 2016-04-29 2016-04-29 Apparatus and method for performing batch normalization operations Active CN111860814B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010617696.8A CN111860814B (en) 2016-04-29 2016-04-29 Apparatus and method for performing batch normalization operations

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610282550.6A CN107341546B (en) 2016-04-29 2016-04-29 Device and method for executing batch normalization operation
CN202010617696.8A CN111860814B (en) 2016-04-29 2016-04-29 Apparatus and method for performing batch normalization operations

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CN201610282550.6A Division CN107341546B (en) 2016-04-29 2016-04-29 Device and method for executing batch normalization operation

Publications (2)

Publication Number Publication Date
CN111860814A true CN111860814A (en) 2020-10-30
CN111860814B CN111860814B (en) 2024-01-16

Family

ID=60221813

Family Applications (2)

Application Number Title Priority Date Filing Date
CN202010617696.8A Active CN111860814B (en) 2016-04-29 2016-04-29 Apparatus and method for performing batch normalization operations
CN201610282550.6A Active CN107341546B (en) 2016-04-29 2016-04-29 Device and method for executing batch normalization operation

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN201610282550.6A Active CN107341546B (en) 2016-04-29 2016-04-29 Device and method for executing batch normalization operation

Country Status (1)

Country Link
CN (2) CN111860814B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109978157B (en) * 2017-12-28 2020-06-02 中科寒武纪科技股份有限公司 Integrated circuit chip device and related product
KR102148110B1 (en) * 2018-02-13 2020-08-25 상하이 캠브리콘 인포메이션 테크놀로지 컴퍼니 리미티드 Computing device and method
CN109918999A (en) * 2019-01-22 2019-06-21 西安交通大学 Based on the mechanical equipment fault intelligent diagnosing method for generating model under a kind of Small Sample Database

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104809426A (en) * 2014-01-27 2015-07-29 日本电气株式会社 Convolutional neural network training method and target identification method and device
CN105512723A (en) * 2016-01-20 2016-04-20 南京艾溪信息科技有限公司 Artificial neural network calculating device and method for sparse connection
CN105512725A (en) * 2015-12-14 2016-04-20 杭州朗和科技有限公司 Neural network training method and equipment

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103020740A (en) * 2012-12-25 2013-04-03 临安市供电局 Micrometeorological data based electric power circuit icing thickness prediction method
CN104978601B (en) * 2015-06-26 2017-08-25 深圳市腾讯计算机系统有限公司 neural network model training system and method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104809426A (en) * 2014-01-27 2015-07-29 日本电气株式会社 Convolutional neural network training method and target identification method and device
CN105512725A (en) * 2015-12-14 2016-04-20 杭州朗和科技有限公司 Neural network training method and equipment
CN105512723A (en) * 2016-01-20 2016-04-20 南京艾溪信息科技有限公司 Artificial neural network calculating device and method for sparse connection

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
KRATZERT: "Understanding the backward pass through Batch Normalization Layer", 《GITHUB》, pages 1 - 17 *
YUNJI CHEN ET AL: "DaDianNao: A Machine-Learning Supercomputer", 《2014 47TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE》, pages 609 - 622 *

Also Published As

Publication number Publication date
CN111860814B (en) 2024-01-16
CN107341546B (en) 2021-06-08
CN107341546A (en) 2017-11-10

Similar Documents

Publication Publication Date Title
CN109284825B (en) Apparatus and method for performing LSTM operations
US11922132B2 (en) Information processing method and terminal device
CN111310904B (en) Apparatus and method for performing convolutional neural network training
KR102470264B1 (en) Apparatus and method for performing reverse training of a fully-connected layer neural network
KR102486030B1 (en) Apparatus and method for executing forward operation of fully-connected layer neural network
CN110298443B (en) Neural network operation device and method
CN107704267B (en) Convolution neural network operation instruction and method thereof
CN107316078B (en) Apparatus and method for performing artificial neural network self-learning operation
WO2017185347A1 (en) Apparatus and method for executing recurrent neural network and lstm computations
EP3564863B1 (en) Apparatus for executing lstm neural network operation, and operational method
CN107886166B (en) Device and method for executing artificial neural network operation
US10853722B2 (en) Apparatus for executing LSTM neural network operation, and operational method
WO2017185336A1 (en) Apparatus and method for executing pooling operation
WO2017185335A1 (en) Apparatus and method for executing batch normalization operation
CN107341546B (en) Device and method for executing batch normalization operation
CN109711540B (en) Computing device and board card
WO2017185248A1 (en) Apparatus and method for performing auto-learning operation of artificial neural network
CN107329733B (en) Apparatus and method for performing posing operations
CN113934678A (en) Computing device, integrated circuit chip, board card, equipment and computing method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant