WO2018228424A1 - 一种神经网络训练方法和装置 - Google Patents

一种神经网络训练方法和装置 Download PDF

Info

Publication number
WO2018228424A1
WO2018228424A1 PCT/CN2018/091033 CN2018091033W WO2018228424A1 WO 2018228424 A1 WO2018228424 A1 WO 2018228424A1 CN 2018091033 W CN2018091033 W CN 2018091033W WO 2018228424 A1 WO2018228424 A1 WO 2018228424A1
Authority
WO
WIPO (PCT)
Prior art keywords
neural network
rram
layer
value
values
Prior art date
Application number
PCT/CN2018/091033
Other languages
English (en)
French (fr)
Inventor
姚骏
刘武龙
汪玉
夏立雪
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP18818039.2A priority Critical patent/EP3627401B1/en
Publication of WO2018228424A1 publication Critical patent/WO2018228424A1/zh
Priority to US16/714,011 priority patent/US11475300B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • G06N3/065Analogue means

Definitions

  • the present application relates to the field of data processing and, more particularly, to a neural network training method and apparatus.
  • Neural networks (such as deep neural networks) are widely used in computer vision, natural language processing, and big data mining. Neural network computing has the following two typical characteristics:
  • the main operation of the neural network is multidimensional matrix multiplication, and its computational complexity is generally O(N3), that is, the time required to complete the operation of N data is proportional to the cube of N.
  • O(N3) the time required to complete the operation of N data is proportional to the cube of N.
  • the 22-layer googlenet Google Network, a neural network structure, proposed by Google researchers
  • 6GFLOPS Floating-point Operations Per Second
  • neural networks often relies on massive training data. For example, imagenet 2012 contains 14 million images.
  • neural networks contain connection parameters of hundreds of millions of neurons, especially during training.
  • the neural network generates a large number of intermediate results, such as gradient information, during the operation.
  • the memory access overhead of training data, connection weights, intermediate results, etc. is an urgent requirement for data storage structure and computational performance optimization.
  • Emerging RRAM devices are considered to be one of the devices that improve the energy efficiency of neural network computing.
  • RRAM is a non-volatile memory with high integration density, higher access speed than flash FLASH devices, and lower power consumption, more suitable for data close to the processor. Access, which is well suited for non-volatile data storage media in devices such as mobile terminals.
  • the RRAM memory cell can characterize multiple values using resistive variable characteristics rather than the 0 and 1 binary values of conventional memory cells. Based on these characteristics of RRAM, a cross-array structure is constructed by RRAM, as shown in Figure 1 using the cross-array structure of the RRAM cell, which is very suitable for the matrix vector multiplication of the neural network itself.
  • the multiply-accumulation work in the digital circuit can be quickly realized.
  • matrix A is nxn size
  • matrix B has a size of nx1
  • the C matrix corresponds to the n target data of the column.
  • Each element needs to be multiplied by n times, and the total needs nxn. Secondary calculation.
  • RRAM is limited by the lifetime of RRAM. That is, the lifetime of a single cell in RRAM is limited by the number of erasable times. When the number of impedance changes of a single cell reaches a certain number of times, the ability to change impedance will be weakened, thus affecting the lifetime of RRAM. And produced an error.
  • the present application provides a method and apparatus for neural network training to improve the service life of a RRAM for performing neural network training.
  • a neural network training method is provided.
  • the method is applied to a Resistive Random Access Memory (RRAM), including: inputting a value of a r-th layer of neurons in a neural network ⁇ ri1, ri2...rin Inputting into the RRAM, calculating the neuron input values ⁇ ri1, ri2, ... rin> according to a filter in the RRAM, and obtaining the neuron output of the rth layer in the neural network Values ⁇ ro1, ro2, ...
  • RRAM Resistive Random Access Memory
  • n is a positive integer greater than 0, m is a positive integer greater than 0;
  • the neuron input value of the rth layer in the neural network ⁇ Ri1, ri2, ... rin>, the neuron output values ⁇ ro1, ro2, ... rom> of the rth layer in the neural network, and the back propagation error values ⁇ B1, B2 of the rth layer in the neural network... Bm> performing a calculation to obtain a back propagation update value ⁇ C1, C2, . . .
  • the back propagation error values ⁇ B1, B2, ... Bm> of the rth layer in the neural network are neurons according to the rth layer of the neural network.
  • the back-propagation update value of the r-th layer in the neural network ⁇ C1, C2, . . . Cm> is compared with a preset threshold, when the back propagation update value ⁇ C1, C2, . . . Cm> of the rth layer in the neural network is greater than the preset threshold, according to the nerve
  • the back propagation update values ⁇ C1, C2...Cm> of the rth layer in the network update the filters in the RRAM.
  • the solution determines the update operation in the neural network training by setting a preset threshold, and performs the update operation when the update value is greater than the preset threshold, because the magnitude of the weight update in the neural network training is not too general. Large, so this solution can greatly reduce the erasing operation brought to the RRAM by a large number of update operations in the neural network training, thereby prolonging the service life of the RRAM, and reducing the possibility of more hard errors in the RRAM due to fewer update operations. Sex, thus ensuring the reliability of RRAM.
  • the preset threshold is a static threshold or a dynamic threshold; when the preset threshold is a static threshold, a preset threshold of all levels in the neural network The values are the same.
  • the preset threshold is a dynamic threshold, the values of the preset thresholds of different levels in the neural network are different or partially different.
  • the static threshold in this implementation provides a fixed comparison threshold for the update operation of the neural network, that is, the back propagation update value of each layer in the neural network is compared with the static threshold, and the number of the static threshold is reduced. Wiping operation to extend the service life of the RRAM.
  • the dynamic threshold in this implementation provides different or partially different thresholds for different layers in the neural network. This is due to the error propagation of the neural network and the updating of different layers of the network in the neural network. Different sensitivities, different or partial thresholds are set for different layers of the neural network layer, and the update operation can be performed more specifically, thereby further ensuring the service life of the RRAM.
  • error testing is performed on the RRAM, a hard error profile of the RRAM is output, and data is generated on the neural network according to a hard error profile of the RRAM rearrange.
  • the RRAM hard error is obtained according to the error test of the RRAM.
  • the distribution map, and further data rearrangement of the neural network according to the hard error distribution map thereby reducing the impact of RRAM hard errors on the training accuracy of the neural network, or increasing the usage rate of the neural network on the RRAM having a certain bad unit.
  • the performing data rearrangement of the neural network according to the hard error profile of the RRAM comprises: arranging sparse data of the neural network to the A hard error region of constant 0 on the RRAM.
  • the sparse data of the neural network is arranged to A hard error region of 0 on the RRAM can effectively reduce the impact of RRAM hard errors on the training accuracy of the neural network, or increase the usage rate of the neural network on the RRAM with a certain bad unit.
  • the error testing of the RRAM, outputting a hard error profile of the RRAM includes: writing a test value to each unit in the RRAM, Comparing the test values of the respective units with the actual read values of the respective units to obtain a hard error condition of each unit in the RRAM, and the hard error condition of each unit in the RRAM constitutes the hard of the RRAM. Error map.
  • values of preset thresholds of different levels in the neural network are different or partially different, specifically,
  • the different values of the preset thresholds of the different levels in the neural network include: the value of the preset threshold decreases step by step from the back to the front of the neural network level; the preset thresholds of different levels in the neural network
  • the value partial difference includes: a value of a preset threshold of a front level in the neural network is smaller than a value of a preset threshold of a back level in the neural network, wherein a front level of the neural network is close to the nerve a layer of the network input layer, specifically a layer 1 to an X layer; a rear layer of the neural network is a layer close to the output layer of the neural network, specifically a RX layer to an R layer, wherein the R For the total number of layers of the neural network, X is greater than 1 and less than R.
  • a neural network training device which is applied to a Resistive Random Access Memory (RRAM), the device comprising: a forward calculation module for using a neural network of a rth layer in a neural network
  • the element input values ⁇ ri1, ri2, ...rin> are input to the RRAM, and the neuron input values ⁇ ri1, ri2, ...
  • rin> are calculated according to a filter in the RRAM, and the obtained The neuron output value of the Rth layer in the neural network ⁇ ro1, ro2...rom>, where n is a positive integer greater than 0, m is a positive integer greater than 0; a reverse calculation module is used according to the RRAM a kernel value, a neuron input value ⁇ ri1, ri2...rin> of the rth layer in the neural network, a neuron output value ⁇ ro1, ro2...rom> of the rth layer in the neural network, and the nerve
  • the back propagation error values ⁇ B1, B2, ... Bm> of the rth layer in the network are calculated to obtain the back propagation update values ⁇ C1, C2, ...
  • the core value of the RRAM is the matrix value of the filter in the RRAM, and the back propagation error of the rth layer in the neural network ⁇ B1, B2, . . . Bm> is a neuron reference output value ⁇ r1, rt2 according to the neuron output value ⁇ ro1, ro2, ... rom> of the rth layer of the neural network and the rth layer of the neural network.
  • the solution determines the update operation in the neural network training by setting a preset threshold, and performs the update operation when the update value is greater than the preset threshold, because the magnitude of the weight update in the neural network training is not too general. Large, so this solution can greatly reduce the erasing operation brought to the RRAM by a large number of update operations in the neural network training, thereby extending the life of the RRAM.
  • the apparatus further includes a threshold generating module, configured to generate the preset threshold, where the preset threshold includes a static threshold or a dynamic threshold;
  • the threshold of the threshold is that the threshold generation module sets the values of the preset thresholds of all the levels in the neural network to be the same;
  • the dynamic threshold refers to: the threshold generation module will be different levels in the neural network.
  • the values of the preset thresholds are set to be different or partially different.
  • the static threshold in this implementation provides a fixed comparison threshold for the update operation of the neural network, that is, the back propagation update value of each layer in the neural network is compared with the static threshold, and the number of the static threshold is reduced. Wiping operation to extend the service life of the RRAM.
  • the dynamic threshold in this implementation provides different or partially different thresholds for different layers in the neural network. This is due to the error propagation of the neural network and the updating of different layers of the network in the neural network. Different sensitivities, different or partial thresholds are set for different layers of the neural network layer, and the update operation can be performed more specifically, thereby further ensuring the service life of the RRAM.
  • the comparison module includes the preset threshold, and the preset threshold is a static threshold.
  • the static threshold refers to: the processing The values of the preset thresholds of all levels in the neural network are set to be the same.
  • the apparatus further includes an error testing module and a rearrangement module, the error testing module is configured to perform error testing on the RRAM, and output the hard of the RRAM An error profile is provided to the rearrangement module, and the rearrangement module is configured to perform data rearrangement on the neural network according to a hard error profile of the RRAM.
  • the RRAM hard error is obtained according to the error test of the RRAM.
  • the distribution map, and further data rearrangement of the neural network according to the hard error distribution map thereby reducing the impact of RRAM hard errors on the training accuracy of the neural network, or increasing the usage rate of the neural network on the RRAM having a certain bad unit.
  • the rearrangement module is specifically configured to: arrange the sparse data of the neural network to a hard error region of zero on the RRAM.
  • the sparse data of the neural network is arranged to A hard error region of 0 on the RRAM can effectively reduce the impact of RRAM hard errors on the training accuracy of the neural network, or increase the usage rate of the neural network on the RRAM with a certain bad unit.
  • the error testing module is specifically configured to: separately write a test value to each unit in the RRAM, and compare the test value of each unit with each of the The actual readout values of the cells are compared separately to obtain a hard error condition for each cell in the RRAM, and the hard error condition of each cell in the RRAM constitutes a hard error profile of the RRAM.
  • the threshold generating module sets a value of a preset threshold of different levels in the neural network to be different or partially different, and specifically includes: the threshold generating module Setting a value of the preset threshold to be decremented layer by layer from the back to the front of the neural network level; or setting a preset threshold of a front level in the neural network to be smaller than a preset of a lower level in the neural network a value of a threshold, wherein a front level of the neural network is a layer close to the input layer of the neural network, specifically a layer 1 to an X layer; and a rear layer of the neural network is close to the neural network output
  • the level of the layer specifically the RX layer to the Rth layer, wherein R is the total number of layers of the neural network, and X is greater than 1 and less than R.
  • a neural network training device for use in a Resistive Random Access Memory (RRAM), the device comprising: a processor, configured to input a value of a neuron of the rth layer in the neural network.
  • Ri1, ri2, ... rin> are input to the RRAM, and the neuron input values ⁇ ri1, ri2, ... rin> are calculated according to a filter in the RRAM, and the neural network is obtained.
  • the neuron output value of the R layer ⁇ ro1, ro2, ...
  • n is a positive integer greater than 0, m is a positive integer greater than 0; and according to the kernel value of the RRAM, the rth in the neural network
  • the neuron input values ⁇ ri1, ri2...rin> of the layer, the neuron output values ⁇ ro1, ro2...rom> of the rth layer in the neural network, and the back propagation error of the rth layer in the neural network The values ⁇ B1, B2, . . . Bm> are calculated to obtain the back propagation update values ⁇ C1, C2, . . .
  • the core value of the RRAM is in the RRAM a matrix value of the filter
  • the back propagation error values ⁇ B1, B2, ... Bm> of the rth layer in the neural network are based on the neural network a neuron output value ⁇ ro1, rr2... rom> of the r-th layer and a neuron reference output value ⁇ rt1, rt2, ...
  • a comparator for The back propagation update value ⁇ C1, C2...Cm> of the rth layer in the neural network is compared with a preset threshold when the back propagation update value of the rth layer in the neural network is ⁇ C1, C2...Cm> If the threshold is greater than the preset threshold, the processor updates the filter in the RRAM according to the back propagation update value ⁇ C1, C2 . . . Cm> of the rth layer in the neural network.
  • the solution determines the update operation in the neural network training by setting a preset threshold, and performs the update operation when the update value is greater than the preset threshold, because the magnitude of the weight update in the neural network training is not too general. Large, so this solution can greatly reduce the erasing operation brought to the RRAM by a large number of update operations in the neural network training, thereby extending the life of the RRAM.
  • the processor is further configured to generate the preset threshold, where the preset threshold includes a static threshold or a dynamic threshold; specifically, the static threshold is Means that the processor sets the values of preset thresholds of all levels in the neural network to be the same; the dynamic threshold refers to: the processor sets a preset threshold value of different levels in the neural network. Different or partial.
  • the static threshold in this implementation provides a fixed comparison threshold for the update operation of the neural network, that is, the back propagation update value of each layer in the neural network is compared with the static threshold, and the number of the static threshold is reduced. Wiping operation to extend the service life of the RRAM.
  • the dynamic threshold in this implementation provides different or partially different thresholds for different layers in the neural network. This is due to the error propagation of the neural network and the updating of different layers of the network in the neural network. Different sensitivities, different or partial thresholds are set for different layers of the neural network layer, and the update operation can be performed more specifically, thereby further ensuring the service life of the RRAM.
  • the comparator includes the preset threshold, and the preset threshold is a static threshold.
  • the static threshold refers to: the processing The values of the preset thresholds of all levels in the neural network are set to be the same.
  • the processor is further configured to receive a hard error profile of the RRAM, and perform data rearrangement on the neural network according to the hard error profile of the RRAM
  • the hard error profile is obtained by performing error testing on the RRAM by a peripheral circuit of the processor.
  • the RRAM hard error is obtained according to the error test of the RRAM.
  • the distribution map, and further data rearrangement of the neural network according to the hard error distribution map thereby reducing the impact of RRAM hard errors on the training accuracy of the neural network, or increasing the usage rate of the neural network on the RRAM having a certain bad unit.
  • the processor is specifically configured to arrange the sparse data of the neural network to the RRAM according to a hard error profile of the RRAM to be always 0. Hard error area.
  • the sparse data of the neural network is arranged to A hard error region of 0 on the RRAM can effectively reduce the impact of RRAM hard errors on the training accuracy of the neural network, or increase the usage rate of the neural network on the RRAM with a certain bad unit.
  • the hard error distribution map of the RRAM received by the processor is specifically obtained by an error test circuit, and the error test circuit is specifically used for:
  • Each unit in the RRAM writes a test value, and compares the test value of each unit with the actual read value of each unit to obtain a hard error condition of each unit in the RRAM, and each of the RRAMs
  • the hard error condition of the unit constitutes a hard error map of the RRAM, wherein the error test circuit is present independently of the processor and is electrically coupled to the processor.
  • the processor sets a value of a preset threshold of different levels in the neural network to be different or partially different, and specifically includes: the processor setting The value of the preset threshold is decremented layer by layer from the back to the front of the neural network level; or the value of the preset threshold of the previous level in the neural network is set to be smaller than the preset threshold of the back level in the neural network.
  • a front level of the neural network is a layer close to the input layer of the neural network, specifically a layer 1 to an X layer; and a rear layer of the neural network is close to an output layer of the neural network
  • the level specifically the RX layer to the Rth layer, wherein R is the total number of layers of the neural network, and X is greater than 1 and less than R.
  • a fourth aspect a computer readable storage medium comprising instructions, wherein when executed on a server or a terminal, causing the server or the terminal to perform any of claims 1-6
  • Figure 1 is a cross-array structure diagram of an RRAM cell.
  • Figure 2 is a schematic diagram of the working principle of neurons.
  • FIG. 3 is a flow chart of a neural network training method according to an embodiment of the present application.
  • FIG. 4 is a schematic structural diagram of a neural network training device according to an embodiment of the present application.
  • FIG. 5 is a schematic diagram of an update logic of an embodiment of the present application.
  • FIG. 6 is a schematic structural diagram of a neural network training apparatus according to another embodiment of the present application.
  • FIG. 7 is a schematic structural diagram of a neural network training apparatus according to still another embodiment of the present application.
  • FIG. 8 is a schematic diagram of the operation of the error detecting module of the present application.
  • FIG. 9 is a schematic structural diagram of a neural network training apparatus according to still another embodiment of the present application.
  • Neural networks or Connection Models, are mathematical models of algorithms that mimic the behavioral characteristics of animal neural networks and perform distributed parallel information processing. This kind of network relies on the complexity of the system to achieve the purpose of processing information by adjusting the relationship between a large number of internal nodes.
  • RRAM Resistive Random Access Memory
  • Resistive Random Access Memory a rewritable memory technology that significantly improves durability and data transfer speed.
  • RRAM is a kind of "depending on the voltage applied to the metal oxide (Metal Oxide), the resistance of the material changes correspondingly between the high resistance state and the low resistance state, thereby opening or blocking the current flow channel, and utilizing this Memory that stores various kinds of information.”
  • Emerging RRAM devices are considered to be one of the devices that improve the energy efficiency of neural network computing.
  • RRAM is a non-volatile memory with high integration density, higher access speed than flash FLASH devices, and lower power consumption, more suitable for data close to the processor. Access, which is well suited for non-volatile data storage media in devices such as mobile terminals.
  • the RRAM memory cell can characterize multiple values using resistive variable characteristics rather than the 0 and 1 binary values of conventional memory cells. Based on these characteristics of RRAM, a cross-array structure is constructed by RRAM, as shown in Figure 1 using the cross-array structure of the RRAM cell, which is very suitable for the matrix vector multiplication of the neural network itself.
  • a crossbar (xbar) structure refers to a structure having a row and a column crossing.
  • each cross node is provided with NVM (hereinafter referred to as a cross node as an NVM node) for storing data and calculation.
  • NVM hereinafter referred to as a cross node as an NVM node
  • the cross array is very suitable for neural network calculation.
  • the dimension of the vector to be calculated is n, and the n elements in the vector to be calculated are respectively represented by digital signals D1 to Dn.
  • the digital signals D1 to Dn are converted into analog signals V1 to Vn by an analog to digital converter (DAC), and at this time, n elements in the vector to be calculated are represented by analog signals V1 to Vn, respectively.
  • the analog signals V1 to Vn are input to n rows of the cross array, respectively.
  • the conductance value of the NVM node of each column in the cross array represents the magnitude of the weight stored by the NVM node, and therefore, after the analog signals V1 to Vn act on the corresponding NVM nodes of each column, the output of each NVM node
  • the current value represents the product of the weight stored by the NVM node and the data element represented by the analog signal received by the NVM node.
  • each column of the cross array corresponds to a kernel vector
  • the sum of the output currents of each column represents the operation result of the matrix product of the corresponding kernel of the column and the submatrix corresponding to the vector to be calculated.
  • the operation result of the matrix product is converted from an analog quantity to a digital quantity by an analog to digital converter (ADC) at the end of each column of the cross array.
  • ADC analog to digital converter
  • Fig. 2 shows an information transfer from the input of a1, a2, ..., an neuron to the exit t.
  • the way of transmission is that the signal intensity generated by each neuron is transmitted to the exit direction through the transmission force of the synapse.
  • it can be expressed as: multiplication and addition of the vector ⁇ a1, a2, ..., an> with the synaptic weight vector ⁇ w1, w2, ..., wn>.
  • the vector ⁇ a1, a2, ..., an> is equivalent to the vector to be calculated as described above, where the velocity of the synapse transmission, that is, the weight vector of the synapse ⁇ w1, w2, ..., wn> is equivalent to The kernel vector mentioned in the text.
  • the left DAC end imports the neuron intensity ⁇ a1,...an>, and the cross array stores the kernel.
  • the first column is ⁇ w1,...,wn>, and the result of SUM (multiply-add) is output on the ADC.
  • b is the deviation used to increase or decrease the neuron input by a fixed value.
  • RRAM is similar to FLASH equipment.
  • the lifetime of a single unit is limited by the number of erasable times. After the number of impedance changes of a single unit reaches a certain number of times, the ability to change the impedance will be weak. Affect life and cause errors. Frequent update operations also greatly increase the potential for hard errors, which affects the reliability of RRAM.
  • the RRAM cross array is only used for the forward calculation of the neural network, that is, the neural network parameters that will be trained on other devices, including the strength (weight) of neurons and synapses. , load the RRAM cross array.
  • This method mainly uses RRAM's low power consumption, computation and storage fusion, and high-speed matrix multiplication to increase the application of neural networks in the inference process. If you go further, applying the low power consumption of RRAM, computational storage fusion, and high-speed matrix multiplication to the training of neural networks will be affected by the above soft errors and hard errors.
  • the inference process only contains forward calculations, including a large number of matrix multiplications; the training process includes forward calculations, backpropagation, and parameter updates, including forward and reverse, including A large number of matrix multiplications, while parameter updates require updating the strength of neurons and synapses based on the results of backpropagation.
  • the training process it can be considered as a training set, usually a large data amount training set to carry out a large number of synaptic and neuron intensity training, there will be a large number of update operations in this process. Frequent write operations caused by update operations can greatly affect the reliability of RRAM cells and increase the potential for soft errors and hard errors.
  • a neural network training method is described in detail below.
  • the method is applied to a Resistive Random Access Memory (RRAM).
  • RRAM Resistive Random Access Memory
  • CROSSBAR cross array of memories
  • FIG. 3 is a flowchart of a method for applying for network training according to an embodiment of the present application, including:
  • Ri2...rin> performs calculations to obtain neuron output values ⁇ ro1, ro2...rom> of the rth layer in the neural network, where n is a positive integer greater than 0, and m is a positive integer greater than zero.
  • the calculating the neuron input value ⁇ ri1, ri2, ... rin> according to a filter in the RRAM may be: a neural network accelerator composed of RRAM to the neuron
  • the input values ⁇ ri1, ri2, ... rin> are calculated.
  • the neural network accelerator includes a plurality of filters, and the filters form a filter group, and the filter groups are represented by a matrix, wherein each column is composed of A filter, also known as a kernel vector representation.
  • the input value of the neuron is calculated according to the filter in the RRAM, which is specifically performed by multiplying the neuron input value vector by the above-mentioned kernel vector, and the result of multiplying and adding is converted by the ADC in the RRAM, thereby obtaining the The neuron output value ⁇ ro1, ro2...rom> of the rth layer in the neural network.
  • the cross array is a kind of RRAM.
  • RRAM has a wide range.
  • cross-array is a carrier implemented by a relatively large number of neural network accelerators.
  • the cross array used as a neural network accelerator is based on its RRAM characteristics, that is, impedance-based storage.
  • ...rom> and the back propagation error value ⁇ B1, B2...Bm> of the rth layer in the neural network are calculated, and the back propagation update value ⁇ C1, C2 of the rth layer in the neural network is obtained.
  • the core value of the RRAM is a matrix value of a filter in the RRAM, and a back propagation error value ⁇ B1, B2...Bm> of the rth layer in the neural network is according to the nerve
  • the neuron output values ⁇ ro1, ro2...rom> of the rth layer of the network and the neuron reference output values ⁇ rt1, rt2...rtm> of the rth layer of the neural network are obtained.
  • the neuron input values ⁇ ri1, ri2...rin> and the core value of the RRAM shown as ⁇ W1, W2... Wn>
  • neuron output values ⁇ ro1, ro2...rom> and backpropagation error values ⁇ B1, B2...Bm> are multiplied to obtain the back propagation update value of the layer ⁇ C1, C2...Cm >.
  • the number of neuron output values is not necessarily the same as the number of neuron output values, so it is represented by n and m separately.
  • the range of values of n and m is a positive integer greater than 0, and n can be equal to m, but in more cases, n is not the same as m.
  • the solution determines the update operation in the neural network training by setting a preset threshold, and performs the update operation when the update value is greater than the preset threshold, because the magnitude of the weight update in the neural network training is not too general. Large, so this solution can greatly reduce the erasing operation brought to the RRAM by a large number of update operations in the neural network training, thereby prolonging the service life of the RRAM, and reducing the possibility of more hard errors in the RRAM due to fewer update operations. Sex, thus ensuring the reliability of RRAM.
  • the preset threshold is a static threshold or a dynamic threshold; when the preset threshold is a static threshold, values of preset thresholds of all levels in the neural network are the same.
  • the preset threshold is a dynamic threshold, the values of the preset thresholds of different levels in the neural network are different or partially different.
  • the static threshold in this implementation provides a fixed comparison threshold for the update operation of the neural network, that is, the back propagation update value of each layer in the neural network is compared with the static threshold, and the number of the static threshold is reduced. Wiping operation to extend the service life of the RRAM.
  • the dynamic threshold in this implementation provides different or partially different thresholds for different layers in the neural network. This is due to the error propagation of the neural network and the updating of different layers of the network in the neural network. Different sensitivities, different or partial thresholds are set for different layers of the neural network layer, and the update operation can be performed more specifically, thereby further ensuring the service life of the RRAM.
  • values of preset thresholds of different levels in the neural network are different or partially different, specifically, different levels in the neural network.
  • the different values of the preset thresholds include: the values of the preset thresholds are decremented layer by layer from the back to the front of the neural network level; the values of the preset thresholds of different levels in the neural network are different: The value of the preset threshold of the front level in the neural network is smaller than the value of the preset threshold of the back level in the neural network, wherein the front level of the neural network is close to the level of the input layer of the neural network, a layer from the first layer to the Xth layer; a rear layer of the neural network is a layer close to the output layer of the neural network, specifically a RX layer to an Rth layer, wherein the R is a total of the neural network
  • the number of layers, X is greater than 1 and less than R.
  • the preset threshold can take a smaller threshold and close to the level of the neural network output layer. It is relatively insensitive to updates, so its preset threshold can take a larger threshold.
  • the error transmission of the neural network makes the update sensitivity of different layers of the neural network different, and different or partial thresholds are set for the different layers of the neural network layer, the update operation can be more targeted, thereby further ensuring The service life of the RRAM.
  • the RRAM is erroneously tested, a hard error profile of the RRAM is output, and the neural network is rearranged according to a hard error profile of the RRAM.
  • the RRAM hard error is obtained according to the error test of the RRAM.
  • the distribution map, and further data rearrangement of the neural network according to the hard error distribution map thereby reducing the impact of RRAM hard errors on the training accuracy of the neural network, or increasing the usage rate of the neural network on the RRAM having a certain bad unit.
  • the performing data rearrangement on the neural network according to the hard error profile of the RRAM includes: arranging the sparse data of the neural network to the RRAM to be always 0. Hard error area.
  • the sparse data of the neural network is arranged to A hard error region of 0 on the RRAM can effectively reduce the impact of RRAM hard errors on the training accuracy of the neural network, or increase the usage rate of the neural network on the RRAM with a certain bad unit.
  • the above error test does not need to be performed every time the neural network iteration, which may be performed before the neural network training, outputting a hard error distribution map of the RRAM to the embodiment of the present invention to enable the implementation of the present invention.
  • the neural network is rearranged according to the hard error distribution map, which can also be performed after a certain number of iterations, because during the training of the neural network, the hard error of the RRAM may continue to be generated, so that the training section is trained.
  • Time is the phase error test of the RRAM and output the hard error distribution map of the RRAM at that time, and then the data is rearranged according to the hard error distribution map of the RRAM, which is beneficial to fully utilize the RRAM to be 0.
  • the hard error unit allows the sparse data in the neural network to be allocated as much as possible to this error region of constant 0, thereby increasing the efficiency of the neural network in RRAM.
  • the error testing is performed on the RRAM, and outputting a hard error distribution map of the RRAM includes: respectively writing a test value to each unit in the RRAM, where each unit is The test value is compared with the actual read value of each unit to obtain a hard error condition of each unit in the RRAM, and the hard error condition of each unit in the RRAM constitutes a hard error map of the RRAM.
  • RRAM Resistive Random Access Memory
  • a forward calculation module configured to input a neuron input value ⁇ ri1, ri2...rin> of the rth layer in the neural network into the RRAM, and the neuron according to a filter in the RRAM
  • the input values ⁇ ri1, ri2, ... rin> are calculated to obtain the neuron output values ⁇ ro1, ro2, ... rom> of the Rth layer in the neural network, where n is a positive integer greater than 0, and m is greater than 0. Positive integer.
  • a reverse calculation module configured to: according to a kernel value of the RRAM, a neuron input value of the rth layer in the neural network ⁇ ri1, ri2...rin>, and a neuron output value of the rth layer in the neural network ⁇ ro1, ro2, ... rom> and the back propagation error values ⁇ B1, B2, ... Bm> of the rth layer in the neural network are calculated to obtain a back propagation update value of the rth layer in the neural network.
  • a comparison module configured to compare the back propagation update values ⁇ C1, C2, . . . Cm> of the rth layer in the neural network with a preset threshold.
  • an update module configured to: when the back propagation update value ⁇ C1, C2 . . . Cm> of the rth layer in the neural network is greater than the preset threshold, update according to a reverse propagation of the rth layer in the neural network
  • the values ⁇ C1, C2, ... Cm> update the filters in the RRAM.
  • comparison module here can be used as part of the RRAM, or it can be set independently of the RRAM, and is not limited here.
  • the solution determines the update operation in the neural network training by setting a preset threshold, and performs the update operation when the update value is greater than the preset threshold, because the magnitude of the weight update in the neural network training is not too general. Large, so this solution can greatly reduce the erasing operation brought to the RRAM by a large number of update operations in the neural network training, thereby prolonging the service life of the RRAM, and reducing the possibility of more hard errors in the RRAM due to fewer update operations. Sex, thus ensuring the reliability of RRAM.
  • the apparatus further includes a threshold generating module, configured to generate the preset threshold, where the preset threshold includes a static threshold or a dynamic threshold; specifically, the static threshold refers to: The threshold generation module sets the values of the preset thresholds of all the levels in the neural network to be the same; the dynamic threshold refers to: the threshold generation module sets the values of preset thresholds of different levels in the neural network. Different or partial.
  • a threshold generating module configured to generate the preset threshold, where the preset threshold includes a static threshold or a dynamic threshold; specifically, the static threshold refers to: The threshold generation module sets the values of the preset thresholds of all the levels in the neural network to be the same; the dynamic threshold refers to: the threshold generation module sets the values of preset thresholds of different levels in the neural network. Different or partial.
  • the comparison module includes the preset threshold, where the preset threshold is a static threshold.
  • the static threshold refers to: the processor will use the neural network.
  • the values of the preset thresholds for all levels in the middle are set to be the same.
  • the threshold generating module sets different values of preset thresholds of different levels in the neural network to be different or partially different, and specifically includes: the threshold generating module sets the preset threshold.
  • the value of the neural network is decremented from back to front with the layer of the neural network; or the value of the preset threshold of the previous level in the neural network is set to be smaller than the preset threshold of the back level in the neural network, where
  • the front level of the neural network is a level close to the input layer of the neural network, specifically the first layer to the Xth layer; and the rear level of the neural network is a level close to the output layer of the neural network, specifically The RX layer to the Rth layer, wherein R is a total number of layers of the neural network, and X is greater than 1 and less than R.
  • FIG. 5 embodies the update logic of the embodiment of the present invention, that is, the back propagation update value ⁇ C1, C2, . . . Cm> is calculated by the reverse calculation module (the vector is represented by C in the figure), and the backward propagation update value C enters Comparing a module, comparing with a preset threshold of the threshold generation module, when the back propagation update value is less than the preset threshold, the write enable signal is set to 0, and the back propagation update value is not written into the RRAM cross array When the back propagation update value is greater than the preset threshold, the write enable signal is set to 1, and the back propagation update value is written to the RRAM cross array.
  • this approximation method accumulates multiple back-propagation update values, and this more accumulative mode is realized by the main path of the training process itself of the neural network.
  • a special unit (such as the update logic shown in FIG. 5) is added at the time of updating to block the partial update, and other processes of the entire neural network training are not affected.
  • the embodiment of the present invention may also set different or partially different thresholds for different network layers of the neural network, which is referred to as a dynamic threshold, and the error transmission of the neural network may rapidly decrease from the back to the front.
  • a dynamic threshold In the process of backpropagation, the error is differentiated layer by layer. Therefore, the back propagation update value becomes smaller after a certain number of layers. Therefore, the level close to the input layer of the neural network is more sensitive to the update.
  • the corresponding preset threshold takes a smaller threshold, and the level close to the output layer of the neural network is relatively insensitive to the update, so a corresponding threshold is taken for a corresponding threshold. In this way, different or partial thresholds are set for different layers of the neural network layer, and the update operation can be performed more specifically, thereby further ensuring the service life of the RRAM.
  • the apparatus further includes an error testing module and a rearrangement module, the error testing module is configured to perform error testing on the RRAM, and output a hard error distribution map of the RRAM to the And a rearrangement module, configured to perform data rearrangement on the neural network according to a hard error distribution map of the RRAM.
  • the rearrangement module is specifically configured to: arrange the sparse data of the neural network to a hard error area of zero on the RRAM.
  • the error test module can perform error test on the RRAM and is performed by the rearrangement module.
  • the neural network is rearranged according to a hard error map of the RRAM of the test output. It should be noted here that the error test module and the rearrangement module do not necessarily need to perform the error measurement and rearrangement work in each neural network training iteration.
  • the error test can be performed before the neural network training, that is, offline. It can be done in advance, or after a certain number of iterations.
  • the hard error of the RRAM may continue to occur, so that the training of the RRAM is a phased error.
  • the rearrangement module receives the neural network parameter W′ input obtained by the neural network iterative completion or partial completion (the neural network parameter W′ is specifically output by the update module in FIG. 4, and the W′ is After updating the W vector by the C vector, the multi-column W' vector constitutes a P matrix), as shown in the memory area of the weighted neural network to be rearranged in FIG. 7 (assumed to be P).
  • the other input is the hard error map of the RRAM (assumed to be Q).
  • the main function of this module is to match the weight matrix P and the error distribution matrix Q.
  • Figure 7 shows an example of an exchange.
  • the error test module in Figure 7 assuming that the RRAM is a 6x6 cross array, Q is a 6x6 matrix. After the error test module performs the RRAM error test, Q obtains three cells with a constant 0 error, which are the fifth column element of the first row, the sixth column element of the second row, and the fourth row of Column 3 element.
  • P is an example weight matrix before rearrangement (ie, weight data of a neural network, which is composed of the updated neural network weight W'), where the unit is 0.
  • the value which is actually close to 0 is a smaller value than the other values such as 0.003, and is represented by 0 in the figure. It can be seen that approximately 2/3 of the values in P are zero. If P is directly written to the RRAM, elements other than 0 in P, such as 0.009 in the 5th column of the first row and 0.003 in the third row and the third column, will fall into the error unit of 0 in Q. on.
  • the rearrangement module exchanges the first row and the second row in P according to the distribution of the constant zero unit in Q, the fifth row. Exchanged with line 6 and got the updated P'.
  • P' 3 cells (1, 5), (2, 6), (4, 3) are always 0 in Q (Note) : The format is row, column), and the corresponding position in P' corresponds to the value 0, which has no effect on subsequent calculations.
  • the implementation of the rearrangement logic may be specifically implemented by using a software method.
  • the specific implementation can be done using a general purpose processor.
  • the hard error distribution map Q obtained by the error test module can be written into the dedicated area of the peripheral circuit of the RRAM, and sent to the general-purpose processor for rearrangement together with the updated weighted area of the neural network.
  • the starting point of its rearrangement logic is that the neural network is sparse. Most of the data is close to zero. In a neural network with sparsity, the matching result of a unit with a constant zero can be obtained relatively simply.
  • the hard error of the RRAM includes a hard error of constant zero and a hard error of constant 1
  • the error of constant 1 is much less than the error of constant 0, in the embodiment of the present invention
  • the focus is on the hard error of 0, and the error of the constant 1 can be processed in a manner of skipping the entire line, or can be processed in another manner, which is not limited in the embodiment of the present invention.
  • the error testing module is specifically configured to: separately write test values to respective units in the RRAM, and compare test values of the respective units with actual read values of the respective units. Comparing separately, a hard error condition of each unit in the RRAM is obtained, and a hard error condition of each unit in the RRAM constitutes a hard error map of the RRAM.
  • Figure 8 shows the operation of the error detection module. Constantly 0, a constant 1 error needs to be detected by modifying the RRAM resistance value and comparing it with the ideal modification result. Therefore, before the error is detected, the current resistance of all RRAMs must be read and recorded.
  • the embodiment of the present invention may use a block error detection method to first divide the original RRAM array into a plurality of mutually disjoint sub-matrices, and perform error detection on each sub-matrix as a whole.
  • FIG. 8 An exemplary error detection circuit having an array size of 5 ⁇ 5 is shown in FIG. 8 (the array is an error detection sub-array, and may be a partial array in the cross array (hereinafter referred to as the original cross array) in the foregoing or the previous figure,
  • the total cross-array is usually relatively large. Therefore, when the error detection module performs error detection, the error detection sub-array can be detected in error. In the case of performing error detection with constant 0 and constant 1, the sub-array to be detected first is detected. All RRAM cells in the middle write a minimum deviation, the magnitude of which is determined by the resistance of the RRAM device.
  • the minimum deviation of the reduction resistance (increasing the conductance) needs to be written; when the error detection is always 1 error, the increase resistance is written ( Reduce the minimum deviation of conductance).
  • an error detection voltage is applied to the input interface of the error detection sub-array (the input interface is shown at the left end of FIG. 8), and can be written on the output interface of the error detection sub-array (the output interface is shown in the lower end of the ADC as shown in FIG. 8).
  • the calculation result after entering the deviation is subjected to analog-to-digital conversion as a signal to be compared.
  • RRAM devices are bipolar, allowing the application of a reverse bias voltage. Therefore, an error detection voltage can be applied to the output interface of the error detecting sub-array, and the error detecting circuit shown in FIG. 8 can be connected to the input interface of the error detecting sub-array, thereby performing error detection of the RRAM line.
  • the result of the row error detection and the column error detection result of the final integrated sub-array If the row and column corresponding to a certain unit contain errors, the unit is considered to have the possibility of including the error, and is recorded in the error shown in FIG. 7. Distribution map Q.
  • the size of the error detection sub-array is S ⁇ T. Since the sub-array of the same row can perform column error detection at the same time, the sub-array of the same column can perform line error detection at the same time, and the method can effectively reduce the error detection complexity compared with the original method of error detection one by one.
  • the device embodiment corresponds to the foregoing method embodiments and device embodiments. Therefore, the parts that are not described in detail may be referred to the foregoing method embodiments and device embodiments, and are not repeated. Narration.
  • FIG. 9 is a schematic structural diagram of a neural network training device 900 according to an embodiment of the present application.
  • the neural network training device is applied to a Resistive Random Access Memory (RRAM), and the device includes:
  • the processor 910 is configured to input a neuron input value ⁇ ri1, ri2, ... rin> of the rth layer in the neural network into the RRAM, and input the neuron according to a filter in the RRAM.
  • the values ⁇ ri1, ri2, ... rin> are calculated to obtain the neuron output values ⁇ ro1, ro2, ... rom> of the Rth layer in the neural network, where n is a positive integer greater than 0, and m is greater than 0.
  • the neuron input value ⁇ ri1, ri2...rin> of the rth layer in the neural network, and the neuron output value ⁇ ro1 of the rth layer in the neural network Ro2...rom> and the back propagation error values ⁇ B1, B2...Bm> of the rth layer in the neural network are calculated, and the back propagation update values ⁇ C1, C2 of the rth layer in the neural network are obtained.
  • the core value of the RRAM is a matrix value of a filter in the RRAM
  • the back propagation error value ⁇ B1, B2...Bm> of the rth layer in the neural network is based on
  • the neuron output values ⁇ ro1, ro2, ... rom> of the r-th layer of the neural network and the neuron reference output values ⁇ rt1, rt2, ... rtm> of the r-th layer of the neural network are obtained.
  • a comparator 920 configured to compare a back propagation update value ⁇ C1, C2, . . . Cm> of the rth layer in the neural network with a preset threshold, when the reverse propagation of the rth layer in the neural network is updated The values ⁇ C1, C2, ... Cm> are greater than the preset threshold, then
  • the filter in the RRAM is updated by the processor 910 according to the back propagation update values ⁇ C1, C2 ... Cm> of the rth layer in the neural network.
  • comparator 920 herein may be used as part of the RRAM or may be independent of the RRAM setting, and is not limited herein.
  • the solution determines the update operation in the neural network training by setting a preset threshold, and performs the update operation when the update value is greater than the preset threshold, because the magnitude of the weight update in the neural network training is not too general. Large, so this solution can greatly reduce the erasing operation brought to the RRAM by a large number of update operations in the neural network training, thereby prolonging the service life of the RRAM, and reducing the possibility of more hard errors in the RRAM due to fewer update operations. Sex, thus ensuring the reliability of RRAM.
  • the processor 910 is further configured to generate the preset threshold, where the preset threshold includes a static threshold or a dynamic threshold; specifically, the static threshold refers to: the processing The 910 sets the values of the preset thresholds of all the levels in the neural network to be the same; the dynamic threshold refers to: the processor 910 sets the values of the preset thresholds of different levels in the neural network to be different or Partially different.
  • the comparator 920 includes the preset threshold, and the preset threshold is a static threshold.
  • the static threshold refers to: the processor 910 The values of the preset thresholds for all levels in the neural network are set to be the same.
  • the processor 910 sets values of preset thresholds of different levels in the neural network to be different or partially different, and specifically includes: the processor 910 sets the preset threshold.
  • the value of the neural network is decremented from back to front with the layer of the neural network; or the value of the preset threshold of the previous level in the neural network is set to be smaller than the preset threshold of the back level in the neural network, where
  • the front level of the neural network is a level close to the input layer of the neural network, specifically the first layer to the Xth layer; and the rear level of the neural network is a level close to the output layer of the neural network, specifically The RX layer to the Rth layer, wherein R is a total number of layers of the neural network, and X is greater than 1 and less than R.
  • the embodiment of the present invention may also set different or partially different thresholds for different network layers of the neural network, which is referred to as a dynamic threshold.
  • a dynamic threshold Since the error transmission of the neural network will rapidly decrease from the back to the front, in the opposite In the process of propagation, the error is differentiated layer by layer. Therefore, the back propagation update value becomes smaller after a certain number of layers. Therefore, the level close to the input layer of the neural network is more sensitive to the update and can be corresponding to it.
  • the preset threshold takes a smaller threshold, and the level close to the output layer of the neural network is relatively insensitive to the update, so a corresponding threshold is taken for a corresponding threshold. In this way, different or partial thresholds are set for different layers of the neural network layer, and the update operation can be performed more specifically, thereby further ensuring the service life of the RRAM.
  • the processor 910 is further configured to receive a hard error profile of the RRAM, and perform data rearrangement on the neural network according to the hard error profile of the RRAM, where The hard error profile is obtained by error testing the RRAM by the peripheral circuitry of the processor 910.
  • the processor 910 is specifically configured to arrange the sparse data of the neural network to a hard error region of zero on the RRAM according to a hard error profile of the RRAM.
  • error test can be a program, or a test circuit, carried on the RRAM.
  • the above error test logic does not need to be performed every time the neural network iteration, which may be performed before the neural network training, outputting a hard error profile of the RRAM to the embodiment of the present invention to make the present invention
  • the processor 910 in the embodiment rearranges the neural network according to the hard error distribution map, which may also be performed after a certain number of iterations, because during the training of the neural network, a hard error of the RRAM may occur. Continuously generated, so that the RRAM is subjected to a phased error test for a period of time and a hard error distribution map of the RRAM at that time is output, and then the processor 910 performs data rearrangement on the neural network according to the hard error map of the RRAM. It is beneficial to make full use of the hard error unit of 0 in the RRAM, so that the sparse data in the neural network is allocated as much as possible to the error region of 0, thereby increasing the efficiency of the neural network in the RRAM.
  • the hard error distribution map of the RRAM received by the processor 910 is specifically obtained by using an error test circuit, where the error test circuit is specifically used to: each unit in the RRAM Writing test values respectively, comparing the test values of the respective units with the actual read values of the respective units to obtain a hard error condition of each unit in the RRAM, and a hard error condition of each unit in the RRAM
  • a hard error map of the RRAM is constructed, wherein the error test circuit can exist independently of the processor 910 and be electrically coupled to the processor 910.
  • a computer readable storage medium when run on a server or terminal, causes the server or the terminal to perform a neural network training method as described in any of the above method embodiments.
  • the disclosed apparatus and method may be implemented in other manners.
  • the device embodiments described above are merely illustrative.
  • the division of the modules is only a logical function division.
  • there may be another division manner for example, multiple modules or components may be combined or Can be integrated into another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or module, and may be electrical, mechanical or otherwise.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each module may exist physically separately, or two or more modules may be integrated into one module.
  • the functions may be stored in a computer readable storage medium if implemented in the form of a software functional unit and sold or used as a standalone product.
  • the technical solution of the present application which is essential or contributes to the prior art, or a part of the technical solution, may be embodied in the form of a software product, which is stored in a storage medium, including
  • the instructions are used to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present application.
  • the foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like. .

Abstract

一种神经网络训练方法,应用于阻变存储器RRAM上,通过设置预设阈值减少神经网络训练中的更新操作,从而延长RRAM的使用寿命。该方法包括:将神经网络中第r层的神经元输入值输入到RRAM中,根据所述RRAM中的过滤器对神经元输入值进行计算,得到所述神经网络中第r层的神经元输出值(301),根据所述RRAM的核值、所述神经网络中第r层的神经元输入值、所述神经网络中第r层的神经元输出值以及所述神经网络中第r层的反向传播误差值进行计算,得到所述神经网络中第r层的反向传播更新值(302),将所述神经网络中第r层的反向传播更新值与预设阈值进行比较,当所述神经网络中第r层的反向传播更新值大于所述预设阈值(303),根据所述神经网络中第r层的反向传播更新值对所述RRAM中的过滤器进行更新(304)。

Description

一种神经网络训练方法和装置 技术领域
本申请数据处理领域,并且更具体地,涉及一种神经网络训练方法和装置。
背景技术
神经网络(如深度神经网络)在计算机视觉、自然语言处理、大数据挖掘等领域得到广泛应用。神经网络计算具有如下两个典型特点:
1)计算密集
神经网络主要进行的运算为多维矩阵乘法,其计算复杂度一般为O(N3),即完成对N个数据的操作需要耗费的时间跟N的三次方成正比。例如,22层的googlenet(谷歌网络,一种神经元网络结构,由谷歌的研究者提出)一般需要6GFLOPS(Floating-point Operations Per Second,每秒所执行的浮点运算)的计算量。因此对计算硬件和性能优化提出了较高要求。
2)访存密集
首先,神经网络的训练过程往往需要依赖海量的训练数据,如imagenet 2012包含1400万幅图片;其次,神经网络包含上亿级的神经元的连接参数,尤其在训练过程中需要频繁更新;再者,神经网络在运算过程中会产生大量的中间结果,如梯度信息。训练数据、连接权重、中间结果等大量数据的访存开销对于数据存储结构和计算性能优化提出迫切要求。
新兴的RRAM器件(阻变式存储器,Resistive Random Access Memory)被认为是提升神经网络计算能效的器件之一。首先,RRAM是一种非易失性的存储器,且具备较高的集成密度,相比闪存FLASH设备有更高的存取速度,且耗电量更低,更合适于进行靠近处理器的数据存取,从而十分适合应用于手机终端等设备中的非易失性的数据存储介质。再者,RRAM存储单元利用阻值可变特性能够表征多值,而非传统存储单元的0和1二值。基于RRAM以上这些特性,通过RRAM组建一种交叉阵列结构,如图1利用RRAM单元的交叉阵列结构所示,非常适应神经网络本身的矩阵向量乘运算。
通过使用RRAM的模拟电路形式,可以快速的实现数字电路中的乘加工作。例如,矩阵运算C=AxB中,产生某一列的n个目标数据,所对应的计算复杂度为O(n2),上述目标数据可以认为是矩阵A的所有数据乘以矩阵B的对应的列数据获得的。假设矩阵A为nxn大小,矩阵B的一列为nx1大小,则C矩阵(结果矩阵)的这一个对应列的n个目标数据,每一个元素的获得都需要一次n次乘加,总共的需要nxn次计算。而RRAM的计算中,可以通过DAC(Digital to Analog Converter,数字/模拟转换器)和ADC(Analog to Digital Converter,模拟/数字转换器)的协同作为,将此过程转换到基于RRAM的模拟电路计算的过程中。对所有的元素现有工作表明,利用RRAM存储单元搭建的交叉阵列结构对神经网络计算进行加速,同CPU或者GPU相比可提升100-1000倍的能效。
然而RRAM的应用受到RRAM寿命的制约,即RRAM中单个单元的寿命受可擦写 次数的限制,当单个单元的阻抗改变次数达到一定次数之后,阻抗的改变能力将变弱,从而影响RRAM的寿命,并产生错误。尤其是,在神经网络训练过程中,通常有大数据量的训练集进行大量的突触和神经元强度的训练,这个过程中将有大量的更新操作,从而产生大量的擦写任务,直接影响RRAM的使用寿命,并且,频繁的更新操作也会大幅增加硬错误的潜在发生可能性,从而影响RRAM的可靠性,其中,硬错误指的是RRAM中的阻抗已经不能够完成改变,从而在电路上永远呈现为断路(阻抗无穷大)或短路(阻抗无穷小)。对应的数值即为Stuck-at-0,即恒为0错误,或者stuck-at-1,即恒为1错误。
发明内容
本申请提供一种神经网络训练的方法和装置,以提高进行神经网络训练的RRAM的使用寿命。
第一方面,提供一种神经网络训练方法,该方法应用于阻变存储器(Resistive Random Access Memory,RRAM)上,包括:将神经网络中第r层的神经元输入值<ri1、ri2……rin>输入到所述RRAM中,根据所述RRAM中的过滤器(filter)对所述神经元输入值<ri1、ri2……rin>进行计算,得到所述神经网络中第r层的神经元输出值<ro1、ro2……rom>,其中,n为大于0的正整数,m为大于0的正整数;根据所述RRAM的核值、所述神经网络中第r层的神经元输入值<ri1、ri2……rin>、所述神经网络中第r层的神经元输出值<ro1、ro2……rom>以及所述神经网络中第r层的反向传播误差值<B1、B2……Bm>进行计算,得到所述神经网络中第r层的反向传播更新值<C1、C2……Cm>;其中,所述RRAM的核值为所述RRAM中的过滤器的矩阵值,所述神经网络中第r层的反向传播误差值<B1、B2……Bm>是根据所述神经网络第r层的神经元输出值<ro1、ro2……rom>和所述神经网络第r层的神经元参考输出值<rt1、rt2……rtm>得到的;将所述神经网络中第r层的反向传播更新值<C1、C2……Cm>与预设阈值进行比较,当所述神经网络中第r层的反向传播更新值<C1、C2……Cm>大于所述预设阈值,则根据所述神经网络中第r层的反向传播更新值<C1、C2……Cm>对所述RRAM中的过滤器进行更新。
本方案通过设置预设阈值对神经网络训练中的更新操作进行判断,当该更新值大于该预设阈值时才执行该更新操作,由于在神经网络训练中权值更新的幅度整体上并不太大,因此该方案可以大大减少由神经网络训练中大量更新操作带给RRAM的擦写操作,从而延长了RRAM的使用寿命,也由于更少的更新操作,减少了RRAM发生更多硬错误的可能性,从而保障了RRAM的可靠性。
结合第一方面,在第一方面的某些实现方式中,所述预设阈值为静态阈值或动态阈值;当所述预设阈值为静态阈值时,所述神经网络中所有层级的预设阈值的值相同,当所述预设阈值为动态阈值时,所述神经网络中不同层级的预设阈值的值不同或部分不同。
本实现方式中的静态阈值为神经网络的更新操作提供了一个固定的比较阈值,即该神经网络中每一层的反向传播更新值都与该静态阈值进行比较,减少很多该静态阈值以下的擦写操作,从而延长RRAM的使用寿命,本实现方式中的动态阈值为神经网络中的不同层提供不同或部分不同的阈值,这是由于神经网络的误差传递使得神经网络中不同层网络的更新敏感度不同,为不同层的神经网络层设置不同或部分不同的阈值,可以更有针对性的进行更新操作,从而进一步保证了RRAM的使用寿命。
结合第一方面,在第一方面的某些实现方式中,对所述RRAM进行错误测试,输出所述RRAM的硬错误分布图,根据所述RRAM的硬错误分布图对所述神经网络进行数据重排。
由于RRAM的应用受很多非易失性存储的自然特性的制约,比如产生较多硬错误,又由于神经网络具有稀疏性,因此在本实现方式中,根据对RRAM进行错误测试得到RRAM的硬错误分布图,并进一步根据该硬错误分布图对神经网络进行数据重排,从而降低RRAM硬错误对神经网络训练精度的影响,或增加神经网络在有一定不良单元的RRAM上的使用率。
结合第一方面,在第一方面的某些实现方式中,所述根据所述RRAM的硬错误分布图对所述神经网络进行数据重排包括:将所述神经网络的稀疏数据排布到所述RRAM上恒为0的硬错误区域。
由于RRAM的应用受很多非易失性存储的自然特性的制约,比如产生较多硬错误,又由于神经网络具有稀疏性,因此在本实现方式中,将所述神经网络的稀疏数据排布到所述RRAM上恒为0的硬错误区域,可以有效降低RRAM硬错误对神经网络训练精度的影响,或增加神经网络在有一定不良单元的RRAM上的使用率。
结合第一方面,在第一方面的某些实现方式中,所述对所述RRAM进行错误测试,输出所述RRAM的硬错误分布图包括:对所述RRAM中各个单元分别写入测试值,将所述各个单元的测试值与所述各个单元的实际读出值分别进行比较,得到所述RRAM中各个单元的硬错误情况,所述RRAM中各个单元的硬错误情况构成所述RRAM的硬错误分布图。
结合第一方面,在第一方面的某些实现方式中,所述当所述预设阈值为动态阈值时,所述神经网络中不同层级的预设阈值的值不同或部分不同,具体的,所述神经网络中不同层级的预设阈值的值不同包括:所述预设阈值的值随所述神经网络层级的由后至前逐层递减;所述神经网络中不同层级的预设阈值的值部分不同包括:所述神经网络中靠前层级的预设阈值的值小于所述神经网络中靠后层级的预设阈值的值,其中,所述神经网络中靠前层级为靠近所述神经网络输入层的层级,具体为第1层至第X层;所述神经网络中靠后层级为靠近所述神经网络输出层的层级,具体为第R-X层至第R层,其中,所述R为所述神经网络的总层数,X大于1且小于R。
由于神经网络的误差传递使得神经网络中不同层网络的更新敏感度不同,为不同层的神经网络层设置不同或部分不同的阈值,可以更有针对性的进行更新操作,从而进一步保证了RRAM的使用寿命。
第二方面,提供一种神经网络训练装置,该装置应用于阻变存储器(Resistive Random Access Memory,RRAM)上,所述装置包括:前向计算模块,用于将神经网络中第r层的神经元输入值<ri1、ri2……rin>输入到所述RRAM中,根据所述RRAM中的过滤器(filter)对所述神经元输入值<ri1、ri2……rin>进行计算,得到所述神经网络中第R层的神经元输出值<ro1、ro2……rom>,其中,n为大于0的正整数,m为大于0的正整数;反向计算模块,用于根据所述RRAM的核值、所述神经网络中第r层的神经元输入值<ri1、ri2……rin>、所述神经网络中第r层的神经元输出值<ro1、ro2……rom>以及所述神经网络中第r层的反向传播误差值<B1、B2……Bm>进行计算,得到所述神经网络中第r层的反向传播更新值<C1、C2……Cm>;其中,所述RRAM的核值为所述RRAM中的过滤器 的矩阵值,所述神经网络中第r层的反向传播误差值<B1、B2……Bm>是根据所述神经网络第r层的神经元输出值<ro1、ro2……rom>和所述神经网络第r层的神经元参考输出值<rt1、rt2……rtm>得到的;比较模块,用于将所述神经网络中第r层的反向传播更新值<C1、C2……Cm>与预设阈值进行比较;更新模块,用于当所述神经网络中第r层的反向传播更新值<C1、C2……Cm>大于所述预设阈值,则根据所述神经网络中第r层的反向传播更新值<C1、C2……Cm>对所述RRAM中的过滤器进行更新。
本方案通过设置预设阈值对神经网络训练中的更新操作进行判断,当该更新值大于该预设阈值时才执行该更新操作,由于在神经网络训练中权值更新的幅度整体上并不太大,因此该方案可以大大减少由神经网络训练中大量更新操作带给RRAM的擦写操作,从而延长了RRAM的使用寿命。
结合第二方面,在第二方面的某些实现方式中,所述装置还包括阈值生成模块,用于生成所述预设阈值,所述预设阈值包括静态阈值或动态阈值;具体的,所述静态阈值是指:所述阈值生成模块将所述神经网络中所有层级的预设阈值的值设置为相同;所述动态阈值是指:所述阈值生成模块将所述神经网络中不同层级的预设阈值的值设置为不同或部分不同。
本实现方式中的静态阈值为神经网络的更新操作提供了一个固定的比较阈值,即该神经网络中每一层的反向传播更新值都与该静态阈值进行比较,减少很多该静态阈值以下的擦写操作,从而延长RRAM的使用寿命,本实现方式中的动态阈值为神经网络中的不同层提供不同或部分不同的阈值,这是由于神经网络的误差传递使得神经网络中不同层网络的更新敏感度不同,为不同层的神经网络层设置不同或部分不同的阈值,可以更有针对性的进行更新操作,从而进一步保证了RRAM的使用寿命。
结合第二方面,在第二方面的某些实现方式中,所述比较模块中包含所述预设阈值,所述预设阈值为静态阈值,具体的,所述静态阈值是指:所述处理器将所述神经网络中所有层级的预设阈值的值设置为相同。
结合第二方面,在第二方面的某些实现方式中,所述装置还包括错误测试模块和重排模块,所述错误测试模块用于对所述RRAM进行错误测试,输出所述RRAM的硬错误分布图给所述重排模块,所述重排模块用于根据所述RRAM的硬错误分布图对所述神经网络进行数据重排。
由于RRAM的应用受很多非易失性存储的自然特性的制约,比如产生较多硬错误,又由于神经网络具有稀疏性,因此在本实现方式中,根据对RRAM进行错误测试得到RRAM的硬错误分布图,并进一步根据该硬错误分布图对神经网络进行数据重排,从而降低RRAM硬错误对神经网络训练精度的影响,或增加神经网络在有一定不良单元的RRAM上的使用率。
结合第二方面,在第二方面的某些实现方式中,所述重排模块具体用于:将所述神经网络的稀疏数据排布到所述RRAM上恒为0的硬错误区域。
由于RRAM的应用受很多非易失性存储的自然特性的制约,比如产生较多硬错误,又由于神经网络具有稀疏性,因此在本实现方式中,将所述神经网络的稀疏数据排布到所述RRAM上恒为0的硬错误区域,可以有效降低RRAM硬错误对神经网络训练精度的影响,或增加神经网络在有一定不良单元的RRAM上的使用率。
结合第二方面,在第二方面的某些实现方式中,所述错误测试模块具体用于:对所 述RRAM中各个单元分别写入测试值,将所述各个单元的测试值与所述各个单元的实际读出值分别进行比较,得到所述RRAM中各个单元的硬错误情况,所述RRAM中各个单元的硬错误情况构成所述RRAM的硬错误分布图。
结合第二方面,在第二方面的某些实现方式中,所述阈值生成模块将所述神经网络中不同层级的预设阈值的值设置为不同或部分不同,具体包括:所述阈值生成模块设置所述预设阈值的值随所述神经网络层级由后至前逐层递减;或设置所述神经网络中靠前层级的预设阈值的值小于所述神经网络中靠后层级的预设阈值的值,其中,所述神经网络中靠前层级为靠近所述神经网络输入层的层级,具体为第1层至第X层;所述神经网络中靠后层级为靠近所述神经网络输出层的层级,具体为第R-X层至第R层,其中,所述R为所述神经网络的总层数,X大于1且小于R。
由于神经网络的误差传递使得神经网络中不同层网络的更新敏感度不同,为不同层的神经网络层设置不同或部分不同的阈值,可以更有针对性的进行更新操作,从而进一步保证了RRAM的使用寿命。
第三方面,提供一种神经网络训练装置,应用于阻变存储器(Resistive Random Access Memory,RRAM)上,所述装置包括:处理器,用于将神经网络中第r层的神经元输入值<ri1、ri2……rin>输入到所述RRAM中,根据所述RRAM中的过滤器(filter)对所述神经元输入值<ri1、ri2……rin>进行计算,得到所述神经网络中第R层的神经元输出值<ro1、ro2……rom>,其中,n为大于0的正整数,m为大于0的正整数;并根据所述RRAM的核值、所述神经网络中第r层的神经元输入值<ri1、ri2……rin>、所述神经网络中第r层的神经元输出值<ro1、ro2……rom>以及所述神经网络中第r层的反向传播误差值<B1、B2……Bm>进行计算,得到所述神经网络中第r层的反向传播更新值<C1、C2……Cm>;其中,所述RRAM的核值为所述RRAM中的过滤器的矩阵值,所述神经网络中第r层的反向传播误差值<B1、B2……Bm>是根据所述神经网络第r层的神经元输出值<ro1、ro2……rom>和所述神经网络第r层的神经元参考输出值<rt1、rt2……rtm>得到的;比较器,用于将所述神经网络中第r层的反向传播更新值<C1、C2……Cm>与预设阈值进行比较,当所述神经网络中第r层的反向传播更新值<C1、C2……Cm>大于所述预设阈值,则由所述处理器根据所述神经网络中第r层的反向传播更新值<C1、C2……Cm>对所述RRAM中的过滤器进行更新。
本方案通过设置预设阈值对神经网络训练中的更新操作进行判断,当该更新值大于该预设阈值时才执行该更新操作,由于在神经网络训练中权值更新的幅度整体上并不太大,因此该方案可以大大减少由神经网络训练中大量更新操作带给RRAM的擦写操作,从而延长了RRAM的使用寿命。
结合第三方面,在第三方面的某些实现方式中,所述处理器还用于生成所述预设阈值,所述预设阈值包括静态阈值或动态阈值;具体的,所述静态阈值是指:所述处理器将所述神经网络中所有层级的预设阈值的值设置为相同;所述动态阈值是指:所述处理器将所述神经网络中不同层级的预设阈值的值设置为不同或部分不同。
本实现方式中的静态阈值为神经网络的更新操作提供了一个固定的比较阈值,即该神经网络中每一层的反向传播更新值都与该静态阈值进行比较,减少很多该静态阈值以下的擦写操作,从而延长RRAM的使用寿命,本实现方式中的动态阈值为神经网络中的不同层提供不同或部分不同的阈值,这是由于神经网络的误差传递使得神经网络中不同 层网络的更新敏感度不同,为不同层的神经网络层设置不同或部分不同的阈值,可以更有针对性的进行更新操作,从而进一步保证了RRAM的使用寿命。
结合第三方面,在第三方面的某些实现方式中,所述比较器中包含所述预设阈值,所述预设阈值为静态阈值,具体的,所述静态阈值是指:所述处理器将所述神经网络中所有层级的预设阈值的值设置为相同。
结合第三方面,在第三方面的某些实现方式中,所述处理器还用于接收RRAM的硬错误分布图,并根据所述RRAM的硬错误分布图对所述神经网络进行数据重排,其中,所述硬错误分布图由所述处理器的周边电路对所述RRAM进行错误测试得到。
由于RRAM的应用受很多非易失性存储的自然特性的制约,比如产生较多硬错误,又由于神经网络具有稀疏性,因此在本实现方式中,根据对RRAM进行错误测试得到RRAM的硬错误分布图,并进一步根据该硬错误分布图对神经网络进行数据重排,从而降低RRAM硬错误对神经网络训练精度的影响,或增加神经网络在有一定不良单元的RRAM上的使用率。
结合第三方面,在第三方面的某些实现方式中,所述处理器具体用于根据所述RRAM的硬错误分布图将所述神经网络的稀疏数据排布到所述RRAM上恒为0的硬错误区域。
由于RRAM的应用受很多非易失性存储的自然特性的制约,比如产生较多硬错误,又由于神经网络具有稀疏性,因此在本实现方式中,将所述神经网络的稀疏数据排布到所述RRAM上恒为0的硬错误区域,可以有效降低RRAM硬错误对神经网络训练精度的影响,或增加神经网络在有一定不良单元的RRAM上的使用率。
结合第三方面,在第三方面的某些实现方式中,所述处理器接收到的RRAM的硬错误分布图,具体是通过错误测试电路得到的,所述错误测试电路具体用于:对所述RRAM中各个单元分别写入测试值,将所述各个单元的测试值与所述各个单元的实际读出值分别进行比较,得到所述RRAM中各个单元的硬错误情况,所述RRAM中各个单元的硬错误情况构成所述RRAM的硬错误分布图,其中,所述错误测试电路独立于处理器存在,并与所述处理器电连接。
结合第三方面,在第三方面的某些实现方式中,所述处理器将所述神经网络中不同层级的预设阈值的值设置为不同或部分不同,具体包括:所述处理器设置所述预设阈值的值随所述神经网络层级由后至前逐层递减;或设置所述神经网络中靠前层级的预设阈值的值小于所述神经网络中靠后层级的预设阈值的值,其中,所述神经网络中靠前层级为靠近所述神经网络输入层的层级,具体为第1层至第X层;所述神经网络中靠后层级为靠近所述神经网络输出层的层级,具体为第R-X层至第R层,其中,所述R为所述神经网络的总层数,X大于1且小于R。
由于神经网络的误差传递使得神经网络中不同层网络的更新敏感度不同,为不同层的神经网络层设置不同或部分不同的阈值,可以更有针对性的进行更新操作,从而进一步保证了RRAM的使用寿命。
第四方面,提供一种计算机可读存储介质,包括指令,其特征在于,当其在服务器或终端上运行时,使得所述服务器或所述终端执行如权利要求1-6中任一项所述的神经网络训练方法。
附图说明
图1是RRAM单元的交叉阵列结构图。
图2是神经元工作原理图。
图3是本申请一个实施例的神经网络训练方法流程图。
图4是本申请一个实施例的神经网络训练装置结构示意图。
图5是本申请实施例的更新逻辑示意图。
图6是本申请另一个实施例的神经网络训练装置结构示意图。
图7是本申请又一个实施例的神经网络训练装置结构示意图。
图8是本申请错误检测模块的工作示意图。
图9是本申请又一个实施例的神经网络训练装置结构示意图。
具体实施方式
为了便于理解,先对神经网络及RRAM进行相关介绍。
神经网络(NNs)或称作连接模型(Connection Model),它是一种模仿动物神经网络行为特征,进行分布式并行信息处理的算法数学模型。这种网络依靠系统的复杂程度,通过调节内部大量节点之间相互连接的关系,从而达到处理信息的目的。
RRAM,阻变式存储器(Resistive Random Access Memory),可显著提高耐久性和数据传输速度的可擦写内存技术。RRAM是一种“根据施加在金属氧化物(Metal Oxide)上的电压的不同,使材料的电阻在高阻态和低阻态间发生相应变化,从而开启或阻断电流流动通道,并利用这种性质储存各种信息的内存”。
新兴的RRAM器件被认为是提升神经网络计算能效的器件之一。首先,RRAM是一种非易失性的存储器,且具备较高的集成密度,相比闪存FLASH设备有更高的存取速度,且耗电量更低,更合适于进行靠近处理器的数据存取,从而十分适合应用于手机终端等设备中的非易失性的数据存储介质。再者,RRAM存储单元利用阻值可变特性能够表征多值,而非传统存储单元的0和1二值。基于RRAM以上这些特性,通过RRAM组建一种交叉阵列结构,如图1利用RRAM单元的交叉阵列结构所示,非常适应神经网络本身的矩阵向量乘运算。
其中,交叉阵列(crossbar或xbar)结构是指具有行列交叉的一种结构。如图1所示,每个交叉节点设置有NVM(下称交叉节点为NVM节点),用于存储数据和计算。由于神经网络层的计算主要以向量-矩阵乘法,或矩阵-矩阵乘法为主,因此,交叉阵列很适合用于神经网络计算。具体的,如图1所示,假设待计算向量的维度为n,待计算向量中的n个元素分别通过数字信号D1至Dn表示。然后,通过模拟/数字转换器(Digital to Analog Converter,DAC)将数字信号D1至Dn转换成模拟信号V1至Vn,此时,待计算向量中的n个元素分别通过模拟信号V1至Vn表示。接着,将该模拟信号V1至Vn分别输入至交叉阵列的n行。交叉阵列中的每列的NVM节点的电导值代表的是该NVM节点存储的权值的大小,因此,当模拟信号V1至Vn作用在每列对应的NVM节点上之后,每个NVM节点输出的电流值表示该NVM节点存储的权值与该NVM节点接收到的模拟信号所表示的数据元素的乘积。由于交叉阵列的每列对应一个核(kernel)向量,因 此,每列的输出电流总和代表的是该列对应的核与待计算向量对应的子矩阵的矩阵乘积的运算结果。然后,如图1所示,通过交叉阵列每列末尾的模拟/数字转换器(Analog to Digital Converter,ADC)将矩阵乘积的运算结果从模拟量转换成数字量进行输出。基于以上工作原理可以看出,交叉阵列将矩阵-矩阵乘法转换成两个向量(待计算向量和核向量)的乘法运算,并能够基于模拟计算快速得到计算结果,非常适于处理向量-矩阵乘法或矩阵-矩阵乘法等运算。由于神经网络中90%以上的运算均为此类运算,因此,交叉阵列非常适合作为神经网络中的计算单元。
又如图2所示,其表示了一个由a1,a2,…,an神经元的输入到出口t的一个信息传递。其中,传递的方式为,每个神经元产生的信号强度,通过突触的传递力度向出口方向汇集传递。在数学上可以表示为:向量<a1,a2,…,an>与突触的权值向量<w1,w2,…,wn>的乘加。这里的向量<a1,a2,…,an>相当于上文所说的待计算向量,这里的突触的传递力度,即突触的权值向量<w1,w2,…,wn>相当于上文所说的核向量。综上,这样的乘加比较容易采用图1中的交叉阵列来实现。即左方DAC端汇入神经元强度<a1,…an>,交叉阵列中储存kernel,第一列纵向为<w1,…,wn>,则在ADC上输出SUM(乘加)的结果。其中,b表示偏差,用于将神经元输入固定的调高或者调低一个值。
然而,RRAM的应用受到RRAM寿命的制约,RRAM的与FLASH设备类似,单个单元的寿命受可擦写次数的限制,单个单元的阻抗改变次数到达一定次数后,阻抗的改变的能力将变弱,影响寿命,产生错误。且频繁的更新操作也会大幅增加硬错误的潜在发生可能性,从而影响RRAM的可靠性。
在现有工作中,大多数情况下只是将RRAM交叉阵列用于神经网络的前向计算,即,将在其他设备上训练完的神经网络参数,包含神经元和突触的强度(权值),载入RRAM交叉阵列。这种方法主要使用RRAM的低功耗、计算与存储融合、高速矩阵乘的特性来增加神经网络在推理(inference)过程中的应用。如果更进一步,把RRAM的低功耗、计算存储融合、高速矩阵乘的特性应用到神经网络的训练(Training)的过程中,就会受到上述的软错误、硬错误的影响。这是由于训练过程和推理过程的不同引起的:推理过程只包含前向计算,包含大量的矩阵乘;训练过程包含前向计算、反向传播和参数更新,其中的前向和反向,包含大量的矩阵乘,而参数更新则需要根据反向传播的结果,对神经元和突触的强度进行更新。由于训练过程中,可以认为是针对训练集,通常是大数据量的训练集进行大量的突触和神经元强度的训练,这个过程中将有大量的更新操作。更新操作带来的频繁的写操作会大幅度影响RRAM单元的可靠性,增加软错误和硬错误潜在的发生可能性。
为了解决上述问题,下面结合图3,详细描述本申请实施例的一种神经网络训练方法,该方法应用于阻变存储器(Resistive Random Access Memory,RRAM)上,具体的,该方法应用于阻变存储器的交叉阵列(CROSSBAR)结构上。
图3是本申请实施例提供的一种申请网络训练方法的流程图,包括:
301、将神经网络中第r层的神经元输入值<ri1、ri2……rin>输入到所述RRAM中,根据所述RRAM中的过滤器(filter)对所述神经元输入值<ri1、ri2……rin>进行计算,得到所述神经网络中第r层的神经元输出值<ro1、ro2……rom>,其中,n为大于0的正整数,m为大于0的正整数。
我们将RRAM交叉阵列中的每个节点视为一个单元(cell),该单元用于数据的储存 和计算,将神经网络中第r层的神经元输入值<ri1、ri2……rin>输入到所述RRAM中,所述根据所述RRAM中的过滤器(filter)对所述神经元输入值<ri1、ri2……rin>进行计算具体可以为:由RRAM构成的神经网络加速器对该神经元输入值<ri1、ri2……rin>进行计算,具体的,该神经网络加速器包括多个过滤器,这些过滤器形成过滤器组,该过滤器组(filters)表现为一个矩阵,其中每一列由一个过滤器,也称为核向量表示。因此,根据所述RRAM中的过滤器对神经元输入值进行计算,具体表现为将神经元输入值向量与上述核向量相乘加,乘加的结果由RRAM中的ADC进行转换,从而得到所述神经网络中第r层的神经元输出值<ro1、ro2……rom>。
值得介绍的是,交叉阵列是RRAM的一种。RRAM的范围较广,近年来,交叉阵列是用的比较多的一种神经网络加速器实现的载体。其中,交叉阵列用作神经网络加速器是基于它的RRAM特性,即基于阻抗的存储方式。
303、根据所述RRAM的核值、所述神经网络中第r层的神经元输入值<ri1、ri2……rin>、所述神经网络中第r层的神经元输出值<ro1、ro2……rom>以及所述神经网络中第r层的反向传播误差值<B1、B2……Bm>进行计算,得到所述神经网络中第r层的反向传播更新值<C1、C2……Cm>;其中,所述RRAM的核值为所述RRAM中的过滤器的矩阵值,所述神经网络中第r层的反向传播误差值<B1、B2……Bm>是根据所述神经网络第r层的神经元输出值<ro1、ro2……rom>和所述神经网络第r层的神经元参考输出值<rt1、rt2……rtm>得到的。
如图3所述的b步骤所示,针对所述神经网络的第r层,将神经元输入值<ri1、ri2……rin>与RRAM的核值(图中表示为<W1、W2……Wn>)、神经元输出值<ro1、ro2……rom>以及反向传播误差值<B1、B2……Bm>进行乘积运算,得到该层的反向传播更新值<C1、C2……Cm>。值得说明的是,神经元输出值的个数与神经元输出值的个数不一定相同,因此用n和m分开表示,n与m的取值范围均为大于0的正整数,n可以等于m,但更多的情况下,n与m并不相同。
305、将所述神经网络中第r层的反向传播更新值<C1、C2……Cm>与预设阈值进行比较,当所述神经网络中第r层的反向传播更新值<C1、C2……Cm>大于所述预设阈值,则
307、根据所述神经网络中第r层的反向传播更新值<C1、C2……Cm>对所述RRAM中的过滤器进行更新。
本方案通过设置预设阈值对神经网络训练中的更新操作进行判断,当该更新值大于该预设阈值时才执行该更新操作,由于在神经网络训练中权值更新的幅度整体上并不太大,因此该方案可以大大减少由神经网络训练中大量更新操作带给RRAM的擦写操作,从而延长了RRAM的使用寿命,也由于更少的更新操作,减少了RRAM发生更多硬错误的可能性,从而保障了RRAM的可靠性。
可选地,在一些实施例中,所述预设阈值为静态阈值或动态阈值;当所述预设阈值为静态阈值时,所述神经网络中所有层级的预设阈值的值相同,当所述预设阈值为动态阈值时,所述神经网络中不同层级的预设阈值的值不同或部分不同。
当采用静态阈值,比较直接的做法是取一个较小的通用的阈值,例如0.01。当更新值不超过0.01的时候,忽略更新。当采用动态阈值时,分为两种情况,一种是对神经网络中不同层取不同阈值,另一种是对神经网络中部分层取一个阈值,另一部分层取另一 个值,当然,此处的另一部分也可以是另几部分,这样的话将神经网络不同层级分为不同部分,每个部分对应的阈值取值均不相同,这样做是考虑到神经网络中的误差传递导致的各层级更新敏感度不同。
本实现方式中的静态阈值为神经网络的更新操作提供了一个固定的比较阈值,即该神经网络中每一层的反向传播更新值都与该静态阈值进行比较,减少很多该静态阈值以下的擦写操作,从而延长RRAM的使用寿命,本实现方式中的动态阈值为神经网络中的不同层提供不同或部分不同的阈值,这是由于神经网络的误差传递使得神经网络中不同层网络的更新敏感度不同,为不同层的神经网络层设置不同或部分不同的阈值,可以更有针对性的进行更新操作,从而进一步保证了RRAM的使用寿命。
具体的,在一些实施例中,所述当所述预设阈值为动态阈值时,所述神经网络中不同层级的预设阈值的值不同或部分不同,具体的,所述神经网络中不同层级的预设阈值的值不同包括:所述预设阈值的值随所述神经网络层级的由后至前逐层递减;所述神经网络中不同层级的预设阈值的值部分不同包括:所述神经网络中靠前层级的预设阈值的值小于所述神经网络中靠后层级的预设阈值的值,其中,所述神经网络中靠前层级为靠近所述神经网络输入层的层级,具体为第1层至第X层;所述神经网络中靠后层级为靠近所述神经网络输出层的层级,具体为第R-X层至第R层,其中,所述R为所述神经网络的总层数,X大于1且小于R。
这样做是由于神经网络的误差传递自后向前会迅速变小,因此靠近神经网络输入层的层级对更新较为敏感,因此其预设阈值可以取较小阈值,而靠近神经网络输出层的层级对更新相对不敏感,因此其预设阈值可以取较大阈值。
综上,由于神经网络的误差传递使得神经网络中不同层网络的更新敏感度不同,为不同层的神经网络层设置不同或部分不同的阈值,可以更有针对性的进行更新操作,从而进一步保证了RRAM的使用寿命。
可选地,在一些实施例中,对所述RRAM进行错误测试,输出所述RRAM的硬错误分布图,根据所述RRAM的硬错误分布图对所述神经网络进行数据重排。
由于RRAM的应用受很多非易失性存储的自然特性的制约,比如产生较多硬错误,又由于神经网络具有稀疏性,因此在本实现方式中,根据对RRAM进行错误测试得到RRAM的硬错误分布图,并进一步根据该硬错误分布图对神经网络进行数据重排,从而降低RRAM硬错误对神经网络训练精度的影响,或增加神经网络在有一定不良单元的RRAM上的使用率。
可选地,在一些实施例中,所述根据所述RRAM的硬错误分布图对所述神经网络进行数据重排包括:将所述神经网络的稀疏数据排布到所述RRAM上恒为0的硬错误区域。
由于RRAM的应用受很多非易失性存储的自然特性的制约,比如产生较多硬错误,又由于神经网络具有稀疏性,因此在本实现方式中,将所述神经网络的稀疏数据排布到所述RRAM上恒为0的硬错误区域,可以有效降低RRAM硬错误对神经网络训练精度的影响,或增加神经网络在有一定不良单元的RRAM上的使用率。
值得说明的是,上述的错误测试并不需要在每次神经网络迭代的时候都进行,其可以是在神经网络训练之前进行,输出RRAM的硬错误分布图给本发明实施例以使得本发明实施例根据该硬错误分布图对神经网络进行重排,其也可以在进行了一定次数迭代之后进行,这是由于在神经网络训练的过程中,RRAM的硬错误有可能会持续产生,这样 训练一段时间即对该RRAM进行阶段性的错误测试并输出当时的RRAM的硬错误分布图,再根据这个RRAM的硬错误分布图对神经网络进行数据重排,这样有利于充分的利用RRAM中恒为0的硬错误单元,使神经网络中的稀疏数据尽可能的分配到这个恒为0的错误区域,从而增加神经网络在RRAM中的使用效率。
可选地,在一些实施例中,所述对所述RRAM进行错误测试,输出所述RRAM的硬错误分布图包括:对所述RRAM中各个单元分别写入测试值,将所述各个单元的测试值与所述各个单元的实际读出值分别进行比较,得到所述RRAM中各个单元的硬错误情况,所述RRAM中各个单元的硬错误情况构成所述RRAM的硬错误分布图。
下面对本申请的装置实施例进行描述,装置实施例与方法实施例对应,因此未详细描述的部分可以参见前面各方法实施例,同样的,在方法实施例中未详细展开的内容也可参见装置实施例的描述,不再重复赘述。
图4是本申请实施例的一种神经网络训练装置结构示意图。该神经网络训练装置应用于阻变存储器(Resistive Random Access Memory,RRAM)上,所述装置包括:
前向计算模块,用于将神经网络中第r层的神经元输入值<ri1、ri2……rin>输入到所述RRAM中,根据所述RRAM中的过滤器(filter)对所述神经元输入值<ri1、ri2……rin>进行计算,得到所述神经网络中第R层的神经元输出值<ro1、ro2……rom>,其中,n为大于0的正整数,m为大于0的正整数。
反向计算模块,用于根据所述RRAM的核值、所述神经网络中第r层的神经元输入值<ri1、ri2……rin>、所述神经网络中第r层的神经元输出值<ro1、ro2……rom>以及所述神经网络中第r层的反向传播误差值<B1、B2……Bm>进行计算,得到所述神经网络中第r层的反向传播更新值<C1、C2……Cm>;其中,所述RRAM的核值为所述RRAM中的过滤器的矩阵值,所述神经网络中第r层的反向传播误差值<B1、B2……Bm>是根据所述神经网络第r层的神经元输出值<ro1、ro2……rom>和所述神经网络第r层的神经元参考输出值<rt1、rt2……rtm>得到的。
比较模块,用于将所述神经网络中第r层的反向传播更新值<C1、C2……Cm>与预设阈值进行比较。
更新模块,用于当所述神经网络中第r层的反向传播更新值<C1、C2……Cm>大于所述预设阈值,则根据所述神经网络中第r层的反向传播更新值<C1、C2……Cm>对所述RRAM中的过滤器进行更新。
值得说明的是,此处的比较模块可以作为RRAM的一部分,也可以独立于RRAM设置,此处不做限制。
本方案通过设置预设阈值对神经网络训练中的更新操作进行判断,当该更新值大于该预设阈值时才执行该更新操作,由于在神经网络训练中权值更新的幅度整体上并不太大,因此该方案可以大大减少由神经网络训练中大量更新操作带给RRAM的擦写操作,从而延长了RRAM的使用寿命,也由于更少的更新操作,减少了RRAM发生更多硬错误的可能性,从而保障了RRAM的可靠性。
可选地,在一些实施例中,所述装置还包括阈值生成模块,用于生成所述预设阈值,所述预设阈值包括静态阈值或动态阈值;具体的,所述静态阈值是指:所述阈值生成模块将所述神经网络中所有层级的预设阈值的值设置为相同;所述动态阈值是指:所述阈值生成模块将所述神经网络中不同层级的预设阈值的值设置为不同或部分不同。
可选地,在一些实施例中,所述比较模块中包含所述预设阈值,所述预设阈值为静态阈值,具体的,所述静态阈值是指:所述处理器将所述神经网络中所有层级的预设阈值的值设置为相同。
可选地,在一些实施例中,所述阈值生成模块将所述神经网络中不同层级的预设阈值的值设置为不同或部分不同,具体包括:所述阈值生成模块设置所述预设阈值的值随所述神经网络层级由后至前逐层递减;或设置所述神经网络中靠前层级的预设阈值的值小于所述神经网络中靠后层级的预设阈值的值,其中,所述神经网络中靠前层级为靠近所述神经网络输入层的层级,具体为第1层至第X层;所述神经网络中靠后层级为靠近所述神经网络输出层的层级,具体为第R-X层至第R层,其中,所述R为所述神经网络的总层数,X大于1且小于R。
图5体现了本发明实施例的更新逻辑,即通过反向计算模块计算出反向传播更新值<C1、C2……Cm>(图中用C表示向量),该反向传播更新值C进入比较模块,与阈值生成模块的预设阈值进行比较,当所述反向传播更新值小于预设阈值时,写入使能信号置0,则该反向传播更新值不被写入RRAM交叉阵列;当所述反向传播更新值大于预设阈值时,写入使能信号置1,该反向传播更新值写入RRAM交叉阵列。基于这个逻辑,一些过小的反向传播更新将不被即时的反映到RRAM阵列中去,仅有足够显著的反向传播更新值才会得到反映,这样的结果是得到一个近似的神经网络训练,在下一次迭代的过程中,这个近似的神经网络训练将产生比非近似的神经网络更大的反向传播误差值<B1、B2……Bm>(图中用B表示向量),而这个更大的反向传播误差值B将在反向计算的过程中倾向性的产生一个更大的反向传播更新值C,当这个反向传播更新值C超过所述预设阈值时,则可以在下一次迭代中反映到RRAM交叉阵列中。通过这样的方式,可以认为这种近似的方法累计了多次的反向传播更新值,更这种累计的方式,是通过神经网络的训练过程本身的主要通路实现的。本发明实施例在更新的时候加入特殊的单元(如图5所示的更新逻辑)对部分更新进行阻止,而整个神经网络训练的其他过程并不受影响。
并且,本发明实施例除了可以设置静态阈值,还可以为神经网络的不同网络层设置不同或部分不同的阈值,上面称为动态阈值,由于神经网络的误差传递自后向前会迅速变小,在反向传播的过程中相当于对误差逐层取微分,因此超过一定层数后反向传播更新值就变得很小,因此对于靠近神经网络输入层的层级对于更新更为敏感,可以对其对应的预设阈值取较小阈值,而靠近神经网络输出层的层级对更新相对不敏感,因此对其对应的预设阈值取较大阈值。这样为不同层的神经网络层设置不同或部分不同的阈值,可以更有针对性的进行更新操作,从而进一步保证了RRAM的使用寿命。
可选地,在一些实施例中,所述装置还包括错误测试模块和重排模块,所述错误测试模块用于对所述RRAM进行错误测试,输出所述RRAM的硬错误分布图给所述重排模块,所述重排模块用于根据所述RRAM的硬错误分布图对所述神经网络进行数据重排。
可选地,在一些实施例中,所述重排模块具体用于:将所述神经网络的稀疏数据排布到所述RRAM上恒为0的硬错误区域。
如图6所示,前向计算模块、反向计算模块、比较模块及更新模块完成一次或多次神经网络训练的迭代后,错误测试模块可以对所述RRAM进行错误测试,并由重排模块根据测试输出的RRAM的硬错误分布图对所述神经网络进行重排。这里需要说明的是,错误测试模块和重排模块不一定也不需要在每次神经网络训练迭代中都执行其测错和重 排的工作,错误测试的工作可以在神经网络训练前,即离线的时候提前进行,也可以在进行了一定次数迭代之后进行,这是由于在神经网络训练的过程中,RRAM的硬错误有可能会持续产生,这样训练一段时间即对该RRAM进行阶段性的错误测试并输出当时的RRAM的硬错误分布图,再由重排模块根据这个RRAM的硬错误分布图对神经网络进行数据重排,这样有利于充分的利用RRAM中恒为0的硬错误单元,使神经网络中的稀疏数据尽可能的分配到这个恒为0的错误区域,从而增加神经网络在RRAM中的使用效率。
具体的,可以如图7所示,重排模块接受神经网络迭代完成或者部分完成得到的神经网络参数W’输入(该神经网络参数W’具体由图4中的更新模块输出,该W’为经过C向量对W向量进行更新后的结果,多列W’向量构成P矩阵),具体如图7中的待重排神经网络权值内存区(假设为P)所示。另外一个输入为该RRAM的硬错误分布图(假设为Q)。该模块的主要作用为对权值矩阵P和错误分布矩阵Q进行匹配,以P的行为粒度,将P中为0(或者近似接近0)的部分,最优化的交换到Q中恒为0的部分。重排的结果为待写入神经网络权值内存区(假设为P’)。本发明中P’是由P经过行交换获得的。
对于交换,图7给出了一个交换的例子。如图7中的错误测试模块所示,假设该RRAM是一个6x6的交叉阵列,则Q为一个6x6的矩阵。错误测试模块执行完RRAM的错误测试之后,Q中得到了3个带有恒为0错误的单元,分别为第一行的第5列元素,第二行的第6列元素,和第四行的第3列元素。
如图7中的重排模块所示,P是一个示例的重排前的权值矩阵(即神经网络的权值数据,由更新后的神经网络权值W’构成),其中为0的单元实际为近似接近0的值,与其他例如0.003这样的值相比,属于更高阶的小值,在图中用0表示。可以看到,P中的大约有2/3的数值为0。如果直接将P写入RRAM,则P中的不为0的元素,例如第一行第5列的0.009和第四行第3列的0.003,会落入到Q中的恒为0的错误单元上。
如图7中的重排模块所示,重排过程中,根据Q中的恒为0的单元的分布情况,重排模块将P中的第一行和第二行进行了交换,第5行和第6行进行了交换,得到了更新后的P’,在P’中,Q中恒为0的3个单元格(1,5),(2,6),(4,3)(注:格式为行,列),在P’中相应的位置都对应为数值0,对后续的计算不产生影响。
在本发明实施例中,具体可以采用软件的方法进行重排逻辑的实现。具体的执行可以使用通用处理器进行完成。错误测试模块得到的硬错误分布图Q可以写入到RRAM的外围电路的专用区域中,和神经网络更新后的权值区域一起送入到通用处理器中进行重排。
在重排模块中,其重排逻辑的出发点在于神经网络具有稀疏性。大多数数据都近似的接近0。在具有稀疏性的神经网络中,能够较简单的获得和恒为0的单元的匹配结果。
另外,需要说明的是,由于RRAM的硬错误包括恒为0的硬错误和恒为1的硬错误,由于恒为1的错误远少于恒为0的错误,因此在本发明的实施例中,重点关注恒为0的硬错误,对于恒为1的错误可以采取整行跳过的方式进行处理,或者也可以采取别的方式进行处理,本发明实施例对此不做限制。
可选地,在一些实施例中,所述错误测试模块具体用于:对所述RRAM中各个单元分别写入测试值,将所述各个单元的测试值与所述各个单元的实际读出值分别进行比较,得到所述RRAM中各个单元的硬错误情况,所述RRAM中各个单元的硬错误情况构成 所述RRAM的硬错误分布图。
如图8示出了错误检测模块的工作原理,恒为0,恒为1的错误需要通过修改RRAM阻值并与理想修改结果进行比较来检测。因此,在检错前,需先读取并记录全部RRAM的当前阻值。为提升检错效率,本发明实施例可以使用分块检错的方式,首先将原始RRAM阵列分成若干个互不相交的子矩阵,并对每一个子矩阵整体进行检错。
如图8展示了阵列大小为5×5的示例性检错电路(该阵列是一个检错子阵列,具体可以是前文或前图中交叉阵列(下文称原始交叉阵列)中的部分阵列,由于总的交叉阵列通常比较大,因此,错误检测模块进行检错的时候可以依次对检错子阵列进行错误检测),在进行恒为0,恒为1的检错时,首先对待检错的子阵列中的所有RRAM单元写入一个最小偏差,该偏差的大小由RRAM器件的阻值精度决定。为避免器件阻值的写入饱和,在检错恒为0错误时,需写入减小电阻(增大电导)的最小偏差;在检错恒为1错误时,需写入增大电阻(减小电导)的最小偏差。之后,对检错子阵列的输入接口(该输入接口如图8左端所示)施加检错电压,可以在该检错子阵列的输出接口(该输出接口如图8下端ADC所示)得到写入偏差后的计算结果,经过模数转换作为待比较的信号。同时,对同一列的原始计算结果增加相同数量的最小偏差,可以得到无错误情况下的理想计算结果,用作待比较的参考信号。将每一列的实际结果(即待比较的信号)和理想结果(即待比较的参考信号)进行比较,若二者相同,则认为该列不包含错误;若二者不同,则认为该列包含错误。
RRAM器件具有双极性,允许施加反向偏置电压。因此,可以在检错子阵列的输出接口施加检错电压,并将图8所示检错电路连接在检错子阵列的输入接口,从而进行RRAM行的检错。最终综合子阵列的行检错结果与列检错结果,如果某一单元对应的行和列都包含错误,则认为该单元具有包含错误的可能性,并将其记录在图7所示的错误分布图Q中。
假设原始交叉阵列的尺寸为M×N,检错子阵列的尺寸为S×T。由于同一行的子阵列可以同时进行列检错,同一列的子阵列可以同时进行行检错,相比于原始的逐个检错的方法,该方法能够有效的降低检错复杂度。
下面对本申请的另一装置实施例进行描述,该装置实施例与上面的各方法实施例及装置实施例对应,因此未详细描述的部分可以参见前面各方法实施例与装置实施例,不再重复赘述。
图9是本申请实施例的一种神经网络训练装置900结构示意图。该神经网络训练装置应用于阻变存储器(Resistive Random Access Memory,RRAM)上,所述装置包括:
处理器910,用于将神经网络中第r层的神经元输入值<ri1、ri2……rin>输入到所述RRAM中,根据所述RRAM中的过滤器(filter)对所述神经元输入值<ri1、ri2……rin>进行计算,得到所述神经网络中第R层的神经元输出值<ro1、ro2……rom>,其中,n为大于0的正整数,m为大于0的正整数;并根据所述RRAM的核值、所述神经网络中第r层的神经元输入值<ri1、ri2……rin>、所述神经网络中第r层的神经元输出值<ro1、ro2……rom>以及所述神经网络中第r层的反向传播误差值<B1、B2……Bm>进行计算,得到所述神经网络中第r层的反向传播更新值<C1、C2……Cm>;其中,所述RRAM的核值为所述RRAM中的过滤器的矩阵值,所述神经网络中第r层的反向传播误差值<B1、B2……Bm>是根据所述神经网络第r层的神经元输出值<ro1、ro2……rom>和所述神经网络第r 层的神经元参考输出值<rt1、rt2……rtm>得到的。
比较器920,用于将所述神经网络中第r层的反向传播更新值<C1、C2……Cm>与预设阈值进行比较,当所述神经网络中第r层的反向传播更新值<C1、C2……Cm>大于所述预设阈值,则
由所述处理器910根据所述神经网络中第r层的反向传播更新值<C1、C2……Cm>对所述RRAM中的过滤器进行更新。
值得说明的是,此处的比较器920可以作为RRAM的一部分,也可以独立于RRAM设置,此处不做限制。
本方案通过设置预设阈值对神经网络训练中的更新操作进行判断,当该更新值大于该预设阈值时才执行该更新操作,由于在神经网络训练中权值更新的幅度整体上并不太大,因此该方案可以大大减少由神经网络训练中大量更新操作带给RRAM的擦写操作,从而延长了RRAM的使用寿命,也由于更少的更新操作,减少了RRAM发生更多硬错误的可能性,从而保障了RRAM的可靠性。
可选地,在一些实施例中,所述处理器910还用于生成所述预设阈值,所述预设阈值包括静态阈值或动态阈值;具体的,所述静态阈值是指:所述处理器910将所述神经网络中所有层级的预设阈值的值设置为相同;所述动态阈值是指:所述处理器910将所述神经网络中不同层级的预设阈值的值设置为不同或部分不同。
可选地,在一些实施例中,所述比较器920中包含所述预设阈值,所述预设阈值为静态阈值,具体的,所述静态阈值是指:所述处理器910将所述神经网络中所有层级的预设阈值的值设置为相同。
可选地,在一些实施例中,所述处理器910将所述神经网络中不同层级的预设阈值的值设置为不同或部分不同,具体包括:所述处理器910设置所述预设阈值的值随所述神经网络层级由后至前逐层递减;或设置所述神经网络中靠前层级的预设阈值的值小于所述神经网络中靠后层级的预设阈值的值,其中,所述神经网络中靠前层级为靠近所述神经网络输入层的层级,具体为第1层至第X层;所述神经网络中靠后层级为靠近所述神经网络输出层的层级,具体为第R-X层至第R层,其中,所述R为所述神经网络的总层数,X大于1且小于R。
本发明实施例除了可以设置静态阈值,还可以为神经网络的不同网络层设置不同或部分不同的阈值,上面称为动态阈值,由于神经网络的误差传递自后向前会迅速变小,在反向传播的过程中相当于对误差逐层取微分,因此超过一定层数后反向传播更新值就变得很小,因此对于靠近神经网络输入层的层级对于更新更为敏感,可以对其对应的预设阈值取较小阈值,而靠近神经网络输出层的层级对更新相对不敏感,因此对其对应的预设阈值取较大阈值。这样为不同层的神经网络层设置不同或部分不同的阈值,可以更有针对性的进行更新操作,从而进一步保证了RRAM的使用寿命。
可选地,在一些实施例中,所述处理器910还用于接收RRAM的硬错误分布图,并根据所述RRAM的硬错误分布图对所述神经网络进行数据重排,其中,所述硬错误分布图由所述处理器910的周边电路对所述RRAM进行错误测试得到。
可选地,在一些实施例中,所述处理器910具体用于根据所述RRAM的硬错误分布图将所述神经网络的稀疏数据排布到所述RRAM上恒为0的硬错误区域。
值得说明的是,该错误测试可以是一个程序,或一个测试电路,搭载在所述RRAM 上实现。
值得说明的是,上述的错误测试逻辑并不需要在每次神经网络迭代的时候都进行,其可以是在神经网络训练之前进行,输出RRAM的硬错误分布图给本发明实施例以使得本发明实施例中的处理器910根据该硬错误分布图对神经网络进行重排,其也可以在进行了一定次数迭代之后进行,这是由于在神经网络训练的过程中,RRAM的硬错误有可能会持续产生,这样训练一段时间即对该RRAM进行阶段性的错误测试并输出当时的RRAM的硬错误分布图,再由处理器910根据这个RRAM的硬错误分布图对神经网络进行数据重排,这样有利于充分的利用RRAM中恒为0的硬错误单元,使神经网络中的稀疏数据尽可能的分配到这个恒为0的错误区域,从而增加神经网络在RRAM中的使用效率。
可选地,在一些实施例中,所述处理器910接收到的RRAM的硬错误分布图,具体是通过错误测试电路得到的,所述错误测试电路具体用于:对所述RRAM中各个单元分别写入测试值,将所述各个单元的测试值与所述各个单元的实际读出值分别进行比较,得到所述RRAM中各个单元的硬错误情况,所述RRAM中各个单元的硬错误情况构成所述RRAM的硬错误分布图,其中,所述错误测试电路可以独立于处理器910存在,并与所述处理器910电连接。
一种计算机可读存储介质,当其在服务器或终端上运行时,使得所述服务器或所述终端执行如上述方法实施例中任一个实施例所述的神经网络训练方法。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述模块的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个模块或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或模块的间接耦合或通信连接,可以是电性,机械或其它的形式。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个模块单独物理存在,也可以两个或两个以上模块集成在一个模块中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序 代码的介质。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应所述以权利要求的保护范围为准。

Claims (20)

  1. 一种神经网络训练方法,其特征在于,应用于阻变存储器(Resistive Random Access Memory,RRAM)上,所述方法包括:
    将神经网络中第r层的神经元输入值<ri1、ri2……rin>输入到所述RRAM中,根据所述RRAM中的过滤器(filter)对所述神经元输入值<ri1、ri2……rin>进行计算,得到所述神经网络中第r层的神经元输出值<ro1、ro2……rom>,其中,n为大于0的正整数,m为大于0的正整数;
    根据所述RRAM的核值、所述神经网络中第r层的神经元输入值<ri1、ri2……rin>、所述神经网络中第r层的神经元输出值<ro1、ro2……rom>以及所述神经网络中第r层的反向传播误差值<B1、B2……Bm>进行计算,得到所述神经网络中第r层的反向传播更新值<C1、C2……Cm>;其中,所述RRAM的核值为所述RRAM中的过滤器的矩阵值,所述神经网络中第r层的反向传播误差值<B1、B2……Bm>是根据所述神经网络第r层的神经元输出值<ro1、ro2……rom>和所述神经网络第r层的神经元参考输出值<rt1、rt2……rtm>得到的;
    将所述神经网络中第r层的反向传播更新值<C1、C2……Cm>与预设阈值进行比较,当所述神经网络中第r层的反向传播更新值<C1、C2……Cm>大于所述预设阈值,则
    根据所述神经网络中第r层的反向传播更新值<C1、C2……Cm>对所述RRAM中的过滤器进行更新。
  2. 如权利要求1所述的方法,其特征在于,所述预设阈值为静态阈值或动态阈值;当所述预设阈值为静态阈值时,所述神经网络中所有层级的预设阈值的值相同,当所述预设阈值为动态阈值时,所述神经网络中不同层级的预设阈值的值不同或部分不同。
  3. 如权利要求1或2所述的方法,其特征在于,对所述RRAM进行错误测试,输出所述RRAM的硬错误分布图,根据所述RRAM的硬错误分布图对所述神经网络进行数据重排。
  4. 如权利要求3所述的方法,其特征在于,所述根据所述RRAM的硬错误分布图对所述神经网络进行数据重排包括:将所述神经网络的稀疏数据排布到所述RRAM上恒为0的硬错误区域。
  5. 如权利要求3或4所述的方法,其特征在于,所述对所述RRAM进行错误测试,输出所述RRAM的硬错误分布图包括:
    对所述RRAM中各个单元分别写入测试值,将所述各个单元的测试值与所述各个单元的实际读出值分别进行比较,得到所述RRAM中各个单元的硬错误情况,所述RRAM中各个单元的硬错误情况构成所述RRAM的硬错误分布图。
  6. 如权利要求2至5任一所述的方法,其特征在于,所述当所述预设阈值为动态阈值时,所述神经网络中不同层级的预设阈值的值不同或部分不同,具体的,
    所述神经网络中不同层级的预设阈值的值不同包括:所述预设阈值的值随所述神经网络层级的由后至前逐层递减;
    所述神经网络中不同层级的预设阈值的值部分不同包括:所述神经网络中靠前层级的预设阈值的值小于所述神经网络中靠后层级的预设阈值的值,其中,所述神经网络中靠前层级为靠近所述神经网络输入层的层级,具体为第1层至第X层;所述神经网络中靠后层级为靠近所述神经网络输出层的层级,具体为第R-X层至第R层,其中,所述R为所述神经网络的总层数,X大于1且小于R。
  7. 一种神经网络训练装置,其特征在于,应用于阻变存储器(Resistive Random Access Memory,RRAM)上,所述装置包括:
    前向计算模块,用于将神经网络中第r层的神经元输入值<ri1、ri2……rin>输入到所述RRAM中,根据所述RRAM中的过滤器(filter)对所述神经元输入值<ri1、ri2……rin>进行计算,得到所述神经网络中第R层的神经元输出值<ro1、ro2……rom>,其中,n为大于0的正整数,m为大于0的正整数;
    反向计算模块,用于根据所述RRAM的核值、所述神经网络中第r层的神经元输入值<ri1、ri2……rin>、所述神经网络中第r层的神经元输出值<ro1、ro2……rom>以及所述神经网络中第r层的反向传播误差值<B1、B2……Bm>进行计算,得到所述神经网络中第r层的反向传播更新值<C1、C2……Cm>;其中,所述RRAM的核值为所述RRAM中的过滤器的矩阵值,所述神经网络中第r层的反向传播误差值<B1、B2……Bm>是根据所述神经网络第r层的神经元输出值<ro1、ro2……rom>和所述神经网络第r层的神经元参考输出值<rt1、rt2……rtm>得到的;
    比较模块,用于将所述神经网络中第r层的反向传播更新值<C1、C2……Cm>与 预设阈值进行比较;
    更新模块,用于当所述神经网络中第r层的反向传播更新值<C1、C2……Cm>大于所述预设阈值,则根据所述神经网络中第r层的反向传播更新值<C1、C2……Cm>对所述RRAM中的过滤器进行更新。
  8. 如权利要求7所述的装置,其特征在于,所述装置还包括阈值生成模块,用于生成所述预设阈值,所述预设阈值包括静态阈值或动态阈值;具体的,所述静态阈值是指:所述阈值生成模块将所述神经网络中所有层级的预设阈值的值设置为相同;所述动态阈值是指:所述阈值生成模块将所述神经网络中不同层级的预设阈值的值设置为不同或部分不同。
  9. 如权利要求7所述的装置,其特征在于,所述比较模块中包含所述预设阈值,所述预设阈值为静态阈值,具体的,所述静态阈值是指:所述处理器将所述神经网络中所有层级的预设阈值的值设置为相同。
  10. 如权利要求7至9所述的装置,其特征在于,所述装置还包括错误测试模块和重排模块,所述错误测试模块用于对所述RRAM进行错误测试,输出所述RRAM的硬错误分布图给所述重排模块,所述重排模块用于根据所述RRAM的硬错误分布图对所述神经网络进行数据重排。
  11. 如权利要求10所述的装置,其特征在于,所述重排模块具体用于:将所述神经网络的稀疏数据排布到所述RRAM上恒为0的硬错误区域。
  12. 如权利要求10或11所述的装置,其特征在于,所述错误测试模块具体用于:对所述RRAM中各个单元分别写入测试值,将所述各个单元的测试值与所述各个单元的实际读出值分别进行比较,得到所述RRAM中各个单元的硬错误情况,所述RRAM中各个单元的硬错误情况构成所述RRAM的硬错误分布图。
  13. 如权利要求8、10至12任一所述的装置,其特征在于,所述阈值生成模块将所述神经网络中不同层级的预设阈值的值设置为不同或部分不同,具体包括:
    所述阈值生成模块设置所述预设阈值的值随所述神经网络层级由后至前逐层递减;或设置所述神经网络中靠前层级的预设阈值的值小于所述神经网络中靠后层级的预设阈值的值,其中,所述神经网络中靠前层级为靠近所述神经网络输入层的层级,具体为第1层至第X层;所述神经网络中靠后层级为靠近所述神经网络输出层的层级,具体为第R-X层至第R层,其中,所述R为所述神经网络的总层数,X大于1且小于R。
  14. 一种神经网络训练装置,其特征在于,应用于阻变存储器(Resistive Random Access Memory,RRAM)上,所述装置包括:
    处理器,用于将神经网络中第r层的神经元输入值<ri1、ri2……rin>输入到所述RRAM中,根据所述RRAM中的过滤器(filter)对所述神经元输入值<ri1、ri2……rin>进行计算,得到所述神经网络中第R层的神经元输出值<ro1、ro2……rom>,其中,n为大于0的正整数,m为大于0的正整数;并根据所述RRAM的核值、所述神经网络中第r层的神经元输入值<ri1、ri2……rin>、所述神经网络中第r层的神经元输出值<ro1、ro2……rom>以及所述神经网络中第r层的反向传播误差值<B1、B2……Bm>进行计算,得到所述神经网络中第r层的反向传播更新值<C1、C2……Cm>;其中,所述RRAM的核值为所述RRAM中的过滤器的矩阵值,所述神经网络中第r层的反向传播误差值<B1、B2……Bm>是根据所述神经网络第r层的神经元输出值<ro1、ro2……rom>和所述神经网络第r层的神经元参考输出值<rt1、rt2……rtm>得到的;
    比较器,用于将所述神经网络中第r层的反向传播更新值<C1、C2……Cm>与预设阈值进行比较,当所述神经网络中第r层的反向传播更新值<C1、C2……Cm>大于所述预设阈值,则
    由所述处理器根据所述神经网络中第r层的反向传播更新值<C1、C2……Cm>对所述RRAM中的过滤器进行更新。
  15. 如权利要求14所述的装置,其特征在于,所述处理器还用于生成所述预设阈值,所述预设阈值包括静态阈值或动态阈值;具体的,所述静态阈值是指:所述处理器将所述神经网络中所有层级的预设阈值的值设置为相同;所述动态阈值是指:所述处理器将所述神经网络中不同层级的预设阈值的值设置为不同或部分不同。
  16. 如权利要求14所述的装置,其特征在于,所述比较器中包含所述预设阈值,所述预设阈值为静态阈值,具体的,所述静态阈值是指:所述处理器将所述神经网络中所有层级的预设阈值的值设置为相同。
  17. 如权利要求14至16任一所述的装置,其特征在于,所述处理器还用于接收RRAM的硬错误分布图,并根据所述RRAM的硬错误分布图对所述神经网络进行数据重排,其中,所述硬错误分布图由所述处理器的周边电路对所述RRAM进行错误测试得到。
  18. 如权利要求17所述的装置,其特征在于,所述处理器具体用于根据所述RRAM 的硬错误分布图将所述神经网络的稀疏数据排布到所述RRAM上恒为0的硬错误区域。
  19. 如权利要求14、15、17或18任一所述的装置,其特征在于,所述处理器将所述神经网络中不同层级的预设阈值的值设置为不同或部分不同,具体包括:
    所述处理器设置所述预设阈值的值随所述神经网络层级由后至前逐层递减;或设置所述神经网络中靠前层级的预设阈值的值小于所述神经网络中靠后层级的预设阈值的值,其中,所述神经网络中靠前层级为靠近所述神经网络输入层的层级,具体为第1层至第X层;所述神经网络中靠后层级为靠近所述神经网络输出层的层级,具体为第R-X层至第R层,其中,所述R为所述神经网络的总层数,X大于1且小于R。
  20. 一种计算机可读存储介质,包括指令,其特征在于,当其在服务器或终端上运行时,使得所述服务器或所述终端执行如权利要求1-6中任一项所述的神经网络训练方法。
PCT/CN2018/091033 2017-06-16 2018-06-13 一种神经网络训练方法和装置 WO2018228424A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP18818039.2A EP3627401B1 (en) 2017-06-16 2018-06-13 Method and device for training neural network
US16/714,011 US11475300B2 (en) 2017-06-16 2019-12-13 Neural network training method and apparatus

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710459806.0 2017-06-16
CN201710459806.0A CN109146073B (zh) 2017-06-16 2017-06-16 一种神经网络训练方法和装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/714,011 Continuation US11475300B2 (en) 2017-06-16 2019-12-13 Neural network training method and apparatus

Publications (1)

Publication Number Publication Date
WO2018228424A1 true WO2018228424A1 (zh) 2018-12-20

Family

ID=64659809

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/091033 WO2018228424A1 (zh) 2017-06-16 2018-06-13 一种神经网络训练方法和装置

Country Status (4)

Country Link
US (1) US11475300B2 (zh)
EP (1) EP3627401B1 (zh)
CN (1) CN109146073B (zh)
WO (1) WO2018228424A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109918237A (zh) * 2019-04-01 2019-06-21 北京中科寒武纪科技有限公司 异常网络层确定方法及相关产品
WO2020260849A1 (en) * 2019-06-25 2020-12-30 Arm Limited Non-volatile memory-based compact mixed-signal multiply-accumulate engine

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6708146B2 (ja) * 2017-03-03 2020-06-10 株式会社デンソー ニューラルネットワーク回路
US11880769B2 (en) * 2018-11-14 2024-01-23 Advanced Micro Devices, Inc. Using multiple functional blocks for training neural networks
SG11202108799QA (en) * 2019-02-26 2021-09-29 Lightmatter Inc Hybrid analog-digital matrix processors
US11402233B2 (en) * 2019-07-23 2022-08-02 Mapsted Corp. Maintaining a trained neural network in magnetic fingerprint based indoor navigation
CN110515454B (zh) * 2019-07-24 2021-07-06 电子科技大学 一种基于内存计算的神经网络架构电子皮肤
KR20210024865A (ko) * 2019-08-26 2021-03-08 삼성전자주식회사 데이터를 처리하는 방법 및 장치
CN111126596B (zh) * 2019-12-17 2021-03-19 百度在线网络技术(北京)有限公司 神经网络训练中的信息处理方法、设备与存储介质
US11507443B2 (en) * 2020-04-10 2022-11-22 Micron Technology, Inc. Memory fault map for an accelerated neural network
US20220058489A1 (en) * 2020-08-19 2022-02-24 The Toronto-Dominion Bank Two-headed attention fused autoencoder for context-aware recommendation
CN113613252B (zh) * 2021-07-14 2023-11-07 上海德衡数据科技有限公司 基于5g的网络安全的分析方法及系统
CN115564036B (zh) * 2022-10-25 2023-06-30 厦门半导体工业技术研发有限公司 基于rram器件的神经网络阵列电路及其设计方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120011089A1 (en) * 2010-07-08 2012-01-12 Qualcomm Incorporated Methods and systems for neural processor training by encouragement of correct output
CN104376362A (zh) * 2014-11-21 2015-02-25 北京大学 用于人工神经网络的突触器件和人工神经网络
CN105303252A (zh) * 2015-10-12 2016-02-03 国家计算机网络与信息安全管理中心 基于遗传算法的多阶段神经网络模型训练方法
CN106530210A (zh) * 2016-10-31 2017-03-22 北京大学 基于阻变存储器件阵列实现并行卷积计算的设备和方法

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9342780B2 (en) * 2010-07-30 2016-05-17 Hewlett Packard Enterprise Development Lp Systems and methods for modeling binary synapses
KR101818671B1 (ko) * 2011-04-19 2018-02-28 삼성전자주식회사 불휘발성 메모리 장치, 불휘발성 메모리 시스템 및 그것의 랜덤 데이터 읽기 방법
NL2010887C2 (en) * 2013-05-29 2014-12-02 Univ Delft Tech Memristor.
CN103810497B (zh) * 2014-01-26 2017-04-19 华中科技大学 一种基于忆阻器的图像识别系统及方法
US9563373B2 (en) * 2014-10-21 2017-02-07 International Business Machines Corporation Detecting error count deviations for non-volatile memory blocks for advanced non-volatile memory block management
CN104463324A (zh) * 2014-11-21 2015-03-25 长沙马沙电子科技有限公司 一种基于大规模高性能集群的卷积神经网络并行处理方法
CN105224986B (zh) * 2015-09-29 2018-01-23 清华大学 基于忆阻器件的深度神经网络系统
CN106201651A (zh) * 2016-06-27 2016-12-07 鄞州浙江清华长三角研究院创新中心 神经形态芯片的模拟器
CN106650922B (zh) * 2016-09-29 2019-05-03 清华大学 硬件神经网络转换方法、计算装置、软硬件协作系统
CN106599586B (zh) * 2016-12-19 2019-05-28 北京国能中电节能环保技术股份有限公司 基于神经网络的scr智能喷氨优化方法及装置
CN106845634B (zh) * 2016-12-28 2018-12-14 华中科技大学 一种基于忆阻器件的神经元电路
US11550709B2 (en) * 2019-04-03 2023-01-10 Macronix International Co., Ltd. Memory device and wear leveling method for the same

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120011089A1 (en) * 2010-07-08 2012-01-12 Qualcomm Incorporated Methods and systems for neural processor training by encouragement of correct output
CN104376362A (zh) * 2014-11-21 2015-02-25 北京大学 用于人工神经网络的突触器件和人工神经网络
CN105303252A (zh) * 2015-10-12 2016-02-03 国家计算机网络与信息安全管理中心 基于遗传算法的多阶段神经网络模型训练方法
CN106530210A (zh) * 2016-10-31 2017-03-22 北京大学 基于阻变存储器件阵列实现并行卷积计算的设备和方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3627401A4 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109918237A (zh) * 2019-04-01 2019-06-21 北京中科寒武纪科技有限公司 异常网络层确定方法及相关产品
CN109918237B (zh) * 2019-04-01 2022-12-09 中科寒武纪科技股份有限公司 异常网络层确定方法及相关产品
WO2020260849A1 (en) * 2019-06-25 2020-12-30 Arm Limited Non-volatile memory-based compact mixed-signal multiply-accumulate engine

Also Published As

Publication number Publication date
CN109146073A (zh) 2019-01-04
EP3627401B1 (en) 2022-09-28
EP3627401A1 (en) 2020-03-25
EP3627401A4 (en) 2020-07-29
CN109146073B (zh) 2022-05-24
US11475300B2 (en) 2022-10-18
US20200117997A1 (en) 2020-04-16

Similar Documents

Publication Publication Date Title
WO2018228424A1 (zh) 一种神经网络训练方法和装置
CN111279366B (zh) 人工神经网络的训练
US10692570B2 (en) Neural network matrix multiplication in memory cells
JP6995131B2 (ja) 抵抗型処理ユニットアレイ、抵抗型処理ユニットアレイを形成する方法およびヒステリシス動作のための方法
US11409438B2 (en) Peripheral circuit and system supporting RRAM-based neural network training
US9715655B2 (en) Method and apparatus for performing close-loop programming of resistive memory devices in crossbar array based hardware circuits and systems
CN109800876B (zh) 一种基于NOR Flash模块的神经网络的数据运算方法
US20200012924A1 (en) Pipelining to improve neural network inference accuracy
Zhou et al. Noisy machines: Understanding noisy neural networks and enhancing robustness to analog hardware errors using distillation
US20190325291A1 (en) Resistive processing unit with multiple weight readers
US10340002B1 (en) In-cell differential read-out circuitry for reading signed weight values in resistive processing unit architecture
CN111478703B (zh) 基于忆阻交叉阵列的处理电路及输出电流的补偿方法
Schuman et al. Resilience and robustness of spiking neural networks for neuromorphic systems
US20210383203A1 (en) Apparatus and method with neural network
Eldebiky et al. Correctnet: Robustness enhancement of analog in-memory computing for neural networks by error suppression and compensation
Yang et al. Essence: Exploiting structured stochastic gradient pruning for endurance-aware reram-based in-memory training systems
CN111461308B (zh) 忆阻神经网络及权值训练方法
CN115796252A (zh) 权重写入方法及装置、电子设备和存储介质
de Lima et al. Quantization-aware in-situ training for reliable and accurate edge ai
CN115699028A (zh) 模拟人工智能网络推理的逐行卷积神经网络映射的高效瓦片映射
Chen et al. A Multifault‐Tolerant Training Scheme for Nonideal Memristive Neural Networks
Zhao et al. Intra-array non-idealities modeling and algorithm optimization for RRAM-based computing-in-memory applications
US20230298663A1 (en) Neural network based method and device
CN116523011B (zh) 基于忆阻的二值神经网络层电路及二值神经网络训练方法
Nie et al. Cross-layer designs against non-ideal effects in ReRAM-based processing-in-memory system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18818039

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2018818039

Country of ref document: EP

Effective date: 20191220