WO2023217017A1 - 基于忆阻器阵列的贝叶斯神经网络的变分推理方法和装置 - Google Patents

基于忆阻器阵列的贝叶斯神经网络的变分推理方法和装置 Download PDF

Info

Publication number
WO2023217017A1
WO2023217017A1 PCT/CN2023/092447 CN2023092447W WO2023217017A1 WO 2023217017 A1 WO2023217017 A1 WO 2023217017A1 CN 2023092447 W CN2023092447 W CN 2023092447W WO 2023217017 A1 WO2023217017 A1 WO 2023217017A1
Authority
WO
WIPO (PCT)
Prior art keywords
weight
memristor
neural network
bayesian neural
memristor array
Prior art date
Application number
PCT/CN2023/092447
Other languages
English (en)
French (fr)
Inventor
吴华强
林钰登
高滨
唐建石
张清天
钱鹤
Original Assignee
清华大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 清华大学 filed Critical 清华大学
Publication of WO2023217017A1 publication Critical patent/WO2023217017A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks

Definitions

  • Embodiments of the present disclosure relate to a variational reasoning method and variational reasoning device based on a Bayesian neural network of a memristor array.
  • Variational inference is a deterministic approximate inference method. Approximate inference is simply a method used to approximate a computationally complex distribution or at least obtain some statistics of the target distribution. In Bayesian statistics, all inference problems for unknown quantities can be regarded as the calculation of posterior probability (posterior), and this probability is usually difficult to calculate.
  • the Markov chain Monte Carlo algorithm can be used for approximation. However, for a large amount of data, the Markov chain Monte Carlo algorithm is slow to calculate, and variational reasoning provides people with a faster and simpler approximate reasoning method suitable for large amounts of data.
  • At least one embodiment of the present disclosure provides a variational reasoning method for a Bayesian neural network based on a memristor array.
  • the memristor array includes a plurality of memristors arranged in an array.
  • the Bayesian neural network is trained to obtain The weight matrix is mapped to the memristor array.
  • the method includes: for the memristor array mapped according to the weight matrix of the Bayesian neural network, obtain the conductance states of multiple memristors of the current memristor array.
  • the variational inference method provided by at least one embodiment of the present disclosure further includes: in response to the need to continue training, using the updated conductance state of the memristor of the weight matrix of the memristor array for the Bayesian neural network. Keep training or stop training.
  • N memristors are used in the memristor array to correspond to each weight of the weight matrix between layers in the Bayesian neural network, and N is An integer greater than or equal to 2, each of the multiple weight sampling samples for each weight is Both offset and scale are hyperparameters.
  • Current n is the current value obtained by reading and sampling the nth memristor among N memristors, 1 ⁇ n ⁇ N.
  • N memristors are used in the memristor array to correspond to each weight of the weight matrix between layers in the Bayesian neural network, including: Use N memristors for each weight to realize the distribution corresponding to the weight. For the random probability distribution corresponding to the weight, N conductance values are calculated, and the N conductance values are mapped to N memristors respectively.
  • the current value obtained by reading and sampling the target memristor obeys a probability distribution with the conductance state ⁇ of the target memristor as the mean value and the scale factor as S.
  • the loss function includes a KL loss term and a likelihood loss term
  • the KL loss term is obtained from multiple weight sampling samples of each weight
  • the KL loss term is obtained from multiple output results
  • the likelihood loss term is calculated.
  • the KL loss term is calculated by calculating the mean and standard deviation of multiple weight sampling samples for each weight.
  • At least one embodiment of the present disclosure also provides a variational reasoning device for a Bayesian neural network based on a memristor array.
  • the memristor array includes a plurality of memristors arranged in an array.
  • the trained Bayesian neural network The obtained weight matrix is mapped to the memristor array.
  • the device includes: an acquisition unit configured to obtain the current memristor array mapped according to the weight matrix of the Bayesian neural network.
  • a forward propagation unit configured to obtain a plurality of weight sampling samples for each weight of the Bayesian neural network and perform multiple times using the Bayesian neural network Forward propagation to obtain multiple output results;
  • the computing unit is configured to obtain the loss function of the Bayesian neural network based on multiple weight sampling samples and multiple output results of each weight; perform back propagation on the loss function to obtain a gradient of the conductance state of the memristor for each weight of the Bayesian neural network in the memristor array; an update unit configured to update the conductance state of the memristor for the Bayesian neural network in the memristor array according to the gradient conductivity state.
  • Figure 1 shows a schematic flow chart of a variational inference method based on a Bayesian neural network based on a memristor array provided by at least one embodiment of the present disclosure
  • Figure 2A shows a schematic structure of a memristor array
  • Figure 2B is a schematic diagram of a memristor device
  • Figure 2C is a schematic diagram of another memristor device
  • Figure 2D shows a schematic diagram of mapping the weight matrix of a Bayesian neural network to a memristor array
  • Figure 3 shows a schematic flow chart of a variational inference method provided by at least one embodiment of the present disclosure
  • FIG. 4 shows a schematic block diagram of a variational reasoning device for a Bayesian neural network based on a memristor array provided by at least one embodiment of the present disclosure.
  • Bayesian Neural Network is a probabilistic model that places neural networks in a Bayesian framework and can describe complex random patterns. In order to account for parameter uncertainty, it is best to be able to construct a Bayesian model. Under the Bayesian model, parameters are not represented by single values but by probability distributions. Given observational data, the distribution of all parameters of the Bayesian model is called the posterior distribution. As an analogue of deriving optimal deterministic models via gradient-based updates, Bayesian machine learning aims to learn an approximation of the posterior distribution. To this end, the inventor's previous Chinese patent application publication CN110956256A describes a method and device for implementing a Bayesian neural network using the intrinsic noise of a memristor, which is hereby cited in its entirety as part of this application.
  • At least one embodiment of the present disclosure provides a variational reasoning method for a Bayesian neural network based on a memristor array.
  • the memristor array includes a plurality of memristors arranged in an array.
  • the Bayesian neural network is trained to obtain The weight matrix is mapped to the memristor array.
  • the method includes: for the memristor array mapped according to the weight matrix of the Bayesian neural network, obtain multiple values of the current memristor array.
  • the conductance state of the memristor obtain multiple weight samples for each weight of the weight matrix of the Bayesian neural network, and perform multiple forward propagations using the memristor array to obtain multiple output results; based on each Multiple weight sampling samples and multiple output results of the weight are used to obtain the loss value of the Bayesian neural network's loss function; the loss function is backpropagated to obtain the weight matrix used in the Bayesian neural network in the memristor array.
  • the gradient of the conductance state of the memristor for each weight updating the conductance state of the memristor in the weight matrix used in the Bayesian neural network in the memristor array according to the gradient.
  • the variational reasoning method provided by the above embodiments of the present disclosure utilizes the stochastic characteristics of memristor to learn the approximation of the posterior distribution, so that the neural network under the Bayesian framework has the ability to estimate uncertainty, and improves the training learning algorithm calculation accuracy.
  • At least one embodiment of the present disclosure also provides a variational inference device corresponding to the above variational inference method.
  • the posterior distribution of the weight w given the training data D is P(w
  • a test data can be obtained by taking the expectation under the posterior distribution of the weights unknown tags
  • weighting is performed according to the posterior distribution P(w
  • variational methods can be used to approximate the posterior distribution of the weights of a Bayesian neural network.
  • Variational learning is to find the parameters ⁇ of the weight distribution q(w
  • KL divergence also known as relative entropy or information divergence, is an asymmetric measure of the difference between two probability distributions.
  • the parameter ⁇ is the mean parameter of the conductance state of the memristor (for example, exhibiting Laplace distribution), that is, finding the conductance state ⁇ of the memristor to minimize the KL divergence, that is:
  • the cost function in formula (1) is the sum of a data-dependent term (likelihood cost) and a priori-dependent term (complexity cost). Therefore, it reflects the need to satisfy the complexity of the data and satisfy the prior. trade-off between simplicity of experimentation. It is not possible to calculate the minimum value of this cost function exactly, but gradient descent or some approximation method is used to calculate it. Under certain conditions, the expected derivative can be expressed as the expectation of the derivative, and equation (1) can be written as:
  • the memristor's read noise sampling can be used to estimate the expectation.
  • the expectation gradient problem can be transformed into a gradient and then solved using the memristor's read noise sampling.
  • a variational inference algorithm similar to the Bayesian neural network based on memristor array is obtained.
  • w (i) represents the i-th sampling value of the weight of the memristor according to the posterior distribution q(w (i)
  • the inventor proposes a variational reasoning method of Bayesian neural network based on memristor array according to the embodiment of the present disclosure.
  • FIG. 1 shows a schematic flowchart of a variational inference method based on a Bayesian neural network based on a memristor array provided by at least one embodiment of the present disclosure.
  • a memristor array includes a plurality of memristors arranged in an array; for example, a trained weight matrix of a Bayesian neural network is mapped to the memristor array.
  • the structure of Bayesian neural network includes fully connected structure or convolutional neural network structure.
  • Each weight of this Bayesian neural network is a random variable.
  • each weight is a distribution, such as Gaussian distribution or Laplace distribution.
  • the Bayesian neural network can be trained offline to obtain the weight matrix.
  • the method of training the Bayesian neural network can refer to conventional methods.
  • a central processing unit (CPU) or an image processing unit (GPU) can be used.
  • neural network processing unit (NPU), neural network accelerator, etc. for training, which will not be described in detail here.
  • FIG. 2A shows a schematic structure of a memristor array.
  • the memristor array is composed of, for example, multiple memristor units.
  • the multiple memristor units form an array of M rows and N columns. M and N are all positive integers.
  • Each memristor cell includes a switching element and one or more memristors.
  • WL ⁇ 1>, WL ⁇ 2>...WL ⁇ M> respectively represent the word lines of the first row, the second row...the Mth row, and the switching elements in the memristor unit circuit of each row.
  • the control electrode (such as the gate of a transistor) is connected to the corresponding word line of the row; BL ⁇ 1>, BL ⁇ 2>...BL ⁇ N> respectively represent the bits of the first column, the second column...the Nth column Line, the memristor in the memristor unit circuit of each column is connected to the bit line corresponding to the column; SL ⁇ 1>, SL ⁇ 2>...SL ⁇ M> respectively represent the first row, the second row...
  • the source line of the Mth row the source of the transistor in the memristor unit circuit of each row is connected to the corresponding source line of the row. According to Kirchhoff's law, by setting the state (such as resistance) of the memristor unit and applying corresponding word line signals and bit line signals to the word line and bit line, the above memristor array can complete the multiply-accumulate calculation in parallel. .
  • FIG. 2B is a schematic diagram of a memristor device, which includes a memristor array and its peripheral driving circuit.
  • the memristor device includes a signal acquisition device, a word line driving circuit, a bit line driving circuit, a source line driving circuit, a memristor array, and a data output circuit.
  • the signal acquisition device is configured to convert a digital signal into a plurality of analog signals through a digital to analog converter (DAC), so as to be input to a plurality of column signal input terminals of the memristor array.
  • DAC digital to analog converter
  • a memristor array includes M source lines, M word lines, and N bit lines, and a plurality of memristor cells arranged in M rows and N columns.
  • the operation of the memristor array is implemented through a word line driving circuit, a bit line driving circuit and a source line driving circuit.
  • the word line driving circuit includes multiple multiplexers (Mux) for switching the word line input voltage; the bit line driving circuit includes multiple multiplexers for switching the bit line input voltage; the source line The driver circuit also includes multiple multiplexers (Mux) for switching the source line input voltage.
  • the source line driver circuit also includes multiple ADCs for converting analog signals into digital signals.
  • TIA Trans-Impedance Amplifier
  • a memristor array includes operating modes and computational modes.
  • the memristor array is in operating mode
  • the memristor unit is in the initialization state, and the values of the parameter elements in the parameter matrix can be written into the memristor array.
  • the source line input voltage, bit line input voltage and word line input voltage of the memristor are switched to corresponding preset voltage ranges through a multiplexer.
  • the word line input voltage is switched to the corresponding voltage range through the control signal WL_sw[1:M] of the multiplexer in the word line driving circuit in FIG. 2B.
  • the word line input voltage is set to 2V (volts), for example, when performing a reset operation on the memristor, the word line input voltage is set to 5V, for example, the word line input voltage It can be obtained from the voltage signal V_WL[1:M] in Figure 2B.
  • the source line input voltage is switched to the corresponding voltage range through the control signal SL_sw[1:M] of the multiplexer in the source line driving circuit in FIG. 2B.
  • the source line input voltage is set to 0V.
  • the source line input voltage is set to 2V.
  • the source line input voltage can be determined by the figure.
  • the voltage signal V_SL[1:M] in 2B is obtained.
  • the bit line input voltage is switched to the corresponding voltage range through the control signal BL_sw[1:N] of the multiplexer in the bit line driving circuit in FIG. 2B.
  • the bit line input voltage is set to 2V.
  • the bit line input voltage is set to 0V.
  • the bit line input voltage can be determined by the figure. DAC obtained in 2B.
  • the memristors in the memristor array are in a conductive state that can be used for computing, and the bitline input voltage input to the column signal input does not change the conductance value of the memristor, e.g. , the calculation can be completed by performing multiplication and addition operations on the memristor array.
  • the word line input voltage is switched to the corresponding voltage range through the control signal WL_sw[1:M] of the multiplexer in the word line driving circuit in Figure 2B.
  • the word line input of the corresponding row The voltage is set to 5V, for example, when no turn-on signal is applied, the word line input voltage of the corresponding row is set to 0V, for example, the GND signal is turned on; through the control signal SL_sw[1 of the multiplexer in the source line driver circuit in Figure 2B :M] Switch the source line input voltage to the corresponding voltage range, for example, set the source line input voltage to 0V, so that the current signals from multiple row signal output terminals can flow into the data output circuit through the bit line drive circuit in Figure 2B
  • the control signal BL_sw[1:N] of the multiplexer in switches the bit line input voltage to the corresponding voltage range, for example, setting the bit line input voltage to 0.1V-0.3V, thereby using the memristor array to perform multiplication and addition. Operation.
  • the data output circuit can include multiple transimpedance amplifiers (TIAs), ADCs, and multiple The current signal at each row signal output end is converted into a voltage signal, and then converted into a digital signal for subsequent processing.
  • TIAs transimpedance amplifiers
  • ADCs analog to digital converters
  • Figure 2C is a schematic diagram of another memristor device.
  • the structure of the memristor device shown in FIG. 2C is basically the same as that of the memristor device shown in FIG. 2B, and also includes a memristor array and its peripheral driving circuit.
  • the memristor device signal acquisition device, word line driving circuit, bit line driving circuit, source line driving circuit, memristor array and data output circuit.
  • a memristor array includes M source lines, 2M word lines, and 2N bit lines, and a plurality of memristor cells arranged in M rows and N columns.
  • each memristor unit has a 2T2R structure, and mapping of positive and negative values can be achieved through this 2T2R structure. The operation of mapping the parameter matrix used for transformation processing to multiple different memristor units in the memristor array will not be described again here.
  • the memristor array may also include M source lines, M word lines and 2N bit lines, and a plurality of memristor units arranged in M rows and N columns.
  • Figure 2D shows the process of mapping the weight matrix of the Bayesian neural network to the memristor array.
  • Memristor arrays are used to implement the weight matrix between layers in the Bayesian neural network.
  • N memristors are used for each weight to implement the distribution corresponding to the weight.
  • N is an integer greater than or equal to 2.
  • N conductance values are calculated, and the N conductance value distributions are mapped to the N memristors. In this way, the weight matrix in the Bayesian neural network is converted into the target conductance value and mapped into the intersection sequence of the memristor array.
  • the left side of the figure is a three-layer Bayesian neural network, which includes three neuron layers connected one by one.
  • the input layer includes layer 1 neurons
  • the hidden layer includes layer 2 neurons
  • the output layer includes layer 3 neurons.
  • the input layer passes the received input data to the hidden layer
  • the hidden layer performs calculations and transformations on the input data and sends it to the output layer
  • the output layer outputs the output structure of the Bayesian neural network.
  • the input layer, hidden layer and output layer all include multiple neuron nodes, and the number of neuron nodes in each layer can be set according to different application situations.
  • the number of neurons in the input layer is 2 (including N 1 and N 2 )
  • the number of neurons in the middle hidden layer is 3 (including N 3 , N 4 and N 5 )
  • the number of neurons in the output layer is 1 (including N 6 ).
  • the weight matrix is implemented by a memristor array as shown on the right side of Figure 2D.
  • the weight parameters can be programmed directly to the conductance of the memristor array.
  • the weight parameters can also be mapped to the conductance of the memristor array according to a certain rule.
  • the difference in conductance of two memristors can also be used to represent a weight parameter.
  • the structure of the memristor array on the right side in FIG. 2D is, for example, as shown in FIG. 2A .
  • the memristor array may include a plurality of memristors arranged in an array.
  • the weight connecting the input N1 and the output N3 is implemented by three memristors (G 11 , G 12 , G 13 ), and other weights in the weight matrix can be implemented in the same way.
  • source line SL 1 corresponds to neuron N 3
  • source line SL 2 corresponds to neuron N 4
  • source line SL 5 corresponds to neuron N 5
  • bit lines BL 1 , BL 2 and BL 3 correspond to neuron N 1
  • a weight between the input layer and the hidden layer is converted into three target conductance values according to the distribution, and the distribution is mapped into the cross sequence of the memristor array, here
  • the target conductance values are G 11 , G 12 , and G 13 , respectively, and are outlined with dashed lines in the memristor array.
  • the variational inference method includes the following steps S101 to S105.
  • Step S101 For the memristor array mapped according to the weight matrix of the Bayesian neural network, obtain the conductance states of multiple memristors of the current memristor array.
  • Step S102 Obtain multiple weight sampling samples for each weight of the weight matrix of the Bayesian neural network, and use the memristor array to perform multiple forward propagations to obtain multiple output results.
  • N memristors are used in the memristor array to correspond to each weight of the weight matrix between layers in the Bayesian neural network, and N is an integer greater than or equal to 2.
  • each weight sampling sample in the multiple weight sampling samples of each weight is as follows:
  • Current n is the current value obtained by reading and sampling the nth memristor among N memristors, 1 ⁇ n ⁇ N, offset and scale are both hyperparameters
  • scale is a scale hyperparameter, which can change the current
  • the standard deviation, offset is a bias hyperparameter that can change the average value of the current
  • the aforementioned hyperparameters can be set before starting the training process; if necessary, also These hyperparameters can be optimized to obtain the optimal hyperparameters to improve the performance and effect of learning. Whether to improve and optimize these two hyperparameters can be decided according to the complexity of the specific task. For this purpose, I won’t go into details publicly.
  • using N memristors in the memristor array to correspond to each weight of the weight matrix between layers in the Bayesian neural network may include: using N for each weight The memristors realize the distribution corresponding to the weight. Based on the random probability distribution corresponding to the weight, N conductance values are calculated, and the N conductance values are mapped to N memristors respectively.
  • three memristors are used to correspond to the weight between input N 1 and output N 3.
  • the weight is converted into three conductance values according to a random probability distribution.
  • the three conductance values here are G 11 , G 12 and G 13 map three conductance values to three memristors respectively. Read and sample the first memristor among the three memristors to obtain the current value Current 1 , read and sample the second memristor among the three memristors to obtain the current value Current 2 , and obtain the current value Current 2 for the three memristors.
  • the third memristor in the resistor reads and samples to obtain the current value Current 3 , through the above formula Get multiple weight samples of that weight and perform multiple forward propagations using the memristor array to get multiple output results.
  • the input signal is a voltage signal and the output signal is a current signal.
  • the output signal is read and analog-to-digital converted for subsequent processing.
  • the process of reading and sampling the memristors in the memristor array to obtain the current value is: applying the input sequence to the BL (Bit-line, bit line) in the form of voltage pulses, and then collecting the data from the SL (Source- line, source line) out of the output current.
  • the input sequence can be converted into an analog voltage signal by a DAC, and the analog voltage signal is applied to the bit line BL through a multiplexer.
  • the output current is obtained from the source line SL, which can be converted into a voltage signal through a transimpedance amplifier, and converted into a digital signal through the ADC, and the digital signal can be used for subsequent processing.
  • the total output current shows a certain distribution, such as a distribution similar to Gaussian distribution or Laplace distribution.
  • the total output current of all voltage pulses is the result of multiplying the input vector and the weight matrix.
  • such a parallel read operation is equivalent to implementing two operations of sampling and vector matrix multiplication.
  • the current obtained by reading the target memristor is The values follow a probability distribution with the conductance state ⁇ of the target memristor as the mean and the scale factor S.
  • Step S103 Obtain the loss value of the loss function of the Bayesian neural network based on multiple weight sampling samples and multiple output results of each weight.
  • the loss function is a function that maps the value of a random event or its related random variables into a non-negative real number to represent the "risk” or “loss” of the random event. In applications, loss functions are often associated with optimization problems as learning criteria.
  • the loss function includes a KL loss term and a likelihood loss term.
  • P(w)] is the KL loss term
  • w)] is the likelihood loss term
  • a KL loss term is obtained from multiple weight sampling samples of each weight, and a likelihood loss term is calculated from multiple output results.
  • the KL loss term is calculated by calculating the mean and standard deviation of multiple weight sampling samples for each weight.
  • Step S104 Backpropagate the loss function to obtain the gradient of the conductance state of the memristor for each weight of the weight matrix of the Bayesian neural network in the memristor array.
  • Step S105 Update the conductance state of the memristor in the weight matrix of the Bayesian neural network in the memristor array according to the gradient.
  • update methods include, for example: stochastic gradient descent method, momentum update (Momentum Update) method, adaptive gradient (Adagrad) Method, Root Mean Square Transfer (RMSProp) method, Adaptive Momentum (Adam) method, etc.
  • variational reasoning method provided by at least one embodiment of the present disclosure may further include: in response to the need to continue training, using the updated conductance of the memristor of the weight matrix of the memristor array for the Bayesian neural network. continue training, otherwise stop training.
  • the training and the updating of the conductance state are stopped; otherwise, the training and the updating of the conductance state are continued.
  • the mode length of the gradient is less than a preset threshold (for example, the preset threshold is 1e-4)
  • the training and the updating of the conductance state are stopped.
  • Another example is to set a number of updates. When the number of updates is reached, the training and the updating of the conductance state are stopped.
  • Figure 3 shows a schematic flowchart of a variational inference method provided by at least one embodiment of the present disclosure.
  • the conductance states ⁇ of multiple memristors of the memristor array are obtained.
  • the memristor is read and sampled to obtain the current value, and multiple weight sampling samples for each weight of the weight matrix of the Bayesian neural network are calculated based on the current value.
  • the KL loss term is calculated by calculating the mean and standard deviation of the multiple weighted sampling samples, and the memristor array is used to perform multiple forward propagations to obtain multiple output results, and the likelihood loss term is calculated based on the multiple output results. .
  • the loss function is the KL loss term plus the likelihood loss term.
  • the loss function is backpropagated to obtain the gradient of the conductance state of the memristor for each weight of the weight matrix used in the Bayesian neural network in the memristor array.
  • the conductance states of the memristors in the memristor array are updated according to the gradient for the weight matrix of the Bayesian neural network. If the stop condition is not reached, continue training and continue to update the conductivity state; otherwise, stop continuing to train and continue to update the conductance state.
  • FIG. 4 shows a schematic block diagram of a variational inference device 400 of a memristor array-based Bayesian neural network provided by at least one embodiment of the present disclosure.
  • the variational inference device 400 can be used to perform the execution shown in FIG. 1 variational reasoning method.
  • a memristor array includes a plurality of memristors arranged in an array, and a trained weight matrix of a Bayesian neural network is mapped to the memristor array.
  • the variational inference device 400 includes an acquisition unit 401 , a forward propagation unit 402 , a calculation unit 403 and an update unit 404 .
  • the acquisition unit 401 is configured to obtain the conductance states of a plurality of memristors of the current memristor array for the memristor array mapped according to the weight matrix of the Bayesian neural network.
  • the forward propagation unit 402 is configured to obtain multiple weight sampling samples for each weight of the Bayesian neural network, and use the Bayesian neural network to perform multiple forward propagations to obtain multiple output results.
  • the computing unit 403 is configured to obtain the loss function of the Bayesian neural network based on multiple weight sampling samples and multiple output results of each weight; perform backpropagation on the loss function to obtain the Bayesian neural network in the memristor array.
  • the update unit 404 is configured to update the conductance state of the memristor in the memristor array for the Bayesian neural network according to the gradient.
  • variational inference device 400 can be implemented using hardware, software, firmware, and any feasible combination thereof, and this disclosure is not limited thereto.
  • the variational reasoning device 400 may further include one or more memristor arrays for mapping the weight matrix of the Bayesian neural network.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Neurology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Algebra (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Micromachines (AREA)

Abstract

一种基于忆阻器阵列的贝叶斯神经网络的变分推理方法和装置。忆阻器阵列包括多个忆阻器,贝叶斯神经网络的经训练得到的权重矩阵被映射至忆阻器阵列中,对于被映射后的忆阻器阵列,获得当前忆阻器阵列的多个忆阻器的电导态(S101);获取对于贝叶斯神经网络的权重矩阵的每个权重的多个权重采样样本,并使用忆阻器阵列进行多次前向传播以得到多个输出结果(S102);基于每个权重的多个权重采样样本和多个输出结果,获取贝叶斯神经网络的损失函数的损失值(S103);对损失函数进行反向传播,得到忆阻器阵列中用于贝叶斯神经网络的权重矩阵的每个权重的忆阻器的电导态的梯度(S104);根据梯度更新忆阻器阵列中用于贝叶斯神经网络的权重矩阵的忆阻器的电导态(S105)。

Description

基于忆阻器阵列的贝叶斯神经网络的变分推理方法和装置
本申请要求于2022年5月9日递交的中国专利申请第202210497666.7号的优先权,在此全文引用上述中国专利申请公开的内容以作为本申请的一部分。
技术领域
本公开的实施例涉及一种基于忆阻器阵列的贝叶斯神经网络的变分推理方法和变分推理装置。
背景技术
变分推理是一种确定式的(deterministic)近似推理(approximate inference)方法。近似推理简单地说是一种用来近似一个计算复杂的分布或者至少获得目标分布的一些统计量(statistics)的方法。在贝叶斯统计中,所有的对于未知量的推断问题可以看作是对后验概率(posterior)的计算,而这一概率通常难以计算,可以利用马尔科夫链蒙特卡洛算法做近似,但是对于大量数据,马尔科夫链蒙特卡洛算法计算较慢,变分推理为人们提供了一种更快更简单的适用于大量数据的近似推理方法。
发明内容
本公开至少一个实施例提供一种基于忆阻器阵列的贝叶斯神经网络的变分推理方法,忆阻器阵列包括阵列排布的多个忆阻器,贝叶斯神经网络的经训练得到的权重矩阵被映射至忆阻器阵列中,该方法包括:对于根据贝叶斯神经网络的权重矩阵被映射后的忆阻器阵列,获得当前忆阻器阵列的多个忆阻器的电导态;获取对于贝叶斯神经网络的权重矩阵的每个权重的多个权重采样样本,并使用忆阻器阵列进行多次前向传播以得到多个输出结果;基于每个权重的多个权重采样样本和多个输出结果,获取贝叶斯神经网络的损失函数的损失值;对损失函数进行反向传播,得到忆阻器阵列中用于贝叶斯神经网络的权重矩阵的每个权重的忆阻器的电导态的梯度;根据梯度更新忆 阻器阵列中用于贝叶斯神经网络的权重矩阵的忆阻器的电导态。
例如,本公开至少一个实施例提供的变分推理方法,还包括:响应于需要继续训练,则使用忆阻器阵列的用于贝叶斯神经网络的权重矩阵的忆阻器更新后的电导态继续训练,否则停止训练。
例如,在本公开至少一个实施例提供的变分推理方法中,在忆阻器阵列中利用N个忆阻器对应贝叶斯神经网络中层与层之间的权重矩阵的每个权重,N为大于或等于2的整数,每个权重的多个权重采样样本中的每个权重采样样本为offset和scale均为超参数,Currentn为对N个忆阻器中的第n个忆阻器进行读采样获得的电流值,1≤n≤N。
例如,在本公开至少一个实施例提供的变分推理方法中,在忆阻器阵列中利用N个忆阻器对应贝叶斯神经网络中层与层之间的权重矩阵的每个权重,包括:对每个权重使用N个忆阻器实现与权重对应的分布,针对权重的对应的随机概率分布,计算得到N个电导值,将N个电导值分别映射到N个忆阻器中。
例如,在本公开至少一个实施例提供的变分推理方法中,对目标忆阻器进行读采样获得的电流值服从以目标忆阻器的电导态θ为均值且尺度因子为S的概率分布。
例如,在本公开至少一个实施例提供的变分推理方法中,损失函数包括KL损失项和似然损失项,由每个权重的多个权重采样样本得到KL损失项,以及由多个输出结果计算得到似然损失项。
例如,在本公开至少一个实施例提供的变分推理方法中,通过计算每个权重的多个权重采样样本的均值和标准差来计算KL损失项。
例如,在本公开至少一个实施例提供的变分推理方法中,损失函数的表达为:f(D,θ)=KL[q(w|θ)||P(w)]-Eq(w|θ)[logP(D|w)],KL[q(w|θ)||P(w)]为KL损失项,Eq(w|θ)[logP(D|w)]为似然损失项。
本公开至少一个实施例还提供一种基于忆阻器阵列的贝叶斯神经网络的变分推理装置,忆阻器阵列包括阵列排布的多个忆阻器,贝叶斯神经网络的经训练得到的权重矩阵被映射至忆阻器阵列中,该装置包括:获取单元,配置为对于根据贝叶斯神经网络的权重矩阵被映射后的忆阻器阵列,获得当 前忆阻器阵列的多个忆阻器的电导态;前向传播单元,配置为获取对于贝叶斯神经网络的每个权重的多个权重采样样本,并使用贝叶斯神经网络进行多次前向传播以得到多个输出结果;计算单元,配置为基于每个权重的多个权重采样样本和多个输出结果,获取贝叶斯神经网络的损失函数;对损失函数进行反向传播,得到忆阻器阵列中用于贝叶斯神经网络每个权重的忆阻器的电导态的梯度;更新单元,配置为根据梯度更新忆阻器阵列中用于贝叶斯神经网络的忆阻器的电导态。
附图说明
为了更清楚地说明本公开实施例的技术方案,下面将对实施例的附图作简单地介绍,显而易见地,下面描述中的附图仅仅涉及本公开的一些实施例,而非对本公开的限制。
图1示出了本公开至少一个实施例提供的一种基于忆阻器阵列的贝叶斯神经网络的变分推理方法的示意性流程图;
图2A示出了一种忆阻器阵列的示意性结构;
图2B为一种忆阻器装置的示意图;
图2C为另一种忆阻器装置的示意图;
图2D示出了将贝叶斯神经网络的权重矩阵映射到忆阻器阵列的示意图;
图3示出了本公开至少一个实施例提供的变分推理方法的示意性流程图;
图4示出了本公开至少一个实施例提供的一种基于忆阻器阵列的贝叶斯神经网络的变分推理装置的示意框图。
具体实施方式
为使本公开实施例的目的、技术方案和优点更加清楚,下面将结合本公开实施例的附图,对本公开实施例的技术方案进行清楚、完整地描述。显然,所描述的实施例是本公开的一部分实施例,而不是全部的实施例。基于所描述的本公开的实施例,本领域普通技术人员在无需创造性劳动的前提下所获得的所有其他实施例,都属于本公开保护的范围。
除非另外定义,本公开使用的技术术语或者科学术语应当为本公开所属领域内具有一般技能的人士所理解的通常意义。本公开中使用的“第一”、“第 二”以及类似的词语并不表示任何顺序、数量或者重要性,而只是用来区分不同的组成部分。同样,“一个”、“一”或者“该”等类似词语也不表示数量限制,而是表示存在至少一个。“包括”或者“包含”等类似的词语意指出现该词前面的元件或者物件涵盖出现在该词后面列举的元件或者物件及其等同,而不排除其他元件或者物件。“连接”或者“相连”等类似的词语并非限定于物理的或者机械的连接,而是可以包括电性的连接,不管是直接的还是间接的。“上”、“下”、“左”、“右”等仅用于表示相对位置关系,当被描述对象的绝对位置改变后,则该相对位置关系也可能相应地改变。
目前基于忆阻器阵列的神经网络的训练学习算法无法与忆阻器的内在非理想特性(例如,器件间波动、器件电导卡滞、电导状态漂移等)相协调。训练这种基于忆阻器的模型的主要方法是基于梯度的学习算法,其中损失度量根据模型的当前参数进行区分。但是,忆阻器的内在随机特性,使得这类训练方法不能提供足够的计算精度,从而使得对于神经网络的训练极具挑战性。此外,在这种确定式的(deterministic)建模方法中,每个参数都由单个值描述,并且不能考虑参数的不确定性。在大多数的人工智能系统中,捕获参数中的不确定性非常重要。概率模型为解决不确定性问题提供了一种方法,这些概率模型使人们能够利用模型的预测做出明智的决定,同时对这些预测的不确定性持谨慎态度。
贝叶斯神经网络(Bayesian Neural Network,BNN)是一种将神经网络置于贝叶斯框架中的概率模型,可以描述复杂的随机模式。为了考虑参数的不确定性,最好能够构建贝叶斯模型。在贝叶斯模型下,参数不是由单个值表示,而是由概率分布表示。给定观测数据,贝叶斯模型所有参数的分布称为后验分布。作为通过基于梯度的更新推导最佳确定性模型的类似物,贝叶斯机器学习的目标是学习后验分布的近似值。为此,发明人在在先的中国专利申请公开CN110956256A中描述了利用忆阻器本征噪声实现贝叶斯神经网络的方法及装置,在此全文引用以作为本申请的一部分。
本公开至少一个实施例提供一种基于忆阻器阵列的贝叶斯神经网络的变分推理方法,忆阻器阵列包括阵列排布的多个忆阻器,贝叶斯神经网络的经训练得到的权重矩阵被映射至忆阻器阵列中,该方法包括:对于根据贝叶斯神经网络的权重矩阵被映射后的忆阻器阵列,获得当前忆阻器阵列的多个 忆阻器的电导态;获取对于贝叶斯神经网络的权重矩阵的每个权重的多个权重采样样本,并使用忆阻器阵列进行多次前向传播以得到多个输出结果;基于每个权重的多个权重采样样本和多个输出结果,获取贝叶斯神经网络的损失函数的损失值;对损失函数进行反向传播,得到忆阻器阵列中用于贝叶斯神经网络的权重矩阵的每个权重的忆阻器的电导态的梯度;根据梯度更新忆阻器阵列中用于贝叶斯神经网络的权重矩阵的忆阻器的电导态。
本公开的上述实施例提供的变分推理方法利用忆阻器的随机特性,学习后验分布的近似值,使得在贝叶斯框架下的神经网络具有不确定性估计的能力,提高了训练学习算法的计算精度。
本公开至少一个实施例还提供了一种对应于上述变分推理方法的变分推理装置。
下面结合附图对本公开的实施例进行详细说明,但是本公开并不限于这些具体的实施例。
首先说明提出本公开实施例的变分推理方法的整体构思。
给定训练数据D的权重w的后验分布为P(w|D)可以通过贝叶斯神经网络推断计算。通过在权重的后验分布下取期望可以得到一个测试数据的未知标签的预测分布:
对于每个可能的权重w,根据后验分布P(w|D)进行加权,从而实现对给定测试数据的未知标签进行预测。因此,在权重w的后验分布下取期望等价于使用一个无限数量的神经网络的集合。这对于任何实际规模的神经网络都是棘手的。但是,可以用变分的方法来近似贝叶斯神经网络的权重的后验分布。变分学习是寻找贝叶斯神经网络的权重的分布q(w|θ)的参数θ,使得这个分布与真实的后验分布之间的KL(Kullback-Leibler)散度最小。KL散度又被称为相对熵或信息散度,是两个概率分布间差异的非对称性度量。此处,参数θ是忆阻器的电导态(例如,呈现拉普拉斯分布)的均值参数,也即是寻找忆阻器的电导态θ使得KL散度最小,即:

成本函数也被称为变分自由能或预期下界,将其记为:
F(D,θ)=KL[q(w|θ)||P(w)]-Eq(w|θ)[logP(D|w)]    (1)
式(1)中的成本函数是一个依赖于数据的项(似然成本)和一个依赖先验的项(复杂度成本)的加和,因此,它体现出在满足数据的复杂性和满足先验的简单性之间的权衡。精确地计算出这个成本函数的最小值是不可能的,而是用梯度下降或一些近似方法来进行计算。在一定条件下,期望的导数可以表示为导数的期望,式(1)可以写为:
对于使用忆阻器阵列来映射神经网络的情形,可以使用忆阻器的读噪声采样来估计期望,这样就可以把求期望的梯度问题转化为求梯度之后用忆阻器的读噪声采样求解,得到了类似向后传播的基于忆阻器阵列的贝叶斯神经网络的变分推断算法。在式(2)中,w(i)表示第i次根据后验分布q(w(i)|θ)的忆阻器的权重的采样值。这样就可以用式(1)的成本函数的梯度的无偏估计来找到忆阻器的电导态θ。
根据上述构思,发明人提出了本公开实施例的基于忆阻器阵列的贝叶斯神经网络的变分推理方法。
图1示出了本公开至少一个实施例提供的一种基于忆阻器阵列的贝叶斯神经网络的变分推理方法的示意性流程图。
例如,忆阻器阵列包括阵列排布的多个忆阻器;例如,贝叶斯神经网络的经训练得到的权重矩阵被映射至忆阻器阵列中。
例如,贝叶斯神经网络的结构包括全连接结构或卷积神经网络结构等。该贝叶斯神经网络的每个权重是随机变量。例如,在该贝叶斯神经网络经训练完成后,每一个权重都是一个分布,例如高斯分布或者拉普拉斯分布。
例如,可以对贝叶斯神经网络进行离线(offline)训练得到权重矩阵,对贝叶斯神经网络进行训练的方法可以参考常规方法,例如可以采用中央处理单元(CPU)、图像处理单元(GPU)、神经网络处理单元(NPU)、神经网络加速器等进行训练,在此不再赘述。
例如,关于忆阻器阵列的结构可以参考图2A。
图2A示出了一种忆阻器阵列的示意性结构,该忆阻器阵列例如由多个忆阻器单元构成,该多个忆阻器单元构成一个M行N列的阵列,M和N均为正整数。每个忆阻器单元包括开关元件和一个或多个忆阻器。在图2A中,WL<1>、WL<2>……WL<M>分别表示第一行、第二行……第M行的字线,每一行的忆阻器单元电路中的开关元件的控制极(例如晶体管的栅极)和该行对应的字线连接;BL<1>、BL<2>……BL<N>分别表示第一列、第二列……第N列的位线,每列的忆阻器单元电路中的忆阻器和该列对应的位线连接;SL<1>、SL<2>……SL<M>分别表示第一行、第二行……第M行的源线,每一行的忆阻器单元电路中的晶体管的源极和该行对应的源线连接。根据基尔霍夫定律,通过设置忆阻器单元的状态(例如阻值)并且在字线与位线施加相应的字线信号与位线信号,上述忆阻器阵列可以并行地完成乘累加计算。
图2B为一种忆阻器装置的示意图,该忆阻器装置包括忆阻器阵列及其外围驱动电路。例如,如图2B所示,该忆阻器装置包括信号获取装置、字线驱动电路、位线驱动电路、源线驱动电路、忆阻器阵列以及数据输出电路。
例如,信号获取装置配置为将数字信号通过数字模拟转换器(Digital to Analog converter,简称DAC)转换为多个模拟信号,以输入至忆阻器阵列的多个列信号输入端。
例如,忆阻器阵列包括M条源线、M条字线和N条位线,以及阵列排布为M行N列的多个忆阻器单元。
例如,通过字线驱动电路、位线驱动电路和源线驱动电路实现对于忆阻器阵列的操作。
例如,字线驱动电路包括多个多路选择器(Multiplexer,简称Mux),用于切换字线输入电压;位线驱动电路包括多个多路选择器,用于切换位线输入电压;源线驱动电路也包括多个多路选择器(Mux),用于切换源线输入电压。例如,源线驱动电路还包括多个ADC,用于将模拟信号转换为数字信号。此外,在源线驱动电路中的Mux和ADC之间还可以进一步设置跨阻放大器(Trans-Impedance Amplifier,简称TIA)(图中未示出)以完成电流到电压的转换,以便于ADC处理。
例如,忆阻器阵列包括操作模式和计算模式。当忆阻器阵列处于操作模 式时,忆阻器单元处于初始化状态,可以将参数矩阵中的参数元素的数值写入忆阻器阵列中。例如,将忆阻器的源线输入电压、位线输入电压和字线输入电压通过多路选择器切换至对应的预设电压区间。
例如,通过图2B中的字线驱动电路中的多路选择器的控制信号WL_sw[1:M]将字线输入电压切换至相应的电压区间。例如在对忆阻器进行置位操作时,将字线输入电压设置为2V(伏特),例如在对忆阻器进行复位操作时,将字线输入电压设置为5V,例如,字线输入电压可以通过图2B中的电压信号V_WL[1:M]得到。
例如,通过图2B中的源线驱动电路中的多路选择器的控制信号SL_sw[1:M]将源线输入电压切换至相应的电压区间。例如在对忆阻器进行置位操作时,将源线输入电压设置为0V,例如在对忆阻器进行复位操作时,将源线输入电压设置为2V,例如,源线输入电压可以通过图2B中的电压信号V_SL[1:M]得到。
例如,通过图2B中的位线驱动电路中的多路选择器的控制信号BL_sw[1:N]将位线输入电压切换至相应的电压区间。例如在对忆阻器进行置位操作时,将位线输入电压设置为2V,例如在对忆阻器进行复位操作时,将位线输入电压设置为0V,例如,位线输入电压可以通过图2B中DAC得到。
例如,当忆阻器阵列处于计算模式时,忆阻器阵列中的忆阻器处于可用于计算的导电状态,列信号输入端输入的位线输入电压不会改变忆阻器的电导值,例如,可以通过忆阻器阵列执行乘加运算完成计算。例如,通过图2B中的字线驱动电路中的多路选择器的控制信号WL_sw[1:M]将字线输入电压切换至相应的电压区间,例如施加开启信号时,相应行的字线输入电压设置为5V,例如不施加开启信号时,相应行的字线输入电压设置为0V,例如接通GND信号;通过图2B中的源线驱动电路中的多路选择器的控制信号SL_sw[1:M]将源线输入电压切换至相应的电压区间,例如将源线输入电压设置为0V,从而使得多个行信号输出端的电流信号可以流入数据输出电路,通过图2B中的位线驱动电路中的多路选择器的控制信号BL_sw[1:N]将位线输入电压切换至相应的电压区间,例如将位线输入电压设置为0.1V-0.3V,从而利用忆阻器阵列进行乘加运算。
例如,数据输出电路可以包括多个跨阻放大器(TIA)、ADC,可以将多 个行信号输出端的电流信号转换为电压信号,而后转换为数字信号,以用于后续处理。
图2C为另一种忆阻器装置的示意图。图2C所示的忆阻器装置与图2B所示的忆阻器装置的结构基本相同,也包括忆阻器阵列及其外围驱动电路。例如,如图2C所示,该忆阻器装置信号获取装置、字线驱动电路、位线驱动电路、源线驱动电路、忆阻器阵列以及数据输出电路。
例如,忆阻器阵列包括M条源线、2M条字线和2N条位线,以及阵列排布为M行N列的多个忆阻器单元。例如,每个忆阻器单元为2T2R结构,通过该2T2R结构可以实现对于正值以及负值的映射。将用于变换处理的参数矩阵映射于忆阻器阵列中不同的多个忆阻器单元的操作,这里不再赘述。需要说明的是,忆阻器阵列也可以包括M条源线、M条字线和2N条位线,以及阵列排布为M行N列的多个忆阻器单元。
关于信号获取装置、控制驱动电路以及数据输出电路的描述可以参照之前的描述,这里不再赘述。
例如,关于将贝叶斯神经网络的权重矩阵映射到忆阻器阵列的过程可以参考图2D。
图2D示出了将贝叶斯神经网络的权重矩阵映射到忆阻器阵列的过程。利用忆阻器阵列实现贝叶斯神经网络中层与层之间的权重矩阵,对每个权重使用N个忆阻器实现与该权重对应的分布,N为大于等于2的整数,针对该权重的对应的随机概率分布,计算得到N个电导值,将该N个电导值分布映射到该N个忆阻器中。如此,将贝叶斯神经网络中的权重矩阵转换为目标电导值映射到忆阻器阵列的交叉序列中。
如图2D所示,图中的左侧是一个三层贝叶斯神经网络,该贝叶斯神经网络包括逐一连接的3层神经元层。例如,输入层包括第1层神经元层,隐含层包括第2层神经元层,输出层包括第3层神经元层。例如,输入层将接收的输入数据传递到隐含层,隐含层对该输入数据进行计算转换发送至输出层,输出层输出贝叶斯神经网络的输出结构。
如图2D所示,输入层、隐含层以及输出层均包括多个神经元节点,各层的神经元节点的个数可以根据不同的应用情况设定。例如,输入层的神经元个数为2(包括N1和N2),中间隐藏层的神经元个数为3(包括N3、N4和 N5),输出层的神经元个数为1(包括N6)。
如图2D所示,贝叶斯神经网络的相邻两层神经元层之间通过权重矩阵连接。例如,权重矩阵由如图2D右侧的忆阻器阵列实现。例如,可以将权重参数直接编程为忆阻器阵列的电导。例如,也可以将权重参数按照某一规则映射到忆阻器阵列的电导。例如,也可以利用两个忆阻器的电导的差值来代表一个权重参数。虽然本公开以将权重参数直接编程为忆阻器阵列的电导或将权重参数按照某一规则映射到忆阻器阵列的电导的方式对本公开的技术方案进行了描述,但其仅是示例性的,而不是对本公开的限制。
图2D中的右侧的忆阻器阵列的结构例如如图2A所示,该忆阻器阵列可以包括阵列排布的多个忆阻器。如图2D所示的示例中,连接输入N1与输出N3之间的权重由3个忆阻器(G11、G12、G13)实现,权重矩阵中的其他权重可以相同地实现。更具体而言,源线SL1对应神经元N3,源线SL2对应神经元N4,源线SL5对应神经元N5,位线BL1、BL2和BL3对应神经元N1,输入层和隐藏层之间的一个权重(神经元N1和神经元N3之间的权重)按照分布被转换为三个目标电导值,并分布映射到忆阻器阵列的交叉序列中,这里目标电导值分别为G11、G12和G13,在忆阻器阵列中用虚线框框出。
返回到图1,如图1所示,该变分推理方法包括如下的步骤S101~S105。
步骤S101:对于根据贝叶斯神经网络的权重矩阵被映射后的忆阻器阵列,获得当前忆阻器阵列的多个忆阻器的电导态。
步骤S102:获取对于贝叶斯神经网络的权重矩阵的每个权重的多个权重采样样本,并使用忆阻器阵列进行多次前向传播以得到多个输出结果。
例如,在本公开的一些实施例中,在忆阻器阵列中利用N个忆阻器对应贝叶斯神经网络中层与层之间的权重矩阵的每个权重,N为大于或等于2的整数,每个权重的多个权重采样样本中的每个权重采样样本如下式所示:
其中,Currentn为对N个忆阻器中的第n个忆阻器进行读采样获得的电流值,1≤n≤N,offset和scale均为超参数,scale是尺度超参数,可以改变电流的标准差,offset是偏置超参数,可以改变电流的平均值;scale参数可以被默认设置为scale=1,offset参数可以被默认设置为offset=忆阻器电流窗口的中值*N。对于前述超参数可以在开始训练过程之前设置;如果需要,还 可以对这些超参数进行优化,由此得到最优的超参数,以提高学习的性能和效果,是否对这两个超参数的进行改进和优化可以根据具体任务的复杂程度来决定,对此本公开不再赘述。
例如,在本公开的一些实施例中,在忆阻器阵列中利用N个忆阻器对应贝叶斯神经网络中层与层之间的权重矩阵的每个权重可以包括:对每个权重使用N个忆阻器实现与权重对应的分布,针对权重的对应的随机概率分布,计算得到N个电导值,将N个电导值分别映射到N个忆阻器中。
例如,在图2D中,利用3个忆阻器对应输入N1与输出N3之间的权重,该权重按照随机概率分布被转换为三个电导值,这里三个电导值分别为G11、G12和G13,将三个电导值分别映射到3个忆阻器中。对3个忆阻器中的第1个忆阻器进行读采样获得电流值Current1,对3个忆阻器中的第2个忆阻器进行读采样获得电流值Current2,对3个忆阻器中的第3个忆阻器进行读采样获得电流值Current3,通过上述公式获取该权重的多个权重采样样本,并使用忆阻器阵列进行多次前向传播以得到多个输出结果。
需要说明的是,在Currentn可正负的情况下,offset可以为0。Currentn的正负可以由忆阻器阵列的输入信号的正负确定。
对于忆阻器阵列,输入信号为电压信号,输出信号为电流信号,读取输出信号并将输出信号进行模数转换以用于后续处理。例如,对忆阻器阵列中的忆阻器进行读采样获得电流值的过程为:将输入序列以电压脉冲的方式施加在BL(Bit-line,位线)上,然后采集从SL(Source-line,源线)流出的输出电流。例如,对于如图2B或2C所示的忆阻器装置,可以通过DAC将输入序列转换为模拟电压信号,模拟电压信号通过多路选择器施加位线BL上。对应地,从源线SL获取输出电流,该电流可通过跨阻放大器转换成电压信号,通过ADC转换成数字信号,该数字信号可以用于后续处理。N个忆阻器在读电流且N比较大时,输出的总电流呈现一定的分布,例如类似于高斯分布或拉普拉斯分布等分布。所有电压脉冲的输出总电流就是输入向量与权重矩阵相乘的结果。在忆阻器交叉阵列中,这样的一次并行读操作就相当于实现了采样和向量矩阵乘法的两个操作。
例如,在本公开的一些实施例中,对目标忆阻器进行读采样获得的电流 值服从以目标忆阻器的电导态θ为均值且尺度因子为S的概率分布。
步骤S103:基于每个权重的多个权重采样样本和多个输出结果,获取贝叶斯神经网络的损失函数的损失值。
损失函数(loss function)是将随机事件或其有关随机变量的取值映射为非负实数以表示该随机事件的“风险”或“损失”的函数。在应用中,损失函数通常作为学习准则与优化问题相联系。
例如,在本公开的一些实施例中,损失函数包括KL损失项和似然损失项。例如,损失函数的表达为下式:
F(D,θ)=KL[q(w|θ)||P(w)]-Eq(w|θ)[logP(D|w)],
其中,KL[q(w|θ)||P(w)]为KL损失项,Eq(w|θ)[logP(D|w)]为似然损失项。
例如,在本公开的一些实施例中,由每个权重的多个权重采样样本得到KL损失项,以及由多个输出结果计算得到似然损失项。
例如,在本公开的一些实施例中,通过计算每个权重的多个权重采样样本的均值和标准差来计算KL损失项。
步骤S104:对损失函数进行反向传播,得到忆阻器阵列中用于贝叶斯神经网络的权重矩阵的每个权重的忆阻器的电导态的梯度。
例如,在利用梯度下降法对神经网络的权重等参数进行训练时,需要利用反向传播去计算损失函数对权重的偏导数,从而得到用于贝叶斯神经网络的权重矩阵的每个权重的忆阻器的电导态的梯度。在几何上,沿梯度的方向是函数增加最快的地方,沿梯度的反方向是函数减小最快的地方,更容易找到最小值。
步骤S105:根据梯度更新忆阻器阵列中用于贝叶斯神经网络的权重矩阵的忆阻器的电导态。
当通过反向传播计算出梯度后,梯度就可以用于进行参数更新。通过利用梯度信息,以此不断更新参数(例如,电导态)来寻找使损失函数最小的最优解。在此实施例中,例如,对用于贝叶斯神经网络的权重矩阵的忆阻器的电导态进行更新。进行更新的方法有许多种,最简单的是沿着负梯度方向逐渐改变参数。在本公开的实施例中,可以使用的更新方法例如包括:随机梯度下降方法、动量更新(Momentum Update)方法、自适应梯度(Adagrad) 方法、均方根传递(RMSProp)方法、自适应动量(Adam)方法等。
例如,本公开至少一个实施例提供的变分推理方法,还可以包括:响应于需要继续训练,则使用忆阻器阵列的用于贝叶斯神经网络的权重矩阵的忆阻器更新后的电导态继续训练,否则停止训练。
例如,当达到停止条件时停止继续训练以及对电导态的继续更新,否则继续训练以及继续更新电导态。例如,当梯度的模长小于一个预设阈值(例如,预设阈值为1e-4)时,则停止继续训练以及对电导态的继续更新。又例如,设置一个更新次数,当达到该更新次数时,则停止继续训练以及对电导态的继续更新。
图3示出了本公开至少一个实施例提供的变分推理方法的示意性流程图。
如图3所示,首先,对于一个根据贝叶斯神经网络的权重矩阵被映射后的忆阻器阵列,获得该忆阻器阵列的多个忆阻器的电导态θ。然后,对忆阻器进行读采样获得电流值,并根据电流值计算贝叶斯神经网络的权重矩阵的每个权重的多个权重采样样本。然后,通过计算该多个权重采样样本的均值和标准差来计算KL损失项,并使用忆阻器阵列进行多次前向传播以得到多个输出结果,基于多个输出结果计算似然损失项。基于计算出的KL损失项和似然损失项,获取贝叶斯神经网络的损失函数的损失值(损失函数为KL损失项加上似然损失项)。然后,对损失函数进行反向传播,得到忆阻器阵列中用于贝叶斯神经网络的权重矩阵的每个权重的忆阻器的电导态的梯度。最后,根据梯度更新忆阻器阵列中用于贝叶斯神经网络的权重矩阵的忆阻器的电导态。如果没有达到停止条件,则继续训练以及对电导态继续更新,否则停止继续训练以及对电导态的继续更新。
图4示出了本公开至少一个实施例提供的一种基于忆阻器阵列的贝叶斯神经网络的变分推理装置400的示意框图,该变分推理装置400可以用于执行图1所示的变分推理方法。例如,忆阻器阵列包括阵列排布的多个忆阻器,贝叶斯神经网络的经训练得到的权重矩阵被映射至忆阻器阵列中。
如图4所示,变分推理装置400包括获取单元401、前向传播单元402、计算单元403以及更新单元404。
获取单元401被配置为对于根据所述贝叶斯神经网络的权重矩阵被映射后的所述忆阻器阵列,获得当前所述忆阻器阵列的多个忆阻器的电导态。
前向传播单元402被配置为,获取对于贝叶斯神经网络的每个权重的多个权重采样样本,并使用贝叶斯神经网络进行多次前向传播以得到多个输出结果。
计算单元403被配置为,基于每个权重的多个权重采样样本和多个输出结果,获取贝叶斯神经网络的损失函数;对损失函数进行反向传播,得到忆阻器阵列中用于贝叶斯神经网络每个权重的忆阻器的电导态的梯度。
更新单元404被配置为,根据梯度更新忆阻器阵列中用于贝叶斯神经网络的忆阻器的电导态。
例如,上述变分推理装置400可以采用硬件、软件、固件以及它们的任意可行的组合实现,本公开对此不作限制。
在本公开的实施例中,变分推理装置400还可以包括一个或多个忆阻器阵列,用于映射贝叶斯神经网络的权重矩阵。
上述变分推理装置的技术效果与图1所示的变分推理方法的技术效果相同,在此不再赘述。
有以下几点需要说明:
(1)本公开实施例附图只涉及到本公开实施例涉及到的结构,其他结构可参考通常设计。
(2)在不冲突的情况下,本公开的实施例及实施例中的特征可以相互组合以得到新的实施例。
以上所述,仅为本公开的具体实施方式,但本公开的保护范围并不局限于此,本公开的保护范围应以所述权利要求的保护范围为准。

Claims (9)

  1. 一种基于忆阻器阵列的贝叶斯神经网络的变分推理方法,所述忆阻器阵列包括阵列排布的多个忆阻器,所述贝叶斯神经网络的经训练得到的权重矩阵被映射至所述忆阻器阵列中,所述方法包括:
    对于根据所述贝叶斯神经网络的权重矩阵被映射后的所述忆阻器阵列,获得当前所述忆阻器阵列的多个忆阻器的电导态;
    获取对于所述贝叶斯神经网络的权重矩阵的每个权重的多个权重采样样本,并使用所述忆阻器阵列进行多次前向传播以得到多个输出结果;
    基于每个权重的所述多个权重采样样本和所述多个输出结果,获取所述贝叶斯神经网络的损失函数的损失值;
    对所述损失函数进行反向传播,得到所述忆阻器阵列中用于所述贝叶斯神经网络的权重矩阵的每个权重的忆阻器的电导态的梯度;
    根据所述梯度更新所述忆阻器阵列中用于所述贝叶斯神经网络的权重矩阵的忆阻器的电导态。
  2. 根据权利要求1所述的变分推理方法,还包括:响应于需要继续训练,则使用所述忆阻器阵列的用于所述贝叶斯神经网络的权重矩阵的忆阻器更新后的电导态继续训练,否则停止训练。
  3. 根据权利要求2所述的变分推理方法,其中,在所述忆阻器阵列中利用N个忆阻器对应所述贝叶斯神经网络中层与层之间的权重矩阵的每个权重,N为大于或等于2的整数,
    所述每个权重的所述多个权重采样样本中的每个权重采样样本为
    其中,offset和scale均为超参数,Currentn为对N个忆阻器中的第n个忆阻器进行读采样获得的电流值,1≤n≤N。
  4. 根据权利要求3所述的变分推理方法,其中,在所述忆阻器阵列中利用N个忆阻器对应所述贝叶斯神经网络中层与层之间的权重矩阵的每个权重,包括:
    对每个权重使用N个忆阻器实现与所述权重对应的分布,针对所述权重的对应的随机概率分布,计算得到N个电导值,将所述N个电导值分别映射 到所述N个忆阻器中。
  5. 根据权利要求3所述的变分推理方法,其中,对目标忆阻器进行读采样获得的电流值服从以所述目标忆阻器的电导态θ为均值且尺度因子为S的概率分布。
  6. 根据权利要求1-5中的任一项所述的变分推理方法,其中,所述损失函数包括KL损失项和似然损失项,由每个权重的所述多个权重采样样本得到所述KL损失项,以及由所述多个输出结果计算得到所述似然损失项。
  7. 根据权利要求6所述的变分推理方法,其中,通过计算每个权重的所述多个权重采样样本的均值和标准差来计算所述KL损失项。
  8. 根据权利要求7所述的变分推理方法,其中,所述损失函数的表达为:
    F(D,θ)=KL[q(w|θ)||P(w)]-Eq(w|θ)[logP(D|w)],
    其中,KL[q(w|θ)||P(w)]为所述KL损失项,Eq(w|θ)[logP(D|w)]为所述似然损失项。
  9. 一种基于忆阻器阵列的贝叶斯神经网络的变分推理装置,所述忆阻器阵列包括阵列排布的多个忆阻器,所述贝叶斯神经网络的经训练得到的权重矩阵被映射至所述忆阻器阵列中,所述装置包括:
    获取单元,配置为对于根据所述贝叶斯神经网络的权重矩阵被映射后的所述忆阻器阵列,获得当前所述忆阻器阵列的多个忆阻器的电导态;
    前向传播单元,配置为获取对于所述贝叶斯神经网络的每个权重的多个权重采样样本,并使用所述贝叶斯神经网络进行多次前向传播以得到多个输出结果;
    计算单元,配置为基于每个权重的所述多个权重采样样本和所述多个输出结果,获取所述贝叶斯神经网络的损失函数;对所述损失函数进行反向传播,得到所述忆阻器阵列中用于所述贝叶斯神经网络每个权重的忆阻器的电导态的梯度;
    更新单元,配置为根据所述梯度更新所述忆阻器阵列中用于所述贝叶斯神经网络的忆阻器的电导态。
PCT/CN2023/092447 2022-05-09 2023-05-06 基于忆阻器阵列的贝叶斯神经网络的变分推理方法和装置 WO2023217017A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210497666.7A CN114819128A (zh) 2022-05-09 2022-05-09 基于忆阻器阵列的贝叶斯神经网络的变分推理方法和装置
CN202210497666.7 2022-05-09

Publications (1)

Publication Number Publication Date
WO2023217017A1 true WO2023217017A1 (zh) 2023-11-16

Family

ID=82514021

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/092447 WO2023217017A1 (zh) 2022-05-09 2023-05-06 基于忆阻器阵列的贝叶斯神经网络的变分推理方法和装置

Country Status (2)

Country Link
CN (1) CN114819128A (zh)
WO (1) WO2023217017A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118193298A (zh) * 2023-12-29 2024-06-14 深圳芯瑞华声科技有限公司 存算一体化芯片的测试方法、装置、电子设备及存储介质

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114819128A (zh) * 2022-05-09 2022-07-29 清华大学 基于忆阻器阵列的贝叶斯神经网络的变分推理方法和装置
CN116130046B (zh) * 2023-03-02 2024-03-08 广东技术师范大学 一种用于血压分级量化的模糊忆阻计算方法及系统

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109902801A (zh) * 2019-01-22 2019-06-18 华中科技大学 一种基于变分推理贝叶斯神经网络的洪水集合预报方法
WO2021114859A1 (zh) * 2019-12-09 2021-06-17 清华大学 利用忆阻器本征噪声实现贝叶斯神经网络的方法及装置
CN113191402A (zh) * 2021-04-14 2021-07-30 华中科技大学 基于忆阻器的朴素贝叶斯分类器设计方法、系统及分类器
US20210397936A1 (en) * 2018-11-13 2021-12-23 The Board Of Trustees Of The University Of Illinois Integrated memory system for high performance bayesian and classical inference of neural networks
CN114819128A (zh) * 2022-05-09 2022-07-29 清华大学 基于忆阻器阵列的贝叶斯神经网络的变分推理方法和装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210397936A1 (en) * 2018-11-13 2021-12-23 The Board Of Trustees Of The University Of Illinois Integrated memory system for high performance bayesian and classical inference of neural networks
CN109902801A (zh) * 2019-01-22 2019-06-18 华中科技大学 一种基于变分推理贝叶斯神经网络的洪水集合预报方法
WO2021114859A1 (zh) * 2019-12-09 2021-06-17 清华大学 利用忆阻器本征噪声实现贝叶斯神经网络的方法及装置
CN113191402A (zh) * 2021-04-14 2021-07-30 华中科技大学 基于忆阻器的朴素贝叶斯分类器设计方法、系统及分类器
CN114819128A (zh) * 2022-05-09 2022-07-29 清华大学 基于忆阻器阵列的贝叶斯神经网络的变分推理方法和装置

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
GAO DI; HUANG QINGRONG; ZHANG GRACE LI; YIN XUNZHAO; LI BING; SCHLICHTMANN ULF; ZHUO CHENG: "Bayesian Inference Based Robust Computing on Memristor Crossbar", 2021 58TH ACM/IEEE DESIGN AUTOMATION CONFERENCE (DAC), IEEE, 5 December 2021 (2021-12-05), pages 121 - 126, XP034013601, DOI: 10.1109/DAC18074.2021.9586160 *
胡小方 等 (HU, XIAOFANG ET AL.): "基于忆阻循环神经网络的层次化状态正则变分自编码器 (Hierarchical State Regularization Variational AutoEncoder Based on Memristor Recurrent Neural Network)", 电子与信息学报 (JOURNAL OF ELECTRONICS & INFORMATION TECHNOLOGY), 16 March 2022 (2022-03-16) *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118193298A (zh) * 2023-12-29 2024-06-14 深圳芯瑞华声科技有限公司 存算一体化芯片的测试方法、装置、电子设备及存储介质

Also Published As

Publication number Publication date
CN114819128A (zh) 2022-07-29

Similar Documents

Publication Publication Date Title
CN111279366B (zh) 人工神经网络的训练
WO2023217017A1 (zh) 基于忆阻器阵列的贝叶斯神经网络的变分推理方法和装置
CN110352436B (zh) 用于神经网络训练的具有迟滞更新的电阻处理单元
US10990651B2 (en) Systems and methods for efficient matrix multiplication
WO2021082325A1 (zh) 基于忆阻器的神经网络的训练方法及其训练装置
TWI673657B (zh) 具有非揮發性突觸陣列的神經網路電路
WO2021098821A1 (zh) 神经网络系统中数据处理的方法、神经网络系统
WO2018228424A1 (zh) 一种神经网络训练方法和装置
US20210192325A1 (en) Kernel transformation techniques to reduce power consumption of binary input, binary weight in-memory convolutional neural network inference engine
US11620505B2 (en) Neuromorphic package devices and neuromorphic computing systems
US11386319B2 (en) Training of artificial neural networks
US20200118001A1 (en) Mixed-precision deep-learning with multi-memristive devices
WO2023217027A1 (zh) 利用基于忆阻器阵列的环境模型的策略优化方法和装置
US10446231B2 (en) Memory cell structure
WO2023217021A1 (zh) 基于忆阻器阵列的数据处理方法和数据处理装置
US11301752B2 (en) Memory configuration for implementing a neural network
US12050997B2 (en) Row-by-row convolutional neural network mapping for analog artificial intelligence network training
TW201937413A (zh) 具有非揮發性突觸陣列的神經網路電路
US20230113627A1 (en) Electronic device and method of operating the same
CN115796252A (zh) 权重写入方法及装置、电子设备和存储介质
Agarwal et al. Designing and modeling analog neural network training accelerators
KR20170117861A (ko) 뉴럴 네트워크 시스템
US20170300810A1 (en) Neural network system
CN117808062A (zh) 计算装置、电子装置以及用于计算装置的操作方法
CN115796250A (zh) 权重部署方法及装置、电子设备和存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23802789

Country of ref document: EP

Kind code of ref document: A1