CN116128035A

CN116128035A - Training method and device, electronic equipment and computer storage medium

Info

Publication number: CN116128035A
Application number: CN202211490875.5A
Authority: CN
Inventors: 吴华强; 林钰登; 高滨; 唐建石; 张清天; 钱鹤
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2022-11-25
Filing date: 2022-11-25
Publication date: 2023-05-16

Abstract

A training method and device for a Bayesian neural network based on a memristor array, electronic equipment and a storage medium. The conductance values of memristors in the memristor array are used for mapping weights of the Bayesian neural network, and the training method comprises the following steps: acquiring first priori knowledge and second priori knowledge of the memristor array based on inherent non-ideal characteristics of memristors; calculating a total loss function of the Bayesian neural network based on the first priori knowledge; back-propagating the total loss function to update the current parameters in the Bayesian neural network to obtain object parameters; and constraining the object parameters based on the second priori knowledge to obtain a training result of the weights of the Bayesian neural network. The training method can improve the robustness of the network to memristor conductivity value fluctuation.

Description

Training method and device, electronic equipment and computer storage medium

Technical Field

Embodiments of the present disclosure relate to a training method and apparatus for a bayesian neural network based on a memristor array, an electronic device, and a computer storage medium.

Background

Memristors are non-volatile devices whose conductance state can be adjusted by application of an external stimulus. According to kirchhoff's current law and ohm's law, an array of memristors can perform multiply-accumulate calculations in parallel, and both storage and calculation occur in each device of the array. Based on the computing architecture, a computationally-intensive computation that does not require a large amount of data movement can be implemented. Meanwhile, multiply-accumulate is a core computational task required to run a neural network. Thus, using the conductance representation weight values of memristors in the array, energy efficient neural network operations can be implemented based on such computationally intensive calculations. In neural network operations, the conductance values of memristors on a memristor array represent synaptic weights in the neural network. These conductance values should be written to the memristors before the memristor array is used to calculate the acceleration. Some widely used offline training methods may enable the writing of conductance values into memristors.

The bayesian neural network (Bayesian neural network, BNN) is a parameterized model that places the flexibility of the neural network in a bayesian framework. All weights in BNN are not fixed values as in conventional neural networks, but are represented by probability distributions.

Disclosure of Invention

At least one embodiment of the present disclosure provides a training method for a bayesian neural network based on a memristor array, conductance values of memristors in the memristor array being used to map weights of the bayesian neural network, the training method comprising: acquiring first priori knowledge and second priori knowledge of the memristor array based on inherent non-ideal characteristics of the memristor; calculating a total loss function of the bayesian neural network based on the first priori knowledge; counter-propagating the total loss function to update current parameters in the Bayesian neural network to obtain object parameters; and constraining the object parameters based on the second prior knowledge to obtain a training result of the weights of the bayesian neural network.

For example, a training method provided by at least one embodiment of the present disclosure, the first priori knowledge including a weight fluctuation standard deviation, calculating the total loss function of the bayesian neural network based on the first priori knowledge, includes: the total loss function is calculated using variation learning based on the weight fluctuation standard deviation, wherein the variation learning includes a complexity cost term, and the weight fluctuation standard deviation is used as an a priori standard deviation in the complexity cost term.

For example, in a training method provided in at least one embodiment of the present disclosure, before performing a first training, the bayesian neural network is initialized with initialization parameters, the initialization parameters obeying a gaussian distribution, wherein the initialization parameters are the current parameters of the first training of the bayesian neural network.

For example, object parameters provided in at least one embodiment of the present disclosure include a mean of the gaussian distribution, the second prior knowledge including a weight window range of memristor cells in the memristor array; constraining the object parameters based on the second prior knowledge to obtain the training result of the weights of the bayesian neural network, comprising: and constraining the mean value of the Gaussian distribution in the object parameter to be within the weight window range.

For example, in a training method provided in at least one embodiment of the present disclosure, the object parameter includes a standard deviation of the gaussian distribution, and the second prior knowledge includes a read fluctuation standard deviation, where the read fluctuation standard deviation is an error caused by reading a conductance value of a memristor in the memristor array; constraining the object parameters based on the second prior knowledge to obtain the training result of the weights of the bayesian neural network, comprising: and constraining the standard deviation of the Gaussian distribution in the object parameter so that the standard deviation of the Gaussian distribution is greater than or equal to the read fluctuation standard deviation.

For example, in a training method provided by at least one embodiment of the present disclosure, obtaining a first a priori knowledge of the memristor array based on intrinsic non-ideal characteristics of the memristor includes: carrying out multiple groups of electrical tests on memristors in the memristor array to obtain multiple test results of the multiple groups of electrical tests, wherein each test result comprises a weight fluctuation standard deviation of the memristors in the memristor array; and taking the maximum value of the weight fluctuation standard deviation as the first priori knowledge.

In a training method provided by at least one embodiment of the present disclosure, for example, a memristor array includes a plurality of rows and columns of memristor cells, each memristor cell including a first memristor and a second memristor provided in pairs,

the second prior knowledge includes a weight window range of memristors in the memristor array, the weights of the bayesian neural network are expressed based on a difference value between a conductance value of a first memristor included in each memristor unit in the memristor array and a conductance value of a second memristor, and the obtaining of the second prior knowledge of the memristor array based on inherent non-ideal characteristics of the memristor array includes: calculating a value range of a difference value between a conductance value of a first memristor and a conductance value of a second memristor included in each memristor unit in the memristor array; and taking the value range as the weight window range.

For example, in a training method provided in at least one embodiment of the present disclosure, a training result includes a training mean of weights in the bayesian neural network, and the training method further includes: judging whether the training of the Bayesian neural network reaches the preset training times or not; and if the preset training times are reached, mapping the training mean value to memristors in the memristor array so as to utilize the memristor array to calculate, or if the preset training times are not reached, taking the training result as the current parameter, and training the Bayesian neural network again until the preset training times are reached.

At least one embodiment of the present disclosure also provides a training apparatus for a bayesian neural network based on a memristor array, conductance values of memristors in the memristor array being used to map weights of the bayesian neural network, the training apparatus comprising: an acquisition unit configured to acquire a first priori knowledge and a second priori knowledge of the memristor array based on intrinsic non-ideal characteristics of the memristor; a calculation unit configured to calculate a total loss function of the bayesian neural network based on the first prior knowledge; the updating unit is configured to back propagate the total loss function so as to update the current parameters in the Bayesian neural network to obtain object parameters; and a constraint unit configured to constrain the object parameters based on the second prior knowledge to obtain a training result of the weights of the bayesian neural network.

At least one embodiment of the present disclosure also provides an electronic device, including: a processor; a memory storing one or more computer program instructions; the one or more computer program instructions, when executed by the processor, are for implementing the training method provided by at least one embodiment of the present disclosure.

At least one embodiment of the present disclosure also provides a computer-readable storage medium having computer-readable instructions stored non-transitory for implementing the training method provided by at least one embodiment of the present disclosure when the computer-readable instructions are executed by a processor.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings of the embodiments will be briefly described below, and it is apparent that the drawings in the following description relate only to some embodiments of the present disclosure, not to limit the present disclosure.

FIG. 1A shows a schematic structure of a memristor array;

FIG. 1B is a schematic diagram of a memristor device;

FIG. 1C is a schematic diagram of another memristor device;

FIG. 1D illustrates a schematic diagram of mapping a weighting matrix of a Bayesian neural network to a memristor array;

FIG. 2 illustrates a schematic flow diagram of a training method for a Bayesian neural network based on a memristor array provided by at least one embodiment of the present disclosure;

FIG. 3 shows a schematic flow chart of a method of obtaining a first priori knowledge provided by at least one embodiment of the present disclosure;

FIG. 4A shows a schematic flow chart of a method of obtaining second prior knowledge provided by at least one embodiment of the present disclosure;

FIG. 4B is a schematic diagram of another memristor array provided in accordance with at least one embodiment of the present disclosure;

FIG. 4C is a schematic diagram of another memristor array provided in accordance with at least one embodiment of the present disclosure;

FIG. 5 illustrates a schematic flow diagram of another training method provided by at least one embodiment of the present disclosure;

FIG. 6 illustrates a schematic block diagram of a training apparatus for a Bayesian neural network based on a memristor array provided by at least one embodiment of the present disclosure;

FIG. 7 is a schematic block diagram of an electronic device provided by some embodiments of the present disclosure;

FIG. 8 is a schematic block diagram of another electronic device provided by some embodiments of the present disclosure;

fig. 9 is a schematic diagram of a storage medium according to some embodiments of the present disclosure.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present disclosure more apparent, the technical solutions of the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings of the embodiments of the present disclosure. It will be apparent that the described embodiments are some, but not all, of the embodiments of the present disclosure. All other embodiments, which can be made by one of ordinary skill in the art without the need for inventive faculty, are within the scope of the present disclosure, based on the described embodiments of the present disclosure.

Unless defined otherwise, technical or scientific terms used in this disclosure should be given the ordinary meaning as understood by one of ordinary skill in the art to which this disclosure belongs. The terms "first," "second," and the like, as used in this disclosure, do not denote any order, quantity, or importance, but rather are used to distinguish one element from another. Likewise, the terms "a," "an," or "the" and similar terms do not denote a limitation of quantity, but rather denote the presence of at least one. The word "comprising" or "comprises", and the like, means that elements or items preceding the word are included in the element or item listed after the word and equivalents thereof, but does not exclude other elements or items. The terms "connected" or "connected," and the like, are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "upper", "lower", "left", "right", etc. are used merely to indicate relative positional relationships, which may also be changed when the absolute position of the object to be described is changed.

In an offline training process of the neural network, the neural network is trained by using a related software platform, and then target weights obtained through training are transferred to the memristor array. However, the conductivity cell of a memristor may fluctuate due to diffusion or recombination of oxygen vacancies in the multiple weakly conductive filament regions. Even with the same programming conditions, the conductivity change of the memristors may be different. Due to the non-ideal nature of this conductivity change, deviations of the weights written to the memristors from the trained target weights are unavoidable, resulting in a significant degradation of network performance. Current training learning algorithms based on neural networks of memristor arrays cannot be coordinated with inherent non-ideal characteristics of memristors (e.g., device-to-device fluctuations, device conductance clamping stagnation, conductance state drift, etc.). The trained network is often a model based on weight determination in which each parameter is described by a single value, regardless of the uncertainty of the weight. Such training methods lack a means to handle the random characteristics of memristors, so that disturbances in memristor conductance values can have a large impact on network performance. In most artificial intelligence systems, uncertainty in the capture parameters is very important. Probabilistic models provide a way to solve uncertainty problems that enable one to make informed decisions with predictions of the model while taking careful attitude towards the uncertainty of those predictions.

A bayesian neural network (Bayesian Neural Network, BNN) is a probabilistic model that places the neural network in a bayesian framework and can describe complex stochastic patterns. To take into account the uncertainty of the parameters, a bayesian model can preferably be constructed. Under the bayesian model, the parameters are not represented by a single value, but by a probability distribution. Given observation data, the distribution of all parameters of the bayesian model is called posterior distribution. As an analogue to deriving the best deterministic model by gradient-based updating, bayesian machine learning aims at learning an approximation of the posterior distribution.

At least one embodiment of the present disclosure provides a training method for a bayesian neural network based on a memristor array in which conductance values of memristors are used to map weights of the bayesian neural network. The training method comprises the following steps: acquiring first priori knowledge and second priori knowledge of the memristor array based on inherent non-ideal characteristics of memristors; calculating a total loss function of the Bayesian neural network based on the first priori knowledge; back-propagating the total loss function to update the current parameters in the Bayesian neural network to obtain object parameters; and constraining the object parameters based on the second prior knowledge to obtain a training result of the weights of the Bayesian neural network.

The training method provided by the embodiment of the disclosure can ensure that the output of the memristor neural network is robust and reliable even under the disturbance of the memristor weight, and the robustness of the neural network to the fluctuation of the memristor conductance value is improved.

At least one embodiment of the present disclosure further provides a training apparatus corresponding to the above-described bayesian neural network for memristor-based arrays.

Embodiments of the present disclosure will be described in detail below with reference to the attached drawings, but the present disclosure is not limited to these specific embodiments.

First, a training method of the bayesian neural network is described.

Given a data set D, the training objective of bayesian neural networks is to optimize the posterior distribution of weights p (w|d) using bayesian theorem:

where p (w) is the a priori weight distribution, p (d|w) =p (y|x, w) is the likelihood corresponding to the bayesian neural network output, and p (D) is the edge likelihood, i.e., evidence. Since the true posterior distribution p (w|d) is difficult to achieve, the true posterior is not directly calculated, but is generally approximated using an inference method. For example, a posterior distribution of weights of the bayesian neural network is approximated by a variation learning method.

The variational learning is to find the parameter θ of the distribution q (w|θ) of weights of the bayesian neural network so as to minimize the KL (Kullback-Leibler) divergence between this distribution and the true posterior distribution. KL divergence is used to measure how close the distribution q (w|θ) is to the true posterior distribution. KL divergence, also known as relative entropy or information divergence, is a measure of asymmetry of the difference between two probability distributions. By mathematical transformation, the objective of minimizing the KL divergence between q (w|θ) and p (w|D) can be expressed as:

BNN derives a posterior weight distribution from a priori p (w) and likelihood probability p (D|w). This main feature introduces a weight uncertainty of the network into the learning process. Thus, the learned weight parameters and calculations are robust under weight disturbances.

FIG. 1A illustrates a schematic structure of a memristor array, for example, composed of a plurality of memristor cells that constitute an array of M rows and N columns, with M and N each being a positive integer. Each memristor cell includes a switching element and one or more memristors. In FIG. 1A, WL <1>, WL <2> … … WL < M > represent the word lines of the Mth row of the first row, the second row … …, respectively, with the gate of the switching element (e.g., the gate of a transistor) in the memristor cell circuit of each row connected to the word line corresponding to that row; BL <1>, BL <2> … … BL < N > represent the bit lines of the N-th column of the first column and the second column … … respectively, and the memristors in the memristor unit circuits of each column are connected with the bit lines corresponding to the column; SL <1>, SL <2> … … SL < M > represent the source lines of the Mth row of the first row and the second row … …, respectively, and the sources of the transistors in the memristor cell circuits of each row are connected to the source line corresponding to that row. According to kirchhoff's law, the memristor array may complete multiply-accumulate computations in parallel by setting the states (e.g., resistance) of the memristor cells and applying corresponding word line and bit line signals at the word and bit lines.

FIG. 1B is a schematic diagram of a memristor device including a memristor array and its peripheral drive circuitry. For example, as shown in FIG. 1B, the memristor device includes a signal acquisition device, a word line drive circuit, a bit line drive circuit, a source line drive circuit, a memristor array, and a data output circuit.

For example, the signal acquisition device is configured to convert a digital signal to a plurality of analog signals through a digital-to-analog converter (Digital to Analog converter, DAC for short) for input to a plurality of column signal inputs of the memristor array.

For example, a memristor array includes M source lines, M word lines, and N bit lines, and a plurality of memristor cells arranged in M rows and N columns.

For example, operation of the memristor array is achieved by a word line drive circuit, a bit line drive circuit, and a source line drive circuit.

For example, the word line driving circuit includes a plurality of multiplexers (Mux) for switching word line input voltages; the bit line driving circuit includes a plurality of multiplexers for switching bit line input voltages; the source line driving circuit also includes a plurality of multiplexers (Mux) for switching the source line input voltages. For example, the source line driving circuit further includes a plurality of ADCs for converting analog signals into digital signals. In addition, a transimpedance amplifier (Trans-Impedance Amplifier, TIA for short) (not shown in the figure) can be further arranged between the Mux and the ADC in the source line driving circuit to complete the conversion from current to voltage, so as to facilitate ADC processing.

For example, a memristor array includes an operational mode and a computation mode. When the memristor array is in the operation mode, the memristor unit is in an initialized state, and values of parameter elements in the parameter matrix can be written into the memristor array. For example, the source line input voltage, the bit line input voltage, and the word line input voltage of the memristor are switched to corresponding preset voltage intervals through the multiplexer.

For example, the word line input voltage is switched to the corresponding voltage interval by the control signal wl_sw [1:M ] of the multiplexer in the word line driving circuit in fig. 1B. For example, when the memristor is set, the word line input voltage is set to 2V (volts), for example, when the memristor is reset, the word line input voltage is set to 5V, for example, the word line input voltage may be obtained by the voltage signal v_wl [1:M ] in fig. 1B.

For example, the source line input voltage is switched to the corresponding voltage section by the control signal sl_sw [1:M ] of the multiplexer in the source line driving circuit in fig. 1B. For example, when the memristor is set, the source line input voltage is set to 0V, for example, when the memristor is reset, the source line input voltage is set to 2V, for example, the source line input voltage may be obtained by the voltage signal v_sl [1:M ] in fig. 1B.

For example, the bit line input voltages are switched to the corresponding voltage intervals by the control signals BL_sw [1:N ] of the multiplexers in the bit line driving circuit in FIG. 1B. For example, when the memristor is set, the bit line input voltage is set to 2V, for example, when the memristor is reset, the bit line input voltage is set to 0V, for example, the bit line input voltage may be obtained by the DAC in fig. 1B.

For example, when the memristor array is in a calculation mode, the memristors in the memristor array are in a conductive state available for calculation, and the bit line input voltage input by the column signal input terminal does not change the conductance value of the memristor, for example, the calculation can be completed by performing a multiply-add operation through the memristor array. For example, the word line input voltage is switched to a corresponding voltage section by the control signal wl_sw [1:M ] of the multiplexer in the word line driving circuit in fig. 1B, for example, when an on signal is applied, the word line input voltage of a corresponding row is set to 5V, for example, when an on signal is not applied, the word line input voltage of a corresponding row is set to 0V, for example, the GND signal is turned on; the source line input voltage is switched to a corresponding voltage interval, for example, the source line input voltage is set to 0V by the control signal sl_sw [1:M ] of the multiplexer in the source line driving circuit in fig. 1B, so that the current signals of the plurality of row signal output terminals can flow into the data output circuit, and the bit line input voltage is switched to a corresponding voltage interval, for example, the bit line input voltage is set to 0.1V-0.3V by the control signal bl_sw [1:n ] of the multiplexer in the bit line driving circuit in fig. 1B, so that the memristor array is utilized for performing the multiply-add operation.

For example, the data output circuit may include a plurality of transimpedance amplifiers (TIAs), ADCs, which may convert current signals at the plurality of row signal outputs to voltage signals and then to digital signals for subsequent processing.

FIG. 1C is a schematic diagram of another memristor device. The memristor device shown in fig. 1C is substantially the same structure as the memristor device shown in fig. 1B, and also includes a memristor array and its peripheral driving circuitry. For example, as shown in FIG. 1C, the memristor device includes a signal acquisition device, a word line drive circuit, a bit line drive circuit, a source line drive circuit, a memristor array, and a data output circuit.

For example, a memristor array includes M source lines, 2M word lines, and 2N bit lines, and a plurality of memristor cells arranged in an array of M rows and N columns. For example, each memristor cell is a 2T2R structure by which mapping for positive as well as negative values can be achieved. The operation of mapping the parameter matrix for the transformation process to different ones of the memristor cells in the memristor array is not described in detail herein. It should be noted that the memristor array may also include M source lines, M word lines, 2N bit lines, and a plurality of memristor cells arranged in M rows and N columns.

The description of the signal acquisition device, the control driving circuit, and the data output circuit may refer to the previous description, and will not be repeated here.

For example, reference may be made to FIG. 1D with respect to a process of mapping a weighting matrix of a Bayesian neural network to a memristor array.

FIG. 1D illustrates a process of mapping a weighting matrix of a Bayesian neural network to a memristor array. And realizing a weight matrix between layers in the Bayesian neural network by utilizing the memristor array, realizing distribution corresponding to the weight by using N memristors for each weight, calculating N conductance values aiming at the corresponding random probability distribution of the weight, and mapping the N conductance value distribution into the N memristors. In this way, the weight matrix in the bayesian neural network is converted into a target conductance value that is mapped into a crossing sequence of memristor arrays.

As shown in fig. 1D, the left side of the figure is a three-layer bayesian neural network including 3-layer neuron layers connected one to the other. For example, the input layer includes a layer 1 neuron layer, the hidden layer includes a layer 2 neuron layer, and the output layer includes a layer 3 neuron layer. For example, the input layer transfers the received input data to the hidden layer, the hidden layer performs calculation conversion on the input data and sends the result to the output layer, and the output layer outputs the output structure of the bayesian neural network.

As shown in fig. 1D, the input layer, the hidden layer and the output layer each include a plurality of neuron nodes, and the number of the neuron nodes in each layer can be set according to different application situations. Example(s)For example, the number of neurons of the input layer is 2 (including N ₁ And N ₂ ) The number of neurons in the middle hidden layer is 3 (including N ₃ 、N ₄ And N ₅ ) The number of neurons of the output layer was 1 (including N ₆ )。

As shown in fig. 1D, two adjacent neuron layers of the bayesian neural network are connected through a weight matrix. For example, the weight matrix is implemented by a memristor array as shown on the right side of fig. 1D.

The structure of the memristor array on the right in fig. 1D is shown in fig. 1A, for example, and the memristor array may include a plurality of memristors arranged in an array. In the example shown in FIG. 1D, the weights are mapped to the conductance of the memristor array according to a certain rule, the weights between the connection input N1 and the output N3 are represented by 3 memristors (G ₁₁ 、G ₁₂ 、G ₁₃ ) Other weights in the weight matrix may be implemented identically. More specifically, the source line SL ₁ Corresponding neuron N ₃ Source line SL ₂ Corresponding neuron N ₄ Source line SL ₅ Corresponding neuron N ₅ Bit line BL ₁ 、BL ₂ And BL (BL) ₃ Corresponding to neuron N1, a weight between the input layer and the hidden layer (neuron N ₁ And neuron N ₃ Weights between) are converted into three target conductance values according to a distribution, and the distribution is mapped into a cross sequence of memristor arrays, wherein the target conductance values are G respectively ₁₁ 、G ₁₂ And G ₁₃ Outlined with a dashed box in the memristor array.

In other embodiments of the present disclosure, the weights in the weight matrix are programmed directly to the conductance of the memristor array, i.e., the weights in the weight matrix are in one-to-one correspondence with the memristors in the memristor array. Each weight corresponds to that weight using 1 memristor implementation.

In other embodiments of the present disclosure, a difference in conductance of two memristors may also be utilized to represent a weight. For example, the difference in conductance of two memristors in the same column and adjacent rows represents a weight. That is, each weight corresponds to that weight using 2 memristor realizations.

FIG. 2 illustrates a schematic flow diagram of a training method for a Bayesian neural network based on a memristor array provided by at least one embodiment of the present disclosure.

For example, a memristor array includes a plurality of memristors arranged in an array of rows and columns; for example, a trained weight matrix of a bayesian neural network is mapped into a memristor array.

For example, the structure of the bayesian neural network includes a fully connected structure or a convolutional neural network structure, or the like. Each weight of the bayesian neural network is a random variable. For example, after the bayesian neural network is trained, each weight is a distribution, such as a gaussian distribution or a laplace distribution.

For example, offline (offline) training may be performed on the bayesian neural network to obtain a weight matrix, and the method for training the bayesian neural network may refer to a conventional method, for example, training may be performed by using a Central Processing Unit (CPU), an image processing unit (GPU), a neural Network Processing Unit (NPU), a neural network accelerator, and the like, which are not described herein.

As shown in fig. 2, the training method includes the following steps S101 to S104.

Step S101: a first priori knowledge and a second priori knowledge of the memristor array based on intrinsic non-ideal characteristics of the memristors are obtained.

Step S102: based on the first prior knowledge, a total loss function of the bayesian neural network is calculated.

Step S103: and back-propagating the total loss function to update the current parameters in the Bayesian neural network to obtain the object parameters.

Step S104: and constraining the object parameters based on the second priori knowledge to obtain a training result of the weights of the Bayesian neural network.

The training method provided by the embodiment of the disclosure integrates the influence of memristor conductivity value fluctuation into the training of the Bayesian neural network, so that the output of the memristor neural network is robust and reliable even under the disturbance of the memristor weight.

In one of the present disclosureIn some embodiments, the weight values written to the memristor array may be considered as samples generated from some distribution, e.g., the weights represented by the memristors may be considered as being in BNN following a Gaussian distribution N (μ, σ) ² ) Is an uncertain weight of (2). The average μ is the target weight value to be transferred onto the memristor array. Second, when optimizing complexity cost terms in BNN objective function, weight distribution N (μ, σ) ² ) Will resemble the prior as much as possible.

For step S101, inherent non-ideal characteristics of memristors include, for example, device-to-device fluctuations, device conductance clamping, conductance state drift, and the like. The inherent non-ideal characteristics of these memristors, for example, cause the conductance values written to the memristors to drift, thereby affecting the computational accuracy of the memristor array.

The first a priori knowledge may include, for example, a fluctuating parameter of the conductance value of the memristor cell. The fluctuation parameter may include, for example, a weight fluctuation standard deviation (i.e., a fluctuation standard deviation of the conductance value), a weight fluctuation mean (i.e., a fluctuation mean of the conductance value), and the like. The fluctuation parameters are not limited to the weight fluctuation standard deviation and the weight fluctuation mean value, but can be other parameters, and the fluctuation parameters are not particularly limited in the disclosure. For example, the fluctuation parameter may also include a fluctuation range of the conductance value. In embodiments of the present disclosure, a memristor cell may include a combination of 1 memristor and 1 transistor (1T 1R), a combination of 2 memristors and 2 transistors (2T 2R), and so on.

The second a priori knowledge may include, for example, read fluctuation parameters. The read fluctuation parameter refers to an error caused by reading the conductance value of the memristor. For example, errors due to inaccuracy in reading the conductance values of the memristors from voltmeters, ammeter, or multimeters. The read ripple parameter may include, for example, a read ripple standard deviation, which may be calculated by reading the conductance value of the memristor multiple times.

The second a priori knowledge may also include, for example, a range of weight windows. The weight window range refers to a value range of the conductance value of the memristor unit corresponding to the conductance value, for example. A memristor cell may include one or more memristors.

Fig. 3 and 4 below illustrate embodiments of obtaining the first priori knowledge and the second priori knowledge, see description below.

For step S102, for example, the first priori knowledge includes the weight fluctuation standard deviation sigma _proir Step S102 includes: based on the weight fluctuation standard deviation, the total loss function is calculated by using variation learning. The variation learning includes a complexity cost term, and the weight fluctuation standard deviation is used as an a priori standard deviation in the complexity cost term.

For example, in some embodiments of the present disclosure, the total loss function obtained using variation learning includes KL loss terms and likelihood loss terms, as described above. For example, the expression of the total loss function is as follows:

F(D,θ)＝KL[q(w|θ)||P(w)]-E _q(w|θ) [logP(D|w)]，

Wherein KL [ q (w|θ) ||P (w)]As KL loss term, E _q(w|θ) [logP(D|w)]Is a likelihood loss term.

In this example, the weight fluctuates by a standard deviation σ _proir As a priori standard deviation P (w), the weight fluctuation standard deviation sigma _proir And carrying out the expression of the loss function, and calculating to obtain the total loss function.

In case the first a priori knowledge is the other parameter, a person skilled in the art can calculate the total loss function of the other parameter from the expression of the total loss function.

For step S103, for example, when training parameters such as weights of the neural network by using a gradient descent method, it is necessary to calculate partial derivatives of the weights by using back propagation to calculate a gradient of the conductance states of memristors for each weight of the weight matrix of the bayesian neural network. Geometrically, the direction along the gradient is where the function increases fastest, and the opposite direction along the gradient is where the function decreases fastest, so that the minimum is easier to find.

For example, for each parameter μ of the Bayesian neural network _i ,σ _i Each weight w _i Obeying gaussian distribution

Counter-propagating the total loss function calculated in step S102 to calculate each current parameter mu _i ,σ _i Each parameter is updated according to the update amount delta. For example, parameter μ _i Updated to mu _i +Δ, parameter σ _i Updated to sigma _i +Δ. The object parameter is the current parameter mu _i ,σ _i Updated parameter mu _i +delta and sigma _i +Δ。

In some embodiments of the present disclosure, the bayesian neural network is initialized with initialization parameters that follow a gaussian distribution prior to the first training, the initialization parameters being current parameters for the first training of the bayesian neural network.

In some embodiments of the present disclosure, the object parameter comprises a mean of a gaussian distribution, and the second prior knowledge comprises a weight window range of memristor cells in the memristor array.

Step S104 may include constraining the mean of the gaussian distribution in the object parameters to be within a weight window.

For example, the weight window ranges from [ -w _max ,w _max ]Mu in the object parameters _i Constraint of +Δ to [ -w _max ,w _max ]And (3) inner part. For example mu _i +Δ<-w _max Mu is then _i The +delta constraint is-w _max 。

In this embodiment, due to the limited conductivity window of the memristor, the weights will be truncated to a symmetrical range, i.e., the above weight window range [ -w _max ,w _max ]And in the method, the Bayesian neural network is more compatible with the memristor array, and the mean value of Gaussian distribution is prevented from being out of the range of the weight window and is difficult to map into the memristor unit.

The weight window range is a range of conductance values of the memristor cell. For example, the memristor cell includes a first memristor and a second memristor, and a difference between a conductance value of the first memristor and a conductance value of the second memristor forms a conductance value of the memristor cell as a weight of the bayesian neural network.

In other embodiments of the present disclosure, the subject parameter comprises a standard deviation of a gaussian distribution, and the second prior knowledge comprises a read fluctuation standard deviation, which is an error introduced by reading conductance values of memristors in the memristor array.

Step S104 may include constraining the standard deviation of the gaussian distribution in the object parameters such that the standard deviation of the gaussian distribution is greater than or equal to the read fluctuation standard deviation.

For example, the standard deviation of the Gaussian distribution is sigma _i Equal to 0.1, the standard deviation of the read fluctuation is sigma _read Equal to 0.15 sigma _i <σ _read The standard deviation sigma of the Gaussian distribution is then _i Constraint of 0.15, i.e. sigma _i The value of (2) is updated to 0.15.

And the standard deviation of Gaussian distribution is constrained to be greater than or equal to the standard deviation of read fluctuation, so that the Bayesian neural network can accommodate errors caused by the read fluctuation, and the robustness is better. Since the read ripple is always present during the life cycle of the memristor, the conductivity deviation is not zero, further limiting the minimum of the posterior standard deviation during learning (i.e., the read ripple standard deviation is σ _read )。

Fig. 3 shows a schematic flow chart of a method of obtaining first a priori knowledge provided by at least one embodiment of the present disclosure.

As shown in fig. 3, the method may include step S111 and step S121.

Step S111: and carrying out multiple groups of electrical tests on memristors in the memristor array to obtain multiple test results of the multiple groups of electrical tests, wherein each test result comprises a weight fluctuation standard deviation of the memristors in the memristor array.

Step S121: the maximum value of the weight fluctuation standard deviation is taken as the first priori knowledge.

This embodiment may use the worst case memristor weight fluctuation as a priori knowledge, obtaining a large standard deviation of the posterior during training. After training, the learned weight distribution has higher tolerance to the worst case, and simultaneously ensures the network performance.

For step S111, for example, each set of electrical tests includes a plurality of electrical tests. Multiple electrical tests may be performed for one memristor in the array of memristors, or for multiple memristors in the array of memristors.

For example, each electrical test includes a write operation, a delay operation, and a read operation on the memristor. The delay operation is to perform the reading operation after a predetermined delay time. The delay operation is to wait for the memristor to drift in the conductance value written to the memristor due to inherent non-ideal characteristics. The preset time length is, for example, 1s, 10s, etc., and the present disclosure is not limited to a specific time length of the preset time. For example, for each set of electrical tests, an initial conductance value is first written to the memristor, after a preset length of time, the current conductance value of the memristor is read, and the absolute value of the difference between the current conductance value and the initial conductance value is taken as the weight fluctuation value for this electrical test. And performing multiple electrical tests according to the electrical test method to obtain multiple weight fluctuation values, so that the standard deviation of the multiple weight fluctuation values is calculated to be the weight fluctuation standard deviation obtained by the electrical test.

For step S121, for example, the weight fluctuation standard deviations obtained by the plurality of sets of electrical tests are compared to obtain a maximum value of the weight fluctuation standard deviations, which is regarded as the first priori knowledge, i.e., the weight fluctuation standard deviation sigma _proir 。

Fig. 4A shows a schematic flow chart of a method of obtaining second prior knowledge provided by at least one embodiment of the present disclosure.

As shown in fig. 4A, the method may include step S131 and step S141.

Step S131: and calculating a value range of a difference value between the conductance value of the first memristor and the conductance value of the second memristor included in each memristor unit in the memristor array.

Step S141: the value range is used as a weight window range.

In this embodiment, the memristor array includes a plurality of rows and columns of memristor cells, each memristor cell including a first memristor and a second memristor provided in pairs. The second prior knowledge includes a range of weight windows for memristors in the memristor array, and the weights of the bayesian neural network are represented based on a difference between a conductance value of a first memristor included in each memristor cell in the memristor array and a conductance value of a second memristor.

For step S131, for example, the conductance value of the first memristor is a, the conductance value of the second memristor is b, and the difference between the conductance values of the first and second memristors is a-b. a >0, and b >0.

FIG. 4B is a schematic block diagram of a memristor array provided in accordance with at least one embodiment of the present disclosure.

As shown in fig. 4B, the memristor 401 and the memristor 402 may form a memristor pair, and the conductance value of the memristor 401 is denoted as G ₁₁ The conductance value of memristor 402 is denoted as G ₁₂ . Because the memristor 402 is connected to an inverter, when the memristor 401 receives an input voltage signal of positive polarity, the inverter may invert the polarity of the input voltage signal, thereby causing the memristor 402 to receive an input voltage signal of negative polarity. For example, the input voltage signal received by memristor 401 is denoted by v (t), and the input voltage signal received by memristor 402 is denoted by-v (t). Memristor 401 and memristor 402 are connected to two different SLs, and an input voltage signal generates an output current through the memristor. The output current through memristor 401 and the output current through memristor 402 are superimposed at the SL end. Thus, the result of the memristor 401 and memristor 402 multiply-accumulate computation is v (t) G ₁₁ +(-v(t))G ₁₂ That is v (t) (G) ₁₁ -G ₁₂ ). Thus, the memristor pair consisting of memristor 401 and memristor 402 may correspond to a weight, and the weight value is G ₁₁ -G ₁₂ By configuring G ₁₁ -G ₁₂ The numerical relationship of (c) may implement positive, zero, and negative elements.

FIG. 4C is a schematic diagram of another memristor array provided in accordance with at least one embodiment of the present disclosure.

As shown in fig. 4C, for example, the memristor 401 and the memristor 402 may form a memristor pair, and the conductance value of the memristor 401 is denoted as G ₁₁ The conductance value of memristor 402 is denoted as G ₁₂ . Unlike FIG. 4A, the memristor 402 does not haveAnd is connected to an inverter so that when memristor 401 receives an input voltage signal of positive polarity, memristor 402 also receives an input voltage signal of positive polarity. For example, the input voltage signal received by memristor 401 is denoted by v (t), and the input voltage signal received by memristor 402 is also denoted by v (t). Memristor 401 and memristor 402 are connected to two different SLs, and the output current through memristor 401 and the output current through memristor 402 are subtracted at the SL ends. Thus, the result of the memristor 401 and memristor 402 multiply-accumulate computation is v (t) G ₁₁ -v(t)G ₁₂ I.e. v ₀ (t)(G ₁₁ -G ₁₂ ). Thus, the memristor pair consisting of memristor 401 and memristor 402 may be weighted with a weight value of G ₁₁ -G ₁₂ By configuring G ₁₁ -G ₁₂ The numerical relationship of (c) may implement positive, zero, and negative elements.

For example, to a first memristor G ₁₁ The written conductance value is in the first value range, and the second memristor G is connected with the first memristor G ₁₂ And the written conductance value is in a second value range, and the value range of the difference value between the conductance value of the first memristor and the conductance value of the second memristor is calculated according to the first value range and the second value range.

For step S141, for example, the difference between the conductance of the first memristor and the conductance of the second memristor is in the range of [ -w _max ,w _max ]The weight window range is [ -w _max ,w _max ]。

Fig. 5 shows a schematic flow chart of another training method provided by at least one embodiment of the present disclosure.

As shown in fig. 5, the training method further includes steps S501 to S503 in addition to steps S101 to S104 shown in fig. 2. Steps S501 to S503 may be performed after step S104, for example. The training results include training averages of weights in the bayesian neural network.

Step S501: judging whether the training of the Bayesian neural network reaches the preset training times.

Step S502: and if the preset training times are reached, mapping the training mean value to memristors in the memristor array so as to calculate by using the memristor array.

Step S503: if the preset training times are not reached, training the Bayesian neural network again by taking the training result as the current parameter until the preset training times are reached.

For step S501, the preset number of training times is set empirically by those skilled in the art, for example. For example, in the process of training the bayesian neural network, the number of training times of the bayesian neural network is counted. For example, each time an object parameter is obtained, the bayesian neural network is represented as having completed a training, the count is once. Judging whether the current count value reaches the preset training times or not.

For step S502, if the current count value reaches the preset training times, mapping an average matrix formed by training averages of the bayesian neural network into the memristor array, where elements in the average matrix correspond to memristor units in the memristor array one by one.

For step S503, if the preset training frequency is not reached, for example, the training result is used as the current parameter, and steps S102 to S103 and S501 are executed again until the training of the bayesian neural network reaches the preset training frequency.

FIG. 6 illustrates a schematic block diagram of a training apparatus 600 for a Bayesian neural network based on a memristor array provided by at least one embodiment of the present disclosure. The training apparatus 400 may be used to perform the training method shown in fig. 2. For example, a memristor array includes a plurality of memristors arranged in an array, and a trained weight matrix of a bayesian neural network is mapped into the memristor array.

As shown in fig. 6, the training apparatus 600 includes an acquisition unit 601, a calculation unit 602, an update unit 603, and a constraint unit 604.

The acquisition unit 601 is configured to acquire a first priori knowledge and a second priori knowledge of the memristor array based on intrinsic non-ideal characteristics of the memristors.

The calculation unit 602 is configured to calculate a total loss function of the bayesian neural network based on the first prior knowledge.

The updating unit 603 is configured to back-propagate the total loss function to update current parameters in the bayesian neural network to obtain object parameters.

The constraint unit 604 is configured to constrain the object parameters based on the second prior knowledge to obtain a training result of the weights of the bayesian neural network.

The technical effects of the training device are the same as those of the training method shown in fig. 2, and will not be described herein.

For example, the acquisition unit 601, the calculation unit 602, the update unit 603, and the constraint unit 604 may be hardware, software, firmware, and any feasible combination thereof. For example, the acquisition unit 601, the calculation unit 602, the update unit 603, and the constraint unit 604 may be dedicated or general-purpose circuits, chips, devices, or the like, or may be a combination of a processor and a memory. With respect to the specific implementation forms of the respective units described above, the embodiments of the present disclosure are not limited thereto.

It should be noted that, in the embodiment of the present disclosure, each unit of the training apparatus 600 corresponds to each step of the foregoing training method, and the specific function of the training apparatus 600 may refer to the related description of the training method, which is not repeated herein. The components and structures of the exercise device 600 shown in fig. 6 are exemplary only and not limiting, and the exercise device 600 may also include other components and structures as desired.

At least one embodiment of the present disclosure also provides an electronic device comprising a processor and a memory storing one or more computer program instructions. One or more computer program instructions, when executed by the processor, are configured to implement the training method described above. The electronic device can improve the robustness of the network to memristor conductance value fluctuation.

Fig. 7 is a schematic block diagram of an electronic device provided by some embodiments of the present disclosure. As shown in fig. 7, the electronic device 700 includes a processor 710 and a memory 720. Memory 720 is used to store non-transitory computer-readable instructions (e.g., one or more computer program modules). Processor 710 is configured to execute non-transitory computer readable instructions that, when executed by processor 710, may perform one or more of the steps of the training method described above. The memory 720 and the processor 710 may be interconnected by a bus system and/or other forms of connection mechanisms (not shown).

For example, processor 710 may be a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), or other form of processing unit having data processing capabilities and/or program execution capabilities. For example, the Central Processing Unit (CPU) may be an X76 or ARM architecture, or the like. Processor 710, which may be a general purpose processor or a special purpose processor, may control other components in electronic device 700 to perform the desired functions.

For example, memory 720 may include any combination of one or more computer program products, which may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. Volatile memory can include, for example, random Access Memory (RAM) and/or cache memory (cache) and the like. The non-volatile memory may include, for example, read-only memory (ROM), hard disk, erasable programmable read-only memory (EPROM), portable compact disc read-only memory (CD-ROM), USB memory, flash memory, and the like. One or more computer program modules may be stored on the computer readable storage medium and executed by the processor 710 to implement the various functions of the electronic device 700. Various applications and various data, as well as various data used and/or generated by the applications, etc., may also be stored in the computer readable storage medium.

It should be noted that, in the embodiments of the present disclosure, specific functions and technical effects of the electronic device 700 may refer to the above description about the training method, which is not repeated herein.

Fig. 8 is a schematic block diagram of another electronic device provided by some embodiments of the present disclosure. The electronic device 800 is suitable, for example, for implementing the training method provided by embodiments of the present disclosure. For example, the electronic device 800 may be a terminal device or the like. It should be noted that the electronic device 800 illustrated in fig. 8 is merely an example and is not intended to limit the functionality and scope of use of the disclosed embodiments.

As shown in fig. 8, the electronic device 800 may include a processing means (e.g., a central processor, a graphics processor, etc.) 810 that may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 820 or a program loaded from a storage means 880 into a Random Access Memory (RAM) 830. In the RAM830, various programs and data required for the operation of the electronic device 800 are also stored. The processing device 810, the ROM 820, and the RAM830 are connected to each other by a bus 840. An input/output (I/O) interface 850 is also connected to bus 840.

In general, the following devices may be connected to the I/O interface 850: input devices 860 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, and the like; an output device 870 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 880 including, for example, magnetic tape, hard disk, etc.; and communication device 890. Communication device 890 may allow electronic device 800 to communicate wirelessly or by wire with other electronic devices to exchange data. While fig. 8 shows the electronic device 800 with various means, it is to be understood that not all of the illustrated means are required to be implemented or provided, and that the electronic device 800 may alternatively be implemented or provided with more or fewer means.

For example, according to embodiments of the present disclosure, the training method described above may be implemented as a computer software program. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a non-transitory computer readable medium, the computer program comprising program code for performing the training method described above. In such an embodiment, the computer program may be downloaded and installed from a network via communications device 890, or from storage 880, or from ROM 820. The functions defined in the training method provided by the embodiments of the present disclosure may be implemented when the computer program is executed by the processing means 810.

At least one embodiment of the present disclosure also provides a computer-readable storage medium for storing non-transitory computer-readable instructions that, when executed by a computer, implement the training method described above. With the computer readable storage medium, the robustness of the network to memristor conductance value fluctuation can be improved.

Fig. 9 is a schematic diagram of a storage medium according to some embodiments of the present disclosure. As shown in fig. 9, storage medium 900 is used to store non-transitory computer readable instructions 910. For example, non-transitory computer readable instructions 910, when executed by a computer, may perform one or more steps in accordance with the training methods described above.

For example, the storage medium 900 may be applied to the electronic device 700 described above. For example, the storage medium 900 may be the memory 720 in the electronic device 700 shown in fig. 7. For example, the relevant description of the storage medium 900 may refer to the corresponding description of the memory 720 in the electronic device 700 shown in fig. 7, which is not repeated here.

The following points need to be described:

(1) The drawings of the embodiments of the present disclosure relate only to the structures to which the embodiments of the present disclosure relate, and reference may be made to the general design for other structures.

(2) The embodiments of the present disclosure and features in the embodiments may be combined with each other to arrive at a new embodiment without conflict.

The foregoing is merely specific embodiments of the disclosure, but the scope of the disclosure is not limited thereto, and the scope of the disclosure should be determined by the claims.

Claims

1. A training method for a bayesian neural network based on a memristor array, conductance values of memristors in the memristor array being used to map weights of the bayesian neural network, the training method comprising:

acquiring first priori knowledge and second priori knowledge of the memristor array based on inherent non-ideal characteristics of the memristor;

Calculating a total loss function of the bayesian neural network based on the first priori knowledge;

counter-propagating the total loss function to update current parameters in the Bayesian neural network to obtain object parameters; and

and constraining the object parameters based on the second priori knowledge to obtain a training result of the weights of the Bayesian neural network.

2. The training method of claim 1, wherein the first prior knowledge comprises a standard deviation of weight fluctuation,

based on the first prior knowledge, calculating the total loss function of the bayesian neural network, comprising:

calculating the total loss function using variation learning based on the weight fluctuation standard deviation,

wherein the variation learning includes a complexity cost term, and the weight fluctuation standard deviation is used as an a priori standard deviation in the complexity cost term.

3. Training method according to claim 1 or 2, wherein the bayesian neural network is initialized with initialization parameters, which obey a gaussian distribution,

wherein the initialization parameter is used as the current parameter for the first training of the Bayesian neural network.

4. The training method of claim 3, wherein the object parameter comprises a mean of the gaussian distribution, the second prior knowledge comprising a weight window range of memristor cells in the memristor array;

constraining the object parameters based on the second prior knowledge to obtain the training result of the weights of the bayesian neural network, comprising:

and constraining the mean value of the Gaussian distribution in the object parameter to be within the weight window range.

5. The training method of claim 4, wherein the object parameter comprises a standard deviation of the gaussian distribution, and the second prior knowledge comprises a read fluctuation standard deviation, which is an error brought by reading a conductance value of a memristor in the memristor array;

and constraining the standard deviation of the Gaussian distribution in the object parameter so that the standard deviation of the Gaussian distribution is greater than or equal to the read fluctuation standard deviation.

6. The training method of claim 1, wherein obtaining a first a priori knowledge of the memristor array based on intrinsic non-ideal characteristics of the memristor comprises:

Carrying out multiple groups of electrical tests on memristors in the memristor array to obtain multiple test results of the multiple groups of electrical tests, wherein each test result comprises a weight fluctuation standard deviation of the memristors in the memristor array; and

and taking the maximum value of the weight fluctuation standard deviation as the first priori knowledge.

7. The training method of claim 1, wherein the memristor array comprises a plurality of rows and columns of memristor cells, each memristor cell comprising a first memristor and a second memristor provided in pairs,

the second prior knowledge includes a range of weight windows for memristors in the memristor array, weights of the bayesian neural network are based on a representation of differences between conductance values of a first memristor and conductance values of a second memristor included in each memristor cell in the memristor array,

obtaining a second a priori knowledge of the memristor array based on intrinsic non-ideal characteristics of the memristor, including:

calculating a value range of a difference value between a conductance value of a first memristor and a conductance value of a second memristor included in each memristor unit in the memristor array; and

and taking the value range as the weight window range.

8. The training method of claim 1, wherein the training result comprises a training mean of weights in the bayesian neural network, the training method further comprising:

judging whether the training of the Bayesian neural network reaches the preset training times or not;

if the preset training times are reached, mapping the training mean value to memristors in the memristor array to calculate by using the memristor array, or

And if the preset training times are not reached, training the Bayesian neural network again by taking the training result as the current parameter until the preset training times are reached.

9. A training apparatus for a bayesian neural network based on a memristor array, conductance values of memristors in the memristor array being used to map weights of the bayesian neural network, the training apparatus comprising:

an acquisition unit configured to acquire a first priori knowledge and a second priori knowledge of the memristor array based on intrinsic non-ideal characteristics of the memristor;

a calculation unit configured to calculate a total loss function of the bayesian neural network based on the first prior knowledge;

The updating unit is configured to back propagate the total loss function so as to update the current parameters in the Bayesian neural network to obtain object parameters; and

and the constraint unit is configured to constrain the object parameters based on the second priori knowledge so as to obtain a training result of the weight of the Bayesian neural network.

10. An electronic device, comprising:

a processor;

a memory storing one or more computer program instructions;

wherein the one or more computer program instructions, when executed by the processor, are for implementing the training method of any of claims 1-8.

11. A computer readable storage medium, non-transitory storing computer readable instructions, wherein the computer readable instructions, when executed by a processor, are for implementing the training method of any of claims 1-8.