CN116451760A

CN116451760A - Training method and device of circulating neural network based on memristor array

Info

Publication number: CN116451760A
Application number: CN202210015968.6A
Authority: CN
Inventors: 吴华强; 周姝; 唐建石; 张清天; 钱鹤; 高滨
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2022-01-07
Filing date: 2022-01-07
Publication date: 2023-07-18

Abstract

A training method and device for a memristor array-based Recurrent Neural Network (RNN). The method comprises the following steps: performing at least one training operation to obtain an object weight matrix; performing quantization processing on the object weight matrix to obtain a quantized object weight matrix; the quantized object weight matrix is mapped to a memristor array. Each training operation includes: acquiring an initial weight matrix; performing quantization processing and forward propagation processing on the initial weight matrix to obtain a forward calculation result; obtaining penalty items based on current weight values of the weight units in the initial weight matrix; obtaining an initial loss value by using a loss function based on a forward processing result, and performing punishment item processing on the initial loss value by using punishment items to obtain a processed loss value; and using the processed loss value to perform backward propagation to update the weight matrix to obtain an updated weight matrix. The method can improve the reasoning precision of RNN.

Description

Training method and device of circulating neural network based on memristor array

Technical Field

The embodiment of the disclosure relates to a training method and device of a circulating neural network based on a memristor array.

Background

With the rapid development of information science, more and more artificial intelligence algorithms and techniques are applied to various fields of social life. The recurrent neural network (Recurrent Neural Networks, RNN) is a common neural network structure that performs well in processing and predicting sequence data.

Disclosure of Invention

At least some embodiments of the present disclosure are directed to read noise of a memristor array, capable of training and mapping a set of weight matrices with stronger read noise resistance into the memristor array to improve inference accuracy of a circulating neural network based on the memristor array.

At least one embodiment of the present disclosure provides a method for training a recurrent neural network based on a memristor array, including:

performing at least one training operation on the cyclic neural network to obtain an object weight matrix;

performing quantization processing on the object weight values of each weight unit in the object weight matrix to obtain a quantized object weight matrix; and

mapping the quantized object weight matrix to the memristor array,

wherein each of the at least one training operation comprises:

acquiring an initial weight matrix for the cyclic neural network in the current training operation;

Carrying out quantization processing on the current weight value of each weight unit in the initial weight matrix to obtain a first quantized weight matrix;

forward propagation processing is carried out by using the first quantized weight matrix, so that a forward calculation result is obtained;

obtaining penalty terms based on current weight values of the weight units in the initial weight matrix;

obtaining an initial loss value by using a loss function based on the forward processing result, and performing punishment item processing on the initial loss value by using the punishment item to obtain a processed loss value;

backward propagation using the processed loss values to update the weight matrix results in an updated weight matrix,

wherein the updated weight matrix is used as an initial weight matrix for a next training operation or as the object weight matrix.

At least one embodiment of the present disclosure provides a training apparatus for a memristor array-based recurrent neural network, comprising:

the training unit is configured to perform at least one training operation on the cyclic neural network to obtain an object weight matrix;

the quantization unit is configured to quantize the object weight values of each weight unit in the object weight matrix to obtain a quantized object weight matrix; and

A control unit configured to map the quantized object weight matrix to the memristor array, wherein the training unit includes:

the acquisition module is configured to acquire an initial weight matrix for the cyclic neural network in the current training operation;

the quantization module is configured to perform the quantization processing on the initial weight values of all weight units in the initial weight matrix to obtain a first quantized weight matrix;

a processing module configured to perform forward propagation processing using the first quantized weight matrix to obtain a forward computation result, obtain a penalty term based on current weight values of each weight unit in the initial weight matrix, obtain an initial loss value using a loss function based on the forward processing result, perform penalty term processing on the initial loss value using the penalty term to obtain a post-processing loss value, perform backward propagation using the post-processing loss value to update the weight matrix to obtain an updated weight matrix,

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings of the embodiments will be briefly described below, and it is apparent that the drawings in the following description relate only to some embodiments of the present disclosure, not to limit the present disclosure.

FIG. 1 is a schematic diagram of a recurrent neural network;

FIG. 2A is a schematic diagram of a memristor array;

FIG. 2B is a schematic diagram of a memristor device;

FIG. 2C is a schematic diagram of another memristor device;

FIG. 3 shows a schematic diagram of a method of implementing a Recurrent Neural Network (RNN) using a memristor array;

FIG. 4A is a schematic flow chart of a method for training a recurrent neural network according to at least one embodiment of the present disclosure;

FIG. 4B is a schematic flow chart of each training operation in the training method shown in FIG. 4A;

FIG. 5A is a schematic diagram showing the read noise distribution of a resistive random access memory at a low read current;

FIG. 5B is a schematic diagram showing the read noise distribution of the resistive random access memory at a larger read current;

FIG. 6A illustrates a schematic block diagram of a memristor array provided by at least one embodiment of the present disclosure;

FIG. 6B illustrates a schematic diagram of another memristor array provided by at least one embodiment of the present disclosure;

FIG. 6C illustrates a memristor array constructed with memristor cells in a 2T2R structure;

FIG. 6D illustrates another memristor array constructed with memristor cells in a 2T2R structure; and

FIG. 7 illustrates a schematic diagram of a training apparatus for memristor array-based recurrent neural networks provided in at least one embodiment of the present disclosure.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present disclosure more apparent, the technical solutions of the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings of the embodiments of the present disclosure. It will be apparent that the described embodiments are some, but not all, of the embodiments of the present disclosure. All other embodiments, which can be made by one of ordinary skill in the art without the need for inventive faculty, are within the scope of the present disclosure, based on the described embodiments of the present disclosure.

Unless defined otherwise, technical or scientific terms used herein should be given the ordinary meaning as understood by one of ordinary skill in the art to which this disclosure belongs. The terms "first," "second," and the like, as used in this disclosure, do not denote any order, quantity, or importance, but rather are used to distinguish one element from another. Likewise, the word "comprising" or "comprises", and the like, means that elements or items preceding the word are included in the element or item listed after the word and equivalents thereof, but does not exclude other elements or items. The terms "connected" or "connected," and the like, are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "upper", "lower", "left", "right", etc. are used merely to indicate relative positional relationships, which may also be changed when the absolute position of the object to be described is changed.

Fig. 1 shows a schematic diagram of a recurrent neural network. As shown in fig. 1, V denotes a weight matrix of the output layer, y _t An output value (vector) representing the output layer, g representing the activation function of the output layer; h is a _t Output value (vector) representing current hidden layer, x _t Representing the input value (vector) of the current input layer, W _x Is the input value x of the input layer _t Weight matrix, W of (2) _h Is the value h of the last hidden layer _t-1 As a weighting matrix for the current input value, f is the activation function of the hidden layerWherein, the method comprises the steps of, wherein,

y _t ＝g(Vh _t )-----(1)

h _t ＝f(W _x x _t +W _h h _t-1 )-----(2)

from formulae (1) and (2):

y _t ＝g(Vh _t )＝g(Vf(W _x x _t +W _h h _t-1 ))

＝g(Vf(W _x x _t +W _h f(W _x x _t-1 +W _h h _t-2 ))-----(3)

as can be seen from the above equation (3), the output y of the recurrent neural network _t Is not only subject to the input value x _t Is also affected by the output h of the previous hidden layer _t-1 Is a function of (a) and (b).

Compared with feedforward neural networks such as convolution, RNN does not consider the relevance of data, and its output value is related to the current time input and also to the previous time input. Because of this feature, RNNs have time-memorizing capabilities, such as achieving significant performance in terms of sequential tasks such as text processing, speech recognition, and machine translation.

Memristors (e.g., resistive random access memories, phase change memories, conductive bridge memories, etc.) on the other hand are a new type of micro-nano electronic device whose conductance state can be adjusted by applying an external stimulus. Memristors, which are two-terminal devices, have the characteristics of adjustable resistance and non-volatility. According to kirchhoff's current law and ohm's law, a memory-calculation integrated array formed by the devices can finish analog multiply-add calculation in parallel, so that input analog signals are directly processed, and memory and calculation occur in memristors of the array. The memory integrated array comprises a plurality of rows and columns of operation units, and each operation unit is a memristor unit.

Fig. 2A shows a schematic structure of a computationally integrated array composed of a plurality of arithmetic units (e.g., memristor units) that constitute an array of M rows and N columns, with M and N each being a positive integer. Each arithmetic unit includes a switching element and one or more memristors. In fig. 2A, WL <1>, WL <2>, and WL < M > represent the word lines of the first row and the second row, respectively; BL <1>, BL <2> … … BL < N > respectively represent bit lines of the N-th column of the first column and the second column … …, and memristors in the operation unit circuit of each column are connected with the bit lines corresponding to the column; SL <1>, SL <2> … … SL < M > represent the source lines of the Mth row of the first row and the second row … …, respectively, and the sources of the transistors in the arithmetic unit circuits of each row are connected to the source lines corresponding to the row. According to kirchhoff's law, the above-described integrated memory array can perform multiply-accumulate calculations in parallel by setting the state (e.g., resistance) of the arithmetic unit and applying corresponding word line signals and bit line signals to the word lines and bit lines.

The operation units in the memory array of fig. 2A may have, for example, a 1T1R structure or a 2T2R structure, where the operation unit of the 1T1R structure includes one transistor and one memristor, and the operation unit of the 2T2R structure includes two transistors and two memristors, e.g., memristors including, but not limited to RRAM, PCRAM, ECRAM, flash, and the like. It should be noted that the present disclosure does not limit the structure of the memristor cell.

It should be noted that the transistors used in the embodiments of the present disclosure may be thin film transistors or field effect transistors (e.g., MOS field effect transistors) or other switching devices with the same characteristics. The source and drain of the transistor used herein may be symmetrical in structure, so that the source and drain may be indistinguishable in structure.

FIG. 2B is a schematic diagram of a memristor device provided by at least one embodiment of the present disclosure, including a memristor (sub) array and its peripheral drive circuitry for implementing the input-output modules of the present disclosure. For example, as shown in FIG. 2B, the memristor device includes a signal acquisition device, a word line drive circuit, a bit line drive circuit, a source line drive circuit, a memristor array, and a data output circuit.

For example, the signal acquisition device is configured to convert a digital signal into a plurality of first analog signals by a digital-to-analog converter (Digital to Analog converter, DAC for short) to be input to a plurality of column signal inputs of the memristor array when subjected to convolution processing.

For example, a memristor array includes M source lines, M word lines, and N bit lines, and a plurality of memristor cells arranged in M rows and N columns. For example, each memristor cell is a 1T1R structure, and a parameter matrix for fourier transformation may be mapped to a plurality of memristor cells in the memristor array.

For example, operation of the memristor array is achieved by a word line drive circuit, a bit line drive circuit, and a source line drive circuit.

For example, the word line driving circuit includes a plurality of multiplexers (Mux) for switching word line input voltages; the bit line driving circuit includes a plurality of multiplexers for switching bit line input voltages; the source line driving circuit also includes a plurality of multiplexers (Mux) for switching the source line input voltages. For example, the source line driving circuit further includes a plurality of ADCs for converting analog signals into digital signals. In addition, a transimpedance amplifier (Trans-Impedance Amplifier, TIA for short) (not shown in the figure) can be further arranged between the Mux and the ADC in the source line driving circuit to complete the conversion from current to voltage, so as to facilitate ADC processing.

For example, a memristor array includes an operational mode and a computation mode. When the memristor array is in the operation mode, the memristor unit is in an initialized state, and values of parameter elements in the parameter matrix can be written into the memristor array. For example, the source line input voltage, the bit line input voltage, and the word line input voltage of the memristor are switched to corresponding preset voltage intervals through the multiplexer.

For example, the word line input voltage is switched to the corresponding voltage interval by the control signal wl_sw [1:M ] of the multiplexer in the word line driving circuit in fig. 2B. For example, when the memristor is set, the word line input voltage is set to 2V (volts), for example, when the memristor is reset, the word line input voltage is set to 5V, for example, the word line input voltage may be obtained by the voltage signal v_wl [1:M ] in fig. 2B.

For example, the source line input voltage is switched to the corresponding voltage section by the control signal sl_sw [1:M ] of the multiplexer in the source line driving circuit in fig. 2B. For example, when the memristor is set, the source line input voltage is set to 0V, for example, when the memristor is reset, the source line input voltage is set to 2V, for example, the source line input voltage may be obtained by the voltage signal v_sl [1:M ] in fig. 2B.

For example, the bit line input voltages are switched to the corresponding voltage intervals by the control signals BL_sw [1:N ] of the multiplexers in the bit line driving circuit in FIG. 2B. For example, the bit line input voltage is set to 2V when the memristor is set, for example, the bit line input voltage is set to 0V when the memristor is reset, for example, the bit line input voltage may be obtained by the DAC in fig. 2B.

For example, when the memristor array is in a calculation mode, the memristors in the memristor array are in a conductive state available for calculation, and the bit line input voltage input by the column signal input terminal does not change the conductance value of the memristor, for example, the calculation can be completed by performing a multiply-add operation through the memristor array. For example, the word line input voltage is switched to a corresponding voltage section by the control signal wl_sw [1:M ] of the multiplexer in the word line driving circuit in fig. 2B, for example, when an on signal is applied, the word line input voltage of a corresponding row is set to 5V, for example, when an on signal is not applied, the word line input voltage of a corresponding row is set to 0V, for example, the GND signal is turned on; the source line input voltage is switched to a corresponding voltage interval by the control signal sl_sw [1:M ] of the multiplexer in the source line driving circuit in fig. 2B, for example, the source line input voltage is set to 0V, so that the current signals of the plurality of row signal output terminals can flow into the data output circuit, and the bit line input voltage is switched to a corresponding voltage interval by the control signal bl_sw [1:n ] of the multiplexer in the bit line driving circuit in fig. 2B, for example, the bit line input voltage is set to 0.1V-0.3V, so that the convolution operation is completed by utilizing the characteristic that the memristor array can perform the multiply-add operation.

For example, the data output circuit may include a plurality of TIAs, ADCs, and may convert the current signals at the plurality of row signal outputs to voltage signals and then to digital signals for subsequent processing.

FIG. 2C is a schematic diagram of another memristor device provided in at least one embodiment of the present disclosure. The memristor device shown in fig. 2C is substantially the same as the memristor device shown in fig. 2B in structure, and also includes a memristor (sub) array and its peripheral driving circuitry for implementing the input-output module of the present disclosure. For example, as shown in FIG. 2C, the memristor device signal acquisition device, word line drive circuitry, bit line drive circuitry, source line drive circuitry, memristor array, and data output circuitry.

For example, a memristor array includes M source lines, 2M word lines, and 2N bit lines, and a plurality of memristor cells arranged in an array of M rows and N columns. For example, each memristor unit is in a 2T2R structure, and the parameter matrix for transformation processing is mapped to the operation of different memristor units in the memristor array, which is not described herein. It should be noted that the memristor array may also include M source lines, M word lines, 2N bit lines, and a plurality of memristor cells arranged in M rows and N columns. Since the on signal is applied to the plurality of signal control terminals of the memristor array at the same time when step S130 is performed, two memristors in each row of memristor cells can be simultaneously controlled by each word line.

The description of the signal acquisition device, the control driving circuit, and the data output circuit may refer to the previous description, and will not be repeated here.

FIG. 3 shows a schematic diagram of a method of implementing a Recurrent Neural Network (RNN) using a memristor array. The memristor array shown in fig. 3 is used to implement the input layer to hidden layer calculation of RNN, and a combination of two 1T1R (one transistor one memristor) cells is used to represent one weight value. Input value x _t Output value h of previous hidden layer _t-1 Is input into the memristor array in the form of a voltage signal through a bit line BL, obtains an output value in the form of a current signal at a source line SL of the memristor array, and can be used for a memory cellAfter the modulus is converted, the output value is calculated by adopting an activation function f to obtain the output value h of the hidden layer _t The output value h _t The converted is used for the next forward calculation. The memristor array corresponds to an input value x _t For implementing the corresponding weight matrix W _x Output value h corresponding to last time of hidden layer _t-1 For implementing the corresponding weight matrix W _h 。

Similarly, a memristor array may be employed for implementing the hidden layer-to-output layer calculation of the RNN, in which calculation only a weight matrix for the output values of the hidden layer, the output values of the hidden layer h, are included _t The bit line BL is used for inputting the memristor array in the form of a voltage signal, an output value in the form of a current signal is obtained on the source line SL of the memristor array, and the output value can be calculated by adopting the activation function g after the analog-to-digital conversion to obtain the output value y of the output layer _t And obtaining the output value of the forward calculation of the RNN.

The activation function g and/or the activation function f in the above operations may be sigmoid, tanh, reLu or softmax, as embodiments of the present disclosure are not limited in this regard.

In the course of performing the inference operation (forward propagation) by the above-described RNN embodiment, W is for a plurality of loop processes _x And W is _h Originally shared values, but in mapping these weight values into a memristor array, the weights of the two parts will fluctuate rather than being constant values, then the output value y of the RNN _t Will receive W _x And W is _h And deviate from the target value that should be originally output.

As mentioned above, since the non-ideal characteristics of the memristor, such as the read noise characteristics of the device, cannot be ignored in practical applications, the read noise characteristics will affect the output of the computation hidden layer, thereby adversely affecting the inference accuracy of the neural network. In particular, for the recurrent neural network, since the output value of the hidden layer is fed into the recurrent neural network together with the next input value, the accumulated calculation error increases (or increases) with the forward propagation. The existing method for relieving the read noise of the memristor array is mainly used for modeling aiming at the read noise characteristic of the memristor and then adding the memristor to a neural network for simulation, but the method has obvious limitations. This is because the read noise device characteristics of memristors are highly random and cannot be completely fitted with a particular noise model. Also, the read noise characteristics of memristors are different at different times, and thus it is not possible to describe it very accurately with a particular model. Therefore, when the measured data of the memristors are used for reasoning, the neural network is difficult to adapt to the real noise in the memristor array, and therefore the reasoning accuracy of the neural network is difficult to be greatly improved. Additional approaches are more process-wise considered to mitigate the characteristics of read noise in an attempt to fabricate better memristor devices.

The inventors of the present disclosure have noted that merely adding a noise model on an existing hardware basis or from the time spent on redesigning and manufacturing new devices takes a significant amount of time is not an efficient and effective solution. Therefore, the inventor of the present disclosure proposes a training method for RNN based on memristor array to train a set of weight matrix with stronger anti-read noise capability and map the weight matrix into the memristor array to improve the reasoning accuracy of the cyclic neural network based on the memristor array.

At least one embodiment of the present disclosure provides a method of training a recurrent neural network based on a memristor array. The recurrent neural network involved in the method comprises an input layer, a hidden layer and an output layer as shown in fig. 1, and in order to achieve a value (input vector) x for the input _t The output values (output vectors) of the recurrent neural network are processed, for example, by a fully connected layer, and the memristor array used in the training method can be exemplified as described above, and corresponds to the weight matrix W (W _x +W _h ) (see, e.g., FIG. 3), the weight matrix V and the weight matrix corresponding to the fully connected layers may each be implemented with a different memristor array or may be implemented with the same memristor array Different parts are realized. In at least one of the above embodiments of the present disclosure, for example, at least one of the weight matrix V and the weight matrix corresponding to the fully connected layer may also be implemented not by a memristor array, but by software, for example.

FIG. 4A illustrates a flow diagram of a training method provided by at least one embodiment of the present disclosure; as shown in fig. 4A, the method includes the following steps S100 to S200:

step S100: and performing at least one training operation on the cyclic neural network to obtain an object weight matrix.

In this step, at least one (e.g., a plurality of) training operations are performed on the recurrent neural network to be trained (or referred to as "target RNN") by, for example, software, firmware, or hardware, thereby determining an object weight matrix, which is a weight matrix as a training result. For example, the software may run on a CPU, GPU, NPU (AI accelerator), or the like, in a manner that is at least partially trained by the software. Typically, the training is terminated after a specified number of training rounds or after the value of the calculated loss function is below a certain preset value during the training. For example, the weight matrix may be iteratively updated repeatedly during the training process, the training may be ended until a convergence condition is reached, and then the resulting optimal weight matrix (or the weight matrix deemed optimal) may be mapped into the memristor array.

Step S200: and carrying out quantization processing on the object weight values of each weight unit in the object weight matrix to obtain a quantized object weight matrix.

In this step, the quantization process converts the weight value of the floating point type into an integer (int) type, for example, the data before quantization is a weight value of the floating point type which may be 32 bits or 16 bits, the length of the quantized data bit may be 1-bit (binary network), 2-bit (ternary network), 3-bit, 4-bit, 5-bit, 6-bit, 7-bit, 8-bit, etc., and these several quantized value types are integer types; for example, the quantization process may be symmetric quantization (the quantization result has a positive or negative number) or asymmetric quantization (the quantization result is all positive numbers, for example). Embodiments of the present disclosure are not limited to a particular quantization processing algorithm.

Step S300: the quantized object weight matrix is mapped to a memristor array. In the step, the memristor array driving circuit is controlled to set each memristor in the memristor array through corresponding setting operation on a word line, a bit line and a source line of the memristor array, so that each memristor is set to a resistance value corresponding to a weight value, and the first quantized weight matrix is mapped to the memristor array. For example, after the memristors are set, for example, by performing corresponding read operations on the word lines, the bit lines and the source lines of the memristor array, the resistance value of each memristor in the memristor array can be read, and if the resistance value of a certain memristor does not fall within a predetermined range of the target resistance value corresponding to the certain memristor (for example, the error does not exceed ±5%) then the setting can be performed again until the resistance value of the memristor falls within the predetermined range of the target resistance value corresponding to the certain memristor.

An inference operation may then be performed based on the mapped memristor array, for example, to sort the input values, and specifically, for example, forward propagation processing is performed using the memristor array, to obtain a forward computation result. In this step, the memristor array drive circuit is controlled to perform corresponding forward computation (inference computation) operations on the word lines, bit lines, and source lines of the memristor array.

First, for example, the value x is input _t Derived from the actual classification task. Input signal x _t Output value h of previous hidden layer _t-1 The memristor array corresponding to the weight matrix W is input (if any) in the form of a voltage signal via the bit line, while the output value in the form of a current is obtained via the source line of the memristor array, which is subsequently subjected to operations such as analog-to-digital conversion, whereby a digital value is obtained which can be further processed, for example, which can be further processed by an activation function f to obtain the calculation result h of the hidden layer _t . The result of this calculation is then input again in the form of a voltage signal to the memristor array corresponding to the weight matrix V, from which the output value in the form of a current is then derived, which is subsequently passed through a modulus Digital conversion, etc., to obtain a digital value which can be further processed, e.g., the digital value can be further processed by the activation function g to obtain the calculation result y of the output layer _t The result y of the calculation is as described above _t The processing may be further performed, for example, by a memristor array corresponding to the fully-connected layer to obtain a classification result, where the classification result is a forward calculation result of the RNN.

As described above, to obtain the last applicable weight matrix and map to the memristor array, at least one training needs to be performed, e.g., a large number (e.g., hundreds, thousands, etc.) of training to strive for an optimal weight matrix, and map the optimal weight matrix to the memristor array. In the case of multiple exercises, these exercises are performed in a cyclic manner.

Fig. 4B shows a schematic flow chart of each training operation of the training method. As described above, the training method is performed, for example, by software, hardware, firmware, or the like, to which embodiments of the present disclosure are not limited. As shown in fig. 4B, the method includes the following steps S101 to S106 for the current training operation of the at least one training operation:

step S101: an initial weight matrix for a current training operation for the recurrent neural network is obtained.

In this step, the initial weight matrix in the current training operation may be an updated weight matrix obtained by back-propagating the loss value obtained by the previous training in the training process, where the weight matrix (W and/or V) is obtained according to the back-propagated calculation result, and is, for example, a 32-bit floating point value.

Step S102: and carrying out quantization processing on the current weight value of each weight unit in the initial weight matrix to obtain a first quantized weight matrix.

In this step, the quantization process converts the weight value of the floating point type into an integer (int) type, for example, the quantized data bit length may be 1-bit (binary network), 2-bit (ternary network), 3-bit, 4-bit, 5-bit, 6-bit, 7-bit, 8-bit, etc., and these several quantized value types are integer types; for example, the quantization process may be symmetric quantization (the quantization result has a positive or negative number) or asymmetric quantization (the quantization result is all positive numbers, for example). The quantization process in this training process is, for example, the same quantization process manner (or quantization process algorithm) as the quantization process in the reasoning operation shown in fig. 4A, and the embodiment of the present disclosure is not limited to a specific quantization process algorithm.

Step S103: and performing forward propagation processing by using the first quantized weight matrix to obtain a forward calculation result.

In the forward propagation process, first, for example, an input value x _t Derived from the training dataset. Input signal x _t Output value h of previous hidden layer _t-1 Processing (if any) with the first quantized weight matrix, and further processing with an activation function f to obtain hidden layer calculation result h _t . The calculation result is processed by a weight matrix V and then is processed by an activation function g to obtain a calculation result y of an output layer _t The result y of the calculation is as described above _t Further processing may be performed, for example, by the full connectivity layer, to obtain a classification result, which is a forward calculation result of the RNN.

For example, in at least one embodiment of the present disclosure, a noise model of the memristor array corresponding to the weight matrix W or the memristor array of the weight matrix V may be further added in performing the forward computation to obtain the computation result y of the output layer _t . For example, after the first quantized weight matrix is obtained, a read noise model acquired from the memristor array to be used is added to the first quantized weight matrix, noise actually received in the memristor array is simulated, and then forward propagation processing is performed.

For example, the noise model is modeled for the read noise characteristics of the memristor array, for example, by fitting a classical probability distribution model (typically noise distribution conforms to a laplace distribution or a gaussian distribution) to the error of the read noise of the memristor array by statistical modeling, and then adding a perturbation to the weight values in the weight matrix that approximately conforms to the actual noise distribution during the neural network training process, thereby simulating the noise encountered during the actual forward propagation process. Embodiments of the present disclosure are not limited to this noise model, and any suitable noise model may be used.

Step S104: penalty terms are obtained based on the current weight values of the individual weight units in the initial weight matrix.

In this step, penalty terms are obtained based on the current weight values of the respective weight units in the initial weight matrix, whereby the penalty terms are recorded with, for example, distribution information of the current weight values of the respective weight units in the initial weight matrix, respectively.

Step S105: and obtaining an initial loss value by using a loss function based on the forward processing result, and performing punishment item processing on the initial loss value by using punishment items to obtain a processed loss value.

In this step, an initial loss value is obtained using a loss function based on the forward processing result obtained in the previous step and a target result (ideal result) corresponding thereto, and then the initial loss value is subjected to penalty term processing using a penalty term to obtain a post-processing loss value, whereby the obtained post-processing loss value is also recorded with distribution information of current weight values of the respective weight units in the initial weight matrix, for example, accordingly. Similarly, the input values x _t (and corresponding calculation result y) _t ) The corresponding target results are derived from the training dataset.

Step S106: and using the processed loss value to perform backward propagation to update the weight matrix to obtain an updated weight matrix.

In this step, back Propagation (BP) is performed with the processed loss value to update the weight matrix to obtain an updated weight matrix. The updated weight matrix may include an updated weight matrix W and/or an updated weight matrix V, etc. The weight values of the respective weight units of the updated weight matrix may be floating point values, such as 32-bit floating point values. Embodiments of the present disclosure are not limited to backward propagation algorithms.

So far, the current training operation is completed, then judging whether the training is continued or not, if the next training is continued, using the obtained updated weight matrix as an initial weight matrix of the next training operation, if the obtained updated weight matrix meets the expected requirement, and if the convergence condition is met, finishing the training, and using the obtained updated weight matrix as an object weight matrix in the method shown in fig. 4A.

After obtaining the updated weight matrix, if the updated weight matrix is used for training of the input values of the next batch, the weight values of the updated weight matrix are affected by the initial weight matrix because the updated weight matrix is obtained based on the loss values using the penalty terms, the training of the input values of the next batch can be achieved by repeating the steps S101-S106, and the obtained updated weight matrix is used for training of the input values of the next batch, and the "knowledge" learned from the training at the current moment is continuously transferred.

Whereby the next batch of training has "knowledge" learned from the current batch of training. And repeatedly and iteratively updating until the convergence condition is reached, ending updating the gradient to obtain a group of optimal weights, and mapping the group of optimal weights onto the memristor array.

In at least one embodiment of the present disclosure, a noise model of the memristor array corresponding to the weight matrix W or the memristor array of the weight matrix V may be further added in the forward computation process to obtain the computation result y of the output layer _t . For example, the noise model is modeled for the read noise characteristics of the memristor array, for example, by fitting a classical probability distribution model (typically noise distribution conforms to a laplace distribution or gaussian distribution) to the error of the read noise of the memristor array by statistical modeling, and then adding perturbations to the weight values in the weight matrix that approximately conform to the actual noise distribution during the neural network training, thereby simulating the noise encountered during the actual forward propagation. Embodiments of the present disclosure are not limited to this noise model, and any suitable noise model may be used.

In the method of the above embodiment, the input values (input data) of different batches are treated as different tasks, the last input data being an old task for the RNN, the current input data being a new task for the RNN, the new task being "learned" from the old task during training to "knowledge" passed through the updated weight matrix.

For example, in at least some embodiments of the present disclosure, this "knowledge" includes the importance of the individual weights in the old task. In this regard, the weight values in the weight matrix obtained by the old task (i.e., the "initial weight matrix" used by the current training) are first ranked, and for the weight values that are relatively important therein, when the loss value obtained by the current training is used to perform backward propagation to update the weight matrix, the weight that appears relatively important in the current task is changed more limitedly, i.e., the change is relatively smaller, so that when the updated weight matrix is mapped into the memristor array, the mapped weight value is not deviated too far from the target value of the weight value that should be originally caused by disturbance of the read noise of the memristor. This way, a set of weights matrices trained will retain relatively more critical information (e.g., including the most critical information) in the weights matrix and have good noise immunity.

The penalty term reflects the "knowledge" to the penalty value by a penalty term process. The penalty term is used to process the initial loss value to obtain a processed loss value, the processed loss value is used for backward propagation, and therefore the knowledge is further reflected into an updated weight matrix, and when the updated weight matrix is used for next training, the knowledge is further reflected into the next training result.

For example, in at least one embodiment of the present disclosure, in step S104, obtaining a penalty term based on the current weight value of each weight element in the initial weight matrix includes: calculating a Maximum Likelihood Estimation (MLE) of the initial weight matrix; calculating a Fisher information matrix of the maximum likelihood estimation value of the initial weight matrix; the penalty term is obtained using the maximum likelihood estimate and the fisher information matrix.

For example, in at least one example of the above-described embodiments, the Maximum Likelihood Estimate (MLE), e.g., here, the maximum likelihood estimate of a discrete random variable, is obtained by constructing a likelihood function and then solving the likelihood function, as embodiments of the present disclosure are not specifically limited thereto. The Fisher information matrix of the maximum likelihood estimates is obtained by calculating the second derivative of the Maximum Likelihood Estimates (MLE).

Also, in at least one example of the above embodiment, the obtaining the penalty term using the maximum likelihood estimation value and the fisher information matrix includes: punishment term βΩ _i-1 (θ _i-1 -θ* _i-1 ) ² (hereinafter referred to as "formula 1"), wherein β is a super parameter, Ω _i-1 Representing a Fisher information matrix, θ _i-1 Represents an initial weight matrix, θ _i-1 Representing the maximum likelihood estimate. This penalty term reflects the importance of the individual weights in the old task and, when combined with the initial loss value (as described in the specific example below), allows the weights that are relatively important in the current task to be more restrictively changed, i.e., to be relatively less changed.

More specifically, an initial weight matrix θ in the current training operation _i-1 Transformed by noise disturbance during forward propagation, the objective of setting the penalty term is to require an initial weight matrix θ _i-1 The more stable and better the most critical weights, the most optimal case is unchanged. Because of the Fisher information matrix Ω in the foregoing equation 1 _i-1 It is the value of the magnitude of the correlation that this set of weights affects the final classification result that is stored, the greater the value of a certain matrix element, the more important the weight of the position corresponding to this matrix element. Because the penalty term is superimposed inside the loss value function, the final objective is to make the value of the loss function corresponding to the more important weights smaller more likely, resulting in an initial weight matrix θ by minimizing equation 1 _i-1 Infinite approximation maximum likelihood estimate θ _i-1 The final weight matrix trained thereby includes the most critical weight values and has good noise immunity.

Furthermore, since β is a super parameter, it can be freely adjusted, so in some examples, the penalty term may also be expressed as β/2 Ω _i-1 (θ _i-1 -θ* _i-1 ) ² This expression is substantially equivalent to the aforementioned formula 1.

The hyper-parameter β in the penalty term can be freely adjusted as described above and affects the resulting weight values after back-propagation. The appropriate hyper-parameter beta may be obtained, for example, by simulation in software or by multiple attempts. In equation 1 above, the hyper-parameter β affects the convergence rate of training more significantly, because the larger the value of β, the more important it is to say that the weight distribution ("knowledge") learned in the past is, and the faster it is easy to find the optimal point. On the other hand, if β is too large, the important weight learned at the old task is too much influenced, i.e., a large number of learned weight distributions at the old task are inherited, the more difficult it is to change the weight distribution at the current task, and instead, it may be more difficult to improve the accuracy distribution of reasoning. The obtained proper super parameter beta can enable the penalty term to properly transfer the learned 'knowledge' in the past task, so that a weight matrix with stronger anti-reading noise capability and mapping to the memristor array can be provided, and the reasoning precision of the circulating neural network based on the memristor array can be further improved.

For example, as β increases, the convergence speed of the obtained updated weight matrix becomes faster and faster, but if β is too large, the inference accuracy continues to increase upward and becomes more limited. Thus, as described above, the inference accuracy of the memristor array-based recurrent neural network can be improved by modeling in software or training out a set of weights through multiple attempts to find a set of preferred (or even optimal) parameters to map to the memristor array for inference operations.

In at least one embodiment of the present disclosure, the quantization process is a three-value quantization process, quantizing the quantized weight value to-1, 0, or 1; for example, correspondingly, the subsequent penalty term processing will increase the proportion of weight units with weight value 0 in the quantized object weight matrix.

In at least one embodiment of the present disclosure, to map a first quantized weight matrix to a memristor array, a single quantized weight value is mapped using two memristor combinations, and a read current of the memristor corresponding to 0 is less than a read current of the memristor corresponding to 1.

For example, in one example, memristors of memristor cells of a memristor array employ a Resistive Random Access Memory (RRAM), e.g., the RRAM includes a "sandwich" structure, i.e., a value resistive memory layer between two electrodes, whose material composition may be a stacked structure: hfQ _x /AlQ _x /TaQ _x The thickness of the individual material layers in the stack may be 60nm/10nm/220nm, respectively. The resistive random access memory of the structure and the material is taken as an example to be described below, but the embodiment is not limited to the example, and the same concept can be adopted for quantifying the weight value and setting the memristor array for memristors of other structures and materials.

FIG. 5A is a schematic diagram showing the read noise distribution of the resistive random access memory at a low read current; fig. 5B shows a schematic diagram of the read noise distribution of the resistive random access memory at a larger read current. In these cases, in order to read the value of the resistance change memory, a read voltage of 0.2V is employed.

As can be seen with reference to fig. 5A and 5B, the current state of the resistive random access memory is more stable at a smaller read current, the fluctuation range is smaller, and thus the read noise is smaller; the resistive random access memory has more unstable current state under the condition of larger reading current, and has larger fluctuation range, thereby having larger reading noise. Here, the resistance of the memristor in the resistance state of 0.4 μa (microampere) is selected to map "0", and the memristor in the resistance state of 4 μa is selected to map "1" and/or "-1".

Therefore, in the case of combining two memristors, a resistance difference between two memristors in a resistance state of 0.4 μa (microampere) may be selected to map the quantized weight value of "0", while a resistance difference between one memristor in a resistance state of 4 μa and another memristor in a resistance state of 0.4 μa may be selected to map the quantized weight value of "1" and/or "-1"; in quantization, for example, the difference in resistance between two memristors may be quantized by dividing by, for example, 36, i.e., dividing between 0 and 1 (or-1) into 36 intervals, each interval corresponding to 0.1 μA.

As described above, the current state in which the read current of the resistive random access memory is 0.4 μa is more stable, so that "0" is represented by the difference in resistance between the two memristors in the resistance state in which the read current is 0.4 μa, instead of the difference in resistance between the two memristors in the resistance state in which the read current is 4 μa, thereby making the read noise of the read current of the memristor corresponding to 0 smaller, which contributes to making the read noise of the memristor array on which the weight matrix is mapped smaller as a whole.

Of course, for the weight value 0 or the weight value 1, the resistance state corresponding to the specific read current described above is not limited to be selected, as long as the read noise of the read current of the memristor corresponding to 0 is made smaller than the read noise of the read current of the memristor corresponding to 1. For example, the resistance of two memristors in the resistance state of 0.5 μa is selected to be different to map the quantized weight value of "0", while the resistance difference between one memristor in the resistance state of 4.5 μa and the other memristor in the resistance state of 0.5 μa is selected to map the quantized weight value of "1", at which time the resistance difference between the two memristors may be divided by, for example, 40 to quantize, that is, the interval between 0 and 1 (or-1) is divided into 40 intervals, each interval corresponding to 0.1 μa.

For example, in at least one embodiment of the present disclosure, the two memristors described above for combining mapping the same weight value include a first memristor (RRAM 1) and a second memristor (RRAM 2), so using two memristor combinations to map a single quantized weight value includes the following scenarios:

for a single quantized weight value of 0, map both memristors to 0;

for a single quantized weight value of 1, mapping the first memristor to 1 and the second memristor to 0;

for a single quantized weight value of-1, the first memristor is mapped to 0 and the second memristor is mapped to 1.

The above-described mapping relationship can be referred to the following table 1.

TABLE 1

Weight value	RRAM 1	RRAM 2
			-1	0.4μA	4μA
0	0.4μA	0.4μA
			1	4μA	0.4μA

By the mapping mode, after the quantized weight matrix is mapped into the memristor (here, a resistive random access memory is taken as an example) array, each weight value is expressed by using a resistance value corresponding to a low reading current with low reading noise as much as possible. As shown in the above table, each weight value includes at least one resistance state set to a read current of 0.4 μa. Therefore, the weight value fluctuation in the weight matrix caused by the read noise characteristic of each memristor in the memristor array is reduced, and the error accumulation in the forward reasoning process is reduced, so that the weight matrix which is mapped to the memristor array and has stronger read noise resistance can be provided, and the reasoning precision of the circulating neural network based on the memristor array can be improved.

In other quantization schemes, for example, in an example employing binarization quantization (quantization values of 0 and 1), for example, the following mapping relationship of table 2 may be similarly employed. This also helps to reduce weight value fluctuations in the weight matrix caused by the read noise characteristics of the memristor array itself.

TABLE 2

Weight value	RRAM 1	RRAM 2
			0	0.4μA	0.4μA
1	4μA	0.4μA

For example, in at least one embodiment of the present disclosure, in step S106 described above, the Loss function includes a cross entropy Loss function (CEL), a focal Loss function (focal Loss), a mean square error Loss function (Mean Squared Error Loss), a mean absolute error Loss function (Mean Absolute Error Loss), or a fractional Loss function (Quantile Loss). Using these loss functions, a cross entropy loss value, a focus loss value, a mean square error loss value, an average absolute error loss value, and a quantile loss value can be obtained, respectively, based on the forward processing result.

For example, in at least one embodiment of the present disclosure, the penalty term processing described above includes adding a penalty term to the initial loss value, e.g., in step S106.

Corresponding to the loss function and the penalty term, based on obtaining an initial loss value by using the loss function, performing penalty term processing on the initial loss value to obtain processed loss values, wherein the processed loss values are respectively as follows:

Post-processing loss value = cross entropy loss value + penalty term;

post-processing loss value = focus loss value + penalty term;

post-processing loss value = mean square error loss value + penalty term;

post-processing loss value = average absolute error loss value + penalty term;

post-processing penalty = quantile penalty + penalty term.

As described above, the penalty term process adds a correction term to the selected loss function.

For example, in at least one embodiment of the present disclosure, a memristor array includes a first memristor sub-array and a second memristor sub-array, with two memristors located in the first memristor sub-array and in the second memristor sub-array, respectively.

First, an example of a memristor array capable of implementing negative elements is specifically described below by way of fig. 6A, 6B.

FIG. 6A is a schematic block diagram of a memristor array provided in accordance with at least one embodiment of the present disclosure.

As shown in FIG. 6A, the memristors 801 and 802 may form a memristor pair, with the conductance value of the memristor 801 denoted as G ₁₁ The conductance value of the memristor 802 is denoted as G ₁₂ . Because the memristor 802 is connected to an inverter, when the memristor 801 receives an input voltage signal of positive polarity, the inverter may invert the polarity of the input voltage signal, thereby causing the memristor 802 to receive an input voltage signal of negative polarity. For example, the input voltage signal received by the memristor 801 is denoted by v (t), and the input voltage signal received by the memristor 802 is denoted by-v (t). Memristors 801 and 802 are connected to two different SLs, through which an input voltage signal generates an output current. The output current through memristor 801 and the output current through memristor 802 are superimposed at the SL end. Thus, the result of the multiply-accumulate computation of memristor 801 and memristor 802 is v (t) G ₁₁ +(-v(t))G ₁₂ That is v (t) (G) ₁₁ -G ₁₂ ). Thus, a memristor pair consisting of memristor 801 and memristor 802 may correspond to one weight element, and the weight element is G ₁₁ -G ₁₂ By configuring G ₁₁ -G ₁₂ The numerical relationship of (c) may implement positive, zero, and negative elements.

FIG. 6B is a schematic diagram of another memristor array provided in accordance with at least one embodiment of the present disclosure.

As shown in FIG. 6B, for example, the memristors 801 and 802 may form a memristor pair, with the conductance value of the memristor 801 denoted as G ₁₁ The conductance value of the memristor 802 is denoted as G ₁₂ . Unlike FIG. 6A, the memristor 802 is not connected to an inverter, so when the memristor 801 receives an input voltage signal of positive polarity, the memristor 802 also receives an input voltage signal of positive polarity. For example, the input voltage signal received by the memristor 801 is denoted by v (t), and the input voltage signal received by the memristor 802 is also denoted by v (t). Memristor 801 and memristor 802 are connected to two different SLs, and the output current through memristor 801 and the output current through memristor 802 are subtracted at the SL ends. Thus, the result of the multiply-accumulate computation of memristor 801 and memristor 802 is v (t) G ₁₁ -v(t)G ₁₂ I.e. v ₀ (t)(G ₁₁ -G ₁₂ ). Thus, a memristor pair consisting of memristor 801 and memristor 802 may be a weight element, and the weight element is G ₁₁ -G ₁₂ By configuring G ₁₁ -G ₁₂ The numerical relationship of (c) may implement positive, zero, and negative elements.

In addition, a memristor unit of a 2T2R structure can be utilized to correspond to one weight element. An example of constructing a memristor array with memristor cells of a 2T2R structure is described below with reference to fig. 6C, 6D.

FIG. 6C illustrates a memristor array constructed with memristor cells in a 2T2R structure.

As shown in fig. 6C, for example, a memristor unit of a 2T2R structure includes two memristors, namely, memristor 801 and memristor 802, respectively, and the conductance value of the memristor 801 is denoted as G ₁₁ MemristorThe conductance value of the resistor 802 is denoted as G ₁₂ . For example, since the memristor 802 is connected to an inverter, when the memristor 801 receives an input voltage signal of positive polarity, the inverter may invert the polarity of the input voltage signal, thereby causing the memristor 802 to receive the input voltage signal of negative polarity. For example, the input voltage signal received by the memristor 801 is denoted by v (t), and the memristor 802 receives an inverted input voltage signal of v (t), i.e., -v (t). Memristor 801 and memristor 802 are connected to the same SL, at the end of which the output current through memristor 801 and the output current through memristor 802 are superimposed. Thus, the result of the multiply-accumulate computation of memristor 801 and memristor 802 is v (t) G ₁₁ +(-v(t))G ₁₂ That is v (t) (G) ₁₁ -G ₁₂ ). Thus, a memristor cell of a 2T2R structure containing memristor 801 and memristor 802 may correspond to one weight element, and the weight element is G ₁₁ -G ₁₂ By configuring G ₁₁ -G ₁₂ The numerical relationship of (c) may implement positive, zero, and negative elements.

FIG. 6D illustrates another memristor array constructed with memristor cells in a 2T2R structure.

As shown in fig. 6D, for example, a memristor cell of a 2T2R structure includes two memristors, namely, memristor 801 and memristor 802, respectively, and the conductance value of the memristor 801 is denoted as G ₁₁ The conductance value of the memristor 802 is denoted as G ₁₂ . Unlike FIG. 6C, the memristor 802 is not connected to an inverter, so when the memristor 801 receives an input voltage signal of positive polarity, the memristor 802 also receives an input voltage signal of positive polarity. For example, the input voltage signal received by the memristor 801 is denoted by v (t), and the input voltage signal received by the memristor 802 is also denoted by v (t). Memristor 801 and memristor 802 are connected to different SL's, and the output current through memristor 801 and the output current through memristor 802 are subtracted at the SL's termination. Thus, the result of the multiply-accumulate computation of memristor 801 and memristor 802 is v (t) G ₁₁ -v(t)G ₁₂ That is v (t) (G) ₁₁ -G ₁₂ ). Thus, a memristor cell of a 2T2R structure containing memristor 801 and memristor 802 may correspond to one weight element, andthe weight element is G ₁₁ -G ₁₂ By configuring G ₁₁ -G ₁₂ The numerical relationship of (c) may implement positive, zero, and negative elements.

At least one embodiment of the present disclosure also provides a training device for a recurrent neural network based on a memristor array. FIG. 7 shows a schematic diagram of the training apparatus of the memristor array-based recurrent neural network.

As shown in fig. 7, the training apparatus 700 includes a training unit 701, a quantization unit 702, and a control unit 703. The training apparatus 700 is configured to train RNNs based on one or more memristor arrays 704, and in particular, to perform RNN training, and then to control a memristor array drive circuit 705 to drive the memristor arrays 704 for weight mapping, and then to perform RNN forward reasoning operations using the mapped memristor arrays 704.

The training unit 701 is configured to perform at least one training operation on the recurrent neural network to obtain an object weight matrix.

The quantization unit 702 is configured to perform quantization processing on the object weight values of the respective weight units in the object weight matrix to obtain a quantized object weight matrix.

The control unit 703 is configured to matrix map quantized object weights to the memristor array.

For example, the training unit 701 includes an acquisition module 7011, a quantization module 7012, and a processing module 7013.

The acquisition module 7011 is configured to acquire an initial weight matrix for the recurrent neural network in the current training operation.

The quantization module 7012 is configured to perform quantization processing on the initial weight values of the weight units in the initial weight matrix to obtain a first quantized weight matrix.

The processing module 7013 is configured to perform forward propagation processing using the first quantized weight matrix to obtain a forward calculation result, obtain a penalty term based on the current weight value of each weight unit in the initial weight matrix, obtain an initial loss value using a loss function based on the forward processing result, perform penalty term processing on the initial loss value using the penalty term to obtain a post-processing loss value, and perform backward propagation using the post-processing loss value to update the weight matrix to obtain an updated weight matrix. Here, the updated weight matrix is used as an initial weight matrix for the next training operation or as the aforementioned object weight matrix.

For example, the quantization unit 702 and the quantization module 7012 are independent of each other, e.g., have the same design, or both share at least part of an element (e.g., code), or in at least one example, the quantization unit 702 and the quantization module 7012 are the same unit/module.

The memristor array driving circuit 705 herein may include, for example, referring to fig. 2B and 2C, a word line driving circuit, a bit line driving circuit, a source line driving circuit, a signal acquisition device, a data output circuit, and the like; likewise, the memristor array 704 may refer to fig. 2B-2C or fig. 3 or fig. 6A-6D as well, and thus will not be described again here.

The control unit 703 is coupled to the memristor array drive circuit 705, thereby controlling the memristor array drive circuit 705 to drive the memristor array 704 to perform set (mapping), read, forward computation (forward propagation), etc. operations on the memristors.

Also, the training device 700 may further comprise a storage device 706, which storage device 706 is coupled to the training unit 701, the quantization unit 702 and the control unit 703 and is adapted to store data and/or instructions. For example, the control unit 703 reads the initial weight matrix from the storage device 706, the quantization unit 702 stores the quantized first quantized weight matrix in the storage device 706, the control unit 703 acquires the first quantized weight matrix from the storage device 706, then controls the memristor array driving circuit 705, acquires the forward calculation result from the memristor array driving circuit 705, further acquires the updated weight matrix, stores the updated weight matrix in the storage device 706, and so on.

The training unit 701, the quantization unit 702 and the control unit 703 may be implemented by software, firmware or hardware or any combination thereof. In the case of such units implemented in software, the implementation may be by a processor executing executable computer code stored on a storage medium (e.g., a storage device). For the operations of the training unit 701, the quantization unit 702, and the control unit 703, reference is made to the description of the corresponding training method described above with reference to fig. 4A and 4B, and no further description is given here.

Embodiments of the present disclosure are not limited to the type of processor (e.g., RISC, CISC), nor to the type of storage medium (e.g., semiconductor storage medium, optical storage medium, magnetic storage medium, etc.).

For the purposes of this disclosure, the following points are also noted:

(1) The drawings of the embodiments of the present disclosure relate only to the structures related to the embodiments of the present disclosure, and other structures may refer to the general design.

(2) The embodiments of the present disclosure and features in the embodiments may be combined with each other to arrive at a new embodiment without conflict.

The foregoing is merely exemplary embodiments of the present disclosure and is not intended to limit the scope of the disclosure, which is defined by the appended claims.

Claims

1. A memristor array-based training method for a recurrent neural network, comprising:

mapping the quantized object weight matrix to the memristor array,

wherein each of the at least one training operation comprises:

2. The training method of claim 1, wherein the obtaining a penalty term based on the current weight values of the respective weight units in the initial weight matrix comprises:

calculating the maximum likelihood estimation value of the initial weight matrix;

calculating a Fisher information matrix of the maximum likelihood estimation value;

the penalty term is obtained using the maximum likelihood estimate and the fisher information matrix.

3. The training method of claim 1, wherein the using the maximum likelihood estimate and the fisher information matrix to obtain the penalty term comprises:

the punishment term is beta omega _i-1 (θ _i-1 -θ* _i-1 ) ² ，

Wherein beta is a super parameter, omega _i-1 Representing the Fisher information matrix, θ _i-1 Represents an initial weight matrix, θ _i-1 Representing the maximum likelihood estimate.

4. The training method of claim 1 wherein the penalty term processing includes adding the penalty term to the initial loss value.

5. The training method of claim 1, wherein the loss function comprises a cross entropy loss function, a focus loss function, a mean square error loss function, a mean absolute error loss function, or a quantile loss function.

6. Training method according to any of the claims 1-5, wherein the quantization process is a three-value quantization process, quantizing the quantized weight values to-1, 0 or 1.

7. The training method of claim 6, wherein the mapping the first quantized weight matrix to the memristor array comprises:

a single quantized weight value is mapped using two memristor combinations.

8. The training method of claim 7, wherein the memristor array comprises a first memristor sub-array and a second memristor sub-array, the two memristors being located in the first memristor sub-array and in the second memristor sub-array, respectively.

9. The training method of claim 7, wherein the two memristors include a first memristor and a second memristor,

the mapping of single quantized weight values using two memristor combinations includes:

for a single quantized weight value of 0, mapping both memristors to 0;

mapping the first memristor to 1 and the second memristor to 0 for a single quantized weight value of 1;

for a single quantized weight value of-1, mapping the first memristor to 0 and the second memristor to 1,

Wherein the read current of the memristor corresponding to 0 is less than the read current of the memristor corresponding to 1.

10. The training method of claim 9, wherein a read noise of a read current of the memristor corresponding to 0 is less than a read noise of a read current of the memristor corresponding to 1.

11. A training device of a memristor array-based recurrent neural network, comprising: