CN116523014A

CN116523014A - Device and method for realizing physical RC network capable of learning on chip

Info

Publication number: CN116523014A
Application number: CN202310438361.3A
Authority: CN
Inventors: 朱云来; 方修全; 吴祖恒; 冯哲; 徐祖雨; 代月花
Original assignee: Anhui University
Current assignee: Anhui University
Priority date: 2023-04-23
Filing date: 2023-04-23
Publication date: 2023-08-01

Abstract

The invention discloses a device and a method for realizing a physical RC network capable of on-chip learning, and belongs to the field of memristor brain-like computing systems. The invention is based on the next generation RC network, and realizes the hardware of the RC network, thereby solving the problem that the parallelism of the traditional RC network operation is not high. The full hardware RC network of the invention can carry out in-situ training instead of carrying out data classification after simple weight mapping, thus improving the tolerance degree of noise caused by nonlinearity of devices of the network and adaptively making changes aiming at the change of external environment.

Description

Device and method for realizing physical RC network capable of learning on chip

Technical Field

The invention relates to the field of memristor brain-like computing systems, in particular to a device and a method for realizing a physical RC network capable of on-chip learning.

Background

The reservoir (reservoir computing, RC) is a simplified form of RNN. The RC concept was originally proposed to simulate the process of processing visual spatial sequence information by the cortical striatum system with a large number of circulatory connections in the biological brain. The core of RC is a hidden layer of recurrent neural network called "pool". The network is capable of converting the time-series input signal into a high-dimensional space. After high-dimensional conversion, the characteristics of the input signal can be more easily and effectively read out by a simple linear regression method.

Memristors are a device with a memory function that has been attracting attention in recent years. The cross array formed by memristor devices can complete matrix vector multiplication operation in an in-memory computing mode in an in-situ, parallel and physical mode through ohm law and kirchhoff law, so that data carrying in the computing process is effectively reduced, and the method has the advantages of low power consumption and high speed. The invention adopts the memristor array as the weight of the last layer, and the invention adopts a constant pulse updating mode due to the non-ideal characteristic. The result is a next generation RC network that is entirely comprised of hardware circuits and supports in-situ training.

Because of non-ideal characteristics of memristors, the adoption of fixed pulse update weights is the most stable way. SBP-movement is an on-chip learning method, but requires a large number of registers to store floating point data each time the resulting error value of the previous layer needs to be saved.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a method and a device for realizing a physical RC network capable of on-chip learning.

The aim of the invention can be achieved by the following technical method:

an implementation apparatus of a physical RC network capable of on-chip learning, comprising:

a signal input circuit configured to receive an input signal;

the storage pool module comprises storage pool units, each storage pool unit in the storage pool units is configured to receive the input signals, the input signals are multiplied by the input signals through analog multipliers, the analog multipliers are formed into a matrix shape, the same input signals are adopted above and to the left of the analog multipliers, the result is the upper triangle part of each multiplier matrix, the first processing result is obtained, the first processing result and the input signals are subjected to vector splicing operation to obtain the second processing result, the second processing result is subjected to matrix multiplication with a constant memristor array, and the dimension reduction operation is carried out to obtain the third processing result;

and the output layer unit is configured to multiply a plurality of third processing results of the plurality of reserve tank units with the weight matrix to obtain a fourth processing result and output the fourth processing result.

In some disclosures, the output layer unit includes a weight matrix, a back propagation circuit, a cross entropy circuit and a storage logic circuit, the class probability is output through the softmax circuit through a plurality of third processing results, the error of the input probability is calculated by the cross entropy circuit, so as to obtain an updated symbol of the weight, the symbol is positive, a forward pulse is applied to the device, the weight is reduced, the symbol is negative, a back pulse is applied to the device, and the weight is increased.

In some disclosures, the cross entropy circuit calculates an error in the input probability, the cross entropy formula being shown in formulas (1) (2)

Wherein formula (2) is a softmax calculation formula; the cross entropy circuit obtains logarithm, and adopts an equivalent replacement principle; first, the logarithm lx is taylor-expanded at x=1, as shown in formula (3);

the higher order term is ignored and the first term is taken. Then the output error voltage passes through a comparator, and is compared with a given threshold voltage, and if the error voltage is smaller than the threshold voltage, a low level is output; the last counter is used for judging whether the training is converged or not, and stopping the training when the training is converged.

In some disclosures, the counter-propagating circuit: the updating direction of the weight matrix depends on the sign of the backward propagation gradient, firstly, when in forward propagation, the input data is a second processing result, the weight matrix is G, the output voltage vector is obtained by the kirchhoff law and is a third processing result, and then the final output probability is obtained by a formula 2; obtaining error values by the output probability through a cross entropy calculation circuit, so that the derivative sign of the back propagation depends on the derivative of the error values on each weight;

the principle is shown by formula (4)

Where L is the error obtained through cross entropy,for the output through the Softmax circuit, C is the input of the Softmax circuit, i.e. the output through the weight matrix, W is the weight matrix G, X is the data of the input weight matrix, i.e. the second processing result, and finally the sign of each direction propagation depends on the update matrix obtained by the logic circuit between the difference value of the second processing result of the input voltage at the Softmax output Y and the target tag.

In some disclosures, the cross entropy back propagation circuit calculates the error of each class by adding a threshold value, so that the algorithm convergence speed is increased, that is, the algorithm is updated only when the algorithm convergence speed is higher than the threshold value; therefore, some modification of equation (4) is required. Obtaining a formula 5;

wherein B isAnd comparing the obtained difference values through threshold values to obtain an error update condition, wherein 1 is update and 0 is not update.

In some disclosures, the storage logic is to store the last updated state by employing two memristors to store the updated state for one weight.

An implementation method of a physical RC network capable of on-chip learning is used for the implementation device of the physical RC network capable of on-chip learning, and comprises the following steps: performing reasoning calculation operation by using the implementation device of the physical RC network capable of on-chip learning; or performing training calculation operation by using the implementation device of the physical RC network capable of on-chip learning.

The inference calculation operation includes: receiving an input signal for the inference calculation operation through the signal input circuit; the input signals are multiplied by the input signals through analog multipliers through the reserve pool circuit, the analog multipliers are formed into a matrix shape, the same input signals are adopted above and on the left of the analog multipliers, the result is obtained in the upper triangular part of each multiplier matrix, a first processing result is obtained, the first processing result and the input signals are combined into a second processing result, then the second processing result and a constant memristor array are subjected to matrix multiplication, and dimension reduction operation is carried out, so that a third processing result is obtained;

multiplying the plurality of third processing results by the weight matrix through the output layer unit to obtain the fourth processing result, and outputting the fourth processing result.

The training calculation operation includes:

receiving, by the signal input circuit, an input signal for the training computing operation and a tag value for the input signal;

the input signals are multiplied by the input signals through analog multipliers through the reserve pool circuit, the analog multipliers are formed into a matrix shape, the same input signals are adopted above and on the left of the analog multipliers, the result is obtained in the upper triangular part of each multiplier matrix, a first processing result is obtained, the first processing result and the input signals are combined into a second processing result, then the second processing result and a constant memristor array are subjected to matrix multiplication, and dimension reduction operation is carried out, so that a third processing result is obtained;

multiplying the plurality of third processing results by the weight matrix through the output layer unit to obtain a fourth processing result, and outputting the fourth processing result;

calculating an error of the weight matrix according to the plurality of fourth processing results and the tag values of the training input signals to update the weight matrix; and writing the updated weight matrix into the output layer unit.

The invention has the beneficial effects that: at present, the RC network basically performs data processing through a physical reservoir, performs training after obtaining data, does not perform data processing and training at the same time, and needs a large number of registers to take a storage history state. Aiming at the problem, the invention provides an RC network realization device based on the next generation RC network, which is used for realizing hardware, thereby solving the problem that the parallelism of the traditional RC network operation is not high. The full hardware RC network of the invention can carry out in-situ training instead of carrying out data classification after simple weight mapping, thus improving the tolerance degree of noise caused by nonlinearity of devices of the network and adaptively making changes aiming at the change of external environment.

Drawings

The invention is further described below with reference to the accompanying drawings.

FIG. 1 is a system block diagram of the present application;

FIG. 2 is a Softmax circuit of the present application;

FIG. 3 is a cross entropy calculation circuit of the present application;

FIG. 4 is a cross-entropy back propagation circuit of the present application;

FIG. 5 is a storage logic diagram of the present application;

FIG. 6 is an updated logic diagram of the present application;

FIG. 7 is a graph comparing results of the present application;

Detailed Description

The technical method according to the embodiments of the present invention will be clearly and completely described in the following description with reference to the drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

An implementation device and method of a physical RC network capable of on-chip learning, comprising the following steps:

a signal input circuit configured to receive an input signal;

the storage pool module comprises storage pool units, each storage pool unit in the storage pool units is configured to receive the input signals, the input signals are multiplied by the input signals through analog multipliers, the analog multipliers are formed into a matrix shape, the same input signals are adopted above and to the left of the analog multipliers, the result is the upper triangle part of each multiplier matrix, the first processing result is obtained, the first processing result and the input signals are subjected to vector splicing operation to obtain the second processing result, the second processing result is subjected to matrix multiplication with a constant memristor array, and the dimension reduction operation is carried out to obtain the third processing result; the array is mainly formed by utilizing the randomness of memristors, and by writing the memristors into the same weight, a weight array obeying normal distribution is finally obtained due to the variability of device equipment and device equipment of the memristors, wherein the dimension reduction operation is mainly formed by utilizing the random projection theorem. Generally, a data is collected, some characteristics of a time period are obtained through a delay circuit, characteristic points of different time periods are obtained through a signal coding circuit, and a Vn signal is obtained.

The training calculation operation includes:

The hardware part is shown in FIG. 1, which is a block diagram of the whole system, which is an analog voltage signal V collected by a sensor _linear First the product (eg. X) ₁ ,x ₂ ＝>x ₁ *x ₂ ,x ₁ *x ₁ ,x ₂ *x ₂ ) Here we form the analog multipliers into a matrix shape, the upper and left inputs are the same, the result takes the upper triangle of each multiplier matrix to get V _nonlinear And then V is added _linear And V is equal to _nonlinear Vector splicing operation to obtain new voltage vector V _t1 And then V is arranged _t1 Matrix multiplication is carried out on the constant memristor array, and dimension reduction operation is carried out to obtain V _t2 . Wherein the constant array is to utilize the randomness of memristors, and the conductance obeys normal distribution.

Finally, the training part, the main weight matrix, the back propagation circuit, the cross entropy circuit and the storage logic circuit. By obtaining the voltage vector V from the upper layer _t2 Through the weight matrix, the class probability is output through the softmax circuit, the error of the input probability is calculated by the cross entropy circuit, so that the updated sign of the weight is obtained, the sign is positive, the forward pulse (Set) is applied to the device, the weight is reduced, the sign is negative, the reverse pulse (Set) is applied to the device, and the weight is increased. The storage logic stores the last updated state.

2. In situ training part

Cross entropy circuit, cross entropy formula is shown as formula (1) (2)

Wherein equation (2) is a softmax calculation equation. The circuit diagram 2, 3 can be obtained according to the formula. Fig. 2 can be derived from equation 2, where the exponent output is obtained by passing the input data through an e exponent circuit, denoted as a first output Vi, and the numerator portion of equation 2 is obtained, where Vsum is obtained by passing the input data through a co-directional adder formed by an integrated operational amplifier, denoted as a second output, that is, a denominator portion, and finally dividing the first output and the second output sequentially to obtain a softmax output. The Exponential part is a circuit for suppressing temperature drift to generate e-exponent signal output, and mainly uses the relation of base current and collector current of triode to present e-exponent.

The cross entropy circuit of fig. 3 performs logarithm, and adopts an equivalent substitution principle. The method is mainly obtained by a formula (1), firstly, the formula 2 is brought into the formula 1 for simplification, and the value of y is only 0 and 1 in consideration of the fact that y is formed by single-heat coding, so that the final simplification result is that

Due toThe Taylor opening of l nx can be performed by the Vsum output of circuit 2 through equation (3), thus the final reduction result is

An adder circuit formed by a first operational amplifier to obtain 1+x _i Wherein 1 is taken as reference currentThe Vref is input as an input,

the second operational amplifier is used as a subtractor, the purpose is to finally obtain an error L, and whether the algorithm converges or not is judged by a voltage comparator (compare).

First, the logarithm lnx is taylor-splayed at x=1 as shown in equation (3).

We ignore the higher order term and take the first term. The output error voltage is then passed through a comparator, and compared with a given threshold voltage, a low level is output if the error voltage is smaller than the threshold voltage. The last counter is used for judging whether the training is converged or not, and stopping the training when the training is converged.

Back propagation circuit: the updating direction of the weight matrix depends on the sign of the backward propagation gradient, firstly, when the forward propagation is carried out, the input data is V _t2 The weight matrix is G, and the output voltage vector is V obtained by using kirchhoff's law _t3 And the final output probability is obtained through the circuit shown in the formula 2 and the figure 2. The error value is then derived from the probability of the output through the circuit shown in fig. 3, so that the sign of the derivative of the back propagation depends on the derivative of the error value for each weight.

The principle is shown by formula (4)

Where L is the error obtained through cross entropy,for the output through the Softmax circuit, C is the input of the Softmax circuit, i.e. the output through the weight matrix, W is the weight matrix G, X is the data of the input weight matrix, i.e. V _t2 The sign of each direction propagation ultimately depends on the input voltage V _t2 The update matrix obtained by the logic circuit between the softmax output Y and the difference value of the target tag. As shown in fig. 4, vo is first taken as passingThe voltage signal output by softmax, vc is a single thermal coding input vector, only indexes 0 and 1,1 represent the sizes of labels, the actual loss of a target label is screened out through a select target part, the loss of other labels is the actual output probability, the actual output probability is obtained through Vc control signals, the final label loss is obtained through a network for each data input person, and in order to accelerate the convergence of network training, whether the loss has updating necessity is determined through a threshold voltage comparison. The algorithm convergence speed is increased by adding a threshold value when calculating the error of each category, namely, the algorithm is updated only when the algorithm convergence speed is higher than the threshold value. Therefore, some modification of equation (4) is required. Equation 5 is obtained.

Storage logic: the purpose of this circuit is to store the last updated state, here by using two memristors to store one weighted updated state, as shown in fig. 5. The method comprises the steps that two memristors are adopted to store historical states, opposite-sign voltages are respectively input at the same time, if the two memristors are in a high-resistance state at the same time through a series resistance voltage division principle, circuit outputs are 0, output voltages obtained by different high-low states of the two memristors tend to be negative voltages or positive voltages, and 1-0 is obtained through a voltage comparator; the 0-1 output R1R2 is in a high resistance state and a low configuration at the same time, and Set is stored; R1R2 is in a low resistance state and a high configuration at the same time, and a set is stored; R1R2 is in a high resistance state and a high configuration at the same time, and is not updated in storage; v+ and V-are logic voltages obtained through the positive and negative voltage comparators respectively, and stored information is obtained through the logic circuits.

FIG. 6 shows the updated logic based on equation 5, vi being the input weightThe voltage sign of the re-array is X input signal, positive is 1, negative is 0, ve is error sign, isThe sign of interpolation, vec, is determined by the sign of the output after passing the voltage threshold, and it is determined for B whether to update, vhset and Vhset are Vset and Vreset, which are the final values of fig. 5, and are the read history states, i.e., the logic signals obtained by the memory logic circuit, are determined by Set and Reset, respectively.

Table 1, update logic table

The table above gives an update state table, where 1 is denoted Set, -1 is denoted unset, 0 is not updated, and History is an updated state stored on the memristor array. Current is calculated according to equation (5) to obtain the Current update state. The Updata column serves as the final update state.

The invention mainly provides an RC network implementation device aiming at the next generation RC network, which mainly comprises a hardware implementation method and an in-situ training method, wherein the hardware implementation method reduces the power consumption and improves the operation speed compared with software; the hardware adopts an in-situ training method, and improves and gives out specific circuit implementation according to the original in-situ training method.

The patent provides a hardware implementation method based on the next generation RC network. In order to embody the feasibility and the practicability of the method, the method is tested on an mnist handwriting digital recognition task. Simulation results show that the system using the method can reach 93.1% accuracy after training for 4 times, wherein compared with the prior method, the simulation verification shows that the test accuracy after training is improved by 10%, the final result is compared with the SBP-movement algorithm, as shown in fig. 7, the figure mainly compares the SBP-movement algorithm in hardware design, the original hardware design is compared with the design, and the final test accuracy is aimed at the number of data training data.

In the description of the present invention, it should be understood that the terms "open," "upper," "lower," "thickness," "top," "middle," "length," "inner," "peripheral," and the like indicate orientation or positional relationships, merely for convenience in describing the present invention and to simplify the description, and do not indicate or imply that the components or elements referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus should not be construed as limiting the present invention.

In the description of the present specification, the descriptions of the terms "one embodiment," "example," "specific example," and the like, mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

The foregoing has shown and described the basic principles, principal features and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and that the above embodiments and descriptions are merely illustrative of the principles of the present invention, and various changes and modifications may be made without departing from the spirit and scope of the invention, which is defined in the appended claims.

Claims

1. An implementation apparatus of a physical RC network capable of on-chip learning, comprising:

a signal input circuit configured to receive an input signal;

2. The device for implementing the physical RC network capable of on-chip learning according to claim 1, wherein the output layer unit includes a weight matrix, a counter propagation circuit, a cross entropy circuit, and a storage logic circuit, the class probability is output through the softmax circuit through a plurality of third processing results passing through the weight matrix, the error of the input probability is calculated by the cross entropy circuit, so as to obtain an updated symbol of the weight, the symbol is positive, a forward pulse is applied to the device, the weight is reduced, the symbol is negative, a reverse pulse is applied to the device, and the weight is increased.

3. The apparatus according to claim 2, wherein the cross entropy circuit calculates an error of the input probability, and the cross entropy formula is shown in formula (1) (2)

Wherein the formula (2) is a softmax calculation formula, L is a finally obtained error value, and y is a target label value，The output result of the formula (2); the cross entropy circuit obtains logarithm, and adopts an equivalent replacement principle; first, the logarithm lnx is taylor-splayed at x=1 as shown in formula (3);

x is actually the output of equation (2), but this equation only indicates that the logarithm can be taylor-expanded and does not belong to this system equation; ignoring the higher term and taking the first term; then the output error voltage passes through a comparator, and is compared with a given threshold voltage, and if the error voltage is smaller than the threshold voltage, a low level is output; the last counter is used for judging whether the training is converged or not, and stopping the training when the training is converged.

4. An apparatus for implementing an on-chip learnable physical RC network as in claim 2, wherein the back propagation circuit comprises: the updating direction of the weight matrix depends on the sign of the backward propagation gradient, firstly, when in forward propagation, the input data is a second processing result, the weight matrix is G, the output voltage vector is obtained by the kirchhoff law and is a third processing result, and then the final output probability is obtained by a formula 2; obtaining error values by the output probability through a cross entropy calculation circuit, so that the derivative sign of the back propagation depends on the derivative of the error values on each weight;

the principle is shown by formula (4)

Where L is the error obtained through cross entropy,for output through the Softmax circuit, C isThe input of the Softmax circuit is the output through the weight matrix, W is the weight matrix G, X is the data of the input weight matrix, namely the second processing result, Y is the label value, and all the variable data of the formula are in matrix form _； The sign of each directional propagation ultimately depends on the update matrix obtained by the logic circuit between the input voltage second processing result and the difference between the softmax output Y and the target tag.

5. The device for implementing an on-chip learning physical RC network of claim 4 wherein the counter-propagating circuit calculates the error for each class by adding a threshold above which the algorithm converges faster; therefore, a certain modification is needed to the formula (4); obtaining a formula 5;

wherein B isThe obtained difference is mainly composed of 0-1, and all variables of the formula are matrixes; after threshold comparison, an error update condition is obtained, wherein 1 is updated, and 0 is not updated.

6. The device of claim 2, wherein the storage logic is configured to store a last updated state, and to store a weighted updated state by using two memristors.

7. A method for implementing an on-chip learnable physical RC network, which is used in an on-chip learnable physical RC network according to any one of claims 1 to 6, comprising the steps of: performing reasoning calculation operation by using the implementation device of the physical RC network capable of on-chip learning; or performing training calculation operation by using the implementation device of the physical RC network capable of on-chip learning.

8. A method of implementing an on-chip learnable physical RC network as in claim 7, wherein said inferential computing operation comprises: receiving an input signal for the inference calculation operation through the signal input circuit; the input signals are multiplied by the input signals through analog multipliers through the reserve pool circuit, the analog multipliers are formed into a matrix shape, the same input signals are adopted above and on the left of the analog multipliers, the result is obtained in the upper triangular part of each multiplier matrix, a first processing result is obtained, the first processing result and the input signals are combined into a second processing result, then the second processing result and a constant memristor array are subjected to matrix multiplication, and dimension reduction operation is carried out, so that a third processing result is obtained;

9. A method of implementing an on-chip learnable physical RC network according to claim 7, wherein the training computing operation comprises: