CN116523014A - Device and method for realizing physical RC network capable of learning on chip - Google Patents

Device and method for realizing physical RC network capable of learning on chip Download PDF

Info

Publication number
CN116523014A
CN116523014A CN202310438361.3A CN202310438361A CN116523014A CN 116523014 A CN116523014 A CN 116523014A CN 202310438361 A CN202310438361 A CN 202310438361A CN 116523014 A CN116523014 A CN 116523014A
Authority
CN
China
Prior art keywords
processing result
circuit
matrix
output
input
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310438361.3A
Other languages
Chinese (zh)
Inventor
朱云来
方修全
吴祖恒
冯哲
徐祖雨
代月花
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui University
Original Assignee
Anhui University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui University filed Critical Anhui University
Priority to CN202310438361.3A priority Critical patent/CN116523014A/en
Publication of CN116523014A publication Critical patent/CN116523014A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • G06N3/065Analogue means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/52Multiplying; Dividing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/061Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using biological neurons, e.g. biological neurons connected to an integrated circuit
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Neurology (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Microelectronics & Electronic Packaging (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a device and a method for realizing a physical RC network capable of on-chip learning, and belongs to the field of memristor brain-like computing systems. The invention is based on the next generation RC network, and realizes the hardware of the RC network, thereby solving the problem that the parallelism of the traditional RC network operation is not high. The full hardware RC network of the invention can carry out in-situ training instead of carrying out data classification after simple weight mapping, thus improving the tolerance degree of noise caused by nonlinearity of devices of the network and adaptively making changes aiming at the change of external environment.

Description

Device and method for realizing physical RC network capable of learning on chip
Technical Field
The invention relates to the field of memristor brain-like computing systems, in particular to a device and a method for realizing a physical RC network capable of on-chip learning.
Background
The reservoir (reservoir computing, RC) is a simplified form of RNN. The RC concept was originally proposed to simulate the process of processing visual spatial sequence information by the cortical striatum system with a large number of circulatory connections in the biological brain. The core of RC is a hidden layer of recurrent neural network called "pool". The network is capable of converting the time-series input signal into a high-dimensional space. After high-dimensional conversion, the characteristics of the input signal can be more easily and effectively read out by a simple linear regression method.
Memristors are a device with a memory function that has been attracting attention in recent years. The cross array formed by memristor devices can complete matrix vector multiplication operation in an in-memory computing mode in an in-situ, parallel and physical mode through ohm law and kirchhoff law, so that data carrying in the computing process is effectively reduced, and the method has the advantages of low power consumption and high speed. The invention adopts the memristor array as the weight of the last layer, and the invention adopts a constant pulse updating mode due to the non-ideal characteristic. The result is a next generation RC network that is entirely comprised of hardware circuits and supports in-situ training.
Because of non-ideal characteristics of memristors, the adoption of fixed pulse update weights is the most stable way. SBP-movement is an on-chip learning method, but requires a large number of registers to store floating point data each time the resulting error value of the previous layer needs to be saved.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a method and a device for realizing a physical RC network capable of on-chip learning.
The aim of the invention can be achieved by the following technical method:
an implementation apparatus of a physical RC network capable of on-chip learning, comprising:
a signal input circuit configured to receive an input signal;
the storage pool module comprises storage pool units, each storage pool unit in the storage pool units is configured to receive the input signals, the input signals are multiplied by the input signals through analog multipliers, the analog multipliers are formed into a matrix shape, the same input signals are adopted above and to the left of the analog multipliers, the result is the upper triangle part of each multiplier matrix, the first processing result is obtained, the first processing result and the input signals are subjected to vector splicing operation to obtain the second processing result, the second processing result is subjected to matrix multiplication with a constant memristor array, and the dimension reduction operation is carried out to obtain the third processing result;
and the output layer unit is configured to multiply a plurality of third processing results of the plurality of reserve tank units with the weight matrix to obtain a fourth processing result and output the fourth processing result.
In some disclosures, the output layer unit includes a weight matrix, a back propagation circuit, a cross entropy circuit and a storage logic circuit, the class probability is output through the softmax circuit through a plurality of third processing results, the error of the input probability is calculated by the cross entropy circuit, so as to obtain an updated symbol of the weight, the symbol is positive, a forward pulse is applied to the device, the weight is reduced, the symbol is negative, a back pulse is applied to the device, and the weight is increased.
In some disclosures, the cross entropy circuit calculates an error in the input probability, the cross entropy formula being shown in formulas (1) (2)
Wherein formula (2) is a softmax calculation formula; the cross entropy circuit obtains logarithm, and adopts an equivalent replacement principle; first, the logarithm lx is taylor-expanded at x=1, as shown in formula (3);
the higher order term is ignored and the first term is taken. Then the output error voltage passes through a comparator, and is compared with a given threshold voltage, and if the error voltage is smaller than the threshold voltage, a low level is output; the last counter is used for judging whether the training is converged or not, and stopping the training when the training is converged.
In some disclosures, the counter-propagating circuit: the updating direction of the weight matrix depends on the sign of the backward propagation gradient, firstly, when in forward propagation, the input data is a second processing result, the weight matrix is G, the output voltage vector is obtained by the kirchhoff law and is a third processing result, and then the final output probability is obtained by a formula 2; obtaining error values by the output probability through a cross entropy calculation circuit, so that the derivative sign of the back propagation depends on the derivative of the error values on each weight;
the principle is shown by formula (4)
Where L is the error obtained through cross entropy,for the output through the Softmax circuit, C is the input of the Softmax circuit, i.e. the output through the weight matrix, W is the weight matrix G, X is the data of the input weight matrix, i.e. the second processing result, and finally the sign of each direction propagation depends on the update matrix obtained by the logic circuit between the difference value of the second processing result of the input voltage at the Softmax output Y and the target tag.
In some disclosures, the cross entropy back propagation circuit calculates the error of each class by adding a threshold value, so that the algorithm convergence speed is increased, that is, the algorithm is updated only when the algorithm convergence speed is higher than the threshold value; therefore, some modification of equation (4) is required. Obtaining a formula 5;
wherein B isAnd comparing the obtained difference values through threshold values to obtain an error update condition, wherein 1 is update and 0 is not update.
In some disclosures, the storage logic is to store the last updated state by employing two memristors to store the updated state for one weight.
An implementation method of a physical RC network capable of on-chip learning is used for the implementation device of the physical RC network capable of on-chip learning, and comprises the following steps: performing reasoning calculation operation by using the implementation device of the physical RC network capable of on-chip learning; or performing training calculation operation by using the implementation device of the physical RC network capable of on-chip learning.
The inference calculation operation includes: receiving an input signal for the inference calculation operation through the signal input circuit; the input signals are multiplied by the input signals through analog multipliers through the reserve pool circuit, the analog multipliers are formed into a matrix shape, the same input signals are adopted above and on the left of the analog multipliers, the result is obtained in the upper triangular part of each multiplier matrix, a first processing result is obtained, the first processing result and the input signals are combined into a second processing result, then the second processing result and a constant memristor array are subjected to matrix multiplication, and dimension reduction operation is carried out, so that a third processing result is obtained;
multiplying the plurality of third processing results by the weight matrix through the output layer unit to obtain the fourth processing result, and outputting the fourth processing result.
The training calculation operation includes:
receiving, by the signal input circuit, an input signal for the training computing operation and a tag value for the input signal;
the input signals are multiplied by the input signals through analog multipliers through the reserve pool circuit, the analog multipliers are formed into a matrix shape, the same input signals are adopted above and on the left of the analog multipliers, the result is obtained in the upper triangular part of each multiplier matrix, a first processing result is obtained, the first processing result and the input signals are combined into a second processing result, then the second processing result and a constant memristor array are subjected to matrix multiplication, and dimension reduction operation is carried out, so that a third processing result is obtained;
multiplying the plurality of third processing results by the weight matrix through the output layer unit to obtain a fourth processing result, and outputting the fourth processing result;
calculating an error of the weight matrix according to the plurality of fourth processing results and the tag values of the training input signals to update the weight matrix; and writing the updated weight matrix into the output layer unit.
The invention has the beneficial effects that: at present, the RC network basically performs data processing through a physical reservoir, performs training after obtaining data, does not perform data processing and training at the same time, and needs a large number of registers to take a storage history state. Aiming at the problem, the invention provides an RC network realization device based on the next generation RC network, which is used for realizing hardware, thereby solving the problem that the parallelism of the traditional RC network operation is not high. The full hardware RC network of the invention can carry out in-situ training instead of carrying out data classification after simple weight mapping, thus improving the tolerance degree of noise caused by nonlinearity of devices of the network and adaptively making changes aiming at the change of external environment.
Drawings
The invention is further described below with reference to the accompanying drawings.
FIG. 1 is a system block diagram of the present application;
FIG. 2 is a Softmax circuit of the present application;
FIG. 3 is a cross entropy calculation circuit of the present application;
FIG. 4 is a cross-entropy back propagation circuit of the present application;
FIG. 5 is a storage logic diagram of the present application;
FIG. 6 is an updated logic diagram of the present application;
FIG. 7 is a graph comparing results of the present application;
Detailed Description
The technical method according to the embodiments of the present invention will be clearly and completely described in the following description with reference to the drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
An implementation device and method of a physical RC network capable of on-chip learning, comprising the following steps:
a signal input circuit configured to receive an input signal;
the storage pool module comprises storage pool units, each storage pool unit in the storage pool units is configured to receive the input signals, the input signals are multiplied by the input signals through analog multipliers, the analog multipliers are formed into a matrix shape, the same input signals are adopted above and to the left of the analog multipliers, the result is the upper triangle part of each multiplier matrix, the first processing result is obtained, the first processing result and the input signals are subjected to vector splicing operation to obtain the second processing result, the second processing result is subjected to matrix multiplication with a constant memristor array, and the dimension reduction operation is carried out to obtain the third processing result; the array is mainly formed by utilizing the randomness of memristors, and by writing the memristors into the same weight, a weight array obeying normal distribution is finally obtained due to the variability of device equipment and device equipment of the memristors, wherein the dimension reduction operation is mainly formed by utilizing the random projection theorem. Generally, a data is collected, some characteristics of a time period are obtained through a delay circuit, characteristic points of different time periods are obtained through a signal coding circuit, and a Vn signal is obtained.
And the output layer unit is configured to multiply a plurality of third processing results of the plurality of reserve tank units with the weight matrix to obtain a fourth processing result and output the fourth processing result.
An implementation method of a physical RC network capable of on-chip learning is used for the implementation device of the physical RC network capable of on-chip learning, and comprises the following steps: performing reasoning calculation operation by using the implementation device of the physical RC network capable of on-chip learning; or performing training calculation operation by using the implementation device of the physical RC network capable of on-chip learning.
The inference calculation operation includes: receiving an input signal for the inference calculation operation through the signal input circuit; the input signals are multiplied by the input signals through analog multipliers through the reserve pool circuit, the analog multipliers are formed into a matrix shape, the same input signals are adopted above and on the left of the analog multipliers, the result is obtained in the upper triangular part of each multiplier matrix, a first processing result is obtained, the first processing result and the input signals are combined into a second processing result, then the second processing result and a constant memristor array are subjected to matrix multiplication, and dimension reduction operation is carried out, so that a third processing result is obtained;
multiplying the plurality of third processing results by the weight matrix through the output layer unit to obtain the fourth processing result, and outputting the fourth processing result.
The training calculation operation includes:
receiving, by the signal input circuit, an input signal for the training computing operation and a tag value for the input signal;
the input signals are multiplied by the input signals through analog multipliers through the reserve pool circuit, the analog multipliers are formed into a matrix shape, the same input signals are adopted above and on the left of the analog multipliers, the result is obtained in the upper triangular part of each multiplier matrix, a first processing result is obtained, the first processing result and the input signals are combined into a second processing result, then the second processing result and a constant memristor array are subjected to matrix multiplication, and dimension reduction operation is carried out, so that a third processing result is obtained;
multiplying the plurality of third processing results by the weight matrix through the output layer unit to obtain a fourth processing result, and outputting the fourth processing result;
calculating an error of the weight matrix according to the plurality of fourth processing results and the tag values of the training input signals to update the weight matrix; and writing the updated weight matrix into the output layer unit.
The hardware part is shown in FIG. 1, which is a block diagram of the whole system, which is an analog voltage signal V collected by a sensor linear First the product (eg. X) 1 ,x 2 =>x 1 *x 2 ,x 1 *x 1 ,x 2 *x 2 ) Here we form the analog multipliers into a matrix shape, the upper and left inputs are the same, the result takes the upper triangle of each multiplier matrix to get V nonlinear And then V is added linear And V is equal to nonlinear Vector splicing operation to obtain new voltage vector V t1 And then V is arranged t1 Matrix multiplication is carried out on the constant memristor array, and dimension reduction operation is carried out to obtain V t2 . Wherein the constant array is to utilize the randomness of memristors, and the conductance obeys normal distribution.
Finally, the training part, the main weight matrix, the back propagation circuit, the cross entropy circuit and the storage logic circuit. By obtaining the voltage vector V from the upper layer t2 Through the weight matrix, the class probability is output through the softmax circuit, the error of the input probability is calculated by the cross entropy circuit, so that the updated sign of the weight is obtained, the sign is positive, the forward pulse (Set) is applied to the device, the weight is reduced, the sign is negative, the reverse pulse (Set) is applied to the device, and the weight is increased. The storage logic stores the last updated state.
2. In situ training part
Cross entropy circuit, cross entropy formula is shown as formula (1) (2)
Wherein equation (2) is a softmax calculation equation. The circuit diagram 2, 3 can be obtained according to the formula. Fig. 2 can be derived from equation 2, where the exponent output is obtained by passing the input data through an e exponent circuit, denoted as a first output Vi, and the numerator portion of equation 2 is obtained, where Vsum is obtained by passing the input data through a co-directional adder formed by an integrated operational amplifier, denoted as a second output, that is, a denominator portion, and finally dividing the first output and the second output sequentially to obtain a softmax output. The Exponential part is a circuit for suppressing temperature drift to generate e-exponent signal output, and mainly uses the relation of base current and collector current of triode to present e-exponent.
The cross entropy circuit of fig. 3 performs logarithm, and adopts an equivalent substitution principle. The method is mainly obtained by a formula (1), firstly, the formula 2 is brought into the formula 1 for simplification, and the value of y is only 0 and 1 in consideration of the fact that y is formed by single-heat coding, so that the final simplification result is that
Due toThe Taylor opening of l nx can be performed by the Vsum output of circuit 2 through equation (3), thus the final reduction result is
An adder circuit formed by a first operational amplifier to obtain 1+x i Wherein 1 is taken as reference currentThe Vref is input as an input,
the second operational amplifier is used as a subtractor, the purpose is to finally obtain an error L, and whether the algorithm converges or not is judged by a voltage comparator (compare).
First, the logarithm lnx is taylor-splayed at x=1 as shown in equation (3).
We ignore the higher order term and take the first term. The output error voltage is then passed through a comparator, and compared with a given threshold voltage, a low level is output if the error voltage is smaller than the threshold voltage. The last counter is used for judging whether the training is converged or not, and stopping the training when the training is converged.
Back propagation circuit: the updating direction of the weight matrix depends on the sign of the backward propagation gradient, firstly, when the forward propagation is carried out, the input data is V t2 The weight matrix is G, and the output voltage vector is V obtained by using kirchhoff's law t3 And the final output probability is obtained through the circuit shown in the formula 2 and the figure 2. The error value is then derived from the probability of the output through the circuit shown in fig. 3, so that the sign of the derivative of the back propagation depends on the derivative of the error value for each weight.
The principle is shown by formula (4)
Where L is the error obtained through cross entropy,for the output through the Softmax circuit, C is the input of the Softmax circuit, i.e. the output through the weight matrix, W is the weight matrix G, X is the data of the input weight matrix, i.e. V t2 The sign of each direction propagation ultimately depends on the input voltage V t2 The update matrix obtained by the logic circuit between the softmax output Y and the difference value of the target tag. As shown in fig. 4, vo is first taken as passingThe voltage signal output by softmax, vc is a single thermal coding input vector, only indexes 0 and 1,1 represent the sizes of labels, the actual loss of a target label is screened out through a select target part, the loss of other labels is the actual output probability, the actual output probability is obtained through Vc control signals, the final label loss is obtained through a network for each data input person, and in order to accelerate the convergence of network training, whether the loss has updating necessity is determined through a threshold voltage comparison. The algorithm convergence speed is increased by adding a threshold value when calculating the error of each category, namely, the algorithm is updated only when the algorithm convergence speed is higher than the threshold value. Therefore, some modification of equation (4) is required. Equation 5 is obtained.
Wherein B isAnd comparing the obtained difference values through threshold values to obtain an error update condition, wherein 1 is update and 0 is not update.
Storage logic: the purpose of this circuit is to store the last updated state, here by using two memristors to store one weighted updated state, as shown in fig. 5. The method comprises the steps that two memristors are adopted to store historical states, opposite-sign voltages are respectively input at the same time, if the two memristors are in a high-resistance state at the same time through a series resistance voltage division principle, circuit outputs are 0, output voltages obtained by different high-low states of the two memristors tend to be negative voltages or positive voltages, and 1-0 is obtained through a voltage comparator; the 0-1 output R1R2 is in a high resistance state and a low configuration at the same time, and Set is stored; R1R2 is in a low resistance state and a high configuration at the same time, and a set is stored; R1R2 is in a high resistance state and a high configuration at the same time, and is not updated in storage; v+ and V-are logic voltages obtained through the positive and negative voltage comparators respectively, and stored information is obtained through the logic circuits.
FIG. 6 shows the updated logic based on equation 5, vi being the input weightThe voltage sign of the re-array is X input signal, positive is 1, negative is 0, ve is error sign, isThe sign of interpolation, vec, is determined by the sign of the output after passing the voltage threshold, and it is determined for B whether to update, vhset and Vhset are Vset and Vreset, which are the final values of fig. 5, and are the read history states, i.e., the logic signals obtained by the memory logic circuit, are determined by Set and Reset, respectively.
Table 1, update logic table
The table above gives an update state table, where 1 is denoted Set, -1 is denoted unset, 0 is not updated, and History is an updated state stored on the memristor array. Current is calculated according to equation (5) to obtain the Current update state. The Updata column serves as the final update state.
The invention mainly provides an RC network implementation device aiming at the next generation RC network, which mainly comprises a hardware implementation method and an in-situ training method, wherein the hardware implementation method reduces the power consumption and improves the operation speed compared with software; the hardware adopts an in-situ training method, and improves and gives out specific circuit implementation according to the original in-situ training method.
The patent provides a hardware implementation method based on the next generation RC network. In order to embody the feasibility and the practicability of the method, the method is tested on an mnist handwriting digital recognition task. Simulation results show that the system using the method can reach 93.1% accuracy after training for 4 times, wherein compared with the prior method, the simulation verification shows that the test accuracy after training is improved by 10%, the final result is compared with the SBP-movement algorithm, as shown in fig. 7, the figure mainly compares the SBP-movement algorithm in hardware design, the original hardware design is compared with the design, and the final test accuracy is aimed at the number of data training data.
In the description of the present invention, it should be understood that the terms "open," "upper," "lower," "thickness," "top," "middle," "length," "inner," "peripheral," and the like indicate orientation or positional relationships, merely for convenience in describing the present invention and to simplify the description, and do not indicate or imply that the components or elements referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus should not be construed as limiting the present invention.
In the description of the present specification, the descriptions of the terms "one embodiment," "example," "specific example," and the like, mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
The foregoing has shown and described the basic principles, principal features and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and that the above embodiments and descriptions are merely illustrative of the principles of the present invention, and various changes and modifications may be made without departing from the spirit and scope of the invention, which is defined in the appended claims.

Claims (9)

1. An implementation apparatus of a physical RC network capable of on-chip learning, comprising:
a signal input circuit configured to receive an input signal;
the storage pool module comprises storage pool units, each storage pool unit in the storage pool units is configured to receive the input signals, the input signals are multiplied by the input signals through analog multipliers, the analog multipliers are formed into a matrix shape, the same input signals are adopted above and to the left of the analog multipliers, the result is the upper triangle part of each multiplier matrix, the first processing result is obtained, the first processing result and the input signals are subjected to vector splicing operation to obtain the second processing result, the second processing result is subjected to matrix multiplication with a constant memristor array, and the dimension reduction operation is carried out to obtain the third processing result;
and the output layer unit is configured to multiply a plurality of third processing results of the plurality of reserve tank units with the weight matrix to obtain a fourth processing result and output the fourth processing result.
2. The device for implementing the physical RC network capable of on-chip learning according to claim 1, wherein the output layer unit includes a weight matrix, a counter propagation circuit, a cross entropy circuit, and a storage logic circuit, the class probability is output through the softmax circuit through a plurality of third processing results passing through the weight matrix, the error of the input probability is calculated by the cross entropy circuit, so as to obtain an updated symbol of the weight, the symbol is positive, a forward pulse is applied to the device, the weight is reduced, the symbol is negative, a reverse pulse is applied to the device, and the weight is increased.
3. The apparatus according to claim 2, wherein the cross entropy circuit calculates an error of the input probability, and the cross entropy formula is shown in formula (1) (2)
Wherein the formula (2) is a softmax calculation formula, L is a finally obtained error value, and y is a target label value,The output result of the formula (2); the cross entropy circuit obtains logarithm, and adopts an equivalent replacement principle; first, the logarithm lnx is taylor-splayed at x=1 as shown in formula (3);
x is actually the output of equation (2), but this equation only indicates that the logarithm can be taylor-expanded and does not belong to this system equation; ignoring the higher term and taking the first term; then the output error voltage passes through a comparator, and is compared with a given threshold voltage, and if the error voltage is smaller than the threshold voltage, a low level is output; the last counter is used for judging whether the training is converged or not, and stopping the training when the training is converged.
4. An apparatus for implementing an on-chip learnable physical RC network as in claim 2, wherein the back propagation circuit comprises: the updating direction of the weight matrix depends on the sign of the backward propagation gradient, firstly, when in forward propagation, the input data is a second processing result, the weight matrix is G, the output voltage vector is obtained by the kirchhoff law and is a third processing result, and then the final output probability is obtained by a formula 2; obtaining error values by the output probability through a cross entropy calculation circuit, so that the derivative sign of the back propagation depends on the derivative of the error values on each weight;
the principle is shown by formula (4)
Where L is the error obtained through cross entropy,for output through the Softmax circuit, C isThe input of the Softmax circuit is the output through the weight matrix, W is the weight matrix G, X is the data of the input weight matrix, namely the second processing result, Y is the label value, and all the variable data of the formula are in matrix form The sign of each directional propagation ultimately depends on the update matrix obtained by the logic circuit between the input voltage second processing result and the difference between the softmax output Y and the target tag.
5. The device for implementing an on-chip learning physical RC network of claim 4 wherein the counter-propagating circuit calculates the error for each class by adding a threshold above which the algorithm converges faster; therefore, a certain modification is needed to the formula (4); obtaining a formula 5;
wherein B isThe obtained difference is mainly composed of 0-1, and all variables of the formula are matrixes; after threshold comparison, an error update condition is obtained, wherein 1 is updated, and 0 is not updated.
6. The device of claim 2, wherein the storage logic is configured to store a last updated state, and to store a weighted updated state by using two memristors.
7. A method for implementing an on-chip learnable physical RC network, which is used in an on-chip learnable physical RC network according to any one of claims 1 to 6, comprising the steps of: performing reasoning calculation operation by using the implementation device of the physical RC network capable of on-chip learning; or performing training calculation operation by using the implementation device of the physical RC network capable of on-chip learning.
8. A method of implementing an on-chip learnable physical RC network as in claim 7, wherein said inferential computing operation comprises: receiving an input signal for the inference calculation operation through the signal input circuit; the input signals are multiplied by the input signals through analog multipliers through the reserve pool circuit, the analog multipliers are formed into a matrix shape, the same input signals are adopted above and on the left of the analog multipliers, the result is obtained in the upper triangular part of each multiplier matrix, a first processing result is obtained, the first processing result and the input signals are combined into a second processing result, then the second processing result and a constant memristor array are subjected to matrix multiplication, and dimension reduction operation is carried out, so that a third processing result is obtained;
multiplying the plurality of third processing results by the weight matrix through the output layer unit to obtain the fourth processing result, and outputting the fourth processing result.
9. A method of implementing an on-chip learnable physical RC network according to claim 7, wherein the training computing operation comprises:
receiving, by the signal input circuit, an input signal for the training computing operation and a tag value for the input signal;
the input signals are multiplied by the input signals through analog multipliers through the reserve pool circuit, the analog multipliers are formed into a matrix shape, the same input signals are adopted above and on the left of the analog multipliers, the result is obtained in the upper triangular part of each multiplier matrix, a first processing result is obtained, the first processing result and the input signals are combined into a second processing result, then the second processing result and a constant memristor array are subjected to matrix multiplication, and dimension reduction operation is carried out, so that a third processing result is obtained;
multiplying the plurality of third processing results by the weight matrix through the output layer unit to obtain a fourth processing result, and outputting the fourth processing result;
calculating an error of the weight matrix according to the plurality of fourth processing results and the tag values of the training input signals to update the weight matrix; and writing the updated weight matrix into the output layer unit.
CN202310438361.3A 2023-04-23 2023-04-23 Device and method for realizing physical RC network capable of learning on chip Pending CN116523014A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310438361.3A CN116523014A (en) 2023-04-23 2023-04-23 Device and method for realizing physical RC network capable of learning on chip

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310438361.3A CN116523014A (en) 2023-04-23 2023-04-23 Device and method for realizing physical RC network capable of learning on chip

Publications (1)

Publication Number Publication Date
CN116523014A true CN116523014A (en) 2023-08-01

Family

ID=87389629

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310438361.3A Pending CN116523014A (en) 2023-04-23 2023-04-23 Device and method for realizing physical RC network capable of learning on chip

Country Status (1)

Country Link
CN (1) CN116523014A (en)

Similar Documents

Publication Publication Date Title
US11348002B2 (en) Training of artificial neural networks
Salamat et al. Rnsnet: In-memory neural network acceleration using residue number system
US11373092B2 (en) Training of artificial neural networks
Kan et al. Simple reservoir computing capitalizing on the nonlinear response of materials: theory and physical implementations
KR102672586B1 (en) Artificial neural network training method and device
US20200118001A1 (en) Mixed-precision deep-learning with multi-memristive devices
US20200293855A1 (en) Training of artificial neural networks
Cai et al. Training low bitwidth convolutional neural network on RRAM
US20210383203A1 (en) Apparatus and method with neural network
JP4579798B2 (en) Arithmetic unit
CN114819128A (en) Variational reasoning method and device of Bayesian neural network based on memristor array
US11568217B2 (en) Sparse modifiable bit length deterministic pulse generation for updating analog crossbar arrays
US11556770B2 (en) Auto weight scaling for RPUs
CN113841164A (en) Noise and signal management for RPU arrays
CN116523014A (en) Device and method for realizing physical RC network capable of learning on chip
US11443171B2 (en) Pulse generation for updating crossbar arrays
CN115796252A (en) Weight writing method and device, electronic equipment and storage medium
CN114861902A (en) Processing unit, operation method thereof and computing chip
US11816447B2 (en) Method and apparatus performing operations using circuits
Huang et al. BWA-NIMC: Budget-based Workload Allocation for Hybrid Near/In-Memory-Computing
US20240143541A1 (en) Compute in-memory architecture for continuous on-chip learning
US20240037394A1 (en) System and method for neural network multiple task adaptation
CN116128035A (en) Training method and device, electronic equipment and computer storage medium
JP2022173059A (en) Hybrid adc-based mac computation circuit and method
Manfredini Hybrid Artificial Neural Networks for Electricity Consumption Prediction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination