CN112749784B - Computing device and acceleration method of neural network - Google Patents

Computing device and acceleration method of neural network Download PDF

Info

Publication number
CN112749784B
CN112749784B CN201911063237.3A CN201911063237A CN112749784B CN 112749784 B CN112749784 B CN 112749784B CN 201911063237 A CN201911063237 A CN 201911063237A CN 112749784 B CN112749784 B CN 112749784B
Authority
CN
China
Prior art keywords
calculation result
analog
operation circuit
circuit
point
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911063237.3A
Other languages
Chinese (zh)
Other versions
CN112749784A (en
Inventor
张悠慧
韩建辉
蒋磊
王侃文
吴华强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Huawei Technologies Co Ltd
Original Assignee
Tsinghua University
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University, Huawei Technologies Co Ltd filed Critical Tsinghua University
Priority to CN201911063237.3A priority Critical patent/CN112749784B/en
Publication of CN112749784A publication Critical patent/CN112749784A/en
Application granted granted Critical
Publication of CN112749784B publication Critical patent/CN112749784B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • G06N3/065Analogue means

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Neurology (AREA)
  • Complex Calculations (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the application discloses a computing device, which comprises: the computing array comprises a plurality of computing units and is used for computing input data and a preset weight matrix to obtain a first computing result, wherein the first computing result is in an analog signal form, and the input data is in a digital signal form; the analog operation circuit is connected with the calculation array and used for carrying out point-by-point operation on the first calculation result to obtain a second calculation result, wherein the second calculation result is in an analog signal form; and the analog-to-digital conversion circuit is connected with the analog operation circuit and is used for converting the second calculation result into a digital signal to obtain a third calculation result. By adopting the embodiment of the application, the hardware cost can be reduced, and the power consumption can be reduced.

Description

Computing device and acceleration method of neural network
Technical Field
The present application relates to the field of neural networks, and in particular, to a computing device and a method for accelerating a neural network.
Background
A long short-term memory (LSTM) is a type of time-recurrent neural network that, due to its unique design structure, is suitable for processing and predicting very long-spaced and delayed events of importance in a time series. The LSTM may use a memristor array to process matrix multiplication, where peripheral circuits of the memristor array mainly include an analog-to-digital converter (ADC), and the ADC has a high power consumption due to a high hardware overhead in the LSTM hardware system.
Disclosure of Invention
The embodiment of the application provides a computing device and a neural network acceleration method, which reduce the expenditure of hardware resources and reduce the power consumption.
In a first aspect, an embodiment of the present application provides a computing device, including: the computing array comprises a plurality of computing units and is used for computing input data and a preset weight matrix to obtain a first computing result, wherein the first computing result is in an analog signal form, and the input data is in a digital signal form; the analog operation circuit is connected with the calculation array and used for carrying out point-by-point operation on the first calculation result to obtain a second calculation result, wherein the second calculation result is in an analog signal form; the analog-to-digital conversion circuit is connected with the analog operation circuit and used for converting the second calculation result into a digital signal to obtain a third calculation result. By inputting the first calculation result to the analog operation circuit and then inputting the second calculation result output by the analog operation circuit to the analog-to-digital conversion circuit, the hardware cost of the analog conversion circuit in the neural network is reduced, and the power consumption is reduced. And the analog signal is converted into the digital signal through the analog-to-digital conversion circuit, so that the digital signal is transmitted on the network-on-chip, and the cooperation of a plurality of ceramic chips can be realized more efficiently to realize a large-scale neural network.
In one possible design, the activation function circuit is configured to perform an activation function calculation on the first calculation result to obtain a first intermediate result; and the point-by-point computing circuit is connected with the activation function circuit and is used for performing point-by-point computation on the first intermediate result. By introducing an activation function circuit and a point-by-point calculation circuit, the accuracy of the neural network operation is improved.
In another possible design, the weight matrix is determined from error terms obtained from circuit characteristics of the analog operation circuit. The weight matrix of the neural network model is adjusted by introducing circuit characteristics, so that the reasoning accuracy of the neural network is improved.
In another possible design, the error term includes a relational expression of a first derivative of the error term; the analog operation circuit is also used for adjusting the output value of the corresponding operator realized by the analog operation circuit according to the relational expression of the first derivative of the error term, and the output value of the corresponding operator realized by the analog operation circuit is used for adjusting the initial weight matrix to obtain the weight matrix. And adjusting the weight matrix of the neural network model through the relational expression of the first derivative of the error term, thereby improving the reasoning accuracy of the neural network.
In another possible design, the analog operation circuit is further configured to train through a sum of a theoretical relational expression of the first derivative of the activation function and a relational expression of the first derivative of the error term, and adjust an output value of a corresponding operator implemented by the analog operation circuit.
In another possible design, the error term includes a relational expression of the error term; and the analog operation circuit is also used for adjusting the output value of the corresponding operator realized by the analog operation circuit according to the relational expression of the error term. And then the weight matrix of the neural network model is adjusted through the output value of the corresponding operator realized by the adjusted analog operation circuit, so that the reasoning precision of the neural network is improved.
In another possible design, the analog operation circuit is further configured to train through a sum of a theoretical relational expression of the analog operation circuit and a relational expression of the error term, and adjust an output value of a corresponding operator implemented by the analog operation circuit.
In another possible design, the computing device is applied to a long short term memory network LSTM system.
In a second aspect, an embodiment of the present application provides a method for accelerating a neural network, where the method is performed by a computing array, an analog operation circuit, and an analog-to-digital conversion circuit in a computing device provided in the first aspect, and the same flow and steps as those in the first aspect are performed.
In a third aspect, the application provides a computer readable storage medium having instructions stored therein which, when run on a computer, cause the computer to perform the method of any of the above aspects.
In a fourth aspect, the present application provides a computer program product for storing a computer program for causing a computer to carry out the method of any one of the preceding aspects when the computer program is run on the computer.
Drawings
In order to more clearly describe the embodiments of the present application or the technical solutions in the background art, the following description will describe the drawings that are required to be used in the embodiments of the present application or the background art.
Fig. 1 is a schematic structural diagram of a neural network chip according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a weight matrix according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a computing unit according to an embodiment of the present application;
Fig. 4 is a schematic structural diagram of a neural network processing unit according to an embodiment of the present application;
FIG. 5 is a schematic diagram of a computing device according to an embodiment of the present application;
FIG. 6 is a schematic diagram of a tanh function circuit according to an embodiment of the present application;
FIG. 7 is a schematic diagram of a sigma function circuit according to an embodiment of the application;
FIG. 8 is a schematic diagram of a point-wise multiplication circuit according to an embodiment of the present application;
FIG. 9 is a schematic diagram of another computing device provided by an embodiment of the present application;
FIG. 10 is a schematic diagram of a training model deployment provided by an embodiment of the present application;
FIG. 11 is a schematic diagram of a circuit feature provided by an embodiment of the present application;
FIG. 12 is a flowchart of an output value adjustment method according to an embodiment of the present application;
Fig. 13 is a flow chart of an acceleration method of a neural network according to an embodiment of the present application.
Detailed Description
Embodiments of the present application will be described below with reference to the accompanying drawings in the embodiments of the present application.
An artificial neural network (ARTIFICIAL NEURAL NETWORK, ANN), simply referred to as Neural Network (NN) or neural-like network, is a mathematical or computational model that mimics the structure and function of a biological neural network (the central nervous system of an animal, particularly the brain) in the field of machine learning and cognitive sciences, for estimating or approximating functions. The artificial neural network may include convolutional neural networks (convolutional neural network, CNN), deep neural networks (deep neural network, DNN), multi-layer perceptrons (multilayer perceptron, MLP), and the like. The neural network circuit may be a chip array composed of a plurality of neural network chips (chips).
As shown in fig. 1, fig. 1 is a schematic structural diagram of a neural network chip according to an embodiment of the present application. The neural network chip may include a plurality of neural network processing units and a plurality of routers. Fig. 1 illustrates a neural network processing unit as a tile. One tile may be connected to one or more routers, and a plurality of tiles having the same structure may be connected through the routers to form a network-on-chip (NoC). Each tile may include an ADC, an Input/Output buffer (Input/Output buffer), a memory buffer (memory buffer), a router (router), a memristor array (ReRAM crossbar array), and an analog operation circuit, where the memristor array is configured to calculate Input data (e.g., x t) with a preset weight matrix (W f、Wi、Wc、Wo). The analog operation circuit may include a sigmoid unit (e.g., sigma function), a tanh unit (e.g., tanh function), a multiplication unit (e.g., x), and an addition unit (e.g., +).
In the embodiment of the application, because the ceramic tile of the neural network system comprises at least one ReRAM crossbar (crossbar), and because ReRAM has the advantages of integration of storage and calculation, weights can be configured on RERAM CELL before calculation, and calculation results can be directly sent to the next layer for pipeline calculation. Weights are typically used to represent the importance of the input data to the output data. In neural networks, weights are typically represented by a matrix. As shown in fig. 2, the weight matrix of j rows and k columns shown in fig. 2 may be a weight of a neural network layer, and each element in the weight matrix represents a weight value.
In the embodiment of the application, the weight in each computing node can be finished through pre-configuration. Specifically, each element in a weight matrix is configured in RERAM CELL in a corresponding crossbar array, so that a matrix multiply-add operation of input data and configured weights can be implemented through the ReRAM crossbar array. A brief description will be given of how matrix multiply-add operations are implemented by crossbar.
For clarity of description, a brief description is given below of how the ReRAM crossbar implements matrix multiplication and addition operations. As shown in fig. 2, the weight matrix of j rows and k columns shown in fig. 2 may be a weight of a neural network layer, and each element in the weight matrix represents a weight value. Fig. 3 is a schematic structural diagram of a ReRAM crossbar in a computing unit according to an embodiment of the present application. For convenience of description, the embodiment of the application may refer to the ReRAM crossbar for short. As shown in FIG. 3, the crossbar includes a plurality RERAM CELL, such as G1, G2,1, etc. The plurality RERAM CELL forms a neural network matrix. In the embodiment of the present application, the weights shown in fig. 2 may be input into the crossbar from the bit line of the crossbar shown in fig. 3 (as shown by the input port 402 in fig. 3) in the process of configuring the neural network, so that each element in the weights is configured into a corresponding RERAM CELL. For example, the weight elements W0,0 in fig. 2 are configured into G1,1 of fig. 3, the weight elements W1,0 in fig. 2 are configured into G2,1 of fig. 3, and so on. One for each weight element RERAM CELL. In performing neural network calculations, input data is input to the crossbar via the word lines of the crossbar (e.g., input ports 404 shown in FIG. 3). It will be appreciated that the input data may be represented by voltages, so that the input data and the weight values configured in RERAM CELL implement a dot product operation, and the obtained calculation result is output from the output terminal (such as the output port 406 shown in fig. 3) of each column of crossbar in the form of an output voltage.
The neural network related by the application can be a long short-term memory (LSTM) network, which is a time recurrent neural network, and is suitable for processing and predicting important events with very long intervals and delays in a time sequence due to the unique design structure. Because the long-term memory network is suitable for processing time or sequence related input and output information, the long-term memory network is widely applied to the fields of voice recognition, machine translation, picture-text conversion, machine question answering or video-text conversion and the like at present, and good effects are obtained. Forward calculation flow of long-term memory network: for time t=1, the term "T, from the input sequence x= (X 1,......,xT), the long and short term memory network calculates the hidden state h= (H1, the..the..h T) of its output.
The specific calculation mode is as follows:
ft=σ(Wfxxt+Wfhht-1+bf)
it=σ(Wixxt+Wihht-1+bi)
ot=σ(Woxxt+Wohht-1+bo)
ht=ot⊙tanh(Ct)
the 1 st to 4 th rows of the above formula perform matrix multiplication, and the 5 th and 6 th rows perform point-wise multiplication (++sign) and point-wise addition (++sign) operations. It can be seen that LSTM neural network models require a point-wise multiplication of vectors and a point-wise addition, as compared to other neural network models. And two different types of activation functions (sigma and tanh) need to be supported.
As shown in fig. 4, fig. 4 is a schematic structural diagram of a neural network processing unit. The neural network processing unit may be a tile including a memory array (memory array), a buffer array (buffer array), a data flow controller (dataflow controller), a plurality of ReRAM crossbar, a plurality of mults, and a plurality of SFUs. Wherein, mult and SFU are both arithmetic units of digital circuits. The neural network processing unit executes the following procedures: the method comprises the steps of firstly, obtaining a calculation result through memristor array processing matrix multiplication, then converting the calculation result into a digital signal through an ADC, and finally, processing the digital signal through an activation function and point-by-point operation. As also shown in fig. 4, peripheral circuitry, primarily analog-to-digital converters ADCs, are disposed around the memristor array. Because each path of analog signal output by the memristor array needs an analog-to-digital conversion circuit to convert into a digital signal, more analog-to-digital conversion circuits are needed, the cost of the analog-to-digital converter in a hardware system is high, and the power consumption is high. In order to solve the technical problems, the embodiment of the application provides the following solutions.
FIG. 5 is a schematic diagram of a computing device according to an embodiment of the present application. The computing device may be an improved neural network processing unit (e.g., tile) as mentioned above, the computing device comprising at least:
the computing array 501 includes a plurality of computing units, and is configured to compute input data and a preset weight matrix to obtain a first computing result, where the first computing result is in an analog signal form, and the input data is in a digital signal form. The computing array 501 may be a memristor array. The specific calculation method of the calculation array 501 may refer to the description above with respect to fig. 2 and 3.
The analog operation circuit 502 is connected to the calculation array, and is configured to perform a point-by-point operation on the first calculation result to obtain a second calculation result, where the second calculation result is in an analog signal form.
Specifically, the analog operation circuit 502 may further include an activation function circuit and a point-by-point calculation circuit, where the activation function circuit is configured to perform activation function calculation on the first calculation result to obtain a first intermediate result. And the point-by-point calculation circuit is connected with the activation function circuit and is used for executing point-by-point calculation on the first intermediate result. The activation function circuit comprises a sigma function circuit and a tanh function circuit, and the point-by-point calculation circuit comprises a point-by-point addition circuit and a point-by-point multiplication circuit. The circuit structure of each analog operation circuit may refer to fig. 6 to 8. Fig. 6 is a schematic diagram of a tanh function circuit according to an embodiment of the present application. Fig. 7 is a schematic diagram of a sigma function circuit according to an embodiment of the present application. Fig. 8 is a schematic diagram of a point-by-point multiplication circuit according to an embodiment of the present application. Each analog operation circuit is formed by connecting a plurality of components.
The analog-to-digital conversion circuit 503 is connected to the analog operation circuit, and is configured to convert the second calculation result into a digital signal, so as to obtain a third calculation result.
It should be noted that, because the analog signal is not suitable for network-on-chip transmission, the analog signal is converted into the digital signal by the analog-to-digital conversion circuit, so that the digital signal is transmitted on the network-on-chip, and thus, the cooperative work of a plurality of ceramic chips can be realized more efficiently to realize a larger-scale LSTM network.
Optionally, the computing device may further include a router connected to the analog-to-digital conversion circuit, and the plurality of tiles with the same structure are connected through the router to form an on-chip network of the LSTM network, and data processed by any one tile may be sent to other tiles through the on-chip network formed by the router, where the computing device may be used for data operation of the LSTM network.
Optionally, the computing device may further include a memory coupled to the analog-to-digital conversion circuit for storing the input data and the output data.
In an embodiment of the present application, each tile may include 16 memristor arrays with dimensions 128×128. The on-chip memory may employ enhanced dynamic random access memory (ENHANCE DYNAMIC random access memory, eDRAM) and Static Random Access Memory (SRAM). Analog-to-digital conversion circuit 503 may use a 1.3GHz sampling frequency, thereby achieving low power consumption.
As shown in fig. 9, fig. 9 is a schematic structural diagram of another computing device according to an embodiment of the present application. Firstly, calculating input data and a preset weight matrix through a calculation array to obtain a first calculation result. The first calculation result is then input to the analog operation circuit, and the second calculation result output by the analog operation circuit is input to the analog-to-digital conversion circuit. In the process, after the multipath analog signals pass through the analog operation circuit, the number of paths of analog signal output is reduced, so that fewer analog-to-digital conversion circuits can be used for processing. By adopting the technical scheme of the embodiment of the application, the hardware cost of the analog-to-digital converter in the hardware system of the LSTM network is reduced, thereby reducing the circuit power consumption.
As shown in table 1, table 1 is a comparative schematic table of hardware efficiencies (power consumption efficiency and area efficiency) employing computing device a and computing device B, respectively. As can be seen from table 1, computing device a has improved power consumption efficiency by 79.6% and area efficiency by a factor of 3.28 relative to computing device B.
TABLE 1
Efficiency of power consumption Area efficiency
A 208.9GOP/s/W 913.9GOP/s/mm2
B 116.3GOP/s/W 278.2GOP/s/mm2
Optionally, because errors are introduced in the processes of realizing the activation function, the point-by-point multiplication and the point-by-point addition by the analog operation circuit, the embodiment of the application can train the LSTM network model first, adjust the output value of the analog operation circuit, further adjust the initial weight matrix of the LSTM network model, and then realize the calculation process through the adjusted weight matrix. As shown in fig. 10, fig. 10 is a schematic diagram of a training model deployment provided in an embodiment of the present application, first training an LSTM network model, and then deploying the trained LSTM network model onto an LSTM accelerator for operation. Wherein the LSTM accelerator is a neural network circuit comprising the computing device shown in fig. 5.
The weight matrix is determined according to an error term obtained according to the circuit characteristic of the analog operation circuit, and the circuit characteristics of a sigma function circuit, a tanh function circuit, a point-by-point addition circuit or a point-by-point multiplication circuit can be respectively extracted according to the same method and analyzed, so that the initial weight matrix of the LSTM network model is adjusted to obtain an adjusted weight matrix. Wherein the error term may comprise a relational expression of the error term and a relational expression of a first derivative of the error term. The error term may be determined as follows.
First, a circuit characteristic of an analog operation circuit may be extracted, wherein the circuit characteristic may include an actual value of a correspondence relationship between an input value and an output value. Specifically, the hardware circuitry employed in the actual design may be simulated with a circuit simulation tool (e.g., HSPICE, etc.). For each analog operation circuit, one input value corresponds to one output value, and finally, the scattered point relation of circuit characteristics from input to output can be obtained. As shown in fig. 11, the circuit characteristics of the analog operation circuit are obtained by circuit simulation, in which tanh function circuits are taken as an example for illustration, and the results of other analog operation circuits are similar and will not be described here.
Then, a relational expression of the error term and a relational expression of the first derivative of the error term may be determined from the circuit characteristics of the analog operation circuit. The error terms for the various analog circuits may be fitted by a digital analysis tool (e.g., MATLAB). The method comprises the following steps:
Step 1, the actual value of the circuit characteristic of the analog operation circuit obtained in the above step may be subtracted from the theoretical value of the input and output of the analog operation circuit, so as to obtain error items of a plurality of scattered point data. As shown in fig. 11, the actual value of the circuit characteristic of the analog operation circuit is different from the theoretical value of the analog operation circuit, and the actual value and the theoretical value can be subtracted to obtain error terms of a plurality of scatter data.
And 2, fitting error items of the plurality of scattered point data to obtain a continuous relational expression of the error items. Fitting of the piecewise polynomial approximation may be used.
The errors of the first derivatives of the activation functions (i.e. sigma and tanh functions) are extracted and fitted as follows.
And step 3, determining a relational expression of the first derivative of the discrete circuit characteristic according to the actual value of the circuit characteristic of the analog operation circuit obtained in the step.
Step 4, a discrete relational expression of the first derivative of the error term may be determined in the same manner as step 1.
Step 5, fitting the discrete relational expression of the first derivative of the error term into a continuous relational expression by adopting a fitting mode of piecewise polynomial approximation.
And finally, when the LSTM network model is trained to call the corresponding operator realized by the analog operation circuit, the output value of the operator corresponding to the analog operation circuit is adjusted through the relational expression of the error term and the relational expression of the first derivative of the error term obtained through the steps, so that the weight matrix of the LSTM network model is adjusted. As shown in fig. 12, fig. 12 is a flowchart of an output value adjustment method according to an embodiment of the present application. In this process, it is necessary to determine whether the training process is forward-propagating or backward-propagating, and whether the activation function is called at the time of backward propagation. The forward propagation represents the process of sequentially propagating the input data from front to back in the training model to obtain an output result. Because the output result obtained after forward propagation and the true value of the training sample have certain error terms, the error terms can be utilized to sequentially calculate the partial derivatives of each parameter from back to front so as to adjust the weight matrix of the training model, and the reverse process is called reverse propagation.
Optionally, the analog operation circuit 502 is further configured to adjust an output value of a corresponding operator implemented by the analog operation circuit according to a relational expression of the first derivative of the error term, where the output value of the corresponding operator implemented by the analog operation circuit is used to adjust an initial weight matrix to obtain the weight matrix. Further, when the first derivative of the activation function is called in the back propagation process, training is performed through the sum of the theoretical relational expression of the first derivative of the activation function and the relational expression of the first derivative of the error term, and the output value of the corresponding operator realized by the analog operation circuit is adjusted. When the first derivative of the activation function is not called in the back propagation process, training is carried out through a theoretical relational expression of the first derivative of the activation function, and the output value of a corresponding operator realized by the simulation operation circuit is adjusted.
Optionally, the analog operation circuit 502 is further configured to adjust an output value of a corresponding operator implemented by the analog operation circuit according to the relational expression of the error term. And then forward propagation is carried out according to the output value of the corresponding operator realized by the adjusted analog operation circuit, and then backward propagation is carried out by utilizing the output result obtained by the forward propagation, so that the weight matrix of the LSTM network model is adjusted. Further, when a corresponding algorithm implemented by the analog operation circuit is invoked in the forward propagation process, training is performed by the sum of the theoretical relational expression of the analog operation circuit and the relational expression of the error term.
And the weight matrix of the LSTM network model is adjusted by introducing circuit characteristics, so that the inference accuracy of the LSTM network is improved. In addition, through the modification of the back propagation process in the training process, the accuracy of the introduced circuit characteristic is improved, and the convergence of the training method is ensured.
Fig. 13 is a schematic flow chart of an acceleration method of a neural network according to an embodiment of the present application, as shown in fig. 13. The steps in the embodiment of the application mainly comprise:
S1301, calculating the input data and a preset weight matrix to obtain a first calculation result, wherein the first calculation result is in an analog signal form, and the input data is in a digital signal form.
Wherein the weight matrix is determined based on error terms obtained from circuit characteristics of the analog operation circuit.
Optionally, the error term includes a relational expression of a first derivative of the error term; before the input data and the preset weight matrix are calculated to obtain a first calculation result, the output value of a corresponding operator realized by the analog operation circuit can be adjusted according to the relational expression of the first derivative of the error term, and the output value of the corresponding operator realized by the analog operation circuit is used for adjusting the initial weight matrix to obtain the weight matrix. Further, training is performed through the sum of the theoretical relational expression of the first derivative of the activation function and the relational expression of the first derivative of the error term, and the output value of the corresponding operator realized by the analog operation circuit is adjusted.
Optionally, the error term includes a relational expression of the error term; before the input data and the preset weight matrix are calculated to obtain a first calculation result, the output value of a corresponding operator realized by the analog operation circuit can be adjusted according to the relation expression of the error term. Further, training can be performed through the sum of the theoretical relational expression of the analog operation circuit and the relational expression of the error term, and the output value of the corresponding operator realized by the analog operation circuit is adjusted. And adjusting the weight matrix of the neural network model through the output value of the corresponding operator realized by the adjusted analog operation circuit, thereby improving the reasoning precision of the neural network.
S1302, performing point-by-point operation on the first calculation result to obtain a second calculation result, wherein the second calculation result is in an analog signal form.
Specifically, performing activation function calculation on the first calculation result to obtain a first intermediate result; a point-wise calculation is performed on the first intermediate result.
And S1303, converting the second calculation result into a digital signal to obtain a third calculation result.
The specific implementation procedure may correspond to the corresponding description of the embodiment of the apparatus shown in fig. 5, and execute the procedure executed by the computing apparatus in the above embodiment.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, by wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid state disk Solid STATE DISK (SSD)), etc.
The above-mentioned specific embodiments further describe the objects, technical solutions and advantageous effects of the present application in detail. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims (8)

1. A computing device, comprising:
The computing array comprises a plurality of computing units and is used for computing input data and a preset weight matrix to obtain a first computing result, wherein the first computing result is in an analog signal form, and the input data is in a digital signal form;
The analog operation circuit is connected with the calculation array and used for carrying out point-by-point operation on the first calculation result to obtain a second calculation result, wherein the second calculation result is in an analog signal form;
The analog-to-digital conversion circuit is connected with the analog operation circuit and is used for converting the second calculation result into a digital signal to obtain a third calculation result;
wherein the weight matrix is determined according to an error term obtained from circuit characteristics of the analog operation circuit, the error term including a relational expression of a first derivative of the error term;
The analog operation circuit is further used for adjusting output values of corresponding operators realized by the analog operation circuit according to a relational expression of the first derivative of the error term, and the output values of the corresponding operators realized by the analog operation circuit are used for adjusting an initial weight matrix to obtain the weight matrix.
2. The computing device of claim 1, wherein the analog operation circuit comprises:
The activation function circuit is used for executing activation function calculation on the first calculation result to obtain a first intermediate result;
and the point-by-point calculation circuit is connected with the activation function circuit and is used for executing point-by-point calculation on the first intermediate result.
3. The computing device of claim 1, wherein the analog operation circuit is further to adjust an output value of a corresponding operator implemented by the analog operation circuit by training a sum of a theoretical relational expression of a first derivative of an activation function and a relational expression of a first derivative of the error term.
4. A computing device as claimed in any one of claims 1 to 3, wherein the computing device is applied to a long short term memory network LSTM system.
5. A method for accelerating a neural network, comprising:
Calculating input data and a preset weight matrix to obtain a first calculation result, wherein the first calculation result is in an analog signal form, and the input data is in a digital signal form;
Performing point-by-point operation on the first calculation result to obtain a second calculation result, wherein the second calculation result is in an analog signal form;
Converting the second calculation result into a digital signal to obtain a third calculation result; wherein the weight matrix is determined according to an error term obtained from circuit characteristics of an analog operation circuit, and the error term comprises a relational expression of a first derivative of the error term;
And adjusting the output value of the corresponding operator realized by the analog operation circuit according to the relational expression of the first derivative of the error term, wherein the output value of the corresponding operator realized by the analog operation circuit is used for adjusting an initial weight matrix to obtain the weight matrix.
6. The method of claim 5, wherein performing a point-wise operation on the first calculation result to obtain a second calculation result comprises:
Performing activation function calculation on the first calculation result to obtain a first intermediate result;
a point-wise calculation is performed on the first intermediate result.
7. The method of claim 5, wherein said adjusting the output value of the corresponding operator implemented by the analog operation circuit according to the relational expression of the first derivative of the error term comprises:
And training by the sum of a theoretical relational expression of the first derivative of the activation function and a relational expression of the first derivative of the error term, and adjusting the output value of a corresponding operator realized by the analog operation circuit.
8. The method according to any of claims 5-7, wherein the method is applied to a long short term memory network LSTM system.
CN201911063237.3A 2019-10-31 2019-10-31 Computing device and acceleration method of neural network Active CN112749784B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911063237.3A CN112749784B (en) 2019-10-31 2019-10-31 Computing device and acceleration method of neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911063237.3A CN112749784B (en) 2019-10-31 2019-10-31 Computing device and acceleration method of neural network

Publications (2)

Publication Number Publication Date
CN112749784A CN112749784A (en) 2021-05-04
CN112749784B true CN112749784B (en) 2024-05-14

Family

ID=75645460

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911063237.3A Active CN112749784B (en) 2019-10-31 2019-10-31 Computing device and acceleration method of neural network

Country Status (1)

Country Link
CN (1) CN112749784B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113656751B (en) * 2021-08-10 2024-02-27 上海新氦类脑智能科技有限公司 Method, apparatus, device and medium for realizing signed operation by unsigned DAC
US11876527B2 (en) 2021-09-27 2024-01-16 Skymizer Taiwan Inc. Error calibration apparatus and method
CN115081373B (en) * 2022-08-22 2022-11-04 统信软件技术有限公司 Memristor simulation method and device, computing equipment and readable storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004027680A1 (en) * 2002-09-18 2004-04-01 Canon Kabushiki Kaisha Arithmetic circuit
CN109472348A (en) * 2018-10-23 2019-03-15 华中科技大学 A kind of LSTM nerve network system based on memristor crossed array
CN110245324A (en) * 2019-05-19 2019-09-17 南京惟心光电系统有限公司 A kind of de-convolution operation accelerator and its method based on photoelectricity computing array
CN110325963A (en) * 2017-02-28 2019-10-11 微软技术许可有限责任公司 The multi-functional unit for programmable hardware node for Processing with Neural Network

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11501130B2 (en) * 2016-09-09 2022-11-15 SK Hynix Inc. Neural network hardware accelerator architectures and operating method thereof
US11061646B2 (en) * 2018-09-28 2021-07-13 Intel Corporation Compute in memory circuits with multi-Vdd arrays and/or analog multipliers

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004027680A1 (en) * 2002-09-18 2004-04-01 Canon Kabushiki Kaisha Arithmetic circuit
CN110325963A (en) * 2017-02-28 2019-10-11 微软技术许可有限责任公司 The multi-functional unit for programmable hardware node for Processing with Neural Network
CN109472348A (en) * 2018-10-23 2019-03-15 华中科技大学 A kind of LSTM nerve network system based on memristor crossed array
CN110245324A (en) * 2019-05-19 2019-09-17 南京惟心光电系统有限公司 A kind of de-convolution operation accelerator and its method based on photoelectricity computing array

Also Published As

Publication number Publication date
CN112749784A (en) 2021-05-04

Similar Documents

Publication Publication Date Title
CN112749784B (en) Computing device and acceleration method of neural network
Sung et al. Resiliency of deep neural networks under quantization
CN106022521B (en) Short-term load prediction method of distributed BP neural network based on Hadoop architecture
CN107730003A (en) One kind supports more high-precision NILM implementation methods of appliance type
JPH0713949A (en) Neural network and its usage method
CN110543939B (en) Hardware acceleration realization device for convolutional neural network backward training based on FPGA
Roy et al. TxSim: Modeling training of deep neural networks on resistive crossbar systems
KR102189311B1 (en) An apparatus of analysis and a method therefor
US20210406661A1 (en) Analog Hardware Realization of Neural Networks
KR102396447B1 (en) Deep learning apparatus for ANN with pipeline architecture
EP4305554A1 (en) Analog hardware realization of trained neural networks for voice clarity
TWI737228B (en) Quantization method based on hardware of in-memory computing and system thereof
Sakr et al. Signal processing methods to enhance the energy efficiency of in-memory computing architectures
US20220268229A1 (en) Systems and Methods for Detonation Control in Spark Ignition Engines Using Analog Neuromorphic Computing Hardware
WO2022062391A1 (en) System and method for accelerating rnn network, and storage medium
CN113435595A (en) Two-stage optimization method for extreme learning machine network parameters based on natural evolution strategy
Wu et al. A 3.89-GOPS/mW scalable recurrent neural network processor with improved efficiency on memory and computation
KR20210091880A (en) Method of reconfiguring neural network model and data processing apparatus performing the same
KR102138227B1 (en) An apparatus for optimizing fluid dynamics analysis and a method therefor
US20230177284A1 (en) Techniques of performing operations using a hybrid analog-digital processor
Vazquez et al. A high-precision multi-arithmetic neural circuit for the efficient computation of the new filtered-X Kronecker product APL-NLMS algorithm applied to active noise control
Cao et al. A 65 nm wireless image SoC supporting on-chip DNN optimization and real-time computation-communication trade-off via actor-critical neuro-controller
Gupta et al. Neuromorphic time-multiplexed reservoir computing with on-the-fly weight generation for edge devices
Prajapati et al. FPGA implementation of MRMN with step-size scaler adaptive filter for impulsive noise reduction
CN112488248A (en) Method for constructing proxy model based on convolutional neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant