CN111984921A - Memory numerical calculation accelerator and memory numerical calculation method - Google Patents

Memory numerical calculation accelerator and memory numerical calculation method Download PDF

Info

Publication number
CN111984921A
CN111984921A CN202010879915.XA CN202010879915A CN111984921A CN 111984921 A CN111984921 A CN 111984921A CN 202010879915 A CN202010879915 A CN 202010879915A CN 111984921 A CN111984921 A CN 111984921A
Authority
CN
China
Prior art keywords
vector
memory
memory array
matrix
solving
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010879915.XA
Other languages
Chinese (zh)
Other versions
CN111984921B (en
Inventor
李祎
李健聪
缪向水
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CN202010879915.XA priority Critical patent/CN111984921B/en
Publication of CN111984921A publication Critical patent/CN111984921A/en
Application granted granted Critical
Publication of CN111984921B publication Critical patent/CN111984921B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/52Multiplying; Dividing
    • G06F7/523Multiplying only

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Computational Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Complex Calculations (AREA)

Abstract

The invention discloses an in-memory numerical calculation accelerator and an in-memory numerical calculation method, which comprise an external control module and an in-memory calculation module; wherein the in-memory computing module comprises a non-volatile memory array; based on the storage and calculation integration characteristics of a numerical iteration algorithm and a nonvolatile memory, the nonvolatile memory array executes data intensive vector matrix multiplication operation, an external control unit executes control of the numerical iteration algorithm, and most numerical problems can be solved by using a numerical algorithm containing the vector matrix multiplication operation, so that the numerical calculation accelerator can be widely applied to tasks such as solving of linear equations, solving of linear equation sets, solving of stationary/time-varying partial differential equations, solving of matrix characteristic values and characteristic vectors, solving of a curve minimum two-layer fitting problem, solving of linear regression equations and the like, and the system has high reconfigurability. In addition, the memory numerical calculation accelerator can be compatible with various nonvolatile memories, and has strong expandability.

Description

Memory numerical calculation accelerator and memory numerical calculation method
Technical Field
The invention belongs to the field of analog circuits, and particularly relates to an in-memory numerical calculation accelerator and an in-memory numerical calculation method.
Background
In the big data era, when a large amount of data is transmitted between a storage unit and an operation unit during calculation, the traditional computer architecture generates huge energy consumption during the operation process when processing data intensive tasks, so that the traditional computer architecture has very low operation energy efficiency when processing the data intensive tasks.
The storage and computation integration framework based on various nonvolatile memories is a new computing framework for processing data intensive tasks, and the data transmission in the computation process is reduced to the maximum extent because the computation process is directly carried out in the memories, so that the storage and computation integration framework has high computation energy efficiency. At present, a storage and calculation integrated architecture has achieved remarkable achievement in the field of neuromorphic computing, and various artificial neural networks built based on nonvolatile memories prove the great potential of the storage and calculation integrated technology.
However, as a data intensive task, the non-volatile memory based integrated memory technology faces a challenge when performing acceleration of numerical calculation, and at present, although various non-volatile memory based acceleration circuits for numerical calculation have been proposed, most of these tasks can only process one or two tasks, and taking a linear equation set as an example, the linear equation set itself has various forms such as a compatible equation set, an incompatible equation set, and the like, and also has various applications such as linear regression, curve least square fitting, and the like. However, the existing equation solver is not compatible with the solution of various equations, and is limited to a fixed problem in application, so that it is imperative to develop an in-memory numerical calculation accelerator with high system reconfigurability and high energy efficiency.
Disclosure of Invention
In view of the above defects or improvement requirements of the prior art, the present invention provides an in-memory numerical calculation accelerator and an in-memory numerical calculation method, and aims to solve the technical problem of low system reconfigurability of the existing mathematical method and operation architecture.
To achieve the above object, in a first aspect, there is provided a memory numerical computation accelerator including an external control module and a memory computation module; wherein the memory computing module comprises a non-volatile memory array;
the external control module is used for solving the problem in the initial stageConverting the task into a form of multiplying the matrix X by the vector w to be solved; a predetermined vector rnSequentially combining the matrix X and the vector rnTransmitting the data to an in-memory computing module; wherein, the vector rnThe same dimension as the vector w;
the memory computing module is used for writing the received matrix X into the nonvolatile memory array in an initial stage; and upon receiving the vector rnThen, the vector is input into a nonvolatile memory array to realize the matrix X and the vector rnAnd feeding back the multiplication result to the external control module;
the external control module is also used for judging whether the preset iteration times are met or the multiplication result reaches the preset precision after the multiplication result fed back by the memory computing module is received in the iteration stage, and if so, the current vector rnNamely the vector w to be solved, and stopping operation; if not, updating the vector rnAnd transmitting the data to the memory computing module;
the in-memory computation module is also used for receiving the current vector r in the iteration stagenThen it is input into the non-volatile memory array to realize matrix X and vector rnAnd feeding back the multiplication result to the external control module.
Further preferably, the external control module includes a first control unit and a dynamic random access memory DRAM;
the first control unit converts the task to be solved into a form of matrix and vector multiplication at an initial stage; a predetermined vector rnAnd sequentially combining the matrix X and the vector rnTransmitting the data to an in-memory computing module; in the iteration stage, after the DRAM receives the multiplication result fed back by the memory computing module, whether the multiplication result meets the preset precision or the iteration number meets the preset iteration number is judged, and if yes, the current vector rnNamely the vector w to be solved, and stopping operation; if not, updating the vector rnAnd transferred to the memory computing module via the DRAM.
Further preferably, the memory computing module comprises a second control unit and a multiplication unit; the multiplication unit comprises a digital-to-analog converter, an analog-to-digital converter and the nonvolatile memory array;
the second control unit is connected with the multiplication unit; the nonvolatile memory array comprises an input end and an output end; the digital-to-analog converter is connected with the input end of the nonvolatile memory array, and the analog-to-digital converter is connected with the output end of the nonvolatile memory array;
the second control unit is used for gating corresponding rows and columns in the nonvolatile memory array and controlling the input and the output of the nonvolatile memory array;
when matrix and vector multiplication operation is performed, the data input by the external control module is converted into a voltage vector by the digital-to-analog converter and input into the nonvolatile memory array to perform operation, and a current vector output by the nonvolatile memory array is converted into a data quantity after passing through the analog-to-digital converter, namely a multiplication operation result, and the data quantity is fed back to the external control module.
Further preferably, the nonvolatile memory array is a resistance change memory array, a phase change memory array, a NOR-FLASH array, a spin transfer torque magnetic memory array, or a ferroelectric field effect transistor array.
Further preferably, the in-memory numerical computation accelerator is adapted to any numerical problem that can be solved using a numerical iterative algorithm including matrix and vector operations.
Further preferably, the memory numerical computation accelerator is suitable for solving a linear equation, solving a linear equation set, solving a partial differential equation, solving a matrix eigenvalue and eigenvector, solving a curve minimum two-layer fitting problem, and solving a linear regression equation.
In a second aspect, the present invention provides a memory numerical calculation method for a memory numerical calculation accelerator based on the first aspect of the present invention, including the following steps:
s1, converting the task to be solved into a form of multiplying the matrix X by the vector w to be solved, and writing the matrix X into the nonvolatile memory array;
s2, presetting vector rnAnd input into the non-volatile memory array to realize matrix X and vector rnThe multiplication of (1); wherein, the vector rnThe same dimension as the vector w;
s3, judging whether the preset iteration times are met or whether the obtained multiplication result reaches the preset precision, if so, judging that the current vector rnNamely the vector w to be solved, and the operation is finished; if not, updating the vector rnGo to step S4;
s4, converting the vector rnInputting the vector into a nonvolatile memory array to realize matrix X and vector rnThe process proceeds to step S3.
Further preferably, the vector r is updatednThe method for judging whether the multiplication result reaches the preset precision or not is determined by a solving algorithm, wherein the solving algorithm is determined according to the task to be solved and comprises a gradient descent method, a conjugate gradient method and a power method.
In general, compared with the prior art, the above technical solution contemplated by the present invention can achieve the following beneficial effects:
1. the invention provides an in-memory numerical calculation accelerator, which is based on numerical iteration algorithm and the storage and calculation integration characteristic of a nonvolatile memory, and comprises an external control unit and a storage and calculation integration unit, wherein the nonvolatile memory array executes data intensive vector matrix multiplication operation, the external control unit executes the control of the numerical iteration algorithm, and most of numerical problems can use the numerical algorithm containing the vector matrix multiplication operation in the solution, so the numerical calculation accelerator can be widely applied to tasks of solving linear equations, solving linear equation sets, solving stationary/time-varying partial differential equations, solving matrix characteristic values and characteristic vectors, solving curve minimum two-layer fitting problems, solving linear regression equations and the like, and the system has high reconfigurability.
2. As the emerging nonvolatile memory has the advantages of high speed, low power consumption, easy integration, compatibility with a CMOS (complementary metal oxide semiconductor) process and the like, the memory computing unit in the numerical computing accelerator provided by the invention can realize multiplication of vectors and matrixes by a data matrix for conductance memory operation; the numerical calculation accelerator has the characteristics of high operation energy efficiency and high calculation precision due to the adoption of a mode of combining the storage and calculation integrated operation unit with the external control unit.
3. In the memory numerical calculation accelerator provided by the invention, the external control circuit is used for executing iterative control, the calculation precision is determined by the floating point operation precision of the first control unit, the influence of the non-ideal effect of the nonvolatile memory array on the calculation precision is overcome to a certain extent, and the calculation result has higher precision.
4. The in-memory numerical value calculation accelerator and the in-memory numerical value calculation method provided by the invention ensure that the matrix X is not changed in the solving process through the optimal solving algorithm, only one writing process is needed, the circuit complexity is reduced, the data transmission is reduced to the maximum extent, the circuit power consumption is reduced, and meanwhile, compared with the traditional process of solving the numerical value iterative algorithm by using a computer, the time complexity can be effectively reduced by adopting the circuit, the integration of storage and calculation is realized, the operation energy consumption and time are greatly saved, the reliability is high, and the operation energy efficiency is further improved.
5. The memory computing unit of the memory numerical computing accelerator can use a plurality of nonvolatile memories such as a resistive random access memory array, a phase change memory array, a NOR-FLASH array, a spin transfer torque magnetic memory array or a ferroelectric field effect transistor array, and has strong expandability.
6. The memory numerical value calculation method provided by the invention achieves the purpose of solving the inverse numerical value problem based on the numerical iteration algorithm on the memory numerical value calculation accelerator provided by the invention, can be used for solving a linear equation, a linear equation set, a stationary/time-varying partial differential equation, a matrix characteristic value and a characteristic vector, a curve minimum two-layer fitting problem, a linear regression equation and the like, and has strong universality.
Drawings
FIG. 1 is a schematic diagram of a memory numerical computation accelerator according to embodiment 1 of the present invention;
FIG. 2 is a schematic diagram of a multiplication unit according to embodiment 1 of the present invention;
FIG. 3 shows an implementation matrix X and a vector r provided in embodiment 2 of the present inventionnThe multiplication process of (1) is shown schematically;
fig. 4 is a schematic diagram of an operation process of solving a linear regression equation in the in-memory numerical calculation accelerator according to embodiment 2 of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
Examples 1,
An in-memory numerical computation accelerator, as shown in fig. 1, includes an external control module and an in-memory computation module; wherein the memory computing module comprises a non-volatile memory array;
the external control module is used for converting the task to be solved into a form of multiplying a matrix X and a vector w to be solved at an initial stage, and the form is recorded as X.w is equal to y, wherein y is a target vector; a predetermined vector rnSequentially combining the matrix X and the vector rnTransmitting the data to an in-memory computing module; wherein, the vector rnThe dimension of the vector w is the same as that of the vector w, and is n;
the memory computing module is used for writing the received matrix X into the nonvolatile memory array in an initial stage; and upon receiving the vector rnThen, the vector is input into a nonvolatile memory array to realize the matrix X and the vector rnAnd feeding back the multiplication result to the external control module;
the external control module is also used for judging whether the preset iteration is achieved or not after the multiplication operation result fed back by the memory calculation module is received in the iteration stageThe generation times or the multiplication result reach the preset precision, if so, the current vector rnNamely the vector w to be solved, and stopping operation; if not, updating the vector rnAnd transmitting the data to the memory computing module; in particular, the vector r is updatednThe method for judging whether the multiplication result reaches the preset precision or not is determined by a solving algorithm, wherein the solving algorithm is determined according to the type of the equation to be solved, and specifically comprises a gradient descent method, a conjugate gradient method and a power method.
The in-memory computation module is also used for receiving the current vector r in the iteration stagenThen it is input into the non-volatile memory array to realize matrix X and vector rnAnd feeding back the multiplication result to the external control module.
Specifically, the external control module comprises a first control unit and a Dynamic Random Access Memory (DRAM); the memory computing module comprises a second control unit and a multiplication operation unit; wherein, the nonvolatile memory array is positioned in a multiplication unit;
inputting the task to be solved into the first control unit, converting the task to be solved into a form of multiplying the matrix X by the vector w to be solved, and presetting a vector r according to the vector w to be solvednAnd after the matrix X is transmitted to the second control unit through the DRAM, the matrix X is written into the nonvolatile memory array under the action of the second control unit; then, the first control unit outputs the vector rnTransferring the vector r via the DRAM to a second control unit, which gates corresponding rows and columns in the non-volatile memory arraynInputting into a non-volatile memory array to realize a data-intensive vector rnMultiplication operation with the matrix X is carried out, and an operation result is returned to the DRAM to complete a round of iteration; the first control unit judges whether the multiplication result reaches preset precision or the iteration times reach preset iteration times, if so, the current vector rnNamely the vector w to be solved, and stopping operation; if not, updating the vector rnAnd transmitting it to the second control unit via DRAM, and implementing data intensive vector r under the control of the second control unitnMultiplication operation with the matrix X is carried out, and an operation result is returned to the DRAM to complete the second iteration; and analogizing in turn, and outputting the solution of the solving task in the external control unit after multiple rounds of iterative cycles.
Further, the multiplication unit comprises the nonvolatile memory array, a digital-to-analog converter and an analog-to-digital converter; as shown in particular in fig. 2. In this embodiment, the nonvolatile memory array is specifically a resistive random access memory array (RRAM), and includes an input end and an output end; the digital-to-analog converter is connected with the input end and the output end; when matrix and vector multiplication operation is performed, the data input by the second control unit is converted into a voltage vector by the digital-to-analog converter, the voltage vector is input into the nonvolatile memory array to perform multiplication operation, and a current vector output by the nonvolatile memory array is converted into a data quantity after passing through the analog-to-digital converter, namely a multiplication operation result. In addition to the resistance change memory array, the nonvolatile memory array may be a NOR-FLASH array, a spin transfer torque magnetic memory array (STT-MRAM), a ferroelectric field effect transistor array (FeFET), or the like.
Further, it should be noted that the above-mentioned in-memory numerical calculation accelerator is applicable to any numerical problem that can be solved by using a numerical iterative algorithm including matrix and vector operation, and is particularly applicable to tasks such as solving a linear equation, solving a linear equation set, solving a partial differential equation, solving a matrix eigenvalue and eigenvector, solving a two-layer fitting problem with a minimum curve, and solving a linear regression equation.
Examples 2,
A memory numerical calculation method of a memory numerical calculation accelerator according to embodiment 1 of the present invention includes the steps of:
s1, converting the task to be solved into a form of multiplying the matrix X by the vector w to be solved, and writing the matrix X into the nonvolatile memory array;
s2, presetting vector rnAnd input into the non-volatile memory array to realize matrix X and vector rnThe multiplication of (1); wherein the content of the first and second substances,vector rnThe same dimension as the vector w;
specifically, a vector r is preset by a control unit in the external control modulenAnd the matrix X and the vector r are realized by inputting DRAM in the external control module into the nonvolatile memory arraynAnd feeding back the obtained multiplication result to the external control module and storing the result in the DRAM.
Further, matrix X and vector r are realizednThe multiplication process of (2) is shown in FIG. 3, and the element written into the matrix X in the nonvolatile memory array is XijWherein i is more than or equal to 1 and less than or equal to m, j is more than or equal to 1 and less than or equal to n, m is the row number of the matrix X, n is the column number of the matrix X, and the storage mode of the matrix elements in the nonvolatile memory array is Xij=GijData vector r to be transmitted to multiplication unitnThe voltage vector is converted into a voltage vector by a digital-to-analog converter and then is input into the nonvolatile memory array, and each unit in the nonvolatile memory array obtains one current quantity according to ohm's law. According to kirchhoff's current law, the output current of each row of the array is the sum of the currents of each cell in the row, and the output current of each row is
Figure BDA0002653794510000091
Thus, a series of output currents are obtained on the row lines of the nonvolatile memory array, and a current vector is formed. The current vector is converted into a data vector through an analog-to-digital converter and is fed back to the external control module.
S3, judging whether the preset iteration times are met or whether the obtained multiplication result reaches the preset precision, if so, judging that the current vector rnNamely the vector w to be solved, and the operation is finished; if not, updating the vector rnGo to step S4; in particular, the vector r is updatednThe method for judging whether the multiplication result reaches the preset precision or not is determined by a solving algorithm, wherein the solving algorithm is determined according to the type of the equation to be solved, and specifically comprises algorithms such as a gradient descent method, a conjugate gradient method and a power method.
S4, converting the vector rnInput to NOTVolatile memory array implementing matrix X and vector rnThe process proceeds to step S3.
It should be noted that the memory numerical calculation of the memory numerical calculation accelerator provided in embodiment 1 of the present invention is applicable to any numerical problem that can be solved by using a numerical iterative algorithm that includes a matrix and a vector, and specifically includes: solving a linear equation, solving a linear equation set, solving a partial differential equation, solving a matrix eigenvalue and an eigenvector, fitting a curve minimum two-layer and solving a linear regression equation.
When solving the linear regression equation, a schematic diagram of an operation process when solving the linear regression equation in the in-memory numerical computation accelerator is shown in fig. 4, and the linear regression problem at this time is recorded as X · w ═ y, where X is a coefficient matrix of an mxn specification, y is a column vector of m rows, and w is a column vector of a dimension n to be solved. The solving algorithm adopted in the embodiment is a gradient descent method. Specifically, the method comprises the following steps: (1) writing the matrix X into the non-volatile memory array of the in-memory processing module by means of an external control module, presetting a vector rnDetermining a learning rate eta; (2) vector r is transmitted by an external control modulenInputting the data into a nonvolatile memory array to calculate X.rnThe in-memory processing module feeds back the multiplication result to the external control module; (3) calculating the least square error E | | | | y-X.r in the external control modulenIf yes, stopping iteration, and judging whether the current vector r reaches the preset iteration times or whether the error E is less than or equal to the preset error limit t or notnThe result is obtained; otherwise, the vector r is updatednIs concretely provided with
Figure BDA0002653794510000101
And go to the above-mentioned step (2); wherein, the | | · | | | in the error of the least square calculation is 2-norm.
Further, when solving the linear equation set, the task to be solved can be converted into a form of multiplying the matrix X by the vector w to be solved, and the process of solving is greater than that of solving the linear regression equationThe same is achieved; in contrast, the solution algorithm used in this case is preferably a conjugate gradient method, and the vector r is updatednThe method for judging whether the multiplication result reaches the preset precision or not is an updating and judging method in the conjugate gradient method. Specifically, when a conjugate gradient method is adopted, an iteration initial value w is given0Calculating the residual u0=y-X·w0And let p stand for0=u0(ii) a In the update process, wk+1=wkkpk,uk+1=ukkXpk,pk+1=uk+1kpk
Figure BDA0002653794510000102
Figure BDA0002653794510000103
It should be noted that k here denotes the kth iteration. The judgment method for judging whether the preset precision is achieved is to judge whether the preset precision is met
Figure BDA0002653794510000104
Where η is a constant close to 0.
Further, when the equation to be solved is a partial differential equation, the partial differential equation is subjected to mathematical transformation by adopting a finite difference method, and the partial differential equation is converted into a linear equation set form; for stationary partial differential equations, the solving process is the same as that of the linear equation set, and a conjugate gradient algorithm is adopted. For the time-varying partial differential equation, the discrete process uses a full explicit format, specifically: b is1uk+1=B0uk+ΔtFkK is 0,1, …, N-1; wherein N is the order of a time-varying partial differential equation; by solving the partial derivatives of the above formula, continuously iterating and circulating (the stopping condition is that k reaches N-1), converting into a form of multiplying a matrix by a vector, and ensuring that the solving process and the solving process of the linear equation set are not fundamentally changed; the difference is that the algorithm is determined by whether the time parameter reaches the preset time limit, which is not described herein.
Further, when solving the matrix characteristicsWhen the value and the characteristic vector are obtained, converting the task to be solved into a form of multiplying a matrix and a vector, wherein the solving process is approximately the same as the linear regression equation solving process; in contrast, the solution algorithm used in this case is preferably a power method, and the vector r is updatednThe method for judging whether the multiplication result reaches the preset precision is an updating and judging method in the power method; specifically, the eigenvalue equation of the matrix is Ax ═ λ x, and the equation is converted into a form of (λ E-a) · x ═ 0 by multiplying the matrix and the vector, where; preset vector u0Here u0Is the same as the dimension of x. The updating process of the feature vector in the power method is as follows: u. ofk=A·uk+1Where k denotes the kth iteration; the judgment method for judging whether the preset precision is reached is to judge whether the | u is satisfiedk-uk-1L <; among them, a constant close to 0 is used.
It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (8)

1. An in-memory numerical computation accelerator is characterized by comprising an external control module and an in-memory computation module; wherein the in-memory computing module comprises a non-volatile memory array;
the external control module is used for converting the task to be solved into a form of multiplying a matrix X and a vector w to be solved in an initial stage; a predetermined vector rnSequentially combining the matrix X and the vector rnTransmitting the data to an in-memory computing module; wherein, the vector rnThe same dimension as the vector w;
the in-memory computation module is used for writing the received matrix X into the nonvolatile memory array in the initial stage; and upon receiving the vector rnThen, inputting the vector into the nonvolatile memory array to realize matrix X and vector rnAnd feeding back the multiplication result to the external control module;
the external control module is further used for judging whether a preset iteration number is reached or whether a multiplication result reaches a preset precision after the multiplication result fed back by the memory computing module is received in an iteration stage, and if so, judging that the current vector r reaches a preset iteration number or the multiplication result reaches a preset precisionnNamely the vector w to be solved, and stopping operation; if not, updating the vector rnAnd transmitting it to the memory computing module;
the in-memory computation module is further configured to, in the iteration stage, when receiving the current vector rnThen it is input into the non-volatile memory array to realize matrix X and vector rnAnd feeding back the multiplication result to the external control module.
2. The in-memory numerical computation accelerator of claim 1, wherein the external control module comprises a first control unit and a Dynamic Random Access Memory (DRAM);
the first control unit converts the task to be solved into a form of matrix and vector multiplication at an initial stage; a predetermined vector rnAnd sequentially combining the matrix X and the vector rnTransmitting the data to an in-memory computing module; in the iteration stage, after the DRAM receives the multiplication result fed back by the memory computing module, whether the multiplication result meets the preset precision or the iteration number meets the preset iteration number is judged, if yes, the current vector rnNamely the vector w to be solved, and stopping operation; if not, updating the vector rnAnd transmits it to the memory computing module through the DRAM.
3. The in-memory numerical computation accelerator of claim 1, wherein the in-memory computation module comprises a second control unit and a multiplication unit; the multiplication operation unit comprises a digital-to-analog converter, an analog-to-digital converter and the nonvolatile memory array;
the second control unit is connected with the multiplication unit; the non-volatile memory array comprises an input terminal and an output terminal; the digital-to-analog converter is connected with the input end of the nonvolatile memory array, and the analog-to-digital converter is connected with the output end of the nonvolatile memory array;
the second control unit is used for gating corresponding rows and columns in the nonvolatile memory array and controlling the input and the output of the nonvolatile memory array;
when matrix and vector multiplication operation is performed, the digital-to-analog converter converts data input by the external control module into a voltage vector, the voltage vector is input into the nonvolatile memory array to perform multiplication operation, and a current vector output by the nonvolatile memory array is converted into a data quantity after passing through the analog-to-digital converter, namely a multiplication operation result, and the data quantity is fed back to the external control module.
4. The in-memory numerical computation accelerator of claim 1, wherein the non-volatile memory array is a resistive memory array, a phase change memory array, a NOR-FLASH array, a spin transfer torque magnetic memory array, or a ferroelectric field effect transistor array.
5. An in-memory numerical computation accelerator according to any one of claims 1 to 4, adapted to any numerical problem that can be solved using a numerical iterative algorithm involving matrix and vector operations.
6. The in-memory numerical computation accelerator of claim 5, adapted for solving linear equations, solving linear systems of equations, solving partial differential equations, solving matrix eigenvalues and eigenvectors, curve minimum two-layer fitting, and solving linear regression equations.
7. A memory numerical calculation method of a memory numerical calculation accelerator based on any one of claims 1 to 6, characterized by comprising the steps of:
s1, converting the task to be solved into a form of multiplying the matrix X by the vector w to be solved, and writing the matrix X into the nonvolatile memory array;
s2, presetting vector rnAnd input into the non-volatile memory array to realize matrix X and vector rnThe multiplication of (1); wherein, the vector rnThe same dimension as the vector w;
s3, judging whether the preset iteration times are met or whether the obtained multiplication result reaches the preset precision, if so, judging that the current vector rnNamely the vector w to be solved, and the operation is finished; if not, updating the vector rnGo to step S4;
s4, converting the vector rnInputting the vector into a nonvolatile memory array to realize matrix X and vector rnThe process proceeds to step S3.
8. The memory numerical calculation method of claim 7, wherein the update vector rnThe method for judging whether the multiplication result reaches the preset precision or not is determined by a solving algorithm, wherein the solving algorithm is determined according to the task to be solved and comprises a gradient descent method, a conjugate gradient method and a power method.
CN202010879915.XA 2020-08-27 2020-08-27 Memory numerical calculation accelerator and memory numerical calculation method Active CN111984921B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010879915.XA CN111984921B (en) 2020-08-27 2020-08-27 Memory numerical calculation accelerator and memory numerical calculation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010879915.XA CN111984921B (en) 2020-08-27 2020-08-27 Memory numerical calculation accelerator and memory numerical calculation method

Publications (2)

Publication Number Publication Date
CN111984921A true CN111984921A (en) 2020-11-24
CN111984921B CN111984921B (en) 2024-04-19

Family

ID=73440021

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010879915.XA Active CN111984921B (en) 2020-08-27 2020-08-27 Memory numerical calculation accelerator and memory numerical calculation method

Country Status (1)

Country Link
CN (1) CN111984921B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112464156A (en) * 2020-12-17 2021-03-09 长江先进存储产业创新中心有限责任公司 Matrix and vector multiplication method and device
CN112767993A (en) * 2021-03-03 2021-05-07 清华大学 Test method and test system
CN114237548A (en) * 2021-11-22 2022-03-25 南京大学 Method and system for complex dot product operation based on nonvolatile memory array
WO2023071163A1 (en) * 2021-10-26 2023-05-04 北京大学 Partial differential equation solver and solving method based on non-volatile memory array

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090248778A1 (en) * 2008-03-28 2009-10-01 Magerlein Karen A Systems and methods for a combined matrix-vector and matrix transpose vector multiply for a block-sparse matrix
US20150293883A1 (en) * 2012-12-28 2015-10-15 Murata Manufacturing Co., Ltd. Calculation device and calculation method for deriving solutions of system of linear equations and program that is applied to the same
CN110597487A (en) * 2019-08-26 2019-12-20 华中科技大学 Matrix vector multiplication circuit and calculation method
CN110597555A (en) * 2019-08-02 2019-12-20 北京航空航天大学 Nonvolatile memory computing chip and operation control method thereof
CN111026700A (en) * 2019-11-21 2020-04-17 清华大学 Memory computing architecture for realizing acceleration and acceleration method thereof
CN111126579A (en) * 2019-11-05 2020-05-08 复旦大学 Memory computing device suitable for binary convolution neural network computing

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090248778A1 (en) * 2008-03-28 2009-10-01 Magerlein Karen A Systems and methods for a combined matrix-vector and matrix transpose vector multiply for a block-sparse matrix
US20150293883A1 (en) * 2012-12-28 2015-10-15 Murata Manufacturing Co., Ltd. Calculation device and calculation method for deriving solutions of system of linear equations and program that is applied to the same
CN110597555A (en) * 2019-08-02 2019-12-20 北京航空航天大学 Nonvolatile memory computing chip and operation control method thereof
CN110597487A (en) * 2019-08-26 2019-12-20 华中科技大学 Matrix vector multiplication circuit and calculation method
CN111126579A (en) * 2019-11-05 2020-05-08 复旦大学 Memory computing device suitable for binary convolution neural network computing
CN111026700A (en) * 2019-11-21 2020-04-17 清华大学 Memory computing architecture for realizing acceleration and acceleration method thereof

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
JUNWHAN AHN 等: "A Scalable Processing-in-Memory Accelerator for Parallel Graph Processing", ISCA’15, 17 June 2015 (2015-06-17) *
张旭 等: "面向图计算应用的处理器访存通路优化设计与实现", 国防科技大学学报, vol. 42, no. 2, 30 April 2020 (2020-04-30) *
李祎 等: "基于忆阻器的存储与计算融合理论与实现", 国防科技, vol. 37, no. 6, 31 December 2016 (2016-12-31) *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112464156A (en) * 2020-12-17 2021-03-09 长江先进存储产业创新中心有限责任公司 Matrix and vector multiplication method and device
CN112464156B (en) * 2020-12-17 2022-08-23 长江先进存储产业创新中心有限责任公司 Matrix and vector multiplication method and device
CN112767993A (en) * 2021-03-03 2021-05-07 清华大学 Test method and test system
CN112767993B (en) * 2021-03-03 2023-07-14 北京忆元科技有限公司 Test method and test system
WO2023071163A1 (en) * 2021-10-26 2023-05-04 北京大学 Partial differential equation solver and solving method based on non-volatile memory array
CN114237548A (en) * 2021-11-22 2022-03-25 南京大学 Method and system for complex dot product operation based on nonvolatile memory array
CN114237548B (en) * 2021-11-22 2023-07-18 南京大学 Method and system for complex point multiplication operation based on nonvolatile memory array

Also Published As

Publication number Publication date
CN111984921B (en) 2024-04-19

Similar Documents

Publication Publication Date Title
CN111984921A (en) Memory numerical calculation accelerator and memory numerical calculation method
Mochida et al. A 4M synapses integrated analog ReRAM based 66.5 TOPS/W neural-network processor with cell current controlled writing and flexible network architecture
US11361215B2 (en) Neural network circuits having non-volatile synapse arrays
CN109800876B (en) Data operation method of neural network based on NOR Flash module
US10339202B2 (en) Resistive memory arrays for performing multiply-accumulate operations
CN108009640B (en) Training device and training method of neural network based on memristor
US10877752B2 (en) Techniques for current-sensing circuit design for compute-in-memory
US10297315B2 (en) Resistive memory accelerator
CN111507464B (en) Equation solver based on memristor array and operation method thereof
JP2021500646A (en) A method for training an artificial neural network and a device for implementing an artificial neural network (training for an artificial neural network)
US20200202203A1 (en) Neural network computation circuit including semiconductor storage elements
CN112041928A (en) Acceleration of model/weight programming in memristor crossbar arrays
CN111478703B (en) Memristor cross array-based processing circuit and output current compensation method
CN113517007B (en) Flowing water processing method and system and memristor array
CN111125616B (en) Two-dimensional discrete Fourier transform operation circuit and operation method
US11922169B2 (en) Refactoring mac operations
Richter et al. Memristive accelerator for extreme scale linear solvers
US20180322094A1 (en) Resistive memory accelerator
CN113837371A (en) Neuromorphic device and method for implementing neural networks
CN114282478B (en) Method for correcting array dot product error of variable resistor device
KR20220020097A (en) Processing apparatus and electronic system having the same
CN111428857A (en) Convolution operation device and method based on memristor
CN111988031B (en) Memristor memory vector matrix operator and operation method
CN115796252A (en) Weight writing method and device, electronic equipment and storage medium
TWI803889B (en) Computing device and computing method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant