CN111984921A

CN111984921A - Memory numerical calculation accelerator and memory numerical calculation method

Info

Publication number: CN111984921A
Application number: CN202010879915.XA
Authority: CN
Inventors: 李祎; 李健聪; 缪向水
Original assignee: Huazhong University of Science and Technology
Current assignee: Huazhong University of Science and Technology
Priority date: 2020-08-27
Filing date: 2020-08-27
Publication date: 2020-11-24
Anticipated expiration: 2040-08-27
Also published as: CN111984921B

Abstract

The invention discloses an in-memory numerical calculation accelerator and an in-memory numerical calculation method, which comprise an external control module and an in-memory calculation module; wherein the in-memory computing module comprises a non-volatile memory array; based on the storage and calculation integration characteristics of a numerical iteration algorithm and a nonvolatile memory, the nonvolatile memory array executes data intensive vector matrix multiplication operation, an external control unit executes control of the numerical iteration algorithm, and most numerical problems can be solved by using a numerical algorithm containing the vector matrix multiplication operation, so that the numerical calculation accelerator can be widely applied to tasks such as solving of linear equations, solving of linear equation sets, solving of stationary/time-varying partial differential equations, solving of matrix characteristic values and characteristic vectors, solving of a curve minimum two-layer fitting problem, solving of linear regression equations and the like, and the system has high reconfigurability. In addition, the memory numerical calculation accelerator can be compatible with various nonvolatile memories, and has strong expandability.

Description

Memory numerical calculation accelerator and memory numerical calculation method

Technical Field

The invention belongs to the field of analog circuits, and particularly relates to an in-memory numerical calculation accelerator and an in-memory numerical calculation method.

Background

In the big data era, when a large amount of data is transmitted between a storage unit and an operation unit during calculation, the traditional computer architecture generates huge energy consumption during the operation process when processing data intensive tasks, so that the traditional computer architecture has very low operation energy efficiency when processing the data intensive tasks.

The storage and computation integration framework based on various nonvolatile memories is a new computing framework for processing data intensive tasks, and the data transmission in the computation process is reduced to the maximum extent because the computation process is directly carried out in the memories, so that the storage and computation integration framework has high computation energy efficiency. At present, a storage and calculation integrated architecture has achieved remarkable achievement in the field of neuromorphic computing, and various artificial neural networks built based on nonvolatile memories prove the great potential of the storage and calculation integrated technology.

However, as a data intensive task, the non-volatile memory based integrated memory technology faces a challenge when performing acceleration of numerical calculation, and at present, although various non-volatile memory based acceleration circuits for numerical calculation have been proposed, most of these tasks can only process one or two tasks, and taking a linear equation set as an example, the linear equation set itself has various forms such as a compatible equation set, an incompatible equation set, and the like, and also has various applications such as linear regression, curve least square fitting, and the like. However, the existing equation solver is not compatible with the solution of various equations, and is limited to a fixed problem in application, so that it is imperative to develop an in-memory numerical calculation accelerator with high system reconfigurability and high energy efficiency.

Disclosure of Invention

In view of the above defects or improvement requirements of the prior art, the present invention provides an in-memory numerical calculation accelerator and an in-memory numerical calculation method, and aims to solve the technical problem of low system reconfigurability of the existing mathematical method and operation architecture.

To achieve the above object, in a first aspect, there is provided a memory numerical computation accelerator including an external control module and a memory computation module; wherein the memory computing module comprises a non-volatile memory array;

the external control module is used for solving the problem in the initial stageConverting the task into a form of multiplying the matrix X by the vector w to be solved; a predetermined vector r_nSequentially combining the matrix X and the vector r_nTransmitting the data to an in-memory computing module; wherein, the vector r_nThe same dimension as the vector w;

the memory computing module is used for writing the received matrix X into the nonvolatile memory array in an initial stage; and upon receiving the vector r_nThen, the vector is input into a nonvolatile memory array to realize the matrix X and the vector r_nAnd feeding back the multiplication result to the external control module;

the external control module is also used for judging whether the preset iteration times are met or the multiplication result reaches the preset precision after the multiplication result fed back by the memory computing module is received in the iteration stage, and if so, the current vector r_nNamely the vector w to be solved, and stopping operation; if not, updating the vector r_nAnd transmitting the data to the memory computing module;

the in-memory computation module is also used for receiving the current vector r in the iteration stage_nThen it is input into the non-volatile memory array to realize matrix X and vector r_nAnd feeding back the multiplication result to the external control module.

Further preferably, the external control module includes a first control unit and a dynamic random access memory DRAM;

the first control unit converts the task to be solved into a form of matrix and vector multiplication at an initial stage; a predetermined vector r_nAnd sequentially combining the matrix X and the vector r_nTransmitting the data to an in-memory computing module; in the iteration stage, after the DRAM receives the multiplication result fed back by the memory computing module, whether the multiplication result meets the preset precision or the iteration number meets the preset iteration number is judged, and if yes, the current vector r_nNamely the vector w to be solved, and stopping operation; if not, updating the vector r_nAnd transferred to the memory computing module via the DRAM.

Further preferably, the memory computing module comprises a second control unit and a multiplication unit; the multiplication unit comprises a digital-to-analog converter, an analog-to-digital converter and the nonvolatile memory array;

the second control unit is connected with the multiplication unit; the nonvolatile memory array comprises an input end and an output end; the digital-to-analog converter is connected with the input end of the nonvolatile memory array, and the analog-to-digital converter is connected with the output end of the nonvolatile memory array;

the second control unit is used for gating corresponding rows and columns in the nonvolatile memory array and controlling the input and the output of the nonvolatile memory array;

when matrix and vector multiplication operation is performed, the data input by the external control module is converted into a voltage vector by the digital-to-analog converter and input into the nonvolatile memory array to perform operation, and a current vector output by the nonvolatile memory array is converted into a data quantity after passing through the analog-to-digital converter, namely a multiplication operation result, and the data quantity is fed back to the external control module.

Further preferably, the nonvolatile memory array is a resistance change memory array, a phase change memory array, a NOR-FLASH array, a spin transfer torque magnetic memory array, or a ferroelectric field effect transistor array.

Further preferably, the in-memory numerical computation accelerator is adapted to any numerical problem that can be solved using a numerical iterative algorithm including matrix and vector operations.

Further preferably, the memory numerical computation accelerator is suitable for solving a linear equation, solving a linear equation set, solving a partial differential equation, solving a matrix eigenvalue and eigenvector, solving a curve minimum two-layer fitting problem, and solving a linear regression equation.

In a second aspect, the present invention provides a memory numerical calculation method for a memory numerical calculation accelerator based on the first aspect of the present invention, including the following steps:

s1, converting the task to be solved into a form of multiplying the matrix X by the vector w to be solved, and writing the matrix X into the nonvolatile memory array;

s2, presetting vector r_nAnd input into the non-volatile memory array to realize matrix X and vector r_nThe multiplication of (1); wherein, the vector r_nThe same dimension as the vector w;

s3, judging whether the preset iteration times are met or whether the obtained multiplication result reaches the preset precision, if so, judging that the current vector r_nNamely the vector w to be solved, and the operation is finished; if not, updating the vector r_nGo to step S4;

s4, converting the vector r_nInputting the vector into a nonvolatile memory array to realize matrix X and vector r_nThe process proceeds to step S3.

Further preferably, the vector r is updated_nThe method for judging whether the multiplication result reaches the preset precision or not is determined by a solving algorithm, wherein the solving algorithm is determined according to the task to be solved and comprises a gradient descent method, a conjugate gradient method and a power method.

In general, compared with the prior art, the above technical solution contemplated by the present invention can achieve the following beneficial effects:

1. the invention provides an in-memory numerical calculation accelerator, which is based on numerical iteration algorithm and the storage and calculation integration characteristic of a nonvolatile memory, and comprises an external control unit and a storage and calculation integration unit, wherein the nonvolatile memory array executes data intensive vector matrix multiplication operation, the external control unit executes the control of the numerical iteration algorithm, and most of numerical problems can use the numerical algorithm containing the vector matrix multiplication operation in the solution, so the numerical calculation accelerator can be widely applied to tasks of solving linear equations, solving linear equation sets, solving stationary/time-varying partial differential equations, solving matrix characteristic values and characteristic vectors, solving curve minimum two-layer fitting problems, solving linear regression equations and the like, and the system has high reconfigurability.

2. As the emerging nonvolatile memory has the advantages of high speed, low power consumption, easy integration, compatibility with a CMOS (complementary metal oxide semiconductor) process and the like, the memory computing unit in the numerical computing accelerator provided by the invention can realize multiplication of vectors and matrixes by a data matrix for conductance memory operation; the numerical calculation accelerator has the characteristics of high operation energy efficiency and high calculation precision due to the adoption of a mode of combining the storage and calculation integrated operation unit with the external control unit.

3. In the memory numerical calculation accelerator provided by the invention, the external control circuit is used for executing iterative control, the calculation precision is determined by the floating point operation precision of the first control unit, the influence of the non-ideal effect of the nonvolatile memory array on the calculation precision is overcome to a certain extent, and the calculation result has higher precision.

4. The in-memory numerical value calculation accelerator and the in-memory numerical value calculation method provided by the invention ensure that the matrix X is not changed in the solving process through the optimal solving algorithm, only one writing process is needed, the circuit complexity is reduced, the data transmission is reduced to the maximum extent, the circuit power consumption is reduced, and meanwhile, compared with the traditional process of solving the numerical value iterative algorithm by using a computer, the time complexity can be effectively reduced by adopting the circuit, the integration of storage and calculation is realized, the operation energy consumption and time are greatly saved, the reliability is high, and the operation energy efficiency is further improved.

5. The memory computing unit of the memory numerical computing accelerator can use a plurality of nonvolatile memories such as a resistive random access memory array, a phase change memory array, a NOR-FLASH array, a spin transfer torque magnetic memory array or a ferroelectric field effect transistor array, and has strong expandability.

6. The memory numerical value calculation method provided by the invention achieves the purpose of solving the inverse numerical value problem based on the numerical iteration algorithm on the memory numerical value calculation accelerator provided by the invention, can be used for solving a linear equation, a linear equation set, a stationary/time-varying partial differential equation, a matrix characteristic value and a characteristic vector, a curve minimum two-layer fitting problem, a linear regression equation and the like, and has strong universality.

Drawings

FIG. 1 is a schematic diagram of a memory numerical computation accelerator according to embodiment 1 of the present invention;

FIG. 2 is a schematic diagram of a multiplication unit according to embodiment 1 of the present invention;

FIG. 3 shows an implementation matrix X and a vector r provided in embodiment 2 of the present invention_nThe multiplication process of (1) is shown schematically;

fig. 4 is a schematic diagram of an operation process of solving a linear regression equation in the in-memory numerical calculation accelerator according to embodiment 2 of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.

Examples 1,

An in-memory numerical computation accelerator, as shown in fig. 1, includes an external control module and an in-memory computation module; wherein the memory computing module comprises a non-volatile memory array;

the external control module is used for converting the task to be solved into a form of multiplying a matrix X and a vector w to be solved at an initial stage, and the form is recorded as X.w is equal to y, wherein y is a target vector; a predetermined vector r_nSequentially combining the matrix X and the vector r_nTransmitting the data to an in-memory computing module; wherein, the vector r_nThe dimension of the vector w is the same as that of the vector w, and is n;

the external control module is also used for judging whether the preset iteration is achieved or not after the multiplication operation result fed back by the memory calculation module is received in the iteration stageThe generation times or the multiplication result reach the preset precision, if so, the current vector r_nNamely the vector w to be solved, and stopping operation; if not, updating the vector r_nAnd transmitting the data to the memory computing module; in particular, the vector r is updated_nThe method for judging whether the multiplication result reaches the preset precision or not is determined by a solving algorithm, wherein the solving algorithm is determined according to the type of the equation to be solved, and specifically comprises a gradient descent method, a conjugate gradient method and a power method.

Specifically, the external control module comprises a first control unit and a Dynamic Random Access Memory (DRAM); the memory computing module comprises a second control unit and a multiplication operation unit; wherein, the nonvolatile memory array is positioned in a multiplication unit;

inputting the task to be solved into the first control unit, converting the task to be solved into a form of multiplying the matrix X by the vector w to be solved, and presetting a vector r according to the vector w to be solved_nAnd after the matrix X is transmitted to the second control unit through the DRAM, the matrix X is written into the nonvolatile memory array under the action of the second control unit; then, the first control unit outputs the vector r_nTransferring the vector r via the DRAM to a second control unit, which gates corresponding rows and columns in the non-volatile memory array_nInputting into a non-volatile memory array to realize a data-intensive vector r_nMultiplication operation with the matrix X is carried out, and an operation result is returned to the DRAM to complete a round of iteration; the first control unit judges whether the multiplication result reaches preset precision or the iteration times reach preset iteration times, if so, the current vector r_nNamely the vector w to be solved, and stopping operation; if not, updating the vector r_nAnd transmitting it to the second control unit via DRAM, and implementing data intensive vector r under the control of the second control unit_nMultiplication operation with the matrix X is carried out, and an operation result is returned to the DRAM to complete the second iteration; and analogizing in turn, and outputting the solution of the solving task in the external control unit after multiple rounds of iterative cycles.

Further, the multiplication unit comprises the nonvolatile memory array, a digital-to-analog converter and an analog-to-digital converter; as shown in particular in fig. 2. In this embodiment, the nonvolatile memory array is specifically a resistive random access memory array (RRAM), and includes an input end and an output end; the digital-to-analog converter is connected with the input end and the output end; when matrix and vector multiplication operation is performed, the data input by the second control unit is converted into a voltage vector by the digital-to-analog converter, the voltage vector is input into the nonvolatile memory array to perform multiplication operation, and a current vector output by the nonvolatile memory array is converted into a data quantity after passing through the analog-to-digital converter, namely a multiplication operation result. In addition to the resistance change memory array, the nonvolatile memory array may be a NOR-FLASH array, a spin transfer torque magnetic memory array (STT-MRAM), a ferroelectric field effect transistor array (FeFET), or the like.

Further, it should be noted that the above-mentioned in-memory numerical calculation accelerator is applicable to any numerical problem that can be solved by using a numerical iterative algorithm including matrix and vector operation, and is particularly applicable to tasks such as solving a linear equation, solving a linear equation set, solving a partial differential equation, solving a matrix eigenvalue and eigenvector, solving a two-layer fitting problem with a minimum curve, and solving a linear regression equation.

Examples 2,

A memory numerical calculation method of a memory numerical calculation accelerator according to embodiment 1 of the present invention includes the steps of:

s2, presetting vector r_nAnd input into the non-volatile memory array to realize matrix X and vector r_nThe multiplication of (1); wherein the content of the first and second substances,vector r_nThe same dimension as the vector w;

specifically, a vector r is preset by a control unit in the external control module_nAnd the matrix X and the vector r are realized by inputting DRAM in the external control module into the nonvolatile memory array_nAnd feeding back the obtained multiplication result to the external control module and storing the result in the DRAM.

Further, matrix X and vector r are realized_nThe multiplication process of (2) is shown in FIG. 3, and the element written into the matrix X in the nonvolatile memory array is X_ijWherein i is more than or equal to 1 and less than or equal to m, j is more than or equal to 1 and less than or equal to n, m is the row number of the matrix X, n is the column number of the matrix X, and the storage mode of the matrix elements in the nonvolatile memory array is X_ij＝G_ijData vector r to be transmitted to multiplication unit_nThe voltage vector is converted into a voltage vector by a digital-to-analog converter and then is input into the nonvolatile memory array, and each unit in the nonvolatile memory array obtains one current quantity according to ohm's law. According to kirchhoff's current law, the output current of each row of the array is the sum of the currents of each cell in the row, and the output current of each row is

Thus, a series of output currents are obtained on the row lines of the nonvolatile memory array, and a current vector is formed. The current vector is converted into a data vector through an analog-to-digital converter and is fed back to the external control module.

S3, judging whether the preset iteration times are met or whether the obtained multiplication result reaches the preset precision, if so, judging that the current vector r_nNamely the vector w to be solved, and the operation is finished; if not, updating the vector r_nGo to step S4; in particular, the vector r is updated_nThe method for judging whether the multiplication result reaches the preset precision or not is determined by a solving algorithm, wherein the solving algorithm is determined according to the type of the equation to be solved, and specifically comprises algorithms such as a gradient descent method, a conjugate gradient method and a power method.

S4, converting the vector r_nInput to NOTVolatile memory array implementing matrix X and vector r_nThe process proceeds to step S3.

It should be noted that the memory numerical calculation of the memory numerical calculation accelerator provided in embodiment 1 of the present invention is applicable to any numerical problem that can be solved by using a numerical iterative algorithm that includes a matrix and a vector, and specifically includes: solving a linear equation, solving a linear equation set, solving a partial differential equation, solving a matrix eigenvalue and an eigenvector, fitting a curve minimum two-layer and solving a linear regression equation.

When solving the linear regression equation, a schematic diagram of an operation process when solving the linear regression equation in the in-memory numerical computation accelerator is shown in fig. 4, and the linear regression problem at this time is recorded as X · w ═ y, where X is a coefficient matrix of an mxn specification, y is a column vector of m rows, and w is a column vector of a dimension n to be solved. The solving algorithm adopted in the embodiment is a gradient descent method. Specifically, the method comprises the following steps: (1) writing the matrix X into the non-volatile memory array of the in-memory processing module by means of an external control module, presetting a vector r_nDetermining a learning rate eta; (2) vector r is transmitted by an external control module_nInputting the data into a nonvolatile memory array to calculate X.r_nThe in-memory processing module feeds back the multiplication result to the external control module; (3) calculating the least square error E | | | | y-X.r in the external control module_nIf yes, stopping iteration, and judging whether the current vector r reaches the preset iteration times or whether the error E is less than or equal to the preset error limit t or not_nThe result is obtained; otherwise, the vector r is updated_nIs concretely provided with

Further, when solving the linear equation set, the task to be solved can be converted into a form of multiplying the matrix X by the vector w to be solved, and the process of solving is greater than that of solving the linear regression equationThe same is achieved; in contrast, the solution algorithm used in this case is preferably a conjugate gradient method, and the vector r is updated_nThe method for judging whether the multiplication result reaches the preset precision or not is an updating and judging method in the conjugate gradient method. Specifically, when a conjugate gradient method is adopted, an iteration initial value w is given₀Calculating the residual u₀＝y-X·w₀And let p stand for₀＝u₀(ii) a In the update process, w_k+1＝w_k+α_kp_k，u_k+1＝u_k-α_kXp_k，p_k+1＝u_k+1+β_kp_k，

It should be noted that k here denotes the kth iteration. The judgment method for judging whether the preset precision is achieved is to judge whether the preset precision is met

Where η is a constant close to 0.

Further, when the equation to be solved is a partial differential equation, the partial differential equation is subjected to mathematical transformation by adopting a finite difference method, and the partial differential equation is converted into a linear equation set form; for stationary partial differential equations, the solving process is the same as that of the linear equation set, and a conjugate gradient algorithm is adopted. For the time-varying partial differential equation, the discrete process uses a full explicit format, specifically: b is₁u^k+1＝B₀u^k+ΔtF^kK is 0,1, …, N-1; wherein N is the order of a time-varying partial differential equation; by solving the partial derivatives of the above formula, continuously iterating and circulating (the stopping condition is that k reaches N-1), converting into a form of multiplying a matrix by a vector, and ensuring that the solving process and the solving process of the linear equation set are not fundamentally changed; the difference is that the algorithm is determined by whether the time parameter reaches the preset time limit, which is not described herein.

Further, when solving the matrix characteristicsWhen the value and the characteristic vector are obtained, converting the task to be solved into a form of multiplying a matrix and a vector, wherein the solving process is approximately the same as the linear regression equation solving process; in contrast, the solution algorithm used in this case is preferably a power method, and the vector r is updated_nThe method for judging whether the multiplication result reaches the preset precision is an updating and judging method in the power method; specifically, the eigenvalue equation of the matrix is Ax ═ λ x, and the equation is converted into a form of (λ E-a) · x ═ 0 by multiplying the matrix and the vector, where; preset vector u₀Here u₀Is the same as the dimension of x. The updating process of the feature vector in the power method is as follows: u. of_k＝A·u_k+1Where k denotes the kth iteration; the judgment method for judging whether the preset precision is reached is to judge whether the | u is satisfied_k-u_k-1L <; among them, a constant close to 0 is used.

It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. An in-memory numerical computation accelerator is characterized by comprising an external control module and an in-memory computation module; wherein the in-memory computing module comprises a non-volatile memory array;

the external control module is used for converting the task to be solved into a form of multiplying a matrix X and a vector w to be solved in an initial stage; a predetermined vector r_nSequentially combining the matrix X and the vector r_nTransmitting the data to an in-memory computing module; wherein, the vector r_nThe same dimension as the vector w;

the in-memory computation module is used for writing the received matrix X into the nonvolatile memory array in the initial stage; and upon receiving the vector r_nThen, inputting the vector into the nonvolatile memory array to realize matrix X and vector r_nAnd feeding back the multiplication result to the external control module;

the external control module is further used for judging whether a preset iteration number is reached or whether a multiplication result reaches a preset precision after the multiplication result fed back by the memory computing module is received in an iteration stage, and if so, judging that the current vector r reaches a preset iteration number or the multiplication result reaches a preset precision_nNamely the vector w to be solved, and stopping operation; if not, updating the vector r_nAnd transmitting it to the memory computing module;

the in-memory computation module is further configured to, in the iteration stage, when receiving the current vector r_nThen it is input into the non-volatile memory array to realize matrix X and vector r_nAnd feeding back the multiplication result to the external control module.

2. The in-memory numerical computation accelerator of claim 1, wherein the external control module comprises a first control unit and a Dynamic Random Access Memory (DRAM);

the first control unit converts the task to be solved into a form of matrix and vector multiplication at an initial stage; a predetermined vector r_nAnd sequentially combining the matrix X and the vector r_nTransmitting the data to an in-memory computing module; in the iteration stage, after the DRAM receives the multiplication result fed back by the memory computing module, whether the multiplication result meets the preset precision or the iteration number meets the preset iteration number is judged, if yes, the current vector r_nNamely the vector w to be solved, and stopping operation; if not, updating the vector r_nAnd transmits it to the memory computing module through the DRAM.

3. The in-memory numerical computation accelerator of claim 1, wherein the in-memory computation module comprises a second control unit and a multiplication unit; the multiplication operation unit comprises a digital-to-analog converter, an analog-to-digital converter and the nonvolatile memory array;

the second control unit is connected with the multiplication unit; the non-volatile memory array comprises an input terminal and an output terminal; the digital-to-analog converter is connected with the input end of the nonvolatile memory array, and the analog-to-digital converter is connected with the output end of the nonvolatile memory array;

when matrix and vector multiplication operation is performed, the digital-to-analog converter converts data input by the external control module into a voltage vector, the voltage vector is input into the nonvolatile memory array to perform multiplication operation, and a current vector output by the nonvolatile memory array is converted into a data quantity after passing through the analog-to-digital converter, namely a multiplication operation result, and the data quantity is fed back to the external control module.

4. The in-memory numerical computation accelerator of claim 1, wherein the non-volatile memory array is a resistive memory array, a phase change memory array, a NOR-FLASH array, a spin transfer torque magnetic memory array, or a ferroelectric field effect transistor array.

5. An in-memory numerical computation accelerator according to any one of claims 1 to 4, adapted to any numerical problem that can be solved using a numerical iterative algorithm involving matrix and vector operations.

6. The in-memory numerical computation accelerator of claim 5, adapted for solving linear equations, solving linear systems of equations, solving partial differential equations, solving matrix eigenvalues and eigenvectors, curve minimum two-layer fitting, and solving linear regression equations.

7. A memory numerical calculation method of a memory numerical calculation accelerator based on any one of claims 1 to 6, characterized by comprising the steps of:

8. The memory numerical calculation method of claim 7, wherein the update vector r_nThe method for judging whether the multiplication result reaches the preset precision or not is determined by a solving algorithm, wherein the solving algorithm is determined according to the task to be solved and comprises a gradient descent method, a conjugate gradient method and a power method.