CN115982092A - Storage and calculation integrated circuit, chip system and electronic equipment - Google Patents

Storage and calculation integrated circuit, chip system and electronic equipment Download PDF

Info

Publication number
CN115982092A
CN115982092A CN202111205493.9A CN202111205493A CN115982092A CN 115982092 A CN115982092 A CN 115982092A CN 202111205493 A CN202111205493 A CN 202111205493A CN 115982092 A CN115982092 A CN 115982092A
Authority
CN
China
Prior art keywords
bit data
transistor
processing circuit
data
imaginary
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111205493.9A
Other languages
Chinese (zh)
Inventor
吴志航
倪磊滨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN202111205493.9A priority Critical patent/CN115982092A/en
Publication of CN115982092A publication Critical patent/CN115982092A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Complex Calculations (AREA)

Abstract

The application discloses a storage and calculation integrated circuit which is used for realizing the complex multiplication and accumulation calculation integrated with the storage and calculation with smaller area and power consumption. The calculation integrated circuit comprises at least one storage operator array and at least one multiply-accumulate MAC processing circuit, wherein each storage operator array comprises n × m storage units arranged according to n rows and m columns, each storage unit is connected with a group of row signal lines and a group of column signal lines, the group of row signal lines are connected with the m storage units on the same row, the group of column signal lines are connected with the n storage units on the same column, and the corresponding target MAC processing circuit is connected with the target MAC processing circuit. Each of the storage units is configured to calculate a product of real and imaginary parts of the first complex numbers and real and imaginary parts of the second complex numbers, and inverse data of imaginary parts of the second complex numbers to obtain a first product result and a second product result, and the target MAC processing circuit is configured to perform an accumulation operation on the plurality of first product results and the plurality of second product results.

Description

Storage and calculation integrated circuit, chip system and electronic equipment
Technical Field
The application relates to the technical field of circuits, in particular to a storage and calculation integrated circuit, a chip system and electronic equipment.
Background
In recent years, big data and artificial intelligence technologies have rapidly developed in various fields, and the communication and computation demands of data have sharply increased, which poses higher challenges to the energy efficiency of integrated circuits. Recently, a new type of integrated Computing In Memory (CIM) computing architecture has received much attention. The problem of energy consumption such as data transportation in a calculation task is solved by integrating and designing a calculation circuit in a storage circuit. With the development of Artificial Intelligence (AI) algorithm and signal processing algorithm, a storage and computation integrated architecture needs to solve the basic operation of complex operations.
At present, a storage and computation integrated architecture for solving complex number operations is usually implemented based on a Static Random Access Memory (SRAM), in the complex number operations, an input signal in a complex number form is input to an SRAM cell, the SRAM cell stores a real part in a positive value form of another complex number, an imaginary part in a positive value form, and an imaginary part in a negative value form, the input signal in the complex number form and the real part and the imaginary part of the two complex numbers stored in the SRAM cell are respectively multiplied, and then are accumulated to obtain a multiply-accumulate result. Because two real parts and two imaginary parts are stored, twice storage array resources are needed, and each memory array needs one peripheral resource (such as peripheral driving circuits, rear-end wiring and other resources), which results in large volume, large power consumption and low calculation efficiency of the current storage and calculation integrated circuit.
Disclosure of Invention
The application provides a storage and calculation integrated circuit for realizing the complex multiplication and accumulation calculation integrated with the calculation with smaller area and power consumption. The embodiment of the application also provides a corresponding chip system and electronic equipment.
The first aspect of the present application provides a storage and integration circuit, which includes a storage array and a data processing circuit, where the storage array includes at least one storage operator array, where each storage operator array includes N × M storage units arranged in N rows and M columns, the data processing circuit includes at least one multiply-and-accumulate (MAC) processing circuit, at least one storage operator array corresponds to at least one MAC processing circuit one to one, and N and M are integers greater than 1. Each memory unit is connected with a group of row signal lines and a group of column signal lines, the group of row signal lines is connected with m memory units on the same row, the group of column signal lines is connected with n memory units on the same column, and the target MAC processing circuit corresponding to the memory sub-array where the n memory units are located.
Each storage unit is used for receiving first real part bit data and first imaginary part bit data of a first complex number from a group of connected line signal lines respectively, and carrying out operation on second real part bit data of a second complex number and related data of target bit data to obtain a first product result and a second product result, wherein the second real part bit data and the target bit data are stored in the storage unit, the target bit data comprise second imaginary part bit data or inverted data of the second imaginary part bit data, and the related data of the target bit data comprise second imaginary part bit data and inverted data of the second imaginary part bit data.
The target MAC processing circuit is configured to receive the plurality of first multiplication results and the plurality of second multiplication results from the m groups of connected column signal lines, accumulate the plurality of first multiplication results to obtain first output data, and accumulate the plurality of second multiplication results to obtain second output data.
In the present application, the term "storage and calculation integrated circuit" refers to a circuit that integrates a calculation circuit into a memory circuit to realize storage and calculation integration, and the storage and calculation integrated circuit may be applied to complex number calculation or real number calculation. The storage unit is used for realizing the calculation process of multiplying two complex numbers, and the MAC processing circuit is used for realizing the accumulation process of the calculation results of the plurality of storage units.
In this application, the integrated memory circuit may be any form of memory with a bit array structure, such as a register, a Random Access Memory (RAM), and the like. The random access memory includes: static Random Access Memory (SRAM), embedded Dynamic Random Access Memory (EDRAM), and the like.
In this application, the first complex number may also be referred to as a first operand and the second complex number may also be referred to as a second operand. The first complex number may be represented by x r +x i i to representWherein x is r Is the real part of the first complex number, x i Is the imaginary part of the first complex number. The second complex number may be represented by w r +w i i is represented by, wherein w r Is the real part of the second complex number, w i Is the imaginary part of the second complex number. The first real bit data may be bit data of different significant bits in a complementary form after binary conversion is performed on the real part of the first complex number, such as: x is the number of r,1 …x r,b . The first imaginary part bit data may be bit data of different valid bits after binary conversion of the imaginary part of the first complex number, such as: x is the number of i,1 …x i,b . The second real bit data may be bit data of different significant bits after binary conversion of the real part of the second complex number, such as: w is a r,1 …w r,m . The second imaginary part bit data may be bit data of different significant bits after binary conversion of the imaginary part of the second complex number, such as: w is a i,1 …w i,m . The inverted data of the second imaginary bit data refers to data obtained by inverting the second imaginary bit data, e.g.
Figure BDA0003306676480000021
Where subscript r denotes the real part, subscript i denotes the imaginary part, subscript b denotes the bit width of the first complex number, and subscript m denotes the bit width of the second complex number.
It should be noted that, in the present application, the first real part bit data, the first imaginary part bit data, the second real part bit data, the second imaginary part bit data, and the inverse data of the second imaginary part bit data may be analog signals, and are represented by levels. The real part of the first complex number refers to the value of the real part of the first complex number, the imaginary part of the first complex number refers to the value of the imaginary part of the first complex number, the real part of the second complex number refers to the value of the real part of the second complex number, and the imaginary part of the second complex number refers to the value of the imaginary part of the second complex number.
In this application, the first product result refers to a product result including an imaginary number after the product, such as: x is the number of r,p ·w i,q And x i,p ·w r,q The second multiplication result refers to a multiplication result that does not include an imaginary number after the multiplication, such as: x is a radical of a fluorine atom r,p ·w r,q And
Figure BDA0003306676480000023
p is any value of 1 … b and q is any value of 1 … m.
In the present application, the first output data may be a plurality of (x) r,p ·w i,q +x i,p ·w r,q ) The second output data may be a plurality of
Figure BDA0003306676480000022
The accumulated result of (c).
It can be known from the first aspect that the storage unit stores the second imaginary part bit data or the inverse data of the second imaginary part bit data, and the second imaginary part bit data and the inverse data of the second imaginary part bit data can be inverted to obtain the opposite side, so that when the complex multiplication is performed, the multiplication of the second imaginary part bit data and the inverse data of the second imaginary part bit data can be directly performed, and the multiplication and accumulation of the two complex numbers can be completed only by storing one part of the real part and the imaginary part of the second complex number, and the storage of the real part of the second complex number of two positive values, one part of the imaginary part of the second complex number of a positive value, and one part of the second complex imaginary part of a negative value is not required.
In a possible implementation manner of the first aspect, each of the computing units includes a first computing unit (LCC), a second computing unit, a third computing unit, a fourth computing unit, a first storage unit, and a second storage unit.
The first storage unit is connected with the first calculation unit and the third calculation unit, and the second storage unit is connected with the second calculation unit and the fourth calculation unit.
At least one first row signal line in the group of row signal lines is connected with the first calculating unit and the fourth calculating unit, at least one second row signal line is connected with the second calculating unit and the third calculating unit, a first column signal line in the group of column signal lines is connected with the first calculating unit and the second calculating unit, and a second column signal line is connected with the third calculating unit and the fourth calculating unit.
The first storage unit is used for storing target bit data, and the second storage unit is used for storing second real part bit data.
In this possible implementation manner, the storage unit includes a first computing unit to a fourth computing unit, each computing unit may be understood as a local computing unit (LCC), the first computing unit to the fourth computing unit may be respectively represented by LCC1, LCC2, LCC3, and LCC4, of the two storage units, a first storage unit connected in the row direction to LCC1 and LCC3 stores second imaginary bit data or inverse data of the second imaginary bit data, and a second storage unit connected in the row direction to LCC2 and LCC4 stores second real bit data. There may be one or more first row signal lines, and similarly, there may be one or more second row signal lines, and no matter there are several first row signal lines or several second row signal lines, each first row signal line is connected to LCC1 and LCC4, and the first real part bit data x is input to LCC1 and LCC4 r,p Each second row signal line is respectively connected with LCC2 and LCC3, and first imaginary part bit data x is input into LCC2 and LCC3 i,p In contrast, the first real part bit data may be different on different first row signal lines, and the first imaginary part bit data may be different on different second row signal lines. A first column signal line for transmitting the product result calculated by each of LCC1 and LCC2 to the target MAC processing circuit, and a second column signal line for transmitting the product result calculated by each of LCC3 and LCC4 to the target MAC processing circuit, are connected to LCC1 and LCC 2. As can be seen from the above description, the connection and routing manner between the row signal line and the column signal line in the storage unit and the four LCCs, and the routing manner between the storage unit and the four LCCs, can simplify the circuit connection and optimize the circuit structure of the storage unit.
In a possible implementation manner of the first aspect, the first storage unit is configured to output the second imaginary bit data to the first calculation unit, output inverted data of the second imaginary bit data to the third calculation unit, and output the second real bit data to the second calculation unit and the fourth calculation unit.
The first calculation unit is used for calculating the product of the first real part bit data and the second imaginary part bit data, the second calculation unit is used for calculating the product of the first imaginary part bit data and the second real part bit data, the third calculation unit is used for calculating the product of the first imaginary part bit data and the inverted data of the second imaginary part bit data, and the fourth calculation unit is used for calculating the product of the first real part bit data and the second real part bit data.
The first product result includes a product of the first real bit data and the second imaginary bit data, and a product of the first imaginary bit data and the second real bit data, and the second product result includes a product of the first real bit data and the second real bit data, and a product of the first imaginary bit data and an inverse of the second imaginary bit data.
In this possible implementation, the first memory cell outputs w to LCC1 i,q Output to LCC3
Figure BDA0003306676480000032
The second memory cell outputs w to LCC2 and LCC4 r,q . LCC1 for calculating x r,p ·w i,q LCC2 is used to calculate x i,p ·w r,q LCC3 for counting->
Figure BDA0003306676480000031
LCC4 for calculating x r,p ·w r,q . According to the description of the implementation mode, the two storage units store one part of real part bit data and imaginary part bit data of the second complex number, so that the multiplication of the two complex numbers can be realized, and the volume of a storage and calculation integrated circuit is effectively reduced.
In one possible implementation form of the first aspect, the target MAC processing circuit comprises an imaginary processing circuit and a real processing circuit, the imaginary processing circuit being connected to the m first column signal lines, the real processing circuit being connected to the m second column signal lines.
The imaginary part processing circuit is used for receiving the first product result from the storage unit of each column through each first column signal line, accumulating a plurality of first product results from the same column to obtain a first column accumulation result, performing weighting operation on the first column accumulation results of different columns, and accumulating the first column accumulation results of different columns after the weighting operation to obtain first output data.
The real part processing circuit is used for calculating a second product result received from the storage unit of each column through each second column signal line, accumulating a plurality of second product results from the same column to obtain a second column accumulation result, performing weighting operation on the second column accumulation results of different columns, and accumulating the weighted second column accumulation results of different columns to obtain second output data.
In this possible implementation manner, the target MAC processing circuit implements accumulation of the multiplied result including the imaginary number after multiplication and accumulation of the multiplied result including the real part after multiplication by two circuits, such as: the imaginary part processing circuit is used for accumulating a plurality of (x) r,p ·w i,q +x i,p ·w r,q ). Real part processing circuit for accumulation
Figure BDA0003306676480000041
The accumulation sequence may be that the product results on the same row are accumulated to obtain the corresponding row accumulation result, then the row accumulation result is weighted, and then the accumulation results weighted by different rows are accumulated. The accumulation mode can be combined with the importance degree of the bit data on different columns to give different weights, so that more accurate output data can be obtained.
In one possible implementation manner of the first aspect, the data processing circuit further includes a compensation circuit, one end of which is connected to the N second row signal lines through the target column signal line, and the other end of which is connected to the real part processing circuit in each of the at least one MAC processing circuit.
The compensation circuit is used for receiving a plurality of first imaginary part bit data from the N second line signal wires, accumulating the plurality of first imaginary part bit data and transmitting the accumulated result of the plurality of first imaginary part bit data to the real part processing circuit.
The real part processing circuit is used for performing summation operation on the accumulated result of the plurality of first imaginary part bit data and the second output data to obtain third output data.
In this possible implementation, it is considered that the inverse data of the second imaginary part bit data is used in the process of the product operation of the two complex numbers
Figure BDA0003306676480000042
With first imaginary bit data x i,p The product has an influence on the second output data, so that x transmitted via the second row signal line is compensated by the compensation circuit i,p And accumulating, and transmitting the accumulated result to the real part processing circuit, so that the compensation of the second output data is realized, and the accuracy of the third output data output after compensation is higher.
In a possible implementation manner of the first aspect, the m storage units located in the same row are configured to store bit data of different significant bits after binary conversion is performed on a real part and an imaginary part of the second complex number, respectively; the n storage units in the same column are used for storing bit data of the same effective bit of different second complex numbers.
In this possible implementation manner, w is stored in M storage operator units in the same row in the storage operator array arranged in N rows and M columns r,1 …w r,m And w i,1 …w i,m Or alternatively
Figure BDA0003306676480000043
And w i,1 …w i,m Or (R)>
Figure BDA0003306676480000044
The arrangement order in the m storage units may be from lower to higher or from higher to lower. The storage units in the same column may be arranged in the order of w1 … wn from top to bottom.
In one possible implementation manner of the first aspect, the first real part bit data of the first complex number includes bit data of different significant bits after binary conversion of a real part of the first complex number; the first imaginary part bit data of the first complex number comprises bit data of different effective bits after binary conversion of the imaginary part of the first complex number.
In this possible implementation, the first real bit data includes x r,1 …x r,b At least one bit of data. First imaginary bit data x i,1 …x i,b At least one bit of data.
In one possible implementation form of the first aspect, the first memory cell includes a first transistor and a second transistor, and a first inverter or/and a second inverter.
The grid electrode of the first transistor and the grid electrode of the second transistor are respectively connected with a Word Line (WL), and the word line is used for activating the first transistor and the second transistor; the source of the first transistor is connected to the input terminal of the first inverter or to the output terminal of the second inverter, and the drain of the first transistor is connected to a first Bit Line (BL).
The source of the second transistor is connected to the second bit line BLB, and the drain of the second transistor is connected to the output terminal of the first inverter or to the input terminal of the second inverter.
When the target bit data is the second imaginary bit data: the first inverter is used for converting second imaginary part bit data stored between the source electrode of the first transistor and the input end of the first inverter into inverted data of the second imaginary part bit data and outputting the inverted data to the second transistor; the first bit line and the second bit line are used for reading or writing the second imaginary bit data from the first memory cell and reading or writing the second real bit data from the second memory cell.
When the target bit data is inverted data of the second imaginary bit data: the second inverter is used for converting the inverted data of the second imaginary part bit data stored between the drain electrode of the second transistor and the input end of the second inverter into second imaginary part bit data and outputting the second imaginary part bit data to the first transistor; the first bit line and the second bit line are used for reading or writing the second imaginary part bit data from the first memory cell and reading or writing the second real part bit data from the second memory cell.
In this possible implementation manner, the first storage unit in the storage unit may be implemented by two transistors and one or two inverters, so as to quickly and effectively obtain the other side when storing a second imaginary part bit data or the inverted data thereof, thereby implementing the multiplication and accumulation operation of complex numbers with a circuit having a smaller volume.
In one possible implementation manner of the first aspect, the first calculation unit includes a third transistor and a fourth transistor, a gate of the third transistor is connected to the first storage unit, the second imaginary part bit data is received from the first storage unit, a source of the third transistor is grounded, a drain of the third transistor is connected to a source of the fourth transistor, a gate of the fourth transistor is connected to the first row signal line, and a drain of the fourth transistor is connected to the first column signal line.
The second calculation unit includes a fifth transistor and a sixth transistor, a gate of the fifth transistor is connected to the second storage unit, the second real part bit data is received from the second storage unit, a source of the fifth transistor is grounded, a drain of the fifth transistor is connected to a source of the sixth transistor, a gate of the sixth transistor is connected to the second row signal line, and a drain of the sixth transistor is connected to the first column signal line.
The third calculation unit includes a seventh transistor and an eighth transistor, a gate of the seventh transistor is connected to the first memory cell, and receives inverted data of the second imaginary part bit data from the first memory cell, a source of the seventh transistor is grounded, a drain of the seventh transistor is connected to a source of the eighth transistor, a gate of the eighth transistor is connected to the second row signal line, and a drain of the eighth transistor is connected to the second column signal line.
The fourth calculation unit includes a ninth transistor and a tenth transistor, a gate of the ninth transistor is connected to the second memory unit, the second real part bit data is received from the second memory unit, a source of the ninth transistor is grounded, a drain of the ninth transistor is connected to a source of the tenth transistor, a gate of the tenth transistor is connected to the first row signal line, and a drain of the tenth transistor is connected to the second column signal line.
It should be noted that the connection relationships between the sources and the drains in the third transistor to the tenth transistor and other devices may be interchanged, and the above-described positional relationship is not limited.
In this possible implementation, four LCCs may include two transistors through which multiplication is performed and the result of the multiplication is transmitted on the column signal line.
In one possible implementation manner of the first aspect, the target column signal line is connected to the second row signal line through an eleventh transistor, a gate of the eleventh transistor is connected to the second row signal line, a source of the eleventh transistor is grounded, and a drain of the eleventh transistor is connected to the target column signal line.
In this possible implementation, the transfer of the first imaginary part bit data on the second row signal line to the target column signal line is realized by one transistor on each second row signal line.
In one possible implementation manner of the first aspect, the imaginary part processing circuit or the real part processing circuit includes a digital post-processing circuit and m analog-to-digital conversion circuits, the digital post-processing circuit is connected with the m analog-to-digital conversion circuits, and each analog-to-digital conversion circuit is connected with one column signal line.
Each analog-to-digital conversion circuit comprises a switch, a capacitor, a power supply and an analog-to-digital converter; the fixed end of the switch is connected with one end of the capacitor, and the movable end of the switch is connected with a power supply or a corresponding column signal wire; the other end of the capacitor is connected with an analog signal input end of the analog-to-digital converter; the digital signal output end of the analog-to-digital converter is connected with the digital post-processing circuit; the analog-to-digital converter is used for converting the first product result or the second product result in an analog signal state into a digital signal and transmitting the digital signal to the digital post-processing circuit.
In this possible implementation, each analog-to-digital converter (ADC) is provided with an analog-to-digital converter (ADC), and after each ADC converts the analog signals on the corresponding signal lines into digital signals, the digital post-processing circuit performs an accumulation operation.
In one possible implementation manner of the first aspect, the imaginary part processing circuit or the real part processing circuit includes a digital post-processing circuit, an analog-to-digital converter, and m capacitance circuits, the digital post-processing circuit is connected with the m capacitance circuits, and each capacitance circuit is connected with one column signal line.
Each capacitor circuit comprises a switch, a capacitor and a power supply; the fixed end of the switch is connected with one end of the capacitor, and the movable end of the switch is connected with a power supply or a corresponding column signal wire; the other end of the capacitor is connected with an analog signal input end of the analog-to-digital converter; the digital signal output end of the analog-to-digital converter is connected with the digital post-processing circuit; the analog-to-digital converter is used for converting the accumulation result of the first multiplication result or the accumulation result of the second multiplication result in an analog signal state into a digital signal and transmitting the digital signal to the digital post-processing circuit.
In this possible implementation, the analog signals on the m capacitor circuits are collected to one ADC, and then converted into digital signals by the ADC. This can reduce the number of ADCs and further reduce the area of the integrated circuit.
In one possible implementation manner of the first aspect, the digital post-processing circuit in the real part processing circuit further receives a compensation digital signal from an analog-to-digital conversion circuit of the compensation circuit.
In this possible implementation, the compensation circuit is also configured with an analog-to-digital conversion circuit, and after the compensation circuit converts the analog signal into a digital signal, the digital signal is transmitted to the digital post-processing circuit of the real part processing circuit for accumulation operation.
A second aspect of the present application provides a chip system, which includes a computation integrated circuit and a peripheral circuit, where the computation integrated circuit is connected to the peripheral circuit, and the computation integrated circuit is the computation integrated circuit described in the first aspect or any possible implementation manner of the first aspect.
A third aspect of the present application provides an electronic device, which includes the integrated circuit according to the first aspect or any one of the possible implementation manners of the first aspect.
Drawings
FIG. 1 is a schematic diagram of a storage integrated circuit according to an embodiment of the present disclosure;
fig. 2A is a schematic structural diagram of a storage unit provided in an embodiment of the present application;
fig. 2B is a schematic structural diagram of a storage unit provided in an embodiment of the present application;
FIG. 3 is a schematic diagram of another structure of a storage unit according to an embodiment of the present disclosure;
fig. 4 is a schematic structural diagram of a first storage unit in a storage unit provided in an embodiment of the present application;
fig. 5A to fig. 5D are schematic structural diagrams of four LCCs in a storage unit provided in an embodiment of the present application;
fig. 6 is another schematic structural diagram of a storage and computation integrated circuit provided in an embodiment of the present application;
FIG. 7 is a schematic diagram of a circuit configuration for connecting a compensation circuit to a second row signal line according to an embodiment of the present application;
FIG. 8 is a schematic diagram of another structure of a storage integration circuit provided in an embodiment of the present application;
fig. 9 is a schematic structural diagram of a real part processing circuit provided in an embodiment of the present application;
fig. 10 is another schematic structural diagram of a real part processing circuit provided in an embodiment of the present application;
fig. 11 is another schematic structural diagram of a storage and computation integrated circuit provided in an embodiment of the present application.
Detailed Description
Embodiments of the present application will now be described with reference to the accompanying drawings, and it is to be understood that the described embodiments are merely illustrative of some, but not all, embodiments of the present application. As can be known to those skilled in the art, with the development of technology and the emergence of new scenarios, the technical solution provided in the embodiments of the present application is also applicable to similar technical problems.
The terms "first," "second," and the like in the description and in the claims of the present application and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein. Moreover, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Embodiments of the present application provide complex multiply-accumulate calculations for implementing the calculations with smaller area and power consumption. The embodiment of the application further provides a corresponding chip and electronic equipment. The following are detailed below.
In the embodiment of the present application, the storage and calculation integrated circuit refers to a circuit that integrates a calculation circuit into a memory circuit to realize storage and calculation integration, and the storage and calculation integrated circuit may be applicable to complex number operation and may also be applicable to real number operation. The storage unit is used for realizing the calculation process of multiplying two complex numbers, and the MAC processing circuit is used for realizing the accumulation process of the calculation results of the plurality of storage units.
In the embodiment of the present application, the integral storage circuit may be a memory with any form of bit array structure, such as a register, a Random Access Memory (RAM), and the like. The random access memory includes: static Random Access Memory (SRAM), embedded Dynamic Random Access Memory (EDRAM), and the like.
The structure of the integrated circuit provided by the embodiment of the present application can be understood with reference to fig. 1. As shown in fig. 1, the architecture of the bank circuit includes a bank array 10 and a data processing circuit 20. The storage operator array 10 includes at least one storage operator array 101, where each storage operator array includes N × M storage units 1011 arranged in N rows and M columns, the data processing circuit 20 includes at least one multiply-and-accumulate (MAC) processing circuit 201, at least one storage operator array 101 corresponds to at least one MAC processing circuit 201 one to one, and N and M are integers greater than 1. Each of the storage units 1011 is connected to a group of row signal lines and a group of column signal lines, the group of row signal lines is connected to m storage units 1011 located on the same row, the group of column signal lines is connected to n storage units 1011 located on the same column, and the target MAC processing circuit 201 corresponding to the storage sub-array 101 where the n storage units 1011 are located.
Each of the storage units 1011 is configured to receive first real bit data and first imaginary bit data of the first complex number from the connected group of row signal lines, and perform an operation with second real bit data of the second complex number and related data of target bit data to obtain a first product result and a second product result, where the second real bit data and the target bit data are stored in the storage unit, the target bit data includes the second imaginary bit data or inverted data of the second imaginary bit data, and the related data of the target bit data includes the second imaginary bit data and inverted data of the second imaginary bit data.
Target MAC processing circuit 201 is configured to receive a plurality of first multiplication results and a plurality of second multiplication results from the m groups of connected column signal lines, and accumulate the plurality of first multiplication results to obtain first output data, and accumulate the plurality of second multiplication results to obtain second output data.
In the embodiment of the present application, the first complex number may also be referred to as a first operand, and the second complex number may also be referred to as a second operand. The first complex number may be x r +x i i is represented by, wherein x r Is the real part of the first complex number, x i Is the imaginary part of the first complex number. The second complex number may be represented by w r +w i i is represented by, wherein w r Is the real part of the second complex number, w i Is the imaginary part of the second complex number. The first real bit data may be bit data of different significant bits after binary conversion of the real part of the first complex number, such as: x is the number of r,1 …x r,b . The first imaginary part bit data may be bit data of different significant bits after binary conversion of the imaginary part of the first complex number, such as: x is the number of i,1 …x i,b . The second real bit data may be bit data of different significant bits after binary conversion of the real part of the second complex number, such as: w is a r,1 …w r,m . The second imaginary part bit data may be bit data of different significant bits after binary conversion of the imaginary part of the second complex number, such as: w is a i,1 …w i,m . The inverted data of the second imaginary bit data refers to data obtained by inverting the second imaginary bit data, e.g.
Figure BDA0003306676480000081
Where subscript r denotes the real part, subscript i denotes the imaginary part, subscript b denotes the bit width of the first complex number, and subscript m denotes the bit width of the second complex number.
It should be noted that, in the embodiment of the present application, the first real part bit data, the first imaginary part bit data, the second real part bit data, the second imaginary part bit data, and the inverse data of the second imaginary part bit data may be analog signals, and are represented by levels. The real part of the first complex number refers to the value of the real part of the first complex number, the imaginary part of the first complex number refers to the value of the imaginary part of the first complex number, the real part of the second complex number refers to the value of the real part of the second complex number, and the imaginary part of the second complex number refers to the value of the imaginary part of the second complex number.
In the embodiment of the present application, the first multiplication result refers to a multiplication result including an imaginary number after multiplication, such as: x is the number of r,p ·w i,q And x i,p ·w r,q The second product result refers to a product result that does not include an imaginary number after the product, such as: x is the number of r,p ·w r,q And
Figure BDA0003306676480000082
p is any value of 1 … b and q is any value of 1 … m.
In the embodiment of the present application, the first output data may be a plurality of (x) r,p ·w i,q +x i,p ·w r,q ) The second output data may be a plurality of
Figure BDA0003306676480000083
The result of the accumulation.
As can be seen from this embodiment, the second imaginary part bit data w is stored in the storage unit 1011 i,q Or inverted data of the second imaginary part bit data
Figure BDA0003306676480000084
The second imaginary part bit data and the inverse phase data of the second imaginary part bit data can be inverted to obtain the opposite side, therefore, when complex multiplication operation is carried out, the second imaginary part bit data and the inverse phase data of the second imaginary part bit data can be directly used for multiplication operation, multiplication and accumulation operation of two complex numbers can be completed only by storing one part of the real part and the imaginary part of the second complex number, the real part of the second complex number of two positive values does not need to be stored, one part of the imaginary part of the second complex number of the positive value and one part of the second complex imaginary part of the negative value, the size of the storage and calculation integrated circuit is effectively reduced, power consumption is reduced, two parts of data do not need to be calculated, and calculation efficiency is also improved.
The structure of the storage unit 1011 in the embodiment shown in fig. 1 described above can be understood with reference to fig. 2A. As shown in fig. 2A, the calculation unit 1011 may include a first calculation unit 10111, a second calculation unit 10112, a third calculation unit 10113, a fourth calculation unit 10114, a first storage unit 10115, and a second storage unit 10116. The first storage unit 10115 connects the first calculation unit 10111 and the third calculation unit 10113, and the second storage unit 10116 connects the second calculation unit 10112 and the fourth calculation unit 10114.
At least one first row signal line in one group of row signal lines is connected to the first calculation unit 10111 and the fourth calculation unit 10114, at least one second row signal line is connected to the second calculation unit 10112 and the third calculation unit 10113, a first column signal line in one group of column signal lines is connected to the first calculation unit 10111 and the second calculation unit 10112, and a second column signal line is connected to the third calculation unit 10113 and the fourth calculation unit 10114.
The first storage unit 10115 is used to store target bit data, and the second storage unit 10116 is used to store second real bit data. The target bit data and the second real bit data may be pre-configured in the first storage unit 10115 and the second storage unit 10116, and of course, the data stored in the first storage unit 10115 and the second storage unit 10116 may be updated according to the requirements of different application scenarios.
The first storage unit 10115 is configured to output the second imaginary bit data to the first calculation unit 10111, output inverted data of the second imaginary bit data to the third calculation unit 10113, and the second storage unit 10116 is configured to output the second real bit data to the second calculation unit 10112 and the fourth calculation unit 10114.
The first calculation unit 10111 is configured to calculate a product of the first real bit data and the second imaginary bit data, the second calculation unit 10112 is configured to calculate a product of the first imaginary bit data and the second real bit data, the third calculation unit 10113 is configured to calculate a product of the first imaginary bit data and inverse data of the second imaginary bit data, and the fourth calculation unit 10114 is configured to calculate a product of the first real bit data and the second real bit data.
The first product result includes a product of the first real bit data and the second imaginary bit data, and a product of the first imaginary bit data and the second real bit data, and the second product result includes a product of the first real bit data and the second real bit data, and a product of the first imaginary bit data and an inverse of the second imaginary bit data.
While the group of row signal lines shown in fig. 2A includes one first row signal line and one second row signal line, in practice, there may be one or more first row signal lines, and similarly, there may be one or more second row signal lines, and no matter there are several first row signal lines or second row signal lines, each first row signal line is connected to the first calculating unit 10111 and the fourth calculating unit 10114, and the first real part bit data x is input to the first calculating unit 10111 and the fourth calculating unit 10114 r,p Each second row signal line is connected to the second calculation unit 10112 and the third calculation unit 10113, respectively, and the first imaginary bit data x is input to the second calculation unit 10112 and the third calculation unit 10113 i,p With the difference thatThe first real bit data on different first row signal lines may be different and the first imaginary bit data on different second row signal lines may be different.
Fig. 2B below illustrates connection relationships of each of the first row signal lines and each of the second row signal lines to the first calculation unit 10111 to the fourth calculation unit 10114 by taking an example in which one group of row signal lines includes two first row signal lines and two second row signal lines.
As shown in fig. 2B, two first row signal lines are connected to the first calculation unit 10111 and the fourth calculation unit 10114, respectively, and two second row signal lines are connected to the second calculation unit 10112 and the third calculation unit 10113, respectively.
The first to fourth computing units of the storage unit may be local computing units (LCCs), and the first to fourth computing units 10111 to 10114 may be represented by LCCs 1, LCCs 2, LCCs 3 and LCCs 4 shown in fig. 3, respectively, and the first storage unit stores w therein i,q Or
Figure BDA0003306676480000101
The second storage unit stores w r,q . The first memory cell inputs w to LCC1 i,q Input to LCC 3->
Figure BDA0003306676480000102
The second memory cell inputs w to LCC2 and LCC4, respectively r,q . LCC1 receives x from the first row signal line r,p And then calculate x r,p And w i,q Product of (a) x r,p ·w i,q . LCC2 receives x from the second row signal line i,p And then calculate x i,p And w r,q Product of (a) x i,p ·w r,q LCC3 receives x from the second row signal line i,p And then further on calculating x i,p And/or>
Figure BDA0003306676480000103
Is greater than or equal to>
Figure BDA0003306676480000104
LCC4 fromA row of signal lines receiving x r,p And then calculate x r,p And w r,q X is the product of r,p ·w r,q . LCC1 then transmits x to the target MAC processing circuit via the first column signal line r,p ·w i,q LCC1 transmits x to the target MAC processing circuit via the first column signal line i,p ·w r,q And the LCC3 is transmitted to the target MAC processing circuit through the second column signal wire
Figure BDA0003306676480000105
LCC4 transmits x to target MAC processing circuit through second column signal wire r,p ·w i,q 。/>
As can be seen from the above description of the embodiments corresponding to fig. 2A, fig. 2B, and fig. 3, the connection and routing manner between the row signal line and the column signal line in the storage unit and the routing manner between the storage unit and the four LCCs can simplify the circuit connection and optimize the circuit structure of the storage unit, and the real part bit data and the imaginary part bit data of the second complex number are stored in the two storage units, so that the multiplication of the two complex numbers can be realized, and the volume of the storage-calculation integrated circuit is effectively reduced.
The structure of the first memory cell shown in fig. 2A, 2B or 3 can be understood with reference to fig. 4, which includes a first transistor T1 and a second transistor T2, and a first inverter S1 and a second inverter S2, as shown in fig. 4. The first inverter S1 and the second inverter S2 can generate complementary inverted signals, and two inverters are generally included in the SRAM. Similarly, for a register or Latch-like memory, since the data holding node thereof usually includes 2 inverters, a complementary inverted signal can be generated. In a memory without complementary signals (such as a DRAM and an eDRAM), an inverter is additionally arranged to generate an inverted signal for realizing the multiply-accumulate calculation. In the embodiment of the present application, the inverted signal may be replaced with inverted data.
The gate of the first transistor T1 and the gate of the second transistor T2 are respectively connected to a Word Line (WL) for activating the first transistor and the second transistor. The source of the first transistor T1 is connected to the input of the first inverter S1 or to the output of the second inverter S2, and the drain of the first transistor T1 is connected to a first Bit Line (BL).
The source of the second transistor T2 is connected to the second bit line BLB, and the drain of the second transistor T2 is connected to the output terminal of the first inverter S1 or the input terminal of the second inverter S2.
When the target bit data is the second imaginary bit data w i,q The method comprises the following steps: the first inverter S1 is for converting the second imaginary bit data stored between the source of the first transistor T1 and the input terminal of the first inverter S1 into inverted data of the second imaginary bit data
Figure BDA0003306676480000106
And output to the second transistor T2; the first bit line and the second bit line are used for reading or writing the second imaginary part bit data w from the first memory cell i,q Reading or writing second real bit data w from the second memory cell r,q
When the target bit data is the inverse data of the second imaginary bit data
Figure BDA0003306676480000107
When the method is used: the second inverter S2 is for inverting data of second imaginary part bit data to be stored between the drain of the second transistor T2 and the input terminal of the second inverter S2
Figure BDA0003306676480000114
Conversion into second imaginary bit data w i,q And output to the first transistor T1; the first bit line and the second bit line are used for reading or writing the second imaginary part bit data w from the first memory cell i,q Reading or writing second real bit data w from the second memory cell r,q
In this embodiment, the first storage unit in the storage unit may be implemented by two transistors and one or two inverters to quickly and effectively obtain the other one when storing one second imaginary part bit data or its inverse data, thereby implementing the multiplication and accumulation operation of complex numbers with a circuit having a smaller volume.
The structures of LCCs 1-4 shown in fig. 3 can be understood with reference to fig. 5A-5D.
As shown in fig. 5A, in the LCC1 structure, the LCC1 includes a third transistor T3 and a fourth transistor T4, a gate of the third transistor T3 is connected to the first storage unit, and the second imaginary part bit data w is received from the first storage unit i,q A source of the third transistor T3 is grounded, a drain of the third transistor T3 is connected to a source of the fourth transistor T4, a gate of the fourth transistor T4 is connected to the first row signal line, and the first real part bit data x is received from the first row signal line r,p The drain of the fourth transistor T4 is connected to the first column signal line, and outputs x to the first column signal line r,p ·w i,q
The structure of the LCC2 shown in fig. 5B, the LCC2 includes a fifth transistor T5 and a sixth transistor T6, the gate of the fifth transistor T5 is connected to the second memory cell, and the second real bit data w is received from the second memory cell r,q A source of the fifth transistor T5 is grounded, a drain of the fifth transistor T5 is connected to a source of the sixth transistor T6, a gate of the sixth transistor T6 is connected to the second row signal line, and the first imaginary part bit data x is received from the second row signal line i,p The drain of the sixth transistor T6 is connected to the first column signal line, and outputs x to the first column signal line i,p ·w r,q
As shown in fig. 5C, in the LCC3 structure, the LCC3 includes a seventh transistor T7 and an eighth transistor T8, a gate of the seventh transistor T7 is connected to the first memory cell, and inverted data of the second imaginary part bit data is received from the first memory cell
Figure BDA0003306676480000112
A source of the seventh transistor T7 is grounded, a drain of the seventh transistor T7 is connected to a source of the eighth transistor T8, a gate of the eighth transistor T8 is connected to the second row signal line, and the first imaginary part bit data x is received from the second row signal line i,p The drain of the eighth transistor T8 is connected to the second column signal line and outputs/holds to the second column signal line>
Figure BDA0003306676480000113
As shown in fig. 5D, the LCC4 includes a ninth transistor T9 and a tenth transistor T10, the gate of the ninth transistor T9 is connected to the second memory cell, and the second real bit data w is received from the second memory cell r,q A source of the ninth transistor T9 is grounded, a drain of the ninth transistor T9 is connected to a source of the tenth transistor T10, a gate of the tenth transistor T10 is connected to the first row signal line, and the first real part bit data x is received from the first row signal line r,p A drain of the tenth transistor is connected to the second column signal line, and outputs x to the second column signal line r,p ·w r,q
It should be noted that the connection relationships between the source and the drain of the third transistor to the tenth transistor and other devices may be interchanged, and the above-described positional relationship is not limited.
In this possible implementation, two transistors may be included in each of the four LCCs, through which multiplication is performed, and transmission of the product result on the column signal line.
Alternatively, as shown in fig. 6, the target MAC processing circuit includes an imaginary part processing circuit 2011 and a real part processing circuit 2012, the imaginary part processing circuit 2011 connecting the m first column signal lines, and the real part processing circuit 2012 connecting the m second column signal lines.
The imaginary part processing circuit 2011 is configured to receive the first multiplication result from the storage unit in each row through each first row signal line, accumulate the plurality of first multiplication results from the same row to obtain a first row accumulation result, perform a weighting operation on the first row accumulation results in different rows, and accumulate the first row accumulation results in different rows after the weighting operation to obtain first output data.
The real part processing circuit 2012 is configured to calculate a second multiplication result received from the storage unit of each column via each second column signal line, accumulate the plurality of second multiplication results from the same column to obtain a second column accumulation result, perform weighting operation on the second column accumulation results of different columns, and accumulate the weighted second column accumulation results of different columns to obtain second output data.
Alternatively, the data processing circuit further includes a compensation circuit 203, one end of the compensation circuit 203 being connected to the N second row signal lines through the target column signal line, and the other end being connected to the real part processing circuit 2012 in each of the at least one MAC processing circuits 201.
The compensation circuit 203 is configured to receive a plurality of first imaginary bit data from the N second row signal lines, accumulate the plurality of first imaginary bit data, and transmit an accumulation result of the plurality of first imaginary bit data to the real processing circuit.
The real part processing circuit 2012 is configured to sum the accumulated result of the plurality of first imaginary part bit data and the second output data to obtain third output data.
The target column signal line and the second row signal line may be connected by an eleventh transistor, a gate of the eleventh transistor is connected to the second row signal line, a source of the eleventh transistor is grounded, and a drain of the eleventh transistor is connected to the target column signal line. The connection relationship between the target signal line and the eleventh transistor can be understood with reference to fig. 7, and as shown in fig. 7, the second row signal line is connected to the gate of the eleventh transistor T11, the source of T11 is grounded, and the drain of T11 is connected to the target column signal line.
For better understanding of the above-mentioned storage integrated circuit, the process of multiply-accumulate operation of the storage integrated circuit on complex numbers will be described with reference to fig. 8.
As shown in fig. 8, in the data preparation stage, binary conversion is performed on the real part and imaginary part values of n second complex numbers with a bit width of m, so as to obtain m imaginary part bit data and real part bit data arranged from a lower bit to a higher bit in a complementary form, where the imaginary part bit data and the real part bit data on the same row shown in fig. 8 are arranged from a lower bit to a higher bit, and may also be arranged from a higher bit to a lower bit, which is not limited in this embodiment.
N second complex numbers w in the embodiments of the present application r +w i Binary of iThe form of the transformed imaginary bit data and real bit data can be understood with reference to table 1 below.
Table 1: imaginary bit data and real bit data of n second complex numbers
Figure BDA0003306676480000131
In the integrated circuit shown in fig. 8, in the storage units in the same row, real bit data and imaginary bit data of the same dimension are stored, for example: w1 i,1 ,w1 r,1 Stored in the first memory cell of the first row, w1 i,2 ,w1 r,2 Stored in the second memory cell of the same row, …, w1 i,m ,w1 r,m Stored in the m-th storage unit of the same row. w2 i,1 ,w2 r,1 Stored in the first memory cell of the second row, w2 i,2 ,w2 r,2 Stored in a second memory cell of the same row, …, w2 i,m ,w2 r,m Stored in the m-th storage unit of the same row. wn i,1 ,wn r,1 Stored in the first memory cell of the nth row, wn i,2 ,wn r,2 Stored in a second memory location of the same row, …, wn i,m ,wn r,m Stored in the m-th storage unit of the same row. That is, m storage units located in the same row are used for storing bit data of different significant bits after binary conversion of the real part and the imaginary part of the second complex number. As can also be seen from table 1 and fig. 8, the n storage units located in the same column are used for storing bit data of the same significant bit of different second complex numbers.
The imaginary bit data in table 1 may be understood as the second imaginary bit data in the foregoing embodiment, and the real bit data is the second real bit data in the foregoing embodiment.
Here, table 1 lists the case where the second imaginary bit data is stored in the storage unit, and actually, the second imaginary bit data may be replaced by inverse data of the second imaginary bit data, such as: by using
Figure BDA0003306676480000133
Alternative wn i,m . This case is not expanded on specifically here, reference may be made to->
Figure BDA0003306676480000134
Wn i,m The inverse correlation of (c) is understood to be the case where n and m take other values.
The representation of the real bit data and the imaginary bit data of the n first complex numbers as the input signal, where the bit width is b, can be understood with reference to table 2.
Table 2: imaginary bit data and real bit data of n first complex numbers
Figure BDA0003306676480000132
The imaginary bit data in table 1 may be understood as the first imaginary bit data in the foregoing embodiment, and the real bit data is the first real bit data in the foregoing embodiment.
In the multiply-accumulate calculation, in the first row, x1 is sequentially input to each of the storage units through the first signal line r,1 …x1 r,b Sequentially inputting x1 to each memory cell through a second signal line i,1 …x1 i,b . In the second row, x2 is input to each memory cell in turn through the first signal line r,1 …x2 r,b Sequentially inputting x2 to each memory cell via a second signal line i,1 …x2 i,b . In the n-th row, xn is input to each of the storage units in turn through the first signal line r,1 …xn r,b Sequentially inputting xn to each storage unit through a second signal line i,1 …xn i,b
After the first imaginary bit data and the first real bit data of the first complex number are input into the storage unit, the storage unit will store the first imaginary bit data and the first real bit data of the first complex number, and the second imaginary bit data, the second real bit data, and the second imaginary bit data of the second complex numberThe inverse data performs multiplication operations, and the specific operation process can be understood with reference to the process described in fig. 3, such as: with xn r,b ,xn i,b ,wn r,m ,wn i,m
Figure BDA0003306676480000146
The multiplication process of these several bits of data is taken as an example, and the following product is obtained: xn r,b ·wn i,m ,xn i,b ·wn r,m ,/>
Figure BDA0003306676480000147
xn r,b ·wn r,m . Then, xn is transmitted to the imaginary part processing circuit 2011 through the first column signal line r,b ·wn i,m ,xn i,b ·wn r,m Transferred { [ MEANS FOR TRANSMITTING } TO THE REAR PART PROCESSING CIRCUIT 2012 Via the second column signal line>
Figure BDA0003306676480000148
xn r,b ·wn r,m
The imaginary part processing circuit 2011 may perform accumulation of the product results of the respective storage units received through the first column signal lines, and the real part processing circuit 2012 may perform accumulation of the product results of the respective storage units received through the second column signal lines.
The accumulation process in the imaginary part processing circuit 2011 may be to accumulate the first product result in the same row to obtain a first row accumulation result, perform a weighting operation on the first row accumulation result in a different row, and accumulate the first row accumulation result in a different row after the weighting operation to obtain the first output data. The process can be formulated as:
Figure BDA0003306676480000141
the formula represents a first column accumulation result in an m-th column for bit data of a first complex number in a b-th significant bit. The formula for accumulating the first column accumulation results of different columns after weighting operation can be expressed as:
Figure BDA0003306676480000142
the accumulation process in the real part processing circuit 2012 may be to accumulate the second product result of the same column to obtain a second column accumulation result, perform a weighting operation on the second column accumulation results of different columns, and accumulate the second column accumulation results of different columns after the weighting operation to obtain second output data. The process can be formulated as:
Figure BDA0003306676480000143
the accumulated result in the compensation circuit 203 can be formulated as:
Figure BDA0003306676480000144
these two equations represent the second column accumulation result at the mth column for the bit data of the first complex number at the b-th significant bit. The formula for accumulating the second column accumulation results of different columns after the weighting operation can be expressed as:
Figure BDA0003306676480000145
Figure BDA0003306676480000151
/>
the weighting coefficients in the above formula: c. C m,b =c m ·c b Wherein:
Figure BDA0003306676480000152
Figure BDA0003306676480000153
third output data: y is r =y′ r +y″ r
The structure of the imaginary part processing circuit 2011 or the real part processing circuit 2012 described above can be understood with reference to fig. 9 or 10.
As shown in fig. 9, the real part processing circuit 2012 includes a digital post-processing circuit 20121 and m analog-to-digital conversion circuits 20122, the digital post-processing circuit 20121 is connected to the m analog-to-digital conversion circuits 20122, and each analog-to-digital conversion circuit 20122 is connected to one column signal line.
Each analog-to-digital conversion circuit 20122 includes a switch K, a capacitor C, a power supply VDD, and an analog-to-digital converter (ADC); the fixed end of the switch K is connected with one end of the capacitor C, and the movable end of the switch K is connected with a power supply VDD or a corresponding column signal line; the other end of the capacitor C is connected with an analog signal input end of the analog-to-digital converter ADC; the digital signal output end of the analog-to-digital converter ADC is connected with the digital post-processing circuit 20121; the analog-to-digital converter ADC is configured to convert the second product result in the analog signal state into a digital signal, and transmit the digital signal to the digital post-processing circuit 20121.
The digital post-processing circuit 20121 also receives the compensated digital signal from the analog-to-digital conversion circuit of the compensation circuit.
The digital post-processing circuit 20121 performs the accumulation operation according to the accumulation procedure in the foregoing embodiment to obtain the third output data.
Fig. 9 shows a real part processing circuit, and if the imaginary part processing circuit is the real part processing circuit, only the compensation circuit needs to be removed, and other parts can be understood by referring to the real part processing circuit shown in fig. 9.
As shown in fig. 10, the real part processing circuit 2012 includes a digital post-processing circuit 20121, an analog-to-digital converter ADC, and m capacitance circuits 20123, the digital post-processing circuit 20121 is connected to the m capacitance circuits 20123, and each capacitance circuit 20123 is connected to one column signal line.
Each capacitor circuit 20123 includes a switch K, a capacitor C, and a power supply VDD; the fixed end of the switch K is connected with one end of the capacitor C, and the movable end of the switch K is connected with a power supply VDD or a corresponding column signal line; the other end of the capacitor C is connected with an analog signal input end of the analog-to-digital converter ADC; the digital signal output end of the analog-to-digital converter ADC is connected with the digital post-processing circuit 20121; the ADC is configured to convert the accumulated result of the second multiplication result in the analog signal state into a digital signal, and transmit the digital signal to the digital post-processing circuit 20121.
The digital post-processing circuit 20121 performs the accumulation operation according to the accumulation procedure in the foregoing embodiment to obtain the first output data.
Fig. 10 shows a real part processing circuit, and if the imaginary part processing circuit is used, only the compensation circuit needs to be removed, and other parts can be understood by referring to the real part processing circuit shown in fig. 10.
As can be seen from a comparison between fig. 9 and fig. 10, the analog signals on the m capacitor circuits are collected into one ADC, and then are converted into digital signals by the ADC. This reduces the number of ADCs, and further reduces the area of the integrated circuit.
Optionally, as shown in fig. 11, the integrated circuit may further include a data input driving interface, a row input driving circuit, a control circuit, a column output driving and processing circuit, and a data processing driving interface in addition to those described above.
The data input interface and the data output interface are used for realizing digital communication between the arithmetic integrated circuit and other modules of the chip system.
A row input drive circuit: n groups of row direction input signals required in operation are provided for the storage and calculation array, and parameters such as voltage or current amplitude, signal pulse width and the like corresponding to the data input control signals are input.
Column output driver and processing circuitry: and providing column direction driving signals required by operation for the storage array, and further processing column direction calculation output signals of the storage array to obtain a complete multiply-accumulate operation result.
The control circuit: the circuit comprises a clock and the like, and the functional modules are controlled in corresponding time sequence.
In an embodiment of the present application, a chip system is further provided, where the chip system includes a storage and computation integrated circuit and a peripheral circuit, where the storage and computation integrated circuit is connected to the peripheral circuit, and the storage and computation integrated circuit is a storage and computation integrated circuit of any one of the foregoing optional structures.
In an embodiment of the present application, an electronic device is further provided, where the electronic device includes the storage and computation integrated circuit with any one of the optional structures described above.
The electronic device may be a terminal device, a server, or a virtual machine.
The above description is only a specific embodiment of the embodiments of the present application, but the scope of the embodiments of the present application is not limited thereto.

Claims (15)

1. An all-in-one counting circuit is characterized by comprising a counting array and a data processing circuit, wherein the counting array comprises at least one counting operator array, each counting operator array comprises n × m counting units arranged according to n rows and m columns, the data processing circuit comprises at least one multiply-accumulate MAC processing circuit, the at least one counting operator array corresponds to the at least one MAC processing circuit in a one-to-one mode, and n and m are integers larger than 1;
each storage unit is connected with a group of row signal lines and a group of column signal lines, the group of row signal lines are connected with m storage units on the same row, the group of column signal lines are connected with n storage units on the same column, and a target MAC processing circuit corresponding to the storage sub-array where the n storage units are located;
each storage unit is configured to receive first real part bit data and first imaginary part bit data of a first complex number from a group of connected row signal lines, respectively, and perform an operation with second real part bit data of a second complex number and related data of target bit data to obtain a first product result and a second product result, where the second real part bit data and the target bit data are stored in the storage unit, the target bit data includes second imaginary part bit data or inverted data of the second imaginary part bit data, and the related data of the target bit data includes the second imaginary part bit data and inverted data of the second imaginary part bit data;
the destination MAC processing circuit is configured to receive a plurality of the first multiplication results and a plurality of the second multiplication results from the m groups of connected column signal lines, and accumulate the plurality of the first multiplication results to obtain first output data, and accumulate the plurality of the second multiplication results to obtain second output data.
2. The integrated circuit of claim 1, wherein each of the storage units comprises a first calculation unit, a second calculation unit, a third calculation unit, a fourth calculation unit, a first storage unit and a second storage unit;
the first storage unit is connected with the first calculation unit and the third calculation unit, and the second storage unit is connected with the second calculation unit and the fourth calculation unit;
at least one first row signal line in the group of row signal lines is connected with the first calculating unit and the fourth calculating unit, at least one second row signal line is connected with the second calculating unit and the third calculating unit, a first column signal line in the group of column signal lines is connected with the first calculating unit and the second calculating unit, and a second column signal line is connected with the third calculating unit and the fourth calculating unit;
the first storage unit is used for storing the target bit data, and the second storage unit is used for storing the second real bit data.
3. The memory integrated circuit of claim 2,
the first storage unit is configured to output the second imaginary part bit data to the first calculation unit, and output inverted data of the second imaginary part bit data to the third calculation unit;
the second storage unit is configured to output the second real part bit data to the second calculation unit and the fourth calculation unit;
the first calculation unit is configured to calculate a product of the first real-part bit data and the second imaginary-part bit data;
the second calculation unit is configured to calculate a product of the first imaginary bit data and the second real bit data;
the third calculating unit is used for calculating the product of the first imaginary part bit data and the inverted data of the second imaginary part bit data;
the fourth calculation unit is configured to calculate a product of the first real bit data and the second real bit data;
the first product result includes a product of the first real bit data and the second imaginary bit data, and a product of the first imaginary bit data and the second real bit data, and the second product result includes a product of the first real bit data and the second real bit data, and a product of the first imaginary bit data and an inverse of the second imaginary bit data.
4. The memory integrated circuit of claim 2 or 3, wherein the target MAC processing circuit comprises an imaginary processing circuit and a real processing circuit, the imaginary processing circuit is connected to m first column signal lines, the real processing circuit is connected to m second column signal lines;
the imaginary part processing circuit is used for receiving the first product result from the storage unit of each row through each first row signal wire, accumulating a plurality of first product results from the same row to obtain a first row accumulated result, performing weighting operation on the first row accumulated results of different rows, and accumulating the first row accumulated results of different rows after the weighting operation to obtain the first output data;
the real part processing circuit is used for calculating the second product result received from the storage unit of each column through each second column signal line, accumulating a plurality of second product results from the same column to obtain a second column accumulated result, performing weighting operation on second column accumulated results of different columns, and accumulating the second column accumulated results of different columns after the weighting operation to obtain the second output data.
5. The memory integrated circuit according to claim 4, wherein the data processing circuit further comprises a compensation circuit having one end connected to the N second row signal lines through the target column signal line and the other end connected to the real part processing circuit in each of the at least one MAC processing circuit;
the compensation circuit is used for receiving a plurality of first imaginary part bit data from the N second line signal wires, accumulating the plurality of first imaginary part bit data and transmitting the accumulated result of the plurality of first imaginary part bit data to the real part processing circuit;
the real part processing circuit is configured to perform a summation operation on the accumulated result of the plurality of first imaginary part bit data and the second output data to obtain third output data.
6. The integrated circuit of claim 4 or 5, wherein the m storage units located in the same row are configured to store bit data of different significant bits after binary conversion is performed on the real part and the imaginary part of the second complex number respectively;
the n storage units in the same column are used for storing the bit data of the same effective bit of different second complex numbers.
7. The memory and computation integrated circuit of claim 4 or 5, wherein the first real bit data of the first complex number comprises bit data of different significant bits after binary conversion of the real part of the first complex number;
the first imaginary part bit data of the first complex number comprises bit data of different effective bits after binary conversion of the imaginary part of the first complex number.
8. The memory cell of any of claims 2-7, wherein the first memory cell comprises a first transistor and a second transistor, and a first inverter or/and a second inverter;
the grid electrode of the first transistor and the grid electrode of the second transistor are respectively connected with a word line WL, and the word line is used for activating the first transistor and the second transistor;
the source electrode of the first transistor is connected with the input end of the first phase inverter or the output end of the second phase inverter, and the drain electrode of the first transistor is connected with a first bit line BL;
the source of the second transistor is connected with a second bit line BLB, and the drain of the second transistor is connected with the output end of the first inverter or the input end of the second inverter;
when the target bit data is the second imaginary bit data:
the first inverter is configured to convert the second imaginary part bit data stored between the source of the first transistor and the input terminal of the first inverter into inverted data of the second imaginary part bit data, and output the inverted data to the second transistor;
the first bit line and the second bit line are used for reading or writing the second imaginary part bit data from the first storage unit and reading or writing the second real part bit data from the second storage unit;
when the target bit data is inverse data of the second imaginary bit data:
the second inverter is configured to convert inverted data of the second imaginary part bit data stored between a drain of the second transistor and an input terminal of the second inverter into the second imaginary part bit data, and output the second imaginary part bit data to the first transistor;
the first bit line and the second bit line are used for reading or writing the second imaginary bit data from the first memory cell and reading or writing the second real bit data from the second memory cell.
9. The memory integrated circuit of any of claims 2-8,
the first calculation unit includes a third transistor and a fourth transistor, a gate of the third transistor is connected to the first storage unit, the second imaginary part bit data is received from the first storage unit, a source of the third transistor is grounded, a drain of the third transistor is connected to a source of the fourth transistor, a gate of the fourth transistor is connected to the first row signal line, and a drain of the fourth transistor is connected to the first column signal line;
the second calculation unit includes a fifth transistor and a sixth transistor, a gate of the fifth transistor is connected to the second storage unit, the second real part bit data is received from the second storage unit, a source of the fifth transistor is grounded, a drain of the fifth transistor is connected to a source of the sixth transistor, a gate of the sixth transistor is connected to the second row signal line, and a drain of the sixth transistor is connected to the first column signal line;
the third calculation unit includes a seventh transistor and an eighth transistor, a gate of the seventh transistor is connected to the first memory cell, inverted data of the second imaginary part bit data is received from the first memory cell, a source of the seventh transistor is grounded, a drain of the seventh transistor is connected to a source of the eighth transistor, a gate of the eighth transistor is connected to the second row signal line, and a drain of the eighth transistor is connected to the second column signal line;
the fourth calculation unit includes a ninth transistor and a tenth transistor, a gate of the ninth transistor is connected to the second memory cell, the second real part bit data is received from the second memory cell, a source of the ninth transistor is grounded, a drain of the ninth transistor is connected to a source of the tenth transistor, a gate of the tenth transistor is connected to the first row signal line, and a drain of the tenth transistor is connected to the second column signal line.
10. The memory integrated circuit according to any one of claims 2 to 9, wherein the target column signal line and the second row signal line are connected via an eleventh transistor, a gate of the eleventh transistor is connected to the second row signal line, a source of the eleventh transistor is grounded, and a drain of the eleventh transistor is connected to the target column signal line.
11. The memory integrated circuit according to any one of claims 4 to 7, wherein the imaginary part processing circuit or the real part processing circuit comprises a digital post-processing circuit and m analog-to-digital conversion circuits, the digital post-processing circuit being connected to the m analog-to-digital conversion circuits, each analog-to-digital conversion circuit being connected to one column signal line;
each analog-to-digital conversion circuit comprises a switch, a capacitor, a power supply and an analog-to-digital converter;
the fixed end of the switch is connected with one end of the capacitor, and the movable end of the switch is connected with the power supply or the corresponding column signal line;
the other end of the capacitor is connected with an analog signal input end of the analog-to-digital converter;
the digital signal output end of the analog-to-digital converter is connected with the digital post-processing circuit;
the analog-to-digital converter is used for converting the first product result or the second product result in an analog signal state into a digital signal and transmitting the digital signal to the digital post-processing circuit.
12. The memory integrated circuit according to any one of claims 4 to 7, wherein the imaginary part processing circuit or the real part processing circuit comprises a digital post-processing circuit, an analog-to-digital converter, and m capacitance circuits, the digital post-processing circuit being connected to the n capacitance circuits, each capacitance circuit being connected to one column signal line;
each capacitor circuit comprises a switch, a capacitor and a power supply;
the fixed end of the switch is connected with one end of the capacitor, and the movable end of the switch is connected with the power supply or the corresponding column signal line;
the other end of the capacitor is connected with an analog signal input end of the analog-to-digital converter;
the digital signal output end of the analog-to-digital converter is connected with the digital post-processing circuit;
the analog-to-digital converter is used for converting the accumulation result of the first multiplication result or the accumulation result of the second multiplication result into a digital signal in an analog signal state and transmitting the digital signal to the digital post-processing circuit.
13. The memory integrated circuit of claim 11 or 12, wherein the digital post-processing circuit in the real-part processing circuit further receives a compensated digital signal from an analog-to-digital conversion circuit of the compensation circuit.
14. A chip system, comprising: a memory integrated circuit according to any of claims 1 to 13.
15. An electronic device, characterized by a memory integrated circuit according to any of the preceding claims 1-13.
CN202111205493.9A 2021-10-15 2021-10-15 Storage and calculation integrated circuit, chip system and electronic equipment Pending CN115982092A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111205493.9A CN115982092A (en) 2021-10-15 2021-10-15 Storage and calculation integrated circuit, chip system and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111205493.9A CN115982092A (en) 2021-10-15 2021-10-15 Storage and calculation integrated circuit, chip system and electronic equipment

Publications (1)

Publication Number Publication Date
CN115982092A true CN115982092A (en) 2023-04-18

Family

ID=85962809

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111205493.9A Pending CN115982092A (en) 2021-10-15 2021-10-15 Storage and calculation integrated circuit, chip system and electronic equipment

Country Status (1)

Country Link
CN (1) CN115982092A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116821047A (en) * 2023-08-31 2023-09-29 北京犀灵视觉科技有限公司 Sensing and storing integrated circuit, system and method

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116821047A (en) * 2023-08-31 2023-09-29 北京犀灵视觉科技有限公司 Sensing and storing integrated circuit, system and method
CN116821047B (en) * 2023-08-31 2023-10-31 北京犀灵视觉科技有限公司 Sensing and storing integrated circuit, system and method

Similar Documents

Publication Publication Date Title
JP3260357B2 (en) Information processing device
CN112114776A (en) Quantum multiplication method and device, electronic device and storage medium
EP3985670A1 (en) Subunit, mac array, and analog and digital combined in-memory computing module having reconstructable bit width
WO2020230374A1 (en) Arithmetic operation device and arithmetic operation system
CN115982092A (en) Storage and calculation integrated circuit, chip system and electronic equipment
US20170168775A1 (en) Methods and Apparatuses for Performing Multiplication
US11469770B2 (en) Architecture for multiplier accumulator using unit elements for multiplication, bias, accumulation, and analog to digital conversion over a shared charge transfer bus
JPH06502265A (en) Calculation circuit device for matrix operations in signal processing
US12014151B2 (en) Scaleable analog multiplier-accumulator with shared result bus
US20220209788A1 (en) Differential Analog Multiplier-Accumulator
Bohrn et al. Field programmable neural array for feed-forward neural networks
CN116543808A (en) All-digital domain in-memory approximate calculation circuit based on SRAM unit
CN115658013B (en) ROM in-memory computing device of vector multiply adder and electronic equipment
CN113378109A (en) Mixed base fast Fourier transform calculation circuit based on memory calculation
CN115658012B (en) SRAM analog memory computing device of vector multiply adder and electronic equipment
CN110717580B (en) Calculation array based on voltage modulation and oriented to binarization neural network
US20220207247A1 (en) Unit Element for Asynchronous Analog Multiplier Accumulator
US20220206753A1 (en) Cascade Multiplier using Unit Element Analog Multiplier-Accumulator
US11983507B2 (en) Differential analog multiplier for a signed binary input
US12026479B2 (en) Differential unit element for multiply-accumulate operations on a shared charge transfer bus
US11476866B2 (en) Successive approximation register using switched unit elements
US20220244915A1 (en) Layout Structure for Shared Analog Bus in Unit Element Multiplier
US20230043170A1 (en) Memory device for performing convolution operation
WO2023084299A1 (en) Hybrid matrix multiplier
CN115906735A (en) Multi-bit-number storage and calculation integrated circuit based on analog signals, chip and calculation device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination