US10839894B2 - Memory computation circuit and method - Google Patents

Memory computation circuit and method Download PDF

Info

Publication number
US10839894B2
US10839894B2 US16/405,822 US201916405822A US10839894B2 US 10839894 B2 US10839894 B2 US 10839894B2 US 201916405822 A US201916405822 A US 201916405822A US 10839894 B2 US10839894 B2 US 10839894B2
Authority
US
United States
Prior art keywords
circuit
memory
data
segment
memory cells
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
US16/405,822
Other versions
US20200005859A1 (en
Inventor
Yen-Huei Chen
Hidehiro Fujiwara
Hung-jen Liao
Jonathan Tsung-Yung Chang
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Taiwan Semiconductor Manufacturing Co TSMC Ltd
Original Assignee
Taiwan Semiconductor Manufacturing Co TSMC Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority to US16/405,822 priority Critical patent/US10839894B2/en
Application filed by Taiwan Semiconductor Manufacturing Co TSMC Ltd filed Critical Taiwan Semiconductor Manufacturing Co TSMC Ltd
Priority to TW108121134A priority patent/TW202001884A/en
Priority to CN201910538988.XA priority patent/CN110660417A/en
Assigned to TAIWAN SEMICONDUCTOR MANUFACTURING COMPANY, LTD. reassignment TAIWAN SEMICONDUCTOR MANUFACTURING COMPANY, LTD. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: CHANG, JONATHAN TSUNG-YUNG, LIAO, HUNG-JEN, CHEN, YEN-HUEI, FUJIWARA, HIDEHIRO
Publication of US20200005859A1 publication Critical patent/US20200005859A1/en
Priority to US17/077,401 priority patent/US11398275B2/en
Publication of US10839894B2 publication Critical patent/US10839894B2/en
Application granted granted Critical
Priority to US17/808,536 priority patent/US11830543B2/en
Priority to US18/448,039 priority patent/US20230395143A1/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C11/00Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor
    • G11C11/21Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements
    • G11C11/34Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices
    • G11C11/40Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices using transistors
    • G11C11/41Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices using transistors forming static cells with positive feedback, i.e. cells not needing refreshing or charge regeneration, e.g. bistable multivibrator or Schmitt trigger
    • G11C11/412Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices using transistors forming static cells with positive feedback, i.e. cells not needing refreshing or charge regeneration, e.g. bistable multivibrator or Schmitt trigger using field-effect transistors only
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C11/00Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor
    • G11C11/21Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements
    • G11C11/34Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices
    • G11C11/40Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices using transistors
    • G11C11/41Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices using transistors forming static cells with positive feedback, i.e. cells not needing refreshing or charge regeneration, e.g. bistable multivibrator or Schmitt trigger
    • G11C11/413Auxiliary circuits, e.g. for addressing, decoding, driving, writing, sensing, timing or power reduction
    • G11C11/417Auxiliary circuits, e.g. for addressing, decoding, driving, writing, sensing, timing or power reduction for memory cells of the field-effect type
    • G11C11/419Read-write [R-W] circuits
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C11/00Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor
    • G11C11/54Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using elements simulating biological cells, e.g. neuron
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C7/00Arrangements for writing information into, or reading information out from, a digital store
    • G11C7/10Input/output [I/O] data interface arrangements, e.g. I/O data control circuits, I/O data buffers
    • G11C7/1006Data managing, e.g. manipulating data before writing or reading out, data bus switches or control circuits therefor
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C11/00Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor
    • G11C11/21Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements
    • G11C11/34Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices
    • G11C11/40Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices using transistors
    • G11C11/41Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices using transistors forming static cells with positive feedback, i.e. cells not needing refreshing or charge regeneration, e.g. bistable multivibrator or Schmitt trigger
    • G11C11/413Auxiliary circuits, e.g. for addressing, decoding, driving, writing, sensing, timing or power reduction
    • G11C11/417Auxiliary circuits, e.g. for addressing, decoding, driving, writing, sensing, timing or power reduction for memory cells of the field-effect type
    • G11C11/418Address circuits

Definitions

  • Memory arrays are often used to store and access data used for various types of computations such as logic or mathematical operations. To perform these operations, data bits are moved between the memory arrays and circuits used to perform the computations. In some cases, computations include multiple layers of operations, and the results of a first operation are used as input data in a second operation.
  • FIG. 1 is a diagram of a memory circuit, in accordance with some embodiments.
  • FIG. 2A is a diagram of a system, in accordance with some embodiments.
  • FIG. 2B is a diagram of a network circuit, in accordance with some embodiments.
  • FIG. 2C is a diagram of a neural network circuit, in accordance with some embodiments.
  • FIG. 3 is a diagram of a memory circuit, in accordance with some embodiments.
  • FIG. 4 is a diagram of a memory cell circuit, in accordance with some embodiments.
  • FIG. 5 is a plot of memory circuit operating parameters, in accordance with some embodiments.
  • FIG. 6 is a flowchart of a method of performing an in-memory computation, in accordance with some embodiments.
  • first and second features are formed in direct contact
  • additional features may be formed between the first and second features, such that the first and second features may not be in direct contact
  • present disclosure may repeat reference numerals and/or letters in the various examples. This repetition is for the purpose of simplicity and clarity and does not in itself dictate a relationship between the various embodiments and/or configurations discussed.
  • spatially relative terms such as “beneath,” “below,” “lower,” “above,” “upper” and the like, may be used herein for ease of description to describe one element or feature's relationship to another element(s) or feature(s) as illustrated in the figures.
  • the spatially relative terms are intended to encompass different orientations of the device in use or operation in addition to the orientation depicted in the figures.
  • the apparatus may be otherwise oriented (rotated 90 degrees or at other orientations) and the spatially relative descriptors used herein may likewise be interpreted accordingly.
  • a circuit in various embodiments, includes a memory array positioned between a write circuit and a read circuit.
  • the write circuit stores data in the memory array based on data received at an input port, and the read circuit retrieves stored data for a computation circuit that outputs result data to an output port.
  • the circuit is capable of reducing data movement compared to approaches that do not perform such in-memory computations, particularly in cases in which the circuit is used in one or more layers of a network circuit such as a neural network.
  • the circuit performs in-memory computations by operating at least one segment of the memory array separately from at least one other segment of the memory array, and is further capable of reducing data movement compared to approaches in which a circuit performs computations based on multiple memory arrays that do not operate segments separately.
  • FIG. 1 is a diagram of a memory circuit 100 , in accordance with some embodiments.
  • Memory circuit 100 includes a memory array 110 , a row decode circuit 120 , a write circuit 130 , a write control circuit 140 , a read circuit 150 , a read control circuit 160 , a computation circuit 170 , and a control circuit 180 .
  • Memory array 110 is positioned between and coupled with each one of write circuit 130 and read circuit 150 .
  • Read circuit 150 is positioned between and coupled with each one of memory array 110 and computation circuit 170 .
  • Write control circuit 140 is adjacent to and coupled with write circuit 130 ; row decode circuit 120 is adjacent to and coupled with memory array 110 ; and read control circuit 160 is adjacent to and coupled with read circuit 150 .
  • both write circuit 130 and write control circuit 140 are positioned at the top of memory array 110 , and read circuit 150 , read control circuit 160 , and computation circuit 170 are positioned at the bottom of memory array 110 .
  • both write circuit 130 and write control circuit 140 are positioned at the bottom of memory array 110 , and read circuit 150 , read control circuit 160 , and computation circuit 170 are positioned at the top of memory array 110 .
  • Row decode circuit 120 is positioned between and coupled with each one of write control circuit 140 and read control circuit 160 .
  • Control circuit 180 is coupled with each one of write control circuit 140 , row decode circuit 120 , read control circuit 160 , and computation circuit 170 . In some embodiments, control circuit 180 is not coupled with one or more of write control circuit 140 , row decode circuit 120 , read control circuit 160 , or computation circuit 170 .
  • Two or more circuit elements are considered to be coupled based on one or more direct signal connections and/or one or more indirect signal connections that include one or more logic devices, e.g., an inverter or logic gate, between the two or more circuit elements.
  • signal communications between the two or more coupled circuit elements are capable of being modified, e.g., inverted or made conditional, by the one or more logic devices.
  • control circuit 180 is adjacent to each one of write control circuit 140 , row decode circuit 120 , read control circuit 160 , and computation circuit 170 .
  • control circuit 180 is positioned apart from one or more of write control circuit 140 , row decode circuit 120 , read control circuit 160 , or computation circuit 170 , and/or control circuit 180 includes one or more of write control circuit 140 , row decode circuit 120 , read control circuit 160 , or computation circuit 170 .
  • memory circuit 100 does not include control circuit 180 , and one or more of row decode circuit 120 , write control circuit 140 , read control circuit 160 , or computation circuit 170 is configured to receive one or more control signals (not shown) from one or more circuits, e.g., a processor 210 discussed below with respect to FIG. 2A , external to memory circuit 100 .
  • control circuit 180 one or more of row decode circuit 120 , write control circuit 140 , read control circuit 160 , or computation circuit 170 is configured to receive one or more control signals (not shown) from one or more circuits, e.g., a processor 210 discussed below with respect to FIG. 2A , external to memory circuit 100 .
  • Memory array 110 is an array of memory cells 112 arranged in rows and columns.
  • memory array 110 includes a segment 110 A including one or more columns of memory cells 112 , and a segment 110 B including one or more columns of memory cells 112 .
  • memory array 110 includes a single segment, or greater than two segments, each segment including one or more columns of memory cells 112 .
  • memory array 110 includes one or more of memory array segments 310 X[ 1 ] . . . 310 X[N] and/or 310 W[ 1 ] . . . 310 W[N], discussed below with respect to FIG. 3 .
  • memory circuit 100 is configured to operate at least one segment separately from at least one other segment, as discussed below.
  • a memory cell 112 of memory array 110 includes electrical, electromechanical, electromagnetic, or other devices configured to store bit data represented by logical states.
  • Each column of a number C columns of memory cells 112 is coupled with a corresponding bit line of bit lines 114 [ 1 ] . . . 114 [C] through which the logical states are programmed in a write operation and detected in a read operation.
  • Each row of a number R rows of memory cells 112 is coupled with a corresponding word line of word lines 116 [ 1 ] . . . 116 [R] through which the memory cell 112 is selected in the read and write operations.
  • a logical state corresponds to a voltage level of an electrical charge stored in a given memory cell 112 . In some embodiments, a logical state corresponds to a physical property, e.g., a resistance or magnetic orientation, of a component of a given memory cell 112 .
  • memory cells 112 include static random-access memory (SRAM) cells.
  • SRAM cells include five-transistor (5T) SRAM cells, six-transistor (6T) SRAM cells, eight-transistor (8T) SRAM cells, nine-transistor (9T) SRAM cells, or SRAM cells having other numbers of transistors.
  • memory cells 112 include dynamic random-access memory (DRAM) cells or other memory cell types capable of storing bit data.
  • DRAM dynamic random-access memory
  • memory cells 112 include memory cells 412 X and 412 W, discussed below with respect to FIG. 4 .
  • Row decode circuit 120 is an electronic circuit configured to generate one or more word line signals (not labeled) on word lines 116 [ 1 ] . . . 116 [R] based on one or more control signals (not shown) received from control circuit 180 or from one or more circuits, e.g., processor 210 discussed below with respect to FIG. 2A , external to memory circuit 100 .
  • the one or more word line signals are capable of causing one or more memory cells 112 to become activated during read and write operations, thereby selecting the one or more memory cells 112 during a read or write operation.
  • row decode circuit 120 is configured to select an entirety of a given row of memory cells 112 during a read or write operation. In some embodiments, during a read or write operation, row decode circuit 120 is configured to select one or more subsets of a given row of memory cells 112 by generating one or more subsets of word line signals on one or more subsets of word lines 116 [ 1 ] . . . 116 [R], memory circuit 100 thereby being configured in part to operate at least one segment, e.g., segment 110 A, of memory array 110 separately from at least one other segment, e.g., segment 110 B, of memory array 110 .
  • row decode circuit 120 includes row decode circuit 320 configured to generate one or more of word line signals WX[ 1 ] . . . WX[M] on word lines 316 X[ 1 ] . . . 316 X[M] or word line signals WW[ 1 ] . . . WW[M] on word lines 316 W[ 1 ] . . . 316 W[M], discussed below with respect to FIG. 3 .
  • Write circuit 130 is an electronic circuit configured to generate voltage levels corresponding to logical states on bit lines 114 [ 1 ] . . . 114 [C] during a write operation, the one or more memory cells 112 selected during the write operation thereby being programmed to logical states based on the voltage levels on bit lines 114 [ 1 ] . . . 114 [C].
  • each memory cell 112 is coupled with a single one of bit lines 114 [ 1 ] . . . 114 [C]
  • write circuit 130 is configured to output a single voltage level on the single one of bit lines 114 [ 1 ] . . . 114 [C] corresponding to a given memory cell 112 .
  • each memory cell 112 is coupled with a pair of bit lines of bit lines 114 [ 1 ] . . . 114 [C], and write circuit 130 is configured to output complementary voltage levels on the pair of bit lines of bit lines 114 [ 1 ] . . . 114 [C] corresponding to a given memory cell 112 .
  • Write circuit 130 is configured to generate the voltage levels based on data IN received at an input port 100 -I, and on one or more control signals (not shown) received from write control circuit 140 or from one or more circuits, e.g., processor 210 discussed below with respect to FIG. 2A , external to memory circuit 100 .
  • a port e.g., input port 100 -I
  • Data IN includes a plurality of voltage levels, each voltage level being carried on one or more electrical connections of input port 100 -I and corresponding to a logical state of a data bit of data IN.
  • write circuit 130 is configured to generate the one or more voltage levels for an entirety of the columns of memory cells 112 during a write operation. In some embodiments, during a write operation, write circuit 130 is configured to write to one or more subsets of the columns of memory cells 112 , memory circuit 100 thereby being configured in part to operate at least one segment, e.g., segment 110 A, of memory array 110 separately from at least one other segment, e.g., segment 110 B, of memory array 110 .
  • memory circuit 100 is configured so that write circuit 130 writes to the one or more subsets of the columns of memory cells 112 based on the one or more subsets of the columns of memory cells 112 being activated by row decoder 120 during a write operation as discussed above.
  • write circuit 130 is configured to write to one or more subsets of the columns of memory cells 112 by masking one or more portions of data IN during a write operation.
  • Write control circuit 140 is an electronic circuit configured to generate and output the one or more control signals to write circuit 130 based on one or more control signals (not shown) received from control circuit 180 or from one or more circuits, e.g., processor 210 discussed below with respect to FIG. 2A , external to memory circuit 100 .
  • Read circuit 150 is an electronic circuit configured to receive voltage signals (not labeled) on one or more of bit lines 114 [ 1 ] . . . 114 [C] during a read operation, the voltage signals being based on the logical states of the one or more memory cells 112 selected during the read operation.
  • Read circuit 150 is configured to determine the logical states of the one or more memory cells 112 selected during the read operation based on the voltage signals on the one or more of bit lines 114 [ 1 ] . . . 114 [C].
  • read circuit 150 includes one or more sense amplifiers, e.g., sense amplifier SA discussed below with respect to FIG. 3 , configured to determine the logical states of the one or more memory cells 112 .
  • each memory cell 112 is coupled with a single bit line of bit lines 114 [ 1 ] . . . 114 [C], and read circuit 150 is configured to determine the logical state of a given memory cell 112 based on the voltage signal on the single bit line of bit lines 114 [ 1 ] . . . 114 [C] corresponding to the given memory cell 112 .
  • each memory cell 112 is coupled with a pair of bit lines of bit lines 114 [ 1 ] . . . 114 [C]
  • read circuit 150 is configured to determine the logical state of a given memory cell 112 based on the voltage signals on the pair of bit lines of bit lines 114 [ 1 ] . . . 114 [C] corresponding to the given memory cell 112 .
  • Read circuit 150 is configured to generate one or more data signals (not shown) based on the determined logical states of memory cells 112 , and on one or more control signals (not shown) received from read control circuit 160 .
  • read circuit 150 is configured to generate the one or more data signals based on an entirety of the columns of memory cells 112 during a read operation. In some embodiments, during a read operation, read circuit 150 is configured to generate one or more data signals based on one or more subsets of the columns of memory cells 112 , memory circuit 100 thereby being configured in part to operate at least one segment, e.g., segment 110 A, of memory array 110 separately from at least one other segment, e.g., segment 110 B, of memory array 110 . In some embodiments, read circuit 150 is configured to generate one or more data signals based on one or more subsets of the columns of memory cells 112 by masking one or more voltage signals on bit lines 114 [ 1 ] . . . 114 [C] during a read operation.
  • memory circuit 100 is configured so that read circuit 150 generates one or more data signals based on the one or more subsets of the columns of memory cells 112 being activated by row decoder 120 during a read operation as discussed above.
  • read circuit 150 includes read circuit 350 configured to generate data signals X[ 1 ] . . . X[N] and W[ 1 ] . . . W[N], discussed below with respect to FIG. 3 .
  • Read control circuit 160 is an electronic circuit configured to generate and output the one or more control signals to read circuit 150 based on one or more control signals (not shown) received from control circuit 180 or from one or more circuits, e.g., processor 210 discussed below with respect to FIG. 2A , external to memory circuit 100 .
  • Computation circuit 170 is an electronic circuit configured to receive the one or more data signals from read circuit 150 , and perform one or more logical and/or mathematical operations based on the one or more data signals and one or more control signals (not shown) received from control circuit 180 or from one or more circuits, e.g., processor 210 discussed below with respect to FIG. 2A , external to memory circuit 100 .
  • memory circuit 100 is configured so that one or more logical and/or mathematical operations performed by computation circuit 170 are coordinated with one or more operations performed by read circuit 150 , memory circuit 100 thereby being configured to perform an in-memory computation. In some embodiments, memory circuit 100 is configured so that computation circuit 170 performs one or more logical and/or mathematical operations in a sequence coordinated with a sequence by which read circuit 150 determines logical states of memory cells 112 . In some embodiments, memory circuit 100 is configured so that read circuit 150 and computation circuit 170 operations are coordinated to perform a matrix computation as discussed below with respect to the non-limiting examples of FIGS. 2C and 5 .
  • computation circuit 170 is configured to perform the one or more logical functions based on performing a first operation on a first subset of the one or more data signals and performing a second operation on a second subset of the one or more data signals, memory circuit 100 thereby being configured in part to operate at least one segment, e.g., segment 100 A, of memory array 110 separately from at least one other segment, e.g., segment 110 B, of memory array 110 .
  • computation circuit 170 is configured to perform a matrix computation using the first subset of the one or more data signals as input data and the second subset of the one or more data signals as weight data.
  • computation circuit 170 includes a multiplier-accumulator configured to perform a multiply-accumulate operation.
  • computation circuit 170 includes operation circuit 370 A and addition circuit 370 B, discussed below with respect to FIG. 3 .
  • Computation circuit 170 is configured to output data OUT on an output port 100 -O.
  • Data OUT includes a plurality of voltage levels, each voltage level being carried on one or more electrical connections of output port 100 -O.
  • data OUT includes a same, greater, or lesser number of voltage levels as a number of voltage levels included in data IN.
  • the plurality of voltage levels of data OUT are based on one or more results of the one or more logical and/or mathematical operations. In some embodiments, one or more voltage levels are based on one or more results of a logical or mathematical operation performed by computation circuit 170 on two or more data bits stored in memory array 110 and retrieved by read circuit 150 . In various embodiments, memory circuit 100 is configured to generate data OUT including none, one or more, or all of the plurality of voltage levels of data OUT representing a logical state of a memory cell 112 in memory array 110 .
  • memory array 110 is positioned between input port 100 -I at the top of memory circuit 100 and output port 100 -O at the bottom of memory circuit 100 .
  • memory array 110 is positioned between input port 100 -I at the bottom of memory circuit 100 and output port 100 -O at the top of memory circuit 100 .
  • memory array 110 is positioned between input port 100 -I and output port 100 -O based on one or both of input port 100 -I or output port 100 -O being positioned at a side or sides of memory circuit 100 .
  • memory circuit 100 in operation, is capable of receiving data IN at input port 100 -I, storing logical states based on data IN, performing one or more logical functions based on the stored logical states, and generating data OUT at output port 100 -O.
  • Memory circuit 100 is thereby configured to perform an in-memory computation in which data flows in the direction determined by the positioning of input port 100 -I and output port 100 -O.
  • memory circuit 100 By including separately positioned input and output ports and in-memory computation, memory circuit 100 is capable of being included in circuits in which data movement distances are reduced compared to approaches in which a memory circuit does not include one or both of separately positioned input and output ports or in-memory computation. By reducing data movement distances, memory circuit 100 enables reduced power and simplified circuit configurations by reducing parasitic capacitances associated with data bus lengths and/or numbers of data buffers compared to approaches in which a memory circuit does not include one or both of separately positioned input and output ports or in-memory computation.
  • memory circuit 100 is further capable of reducing data movement distances compared to approaches in which a memory circuit includes multiple memory arrays that do not include in-memory computation or segmented arrays.
  • FIG. 2A is a diagram of a system 200 A, in accordance with some embodiments.
  • System 200 A includes memory circuit 100 , discussed above with respect to FIG. 1 , and a processor 210 .
  • Processor 210 is an electronic circuit configured to perform one or more logic operations and is coupled with memory circuit 100 through a data bus BUS.
  • System 200 A is an electronic or electromechanical system configured to perform one or more predetermined functions based on the one or more logic operations performed by processor 210 and on data and in-memory computation operations performed by memory circuit 100 including computation circuit 170 , as discussed above with respect to FIG. 1 .
  • system 200 A is configured to perform one or more functions, e.g., a feed-forward or multiply-accumulate function, of a neural network.
  • system 200 A includes one or more circuits (not shown) in addition to memory circuit 100 and processor 210 .
  • system 200 A includes a network circuit, e.g., network circuit 200 B discussed below with respect to FIG. 2B , that includes a plurality of memory circuits 100 .
  • Data bus BUS is a plurality of electrical connections configured to conduct one or more signals between memory circuit 100 and processor 210 .
  • Data bus BUS is coupled with input port 100 -I and output port 100 -O of memory circuit 100 and is thereby configured to conduct one or both of data IN from processor 210 to memory circuit 100 or data OUT from memory circuit 100 to processor 210 .
  • data bus BUS is further coupled with memory circuit 100 and is thereby configured to conduct one or more control or other signals (not shown) between memory circuit 100 and processor 210 .
  • system 200 A including memory circuit 100 is capable of realizing the benefits discussed above with respect to memory circuit 100 .
  • FIG. 2B is a diagram of a network circuit 200 B, in accordance with some embodiments.
  • Network circuit 200 B includes multiple layers of memory circuits 100 , discussed above with respect to FIG. 1 .
  • Network circuit 200 B includes a number L of layers of memory circuits 100 labeled 100 - 1 through 100 -L, the layers including respective input ports 100 - 1 -I through 100 -L-I and output ports 100 - 1 -O through 100 -L-O.
  • Input port 100 - 1 -I is an input port of network circuit 200 B
  • output port 100 -L-O is an output port of network circuit 200 B.
  • Output port 100 - 1 -O is coupled with input port 100 - 2 -I, and output port 100 - 2 -O is coupled with the input port of the adjacent layer (not shown), the pattern being repeated through input port 100 -L-I such that data paths from input port 100 - 1 -I to output port 100 -L-O include each one of memory circuits 100 - 1 through 100 -L.
  • memory circuit 100 - 1 receives data IN- 1 at input port 100 - 1 -I and outputs data OUT- 1 on output port 100 - 1 -O
  • memory circuit 100 - 2 receives data OUT- 1 as data IN- 2 at input port 100 - 2 -I and outputs data OUT- 2 on output port 100 - 2 -O, the pattern being repeated such that data flows from input port 100 - 1 -I to output port 100 -L-O through each one of memory circuits 100 - 1 through 100 -L.
  • network circuit 200 B includes the number L layers of memory circuits 100 equal to three. In various embodiments, network circuit 200 B includes the number L layers of memory circuits 100 fewer or greater than three.
  • input ports 100 - 1 -I through 100 -L-I are positioned at the tops of respective memory circuits 100 - 1 through 100 -L
  • output ports 100 - 1 -O through 100 -L-O are positioned at the bottoms of respective memory circuits 100 - 1 through 100 -L, so that, in operation, data flows from input port 100 - 1 -I at the top of network circuit 200 B to output port 100 -L-O at the bottom of network circuit 200 B.
  • input ports 100 - 1 -I through 100 -L-I are positioned at the bottoms of respective memory circuits 100 - 1 through 100 -L
  • output ports 100 - 1 -O through 100 -L-O are positioned at the tops of respective memory circuits 100 - 1 through 100 -L, so that, in operation, data flows from input port 100 - 1 -I at the bottom of network circuit 200 B to output port 100 -L-O at the top of network circuit 200 B.
  • one or more subsets of input ports 100 - 1 -I through 100 -L-I and/or one or more subsets of output ports 100 - 1 -O through 100 -L-O are positioned on respective memory circuits 100 - 1 through 100 -L at one or more locations other than those depicted in FIG. 2B so that, in operation, data flows in more than one direction within network circuit 200 B.
  • network circuit 200 B includes memory circuits 100 - 1 through 100 -L arranged in multiple rows and/or columns so that, in operation, data flows in a multi-directional pattern, e.g., a serpentine pattern, within network circuit 200 B.
  • each layer of network circuit 200 B have a same number of electrical connections, or at least one pair of input and output ports of adjacent layers of network circuit 200 B has one or more numbers of electrical connections different from one or more numbers of electrical connections of one or more other pairs of input and output ports of adjacent layers of network circuit 200 B.
  • the memory circuits 100 of each layer of network circuit 200 B are configured to output and receive data having a same number of data bits, or at least one pair of memory circuits 100 of adjacent layers of network circuit 200 B is configured to output and receive data having a number of data bits different from a number of data bits of data output and received by one or more other pairs of memory circuits 100 of adjacent layers of network circuit 200 B.
  • the data output on an output port of a memory circuit 100 of a given layer of network circuit 200 B is the same data as the data received at the input port of the memory circuit of the corresponding adjacent layer of network circuit 200 B.
  • one or more of the data output from a given layer is a subset or a superset of the data received at the corresponding adjacent layer
  • the data output from a given layer includes data received by a circuit, e.g., processor 210 discussed above with respect to FIG. 2A , other than the corresponding adjacent layer
  • the data received at the corresponding adjacent layer includes data output from a circuit, e.g., processor 210 discussed above with respect to FIG. 2A , other than the given layer.
  • each one of memory circuits 100 - 1 through 100 -L includes computation circuit 170 , discussed above with respect to FIG. 1
  • network circuit 200 B includes memory circuits 100 - 1 through 100 -L configured as discussed above
  • network circuit 200 B is configured to perform a series of computations in which the computational results of each one of memory circuits 100 - 1 through 100 -(L ⁇ 1) are included in one or more computations performed by each one of corresponding memory circuits 100 - 2 through 100 -L.
  • Network circuit 200 B is thereby configured to perform a layered computational operation based on data received at input port 100 - 1 -I and to output the results of the layered computational operation on output port 100 -L-O.
  • network circuit 200 B includes at least one memory circuit 100 configured to operate at least one segment of memory array 110 separately from at least one other segment of memory array 110 .
  • network circuit 200 B includes at least one memory circuit 100 including computation circuit 170 configured to perform a matrix computation using data stored in segment 110 A of memory array 110 as input data and data stored in segment 110 B of memory array 110 as weight data.
  • network circuit 200 B By the configuration discussed above, data movement distances in network circuit 200 B are reduced compared to approaches in which a network circuit does not include memory circuits 100 such that data flows in a given direction and in which in-memory computation is performed within the data flow.
  • network circuit 200 B enables reduced power and simplified circuit configurations compared to approaches in which a network circuit does not include memory circuits that include one or both of separately positioned input and output ports or in-memory computation, as discussed above with respect to memory circuit 100 .
  • network circuit 200 B includes at least one memory circuit 100 configured to perform in-memory computation by operating at least one segment, e.g., segment 110 A, of memory array 110 separately from at least one other segment, e.g., segment 110 B, of memory array 110
  • network circuit 200 B is further capable of reducing data movement distances compared to approaches in which a network circuit includes multiple memory arrays that do not include in-memory computation or segmented arrays.
  • FIG. 2C is a diagram of neural network circuit 200 C, in accordance with some embodiments.
  • Neural network circuit 200 C is a non-limiting example of network circuit 200 B, discussed above with respect to FIG. 2B , in which L ⁇ 1 layers of memory circuits 100 are configured as hidden layers of a deep learning neural network.
  • Neural network circuit 200 C includes memory circuits 100 - 1 through 100 -L, discussed above with respect to FIG. 2B , and an input layer 2001 coupled with input port 100 - 1 -I of memory circuit 100 - 1 .
  • Input layer 2001 includes an input port 20014 of neural network circuit 200 C, and memory circuit 100 -L is configured as an output layer of neural network circuit 200 C by including output port 100 -L-O configured as an output port of neural network circuit 200 C.
  • each of memory circuits 100 - 1 through 100 -L includes segments 110 A and 110 B, and computation circuit 170 configured to perform one or more matrix computations on data signals based on segments 110 A and 110 B, as discussed above with respect to FIG. 1 .
  • the one or more matrix computations are represented in FIG. 2C as intersecting line segments in each instance of computation circuit 170 .
  • the instances of computation circuit 170 are configured to perform a same one or more matrix computations on a same portion or all of the data signals based on segments 110 A and 110 B. In various embodiments, the instances of computation circuit 170 are configured so that at least one instance of computation circuit 170 is configured to perform one or more matrix computations different from one or more matrix computations performed based on a configuration of at least one other instance of computation circuit 170 .
  • the instances of computation circuit 170 are configured so that at least one instance of computation circuit 170 is configured to perform one or more matrix computations on a portion or all of the data signals different from a portion or all of the data signals on which one or more matrix computations are performed based on a configuration of at least one other instance of computation circuit 170 .
  • Input layer 2001 is an electronic circuit configured to receive one or more data and/or control signals and, responsive to the one or more data and/or control signals, output data IN- 1 to input port 100 - 1 -I.
  • Data IN- 1 includes a number M 1 of input data bits X 1 -XM 1 and a number N 1 of weight data bits W 1 -WN 1 .
  • Memory circuit 100 - 1 is configured to store bit data corresponding to input data bits X 1 -XM 1 in segment 110 A and bit data corresponding to weight data bits W 1 -WN 1 in segment 110 B, perform the one or more matrix computations by combining the data stored in segment 110 A with the data stored in segment 110 B, and output data OUT- 1 to output port 100 - 1 -O.
  • Data OUT- 1 includes a number M 2 of input data bits X 1 -XM 2 and a number N 2 of weight data bits W 1 -WN 2 .
  • Memory circuit 100 - 2 is configured to receive data OUT- 1 as data IN- 2 at input port 100 - 24 , store bit data corresponding to input data bits X 1 -XM 2 in segment 110 A and bit data corresponding to weight data bits W 1 -WN 2 in segment 110 B, perform the one or more matrix computations by combining the data stored in segment 110 A with the data stored in segment 110 B, and output data OUT- 2 to output port 100 - 2 -O.
  • Data OUT- 2 includes a number M 3 of input data bits X 1 -XM 3 and a number N 3 of weight data bits W 1 -WN 3 .
  • Memory circuit 100 -L is configured to receive data IN-L at input port 100 -L-I, store bit data corresponding to input data bits X 1 -XML in segment 110 A and bit data corresponding to weight data bits W 1 -WNL in segment 110 B, perform the one or more matrix computations by combining the data stored in segment 110 A with the data stored in segment 110 B, and output data OUT-L to output port 100 -L-O.
  • Data OUT-L includes a number K of data bits Y 1 -YK.
  • numbers M 1 -M(L ⁇ 1) are a same number of input data bits and numbers N 1 -N(L ⁇ 1) are a same number of weight data bits.
  • at least one of numbers M 1 -M(L ⁇ 1) is different from at least one other of numbers M 1 -M(L ⁇ 1) and/or at least one of numbers N 1 -N(L ⁇ 1) is different from at least one other of numbers N 1 -N(L ⁇ 1).
  • the number K of data bits Y 1 -YK is the same as or different from at least one of numbers M 1 -M(L ⁇ 1) and/or numbers N 1 -N(L ⁇ 1).
  • memory circuit 100 - 1 performs a matrix computation on input data bits X 1 -XM 1 and weight data bits W 1 -WN 1 to generate input data bits X 1 -XM 2 and weight data bits W 1 -WN 2
  • memory circuit 100 - 2 performs a matrix computation on input data bits X 1 -XM 2 and weight data bits W 1 -WN 2 to generate input data bits X 1 -XM 3 and weight data bits W 1 -WN 3 , the pattern being repeated such that data flows from input port 100 - 1 -I to output port 100 -L-O through each one of memory circuits 100 - 1 through 100 -L.
  • neural network circuit 200 C includes input layer 2001 between input port 20014 and memory circuit 100 - 1 , and memory circuit 100 -(L ⁇ 1) is separated from output port 100 -L-O by memory circuit 100 -L configured as an output layer, memory circuits 100 - 1 through 100 -(L ⁇ 1) are sometimes referred to as hidden layers of neural network circuit 200 C.
  • neural network circuit 200 C is included in a neural network, and each layer of neural network circuit 200 C is a layer of the neural network. In some embodiments, each hidden layer of neural network circuit 200 C is a multiplier-accumulator layer of a feed-forward neural network.
  • FIG. 3 is a diagram of a memory circuit 300 , in accordance with some embodiments.
  • Memory circuit 300 is usable as a portion of memory circuit 100 , discussed above with respect to FIG. 1 .
  • Memory circuit 300 includes memory array segments 310 X[ 1 ] . . . 310 X[N] and 310 W[ 1 ] . . . 310 W[N] usable as all or a portion of memory array 110 including segments 110 A and 110 B, a row decode circuit 320 usable as all or a portion of row decode circuit 120 , write circuit 130 , a read circuit 350 usable as all or a portion of read circuit 150 , and operation circuit 370 A and addition circuit 370 B, collectively usable as all or a portion of computation circuit 170 , as discussed above with respect to FIG. 1 .
  • Each one of memory array segments 310 X[ 1 ] . . . 310 X[N] and 310 W[ 1 ] . . . 310 W[N] corresponds to a segment 110 A or 110 B and includes at least one column of memory cells 112 coupled with a bit line of bit lines BLX[ 1 A] . . . BLX[NA], BLX[ 1 B] . . . BLX[NB], BLW[ 1 A] . . . BLW[NA], or BLW[ 1 B] . . . BLW[NB] corresponding to a bit line of bit lines 114 [ 1 ] . . . 114 [C], discussed above with respect to FIG. 1 .
  • a given memory cell 112 is coupled with a single bit line of bit lines BLX[ 1 A] . . . BLX[NA], BLX[ 1 B] . . . BLX[NB], BLW[ 1 A] . . . BLW[NA], or BLW[ 1 B] . . . BLW[NB].
  • a given memory cell 112 is coupled with a pair of bit lines of bit lines BLX[ 1 A] . . . BLX[NA], BLX[ 1 B] . . . BLX[NB], BLW[ 1 A] . . . BLW[NA], or BLW[ 1 B] . . . BLW[NB].
  • each one of memory array segments 310 X[ 1 ] . . . 310 X[N] and 310 W[ 1 ] . . . 310 W[N] includes two columns of memory cells 112 .
  • one or more of memory array segments 310 X[ 1 ] . . . 310 X[N] or 310 W[ 1 ] . . . 310 W[N] includes one or greater than two columns of memory cells 112 .
  • each one of memory array segments 310 X[ 1 ] . . . 310 X[N] and 310 W[ 1 ] . . . 310 W[N] includes a same number of columns of memory cells 112 .
  • one or more of memory array segments 310 X[ 1 ] . . . 310 X[N] includes a first number of columns of memory cells 112 and one or more of memory array segments 310 W[ 1 ] . . . 310 W[N] includes a second number of columns of memory cells 112 different from the first number of columns of memory cells 112 .
  • Memory array segments 310 X[ 1 ] . . . 310 X[N] and 310 W[ 1 ] . . . 310 W[N] are positioned such that each memory array segment 310 X[n] is adjacent to a corresponding memory array segment 310 W[n].
  • a given row of memory cells 112 thereby includes a first subset of memory cells 112 in memory array segments 310 X[ 1 ] . . . 310 X[N] alternating with a second subset of memory cells 112 in memory array segments 310 W[ 1 ] . . . 310 W[N].
  • the first subset of memory cells 112 of a given row m is coupled with one of word lines 316 X[m]
  • the second subset of memory cells 112 of the given row m is coupled with one of word lines 316 W[m].
  • a given row m of memory cells 112 includes a memory cell 412 X coupled with a word line 316 X[m] and a memory cell 412 W coupled with a word line 316 W[m], discussed below with respect to FIG. 4 .
  • Row decode circuit 320 is configured to output word line signals WX[ 1 ] . . . WX[M] corresponding to the first subset of memory cells 112 on word lines 316 X[ 1 ] . . . 316 X[M], and to output word line signals WW[ 1 ] . . . WW[M] corresponding to the second subset of memory cells 112 on word lines 316 W[ 1 ] . . . 316 W[M].
  • Row decode circuit 320 is thereby configured to, during a read or write operation, select the first subset of memory cells 112 of a row m by generating word line signal WX[m] on the corresponding word line 316 X[m], and/or to select the second subset of memory cells 112 of the row m by generating word line signal WW[m] on the corresponding word line 316 W[m].
  • write circuit 130 is configured to generate the voltage levels on bit lines BLX[ 1 A] . . . BLX[NA], BLX[ 1 B] . . . BLX[NB], BLW[ 1 A] . . . BLW[NA], and BLW[ 1 B] . . . BLW[NB] based on data IN received at input port 100 -I
  • memory circuit 300 is thereby configured, in a write operation, to write a first subset of data IN to memory array segments 310 X[ 1 ] . . . 310 X[N], write a second subset of data IN to memory array segments 310 W[ 1 ] . . . 310 W[N], or write an entirety of data IN to memory array segments 310 X[ 1 ] . . . 310 X[N] and 310 W[ 1 ] . . . 310 W[N].
  • Read circuit 350 includes a plurality of sense amplifiers SA coupled with bit lines BLX[ 1 A] . . . BLX[NA], BLX[ 1 B] . . . BLX[NB], BLW[ 1 A] . . . BLW[NA], and BLW[ 1 B] . . . BLW[NB] through a plurality of selection circuits SEL.
  • a given sense amplifier SA is coupled with a pair of bit lines of bit lines BLX[ 1 A] . . . BLX[NA], BLX[ 1 B] . . . BLX[NB], BLW[ 1 A] . . . BLW[NA], or BLW[ 1 B] . . . BLW[NB] through a corresponding selection circuit SEL.
  • read circuit does not include a plurality of selection circuits SEL, and a given sense amplifier SA is coupled with a single bit line of bit lines BLX[ 1 A] . . . BLX[NA], BLX[ 1 B] . . . BLX[NB], BLW[ 1 A] . . . BLW[NA], or BLW[ 1 B] . . . BLW[NB].
  • a given sense amplifier SA is coupled with greater than two bit lines of bit lines BLX[ 1 A] . . . BLX[NA], BLX[ 1 B] . . . BLX[NB], BLW[ 1 A] . . . BLW[NA], or BLW[ 1 B] . . . BLW[NB] through a corresponding selection circuit SEL.
  • a selection circuit SEL includes a multiplexer. In some embodiments, read circuit 350 does not include a selection circuit SEL, and each one of memory array segments 310 X[ 1 ] . . . 310 X[N] and/or 310 W[ 1 ] . . . 310 W[N] includes a selection circuit SEL.
  • Each sense amplifier SA is an electronic circuit configured to determine the logical state of a corresponding selected memory cell 112 during a read operation.
  • a first subset of sense amplifiers SA is coupled with the first subsets of the rows of memory cells 112 corresponding to memory array segments 310 X[ 1 ] . . . 310 X[N], and a second subset of sense amplifiers SA is coupled with the second subsets of the rows of memory cells 112 corresponding to memory array segments 310 W[ 1 ] . . . 310 W[N].
  • the first subset of sense amplifiers SA is configured to generate data signals X[ 1 ] . . . X[N] having voltage levels based on the logical states of the corresponding selected memory cells 112 during a read operation
  • the second subset of sense amplifiers SA is configured to generate data signals W[ 1 ] . . . W[N] having voltage levels based on the logical states of the corresponding selected memory cells 112 during a read operation.
  • each sense amplifier SA of the first subset of sense amplifiers SA includes a latch circuit configured to generate data signals X[ 1 ] . . . X[N] having latched voltage levels.
  • each sense amplifier SA of the second subset of sense amplifiers SA includes a latch circuit configured to generate data signals W[ 1 ] . . . W[N] having latched voltage levels.
  • Operation circuit 370 A includes the number N of logic units 372 .
  • An nth logic unit 372 is configured to receive a pair of data signals X[n] and W[n], perform one or more logic or mathematical operations based on the voltage levels of data signals X[n] and W[n], and generate a signal R[n] of signals R[ 1 ] . . . R[N] having a voltage level representing a result of the one or more logic or mathematical operations.
  • the nth logic unit 372 is configured to perform the one or more logic or mathematical operations based solely on data signals X[n] and W[n], or to perform the one or more logic or mathematical operations based on one or more data signals (not shown) in addition to data signals X[n] and W[n].
  • logic units 372 are configured to perform one or more of an OR, NOR, XOR, AND, NAND, or multiplication operation, or one or more other operations suitable for processing two or more data bits.
  • each logic unit 372 is configured to perform a same logic or mathematical operation. In various embodiments, at least one logic unit 372 is configured to perform a logic or mathematical operation different from one or more logic or mathematical operations performed by one or more other logic units 372 .
  • each logic unit 372 is configured to perform a same logic or mathematical operation during all operations. In various embodiments, at least one logic unit 372 is configurable so as to perform at least one logic or mathematical operation of a plurality of varying logic or mathematical operations responsive to one or more received signals (not shown).
  • memory circuit 300 is capable of performing an in-memory computation by coordinating read circuit 350 generating data signals X[n] and W[n] with operation circuit 170 A performing one or more logical and/or mathematical operations on data signals X[n] and W[n].
  • operation circuit 370 A is capable of performing multiple logic and/or mathematical operations on data stored in memory cells 112 by operating on data in memory array segments 310 X[ 1 ] . . . 310 X[N] separately from data in respective memory array segments 310 W[ 1 ] . . . 310 W[N].
  • memory circuit 300 is configured to, in an in-memory computing operation, use a first sense amplifier SA to generate a latched one of data signals X[n] or W[n], use a corresponding second sense amplifier SA to dynamically generate the other one of data signals X[n] or W[n] by sequentially selecting memory cells 112 from multiple rows in a given column, and use an nth logic unit 372 to repeatedly perform a given logic or mathematical operation to generate signal R[n].
  • Memory circuit 300 is configured to sequentially select memory cells 112 from multiple rows in a given column by generating either word line signal WX[m] on a word line 316 X[m] or word line signal WW[m] on a word line 316 W[m] while changing values of m.
  • memory circuit 300 is configured to, in an in-memory computing operation, sequentially select memory cells 112 by stepping values of m from 1 through M, from M through 1, from 1 to a value less than M, from M to a value greater than 1, or using another order to change values of m within the span of 1 through M.
  • memory circuit 300 is configured such that, in an in-memory computing operation, operation circuit 370 A repeats the nth logic unit 372 repeatedly performing the given logic or mathematical operation to generate signal R[n] for multiple values of n.
  • memory circuit 300 is configured to, in an in-memory computing operation, generate signal R[n] for multiple values of n by using each value of n from 1 through N or by using a subset of values of n from within the span of 1 through N. In various embodiments, memory circuit 300 is configured to, in an in-memory computing operation, generate signal R[n] for multiple values of n by using multiple logic units 372 in parallel, in series, or in a combination of parallel and series operation.
  • memory circuit 300 is configured to perform a non-limiting example of an in-memory computing operation discussed below with respect to FIG. 5 .
  • Addition circuit 370 B is configured to receive signals R[ 1 ] . . . R[N], perform an addition operation based on the results represented by the voltage levels of signals R[ 1 ] . . . R[N], generate data OUT, and output data OUT on output port 100 -O.
  • addition circuit 370 B is configured to perform the addition operation by adding each of the results of an nth logic unit 372 repeatedly performing the given logic or mathematical operation represented by signal R[n] for each signal R[n] of signals R[ 1 ] . . . R[N]. In various embodiments, addition circuit 370 B is configured to perform the addition operation by adding one or more subsets of the results of the nth logic unit 372 repeatedly performing the given logic or mathematical operation represented by signal R[n] for each signal R[n] of signals R[ 1 ] . . . R[N].
  • addition circuit 370 B is configured to generate data OUT having N data bits, fewer than N data bits, or greater than N data bits.
  • memory circuit 300 is capable of performing a series of in-memory computing operations, e.g., a matrix computation, based on data in memory array segments 310 X[ 1 ] . . . 310 X[N] separate from data in respective memory array segments 310 W[ 1 ] . . . 310 W[N].
  • a memory circuit 100 , system 200 A, or network circuit 200 B including memory circuit 300 is thereby capable of realizing the benefits discussed above with respect to memory circuit 100 , system 200 A, and network circuit 200 B.
  • memory circuit 300 In embodiments in which memory circuit 300 is configured to dynamically generate one of data signals X[n] or W[n] by sequentially selecting memory cells 112 from multiple rows in a given column with the other of data signals X[n] or W[n] latched, memory circuit 300 enables reduced power and simplified circuit configurations compared to approaches in which a memory circuit does not dynamically generate a first data signal with a second data signal.
  • FIG. 4 is a diagram of a memory cell circuit 400 , in accordance with some embodiments.
  • Memory cell circuit 400 is usable as a portion of a memory circuit 100 or 300 , discussed above with respect to FIGS. 1 and 3 .
  • Memory cell circuit 400 includes word line 316 X[m] configured to carry word line signal WX[m] and word line 316 W[m] configured to carry word line signal WW[m], discussed above with respect to FIG. 3 .
  • Memory cell circuit 400 also includes memory cells 412 X and 412 W, each usable as a memory cell 112 , and bit lines BL and BLB, each usable as a bit line of bit lines 114 [ 1 ] . . . 114 [C], each discussed above with respect to FIG. 1 .
  • FIG. 4 depicts memory cell circuit 400 including one each of memory cells 412 X and 412 W for the purpose of illustration.
  • memory cell circuit 400 includes greater than one each of one or both of memory cells 412 X and 412 W.
  • Each one of memory cells 412 X and 412 W is configured as a 6T SRAM cell by including power nodes VDD and VSS, PMOS transistors P 1 and P 2 , and NMOS transistors N 1 , N 2 , N 3 , and N 4 , in which each pair of transistor pairs P 1 and N 1 , and P 2 and N 2 , is configured as an inverter coupled between power nodes VDD and VSS.
  • Gates of transistors P 2 and N 2 are coupled together, to drain terminals of transistors P 1 and N 1 , and to one of a source or drain terminal of transistor N 3 .
  • the other of the source or drain terminal of transistor N 3 is coupled with bit line BL.
  • Gates of transistors P 1 and N 1 are coupled together, to drain terminals of transistors P 2 and N 2 , and to one of a source or drain terminal of transistor N 4 .
  • the other of the source or drain terminal of transistor N 4 is coupled with complementary bit line BLB.
  • Transistor pairs P 1 and N 1 , and P 2 and N 2 are thereby cross-coupled and configured to be selectively coupled with bit lines BL and BLB through respective transistors N 3 and N 4 .
  • Memory cell 412 X includes the gates of transistors N 3 and N 4 coupled with word line 316 X[m], and is thereby configured to be coupled with bit lines BL and BLB responsive to word line signal WX[m].
  • Memory cell 412 W includes the gates of transistors N 3 and N 4 coupled with word line 316 W[m], and is thereby configured to be coupled with bit lines BL and BLB responsive to word line signal WW[m].
  • Memory cell circuit 400 is thereby configured to selectively activate one or both of memory cells 412 X or 412 W in a read or write operation.
  • a memory circuit 100 or 300 including memory cell circuit 400 is thereby capable of realizing the benefits discussed above with respect to memory circuit 100 , system 200 A, and network circuit 200 B.
  • FIG. 5 is a plot of memory circuit operating parameters, in accordance with some embodiments.
  • data signals W 1 [ m ] and X 1 [ m ] include pulses that do not indicate a particular logic state determined by a sense amplifier SA. Instead, the data signal pulses indicate that a sense amplifier SA is actively outputting a data signal W 1 [ m ] or X 1 [ m ] based on any determined logic state of a selected memory cell 112 .
  • Clock signal CLK includes pulses that indicate step numbers.
  • Data signal W 1 [M] is active from step 1 through step M, illustrating that the corresponding sense amplifier SA is outputting data signal W 1 [M] latched to a voltage level indicating a logic state of the memory cell 112 in row M of a given column in memory array segment 310 W[ 1 ].
  • Data signal W 1 [M ⁇ 1] is active from step M+1 through step 2 M (not shown), illustrating that the corresponding sense amplifier SA is outputting data signal W 1 [M ⁇ 1] latched to a voltage level indicating a logic state of the memory cell 112 in row M ⁇ 1 of the given column in memory array segment 310 W[ 1 ].
  • Data signal X 1 [M] is active during steps 1 and M+1, illustrating that the corresponding sense amplifier SA is outputting data signal X 1 [M] at a voltage level indicating a logic state of the memory cell 112 in row M of a given column in memory array segment 310 X[ 1 ] only during a first step in a sequence of M steps.
  • Data signal X 1 [M ⁇ 1] is active during steps 2 and M+2, illustrating that the corresponding sense amplifier SA is outputting data signal X 1 [M ⁇ 1] at a voltage level indicating a logic state of the memory cell 112 in row M ⁇ 1 of the given column in memory array segment 310 X[ 1 ] only during a second step in the sequence of M steps.
  • Data signal X 1 [M ⁇ 2] is active during steps 3 and M+3 (not shown), illustrating that the corresponding sense amplifier SA is outputting data signal X 1 [M ⁇ 2] at a voltage level indicating a logic state of the memory cell 112 in row M ⁇ 2 of the given column in memory array segment 310 X[ 1 ] only during a third step in the sequence of M steps.
  • Data signal X 1 [ 1 ] is active during steps M and 2 M, illustrating that the corresponding sense amplifier SA is outputting data signal X 1 [ 1 ] at a voltage level indicating a logic state of the memory cell 112 in row 1 of the given column in memory array segment 310 X[ 1 ] only during the Mth step in the sequence of M steps.
  • Steps 1 through M correspond to a first portion of a matrix computation in which a given logic operation is repeatedly performed, e.g., using operation circuit 370 A discussed above with respect to FIG. 3 , by combining latched data signal W 1 [M] with data signals X 1 [M] through X 1 [ 1 ] sequentially selected at each step.
  • steps M+1 through 2 M correspond to a second portion of the matrix computation in which the given logic operation is repeatedly performed by combining latched data signal W 1 [M ⁇ 1] with data signals X 1 [M] through X 1 [ 1 ] sequentially selected at each step.
  • Additional portions of the matric computation correspond to combining each of latched data signals W 1 [M ⁇ 2] through W 1 [ 1 ] with data signals X 1 [M] through X 1 [ 1 ] sequentially selected at corresponding steps.
  • data signals W[M ⁇ 2] through W[ 1 ] correspond to weight data
  • data signals X[M] through X[ 1 ] correspond to input data of a multiply-accumulate operation.
  • steps 1 through M are repeated for each value of m and n, thereby resulting in the following matrix multiplication operation:
  • a memory circuit 100 or 300 configured to perform an in-memory computing operation in accordance with the non-limiting example depicted in FIG. 5 is capable of operating one memory array segment separately from at least one other memory array segment and is thereby capable of realizing the benefits discussed above with respect to memory circuit 100 , system 200 A, and network circuit 200 B.
  • FIG. 6 is a flowchart of a method 600 of performing an in-memory computation, in accordance with one or more embodiments.
  • Method 600 is usable with a memory circuit, e.g., memory circuit 100 discussed above with respect to FIG. 1 , a system, e.g., system 200 A discussed above with respect to FIG. 2A , or a network circuit, e.g., network circuit 200 B discussed above with respect to FIG. 2B .
  • FIG. 6 The sequence in which the operations of method 600 are depicted in FIG. 6 is for illustration only; the operations of method 600 are capable of being executed in sequences that differ from that depicted in FIG. 6 . In some embodiments, operations in addition to those depicted in FIG. 6 are performed before, between, during, and/or after the operations depicted in FIG. 6 .
  • some or all of the operations of method 600 are a subset of operations of a method of performing a memory circuit or network, e.g., neural network, computation. In some embodiments, some or all of the operations of method 600 are used to perform an in-memory computing operation in accordance with the non-limiting example depicted in FIG. 5 .
  • input data is received at an input port of a memory circuit.
  • the memory circuit includes a memory array positioned between the input port and an output port, a write circuit positioned between the input port and the memory array, and a read circuit positioned between the memory array and the output port.
  • receiving the input data at the input port includes receiving input data IN at input port 100 -I, discussed above with respect to FIG. 1 .
  • receiving the input data at the input port includes receiving data from an output port of another memory circuit. In some embodiments, receiving the input data at the input port includes receiving data at one of memory circuits 100 - 2 through 100 -L from an adjacent one of memory circuits 100 - 1 through 100 -(L ⁇ 1), discussed above with respect to FIG. 2 .
  • a first subset of the input data is stored in a first segment of the memory array and a second subset of the input data is stored in a second segment of the memory array.
  • storing the first subset in the first segment and the second subset in the second segment includes storing input data in one of the first or second segments and weight data in the other of the first or second segments.
  • Storing the first subset of the input data in the first segment and the second subset in the second segment includes storing the first and second subsets using the write circuit separate from the read circuit. In some embodiments, storing the first and second subsets includes using the write circuit at a first end of the columns of the memory array opposite a second end of the columns of the memory array at which the read circuit is positioned. In some embodiments, storing the first and second subsets includes using write circuit 130 , discussed above with respect to FIGS. 1 and 3 .
  • storing the first subset in the first segment includes storing the first subset in one of memory array segments 310 X[ 1 ] . . . 310 X[N], and storing the second subset in the second segment includes storing the second subset in one of memory array segments 310 W[ 1 ] . . . 310 W[N], discussed above with respect to FIG. 3 .
  • a first data bit from a first column of memory cells in one of the first segment of the memory array or the second segment of the memory array is latched.
  • latching the first data bit includes latching a weight bit of weight data.
  • latching the first data bit includes latching an input bit of input data.
  • latching the first data bit includes latching the first data bit with a sense amplifier of the read circuit. In some embodiments, latching the first data bit includes selecting the first column using a selection circuit, e.g., a multiplexer. In some embodiments, latching the first data bit includes latching one of data signals X[n] or W[n], discussed above with respect to FIG. 3 .
  • a plurality of second data bits from a second column of memory cells in the other of the first segment or the second segment is sequentially read.
  • sequentially reading the second data bits includes sequentially reading input data bits of input data.
  • sequentially reading the second data bits includes sequentially reading weight data bits of weight data.
  • sequentially reading the second data bits includes sequentially reading the second data bits with a sense amplifier of the read circuit. In some embodiments, sequentially reading the second data bits includes selecting the second column using a selection circuit, e.g., a multiplexer. In some embodiments, sequentially reading the second data bits includes sequentially reading one of data signals X[n] or W[n], discussed above with respect to FIG. 3 .
  • a logic operation is performed on each combination of the latched first data bit and each second data bit of the plurality of second data bits.
  • performing the logic operation includes one or more of performing an OR, NOR, XOR, AND, NAND, or multiplication operation, or one or more other operations suitable for processing at least two data bits.
  • performing the logic operation includes combining a weight data bit with an input data bit.
  • Performing the logic operation includes using a logic circuit.
  • performing the logic operation includes using computation circuit 170 , discussed above with respect to FIG. 1 .
  • performing the logic operation includes using a logic unit 372 , discussed above with respect to FIG. 3 .
  • repeating one or more or all of operations 630 through 650 includes latching a third data bit from the first column of memory cells, sequentially reading the plurality of second data bits from the second column of memory cells, and performing the logic operation on each combination of the latched third data bit and each second data bit of the plurality of second data bits.
  • repeating one or more or all of operations 630 through 650 includes repeating the operations of latching a given data bit, sequentially reading a corresponding plurality of data bits, and performing the logic operation on the resultant combinations for a plurality of columns in respective first and second memory array segments.
  • the respective first and second memory array segments are memory array segments 310 W[ 1 ] . . . 310 W[N] and 310 X[ 1 ] . . . 310 X[N], discussed above with respect to FIG. 3 .
  • a sum is calculated by adding some or all of the results of performing the logic operation on each combination of each latched data bit and each sequentially read data bit. In some embodiments, calculating the sum is part of performing a matrix computation. In some embodiments, calculating the sum is part of performing a matrix combination of weight and input data.
  • Calculating the sum includes using an addition circuit. In some embodiments, calculating the sum includes using computation circuit 170 , discussed above with respect to FIG. 1 . In some embodiments, calculating the sum includes using addition circuit 370 B, discussed above with respect to FIG. 3 .
  • the sum is output by the memory circuit.
  • Outputting the sum includes outputting the sum at the output port of the memory circuit.
  • outputting the sum includes outputting the sum at output port OUT, discussed above with respect to FIG. 1 .
  • the sum is included in an input to a layer of a network circuit.
  • including the sum in an input includes including the sum in an input to an input port of another memory circuit.
  • including the sum in an input includes including the sum in an input to one of memory circuits 100 - 2 through 100 -(L ⁇ 1), discussed above with respect to FIG. 2 .
  • including the sum in an input includes including the sum in an input to a layer of a neural network computation.
  • some or all of an in-memory computation is performed, thereby obtaining the benefits discussed above with respect to memory circuit 100 , system 200 A, and network circuit 200 B.
  • a circuit includes a memory array, a write circuit configured to store data in memory cells of the memory array, a read circuit configured to retrieve the stored data from the memory cells of the memory array, and a computation circuit configured to perform one or more logic operations on the retrieved stored data, wherein the memory array is positioned between the write circuit and the read circuit.
  • a memory circuit includes a memory array including a first segment of memory cells and a second segment of memory cells, and a computation circuit configured to perform a matrix computation by combining first data retrieved from the memory cells of the first segment with second data retrieved from the memory cells of the second segment.
  • method of performing an in-memory computation includes latching a first data bit from a first column of memory cells in one of a first segment of a memory array or a second segment of the memory array, sequentially reading a plurality of second data bits from a second column of memory cells in the other of the first segment or the second segment, and performing a logic operation on each combination of the latched first data bit and each second data bit of the plurality of second data bits.

Abstract

A circuit includes a memory array, a write circuit configured to store data in memory cells of the memory array, a read circuit configured to retrieve the stored data from the memory cells of the memory array, and a computation circuit configured to perform one or more logic operations on the retrieved stored data. The memory array is positioned between the write circuit and the read circuit.

Description

BACKGROUND
Memory arrays are often used to store and access data used for various types of computations such as logic or mathematical operations. To perform these operations, data bits are moved between the memory arrays and circuits used to perform the computations. In some cases, computations include multiple layers of operations, and the results of a first operation are used as input data in a second operation.
BRIEF DESCRIPTION OF THE DRAWINGS
Aspects of the present disclosure are best understood from the following detailed description when read with the accompanying figures. It is noted that, in accordance with the standard practice in the industry, various features are not drawn to scale. In fact, the dimensions of the various features may be arbitrarily increased or reduced for clarity of discussion.
FIG. 1 is a diagram of a memory circuit, in accordance with some embodiments.
FIG. 2A is a diagram of a system, in accordance with some embodiments.
FIG. 2B is a diagram of a network circuit, in accordance with some embodiments.
FIG. 2C is a diagram of a neural network circuit, in accordance with some embodiments.
FIG. 3 is a diagram of a memory circuit, in accordance with some embodiments.
FIG. 4 is a diagram of a memory cell circuit, in accordance with some embodiments.
FIG. 5 is a plot of memory circuit operating parameters, in accordance with some embodiments.
FIG. 6 is a flowchart of a method of performing an in-memory computation, in accordance with some embodiments.
DETAILED DESCRIPTION
The following disclosure provides many different embodiments, or examples, for implementing different features of the provided subject matter. Specific examples of components, values, operations, materials, arrangements, or the like, are described below to simplify the present disclosure. These are, of course, merely examples and are not intended to be limiting. Other components, values, operations, materials, arrangements, or the like, are contemplated. For example, the formation of a first feature over or on a second feature in the description that follows may include embodiments in which the first and second features are formed in direct contact, and may also include embodiments in which additional features may be formed between the first and second features, such that the first and second features may not be in direct contact. In addition, the present disclosure may repeat reference numerals and/or letters in the various examples. This repetition is for the purpose of simplicity and clarity and does not in itself dictate a relationship between the various embodiments and/or configurations discussed.
Further, spatially relative terms, such as “beneath,” “below,” “lower,” “above,” “upper” and the like, may be used herein for ease of description to describe one element or feature's relationship to another element(s) or feature(s) as illustrated in the figures. The spatially relative terms are intended to encompass different orientations of the device in use or operation in addition to the orientation depicted in the figures. The apparatus may be otherwise oriented (rotated 90 degrees or at other orientations) and the spatially relative descriptors used herein may likewise be interpreted accordingly.
In various embodiments, a circuit includes a memory array positioned between a write circuit and a read circuit. The write circuit stores data in the memory array based on data received at an input port, and the read circuit retrieves stored data for a computation circuit that outputs result data to an output port. By performing this in-memory computation in which data flows from the input port to the output port, the circuit is capable of reducing data movement compared to approaches that do not perform such in-memory computations, particularly in cases in which the circuit is used in one or more layers of a network circuit such as a neural network.
In some embodiments, the circuit performs in-memory computations by operating at least one segment of the memory array separately from at least one other segment of the memory array, and is further capable of reducing data movement compared to approaches in which a circuit performs computations based on multiple memory arrays that do not operate segments separately.
FIG. 1 is a diagram of a memory circuit 100, in accordance with some embodiments. Memory circuit 100 includes a memory array 110, a row decode circuit 120, a write circuit 130, a write control circuit 140, a read circuit 150, a read control circuit 160, a computation circuit 170, and a control circuit 180.
Memory array 110 is positioned between and coupled with each one of write circuit 130 and read circuit 150. Read circuit 150 is positioned between and coupled with each one of memory array 110 and computation circuit 170. Write control circuit 140 is adjacent to and coupled with write circuit 130; row decode circuit 120 is adjacent to and coupled with memory array 110; and read control circuit 160 is adjacent to and coupled with read circuit 150.
In the embodiment depicted in FIG. 1, both write circuit 130 and write control circuit 140 are positioned at the top of memory array 110, and read circuit 150, read control circuit 160, and computation circuit 170 are positioned at the bottom of memory array 110. In some embodiments, both write circuit 130 and write control circuit 140 are positioned at the bottom of memory array 110, and read circuit 150, read control circuit 160, and computation circuit 170 are positioned at the top of memory array 110.
Row decode circuit 120 is positioned between and coupled with each one of write control circuit 140 and read control circuit 160. Control circuit 180 is coupled with each one of write control circuit 140, row decode circuit 120, read control circuit 160, and computation circuit 170. In some embodiments, control circuit 180 is not coupled with one or more of write control circuit 140, row decode circuit 120, read control circuit 160, or computation circuit 170.
Two or more circuit elements are considered to be coupled based on one or more direct signal connections and/or one or more indirect signal connections that include one or more logic devices, e.g., an inverter or logic gate, between the two or more circuit elements. In some embodiments, signal communications between the two or more coupled circuit elements are capable of being modified, e.g., inverted or made conditional, by the one or more logic devices.
In the embodiment depicted in FIG. 1, control circuit 180 is adjacent to each one of write control circuit 140, row decode circuit 120, read control circuit 160, and computation circuit 170. In various embodiments, control circuit 180 is positioned apart from one or more of write control circuit 140, row decode circuit 120, read control circuit 160, or computation circuit 170, and/or control circuit 180 includes one or more of write control circuit 140, row decode circuit 120, read control circuit 160, or computation circuit 170.
In some embodiments, memory circuit 100 does not include control circuit 180, and one or more of row decode circuit 120, write control circuit 140, read control circuit 160, or computation circuit 170 is configured to receive one or more control signals (not shown) from one or more circuits, e.g., a processor 210 discussed below with respect to FIG. 2A, external to memory circuit 100.
Memory array 110 is an array of memory cells 112 arranged in rows and columns. In the embodiment depicted in FIG. 1, memory array 110 includes a segment 110A including one or more columns of memory cells 112, and a segment 110B including one or more columns of memory cells 112. In various embodiments, memory array 110 includes a single segment, or greater than two segments, each segment including one or more columns of memory cells 112. In some embodiments, memory array 110 includes one or more of memory array segments 310X[1] . . . 310X[N] and/or 310W[1] . . . 310W[N], discussed below with respect to FIG. 3.
In embodiments in which memory array 110 includes more than one segment, memory circuit 100 is configured to operate at least one segment separately from at least one other segment, as discussed below.
A memory cell 112 of memory array 110 includes electrical, electromechanical, electromagnetic, or other devices configured to store bit data represented by logical states.
Each column of a number C columns of memory cells 112 is coupled with a corresponding bit line of bit lines 114[1] . . . 114[C] through which the logical states are programmed in a write operation and detected in a read operation. Each row of a number R rows of memory cells 112 is coupled with a corresponding word line of word lines 116[1] . . . 116[R] through which the memory cell 112 is selected in the read and write operations.
In some embodiments, a logical state corresponds to a voltage level of an electrical charge stored in a given memory cell 112. In some embodiments, a logical state corresponds to a physical property, e.g., a resistance or magnetic orientation, of a component of a given memory cell 112.
In some embodiments, memory cells 112 include static random-access memory (SRAM) cells. In various embodiments, SRAM cells include five-transistor (5T) SRAM cells, six-transistor (6T) SRAM cells, eight-transistor (8T) SRAM cells, nine-transistor (9T) SRAM cells, or SRAM cells having other numbers of transistors. In some embodiments, memory cells 112 include dynamic random-access memory (DRAM) cells or other memory cell types capable of storing bit data. In some embodiments, memory cells 112 include memory cells 412X and 412W, discussed below with respect to FIG. 4.
Row decode circuit 120 is an electronic circuit configured to generate one or more word line signals (not labeled) on word lines 116[1] . . . 116[R] based on one or more control signals (not shown) received from control circuit 180 or from one or more circuits, e.g., processor 210 discussed below with respect to FIG. 2A, external to memory circuit 100. The one or more word line signals are capable of causing one or more memory cells 112 to become activated during read and write operations, thereby selecting the one or more memory cells 112 during a read or write operation.
In some embodiments, row decode circuit 120 is configured to select an entirety of a given row of memory cells 112 during a read or write operation. In some embodiments, during a read or write operation, row decode circuit 120 is configured to select one or more subsets of a given row of memory cells 112 by generating one or more subsets of word line signals on one or more subsets of word lines 116[1] . . . 116[R], memory circuit 100 thereby being configured in part to operate at least one segment, e.g., segment 110A, of memory array 110 separately from at least one other segment, e.g., segment 110B, of memory array 110.
In some embodiments, row decode circuit 120 includes row decode circuit 320 configured to generate one or more of word line signals WX[1] . . . WX[M] on word lines 316X[1] . . . 316X[M] or word line signals WW[1] . . . WW[M] on word lines 316W[1] . . . 316W[M], discussed below with respect to FIG. 3.
Write circuit 130 is an electronic circuit configured to generate voltage levels corresponding to logical states on bit lines 114[1] . . . 114[C] during a write operation, the one or more memory cells 112 selected during the write operation thereby being programmed to logical states based on the voltage levels on bit lines 114[1] . . . 114[C]. In the embodiment depicted in FIG. 1, each memory cell 112 is coupled with a single one of bit lines 114[1] . . . 114[C], and write circuit 130 is configured to output a single voltage level on the single one of bit lines 114[1] . . . 114[C] corresponding to a given memory cell 112. In some embodiments, each memory cell 112 is coupled with a pair of bit lines of bit lines 114[1] . . . 114[C], and write circuit 130 is configured to output complementary voltage levels on the pair of bit lines of bit lines 114[1] . . . 114[C] corresponding to a given memory cell 112.
Write circuit 130 is configured to generate the voltage levels based on data IN received at an input port 100-I, and on one or more control signals (not shown) received from write control circuit 140 or from one or more circuits, e.g., processor 210 discussed below with respect to FIG. 2A, external to memory circuit 100.
A port, e.g., input port 100-I, is a plurality of electrical connections configured to conduct one or more signals, e.g., data IN, in and/or out of a circuit or portion of a circuit. Data IN includes a plurality of voltage levels, each voltage level being carried on one or more electrical connections of input port 100-I and corresponding to a logical state of a data bit of data IN.
In some embodiments, write circuit 130 is configured to generate the one or more voltage levels for an entirety of the columns of memory cells 112 during a write operation. In some embodiments, during a write operation, write circuit 130 is configured to write to one or more subsets of the columns of memory cells 112, memory circuit 100 thereby being configured in part to operate at least one segment, e.g., segment 110A, of memory array 110 separately from at least one other segment, e.g., segment 110B, of memory array 110.
In some embodiments, memory circuit 100 is configured so that write circuit 130 writes to the one or more subsets of the columns of memory cells 112 based on the one or more subsets of the columns of memory cells 112 being activated by row decoder 120 during a write operation as discussed above. In some embodiments, write circuit 130 is configured to write to one or more subsets of the columns of memory cells 112 by masking one or more portions of data IN during a write operation.
Write control circuit 140 is an electronic circuit configured to generate and output the one or more control signals to write circuit 130 based on one or more control signals (not shown) received from control circuit 180 or from one or more circuits, e.g., processor 210 discussed below with respect to FIG. 2A, external to memory circuit 100.
Read circuit 150 is an electronic circuit configured to receive voltage signals (not labeled) on one or more of bit lines 114[1] . . . 114[C] during a read operation, the voltage signals being based on the logical states of the one or more memory cells 112 selected during the read operation. Read circuit 150 is configured to determine the logical states of the one or more memory cells 112 selected during the read operation based on the voltage signals on the one or more of bit lines 114[1] . . . 114[C]. In some embodiments, read circuit 150 includes one or more sense amplifiers, e.g., sense amplifier SA discussed below with respect to FIG. 3, configured to determine the logical states of the one or more memory cells 112.
In the embodiment depicted in FIG. 1, each memory cell 112 is coupled with a single bit line of bit lines 114[1] . . . 114[C], and read circuit 150 is configured to determine the logical state of a given memory cell 112 based on the voltage signal on the single bit line of bit lines 114[1] . . . 114[C] corresponding to the given memory cell 112. In some embodiments, each memory cell 112 is coupled with a pair of bit lines of bit lines 114[1] . . . 114[C], and read circuit 150 is configured to determine the logical state of a given memory cell 112 based on the voltage signals on the pair of bit lines of bit lines 114[1] . . . 114[C] corresponding to the given memory cell 112.
Read circuit 150 is configured to generate one or more data signals (not shown) based on the determined logical states of memory cells 112, and on one or more control signals (not shown) received from read control circuit 160.
In some embodiments, read circuit 150 is configured to generate the one or more data signals based on an entirety of the columns of memory cells 112 during a read operation. In some embodiments, during a read operation, read circuit 150 is configured to generate one or more data signals based on one or more subsets of the columns of memory cells 112, memory circuit 100 thereby being configured in part to operate at least one segment, e.g., segment 110A, of memory array 110 separately from at least one other segment, e.g., segment 110B, of memory array 110. In some embodiments, read circuit 150 is configured to generate one or more data signals based on one or more subsets of the columns of memory cells 112 by masking one or more voltage signals on bit lines 114[1] . . . 114[C] during a read operation.
In some embodiments, memory circuit 100 is configured so that read circuit 150 generates one or more data signals based on the one or more subsets of the columns of memory cells 112 being activated by row decoder 120 during a read operation as discussed above. In some embodiments, read circuit 150 includes read circuit 350 configured to generate data signals X[1] . . . X[N] and W[1] . . . W[N], discussed below with respect to FIG. 3.
Read control circuit 160 is an electronic circuit configured to generate and output the one or more control signals to read circuit 150 based on one or more control signals (not shown) received from control circuit 180 or from one or more circuits, e.g., processor 210 discussed below with respect to FIG. 2A, external to memory circuit 100.
Computation circuit 170 is an electronic circuit configured to receive the one or more data signals from read circuit 150, and perform one or more logical and/or mathematical operations based on the one or more data signals and one or more control signals (not shown) received from control circuit 180 or from one or more circuits, e.g., processor 210 discussed below with respect to FIG. 2A, external to memory circuit 100.
In some embodiments, memory circuit 100 is configured so that one or more logical and/or mathematical operations performed by computation circuit 170 are coordinated with one or more operations performed by read circuit 150, memory circuit 100 thereby being configured to perform an in-memory computation. In some embodiments, memory circuit 100 is configured so that computation circuit 170 performs one or more logical and/or mathematical operations in a sequence coordinated with a sequence by which read circuit 150 determines logical states of memory cells 112. In some embodiments, memory circuit 100 is configured so that read circuit 150 and computation circuit 170 operations are coordinated to perform a matrix computation as discussed below with respect to the non-limiting examples of FIGS. 2C and 5.
In some embodiments, computation circuit 170 is configured to perform the one or more logical functions based on performing a first operation on a first subset of the one or more data signals and performing a second operation on a second subset of the one or more data signals, memory circuit 100 thereby being configured in part to operate at least one segment, e.g., segment 100A, of memory array 110 separately from at least one other segment, e.g., segment 110B, of memory array 110.
In some embodiments, computation circuit 170 is configured to perform a matrix computation using the first subset of the one or more data signals as input data and the second subset of the one or more data signals as weight data. In some embodiments, computation circuit 170 includes a multiplier-accumulator configured to perform a multiply-accumulate operation. In some embodiments, computation circuit 170 includes operation circuit 370A and addition circuit 370B, discussed below with respect to FIG. 3.
Computation circuit 170 is configured to output data OUT on an output port 100-O. Data OUT includes a plurality of voltage levels, each voltage level being carried on one or more electrical connections of output port 100-O. In various embodiments, data OUT includes a same, greater, or lesser number of voltage levels as a number of voltage levels included in data IN.
The plurality of voltage levels of data OUT are based on one or more results of the one or more logical and/or mathematical operations. In some embodiments, one or more voltage levels are based on one or more results of a logical or mathematical operation performed by computation circuit 170 on two or more data bits stored in memory array 110 and retrieved by read circuit 150. In various embodiments, memory circuit 100 is configured to generate data OUT including none, one or more, or all of the plurality of voltage levels of data OUT representing a logical state of a memory cell 112 in memory array 110.
In the embodiment depicted in FIG. 1, memory array 110 is positioned between input port 100-I at the top of memory circuit 100 and output port 100-O at the bottom of memory circuit 100. In some embodiments in which write circuit 130 and write control circuit 140 are positioned at the bottom of memory array 110, and read circuit 150, read control circuit 160, and computation circuit 170 are positioned at the top of memory array 110, memory array 110 is positioned between input port 100-I at the bottom of memory circuit 100 and output port 100-O at the top of memory circuit 100. In various embodiments, memory array 110 is positioned between input port 100-I and output port 100-O based on one or both of input port 100-I or output port 100-O being positioned at a side or sides of memory circuit 100.
By the configuration discussed above, memory circuit 100, in operation, is capable of receiving data IN at input port 100-I, storing logical states based on data IN, performing one or more logical functions based on the stored logical states, and generating data OUT at output port 100-O. Memory circuit 100 is thereby configured to perform an in-memory computation in which data flows in the direction determined by the positioning of input port 100-I and output port 100-O.
By including separately positioned input and output ports and in-memory computation, memory circuit 100 is capable of being included in circuits in which data movement distances are reduced compared to approaches in which a memory circuit does not include one or both of separately positioned input and output ports or in-memory computation. By reducing data movement distances, memory circuit 100 enables reduced power and simplified circuit configurations by reducing parasitic capacitances associated with data bus lengths and/or numbers of data buffers compared to approaches in which a memory circuit does not include one or both of separately positioned input and output ports or in-memory computation.
In some embodiments in which memory circuit 100 is configured to perform in-memory computation by operating at least one segment of memory array 110 separately from at least one other segment of memory array 110, memory circuit 100 is further capable of reducing data movement distances compared to approaches in which a memory circuit includes multiple memory arrays that do not include in-memory computation or segmented arrays.
FIG. 2A is a diagram of a system 200A, in accordance with some embodiments. System 200A includes memory circuit 100, discussed above with respect to FIG. 1, and a processor 210. Processor 210 is an electronic circuit configured to perform one or more logic operations and is coupled with memory circuit 100 through a data bus BUS.
System 200A is an electronic or electromechanical system configured to perform one or more predetermined functions based on the one or more logic operations performed by processor 210 and on data and in-memory computation operations performed by memory circuit 100 including computation circuit 170, as discussed above with respect to FIG. 1. In various embodiments, system 200A is configured to perform one or more functions, e.g., a feed-forward or multiply-accumulate function, of a neural network.
In some embodiments, system 200A includes one or more circuits (not shown) in addition to memory circuit 100 and processor 210. In some embodiments, system 200A includes a network circuit, e.g., network circuit 200B discussed below with respect to FIG. 2B, that includes a plurality of memory circuits 100.
Data bus BUS is a plurality of electrical connections configured to conduct one or more signals between memory circuit 100 and processor 210. Data bus BUS is coupled with input port 100-I and output port 100-O of memory circuit 100 and is thereby configured to conduct one or both of data IN from processor 210 to memory circuit 100 or data OUT from memory circuit 100 to processor 210.
In some embodiments, data bus BUS is further coupled with memory circuit 100 and is thereby configured to conduct one or more control or other signals (not shown) between memory circuit 100 and processor 210.
By the configuration discussed above, system 200A including memory circuit 100 is capable of realizing the benefits discussed above with respect to memory circuit 100.
FIG. 2B is a diagram of a network circuit 200B, in accordance with some embodiments. Network circuit 200B includes multiple layers of memory circuits 100, discussed above with respect to FIG. 1.
Network circuit 200B includes a number L of layers of memory circuits 100 labeled 100-1 through 100-L, the layers including respective input ports 100-1-I through 100-L-I and output ports 100-1-O through 100-L-O. Input port 100-1-I is an input port of network circuit 200B, and output port 100-L-O is an output port of network circuit 200B.
Output port 100-1-O is coupled with input port 100-2-I, and output port 100-2-O is coupled with the input port of the adjacent layer (not shown), the pattern being repeated through input port 100-L-I such that data paths from input port 100-1-I to output port 100-L-O include each one of memory circuits 100-1 through 100-L.
By the configuration discussed above, in operation, memory circuit 100-1 receives data IN-1 at input port 100-1-I and outputs data OUT-1 on output port 100-1-O, and memory circuit 100-2 receives data OUT-1 as data IN-2 at input port 100-2-I and outputs data OUT-2 on output port 100-2-O, the pattern being repeated such that data flows from input port 100-1-I to output port 100-L-O through each one of memory circuits 100-1 through 100-L.
In the embodiment depicted in FIG. 2B, network circuit 200B includes the number L layers of memory circuits 100 equal to three. In various embodiments, network circuit 200B includes the number L layers of memory circuits 100 fewer or greater than three.
In the embodiment depicted in FIG. 2B, input ports 100-1-I through 100-L-I are positioned at the tops of respective memory circuits 100-1 through 100-L, and output ports 100-1-O through 100-L-O are positioned at the bottoms of respective memory circuits 100-1 through 100-L, so that, in operation, data flows from input port 100-1-I at the top of network circuit 200B to output port 100-L-O at the bottom of network circuit 200B. In some embodiments, input ports 100-1-I through 100-L-I are positioned at the bottoms of respective memory circuits 100-1 through 100-L, and output ports 100-1-O through 100-L-O are positioned at the tops of respective memory circuits 100-1 through 100-L, so that, in operation, data flows from input port 100-1-I at the bottom of network circuit 200B to output port 100-L-O at the top of network circuit 200B.
In various embodiments, one or more subsets of input ports 100-1-I through 100-L-I and/or one or more subsets of output ports 100-1-O through 100-L-O are positioned on respective memory circuits 100-1 through 100-L at one or more locations other than those depicted in FIG. 2B so that, in operation, data flows in more than one direction within network circuit 200B. In some embodiments, network circuit 200B includes memory circuits 100-1 through 100-L arranged in multiple rows and/or columns so that, in operation, data flows in a multi-directional pattern, e.g., a serpentine pattern, within network circuit 200B.
In various embodiments, the input and output ports of each layer of network circuit 200B have a same number of electrical connections, or at least one pair of input and output ports of adjacent layers of network circuit 200B has one or more numbers of electrical connections different from one or more numbers of electrical connections of one or more other pairs of input and output ports of adjacent layers of network circuit 200B.
In various embodiments, the memory circuits 100 of each layer of network circuit 200B are configured to output and receive data having a same number of data bits, or at least one pair of memory circuits 100 of adjacent layers of network circuit 200B is configured to output and receive data having a number of data bits different from a number of data bits of data output and received by one or more other pairs of memory circuits 100 of adjacent layers of network circuit 200B.
In some embodiments, the data output on an output port of a memory circuit 100 of a given layer of network circuit 200B is the same data as the data received at the input port of the memory circuit of the corresponding adjacent layer of network circuit 200B. In various embodiments, one or more of the data output from a given layer is a subset or a superset of the data received at the corresponding adjacent layer, the data output from a given layer includes data received by a circuit, e.g., processor 210 discussed above with respect to FIG. 2A, other than the corresponding adjacent layer, or the data received at the corresponding adjacent layer includes data output from a circuit, e.g., processor 210 discussed above with respect to FIG. 2A, other than the given layer.
Because each one of memory circuits 100-1 through 100-L includes computation circuit 170, discussed above with respect to FIG. 1, and network circuit 200B includes memory circuits 100-1 through 100-L configured as discussed above, network circuit 200B is configured to perform a series of computations in which the computational results of each one of memory circuits 100-1 through 100-(L−1) are included in one or more computations performed by each one of corresponding memory circuits 100-2 through 100-L. Network circuit 200B is thereby configured to perform a layered computational operation based on data received at input port 100-1-I and to output the results of the layered computational operation on output port 100-L-O.
In some embodiments, network circuit 200B includes at least one memory circuit 100 configured to operate at least one segment of memory array 110 separately from at least one other segment of memory array 110. In some embodiments, e.g., a neural network circuit 200C discussed below with respect to FIG. 2C, network circuit 200B includes at least one memory circuit 100 including computation circuit 170 configured to perform a matrix computation using data stored in segment 110A of memory array 110 as input data and data stored in segment 110B of memory array 110 as weight data.
By the configuration discussed above, data movement distances in network circuit 200B are reduced compared to approaches in which a network circuit does not include memory circuits 100 such that data flows in a given direction and in which in-memory computation is performed within the data flow. By reducing data movement distances, network circuit 200B enables reduced power and simplified circuit configurations compared to approaches in which a network circuit does not include memory circuits that include one or both of separately positioned input and output ports or in-memory computation, as discussed above with respect to memory circuit 100.
In some embodiments in which network circuit 200B includes at least one memory circuit 100 configured to perform in-memory computation by operating at least one segment, e.g., segment 110A, of memory array 110 separately from at least one other segment, e.g., segment 110B, of memory array 110, network circuit 200B is further capable of reducing data movement distances compared to approaches in which a network circuit includes multiple memory arrays that do not include in-memory computation or segmented arrays.
FIG. 2C is a diagram of neural network circuit 200C, in accordance with some embodiments. Neural network circuit 200C is a non-limiting example of network circuit 200B, discussed above with respect to FIG. 2B, in which L−1 layers of memory circuits 100 are configured as hidden layers of a deep learning neural network.
Neural network circuit 200C includes memory circuits 100-1 through 100-L, discussed above with respect to FIG. 2B, and an input layer 2001 coupled with input port 100-1-I of memory circuit 100-1. Input layer 2001 includes an input port 20014 of neural network circuit 200C, and memory circuit 100-L is configured as an output layer of neural network circuit 200C by including output port 100-L-O configured as an output port of neural network circuit 200C.
In neural network circuit 200C, each of memory circuits 100-1 through 100-L includes segments 110A and 110B, and computation circuit 170 configured to perform one or more matrix computations on data signals based on segments 110A and 110B, as discussed above with respect to FIG. 1. The one or more matrix computations are represented in FIG. 2C as intersecting line segments in each instance of computation circuit 170.
In some embodiments, the instances of computation circuit 170 are configured to perform a same one or more matrix computations on a same portion or all of the data signals based on segments 110A and 110B. In various embodiments, the instances of computation circuit 170 are configured so that at least one instance of computation circuit 170 is configured to perform one or more matrix computations different from one or more matrix computations performed based on a configuration of at least one other instance of computation circuit 170. In various embodiments, the instances of computation circuit 170 are configured so that at least one instance of computation circuit 170 is configured to perform one or more matrix computations on a portion or all of the data signals different from a portion or all of the data signals on which one or more matrix computations are performed based on a configuration of at least one other instance of computation circuit 170.
Input layer 2001 is an electronic circuit configured to receive one or more data and/or control signals and, responsive to the one or more data and/or control signals, output data IN-1 to input port 100-1-I. Data IN-1 includes a number M1 of input data bits X1-XM1 and a number N1 of weight data bits W1-WN1.
Memory circuit 100-1 is configured to store bit data corresponding to input data bits X1-XM1 in segment 110A and bit data corresponding to weight data bits W1-WN1 in segment 110B, perform the one or more matrix computations by combining the data stored in segment 110A with the data stored in segment 110B, and output data OUT-1 to output port 100-1-O. Data OUT-1 includes a number M2 of input data bits X1-XM2 and a number N2 of weight data bits W1-WN2.
Memory circuit 100-2 is configured to receive data OUT-1 as data IN-2 at input port 100-24, store bit data corresponding to input data bits X1-XM2 in segment 110A and bit data corresponding to weight data bits W1-WN2 in segment 110B, perform the one or more matrix computations by combining the data stored in segment 110A with the data stored in segment 110B, and output data OUT-2 to output port 100-2-O. Data OUT-2 includes a number M3 of input data bits X1-XM3 and a number N3 of weight data bits W1-WN3.
Memory circuit 100-L is configured to receive data IN-L at input port 100-L-I, store bit data corresponding to input data bits X1-XML in segment 110A and bit data corresponding to weight data bits W1-WNL in segment 110B, perform the one or more matrix computations by combining the data stored in segment 110A with the data stored in segment 110B, and output data OUT-L to output port 100-L-O. Data OUT-L includes a number K of data bits Y1-YK.
In some embodiments, numbers M1-M(L−1) are a same number of input data bits and numbers N1-N(L−1) are a same number of weight data bits. In various embodiments, at least one of numbers M1-M(L−1) is different from at least one other of numbers M1-M(L−1) and/or at least one of numbers N1-N(L−1) is different from at least one other of numbers N1-N(L−1). In various embodiments the number K of data bits Y1-YK is the same as or different from at least one of numbers M1-M(L−1) and/or numbers N1-N(L−1).
By the configuration discussed above, in operation, memory circuit 100-1 performs a matrix computation on input data bits X1-XM1 and weight data bits W1-WN1 to generate input data bits X1-XM2 and weight data bits W1-WN2, and memory circuit 100-2 performs a matrix computation on input data bits X1-XM2 and weight data bits W1-WN2 to generate input data bits X1-XM3 and weight data bits W1-WN3, the pattern being repeated such that data flows from input port 100-1-I to output port 100-L-O through each one of memory circuits 100-1 through 100-L.
Because neural network circuit 200C includes input layer 2001 between input port 20014 and memory circuit 100-1, and memory circuit 100-(L−1) is separated from output port 100-L-O by memory circuit 100-L configured as an output layer, memory circuits 100-1 through 100-(L−1) are sometimes referred to as hidden layers of neural network circuit 200C.
In some embodiments, neural network circuit 200C is included in a neural network, and each layer of neural network circuit 200C is a layer of the neural network. In some embodiments, each hidden layer of neural network circuit 200C is a multiplier-accumulator layer of a feed-forward neural network.
A neural network that includes neural network circuit 200C, including memory circuits 100-1 through 100-L configured as discussed above, is thereby capable of realizing the benefits discussed above with respect to network circuit 200B.
FIG. 3 is a diagram of a memory circuit 300, in accordance with some embodiments. Memory circuit 300 is usable as a portion of memory circuit 100, discussed above with respect to FIG. 1.
Memory circuit 300 includes memory array segments 310X[1] . . . 310X[N] and 310W[1] . . . 310W[N] usable as all or a portion of memory array 110 including segments 110A and 110B, a row decode circuit 320 usable as all or a portion of row decode circuit 120, write circuit 130, a read circuit 350 usable as all or a portion of read circuit 150, and operation circuit 370A and addition circuit 370B, collectively usable as all or a portion of computation circuit 170, as discussed above with respect to FIG. 1.
Each one of memory array segments 310X[1] . . . 310X[N] and 310W[1] . . . 310W[N] corresponds to a segment 110A or 110B and includes at least one column of memory cells 112 coupled with a bit line of bit lines BLX[1A] . . . BLX[NA], BLX[1B] . . . BLX[NB], BLW[1A] . . . BLW[NA], or BLW[1B] . . . BLW[NB] corresponding to a bit line of bit lines 114[1] . . . 114[C], discussed above with respect to FIG. 1. In the embodiment depicted in FIG. 3, a given memory cell 112 is coupled with a single bit line of bit lines BLX[1A] . . . BLX[NA], BLX[1B] . . . BLX[NB], BLW[1A] . . . BLW[NA], or BLW[1B] . . . BLW[NB]. In some embodiments, a given memory cell 112 is coupled with a pair of bit lines of bit lines BLX[1A] . . . BLX[NA], BLX[1B] . . . BLX[NB], BLW[1A] . . . BLW[NA], or BLW[1B] . . . BLW[NB].
In the embodiment depicted in FIG. 3, each one of memory array segments 310X[1] . . . 310X[N] and 310W[1] . . . 310W[N] includes two columns of memory cells 112. In various embodiments, one or more of memory array segments 310X[1] . . . 310X[N] or 310W[1] . . . 310W[N] includes one or greater than two columns of memory cells 112.
In the embodiment depicted in FIG. 3, each one of memory array segments 310X[1] . . . 310X[N] and 310W[1] . . . 310W[N] includes a same number of columns of memory cells 112. In some embodiments, one or more of memory array segments 310X[1] . . . 310X[N] includes a first number of columns of memory cells 112 and one or more of memory array segments 310W[1] . . . 310W[N] includes a second number of columns of memory cells 112 different from the first number of columns of memory cells 112.
Memory array segments 310X[1] . . . 310X[N] and 310W[1] . . . 310W[N] are positioned such that each memory array segment 310X[n] is adjacent to a corresponding memory array segment 310W[n].
A given row of memory cells 112 thereby includes a first subset of memory cells 112 in memory array segments 310X[1] . . . 310X[N] alternating with a second subset of memory cells 112 in memory array segments 310W[1] . . . 310W[N]. The first subset of memory cells 112 of a given row m is coupled with one of word lines 316X[m], and the second subset of memory cells 112 of the given row m is coupled with one of word lines 316W[m].
In some embodiments, a given row m of memory cells 112 includes a memory cell 412X coupled with a word line 316X[m] and a memory cell 412W coupled with a word line 316W[m], discussed below with respect to FIG. 4.
Row decode circuit 320 is configured to output word line signals WX[1] . . . WX[M] corresponding to the first subset of memory cells 112 on word lines 316X[1] . . . 316X[M], and to output word line signals WW[1] . . . WW[M] corresponding to the second subset of memory cells 112 on word lines 316W[1] . . . 316W[M].
Row decode circuit 320 is thereby configured to, during a read or write operation, select the first subset of memory cells 112 of a row m by generating word line signal WX[m] on the corresponding word line 316X[m], and/or to select the second subset of memory cells 112 of the row m by generating word line signal WW[m] on the corresponding word line 316W[m].
Because write circuit 130 is configured to generate the voltage levels on bit lines BLX[1A] . . . BLX[NA], BLX[1B] . . . BLX[NB], BLW[1A] . . . BLW[NA], and BLW[1B] . . . BLW[NB] based on data IN received at input port 100-I, memory circuit 300 is thereby configured, in a write operation, to write a first subset of data IN to memory array segments 310X[1] . . . 310X[N], write a second subset of data IN to memory array segments 310W[1] . . . 310W[N], or write an entirety of data IN to memory array segments 310X[1] . . . 310X[N] and 310W[1] . . . 310W[N].
Read circuit 350 includes a plurality of sense amplifiers SA coupled with bit lines BLX[1A] . . . BLX[NA], BLX[1B] . . . BLX[NB], BLW[1A] . . . BLW[NA], and BLW[1B] . . . BLW[NB] through a plurality of selection circuits SEL. In the embodiment depicted in FIG. 3, a given sense amplifier SA is coupled with a pair of bit lines of bit lines BLX[1A] . . . BLX[NA], BLX[1B] . . . BLX[NB], BLW[1A] . . . BLW[NA], or BLW[1B] . . . BLW[NB] through a corresponding selection circuit SEL.
In some embodiments, read circuit does not include a plurality of selection circuits SEL, and a given sense amplifier SA is coupled with a single bit line of bit lines BLX[1A] . . . BLX[NA], BLX[1B] . . . BLX[NB], BLW[1A] . . . BLW[NA], or BLW[1B] . . . BLW[NB]. In some embodiments, a given sense amplifier SA is coupled with greater than two bit lines of bit lines BLX[1A] . . . BLX[NA], BLX[1B] . . . BLX[NB], BLW[1A] . . . BLW[NA], or BLW[1B] . . . BLW[NB] through a corresponding selection circuit SEL.
In some embodiments, a selection circuit SEL includes a multiplexer. In some embodiments, read circuit 350 does not include a selection circuit SEL, and each one of memory array segments 310X[1] . . . 310X[N] and/or 310W[1] . . . 310W[N] includes a selection circuit SEL.
Each sense amplifier SA is an electronic circuit configured to determine the logical state of a corresponding selected memory cell 112 during a read operation. A first subset of sense amplifiers SA is coupled with the first subsets of the rows of memory cells 112 corresponding to memory array segments 310X[1] . . . 310X[N], and a second subset of sense amplifiers SA is coupled with the second subsets of the rows of memory cells 112 corresponding to memory array segments 310W[1] . . . 310W[N].
The first subset of sense amplifiers SA is configured to generate data signals X[1] . . . X[N] having voltage levels based on the logical states of the corresponding selected memory cells 112 during a read operation, and the second subset of sense amplifiers SA is configured to generate data signals W[1] . . . W[N] having voltage levels based on the logical states of the corresponding selected memory cells 112 during a read operation.
In some embodiments, each sense amplifier SA of the first subset of sense amplifiers SA includes a latch circuit configured to generate data signals X[1] . . . X[N] having latched voltage levels. In some embodiments, each sense amplifier SA of the second subset of sense amplifiers SA includes a latch circuit configured to generate data signals W[1] . . . W[N] having latched voltage levels.
Operation circuit 370A includes the number N of logic units 372. An nth logic unit 372 is configured to receive a pair of data signals X[n] and W[n], perform one or more logic or mathematical operations based on the voltage levels of data signals X[n] and W[n], and generate a signal R[n] of signals R[1] . . . R[N] having a voltage level representing a result of the one or more logic or mathematical operations.
In various embodiments, the nth logic unit 372 is configured to perform the one or more logic or mathematical operations based solely on data signals X[n] and W[n], or to perform the one or more logic or mathematical operations based on one or more data signals (not shown) in addition to data signals X[n] and W[n].
In various embodiments, logic units 372 are configured to perform one or more of an OR, NOR, XOR, AND, NAND, or multiplication operation, or one or more other operations suitable for processing two or more data bits.
In some embodiments, each logic unit 372 is configured to perform a same logic or mathematical operation. In various embodiments, at least one logic unit 372 is configured to perform a logic or mathematical operation different from one or more logic or mathematical operations performed by one or more other logic units 372.
In some embodiments, each logic unit 372 is configured to perform a same logic or mathematical operation during all operations. In various embodiments, at least one logic unit 372 is configurable so as to perform at least one logic or mathematical operation of a plurality of varying logic or mathematical operations responsive to one or more received signals (not shown).
By the configuration discussed above, memory circuit 300 is capable of performing an in-memory computation by coordinating read circuit 350 generating data signals X[n] and W[n] with operation circuit 170A performing one or more logical and/or mathematical operations on data signals X[n] and W[n].
By the configuration discussed above, operation circuit 370A is capable of performing multiple logic and/or mathematical operations on data stored in memory cells 112 by operating on data in memory array segments 310X[1] . . . 310X[N] separately from data in respective memory array segments 310W[1] . . . 310W[N].
In some embodiments, memory circuit 300 is configured to, in an in-memory computing operation, use a first sense amplifier SA to generate a latched one of data signals X[n] or W[n], use a corresponding second sense amplifier SA to dynamically generate the other one of data signals X[n] or W[n] by sequentially selecting memory cells 112 from multiple rows in a given column, and use an nth logic unit 372 to repeatedly perform a given logic or mathematical operation to generate signal R[n]. Memory circuit 300 is configured to sequentially select memory cells 112 from multiple rows in a given column by generating either word line signal WX[m] on a word line 316X[m] or word line signal WW[m] on a word line 316W[m] while changing values of m.
In various embodiments, memory circuit 300 is configured to, in an in-memory computing operation, sequentially select memory cells 112 by stepping values of m from 1 through M, from M through 1, from 1 to a value less than M, from M to a value greater than 1, or using another order to change values of m within the span of 1 through M.
In some embodiments, memory circuit 300 is configured such that, in an in-memory computing operation, operation circuit 370A repeats the nth logic unit 372 repeatedly performing the given logic or mathematical operation to generate signal R[n] for multiple values of n.
In various embodiments, memory circuit 300 is configured to, in an in-memory computing operation, generate signal R[n] for multiple values of n by using each value of n from 1 through N or by using a subset of values of n from within the span of 1 through N. In various embodiments, memory circuit 300 is configured to, in an in-memory computing operation, generate signal R[n] for multiple values of n by using multiple logic units 372 in parallel, in series, or in a combination of parallel and series operation.
In some embodiments, memory circuit 300 is configured to perform a non-limiting example of an in-memory computing operation discussed below with respect to FIG. 5.
Addition circuit 370B is configured to receive signals R[1] . . . R[N], perform an addition operation based on the results represented by the voltage levels of signals R[1] . . . R[N], generate data OUT, and output data OUT on output port 100-O.
In some embodiments, addition circuit 370B is configured to perform the addition operation by adding each of the results of an nth logic unit 372 repeatedly performing the given logic or mathematical operation represented by signal R[n] for each signal R[n] of signals R[1] . . . R[N]. In various embodiments, addition circuit 370B is configured to perform the addition operation by adding one or more subsets of the results of the nth logic unit 372 repeatedly performing the given logic or mathematical operation represented by signal R[n] for each signal R[n] of signals R[1] . . . R[N].
In various embodiments, addition circuit 370B is configured to generate data OUT having N data bits, fewer than N data bits, or greater than N data bits.
By the configuration discussed above, memory circuit 300 is capable of performing a series of in-memory computing operations, e.g., a matrix computation, based on data in memory array segments 310X[1] . . . 310X[N] separate from data in respective memory array segments 310W[1] . . . 310W[N]. A memory circuit 100, system 200A, or network circuit 200B including memory circuit 300 is thereby capable of realizing the benefits discussed above with respect to memory circuit 100, system 200A, and network circuit 200B.
In embodiments in which memory circuit 300 is configured to dynamically generate one of data signals X[n] or W[n] by sequentially selecting memory cells 112 from multiple rows in a given column with the other of data signals X[n] or W[n] latched, memory circuit 300 enables reduced power and simplified circuit configurations compared to approaches in which a memory circuit does not dynamically generate a first data signal with a second data signal.
FIG. 4 is a diagram of a memory cell circuit 400, in accordance with some embodiments. Memory cell circuit 400 is usable as a portion of a memory circuit 100 or 300, discussed above with respect to FIGS. 1 and 3.
Memory cell circuit 400 includes word line 316X[m] configured to carry word line signal WX[m] and word line 316W[m] configured to carry word line signal WW[m], discussed above with respect to FIG. 3. Memory cell circuit 400 also includes memory cells 412X and 412W, each usable as a memory cell 112, and bit lines BL and BLB, each usable as a bit line of bit lines 114[1] . . . 114[C], each discussed above with respect to FIG. 1.
FIG. 4 depicts memory cell circuit 400 including one each of memory cells 412X and 412W for the purpose of illustration. In various embodiments, memory cell circuit 400 includes greater than one each of one or both of memory cells 412X and 412W.
Each one of memory cells 412X and 412W is configured as a 6T SRAM cell by including power nodes VDD and VSS, PMOS transistors P1 and P2, and NMOS transistors N1, N2, N3, and N4, in which each pair of transistor pairs P1 and N1, and P2 and N2, is configured as an inverter coupled between power nodes VDD and VSS.
Gates of transistors P2 and N2 are coupled together, to drain terminals of transistors P1 and N1, and to one of a source or drain terminal of transistor N3. The other of the source or drain terminal of transistor N3 is coupled with bit line BL.
Gates of transistors P1 and N1 are coupled together, to drain terminals of transistors P2 and N2, and to one of a source or drain terminal of transistor N4. The other of the source or drain terminal of transistor N4 is coupled with complementary bit line BLB. Transistor pairs P1 and N1, and P2 and N2, are thereby cross-coupled and configured to be selectively coupled with bit lines BL and BLB through respective transistors N3 and N4.
Memory cell 412X includes the gates of transistors N3 and N4 coupled with word line 316X[m], and is thereby configured to be coupled with bit lines BL and BLB responsive to word line signal WX[m]. Memory cell 412W includes the gates of transistors N3 and N4 coupled with word line 316W[m], and is thereby configured to be coupled with bit lines BL and BLB responsive to word line signal WW[m].
Memory cell circuit 400 is thereby configured to selectively activate one or both of memory cells 412X or 412W in a read or write operation. A memory circuit 100 or 300 including memory cell circuit 400 is thereby capable of realizing the benefits discussed above with respect to memory circuit 100, system 200A, and network circuit 200B.
FIG. 5 is a plot of memory circuit operating parameters, in accordance with some embodiments. FIG. 5 depicts a non-limiting example of an in-memory computing operation in which a data signal W1[m] is latched while a data signal X1[m] is dynamically generated by stepping from m=M to m=1 based on a clock signal CLK. Data signals W1[m] and X1[m] are non-limiting examples of respective data signals W[n] and X[n], discussed above with respect to FIG. 3, for a case in which n=1.
For the purpose of illustration, data signals W1[m] and X1[m] include pulses that do not indicate a particular logic state determined by a sense amplifier SA. Instead, the data signal pulses indicate that a sense amplifier SA is actively outputting a data signal W1[m] or X1[m] based on any determined logic state of a selected memory cell 112. Clock signal CLK includes pulses that indicate step numbers.
Data signal W1[M] is active from step 1 through step M, illustrating that the corresponding sense amplifier SA is outputting data signal W1[M] latched to a voltage level indicating a logic state of the memory cell 112 in row M of a given column in memory array segment 310W[1].
Data signal W1[M−1] is active from step M+1 through step 2M (not shown), illustrating that the corresponding sense amplifier SA is outputting data signal W1[M−1] latched to a voltage level indicating a logic state of the memory cell 112 in row M−1 of the given column in memory array segment 310W[1].
Data signal X1[M] is active during steps 1 and M+1, illustrating that the corresponding sense amplifier SA is outputting data signal X1[M] at a voltage level indicating a logic state of the memory cell 112 in row M of a given column in memory array segment 310X[1] only during a first step in a sequence of M steps.
Data signal X1[M−1] is active during steps 2 and M+2, illustrating that the corresponding sense amplifier SA is outputting data signal X1[M−1] at a voltage level indicating a logic state of the memory cell 112 in row M−1 of the given column in memory array segment 310X[1] only during a second step in the sequence of M steps.
Data signal X1[M−2] is active during steps 3 and M+3 (not shown), illustrating that the corresponding sense amplifier SA is outputting data signal X1[M−2] at a voltage level indicating a logic state of the memory cell 112 in row M−2 of the given column in memory array segment 310X[1] only during a third step in the sequence of M steps.
Data signal X1[1] is active during steps M and 2M, illustrating that the corresponding sense amplifier SA is outputting data signal X1[1] at a voltage level indicating a logic state of the memory cell 112 in row 1 of the given column in memory array segment 310X[1] only during the Mth step in the sequence of M steps.
Steps 1 through M correspond to a first portion of a matrix computation in which a given logic operation is repeatedly performed, e.g., using operation circuit 370A discussed above with respect to FIG. 3, by combining latched data signal W1[M] with data signals X1[M] through X1[1] sequentially selected at each step. Similarly, steps M+1 through 2M correspond to a second portion of the matrix computation in which the given logic operation is repeatedly performed by combining latched data signal W1[M−1] with data signals X1[M] through X1[1] sequentially selected at each step. Additional portions of the matric computation correspond to combining each of latched data signals W1[M−2] through W1[1] with data signals X1[M] through X1[1] sequentially selected at corresponding steps.
To complete the matrix computation, the results of each logic operation performed on the combination of data signals W[1] . . . [M] and X[1] . . . X[M] are summed, e.g., using addition circuit 370B discussed above with respect to FIG. 3.
In some embodiments, data signals W[M−2] through W[1] correspond to weight data, and data signals X[M] through X[1] correspond to input data of a multiply-accumulate operation.
In some embodiments, for cases in which n>1, steps 1 through M are repeated for each value of m and n, thereby resulting in the following matrix multiplication operation:
1 st cycle Mth cycle [ X 1 [ M ] X 2 [ M ] Xn [ M ] X 1 [ M - 1 ] X 2 [ M - 1 ] Xn [ M - 1 ] X 1 [ 2 ] X 2 [ 2 ] Xn [ 2 ] X1 [ 1 ] X 2 [ 1 ] Xn [ 1 ] ] · [ W 1 [ M ] W 1 [ M - 1 ] W 1 [ 2 ] W 1 [ 1 ] W 2 [ M ] Wn [ M ] W 2 [ M - 1 ] Wn [ M - 1 ] W 2 [ 2 ] Wn [ 2 ] X 2 [ 1 ] Wn [ 1 ] ] [ 1 ]
The output OUT of the matrix multiplication operation is represented by the equation:
OUT = i = 0 n X i · W i [ 2 ]
wherein Xi represents data signals Xi[1] through Xi[M] and Wi represents data signals Wi[1] through Wi[M].
A memory circuit 100 or 300 configured to perform an in-memory computing operation in accordance with the non-limiting example depicted in FIG. 5 is capable of operating one memory array segment separately from at least one other memory array segment and is thereby capable of realizing the benefits discussed above with respect to memory circuit 100, system 200A, and network circuit 200B.
FIG. 6 is a flowchart of a method 600 of performing an in-memory computation, in accordance with one or more embodiments. Method 600 is usable with a memory circuit, e.g., memory circuit 100 discussed above with respect to FIG. 1, a system, e.g., system 200A discussed above with respect to FIG. 2A, or a network circuit, e.g., network circuit 200B discussed above with respect to FIG. 2B.
The sequence in which the operations of method 600 are depicted in FIG. 6 is for illustration only; the operations of method 600 are capable of being executed in sequences that differ from that depicted in FIG. 6. In some embodiments, operations in addition to those depicted in FIG. 6 are performed before, between, during, and/or after the operations depicted in FIG. 6.
In some embodiments, some or all of the operations of method 600 are a subset of operations of a method of performing a memory circuit or network, e.g., neural network, computation. In some embodiments, some or all of the operations of method 600 are used to perform an in-memory computing operation in accordance with the non-limiting example depicted in FIG. 5.
At operation 610, in some embodiments, input data is received at an input port of a memory circuit. The memory circuit includes a memory array positioned between the input port and an output port, a write circuit positioned between the input port and the memory array, and a read circuit positioned between the memory array and the output port.
In some embodiments, receiving the input data at the input port includes receiving input data IN at input port 100-I, discussed above with respect to FIG. 1.
In some embodiments, receiving the input data at the input port includes receiving data from an output port of another memory circuit. In some embodiments, receiving the input data at the input port includes receiving data at one of memory circuits 100-2 through 100-L from an adjacent one of memory circuits 100-1 through 100-(L−1), discussed above with respect to FIG. 2.
At operation 620, in some embodiments, a first subset of the input data is stored in a first segment of the memory array and a second subset of the input data is stored in a second segment of the memory array. In some embodiments, storing the first subset in the first segment and the second subset in the second segment includes storing input data in one of the first or second segments and weight data in the other of the first or second segments.
Storing the first subset of the input data in the first segment and the second subset in the second segment includes storing the first and second subsets using the write circuit separate from the read circuit. In some embodiments, storing the first and second subsets includes using the write circuit at a first end of the columns of the memory array opposite a second end of the columns of the memory array at which the read circuit is positioned. In some embodiments, storing the first and second subsets includes using write circuit 130, discussed above with respect to FIGS. 1 and 3.
In some embodiments, storing the first subset in the first segment includes storing the first subset in one of memory array segments 310X[1] . . . 310X[N], and storing the second subset in the second segment includes storing the second subset in one of memory array segments 310W[1] . . . 310W[N], discussed above with respect to FIG. 3.
At operation 630, in some embodiments, a first data bit from a first column of memory cells in one of the first segment of the memory array or the second segment of the memory array is latched. In some embodiments, latching the first data bit includes latching a weight bit of weight data. In some embodiments, latching the first data bit includes latching an input bit of input data.
In some embodiments, latching the first data bit includes latching the first data bit with a sense amplifier of the read circuit. In some embodiments, latching the first data bit includes selecting the first column using a selection circuit, e.g., a multiplexer. In some embodiments, latching the first data bit includes latching one of data signals X[n] or W[n], discussed above with respect to FIG. 3.
At operation 640, in some embodiments, a plurality of second data bits from a second column of memory cells in the other of the first segment or the second segment is sequentially read. In some embodiments, sequentially reading the second data bits includes sequentially reading input data bits of input data. In some embodiments, sequentially reading the second data bits includes sequentially reading weight data bits of weight data.
In some embodiments, sequentially reading the second data bits includes sequentially reading the second data bits with a sense amplifier of the read circuit. In some embodiments, sequentially reading the second data bits includes selecting the second column using a selection circuit, e.g., a multiplexer. In some embodiments, sequentially reading the second data bits includes sequentially reading one of data signals X[n] or W[n], discussed above with respect to FIG. 3.
At operation 650, in some embodiments, a logic operation is performed on each combination of the latched first data bit and each second data bit of the plurality of second data bits. In various embodiments, performing the logic operation includes one or more of performing an OR, NOR, XOR, AND, NAND, or multiplication operation, or one or more other operations suitable for processing at least two data bits. In some embodiments, performing the logic operation includes combining a weight data bit with an input data bit.
Performing the logic operation includes using a logic circuit. In some embodiments, performing the logic operation includes using computation circuit 170, discussed above with respect to FIG. 1. In some embodiments, performing the logic operation includes using a logic unit 372, discussed above with respect to FIG. 3.
At operation 660, in some embodiments, one or more or all of operations 630 through 650 are repeated. In some embodiments, repeating one or more or all of operations 630 through 650 includes latching a third data bit from the first column of memory cells, sequentially reading the plurality of second data bits from the second column of memory cells, and performing the logic operation on each combination of the latched third data bit and each second data bit of the plurality of second data bits.
In some embodiments, repeating one or more or all of operations 630 through 650 includes repeating the operations of latching a given data bit, sequentially reading a corresponding plurality of data bits, and performing the logic operation on the resultant combinations for a plurality of columns in respective first and second memory array segments. In some embodiments, the respective first and second memory array segments are memory array segments 310W[1] . . . 310W[N] and 310X[1] . . . 310X[N], discussed above with respect to FIG. 3.
At operation 670, in some embodiments, a sum is calculated by adding some or all of the results of performing the logic operation on each combination of each latched data bit and each sequentially read data bit. In some embodiments, calculating the sum is part of performing a matrix computation. In some embodiments, calculating the sum is part of performing a matrix combination of weight and input data.
Calculating the sum includes using an addition circuit. In some embodiments, calculating the sum includes using computation circuit 170, discussed above with respect to FIG. 1. In some embodiments, calculating the sum includes using addition circuit 370B, discussed above with respect to FIG. 3.
At operation 680, in some embodiments, the sum is output by the memory circuit. Outputting the sum includes outputting the sum at the output port of the memory circuit. In some embodiments, outputting the sum includes outputting the sum at output port OUT, discussed above with respect to FIG. 1.
At operation 690, in some embodiments, the sum is included in an input to a layer of a network circuit. In some embodiments, including the sum in an input includes including the sum in an input to an input port of another memory circuit. In some embodiments, including the sum in an input includes including the sum in an input to one of memory circuits 100-2 through 100-(L−1), discussed above with respect to FIG. 2.
In some embodiments, including the sum in an input includes including the sum in an input to a layer of a neural network computation.
By executing some or all of the operations of method 600, some or all of an in-memory computation is performed, thereby obtaining the benefits discussed above with respect to memory circuit 100, system 200A, and network circuit 200B.
In some embodiments, a circuit includes a memory array, a write circuit configured to store data in memory cells of the memory array, a read circuit configured to retrieve the stored data from the memory cells of the memory array, and a computation circuit configured to perform one or more logic operations on the retrieved stored data, wherein the memory array is positioned between the write circuit and the read circuit.
In some embodiments, a memory circuit includes a memory array including a first segment of memory cells and a second segment of memory cells, and a computation circuit configured to perform a matrix computation by combining first data retrieved from the memory cells of the first segment with second data retrieved from the memory cells of the second segment.
In some embodiments, method of performing an in-memory computation includes latching a first data bit from a first column of memory cells in one of a first segment of a memory array or a second segment of the memory array, sequentially reading a plurality of second data bits from a second column of memory cells in the other of the first segment or the second segment, and performing a logic operation on each combination of the latched first data bit and each second data bit of the plurality of second data bits.
The foregoing outlines features of several embodiments so that those skilled in the art may better understand the aspects of the present disclosure. Those skilled in the art should appreciate that they may readily use the present disclosure as a basis for designing or modifying other processes and structures for carrying out the same purposes and/or achieving the same advantages of the embodiments introduced herein. Those skilled in the art should also realize that such equivalent constructions do not depart from the spirit and scope of the present disclosure, and that they may make various changes, substitutions, and alterations herein without departing from the spirit and scope of the present disclosure.

Claims (18)

What is claimed is:
1. A memory circuit comprising:
an input port;
an output port;
a memory array;
a write circuit configured to receive data at the input port and store the data in memory cells of the memory array;
a read circuit configured to retrieve the stored data from the memory cells of the memory array; and
a computation circuit configured to perform one or more logic operations on the retrieved stored data, and output result data on the output port,
wherein
the memory array is positioned between the input port and the output port,
the computation circuit is configured to output the result data comprising accumulated sum data based on a matrix computation, and
the memory circuit further comprises another write circuit configured to store the accumulated sum data in memory cells of another memory array.
2. The memory circuit of claim 1, wherein the memory circuit is configured to coordinate the read circuit performing one or more read operations with the computation circuit performing the one or more logic operations.
3. The memory circuit of claim 1, further comprising:
a plurality of first word lines coupled with a plurality of rows of memory cells in a first segment of the memory array; and
a plurality of second word lines coupled with the plurality of rows of memory cells in a second segment of the memory array.
4. The memory circuit of claim 2, wherein the computation circuit is configured to perform the matrix computation by combining a first subset of the stored data retrieved from the first segment with a second subset of the stored data retrieved from the second segment.
5. The memory circuit of claim 4, wherein
the read circuit comprises a latch circuit coupled with a first column of the memory cells in one of the first segment or the second segment, and
the computation circuit is configured to perform the matrix computation by sequentially combining a data bit stored in the latch circuit with a plurality of data bits retrieved from a second column of the memory cells in the other of the first segment or the second segment.
6. The memory circuit of claim 1, wherein
the memory array is a first memory array of a plurality of memory arrays,
the another memory array is a second memory array of the plurality of memory arrays, and
the write circuit is configured to receive data from another read circuit positioned between the write circuit and a third memory array of the plurality of memory arrays.
7. A memory circuit comprising:
a memory array comprising a first segment of memory cells and a second segment of memory cells;
a sense amplifier comprising a latch circuit coupled with a first column of the memory cells of the first segment; and
a computation circuit configured to perform a matrix computation by sequentially combining first data retrieved from the memory cells of the first segment and stored in the latch circuit with second data retrieved from a second column of the memory cells of the second segment.
8. The memory circuit of claim 7, further comprising:
a plurality of first word lines coupled with the first column; and
a plurality of second word lines coupled with the second column,
wherein, during the matrix computation, the memory circuit is configured to sequentially activate the plurality of second word lines.
9. The memory circuit of claim 7, wherein
the first segment comprises a plurality of first columns of memory cells configured to store the first data, the plurality of first columns comprising the first column,
the second segment comprises a plurality of second columns of memory cells configured to store the second data, the plurality of second columns comprising the second column, and
the plurality of first columns is adjacent to the plurality of second columns.
10. The memory circuit of claim 9, further comprising:
a plurality of first word lines, each first word line of the plurality of first word lines coupled with a memory cell of each first column of the plurality of first columns; and
a plurality of second word lines, each second word line of the plurality of second word lines coupled with a memory cell of each second column of the plurality of second columns.
11. The memory circuit of claim 9, wherein a number of first columns of the plurality of first columns is equal to a number of second columns of the plurality of second columns.
12. The memory circuit of claim 7, wherein each memory cell of the first segment of memory cells and the second segment of memory cells is a static random-access memory (SRAM) cell.
13. A method of performing an in-memory computation, the method comprising:
latching a first data bit from a first column of memory cells in one of a first segment of a memory array or a second segment of the memory array;
sequentially reading a plurality of second data bits from a second column of memory cells in the other of the first segment or the second segment; and
performing a logic operation on each combination of the latched first data bit and each second data bit of the plurality of second data bits.
14. The method of claim 13, further comprising:
latching a third data bit from the first column of memory cells;
sequentially reading the plurality of second data bits from the second column of memory cells; and
performing the logic operation on each combination of the latched third data bit and each second data bit of the plurality of second data bits.
15. The method of claim 14, further comprising calculating a sum by adding results of the performing the logic operation on each combination of the latched first data bit and each second data bit of the plurality of second data bits to results of the performing the logic operation on each combination of the latched third data bit and each second data bit of the plurality of second data bits.
16. The method of claim 15, wherein the calculating the sum comprises further adding results of performing the logic operation on each combination of a latched fifth data bit from a third column of memory cells and each sixth data bit of a plurality of sixth data bits from a fourth column of memory cells.
17. The method of claim 16, further comprising including the sum in an input to a layer of a neural network computation.
18. The memory circuit of claim 7, wherein
the first column of the memory cells of the first segment is one first column of a plurality of first columns of the memory cells of the first segment, and
the sense amplifier is coupled to each first column of the plurality of first columns through a selection circuit.
US16/405,822 2018-06-29 2019-05-07 Memory computation circuit and method Active US10839894B2 (en)

Priority Applications (6)

Application Number Priority Date Filing Date Title
US16/405,822 US10839894B2 (en) 2018-06-29 2019-05-07 Memory computation circuit and method
TW108121134A TW202001884A (en) 2018-06-29 2019-06-18 Memory computation circuit
CN201910538988.XA CN110660417A (en) 2018-06-29 2019-06-20 Memory computing circuit
US17/077,401 US11398275B2 (en) 2018-06-29 2020-10-22 Memory computation circuit and method
US17/808,536 US11830543B2 (en) 2018-06-29 2022-06-23 Memory computation circuit
US18/448,039 US20230395143A1 (en) 2018-06-29 2023-08-10 Memory computation method

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201862691903P 2018-06-29 2018-06-29
US16/405,822 US10839894B2 (en) 2018-06-29 2019-05-07 Memory computation circuit and method

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/077,401 Continuation US11398275B2 (en) 2018-06-29 2020-10-22 Memory computation circuit and method

Publications (2)

Publication Number Publication Date
US20200005859A1 US20200005859A1 (en) 2020-01-02
US10839894B2 true US10839894B2 (en) 2020-11-17

Family

ID=69054729

Family Applications (4)

Application Number Title Priority Date Filing Date
US16/405,822 Active US10839894B2 (en) 2018-06-29 2019-05-07 Memory computation circuit and method
US17/077,401 Active US11398275B2 (en) 2018-06-29 2020-10-22 Memory computation circuit and method
US17/808,536 Active US11830543B2 (en) 2018-06-29 2022-06-23 Memory computation circuit
US18/448,039 Pending US20230395143A1 (en) 2018-06-29 2023-08-10 Memory computation method

Family Applications After (3)

Application Number Title Priority Date Filing Date
US17/077,401 Active US11398275B2 (en) 2018-06-29 2020-10-22 Memory computation circuit and method
US17/808,536 Active US11830543B2 (en) 2018-06-29 2022-06-23 Memory computation circuit
US18/448,039 Pending US20230395143A1 (en) 2018-06-29 2023-08-10 Memory computation method

Country Status (2)

Country Link
US (4) US10839894B2 (en)
TW (1) TW202001884A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20230420043A1 (en) * 2022-06-24 2023-12-28 Macronix International Co., Ltd. Memory device and operation method thereof for performing multiply-accumulate operation

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11277455B2 (en) 2018-06-07 2022-03-15 Mellanox Technologies, Ltd. Streaming system
US20200106828A1 (en) * 2018-10-02 2020-04-02 Mellanox Technologies, Ltd. Parallel Computation Network Device
FR3088767B1 (en) * 2018-11-16 2022-03-04 Commissariat Energie Atomique MEMORY CIRCUIT SUITABLE FOR IMPLEMENTING CALCULATION OPERATIONS
US11625393B2 (en) 2019-02-19 2023-04-11 Mellanox Technologies, Ltd. High performance computing system
EP3699770A1 (en) 2019-02-25 2020-08-26 Mellanox Technologies TLV Ltd. Collective communication system and methods
US11573834B2 (en) * 2019-08-22 2023-02-07 Micron Technology, Inc. Computational partition for a multi-threaded, self-scheduling reconfigurable computing fabric
DE102020100541A1 (en) * 2020-01-13 2021-07-15 Infineon Technologies Ag DETERMINATION OF A RESULTING DATA WORD WHEN ACCESSING A MEMORY
US11750699B2 (en) 2020-01-15 2023-09-05 Mellanox Technologies, Ltd. Small message aggregation
US11252027B2 (en) 2020-01-23 2022-02-15 Mellanox Technologies, Ltd. Network element supporting flexible data reduction operations
US11714570B2 (en) 2020-02-26 2023-08-01 Taiwan Semiconductor Manufacturing Company, Ltd. Computing-in-memory device and method
US11876885B2 (en) 2020-07-02 2024-01-16 Mellanox Technologies, Ltd. Clock queue with arming and/or self-arming features
US11556378B2 (en) 2020-12-14 2023-01-17 Mellanox Technologies, Ltd. Offloading execution of a multi-task parameter-dependent operation to a network device
CN113346895B (en) * 2021-04-27 2022-09-02 北京航空航天大学 Simulation and storage integrated structure based on pulse cut-off circuit
US11922237B1 (en) 2022-09-12 2024-03-05 Mellanox Technologies, Ltd. Single-step collective operations

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160062947A1 (en) * 2014-08-29 2016-03-03 Nvidia Corporation Performing multi-convolution operations in a parallel processing system
US20170103307A1 (en) * 2015-10-08 2017-04-13 Via Alliance Semiconductor Co., Ltd. Processor with hybrid coprocessor/execution unit neural network unit
US20170236578A1 (en) * 2005-07-01 2017-08-17 Apple Inc. Integrated Circuit With Separate Supply Voltage For Memory That Is Different From Logic Circuit Supply Voltage
US20200013436A1 (en) * 2017-03-21 2020-01-09 Socionext Inc. Semiconductor integrated circuit
US10553285B2 (en) * 2017-11-28 2020-02-04 Western Digital Technologies, Inc. Single-port memory with opportunistic writes

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8417758B1 (en) * 2009-09-01 2013-04-09 Xilinx, Inc. Left and right matrix multiplication using a systolic array
US9747546B2 (en) * 2015-05-21 2017-08-29 Google Inc. Neural network processor
US10049322B2 (en) * 2015-05-21 2018-08-14 Google Llc Prefetching weights for use in a neural network processor
US9910827B2 (en) * 2016-07-01 2018-03-06 Hewlett Packard Enterprise Development Lp Vector-matrix multiplications involving negative values
US10879904B1 (en) * 2017-07-21 2020-12-29 X Development Llc Application specific integrated circuit accelerators
US10642922B2 (en) * 2018-09-28 2020-05-05 Intel Corporation Binary, ternary and bit serial compute-in-memory circuits
US11409352B2 (en) * 2019-01-18 2022-08-09 Silicon Storage Technology, Inc. Power management for an analog neural memory in a deep learning artificial neural network

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170236578A1 (en) * 2005-07-01 2017-08-17 Apple Inc. Integrated Circuit With Separate Supply Voltage For Memory That Is Different From Logic Circuit Supply Voltage
US20160062947A1 (en) * 2014-08-29 2016-03-03 Nvidia Corporation Performing multi-convolution operations in a parallel processing system
US20170103307A1 (en) * 2015-10-08 2017-04-13 Via Alliance Semiconductor Co., Ltd. Processor with hybrid coprocessor/execution unit neural network unit
US20200013436A1 (en) * 2017-03-21 2020-01-09 Socionext Inc. Semiconductor integrated circuit
US10553285B2 (en) * 2017-11-28 2020-02-04 Western Digital Technologies, Inc. Single-port memory with opportunistic writes

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20230420043A1 (en) * 2022-06-24 2023-12-28 Macronix International Co., Ltd. Memory device and operation method thereof for performing multiply-accumulate operation

Also Published As

Publication number Publication date
US20230395143A1 (en) 2023-12-07
US20200005859A1 (en) 2020-01-02
US20220328096A1 (en) 2022-10-13
US11830543B2 (en) 2023-11-28
US20210043253A1 (en) 2021-02-11
TW202001884A (en) 2020-01-01
US11398275B2 (en) 2022-07-26

Similar Documents

Publication Publication Date Title
US10839894B2 (en) Memory computation circuit and method
Ali et al. IMAC: In-memory multi-bit multiplication and accumulation in 6T SRAM array
US11322195B2 (en) Compute in memory system
US11568223B2 (en) Neural network circuit
CN110729011B (en) In-memory arithmetic device for neural network
US11714570B2 (en) Computing-in-memory device and method
US20220269483A1 (en) Compute in memory accumulator
Ali et al. RAMANN: in-SRAM differentiable memory computations for memory-augmented neural networks
US20220375508A1 (en) Compute in memory (cim) memory array
Rai et al. Perspectives on emerging computation-in-memory paradigms
US20230315389A1 (en) Compute-in-memory cell
CN109698000B (en) Dummy word line tracking circuit
US20230045840A1 (en) Computing device, memory controller, and method for performing an in-memory computation
EP3940527A1 (en) In-memory computation circuit and method
US10073655B2 (en) Semiconductor integrated circuit apparatus
US20220067501A1 (en) Sram architecture for convolutional neural network application
US10672465B1 (en) Neuromorphic memory device
CN110660417A (en) Memory computing circuit
US10847215B2 (en) Bitcell shifting technique
Damodaran A Novel SRAM Architecture for In-Memory Computing
US20230127502A1 (en) Memory cell and method of operating the same
US20230131308A1 (en) Memory devices, computing devices, and methods for in-memory computing
US20230176770A1 (en) Data sequencing circuit and method
CN115512729A (en) Memory device and method of operating the same
US20210043241A1 (en) Polarity Swapping Circuitry

Legal Events

Date Code Title Description
FEPP Fee payment procedure

Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

AS Assignment

Owner name: TAIWAN SEMICONDUCTOR MANUFACTURING COMPANY, LTD.,

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:CHEN, YEN-HUEI;FUJIWARA, HIDEHIRO;LIAO, HUNG-JEN;AND OTHERS;SIGNING DATES FROM 20190625 TO 20190701;REEL/FRAME:049955/0301

Owner name: TAIWAN SEMICONDUCTOR MANUFACTURING COMPANY, LTD., TAIWAN

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:CHEN, YEN-HUEI;FUJIWARA, HIDEHIRO;LIAO, HUNG-JEN;AND OTHERS;SIGNING DATES FROM 20190625 TO 20190701;REEL/FRAME:049955/0301

STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS

STPP Information on status: patent application and granting procedure in general

Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED

STCF Information on status: patent grant

Free format text: PATENTED CASE