US20200176056A1 - In-memory convolution for machine learning - Google Patents

In-memory convolution for machine learning Download PDF

Info

Publication number
US20200176056A1
US20200176056A1 US16/205,743 US201816205743A US2020176056A1 US 20200176056 A1 US20200176056 A1 US 20200176056A1 US 201816205743 A US201816205743 A US 201816205743A US 2020176056 A1 US2020176056 A1 US 2020176056A1
Authority
US
United States
Prior art keywords
block
array
memory cells
values
cells
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US16/205,743
Other versions
US10672469B1 (en
Inventor
Hsiang-Lan Lung
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Macronix International Co Ltd
Original Assignee
Macronix International Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Macronix International Co Ltd filed Critical Macronix International Co Ltd
Priority to US16/205,743 priority Critical patent/US10672469B1/en
Priority to TW108119229A priority patent/TWI696189B/en
Priority to CN201910488755.3A priority patent/CN111261210B/en
Application granted granted Critical
Publication of US10672469B1 publication Critical patent/US10672469B1/en
Publication of US20200176056A1 publication Critical patent/US20200176056A1/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C11/00Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor
    • G11C11/21Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements
    • G11C11/34Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices
    • G11C11/40Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices using transistors
    • G11C11/401Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices using transistors forming cells needing refreshing or charge regeneration, i.e. dynamic cells
    • G11C11/4063Auxiliary circuits, e.g. for addressing, decoding, driving, writing, sensing or timing
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C13/00Digital stores characterised by the use of storage elements not covered by groups G11C11/00, G11C23/00, or G11C25/00
    • G11C13/0002Digital stores characterised by the use of storage elements not covered by groups G11C11/00, G11C23/00, or G11C25/00 using resistive RAM [RRAM] elements
    • G11C13/0021Auxiliary circuits
    • G11C13/004Reading or sensing circuits or methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • G06N3/065Analogue means
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C13/00Digital stores characterised by the use of storage elements not covered by groups G11C11/00, G11C23/00, or G11C25/00
    • G11C13/0002Digital stores characterised by the use of storage elements not covered by groups G11C11/00, G11C23/00, or G11C25/00 using resistive RAM [RRAM] elements
    • G11C13/0021Auxiliary circuits
    • G11C13/0069Writing or programming circuits or methods
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C7/00Arrangements for writing information into, or reading information out from, a digital store
    • G11C7/10Input/output [I/O] data interface arrangements, e.g. I/O data control circuits, I/O data buffers
    • G11C7/1051Data output circuits, e.g. read-out amplifiers, data output buffers, data output registers, data output level conversion circuits
    • G11C7/1069I/O lines read out arrangements
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C7/00Arrangements for writing information into, or reading information out from, a digital store
    • G11C7/10Input/output [I/O] data interface arrangements, e.g. I/O data control circuits, I/O data buffers
    • G11C7/1078Data input circuits, e.g. write amplifiers, data input buffers, data input registers, data input level conversion circuits
    • G11C7/1096Write circuits, e.g. I/O line write drivers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology

Definitions

  • the present invention relates to circuitry that can be used to perform in-memory convolution for machine learning.
  • CNN Convolutional neural networks
  • GPU graphics processing units
  • DRAM dynamic random access memory
  • data is frequently moved between multiple GPUs and DRAMs for convolutional operations, through components on printed circuit boards such as conductive traces and pads.
  • data movement can consume a significant amount of power and slow down the performance.
  • a device comprises a first block of memory cells, a second block of memory cells to store a feature array, and a third block of memory cells to store an array of output values.
  • Sensing circuitry is coupled to the first block of memory cells and the second block of memory cells to compare electrical differences between the memory cells in the first block and the memory cells in the second block to generate the array of output values.
  • Writing circuitry operatively coupled to the third block can store the array of output values in the third block of memory cells.
  • an analog level can be stored without verify cycles to verify that the cell has been changed to the target resistance or threshold range corresponding to a particular digital value. Storing output values in the fifth block of memory cells as analog levels instead of digital values can improve the performance for storing the output values in the array of output values, because the verify cycles are not needed.
  • in-place convolution refers to convolution of a function of a filter array over an input array to generate an array of output values, where the filter array and the input array are stored in an addressable memory before the convolution, the convolution is executed while the filter array and the input array remain stored in the same addressable memory, and are not moved to another addressable memory before or during the execution of the convolution.
  • the sensing circuitry is configured to compare electrical differences between the feature array with each frame in the set of frames to generate the array of output values, where each value in the array of output values corresponds to a frame in the set of frames, and indicates electrical differences between analog values from its corresponding frame and analog values from the feature array.
  • the device includes address generation circuits that apply addresses for the set of frames and the feature array to the first block and the second block in coordination with the sensing circuitry comparing the electrical differences.
  • the first block can be configured to store an input array.
  • the device can further comprise a fourth block of memory cells to store a filter array, and a fifth block of memory cells to store an input array.
  • Convolution circuitry operatively coupled to the fourth block of memory cells and the fifth block of memory cells can execute in-place convolution of a function of the filter array over the input array to generate an array of convolved values.
  • Writing circuitry operatively coupled to the first block of memory cells can store the array of convolved values in the first block.
  • the input array and the filter array can include digital values
  • the convolution circuitry can receive the digital values as inputs to the function.
  • the function convolves the filter array with each frame in the set of frames to generate the array of convolved values, where each value in the array of convolved values corresponds to a frame in the set of frames, and indicates a number of digital values from its corresponding frame that matches corresponding digital values from the filter array.
  • the device includes address generation circuits that apply addresses for the set of frames in the input array and the filter array to the fifth block and the fourth block in coordination with the in-place convolution.
  • the writing circuitry operatively coupled to the third block can be configured to store an analog level in each cell of the third block for the array of output values.
  • the writing circuitry can apply a sequence of write pulses for each cell in the third block having a number of write pulses determined according to a corresponding output value in the array of output values.
  • the writing circuitry can apply a sequence of write pulses for each cell in the third block having a pulse duration determined according to a corresponding output value in the array of output values.
  • the writing circuitry can apply a sequence of write pulses for each cell in the third block having a tail length of a write pulse determined according to a corresponding output value in the array of output values.
  • the first block of memory cells, the second block of memory cells, and the third block of memory cells can be implemented on a single integrated circuit chip or a multichip module under one package.
  • a method for operating a device that comprises a first block of memory cells, a second block of memory cells to store a feature array, and a third block of memory cells to store an array of output values.
  • the method comprises comparing electrical differences between memory cells in the first block and the memory cells in the second block to generate the array of output values, and storing the array of output values in the third block of memory cells.
  • the method For a set of frames of cells in the first block, the method includes comparing electrical differences between the feature array with each frame in the set of frames to generate the array of output values, where each value in the array of output values corresponds to a frame in the set of frames, and indicates electrical differences between analog values from its corresponding frame and analog values from the feature array.
  • the method includes applying addresses for the set of frames and the feature array to the first block and the second block in coordination with the sensing circuitry comparing the electrical differences.
  • the method can include storing an input array in the first block of memory cells.
  • the device can comprise a fourth block of memory cells to store a filter array and a fifth block of memory cells to store an input array, and the method can include executing in-place convolution of a function of the filter array over the input array to generate an array of convolved values, and storing the array of convolved values in the first block.
  • the input array and the filter array can include digital values, and the method can include receiving the digital values as inputs to the function.
  • the method can include convolving the filter array with each frame in the set of frames to generate the array of convolved values, where each value in the array of convolved values corresponds to a frame in the set of frames, and indicates a number of digital values from its corresponding frame that matches corresponding digital values from the filter array.
  • the method can include applying addresses for the set of frames in the input array and the filter array to the fifth block and the fourth block in coordination with the in-place convolution.
  • the method can include storing an analog level in each cell of the third block for the array of output values.
  • a sequence of write pulses can be applied for each cell in the third block having a number of write pulses determined according to a corresponding output value in the array of output values.
  • a sequence of write pulses can be applied for each cell in the third block having a pulse duration determined according to a corresponding output value in the array of output values.
  • a sequence of write pulses can be applied for each cell in the third block having a tail length of a write pulse determined according to a corresponding output value in the array of output values.
  • FIG. 1 illustrates an example device for comparing electrical differences between a feature array and a frame in an input array.
  • FIG. 2 illustrates an example device for executing in-place convolution of a function of a filter array over an input array.
  • FIG. 3 illustrates executing in-place convolution as shown in FIG. 2 in more details.
  • FIG. 4 illustrates an example of executing in-place convolution of a function of a filter array over an input array.
  • FIG. 5 illustrates a second example of executing in-place convolution of a function of a filter array over an input array.
  • FIG. 6 illustrates a third example of executing in-place convolution of a function of a filter array over an input array.
  • FIG. 7 illustrates a fourth example of executing in-place convolution of a function of a filter array over an input array.
  • FIG. 8 illustrates an example of a pulse duration determined according to a convolved value from in-place convolution for programmable resistance memory cells.
  • FIG. 9 illustrates an example of a pulse duration determined according to convolved value from in-place convolution for charge storage memory cells.
  • FIGS. 10A, 10B and 10C illustrate example pulse shapes of set pulses for changing the resistance level of a cell having a body of phase change material.
  • FIG. 11 illustrates a simplified flowchart for a flow in operating a device.
  • FIG. 12 is a simplified block diagram of an integrated circuit in accordance with the present technology.
  • FIG. 1 illustrates an example device for comparing electrical differences between a feature array and a frame in an input array.
  • Device 100 comprises a first block of memory cells 110 , a second block of memory cells to store a feature array 120 , and a third block of memory cells 130 to store an array of output values.
  • the first block of memory cells 110 can store an input array, such as supplied via the data-in line 1295 from input ports on the integrated circuit 1200 ( FIG. 12 ), or an array of convolved values from in-place convolution executed by the convolution circuitry 180 ( FIG. 2 ).
  • Sensing circuitry 160 is coupled to the first block of memory cells 110 and the second block of memory cells 120 to compare electrical differences between the memory cells in the first block and the memory cells in the second block to generate the array of output values.
  • the electrical differences indicate write strength for the memory cells in the array of output values.
  • the write strength can be referred to as weight, and the array of output values can be referred as a weight array.
  • Writing circuitry 170 is operatively coupled to the third block of memory cells 130 to store the array of output values in the third block of memory cells 130 .
  • the writing circuitry operatively coupled to the third block is configured to store an analog level in each cell of the third block for the array of output values, for instance, according to the electrical differences between analog values from its corresponding frame and analog values from the feature array stored in the second block of memory cells 120 .
  • Sensing circuitry 160 is coupled to the first block of memory cells 110 and the second block of memory cells 120 via lines 115 and 125 respectively.
  • Writing circuitry 170 is coupled to the sensing circuitry 160 and the third block of memory cells 130 via lines 165 and 175 respectively.
  • the first block of memory cells 110 can have a number M of rows of cells and a number N of rows of cells. For instance, M and N can be 128.
  • a plurality of feature arrays can be stored in the second block of memory cells 120 .
  • the second block of memory cells 120 can store feature arrays F1-Fn.
  • a feature array (e.g. F1) can be stored in a number Y of rows of cells and a number X of columns of cells.
  • the sensing circuitry 160 is configured to compare electrical differences between the feature array with each frame (e.g. 111 , FIG. 1 ) in the set of frames to generate the array of output values, where each value in the array of output values corresponds to a frame in the set of frames, and indicates electrical differences between analog values from its corresponding frame and analog values from the feature array.
  • the device can includes address generation circuits ( 1250 , FIG. 12 ) that apply addresses for the set of frames and the feature array to the first block and the second block in coordination with the sensing circuitry comparing the electrical differences.
  • Writing circuitry 170 operatively coupled to the third block is configured to store an analog level in each cell of the third block for the array of output values.
  • Writing circuitry 170 can apply a sequence of write pulses for each cell in the third block having a number of write pulses determined according to a corresponding output value in the array of output values, where the analog levels in the third block of memory cells can include resistance levels or threshold voltage levels.
  • a difference in analog levels can be compared against a resistance difference threshold, and a number of write pulses for changing the resistance levels can be based on whether the difference is above or below the resistance difference threshold.
  • a difference in analog levels can be compared against a set of resistance difference thresholds (e.g.
  • a number of write pulses for changing the resistance levels can be based on whether the difference is lower than the lowest resistance difference threshold in the set, higher than the highest resistance difference threshold in the set, or between two resistance difference thresholds in the set. For instance, a greater difference in analog levels can correspond to a greater number of write pulses, or vice versa.
  • Writing circuitry 170 can apply a sequence of write pulses for each cell in the third block having a pulse duration determined according to a corresponding output value in the array of output values, where the analog levels in the third block of memory cells can include resistance levels or threshold voltage levels.
  • a difference in analog levels can be compared against a set of resistance difference thresholds (e.g. 0-M ⁇ ), and a pulse duration for changing the resistance levels or threshold voltage levels can be based on whether the difference is lower than the lowest resistance difference threshold in the set, higher than the highest resistance difference threshold in the set, or between two resistance difference thresholds in the set.
  • the pulse duration of a write pulse can be applied to a sequence of write pulses so the write pulses in the sequence have the same pulse duration. For instance, a greater difference in analog levels can correspond to a longer pulse duration of a write pulse, or vice versa.
  • Writing circuitry 170 can apply a sequence of write pulses for each cell in the third block having a tail length of a write pulse determined according to a corresponding output value in the array of output values, where the analog levels in the third block of memory cells can include resistance levels.
  • a difference in analog levels can be compared against a set of resistance difference thresholds (e.g. 0-1M ⁇ ), and a tail length of a write pulse for changing the resistance levels can be based on whether the difference is lower than the lowest resistance difference threshold in the set, higher than the highest resistance difference threshold in the set, or between two resistance difference thresholds in the set.
  • the tail length of a write pulse can be applied to a sequence of write pulses so the write pulses in the sequence have the same tail length. For instance, a greater difference in analog levels can correspond to a longer tail length of a write pulse, or vice versa.
  • FIG. 2 illustrates an example device 200 for executing in-place convolution of a function of a filter array over an input array.
  • Device 200 comprises a fourth block of memory cells 140 to store a filter array, and a fifth block of memory cells 150 to store an input array.
  • Convolution circuitry 180 is operatively coupled to the fourth block of memory cells and the fifth block of memory cells to execute in-place convolution of a function of the filter array over the input array to generate an array of convolved values.
  • Writing circuitry 190 is operatively coupled to the first block of memory cells 110 ( FIG. 1 ) to store the array of convolved values in the first block.
  • Convolution circuitry 180 is coupled to the fourth block 140 and the fifth block 150 via lines 145 and 155 , respectively.
  • Writing circuitry 190 is coupled to the convolution circuitry 180 via lines 185 , and coupled to the first block of memory cells 110 ( FIG. 1 ) via lines 195 .
  • writing circuitry 170 ( FIG. 1 ) and writing circuitry 190 can be the same writing circuitry.
  • the input array stored in the fifth block 150 and the filter array can include digital values, and the convolution circuitry can receive the digital values as inputs to the function.
  • the function can convolve the filter array with each frame in the set of frames to generate the array of convolved values, where each value in the array of convolved values can correspond to a frame in the set of frames, and can indicate a number of digital values from its corresponding frame that matches corresponding digital values from the filter array.
  • Address generation circuits ( 1250 , FIG. 12 ) an apply addresses for the set of frames in the input array and the filter array to the fifth block and the fourth block in coordination with the in-place convolution.
  • the fifth block of memory cells 150 has a number M of rows of cells and a number N of rows of cells. For instance, M and N can be 128.
  • a plurality of filter arrays can be stored in the fourth block of memory cells 140 .
  • the fourth block of memory cells 140 can store filter arrays G1-Gn.
  • a filter array (e.g. G1) can be stored in a number Y of rows of cells and a number X of columns of cells.
  • a frame of cells can have the same number Y of rows of cells and the same number X of columns of cells as in a filter array.
  • In-place convolution of a different function of the filter array G1 can be executed over a set of frames of cells in the input array stored in the fifth block of memory cells 150 .
  • In-place convolution of a function of a different filter array (e.g. G2) can be executed over a set of frames of cells in the input array.
  • a convolution layer can be generated by executing in-place convolution of a function of each filter array (e.g. G1) in the plurality of filter arrays (e.g. G1-Gn) over each frame of cells ( 511 ) in the set of frames in the input array.
  • convolution circuitry 180 can determine a number of matched digital values between cells in the filter array G1 and corresponding cells in a particular frame of cells 511 in the input array to generate an array of convolved values.
  • Convolution circuitry 180 can determine a number of matched digital values in series, i.e., digital values of a cell in the filter array G1 and a corresponding cell in the frame of cells 511 are compared by convolution circuitry 180 at a time.
  • a number of matched digital values can be determined in parallel, i.e., digital values of all cells in the frame of cells 511 in the input array 150 and all corresponding cells in the filter array G1 can be compared by convolution circuitry 180 in parallel. Convolution operations are further described in reference to FIGS. 3-7 .
  • the writing circuitry 190 operatively coupled to the first block 110 is configured to store an analog level in each cell of the first block for the array of convolved values, for instance, according to the determined number of matched digital values between the filter array and the frame of cells in the input array stored in the fifth block of memory cells 150 .
  • Writing circuitry 190 can apply a sequence of write pulses for each cell in the first block 110 having a number of write pulses determined according to a corresponding value in the array of convolved values, where the analog levels in the first block of memory cells can include resistance levels or threshold voltage levels.
  • a corresponding convolved value can indicate a number of matched digital values, and a number of write pulses can be greater for a higher number of matched digital values than for a lower number of matched digital values, or vice versa.
  • Writing circuitry 190 can apply a sequence of write pulses for each cell in the first block 110 having a pulse duration determined according to a corresponding value in the array of convolved values, where the analog levels in the first block of memory cells can include resistance levels or threshold voltage levels.
  • a corresponding convolved value can indicate a number of matched digital values, and a pulse duration can be longer for a lower number of matched digital values than for a higher number of matched digital values, or vice versa.
  • Writing circuitry 190 can apply a sequence of write pulses for each cell in the first block having a tail length of a write pulse determined according to a corresponding value in the array of convolved values, where the analog levels in the first block of memory cells can include resistance levels.
  • a corresponding convolved value can indicate a number of matched digital values
  • a tail length of a write pulse can be longer for a lower number of matched digital values than for a higher number of matched digital values, or vice versa.
  • FIG. 3 illustrates executing in-place convolution as shown in FIG. 2 in more details.
  • convolution circuitry 180 can execute in-place convolution of a function of the filter array stored in the fourth block 140 over the input array stored in the fifth block of memory cells 150 to generate an array of convolved values.
  • Writing circuitry ( 190 , FIG. 2 ) operatively coupled to the first block of memory cells 110 can store the array of convolved values from convolution circuitry 180 in the first block of memory cells 110 .
  • the first block of memory cells 110 , the fourth block of memory cells 140 , and the fifth block of memory cells 150 can be implemented on a single integrated circuit chip or a multichip module under one package.
  • the fifth block of memory cells 150 to store the input array can have a number M of rows of cells and a number N of columns of cells.
  • a number ‘1’ or ‘0’ shown for a cell in the fifth block of memory cells represents a digital value.
  • the fourth block of memory cells 140 to store the filter array can have a number Y of rows of cells and a number X of columns of cells.
  • a number ‘1’ or ‘0’ shown for a cell in the fourth block of memory cells represents a digital value.
  • the fifth block of memory cells 150 has 9 rows (R1-R9) and 9 columns (C1-C9), the fourth block of memory cells 140 has 3 rows (R1, R2, R3) and 3 columns (C1, C2, C3), and the first block of memory cells 110 has 7 rows and 7 columns.
  • a frame of cells in the fifth block of memory cells to store the input array can have the same number Y of rows and the same number X of columns as the fourth block of memory cells 140 .
  • a target cell in a frame of cells in the fifth block of memory cells is a cell at the center of the frame of cells, surrounded by at least one row of cells on an upper side, at least one row of cells on a lower side, at least one row of cells on a left side, and at least one row of cells on a right side of the target cell.
  • the frame of cells can include cells in 3 consecutive rows (e.g. R1, R2, R3) and 3 consecutive columns (e.g. C1, C2, C3), and the target cell is at a center row and a center column of the frame of cells (e.g. R2C2 for a frame 511 , FIG. 4 ).
  • cells in the border rows e.g. R1, R9) and in the border columns (e.g. C1, C9) are not target cells, as they are not surrounded by other cells on at least one of top, bottom, left and right sides. Accordingly a number of frames of cells in the input array that can have a target cell at the center of a frame is fewer than the number of cells in the input array, the number of convolutions of a function of the filter array over the frames of cells having a target cell is fewer than the number of cells in the input array, and the number of cells in the first block of memory cells to store the array of convolved values from the convolutions is fewer than the number of cells in the input array.
  • zero-padding can be used to pad the fifth block of memory cells 150 with a binary value ‘0’ around the fifth block of memory cells.
  • a row of cells with binary values ‘0’ can be padded adjacent a border row (e.g. R1, R9) in the fifth block of memory cells
  • a column of cells of ‘0’ can be padded adjacent a border column (e.g. C1, C9) in the fifth block of memory cells, so the filter array can be applied to cells in a border row or a border column in the fifth block of memory cells.
  • each cell in a border row of cells or a border column of cells can be a target cell in a frame of cells for in-place convolution with a filter array.
  • the first block of memory cells can have the same number M of rows of cells and the same number N of columns as the fifth block of memory cells.
  • the analog levels in the first block of memory cells 110 include programmable resistance memory cells having resistance levels.
  • Programmable resistance memories can include phase change memory (PCM), resistive random access memory (RRAM), and magnetoresistive random access memory (MRAM).
  • a number ‘1’, ‘0.9’, ‘0.8’, ‘0.7’, ‘0.6’, etc for a cell in the first block of memory cells can represent 1M ⁇ , 0.9 M ⁇ , 0.8 M ⁇ , 0.7 M ⁇ , 0.6 M ⁇ , etc respectively, as shown in the examples of FIGS. 3-7 .
  • the first block of memory cells can be set to the highest resistance level, such as 1M ⁇ , representing the case when a number of matched digital values is the same as the number of digital values in a filter array.
  • the analog levels in the first block of memory cells 110 include charge storage memory cells having threshold voltage levels.
  • Charge storage memories can include floating gate and nitride trapping memories.
  • a number ‘1’, ‘0.9’, ‘0.8’, ‘0.7’, ‘0.6’, for a cell in the first block of memory cells can represent 10V, 9V, 8V, 7V, 0.6V, etc respectively, as shown in the examples of FIGS. 3-7 .
  • the first block of memory cells can be erased to the lowest threshold voltage level, representing the case when a number of matched digital values is zero.
  • Convolution circuitry ( 180 , FIG. 2 ) can execute in-place convolution of a function of the filter array over the input array to generate an array of convolved values.
  • Each value in the array of convolved values can indicate a number of digital values from its corresponding frame that match corresponding digital values from the filter array.
  • Storing the convolved value in the particular cell in the first block of memory cells can include addressing the particular cell in the first block of memory cells, and converting the convolved values from in-place convolution into a set time of a set pulse or a program time of a program pulse for the cell in the first block of memory cells.
  • a set time of a set pulse can be used when analog levels in the first block of memory cells include resistance levels.
  • a program time of a program pulse can be used when analog levels in the first block of memory cells include threshold voltage levels.
  • the set time can be applied to a sequence of set pulses so the set pulses in the sequence have the same set time.
  • the program time can be applied to a sequence of program pulses so the program pulses in the sequence have the same program time.
  • the convolved values can be converted into a number of set pulses for a sequence of set pulses, or a number of program pulses for a sequence of program pulses. Furthermore, the convolved values can be converted into a combination of varying set times and numbers of set pulses, or a combination of varying program times and numbers of program pulses.
  • the convolved values in the array of convolved values are stored as analog levels in the first block of memory cells, and no verify cycles are needed to verify that a cell in the first block of memory cells has been changed to a target resistance or threshold range.
  • the frame address of a frame of cells in the fifth block of memory cells 150 can refer to a row address and a column address of a cell in the frame of cells.
  • a frame address can refer to a row address and a column address of a target cell at the center of a frame of cells (e.g. R2C2 for a frame 511 , FIG. 4 ).
  • the frame address can be sequenced in a row direction from a particular frame of cells by at least one column, or in a column direction from a particular frame of cells by at least one row, to address a next frame of cells.
  • Technology as described herein for executing in-place convolution of the function of the filter array over a frame of cells in the fifth block of memory cells can be applied in sequence to other frames of cells in the fifth block of memory cells.
  • FIG. 4 illustrates an example of executing in-place convolution of a function of a filter array over an input array.
  • a number of matched digital values is between the fourth block of memory cells 140 to store the filter array and a particular frame of cells 511 at a first frame address R2C2 in the fifth block of memory cells 150 .
  • Convolution circuitry ( 180 , FIG. 2 ) can compare the filter array stored in the fourth block of memory cells 140 and the particular frame of cells 511 stored in the fifth block of memory cells 150 .
  • the fourth block of memory cells 140 to store the filter array has 3 rows (R1, R2, R3) and 3 columns (C1, C2, C3) of cells, and the particular frame of cells 511 has 3 rows (R1, R2, R3) and 3 columns (C1, C2, C3) of cells correspondingly.
  • the cells in the filter array and the particular frame have one bit per cell.
  • the filter array has digital values 0, 1, 1, 1, 0, 1, 1, 1 and 0 at addresses R1C1, R1C2, R1C3, R2C1, R2C2, R2C3, R3C1, R3C2 and R3C3, respectively.
  • the particular frame of cells has digital values 1, 1, 1, 1, 0, 1, 1, 1 and 0 at corresponding addresses.
  • the fourth block of memory cells can store different values than shown in this example.
  • the function can be different than determining a number of matched digital values.
  • the function can including determining a number of corresponding digital values in the filter array and the particular frame of cells that are both ‘1’, both ‘0’, not matched, etc.
  • Writing circuitry ( 190 , FIG. 2 ) operatively coupled to the first block of memory cells 110 can change an analog level of a first cell 511 C in the first block of memory cells 110 according to the number of matched digital values.
  • FIG. 5 illustrates a second example of executing in-place convolution of a function of a filter array over an input array.
  • Address generation circuits 1250 , FIG. 12
  • Address generation circuits can apply addresses for the set of frames and the filter array to the fifth block of memory cells and the fourth block of memory cells in coordination with the in-place convolution.
  • a second frame of cells 512 can be selected at a second frame address in the fifth block of memory cells.
  • the second frame address can be sequenced from the first frame address by a stride, where the stride can include either at least one column in a row direction or at least one row in a column direction.
  • the second frame of cells 512 at the second frame address R2C3 in the fifth block of memory cells 150 is selected, where the second frame address R2C3 is the address of the target cell at the center of the second frame of cells.
  • a second number of matched digital values is between the fourth block of memory cells 140 to store the filter array and the second frame of cells 512 at the second frame address R2C3 in the fifth block of memory cells 150 .
  • the second frame address R2C3 can be sequenced from the first frame address R2C2 by one column in a row direction.
  • Convolution circuitry 180 , FIG. 2
  • the fourth block of memory cells 140 to store the filter array has 3 rows (R1, R2, R3) and 3 columns (C1, C2, C3)
  • the second frame of cells 512 has 3 rows (R1, R2, R3) and 3 columns (C2, C3, C4) correspondingly.
  • the cells in the filter array and the second frame have one bit per cell.
  • the filter array has digital values 0, 1, 1, 1, 0, 1, 1, 1 and 0 at addresses R1C1, R1C2, R1C3, R2C1, R2C2, R2C3, R3C1, R3C2 and R3C3, respectively.
  • the second frame of cells has digital values 1, 1, 1, 0, 1, 1, 1, 1, 0 and 1 at corresponding addresses.
  • Writing circuitry ( 190 , FIG. 2 ) operatively coupled to the first block of memory cells 110 can change an analog level of a second cell 512 C in the first block of memory cells 110 according to the second number of matched digital values.
  • the analog levels in the first block of memory cells include resistance levels, and a resistance level can be set to the number of matched digital values divided by (1+ the number of cells in the fourth block of memory cells) in Megaohm (M ⁇ ).
  • M ⁇ Megaohm
  • the second cell 512 C is at a different row/column address than the first cell 511 C in the first block of memory cells 110 .
  • the second cell 512 C can be at the same row of cells as the first cell 511 C in the first block of memory cells 110 , and at a different column of cells as the first cell 511 C in the first block of memory cells 110 .
  • the second cell 512 C can be at a different row of cells and at a different column of cells as the first cell 511 C in the first block of memory cells 110 .
  • FIG. 6 illustrates a third example of executing in-place convolution of a function of a filter array over an input array.
  • Address generation circuits 1250 , FIG. 12
  • Address generation circuits can apply addresses for the set of frames and the filter array to the fifth block of memory cells 150 and the fourth block of memory cells 140 in coordination with the in-place convolution.
  • a third frame of cells 521 can be selected at a third frame address in the fifth block of memory cells.
  • the third frame address can be sequenced from the first frame address by a stride, where the stride can include either at least one column in a row direction or at least one row in a column direction.
  • a third frame of cells 521 at a third frame address R3C2 in the fifth block of memory cells 150 is selected, where the third frame address R3C2 is the address of the target cell at the center of the third frame of cells.
  • a third number of matched digital values is between the fourth block of memory cells 140 to store the filter array and the third frame of cells 521 at the third frame address R3C2 in the fifth block of memory cells 110 .
  • the third frame address R3C2 can be sequenced from the first frame address R2C2 by one row in a column direction.
  • Convolution circuitry 180 , FIG. 2
  • the fourth block of memory cells 140 to store the filter array has 3 rows (R1, R2, R3) and 3 columns (C1, C2, C3)
  • the third frame of cells 521 has 3 rows (R2, R3, R4) and 3 columns (C1, C2, C3) correspondingly.
  • the cells in the filter array and the third frame have one bit per cell.
  • the filter array has digital values 0, 1, 1, 1, 0, 1, 1, 1 and 0 at addresses R1C1, R1C2, R1C3, R2C1, R2C2, R2C3, R3C1, R3C2 and R3C3, respectively.
  • the third frame of cells has digital values 1, 0, 1, 1, 1, 0, 1, 1 and 1 at corresponding addresses.
  • Writing circuitry ( 190 , FIG. 2 ) operatively coupled to the first block of memory cells 110 can change an analog level of a third cell 521 C in the first block of memory cells 110 according to the third number of matched digital values.
  • the analog levels in the first block of memory cells include resistance levels, and a resistance level can be set to the number of matched digital values divided by (1+ the number of cells in the fourth block of memory cells) in Megaohm (M ⁇ ).
  • M ⁇ Megaohm
  • the third cell 521 C is at a different row/column address than the first cell 511 C and the second cell 512 C in the first block of memory cells 110 .
  • the third cell 521 C can be at the same column of cells as the first cell 511 C in the first block of memory cells 110 , and at a different row of cells as the first cell 511 C in the first block of memory cells 110 .
  • the third cell 521 C can be at a different row of cells and at a different column of cells as the first cell 511 C and the second cell 512 C in the first block of memory cells 110 .
  • executing in-place convolution of a function of the filter array over the input array can include convolving the function of the filter array over frames of cells at a first row address (e.g. R1) in the fifth block of memory cells 150 while sequencing the column addresses (C1-C9) of the frames of cells, and then convolving the function of the filter array over frames of cells at a next row address (e.g. R2) in the fifth block of memory cells 150 while sequencing the column addresses (C1-C9) of the frames of cells.
  • the next row address is sequenced from the first row address by at least one row.
  • FIG. 7 illustrates a fourth example of executing in-place convolution of a function of a filter array over an input array.
  • Address generation circuits 1250 , FIG. 12
  • Address generation circuits can apply addresses for the set of frames and the filter array to the fifth block of memory cells 150 and the fourth block of memory cells 140 in coordination with the in-place convolution.
  • a last number of matched digital values is between the fourth block of memory cells 140 to store the filter array and a last frame of cells 577 in the fifth block of memory cells 150 .
  • the last frame of cells 577 includes cells addressed in the last three rows of cells in the number M of rows and in the last three columns of cells in the number N of columns, e.g.
  • Convolution circuitry ( 180 , FIG. 2 ) can compare the filter array stored in the fourth block of memory cells 140 and the last frame of cells 577 in the fifth block of memory cells 150 .
  • the fourth block of memory cells 140 to store the filter array has 3 rows (R1, R2, R3) and 3 columns (C1, C2, C3), and the last frame of cells 577 has 3 rows (R7, R8, R9) and 3 columns (C7, C8, C9) correspondingly.
  • the filter array has digital values 0, 1, 1, 1, 0, 1, 1, 1 and 0 at addresses R1C1, R1C2, R1C3, R2C1, R2C2, R2C3, R3C1, R3C2 and R3C3, respectively.
  • the cells in the filter array and the last frame have one bit per cell.
  • the last frame of cells has digital values 0, 1, 1, 1, 0, 1, 1, 1 and 1 at corresponding addresses.
  • Writing circuitry operatively coupled to the first block of memory cells 110 can change an analog level of the cell 577 C in the first block of memory cells 110 according to the last number of matched digital values.
  • the analog levels in the first block of memory cells include resistance levels, and a resistance level can be set to the number of matched digital values divided by (1+ the number of cells in the fourth block of memory cells) in Megaohm (M ⁇ ).
  • M ⁇ Megaohm
  • Address generation circuits can apply addresses for the set of frames and the filter array to the fifth block 150 and the fourth block of memory cells 140 in coordination with the in-place convolution.
  • a first function of the filter array can be convolved over all frames in the set of frames stored in the input array to generate an array of convolved values, and the array of convolved values can be stored as analog levels in the first block of memory cells.
  • a second function of the filter array can be convolved over all frames in the set of frames stored in the input array to generate a second array of convolved values, and the second array of convolved values can be stored as analog levels in the first block of memory cells.
  • different functions of different filter arrays can be used for executing in-place convolution over the input array to generate respective arrays of convolved values, and the respective arrays of convolved values can be stored as analog levels in the first block of memory cells.
  • FIG. 8 illustrates an example of a pulse duration determined according to a convolved value from in-place convolution for programmable resistance memory cells.
  • the first block of memory cells 110 includes programmable resistance memory cells having resistance levels.
  • Programmable resistance memories can include phase change memory (PCM), resistive random access memory (RRAM), and magnetoresistive random access memory (MRAM).
  • PCM phase change memory
  • RRAM resistive random access memory
  • MRAM magnetoresistive random access memory
  • a pulse duration can be referred to as a set time
  • a write pulse can be referred to as a set pulse.
  • the writing circuitry ( 190 , FIG.
  • the set time of a set pulse can be longer for a lower number of matched digital values than for a higher number of matched digital values, or vice versa.
  • a longer set time of a set pulse can induce lower resistance R, and a shorter set time of a set pulse can induce higher resistance R.
  • the writing circuitry ( 190 , FIG. 2 ) can also determine a number of write pulses for changing the resistance levels according to the number of matched digital values. For instance, a number of write pulses can be greater for a higher number of matched digital values than for a lower number of matched digital values, or vice versa.
  • the first block of memory cells can be set to the highest resistance level, representing the case when a number of matched digital values is the same as the number of digital values in a filter array.
  • no set pulse is applied to a cell in the first block of memory cells.
  • FIG. 9 illustrates an example of a pulse duration determined according to convolved value from in-place convolution for charge storage memory cells.
  • the first block of memory cells 110 includes charge storage memory cells having threshold voltage levels.
  • Charge storage memories can include floating gate and nitride trapping memories.
  • a pulse duration can be referred to as a program time, and a write pulse can be referred to as a program pulse.
  • the writing circuitry ( 190 , FIG.
  • the program time of a program pulse can be longer for a lower number of matched digital values than for a higher number of matched digital values, or vice versa.
  • a longer program time of a program pulse can induce higher threshold voltage Vt, and a shorter program time of a program pulse can induce lower resistance R.
  • the writing circuitry ( 190 , FIG. 2 ) can also determine a number of write pulses for changing the threshold voltage levels according to the number of matched digital values. For instance, a number of program pulses can be greater for a higher number of matched digital values than for a lower number of matched digital values, or vice versa.
  • the first block of memory cells can be erased to the lowest threshold voltage level, representing the case when a number of matched digital values is zero. During the process, if a number of matched digital values is zero, then no program pulse is applied to a cell in the first block of memory cells.
  • FIGS. 10A, 10B and 10C illustrate example pulse shapes of set pulses for changing the resistance level of a cell having a body of phase change material.
  • FIG. 10A illustrates a single set pulse 1010 having a relatively long pulse duration and rapid rising and falling edges, with an amplitude above a melting threshold 1005 for the phase change material.
  • FIG. 10B illustrates a sequence of set pulses 1021 and 1022 having a shorter pulse duration than the single set pulse 1010 in FIG. 10A .
  • FIG. 10C illustrates a single set pulse with a rapid rising edge and a ramp-shaped trailing edge or a set tail 1035 of constant or near constant slope.
  • a tail length of a set tail 1035 can vary between 10 ns and 1 ms, according to the differences in analog levels between the filter array and the particular frame of cells in the input array stored in the fifth block of memory cells.
  • FIG. 11 illustrates a simplified flowchart for a flow in operating a device.
  • an input array can be stored in a first block of memory cells.
  • a feature array can be stored in a second block of memory cells.
  • the third block of memory cells 130 can be initialized.
  • the third block of memory cells can comprise programmable resistance memory cells having resistance levels, or charge storage memory cells having threshold voltage levels.
  • Step 1130 can include setting the third block of memory cells to the highest resistance level, such as 1M ⁇ .
  • the highest resistance level can represent the case where a number of matched digital values between the feature array and a particular frame of cells in the first block of memory cells is the same as the number of digital values in the feature array.
  • Step 1130 can include erasing the third block of memory cells to the lowest threshold voltage level.
  • the lowest threshold voltage level can represent the case where a number of matched digital values between the feature array and a particular frame of cells in the first block of memory cells is zero.
  • Step 1110 , 1120 and 1130 as shown in the flowchart does not indicate the order in which Steps 1110 , 1120 and 1130 can be executed. For instance, Step 1130 can be executed before Step 1110 , and Step 1110 can be executed after Step 1120 .
  • sensing circuitry coupled to the first block of memory cells and the second block of memory cells can compare electrical differences between memory cells in the first block and the memory cells in the second block to generate an array of output values. For a set of frames of cells in the first block, the sensing circuitry can compare electrical differences between the feature array with each frame in the set of frames to generate the array of output values, where each value in the array of output values corresponds to a frame in the set of frames, and indicates electrical differences between analog values from its corresponding frame and analog values from the feature array.
  • the writing circuitry operatively coupled to the third block of memory cells 130 can store the array of output values in the third block of memory cells.
  • An analog level can be stored in each cell of the third block for the array of output values.
  • the writing circuitry ( 170 , FIG. 1 ) can apply a sequence of write pulses for each cell in the third block having a number of write pulses determined according to a corresponding output value in the array of output values, where cells in the third block of memory cells can include resistance levels or threshold voltage levels.
  • the writing circuitry can apply a sequence of write pulses for each cell in the third block having a pulse duration determined according to a corresponding output value in the array of output values, where cells in the third block of memory cells include resistance levels or threshold voltage levels.
  • the writing circuitry can apply a sequence of write pulses for each cell in the third block having a tail length of a write pulse determined according to a corresponding output value in the array of output values, where the analog levels in the third block of memory cells include resistance levels.
  • the device can comprise a fourth block of memory cells to store a filter array and a fifth block of memory cells to store an input array.
  • Convolution circuitry is operatively coupled to the fourth block of memory cells and the fifth block of memory cells to generate an array of convolved values.
  • the flow can include executing in-place convolution of a function of the filter array over the input array to generate an array of convolved values, and storing the array of convolved values in the first block.
  • the flow can continue to compare electrical differences between the array of convolved values stored in the first block of memory cells and a feature array stored in the second block of memory cells to generate the array of output values, and store the array of output values in the third block of memory cells.
  • the input array stored in the fifth block of memory cells and the filter array can include digital values
  • the convolution circuitry can receive the digital values as inputs to the function.
  • the function can convolve the filter array with each frame in the set of frames to generate the array of convolved values, where each value in the array of convolved values corresponds to a frame in the set of frames, and indicates a number of digital values from its corresponding frame that matches corresponding digital values from the filter array.
  • the flow includes applying addresses for the set of frames in the input array and the filter array to the fifth block and the fourth block in coordination with the in-place convolution.
  • FIG. 12 is a simplified block diagram of an integrated circuit in accordance with the present technology.
  • the integrated circuit 1200 includes a memory 1270 .
  • the memory 1270 comprises a first block of memory cells 110 , a second block of memory cells 120 to store a feature array, a third block of memory cells 130 to store an array of output values, a fourth block of memory cells 140 to store a filter array, and a fifth block of memory cells 150 .
  • the first block of memory cells 110 is configured to store an input array.
  • the fifth block of memory cells 150 is configured to store an input array.
  • the filter array and the feature array can be the same array.
  • the integrated circuit 1200 includes address generation circuits 1250 that apply addresses for the set of frames in the input array stored in the first block of memory cells and the feature array to the first block and the second block in coordination with the sensing circuitry comparing the electrical differences. Address generation circuits 1250 can also apply addresses for the set of frames in the input array stored in the fifth block and the filter array to the fifth block and the fourth block in coordination with the in-place convolution.
  • Address generation circuits 1250 can include a first block address generator 1251 , a feature array address generator 1252 , an output array address generator 1253 , a filter address generator 1254 , and a fifth block address generator 1255 .
  • the first block address generator 1251 is coupled to address lines 1261 which in turn are coupled to the first block of memory cells 110 .
  • the feature array address generator 1252 is coupled to address lines 1262 which in turn are coupled to the second block of memory cells 120 .
  • the output array address generator 1253 is coupled to address lines 1263 which in turn are coupled to the third block of memory cells 130 .
  • the filter address generator 1254 is coupled to address lines 1264 which in turn are coupled to the fourth block of memory cells 140 .
  • a fifth block address generator 1255 is coupled to address lines 1265 which in turn are coupled to fifth block 150 . Addresses are supplied on bus 1240 to the first block address generator 1251 , the feature array address generator 1252 , the output array address generator 1253 , the filter address generator 1254 , and the fifth block address generator 1255 .
  • Convolution circuitry 180 is operatively coupled to the fourth block of memory cells 140 , the fifth block of memory cells 150 , and the first block of memory cells 110 via lines 1274 , 1275 and 1271 a respectively, for executing in-place convolution of a function of a filter array over the input array stored in the fifth block of memory cells to generate an array of convolved values.
  • Sensing circuitry 160 is coupled to the first block of memory cells and the second block of memory cells via lines 1271 b and 1272 respectively, for comparing electrical differences between the memory cells in the first block and the memory cells in the second block to generate an array of output values.
  • the third block of memory cells 130 is coupled to the sensing circuitry 160 via lines 1273 , for storing the array of output values in the third block of memory cells.
  • the first block of memory cells 110 , the second block of memory cells 120 , the third block of memory cells 130 , the fourth block of memory cells 140 , and the fifth block of memory cells 150 can be configured in separate blocks of cells.
  • the first block address generator 1251 , the feature array address generator 1252 , the output array address generator 1253 , the filter address generator 1254 , and the fifth block address generator 1255 can be separate address generators, including respective row decoders for word lines and column decoders for bit lines.
  • the first block of memory cells 110 , the second block of memory cells 120 , the third block of memory cells 130 , the fourth block of memory cells 140 , and the fifth block of memory cells 150 can be configured in a common block of cells.
  • the first, second and third arrays of cells can share word lines coupled to a common row decoder, and have respective column decoders for bit lines coupled to respective arrays of cells.
  • Data is supplied via the data-in line 1295 from input/output ports on the integrated circuit 1200 or from other data sources internal or external to the integrated circuit 1200 , to the first block of memory cells 110 , the second block of memory cells 120 , the third block of memory cells 130 , the fourth block of memory cells 140 , and the fifth block of memory cells 150 .
  • Data supplied via the data-in line 1295 can include an input array to be stored in the first block of memory cells 110 or the fifth block of memory cells 150 , a filter array to be stored in the fourth block of memory cells 140 , and a feature array to be stored in the second block of memory cells 120 .
  • circuitry 1290 is included on the integrated circuit, such as a general purpose processor or special purpose application circuitry, or a combination of modules providing system-on-a-chip functionality supported by the memory array.
  • Data is supplied via the data-out line 1285 from the sensing circuitry 160 to input/output ports on the integrated circuit 1200 , or to other data destinations internal or external to the integrated circuit 1200 .
  • Data supplied via the data-out line 1285 can include the array of output values stored in the third block of memory cells 130 .
  • Convolution circuitry 180 can execute in-place convolution of a function of the filter array over the input array stored in the fifth block of memory cells to generate an array of convolved values.
  • Writing circuitry 170 operatively coupled to the third block 130 can change an analog level of a cell in the output array.
  • Writing circuitry 190 operatively coupled to the first block 110 can change an analog level of a cell in the first block 110 .
  • writing circuitry 170 and writing circuitry 190 can be the same writing circuitry.
  • Convolution circuitry 180 , writing circuitry 170 and writing circuitry 190 implemented in this example using bias arrangement state machine control the application of bias arrangement supply voltages 1220 generated or provided through the voltage supply or supplies in block 1220 , such as read, program and erase voltages.
  • Convolution circuitry 180 and writing circuitry 170 can be implemented using special-purpose logic circuitry as known in the art.
  • convolution circuitry 180 and writing circuitry 170 can comprise a general-purpose processor, which can be implemented on the same integrated circuit to control the operations of the device.
  • a combination of special-purpose logic circuitry and a general-purpose processor can be utilized for implementation of convolution circuitry 180 and writing circuitry 170 .

Abstract

A device comprises a first block of memory cells, a second block of memory cells to store a feature array, and a third block of memory cells to store an array of output values. Sensing circuitry is coupled to the first block of memory cells and the second block of memory cells to compare electrical differences between the memory cells in the first block and the memory cells in the second block to generate the array of output values. Writing circuitry is operatively coupled to the third block to store the array of output values in the third block of memory cells.

Description

    BACKGROUND Field
  • The present invention relates to circuitry that can be used to perform in-memory convolution for machine learning.
  • Description of Related Art
  • Convolutional neural networks (CNN) are used in machine learning with applications in fields such as speech recognition, computer vision and text processing. CNN operations can be implemented using a system that includes graphics processing units (GPU) and dynamic random access memory (DRAM) coupled to the GPU. In such a system, data is frequently moved between multiple GPUs and DRAMs for convolutional operations, through components on printed circuit boards such as conductive traces and pads. However, such data movement can consume a significant amount of power and slow down the performance.
  • It is desirable to provide a device for convolutional operations that can improve the performance and reduce power consumption.
  • SUMMARY
  • A device is provided that comprises a first block of memory cells, a second block of memory cells to store a feature array, and a third block of memory cells to store an array of output values. Sensing circuitry is coupled to the first block of memory cells and the second block of memory cells to compare electrical differences between the memory cells in the first block and the memory cells in the second block to generate the array of output values. Writing circuitry operatively coupled to the third block can store the array of output values in the third block of memory cells.
  • As used herein, an analog level can be stored without verify cycles to verify that the cell has been changed to the target resistance or threshold range corresponding to a particular digital value. Storing output values in the fifth block of memory cells as analog levels instead of digital values can improve the performance for storing the output values in the array of output values, because the verify cycles are not needed.
  • As used herein, “in-place convolution” refers to convolution of a function of a filter array over an input array to generate an array of output values, where the filter array and the input array are stored in an addressable memory before the convolution, the convolution is executed while the filter array and the input array remain stored in the same addressable memory, and are not moved to another addressable memory before or during the execution of the convolution.
  • For a set of frames of cells in the first block, the sensing circuitry is configured to compare electrical differences between the feature array with each frame in the set of frames to generate the array of output values, where each value in the array of output values corresponds to a frame in the set of frames, and indicates electrical differences between analog values from its corresponding frame and analog values from the feature array. The device includes address generation circuits that apply addresses for the set of frames and the feature array to the first block and the second block in coordination with the sensing circuitry comparing the electrical differences.
  • In one embodiment, the first block can be configured to store an input array. In an alternative embodiment, the device can further comprise a fourth block of memory cells to store a filter array, and a fifth block of memory cells to store an input array. Convolution circuitry operatively coupled to the fourth block of memory cells and the fifth block of memory cells can execute in-place convolution of a function of the filter array over the input array to generate an array of convolved values. Writing circuitry operatively coupled to the first block of memory cells can store the array of convolved values in the first block.
  • The input array and the filter array can include digital values, and the convolution circuitry can receive the digital values as inputs to the function. For a set of frames of cells in the input array stored in the fifth block of memory cells, the function convolves the filter array with each frame in the set of frames to generate the array of convolved values, where each value in the array of convolved values corresponds to a frame in the set of frames, and indicates a number of digital values from its corresponding frame that matches corresponding digital values from the filter array. The device includes address generation circuits that apply addresses for the set of frames in the input array and the filter array to the fifth block and the fourth block in coordination with the in-place convolution.
  • The writing circuitry operatively coupled to the third block can be configured to store an analog level in each cell of the third block for the array of output values. The writing circuitry can apply a sequence of write pulses for each cell in the third block having a number of write pulses determined according to a corresponding output value in the array of output values. The writing circuitry can apply a sequence of write pulses for each cell in the third block having a pulse duration determined according to a corresponding output value in the array of output values. The writing circuitry can apply a sequence of write pulses for each cell in the third block having a tail length of a write pulse determined according to a corresponding output value in the array of output values.
  • In one embodiment, the first block of memory cells, the second block of memory cells, and the third block of memory cells can be implemented on a single integrated circuit chip or a multichip module under one package.
  • A method is provided for operating a device that comprises a first block of memory cells, a second block of memory cells to store a feature array, and a third block of memory cells to store an array of output values. The method comprises comparing electrical differences between memory cells in the first block and the memory cells in the second block to generate the array of output values, and storing the array of output values in the third block of memory cells.
  • For a set of frames of cells in the first block, the method includes comparing electrical differences between the feature array with each frame in the set of frames to generate the array of output values, where each value in the array of output values corresponds to a frame in the set of frames, and indicates electrical differences between analog values from its corresponding frame and analog values from the feature array. The method includes applying addresses for the set of frames and the feature array to the first block and the second block in coordination with the sensing circuitry comparing the electrical differences.
  • The method can include storing an input array in the first block of memory cells.
  • The device can comprise a fourth block of memory cells to store a filter array and a fifth block of memory cells to store an input array, and the method can include executing in-place convolution of a function of the filter array over the input array to generate an array of convolved values, and storing the array of convolved values in the first block. The input array and the filter array can include digital values, and the method can include receiving the digital values as inputs to the function.
  • For a set of frames of cells in the input array, the method can include convolving the filter array with each frame in the set of frames to generate the array of convolved values, where each value in the array of convolved values corresponds to a frame in the set of frames, and indicates a number of digital values from its corresponding frame that matches corresponding digital values from the filter array. The method can include applying addresses for the set of frames in the input array and the filter array to the fifth block and the fourth block in coordination with the in-place convolution.
  • The method can include storing an analog level in each cell of the third block for the array of output values. A sequence of write pulses can be applied for each cell in the third block having a number of write pulses determined according to a corresponding output value in the array of output values. A sequence of write pulses can be applied for each cell in the third block having a pulse duration determined according to a corresponding output value in the array of output values. A sequence of write pulses can be applied for each cell in the third block having a tail length of a write pulse determined according to a corresponding output value in the array of output values.
  • Other aspects and advantages of the present invention can be seen on review of the drawings, the detailed description and the claims, which follow.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 illustrates an example device for comparing electrical differences between a feature array and a frame in an input array.
  • FIG. 2 illustrates an example device for executing in-place convolution of a function of a filter array over an input array.
  • FIG. 3 illustrates executing in-place convolution as shown in FIG. 2 in more details.
  • FIG. 4 illustrates an example of executing in-place convolution of a function of a filter array over an input array.
  • FIG. 5 illustrates a second example of executing in-place convolution of a function of a filter array over an input array.
  • FIG. 6 illustrates a third example of executing in-place convolution of a function of a filter array over an input array.
  • FIG. 7 illustrates a fourth example of executing in-place convolution of a function of a filter array over an input array.
  • FIG. 8 illustrates an example of a pulse duration determined according to a convolved value from in-place convolution for programmable resistance memory cells.
  • FIG. 9 illustrates an example of a pulse duration determined according to convolved value from in-place convolution for charge storage memory cells.
  • FIGS. 10A, 10B and 10C illustrate example pulse shapes of set pulses for changing the resistance level of a cell having a body of phase change material.
  • FIG. 11 illustrates a simplified flowchart for a flow in operating a device.
  • FIG. 12 is a simplified block diagram of an integrated circuit in accordance with the present technology.
  • DETAILED DESCRIPTION
  • The following description will typically be with reference to specific structural embodiments and methods. It is to be understood that there is no intention to limit the technology to the specifically disclosed embodiments and methods but that the technology may be practiced using other features, elements, methods and embodiments. Preferred embodiments are described to illustrate the present technology, not to limit its scope, which is defined by the claims. Those of ordinary skill in the art will recognize a variety of equivalent variations on the description that follows.
  • FIG. 1 illustrates an example device for comparing electrical differences between a feature array and a frame in an input array. Device 100 comprises a first block of memory cells 110, a second block of memory cells to store a feature array 120, and a third block of memory cells 130 to store an array of output values. The first block of memory cells 110 can store an input array, such as supplied via the data-in line 1295 from input ports on the integrated circuit 1200 (FIG. 12), or an array of convolved values from in-place convolution executed by the convolution circuitry 180 (FIG. 2). Sensing circuitry 160 is coupled to the first block of memory cells 110 and the second block of memory cells 120 to compare electrical differences between the memory cells in the first block and the memory cells in the second block to generate the array of output values. The electrical differences indicate write strength for the memory cells in the array of output values. The write strength can be referred to as weight, and the array of output values can be referred as a weight array.
  • Writing circuitry 170 is operatively coupled to the third block of memory cells 130 to store the array of output values in the third block of memory cells 130. The writing circuitry operatively coupled to the third block is configured to store an analog level in each cell of the third block for the array of output values, for instance, according to the electrical differences between analog values from its corresponding frame and analog values from the feature array stored in the second block of memory cells 120.
  • Sensing circuitry 160 is coupled to the first block of memory cells 110 and the second block of memory cells 120 via lines 115 and 125 respectively. Writing circuitry 170 is coupled to the sensing circuitry 160 and the third block of memory cells 130 via lines 165 and 175 respectively.
  • The first block of memory cells 110 can have a number M of rows of cells and a number N of rows of cells. For instance, M and N can be 128. A plurality of feature arrays can be stored in the second block of memory cells 120. For instance, the second block of memory cells 120 can store feature arrays F1-Fn. A feature array (e.g. F1) can be stored in a number Y of rows of cells and a number X of columns of cells.
  • For a set of frames of cells in the input array stored in the first block of memory cells 110, the sensing circuitry 160 is configured to compare electrical differences between the feature array with each frame (e.g. 111, FIG. 1) in the set of frames to generate the array of output values, where each value in the array of output values corresponds to a frame in the set of frames, and indicates electrical differences between analog values from its corresponding frame and analog values from the feature array. The device can includes address generation circuits (1250, FIG. 12) that apply addresses for the set of frames and the feature array to the first block and the second block in coordination with the sensing circuitry comparing the electrical differences.
  • Writing circuitry 170 operatively coupled to the third block is configured to store an analog level in each cell of the third block for the array of output values. Writing circuitry 170 can apply a sequence of write pulses for each cell in the third block having a number of write pulses determined according to a corresponding output value in the array of output values, where the analog levels in the third block of memory cells can include resistance levels or threshold voltage levels. For instance, a difference in analog levels can be compared against a resistance difference threshold, and a number of write pulses for changing the resistance levels can be based on whether the difference is above or below the resistance difference threshold. For instance, a difference in analog levels can be compared against a set of resistance difference thresholds (e.g. 0-1MΩ), and a number of write pulses for changing the resistance levels can be based on whether the difference is lower than the lowest resistance difference threshold in the set, higher than the highest resistance difference threshold in the set, or between two resistance difference thresholds in the set. For instance, a greater difference in analog levels can correspond to a greater number of write pulses, or vice versa.
  • Writing circuitry 170 can apply a sequence of write pulses for each cell in the third block having a pulse duration determined according to a corresponding output value in the array of output values, where the analog levels in the third block of memory cells can include resistance levels or threshold voltage levels. For instance, a difference in analog levels can be compared against a set of resistance difference thresholds (e.g. 0-MΩ), and a pulse duration for changing the resistance levels or threshold voltage levels can be based on whether the difference is lower than the lowest resistance difference threshold in the set, higher than the highest resistance difference threshold in the set, or between two resistance difference thresholds in the set. The pulse duration of a write pulse can be applied to a sequence of write pulses so the write pulses in the sequence have the same pulse duration. For instance, a greater difference in analog levels can correspond to a longer pulse duration of a write pulse, or vice versa.
  • Writing circuitry 170 can apply a sequence of write pulses for each cell in the third block having a tail length of a write pulse determined according to a corresponding output value in the array of output values, where the analog levels in the third block of memory cells can include resistance levels. For instance, a difference in analog levels can be compared against a set of resistance difference thresholds (e.g. 0-1MΩ), and a tail length of a write pulse for changing the resistance levels can be based on whether the difference is lower than the lowest resistance difference threshold in the set, higher than the highest resistance difference threshold in the set, or between two resistance difference thresholds in the set. The tail length of a write pulse can be applied to a sequence of write pulses so the write pulses in the sequence have the same tail length. For instance, a greater difference in analog levels can correspond to a longer tail length of a write pulse, or vice versa.
  • FIG. 2 illustrates an example device 200 for executing in-place convolution of a function of a filter array over an input array. Device 200 comprises a fourth block of memory cells 140 to store a filter array, and a fifth block of memory cells 150 to store an input array. Convolution circuitry 180 is operatively coupled to the fourth block of memory cells and the fifth block of memory cells to execute in-place convolution of a function of the filter array over the input array to generate an array of convolved values. Writing circuitry 190 is operatively coupled to the first block of memory cells 110 (FIG. 1) to store the array of convolved values in the first block. Convolution circuitry 180 is coupled to the fourth block 140 and the fifth block 150 via lines 145 and 155, respectively. Writing circuitry 190 is coupled to the convolution circuitry 180 via lines 185, and coupled to the first block of memory cells 110 (FIG. 1) via lines 195. In one embodiment, writing circuitry 170 (FIG. 1) and writing circuitry 190 can be the same writing circuitry.
  • The input array stored in the fifth block 150 and the filter array can include digital values, and the convolution circuitry can receive the digital values as inputs to the function. For a set of frames of cells in the input array, the function can convolve the filter array with each frame in the set of frames to generate the array of convolved values, where each value in the array of convolved values can correspond to a frame in the set of frames, and can indicate a number of digital values from its corresponding frame that matches corresponding digital values from the filter array.
  • Address generation circuits (1250, FIG. 12) an apply addresses for the set of frames in the input array and the filter array to the fifth block and the fourth block in coordination with the in-place convolution.
  • The fifth block of memory cells 150 has a number M of rows of cells and a number N of rows of cells. For instance, M and N can be 128. A plurality of filter arrays can be stored in the fourth block of memory cells 140. For instance, the fourth block of memory cells 140 can store filter arrays G1-Gn. A filter array (e.g. G1) can be stored in a number Y of rows of cells and a number X of columns of cells.
  • A frame of cells can have the same number Y of rows of cells and the same number X of columns of cells as in a filter array. In-place convolution of a different function of the filter array G1 can be executed over a set of frames of cells in the input array stored in the fifth block of memory cells 150. In-place convolution of a function of a different filter array (e.g. G2) can be executed over a set of frames of cells in the input array. A convolution layer can be generated by executing in-place convolution of a function of each filter array (e.g. G1) in the plurality of filter arrays (e.g. G1-Gn) over each frame of cells (511) in the set of frames in the input array.
  • For instance, convolution circuitry 180 can determine a number of matched digital values between cells in the filter array G1 and corresponding cells in a particular frame of cells 511 in the input array to generate an array of convolved values. Convolution circuitry 180 can determine a number of matched digital values in series, i.e., digital values of a cell in the filter array G1 and a corresponding cell in the frame of cells 511 are compared by convolution circuitry 180 at a time. Alternatively a number of matched digital values can be determined in parallel, i.e., digital values of all cells in the frame of cells 511 in the input array 150 and all corresponding cells in the filter array G1 can be compared by convolution circuitry 180 in parallel. Convolution operations are further described in reference to FIGS. 3-7.
  • The writing circuitry 190 operatively coupled to the first block 110 (FIG. 1) is configured to store an analog level in each cell of the first block for the array of convolved values, for instance, according to the determined number of matched digital values between the filter array and the frame of cells in the input array stored in the fifth block of memory cells 150.
  • Writing circuitry 190 can apply a sequence of write pulses for each cell in the first block 110 having a number of write pulses determined according to a corresponding value in the array of convolved values, where the analog levels in the first block of memory cells can include resistance levels or threshold voltage levels. For instance, a corresponding convolved value can indicate a number of matched digital values, and a number of write pulses can be greater for a higher number of matched digital values than for a lower number of matched digital values, or vice versa.
  • Writing circuitry 190 can apply a sequence of write pulses for each cell in the first block 110 having a pulse duration determined according to a corresponding value in the array of convolved values, where the analog levels in the first block of memory cells can include resistance levels or threshold voltage levels. For instance, a corresponding convolved value can indicate a number of matched digital values, and a pulse duration can be longer for a lower number of matched digital values than for a higher number of matched digital values, or vice versa.
  • Writing circuitry 190 can apply a sequence of write pulses for each cell in the first block having a tail length of a write pulse determined according to a corresponding value in the array of convolved values, where the analog levels in the first block of memory cells can include resistance levels. For instance, a corresponding convolved value can indicate a number of matched digital values, and a tail length of a write pulse can be longer for a lower number of matched digital values than for a higher number of matched digital values, or vice versa.
  • FIG. 3 illustrates executing in-place convolution as shown in FIG. 2 in more details. As described in reference to FIG. 2, convolution circuitry 180 can execute in-place convolution of a function of the filter array stored in the fourth block 140 over the input array stored in the fifth block of memory cells 150 to generate an array of convolved values. Writing circuitry (190, FIG. 2) operatively coupled to the first block of memory cells 110 can store the array of convolved values from convolution circuitry 180 in the first block of memory cells 110. In one embodiment, the first block of memory cells 110, the fourth block of memory cells 140, and the fifth block of memory cells 150 can be implemented on a single integrated circuit chip or a multichip module under one package.
  • As shown in the example of FIG. 3, the fifth block of memory cells 150 to store the input array can have a number M of rows of cells and a number N of columns of cells. A number ‘1’ or ‘0’ shown for a cell in the fifth block of memory cells represents a digital value. The fourth block of memory cells 140 to store the filter array can have a number Y of rows of cells and a number X of columns of cells. A number ‘1’ or ‘0’ shown for a cell in the fourth block of memory cells represents a digital value.
  • In one embodiment, the first block of memory cells 110 can have a number (M-Y+1) of rows of cells and a number (N-X+1) of columns of cells. As shown in the examples of FIGS. 3-7, N=9, M=9, X=3, and Y=3. The fifth block of memory cells 150 has 9 rows (R1-R9) and 9 columns (C1-C9), the fourth block of memory cells 140 has 3 rows (R1, R2, R3) and 3 columns (C1, C2, C3), and the first block of memory cells 110 has 7 rows and 7 columns. A frame of cells in the fifth block of memory cells to store the input array can have the same number Y of rows and the same number X of columns as the fourth block of memory cells 140.
  • As used herein, a target cell in a frame of cells in the fifth block of memory cells is a cell at the center of the frame of cells, surrounded by at least one row of cells on an upper side, at least one row of cells on a lower side, at least one row of cells on a left side, and at least one row of cells on a right side of the target cell. For instance, the frame of cells can include cells in 3 consecutive rows (e.g. R1, R2, R3) and 3 consecutive columns (e.g. C1, C2, C3), and the target cell is at a center row and a center column of the frame of cells (e.g. R2C2 for a frame 511, FIG. 4).
  • In the embodiment described above in reference to FIG. 3, cells in the border rows (e.g. R1, R9) and in the border columns (e.g. C1, C9) are not target cells, as they are not surrounded by other cells on at least one of top, bottom, left and right sides. Accordingly a number of frames of cells in the input array that can have a target cell at the center of a frame is fewer than the number of cells in the input array, the number of convolutions of a function of the filter array over the frames of cells having a target cell is fewer than the number of cells in the input array, and the number of cells in the first block of memory cells to store the array of convolved values from the convolutions is fewer than the number of cells in the input array.
  • In an alternative embodiment, zero-padding can be used to pad the fifth block of memory cells 150 with a binary value ‘0’ around the fifth block of memory cells. For instance, a row of cells with binary values ‘0’ can be padded adjacent a border row (e.g. R1, R9) in the fifth block of memory cells, and a column of cells of ‘0’ can be padded adjacent a border column (e.g. C1, C9) in the fifth block of memory cells, so the filter array can be applied to cells in a border row or a border column in the fifth block of memory cells. In other words, with padded rows of cells and padded columns of cells, each cell in a border row of cells or a border column of cells can be a target cell in a frame of cells for in-place convolution with a filter array. With padded rows of cells and padded columns of cells for the fifth block of memory cells, the first block of memory cells can have the same number M of rows of cells and the same number N of columns as the fifth block of memory cells.
  • In one embodiment, the analog levels in the first block of memory cells 110 include programmable resistance memory cells having resistance levels. Programmable resistance memories can include phase change memory (PCM), resistive random access memory (RRAM), and magnetoresistive random access memory (MRAM). In one embodiment, the analog levels in the first block of memory cells can include at least a number (X times Y) of resistance levels. In this example, X=3, Y=3, and (X times Y)=9 resistance levels. In this embodiment, a number ‘1’, ‘0.9’, ‘0.8’, ‘0.7’, ‘0.6’, etc for a cell in the first block of memory cells can represent 1MΩ, 0.9 MΩ, 0.8 MΩ, 0.7 MΩ, 0.6 MΩ, etc respectively, as shown in the examples of FIGS. 3-7.
  • Before a process starts to convolve a function of the filter array over the input array to generate an array of convolved values, the first block of memory cells can be set to the highest resistance level, such as 1MΩ, representing the case when a number of matched digital values is the same as the number of digital values in a filter array.
  • In an alternative embodiment, the analog levels in the first block of memory cells 110 include charge storage memory cells having threshold voltage levels. Charge storage memories can include floating gate and nitride trapping memories. In one embodiment, the analog levels in the first block of memory cells can include at least a number (X times Y) of threshold voltage levels. In this example, X=3, Y=3, and (X times Y)=9 threshold voltage levels. In this embodiment, a number ‘1’, ‘0.9’, ‘0.8’, ‘0.7’, ‘0.6’, for a cell in the first block of memory cells can represent 10V, 9V, 8V, 7V, 0.6V, etc respectively, as shown in the examples of FIGS. 3-7.
  • Before a process starts to convolve a function of the filter array over the input array to generate an array of convolved values, the first block of memory cells can be erased to the lowest threshold voltage level, representing the case when a number of matched digital values is zero.
  • Convolution circuitry (180, FIG. 2) can execute in-place convolution of a function of the filter array over the input array to generate an array of convolved values. Each value in the array of convolved values can indicate a number of digital values from its corresponding frame that match corresponding digital values from the filter array.
  • Storing the convolved value in the particular cell in the first block of memory cells can include addressing the particular cell in the first block of memory cells, and converting the convolved values from in-place convolution into a set time of a set pulse or a program time of a program pulse for the cell in the first block of memory cells. A set time of a set pulse can be used when analog levels in the first block of memory cells include resistance levels. A program time of a program pulse can be used when analog levels in the first block of memory cells include threshold voltage levels. The set time can be applied to a sequence of set pulses so the set pulses in the sequence have the same set time. The program time can be applied to a sequence of program pulses so the program pulses in the sequence have the same program time. The convolved values can be converted into a number of set pulses for a sequence of set pulses, or a number of program pulses for a sequence of program pulses. Furthermore, the convolved values can be converted into a combination of varying set times and numbers of set pulses, or a combination of varying program times and numbers of program pulses. The convolved values in the array of convolved values are stored as analog levels in the first block of memory cells, and no verify cycles are needed to verify that a cell in the first block of memory cells has been changed to a target resistance or threshold range. In comparison, to write a digital value to a cell, verify cycles are needed to verify whether the cell is within a target resistance or threshold range, and to determine whether more set pulses or program pulses are needed. Storing convolved values as analog levels instead of digital values can improve the performance of storing the convolved values in the array of convolved values, because the verify cycles are not needed.
  • The frame address of a frame of cells in the fifth block of memory cells 150 can refer to a row address and a column address of a cell in the frame of cells. For instance, a frame address can refer to a row address and a column address of a target cell at the center of a frame of cells (e.g. R2C2 for a frame 511, FIG. 4). The frame address can be sequenced in a row direction from a particular frame of cells by at least one column, or in a column direction from a particular frame of cells by at least one row, to address a next frame of cells. Technology as described herein for executing in-place convolution of the function of the filter array over a frame of cells in the fifth block of memory cells can be applied in sequence to other frames of cells in the fifth block of memory cells.
  • FIG. 4 illustrates an example of executing in-place convolution of a function of a filter array over an input array. In this example, a number of matched digital values is between the fourth block of memory cells 140 to store the filter array and a particular frame of cells 511 at a first frame address R2C2 in the fifth block of memory cells 150. Convolution circuitry (180, FIG. 2) can compare the filter array stored in the fourth block of memory cells 140 and the particular frame of cells 511 stored in the fifth block of memory cells 150. A convolved value from the convolution circuitry can indicate a number of digital values (Y=8) from its corresponding frame (511) that matches corresponding digital values from the filter array.
  • In this example, the fourth block of memory cells 140 to store the filter array has 3 rows (R1, R2, R3) and 3 columns (C1, C2, C3) of cells, and the particular frame of cells 511 has 3 rows (R1, R2, R3) and 3 columns (C1, C2, C3) of cells correspondingly. In this example, the cells in the filter array and the particular frame have one bit per cell. The filter array has digital values 0, 1, 1, 1, 0, 1, 1, 1 and 0 at addresses R1C1, R1C2, R1C3, R2C1, R2C2, R2C3, R3C1, R3C2 and R3C3, respectively. The particular frame of cells has digital values 1, 1, 1, 1, 0, 1, 1, 1 and 0 at corresponding addresses. Table 1 indicates matched digital values with ‘1’, and digital values that are not matched with ‘0’. In this example, the number of matched digital values is 8 (Y=8).
  • TABLE 1
    R1C1 R1C2 R1C3 R2C1 R2C2 R2C3 R3C1 R3C2 R3C3
    Kernel
    0 1 1 1 0 1 1 1 0
    Frame 1 1 1 1 0 1 1 1 0
    Match- 0 1 1 1 1 1 1 1 1
    ed
  • The fourth block of memory cells can store different values than shown in this example. The function can be different than determining a number of matched digital values. For example, the function can including determining a number of corresponding digital values in the filter array and the particular frame of cells that are both ‘1’, both ‘0’, not matched, etc.
  • Writing circuitry (190, FIG. 2) operatively coupled to the first block of memory cells 110 can change an analog level of a first cell 511C in the first block of memory cells 110 according to the number of matched digital values. In one embodiment, the analog levels in the first block of memory cells include resistance levels, and a resistance level can be set to the number of matched digital values divided by (1+ the number of cells in the fourth block of memory cells) in MΩ (Megaohm). In this example, where the number of matched digital values is 8 and the second array has 9 cells, a resistance level of 8/(1+9)=0.8MΩ can be set for a first cell 511C in the first block of memory cells 110.
  • FIG. 5 illustrates a second example of executing in-place convolution of a function of a filter array over an input array. Address generation circuits (1250, FIG. 12) can apply addresses for the set of frames and the filter array to the fifth block of memory cells and the fourth block of memory cells in coordination with the in-place convolution. A second frame of cells 512 can be selected at a second frame address in the fifth block of memory cells. The second frame address can be sequenced from the first frame address by a stride, where the stride can include either at least one column in a row direction or at least one row in a column direction. In this example, the second frame of cells 512 at the second frame address R2C3 in the fifth block of memory cells 150 is selected, where the second frame address R2C3 is the address of the target cell at the center of the second frame of cells. A second number of matched digital values is between the fourth block of memory cells 140 to store the filter array and the second frame of cells 512 at the second frame address R2C3 in the fifth block of memory cells 150. The second frame address R2C3 can be sequenced from the first frame address R2C2 by one column in a row direction. Convolution circuitry (180, FIG. 2) can compare the filter array stored in the fourth block of memory cells 140 and the second frame of cells 512 stored in the fifth block of memory cells 150. A convolved value from the convolution circuitry can indicate a number of digital values (Y=4) from its corresponding frame (512) that matches corresponding digital values from the filter array.
  • In this example, the fourth block of memory cells 140 to store the filter array has 3 rows (R1, R2, R3) and 3 columns (C1, C2, C3), and the second frame of cells 512 has 3 rows (R1, R2, R3) and 3 columns (C2, C3, C4) correspondingly. In this example, the cells in the filter array and the second frame have one bit per cell. The filter array has digital values 0, 1, 1, 1, 0, 1, 1, 1 and 0 at addresses R1C1, R1C2, R1C3, R2C1, R2C2, R2C3, R3C1, R3C2 and R3C3, respectively. The second frame of cells has digital values 1, 1, 1, 0, 1, 1, 1, 0 and 1 at corresponding addresses. Table 2 indicates matched digital values with ‘1’, and digital values that are not matched with ‘0’. In this example, the number of matched digital values is 4 (Y=4).
  • TABLE 2
    R1C1 R1C2 R1C3 R2C1 R2C2 R2C3 R3C1 R3C2 R3C3
    Kernel
    0 1 1 1 0 1 1 1 0
    Frame 1 1 1 0 1 1 1 0 1
    Match- 0 1 1 0 0 1 1 0 0
    ed
  • Writing circuitry (190, FIG. 2) operatively coupled to the first block of memory cells 110 can change an analog level of a second cell 512C in the first block of memory cells 110 according to the second number of matched digital values. In one embodiment, the analog levels in the first block of memory cells include resistance levels, and a resistance level can be set to the number of matched digital values divided by (1+ the number of cells in the fourth block of memory cells) in Megaohm (MΩ). In this example, where the number of matched digital values is 4 and the second array has 9 cells, a resistance level of 4/(1+9)=0.4MΩ can be set for a second cell 512C in the first block of memory cells 110.
  • The second cell 512C is at a different row/column address than the first cell 511C in the first block of memory cells 110. For instance, the second cell 512C can be at the same row of cells as the first cell 511C in the first block of memory cells 110, and at a different column of cells as the first cell 511C in the first block of memory cells 110. For instance, the second cell 512C can be at a different row of cells and at a different column of cells as the first cell 511C in the first block of memory cells 110.
  • FIG. 6 illustrates a third example of executing in-place convolution of a function of a filter array over an input array. Address generation circuits (1250, FIG. 12) can apply addresses for the set of frames and the filter array to the fifth block of memory cells 150 and the fourth block of memory cells 140 in coordination with the in-place convolution. A third frame of cells 521 can be selected at a third frame address in the fifth block of memory cells. The third frame address can be sequenced from the first frame address by a stride, where the stride can include either at least one column in a row direction or at least one row in a column direction. In this example, a third frame of cells 521 at a third frame address R3C2 in the fifth block of memory cells 150 is selected, where the third frame address R3C2 is the address of the target cell at the center of the third frame of cells. A third number of matched digital values is between the fourth block of memory cells 140 to store the filter array and the third frame of cells 521 at the third frame address R3C2 in the fifth block of memory cells 110. The third frame address R3C2 can be sequenced from the first frame address R2C2 by one row in a column direction. Convolution circuitry (180, FIG. 2) can compare the filter array stored in the fourth block of memory cells 140 and the third frame of cells 521 in the fifth block of memory cells 150. A convolved value from the convolution circuitry can indicate a number of digital values (Y=4) from its corresponding frame (121) that matches corresponding digital values from the filter array.
  • In this example, the fourth block of memory cells 140 to store the filter array has 3 rows (R1, R2, R3) and 3 columns (C1, C2, C3), and the third frame of cells 521 has 3 rows (R2, R3, R4) and 3 columns (C1, C2, C3) correspondingly. In this example, the cells in the filter array and the third frame have one bit per cell. The filter array has digital values 0, 1, 1, 1, 0, 1, 1, 1 and 0 at addresses R1C1, R1C2, R1C3, R2C1, R2C2, R2C3, R3C1, R3C2 and R3C3, respectively. The third frame of cells has digital values 1, 0, 1, 1, 1, 0, 1, 1 and 1 at corresponding addresses. Table 3 indicates matched digital values with ‘1’, and digital values that are not matched with ‘0’. In this example, the number of matched digital values is 4 (Y=4).
  • TABLE 3
    R1C1 R1C2 R1C3 R2C1 R2C2 R2C3 R3C1 R3C2 R3C3
    Kernel
    0 1 1 1 0 1 1 1 0
    Frame 1 0 1 1 1 0 1 1 1
    Match- 0 0 1 1 0 0 1 1 0
    ed
  • Writing circuitry (190, FIG. 2) operatively coupled to the first block of memory cells 110 can change an analog level of a third cell 521C in the first block of memory cells 110 according to the third number of matched digital values. In one embodiment, the analog levels in the first block of memory cells include resistance levels, and a resistance level can be set to the number of matched digital values divided by (1+ the number of cells in the fourth block of memory cells) in Megaohm (MΩ). In this example, where the number of matched digital values is 4 and the second array has 9 cells, a resistance level of 4/(1+9)=0.4MΩ can be set for a third cell 521C in the first block of memory cells 110.
  • The third cell 521C is at a different row/column address than the first cell 511C and the second cell 512C in the first block of memory cells 110. For instance, the third cell 521C can be at the same column of cells as the first cell 511C in the first block of memory cells 110, and at a different row of cells as the first cell 511C in the first block of memory cells 110. For instance, the third cell 521C can be at a different row of cells and at a different column of cells as the first cell 511C and the second cell 512C in the first block of memory cells 110.
  • In one embodiment, executing in-place convolution of a function of the filter array over the input array can include convolving the function of the filter array over frames of cells at a first row address (e.g. R1) in the fifth block of memory cells 150 while sequencing the column addresses (C1-C9) of the frames of cells, and then convolving the function of the filter array over frames of cells at a next row address (e.g. R2) in the fifth block of memory cells 150 while sequencing the column addresses (C1-C9) of the frames of cells. The next row address is sequenced from the first row address by at least one row.
  • FIG. 7 illustrates a fourth example of executing in-place convolution of a function of a filter array over an input array. Address generation circuits (1250, FIG. 12) can apply addresses for the set of frames and the filter array to the fifth block of memory cells 150 and the fourth block of memory cells 140 in coordination with the in-place convolution. In this example, a last number of matched digital values is between the fourth block of memory cells 140 to store the filter array and a last frame of cells 577 in the fifth block of memory cells 150. The last frame of cells 577 includes cells addressed in the last three rows of cells in the number M of rows and in the last three columns of cells in the number N of columns, e.g. R7C7, R7C8, R7C9, R8C7, R8C8, R8C9, R9C7, R9C8, R9C9. Convolution circuitry (180, FIG. 2) can compare the filter array stored in the fourth block of memory cells 140 and the last frame of cells 577 in the fifth block of memory cells 150. A convolved value from the convolution circuitry can indicate a number of digital values (Y=8) from its corresponding frame (577) that matches corresponding digital values from the filter array.
  • In this example, the fourth block of memory cells 140 to store the filter array has 3 rows (R1, R2, R3) and 3 columns (C1, C2, C3), and the last frame of cells 577 has 3 rows (R7, R8, R9) and 3 columns (C7, C8, C9) correspondingly. The filter array has digital values 0, 1, 1, 1, 0, 1, 1, 1 and 0 at addresses R1C1, R1C2, R1C3, R2C1, R2C2, R2C3, R3C1, R3C2 and R3C3, respectively. In this example, the cells in the filter array and the last frame have one bit per cell. The last frame of cells has digital values 0, 1, 1, 1, 0, 1, 1, 1 and 1 at corresponding addresses. Table 4 indicates matched digital values with ‘1’, and digital values that are not matched with ‘0’. In this example, the number of matched digital values is 8 (Y=8).
  • TABLE 4
    R1C1 R1C2 R1C3 R2C1 R2C2 R2C3 R3C1 R3C2 R3C3
    Kernel
    0 1 1 1 0 1 1 1 0
    Frame 0 1 1 1 0 1 1 1 1
    Match- 1 1 1 1 1 1 1 1 0
    ed
  • Writing circuitry (190, FIG. 2) operatively coupled to the first block of memory cells 110 can change an analog level of the cell 577C in the first block of memory cells 110 according to the last number of matched digital values. In one embodiment, the analog levels in the first block of memory cells include resistance levels, and a resistance level can be set to the number of matched digital values divided by (1+ the number of cells in the fourth block of memory cells) in Megaohm (MΩ). In this example, where the number of matched digital values is 8 and the second array has 9 cells, a resistance level of 8/(1+9)=0.8MΩ can be set for the last cell 577C in the first block of memory cells 110.
  • Address generation circuits (1250, FIG. 12) can apply addresses for the set of frames and the filter array to the fifth block 150 and the fourth block of memory cells 140 in coordination with the in-place convolution. A first function of the filter array can be convolved over all frames in the set of frames stored in the input array to generate an array of convolved values, and the array of convolved values can be stored as analog levels in the first block of memory cells. Subsequently a second function of the filter array can be convolved over all frames in the set of frames stored in the input array to generate a second array of convolved values, and the second array of convolved values can be stored as analog levels in the first block of memory cells.
  • Furthermore, different functions of different filter arrays can be used for executing in-place convolution over the input array to generate respective arrays of convolved values, and the respective arrays of convolved values can be stored as analog levels in the first block of memory cells.
  • FIG. 8 illustrates an example of a pulse duration determined according to a convolved value from in-place convolution for programmable resistance memory cells. In this example, the first block of memory cells 110 includes programmable resistance memory cells having resistance levels. Programmable resistance memories can include phase change memory (PCM), resistive random access memory (RRAM), and magnetoresistive random access memory (MRAM). For programmable resistance memory cells, a pulse duration can be referred to as a set time, and a write pulse can be referred to as a set pulse. The writing circuitry (190, FIG. 2) can determine a pulse duration for write pulses in a sequence of write pulses for changing the resistance levels of cells in the first block of memory cells according to the number of matched digital values Y between a filter array stored in the fourth block of memory cells and a particular frame of cells in the input array stored in the fifth block of memory cells. For instance, the set time of a set pulse can be longer for a lower number of matched digital values than for a higher number of matched digital values, or vice versa. A longer set time of a set pulse can induce lower resistance R, and a shorter set time of a set pulse can induce higher resistance R.
  • The writing circuitry (190, FIG. 2) can also determine a number of write pulses for changing the resistance levels according to the number of matched digital values. For instance, a number of write pulses can be greater for a higher number of matched digital values than for a lower number of matched digital values, or vice versa.
  • Before a process starts to execute in-place convolution of a function of the filter array over the input array to generate an array of convolved values, the first block of memory cells can be set to the highest resistance level, representing the case when a number of matched digital values is the same as the number of digital values in a filter array. During the process, if a number of matched digital values is the same as the number of digital values in a filter array, then no set pulse is applied to a cell in the first block of memory cells.
  • FIG. 9 illustrates an example of a pulse duration determined according to convolved value from in-place convolution for charge storage memory cells. In this example, the first block of memory cells 110 includes charge storage memory cells having threshold voltage levels. Charge storage memories can include floating gate and nitride trapping memories. For charge storage memory cells, a pulse duration can be referred to as a program time, and a write pulse can be referred to as a program pulse. The writing circuitry (190, FIG. 2) can determine a pulse duration for write pulses in a sequence of write pulses for changing the threshold voltage levels in the first block of memory cells according to a number of matched digital values Y between a filter array stored in the fourth block of memory cells and a particular frame of cells in the input array stored in the fifth block of memory cells. For instance, the program time of a program pulse can be longer for a lower number of matched digital values than for a higher number of matched digital values, or vice versa. A longer program time of a program pulse can induce higher threshold voltage Vt, and a shorter program time of a program pulse can induce lower resistance R.
  • The writing circuitry (190, FIG. 2) can also determine a number of write pulses for changing the threshold voltage levels according to the number of matched digital values. For instance, a number of program pulses can be greater for a higher number of matched digital values than for a lower number of matched digital values, or vice versa.
  • Before a process starts to execute in-place convolution of a function of the filter array over the input array to generate an array of output values, the first block of memory cells can be erased to the lowest threshold voltage level, representing the case when a number of matched digital values is zero. During the process, if a number of matched digital values is zero, then no program pulse is applied to a cell in the first block of memory cells.
  • FIGS. 10A, 10B and 10C illustrate example pulse shapes of set pulses for changing the resistance level of a cell having a body of phase change material. FIG. 10A illustrates a single set pulse 1010 having a relatively long pulse duration and rapid rising and falling edges, with an amplitude above a melting threshold 1005 for the phase change material. FIG. 10B illustrates a sequence of set pulses 1021 and 1022 having a shorter pulse duration than the single set pulse 1010 in FIG. 10A. FIG. 10C illustrates a single set pulse with a rapid rising edge and a ramp-shaped trailing edge or a set tail 1035 of constant or near constant slope. For instance, a tail length of a set tail 1035 can vary between 10 ns and 1 ms, according to the differences in analog levels between the filter array and the particular frame of cells in the input array stored in the fifth block of memory cells.
  • FIG. 11 illustrates a simplified flowchart for a flow in operating a device. At Step 1110, an input array can be stored in a first block of memory cells. At Step 1120, a feature array can be stored in a second block of memory cells.
  • At Step 1130, the third block of memory cells 130 can be initialized. The third block of memory cells can comprise programmable resistance memory cells having resistance levels, or charge storage memory cells having threshold voltage levels. Where the analog levels in the third block of memory cells include resistance levels, Step 1130 can include setting the third block of memory cells to the highest resistance level, such as 1MΩ. For example, the highest resistance level can represent the case where a number of matched digital values between the feature array and a particular frame of cells in the first block of memory cells is the same as the number of digital values in the feature array. Where the analog levels in the third block of memory cells include threshold voltage levels, Step 1130 can include erasing the third block of memory cells to the lowest threshold voltage level. For example, the lowest threshold voltage level can represent the case where a number of matched digital values between the feature array and a particular frame of cells in the first block of memory cells is zero.
  • The order of Steps 1110, 1120 and 1130 as shown in the flowchart does not indicate the order in which Steps 1110, 1120 and 1130 can be executed. For instance, Step 1130 can be executed before Step 1110, and Step 1110 can be executed after Step 1120.
  • At Step 1140, sensing circuitry coupled to the first block of memory cells and the second block of memory cells can compare electrical differences between memory cells in the first block and the memory cells in the second block to generate an array of output values. For a set of frames of cells in the first block, the sensing circuitry can compare electrical differences between the feature array with each frame in the set of frames to generate the array of output values, where each value in the array of output values corresponds to a frame in the set of frames, and indicates electrical differences between analog values from its corresponding frame and analog values from the feature array.
  • At Step 1150, the writing circuitry operatively coupled to the third block of memory cells 130 can store the array of output values in the third block of memory cells. An analog level can be stored in each cell of the third block for the array of output values. The writing circuitry (170, FIG. 1) can apply a sequence of write pulses for each cell in the third block having a number of write pulses determined according to a corresponding output value in the array of output values, where cells in the third block of memory cells can include resistance levels or threshold voltage levels. The writing circuitry can apply a sequence of write pulses for each cell in the third block having a pulse duration determined according to a corresponding output value in the array of output values, where cells in the third block of memory cells include resistance levels or threshold voltage levels. The writing circuitry can apply a sequence of write pulses for each cell in the third block having a tail length of a write pulse determined according to a corresponding output value in the array of output values, where the analog levels in the third block of memory cells include resistance levels.
  • Furthermore, the device can comprise a fourth block of memory cells to store a filter array and a fifth block of memory cells to store an input array. Convolution circuitry is operatively coupled to the fourth block of memory cells and the fifth block of memory cells to generate an array of convolved values. The flow can include executing in-place convolution of a function of the filter array over the input array to generate an array of convolved values, and storing the array of convolved values in the first block. The flow can continue to compare electrical differences between the array of convolved values stored in the first block of memory cells and a feature array stored in the second block of memory cells to generate the array of output values, and store the array of output values in the third block of memory cells.
  • The input array stored in the fifth block of memory cells and the filter array can include digital values, and the convolution circuitry can receive the digital values as inputs to the function. For a set of frames of cells in the input array, the function can convolve the filter array with each frame in the set of frames to generate the array of convolved values, where each value in the array of convolved values corresponds to a frame in the set of frames, and indicates a number of digital values from its corresponding frame that matches corresponding digital values from the filter array. The flow includes applying addresses for the set of frames in the input array and the filter array to the fifth block and the fourth block in coordination with the in-place convolution.
  • FIG. 12 is a simplified block diagram of an integrated circuit in accordance with the present technology. In the example shown in FIG. 12, the integrated circuit 1200 includes a memory 1270. The memory 1270 comprises a first block of memory cells 110, a second block of memory cells 120 to store a feature array, a third block of memory cells 130 to store an array of output values, a fourth block of memory cells 140 to store a filter array, and a fifth block of memory cells 150. In one embodiment, the first block of memory cells 110 is configured to store an input array. In an alternative embodiment, the fifth block of memory cells 150 is configured to store an input array. In one embodiment, the filter array and the feature array can be the same array.
  • The integrated circuit 1200 includes address generation circuits 1250 that apply addresses for the set of frames in the input array stored in the first block of memory cells and the feature array to the first block and the second block in coordination with the sensing circuitry comparing the electrical differences. Address generation circuits 1250 can also apply addresses for the set of frames in the input array stored in the fifth block and the filter array to the fifth block and the fourth block in coordination with the in-place convolution.
  • Address generation circuits 1250 can include a first block address generator 1251, a feature array address generator 1252, an output array address generator 1253, a filter address generator 1254, and a fifth block address generator 1255. The first block address generator 1251 is coupled to address lines 1261 which in turn are coupled to the first block of memory cells 110. The feature array address generator 1252 is coupled to address lines 1262 which in turn are coupled to the second block of memory cells 120. The output array address generator 1253 is coupled to address lines 1263 which in turn are coupled to the third block of memory cells 130. The filter address generator 1254 is coupled to address lines 1264 which in turn are coupled to the fourth block of memory cells 140. A fifth block address generator 1255 is coupled to address lines 1265 which in turn are coupled to fifth block 150. Addresses are supplied on bus 1240 to the first block address generator 1251, the feature array address generator 1252, the output array address generator 1253, the filter address generator 1254, and the fifth block address generator 1255.
  • Convolution circuitry 180 is operatively coupled to the fourth block of memory cells 140, the fifth block of memory cells 150, and the first block of memory cells 110 via lines 1274, 1275 and 1271 a respectively, for executing in-place convolution of a function of a filter array over the input array stored in the fifth block of memory cells to generate an array of convolved values. Sensing circuitry 160 is coupled to the first block of memory cells and the second block of memory cells via lines 1271 b and 1272 respectively, for comparing electrical differences between the memory cells in the first block and the memory cells in the second block to generate an array of output values. The third block of memory cells 130 is coupled to the sensing circuitry 160 via lines 1273, for storing the array of output values in the third block of memory cells.
  • In one embodiment, the first block of memory cells 110, the second block of memory cells 120, the third block of memory cells 130, the fourth block of memory cells 140, and the fifth block of memory cells 150 can be configured in separate blocks of cells. The first block address generator 1251, the feature array address generator 1252, the output array address generator 1253, the filter address generator 1254, and the fifth block address generator 1255 can be separate address generators, including respective row decoders for word lines and column decoders for bit lines. In an alternative embodiment, the first block of memory cells 110, the second block of memory cells 120, the third block of memory cells 130, the fourth block of memory cells 140, and the fifth block of memory cells 150 can be configured in a common block of cells. In this embodiment, the first, second and third arrays of cells can share word lines coupled to a common row decoder, and have respective column decoders for bit lines coupled to respective arrays of cells.
  • Data is supplied via the data-in line 1295 from input/output ports on the integrated circuit 1200 or from other data sources internal or external to the integrated circuit 1200, to the first block of memory cells 110, the second block of memory cells 120, the third block of memory cells 130, the fourth block of memory cells 140, and the fifth block of memory cells 150. Data supplied via the data-in line 1295 can include an input array to be stored in the first block of memory cells 110 or the fifth block of memory cells 150, a filter array to be stored in the fourth block of memory cells 140, and a feature array to be stored in the second block of memory cells 120. In the illustrated embodiment, other circuitry 1290 is included on the integrated circuit, such as a general purpose processor or special purpose application circuitry, or a combination of modules providing system-on-a-chip functionality supported by the memory array. Data is supplied via the data-out line 1285 from the sensing circuitry 160 to input/output ports on the integrated circuit 1200, or to other data destinations internal or external to the integrated circuit 1200. Data supplied via the data-out line 1285 can include the array of output values stored in the third block of memory cells 130.
  • Convolution circuitry 180 can execute in-place convolution of a function of the filter array over the input array stored in the fifth block of memory cells to generate an array of convolved values. Writing circuitry 170 operatively coupled to the third block 130 can change an analog level of a cell in the output array. Writing circuitry 190 operatively coupled to the first block 110 can change an analog level of a cell in the first block 110. In one embodiment, writing circuitry 170 and writing circuitry 190 can be the same writing circuitry. Convolution circuitry 180, writing circuitry 170 and writing circuitry 190 implemented in this example using bias arrangement state machine control the application of bias arrangement supply voltages 1220 generated or provided through the voltage supply or supplies in block 1220, such as read, program and erase voltages.
  • Convolution circuitry 180 and writing circuitry 170 can be implemented using special-purpose logic circuitry as known in the art. In alternative embodiments, convolution circuitry 180 and writing circuitry 170 can comprise a general-purpose processor, which can be implemented on the same integrated circuit to control the operations of the device. In yet other embodiments, a combination of special-purpose logic circuitry and a general-purpose processor can be utilized for implementation of convolution circuitry 180 and writing circuitry 170.
  • While the present invention is disclosed by reference to the preferred embodiments and examples detailed above, it is to be understood that these examples are intended in an illustrative rather than in a limiting sense. It is contemplated that modifications and combinations will readily occur to those skilled in the art, which modifications and combinations will be within the spirit of the invention and the scope of the following claims. What is claimed is:

Claims (23)

1. A device, comprising:
a first block of memory cells;
a second block of memory cells to store a feature array;
a third block of memory cells to store an array of output values at analog levels;
sensing circuitry coupled to the first block of memory cells and the second block of memory cells to compare electrical differences between the memory cells in the first block and the memory cells in the second block to generate the array of output values; and
writing circuitry operatively coupled to the third block to store the array of output values in the third block of memory cells.
2. The device of claim 1, wherein
for a set of frames of cells in the first block, the sensing circuitry is configured to compare electrical differences between the feature array with each frame in the set of frames to generate the array of output values, where each value in the array of output values corresponds to a frame in the set of frames, and indicates electrical differences between analog values from its corresponding frame and analog values from the feature array.
3. The device of claim 2, including address generation circuits that apply addresses for the set of frames and the feature array to the first block and the second block in coordination with the sensing circuitry comparing the electrical differences.
4. The device of claim 1, wherein the first block is configured to store an input array.
5. The device of claim 1, comprising:
a fourth block of memory cells to store a filter array;
a fifth block of memory cells to store an input array;
convolution circuitry operatively coupled to the fourth block of memory cells and the fifth block of memory cells to execute in-place convolution of a function of the filter array over the input array to generate an array of convolved values; and
writing circuitry operatively coupled to the first block of memory cells to store the array of convolved values in the first block.
6. The device of claim 5, wherein
the input array and the filter array include digital values, and the convolution circuitry receives the digital values as inputs to the function; and
for a set of frames of cells in the input array, the function convolves the filter array with each frame in the set of frames to generate the array of convolved values, where each value in the array of convolved values corresponds to a frame in the set of frames, and indicates a number of digital values from its corresponding frame that matches corresponding digital values from the filter array.
7. The device of claim 6, including address generation circuits that apply addresses for the set of frames in the input array and the filter array to the fifth block and the fourth block in coordination with the in-place convolution.
8. The device of claim 1, wherein the writing circuitry operatively coupled to the third block is configured to store an analog level in each cell of the third block for the array of output values.
9. The device of claim 1, wherein the writing circuitry applies a sequence of write pulses for each cell in the third block having a number of write pulses determined according to a corresponding output value in the array of output values.
10. The device of claim 1, wherein the writing circuitry applies a sequence of write pulses for each cell in the third block having a pulse duration determined according to a corresponding output value in the array of output values.
11. The device of claim 1, wherein the writing circuitry applies a sequence of write pulses for each cell in the third block having a tail length of a write pulse determined according to a corresponding output value in the array of output values.
12. The device of claim 1, wherein the first, second and third blocks of memory cells are implemented on a single integrated circuit or multichip module under one package.
13. A method of operating a device comprising a first block of memory cells, a second block of memory cells to store a feature array, and a third block of memory cells to store an array of output values at analog levels, the method comprising:
comparing electrical differences between memory cells in the first block and the memory cells in the second block to generate the array of output values; and
storing the array of output values in the third block of memory cells.
14. The method of claim 13, comprising:
for a set of frames of cells in the first block, comparing electrical differences between the feature array with each frame in the set of frames to generate the array of output values, where each value in the array of output values corresponds to a frame in the set of frames, and indicates electrical differences between analog values from its corresponding frame and analog values from the feature array.
15. The method of claim 14, comprising:
applying addresses for the set of frames and the feature array to the first block and the second block in coordination with the sensing circuitry comparing the electrical differences.
16. The method of claim 13, comprising:
storing an input array in the first block of memory cells.
17. The method of claim 13, wherein the device comprises a fourth block of memory cells to store a filter array and a fifth block of memory cells to store an input array, the method comprising:
executing in-place convolution of a function of the filter array over the input array to generate an array of convolved values; and
storing the array of convolved values in the first block.
18. The method of claim 17, wherein the input array and the filter array include digital values, the method comprising:
receiving the digital values as inputs to the function; and
for a set of frames of cells in the input array, convolving the filter array with each frame in the set of frames to generate the array of convolved values, where each value in the array of convolved values corresponds to a frame in the set of frames, and indicates a number of digital values from its corresponding frame that matches corresponding digital values from the filter array.
19. The method of claim 18, comprising:
applying addresses for the set of frames in the input array and the filter array to the fifth block and the fourth block in coordination with the in-place convolution.
20. The method of claim 13, comprising:
storing an analog level in each cell of the third block for the array of output values.
21. The method of claim 13, comprising:
applying a sequence of write pulses for each cell in the third block having a number of write pulses determined according to a corresponding output value in the array of output values.
22. The method of claim 13, comprising:
applying a sequence of write pulses for each cell in the third block having a pulse duration determined according to a corresponding output value in the array of output values.
23. The method of claim 13, comprising:
applying a sequence of write pulses for each cell in the third block having a tail length of a write pulse determined according to a corresponding output value in the array of output values.
US16/205,743 2018-11-30 2018-11-30 In-memory convolution for machine learning Active US10672469B1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
US16/205,743 US10672469B1 (en) 2018-11-30 2018-11-30 In-memory convolution for machine learning
TW108119229A TWI696189B (en) 2018-11-30 2019-06-03 Memory device
CN201910488755.3A CN111261210B (en) 2018-11-30 2019-06-05 Memory device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US16/205,743 US10672469B1 (en) 2018-11-30 2018-11-30 In-memory convolution for machine learning

Publications (2)

Publication Number Publication Date
US10672469B1 US10672469B1 (en) 2020-06-02
US20200176056A1 true US20200176056A1 (en) 2020-06-04

Family

ID=70848779

Family Applications (1)

Application Number Title Priority Date Filing Date
US16/205,743 Active US10672469B1 (en) 2018-11-30 2018-11-30 In-memory convolution for machine learning

Country Status (3)

Country Link
US (1) US10672469B1 (en)
CN (1) CN111261210B (en)
TW (1) TWI696189B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5301162A (en) * 1992-03-26 1994-04-05 Nec Corporation Semiconductor random access memory device having shared sense amplifiers serving as a cache memory
US8386884B2 (en) * 2009-07-14 2013-02-26 Macronix International Co., Ltd. Memory apparatus with multi-level cells and operation method thereof
US9244767B1 (en) * 2014-07-07 2016-01-26 Sandisk Technologies Inc. Data storage device with in-memory parity circuitry
US20180025777A1 (en) * 2016-07-19 2018-01-25 Sandisk Technologies Llc High-reliability memory read technique

Family Cites Families (54)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE2619663C3 (en) 1976-05-04 1982-07-22 Siemens AG, 1000 Berlin und 8000 München Field effect transistor, method of its operation and use as a high-speed switch and in an integrated circuit
US4987090A (en) 1987-07-02 1991-01-22 Integrated Device Technology, Inc. Static ram cell with trench pull-down transistors and buried-layer ground plate
JP3073645B2 (en) 1993-12-27 2000-08-07 株式会社東芝 Nonvolatile semiconductor memory device and method of operating the same
US6107882A (en) 1997-12-11 2000-08-22 Lucent Technologies Inc. Amplifier having improved common mode voltage range
US6960499B2 (en) 1998-02-24 2005-11-01 Texas Instruments Incorporated Dual-counterdoped channel field effect transistor and method
US6313486B1 (en) 2000-06-15 2001-11-06 Board Of Regents, The University Of Texas System Floating gate transistor having buried strained silicon germanium channel layer
US6829598B2 (en) 2000-10-02 2004-12-07 Texas Instruments Incorporated Method and apparatus for modeling a neural synapse function by utilizing a single conventional MOSFET
US6703661B2 (en) 2001-12-27 2004-03-09 Ching-Yuan Wu Contactless NOR-type memory array and its fabrication methods
US7019998B2 (en) 2003-09-09 2006-03-28 Silicon Storage Technology, Inc. Unified multilevel cell memory
JP4620943B2 (en) 2003-10-16 2011-01-26 キヤノン株式会社 Product-sum operation circuit and method thereof
US7057216B2 (en) 2003-10-31 2006-06-06 International Business Machines Corporation High mobility heterojunction complementary field effect transistors and methods thereof
US6906940B1 (en) 2004-02-12 2005-06-14 Macronix International Co., Ltd. Plane decoding method and device for three dimensional memories
US20050287793A1 (en) 2004-06-29 2005-12-29 Micron Technology, Inc. Diffusion barrier process for routing polysilicon contacts to a metallization layer
US8058636B2 (en) 2007-03-29 2011-11-15 Panasonic Corporation Variable resistance nonvolatile memory apparatus
US8155428B2 (en) * 2007-09-07 2012-04-10 Kla-Tencor Corporation Memory cell and page break inspection
CN101256499B (en) * 2008-04-02 2010-09-08 凌阳科技股份有限公司 Method and system for downloading object file
US8860124B2 (en) 2009-01-15 2014-10-14 Macronix International Co., Ltd. Depletion-mode charge-trapping flash device
JP5462490B2 (en) 2009-01-19 2014-04-02 株式会社日立製作所 Semiconductor memory device
JP5317742B2 (en) 2009-02-06 2013-10-16 株式会社東芝 Semiconductor device
US8203187B2 (en) 2009-03-03 2012-06-19 Macronix International Co., Ltd. 3D memory array arranged for FN tunneling program and erase
JP2011065693A (en) 2009-09-16 2011-03-31 Toshiba Corp Non-volatile semiconductor memory device
JP5390337B2 (en) * 2009-10-26 2014-01-15 株式会社東芝 Semiconductor memory device
US8275728B2 (en) 2009-11-05 2012-09-25 The United States Of America As Represented By The Secretary Of The Air Force Neuromorphic computer
US8311965B2 (en) 2009-11-18 2012-11-13 International Business Machines Corporation Area efficient neuromorphic circuits using field effect transistors (FET) and variable resistance material
SG10201700467UA (en) 2010-02-07 2017-02-27 Zeno Semiconductor Inc Semiconductor memory device having electrically floating body transistor, and having both volatile and non-volatile functionality and method
US8331127B2 (en) 2010-05-24 2012-12-11 Macronix International Co., Ltd. Nonvolatile memory device having a transistor connected in parallel with a resistance switching device
US9342780B2 (en) 2010-07-30 2016-05-17 Hewlett Packard Enterprise Development Lp Systems and methods for modeling binary synapses
US20120044742A1 (en) 2010-08-20 2012-02-23 Micron Technology, Inc. Variable resistance memory array architecture
US8432719B2 (en) 2011-01-18 2013-04-30 Macronix International Co., Ltd. Three-dimensional stacked and-type flash memory structure and methods of manufacturing and operating the same hydride
US8630114B2 (en) 2011-01-19 2014-01-14 Macronix International Co., Ltd. Memory architecture of 3D NOR array
US8750042B2 (en) 2011-07-28 2014-06-10 Sandisk Technologies Inc. Combined simultaneous sensing of multiple wordlines in a post-write read (PWR) and detection of NAND failures
JP5722180B2 (en) 2011-09-26 2015-05-20 株式会社日立製作所 Nonvolatile memory device
US9698185B2 (en) 2011-10-13 2017-07-04 Omnivision Technologies, Inc. Partial buried channel transfer device for image sensors
US8981445B2 (en) 2012-02-28 2015-03-17 Texas Instruments Incorporated Analog floating-gate memory with N-channel and P-channel MOS transistors
JP5998521B2 (en) 2012-02-28 2016-09-28 セイコーエプソン株式会社 Nonvolatile semiconductor memory and method for manufacturing nonvolatile semiconductor memory
US9019771B2 (en) 2012-10-26 2015-04-28 Macronix International Co., Ltd. Dielectric charge trapping memory cells with redundancy
KR20140113024A (en) 2013-03-15 2014-09-24 에스케이하이닉스 주식회사 Resistance variable Memory Device And Method of Driving The Same
KR102179899B1 (en) 2013-08-05 2020-11-18 삼성전자주식회사 Neuromophic system and configuration method thereof
US9698156B2 (en) 2015-03-03 2017-07-04 Macronix International Co., Ltd. Vertical thin-channel memory
US9431099B2 (en) 2014-11-11 2016-08-30 Snu R&Db Foundation Neuromorphic device with excitatory and inhibitory functionalities
KR20160073847A (en) 2014-12-17 2016-06-27 에스케이하이닉스 주식회사 Electronic device and method for fabricating the same
CN105989089A (en) * 2015-02-12 2016-10-05 阿里巴巴集团控股有限公司 Data comparison method and device
US9524980B2 (en) 2015-03-03 2016-12-20 Macronix International Co., Ltd. U-shaped vertical thin-channel memory
KR20160122531A (en) 2015-04-14 2016-10-24 에스케이하이닉스 주식회사 Electronic device
US9934463B2 (en) 2015-05-15 2018-04-03 Arizona Board Of Regents On Behalf Of Arizona State University Neuromorphic computational system(s) using resistive synaptic devices
US9589982B1 (en) 2015-09-15 2017-03-07 Macronix International Co., Ltd. Structure and method of operation for improved gate capacity for 3D NOR flash memory
US9892800B2 (en) 2015-09-30 2018-02-13 Sunrise Memory Corporation Multi-gate NOR flash thin-film transistor strings arranged in stacked horizontal active strips with vertical control gates
US9842651B2 (en) 2015-11-25 2017-12-12 Sunrise Memory Corporation Three-dimensional vertical NOR flash thin film transistor strings
KR102084378B1 (en) * 2015-10-23 2020-03-03 가부시키가이샤 한도오따이 에네루기 켄큐쇼 Semiconductor device and electronic device
TWI625729B (en) * 2015-11-25 2018-06-01 旺宏電子股份有限公司 Data allocating method and electric system using the same
WO2017091338A1 (en) 2015-11-25 2017-06-01 Eli Harari Three-dimensional vertical nor flash thin film transistor strings
KR20170065969A (en) 2015-12-04 2017-06-14 에스케이하이닉스 주식회사 Memory device and operation method for the same
WO2017158466A1 (en) 2016-03-18 2017-09-21 Semiconductor Energy Laboratory Co., Ltd. Semiconductor device and system using the same
US20180007302A1 (en) 2016-07-01 2018-01-04 Google Inc. Block Operations For An Image Processor Having A Two-Dimensional Execution Lane Array and A Two-Dimensional Shift Register

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5301162A (en) * 1992-03-26 1994-04-05 Nec Corporation Semiconductor random access memory device having shared sense amplifiers serving as a cache memory
US8386884B2 (en) * 2009-07-14 2013-02-26 Macronix International Co., Ltd. Memory apparatus with multi-level cells and operation method thereof
US9244767B1 (en) * 2014-07-07 2016-01-26 Sandisk Technologies Inc. Data storage device with in-memory parity circuitry
US20180025777A1 (en) * 2016-07-19 2018-01-25 Sandisk Technologies Llc High-reliability memory read technique

Also Published As

Publication number Publication date
CN111261210A (en) 2020-06-09
US10672469B1 (en) 2020-06-02
TW202022865A (en) 2020-06-16
TWI696189B (en) 2020-06-11
CN111261210B (en) 2022-02-22

Similar Documents

Publication Publication Date Title
US11568223B2 (en) Neural network circuit
US9697879B2 (en) Memory device with shared read/write circuitry
US11170839B2 (en) Programming non-volatile memory arrays with automatic programming pulse amplitude adjustment using current-limiting circuits
US10672483B2 (en) Semiconductor memory device
US8477547B2 (en) Semiconductor memory device and method of operating the same
US9064578B2 (en) Enable/disable of memory chunks during memory access
CN103370746B (en) The method of storer and programmable memory
US11550717B2 (en) Apparatuses and methods for concurrently accessing different memory planes of a memory
CN111540390B (en) Apparatus and method for determining data state of memory cell
US20150070989A1 (en) Nonvolatile semiconductor memory device
US10672469B1 (en) In-memory convolution for machine learning
TWI718180B (en) Sram device capable of working in multiple low voltages without loss of performance
US11636325B2 (en) In-memory data pooling for machine learning
US7518924B2 (en) NOR architecture memory and operation method thereof
US20120106246A1 (en) Non-volatile semiconductor memory device, method of writing the same, and semiconductor device
US8687454B2 (en) Semiconductor storage apparatus and semiconductor integrated circuit
CN210052532U (en) Multi-layer unit NAND flash memory
US11756645B2 (en) Control circuit, memory system and control method
US20210272634A1 (en) Semiconductor memory device and reading method
CN116597882A (en) Memory device and programming method using the same
KR20230167916A (en) Semiconductor memory device and operating method thereof

Legal Events

Date Code Title Description
FEPP Fee payment procedure

Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

STCF Information on status: patent grant

Free format text: PATENTED CASE

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 4