WO2017171768A1 - Reordering matrices - Google Patents

Reordering matrices Download PDF

Info

Publication number
WO2017171768A1
WO2017171768A1 PCT/US2016/025141 US2016025141W WO2017171768A1 WO 2017171768 A1 WO2017171768 A1 WO 2017171768A1 US 2016025141 W US2016025141 W US 2016025141W WO 2017171768 A1 WO2017171768 A1 WO 2017171768A1
Authority
WO
WIPO (PCT)
Prior art keywords
array
matrix
output
reordering
resistive memory
Prior art date
Application number
PCT/US2016/025141
Other languages
French (fr)
Inventor
Naveen Muralimanohar
Ali SHAFIEE ARDESTANI
Original Assignee
Hewlett Packard Enterprise Development Lp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hewlett Packard Enterprise Development Lp filed Critical Hewlett Packard Enterprise Development Lp
Priority to PCT/US2016/025141 priority Critical patent/WO2017171768A1/en
Publication of WO2017171768A1 publication Critical patent/WO2017171768A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06GANALOGUE COMPUTERS
    • G06G7/00Devices in which the computing operation is performed by varying electric or magnetic quantities
    • G06G7/12Arrangements for performing computing operations, e.g. operational amplifiers
    • G06G7/16Arrangements for performing computing operations, e.g. operational amplifiers for multiplication or division
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C13/00Digital stores characterised by the use of storage elements not covered by groups G11C11/00, G11C23/00, or G11C25/00
    • G11C13/0002Digital stores characterised by the use of storage elements not covered by groups G11C11/00, G11C23/00, or G11C25/00 using resistive RAM [RRAM] elements
    • G11C13/0021Auxiliary circuits
    • G11C13/0069Writing or programming circuits or methods

Definitions

  • Resistive memory devices such as “memristors” have been described in which an electrical component is capable of being written with a resistance in a non-volatile manner. Use of arrays of such devices has been proposed for carrying out logical operations. BRIEF DESCRIPTION OF DRAWINGS
  • Figure 1 is a flowchart of an example method for reordering a matrix for use in populating a resistive memory array
  • Figure 2 is a simplified schematic of an example matrix and an example resistive memory array having a representation of the matrix written thereto;
  • Figure 3 is a simplified schematic of an example matrix and an example modified matrix
  • Figure 4 is a simplified schematic of an example resistive memory array having a matrix mapped thereto;
  • Figure 5 is a simplified schematic of an example resistive memory array apparatus
  • Figure 6 is a flowchart of an example method of populating a resistive memory array and employing the array to carry out logical functions
  • Figure 7 is a flowchart of an example method for reordering a matrix
  • Figure 8 is a schematic example of dot product engine apparatus
  • Figure 9 is a schematic example of a processing apparatus.
  • Figure 1 shows an example of a method which may for example be implemented by at least one processor.
  • an original matrix of values representing an operand to be used in processing data with a resistive memory array is received.
  • the operand may comprise, for example, a processing kernel which is designed to carry out a particular processing task.
  • the array comprises a plurality of resistive memory elements arranged in wordlines and bitlines, each wordline comprising an input, and wherein the values of a matrix to be written to an array are arranged in lines to be represented as a resistance of a resistive memory element of a bitline of the resistive memory array.
  • the lines of the original matrix to be mapped to the bitlines are reordered to form a modified matrix, the modified matrix being such that, compared to the original matrix, and when mapped to the array, the distribution of lower resistive elements is adjusted towards the inputs of the wordlines of the array.
  • a resistive memory array comprises a two-dimensional grid of resistive memory elements, which may be a crossbar array. Each element lies in a wordline (by convention, these are usually shown as the rows) and a bitline (by convention, these are usually shown as the columns). Writing to the array comprises setting the resistive value of a resistive memory element.
  • the elements may be a binary bit having one of two values for example, representing 0 or 1 , however resistive memory elements which can take a plurality of values, for example 32 distinct levels (which can represent 5 bits), have been demonstrated.
  • a crossbar array of memristors or other resistive memory elements can process an input voltage vector (for example, comprising a one dimensional array of input voltage values, each of which is applied to a wordline) to provide an output vector comprising or derived from a set of voltages output from the bitlines and in which the input values are weighted by the memristor conductance of the elements of the array.
  • the weights of the elements can be 'programmed' by subjecting the elements to voltage pulses, each voltage pulse incrementally changing the resistance of that element.
  • analogue data may be supplied for processing using a resistive memory array.
  • the data may for example represent at least one pixel of an image, or a word (or sub-word or phrase) of speech, data output from a scientific experiment, or any other data on which a logical operation is to be performed.
  • the input data may be provided as a vector, i.e. a one dimensional data string.
  • the vector of values may for example be processed to provide voltages, for example using a digital to analogue conversion.
  • resistive memory crossbar arrays used in such a manner can suffer from sneak currents. For example, sneak currents may be seen when current passes through resistive elements which are intended to be in an Off or nonconductive state. Sneak currents can negatively impact accuracy and energy efficiency (for example, energy consumption) of an array.
  • Figure 2 shows an example of a matrix and a demonstration of how this matrix 202 is mapped to, or represented by, a resistive memory array 204.
  • the array 204 comprises a plurality of wordlines 206a-e and a plurality of bitlines 208a-e. Each of the wordlines 206a-e comprises an input 21 Oa-e.
  • the array 204 is made up of resistive memory elements 212, 214 (only two of which are labelled to avoid complicating the figures). In this example, there are two states for resistive memory element- low (shown as white squares, such as element 212) and high (shown as black squares, such as element
  • the O's of the matrix 202 are represented as high resistance (low conductance) black elements 214, whereas the Ts are represent as low resistance (high conductance) white elements 212.
  • Each element 212, 214 could take either value of 0 or 1 :
  • Figure 2 shows the array 204 in a particular state, in which the matrix 202 has been mapped thereto.
  • the columns of the matrix 202 are mapped to the columns (bitlines 208) of the array 204, but in other examples, the rows may be mapped to the bitlines 208. Therefore, in this example, the columns are reordered but if the rows were to be mapped to bitlines, then the rows would be reordered.
  • Figure 3 shows an example in which columns of the matrix 202 have been reordered such that the distribution of ⁇ 's is moved to the left in the modified matrix 202'.
  • the columns have been identified by letters A-E to show how they have been reordered.
  • the distribution of low resistance elements 212 will also be shifted toward the left, i.e. towards the inputs 21 Oa-e.
  • Figure 4 shows an example of an array 400 in a state in which the modified matrix 202' has been mapped thereto, such that it represents the modified matrix 202'.
  • an output operator 402 is also shown.
  • the output operator 402 reorders output voltages to restore the original matrix line order as will be further discussed below. This reordering restores the processing effect of the original matrix, but does so using an array 400 in which the distribution of the low resistance elements has been shifted towards the inputs of the word lines 206a-e.
  • the apparatus of Figure 4 may provide a dot product engine (DPE) which is capable of carrying out logical operations.
  • a DPE may comprise a processing unit of a processing apparatus which may carry out computational tasks such as speech or image recognition, machine learning or any other data processing task.
  • FIG. 5 shows an example 3x3 array of resistive memory elements 502a-i arranged in a crossbar array.
  • Sneak currents may be seen when current passes through high resistance elements 502.
  • the element 502a closest to the input of V in 1 is a high resistance element. If the resistance was infinite, all of the current would continue along the top wordline. However, as the resistance of 502a is finite, an amount of sneak current passes through the element 502a and contributes to I 0u t1 (a voltage value taken across a resistor, not shown, may then be derived from this current).
  • a lower IR drop may allow a drop in a voltage driver size (for example, the transistors in which a voltage driver is implemented may be smaller, leading to a reduction in the overall size of the voltage driver) and a reduction in energy consumed.
  • the method set out above addresses IR drop in a simple manner, and may therefore mean that a mapping of reduced complexity could be used as, through use of a modified matrix, the detrimental effects of parasitic capacitance, noise, non-linearity, IR drop and the like may be reduced.
  • Figure 6 is an example of a method, which may in some examples follow block 102 and 104 described above in relation to Figure 1.
  • bock 602 a modified matrix is mapped to a resistive memory array.
  • the process of mapping may comprise, for example, accounting for (remaining) array parasitic currents and non-linearity, data patterns and the location of a cell in an array or determining an offset or a scaling factor to be applied. Therefore, while the array may be written to represent a matrix, the voltage values may not be directly proportional to the values of the matrix.
  • an input vector is applied to the array.
  • the input vector comprises a plurality of voltage values, and one voltage value is applied to each wordline.
  • output voltage values are obtained.
  • these output voltage values are reordered to have an order corresponding to the arrangement of the original matrix. It may be noted that, at the time of reordering, the voltage values may be represented as digital values, for example following an analogue to digital conversion of the voltage output to provide a digital representation of the voltages.
  • these reordered output voltage values provide an output vector.
  • the output vector comprises a plurality of voltage values output by the array. The output values comprise the input values weighted by the resistances of the resistive memory elements.
  • the output values represent the dot product of the input vector and the original matrix
  • the original matrix which may for example represent an operand defined to carry out a particular process, such as image sharpening, matrix- by-vector multiplication, or operations as part of a fully connected neural network, a convolutional neural network, or the like
  • the original matrix which may for example represent an operand defined to carry out a particular process, such as image sharpening, matrix- by-vector multiplication, or operations as part of a fully connected neural network, a convolutional neural network, or the like
  • Figure 7 shows an example of method, which may be a computer implemented method, which provides one example of a method of carrying out block 104.
  • reordering of the lines of the matrix comprises, in block 702, determining a total resistance value for each line of the matrix to be represented by the bitlines of the resistive memory array.
  • the line having the lowest total resistance value not yet assigned a positon is identified.
  • Block 706 comprises reordering the matrix such that line is to be mapped to the bitline closest to the input of the wordlines of the array. In this manner, the line associated with the lowest resistance value will be assigned to a position corresponding to the inputs to the wordlines (i.e., in the example of Figure 4, the left-most bitline).
  • This process may be iterated through the array to result in an ordering of the matrix such that the total resistance to represent the values of the bitlines of the array increases with distance from the inputs of the wordlines of the array.
  • Blocks 704 and 706 may be carried out until all the lines have an assigned position.
  • the method continues by, in block 708, determining a first parameter (or parameter value) of an array written to represent the modified matrix
  • a first parameter or parameter value
  • such a parameter may comprise a modelled or estimated average voltage drop along the wordlines of the array, an average noise associated with the outputs of the array, or some other parameter related to the energy efficiency of the array.
  • the parameter may for example be modelled based on a model of a resistive memory array representing the matrix, for example, a computer simulation of such a model.
  • Block 710 a pair of lines in the modified matrix are selected (i.e. reordered lines, or lines in the reordered matrix).
  • the selected lines may be neighboring lines or may be spaced within the matrix.
  • the lines may for example be selected at random.
  • Block 712 comprises determining a second parameter (or parameter value), this parameter being modelled or estimated for an array to be written with the position of the selected lines exchanged.
  • it is determined whether the parameter is improved by the exchange for example, if the parameter is an average voltage drop, whether the second average voltage drop is lower than the first average voltage drop), for example resulting in improved energy efficiency (e.g.
  • Block 716 the position of the selected lines in the modified matrix may be exchanged (block 716) and the second parameter may be treated as the first parameter. If not, the position of the lines may be maintained. Blocks 710 to 716 may be carried out repeatedly, each time with reference to the lower of the first and second voltage drops of the previous iteration, and selecting a different pair of lines.
  • Both the modified matrices may be considered as candidate modified matrices from which a selection is made based on a parameter.
  • a plurality of matrices may be defined as candidate modified matrices by reordering the lines of the original matrix and at least one parameter of an array written to represent each of the plurality of candidate modified matrices may be modelled. These parameters may be compared as described in relation to block 714 above, and a candidate modified matrix may be selected as the modified matrix based on the comparison.
  • the different positions of the lines to be mapped to bitlines could be compared in terms of the voltage drop across word lines, and the configuration of lines having the lowest voltage drop identified and adopted as the configuration for the modified matrix. All of the positions could for example be determined when the number of possible configurations is relatively small (the term 'small' in this context being dependent on the processing resources available, and it may be that millions of combinations could be considered a relatively small number).
  • the degree to which an output of a matrix can be 'de- shuffled' may be constrained, for example by use of a particular hardware.
  • all combinations which may be 'de-shuffled' by a particular hardware may be considered and compared.
  • FIG 8 is an example of a dot product engine (DPE) 800.
  • the DPE 800 comprises a resistive memory array 802 and an output operator 804.
  • the array 802 in this example, an 8x8 array 802, is an array of resistive memory elements comprising a plurality of wordlines and a plurality of bitlines (for example as illustrated in relation to Figures 2, 4 or 5).
  • the elements of the array 802 each comprise a resistance, and the array is to receive an input voltage on each of at least one wordline and is to output an output voltage from each of a plurality of bitlines.
  • the output operator 804 generates an output vector, the output vector comprising reordered output voltage values such that the order of values in the output vector is different to the bitline order of the voltage values.
  • the output operator 804 comprises a plurality of sets 806 of switches 808 (only some of which are labelled to avoid overcomplicating the Figure). Each set 806 comprises a switch 808 associated with the output of each bitline, and the order of the values may be changed by each set 806 of switches 808.
  • a switch 808 may be a 'straight through' switch in a first state, in which case, the placement of the value received thereby in an order is unchanged, and transfer' switch in the second state, in which case the value is moved to a different positon in the order.
  • the output operator 804 may comprise log(N) sets of switches, each comprising N/2 switches, and each switch is reconfigurable between a first state, in which it is to change the positon of a value within the order, and a second state, in which it is to maintain the position of a value within an order. This allows for any reconfiguration of the output, and therefore also allows for any reconfiguration of the columns of an original matrix to a modified matrix.
  • the output operator 804 may comprises M sets of switches, each set of switches comprising N/2 switches.
  • the switches 808 may be controlled as part of the corresponding set 806 such that a set of switches may have a first configuration, in which each of the switches 808 has one of the first and second state, and a second configuration, in which each of the switches 808 has the other of the first and second state than the state of that switch in the first configuration.
  • a switch 808 is a 'straight through' switch in a first configuration, it will be transfer' switch in the second configuration, and the whole set may be switched by a single signal from one configuration to another.
  • the two configurations are the inverse of each other.
  • the options for 'de-shuffling' may be constrained, for example by use of a particular hardware.
  • the potential configurations of a modified array may be selected by toggling between switch configurations. This can lead to a set of possible modified matrices which, in some examples, may be compared, for example based on a modelled parameter of an array written to represent the matrix In some examples, this may comprise the basis for selecting one of the possible modified matrices (for example, a modified matrix associated with a favourable parameter value) to be represented by an array
  • the DPE 800 further comprises a voltage driver 810, which can be used both to write resistances to the resistive elements using a relatively high voltage and to apply an input voltage vector, and a bank of resistors 812, from which output voltage values are measured.
  • An analogue to digital converter ADC 814 is also provided.
  • an output operator may comprise a lookup table or the like, which re-arranges an initial output vector to form a final output vector. In such examples, the reordering of the original matrix may be stored in a memory and used to reorder the output of the array 802.
  • FIG. 9 shows an example of a processing apparatus 900.
  • the apparatus 900 comprises at least one DPE 902 (in this example, comprising a plurality of DPEs 902), a processor 904 to control the DPEs 902 and a memory 906 holding instructions which, when executed, cause the processor to carry out certain processes.
  • the instructions may be to cause the processor 904 to carry out at least some of the blocks of Figure 1 , 6 or 7.
  • the processor 904 may also control components of the DPE 902, for example, voltage driver(s) and/or output operators) therein.
  • voltage driver(s) and/or output operators may for example be controlled write a resistive memory array to represent a matrix, and may apply voltages to wordlines within that array.
  • the voltage drivers may be controlled to carry out at least some of the blocks of
  • the switching state of output operators may be controlled by the processor 904.
  • the memory 906 may also store matrices, input voltage vectors and/or output voltage vectors.
  • the DPEs 902 each comprise a matrix reordering module 908, which may in some examples comprise a processor.
  • the matrix reordering module 908 carries out, for example by executing machine readable instructions stored in the memory 906, a row-wise or column-wise reordering of an original matrix, wherein the reordering is to adjust a distribution of lower resistance elements to be closer to an input end of an array of resistive elements of a DPE 902.
  • the matrix may be reordered into one of 2M configurations. In this example, the reordering is therefore be carried out using a matrix reordering module 908 associated with a particular DPE 902.
  • the processor 904 may carry out the reordering (for example, the matrix reordering module 908 may comprise a component of the processor 904).
  • one matrix reordering module 908 may be a resource which is shared by more than one DPE 902. This may assist in achieving a relatively small size for the processing apparatus 900 (in particular if the processing apparatus 900 comprises more than one DPE 902).
  • An output reordering module may also be provided in some examples, for example comprising a processor (which may in some examples be the same processor as provided the matrix reordering module 908 and/or the processor 904). Such an output reordering module may for example carry out the actions described in relation to the output operator 804 above. Such an output reordering module may be provided as part of the DPE 902 or within the processor 904.
  • Examples in the present disclosure can be provided as methods, systems or machine readable instructions, such as any combination of software, hardware, firmware or the like.
  • Such machine readable instructions may be included on a computer readable storage medium (including but is not limited to disc storage, CD-ROM, optical storage, etc.) having computer readable program codes therein or thereon.
  • the machine readable instructions may, for example, be executed by a general purpose computer, a special purpose computer, an embedded processor or processors of other programmable data processing devices to realize the functions described in the description and diagrams (for example, the processor 904).
  • a processor or processing apparatus may execute the machine readable instructions.
  • functional modules of the apparatus and devices may be implemented by a processor executing machine readable instructions stored in a memory, or a processor operating in accordance with instructions embedded in logic circuitry.
  • the term 'processor * is to be interpreted broadly to include a CPU, processing unit,
  • Such machine readable instructions may also be stored in a computer readable storage (for example, the memory 906) that can guide the computer or other programmable data processing devices to operate in a specific mode.
  • Such machine readable instructions may also be loaded onto a computer or other programmable data processing devices, so that the computer or other programmable data processing devices perform a series of operations to produce computer-implemented processing, thus the instructions executed on the computer or other programmable devices realize functions specified by flow(s) in the flow charts and/or block(s) in the block diagrams.
  • teachings herein may be implemented in the form of a computer software product, the computer software product being stored in a storage medium and comprising a plurality of instructions for making a computer device implement the methods recited in the examples of the present disclosure.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Power Engineering (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • General Physics & Mathematics (AREA)
  • Semiconductor Memories (AREA)

Abstract

In an example, a method includes receiving, by at least one processor, an original matrix of values representing an operand to be used in processing data with a resistive memory array. The resistive memory array may include a plurality of resistive memory elements arranged in wordlines and bitlines, each wordline including an input. The values of the matrix may be arranged in lines to be represented as a resistance of a resistive memory element of a bitline of the resistive memory array. The method may include reordering, by the at least one processor, the lines of the original matrix to form a modified matrix. Compared to the original matrix, the reordered matrix may be such that, when the reordered lines are mapped to the bitlines of an array, the distribution of lower resistive elements is adjusted to be towards the inputs of the wordlines of the array.

Description

REORDERING MATRICES
BACKGROUND
[0001] Resistive memory devices, such as "memristors" have been described in which an electrical component is capable of being written with a resistance in a non-volatile manner. Use of arrays of such devices has been proposed for carrying out logical operations. BRIEF DESCRIPTION OF DRAWINGS
[0002] Non-limiting examples will now be described with reference to the accompanying drawings, in which:
[0003] Figure 1 is a flowchart of an example method for reordering a matrix for use in populating a resistive memory array;
[0004] Figure 2 is a simplified schematic of an example matrix and an example resistive memory array having a representation of the matrix written thereto;
[0005] Figure 3 is a simplified schematic of an example matrix and an example modified matrix;
[0006] Figure 4 is a simplified schematic of an example resistive memory array having a matrix mapped thereto;
[0007] Figure 5 is a simplified schematic of an example resistive memory array apparatus;
[0008] Figure 6 is a flowchart of an example method of populating a resistive memory array and employing the array to carry out logical functions;
[0009] Figure 7 is a flowchart of an example method for reordering a matrix;
[0010] Figure 8 is a schematic example of dot product engine apparatus; and
[0011] Figure 9 is a schematic example of a processing apparatus.
DETAILED DESCRIPTION [0012] Figure 1 shows an example of a method which may for example be implemented by at least one processor. In block 102, an original matrix of values representing an operand to be used in processing data with a resistive memory array is received. The operand may comprise, for example, a processing kernel which is designed to carry out a particular processing task.
In this example, the array comprises a plurality of resistive memory elements arranged in wordlines and bitlines, each wordline comprising an input, and wherein the values of a matrix to be written to an array are arranged in lines to be represented as a resistance of a resistive memory element of a bitline of the resistive memory array.
[0013] In block 104, the lines of the original matrix to be mapped to the bitlines are reordered to form a modified matrix, the modified matrix being such that, compared to the original matrix, and when mapped to the array, the distribution of lower resistive elements is adjusted towards the inputs of the wordlines of the array.
[0014] In an example, a resistive memory array comprises a two-dimensional grid of resistive memory elements, which may be a crossbar array. Each element lies in a wordline (by convention, these are usually shown as the rows) and a bitline (by convention, these are usually shown as the columns). Writing to the array comprises setting the resistive value of a resistive memory element. In some examples, the elements may be a binary bit having one of two values for example, representing 0 or 1 , however resistive memory elements which can take a plurality of values, for example 32 distinct levels (which can represent 5 bits), have been demonstrated.
[0015] A crossbar array of memristors or other resistive memory elements can process an input voltage vector (for example, comprising a one dimensional array of input voltage values, each of which is applied to a wordline) to provide an output vector comprising or derived from a set of voltages output from the bitlines and in which the input values are weighted by the memristor conductance of the elements of the array. This effectively means that the array performs a dot product matrix operation on the input to produce an output. The weights of the elements can be 'programmed' by subjecting the elements to voltage pulses, each voltage pulse incrementally changing the resistance of that element. [0016] In an example, analogue data may be supplied for processing using a resistive memory array. The data may for example represent at least one pixel of an image, or a word (or sub-word or phrase) of speech, data output from a scientific experiment, or any other data on which a logical operation is to be performed. The input data may be provided as a vector, i.e. a one dimensional data string. The vector of values may for example be processed to provide voltages, for example using a digital to analogue conversion.
[0017] Unlike some memory arrays such as DRAM, where individual wordlines and at least one bitline is activated for a read or write activity, in a resistive memory array, a significant portion of an array (sometimes the entire array) may be accessed to carry out a computation in a small, in some examples, minimum, number of cycles. Resistive memory crossbar arrays used in such a manner can suffer from sneak currents. For example, sneak currents may be seen when current passes through resistive elements which are intended to be in an Off or nonconductive state. Sneak currents can negatively impact accuracy and energy efficiency (for example, energy consumption) of an array.
[0018] Figure 2 shows an example of a matrix and a demonstration of how this matrix 202 is mapped to, or represented by, a resistive memory array 204. The array 204 comprises a plurality of wordlines 206a-e and a plurality of bitlines 208a-e. Each of the wordlines 206a-e comprises an input 21 Oa-e. The array 204 is made up of resistive memory elements 212, 214 (only two of which are labelled to avoid complicating the figures). In this example, there are two states for resistive memory element- low (shown as white squares, such as element 212) and high (shown as black squares, such as element
214). It will be noted that the O's of the matrix 202 are represented as high resistance (low conductance) black elements 214, whereas the Ts are represent as low resistance (high conductance) white elements 212.
[0019] Each element 212, 214 could take either value of 0 or 1 : Figure 2 shows the array 204 in a particular state, in which the matrix 202 has been mapped thereto.
[0020] In this example, the columns of the matrix 202 are mapped to the columns (bitlines 208) of the array 204, but in other examples, the rows may be mapped to the bitlines 208. Therefore, in this example, the columns are reordered but if the rows were to be mapped to bitlines, then the rows would be reordered.
[0021] Figure 3 shows an example in which columns of the matrix 202 have been reordered such that the distribution of Ί 's is moved to the left in the modified matrix 202'. The columns have been identified by letters A-E to show how they have been reordered. When such a matrix 202' is represented in an array, the distribution of low resistance elements 212 will also be shifted toward the left, i.e. towards the inputs 21 Oa-e.
[0022] Figure 4 shows an example of an array 400 in a state in which the modified matrix 202' has been mapped thereto, such that it represents the modified matrix 202'. In this example, an output operator 402 is also shown. The output operator 402 reorders output voltages to restore the original matrix line order as will be further discussed below. This reordering restores the processing effect of the original matrix, but does so using an array 400 in which the distribution of the low resistance elements has been shifted towards the inputs of the word lines 206a-e. In some examples, the apparatus of Figure 4 may provide a dot product engine (DPE) which is capable of carrying out logical operations. In some examples, a DPE may comprise a processing unit of a processing apparatus which may carry out computational tasks such as speech or image recognition, machine learning or any other data processing task.
[0023] The effect of this reordering is now discussed with reference to Figure 5, which shows an example 3x3 array of resistive memory elements 502a-i arranged in a crossbar array. Sneak currents may be seen when current passes through high resistance elements 502. In an example, the element 502a closest to the input of Vin1 is a high resistance element. If the resistance was infinite, all of the current would continue along the top wordline. However, as the resistance of 502a is finite, an amount of sneak current passes through the element 502a and contributes to I0ut1 (a voltage value taken across a resistor, not shown, may then be derived from this current). This current is determined according to the relationship V/R, where V is Vin1 and R is the resistance of the element 502a. If however the element 502a is a low resistance element, then the majority of the current passes to loutl the start of each wordline, and the remainder of the wordline has relatively reduced current flowing there through. This means the voltage drop (determined according to the relationship V=IR) across the rest of the line will be lower. It will be noted that in the last column, all of the current remaining on a wordline passes through an element 502c, 502f, 504i. As the current is relatively low, the voltage drop associated with this element is also relatively low.
[0024] In an arrangement where the current drops off (for example on average over the array, or by minimizing the maximum drop over each of the wordlines) earlier in a wordline than would be the case were the original matrix used (or correspondingly, the shorter the high current path along the wordline), the lower the IR drop is over the wordline as a whole. This in turn can change the effective voltage drop in the last column cells (i.e., the farthest column from the voltage source). A lower IR drop may allow a drop in a voltage driver size (for example, the transistors in which a voltage driver is implemented may be smaller, leading to a reduction in the overall size of the voltage driver) and a reduction in energy consumed.
[0025] The method set out above addresses IR drop in a simple manner, and may therefore mean that a mapping of reduced complexity could be used as, through use of a modified matrix, the detrimental effects of parasitic capacitance, noise, non-linearity, IR drop and the like may be reduced.
[0026] Figure 6 is an example of a method, which may in some examples follow block 102 and 104 described above in relation to Figure 1. In bock 602, a modified matrix is mapped to a resistive memory array. It may be noted that the process of mapping may comprise, for example, accounting for (remaining) array parasitic currents and non-linearity, data patterns and the location of a cell in an array or determining an offset or a scaling factor to be applied. Therefore, while the array may be written to represent a matrix, the voltage values may not be directly proportional to the values of the matrix.
[0027] In block 604, an input vector is applied to the array. In an example, the input vector comprises a plurality of voltage values, and one voltage value is applied to each wordline. In block 606, output voltage values are obtained. In block 608, these output voltage values are reordered to have an order corresponding to the arrangement of the original matrix. It may be noted that, at the time of reordering, the voltage values may be represented as digital values, for example following an analogue to digital conversion of the voltage output to provide a digital representation of the voltages. In block 610, these reordered output voltage values provide an output vector. In some examples, the output vector comprises a plurality of voltage values output by the array. The output values comprise the input values weighted by the resistances of the resistive memory elements. In other words, the output values represent the dot product of the input vector and the original matrix This allows the operation of the original matrix (which may for example represent an operand defined to carry out a particular process, such as image sharpening, matrix- by-vector multiplication, or operations as part of a fully connected neural network, a convolutional neural network, or the like) to be implemented in a more energy efficient manner.
[0028] Figure 7 shows an example of method, which may be a computer implemented method, which provides one example of a method of carrying out block 104. In this example, reordering of the lines of the matrix comprises, in block 702, determining a total resistance value for each line of the matrix to be represented by the bitlines of the resistive memory array. In block 704, the line having the lowest total resistance value not yet assigned a positon is identified. Block 706 comprises reordering the matrix such that line is to be mapped to the bitline closest to the input of the wordlines of the array. In this manner, the line associated with the lowest resistance value will be assigned to a position corresponding to the inputs to the wordlines (i.e., in the example of Figure 4, the left-most bitline). This process may be iterated through the array to result in an ordering of the matrix such that the total resistance to represent the values of the bitlines of the array increases with distance from the inputs of the wordlines of the array. Blocks 704 and 706 may be carried out until all the lines have an assigned position.
[0029] Although it may the case that such a reordered matrix will result in an array associated with a low voltage drop, this may be verified or enhanced in some examples. In particular, in this example, the method continues by, in block 708, determining a first parameter (or parameter value) of an array written to represent the modified matrix For example, such a parameter may comprise a modelled or estimated average voltage drop along the wordlines of the array, an average noise associated with the outputs of the array, or some other parameter related to the energy efficiency of the array. The parameter may for example be modelled based on a model of a resistive memory array representing the matrix, for example, a computer simulation of such a model.
[0030] In block 710, a pair of lines in the modified matrix are selected (i.e. reordered lines, or lines in the reordered matrix). The selected lines may be neighboring lines or may be spaced within the matrix. The lines may for example be selected at random. Block 712 comprises determining a second parameter (or parameter value), this parameter being modelled or estimated for an array to be written with the position of the selected lines exchanged. In block 714, it is determined whether the parameter is improved by the exchange (for example, if the parameter is an average voltage drop, whether the second average voltage drop is lower than the first average voltage drop), for example resulting in improved energy efficiency (e.g. less energy will be lost to sneak currents and the like in use of an array written to represent the modified matrix having the lines in the exchanged position than if the lines remained in the position prior to exchange). If so, the position of the selected lines in the modified matrix may be exchanged (block 716) and the second parameter may be treated as the first parameter. If not, the position of the lines may be maintained. Blocks 710 to 716 may be carried out repeatedly, each time with reference to the lower of the first and second voltage drops of the previous iteration, and selecting a different pair of lines.
[0031] In other examples, rather than or in addition to comparing voltage drop, values of other parameters, for example the noise of the output voltage, or some other parameter, could be modelled or estimated and compared.
[0032] Both the modified matrices, i.e. before and after exchange of the positons of the lines, may be considered as candidate modified matrices from which a selection is made based on a parameter. In other examples, a plurality of matrices may be defined as candidate modified matrices by reordering the lines of the original matrix and at least one parameter of an array written to represent each of the plurality of candidate modified matrices may be modelled. These parameters may be compared as described in relation to block 714 above, and a candidate modified matrix may be selected as the modified matrix based on the comparison.
[0033] For example, some, most or all of the different positions of the lines to be mapped to bitlines could be compared in terms of the voltage drop across word lines, and the configuration of lines having the lowest voltage drop identified and adopted as the configuration for the modified matrix. All of the positions could for example be determined when the number of possible configurations is relatively small (the term 'small' in this context being dependent on the processing resources available, and it may be that millions of combinations could be considered a relatively small number).
[0034] In some examples, the degree to which an output of a matrix can be 'de- shuffled' (i.e. reordered, for example as described in relation to block 608 above) may be constrained, for example by use of a particular hardware. In such examples, rather than considering all possible combinations, all combinations which may be 'de-shuffled' by a particular hardware may be considered and compared.
[0035] Other examples of considering different combinations could be employed, for example an artificial intelligence based design space exploration technique such as simulated annealing or the like.
[0036] Figure 8 is an example of a dot product engine (DPE) 800. The DPE 800 comprises a resistive memory array 802 and an output operator 804.
[0037] The array 802, in this example, an 8x8 array 802, is an array of resistive memory elements comprising a plurality of wordlines and a plurality of bitlines (for example as illustrated in relation to Figures 2, 4 or 5). The elements of the array 802 each comprise a resistance, and the array is to receive an input voltage on each of at least one wordline and is to output an output voltage from each of a plurality of bitlines.
[0038] The output operator 804 generates an output vector, the output vector comprising reordered output voltage values such that the order of values in the output vector is different to the bitline order of the voltage values. In this example, the output operator 804 comprises a plurality of sets 806 of switches 808 (only some of which are labelled to avoid overcomplicating the Figure). Each set 806 comprises a switch 808 associated with the output of each bitline, and the order of the values may be changed by each set 806 of switches 808. In an example, a switch 808 may be a 'straight through' switch in a first state, in which case, the placement of the value received thereby in an order is unchanged, and transfer' switch in the second state, in which case the value is moved to a different positon in the order. [0039] In some examples, assuming the array 802 comprises N bitlines, the output operator 804 may comprise log(N) sets of switches, each comprising N/2 switches, and each switch is reconfigurable between a first state, in which it is to change the positon of a value within the order, and a second state, in which it is to maintain the position of a value within an order. This allows for any reconfiguration of the output, and therefore also allows for any reconfiguration of the columns of an original matrix to a modified matrix.
[0040] However, this may result in a relatively large switch array, which may in turn be associated with relatively complex switching control and high energy consumption. In some examples (wherein the array 802 again comprises N bitlines), the output operator 804 may comprises M sets of switches, each set of switches comprising N/2 switches. The switches 808 may be controlled as part of the corresponding set 806 such that a set of switches may have a first configuration, in which each of the switches 808 has one of the first and second state, and a second configuration, in which each of the switches 808 has the other of the first and second state than the state of that switch in the first configuration. In other words, if a switch 808 is a 'straight through' switch in a first configuration, it will be transfer' switch in the second configuration, and the whole set may be switched by a single signal from one configuration to another. Thus the two configurations are the inverse of each other.
[0041] As noted above in some examples, the options for 'de-shuffling' may be constrained, for example by use of a particular hardware. Taking the example of the switches 808 in Figure 8, in such examples, the potential configurations of a modified array may be selected by toggling between switch configurations. This can lead to a set of possible modified matrices which, in some examples, may be compared, for example based on a modelled parameter of an array written to represent the matrix In some examples, this may comprise the basis for selecting one of the possible modified matrices (for example, a modified matrix associated with a favourable parameter value) to be represented by an array
[0042] The DPE 800 further comprises a voltage driver 810, which can be used both to write resistances to the resistive elements using a relatively high voltage and to apply an input voltage vector, and a bank of resistors 812, from which output voltage values are measured. An analogue to digital converter ADC 814 is also provided. [0043] In other examples, an output operator may comprise a lookup table or the like, which re-arranges an initial output vector to form a final output vector. In such examples, the reordering of the original matrix may be stored in a memory and used to reorder the output of the array 802.
[0044] Figure 9 shows an example of a processing apparatus 900. The apparatus 900 comprises at least one DPE 902 (in this example, comprising a plurality of DPEs 902), a processor 904 to control the DPEs 902 and a memory 906 holding instructions which, when executed, cause the processor to carry out certain processes. For example the instructions may be to cause the processor 904 to carry out at least some of the blocks of Figure 1 , 6 or 7.
The processor 904 may also control components of the DPE 902, for example, voltage driver(s) and/or output operators) therein. Such voltage drivers may for example be controlled write a resistive memory array to represent a matrix, and may apply voltages to wordlines within that array. The voltage drivers may be controlled to carry out at least some of the blocks of
Figure 6. The switching state of output operators may be controlled by the processor 904. The memory 906 may also store matrices, input voltage vectors and/or output voltage vectors.
[0045] In this example, the DPEs 902 each comprise a matrix reordering module 908, which may in some examples comprise a processor. The matrix reordering module 908 carries out, for example by executing machine readable instructions stored in the memory 906, a row-wise or column-wise reordering of an original matrix, wherein the reordering is to adjust a distribution of lower resistance elements to be closer to an input end of an array of resistive elements of a DPE 902. In some examples, the matrix may be reordered into one of 2M configurations. In this example, the reordering is therefore be carried out using a matrix reordering module 908 associated with a particular DPE 902. In other examples, the processor 904 may carry out the reordering (for example, the matrix reordering module 908 may comprise a component of the processor 904). In some examples, one matrix reordering module 908 may be a resource which is shared by more than one DPE 902. This may assist in achieving a relatively small size for the processing apparatus 900 (in particular if the processing apparatus 900 comprises more than one DPE 902). An output reordering module may also be provided in some examples, for example comprising a processor (which may in some examples be the same processor as provided the matrix reordering module 908 and/or the processor 904). Such an output reordering module may for example carry out the actions described in relation to the output operator 804 above. Such an output reordering module may be provided as part of the DPE 902 or within the processor 904.
[0046] Examples in the present disclosure can be provided as methods, systems or machine readable instructions, such as any combination of software, hardware, firmware or the like. Such machine readable instructions may be included on a computer readable storage medium (including but is not limited to disc storage, CD-ROM, optical storage, etc.) having computer readable program codes therein or thereon.
[0047] The present disclosure is described with reference to flow charts and/or block diagrams of the method, devices and systems according to examples of the present disclosure. Although the flow diagrams described above show a specific order of execution, the order of execution may differ from that which is depicted. Blocks described in relation to one flow chart may be combined with those of another flow chart. It shall be understood that each flow and/or block in the flow charts and/or block diagrams, as well as combinations of the flows and/or diagrams in the flow charts and/or block diagrams can be realized by machine readable instructions.
[0048] The machine readable instructions may, for example, be executed by a general purpose computer, a special purpose computer, an embedded processor or processors of other programmable data processing devices to realize the functions described in the description and diagrams (for example, the processor 904). In particular, a processor or processing apparatus may execute the machine readable instructions. Thus functional modules of the apparatus and devices may be implemented by a processor executing machine readable instructions stored in a memory, or a processor operating in accordance with instructions embedded in logic circuitry. The term 'processor* is to be interpreted broadly to include a CPU, processing unit,
ASIC, logic unit, or programmable gate array etc. The methods and functional modules may all be performed by a single processor or divided amongst several processors. [0049] Such machine readable instructions may also be stored in a computer readable storage (for example, the memory 906) that can guide the computer or other programmable data processing devices to operate in a specific mode.
[0050] Such machine readable instructions may also be loaded onto a computer or other programmable data processing devices, so that the computer or other programmable data processing devices perform a series of operations to produce computer-implemented processing, thus the instructions executed on the computer or other programmable devices realize functions specified by flow(s) in the flow charts and/or block(s) in the block diagrams.
[0051] Further, the teachings herein may be implemented in the form of a computer software product, the computer software product being stored in a storage medium and comprising a plurality of instructions for making a computer device implement the methods recited in the examples of the present disclosure.
[0052] While the method, apparatus and related aspects have been described with reference to certain examples, various modifications, changes, omissions, and substitutions can be made without departing from the spirit of the present disclosure. It is intended, therefore, that the method, apparatus and related aspects be limited only by the scope of the following claims and their equivalents. It should be noted that the above-mentioned examples illustrate rather than limit what is described herein, and that those skilled in the art will be able to design many alternative implementations without departing from the scope of the appended claims. Features described in relation to one example may be combined with features of another example.
[0053] The word "comprising" does not exclude the presence of elements other than those listed in a claim, "a" or "an" does not exclude a plurality, and a single processor or other unit may fulfil the functions of several units recited in the claims.
[0054] The features of any dependent claim may be combined with the features of any of the independent claims or other dependent claims.

Claims

1. A method comprising:
receiving, by at least one processor, an original matrix of values representing an operand to be used in processing data with a resistive memory array, the resistive memory array comprising a plurality of resistive memory elements arranged in wordlines and bitlines, each wordline comprising an input, and wherein the values of a matrix to be represented by an array are arranged in lines to be represented as a resistance of a resistive memory element of a bitline of the resistive memory array; and
reordering, by the at least one processor, the lines of the original matrix to form a modified matrix, the modified matrix being such that, compared to the original matrix, when the reordered lines are mapped to the bitlines of an array, a distribution of lower resistive elements is adjusted to be towards the inputs of the wordlines of the array.
2. The method according to claim 1 , wherein the reordering of the lines of the matrix comprises:
determining a total resistance value for each line to be represented by the bitline of the resistive memory array; and
ordering the matrix such that, when the matrix is mapped to the array, the total resistance values determined for the bitlines of an array increases with distance from the inputs of the wordlines of the array.
3. The method according to claim 2 wherein the reordering further comprises:
determining a first parameter of an array written to represent the modified matrix; selecting a pair of lines in the modified matrix;
determining a second parameter of an array written to represent the modified matrix when a position of the selected lines is exchanged; and
if second parameter is indicative of a higher energy efficiency than the first parameter, exchanging the position of the selected pair of reordered lines in the modified matrix.
4. The method according to claim 1 comprising determining, by at least one processor, a plurality of matrices as candidate modified matrices by reordering the lines of the original matrix;
modelling, by at least one processor, at least one parameter of an array written to represent each of the plurality of candidate modified matrices;
comparing, by at least one processor, the modelled at least one parameter of the candidate modified matrices; and
selecting, by at least one processor, a candidate modified matrix as the modified matrix based on the comparison.
5. The method according to claim 1 , further comprising writing a resistive memory array to represent the modified matrix.
6. The method according to claim 5, further comprising
applying an input vector to the array, wherein the input vector comprises a plurality of voltage values, and one voltage value is applied to each wordline, and
obtaining an output vector, the output vector comprising a plurality of voltage values output from the brtlines of the array, wherein the output voltage values comprise the input voltage values weighted by the resistances of the resistive memory elements.
7. The method according to claim 6 further comprising reordering the output voltage values to have an order corresponding to the arrangement of the original matrix.
8. The method according to claim 6 further comprising carrying out an analogue to digital conversion on the output voltage values to provide a digital representation of the output voltages.
9. The method according to claim 8 further comprising reordering the digital representation of the output voltage values to have an order corresponding to the arrangement of the original matrix.
10. A dot product engine comprising:
an array of resistive memory elements, the array comprising a plurality of wordlines and a plurality of bitlines, and
an output operator,
wherein the resistive memory elements of the array each comprise a resistance, the array is to receive an input voltage on at least one wordline and is to output an output voltage value from each of a plurality of bitlines, and
the output operator is to generate an output vector, the output vector comprising a representation of the output voltage values reordered to have an order, wherein the order of output voltage values in the output vector is different to a bitline order of the output voltage values.
11. A dot product engine according to claim 10 in which the output operator comprises at least one set of switches, each set comprising a switch associated with the output of each bitline, wherein an order of the output voltage values is adjustable by each set of switches.
12. A dot product engine according to claim 11 , wherein the array comprises N bitlines and the output operator comprises log(N) sets of switches, each set comprising N/2 switches, and each switch is reconfigurable between a first state, in which it is to change a positon of a value within an order, and a second state, in which it is to maintain the position of a value within an order.
13. A dot product engine according to claim 11 , wherein the array comprises N bitlines and the output operator comprises M sets of switches, and a set of switches may have a first configuration, in which each of the switches has one of a first and second state, and a second configuration, in which each of the switches has the other of the first and second state than the state of that switch in the first configuration.
14. A dot product engine according to claim 10 in which at least one element of the array is written with a resistance value according to a matrix, the dot product engine further comprising a matrix reordering module, the matrix reordering module being to carry out a row-wise or column-wise reordering of an original matrix, wherein the reordering is to adjust a distribution of lower resistance elements to be closer to an input end of the array, and in which the array is written with a resistance value according to a reordered matrix.
15. A processing apparatus comprising:
a processor;
a dot product engine comprising an array of resistive memory elements comprising a plurality of word lines and a plurality of bitlines;
a matrix reordering module; and
a memory holding instructions which, when executed, cause the matrix reordering module to carry out a row-wise or column-wise reordering of an original matrix to determine a modified array be represented by the array of the dot product engine, wherein the reordering is to adjust a distribution of lower resistance elements to be closer to an input end of the array.
PCT/US2016/025141 2016-03-31 2016-03-31 Reordering matrices WO2017171768A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/US2016/025141 WO2017171768A1 (en) 2016-03-31 2016-03-31 Reordering matrices

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2016/025141 WO2017171768A1 (en) 2016-03-31 2016-03-31 Reordering matrices

Publications (1)

Publication Number Publication Date
WO2017171768A1 true WO2017171768A1 (en) 2017-10-05

Family

ID=59966246

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2016/025141 WO2017171768A1 (en) 2016-03-31 2016-03-31 Reordering matrices

Country Status (1)

Country Link
WO (1) WO2017171768A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050237834A1 (en) * 2003-08-20 2005-10-27 Bozano Luisa D Memory device and method of making the same
US20130028004A1 (en) * 2010-04-19 2013-01-31 Gregory Stuart Snider Refreshing memristive systems
US20140133211A1 (en) * 2012-11-14 2014-05-15 Crossbar, Inc. Resistive random access memory equalization and sensing
US20140172937A1 (en) * 2012-12-19 2014-06-19 United States Of America As Represented By The Secretary Of The Air Force Apparatus for performing matrix vector multiplication approximation using crossbar arrays of resistive memory devices
US20140361851A1 (en) * 2011-02-01 2014-12-11 Martin Anthony Keane Signal-processing devices having one or more memristors

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050237834A1 (en) * 2003-08-20 2005-10-27 Bozano Luisa D Memory device and method of making the same
US20130028004A1 (en) * 2010-04-19 2013-01-31 Gregory Stuart Snider Refreshing memristive systems
US20140361851A1 (en) * 2011-02-01 2014-12-11 Martin Anthony Keane Signal-processing devices having one or more memristors
US20140133211A1 (en) * 2012-11-14 2014-05-15 Crossbar, Inc. Resistive random access memory equalization and sensing
US20140172937A1 (en) * 2012-12-19 2014-06-19 United States Of America As Represented By The Secretary Of The Air Force Apparatus for performing matrix vector multiplication approximation using crossbar arrays of resistive memory devices

Similar Documents

Publication Publication Date Title
JP7041654B2 (en) NAND block architecture for in-memory multiply-accumulate operations
US9691479B1 (en) Method of operating and apparatus of memristor arrays with diagonal lines interconnect between memristor cells
US12079708B2 (en) Parallel acceleration method for memristor-based neural network, parallel acceleration processor based on memristor-based neural network and parallel acceleration device based on memristor-based neural network
US11657259B2 (en) Kernel transformation techniques to reduce power consumption of binary input, binary weight in-memory convolutional neural network inference engine
EP3637326B1 (en) Shifting architecture for data reuse in a neural network
KR20210052388A (en) In-memory computation circuits with multi-VDD arrays and/or analog multipliers
KR20200124705A (en) Systems and methods for efficient matrix multiplication
Hu et al. Dot-product engine as computing memory to accelerate machine learning algorithms
CN107533459A (en) Use the data processing of resistive memory array
US10776684B1 (en) Mixed core processor unit
WO2021076182A1 (en) Accelerating sparse matrix multiplication in storage class memory-based convolutional neural network inference
CN112992226A (en) Neuromorphic device and memory device
CN111971662A (en) Resistor and digital processing core
JP2020035502A (en) Semiconductor integrated circuit
KR102409859B1 (en) Memory cells configured to generate weighted inputs for neural networks
Liu et al. Era-bs: Boosting the efficiency of reram-based pim accelerator with fine-grained bit-level sparsity
US11556311B2 (en) Reconfigurable input precision in-memory computing
CN111326190B (en) Phase change random access memory cell array and writing method thereof
WO2017171768A1 (en) Reordering matrices
US10754582B2 (en) Assigning data to a resistive memory array based on a significance level
US10754581B2 (en) Identifying outlying values in matrices
CN114171087A (en) Memristor array structure, operation method thereof and neural network sparsification device
Zhang et al. Xma: A crossbar-aware multi-task adaption framework via shift-based mask learning method
US20240249132A1 (en) Multi-resistor unit cell configuration for impementaiton in analog neuromorphic circuits
WO2022230674A1 (en) Computation processing device

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16897317

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 16897317

Country of ref document: EP

Kind code of ref document: A1