CN109492187B

CN109492187B - Method and system for performing analog complex vector matrix multiplication

Info

Publication number: CN109492187B
Application number: CN201811040668.3A
Authority: CN
Inventors: 瑞安·M·哈彻; 博尔纳·J·奥布拉多维奇; 乔治·A·基特尔; 蒂塔什·拉克西特
Original assignee: Samsung Electronics Co Ltd
Current assignee: Samsung Electronics Co Ltd
Priority date: 2017-09-11
Filing date: 2018-09-06
Publication date: 2023-03-24
Anticipated expiration: 2038-09-06
Also published as: US10878317B2; KR20190029406A; US20190080230A1; CN109492187A; KR102462792B1

Abstract

A hardware apparatus and method for performing multiply-accumulate operations are described. The apparatus includes an input line, a weight unit, and an output line. The input lines receive input signals, each input signal having an amplitude and a phase and may represent a complex value. The weight unit couples the input line and the output line. Each weight cell has an electrical admittance corresponding to the weight. The electrical admittance is programmable and can be complex valued. The input lines, weight cells and output lines form a crossbar array, each of which provides an output signal. The output signal of the output line is a sum of input signals of each of the input lines connected to the output line multiplied by the conductance of each of the weight units connecting the input line to the output line.

Description

Method and system for performing analog complex vector matrix multiplication

Cross Reference to Related Applications

This application claims priority from U.S. provisional patent application No.62/556,842, filed at U.S. patent and trademark office at 11/2017 and U.S. patent application No.15/849,106, filed at U.S. patent and trademark office at 20/2017, months and 12, the entire disclosures of which are incorporated herein by reference.

Background

Vector matrix multiplication operations, also known as Multiply and Accumulate (MAC) operations, dominate the performance of applications in various domains. For example, in machine learning, multiple layers of MAC operations may be performed. The input signals may be considered to form an input vector. The input signal may be image data, a byte stream or another data set. The input signal is multiplied by a matrix of values or weights. The output signal is the result of the MAC operation on the input signal and corresponds to the output vector. The output vector may be provided as an input vector to the next layer of MAC operations. This process may be repeated for a large number of layers. Because a large number of MAC operations are performed, the performance of the application depends largely on the performance of the MAC operations. Therefore, there is a need to efficiently and reliably perform MAC operations at low power and high speed.

The MAC operation may be performed digitally. However, analog crossbar arrays can perform MAC operations more efficiently than digital circuits. Such analog crossbar arrays use a dc signal and a resistor at each cross point. The conductance of each resistor corresponds to the weight of the matrix at that location. Can be controlled by setting the potential on the input line to be proportional to the desired input value (V) _i ∝a _i ) To perform multiplication and accumulation, where V _i Is the potential of a _i Is the desired input value. The resistance in the crossover network is set proportional to the inverse of the weight. Thus, W _ij ∝1/R _ij Wherein w is _ij Is the desired weight, R _ij Is the resistance expected for the crossover point. The MAC output is then proportional to the current on the output line: b is a mixture of _j ∝i _i In which b is _j Is an output, i _j Is the current on the output line. The activation function is applied to the MAC output, which is converted back to voltage. The voltage is an output vector corresponding to the product of the input vector and the weight array representing the matrix. Thus, the vector matrix multiplication for the actual inputs and weights can be performed analogically.

Due to its potential for use in various fields, a faster, more flexible and more energy efficient dedicated hardware implementation is desirable for vector matrix multiplication.

Disclosure of Invention

A hardware apparatus and method for performing multiply-accumulate operations are described. The apparatus includes an input line, a weight unit, and an output line. The input line receives input signals, each of which is an alternating current analog harmonic signal having an amplitude and a phase. Thereby, the input signal can represent a complex value. The weight unit couples the input line and the output line. Each of the weight cells has an electrical admittance corresponding to the weight. The electrical admittance is programmable and can be complex valued. The input lines, weight cells and output lines form a crossbar array. Each of the output lines provides an output signal. An output signal of the output line is a sum of input signals of each of the input lines connected to the output line multiplied by an electric admittance of each of a part of the plurality of weight units connecting the input line to the output line.

The hardware device may perform vector matrix multiplication on complex signals (complex valued input vectors) and employ complex weights (complex valued matrices). As a result, the speed, flexibility and/or efficiency of such MAC operations may be improved.

Drawings

FIG. 1 depicts a portion of an exemplary embodiment of a hardware crossbar array capable of performing analog complex MAC operations.

FIG. 2 depicts an exemplary embodiment of a portion of a hardware crossbar array capable of performing analog complex-valued MAC operations that contain negative weights.

FIG. 3 depicts an exemplary embodiment of a portion of a hardware crossbar array capable of performing analog complex-valued MAC operations that contain negative weights.

FIG. 4 depicts an exemplary embodiment of a portion of a hardware crossbar array capable of performing analog complex-valued MAC operations that contain negative weights.

Fig. 5A, 5B, and 5C depict exemplary embodiments of weight cells, programmable resistors, and programmable capacitors that may be used in a hardware crossbar array that performs analog complex MAC operations.

FIG. 6 is a flow chart depicting an exemplary embodiment of a method for performing an analog complex MAC operation.

Detailed Description

Exemplary embodiments relate to a hardware device for performing multiply-accumulate (MAC) operations, also known as vector matrix multiply operations. The methods and systems described herein may be used in various fields, including but not limited to machine learning, artificial intelligence, and neural networks. The method and system also relate to the use of complex valued signals and/or weights that can be used to optimize neural networks or for other applications. The method and system may be extended to other applications that use complex signals (e.g., complex vectors) and/or complex weights (complex valued matrices).

The following description is presented to enable one of ordinary skill in the art to make and use the invention and is provided in the context of a patent application and its requirements. Various modifications to the exemplary embodiments and the generic principles and features described herein will be readily apparent. The exemplary embodiments are described primarily in terms of specific methods and systems provided in particular implementations. However, the method and system will operate effectively in other implementations.

Phrases such as "an exemplary embodiment," "one embodiment," and "another embodiment" may refer to the same or different embodiments, as well as multiple embodiments. Embodiments will be described with respect to systems and/or devices having certain components. However, the system and/or apparatus may include more or fewer components than shown, and changes in the arrangement and type of the components may be made without departing from the scope of the invention. The exemplary embodiments will also be described in the context of a particular method having certain steps. However, the method and system are effectively used with other methods having different and/or additional steps and steps in different orders that are not inconsistent with the exemplary embodiments. Thus, the present invention is not intended to be limited to the embodiments shown, but is to be accorded the widest scope consistent with the principles and features described herein.

The use of the terms "a" and "an" and "the" and similar referents in the context of describing the invention (especially in the context of the following claims) is to be construed to cover both the singular and the plural, unless otherwise indicated herein or clearly contradicted by context. The terms "having," "including," and "containing" are to be construed as open-ended terms (i.e., meaning "including, but not limited to,") unless otherwise noted.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It should be noted that the use of any and all examples, or exemplary terminology provided herein is intended merely to better illuminate the invention and does not pose a limitation on the scope of the invention unless otherwise claimed. Moreover, unless otherwise defined, all terms defined in a general dictionary should not be overread.

A hardware apparatus and method for performing multiply-accumulate operations are described. The apparatus includes an input line, a weight unit, and an output line. The input lines receive input signals, each of which is an alternating current analog harmonic signal having an amplitude and a phase, and which can represent a complex value. The weight unit couples the input line and the output line. Each of the weight cells has an electrical admittance corresponding to the weight. The electrical admittance is programmable and can be complex valued. The input lines, weight cells and output lines form a crossbar array. Each of the output lines provides an output signal. The output signal of the output line is the sum of the input signals of each input line connected to the output line multiplied by the conductance of each weight cell connecting the input line to the output line.

FIG. 1 depicts a portion of an exemplary embodiment of a hardware device 100 capable of performing analog complex valued multiply-accumulate (MAC) operations/vector matrix multiplication. For simplicity, only a portion of hardware device 100 is shown. Hardware device 100 may be a hybrid analog-digital circuit. Hardware device 100 includes input lines 110-1 through 110-n (collectively/generically referred to as input lines 110), weight units 120-11 through 120-nm (collectively/generically referred to as weight units 120), output lines 130-1 through 130-m (collectively/generically referred to as output lines 130), and an optional post-accumulation block 102. In some embodiments, each input line 110 may be considered a pre-synaptic line, each weight unit 120 is a synapse, and each output line 130 is a post-synaptic line. The input line 110 is connected to the output line 130 through an array of weight cells 120. For example, input line 110-1 is connected to output line 130-1 through weight unit 120-11, to output line 130-2 through weight unit 120-12, and to output line 130-m through weight unit 120-1 m. Thus, the input line 110, the weight unit 120, and the output line 130 form a crossbar array.

Input line 130 receives a complex valued input signal. Each input signal is an alternating current analog harmonic signal. The ac analog harmonic signal represents complex values using amplitude and phase. For example, the input signal to input line 110-i may be composed of V _i e ^jωt Given a voltage, where t is time, ω is frequency,V _i is an electric potential. The input signal can also be expressed as: v _i，re cosωt+jV _i，im sinωt。V _i，re And V _i，im Is V _i The magnitude of the real and imaginary parts of (a). For at least some of the input lines 110, V _i，re Or V _i，im May be zero. The input signals provided to input lines 110 correspond to vectors in a matrix-vector multiplication operation. In other words, the magnitude of the voltage signal provided to input line 110 is proportional to the desired input value of the input vector. This input vector may be the output of a previous MAC operation, which may be performed by a device (not shown) that may be similar to hardware device 100.

Each of the weight cells 120 has an electrical admittance Y _ij Where i corresponds to the input line number and j corresponds to the output line number of the cross-point. The electrical admittance is programmable, can be complex valued and is a weight of the weight unit 120. The admittance of a given weight cell 120 may take a purely real value, a purely imaginary value, or may be the sum of a real value and an imaginary value. To provide this admittance, the weight unit 120 may include passive electronic components, such as resistors and capacitors. For example, the weight unit 120 may include a resistor in parallel with a capacitor. For such weight cell, admittance Y _ij ＝G _ij +iC _ij Wherein G is _ij ＝1/R _ij ，G _ij Is the conductance of a resistor, R _ij Is the resistance of a resistor, C _ij Is the capacitance of the capacitor. In other embodiments, other or additional passive components with complex admittances may be used. The array of weight units 120 corresponds to a matrix for vector matrix multiplication. Admittance Y of weight cell 120-ij _ij Proportional to the value in the ijth position of the matrix multiplied by the input vector.

Each of the output lines 130 provides an output signal in the form of a current. Due to the connection of the input lines 110, the weight units 120 and the output lines 130, the output signal of each output line is the sum of the input signals of each of the input lines 110 connected to the output line multiplied by the conductance of each weight unit 120 connecting the input line 110 to the output line 130.Thus, the output signal of each output line 130 is represented by I _j ＝∑ _i V _i Y _ij The complex current is given.

The post-accumulation processing block 102 may be used to perform additional processing on the output signal. For example, the post-processing block may convert the current in the output line 130 to a voltage. Such a voltage may form an input signal to a subsequent hardware MAC operating device (not shown) that may be similar to hardware device 100. In alternative embodiments, the post-accumulation processing block 102 may be omitted.

In operation, the admittance of each of the weight units 120 is programmable. A sinusoidal complex voltage signal corresponding to the input vector is driven through each of the input lines 110. The resulting current through the output lines 130 depends on the voltage through each input line and the admittance of the weight cell 120 connecting the input line 110 and the output line. The current on the jth output line 130 is given by: i is _j ＝∑ _i V _i Y _ij ＝∑ _i V _i (G _ij +C _ij ) Where i corresponds to the input line 110 and ranges from 1 to n, each weight cell 120 includes a capacitor in parallel with a resistor. This current may be converted to a voltage or otherwise manipulated by the post-processing block 102. Because analog signals are used, the hardware device 100 may suffer from delay loss due to the number of cycles required to stabilize the output. Furthermore, sufficient non-linearity in the resistors or capacitors within the weighting unit 120 may cause distortion and errors in the output. During operation of the hardware device 100, such non-linearity needs to be reduced, and it is desirable to perform a sufficient number of cycles for stabilizing the output.

Using the hardware device 100, MAC operations/vector matrix multiplications may be performed on complex valued analog signals and complex valued weights used. Because it is implemented in hardware, the analog hardware device 100 can perform MAC operations with lower power consumption and improved performance. The weight/conductance of a particular weight unit 120 of hardware device 100 may be purely real, purely imaginary, or include both real and imaginary components. Hardware apparatus 100 may thus have increased flexibility compared to conventional hardware implementations that employ only real weights. Hardware device 100 may be used to efficiently perform complex MAC operations in a neural network using complex values. Such complex valued neural networks may use fewer neurons and/or mathematical operations to solve the problem. Thus, neural network design may be improved, performance enhanced and power consumption reduced. Similar benefits may be achieved in other applications using complex valued MAC operations. Accordingly, the hardware device 100 can improve the performance of applications that depend on MAC operations.

In the hardware device 100 shown in fig. 1, if only passive components are used, the admittance of the weight unit 120 may have only a positive value. This is because passive components such as resistors, capacitors, and the like, discussed in detail herein, have non-negative admittances. However, it may be desirable for the weights of the weight unit 120 to take on generally complex values (positive or negative). Fig. 2-4 depict exemplary embodiments of hardware devices in which the weights may be positive or negative complex values and are implemented using passive components. The negative complex value may have a real part and an imaginary part, one or both of which may be negative. In an alternative embodiment, active elements such as current mirrors may be employed within weight unit 120 to provide positive and negative complex valued weights.

FIG. 2 depicts an exemplary embodiment of a portion of a hardware device 100A capable of performing analog complex MAC operations that include negative weights. Hardware device 100A is similar to hardware device 100. Thus, hardware device 100A includes input line 110A, weight unit 120A, output line 130, and optional post-accumulation block 102, which correspond to input line 110, weight unit 120, output line 130, and optional post-accumulation block 102, respectively. Thus, the input line 110A, the weight unit 120A, and the output line 130 are indexed in a manner similar to that described above. However, as shown in FIG. 2, each input line 110A-i includes two input sub-lines 112-i and 114-i (collectively referred to as sub-lines 112 and 114). For example, input line 110A-1 includes sub-lines 112-1 and 114-1. Each weight cell 120A-ij comprises two weight sub-cells 122-ij and 124-ij. For example, weight element 120A-11 includes subunits 122-11 and 124-11.

Input line 110A includes sub-lines 112 and 114, such thatCarrying positive and negative input signals, respectively. Similarly, the weight cells 120A-ij include a positive subcell 122-ij (generically referred to as subcell 122) connected to the positive sub-line 112 and a negative subcell 124-ij (generically referred to as subcell 124) connected to the negative sub-line 114. The positive subunit 122 has an admittance Y _ij+ And the negative subunit 124 has an admittance Y _ij- . Both

subunits

122 and 124 are connected to a corresponding output line 130. Thus, the positive and negative complex input values (voltages) are multiplied by the desired weights (admittances) and both are accumulated on the appropriate output line 130. The output of the MAC operation performed by hardware device 100A is the current carried on each output line 130.

Hardware device 100A operates in a similar manner as hardware device 100. The admittance of each of the weight cells 120 is set. In addition, sinusoidal complex voltages for positive and negative weights are driven through each of the

input lines

112 and 114. The resulting current through the output line 130 depends on the voltage through each

input line

112 and 114 and the admittance of the

subcells

122 and 124, respectively. The current on the jth output line 130 is given by: i is _j ＝∑ _i V _i (Y _ij+ -Y _ij- ) Where i corresponds to input line 110A and ranges from 1 to n. This current may be converted to a voltage by post-processing block 102 and provided to subsequent hardware devices that perform subsequent MAC operations.

Using hardware device 100A, the benefits of hardware device 100 may be realized. Thus, the MAC operation/vector matrix multiplication can be performed using analog signals and complex weights, with lower power consumption and improved performance. Hardware device 100A may thus have increased flexibility and may be used to efficiently perform complex MAC operations in a neural network using complex values. Hardware device 100A may also be extended to other applications where complex values may be desired. Hardware device 100A is therefore able to improve the performance of applications that rely on MAC operations. Further, the weight used in the MAC operation may take a positive complex value and a negative complex value. As a result, the utility of the hardware device 100A can be expanded.

FIG. 3 depicts an exemplary embodiment of a portion of a hardware device 100B capable of performing analog complex MAC operations that include negative weights. Hardware device 100B is similar to

hardware devices

100 and 100A. Thus, the hardware device 100B comprises an input line 110, a weighting unit 120B, an output line 130B and an optional post-accumulation block 102, which correspond to the input line 110/110A, the weighting unit 120/120A, the output line 130 and the optional post-accumulation block 102, respectively. Thus, the input line 110, the weight unit 120B, and the output line 130B are indexed in a manner similar to that described above. As shown in FIG. 3, each output line 130B-i includes two output sub-lines 132-i and 134-i (collectively referred to as lines 132 and 134). For example, output line 130B-1 includes sub-lines 132-1 and 134-1. In addition, each weight cell 120B-ij includes two weight sub-cells 122-ij and 124-ij.

The weight cells 120B-ij include positive subcells 122-ij (collectively referred to as subcells 122) and negative subcells 124-ij (collectively referred to as subcells 124), both of which are connected to the input line 110. The positive subunit 122 is connected to a positive output sub-line 132. The negative subunit 124 is connected to a negative output sub-line 134. The positive subunit 122 has an admittance Y _ij+ And the negative subunit 124 has an admittance Y _ij- . Thus, the two complex input values (voltages) multiplied by the desired weight (admittance) are each summed up on the appropriate positive output line 132 and negative output line 134. Output line 130B thus carries positively and negatively weighted output signals on sub-line 132 and sub-line 134, respectively.

Hardware device 100B operates in a similar manner as

hardware devices

100 and 100A. The admittance of each of the weight cells 120B is set. In addition, a sinusoidal complex voltage is driven through each of the input lines 110. The resulting current through each

output sub-line

132 and 134 depends on the voltage through input line 110 and the admittance of the weight of the

corresponding sub-unit

122 and 124, respectively. The current on the jth output sub-line 132-j and 134-j, respectively, is given by: i is _j+ ＝∑ _i V _i Y _ij+ And I _j- ＝∑ _i V _i Y _ij- Where i corresponds to input line 110A and ranges from 1 to n. This current may be converted to a voltage by post-processing block 102 and provided to subsequent hardware devices that perform subsequent MAC operations.

Using hardware device 100B, the benefits of hardware device 100 and/or 100A may be realized. Thus, the MAC operation/vector matrix multiplication can be performed using analog signals and complex weights, with lower power consumption and improved performance. Hardware device 100A may thus have increased flexibility and may be used to efficiently perform complex MAC operations in a neural network using complex values. Hardware device 100A may also be extended to other applications that may require complex values. Accordingly, the hardware device 100B can improve the performance of the application depending on the MAC operation. Further, the weight used in the MAC operation may take a positive complex value and a negative complex value. As a result, the utility of the hardware device 100 can be expanded.

Although

hardware devices

100A and 100B function, these devices require a significant amount of additional circuitry. More specifically, each weight unit requires an additional input or output line and a plurality of sub-units. Accordingly, an improved mechanism for enabling MAC operations to be performed using positive and negative complex weights may be desired.

Fig. 4 depicts an exemplary embodiment of a portion of a hardware device 100C capable of performing analog complex MAC operations containing negative weights without requiring additional subunits and input or output lines for each weight unit. Hardware device 100C is similar to

hardware devices

100, 100A, and 100B. Thus, hardware device 100C includes input line 110, weight unit 120, output line 130, and optional post-accumulation block 102, which correspond to input line 110, weight unit 120, output line 130, and optional post-accumulation block 102, respectively. The input line 110, the weight unit 120, and the output line 130 are indexed in a similar manner as described above. Further, there is a resistor R having a resistance for each input line 110 _off-i Is provided (generally referred to as 150). The hardware device 100C also includes an additional offset line 140, an optional offset block 160 that can process the output current from the offset line 140, and an optional offset voltage line 170.

It can be shown that _ij- May be represented by a fixed value to cancel the total admittance of the input 110-i. The offset need not be included in each individual weight cell 120-ij. In contrast, the offset may be formed by having pairsThe resistance at each input line 110-i is R _off-i Is represented by resistor 150-i. The sum of the conductances corresponding to these offset resistances multiplied by the input voltage may be summed on a separate offset line 140 and combined with the output on the remaining lines. The offset block 160 may convert the offset current to a voltage. An optional voltage line 170 may provide an offset voltage for each output line 130. In an alternative embodiment, the conversion to voltage may be performed by the post-accumulation block 102.

Hardware device 100C operates in a similar manner as

hardware devices

100, 100A, and 100B. The admittance of each of the weight cells 120 is set. In addition, a sinusoidal complex voltage is driven through each of the input lines 110. The resulting current through each output line 130 is summed in a manner similar to that described for hardware device 100. In addition, the admittance 1/R of the resistor 150-I connected to the offset line 140 _off-i Resulting in an accumulated offset current I _off . The offset current on the offset line 140 is given by: i is _off ＝∑ _i V _i /R _off-i Where i corresponds to input line 110 and ranges from 1 to n. This offset current may be subtracted from the output current on output line 130. The resulting current for each output line 130 may be converted to a voltage by the post-processing block 102 and provided to subsequent hardware devices performing subsequent MAC operations. Alternatively, the offset current may be converted to an offset voltage and subtracted from the voltage corresponding to the current in the output line 130.

Using hardware device 100C, the benefits of

hardware devices

100, 100A, and/or 100B may be realized. Thus, the MAC operation/vector matrix multiplication can be performed using analog signals and complex weights, with lower power consumption and improved performance. The hardware device 100C may thus have increased flexibility and may be used to efficiently perform complex MAC operations in a neural network using complex values. Hardware device 100C may also be extended to other applications where complex values may be desired. Therefore, the hardware device 100C can improve the performance of the application depending on the MAC operation. The weights used in the MAC operation may take positive and negative complex values. Furthermore, negative complex values can be included in simpler, more efficient and more clever circuits. As a result, the utility of the

hardware device

100, 100A, and/or 100B may be expanded.

Fig. 5A, 5B, and 5C depict exemplary embodiments of a weight unit 120', a programmable resistance circuit 200A, and a programmable capacitance circuit 210A, respectively. Referring to fig. 5A, the weight unit 120' may be used in one or more of the

weight units

120, 120A, and/or 120B in the

hardware devices

100, 100A, 100B, and/or 100C. The weight unit 120' includes a programmable resistor 200 in parallel with a programmable capacitor 210. In some embodiments, a switch (not shown) is in series with resistor 200 and capacitor 210. However, in the illustrated embodiment, these switches are omitted. Programmable resistor 200 has a variable conductance G _p . The programmable capacitor 210 has a programmable capacitance C _p . The programmable admittance of the weight cell 120' is G _p +C _p . Thus, the weighting unit 120' may be purely resistive, may be purely capacitive, or may be both resistive and capacitive.

FIG. 5B depicts one embodiment of a programmable resistance circuit 200A that may be used for the programmable resistor 200. The programmable resistance circuit 200A includes

resistors

202, 204, and 206 and switches 203, 205, and 207, respectively, connected in parallel.

Resistors

202, 204, and 206 each have a conductance G ₁ 、G ₂ And G ₃ . In an exemplary embodiment, each

switch

203, 205, and/or 207 is a flash transistor. However, switches 203, 205, and/or 207 may include other devices having a sufficiently large on/off ratio. For example, the on/off ratio may be at least 10 ⁵ . Such devices may include, but are not limited to, MOSFETs, ferroelectric transistors, and/or Resistive Random Access Memory (RRAM) elements.

Resistors

200 and 200A may be implemented in a variety of ways. For example,

resistors

202, 204, and/or 206 may be implemented using Magnetic Random Access Memory (MRAM) elements (such as magnetic tunnel junctions), flash devices, RRAM elements, phase Change Memory (PCM) elements, ferroelectric random access memory (FeRAM) elements, and/or the like. In some embodiments, G ₂ Is G ₁ Double of (G), G ₃ Is G ₂ Twice as much. By selectively opening or closing switches203. 205, and 207, the resistance of the programmable resistance circuit 200A may be set to one of eight evenly distributed values. However, other configurations with other resistance values are possible.

Fig. 5C depicts one embodiment of a programmable capacitance circuit 210A that may be used for the programmable capacitor 210 of the weight unit 120'. The programmable capacitance circuit 210A includes

capacitors

212, 214, and 216 and switches 213, 215, and 217 connected in parallel, respectively.

Capacitors

212, 214, and 216 each have a capacitance C ₁ 、C ₂ And C ₃ . In an exemplary embodiment, each

switch

213, 215, and/or 217 is a flash transistor. However, switches 213, 215, and/or 217 may include other devices having sufficiently large on/off ratios. Such an on/off ratio may be at least 10 ⁵ . Devices meeting such criteria may include the devices described above. Additional devices that may be used as the programmable capacitor 210/210A include ferroelectric FETs (FeFETs), flash devices, MOS capacitors (MOSCAPs), or similar devices having complex impedances. However, such alternative devices may have relatively non-linear current-voltage characteristics, which may lead to distortion and inaccuracy of MAC operation. Thus, the use of such devices may be undesirable in at least some embodiments. In alternative embodiments, the programmable capacitors 210/210A may be augmented or replaced by programmable inductors. However, compact programmable and manufacturable inductor devices may be difficult to obtain. In some embodiments, C ₂ Is C ₁ Twice of, C ₃ Is C ₂ Twice as much. However, other configurations are possible. By selectively opening or closing

switches

213, 215, and 217, the capacitance of programmable capacitance circuit 210A can be set to one of eight evenly distributed values. However, other configurations with other capacitance values are possible.

The

hardware devices

100, 100A, 100B, 100C and/or the like may be implemented using weight cells 120', programmable resistance circuits 200A and/or programmable capacitance circuits 210A. Other implementations may be used in other embodiments. Thus, the beneficial effects of

hardware devices

100, 100A, 100B, and/or 100C may be realized.

Fig. 6 is a flow chart depicting an exemplary embodiment of a method 300 for performing an analog complex MAC operation. For simplicity, some steps may be omitted, performed in another order, and/or combined. The method 300 is also described in the context of the hardware device 100. However, the method 300 may be used in conjunction with another hardware device (such as the

devices

100A, 100B, and/or 100C) for performing analog complex valued MAC operations.

The admittance of the weight unit 120 is programmed, via step 302. Thereby setting the desired resistance, capacitance, inductance, and/or other electrical characteristics of the weight cells 120. Step 302 may include opening or closing one or more of

switches

203, 205, 207, 213, 215, and/or 217 to provide the desired resistance and capacitance in each weight cell 120.

An input signal is received, via step 304. Step 304 may include generating an input signal for each input line 110 and receiving the signal. Thus, the complex alternating current harmonic voltage signal may be provided to the hardware device.

The input signal is passed through the crossbar array of hardware device 100, via step 306. Due to the configuration of the input line 110, the weight unit 120, and the output line 130, a MAC operation may be performed on an input signal. The resulting current on output line 130 is therefore the output of the MAC operation.

The current on output line 130 may be processed by post accumulation block 102, via step 308. For example, the current on the output line 130 may be converted to a voltage. Thereby, an output vector resulting from a multiplication of an input vector on the input line 110 with a matrix formed by the weighting unit 120 may be provided.

Thus, using the method 300, the

hardware device

100, 100A, 100B, 100C and/or the like may be used. As a result, the advantages of one or more of

hardware devices

100, 100A, 100B, and/or 100C may be realized.

A method and system for performing MAC operations/vector matrix multiplication using complex values has been described. The method and system have been described in accordance with the exemplary embodiments shown, and one of ordinary skill in the art will readily recognize that there could be variations to the embodiments, and any variations would be within the spirit and scope of the method and system. Accordingly, many modifications may be made by one of ordinary skill in the art without departing from the spirit and scope of the appended claims.

Claims

1. A hardware apparatus for performing multiply-accumulate operations, comprising:

a plurality of input lines for receiving a plurality of input signals, each of the plurality of input signals being an alternating current analog harmonic signal having an amplitude and a phase, the alternating current analog harmonic signal capable of representing a complex value;

a plurality of output lines; and

a plurality of weight cells coupling the plurality of input lines with the plurality of output lines, each of the plurality of weight cells having an electrical admittance corresponding to a weight, the electrical admittance being programmable and capable of being complex-valued;

wherein the plurality of input lines, the plurality of weight cells, and the plurality of output lines form a crossbar array;

wherein each of the plurality of output lines provides an output signal, the output signal of one of the plurality of output lines is a sum of input signals of each of the plurality of input lines connected to the one output line multiplied by an electric admittance, the electric admittance being an electric admittance of each of a portion of the plurality of weight units connecting the plurality of input lines to the one output line.

2. The hardware device of claim 1, wherein each input line of the plurality of input lines is coupled to each output line of the plurality of output lines through a weight unit of the plurality of weight units.

3. The hardware device of claim 2, wherein each of the plurality of weight cells comprises a programmable resistor connected in parallel with a programmable capacitor.

4. The hardware device of claim 3, wherein the programmable resistor comprises a plurality of parallel-connected resistive circuits, each resistive circuit comprising a switch connected in series with a resistor.

5. The hardware device of claim 3, wherein the programmable capacitor comprises a plurality of capacitors connected in series with a switch.

6. The hardware device of claim 1, wherein at least one weight of at least one of the plurality of weight units may take a negative complex value.

7. The hardware device of claim 6, wherein each weight cell of the at least one weight cell comprises a positive weight subunit and a negative weight subunit, wherein each weight cell is coupled with two input lines of the plurality of input lines.

8. The hardware device of claim 6, wherein each of the at least one weighting unit comprises a positive weighting subunit and a negative weighting subunit, wherein each weighting unit is coupled to two of the plurality of output lines.

9. The hardware device of claim 6, further comprising:

an offset line connected to each of the plurality of input lines through a passive offset component having a real offset impedance and connected to each of the plurality of input lines.

10. A complex valued neural network system, comprising:

a plurality of hardware device layers, each of the hardware devices to perform multiply-accumulate operations of an input vector with a matrix, each of the hardware devices comprising:

a plurality of input lines for receiving a plurality of input signals corresponding to the input vectors, each of the plurality of input signals being an alternating current analog harmonic signal having an amplitude and a phase, the alternating current analog harmonic signal being capable of representing a complex value;

a plurality of output lines coupled to the plurality of input lines; and

a plurality of weight cells coupling each of the plurality of input lines with each of the plurality of output lines to form a crossbar array, the plurality of weight cells corresponding to the matrix, each of the plurality of weight cells having an electrical admittance corresponding to a weight, the electrical admittance being programmable and capable of being complex valued, each of the plurality of weight cells comprising a programmable resistor connected in parallel with a programmable capacitor, the programmable resistor comprising a plurality of resistance circuits connected in parallel, each resistance circuit comprising a switch connected in series with a resistor, the programmable capacitor comprising a plurality of capacitors connected in series with a switch; and

an offset line connected to each of the plurality of input lines through a passive offset component having a real offset impedance and connected to each of the plurality of input lines such that each of the plurality of weights may assume a negative complex value;

wherein each of the plurality of output lines provides an output signal, the output signal of one of the plurality of output lines is a sum of input signals of each of the plurality of input lines connected to the one output line multiplied by an electric admittance, the electric admittance being an electric admittance of each of a portion of the plurality of weight units connecting the plurality of input lines to the one output line; and

for each layer of the plurality of layers except a last layer, an output signal of each output line of the plurality of output lines corresponds to a component of the vector input of a next hardware device layer of the plurality of hardware device layers.

11. A method for performing multiply-accumulate operations, comprising:

receiving a plurality of input signals, each of the plurality of input signals being an alternating current analog harmonic signal having an amplitude and a phase, the alternating current analog harmonic signal capable of representing a complex value;

passing the plurality of input signals through a crossbar array comprising a plurality of input lines, a plurality of output lines, and a plurality of weight cells coupling the plurality of input lines with the plurality of output lines, each weight cell of the plurality of weight cells having an electrical admittance corresponding to a weight, the electrical admittance being programmable and capable of being complex-valued such that each output line of the plurality of output lines provides an output signal, the output signal of one output line of the plurality of output lines being a sum of the input signals of each input line of the plurality of input lines connected to the one output line multiplied by the electrical admittance, the electrical admittance being an electrical admittance of each weight cell of a fraction of the weight cells of the plurality of weight cells connecting the plurality of input lines to the one output line.

12. The method of claim 11, further comprising:

setting an electrical admittance of each of the plurality of weight cells prior to the step of passing the plurality of input signals through the crossbar array.

13. The method of claim 12, wherein each of the plurality of weight cells comprises a programmable resistor connected in parallel with a programmable capacitor, the method comprising:

setting a resistance of a programmable resistor in each of the plurality of weight cells; and

setting a capacitance of a programmable capacitor in each of the plurality of weight cells.

14. The method of claim 13, wherein the programmable resistor comprises a plurality of parallel-connected resistive circuits, each resistive circuit comprising a switch connected in series with a resistor, the method further comprising:

setting a switch in each of the plurality of resistive circuits to one of a closed state and an open state.

15. The method of claim 13, wherein the programmable capacitor comprises a plurality of capacitors connected in series with a switch, the method further comprising:

the switch is set to one of a closed state and an open state.

16. The method of claim 11, wherein at least one weight of at least one of the plurality of weight cells may take a negative complex value.

17. The method of claim 16, wherein each weight cell of the at least one weight cell comprises a positive weight sub-cell and a negative weight sub-cell, wherein each weight cell is coupled with two input lines of the plurality of input lines.

18. The method of claim 16, wherein each of the at least one weight cell comprises a positive weight subunit and a negative weight subunit, wherein each weight cell is coupled to two of the plurality of output lines.

19. The method of claim 16, wherein the crossbar array further comprises: a bias line connected to each of the plurality of input lines through a passive bias component having a real offset impedance and connected to each of the plurality of input lines.