CN111656360B

CN111656360B - System and method for sparsity utilization

Info

Publication number: CN111656360B
Application number: CN201880061175.XA
Authority: CN
Inventors: K.F.布什; J.H.霍利曼三世; P.沃伦坎普; S.W.贝利
Original assignee: Morita Co
Current assignee: Morita Co
Priority date: 2017-07-21
Filing date: 2018-07-20
Publication date: 2024-02-20
Anticipated expiration: 2038-07-20
Also published as: CN117993448A; CN111656360A

Abstract

Disclosed is a neuromorphic integrated circuit that, in some embodiments, includes a multi-layer neural network disposed in an analog multiplier array of a two-quadrant multiplier. When the input signal value of the input signal to the transistors of the multipliers is approximately zero, the weight value of the transistors of the multipliers is approximately zero, or a combination thereof, each of the multiplier lines in the multipliers is grounded and draws a negligible amount of current. Also disclosed is a method of neuromorphic integrated circuits, the method comprising in some embodiments: training a neural network; tracking the change rate of the weight value; determining whether and how quickly certain weight values are trending toward zero; and driving those weight values toward zero, thereby encouraging sparsity in the neural network. Sparsity in the neural network in combination with the line-to-ground multipliers minimizes the power consumption of the neuromorphic integrated circuit so that battery power is sufficient for powering.

Description

System and method for sparsity utilization

Priority

The present application claims priority from U.S. patent application Ser. No.16/041,565, filed on 7.20.2018, and U.S. provisional patent application Ser. No.62/535,705, entitled "Systems and Methods for Sparsity Exploiting," filed on 7.21.2017, which are hereby incorporated by reference in their entireties.

Technical Field

Embodiments of the present disclosure relate to the field of neuromorphic computation. More particularly, embodiments of the present disclosure relate to systems and methods for encouraging sparsity in a neural network of neuromorphic integrated circuits and minimizing power consumption of neuromorphic integrated circuits.

Background

Conventional central processing units ("CPUs") process instructions based on "time keeping". Specifically, the CPU operates such that information is transmitted at regular time intervals. Based on complementary metal oxide semiconductor ("CMOS") technology, silicon-based chips can be fabricated with more than 50 hundred million transistors per die, which has features as small as 10 nm. Advances in CMOS technology have been successfully utilized in the development of parallel computing, which is ubiquitous in personal computers and cellular telephones that contain multiple processors.

However, as machine learning is becoming commonplace for numerous applications including bioinformatics, computer vision, video games, marketing, medical diagnostics, online search engines, etc., conventional CPUs often cannot supply a sufficient amount of processing power while maintaining low power consumption. In particular, machine learning is a sub-part of computer science that is directed to software that has the ability to learn from and make predictions about data. Furthermore, one branch of machine learning includes deep learning, which is directed to utilizing deep (multi-layer) neural networks.

Currently, research is underway to develop direct hardware implementations of deep neural networks, which may include systems that attempt to simulate "silicon" neurons (e.g., "neuromorphic computation"). Neuromorphic chips (e.g., silicon computational chips designed for neuromorphic computation) operate by processing instructions in parallel (e.g., as opposed to a conventional sequential computer) using current bursts transmitted at non-uniform intervals. As a result, neuromorphic chips require significantly less power for processing information, particularly artificial intelligence ("AI") algorithms. To achieve this, the neuromorphic chip may contain as many as five times as many transistors as a conventional processor, consuming as little as 1/2000 of the power. Thus, the development of neuromorphic chips aims at providing a chip with great processing power that consumes much less power than conventional processors. Furthermore, neuromorphic chips are designed to support dynamic learning in the context of complex and unstructured data.

There is a continuing need to develop neuromorphic chips that have significant processing power, consuming much less power than conventional processors. Provided herein are systems and methods for encouraging sparsity in a neural network of neuromorphic chips and minimizing power consumption of neuromorphic chips.

Disclosure of Invention

Disclosed herein is a neuromorphic integrated circuit that, in some embodiments, includes a multi-layer neural network disposed in an analog multiplier array of a plurality of two-quadrant multipliers arranged in memory sectors of the neuromorphic integrated circuit. When the input signal value of the input signal to the transistors of the multipliers is approximately zero, the weight value of the transistors of the multipliers is approximately zero, or a combination thereof, each of the multiplier lines in the multipliers is grounded and draws a negligible amount of current. Sparsity in a neural network in combination with a plurality of multipliers wired to ground minimizes power consumption of the neuromorphic integrated circuit.

In some embodiments, each of the multipliers does not draw current when an input signal value of an input signal to a transistor of the multiplier is zero, a weight value of a transistor of the multiplier is zero, or a combination thereof.

In some embodiments, the weight value corresponds to a synaptic weight value between neural nodes disposed in a neural network in the neuromorphic integrated circuit.

In some embodiments, the input signal values multiplied by the weight values provide output signal values that are combined to arrive at a decision for the neural network.

In some embodiments, the transistors of the two-quadrant multiplier include metal oxide semiconductor field effect transistors ("MOSFETs").

In some embodiments, each of the two-quadrant multipliers has a differential structure configured to allow programmed compensation for overshoot (programmatic compensation) if either of the two cells is set to have a weight value of the ratio target.

In some embodiments, the neuromorphic integrated circuit is configured for one or more specialized standard products ("ASSPs") selected from keyword spotting (keyword spotting), speaker identification, one or more audio filters, gesture recognition, image recognition, video object classification and segmentation, and autonomous vehicles including drones.

In some embodiments, the neuromorphic integrated circuit is configured to operate on battery power.

Also disclosed herein is a method of neuromorphic integrated circuits, the method comprising in some embodiments: training a multi-layer neural network in an analog multiplier array of a plurality of two-quadrant multipliers disposed in a memory sector of a neuromorphic integrated circuit, and encouraging sparsity in the neural network during training. When the input signal value of the input signal to the transistors of the multipliers is approximately zero, the weight value of the transistors of the multipliers is approximately zero, or a combination thereof, each of the multiplier lines in the multipliers is grounded and draws a negligible amount of current. Encouraging sparsity in a neural network includes training with a training algorithm configured to drive a large number of input signal values, weight values, or combinations thereof toward zero for multipliers, thereby enabling minimal power consumption of the neuromorphic integrated circuit.

In some embodiments, the method further includes tracking the rate of change of the weight values of each of the multipliers during training, and determining whether certain weight values are trending towards zero, and how fast those particular weight values are trending towards zero.

In some embodiments, the method further comprises: as part of encouraging sparsity in the neural network, the weight values are driven toward zero for those weight values that are trending toward zero during training.

In some embodiments, the weight value corresponds to a synaptic weight value between neural nodes in a neural network of the neuromorphic integrated circuit.

In some embodiments, the method further comprises incorporating the neuromorphic integrated circuit into one or more ASSPs selected from keyword localization, speaker identification, one or more audio filters, gesture recognition, image recognition, video object classification and segmentation, and autonomous vehicles including drones.

Also disclosed herein is a method of neuromorphic integrated circuits, the method comprising in some embodiments: training a multi-layer neural network in an analog multiplier array of a plurality of two-quadrant multipliers disposed in a memory sector of a neuromorphic integrated circuit; tracking the rate of change of the weight value of each of the multipliers during training; determining whether certain weight values are trending towards zero, and how fast those weight values are trending towards zero; and for those weight values that are trending towards zero, driving the weight values towards zero, thereby encouraging sparsity in the neural network. When the input signal value of the input signal to the transistors of the multipliers is approximately zero, the weight value of the transistors of the multipliers is approximately zero, or a combination thereof, each of the multiplier lines in the multipliers is grounded and draws a negligible amount of current.

In some embodiments, the method further comprises setting a subset of the weight values to zero prior to training the neural network, thereby further encouraging sparsity in the neural network.

In some embodiments, training utilizes a training algorithm configured to drive a large number of input signal values, weight values, or a combination thereof toward zero for a multiplier, thereby enabling minimal power consumption of the neuromorphic integrated circuit.

In some embodiments, training encourages sparsity in neural networks by minimizing a cost function that includes an amount of non-zero weight values for the weight values.

In some embodiments, the method further comprises minimizing a cost function with an optimization function comprising gradient descent, back propagation, or both gradient descent and back propagation. The estimate of the power consumption of the neuromorphic integrated circuit is used as an integral part of the cost function.

Drawings

Embodiments of the present disclosure are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings in which like references indicate similar elements and in which:

FIG. 1 provides a schematic diagram illustrating a system 100 for designing and updating neuromorphic integrated circuits ("ICs"), according to some embodiments.

Fig. 2 provides a schematic diagram illustrating an analog multiplier array according to some embodiments.

Fig. 3 provides a schematic diagram illustrating an analog multiplier array according to some embodiments.

FIG. 4 provides a schematic diagram illustrating an unbiased, two-quadrant multiplier of an analog multiplier array according to some embodiments.

Detailed Description

Terminology

In the following description, certain terminology is used to describe features of the invention. For example, in some cases, the term "logic" may represent hardware, firmware, and/or software configured to perform one or more functions. Logic may include, as hardware, circuitry having data processing or storage functionality. Examples of such circuitry may include, but are not limited to or limited to, a microprocessor, one or more processor cores, a programmable gate array, a microcontroller, a controller, an application specific integrated circuit, a wireless receiver, transmitter and/or transceiver circuitry, semiconductor memory, or combinational logic.

The term "process" may include an instance of a computer program (e.g., a set of instructions, also referred to herein as an application). In one embodiment, a process may include one or more threads executing concurrently (e.g., each thread may be executing the same or different instructions concurrently).

The term "processing" may include executing a binary or script, or launching an application in which objects are processed, where launching should be interpreted as placing the application in an open state, and in some implementations, performing a simulation of actions typical of human interaction with the application.

The term "object" generally refers to a collection of data, whether in transmission (e.g., over a network) or at rest (e.g., stored), often having a logical structure or organization that enables classification or typing thereof. The terms "binary file" and "binary" will be used interchangeably herein.

The term "file" is used in a broad sense to refer to a group or collection of data, information, or other content used with a computer program. A file may be accessed, opened, stored, manipulated, or otherwise processed as a single entity, object, or unit. The file may contain other files and may contain related or unrelated content or no content at all. The files may also have a logical format or be part of a file system having a logical structure or organization of files in a plurality of forms. Files may have names (sometimes simply referred to as "filenames") and often have additional attributes or other metadata. There are many types of files, such as data files, text files, program files, and directory files. The file may be generated by a user of the computing device or by the computing device. Access to and/or operation of files may be mediated by the operating system and/or one or more applications of the computing device. The file system may organize files of a computing device of a storage device. The file system may enable tracking of files and enable access to those files. The file system may also enable operations on files. In some embodiments, operations on a file may include file creation, file modification, file opening, file reading, file writing, file closing, and file deletion.

Finally, the terms "or" and/or "as used herein are to be interpreted as inclusive or meaning any one or any combination. Thus, "A, B or C" or "A, B and/or C" means "any one of the following: a, A is as follows; b, a step of preparing a composite material; c, performing operation; a and B; a and C; b and C; A. b and C). An exception to this definition will occur only when a combination of elements, functions, steps or acts are in some way inherently mutually exclusive.

Referring now to fig. 1, a schematic diagram illustrating a system 100 for designing and updating neuromorphic ICs is provided, in accordance with some embodiments. As shown, the system 100 may include a simulator 110, a neuromorphic synthesizer 120, and a cloud 130 configured for designing and updating a neuromorphic IC (such as neuromorphic IC 102). As further shown, designing and updating the neuromorphic IC may include creating a machine learning architecture with the simulator 110 based on the particular problem. Neuromorphic synthesizer 120 may subsequently transform the machine learning architecture into a netlist (netlist) for the electronic components of neuromorphic IC 102 and the nodes to which the electronic components are connected. In addition, neuromorphic synthesizer 120 may transform the machine learning architecture into a graphic database system ("GDS") file detailing the IC layout of neuromorphic IC 102. From the netlist and GDS file of neuromorphic IC 102, neuromorphic IC 102 itself may be manufactured according to current IC manufacturing techniques. Once neuromorphic IC 102 is manufactured, it may be deployed to address specific issues for which it is designed. While the initially manufactured neuromorphic IC 102 may include initial firmware with custom synaptic weights between nodes, the initial firmware may be updated as needed by the cloud 130 to adjust the weights. Because cloud 130 is configured to update the firmware of neuromorphic IC 102, cloud 130 is not required for daily use.

Neuromorphic ICs, such as neuromorphic IC 102, may be up to 100-fold or more energy efficient than graphics processing unit ("GPU") solutions, and up to 280-fold or more energy efficient than digital CMOS solutions, with precision that meets or exceeds comparable software solutions. This makes such neuromorphic ICs suitable for battery-powered applications.

Neuromorphic ICs, such as neuromorphic IC 102, may be configured for ASSPs including, but not limited to, keyword localization, speaker identification, one or more audio filters, gesture recognition, image recognition, video object classification and segmentation, or autonomous vehicles including drones. For example, if the particular problem is one of the keyword locations, the simulator 110 may create a machine learning architecture with respect to one or more aspects of the keyword locations. Neuromorphic synthesizer 120 may then transform the machine learning architecture into a netlist and GDS file corresponding to the neuromorphic IC for keyword localization, which may be manufactured according to current IC manufacturing techniques. Once the neuromorphic IC for keyword localization is manufactured, it can be deployed to address keyword localization in a system or device, for example.

Neuromorphic ICs, such as neuromorphic IC 102, may be deployed in toys, sensors, wearable devices, augmented reality ("AR") systems or devices, mobile systems or devices, appliances, internet of things ("IoT") devices, or audible devices.

Referring now to fig. 2, a schematic diagram illustrating an analog multiplier array 200 is provided, according to some embodiments. Such an analog multiplier array may be based on a digital NOR flash array, as the core of the analog multiplier array may be similar to or identical to the core of the digital NOR flash array. That is, at least the selection and readout circuitry of the analog multiplier array is different from the digital NOR array. For example, the output current is routed to the next layer as an analog signal, rather than being converted to bits by the bit line to the sense amplifier/comparator. Word line analog is driven by an analog input signal instead of a digital address decoder. Furthermore, the analog multiplier array 200 may be used in a neuromorphic IC (such as neuromorphic IC 102). For example, the neural network may be disposed in an analog multiplier array 200 in a memory sector of the neuromorphic IC.

Since the analog multiplier array 200 is an analog circuit, the input and output current values (or signal values) may vary in a continuous range, rather than simply being on or off. This is useful for storing weights (also known as coefficients) of the neural network, as opposed to digital bits. In operation, the weights are multiplied by the input current values to provide output current values that are combined to arrive at a decision for the neural network.

Analog multiplier array 200 may utilize standard programming and erase circuitry to generate tunneling and erase voltages.

Referring now to fig. 3, a schematic diagram illustrating an analog multiplier array 300 is provided, in accordance with some embodiments. The analog multiplier array 300 may use two transistors (e.g., positive metal oxide semiconductor field effect transistors [ "MOSFETs";]and a negative MOSFET) to perform a square multiplication of signed weights (e.g., positive weights or negative weights) with non-negative input current values. If the input current value is multiplied by a positive or negative weight, the product or output current value may be positive or negative, respectively. The positively weighted product may be stored in a first column (e.g., corresponding to analog multiplier array 300And the product of the negative weights can be stored in a second column (e.g., corresponding to +.>Column of (d). The product or output signal value of the foregoing positive and negative weights can be considered a differential current value to provide useful information for making decisions.

Because each output current from the positive or negative transistor is wired to ground and is proportional to the product of the input current value and the positive or negative weight, respectively, the power consumption of the positive or negative transistor approaches zero when the input current value or weight is at or near zero. I.e. if the input signal value is "", or if the weight is" +.>By "the corresponding transistor of the analog multiplier array 300 will not consume power. This is significant because in many neural networks often most of the values or weights, especially after training, are "+.>". Thus, energy is saved when nothing is to be done or nothing happens. This is different from multipliers based on differential pairs, which consume a constant current (e.g. by means of tail bias current +.>) Regardless of the input signal.

Referring now to fig. 4, a schematic diagram of an unbiased, two-quadrant multiplier 400 illustrating an analog multiplier array (such as analog multiplier array 300) is provided, in accordance with some embodiments. As previously set forth, because each output current from a positive transistor (e.g., M1 of the two-quadrant multiplier 400) or a negative transistor (e.g., M2 of the two-quadrant multiplier 400) is proportional to the product of the input current value and the positive or negative weight, respectively, the power consumption of the positive or negative transistor approaches zero (or is zero) when the input current value or weight approaches zero (or is zero). This is different from differential pair based multipliers, which consume a constant current (e.g., by means of tail bias current) Regardless of the input signal.

When sparsity (many zeros) is encouraged in a neural network consisting of such unbiased two-quadrant multipliers via training, a significant power savings can be achieved. That is, a neural network in an analog multiplier array of a plurality of two-quadrant multipliers disposed in a memory sector of a neuromorphic integrated circuit may be trained to encourage sparsity in the neural network, thereby minimizing power consumption of the neuromorphic IC. The subset of weight values may even be set to zero before the neural network is trained, further encouraging sparsity in the neural network and minimizing power consumption of the neuromorphic IC. In practice, the power consumption of the neuromorphic IC can be minimized, such that the neuromorphic IC can operate on battery power.

Training the neural network may include training with a training algorithm configured to drive a large number of input current values, weight values, or a combination thereof toward zero for a plurality of multipliers, thereby encouraging sparsity in the neural network and minimizing power consumption of the neuromorphic IC. The training may be iterative and the weight values may be adjusted with each iteration of the training. The training algorithm may be further configured to track a rate of change of the weight value of each of the plurality of multipliers to drive the weight value toward zero. The rate of change of the weight values may be used to determine whether and how quickly certain weight values are trending toward zero, which may be used in training to drive the weight values toward zero faster, such as by programming the weight values to approximately zero or zero. Further, training the neural network and encouraging sparsity in the neural network may include minimizing a cost function including an amount of non-zero weight values of the weight values. Minimizing the cost function may include using an optimization function that includes gradient descent, back propagation, or both gradient descent and back propagation. The estimate of the power consumption of the neuromorphic integrated circuit may be used as an integral part of the cost function.

When programming a two-quadrant multiplier such as the unbiased, two-quadrant multiplier 400, it is common to erase each of its programmable cells (e.g., the cell including transistor M1 and the cell including transistor M2) to set the cell to one of the extreme weight values before setting each cell to its target weight value. This is extended to a complete array such as analog multiplier array 300 where all programmable cells in the complete array are set to one extreme weight value before each cell is set to its target weight value. When setting the units to their desired weight values, overshoot problems exist if one or more units are set with the weight value of the ratio target. That is, all cells in the complete array must be reset to one extreme weight value before the cells are reset to their target weight value. However, the differential structure of each unbiased, two-quadrant multiplier of the analog multiplier array provided herein allows such overshoot to be compensated for by programming, avoiding the time-consuming process of erasing and resetting all cells in the array.

In the example of compensating for overshoot by programming, the two-quadrant multiplier 400And->May be erased to set the cell to an extreme weight value. After erasing the cell, if->Programmed with an excessive weight value +.>Can be programmed to have a weight value greater than the initial target to compensate +.>And achieves the initial target effect. Thus, the differential structure can be utilized to compensate for programming overshoot without having to erase any one or more cells and restart.

The foregoing systems and methods encourage sparsity in the neural network of the neuromorphic IC and minimize power consumption of the neuromorphic IC so that the neuromorphic IC can operate on battery power.

In the foregoing specification, the invention has been described with reference to specific exemplary embodiments thereof. It will, however, be evident that various modifications and changes may be made thereto without departing from the broader spirit and scope of the invention as set forth in the appended claims.

Claims

1. An integrated circuit, comprising:

a multi-layer neural network disposed in an analog multiplier array of a plurality of two-quadrant multipliers arranged in memory sectors of the integrated circuit, wherein at least one or more of the plurality of two-quadrant multipliers is an unbiased two-quadrant multiplier,

wherein when the input signal value of the input signal to the transistors of the multipliers is near or zero, the weight value of the transistors of the multipliers is near or zero, or a combination thereof, each of the multiplier lines in the multipliers is grounded and draws a reduced amount of current, and

wherein sparsity in a neural network combined with a plurality of multipliers wired to ground reduces power consumption of the integrated circuit.

2. The integrated circuit of claim 1, wherein each of the multipliers does not draw current when an input signal value of an input signal to a transistor of the multiplier is zero, a weight value of a transistor of the multiplier is zero, or a combination thereof.

3. The integrated circuit of claim 1, wherein the weight value corresponds to a synaptic weight value between neural nodes in a neural network disposed in the integrated circuit.

4. The integrated circuit of claim 3, wherein the input signal value multiplied by the weight value provides an output signal value, which is combined to arrive at a decision for the neural network.

5. The integrated circuit of claim 1, wherein the transistor of the two-quadrant multiplier comprises a metal oxide semiconductor field effect transistor ("MOSFET").

6. The integrated circuit of claim 1, wherein each unbiased one of the two-quadrant multipliers has a differential structure configured to allow programmed compensation for overshoot if either one of the two cells is set to have a weight value of the ratio target.

7. The integrated circuit of claim 1, wherein the integrated circuit is configured for one or more specialized standard products ("ASSPs") selected from keyword localization, speaker identification, one or more audio filters, gesture recognition, image recognition, video object classification and segmentation, and autonomous vehicles including drones.

8. The integrated circuit of claim 1, wherein the integrated circuit is configured to operate on battery power.

9. A method, comprising:

training a multi-layer neural network in an analog multiplier array of a plurality of two-quadrant multipliers disposed in a memory sector of an integrated circuit, wherein at least one or more of the plurality of two-quadrant multipliers is an unbiased two-quadrant multiplier;

wherein each of the multiplier lines is grounded and draws a first amount of current when an input signal value of an input signal to a transistor of the multiplier is near zero or zero, a weight value of a transistor of the multiplier is near zero or zero, or a combination thereof; and

the power consumption of the integrated circuit is reduced by encouraging sparsity in the neural network by training with a training algorithm configured to drive a plurality of input signal values, weight values, or combinations thereof toward zero or near zero for the multiplier.

10. The method according to claim 9, wherein:

when the input signal value of the input signal to the transistors of the multipliers is zero, the weight value of the transistors of the multipliers is zero, or a combination thereof, each of the multipliers does not draw current; and

each unbiased one of the two-quadrant multipliers has a differential structure configured to allow programmed compensation for overshoot if either of the two cells is set to have a weight value of the ratio target.

11. The method of claim 9, further comprising:

tracking the rate of change of the weight value of each of the multipliers during training; and

it is determined whether one or more weight values are tending to fall below a threshold or to zero, and how fast those one or more weight values are tending to fall below a threshold or to zero.

12. The method of claim 11, further comprising:

as part of encouraging sparsity in the neural network, the weight values are driven toward zero or near zero for those one or more weight values that are tending to be below a threshold or tending to zero during training.

13. The method of claim 11, wherein the weight value corresponds to a synaptic weight value between neural nodes in a neural network of the integrated circuit.

14. A method, comprising:

wherein each of the multipliers is wired to ground and draws a first amount of current when an input signal value of an input signal to a transistor of the multiplier is near zero or zero, a weight value of a transistor of the multiplier is near zero or zero, or a combination thereof;

tracking the rate of change of the weight value of each of the multipliers during training;

determining whether one or more weight values are tending to fall below a threshold or to zero, and how fast those one or more weight values are tending to fall below a threshold or to zero; and

for those one or more weight values that are tending to be below a threshold or tending to zero, the weight values are driven toward zero or near zero, encouraging sparsity in the neural network.

15. The method according to claim 14, wherein:

16. The method of claim 14, further comprising:

the subset of weight values is set to zero prior to training the neural network, thereby further encouraging sparsity in the neural network.

17. The method of claim 14, wherein training utilizes a training algorithm configured to drive a plurality of input signal values, weight values, or combinations thereof toward zero or near zero for a multiplier, thereby reducing power consumption of the integrated circuit.

18. The method of claim 14, wherein training encourages sparsity in a neural network by minimizing a cost function, the cost function including an amount of non-zero weight values for weight values.

19. The method of claim 14, further comprising:

minimizing a cost function with an optimization function, the optimization function comprising gradient descent, back propagation, or both gradient descent and back propagation;

wherein an estimate of the power consumption of the integrated circuit is used as an integral part of the cost function.

20. The method of claim 14, further comprising:

the integrated circuit is incorporated into one or more specialized standard products ("ASSPs") selected from keyword localization, speaker identification, one or more audio filters, gesture recognition, image recognition, video object classification and segmentation, and autonomous vehicles including unmanned aerial vehicles.