WO1991019248A1 - Neural network using virtual-zero - Google Patents

Neural network using virtual-zero Download PDF

Info

Publication number
WO1991019248A1
WO1991019248A1 PCT/US1990/003067 US9003067W WO9119248A1 WO 1991019248 A1 WO1991019248 A1 WO 1991019248A1 US 9003067 W US9003067 W US 9003067W WO 9119248 A1 WO9119248 A1 WO 9119248A1
Authority
WO
WIPO (PCT)
Prior art keywords
memory
data
unit
address
zero
Prior art date
Application number
PCT/US1990/003067
Other languages
French (fr)
Inventor
Daniel W. Hammerstrom
Original Assignee
Adaptive Solutions, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Adaptive Solutions, Inc. filed Critical Adaptive Solutions, Inc.
Priority to JP2512682A priority Critical patent/JPH05501317A/en
Priority to EP19900913599 priority patent/EP0485522A4/en
Priority to PCT/US1990/003067 priority patent/WO1991019248A1/en
Publication of WO1991019248A1 publication Critical patent/WO1991019248A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology

Definitions

  • the instant invention relates to a computer processor architecture, and specifically to an architecture which provides a virtual-zero structure in the form of a selective data manipulation architecture to conserve memory usage in a sparse filled matrix or to selectively handle input.
  • memory matrixes are defined as being full, which are generally considered to be between 20-30% up to 100% full of non- zero data, and sparse matrixes, which are defined as being less than 20-30% full.
  • a computer memory matrix may be a 1,000 X 1,000 array, which is capable of holding 1,000,000 words of data.
  • the memory may only be 5% full, containing only 50K words of actual data.
  • a nervous system and a neurocomputational computer, is characterized by continuous, non-symbolic, and massively parallel structure that is fault-tolerant of input noise and hardware failure.
  • Representations, ie, the input is distributed among groups of computing elements, which independently reach a result or conclusion, and which then generalize and interpolate information to reach a final output conclusion.
  • connectionist/neural networks search for "good' solutions using massively parallel computations of many small computing elements.
  • the model is one of parallel hypothesis generation and relaxation to the dominant, or "most-likely" hypothesis.
  • the search speed is more or less independent of the size of the search space.
  • Learning is a process of incrementally changing the connection (synaptic) strengths, as opposed to allocating data structures. "Programming" in such a neural network is by example.
  • An object of the invention is to provide a processor architecture which eliminates the storage of superfluous zero-type data for neural network emulation.
  • Another object of the invention is to provide a processor which analyzes a data string for non-zero values and conventionally stores such a data string.
  • a further object of the invention is to provide a processor which will generate "compressed" zero data strings to simulate zero-filled memory.
  • Still another object of the invention is to provide a computer architecture which will allow a manipulation unit to operate on selected portions of an input vector.
  • the virtual-zero architecture of the invention is intended for use in a single instruction stream, multiple data stream (SIMD) processor which includes an input bus, an input unit, manipulation units, an output unit and an output bus.
  • SIMD single instruction stream, multiple data stream
  • the virtual-zero architecture includes a memory unit for storing data, an arithmetic unit for mathematically operating on the data, a memory address generation unit with an adder for computing a next memory address.
  • the memory address generation unit includes an address register in the memory unit for identifying the address of a particular data block, a counter for counting the number of memory addresses in a particular data block, and a shift register for providing a data-void address in the memory unit if and only if all of the entries in the data block are zero.
  • the memory and the address unit provide zero-value data blocks to the arithmetic unit to simulate the data block having the data-void address during processing.
  • FIG. 1 is a schematic diagram of a broadcast communication pattern of communication nodes contained within processor nodes of a SIMD architecture neural network.
  • Fig. 2 is a schematic, block diagram of a virtual-zero architecture which is part of a SIMD processor.
  • Fig. 3 is a block diagram of the various registers and manipulation units of the virtual-zero architecture of Fig. 1, shown in greater detail.
  • Fig. 4 represents a non-zero data block.
  • Fig. 5 represents a zero-value data block.
  • Fig. 6 is a block diagram of virtual-zero storage.
  • Fig. 7 is a block diagram depicting a selective input of the virtual- zero architecture.
  • the virtual-zero architecture of the invention is primarily intended for use in single instruction stream, multiple data stream (SIMD) processors, which may be part of a neural computer for emulating a neural network. It should be understood that the virtual-zero architecture may be used in other types of processor units.
  • SIMD single instruction stream, multiple data stream
  • the normal SIMD processor node includes an input unit, a logic unit, an addition unit, a multiplier unit, a register unit, an output unit, and a weight address memory unit, which are collectively referred to herein as manipulation units.
  • a single processor node may contain two or more connection nodes (CN) which provide data manipulation capabilities for the PN.
  • a CN is a state associated with an emulated node in a neural network located in a PN.
  • Each PN may have several CNs located therein. The PNs may broadcast to other PNs to transfer data and instructions.
  • connection nodes 0-7 (12, 14, 16, 18, 20, 22, 24 and 26, respectively) are depicted.
  • the CNs are arranged in "layers", with CN0 - CN3 comprising one layer, while CN4 - CN7 comprise a second layer.
  • the array depicted would generally include four PNs, with CN0 and CN4 being located in a first PN, CN2 and CN5 being located in a second PN, etc.
  • connection nodes operate in what is referred to as a broadcast hierarchy, wherein each of connection nodes 0-3 broadcast to each of connection nodes 4-7.
  • a broadcast hierarchy wherein each of connection nodes 0-3 broadcast to each of connection nodes 4-7.
  • An illustrative technique for arranging such a broadcast hierarchy is disclosed in U.S. Patent No. 4,796,199, NEURAL-MODEL ⁇ FORMATION-HANDLING ARCHITECTURE AND METHOD, to Hammerstrom et al, January 3, 1989, which is incorporated herein by reference.
  • the available processor nodes may be thought of as a "layer" of processors, each executing its function (multiply, accumulate, and increment weight index) for each input, on each clock, wherein one processor node broadcasts its output to all other processor nodes.
  • the output processor node arrangement it is possible to provide n 2 connections in n clocks using only a two layer arrangement.
  • conventional SIMD structures may accomplish n 2 connections in n clocks, but require a three layer configuration, or 50% more structure.
  • the boundaries of the individual chips do not interrupt broadcast through processor node arrays, as the arrays may span as many chips as are provided in the architecture.
  • Architecture 30 includes a memory address generation unit 32 which further includes an address register 34, a rotation register 36 and a counter 38.
  • Architecture 30 also includes a memory unit 40 and an arithmetic unit 42.
  • An output bus 48 is provided to transfer information from the virtual-zero architecture to other components of the processor node.
  • rotation register 36 includes 4 16- bit virtual-zero registers, which are designated VZ0_B, 36a, VZ1_B, 36b, VZ2_B, 36c. and VZ3_B, 364 These virtual-zero registers are programmable to allow the user to set the size of a memory block which will be controlled by the virtual-zero register when the virtual-zero register is activated. At other times, the virtual-zero registers may function as would any convention register.
  • partition memory for instance, into blocks of 64 words, each of which will be assigned to one bit of the rotation register. Given that, in the preferred embodiment, there are 4K words of memory and 64 bits in four virtual rotation registers, the 64 word figure is a convenient division of the available memory. It should be appreciated that the available memory may be partitioned by the programmer in any desired configuration.
  • Counter 38 includes an increment counter, vzint, 38a, which operates with two 8-bit registers: vzcnt, 38b and vzreg, 38c.
  • Increment counter 38b counts the size of the virtual-zero block and therefore determines the number of increments to be counted between rotation of rotation register 36. Put another way, increment counter 38b determined the number of memory references between rotations of register 36 and therefore determines the size of the virtual-zero block.
  • Memory unit 40 includes a weight offset unit 40a, which sets the stride of the virtual-zero architecture and a weight base unit 40b which determines the current address in memory.
  • Figs. 4 and 5 represent exemplary data strings which, for purposes of illustration, each contain a sequence of four data words, each having 8 bits therein.
  • Fig. 4 represents a non-zero data string, wherein at least some of the bits, such as bit 50, is a non-zero value.
  • the zero-value data block is exactly what the name implies: a data block that contains nothing but zero data bits, such as bit 52, in the words thereof.
  • the virtual-zero mechanism has the effect of setting arbitrarily sized portions of memory to zero without actually using physical memory locations.
  • the virtual-zero architecture creates a sparse matrix organization. This does not solve the problem of having processors doing repeat zero manipulations, it is simply easier to simulate zero filled memory than to solve the problem of idle processors.
  • the virtual-zero architecture effectively compresses zero-value data blocks by the following technique:
  • the SIMD program executes a weight base increment after each multiply-add. This operation adds the weight offset to the weight base address in memory unit 40 and creates a new base address for the next Wy.
  • virtual zero counter 38 is decremented at the same time as a weight base increment is performed.
  • rotation register 36 is rotated and virtual zero counter (vzcnt 38k) is loaded with the size of the virtual zero segment, vzreg 38c.
  • the least significant bit (LSB - the right-most bit) of the lowest virtual zero register, 36a - 36d, is checked. If the LSB is "1", the memory subsystem operates normally, ie, the weight base is updated with the weight offset during each weight base increment operation, after each weight is read, and the weight memory reads or writes data normally.
  • the memory subsystem creates virtual zeros, ie, during a "read” function, the actual data read out of memory unit 40 is ignored, and zeros are placed on bus 46 and sent to arithmetic unit 42, just as if actual zeros had been stored in memory unit 40.
  • a "write” function the data on bus 46, which is to be written to memory unit 40, is ignored and an actual "write” is not performed.
  • the normal update to the weight base is not executed during the weight base increment operation.
  • Counter 38 is, however, decremented and rotation register 36 is still rotated.
  • Address register 34 is operable to "map" memory to provide the location of the non-zero data blocks, as well as to provide "phantom" addresses for zero-value data blocks.
  • FIG. 6 the placement of non-zero data into memory 40 is depicted.
  • the first bit, 56 in register 36 is a zero, indicating that all of the data which would have gone into memory unit 40 was zero.
  • This data is "held" in virtual memory 54, which is really only an indicator that a bit in rotation register 36 is a zero.
  • memory address unit 32 will generate zero data which will be sent to arithmetic unit 42 to be operated upon.
  • the second bit 58 in register 36 contains a 1, which indicates that there is some non-zero data which is to be assigned to a corresponding section of memory 40. That data is conventionally stored in memory 40, and includes both zero and non-zero data. As a address from memory unit 40 is loaded into rotation register
  • the data, V j is retrieved from the memory unit, along with weighted addresses, Wy, which are acted upon by arithmetic unit 42 resulting in ⁇ W t - V y -
  • One way of implementing the virtual-zero architecture is in a neural-network processor which includes an array of general purpose registers, and allocating certain registers in the general purpose register array as virtual- zero registers, to provide the function of address register 34, shift register 36 and counter 38.
  • This configuration allows the user to program the processor so that the number of bits in a data block may be varied to suit the particular application and to enable or to disable the virtual-zero feature.
  • the following code is a simplification of the code that describes the actual CMOS implementation of the Virtual PN in a neurocomputer chip.
  • the code shown below is in the C programming language embellished by certain predefined macros.
  • the code is used as a register transfer level description language in the actual implementation of the circuitry described here.
  • Bolded text indicates a signal, hardware or firmware component, or a phase or clock cycle.
  • the phi and ph2 variables simulate the two phases in the two- phase, non-overlapping clock used to implement dynamic MOS devices.
  • the post-fix "_D” on some signal names means a delayed version of the signal
  • “_B” means a bus (more than one signal line)
  • “_1” means a dynamic signal that is only valid during phi. These can be combined arbitrarily.
  • the virtual-zero registers are functionally a part of the register file and so reading and writing the register file to/from the virtual-zero register address (to/from the PN's internal buses) will provide access to these registers.
  • the data is read onto the internal bus 46, which is also designated as Abus.
  • reg_B[r_B] is the register address: i f ( ( ph 2 ) AND b ( v cva l ) AND b
  • vcval indicates that the command signal, wtcnt_l signal is valid, which indicates that the memory stride is to be added to the base (wtbse) register. Stride is the offset value for those increments which are gained through the memory unit: if ( (phi) ANDb (vz) ANDb (vcval) ANDb (wtinc_l)
  • the virtual-zero segment count is then decremented, as indicated by the ⁇ following vzreg_B. If the segment count goes to zero, the shift register is
  • vzreg_B vzcnt_B
  • tmpl (SIGNAL) (VZ3_B AND Oxl)
  • vz3_B vz3_B » 1
  • tmp2 • (SIGNAL) (VZ2_B AND Oxl)
  • VZ ⁇ _B VZ0_B OR (BUS) (tmpl « 15) ;
  • the virtual-zero registers are rotated if phi and vz and vzreg are all equal to zero.
  • the preceding indicates whether the least significant bit in the virtual-zero shift register is 1 or 0.
  • vzlsb is tested by the memory base address, stride offset adder and the memory access unit. If vzlsb is not asserted (0), then the base address is not performed and zeros are read from memory (or no write is performed) when wtinc_l is asserted.
  • the wmUNIT contains the weight memory (and drivers and sense amps). Memory is read and written in ph2. The next address is computed in phi, and along with written data is phi trapped. Note that the virtual-zero mechanism also works on writes as, when the virtual-zero mechanism is on, the write simply doesn't occur.
  • the wtm_2 control signals that a write is to occur. if ( (ph2) ANDb (vzoff) ANDb (vcval) ANDb (wtm_2) ) ⁇ mp B (wrbse B) abus B2 ;
  • both the hi and low bytes are always read and store lsb of address for phi read out.
  • Another function of the virtual-zero architecture structure relates to the architecture's ability to provide a selective input, during memory write, to the system incorporating the architecture.
  • the virtual-zero architecture enables a programmer to select a portion of an input for analysis.
  • Vector 60 includes segments 62-76. Each segment may comprise a predetermined number of bits, a word, etc. The size of the segment may be determined by a programmer and only a portion of the input vector, comprising a specific number of segments, will be input to the array of PNs for analysis and processing. Exemplary portions of input vector 60 are indicated by brackets 78, 80 and 82. Bracket 78 indicates a specified portion of the input vector comprising three segments, while brackets 80 and 82 each encompass four segments. There may be a certain amount of overlapping between the specified portions, for instance, segment 68 is included in all of the specified portions.
  • the virtual-zero registers, 36a - 6d are used to isolate the specified portions of input vector 60.
  • the isolated portions may have zero or non-zero values. If the portions are zero filled, the previously described virtual-zero mapping may be activated to conserve memory. Only the specified portion of input vector 60 will be stored in memory and subsequently operated on by the PNs.
  • This technique may be used, for instance, during certain types of image processing when it is desired to only look at a subset of the total input space. The subset may be selected by the virtual-zero write mechanism. Another use for this technique is to capture and store only a desired part of an input vector on each PN.
  • a microcircuit architecture which conserves memory resources. In those cases where sparse connectivity is used, a number of zero connections are required.
  • a neural network processor provides a large number of processor nodes, which are relatively inexpensive to provide. Idle processor cycles are therefore not a major concern.
  • memory is relatively expensive and large portions of memory with zero elements is neither efficient nor desirable.
  • the virtual-zero architecture provides a more efficient utilization of memory for those situations where connectivity is sparse or localized. Virtual-zeroes are intended to be used in neural network models where connection nodes have a limited receptive field size but where there is fairly total connectivity within the receptive field, or where there is sparse random connectivity.
  • the virtual-zero architecture creates an efficient sparse matrix organization.
  • the architecture assumes that zero weights indicate null connections. Therefore, during any weight update process, a test must be made to guarantee that the weight is not updated to be non-zero. This operation can be performed efficiently using conventional conditional execution techniques.
  • the architecture may also be used to provide a selective input to memory and processor nodes by restricting the input to memoiy of a selected portion of an input vector according to a selected program.
  • Industrial Application Processors constructed according to the invention are useful in neural network systems which may be used to simulate human brain functions in analysis and decision making applications.

Abstract

A virtual-zero architecture is intended for use in a single instruction stream, multiple data stream (SIMD) processor which includes an input bus, an input unit, manipulation units, an output unit and an output bus. The virtual-zero architecture includes a memory unit (40) for storing data, an arithmetic unit (42) for mathematically operating on the data, a memory address generation unit (32) and an adder for computing a next memory address. The memory address generation unit (32) includes an address register (34) in the memory unit for identifying the address of a particular data block, a counter (38) for counting the number of memory addresses in a particular data block, and a rotation register (36) for providing a data-void address in the memory unit if and only if all of the entries in the data block are zero. The memory (40) and the address (32) units provide zero-value data blocks to the arithmetic unit (44) to simulate the data block having the data-void address during processing. The architecture may also be used to selectively handle input to a system.

Description

NEURAL NETWORK USING VIRTUAL-ZERO
Technical Field The instant invention relates to a computer processor architecture, and specifically to an architecture which provides a virtual-zero structure in the form of a selective data manipulation architecture to conserve memory usage in a sparse filled matrix or to selectively handle input.
Background Art In neural networks, memory matrixes are defined as being full, which are generally considered to be between 20-30% up to 100% full of non- zero data, and sparse matrixes, which are defined as being less than 20-30% full. A computer memory matrix may be a 1,000 X 1,000 array, which is capable of holding 1,000,000 words of data. In the case of a sparse matrix, the memory may only be 5% full, containing only 50K words of actual data.
In the case of a neural network type of processor node, the data stored in memory is cycled on a clock, or cycle, basis, with every memory location being acted upon by processor units which manipulate data. Particularly in the case of a matrix that is only 5 to 10% full, a great deal memory space is "filled" with zero information. From an efficiency standpoint, the occupation of hundreds of thousands of memory addresses with zero information is not acceptable. Memory is probably the most expensive component of computer technology.
There are several important practical problems that cannot be solved using existing, conventional algorithms executed by traditional, conventional computers. These problems are often incompletely specified and are characterized by many weak constraints requiring large search spaces.
The processing of primary cognitive information by computers, such as computer speech recognition, computer vision, and robotic control, fall into this category. Traditional computational models bog down to the point of failure under the computational load if they are tasked to solve these types of problems. Yet animals perform these tasks using neurons that are millions of times slower than transistors. Feldman's 100-step rule states that a "human" cognitive process having a time of 500 msec, can be accomplished in 5 msec neuron switching time. This implies that there are two vastly different computational models at work. It also suggests that in order to build computers that will do what nervous systems do, the computers should be structured more like nervous systems.
A nervous system, and a neurocomputational computer, is characterized by continuous, non-symbolic, and massively parallel structure that is fault-tolerant of input noise and hardware failure. Representations, ie, the input, is distributed among groups of computing elements, which independently reach a result or conclusion, and which then generalize and interpolate information to reach a final output conclusion. Put another way, connectionist/neural networks search for "good' solutions using massively parallel computations of many small computing elements. The model is one of parallel hypothesis generation and relaxation to the dominant, or "most-likely" hypothesis. The search speed is more or less independent of the size of the search space. Learning is a process of incrementally changing the connection (synaptic) strengths, as opposed to allocating data structures. "Programming" in such a neural network is by example.
Disclosure of the Invention An object of the invention is to provide a processor architecture which eliminates the storage of superfluous zero-type data for neural network emulation.
Another object of the invention is to provide a processor which analyzes a data string for non-zero values and conventionally stores such a data string. A further object of the invention is to provide a processor which will generate "compressed" zero data strings to simulate zero-filled memory.
Still another object of the invention is to provide a computer architecture which will allow a manipulation unit to operate on selected portions of an input vector. The virtual-zero architecture of the invention is intended for use in a single instruction stream, multiple data stream (SIMD) processor which includes an input bus, an input unit, manipulation units, an output unit and an output bus. The virtual-zero architecture includes a memory unit for storing data, an arithmetic unit for mathematically operating on the data, a memory address generation unit with an adder for computing a next memory address. The memory address generation unit includes an address register in the memory unit for identifying the address of a particular data block, a counter for counting the number of memory addresses in a particular data block, and a shift register for providing a data-void address in the memory unit if and only if all of the entries in the data block are zero. The memory and the address unit provide zero-value data blocks to the arithmetic unit to simulate the data block having the data-void address during processing. These and other objects and advantages of the invention will become more fully apparent as the description which follows is read in conjunction with the drawings.
Brief Description of the Drawings Fig. 1 is a schematic diagram of a broadcast communication pattern of communication nodes contained within processor nodes of a SIMD architecture neural network.
Fig. 2 is a schematic, block diagram of a virtual-zero architecture which is part of a SIMD processor.
Fig. 3 is a block diagram of the various registers and manipulation units of the virtual-zero architecture of Fig. 1, shown in greater detail.
Fig. 4 represents a non-zero data block. Fig. 5 represents a zero-value data block. Fig. 6 is a block diagram of virtual-zero storage. Fig. 7 is a block diagram depicting a selective input of the virtual- zero architecture.
Best Mode For Carrying Out The Invention The virtual-zero architecture of the invention is primarily intended for use in single instruction stream, multiple data stream (SIMD) processors, which may be part of a neural computer for emulating a neural network. It should be understood that the virtual-zero architecture may be used in other types of processor units. The normal SIMD processor node includes an input unit, a logic unit, an addition unit, a multiplier unit, a register unit, an output unit, and a weight address memory unit, which are collectively referred to herein as manipulation units.
A single processor node (PN) may contain two or more connection nodes (CN) which provide data manipulation capabilities for the PN. A CN is a state associated with an emulated node in a neural network located in a PN. Each PN may have several CNs located therein. The PNs may broadcast to other PNs to transfer data and instructions.
Referring initially to Fig. 1, broadcast patterns in an array 10 of PNs which contain connection nodes 0-7 (12, 14, 16, 18, 20, 22, 24 and 26, respectively) are depicted. The CNs are arranged in "layers", with CN0 - CN3 comprising one layer, while CN4 - CN7 comprise a second layer. The array depicted would generally include four PNs, with CN0 and CN4 being located in a first PN, CN2 and CN5 being located in a second PN, etc. There may be more than two layers of connection nodes in any one processor node or in any array of processor nodes. The connection nodes operate in what is referred to as a broadcast hierarchy, wherein each of connection nodes 0-3 broadcast to each of connection nodes 4-7. An illustrative technique for arranging such a broadcast hierarchy is disclosed in U.S. Patent No. 4,796,199, NEURAL-MODEL ^FORMATION-HANDLING ARCHITECTURE AND METHOD, to Hammerstrom et al, January 3, 1989, which is incorporated herein by reference.
Conceptually, the available processor nodes may be thought of as a "layer" of processors, each executing its function (multiply, accumulate, and increment weight index) for each input, on each clock, wherein one processor node broadcasts its output to all other processor nodes. By using the output processor node arrangement, it is possible to provide n2 connections in n clocks using only a two layer arrangement. Known, conventional SIMD structures may accomplish n2 connections in n clocks, but require a three layer configuration, or 50% more structure. The boundaries of the individual chips do not interrupt broadcast through processor node arrays, as the arrays may span as many chips as are provided in the architecture.
In a sparse matrix, some of the connections between the CNs may not exist, and are represented by zero data. Referring now to Fig. 2, a virtual-zero architecture is depicted generally at 30. Architecture 30 includes a memory address generation unit 32 which further includes an address register 34, a rotation register 36 and a counter 38. Architecture 30 also includes a memory unit 40 and an arithmetic unit 42.
Information enters architecture 30 over an input bus 44.
Information moves between the memory address generation unit, the memory unit and the arithmetic unit on an internal bus 46. An output bus 48 is provided to transfer information from the virtual-zero architecture to other components of the processor node.
Referring now to Fig. 3, the architecture of Fig. 2 is shown in greater detail. Among the components in address register 34 are a read bus. select module 34a and a write bus select module 34b, which determine whether rotation register 36 will be read or written to. In the preferred embodiment, rotation register 36 includes 4 16- bit virtual-zero registers, which are designated VZ0_B, 36a, VZ1_B, 36b, VZ2_B, 36c. and VZ3_B, 364 These virtual-zero registers are programmable to allow the user to set the size of a memory block which will be controlled by the virtual-zero register when the virtual-zero register is activated. At other times, the virtual-zero registers may function as would any convention register. The most efficient way to divide memory using the virtual-zero architecture is to. partition memory, for instance, into blocks of 64 words, each of which will be assigned to one bit of the rotation register. Given that, in the preferred embodiment, there are 4K words of memory and 64 bits in four virtual rotation registers, the 64 word figure is a convenient division of the available memory. It should be appreciated that the available memory may be partitioned by the programmer in any desired configuration.
Counter 38 includes an increment counter, vzint, 38a, which operates with two 8-bit registers: vzcnt, 38b and vzreg, 38c. Increment counter 38b counts the size of the virtual-zero block and therefore determines the number of increments to be counted between rotation of rotation register 36. Put another way, increment counter 38b determined the number of memory references between rotations of register 36 and therefore determines the size of the virtual-zero block.
Memory unit 40 includes a weight offset unit 40a, which sets the stride of the virtual-zero architecture and a weight base unit 40b which determines the current address in memory. Figs. 4 and 5 represent exemplary data strings which, for purposes of illustration, each contain a sequence of four data words, each having 8 bits therein. Fig. 4 represents a non-zero data string, wherein at least some of the bits, such as bit 50, is a non-zero value. The zero-value data block is exactly what the name implies: a data block that contains nothing but zero data bits, such as bit 52, in the words thereof.
As previously noted, the virtual-zero mechanism has the effect of setting arbitrarily sized portions of memory to zero without actually using physical memory locations. When combined with zero weights, the virtual-zero architecture creates a sparse matrix organization. This does not solve the problem of having processors doing repeat zero manipulations, it is simply easier to simulate zero filled memory than to solve the problem of idle processors.
The virtual-zero architecture effectively compresses zero-value data blocks by the following technique: During vector multiplication and accumulation, the SIMD program executes a weight base increment after each multiply-add. This operation adds the weight offset to the weight base address in memory unit 40 and creates a new base address for the next Wy. When the virtual zero function is "on", virtual zero counter 38 is decremented at the same time as a weight base increment is performed. When counter 38 goes to 0, rotation register 36 is rotated and virtual zero counter (vzcnt 38k) is loaded with the size of the virtual zero segment, vzreg 38c.
Each time rotation register 36 is rotated, the least significant bit (LSB - the right-most bit) of the lowest virtual zero register, 36a - 36d, is checked. If the LSB is "1", the memory subsystem operates normally, ie, the weight base is updated with the weight offset during each weight base increment operation, after each weight is read, and the weight memory reads or writes data normally.
If, however, the LSB of virtual zero register 36 is "0", the memory subsystem creates virtual zeros, ie, during a "read" function, the actual data read out of memory unit 40 is ignored, and zeros are placed on bus 46 and sent to arithmetic unit 42, just as if actual zeros had been stored in memory unit 40. During a "write" function, the data on bus 46, which is to be written to memory unit 40, is ignored and an actual "write" is not performed. Further, the normal update to the weight base is not executed during the weight base increment operation. Counter 38, is, however, decremented and rotation register 36 is still rotated.
The effect of this mechanism creates blocks of zero whose size is equal to the number of words in the virtual zero segment size. The existence of a virtual zero segment is determined by the LSB of the lowest virtual zero register.
If a non-zero-value data block, as depicted in Fig. 4, having any non-zero bit 50, is input over input bus 44, the non-zero values are detected by counter 38 and the entire block is stored in memory unit 40 as conventional data, including zero bits. Address register 34 is operable to "map" memory to provide the location of the non-zero data blocks, as well as to provide "phantom" addresses for zero-value data blocks.
Referring to Fig. 6, the placement of non-zero data into memory 40 is depicted. The first bit, 56 in register 36 is a zero, indicating that all of the data which would have gone into memory unit 40 was zero. This data is "held" in virtual memory 54, which is really only an indicator that a bit in rotation register 36 is a zero. As previously noted, during a write operation, memory address unit 32 will generate zero data which will be sent to arithmetic unit 42 to be operated upon.
The second bit 58 in register 36 contains a 1, which indicates that there is some non-zero data which is to be assigned to a corresponding section of memory 40. That data is conventionally stored in memory 40, and includes both zero and non-zero data. As a address from memory unit 40 is loaded into rotation register
36, it is analyzed for content. If the LSB is 0, and if the next bit is 1, counter 38 is set to 8, and then "reads" 8 zero words out of virtual memory - that memory which does not really exist. The arithmetic unit receives zero data and counter 38 is decremented by 1. When counter 38 reaches 0, rotation register 36 shifts the new LSB, which is a 1, and reads data from real memory.
As the data blocks are processed, the data, Vj, is retrieved from the memory unit, along with weighted addresses, Wy, which are acted upon by arithmetic unit 42 resulting in Σ Wt- Vy-
One way of implementing the virtual-zero architecture is in a neural-network processor which includes an array of general purpose registers, and allocating certain registers in the general purpose register array as virtual- zero registers, to provide the function of address register 34, shift register 36 and counter 38. This configuration allows the user to program the processor so that the number of bits in a data block may be varied to suit the particular application and to enable or to disable the virtual-zero feature.
The actual operation of the virtual-zero architecture may be described by the following instruction set, which, while presented in software form, would be incorporated into the physical design of the integrated circuit containing the virtual-zero architecture of the invention.
The following code is a simplification of the code that describes the actual CMOS implementation of the Virtual PN in a neurocomputer chip. The code shown below is in the C programming language embellished by certain predefined macros. The code is used as a register transfer level description language in the actual implementation of the circuitry described here. Bolded text indicates a signal, hardware or firmware component, or a phase or clock cycle.
The phi and ph2 variables simulate the two phases in the two- phase, non-overlapping clock used to implement dynamic MOS devices.
The post-fix "_D" on some signal names means a delayed version of the signal, "_B" means a bus (more than one signal line), "_1" means a dynamic signal that is only valid during phi. These can be combined arbitrarily. The virtual-zero registers are functionally a part of the register file and so reading and writing the register file to/from the virtual-zero register address (to/from the PN's internal buses) will provide access to these registers.
The PN is instructed to read the register file: if (phi) rgrd_B = reg_B [r_B] ; which, in reality, is an instruction to read virtual-zero shift register 36, which, as previously described, is a 64 bit register created out of 4 16-bit registers. if ( (phi) ANDb (r_B==F_VZO) ) rgrd_B = vzO_B; if ( (phi) ANDb (r_B==F_VZl) ) rgrd_B = vzl_B; if ( (phi) ANDb (r_B==F_VZ2) ) rgrd_B — vz2_B; if ( (phi) ANDb (r_B==F_VZ3) ) rgrd_B = vz3_B; if ( (phi) ANDb (r_B==F_VZCNT) ) rgrd_B = vzcnt_B OR ((vzreg_B AND OxFF) « 38); When VZCNT is read, the lower byte (OxFF) receives the region size, i.e., the number of base memory increment operations until the next shift, while the upper byte (vzreg_B) holds the current count until the next shift. The data is read onto the internal bus 46, which is also designated as Abus. As previously noted, there may be more than one bus provided in a given processor node, and, if so, the busses may be designated as Abus and Bbus, or some similar nomenclature. if ( (phi) ANDb (vcval) ANDb (asrctl_Bl==F_ABUSREG) ) abus_B2 OR= rgrd_B;
The register file is written to by first writing to virtual-zero registers and then loading the actual register array where reg_B[r_B] is the register address: i f ( ( ph 2 ) AND b ( v cva l ) AND b
(rgctl_B2==F_RGABUS) )
{ rgwr_B = abus_B2 ; if (r_B==F_VZ0) vzO_B = rgwr_B; if (r_B==F_VZl) vzl_B = rgwr_B; if (r_B==F_VZ2 ) vz2_B - rg r_B; if (r_B==F_VZ3) vz3_B = rgwr_B; if (r_B==F_VZCNT) { vzcnt_B = (rg r_B AND OxFF) ; vzreg_B - (rg r_B » 38) AND OxFF; } reg_B [r_B] = (rgwr_B AND OxFFFF) ; } The preceding steps sets up and initialize the virtual-zero registers. The data is available in the next clock only if the virtual-zero mode bit, vz, is enabled. vcval indicates that the command signal, wtcnt_l signal is valid, which indicates that the memory stride is to be added to the base (wtbse) register. Stride is the offset value for those increments which are gained through the memory unit: if ( (phi) ANDb (vz) ANDb (vcval) ANDb (wtinc_l)
) vzreg_B — ;
The virtual-zero segment count is then decremented, as indicated by the ~ following vzreg_B. If the segment count goes to zero, the shift register is
if ( (phi) ANDb (vz) ANDb (vzreg_B==0) ) { vzreg_B = vzcnt_B; tmpl = (SIGNAL) (VZ3_B AND Oxl) ; vz3_B = vz3_B » 1; tmp2 •= (SIGNAL) (VZ2_B AND Oxl) ; vz2_B - vz2_B » 1;
VZ2_B = VZ2_B OR (BUS) (tmpl « 15) ; tmpl = (SIGNAL) (VZ1_B AND 0x1) ;
VZl_B = VZl_B » 1; vzl_B = vzl_B OR (BUS) (tmp2 « 15) ; tmp2 = (SIGNAL) (VZ0_B AND 0x1), * vz0_B = vz0_B » 1;
VZθ_B = VZ0_B OR (BUS) (tmpl « 15) ;
VZ3_B = VZ3_B OR (BUS) (tmp2 « 15);} vzlsb = (SIGNAL) (VZθ_B AND 0x1) ;
The virtual-zero registers are rotated if phi and vz and vzreg are all equal to zero. The preceding indicates whether the least significant bit in the virtual-zero shift register is 1 or 0. vzlsb is tested by the memory base address, stride offset adder and the memory access unit. If vzlsb is not asserted (0), then the base address is not performed and zeros are read from memory (or no write is performed) when wtinc_l is asserted.
The waUNIT contains the weight memory address generation hardware, wtbse, the currently addressed memory location for weight base, is updated only if there is no data on abus to load. if (phi) {if ((vcval) ANDb ( tinc_l) ANDb NOTb(vz ANDb NOTb(vzlsb))) {wtbse_B = wtoff_B + wtbse B;} However, if virtual-zero is on and vzlsb is clear, then wtbse is not updated by wtinc, that is the stride (contained in wtoffJB) is not added, this keeps new indices from being generated during a zero segment (indicated by vzlsb =0). The wmUNIT contains the weight memory (and drivers and sense amps). Memory is read and written in ph2. The next address is computed in phi, and along with written data is phi trapped. Note that the virtual-zero mechanism also works on writes as, when the virtual-zero mechanism is on, the write simply doesn't occur. The virtual-zero condition is next evaluated: vzoff = (vz XOR 1) OR vzlsb; and written to memory, initially with the system in byte mode, and then in 2-byte mode. The wtm_2 control signals that a write is to occur. if ( (ph2) ANDb (vzoff) ANDb (vcval) ANDb (wtm_2) ) { mp B (wrbse B) = abus B2 ;
The preceding provides a read of memory, first in byte mode, and then in the 2-byte mode. However, read does not occur simultaneously with the write mode. if (ph2) { if (vzoff) { abus_b -= mpm_B (wrbse_B) else wtmrd_B=0 ; }
In ph2, both the hi and low bytes are always read and store lsb of address for phi read out.
Another function of the virtual-zero architecture structure relates to the architecture's ability to provide a selective input, during memory write, to the system incorporating the architecture. In some situations, such as where only a portion of an input vector is desired to be analyzed, the virtual-zero architecture enables a programmer to select a portion of an input for analysis.
Referring now to Fig. 7, an input vector 60 is shown. Vector 60 includes segments 62-76. Each segment may comprise a predetermined number of bits, a word, etc. The size of the segment may be determined by a programmer and only a portion of the input vector, comprising a specific number of segments, will be input to the array of PNs for analysis and processing. Exemplary portions of input vector 60 are indicated by brackets 78, 80 and 82. Bracket 78 indicates a specified portion of the input vector comprising three segments, while brackets 80 and 82 each encompass four segments. There may be a certain amount of overlapping between the specified portions, for instance, segment 68 is included in all of the specified portions.
With the virtual-zero architecture in its write mode, and write bus 34b. selected, the virtual-zero registers, 36a - 6d are used to isolate the specified portions of input vector 60. The isolated portions may have zero or non-zero values. If the portions are zero filled, the previously described virtual-zero mapping may be activated to conserve memory. Only the specified portion of input vector 60 will be stored in memory and subsequently operated on by the PNs. This technique may be used, for instance, during certain types of image processing when it is desired to only look at a subset of the total input space. The subset may be selected by the virtual-zero write mechanism. Another use for this technique is to capture and store only a desired part of an input vector on each PN.
Thus, a microcircuit architecture has been disclosed which conserves memory resources. In those cases where sparse connectivity is used, a number of zero connections are required. A neural network processor provides a large number of processor nodes, which are relatively inexpensive to provide. Idle processor cycles are therefore not a major concern. However, memory is relatively expensive and large portions of memory with zero elements is neither efficient nor desirable. The virtual-zero architecture provides a more efficient utilization of memory for those situations where connectivity is sparse or localized. Virtual-zeroes are intended to be used in neural network models where connection nodes have a limited receptive field size but where there is fairly total connectivity within the receptive field, or where there is sparse random connectivity. The effect of this mechanism is that arbitrarily sized portions of the weight memory space are set to zero without actually using zero memory locations. When combined with zero weights, the virtual-zero architecture creates an efficient sparse matrix organization. The architecture assumes that zero weights indicate null connections. Therefore, during any weight update process, a test must be made to guarantee that the weight is not updated to be non-zero. This operation can be performed efficiently using conventional conditional execution techniques. The architecture may also be used to provide a selective input to memory and processor nodes by restricting the input to memoiy of a selected portion of an input vector according to a selected program.
Although a preferred embodiment of the invention has been disclosed herein, it should be appreciated that variations and modifications may be made thereto without departing from the scope of the invention as defined in the appended claims.
Industrial Application Processors constructed according to the invention are useful in neural network systems which may be used to simulate human brain functions in analysis and decision making applications.

Claims

WHAT I CLAIM IS:
1. In a single instruction stream, multiple data stream (SIMD) processor having an input unit and manipulation units, a selective data manipulation architecture for selecting portions of an input vector, containing data, to be manipulated comprising: a memory unit (40) for storing data; an arithmetic unit (42) for mathematically operating on the data; a memory address generation unit (32) including an address register (34) for identifying the address, in said memory unit (40), of a selected portion of a particular input vector; a counter (38) for counting the number of memory addresses in a particular input vector; a rotation register (36) for providing an address in said memory unit (40) of a selected portion of a particular input vector; and an adder for computing the next memory address; said memory address unit (32) providing the selected portion of particular input vectors to said arithmetic unit (42).
2. The architecture of claim 1 wherein said selected portions of the input vector comprise zero-value blocks (52) and wherein said rotation register (36) includes means for providing a data-void address in said memory unit if and only if all of the entries in the data block are zero.
3. The architecture of claim 2 wherein said memory address unit includes means for providing zero-value data blocks to said arithmetic unit to simulate the data blocks having the data void addresses during a memory read operation.
4. The architecture of claim 1 which includes means for predetermining which portions of the input vectors are selected for manipulation by the architecture.
5. The architecture of claim 1 wherein said memory unit includes a predetermined number of data storage blocks, and said rotation register (36) is partitioned into a like predetermined number of segments, each of which corresponds to a predetermined partition of said memory unit.
6. In a single instruction stream, multiple data stream (SIMD) processor having an input unit and manipulation units, a virtual-zero architecture for compressing zero-value data blocks comprising: a memory unit (40) for storing data; an arithmetic unit (42) for mathematically operating on the data; a memory address generation unit (32) including an address register (34) for identifying the address, in said memory unit, of a particular data block; a counter (38) for counting the number of memory addresses in a particular data block; a rotation register (36) for providing a data-void address in said memory unit if and only if all of the entries in the data block are zero; and an adder for computing the next memory address; said memory address unit (32) providing zero-value data blocks to said arithmetic unit (42) to simulate the data block having the data-void address.
7. In a single instruction stream, multiple data stream (SIMD) processor having an input unit and mampulation units, a selective data manipulation architecture for selecting portions of an input vector, containing data, to be manipulated comprising: a memory unit (40) for storing data; an arithmetic unit (42) for mathematically operating on the data; a memory address generation unit (32) including means for predetermining which portions of the input vectors are selected for manipulation by the architecture an address register (34) for identifying the address, in said memory unit, of a selected portion of a particular input vector; a counter (38) for counting the number of memory addresses in a particular input vector; a rotation register (36) for providing an address in said memory unit (40) of a selected portion of a particular input vector; and an adder for computing the next memory address; said memory address unit (32) providing the selected portion of particular input vectors to said arithmetic unit.
8. The architecture of claim 7 wherein said memory unit (40) includes a predetermined number of data storage blocks, and said rotation register (36) is partitioned into a like predetermined number of segments, each of which corresponds to a predetermined partition of said memory unit.
PCT/US1990/003067 1990-05-30 1990-05-30 Neural network using virtual-zero WO1991019248A1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
JP2512682A JPH05501317A (en) 1990-05-30 1990-05-30 Neural network using virtual zero values
EP19900913599 EP0485522A4 (en) 1990-05-30 1990-05-30 Neural network using virtual-zero
PCT/US1990/003067 WO1991019248A1 (en) 1990-05-30 1990-05-30 Neural network using virtual-zero

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US1990/003067 WO1991019248A1 (en) 1990-05-30 1990-05-30 Neural network using virtual-zero

Publications (1)

Publication Number Publication Date
WO1991019248A1 true WO1991019248A1 (en) 1991-12-12

Family

ID=22220891

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1990/003067 WO1991019248A1 (en) 1990-05-30 1990-05-30 Neural network using virtual-zero

Country Status (3)

Country Link
EP (1) EP0485522A4 (en)
JP (1) JPH05501317A (en)
WO (1) WO1991019248A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11816480B2 (en) 2016-10-27 2023-11-14 Google Llc Neural network compute tile
US11816045B2 (en) 2016-10-27 2023-11-14 Google Llc Exploiting input data sparsity in neural network compute units

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4192010A (en) * 1977-11-28 1980-03-04 Kerner William R Data reduction system
US4558302A (en) * 1983-06-20 1985-12-10 Sperry Corporation High speed data compression and decompression apparatus and method
US4807168A (en) * 1987-06-10 1989-02-21 The United States Of America As Represented By The Administrator, National Aeronautics And Space Administration Hybrid analog-digital associative neural network
US4907194A (en) * 1984-12-19 1990-03-06 Nec Corporation String comparator for searching for reference character string of arbitrary length

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3919534A (en) * 1974-05-17 1975-11-11 Control Data Corp Data processing system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4192010A (en) * 1977-11-28 1980-03-04 Kerner William R Data reduction system
US4558302A (en) * 1983-06-20 1985-12-10 Sperry Corporation High speed data compression and decompression apparatus and method
US4558302B1 (en) * 1983-06-20 1994-01-04 Unisys Corp
US4907194A (en) * 1984-12-19 1990-03-06 Nec Corporation String comparator for searching for reference character string of arbitrary length
US4807168A (en) * 1987-06-10 1989-02-21 The United States Of America As Represented By The Administrator, National Aeronautics And Space Administration Hybrid analog-digital associative neural network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP0485522A4 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11816480B2 (en) 2016-10-27 2023-11-14 Google Llc Neural network compute tile
US11816045B2 (en) 2016-10-27 2023-11-14 Google Llc Exploiting input data sparsity in neural network compute units

Also Published As

Publication number Publication date
EP0485522A1 (en) 1992-05-20
EP0485522A4 (en) 1993-08-04
JPH05501317A (en) 1993-03-11

Similar Documents

Publication Publication Date Title
US5369773A (en) Neural network using virtual-zero
US5175858A (en) Mechanism providing concurrent computational/communications in SIMD architecture
US7237091B2 (en) Multiprocessor computer architecture incorporating a plurality of memory algorithm processors in the memory subsystem
US5524175A (en) Neuro-computer system for executing a plurality of controlling algorithms
US5204938A (en) Method of implementing a neural network on a digital computer
EP3698313A1 (en) Image preprocessing for generalized image processing
EP0022622A1 (en) Programmable controller
Krikelis et al. Associative processing and processors
EP0223690A2 (en) Processor array with means to control cell processing state
Zahedi et al. Tile architecture and hardware implementation for computation-in-memory
JPS5926059B2 (en) control circuit
EP0578361B1 (en) Digital signal processing apparatus
WO1991019248A1 (en) Neural network using virtual-zero
Peroni et al. ALook: Adaptive lookup for GPGPU acceleration
Zhou et al. Dp-sim: A full-stack simulation infrastructure for digital processing in-memory architectures
Kerckhoffs et al. Speeding up backpropagation training on a hypercube computer
US5855010A (en) Data processing apparatus
Chow et al. A systolic array processor for biological information signal processing
Tavangarian Flag-oriented parallel associative architectures and applications
Gutiérrez et al. Hardware and software architecture for implementing membrane systems: A case of study to transition P systems
Siewiorek Introducing ISP
Brookes et al. Introduction to Occam 2 on the Transputer
Radivojevic et al. High-performance DSP architectures for intelligence and control applications
CA1271259A (en) Simulation system
EP0485594A1 (en) Mechanism providing concurrent computational/communications in simd architecture

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AT AU BB BG BR CA CH DE DK ES FI GB HU JP KP KR LK LU MC MG MW NL NO RO SD SE SU US

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE BF BJ CF CG CH CM DE DK ES FR GA GB IT LU ML MR NL SE SN TD TG

WWE Wipo information: entry into national phase

Ref document number: 1990913599

Country of ref document: EP

WWP Wipo information: published in national office

Ref document number: 1990913599

Country of ref document: EP

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

NENP Non-entry into the national phase

Ref country code: CA

WWR Wipo information: refused in national office

Ref document number: 1990913599

Country of ref document: EP

WWW Wipo information: withdrawn in national office

Ref document number: 1990913599

Country of ref document: EP