AU3616700A - Two architectures for integrated realization of sensing and processing in a single device - Google Patents

Two architectures for integrated realization of sensing and processing in a single device Download PDF

Info

Publication number
AU3616700A
AU3616700A AU36167/00A AU3616700A AU3616700A AU 3616700 A AU3616700 A AU 3616700A AU 36167/00 A AU36167/00 A AU 36167/00A AU 3616700 A AU3616700 A AU 3616700A AU 3616700 A AU3616700 A AU 3616700A
Authority
AU
Australia
Prior art keywords
array
cell
output
sensor
cells
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
AU36167/00A
Inventor
Gamze Erten
Fathi M. Salam
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Clarity LLC
Original Assignee
Clarity LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Clarity LLC filed Critical Clarity LLC
Publication of AU3616700A publication Critical patent/AU3616700A/en
Abandoned legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Neurology (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Image Analysis (AREA)
  • Amplifiers (AREA)
  • Image Processing (AREA)
  • Transforming Light Signals Into Electric Signals (AREA)
  • Semiconductor Integrated Circuits (AREA)

Description

WO 00/52639 PCT/US00/05785 TWO ARCHITECTURES FOR INTEGRATED REALIZATION OF SENSING AND PROCESSING IN A SINGLE DEVICE FIELD OF THE INVENTION The present invention relates to an integrated sensing and processing device. More specifically, the present invention relates to an architecture for the integrated realization of sensing and processing within a single device. BACKGROUND Currently, Cellular Neural Networks (CNN) build a paradigm based on a set of canonical differential equations (Equations 1-3 below) that define a set of nonlinear dynamic interactions between cells in a (usually two dimensional) grid. There are several degrees of freedom in setting up these interactions in the form of feedback and feedforward weights (namely the values A and B in Equation 1) and the initial conditions of the states (x(t=O) in Equation 1) and the bias currents (I in Equation 3). CNN is in fact programmed by way of manipulating these values. The nonlinearity comes from the nonlinear nature of the function ('f' in Equation 2). The function "f can take on different shapes and characteristics. Programming a particular function is then delegated to finding the right combination of A, B, x(t=O), and I that yields the desired outcome or outputs (y) given a pattern of inputs (u). This type of programmability is quite flexible. CNN is a hybrid of Cellular Automata and Neural Networks (hence the name Cellular Neural Networks) and it incorporates the best features of both concepts. Like Neural Networks, its continuous time feature allows real-time signal processing, and like Cellular Automata its local interconnection feature makes physical realization in VLSI possible. Its grid-like structure is suitable WO 00/52639 PCT/USOO/05785 for the solution of a high-order system of first order nonlinear differential equations on-line and in real-time. In summary, CNN can be viewed as an analog nonlinear dynamic processor array. The basic unit of CNN is called a cell. Each cell receives input from its immediate neighbors (and itself via feedback), and also from external sources (e.g., the sensor array points and/or previous layers). The canonical CNN equation summarizes these relationships: T .X 0 (t) = -X, (t) + Z Ayij (Yk1(t),y(t))+ Bj, (U(t) ,u,,(t))+1 k/cNr (ij) kHcNr(U) Equation I yU f(xj) Equation 2 and Ij = I Equation 3 where u represents the input, x represents the state, and y represents a nonlinear function of the state associated with a cell (or neuron), and A and B represent the cloning templates. In a typical CNN, local connections between the neighbors (feedback weights, or the entries of the matrix A in Equation 1), along with connections form the sensory array (input weights, or entries of the matrix B in Equation 1) form the programmable cloning templates. Cloning templates to perform numerous types of visual processing tasks have been developed. Each template is specific to a particular application, e.g., a cloning template for edge detection or binocular stereo. Cellular neural networks are attractive in image processing because of their programmability: One needs to change only the template to perform a different iconic task. Despite such flexibility, critical problems arise when one tries to implement CNN in electronic circuits. The CNN model described in Equations (1-3), however, is not suitable for direct VLSI implementation. In an integrated circuit implementation of the CNN model, the summation equation is a current based computation, as the circuit model in Figure I suggests. By Kirchoff's law, all currents coming into the node that defines the state of the cell ( x ) must 2 TYT'RClTITTTTV QH'FT (PIT T F 6) WO 00/52639 PCT/USOO/05785 add to zero. As the intrinsic resistance values ( R ) are very large and capacitance values are very small, little charge is required to maintain a particular voltage. This also means that the current required to alter the voltage value of the state is relatively very small. The values of the noise currents are of sufficient magnitude to make significant difference. When one adds to that the fact that the transistor characteristics can generally vary as much as 20% within the same chip substrate, it becomes clear that the ( x ) node is highly likely to charge up or down to a power rail. One remedy is adding significant capacitance to each node. This is not desirable at all since it requires precious VLSI real estate and furthermore increases the response time of the cell. The VLSI implementations of the current CNN models have not addressed this issue, and as such, the current CNN model described in Equations (1-3) is not suitable for direct VLSI implementation. 3 CrTmQTTTTT'F CHTJEFT ({ITT 1 'M\ WO 00/52639 PCT/USOO/05785 SUMMARY An integrated sensing device comprising an array of sensor processor cells capable of being arranged into a detection array. Each sensor processor cell comprises a sensing medium; at least one transconductance amplifier configured for feedforward template multiplication; at least one transconductance amplifier configured for feedback template weights; a plurality of local dynamic memory cells; a data bus for data transfer; and a local logic unit. The array of sensor processor cells, by responding to data control signals, is capable of transforming, reshaping, and modulating the original sensed image into varied represenations which include (and extend) traditional spatial and temporal processing transformations. 4 RITRSTITUTE SHEET (RULE 26) WO 00/52639 PCT/USOO/05785 BRIEF DESCRIPTION OF THE DRAWINGS The present invention is illustrated by way of example in the following drawings in which like references indicate similar elements. The following drawings disclose various embodiments of the present invention for purposes of illustration only and are not intended to limit the scope of the invention. Figure 1 (Prior Art) illustrates an existing embodiment of an architecture of a physical realization of the cellular neural network canonical equations. Figure 2 illustrates an embodiment of an adaptation of a dynamic cellular computation model to the VLSI domain in accordance with the teachings of one embodiment of the present invention. Figure 3 illustrates an alternate embodiment of an adaptation of a dynamic cellular computation model to the VLSI domain in accordance with the teachings of one embodiment of the present invention. Figure 4 illustrates an alternate embodiment of an adaptation of a dynamic cellular computation model to the VLSI domain employing analog multipliers in accordance with the teachings of one embodiment of the present invention. Figure 5 illustrates an embodiment of an operational process in flow chart format for a dynamic cellular architecture in accordance with the teachings of one embodiment of the present invention. Figure 6 illustrates an embodiment of a cellular architecture element or cell in accordance with the teachings of one embodiment of the present invention. Figure 7 illustrates an embodiment of the layout of a CMOS chip comprising several types of cellular units in accordance with the teachings of one embodiment of the present invention. 5 qTRqTTT TTF. .H1FFT RITER 261 WO 00/52639 PCT/USOO/05785 Figure 8 illustrates an embodiment of a one input - one output cell with initialization in accordance with the teachings of one embodiment of the present invention. Figure 9 illustrates an embodiment of a programmable type feedback cell in accordance with the teachings of one embodiment of the present invention. Figure 10 illustrates an embodiment of a positive and negative feedback hardwired and state initializable cells in accordance with the teachings of one embodiment of the present invention. Figure 11(a) illustrates an embodiment of two (+/-) hardwired feedback and floating state cells in accordance with the teachings of one embodiment of the present invention. Figure 11(b) illustrates one embodiment of a one feedforward only cell in accordance with the teachings of one embodiment of the present invention. Figure 12 illustrates an embodiment of digital input accommodation via parallel transistor cascading in accordance with the teachings of one embodiment of the present invention. Figure 13 illustrates an embodiment of a feedforward cell in accordance with the teachings of one embodiment of the present invention. Figure 14 illustrates an embodiment of the results from a three input one output cell with no feedback connections in accordance with the teachings of one embodiment of the present invention. Figure 15 illustrates an embodiment of a dual distributed cellular architecture in accordance with the teachings of one embodiment of the present invention. Figure 16 illustrates an embodiment of the layout of a signed sensor in accordance with the teachings of one embodiment of the present invention. Figure 17 illustrates an embodiment of a dual output light sensing pixel in accordance with the teachings of one embodiment of the present invention. 6 L T"TYTY'T'rTT1r' TYYVTPrT /DTTIT U IK, WO 00/52639 PCT/USOO/05785 Figure 18 illustrates an embodiment of a schematic diagram of a cell of a programable convolutiuon array (PCA) in accordance with the teachings of one embodiment of the present invention. Figure 19 illustrates an embodiment of the layout of a single cell of the progranable convolutiuon array (PCA) in accordance with the teachings of one embodiment of the present invention. Figure 20 illustrates an embodiment of the outputs of the programable convolutiuon array (PCA) elements in accordance with the teachings of one embodiment of the present invention. Figure 21 illustrates an embodiment of a 5 x 5 programable convolutiuon array (PCA) with I/O pads in accordance with the teachings of one embodiment of the present invention. Figure 22 illustrates an embodiment of an operational process in flow chart format for a programmable convolution array (PCA) of the dual distributed architecture in accordance with the teachings of one embodiment of the present invention. Figure 23 illustrates an embodiment of an element of the programmable cellular logic array (PCLA) in accordance with the teachings of one embodiment of the present invention. Figure 24 illustrates an embodiment of an operational process in flow chart format for a programmable cellular logic array (PCLA) of the dual distributed architecture in accordance with the teachings of one embodiment of the present invention. Figure 25 illustrates an embodiment of an operational process in flow chart format for using the programmable convolution array (PCA) and programmable cellular logic array (PCLA) of the dual distributed architecture cooperatively in accordance with the teachings of one embodiment of the present invention. Figure 26 illustrates an embodiment of a computing environment in which the invention may be implemented in accordance with the teachings of one embodiment of the present invention. 7 C'TTDCTTTTTTV CQIVVIT /TIT T' rt\ WO 00/52639 PCT/USOO/05785 Figure 27 illustrates an embodiment of a network environment in which the invention may be implemented in accordance with the teachings of one embodiment of the present invention. 5 8 CVTnO'ITTT TTE TTT7'E"F (TT V I4\ WO 00/52639 PCT/US0O/05785 DETAILED DESCRIPTION The following detailed description sets forth numerous specific details to provide a thorough understanding of the invention. However, those of ordinary 5 skill in the art will appreciate that the invention may be practiced without these specific details. In other instances, well-known methods, procedures, protocols, components, algorithms, and circuits have not been described in detail so as not to obscure the invention. In one embodiment, the steps or process of the present invention are 10 embodied in machine-executable instructions, such as computer instructions. The instructions can be used to cause a general-purpose or special-purpose processor that is programmed with the instructions to perform the steps of the present invention. Alternatively, the steps of the present invention might be performed by specific hardware components that contain hardwired logic for 15 performing the steps, or by any combination of programmed computer components and custom hardware components. The present invention is generally directed to a new cellular network architecture that can be implemented successfully using electronic circuits and integrated microelectronic chips, and a new distributed architecture comprising 20 two separate arrays, namely a programmable convolution array PCA and a programmable logic array PLA for processing one. two or three dimensional array (physical) sensor data. DYNAMIC CELLULAR ARCHITECTURE 25 One embodiment of the present invention introduces a self-normalizing feedback structure for the computation of the inputs and feedforward template. Similarly, the present invention also introduces the same structure to the feedback portion of the architecture. These modifications create a new set of 30 equations, and correspondingly a new set of dynamics, and thus represent a new structure, form, and architecture. The feedback in the feedforward input weight multiplication process and the feedback in the feedback weight multiplication 9 vTnTT~"TYTVTV yTYCT'rTTT ITTT V \ WO 00/52639 PCT/USOO/05785 are employed to keep the state node ( x ) from quickly accumulating or losing charge to saturate to a power rail. Figure 2 illustrates one embodiment of the present invention implemented in circuitry. Figure 2(a) illustrates one embodiment of a self 5 regulating mechanism, wherein rather than multiplying the input by B (as in the prior art), a function of the input minus the state is defined per node and that quantity is multiplied by that B. As such, the stabilizing and self scaling feedback elements around an amplifier keep the output of the amplifier from saturating. The gain of the amplifier is manipulated to obtain a factor which is a 10 scaled (normalized) version of B. In this way, the output is consistently the input times a normalized vesrion of "B" and the plural (collective) inputs from the cell neighborhood will not cause the cell's aggregate state or output to saturate or be overdriven with slight fluctuations of the input. Figure 2(b) illustrates one embodiment of an adaptation of the dynamic cellular 15 computation model to the VLSI domain using the concept shown in Figure 2(a). In one embodiment, it is assumed that all input weights (or, all entries of B weights are nonnegative) and all feedback weights (all entries of A) are nonpositive. As illustrated in the embodiment of Figure 2, within the circuit 20 implementation shown in Figure 2(b): (1) both u (unsigned) and x (signed) are represented as analog voltages u and x; (2) template weights can be digitally stored and are programmable; and (3) the transfer function f (), such as that illustrated in Equation 2, can in fact become a programmable tri-level saturation function, f'( ) that produces zero output in a local region of the input (rather 25 than only at a point). The particular equations that incorporate the characteristics of the circuit structure of Figure 2 are listed below. Cxi) Xj) + Y Ak,yk,(t)+ I Btanh [ q (u,,(t)-X + R k1eNr(j) kliN, (y) 2k Equation 4 30 10 OTTDCTTT!TT' QITUVVT (DTTT V IK\ WO 00/52639 PCT/USOO/05785 qK y f(x) =tanh ( (x - V,)) 2kT Equation 5 5 and Ii; = I Equation 6 where the hyperbolic tangent (tanh) describes a basic transconductance circuit. 10 Moreover, the differential equations need to be initialized from the cells initial conditions: x,;(0) We observe that the capacitor C and the resistor R may be due to the parasitics of the micro-electronic implementation process. However, in cases 15 where larger capacitance than the parasitics is desired in order to slow down or reshape the temporal processing, then capacitances in the order of pico-farads may be implemented on chip. Any larger capacitance may be implemented off chip with the likely consequent cost of limiting the size of the cellular array. In one embodiment, the input (feedforward) weights (B) are 20 implemented in a negative feedback structure. Accordingly, in this implementation, the feedback structure stabilizes the state (x) node, which otherwise has a tendency to charge all the way up or down to a rail limited by the power supply, and the structure is far more resilient to manufacturing related issues such as transistor mismatches. In the embodiment of Figure 2, the 25 modified dynamic cellular architecture model is shown, wherein some assumptions have been made about the polarity of inputs and weights, such as B is nonnegative and A is nonpositive. Figure 3 illustrates an alternate embodiment, illustrating a revision or modification that enhances the implementation scales and the contributions of 30 the states as well. Accordingly, the particular equations represent the circuit architecture which is implementable in integrated micro-electronic chips are: 11 WO 00/52639 PCT/USOO/05785 Cicj)= R '+ A, ygi) + E A,,y y(') + I Bl tanh[ (u,,() -. + i R kkN, (ij) keNr(j) 2kT ii Equation 7 f (x,) =ktanh ( (V,? - x, )) 2kT Equation 8 Y41 f(xk,) tanh( (xkl - E 2kT 5 and lij = tanh( qk (VEquation 9 2kT 10 where the hyperbolic tangent (tanh) represents a basic (or wide-range) transconductance amplifier circuit. In one embodiment, the indices kI runs over a neighborhood of the cell location ij for the feedback term A, while kl run over a neighborhood of the location ij in the feedforward term associated with the B parameters. As before, 15 the input (feedforward) weights (B) are implemented in a negative state feedback structure. Accordingly, in this implementation, the feedback structure stabilizes the state (x) node, which otherwise has a tendency to charge all the way up or down to a rail limited by the power supply, and the present structure is far more resilient to manufacturing related issues such as transistor 20 mismatches. As such, Figure 3 illustrates this modified dynamic cellular architecture model is shown, wherein some assumptions have been made about the polarity of inputs and weights, such as each elements of B and A is nonnegative. Figure 4 illustrates yet another embodiment that employs the usage of 25 analog multipliers (rather than amplifiers with gain). As such, the feedback connected amplifiers of the embodiment of Figure 3 (Illustrated as Figure 4(a)) may be replaced by analog multipliers to implement weights of positive or negative sign, as illustrated in Figure 4(b). Moreover, digital gains can also be built by way of modifying the analog multipliers gain with the digital input 30 accommodation via parallel transistor cascading method shown in Figure 9. By 12 C1YT'Vr1mr'~Tr'~ OY~T7T'~ (DTIF U~ l)t WO 00/52639 PCTIUSOO/05785 adding a multiplier that is capable of sign representation, the restrictions on template weights are eliminated. Dynamic Cellular Architecture Process 5 Figure 5 illustrates one embodiment of the operational process of the Dynamic Cellular Architecture illustrated above. In one embodiment, before the process begins, the necessary A, B, x(0), and bias values for all of the operations to be performed is determined. Accordingly, in one embodiment, a 10 program is made of operations needed to complete the task, such that each step of the program specifies an operation, where each operation is defined by the pattern of A, B, x(O), and bias values to be applied to the cellular architecture. In one embodiment, it is possible to simply obtain these values from the results of previous operations. In one embodiment, the program may be stored 15 externally or on the same chip where the architecture is implemented. In an alternate embodiment, each cell of the architecture may contain its own program memory. Referring back to Figure 5, one embodiment of the operational process of the Dynamic Cellular Architecture illustrated above, may be implemented 20 through the following process: 1. START 2. If previous operation is complete or if new data is needed, get new inputs. These inputs (values of the nodes labeled u) may come from continuous or discretely sampled data from sensors built into the 25 architecture, or alternately, they can be scanned into the input nodes (u) from an external data source by employing an appropriate scanning mechanism. 3. Apply the A, B, and bias to the correct nodes, either externally or from internal stored program memory. 30 4. Apply the initial states (x(t=0)) to the state nodes. 5. Allow sufficient time for the computation to complete, i.e., for the electronic circuits to reach a point where the change of the values of the 13 'TT1TTTFYTT' CTTVWT F /DTTT V 1\ WO 00/52639 PCTIUSOO/05785 states equals zero or have sufficiently responded to the input stimulus. For active sensors, which do not produce a static output, there may be a specific optimum or desired time instant to sample their output. Another option is that the output itself is a time variant waveform. 5 6. Read and store the states (x) or the output nodes (y). The state or output can be stored externally or internally per node. 7. If end of program, go to (END). 8. Go to (2). 9. END 10 As such, the aforementioned process steps illustrate one embodiment of the operational process of the Dynamic Cellular Architecture. 15 Dynamic Cellular Architecture-Sample Implementation Figure 6 illustrates one embodiment of a pragmatic implementation of the Dynamic Cellular Architecture, as a simple 5x5 array. Figure 6 illustrates 20 an integrated light sensing and processing architecture implemented within cellular chip based on the dynamic cellular architecture of this invention. It is understood that the inventive concepts of the present invention may be applied to a variety of different technologies and implemented in a variety of different physical architectures, and as such, the present invention is not meant to be 25 limited to integrated light sensing and processing architectures. In one embodiment, the integrated light sensing and processing architecture based on the dynamic cellular architecture of this invention is comprised of an array of identical sensor processor cells, each of which contain: (1) a photodiode and circuits for active light sensing; (2) transconductance amplifiers for feedforward 30 template weight multiplication; (3) wide range transconductance amplifiers for feedback template weights; (4) analog/ single bit digital local dynamic memory cells; (5) a data bus for transfers; (6) local programmable logic; and (7) read/write data controls. 14 ITR4ZTTT1TTF RMFT (RULE 26) WO 00/52639 PCT/USOO/05785 In the embodiment of the structure illustrated by Figure 6, a variety of operations of the cellular paradigm may be performed using, for instance, a pair of 3 x 3 cloning templates, defined by the A and B of Equations (4) or (7). Generally, each operation can be completed in a short time while it is carried 5 out in each of the elements over the entire image. This implies unprecedented processing rate improvements of sequential processors. In one embodiment, the operations include linear transformations of the input (convolution operations), plus connected component detection and other types of data manipulation allowed for by the feedback weights. For instance, some example operations 10 are edge location, morphology operators such as dilation, thinning, and erosion, light adaptation, scratch removal, texture, color and shape analysis. In addition, in one embodiment, by using the initialization of the states by previously obtained frames, it is also possible to use the two template array to implement temporal operations, such as motion analysis, e.g., local velocity detection, 15 motion direction detection. In the embodiment of Figure 6, the overall cellular chip is illustrated as a system. In this embodiment, an external (or international core) microcontroller generates the necessary command signals to the integrated sensor processor, such as sensor timing, row select, and program code. In an alternate 20 embodiment, the microcontroller can be replaced by a processor or a digital signal processor depending on the computing needs of the particular application at hand. Likewise, in one embodiment, the program memory can be internal, where a more compact address gets horizontally coded into bit level micro instructions and determine template values and data transfers within the 25 architecture. In the embodiment of Figure 6, the contents of a cell are illustrated in greater detail. For instance, the feedforward and feedback weights which also integrate the transfer function are shown in outline format. In addition, the local logic and memory functions are likewise illustrated in a similar fashion. In one 30 embodiment, there is a main data bus across which data transfers can occur. As such, two way connections can be made between the data bus and the two 15 eTP TT TTrTE -H1IWTT (RTT.F 761 WO 00/52639 PCTIUSOO/05785 analog and four digital memory units, the state (x) and input (u). Also, connection can be made from the logic output and the reference voltage to the data bus. Additionally, the logic can be implemented by programming, similar to that of programmable logic arrays. Similarly, the emerging reconfigurable 5 logic arrays represent another option or embodiment. As mentioned above, it is understood that the inventive concepts of the present invention may be applied to a variety of different technologies and implemented in a variety of different physical architectures, and as such, the present invention is not meant to be limited to integrated light sensing and 10 processing architectures. For example, the same architectures can be applied to any system, such as an acoustic or sound based system, where there are multiple sensors arranged in a one, two, or three dimensional grid. Each sensor element on the grid may then respond to the same single physical quantity (e.g., frequencies of the audible sound spectrum). As such, each sensor may sense a 15 different attribute of the same signal (e.g., each sensor tuned to a different frequency of the audible sound spectrum). This will measure the frequency pattern of the sound incident on the array. Further, one can also imagine hybrids of these two, such as measuring a color pattern. 20 Implementation of Several Cellular Model Circuits Figure 7 illustrates one embodiment of the layout of a CMOS chip 25 comprising several types of cellular units (e.g., VLSI adapted cellular dynamic architecture cell prototypes), in accordance with the concepts of the present invention. In one embodiment, the CMOS chip design may be manufactured to the MOSIS 2 micron ORBIT ANALOG process. Likewise, in one embodiment, the die size is implemented through the "TINYCHIP" package that is a 2.3 mm 30 x 2.3 mm area bonded to a 40 pin DIP. In one embodiment of the CMOS chip, as illustrated in the embodiment of Figure 7, the CMOS chip comprise eleven types of cells: Four One Feedforward - One Feedback Cells with Initialization; Two Programmable 16 c1meTTTIT1 QUI'urT mITT F 61 WO 00/52639 PCT/USOO/05785 Type (Positive or Negative) Feedback Cells; Two Hardwired Feedback and State Initializable Cells; Two (+/-) Hardwired Feedback and Floating State Cells; and One Feedforward only Cell. In one embodiment, all outputs are made available through wide range followers at the outputs and all 5 programmable cell parameters, i.e., feedforward and feedback weights, and bias are at 3 bit resolution (plus sign bit). External inputs are available via pins to set these weights through the pins. Figures 8-11 illustrate these respective cells schematically. Figure 8 illustrates one embodiment of the schematics of the four cells in a compact 10 manner: Four One Feedforward - One Feedback Cells with Initialization. At the top of the chip of Figure 7, there are four one input one output cells. There are four versions of this simple one input - one output simple cell. The four versions arise due to the four ways in which the permutations of the positive and negative feedbacks can be implemented. Four combinations of +/- terminals are 15 possible and all of these may have been implemented. Moreover, all four cells include a state initialization mechanism to define x at the initial time instant (t = 0). This means that the state can be initialized at any voltage that can be carried by on the specific microelectronic implementation. This is equivalent to defining an initialization point x tO in Equations (4) and (7). Both the 20 feedforward and the feedback amplifiers have three bit programmable gain in this specific implementation. Figure 9 illustrates one embodiment of the schematics of the Two Programmable Type (Positive or Negative) Feedback Cells. Two three-input one-output selectable kind (+ or -) of feedback cells may be implemented. One 25 of these also allowed for initialization of the state in a manner identical to the one illustrated in Figure 8. Figure 9 shows this type of cell without initialization. In one embodiment of the cell, as illustrated, there is an added delay of a pass-through transistor in feedback of the state. The two signals (the state node x and Vref) are directed to the terminals of the feedback weight 30 amplifier based on the feedback sign select bit. In one embodiment, all amplifiers have three bit programmable gain. 17 CTmCT1TTTTTT QTT-TTWT (T TI T 7t WO 00/52639 PCT/USOO/05785 Figure 10 illustrates one embodiment of the schematics of the Two Hardwired Feedback and State Initializable Cells. Two three input one output cells illustrated in Figure 10 were included with initialization. These cells differ from the ones described in Figure 9 only slightly. In one embodiment, the type 5 of feedback is directly implemented rather than being programmable. Two combinations of +/- terminals are possible and both have been implemented. In one embodiment, both amplifiers have three bit programmable gain. Figure 11(a) illustrates one embodiment of the schematics of the Two (+/-) Hardwired Feedback and Floating State Cells. In one embodiment, these 10 cells are identical to the ones described in Figure 10 but do not contain state initialization circuits. Initial value of the state is thus undefined. Figure 11(b) illustrates one embodiment of the schematics of the One Feedforward only Cell. This is a three input one output cell with programmable bias. In one embodiment, the feedback component is eliminated. Thus the cell 15 diverges from the cellular neural network paradigm and instead is useful for feedforward operations only. Figure 13 illustrates one embodiment of a Feedforward cell. In one embodiment, the input u is always nonnegative but has a signed representation and positive entries of the B template select u+ and negative ones select u-. 20 Feedforward Weights and Image Convolution: Sample Operation Convolution is a very common image processing step that precedes many vision tasks. It is often used as a technique to generate a second 25 (different) image from the original where desired features are enhanced and/or undesired characteristics (e.g., noise.) are suppressed. In one embodiment, convolution can be described as a continuous spatial or temporal operation, but its application to sampled images is discrete and involves a convolution kernel with discrete values. 30 18 C'YTTVT'1rTT1 OTJTrqrT IDTITTT 1U \ WO 00/52639 PCTIUSOO/05785 This discrete convolution operation denoted by 0, between an image I and kernel k can be described as: N N I~k = I x 3~yjkj 5 Equation 10 where I is the input image, x and y are the two dimensional image coordinates, and k is an (2N+1 x 2N+1) square kernel. One can easily see the similarity between the feedforward weight input 10 product sum j Bk, (u,(t), U ,(t),) in Equation (1) and the sum in Equation (10). k/EN, (q) Thus, if one sets, 1=0 and A=0, the steady state solution to Equation (1) 15 is Xit (t) = B uk,) ,u,,(1 kiE Nr (U) which is in fact equivalent to I0 kl , where the second term is a normalized kernal. 20 The described VLSI adapted model performs a normalized operation which replaces the feedforward weights time the input values summation terms of the canonical model in Equation (1) SBk, (u,(t)) k/eNr() 25 with I Bk, tanh[ q (u, (t) - X.(t)] in Equation (4). k/ENr (') 30 Normalized convolution, denoted by On is very similar to the conventional convolution operation, except that the output is normalized by the 19 QTITnTTTITT ' CHTT'F T (RuTT 2R61 WO 00/52639 PCT/USOO/05785 sum of kernel entries. Because kernels are the same across the whole image, the result is essentially division by a normalization factor common to the entire image. The advantage is that the dynamic range of the resulting image is essentially the same as that of the input image. 5 N N
.
-
x+i,y+ jk 10, kj" =- =-Nj j-N N N 1: k. i=-N j=-N Equation 11 For the implementation of the normalized convolution described above, 10 input voltage values are loaded with input voltages corresponding to the image, and gains from the entries of the kernel. Since the conductances (or gain values) are necessarily positive, to implement negative kernel values with this approach, one needs to be able to define negative input voltages and select negative polarity for the input to the transconductance amplifier for which a 15 negative kernel value is desired. This cuts the dynamic range of images in half, although the range of kernel entries remain unaffected. One can view the cell arrangement in Figure 11(b) in this light, provided that the bias amplifier is ignored. 20 Sample Operation: For instance, in a sample operation, a four bit plus sign representation, shown in Figure 12 (digital input accommodation via parallel transistor cascading), may used as entries of the B template in Equations (1) and (4) and take on values equivalent to the range of -3.75 and 3.75 in increments of 0.25. 25 In one embodiment, the cell is equivalent to the feedforward configuration shown in Figure 11(a) provided that the gain of the amplifiers are sent by the B template. With more area and more bits higher resolution and dynamic range would be possible. 30 20 0TTDCT1TTTTU QLT"UT /DTTT V M\ WO 00/52639 PCTIUSOO/05785 In the sample operation, other parameters may be as follows: 0 <u< 1 u+=V ref + u 5 u-= Vref - u -3.75 < b < 3.75 u min < x < u max V ref may be the representation of zero in this architecture. One 10 suggested value for Vref are the midpoint between the power supply ranges ground and the power supply of the microelectronic implementation. Implementation of sign is based on selection of the u + or u -. Note that this also creates the opportunity to implement negative u. In other words, it is possible to accommodate u < 0 values in this structure, as well. In that case, 15 the parameters may be as follows: -l <u< 1 u + = Vref + u u-= V ref-u 20 Specific realization of the -3.75 to 3.75 range realizing different values of template B in the architecture yields the following: b= b sign (v) b 3 (MSB) b 2 b I b 0 (LSB) -3.75 0 V I V I V I V 1 V -2.00 0 V I V 0 V 0 V 0 V -1.00 0 V 0 V I V 0 V 0 V -0.50 0 V 0 V 0 V I V 0 V -0.25 0 V 0 V 0 V 0 V I V 0.25 5 V 0 V 0 V 0 V I V 0.50 5 V 0 V 0 V I V 0 V 1.00 5 V 0 V I V 0 V 0 V 2.00 5 V I V 0 V 0 V 0 V 3.75 5 V I V I V I V I V 25 where Power = 5 V and logic 1 = 5 V. We can select I V as bias to operate the bias transistors around threshold. In one embodiment, the state node (x) is 21 TTTT'1'TTY'1M 0TTtUT /DTT 1'U 'M\ WO 00/52639 PCT/USOO/05785 referenced to V ref and does not need a signed representation. Positive entry values for the feedback template "A" can be used to steer x to the + terminal and negative entries steer it to the - terminal of the wide range differential amplifier. In the four bit plus sign representation A values may likewise take on values in 5 the range -3.75 and 3.75 in increments of 0.25. With more area and more bits higher resolution and dynamic range would be possible. Other parameters are as follows: -3.75 < aij < 3.75 and 0 < x < Vdd. As such, the feedback configuration of the feedforward weights provides for some safety against the tendency of active current computation units to be 10 attracted to a power supply rail. Two real images were captured using a CCD camera and presented to the chip three inputs at a time. Three weights for the inputs are set to represent the one dimensional vertical edge kernel of 1 0 -1. In the present sample operation, each pixel is presented in this fashion. In the sample operation, the 15 whole image was presented three pixels at a time along the horizontal direction to the model circuit shown in Figure 11(b) with the bias amplifier disabled, the result is shown in Figure 14. DUAL DISTRIBUTED ARCHITECTURE 20 One embodiment of the present invention, as illustrated in Figure 15. introduces a Dual Distributed Architecture, illustrated as a 5x5 array. In one embodiment, the Dual Distributed Architecture contains two distinct structures: (i) programmable convolution array PCA, a sensing grid with convolution 25 capability, which may also include some short term memory, and (ii) PLA, a programmable logic array and memory area on which transformations or alternate representations of the sensory data can be recreated. In one embodiment, both areas compute in parallel and communicate with each other as needed in a serial but random access fashion. Also, it is primarily the 30 programmable cellular logic array (PCLA) that communicates with the outside realm, which potentially includes a conventional digital signal processor, microprocessor or microcontroller. 22 nVTnTTTTFT CTTYOT /T)T TT TU 14\ WO 00/52639 PCT/USOO/05785 In view of the ideal of leveraging the speed of the implementation medium (most commonly silicon microelectronic circuits) against the large area requirements of a cellular processor, we view again the components of the cell of a specific implementation, which are listed in Dynamic Cellular Architecture 5 above, namely: (1) a photodiode and circuits for active light sensing; (2) transconductance amplifiers for feedforward template weight multiplication; (3) wide range transconductance amplifiers for feedback template weights; (4) analog/ single bit digital local dynamic memory cells; (5) a data bus for transfers; (6) local programmable logic; and (7) read/write data controls. 10 Practical real-world concerns and implementations for integrated cellular sensor-processor systems generally dictate consideration of the following: (1) the sensors be manufacturable and embedded into the processing system; (2) the resolution of the sensory part of the system be close to that available commercially; (3) the fill factor (ratio of sensor area to the total area of each 15 cell) be reasonable for the application; (4) the sensor performance be maintained with shrinking technology size; (5) a set of operator sizes that take in data from a large neighborhood. These are simply factors to consider when implementing an integrated cellular sensor-processor system, but do not have to be specifically adhered to, and as such, the aforementioned factors are not intended to limit the 20 scope of the present invention Nevertheless, taking a systematic approach to meeting the objectives itemized above, the following factors, likewise, should generally be noted: (1) to improve the resolution one needs to shrink the cell; (2) to improve the fill factor, one needs to increase the light sensitive areas; (3) to keep the sensor 25 operational, one may need to build the sensors (or the whole chip) using a larger technology which again means that one needs to make the cell yet smaller; and (4) to increase the neighborhood size, one may need to build yet a larger cell that connects to more of its neighbors. Again, these are simply factors to consider when implementing an integrated cellular sensor-processor system, but 30 do not have to be specifically adhered to, and as such, the aforementioned factors are not intended to limit the scope of the present invention. 23 CT T)C YTY7 TTTr' /TT1TFT In T T ' 1\ WO 00/52639 PCT/IUSOO/05785 In this manner, one can also implement (in the feedfoward connections) arbitrary size neighborhood, or kernels, as they are commonly called in image processing, including a kernel that is as large as the whole sensor array itself, but only with one such kernel at a time. This leverages the speed of the 5 implementation medium against area requirements of the processor. The dual distributed architecture generally provides three significant advantages as compared to currently implemented or existing architectures. First, the computational concept of the architecture is simpler and far more conventional than the concept of the cellular neural network of prior art, 10 which employ nonlinear circuit dynamics. Moreover, practicing engineers already will need far less training to implement this architecture. The way operations in the dual distributed architecture are programmed is far more straightforward than the way in which one needs to go about determining or finding the correct A, B, x(O), and I parameters, as compared to currently 15 implemented or existing architectures. Second, since the role of nonlinear dynamics is minimized, electronic circuit anomalies, such as mismatches or noise inherent to the substrate on which the architecture is built, have far less impact on the outcome of the computation. This makes the implementation a lot easier and cost effective to 20 manufacture. Third, the architecture of the dual distributed architecture, as the name implies, comprises two parts, and each of those parts serves a separate function. In other words, each part is functional on its own and can be implemented without the other. The first part of the dual distributed architecture is the 25 programmable convolution array PCA, which in one embodiment comprises a sensing grid with convolution capability, which could also include some short term memory. The programmable convolution array PCA part performs arbitrary linear transformations of the whole sensory data or parts of the sensory data. The desired weights of the individual elements can be directly 30 programmed. The second part dual distributed architecture is the Programmable Cellular Logic Array (PCLA). In one embodiment, the PCLA is cellular, which 24 c1mTcD~ TT'T1T cuH'rTT lUTTIl 7AM WO 00/52639 PCT/USOO/05785 means that each of the elements of the array connected to a set of its neighbors. The PCLA, however, is not a neural network and in one embodiment relies on conventional logic to perform its operations. In one embodiment, the actual logic can be implemented with conventional digital logic or non-conventional 5 analog logic gates, such as but not limited to AND OR XOR. Additionally, in one embodiment, there can be a state-machine (i.e., a mini computing engine) at each of the elements. For stand-alone operation, each element of the PCLA could be equipped with a sensor, or data could be scanned in from an external source. 10 The Signed Output Sensor (SOS) As for some types of implementations of the convolution function, e.g., those using non-negative gain amplifier circuits, it may be necessary for each sensor to produce dual outputs. 15 Accordingly, depending on the nature of the sensor, the way such dual output is produced may be different. For sensors that measure non-negative values referenced to zero (e.g., light), there may need to be a (+) and a (-) pixel output. Positive (+) output is used for feeding into the amplifier if it contains a positive kernel value. Correspondingly, the negative (-) pixel output may used 20 for negative kernel value. A one dimensional example is given below. Suppose that one needs to implement a one dimensional edge kernel of [ 1 0 -1] on a light sensitive grid. The equivalent arithmetic operation is the sum pixel[i-1] + (- pixel[i+1]). Since current summation can be used, one needs to add the current for the positive representation of pixel[i-1] to the current for 25 negative representation of pixel[i+1]. Figure 16 illustrates one embodiment of an exemplary layout of such a single signed output active sensor. In the embodiment of Figure 16, the active area is a square of 100 k x 100 k and the whole cell covers a 195 k x 130 k area. The (grounded) METAL2 layer covering the areas of the cell that should not be 30 exposed to light is not shown. 25 CTTDOTTTYTTrU CITt'fT /PTT 1W WO 00/52639 PCTIUSOO/05785 Likewise, the corresponding schematic of the signed sensor is illustrated in Figure 17, where the positive and negative pixel representations are marked. In one embodiment, the additional follower is employed since the photocurrent is very small and the photosensing node should not be perturbed. 5 Programmable Convolution Array (PCA) Figure 18 illustrates one embodiment of a design of one element of a 10 light sensing PCA. In one embodiment, each pixel cell can be addressed by the combination of row and column select signals. Likewise, weight bits can also thus be written to selected pixels. Pixel midpoint determines the "0" value - i.e., no light, of the signed pixel value representation. Reset pixel will charge the pixel output to that value. In one embodiment, a dedicated ground signal need 15 not be routed since a metal layer covering the circuits (with opening at light sensitive areas) can be grounded. In the embodiment of Figure 18, each cell contains: (1) one signed output sensor (SOS); (2) four D flip flops for storage of three bit weight and sign; (3) four multiplexers; (4) one wide range transconductance amplifier connected in the unity follower configuration; and 20 (5) a three input AND gate. In one embodiment, the size of the cell can be reduced significantly, since as the layout of the single cell in Figure 19 shows, there are some empty spaces. Smaller feature sizes and added metal layers will no doubt compact the cell further. A significant area savings would result from a dynamic memory 25 cell, since each kernel would be used for a short period of time. Several data and control signals are generally routed to each cell. In one embodiment, the controls signals common to all cells include: (1) clock weight, (2) reset photodiode (or sensor), and (3) sign. Addressing signals are row and column. In one embodiment, the data signals common to all cells 30 include: (1) pixel reference, (2) inverter reference, (3) weight high, (4) weight low, (5) weight MSB, (6) weight MID, (7) weight LSB. Furthermore, in one 26 QTICTTTTITT' CITiT (TrTT F WO 00/52639 PCTIUSOO/05785 embodiment, as illustrated in Figure 20, elements of each row have a common row select signal and elements of each column have a common column select signal. Accordingly, the output of each cell may be summed onto a common line. Alternately, outputs can be combined along columns, rows, or blocksor 5 along alternate designations. Figure 21 illustrates an example PCA layout of 5 x 5 pixels. In the embodiment of Figure 21, each of the 25 elements use four weight bits (three magnitude and one sign) which are stored in D-flip-flops. Accordingly, the weight bits are clocked in only when both the row and the column are selected, 10 as well as when the weight clock bit is active. The outputs of the weight flip flops the select between the weighthigh and weightlow. In one embodiment, the wide range transconductance amplifier at the output stage has three bias transistors with W/L ratios of %, %/, and 1, each corresponding to the magnitude bits. If all weights are zero, the pixel does not contribute any current to the 15 output. If any of the weights is non-zero, based on the sign, the pixel makes a positive or negative current contribution to the overall reading. The data signal, weighthigh also determines the gain on the photosensor. In one embodiment, the D flip flops could be replaced with dynamic memory to save silicon area. In one embodiment, the design may be implemented in a 2 micron CMOS 20 technology design with its pads configured to fit into an area of 2.3 mm x 2.3 mm, and is embedded in a 40 pin DIP chip package. All 25 pixel outputs are summed on a common output line in this example implementation. One can select different signs and weights for each of the cells and thus perform arbitrary convolution operations. Arbitrary 4-bit kernels up to 5 x 5 can be programmed 25 in this particular example implementation. PCA Process 30 Figure 22 illustrates one embodiment of the operational process of the Programmable Convolution Array (PCA) illustrated above. In one embodiment, 27 T1TTTTTTT CT.T T R TIT ' %\ WO 00/52639 PCT/USOO/05785 before the process begins, the necessary weight patterns to be applied to the array for all of the operations to be performed is determined. Accordingly, in one nembodiment, a program is made of operations needed to complete the task, such that each step of the program specifies an operation, where each operation 5 is defined by the pattern of weights to be applied to the array. In one embodiment, it is possible to simply obtain the weight pattern from results of previous operations. In one embodiment, the program may be stored externally or on the same chip where the array is implemented. In an alternate embodiment, each cell of the array can have its own program memory. 10 Referring back to Figure 22, one embodiment of the operational process of the Programmable Convolution Array (PCA) illustrated above, may be implemented through the following process: 1. START 2. If previous operation is complete or if new data is needed, get new 15 inputs. These inputs may come from continuous or discretely sampled data from sensors built into the array, or alternately, they can be scanned into the input nodes from an external data source by employing an appropriate scanning mechanism. 3. Apply the weight pattern of the instruction, either externally or from 20 internal stored program memory. 4. Allow sufficient time for the computation to complete, i.e., for the electronic circuits to reach a point where the change of the value of the output of the array equals zero. For active sensors, which do not produce a static output, there may be a specific optimum time instant to 25 sample their output. Another option is that the output itself is a time variant waveform. 5. Read and store the output(s) of the array. The output can be stored externally or internally. If the PCA is used in tandem with a PCLA, then the output can also be sent to PCLA. 30 6. Wait for next operation. 7. If end of program, go to (END). 28 L'TT"L'I'1'1T' CV T Q1 (UDITI V '79 WO 00/52639 PCTIUSOO/05785 8. Go to (2). 9. END. 5 Programmable Cellular Logic Array (PCLA) The output of a convolution operation performed by the PCA is usually an analog current or voltage value. While this rapid arbitrary size kernel 10 convolution operation is extremely useful, there are many early vision operations that require further processing of the results of many convolution operations. The programmable cellular logic array (PCLA) is a way of implementing this stage. In one embodiment, the PCLA is a processing device that processes 15 binary or digital data using a set of fixed or programmable logic elements arranged on a cellular grid. Accordingly, if the PCLA is used with a PCA, the sizes of the two arrays could be the same or different. In one embodiment, the PCLA grid can function as a "scratch-pad" for a set of iconic operations, such as shape detection, contour following, pattern matching, all performed using the 20 outputs of the PCA or external outputs, potentially at different resolutions than the data received. Furthermore, each element of the PCLA can be addressed like a memory array to which the input (e.g., outputs from the PCA) can be applied or stored, and from which results can be retrieved. Added functionality would result from 25 the ability to transfer bits between neighbors (analogous to shift operations) or to arbitrary locations of the cellular logic grid (random access transfers). The embodiment of Figure 23 illustrates a simple implementation of the elements of the programmable cellular logic array (PCLA). In the embodiment illustrated in Figure 23, only the center cell connections are shown to prevent 30 clutter, also in the embodiment shown, the cell output has been staged to prevent race conditions. In this embodiment, all operations are strictly local. 29 CTmRTTTITTV -HE.T (RULE 26) WO 00/52639 PCT/US0O/05785 This is not a limitation of the invention. In one embodiment, global operations such as matching with a global pattern, reset, set, and other can also be implemented into the PCLA. Furthermore, added functionality could be realized (1) with increased connectivity, where each cell receives more inputs, 5 such as those from additional cells in the neighborhood, or from arbitrary cells on the array via memory-like addressing; and (2) with increased cell memory, where the results of previous operations can be stored in the cell. In one embodiment, the logic within the cell should be programmable. In fact, the program may change many times during the operation. Sample logic functions 10 could be AND, OR, INVERT, XOR etc. of a set of the available inputs. In one embodiment, the PCLA would benefit from the use of the emerging reconfigurable field programmable gate arrays (FGPA's). This is a relatively new field, however, researchers are keenly looking at ways to capitalize on the inherent flexibility in these devices to facilitate the building of 15 a better computing paradigm. One specific concept in this domain, termed the plastic cell architecture, brings forth a new circuit type that is laid out as an array of identical computing elements or cells which could dynamically reconfigure themselves for specific problems. This new computing paradigm offers a novel feature beyond the 20 common reconfigurable FPGA concept, where so far, it's been possible to reconfigure circuits only via software downloaded to one or more FPGA and the chips then directly execute the prescribed functions as hardwired circuits. This added feature is the ability of one circuit to dynamically configure another circuit. The resulting processing array is able to mimic the ability to create 25 specialized cells, which in turn allow a cellular array like the PCLA to configure itself based on outputs of its neighbors, or of itself. This level of data driven performance allow for implementation of very complex functions from very simple rules. 30 30 'CTImQTTTTTTF' QHT1FT (ITILE 261 WO 00/52639 PCTIUSOO/05785 PCLA Process Figure 24 illustrates one embodiment of the operational process of the Programmable Cellular Logic Array (PCLA) illustrated above. In one embodiment, 5 before the process begins, the necessary local and global logic functions that are needed is determined. These logic functions should already be implemented in each element of the array where they need to be performed, or it should be possible to program the logic contained in each element of the array to perform the logic function needed. In general, an operation is defined by the pattern of 10 operations carried out by the individual elements of the array. Accordingly, a program is made of the sequence of operations needed to complete the task. such that each step of the program specifies one or more functions to be performed by each element of the array. In one embodiment, at any given time, all elements may be performing the same function or different 15 functions. Likewise, in one embodiment, it is also possible to obtain the next operation conditional upon or based on results of one or more previous operations. In one embodiment, the program may be stored externally or on the same chip where the array is implemented. In an alternate embodiment, the program memory may be local, i.e., each cell of the array can have its own program memory. 20 Referring back to Figure 24, one embodiment of the operational process of the Programmable Cellular Logic Array (PCLA) illustrated above, may be implemented through the following process: 1. START 2. If previous operation is complete or if new data is needed, get new 25 inputs. These inputs may come from continuous or discretely sampled data from sensors built into the array, or they can be scanned into the input nodes from an external data source, including, but not limited to the PCA, by employing an appropriate scanning mechanism. 3. Carry out the specified operation. The operation code that specifies the 30 operation can be applied from either an external source or from internal stored program memory. 31 fCYTTT1TTTV CT.TWT /DT TT F9\ WO 00/52639 PCTIUSOO/05785 4. Allow sufficient time for the operation to complete, i.e., for the electronic circuits to reach a point where the change of the value of the output of the array equals zero. Output of active sensors should have been digitized a priori. 5 5. Read and store the output(s) of the array. The output can be stored externally or internally. If the PCLA is used in tandem with a PCA, then the output can also be sent to PCA. 6. Wait for next operation. 7. If end of program, go to (END). 10 8. Go to (2). 9. END 15 Communication between the PCA and PCLA It is evident that in addition to power and ground signals, many other data, control and address signals common need to be distributed all across the chip. As is the case with many photosensor chips, the PCA portion of the chip should be covered with a layer of metal with openings only at the photosensitive 20 areas to allow for the exposure to light of the package. Typically, one would ground that layer of metal that covers the whole chip. A similar layer could also be used on the PCLA portion of the dual architecture to carry the output of the PCA to all points of the logic array. The row select and column select signals of the PCA are reminiscent of 25 memory addressing diagrams. Only when both signals are logic high's, a specific pixel is selected. These signals may be used to load the weights to the cell. Elements of the PCLA can also be addressed as one would address memory cells. A row of cells could be thought of as a long word. A set of write 30 and shift operations could replace the need to route multiple address signals across the PCLA. 32 QTT1DCTTTTTTV CTTUrT (DITT V 1\ WO 00/52639 PCT/USOO/05785 PCA and PCLA Tandem Process Figure 25 illustrates one embodiment of the operational process of using the Programmable Convolution Array (PCA) and the Programmable Cellular 5 Logic Array (PCLA) in tandem. In one embodiment, before the process begins, the necessary weight patterns to be applied to the PCA for all of the operations to be performed is determined. Likewise, the necessary local and global logic functions that are needed for the PCLA is determined. In one embodiment, an operation is defined by the pattern of weights for the PCA and the pattern of 10 logic functions carried out by the individual elements of the PCLA. Accordingly, in one embodiment, a program is made of the sequence of operations needed to complete the task, such that each step of the program specifies one or more operations to be performed by either the PCA or the PCLA, or both. In one embodiment, the program may be stored externally or on 15 the same chip or multichip module where the two arrays is implemented. In an alternate embodiment, the program memory may be central or local, i.e., each cell of the arrays can have its own program memory. Referring back to Figure 25, one embodiment of the operational process using the Programmable Convolution Array (PCA) and the Programmable Cellular 20 Logic Array (PCLA) in tandem, may be implemented through the following process: 1. START 2. If previous operation is complete or if new data is needed, get new inputs. These inputs may come from continuous or discretely sampled 25 data from sensors built into one or more of the arrays, or they can be scanned into the input nodes from an external data source, including, but not limited to from one array to the other, by employing an appropriate scanning mechanism. 3. Carry out the specified operation. The operation code that specifies the 30 operation can be applied from either an external source or from internal stored program memory. 33 c1TDcQTYfTTTrU Q11 rT DIIT P I\ WO 00/52639 PCT/USOO/05785 4. Allow sufficient time for the operation to complete, i.e., for the electronic circuits to reach a point where the change of the value of the output of the array equals zero or by some other criteria, the operation is deemed to be complete. 5 5. Read and store the output(s) of the array(s). The output can be stored externally or internally. The output can also be sent from one array to the other. 6. Wait for next operation. 7. If end of program, go to (END). 10 8. Go to (2). 9. END Application Example: Single chip finger print lock control 15 An application of the concepts of the present invention may be applied to finger print recognition systems. Finger print systems currently require a sensor, memory, processor, and loads of software, which in turn limits their practical application. There would be a myriad of applications for a single chip, however, that can accomplish the same task with no external processor, 20 memory, or program. For instance, a lock on a door or a drawer or power button that has a fingerprint key are examples. The latch remains in position until one of the "authorized" fingerprints are impressed on the lock. In this example, since the image of the fingerprint is binary, it may be practical to equip the PCLA with binary sensors and use it on its own. The 25 logic program to manipulate the image and decide whether this is the right fingerprint or not, could be built directly into the device. Alternately, if only a linear transformation of the fingerprint image suffices, one can build a stand alone PCA. The programs could be the combinations of the weights, which may be stored on the device. As another option, using both the PCLA and the 30 PCA may enhance the features, increase the resolution, or offer more security. 34 C1TDCTTTT TT'F QTrVT (DuITT V6 1 WO 00/52639 PCTIUSOO/05785 Computer Environment Figure 26 and the above description are intended to provide a general description of a suitable computing environment in which the invention may be implemented. Although not necessarily required, one embodiment of the present 5 invention may be implemented as a set of general context of computer-executable instructions, such as program modules, being executed by a computer, such as a client workstation or a server. Generally, program modules include routines, programs, objects, components, data structures and the like that perform particular tasks or implement particular abstract data types. 10 Moreover, those skilled in the art will appreciate that the invention may be practiced with other computer system configurations, including hand-held devices, multi-processor systems, microprocessor-based or programmable consumer electronics, communication devices, network PCs, minicomputers, mainframe computers and the like. The invention may also be practiced in distributed computing 15 environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote memory storage devices. As shown in Figure 26, an exemplary general purpose computing system may include a conventional personal computer 20 or the like, including a 20 processing unit 21, a system memory 22, and a system bus 23 that couples various system components including the system memory 22 to the processing unit 21. The system bus 23 may be any of several types of bus structures including a memory bus or memory controller, a peripheral bus, and a local bus using any of a variety of bus architectures. The system memory 22 may include 25 read-only memory (ROM) 24 and random access memory (RAM) 25. A basic input/output system 26 (BIOS), containing the basic routines that help to transfer information between elements within the personal computer 20, such as during start-up, may be stored in ROM 24. The personal computer 20 may further include a hard disk drive 27 for reading from and writing to a hard disk 30 (not shown), a magnetic disk drive 28 for reading from or writing to a removable magnetic disk 29, and an optical disk drive 30 for reading from or writing to a removable optical disk 31 such as a CD-ROM or other optical media. The hard disk drive 27, magnetic disk drive 28, and optical disk drive 35 nYTnTY1rT'T'FYTTU ' Y'WI Y:'T 1r' /nT T ' 1\ WO 00/52639 PCT/USOO/05785 30 may be connected to the system bus 23 by a hard disk drive interface 32, a magnetic disk drive interface 33, and an optical drive interface 34, respectively. The drives and their associated computer-readable media provide non-volatile storage of computer readable instructions, data structures, program modules and 5 other data for the personal computer 20. Although the exemplary embodiment described herein may employ a hard disk, a removable magnetic disk 29, and a removable optical disk 31, or combination thereof, it should be appreciated by those skilled in the art that other types of computer readable media which can store data that is accessible 10 by a computer, such as magnetic cassettes, flash memory cards, digital video disks, Bernoulli cartridges, random access memories (RAMs), read-only memories (ROMs) and the like may also be used in the exemplary operating environment. A number of program modules may be stored on the hard disk, magnetic 15 disk 29, optical disk 31, ROM 24 or RAM 25, including an operating system 35, one or more application programs 36, other program modules 37 and program data 38. A user may enter commands and information into the personal computer 20 through input devices such as a keyboard 40 and pointing device 42. Other input devices (not shown) may include a microphone or 20 microphones, joystick, game pad, satellite disk, scanner, or the like. These and other input devices are often connected to the processing unit 21 through a serial port interface 46 that is coupled to the system bus 23, but may be connected by other interfaces, such as a parallel port, game port, or universal serial bus (USB). A monitor 47 or other type of display device may also be connected to 25 the system bus 23 via an interface, such as a video adapter 48. In addition to the monitor 47, personal computers may typically include other peripheral output devices (not shown), such as speakers and printers. The personal computer 20 may operate in a networked environment using logical connections to one or more remote computers, such as a remote 30 computer 49. The remote computer 49 may be another personal computer, a server, a router, a network PC, a peer device or other common network node, 36 n.T"n'TWTY TTV IVYTEr /nYT T E' WO 00/52639 PCT/USOO/05785 and typically includes many or all of the elements described above relative to the personal computer 20, although only a memory storage device 50 has been illustrated in Figure 26. The logical connections depicted in Figure 26 include a local area network (LAN) 51 and a wide area network (WAN) 52. Such 5 networking environments are commonplace in offices. enterprise-wide computer networks, Intranets, and the Internet. When used in a LAN networking environment, the personal computer 20 is connected to the LAN 51 through a network interface or adapter 53. When used in a WAN networking environment, the personal computer 20 typically 10 includes a modem 54 or other means for establishing communications over the wide area network 52, such as the Internet. The modem 54, which may be internal or external, is connected to the system bus 23 via the serial port interface 46. In a networked environment, program modules depicted relative to the personal computer 20, or portions thereof, may be stored in the remote 15 memory storage device. It will be appreciated that the network connections shown are exemplary and other means of establishing a communications link between the computers may be used. It is further understood that different elements or components may be included or excluded from the general computing environment, or otherwise 20 combined, to implement the concepts and teachings of the present invention as defined in the appended claims. Network Environment As noted, the general-purpose computer described above can be 25 deployed as part of a computer network. In general, the above description applies to both server computers and client computers deployed in a network environment. Figure 27 illustrates one such exemplary network environment in which the present invention may be employed. As shown in Figure 27, a number of servers 1 0a, 1 Ob, etc., are interconnected via a communications 30 network 160 (which may be a LAN, WAN, Intranet or the Internet) with a number of client computers 20a, 20b, 20c, etc. In a network environment in 37 - TT VWIT'YU T'T'T . C Y TW r UTTTE WO 00/52639 PCT/USOO/05785 which the communications network 160 is, e.g., the Internet, the servers 10 can be Web servers with which the clients 20 communicate via any of a number of known protocols such as, for instance, hypertext transfer protocol (HTTP). Each client computer 20 can be equipped with a browser 180 to gain access to 5 the servers 10, and client application software 185. As shown in the embodiment of Figure 27, server 1 Oa includes or is coupled to a dynamic database 12. As shown, the database 12 may include database fields 12a, which contain information about items stored in the database 12. For instance, the 10 database fields 12a can be structured in the database in a variety of ways. The fields 12a could be structured using linked lists, multi-dimensional data arrays, hash tables, or the like. This is generally a design choice based on ease of implementation, amount of free memory, the characteristics of the data to be stored, whether the database is likely to be written to frequently or instead is 15 likely to be mostly read from, and the like. A generic field 12a is depicted on the left side. As shown, a field generally has sub-fields that contain various types of information associated with the field, such as an ID or header sub-field, type of item sub-field, sub-fields containing characteristics, and so on. These database fields 12a are shown for illustrative purposes only, and as mentioned, 20 the particular implementation of data storage in a database can vary widely according to preference. Thus, the present invention can be utilized in a computer network environment having client computers for accessing and interacting with the network and a server computer for interacting with client computers and 25 communicating with a database with stored inventory fields. Likewise, the present invention can be implemented with a variety of network-based architectures, and thus should not be limited to the examples shown. From the above description and drawings, it will be understood by those of ordinary skill in the art that the particular embodiments shown and described 30 are for purposes of illustration only and are not intended to limit the scope of the invention. Those of ordinary skill in the art will recognize that the invention 38 - -YTT 'Y'T IV' T YTT E YU ' /nTT T -I \ WO 00/52639 PCTUSOO/05785 may be embodied in other specific forms without departing from its spirit or essential characteristics. References to details of particular embodiments are not intended to limit the scope of the claims. 39 OTTDQ'rTTTTTF iUT /DTTT Q ')V\

Claims (3)

1. An integrated sensing device comprising: 5 an array of sensor processor cells capable of being arranged into a detection array, each sensor processor cell comprising: a sensing medium; at least one transconductance amplifier configured for feedforward template multiplication; 10 at least one transconductance amplifier configured for feedback template weights; a plurality of local dynamic memory cells; a data bus for data transfer; a local logic unit; and 15 wherein the array of sensor processor cells, by responding to data control signals, is capable of transforming, reshaping, and modulating the original sensed image into varied represenations which include (and extend) traditional spatial and temporal processing transformations. 20
2. An integrated sensor processing cell device comprising: a sensing medium configured to produce a signed pixel output; at least one memory device configured for the storage of weight bits, wherein at least one memory device is configured to store the signed pixel output; a plurality of multiplexers operatively associated with at least 25 one of the memory device; at least one transconductance amplifier operatively associated with at least one of the plurality of multiplexers, the at least one 40 n 'Tw T1YTT T .T7 OIYY o /Yh ITT 7 '"bY\ WO 00/52639 PCT/USOO/05785 transconductance amplifier configured for operation in the unity follower configuration with variable gain; a multiple input logic gate associated with the at least one of said memory devices that is configured to store the signed pixel output; and 5 wherein the integrated sensor processing cell, by responding to data control signals, is capable of transforming, reshaping, and modulating the original sensed image into varied represenations which include (and extend) traditional spatial and temporal processing transformations. 10
3. An integrated sensing and imaging device comprising: an array of sensor processor cells capable of being arranged into a detection array, each said sensor processor cell comprising: a sensing medium configured to produce an electrically representable signal output; 15 a means of quantifying the said sensor output, a means of storing the said sensor output, a plurality of programmable logic elements arranged on a cellular grid, wherein each logic element is capable of receiving inputs from the output of neighboring cells and outputs from itself; and 20 wherein the integrated imaging device, by responding to internal or external data control signals, is capable of transforming, reshaping, and modulating the original sensed signals into varied represenations which include (and extend) traditional spatial and temporal processing transformations. 25 41 TYTCrn'TTrT' cYTrrr IrTI V II1\
AU36167/00A 1999-03-05 2000-03-06 Two architectures for integrated realization of sensing and processing in a single device Abandoned AU3616700A (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US12317799P 1999-03-05 1999-03-05
US60123177 1999-03-05
PCT/US2000/005785 WO2000052639A2 (en) 1999-03-05 2000-03-06 Two architectures for integrated realization of sensing and processing in a single device

Publications (1)

Publication Number Publication Date
AU3616700A true AU3616700A (en) 2000-09-21

Family

ID=22407147

Family Applications (1)

Application Number Title Priority Date Filing Date
AU36167/00A Abandoned AU3616700A (en) 1999-03-05 2000-03-06 Two architectures for integrated realization of sensing and processing in a single device

Country Status (7)

Country Link
EP (1) EP1175659A2 (en)
JP (1) JP2002538557A (en)
CN (1) CN1457471A (en)
AU (1) AU3616700A (en)
CA (1) CA2364182A1 (en)
HK (1) HK1039992A1 (en)
WO (1) WO2000052639A2 (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6278377B1 (en) 1999-08-25 2001-08-21 Donnelly Corporation Indicator for vehicle accessory
WO2001037519A2 (en) 1999-11-19 2001-05-25 Gentex Corporation Vehicle accessory microphone
US7120261B1 (en) 1999-11-19 2006-10-10 Gentex Corporation Vehicle accessory microphone
ES2209642B1 (en) * 2002-11-04 2005-10-01 Innovaciones Microelectronicas, S.L. MIXED SIGNAL PROGRAMMED INTEGRATED CIRCUIT ARCHITECTURE FOR THE PERFORMANCE OF AUTONOMOUS VISION SYSTEMS OF A SINGLE CHIP AND / OR PRE-PROCESSING OF IMAGES IN HIGHER LEVEL SYSTEMS.
US10346944B2 (en) * 2017-04-09 2019-07-09 Intel Corporation Machine learning sparse computation mechanism
CN111368253B (en) * 2018-12-26 2023-09-26 兆易创新科技集团股份有限公司 Convolution operation method and device based on nonvolatile memory
CN111539178B (en) * 2020-04-26 2023-05-05 成都市深思创芯科技有限公司 Chip layout design method and system based on neural network and manufacturing method
CN111983629B (en) * 2020-08-14 2024-03-26 西安应用光学研究所 Linear array signal target extraction device and extraction method

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5140670A (en) * 1989-10-05 1992-08-18 Regents Of The University Of California Cellular neural network
US5355528A (en) * 1992-10-13 1994-10-11 The Regents Of The University Of California Reprogrammable CNN and supercomputer

Also Published As

Publication number Publication date
EP1175659A2 (en) 2002-01-30
WO2000052639A2 (en) 2000-09-08
CN1457471A (en) 2003-11-19
HK1039992A1 (en) 2002-05-17
WO2000052639A3 (en) 2001-02-15
CA2364182A1 (en) 2000-09-08
JP2002538557A (en) 2002-11-12

Similar Documents

Publication Publication Date Title
Moini Vision chips or seeing silicon
Roska et al. Toward visual microprocessors
JP3583535B2 (en) Optically addressed neural networks
US6768515B1 (en) Two architectures for integrated realization of sensing and processing in a single device
WO2017186830A1 (en) Device and method for distributing convolutional data of a convolutional neural network
Thakoor et al. Electronic hardware implementations of neural networks
Hasler Opportunities in physical computing driven by analog realization
AU3616700A (en) Two architectures for integrated realization of sensing and processing in a single device
Higuchi et al. Evolvable hardware chips for industrial applications
Toh Learning from the kernel and the range space
Dogaru et al. The simplicial neural cell and its mixed-signal circuit implementation: an efficient neural-network architecture for intelligent signal processing in portable multimedia applications
US6735482B1 (en) Integrated sensing and processing
Lange et al. Optical neural chips
Laiho et al. MIPA4k: Mixed-mode cellular processor array
Ramanujam et al. Mapping combinatorial optimization problems onto neural networks
Ruckert ULSI architectures for artificial neural networks
KR0183406B1 (en) Capacitive structures for weighted summation, as used in neural nets
Wu et al. Alternative learning vector quantization
Lee et al. VLSI image processor using analog programmable synapses and neurons
Bridges et al. A reconfigurable VLSI learning array
CN108513042B (en) Apparatus for image processing
JPH02242488A (en) Image processor
Erten et al. Real time realization of early visual perception
Tamás et al. Cellular Neural Networks and Cellular Wave Computers
Cho et al. Biologically inspired image sensor/processor architecture with 2D cellular neural network for vision

Legal Events

Date Code Title Description
MK1 Application lapsed section 142(2)(a) - no request for examination in relevant period