WO1995016234A1 - Apparatus and method for signal processing - Google Patents

Apparatus and method for signal processing Download PDF

Info

Publication number
WO1995016234A1
WO1995016234A1 PCT/US1994/014219 US9414219W WO9516234A1 WO 1995016234 A1 WO1995016234 A1 WO 1995016234A1 US 9414219 W US9414219 W US 9414219W WO 9516234 A1 WO9516234 A1 WO 9516234A1
Authority
WO
WIPO (PCT)
Prior art keywords
bit
write
letc
count
int
Prior art date
Application number
PCT/US1994/014219
Other languages
French (fr)
Inventor
Avidan Akerib
Original Assignee
Asp Solutions Usa, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from IL10799693A external-priority patent/IL107996A0/en
Priority claimed from IL10980194A external-priority patent/IL109801A0/en
Application filed by Asp Solutions Usa, Inc. filed Critical Asp Solutions Usa, Inc.
Priority to JP7516374A priority Critical patent/JPH09511078A/en
Priority to AU14334/95A priority patent/AU1433495A/en
Priority to EP95905890A priority patent/EP0733233A4/en
Publication of WO1995016234A1 publication Critical patent/WO1995016234A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/80Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
    • G06F15/8007Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors single instruction multiple data [SIMD] multiprocessors
    • G06F15/8023Two dimensional arrays, e.g. mesh, torus
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/80Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
    • G06F15/8038Associative processors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/94Hardware or software architectures specially adapted for image or video understanding
    • G06V10/955Hardware or software architectures specially adapted for image or video understanding using specific electronic processors

Definitions

  • the present invention relates to methods and apparatus for signal processing.
  • the present invention seeks to provide improved methods and apparatus for signal processing.
  • ASP Associative Signal Processing
  • the ASP architecture is totally different.
  • the computation is carried out on an "intelligent memory” while the CPU is replaced by a simple controller that manages this "intelligent” memory.
  • each cell or word in this memory can identify its contents and change it according to instructions received from the controller.
  • This operation takes only 10 machine cycles in comparison to 1 - 3 million machine cycles with conventional serial computers. Using this basic instruction set of read, identify and write, all the arithmetical and logical operations can be performed.
  • associative signal processing apparatus for processing an incoming signal, the apparatus including an array of processors, each processor including a multiplicity of associative memory cells, each sample of an incoming signal being processed by at least one of the processors, a register array including at least one register operative to store responders arriving from the processors and to provide communication between processors, and an I/O buffer register for inputting and outputting a signal, wherein the processor array, the register array and the I/O buffer register are arranged on a single module.
  • associative signal processing apparatus including an array of processors, each processor including a multiplicity of associative memory cells, at least one of the processors being operative to process a plurality of samples of an incoming signal, a register array including at least one register operative to store responders arriving from the processors and to provide communication between processors, and an I/O buffer register for inputting and outputting a signal.
  • the processor array, the register array and the I/O buffer register are arranged on a single chip.
  • the register array is operative to perform at least one multicell shift operation.
  • signal processing apparatus including an array of associative memory words, each word including a processor, each sample of an incoming signal being processed by at least one of the processors, a register array including at least one register operative to provide communication between words and to perform at least one multicell shift operation, and an I/O buffer register for inputting and outputting a signal.
  • the register array is also operative to perform single cell shift operations.
  • the I/O buffer register and the processors are operative in parallel.
  • the word length of the I/O buffer register is increasable by decreasing the wordlength of the associative memory cells.
  • the apparatus is operative in video real time.
  • the signal includes an image.
  • At least one word in the array of words includes at least one nonassociative memory cell.
  • At least one word in the array of words includes at least one column of nonassociative memory cells.
  • the array, the register array and the I/O buffer register are arranged on a single module.
  • the module has a bus which receives instructions and also performs at least one multicell shift operation.
  • the module has a first bus which performs at least one multicell shift operation and a second bus which performs at least one single cell shift operation.
  • an array of processors which communicate by multicell and single cell shift operations, the array including a plurality of processors, a first bus connecting at least a pair of the processors which is operative to perform at least one multicell shift operation, and a second bus connecting at least a pair of the processors which is operative to perform single cell shift operations.
  • a signal processing method including:
  • counting includes generating a histogram.
  • the signal includes a color imag e .
  • At least one characteristic includes at least one of the following group of characteristics: intensity, noise, and color density.
  • the method also includes scanning a medium bearing the color image.
  • the image includes a color image.
  • an edge detection method including identifying a first plurality of edge pixels and a second plurality of candidate edge pixels, identifying, in parallel, all candidate edge pixels which are connected to at least one edge pixel as edge pixels, and repeating the second identifying step at least once.
  • a signal processing method including storing an indication that a first plurality of first samples has a first characteristic, storing, in parallel for all individual samples which are connected to at least one sample having the first characteristic, an indication that the connected samples have the first characteristic, and repeating the second step at least once.
  • the signal includes an image and the first characteristic of the first samples is that the first samples are edge pixels.
  • a feature labeling method in which a signal is inspected, the signal including at least one feature, the feature including a set of connected samples, the method including storing a plurality of indices for a corresponding plurality of samples, replacing, in parallel for each individual sample from among the plurality of samples, the stored index of the individual sample by an index of a sample connected thereto, if the index of the connected sample is ordered above the index of the individual sample, and repeating the replacing step at least once.
  • replacing is repeated until only a small number of indices are replaced in each iteration.
  • the signal includes an image.
  • the signal includes a color image.
  • the samples include pixels
  • the first characteristic includes at least one color component and adjacency of pixels at least partly determines connectivity of samples.
  • the pixels form an image in which a boundary is defined and repeating is performed until the boundary is reached.
  • repeating is performed a predetermined number of times.
  • a method for image correction including computing a transformation for an output image imaged by a distorting lens, such as an HDTV lens, which compensates for the lens distortion, and applying the transformation in parallel to each of a plurality of pixels in the output image.
  • a distorting lens such as an HDTV lens
  • associative signal processing apparatus including a plurality of comparing memory elements each of which is operative to compare the contents of memory elements other than itself to respective references in accordance with a user-selected logical criterion, thereby to generate a responder if the comparing memory element complies with the criterion, and a register operative to store the responders.
  • the criterion includes at least one logical operand.
  • At least one logical operand includes a reference for at least one memory element other than the comparing memory element itself.
  • a plurality of memory elements may be respectively responsible for a corresponding plurality of pixels forming a color image.
  • the references may include three specific pixel values A, B and C and the user-selected logical criterion may be that an individual pixel have a value of A, OR that its upper right neighbor has a value of B and its lower left neighbor has a value of C.
  • each memory element includes at least one memory cell.
  • the plurality of comparing memory elements are operative in parallel to compare the contents of a memory element other than themselves to an individual reference.
  • an associative memory including an array of PEs (processor elements) including a plurality of PE's, wherein each PE includes a processor of variable size, and a word of variable size including an associative memory cell, wherein all of the associative memory cells from among the plurality of associative memory cells included in the plurality of PE's are arranged in the same location within the word and wherein the plurality of words included in the plurality of PE's together form a FIFO.
  • PEs processor elements
  • word of variable size including an associative memory cell
  • the word of variable size includes more than one associative memory cell.
  • a method for modifying contents of a multiplicity of memory cells including performing, once, an arithmetic computation on an individual value stored in a plurality of memory cells and storing the result of the arithmetic computation in a plurality of memory cells which contain the individual value.
  • storing is carried out in all memory cells in parallel.
  • Also described herein is a chip for multimedia and image processing applications. It is suitable for low-cost, low power consumption, small size and high-performance real-time image processing for consumer applications and high-end powerful image processing for multimedia and communication applications.
  • the chip is a general purpose, massively parallel processing chip, in which typically 1024 associative processors are crowded onto one chip, enabling the processing of 1024 digital words in one machine cycle of the computer clock.
  • the chip was designed to allow the performance of a wide range of image processing and multimedia applications in real-time video rate.
  • existing general purpose, serial computing chips and digital signal processing chips (DSPs) enable the processing of only 1 - 16 words in one machine cycle.
  • the chip's major instruction set is based on four basic commands that enable the performance of all arithmetic and logic instructions. This is another design advantage that allows more than a thousand processors to be crowded onto a single chip.
  • a single chip typically performs the equivalent of 500 - 2000 million instructions per second (MIPS).
  • MIPS million instructions per second
  • a system based on the chip's architecture can reach multimedia performance of high-end computers at only a small fraction of the price of typical high-end computers.
  • the chip is based on a modular architecture, and enables easy connection of more than one chip in order to gain high performance (in a linear ratio) .
  • a large number of the chips can be connected in parallel in order to linearly increase overall performance to the level of the most sophisticated supercomputers.
  • the chip's architecture allows massively parallel processing in concurrence with data input and output transactions.
  • each of the 1024 chips has its own internal memory and data path.
  • the chip's data path architecture provides parallel loading of data into the internal processors, thereby eliminating the bottleneck between memory and CPU that can cause severe performance degradation in serial computers.
  • the chip uses an average of 1 watt to perform the equivalent of 500 MIPS which is 10 - 25 times better than existing general purpose and DSP chips.
  • Fig. 1 is a simplified functional block diagram of associative signal processing apparatus constructed and operative in accordance with a preferred embodiment of the present invention
  • Fig. 2 is a simplified flowchart of a preferred method for employing the apparatus of Fig. 1;
  • Fig. 3 is a simplified block diagram of associative signal processing apparatus for processing an incoming signal which is constructed and operative in accordance with a preferred embodiment of the present invention
  • Fig. 4 is simplified block diagram of a preferred implementation of the apparatus of Fig. 1;
  • Fig. 5 is a simplified block diagram of an alternative preferred implementation of the apparatus of Fig. 1;
  • Fig. 6 is a simplified block diagram of a portion of the apparatus of Fig. 5;
  • Fig. 7 is a simplified block diagram of a portion of the apparatus of Fig. 6;
  • Fig. 8 is a simplified block diagram of another portion of the apparatus of Fig. 6;
  • Fig. 9 is a simplified flowchart illustrating the operation of the apparatus of Fig. 5;
  • Fig. 10 is a simplified pictorial diagram illustrating the operation of a portion of the apparatus of Fig. 5;
  • Fig. 11 is a simplified block diagram of associative real-time vision apparatus constructed and operative in accordance with an alternative preferred embodiment of the present invention.
  • Fig. 12 is a simplified pictorial illustration of the operation of the apparatus of Fig. 11 during compare and write commands;
  • Fig. 13 is a simplified pictorial illustration of interprocessor communication within a portion of the apparatus of Fig. 11;
  • Fig. 14 is a simplified block diagram illustrating chip interface and interconnections within a portion of the apparatus of Fig. 11;
  • Fig. 15 is a simplified pictorial illustration of an automaton used to evaluate the complexity of the apparatus of Fig. 11;
  • Fig. 16 is a simplified block diagram illustrating word format of associative memory within a portion of the apparatus of Fig. 11;
  • Fig. 17 is a simplified block diagram illustrating another word format of associative memory within a portion of the apparatus of Fig. 11;
  • Fig. 18 is a simplified block diagram illustrating an additional word format of associative memory within a portion of the apparatus of Fig. 11;
  • Fig. 19 is a simplified block diagram illustrating an implementation of a method of thresholding utilizing the apparatus of Fig. 11;
  • Figs. 20A - 20F are simplified pictorial illustrations of test templates illustrating an implementation of a method of thinning utilizing the apparatus of Fig. 11;
  • Fig. 21 is a simplified block diagram illustrating an implementation of a method of matching utilizing the apparatus of Fig. 11;
  • Fig. 22 is a simplified block diagram illustrating still another word format of associative memory within a portion of the apparatus of Fig. 11;
  • Fig. 23 is a simplified block diagram illustrating an additional word format of associative memory within a portion of the apparatus of Fig. 11;
  • Fig. 24 is a simplified block diagram illustrating another word format of associative memory within a portion of the apparatus of Fig. 11;
  • Fig. 25 is a graphical illustration of comparative execution time for alternative implementations of a stereo method utilizing the apparatus of Fig. 11;
  • Fig. 26 is graphical illustration of comparative complexity for alternative implementations of a stereo method utilizing the apparatus of Fig. 11;
  • Fig. 27 is a simplified block diagram illustrating another word format of associative memory within a portion of the apparatus of Fig. 11;
  • Fig. 28 is a simplified block diagram illustrating a portion of a method for edge detection utilizing the apparatus of Fig. 11;
  • Fig. 29 is a simplified block diagram illustrating another word format of associative memory within a portion of the apparatus of Fig. 11;
  • Fig. 30 is a simplified pictorial illustration of pixels used within a method for processing an associative saliency network utilizing the apparatus of Fig. 11;
  • Fig. 31 is a simplified block diagram illustrating another word format of associative memory within a portion of the apparatus of Fig. 11;
  • Fig. 32 is a graphical illustration of normal parameterization of a line used within a method for computing a Hough transform utilizing the apparatus of Fig. 11;
  • Fig. 33 is a graphical illustration of a portion of a method for Convex Hull generation utilizing the apparatus of Fig. 11;
  • Fig. 34 is a simplified block diagram illustrating a method for processing an associative Voronoi diagram utilizing the apparatus of Fig. 11;
  • Appendix A is a listing of a subroutine called "sub.rtn" which is called in each of the listings of Appendices B - 0;
  • Appendix B is a listing of a preferred associative signal processing method for generating a histogram
  • Appendix C is a listing of a preferred associative signal processing method for ID convolution
  • Appendix D is a listing of a preferred associative signal processing method for a low pass filter application of 2D convolution
  • Appendix E is a listing of a preferred associative signal processing method for a Laplacian filter application of 2D convolution
  • Appendix F is a listing of a preferred associative signal processing method for a Sobel filter application of 2D convolution
  • Appendix G is a listing of a preferred associative signal processing method for curve propagation
  • Appendix H is a listing of a preferred associative signal processing method for optical flow
  • Appendix I is a listing of a preferred associative signal processing method for performing an RGB to YUV transformation
  • Appendix J is a listing of a preferred associative signal processing method for corner and line detection
  • Appendix K is a listing of a preferred associative signal processing method for contour label ing ;
  • Appendix L is a listing of a preferred associative signal processing method for saliency networking
  • Appendix M is a listing of a preferred associative signal processing method for performing a Hough transform on a signal which is configured as a line;
  • Appendix N is a listing of a preferred associative signal processing method for performing a Hough transform on a signal which is configured as a circle;
  • Appendix 0 is a listing of a preferred associative signal processing method for generating a Voronoi diagram signal.
  • FIG. 1 is a simplified functional block diagram of associative signal processing apparatus constructed and operative in accordance with a preferred embodiment of the present invention.
  • the apparatus of Fig. 1 includes a simultaneously accessible FIFO 10, or, more generally, any simultaneously accessible memory, which stores at least a portion of an incoming signal which arrives over a bus termed herein the DBUS.
  • the simultaneously accessible FIFO 10 feeds onto a PE (processor element) array 16 including a plurality of PE's 20 which feed onto a datalink 30 which preferably also serves as a responder memory. Alternatively, a separate responder memory may be provided.
  • Each PE includes at least one associative memory cell, more typically a plurality of associative memory cells such as, for example, 72 associative memory cells.
  • Each PE 20 stores and processes a subportion of the image, such that the subportions stored and processed by all of the PE's 20 forms the portion of the incoming signal stored at a single time in the simultaneously accessible FIFO 10.
  • the FIFO may, at a single time, store a block of 2048 pixels within the color image. If the processing task is so complex that two PEs are required to process each pixel, then the FIFO may, at a single time, store a smaller block of only 512 pixels within the color image.
  • a controller 40 which is typically connected in parallel to all the PE's.
  • Fig. 2 is a simplified flowchart of a preferred method for employing the apparatus of Fig. 1.
  • the first step of the method of Fig. 2 is step 54.
  • the system receives a user selected command sequence which is to be processed for each pixel of a current block of the color image.
  • the command sequence is stored in command sequence memory 50.
  • a command sequence comprises commands of some or all of the following types:
  • Multicell shift ⁇ The contents of each of one or more PE's shifts, via the datalink 30, directly into respectively non-adjacent one or more PE's,
  • step 54 the first block of the incoming signal is received by simultaneously accessible FIFO 10.
  • the command sequence is then processed, command by command, as shown in Fig. 2.
  • Fig. 3 is a simplified block diagram of associative signal processing apparatus for processing an incoming signal which is constructed and operative in accordance with a preferred embodiment of the present invention.
  • the signal processing apparatus of Fig. 3 includes the following elements, all of which are arranged on a single module 104 such as a single chip:
  • An array 110 of processors or PE's 114 of which, for simplicity, three are shown.
  • Each processor 114 includes a multiplicity of memory cells 120, of which, for simplicity, four are shown. From among each multiplicity of memory cells 120, at least one memory cell (exactly one, in the illustrated embodiment) is an associative memory cell 122.
  • the associative memory cell or cells 122 of each processor are all arranged in the same location or locations within their respective processors, as shown. As an example, there may be IK processors 114 each including 72 memory cells 120, all of which are associative.
  • At least one of the processors is operative to process more than one sample of an incoming signal.
  • a responder memory 130 including one or more registers which are operative to store responders arriving from the processors 120 and, preferably, to serve as a datalink therebetween. Alternatively, a separate datalink between the processors may be provided.
  • the datalink function of memory 130 allows at least one multicell shift operation, such as a 16-cell per cycle shift operation, to be performed.
  • the datalink function of memory 130 also preferably performs single cell shift operations in which a shift from one cell to a neighboring cell or from one PE to a neighboring PE is performed in each cycle.
  • a simultaneously accessible FIFO 140 or, more generally a simultaneously accessible memory, which inputs and outputs a signal.
  • a responder counting unit 150 which is operative to count the number of "YES" responders in responder memory 130.
  • a command sequence memory 160 which may be similar to the command sequence memory 50 of Fig. 1, and a controller 170 are typically external to the module 104.
  • the controller 170 is operative to control the command sequence memory 160.
  • Mid-level associative signal processing methods ⁇ corner and line detection; contour labeling, saliency networking, Hough transform, and geometric tasks such as convex hull generation and Voronoi diagram generation.
  • Appendix B is a listing of one software implementation of a histogram generation method.
  • the method of Appendix B includes a very short loop repeated for each gray level.
  • a COMPARE instruction tags all the pixels of that level and a COUNTAG tallies them up. The count is automatically available at the controller, which accumulates the histogram in an external buffer.
  • Low level vision involves the application of various filters to the image, which are most conveniently executed by convolution.
  • the image may be considered as a simple vector of length N X M or as a concatenation of N row vectors, each of length M.
  • Convolution of an N-element data vector by a P-element filter results in a vector of length N+P-1, but only the central N-P+1 elements, representing the area of full overlap between the two vectors, are typically of interest.
  • the convolution filter vector [f], of length P and precision 8 is applied as an operand by the controller, one element at a time.
  • the result may, for example, be accumulated in a field [fd] of length 8+8+log2(P).
  • a "temp” bit is used for temporary storage of the carry that propagates through field [fd].
  • a "mark” bit serves to identify the area of complete overlap by the filter vector.
  • Curve propagation is useful in that it eliminates weak edges due to noise, but continues to trace strong edges as they weaken.
  • two thresholds on gradient magnitude may be computed - "low” and “high”. Edge candidates with gradient magnitude under “low” are eliminated, while those above “high” are considered edges. Candidates with values between “low” and “high” are considered edges if they can be connected to a pixel above "high” through a chain of pixels above “low”. All other candidates in this interval are eliminated.
  • each "L” candidate is examined to see if at least one of its 8-neighbors is an edge, in which case it is also declared an edge by setting "E”.
  • the two flags are compared to see if steady state has been reached, in which case the process terminates.
  • Optical flow assigns to every point in the image a velocity vector which describes its motion across the visual field.
  • the potential applications of optical flow include the areas of target tracking, target identification, moving image compression, autonomous robots and related areas.
  • the theory of computing optical flow is typically based on two constraints: the brightness of a particular point m the image remains constant, and the flow of brightness patterns varies smoothly almost everywhere.
  • Horn & Schunck derived an iterative process to solve the constrained minimization problem.
  • the flow velocity has two components (u,v).
  • a new set of velocities [u (n+1), v (n+1)] can be estimated from the average of the previous velocity estimates.
  • a method for implementing the Horn and Schunk method associatively is given in Appendix H.
  • One of the major task in color image processing is to transform the 24 bit space of the conventional Red (R), Green (G) and Blue (B) color components, to another space such as a (Y,U,V) space which is more suited for color image compression.
  • R Red
  • G Green
  • B Blue
  • Y,U,V Green
  • a preferred associative method for color space transformation is set forth in Appendix I.
  • An important feature for middle and higher level processing is the ability to distinguish corners and line direction.
  • line orientation is generated during the process.
  • the M&H algorithm is not directional, and the edge bit-map it produces must be further processed to detect line orientation.
  • an edge bit-map of a 9 ⁇ 9 neighborhood around each pixel is used to distinguish segment direction. The resulting method can typically discriminate 120 different lines and corners.
  • Appendix J A program listing of this method is set forth as Appendix J.
  • a preparation step labels each contour point with its x,y coordinates.
  • the process is generally iterative and operates on a 3 ⁇ 3 neighborhood of all contour points in parallel. Every contour point looks at each one of its 8 neighbors in turn and adopts the neighbor's label if smaller than its own.
  • the circular sequence in which neighbors are handled appreciably enhances label propagation.
  • Salient structures in an image can be perceived at a glance without the need for an organized search or prior knowledge about their shape. Such a structure may stand out even when embedded in a cluttered background or when its elements are fragmented.
  • Sha'ashua & Ullman have proposed a global saliency measure for curves based on their length, continuity and smoothness.
  • the image is considered as a network of N ⁇ N grid points, with d orientation elements segments or gaps coming into each point from its neighbors, and as many going out to its neighbors.
  • a curve of length L in this image is a connected sequence of orientation elements p(i),p(i+1), ... ,p(i+L), each element representing a line-segment or a gap in the image.
  • the Hough transform detects a curve whose shape is described by a parametric equation, such as a straight line or a conic section, even if there are gaps in the curve.
  • a parametric equation such as a straight line or a conic section
  • Each point of the figure in image space is transformed to a locus in parameter space.
  • a histogram is generated giving the distribution of locus points in parameter space. Occurrence of the object curve is marked by a distinct peak in the histogram (intersection of many loci).
  • the x-y coordinates are given by 8 bits in absolute value and sign. Angle A from 0 to pi (3.14157) is given to a matching precision of 10 bits (excluding sign of gradient).
  • the sine and cosine are evaluated by table look-up. Preferably, the table size is recued four-fold to take into account the symmetry of these functions. After comparing A, the histogram is evaluated and read-out element by element using a "countag" command.
  • a histogram is generated for x0,y0.
  • An associative implementation of a preferred Hough transform is set forth in Appendix N.
  • Each of the given points acts as a source of "fire” that spreads uniformly in all directions.
  • the boundaries consist of those points at which fires from two (or three) sources meet.
  • Every point in the given set is initially marked with a different color, such as its own xy-coordinates.
  • Each point in the image looks at its 8-neighbors. A blank (uncolored) point that sees a colored neighbor will copy its color. If both are colored, the point will compare colors, marking itself as a Voronoi (boundary) point if the colors are different. This process is iterated until all points are colored.
  • any of the steps categorized as Group 1 may be carried out in parallel with any of the steps from Group 3. Also, any of the steps from Group 1, any of the steps from Group 2 and any of the steps from Group 4 may be carried out in parallel.
  • Appendices B to O may be run on any "C” language compiler such as the Borland “C++” compiler, using the CLASS function.
  • each of the methods of Appendices B to O includes the following steps:
  • ASP 100 One implementation of an associative signal processing chip is now described. The implementation is termed herein ASP 100.
  • the ASP100 is an associative processing chip. It is intended to serve as part of the Vision Associative Computer, most likely in an array of multiple ASP100 chips.
  • the ASP100 consists of an associative memory array of 1K ⁇ 72 bits, peripheral circuitry, image FIFO I/O buffer, and control logic.
  • the ASP100 can be operated as a single chip (Single Chip Mode) or as part of an array of ASP100 chips (Array Mode).
  • the Single Chip Mode is shown in Figure 4. In this mode a single ASP100 is operated in conjunction with a controller.
  • the Array mode is shown in Figure 5.
  • An array of ASP100 chips are interconnected in parallel, constituting a linear array.
  • the ASP100 may be packaged in a 160 pin PQFP package. All output and I/O pins are 3-state, and can be disabled by CS. Following is the complete list of pins:
  • ARRAY is the main associative array.
  • FIFO is the image input/output fifo buffer.
  • SIDE is the side-array, consisting of the tag, the tag logic, the tag count, the select-first, the row drivers (of WriteLine and MatchLine) and sense amplifiers, and the shift units.
  • TOP consists of the mask and comparand registers, and the column drivers (of BitLine, InverseBitLine, and MaskLine).
  • BOTTOM contains the output register and sense amplifiers.
  • CONTROL is the control logic for the chip. Microcontrol is external in this version.
  • the Array consists of 1024 ⁇ 72 associative processing elements (APEs), organized in three columns of 24 APEs wide each, and physically split into three blocks of 342 ⁇ 72 APEs. This six-way split achieves square aspect ratio of the layout and also helps contain the load of the vertical bus wires.
  • one 24 bit sector of the array is reconfigurable as follows (by means of the "CONFIFO" Configure Fifo instruction):
  • APE Associative Processing Element
  • CAM Content Addressable Memory
  • the Storage element consists of two cross coupled CMOS inverters.
  • the Write device implements the logical AND of MASK and WL, so that it can support the MASKED WRITE operation.
  • the Match device implements a dynamic EXCLUSIVE OR (XOR) logic. This technique allows area efficient and reliable implementation of the COMPARE operation.
  • the FIFO is designed to input and output image data in parallel with computations carried out on the ARRAY. It consists of a reconfigurable MATRIX of 1024 ⁇ [24 or 16 or 8] APEs 190, three columns 192 each of 1024 bi-directional Switches, and Address Generator 194.
  • the corresponding section of the Comparand register, in TOP serves as the FIFO input register
  • the corresponding section of the Output Register, in BOTTOM serves as the FIFO Output Register.
  • the FIFO Controller FC resides in TOP.
  • the FIFO is configured by the CONFIFO instruction, where the three LSBs of the operand are:
  • the Address Generator 194 consists of a shift register and implements a sequential addressing mode. It selects a currently active FIFO word line.
  • the FIFO has two modes of operation, IMAGE I/O Mode and IMAGE EXCHANGE Modes.
  • the bi-directional Switches (one column of the three) disconnect the MATRIX from the ARRAY in IMAGE I/O Mode (see below) and connects the MATRIX to the ARRAY in IMAGE EXCHANGE Mode, creating a combined array of APEs.
  • the Input and Output Registers serve as buffer registers for the image I/O.
  • FIFO Controller controls the FIFO as follows: Pixel I/O is synchronous with the CLK. External control input RSTFIFO resets (clears) the Address Generator 194. FENB (asserted for at least 2 CLK cycles) enables the input (and output) of the next pixel (on the positive edge of CLK). Once all pixels entered (and output), FFUL is asserted for 2 CLK cycles. This I/O activity is performed asynchronously of the computation in the rest of the chip.
  • IMAGE I/O mode The basic operation of IMAGE I/O mode is carried out as follows.
  • the pixel at the VIN pins is entered into the FIFO Input Register (the FIFO section of the comparand register).
  • the Address Generator 194 enables exactly one word line.
  • the corresponding word is written into the FIFO Output Register (the FIFO section of the Output Register), and through it directly to the VOUT pins, in an operation similar to Read execution. Subsequently, the word in the FIFO Input Register is written into the same word, similar to a Write execution.
  • VOUT pins are 3-state. They are enabled and disabled internally as needed.
  • This sequence of operations is carried out in a loop 1024 times in order to fill the 1024 processors with data.
  • ASP100 chips can be chained together with a FENB/FFUL chain, where the first ASP100 receives the FENB from an external controller (true for 2 cycles), the FFUL of each ASP100 (true for 2 cycles) is connected directly to the FENB input of the next chip, and the last FFUL goes back to the controller.
  • a destination bit slice of the ARRAY is masked by MASK register and is then reset by a chain of SETAG; ClearComparand, WRITE operations (which can all be executed in one cycle).
  • a source bit slice of the FIFO MATRIX is masked by the MASK register. The contents of the bit slice are passed to the TAG register as a result of the COMPARE operation. The destination bit slice is masked again and then the contents of the TAG register are passed to the destination bit slice by a SetComparand, WRITE operation.
  • LMCC sector 0/1/2, destination ARRAY bit
  • SETAG WRITE
  • IMAGE IN This operation is carried out in exactly the same way as IMAGE IN, except that a destination bit slice is allocated m the FIFO MATRIX while a source bit slice is allocated in the ARRAY.
  • IMAGE EXCHANGE operation requires two different fields in the ARRAY (a first field for allocation of a new image, and a second one for temporary storage of the processed image). The two operations (IMAGE IN & OUT) can be combined in one loop.
  • Fig. 8 illustrates a preferred implementation of the SIDE block of Fig. 6.
  • the SIDE block is shown to include the TAG register, the NEAR neighbor connections, the FAR neighbor connections, the COUNT_TAG counter, the FIRST_RESPONDER circuit, the RSP circuit, and the required horizontal bus drivers and sense amplifiers.
  • the TAG register consists of a column of 1024 TAG_CELLs.
  • the TAG register is implemented by D flip flop with a set input and non-inverse output.
  • the input is selected by means of a 8-input multiplexer, with the following inputs: FarNorth, NearNorth, FarSouth, NearSouth, MatchLine (via sense amp), TAG (feedback loop), GND (for tag reset), and FirstResponder output.
  • the mux is controlled by MUX [0 : 2].
  • the NEAR neighbor connections interconnect the TAG_CELLs in an up/down shift register manner to nearest neighbors. They are typically employed for neighborhood operations along an image line, since pixels are stored consecutively by lines.
  • the FAR connections interconnect TAG_CELLs 16 apart, for faster shifts of many steps. They are typically used for neighborhood operations between image lines.
  • TAG SETAG, SHUP, SHDN, LGUP, LGDN, video load up and video load down microcode signals termed herein VLUP and VLDN respectively, COMPARE, FIRSEL.
  • the COUNT_TAG counter counts the number of TAG_CELLs containing '1'. It consists of three iterative logic arrays of 1 ⁇ 342 cells each. The side inputs of the counter are fed from the TAG register outputs. The counter operates in a bit-serial mode, starting with the least significant bits. In each cycle, the carry bits are retained in the FF (memory cell flip-flop) for the next cycle, and the sum is propagated down to the next stage. The counter is partitioned into pipeline stages. The output of all six columns are added by a summation stage, which generates the final result in a bit-serial manner. The serial output appears on the CTAG outputs and signal CTAGVAL (CTAG valid) is generated by the controller. COUNT_TAG counter is activated by the COUNTAG instruction.
  • the FIRST_RESPONDER circuit finds the first TAG CELL containing '1', and resets all other TAG_CELLs to '0'. It is activated by the FIRSEL instruction. The beginning of the chain is fed from a FIRSTIN input, wherein FIRSTIN is a microcode command according to which the first arriving datum is the first datum to enter the chip memory. If FIRSTIN is '0', then all TAG_CELLs are reset to '0'. This is intended to chain the FIRST_SELECT operation over all ASP100s interconnected together, and the OR of the RSP outputs of the lower-numbered ASP100 should be input into FIRSTIN.
  • the TAG outputs can be disconnected from the FIRST_RESPONDER and COUNT_TAG circuits, in order to save power, by pulling the FIRCNTEN control input to '0'.
  • the RSP circuit generates '1' on the RSP output pin after COMPARE instruction, if there is at least one '1' TAG value. This output is registered.
  • the TOP block consists of the COMPARAND and MASK registers and their respective logic, and the vertical bus drivers.
  • the COMPARAND register contains the word which is compared against the ARRAY. It is 72 bits long, and is partitioned according to FIFO configuration (see Section 4.3) It is affected by the following instructions: LETC, LETMC, LMCC, LMSC, LCSM. All these instructions affect only one of the three sectors at a time, according to the sector bits.
  • the FIFO section of the COMPARAND operates differently, as described in Section 4.3.
  • the MASK registers masks (by '0') the bits of the COMPARAND which are to be ignored during comparison and write.
  • the BitLines and InverseBitLines of the masked bits are kept at '0' to conserve power. It is affected by the following instructions: LETM, LETMC, LMCC, LMSC, LCSM, LMX, SMX.
  • LETM, LETMC, LMCC, LMSC, LCSM, LMX, SMX The former five instructions affect only one sector at a time, whereas LMX and SMX also clear the mask bits of the non-addressed sectors.
  • the FIFO section of the MASK operates differently, as described in Section 4.3.
  • the 24 bit data input from DBUS (the operand) are pipelined by three stages, so as to synchronize the operand and the corresponding control signals.
  • BOTTOM contains the BitLine and InverseBitLine sense amplifiers, the Output Register and its multiplexor, the DBUS multiplexors, and the DBUS I/O buffers. Since the ARRAY is physically organized in three columns, the output of the three sense amplifiers must be resolved. A logic unit selects which column actually generated the output, as follows:
  • READ Select the column whose RSP is true.
  • FIFO OUT Select the column in which the address token is in.
  • the Output Register is 72 bits long. 8 or 16 or 24 bits serve the FIFO and are connected to the VOUT pins. On READ operation, one of the three sectors (according to the sector bits) is connected to 24 bits of DBUS via a multiplexor.
  • DBUS multiplexors allow two configurations:
  • SHIFT Connects the south long shift (from rows 1008:1023) to DBUS [31: 16] and the north long shift lines (from rows 0:15) to DBUS[15:0].
  • READ Connects bits [15:0] (bit 0 is LSB) of the Output Register to DBUS [15:0], and bits [23:16] of the OR to DBUS [23: 16].
  • the DBUS I/O buffers control whether the DBUS is connected as input or output, and are controlled by the HIN and LIN control signals:
  • the ASP100 is controlled by means of an external microcoded state machine, and it receives the decoded control lines.
  • the external microcontroller allows horizontal microprogramming with parallel activities.
  • APS100 The combined operation of APS100, its microcontroller, and the external controller is organized in a five-stage instruction pipeline, consisting of the following stages: Fetch, Decode, microFetch, Comparand, and Execute.
  • Fetch stage the instruction is fetched from the external program memory, and is transferred over the system bus to the IR (instruction register).
  • the instruction (from the IR) is decoded by the microcontroller and stored in the ⁇ lR.
  • the control codes are transferred from the external ⁇ lR, through the input pads, into the internal ⁇ lR.
  • the Comparand stage parts of the execution which affect the Comparand register are carried out, and the control codes move from the internal ⁇ IR to the internal ⁇ IR2.
  • execution in the ARRAY and other parts takes place. See Figure 9.
  • Pipeline breaks occur during SHIFT and READ, when DBUS is used for data transfer rather than instructions, and as a result of branches. Branches are handled by the external controller, and are interpreted as NOP by the APS100 microcontroller. Similarly, operations belonging to the controller are treated as NOPs.
  • a single 50 MHz CLKIN clock is input into the ASP100.
  • a clock synchronization control DCKIN signal is also input.
  • the CLKIN signal serves as the clock of the generator circuit.
  • DCLKIN is an input signal (abiding by the required setup and hold timing).
  • the circuit creates two clocks of, for example, 25 MHz, CLK and DCLK, delayed by 1/4 cycle relative to each others.
  • CLK is fed back into a clock-generating pad, to provide the required drive capability.
  • CLK, DCLK and their complements provide for four-phase clocking, as shown in Figure 10.
  • Instruction format A is used for Group 1 and READ instructions
  • instruction format B is used for all other groups. It contains one bit for NOP, five OpCode bits, two sector bits, and 24 operand bits.
  • Instruction format B contains one bit for NOP, seven OpCode bits, and 24 operand bits.
  • ARTVM Associative Real Time Vision Machine
  • the core of the machine is a basic, classical, associative processor that is parallel by bit as well as by word.
  • the main associative primitive is COMPARE.
  • the comparand register is matched against all words of memory simultaneously and agreement is indicated by setting the corresponding tag bit. The comparison is only carried out in the bits indicated by the mask register and the words indicated by the tag register. Status bit rsp signals that there was at least one match (Fig. 12).
  • the WRITE primitive operates in a similar manner. The contents of the comparand are simultaneously written into all words indicated by the tag and all bits indicated by the mask (Fig. 12).
  • the READ command is normally used to bring out a single word, the one pointed to by the tag.
  • the associative machine may be regarded as an array of simple processors, one for each word in memory.
  • ARTVM provides N ⁇ N words, one for each pixel in the image to be processed, and the pixels are arranged linearly, row after row.
  • the full machine instruction set is given in the following table.
  • Neighborhood operations which play an important role in vision algorithms, require bringing data from "neighboring" pixels.
  • Data communication is carried out a bit slice at a time via the tag register, by means of the SHIFTAG primitives, as shown in Fig. 13.
  • the number of shifts applied determines the distance or relation between source and destination. When this relation is uniform, communication between all processors is simultaneous. Fortunately, neighborhood algorithms only require a uniform communication pattern. Since the image is two-dimensional while the tag register is only one-dimensional, communication between neighbors in adjacent rows requires N shifts. To facilitate these long shifts a multiple shift primitive, SHIFTAG( ⁇ b), was implemented in hardware, where 6 is a sub-multiple of N. The time complexity in cycles for shifting an N ⁇ N image k places is given by
  • Loading data images into associative memory and outputting computed results could take up much valuable processor time. This can be avoided by distributing a frame buffer within associative memory and giving it access to the tag register [33]. That is the function of the I/O Buffer Array which consists of a 16-bit shift register attached to each word. A stereo image frame can be shifted into the buffer array as it is received and digitized, without interfering with associative processing. During vertical blanking, the stereo image frame is transferred into associative memory, a bit slice at a time, using
  • letmc d(i) write; /* load memory with bit slice from buffer */
  • Execution time is 64 machine cycles (under 2 ⁇ s), which is negligible compared to the vertical blanking period (1.8 ms). While the sample routine exchanges data between the buffer array and a continuous field in memory, it should be noted that the tagxch primitive is quite flexible, can fetch data from one field and put to another, and both fields can be distributed.
  • Up to four operations may be done concurrently during a given memory cycle: SETAG or SHIFTAG; loading M (SETX, LETX); loading C (SETX, LETX); and COMPARE, READ or WRITE.
  • FIRSEL resolves multiple responses in 6 cycles.
  • COUNTAG is used to compile statistics and executes in 12 cycles. Control functions are given in the C language, and are carried out in parallel with the associative operations, hence do not contribute to the execution time.
  • associative memory contains two data vectors, A and B, each having J elements and M-bit precision.
  • the associative operation is carried out sequentially, a bit slice at a time, starting with the least significant bit.
  • the statement [letm d(.); letc d(.); setag; compare] followed by a parallel replacement of B and C with the appropriate output combination by means of the statement [letc d(.) write].
  • each word in memory acts as a simple processor so that memory and processor are indistinguishable. Input, output and processing go on simultaneously in different fields of the same word.
  • the field to be accessed or processed, is completely flexible, depending on application.
  • processing capabilities familial of vision algorithms
  • Ruhman and Scherson [34, 35] devised a static associative cell and used it to lay out an associative memory chip. After evaluating its performance by circuit level simulation, they conservatively estimated the area ratio between associative memory and static RAM at 4. Considering that 4 megabits of static RAM is now commercially available on a chip area less than 100mm 2 , associative memory chip capacity becomes 1 M bits. The proposed chip for ARTVM stores 4K words ⁇ 152 bits, which is only 59 percent of capacity. Conservative extrapolation of cycle time to a 0.5 micron technology yields 30 nanoseconds. This value was used in computing execution time of associative vision algorithms.
  • Fig. 14 describes the chip interface, and shows how 64 such chips can be interconnected to make up the associative memory of ARTVM. Considering the exponential growth of chip capacity by a factor of 10 every five years [36], the ARTVM may be reduced to 8 chips around 1995. Since the bulk of the machine is associative memory, upgrading is simple and inexpensive.
  • a control unit is required to generate a sequence of microin-structions and constants (mask and comparand) for associative memory as well as to receive and test its outputs (count and rsp).
  • This unit may be realized using high speed bit-slice components, or may be optimized by the design of one or more VLSI chips. The functions of the control unit will become apparent from the associative algorithms that follow.
  • a simulator for the ARTVM was created, which enables the user to check out associative implementations of vision algorithms and to evaluate their performance. It was written in the "C” language and is referred to as "asslib.h".
  • the vision machine simulator consists of an associative instruction modeler and an execution time evaluator.
  • the main features are:
  • the load command initializes the array A[.][.] with data from the file ass.inp, while the save command writes into file ass. out the contents of memory array A[.][.] at the end of the application program.
  • the print-cycles command displays the number of machine cycles required to execute the simulated program.
  • Speed evaluation was achieved by modeling the machine as a simplified Finite Automaton (F.A.) in which a cost, in machine cycles, was assigned to each state transition.
  • the machine has only two states: S 0 and S 1 .
  • Fig. 15 presents the transition table and diagram.
  • cycles a cycle counter (called cycles) to zero, then increment it at each state transition by the assigned cost in cycles (appears as output in the diagram).
  • This speed model reflects the fact that any instruction from group I 1 can execute simultaneously with one from group I 2 , and they can both be overlapped by an instruction from group I 3 .
  • the cost of countag (I 4 ) is conservatively estimated at 12 cycles on the basis of implementing it on-chip by a pyramid of adders, and summing the partial counts off-chip in a two-dimensional array.
  • the cost of firsel (I 5 ) is conservatively estimated at 6 cycles on the basis of implementing it by a pyramid of OR gates whose depth is log 2 N - 1, part of which is on-chip and the rest off-chip. In the worst case, the pyramid must be traversed twice and then a tag flip-flop must be reset.
  • the simplicity of the model for speed evaluation imposes a mild restriction on the programmer. Instructions that are permitted to proceed concurrently must be written in the order I 1 , I 2 , I 3 .
  • the data is arranged linearly in memory, in the order of the video scan, row after row of pixels, starting at the top left hand corner of the image.
  • the long shift instruction was provided for communication between rows, and its extent was denoted by b, where b is a submultiple of the row length, N.
  • T hist 0.5 + (13)2 M (1)
  • M the gray level precision and is taken to be 8 bits in our model.
  • the histogram is executed in under 3330 machine cycles or nearly 100 ⁇ s.
  • histogram_array [gray.level] countag
  • Low level vision involves the application of various filters to the image, which are most conveniently executed by convolution.
  • the image may be considered as a simple vector of length N 2 or as a concatenation of N row vectors, each of length N.
  • Convolution of an N-element data vector by a P-element filter results in a vector of length N + P - 1, but only the central N - P + 1 elements, representing the area of full overlap between the two vectors, are of interest here.
  • the word format in associative memory is depicted in Fig. 16.
  • the convolution filter vector [f] of length P and precision 8 is applied as operand from the machine controller, one element at a time.
  • the result is accumulated in field [fd] of length 8 + 8 + log 2 (P).
  • Bit temp is used for temporary storage of the carry that propagates through field [fd]. The mark bit serves to identify the area of complete overlap by the filter vector.
  • bit_count 0; bit_count ⁇ n; bit_count++ )
  • T 1d PM[ ⁇ (Mt ⁇ + T p 1d ) + t s ] (2) where t ⁇ , t s are the per-bit addition and shift complexity, and T p 1d is the complexity of carry propagation over field [fd]. Since addition and carry propagation are only executed for a ONE digit in the multiplier (filter element), time complexity is a function of ⁇ , the ratio of ONE digits in the filter vector elements, whose range is . From the program listing,
  • T 2d ⁇ P 2 M(t ⁇ + T p 2d ) + M(P - 1) ( t r s + Pt c s ) (3) where is the average carry propagation for 2-d convolution.
  • Equation 11 can be written as:
  • step 1 of the program evaluation of d ⁇ f 0
  • step 1 of the loop change step 1 of the loop from ⁇ EXT+d to ⁇ EXT-d.
  • T m is the time (2 ⁇ 15 ⁇ 2.5 cycles) to generate the two partial products by table look-up
  • T p1 , T p2 are their carry propagation times following addition into field fd.
  • This algorithm [39] detects the Zero Crossings, ZC, of the Laplacian of the Gaussian filtered image, and can be written as:
  • the DOG filter has the following form: where ⁇ p and ⁇ n are the space constants of the positive and negative Gaussian respectively, and their ratio for closest agreement with the Laplacian of the Gaussian ( ⁇ 2 G)
  • T dog 2T 1d (P p ) + 2T 1d (P n ) + T diff (15)
  • P p and P n are the appropriate filter sizes for space constants ⁇ p and ⁇ n respectively
  • T diff is the complexity of M-bit subtraction and borrow propagation into the sign bit. The speed of associative subtraction is the same as that of addition, hence,
  • the second step of the M&H algorithm operates on a 3 ⁇ 3 neighborhood.
  • the center pixel is considered to be an edge point if one of the four directions (horizontal, vertical and the two diagonals) yields a change in sign. Specifically, we test if one item of a pair (about the center) exceeds a positive threshold T while the other is less than -T.
  • the associative implementation of ZC for each space filter is outlined below. 1. Compare all pixels ( resulting from the DOG function) concurrently against the thresholds T and -T, and assign two bits in memory to indicate the results.
  • the associative algorithm to detect zero crossings shows a time complexity of 165 cycles or 4.95 microseconds. It should be noted that the M&H algorithm generates edge points without gradient direction. This parameter can be computed by operating on a larger neighborhood (9 ⁇ 9) around each edge point. An associative algorithm to detect 16 segment directions (and corners) was developed and is described below. Its time complexity is 1010 cycles or 30.3 microseconds.
  • the x and y derivatives of the Gaussian filter can be obtained by convolving the image with and respectively. Applying the enhanced multiplication method, the execution
  • Non-maximum suppression selects as edge candidates pixels for which the gradient magnitude is maximal. For optimum sensitivity, the test is carried out in the direction of the gradient. Since a 3 ⁇ 3 neighborhood provides only 8 directions, interpolation is used to double this number to 16. To determine if maximal, the gradient value at each pixel is compared with those on either side of it. Associative implementation requires somewhat fewer operations than that of zero-cross detection discussed earlier.
  • Thresholding with hysteresis eliminates weak edges that may be due to noise, but continues to trace strong edges as they become weaken.
  • two thresholds on gradient magnitude are computed - low and high. Edge candidates with gradient magnitude under low are eliminated, while those above high are considered edges. Candidates with values between low and high are considered edges if they can be connected to a pixel above high through a chain of pixels above low. All other candidates in this interval are eliminated.
  • the process involves propagation along curves.
  • Associative implementation uses three flags as shown in Fig. 19: E, which initially marks candidates above high threshold (unambiguous edge points), and at the end designates all selected edge points; OE (Old Edges) to keep track of confirmed edges at the last iteration; and L to designate candidates above low. At every iteration, each L candidate is examined to see if at least one of its 8-neighbors is an edge, in which case it is also declared an edge by setting E. Before moving E into OE, the two flags are compared to see if steady state has been reached, in which case the process terminates.
  • T pro I(23.5 + j) (19) where I is the number of iterations, 23.5 is the time to examine the state of 8 neighbors, and N/b accounts for long shifts to bring in edge points from neighboring rows.
  • the upper bound of I, given by the longest propagation chain, is nearly N 2 /2, but for a representative value of 100 iterations the complexity of curve propagation becomes 3950 cycles or 119 microseconds.
  • a multipass thinning algorithm is proposed, consisting of a pre-thinning phase and an iterative thinning phase.
  • the pre-thinning phase fills a single gap by applying template (a) , and removes some border noise by clearing point P if one of templates b,c or d holds.
  • Multi-pass implies that the templates are applied first in the north direction, then in the south, east and west directions, in succession - except for template (a) which is fully symmetrical and need only be applied once. All templates are shown in the north direction and use an X to denote a "don't care" (ONE or ZERO).
  • the thinning phase tests templates e,f and g in each of the 4 directions successively, and clears point P when agreement is found.
  • This 4-pass sequence is iterated until there is no further change.
  • the quality of the skeleton produced by this simple local process is based on the medial axis.
  • Davies and Plummer [41] proposed a very elaborate algorithm to produce such a skeleton, and chose 8 images for testing it.
  • Our thinning algorithm was applied to these images and produced interesting results: the skeletons agree virtually exactly with those of Davies and Plummer; any discrepancy, not at an end-point, occurs at a point of ambiguity and constitutes an equally valid result.
  • the removal of border noise in the prethinning phase prevents the formation of extraneous spurs in the skeleton. Time complexity of the algorithm is given by,
  • Stereo vision must solve the correspondence problem: for each feature point in the left image, find its corresponding point in the right image, and compute their disparity. Since stereo has been a major research topic in computer vision over the past decade, a large number of approaches have been proposed, too many to attempt to implement them all associatively. Instead, we concentrate on the Grimson [43] algorithm. This also has some similarity with the hierarchical structure of human vision, and it can use its input edges producted by the M&H or Canny edge detection schemes discussed above.
  • edge detection was carried out on both the left and the right image, and the results are sitting side by side in memory.
  • the edge points are marked and their orientation given to a 4-bit precision over 2 ⁇ radians or a resolution of 22.5 degrees.
  • the stereo process uses the left image as reference, and matches edge points with gradient of equal sign and roughly the same orientation. Edge lines near the horizontal (within ⁇ 33.75 degrees) are excluded in order to minimize disparity error.
  • the Grimson algorithm consists of the following steps:
  • the associative memory word format for the matching process is given below.
  • the resulting value of disparity will be recorded in output field DISP.
  • the associative algorithm is outlined in Fig. 21.
  • pool A Search the neighborhood of ⁇ W pixels divided into three pools, A, B and C, where pools A and C are equal in size and represent the divergent and convergent regions, respectively.
  • pool B is the region about zero disparity.
  • T stereo T sh + T m ⁇ t + T dis + T or (20) where T sh accounts for shifting the right image, T m ⁇ t is the time to evaluate matches within the pools, T dis the disambiguation time, and T or the time to find and remove out-of-range disparity.
  • T sh accounts for shifting the right image
  • T m ⁇ t is the time to evaluate matches within the pools
  • T dis the disambiguation time
  • the shift time in cycles is given by,
  • the first term accounts for the initial and final W-place shift-up of fields DR, DIR-R (5 bits).
  • the second term covers the one-place shift-down between successive matching operations.
  • the last term is due to the generation and update of a border flag for handling row end effects.
  • T m ⁇ t 8 ⁇ 3 ⁇ 10 (2W + 1) + 13.5 (22) where the second term accounts for final processing of the comparison results.
  • the disambiguation process consists of the following steps:
  • T dis T cnt + 3 (T cnt + T gt + T cpy + 3.5) (23) where T cnt is the time to count labeled pixels over a neighborhood, T gt to compare for greater than, and T cpy to copy disparity, respectively.
  • T cnt is the subject of the next section, while the other two are given by,
  • the disambiguation algorithm is listed below.
  • COUNT_P+2k-1 used as GTF */
  • next_bit_p COUNT_P + bit_count
  • letmc d (DISP+bit_count) ; write ;
  • T or 4.5 + T cnt + T lt + T rm (26) where the constant term accounts for grouping unmatched edge points and labelling them in MR; T cnt is the time to count the unmatched edge points over the neighborhood; T lt covers comparison of this count to 1/4 the number of edge points in the neighborhood; and T rm is the time to label and clear disparity of edge points in out-of-range neighborhoods.
  • Stereo matching is performed for each of the spatial frequency channels. From the Marr and Poggio [44] model of stereo vision,
  • every row of L labels was actually counted L times, once in each of the vertically overlapping neighborhoods.
  • the count may be carried out in two stages: first the neighboring labels within each row are tallied up, and then the vertically neighboring row sums are accumulated as they are entered in some convenient sequence. That requires an additional "rows" field of length log L, and yields the following program, where the word format is as in Fig. 23 and LISTING 6: 2-d Summation Over L ⁇ L with L Odd count_flag_to_field(flag,field)
  • field_next field+bit-count
  • letc d(rows_next) d(field_next); d(cy); setag; compare; letc d(cy); write; ⁇
  • bit_count rows; bit_count ⁇ rows+kr; bit_count++
  • field_next field+bit_count
  • ⁇ cy field+tree_step+1;
  • ⁇ cy field+k+tree_step
  • letc d(cy) setag; compare; letc d(bit_count) ; write;
  • Count complexity is still seen to grow as L log L in equation 32, but the table indicates an improvement of 40 percent over 2-d summation (for the largest neighborhood), and a resulting improvement in stereo complexity of 27.5 percent.
  • the improvement stems mostly from the fact that at each level of the tree, arithmetic is carried out just to the precision required, which is known in advance.
  • Fig. 25 gives execution time as a function of neighborhood dimension for the three implementations described: the linear, the two-dimensional, and the two-dimensional tree.
  • Fig. 26 presents stereo complexity without neighborhood counts, and with neighborhood counts by each of the three methods, all as a function of neighborhood dimension.
  • Optical flow assigns to every point in the image a velocity vector which describes its motion across the visual field.
  • the potential applications of optical flow include the areas of target tracking, target identification, moving image compression, autonomous robots and related areas.
  • the theory of computing optical flow is based on two constraints: the brightness of a particular point in the image remains constant, and the flow of brightness patterns varies smoothly almost everywhere.
  • Horn & Schunck [45] derived an iterative process to solve the constrained minimization problem.
  • the flow velocity has two components (u, v). At each iteration, a new set of velocities (u n+1 , v n+1 ) can be estimated from the average of the previous velocity estimates ( " f , "" "-) by,
  • the partial derivatives E x , E y and E t in ( 33) are estimated as an average of four first differences taken over adjacent measurements in a cube,
  • E i,j,k is the pixel value at the intersection of row i and column j in frame k. Indices i, j increase from top to bottom and left to right, respectively.
  • Local averages ⁇ and in ( 33) are defined as follows:
  • the memory word was partitioned into multiple fields, each holding input data, output data, or intermediate results.
  • the format is given in Fig. 27.
  • the flow is computed from two successive video frames: E n and E n1 .
  • Each frame contains 512 ⁇ 512 pixels whose grey level is given to 8-bit precision.
  • the current image is moved to E n1 , and a new image from the I/O buffer array (Fig. 15) is written into field E n .
  • the frame time one or more iterations of the algorithm are executed, enough to obtain a reasonable approximation of the optical flow for use with the next frame.
  • Equation 33 can be rewritten as,
  • the first stage computes the partial derivatives E x , E y and E t .
  • Shift E n , E n1 one column to the left (one word up) and accumulate as shown in the above table, steps 7,8. • Shift E n , E n1 to their original position (one word down) .
  • I denotes the number of iterations required for the flow to converge. Evaluating the above expression, the fixed part takes 266 ⁇ s., and each iteration requires an additional 196 ⁇ s.
  • the execution time for different values of I is given in the following table:
  • the sector evaluation function is here defined as the logical OR of its edge point indicators, and a 24- bit field was assigned to the sector values. Evaluation is carried out by shifting in neighboring edge point indicators and OR'ing them directly into the corresponding sector values.
  • Sector partitioning was based on 7r/8 angular resolution, and defines 16 equally spaced segments or rays around the circle. Each segment (direction) is characterized by a required code in a prescribed subset of the segment values, and a maximum Hamming distance of 1 is permitted. The sector value field is now compared against each of the 16 codes and the results registered in a 16-bit field to mark the presence of each segment direction.
  • the 16-bit segment field can now be tested for any pair of segments representing a given line or corner.
  • the sample program selects all pairs without distinction.
  • a preparation step labels each contour point with its x-y coordinates.
  • the main process is iterative and operates on a 3 x 3 neighborhood of all contour points in parallel. Every contour point looks at each one of its 8 neighbors in turn and adopts the neighbor's label if smaller than its own.
  • the circular sequence in which neighbors are handled appreciably enhances label propagation. Iteration stops when all labels remain unchanged, leaving each contour identified by its lowest coordinates. The point of lowest coordinates in each contour is the only one to retain its original label. These points were kept track of and are now counted to obtain the number of contours in the image.
  • Listing 8 presents the program in associative memory.
  • the input fields are [xy_coord] to specify pixel position and [edge] to identify contour points.
  • the output fields are contour [label] and contour starting point [mr].
  • the word format is in Fig. 29.
  • a list of contours, giving label and length (in pixels) may be generated in relatively short order (24 cycles per contour).
  • Salient structures in an image can be perceived at a glance without the need for an organized search or prior knowledge about their shape. Such a structure may stand out even when embedded in a cluttered background or when its elements are fragmented.
  • Sha'ashua & Ullman [46] have proposed a global saliency measure for curves based on their length, continuity and smoothness.
  • a curve of length L in this image is a connected sequence of orientation elements p i , p i+1 , . . . , Pi+L , each element representing a line-segment or a gap in the image, and the saliency measure of the curve is defined as,
  • the attenuation factor ⁇ approaches unity for an active element and is appreciably less than unity (here taken to be 0.7) for a virtual element.
  • the first factor, c i,j is a discrete approximation to a bounded measure of the inverse of total curvature
  • ⁇ k denotes the difference in orientation from the k-th element to its successor
  • ⁇ S the length of an orientation element
  • E j is the state variable of p j , one of the d possible neighbors of p i ; the superscript of E is the iteration number; and f i, j is the inverse curvature factor from p i to ⁇ j . After L iterations the state variable becomes equivalent to the saliency measure defined earlier,
  • Fig. 30 the following notation is employed:
  • the Hough transform can detect a curve whose shape is described by a parametric equation, such as the straight line or conic section, even if there are gaps in the curve.
  • a parametric equation such as the straight line or conic section
  • Each point of the figure in image space is transformed to a locus in parameter space.
  • a histogram is generated giving the distribution of locus points in parameter space. Occurrence of the object curve is marked by a distinct peak in the histogram (intersection of many loci).
  • x cos ⁇ + y sin ⁇ ⁇ which specifies the line by ⁇ and ⁇ , and the histogram includes a straight line in every direction of ⁇ . But if the candidate points are the result of edge detection by a method that yields direction, then ⁇ is known. Following O'Gorman & Clowes [50], this information was applied to effect a major reduction of both hardware (word-length) and time complexity. For a 511 ⁇ 511 image with the origin at its center, the x-y coordinates are given by 9 bits in absolute value and sign. Angle ⁇ from 0 to ⁇ is given to a matching precision of 10 bits (excluding sign of gradient). The sine and cosine are evaluated by table look-up. Advantage is taken of the symmetry of these functions to reduce the table size four-fold. After comparing ⁇ , the histogram is evaluated and read-out element by element using the COUNTAG primitive. This algorithm requires a 52-bit word length and has a time complexity of,
  • T l 1870 + 13t(r - 1) machine cycles, where t, r are the resolutions of ⁇ , ⁇ respectively in the histogram.
  • the second term accounts for histogram evaluation and dominates T l at t, r ⁇ 32.
  • the execution time is only 150 ⁇ s per frame, and grows to just 6.4 ms at a resolution of 128.
  • T c 1550 + 26r x r y , (46)
  • r x , r y are the range resolutions of x 0 , y 0 in the histogram.
  • the execution time is 10.8 ms per frame.
  • Mixed circles that are partly black on white, partly white on black, may be detected by summing the two histograms (in the host) before thresholding. If the search is restricted to bright circles on a dark background (or vice-versa) , the complexity reduces to
  • T c 1280 + 13r x r y , (47) and the execution time drops to 6.4 ms.
  • the approach chosen for associative implementation is known as the package-wrapping method [51]. Starting with a point guaranteed to be on the convex hull, say the lowest one in the set (smallest y coordinate), take a horizontal ray in the positive direction and swing it upward (counter-clockwise) until it hits another point in the set; this point must also be on the hull. Then anchor the ray at this point and continue swinging to the next points, until the starting point is reached, when the package is fully wrapped.
  • V 1 V 2
  • P j P ks are collinear, and the point to be chosen is the most distant from P j , the one having maximum
  • Each of the given points acts as a source of fire that spreads uniformly in all directions. The boundaries consist of those points at which fires from two (or three) sources meet. Every point in the given set is initially marked with a different color - actually its xy-coordinates. Each point in the image looks at its 8-neighbors. A blank (uncolored) point that sees a colored neighbor will copy its color. If both are colored, the point will compare colors, marking itself as a Voronoi (boundary) point if the colors are different. This process is iterated until all points are colored.

Abstract

Associative signal processing apparatus for processing an incoming signal including a plurality of samples, the apparatus including a two-dimensional array of processors, each processor (20) including a multiplicity of content addressable memory cells (122), each sample of an incoming signal being processed by at least one of the processors, and a register array including at least one register operative to store responders arriving from the processors and to provide communication, within a single cycle, between non-adjacent processors.

Description

APPARATUS AND METHOD FOR SIGNAL PROCESSING
FIELD OF THE INVENTION
The present invention relates to methods and apparatus for signal processing.
BACKGROUND OF THE INVENTION
State of the art computer vision systems and methods and state of the art associative processing methods are described in the following publications, the disclosure of which is hereby incorporated by reference:
C. C. Foster, "Content addressable parallel processors", Van Nostrand Reinhold Co., 1976, chs. 2 and
5,
S. Ruhman, I. Scherson, "Associative processor for tomographic image reconstruction", Proc. Medcomp 82 (1982 IEEE Comp. Soc. Int. Conf. on Medical Comp. Sc./Computational Medicine), pp. 353 - 358,
C. Weems, E. Riseman, A. Hanson, A. Rosenfeld, "IU parallel processing benchmark", Proc. Comp. Vision & Pattern Recogn., pp. 673 - 688, 1988,
I. Scherson, "Multioperand associative processing and application to tomography and computer graphics", Ph.D. Thesis, Computer Sciences, Weizman Institute of Science, 1983,
Canny, J. "Computational approach to edge detection", IEEE Trans. on pattern analysis and machine Int., Nov. 1986, pp. 679 - 698, and
Sha'ashua, A. and S. Ullman, "Structural saliency: the detection of global salient structures using a local connected network", Proc. ICCV Conf., pp, 321 - 327, Florida, 1988, and
Akerib, A. J. and Shmil Ruhman, "Associative contour processing", MVP 1990 IAPR Workshop on Machine Vision Applications, Nov. 28 - 30, 1990, Tokyo.
Image processing techniques and other subject matter useful for associative signal processing are described in the following references:
References
[1] W.D. Hillis, "The Connection Machine", MIT Press, 1985.
[2] C. C. Weems, Jr., "Image Processing on a Content Addressable Array Parallel Processor", Ph.D. Thesis, Computer and Information Sciences, University of Massachusetts at Amherst, 1984.
[3] C. C. Foster, "Content Addressable Parallel Processors", Van Nostrand Reinhold Co., 1976, chs. 2 & 5.
[4] S. Ruhman, I. Scherson, "Associative Processor for Tomographic Image Reconstruction", Proc. Medcomp 82 (1982 IEEE Comp. Soc. Int. Conf. on Medical Comp. Sc./Computational Medicine), pp 353-358.
[5] S. Ruhman, I. Scherson, "Feasibility Study of Associative Radar Signal Processing", Internal Report, Weizmann Institute of Science, October 1984.
[6] S. Ruhman, I. Scherson, "Associative Processor Particularly Useful for Tomographic Image Reconstruction", U.S. Patent 4,491,932, filed Oct. 1, 1981, issued Jan. 1, 1985.
[7] C. Weems, E. Riseman, A. Hanson, A. Rosenfeld, "IU Parallel Processing Benchmark", Proc. Comp. Vision & Pattern Recogn. pp 673-688, 1988.
[8] D.C. Marr, E. Hildreth, "Theory of Edge Detection", Proc. R. Soc. London, Vol. B 207, pp 187-217, 1980.
[9] W.E.L. Grimson, "A Computer Implementation of a Theory of Human Stereo Vision", Phil. Trans. Royal Soc. of London, Vol. B 292, pp 217-253, 1981.
[10] B.K.P. Horn, B.G. Schunck, "Determining Optical Flow", Artificial Intelligence Vol. 17, pp 1S5-203, 1981. [11] V. Cantoni, C. Guerra and S. Levialdi, "Towards an Evaluation of an Image Processing System ". In: Computing Structures for Image Processing, Academic Press, 1983, ch. 4.
[12] M.J. Flynn, "Very high-speed computer systems", Proc. IEEE, Vol. 54, pp 1901-1909, 1966.
[13] B. Lindskog "PICAP, An SIMD architecture for multi-dimensional signal processing", Dissertation No. 176, Dept. of El. Eng., Linkoping University, Sweden, 1988.
[14] S.S. Wilson, "One dimensional SIMD architectures - the AIS-5000", Multicomputer Vision, Academic Press, pp. 131-149.
[15] D. Juvini, J.L. Basille, H. Essafi, J.Y. Latil, "Sympati 2, a 1.5D processor array for image application", EUSIPCO IV, Theories and applications, North-Holland, pp. 311- 314, 1988.
[16] I.N. Robinson and W.R. Moore, "A parallel processor array architecture and its implementation in silicon", Proc. IEEE Custom Integrated Circuit Conf. pp 41-45, 1982.
[17] M.J.B. Duff, "CLIP4: A large scale integrated circuit array parallel processor", Proc.
IJCPR, pp. 728-733, 1976.
[18] K.E. Batcher, "Design of a massively parallel processor", IEEE Trans. Comp., Vol. C-29, pp. 836-840, 1980.
[19] S.F. Reddaway, "DAP - A flexible number cruncher", Proc. LASL Workshop on Vector and Parallel Processors, pp. 233-234, 1978.
[20] D.E. Shaw, "The NON- VON supercomputer", Computer Science Dept., Columbia University, New York, NY, 1982.
[21] V. Cantoni and S. Levialdi, "PAPIA: A case history, ", Parallel Computer Vision, L.
Uhr, Ed. New York, NY: Academic Press, 1987, pp. 3-13.
[22] L.I. Hungwen and M. Maresca. "Polymorphic-Torus Architecture for Computer Vision", IEEE Trans. PAMI, Vol. 11, No. 3, 1989.
[23] S.R. Sternberg, "Biomedical image processing" , Computer, Vol. 16, pp. 22-35, 1983. [24] E. W. Kent. M.O. Shneier, and R. Lumia, " PIPE: Pipeline image processing engine ", Parallel and Distributed Computing, Vol. 2, pp. 50-78, 1985.
[25] Max Video product literature, Datacube Corp., Peabody, MA, 1987.
[26] T. Gross, H.T. Kung, M.S. Lam, and J.A. Webb, " Warp as a machine for low-level vision", Proc. of the 1985 Int. Conf. on Robotics and Automation, 1985.
[27] W.W. Wilcke, R.C. Booth, D.A. Brown, D.G. Shea, and F.T. Tong, "Design and application of an experimental multiprocessor", IBM RC 12604, 1987.
[28] C. Withby-Strevens, "The Transputer", Proc. 12th ACM Int. Symp. on Computer Architectures, pp. 292-300, 1985.
[29] L. Erman, "The Hearsay-II Speech- Understanding System: integrating knowledge to resolve uncertainty", Computing Surveys Vol. 12, pp. 213-253, 1980.
[30] H.P. Nii, "The blackboard model of problem solving and the evaluation of blackboard architectures", AI Magazine, Vol. 7, No. 2, pp. 38-53, 1986.
[31] B.A. Draper, R.T. Collins, J. Brolio, J. Griffith, A.R. Hanson, and E.M. Riseman, "Tools and experiments in the knowledge-directed interpretation of road scenes", Proc. Image Understanding Workshop, Morgan Kaufmann: Los Altos, CA, 1987.
[32] NCube Corp, Promotional literature, Beaverton, OR, 1985.
[33] C. Fernstrom, I.Kruzela, B.Svensson, "LUCAS Associative Array Processor", Springer- Verlag, 1985, ch. 2.
[34] S. Ruhman, I. Scherson, "Associative Memory Cell and Memory Unit Including Same", U.S. Patent 4,404,653, filed Oct. 1, 1981, issued Sep. 13, 1983.
[35] I. Scherson, "Multioperand Associative Processing and Application to Tomography and Computer Graphics", Ph.D. thesis, Computer Sciences, Weizmann Institute of Science, 1983.
[36] J. Worlton, "Some Patterns Technology Change in High- Performance Computers", Proc.
Supercomputing, pp 312-320, Nov. 1988. [37] I. Scherson. S. Ruhman, ''Multi-Operand Associative Arithmetic", Proc. 6th Symp. on Computer Arithmetic, Aarhus 19S3, pp 124-128.
[38] I. Scherson, S. Ruhman, "Multi- Operand Arithmetic in a Partitioned Associative Architecture", J. of Parallel & Distr. Comp., Vol. 5., pp 655-668, 1988.
[39] D.C. Marr, E. Hildreth, "Theory of Edge Detection", Proc. R. Soc. London, Vol. B 207, pp 187-217, 1980.
[40] John Canny, "Computational Approach to Edge Detection", IEEE Trans, on Pattern Analysis & Machine Int., Nov. 1986, pp. 679-698.
[41] E.R. Davis and A.P.N. Plummer, Thinning Algorithms: A Critique and New Methodology, Pattern Recognition 14: 53-63, 1981.
[42] R.T. Chin, H.K. War, D.L. Stover, and R.D. Iverson, A One-Pass Thinning Algorithm and its Parallel Implementation, Comp. Vision, Graph. & Image Proc. 40: 30-40, 1987.
[43] W.E.L. Grimson, "A Computer Implementation of a Theory of Human Stereo Vision", Phil. Trans. Royal Soc. of London, Vol. B 292, pp 217-253, 1981.
[44] D. Marr, T. Poggio, " A Computational Theory of Human Stereo Vision", Proc. Royal Soc. London B 204, pp 301-324, 1979.
[45] B.K.P. Horn, B.G. Schunck, Determining Optical Flow, Artificial Intelligence, Vol. 17, pp 185-203, 1981.
[46] Amnon Sha'ashua and Shimon Ullman, "Structural Saliency: The Detection of Global Salient Structures Using a Local Connected Network", Proc. ICCV Conf., pp. 321-327, Florida, 1988.
[47] Amnon Sha'ashua, "Structural Saliency: The detection of globally salient structures using a locally connected network", M.Sc thesis, Dept. of Applied Math., Weizmann Inst. of Science, 1988.
[48] Amnon Sha'ashua, AI Lab., MIT, Private Communication, May 1991. [49] Duda R.O. and Hart P.E., "Use of the Hough transformation to detect lines and curves in pictures ", Comm. ACM Vol. 15, Num. 1, pp. 11-15, Jan. 1972.
[50] O'Gorman, F. and M.B. Clowes "Finding picture edge through collinearity of feature points", Proc. Int. Joint Conf. on Artif. Intel., pp. 543-55, 1973.
[51] A. Rosenfeld, A.C. Kak, "Digital Picture Processing", Academic Press, 1982.
[52] Robert Sedgewick, "Algorithms", Addison-Wesley, 1988.
[53] Preston, K., "The Abingdon Cross benchmark survey", IEEE Computer 22(7) , pp 9-18, 1989.
[54] S. Siegel and S. Waitings, "MaxVideo 's Performance of the Abingdon Cross BenchMark, ", Proc. Vision '87, Society of Manufacturing Engineers, Dearborn, Mich., pp 2-17, 1987.
[55] W.B. Teeuw and R.P.W. Duin, "An algorithm for benchmarking an SIMD pyramid with the Abingdon Cross", Pattern Recognition Letters 11, pp 501-506, 1990.
[56] K. Preston Jr., "Benchmark Results - The Abingdon Cross", Evaluation of Multicomputer for Image Processing, L. Uhr et al., eds., Academic Press, Cambridge, Mass., 1986.
[57] A.J. Akerib, S. Ruhman, S. Ullman, "Real Time Associative Vision Machine", Proc. 7th Israel Conf. on Artif. Intel., Vision & Pattern Recog. Elsevier, Dec. 1990.
[58] Avidan J. Akerib and Smil Ruhman, "Associative Contour Processing", MVP'90 IAPR Workshop on Machine Vision Applications, Nov. 28-30,1990, Tokyo.
[59] Avidan J. Akerib and Smil Ruhman "Associative Array and Tree Algorithms in Stereo Vision", Proc. 8th Israel Conf. on Artif. Intel. Vision & Pattern Recog., Elsevier, Dec. 1991.
All of the above references and publications cited therein are hereby incorporated by reference. Numbers in square brackets in the foregoing text are references to the above documents. SUMMARY OF THE INVENTION
The present invention seeks to provide improved methods and apparatus for signal processing.
The Associative Signal Processing (ASP) approach is now compared to a conventional serial computer's structure which includes a memory and a CPU. The CPU is responsible for computation, while the memory is a simple device which only stores data.
The ASP architecture is totally different. The computation is carried out on an "intelligent memory" while the CPU is replaced by a simple controller that manages this "intelligent" memory. In addition to its capability to read and write, each cell or word in this memory can identify its contents and change it according to instructions received from the controller.
For example, assume an array of a million numbers between 1 and 5. The requirement is to add 3 to the array.
In conventional serial computers, a number is transferred from the addressable memory to the CPU, 3 is then added, and the result is returned to the memory. The process takes 1 - 3 machine cycles for each specific number, for a total of 1 - 3 million machine cycles for the whole array.
In the associative approach the one million numbers are stored in the "intelligent memory". The controller has to ask five questions and generate five answers as follows: "Who is 5?" Please identify yourself." This takes one machine cycle. The controller instructs all those who identified themselves, to become "8". The controller continues to ask "Who is 4" and instruct "You are 7!" and so on, until it covers all the combinations.
This operation takes only 10 machine cycles in comparison to 1 - 3 million machine cycles with conventional serial computers. Using this basic instruction set of read, identify and write, all the arithmetical and logical operations can be performed.
There is thus provided, in accordance with a preferred embodiment of the present invention, associative signal processing apparatus for processing an incoming signal, the apparatus including an array of processors, each processor including a multiplicity of associative memory cells, each sample of an incoming signal being processed by at least one of the processors, a register array including at least one register operative to store responders arriving from the processors and to provide communication between processors, and an I/O buffer register for inputting and outputting a signal, wherein the processor array, the register array and the I/O buffer register are arranged on a single module.
There is also provided, in accordance with a preferred embodiment of the present invention, associative signal processing apparatus including an array of processors, each processor including a multiplicity of associative memory cells, at least one of the processors being operative to process a plurality of samples of an incoming signal, a register array including at least one register operative to store responders arriving from the processors and to provide communication between processors, and an I/O buffer register for inputting and outputting a signal.
Further in accordance with a preferred embodiment of the present invention, the processor array, the register array and the I/O buffer register are arranged on a single chip.
Still further in accordance with a preferred embodiment of the present invention, the register array is operative to perform at least one multicell shift operation. There is also provided, in accordance with a preferred embodiment of the present invention, signal processing apparatus including an array of associative memory words, each word including a processor, each sample of an incoming signal being processed by at least one of the processors, a register array including at least one register operative to provide communication between words and to perform at least one multicell shift operation, and an I/O buffer register for inputting and outputting a signal.
Further in accordance with a preferred embodiment of the present invention, the register array is also operative to perform single cell shift operations.
Still further in accordance with a preferred embodiment of the present invention, the I/O buffer register and the processors are operative in parallel.
Additionally in accordance with a preferred embodiment of the present invention, the word length of the I/O buffer register is increasable by decreasing the wordlength of the associative memory cells.
Further in accordance with a preferred embodiment of the present invention, the apparatus is operative in video real time.
Still further in accordance with a preferred embodiment of the present invention, the signal includes an image.
Further in accordance with a preferred embodiment of the present invention, at least one word in the array of words includes at least one nonassociative memory cell.
Still further in accordance with a preferred embodiment of the present invention, at least one word in the array of words includes at least one column of nonassociative memory cells. Further in accordance with a preferred embodiment of the present invention, the array, the register array and the I/O buffer register are arranged on a single module.
Still further in accordance with a preferred embodiment of the present invention, the module has a bus which receives instructions and also performs at least one multicell shift operation.
Additionally in accordance with a preferred embodiment of the present invention, the module has a first bus which performs at least one multicell shift operation and a second bus which performs at least one single cell shift operation.
There is also provided, in accordance with a preferred embodiment of the present invention, an array of processors which communicate by multicell and single cell shift operations, the array including a plurality of processors, a first bus connecting at least a pair of the processors which is operative to perform at least one multicell shift operation, and a second bus connecting at least a pair of the processors which is operative to perform single cell shift operations.
There is additionally provided, in accordance with a preferred embodiment of the present invention, a signal processing method including:
for each consecutive pair of first and second signal characteristics within a sequence of signal characteristics, counting the number of samples having the first signal characteristic, and
subsequently, counting the number of samples having the second signal characteristic.
Further in accordance with a preferred embodiment of the present invention, counting includes generating a histogram.
Still further in accordance with a preferred embodiment of the present invention, the signal includes a color imag e .
Still further in accordance with a preferred embodiment of the present invention, at least one characteristic includes at least one of the following group of characteristics: intensity, noise, and color density.
Further in accordance with a preferred embodiment of the present invention, the method also includes scanning a medium bearing the color image.
Still further in accordance with a preferred embodiment of the present invention, the image includes a color image.
There is also provided, in accordance with a preferred embodiment of the present invention, an edge detection method including identifying a first plurality of edge pixels and a second plurality of candidate edge pixels, identifying, in parallel, all candidate edge pixels which are connected to at least one edge pixel as edge pixels, and repeating the second identifying step at least once.
There is additionally provided, in accordance with a preferred embodiment of the present invention, a signal processing method including storing an indication that a first plurality of first samples has a first characteristic, storing, in parallel for all individual samples which are connected to at least one sample having the first characteristic, an indication that the connected samples have the first characteristic, and repeating the second step at least once.
Further in accordance with a preferred embodiment of the present invention, the signal includes an image and the first characteristic of the first samples is that the first samples are edge pixels.
There is also provided, in accordance with a preferred embodiment of the present invention, a feature labeling method in which a signal is inspected, the signal including at least one feature, the feature including a set of connected samples, the method including storing a plurality of indices for a corresponding plurality of samples, replacing, in parallel for each individual sample from among the plurality of samples, the stored index of the individual sample by an index of a sample connected thereto, if the index of the connected sample is ordered above the index of the individual sample, and repeating the replacing step at least once.
Further in accordance with a preferred embodiment of the present invention, replacing is repeated until only a small number of indices are replaced in each iteration.
Still further in accordance with a preferred embodiment of the present invention, the signal includes an image.
Additionally in accordance with a preferred embodiment of the present invention, the signal includes a color image.
Still further in accordance with a preferred embodiment of the present invention, the samples include pixels, the first characteristic includes at least one color component and adjacency of pixels at least partly determines connectivity of samples.
Additionally in accordance with a preferred embodiment of the present invention, the pixels form an image in which a boundary is defined and repeating is performed until the boundary is reached.
Further in accordance with a preferred embodiment of the present invention, repeating is performed a predetermined number of times.
There is also provided, in accordance with a preferred embodiment of the present invention, a method for image correction including computing a transformation for an output image imaged by a distorting lens, such as an HDTV lens, which compensates for the lens distortion, and applying the transformation in parallel to each of a plurality of pixels in the output image.
There is also provided associative signal processing apparatus including a plurality of comparing memory elements each of which is operative to compare the contents of memory elements other than itself to respective references in accordance with a user-selected logical criterion, thereby to generate a responder if the comparing memory element complies with the criterion, and a register operative to store the responders.
Further in accordance with a preferred embodiment of the present invention, the criterion includes at least one logical operand.
Still further in accordance with a preferred embodiment of the present invention, at least one logical operand includes a reference for at least one memory element other than the comparing memory element itself. For example, a plurality of memory elements may be respectively responsible for a corresponding plurality of pixels forming a color image. The references may include three specific pixel values A, B and C and the user-selected logical criterion may be that an individual pixel have a value of A, OR that its upper right neighbor has a value of B and its lower left neighbor has a value of C.
Further in accordance with a preferred embodiment of the present invention, each memory element includes at least one memory cell.
Still further in accordance with a preferred embodiment of the present invention, the plurality of comparing memory elements are operative in parallel to compare the contents of a memory element other than themselves to an individual reference.
There is also provided, in accordance with a preferred embodiment of the present invention, an associative memory including an array of PEs (processor elements) including a plurality of PE's, wherein each PE includes a processor of variable size, and a word of variable size including an associative memory cell, wherein all of the associative memory cells from among the plurality of associative memory cells included in the plurality of PE's are arranged in the same location within the word and wherein the plurality of words included in the plurality of PE's together form a FIFO.
Further in accordance with a preferred embodiment of the present invention, the word of variable size includes more than one associative memory cell.
There is also provided, in accordance with a preferred embodiment of the present invention, a method for modifying contents of a multiplicity of memory cells and including performing, once, an arithmetic computation on an individual value stored in a plurality of memory cells and storing the result of the arithmetic computation in a plurality of memory cells which contain the individual value.
Further in accordance with a preferred embodiment of the present invention, storing is carried out in all memory cells in parallel.
Also described herein is a chip for multimedia and image processing applications. It is suitable for low-cost, low power consumption, small size and high-performance real-time image processing for consumer applications and high-end powerful image processing for multimedia and communication applications.
The chip is a general purpose, massively parallel processing chip, in which typically 1024 associative processors are crowded onto one chip, enabling the processing of 1024 digital words in one machine cycle of the computer clock.
The chip was designed to allow the performance of a wide range of image processing and multimedia applications in real-time video rate. In comparison, existing general purpose, serial computing chips and digital signal processing chips (DSPs) enable the processing of only 1 - 16 words in one machine cycle.
The chip's major instruction set is based on four basic commands that enable the performance of all arithmetic and logic instructions. This is another design advantage that allows more than a thousand processors to be crowded onto a single chip.
A single chip typically performs the equivalent of 500 - 2000 million instructions per second (MIPS). A system based on the chip's architecture can reach multimedia performance of high-end computers at only a small fraction of the price of typical high-end computers.
The chip is based on a modular architecture, and enables easy connection of more than one chip in order to gain high performance (in a linear ratio) . Thus, a large number of the chips can be connected in parallel in order to linearly increase overall performance to the level of the most sophisticated supercomputers.
Existing CPU chips and DSPs require a dedicated operating system when more than one chip is connected in parallel. The performance increases in ratio to the square root of the number of chips connected together. Connecting more than two chips requires the architecture of a supercomputer.
The chip's architecture allows massively parallel processing in concurrence with data input and output transactions. As an associative processor, each of the 1024 chips has its own internal memory and data path. The chip's data path architecture provides parallel loading of data into the internal processors, thereby eliminating the bottleneck between memory and CPU that can cause severe performance degradation in serial computers.
The chip uses an average of 1 watt to perform the equivalent of 500 MIPS which is 10 - 25 times better than existing general purpose and DSP chips.
BRIEF DESCRIPTION OF THE DRAWINGS
The present invention will be understood and appreciated from the following detailed description, taken in conjunction with the drawings in which:
Fig. 1 is a simplified functional block diagram of associative signal processing apparatus constructed and operative in accordance with a preferred embodiment of the present invention;
Fig. 2 is a simplified flowchart of a preferred method for employing the apparatus of Fig. 1;
Fig. 3 is a simplified block diagram of associative signal processing apparatus for processing an incoming signal which is constructed and operative in accordance with a preferred embodiment of the present invention;
Fig. 4 is simplified block diagram of a preferred implementation of the apparatus of Fig. 1;
Fig. 5 is a simplified block diagram of an alternative preferred implementation of the apparatus of Fig. 1;
Fig. 6 is a simplified block diagram of a portion of the apparatus of Fig. 5;
Fig. 7 is a simplified block diagram of a portion of the apparatus of Fig. 6;
Fig. 8 is a simplified block diagram of another portion of the apparatus of Fig. 6;
Fig. 9 is a simplified flowchart illustrating the operation of the apparatus of Fig. 5;
Fig. 10 is a simplified pictorial diagram illustrating the operation of a portion of the apparatus of Fig. 5;
Fig. 11 is a simplified block diagram of associative real-time vision apparatus constructed and operative in accordance with an alternative preferred embodiment of the present invention;
Fig. 12 is a simplified pictorial illustration of the operation of the apparatus of Fig. 11 during compare and write commands;
Fig. 13 is a simplified pictorial illustration of interprocessor communication within a portion of the apparatus of Fig. 11;
Fig. 14 is a simplified block diagram illustrating chip interface and interconnections within a portion of the apparatus of Fig. 11;
Fig. 15 is a simplified pictorial illustration of an automaton used to evaluate the complexity of the apparatus of Fig. 11;
Fig. 16 is a simplified block diagram illustrating word format of associative memory within a portion of the apparatus of Fig. 11;
Fig. 17 is a simplified block diagram illustrating another word format of associative memory within a portion of the apparatus of Fig. 11;
Fig. 18 is a simplified block diagram illustrating an additional word format of associative memory within a portion of the apparatus of Fig. 11;
Fig. 19 is a simplified block diagram illustrating an implementation of a method of thresholding utilizing the apparatus of Fig. 11;
Figs. 20A - 20F are simplified pictorial illustrations of test templates illustrating an implementation of a method of thinning utilizing the apparatus of Fig. 11;
Fig. 21 is a simplified block diagram illustrating an implementation of a method of matching utilizing the apparatus of Fig. 11;
Fig. 22 is a simplified block diagram illustrating still another word format of associative memory within a portion of the apparatus of Fig. 11;
Fig. 23 is a simplified block diagram illustrating an additional word format of associative memory within a portion of the apparatus of Fig. 11;
Fig. 24 is a simplified block diagram illustrating another word format of associative memory within a portion of the apparatus of Fig. 11;
Fig. 25 is a graphical illustration of comparative execution time for alternative implementations of a stereo method utilizing the apparatus of Fig. 11;
Fig. 26 is graphical illustration of comparative complexity for alternative implementations of a stereo method utilizing the apparatus of Fig. 11;
Fig. 27 is a simplified block diagram illustrating another word format of associative memory within a portion of the apparatus of Fig. 11;
Fig. 28 is a simplified block diagram illustrating a portion of a method for edge detection utilizing the apparatus of Fig. 11;
Fig. 29 is a simplified block diagram illustrating another word format of associative memory within a portion of the apparatus of Fig. 11;
Fig. 30 is a simplified pictorial illustration of pixels used within a method for processing an associative saliency network utilizing the apparatus of Fig. 11;
Fig. 31 is a simplified block diagram illustrating another word format of associative memory within a portion of the apparatus of Fig. 11;
Fig. 32 is a graphical illustration of normal parameterization of a line used within a method for computing a Hough transform utilizing the apparatus of Fig. 11;
Fig. 33 is a graphical illustration of a portion of a method for Convex Hull generation utilizing the apparatus of Fig. 11;
Fig. 34 is a simplified block diagram illustrating a method for processing an associative Voronoi diagram utilizing the apparatus of Fig. 11;
Attached herewith are the following appendices which aid in the understanding and appreciation of one preferred embodiment of the invention shown and described herein:
Appendix A is a listing of a subroutine called "sub.rtn" which is called in each of the listings of Appendices B - 0;
Appendix B is a listing of a preferred associative signal processing method for generating a histogram;
Appendix C is a listing of a preferred associative signal processing method for ID convolution;
Appendix D is a listing of a preferred associative signal processing method for a low pass filter application of 2D convolution;
Appendix E is a listing of a preferred associative signal processing method for a Laplacian filter application of 2D convolution;
Appendix F is a listing of a preferred associative signal processing method for a Sobel filter application of 2D convolution;
Appendix G is a listing of a preferred associative signal processing method for curve propagation;
Appendix H is a listing of a preferred associative signal processing method for optical flow;
Appendix I is a listing of a preferred associative signal processing method for performing an RGB to YUV transformation;
Appendix J is a listing of a preferred associative signal processing method for corner and line detection;
Appendix K is a listing of a preferred associative signal processing method for contour label ing ;
Appendix L is a listing of a preferred associative signal processing method for saliency networking;
Appendix M is a listing of a preferred associative signal processing method for performing a Hough transform on a signal which is configured as a line;
Appendix N is a listing of a preferred associative signal processing method for performing a Hough transform on a signal which is configured as a circle; and
Appendix 0 is a listing of a preferred associative signal processing method for generating a Voronoi diagram signal.
DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS
Reference is now made to Fig. 1 which is a simplified functional block diagram of associative signal processing apparatus constructed and operative in accordance with a preferred embodiment of the present invention.
The apparatus of Fig. 1 includes a simultaneously accessible FIFO 10, or, more generally, any simultaneously accessible memory, which stores at least a portion of an incoming signal which arrives over a bus termed herein the DBUS. The simultaneously accessible FIFO 10 feeds onto a PE (processor element) array 16 including a plurality of PE's 20 which feed onto a datalink 30 which preferably also serves as a responder memory. Alternatively, a separate responder memory may be provided.
Each PE includes at least one associative memory cell, more typically a plurality of associative memory cells such as, for example, 72 associative memory cells. Each PE 20 stores and processes a subportion of the image, such that the subportions stored and processed by all of the PE's 20 forms the portion of the incoming signal stored at a single time in the simultaneously accessible FIFO 10.
For example, there may be 1024 PE's 20. If the processing task is simple enough to allow each PE to process 2 pixels at a time, then the FIFO may, at a single time, store a block of 2048 pixels within the color image. If the processing task is so complex that two PEs are required to process each pixel, then the FIFO may, at a single time, store a smaller block of only 512 pixels within the color image.
The PE's 20 are controlled by a controller 40 which is typically connected in parallel to all the PE's. Reference is also made to Fig. 2 which is a simplified flowchart of a preferred method for employing the apparatus of Fig. 1. The first step of the method of Fig. 2 is step 54. In step 54, the system receives a user selected command sequence which is to be processed for each pixel of a current block of the color image. The command sequence is stored in command sequence memory 50. Typically, a command sequence comprises commands of some or all of the following types:
a. Compare ╌ Each of one or more PE's compares its contents to a comparand and generates an output indicating whether or not its contents is equivalent to the comparand.
b. Write ╌ Each of one or more PE's if its own contents and/or the contents of other PE's in its vicinity comply with a logical criterion preceding the write command, changes its contents in accordance with a provided writing operand.
c. Single cell shift ╌ The contents of each of one or more PE's shifts, via the datalink 30, into respectively adjacent one or more PE's.
d. Multicell shift ╌ The contents of each of one or more PE's shifts, via the datalink 30, directly into respectively non-adjacent one or more PE's,
Also in step 54, the first block of the incoming signal is received by simultaneously accessible FIFO 10.
The command sequence is then processed, command by command, as shown in Fig. 2.
Reference is now made to Fig. 3 which is a simplified block diagram of associative signal processing apparatus for processing an incoming signal which is constructed and operative in accordance with a preferred embodiment of the present invention. The signal processing apparatus of Fig. 3 includes the following elements, all of which are arranged on a single module 104 such as a single chip:
a. An array 110 of processors or PE's 114, of which, for simplicity, three are shown. Each processor 114 includes a multiplicity of memory cells 120, of which, for simplicity, four are shown. From among each multiplicity of memory cells 120, at least one memory cell (exactly one, in the illustrated embodiment) is an associative memory cell 122. The associative memory cell or cells 122 of each processor are all arranged in the same location or locations within their respective processors, as shown. As an example, there may be IK processors 114 each including 72 memory cells 120, all of which are associative.
Preferably, at least one of the processors is operative to process more than one sample of an incoming signal.
b. A responder memory 130 including one or more registers which are operative to store responders arriving from the processors 120 and, preferably, to serve as a datalink therebetween. Alternatively, a separate datalink between the processors may be provided. Preferably, the datalink function of memory 130 allows at least one multicell shift operation, such as a 16-cell per cycle shift operation, to be performed. The datalink function of memory 130 also preferably performs single cell shift operations in which a shift from one cell to a neighboring cell or from one PE to a neighboring PE is performed in each cycle.
c. A simultaneously accessible FIFO 140, or, more generally a simultaneously accessible memory, which inputs and outputs a signal.
d. A responder counting unit 150 which is operative to count the number of "YES" responders in responder memory 130.
e. Comparand, mask and write operand registers 180, as described above with reference to Fig. 1.
A command sequence memory 160, which may be similar to the command sequence memory 50 of Fig. 1, and a controller 170 are typically external to the module 104. The controller 170 is operative to control the command sequence memory 160.
Preferred methods for associative signal processing are now described, including:
a. Low level associative signal processing methods ╌ histogram generation, 1D and 2D convolution, curve propagation, optical flow, and transformations between color spaces such as RGB to YUV transformations; and
b. Mid-level associative signal processing methods ╌ corner and line detection; contour labeling, saliency networking, Hough transform, and geometric tasks such as convex hull generation and Voronoi diagram generation.
Each of the above associative signal processing methods are now described.
Histogram generation:
A preferred histogram generation method is now described with reference to Appendix B which is a listing of one software implementation of a histogram generation method. The method of Appendix B includes a very short loop repeated for each gray level. A COMPARE instruction tags all the pixels of that level and a COUNTAG tallies them up. The count is automatically available at the controller, which accumulates the histogram in an external buffer.
Convolution:
Low level vision, particularly edge-detection, involves the application of various filters to the image, which are most conveniently executed by convolution. The image may be considered as a simple vector of length N X M or as a concatenation of N row vectors, each of length M. Convolution of an N-element data vector by a P-element filter results in a vector of length N+P-1, but only the central N-P+1 elements, representing the area of full overlap between the two vectors, are typically of interest.
The convolution filter vector [f], of length P and precision 8, is applied as an operand by the controller, one element at a time. The result may, for example, be accumulated in a field [fd] of length 8+8+log2(P). A "temp" bit is used for temporary storage of the carry that propagates through field [fd]. A "mark" bit serves to identify the area of complete overlap by the filter vector.
An example of a row convolution method is provided in Appendix C.
An example of a 2D convolution method which implements a low pass filter is provided in Appendix D.
An example of a 2D convolution method which implements a Laplacian filter is given in Appendix E.
An example of a 2D convolution method which implements the Sobel method of edge detection is given in Appendix F.
Curve Propagation:
Curve propagation is useful in that it eliminates weak edges due to noise, but continues to trace strong edges as they weaken. On the basis of signal statistics and an estimate of noise in the image, two thresholds on gradient magnitude may be computed - "low" and "high". Edge candidates with gradient magnitude under "low" are eliminated, while those above "high" are considered edges. Candidates with values between "low" and "high" are considered edges if they can be connected to a pixel above "high" through a chain of pixels above "low". All other candidates in this interval are eliminated.
The process involves propagation along curves. Associative implementation, a method for which is set forth in detail in Appendix G, uses three flags:
i. "E", which initially marks candidates above "high" threshold (unambiguous edge points), and eventually designates all selected edge points;
li. "OE" (Old Edges) ╌ to keep track of confirmed edges at the last iteration; and
iii. "L" to designate candidates above "low".
At every iteration, each "L" candidate is examined to see if at least one of its 8-neighbors is an edge, in which case it is also declared an edge by setting "E". Before moving "E" into "OE", the two flags are compared to see if steady state has been reached, in which case the process terminates.
Optical Flow:
Optical flow assigns to every point in the image a velocity vector which describes its motion across the visual field. The potential applications of optical flow include the areas of target tracking, target identification, moving image compression, autonomous robots and related areas. The theory of computing optical flow is typically based on two constraints: the brightness of a particular point m the image remains constant, and the flow of brightness patterns varies smoothly almost everywhere.
Horn & Schunck derived an iterative process to solve the constrained minimization problem. The flow velocity has two components (u,v). At each iteration, a new set of velocities [u (n+1), v (n+1)] can be estimated from the average of the previous velocity estimates. A method for implementing the Horn and Schunk method associatively is given in Appendix H.
Color Space Transformation:
One of the major task in color image processing is to transform the 24 bit space of the conventional Red (R), Green (G) and Blue (B) color components, to another space such as a (Y,U,V) space which is more suited for color image compression. A preferred associative method for color space transformation is set forth in Appendix I.
Detection of Corners and Line Direction:
An important feature for middle and higher level processing is the ability to distinguish corners and line direction. In Canny edge detection, line orientation is generated during the process. On the other hand, the M&H algorithm is not directional, and the edge bit-map it produces must be further processed to detect line orientation. According to a preferred embodiment of the present invention, an edge bit-map of a 9 × 9 neighborhood around each pixel is used to distinguish segment direction. The resulting method can typically discriminate 120 different lines and corners. A program listing of this method is set forth as Appendix J.
Contour Tracing and Labeling:
A preparation step labels each contour point with its x,y coordinates. The process is generally iterative and operates on a 3 × 3 neighborhood of all contour points in parallel. Every contour point looks at each one of its 8 neighbors in turn and adopts the neighbor's label if smaller than its own. The circular sequence in which neighbors are handled appreciably enhances label propagation.
Iteration stops when all labels remain unchanged, leaving each contour identified by its lowest coordinates. The point of lowest coordinates in each contour is the only one to retain its original label. The lowest coordinate points have been kept track of and are now counted to obtain the number of contours in the image. A preferred associative method for performing this method is set forth in Appendix K.
Associative Saliency Networking:
Salient structures in an image can be perceived at a glance without the need for an organized search or prior knowledge about their shape. Such a structure may stand out even when embedded in a cluttered background or when its elements are fragmented. Sha'ashua & Ullman have proposed a global saliency measure for curves based on their length, continuity and smoothness.
The image is considered as a network of N × N grid points, with d orientation elements segments or gaps coming into each point from its neighbors, and as many going out to its neighbors. A curve of length L in this image is a connected sequence of orientation elements p(i),p(i+1), ... ,p(i+L), each element representing a line-segment or a gap in the image. An associative implementation of this method is set forth in Appendix L.
Hough Transform:
The Hough transform detects a curve whose shape is described by a parametric equation, such as a straight line or a conic section, even if there are gaps in the curve. Each point of the figure in image space is transformed to a locus in parameter space. After splitting the parameters into suitable ranges, a histogram is generated giving the distribution of locus points in parameter space. Occurrence of the object curve is marked by a distinct peak in the histogram (intersection of many loci).
In the case of a straight line the normal parameterization may be used and given by: Xcos (A) + Ysin(A) = R which specifies the line by R and the angle A, and the histogram includes a straight line in every direction of A. But if the candidate points are the result of edge detection by a method that yields direction, then the angle A is known. An associative implementation of this method is set forth in Appendix M.
Example: The major steps required to perform a Hough transform operation for a sample image of 256 × 256 pixels are now described: The 256 × 256 image with the origin at its center is arranged linearly pixel after pixel and row after row in multiple chips, each in one processor. For example, if the chip includes 1K processors, 64 chips may be used to hold the 256 × 256 = 64K. The x-y coordinates are given by 8 bits in absolute value and sign. Angle A from 0 to pi (3.1415...) is given to a matching precision of 10 bits (excluding sign of gradient). The sine and cosine are evaluated by table look-up. Preferably, the table size is recued four-fold to take into account the symmetry of these functions. After comparing A, the histogram is evaluated and read-out element by element using a "countag" command.
Example: A circle with given radius R and center x0,y0 is to be detected. The direction of the gradient is employed to simplify the process, dy/dx = (x-x0) / (y-y0) = tan(T- pi/2), by differentiation, where T is the gradient direction. Solving for x0 and y0 :
x0 = x +/- R sin (T - pi/2); and
y0 = y +/- R cos (T - pi/2).
A histogram is generated for x0,y0. An associative implementation of a preferred Hough transform is set forth in Appendix N.
Voronoi Diagram:
This type of diagram is useful for proximity analysis. Starting with a given set of L points in the plane, P(i), i=1,2,...,L, the Voronoi diagram surrounds each point P(i) by a region, R(i), such that every point in R(i) is closer to P(i) than to any other point in the set, P (j), j=1, 2, ...L and 1 not equal to j. The boundaries of all these regions, R(i), constitute the Voronoi diagram.
An associative method based on the "brush fire" technique is presented in Appendix 0. Each of the given points acts as a source of "fire" that spreads uniformly in all directions. The boundaries consist of those points at which fires from two (or three) sources meet. Every point in the given set is initially marked with a different color, such as its own xy-coordinates. Each point in the image looks at its 8-neighbors. A blank (uncolored) point that sees a colored neighbor will copy its color. If both are colored, the point will compare colors, marking itself as a Voronoi (boundary) point if the colors are different. This process is iterated until all points are colored.
The following are basic associative signal processing steps which are frequently used in the methods of Appendices B to 0. Any of the steps categorized as Group 1 may be carried out in parallel with any of the steps from Group 3. Also, any of the steps from Group 1, any of the steps from Group 2 and any of the steps from Group 4 may be carried out in parallel.
Group 1
Number Step (abbreviation) Step (full description) 1 LETM Load mask register
2 LETC Load comparand register
3 LETMC Load mask and comparand
4
Figure imgf000035_0001
LMCC Load mask clear comparand
5 LMCCXX Load mask clear comparand exclusive
6 LCSM Load comparand set mask
7 LMX Load mask exclusive
8 LCX Load comparand exclusive
9 LMSC Load mask set comparand
10 SMX Set mask exclusive
11 SCX Set comparand exclusive
Group 2
12 SETAG Set all responders to (yes)
Group 3
13 SHUP Single cell shift up
14 SHDN Single cell shift down
15 LGUP Multiple cell shift up
16 SHUP Multiple cell shift down
Group 4
17 COMPARE Search for a specific value
18 WRITE Write a value to one or more processors simultaneously
Group 5
19 READ Read data from one processor
20 COUNTAG Count all responders
21 FIRSEL Mark the first responder
22 CONFIGURE FIFO Select I/O bit/s for FIFO 23 NOP No operation
To employ any of the methods described in any of Appendices B to O, the listings of Appendices B to 0 may be run on any "C" language compiler such as the Borland "C++" compiler, using the CLASS function.
As already described, each of the methods of Appendices B to O includes the following steps:
a. a basic memory size defining step;
b. a basic associative word length defining step; c. a step in which the subroutine of Appendix A is called; and
d. steps which are specific to the individual application.
It is appreciated that the particular embodiment described in the Appendices is intended only to provide an extremely detailed disclosure of the present invention and is not intended to be limiting.
One implementation of an associative signal processing chip is now described. The implementation is termed herein ASP 100.
1. Introduction
The ASP100 is an associative processing chip. It is intended to serve as part of the Vision Associative Computer, most likely in an array of multiple ASP100 chips.
The ASP100 consists of an associative memory array of 1K×72 bits, peripheral circuitry, image FIFO I/O buffer, and control logic.
2. Modes of Operation
The ASP100 can be operated as a single chip (Single Chip Mode) or as part of an array of ASP100 chips (Array Mode).
2.1 Single Chip Mode The Single Chip Mode is shown in Figure 4. In this mode a single ASP100 is operated in conjunction with a controller.
2.2 Array Mode
The Array mode is shown in Figure 5. An array of ASP100 chips are interconnected in parallel, constituting a linear array. A single controller chip, or a suitable circuit, controls the array.
3. Pinout
3.1 Basic Pinout
The ASP100 may be packaged in a 160 pin PQFP package. All output and I/O pins are 3-state, and can be disabled by CS. Following is the complete list of pins:
Figure imgf000038_0001
Figure imgf000039_0001
Figure imgf000040_0001
Figure imgf000040_0002
3.2 Test Pinout
The following is a list of the pins which serve as test points. They can be ignored while packing the production version:
Figure imgf000042_0001
4. Architecture and Operation
4.1 Top-Level View
The top-level view is shown in Figure 6. ARRAY is the main associative array. FIFO is the image input/output fifo buffer. SIDE is the side-array, consisting of the tag, the tag logic, the tag count, the select-first, the row drivers (of WriteLine and MatchLine) and sense amplifiers, and the shift units.
TOP consists of the mask and comparand registers, and the column drivers (of BitLine, InverseBitLine, and MaskLine). BOTTOM contains the output register and sense amplifiers. CONTROL is the control logic for the chip. Microcontrol is external in this version.
4.2 ARRAY Architecture
The Array consists of 1024×72 associative processing elements (APEs), organized in three columns of 24 APEs wide each, and physically split into three blocks of 342×72 APEs. This six-way split achieves square aspect ratio of the layout and also helps contain the load of the vertical bus wires.
As is explained in Section 4.3 below, one 24 bit sector of the array is reconfigurable as follows (by means of the "CONFIFO" Configure Fifo instruction):
1. All 24 bits serve as FIFO (total ARRAY width is 48).
2. 16 bits FIFO, 8 bits ARRAY (total ARRAY width is 56).
3. 8 bits FIFO, 16 bits ARRAY (total ARRAY width is 64).
The Associative Processing Element (APE) is a Content Addressable Memory (CAM) cell. It consists of three components:
Storage element;
Write device; Match device.
There are three vertical incident buses and four horizontal incident buses in the APE:
Vertical: Bit Line (BL)
Inverse Bit Line (IL)
Mask Line (MASK)
Horizontal: Write Line (WL)
Match Line (ML)
VDD VSS (GND)
The Storage element consists of two cross coupled CMOS inverters.
The Write device implements the logical AND of MASK and WL, so that it can support the MASKED WRITE operation.
The Match device implements a dynamic EXCLUSIVE OR (XOR) logic. This technique allows area efficient and reliable implementation of the COMPARE operation.
4.3 FIFO Architecture
Reference is now made to Fig. 7. The FIFO is designed to input and output image data in parallel with computations carried out on the ARRAY. It consists of a reconfigurable MATRIX of 1024 × [24 or 16 or 8] APEs 190, three columns 192 each of 1024 bi-directional Switches, and Address Generator 194. The corresponding section of the Comparand register, in TOP, serves as the FIFO input register, and the corresponding section of the Output Register, in BOTTOM, serves as the FIFO Output Register. The FIFO Controller FC resides in TOP.
The FIFO is configured by the CONFIFO instruction, where the three LSBs of the operand are:
6 for 8 bit FIFO,
5 for 16 bit FIFO,
3 for 24 bit FIFO, and
7 for no FIFO. When FIFO is 8 bits wide, inputs VIN[0:7] and outputs VOUT[0:7] are routed to bits [0:7] of the FIFO, where bit 0 is the "leftmost" bit. When FIFO is 16 bits wide, inputs VIN[0:15] and outputs VOUT[0:15] are routed to FIFO bits [0:15], such that the least significant byte (0:7) is the same routing as in the previous case. Similarly, when the FIFO is 24 bits wide, the least significant 2 bytes are routed in the same way as in the case of a FIFO 16 bits wide.
The Address Generator 194 consists of a shift register and implements a sequential addressing mode. It selects a currently active FIFO word line.
The FIFO has two modes of operation, IMAGE I/O Mode and IMAGE EXCHANGE Modes. The bi-directional Switches (one column of the three) disconnect the MATRIX from the ARRAY in IMAGE I/O Mode (see below) and connects the MATRIX to the ARRAY in IMAGE EXCHANGE Mode, creating a combined array of APEs. The Input and Output Registers serve as buffer registers for the image I/O.
IMAGE I/O Mode
In IMAGE I/O Mode, a new image is read into the FIFO, while the processed (preceding) image is written out. The FIFO Controller (FC) controls the FIFO as follows: Pixel I/O is synchronous with the CLK. External control input RSTFIFO resets (clears) the Address Generator 194. FENB (asserted for at least 2 CLK cycles) enables the input (and output) of the next pixel (on the positive edge of CLK). Once all pixels entered (and output), FFUL is asserted for 2 CLK cycles. This I/O activity is performed asynchronously of the computation in the rest of the chip.
The basic operation of IMAGE I/O mode is carried out as follows. The pixel at the VIN pins is entered into the FIFO Input Register (the FIFO section of the comparand register). The Address Generator 194 enables exactly one word line. The corresponding word is written into the FIFO Output Register (the FIFO section of the Output Register), and through it directly to the VOUT pins, in an operation similar to Read execution. Subsequently, the word in the FIFO Input Register is written into the same word, similar to a Write execution.
Note that VOUT pins are 3-state. They are enabled and disabled internally as needed.
This sequence of operations is carried out in a loop 1024 times in order to fill the 1024 processors with data.
Multiple ASP100 chips can be chained together with a FENB/FFUL chain, where the first ASP100 receives the FENB from an external controller (true for 2 cycles), the FFUL of each ASP100 (true for 2 cycles) is connected directly to the FENB input of the next chip, and the last FFUL goes back to the controller.
IMAGE EXCHANGE Mode
In IMAGE EXCHANGE Mode, the image previously loaded into the FIFO is transferred into the ARRAY for subsequent processing, and the previously processed image from the ARRAY is transferred to the FIFO for subsequent output. These transfers are carried out via the TAG register of the SIDE block by a sequence of COMPARE and WRITE operations, as follows:
IMAGE IN
A destination bit slice of the ARRAY is masked by MASK register and is then reset by a chain of SETAG; ClearComparand, WRITE operations (which can all be executed in one cycle). A source bit slice of the FIFO MATRIX is masked by the MASK register. The contents of the bit slice are passed to the TAG register as a result of the COMPARE operation. The destination bit slice is masked again and then the contents of the TAG register are passed to the destination bit slice by a SetComparand, WRITE operation. In summary, the following five cycles are employed: FOR (all bit slices of FIFO):
LMCC ( sector 0/1/2, destination ARRAY bit); SETAG; WRITE
/* reset destination bit slice in the proper array sector */
LETMC(sector 2, source FIFO MATRIX bit); COMPARE /* copy source slice (FIFO sector) to TAG */
LETMC( sector 0/1/2, destination ARRAY bit); WRITE /* copy TAG to destination in the proper sector */ IMAGE OUT
This operation is carried out in exactly the same way as IMAGE IN, except that a destination bit slice is allocated m the FIFO MATRIX while a source bit slice is allocated in the ARRAY. Note that IMAGE EXCHANGE operation requires two different fields in the ARRAY (a first field for allocation of a new image, and a second one for temporary storage of the processed image). The two operations (IMAGE IN & OUT) can be combined in one loop.
4.4 SIDE Architecture
Reference is now made to Fig. 8 which illustrates a preferred implementation of the SIDE block of Fig. 6. The SIDE block is shown to include the TAG register, the NEAR neighbor connections, the FAR neighbor connections, the COUNT_TAG counter, the FIRST_RESPONDER circuit, the RSP circuit, and the required horizontal bus drivers and sense amplifiers.
The TAG register consists of a column of 1024 TAG_CELLs. The TAG register is implemented by D flip flop with a set input and non-inverse output. The input is selected by means of a 8-input multiplexer, with the following inputs: FarNorth, NearNorth, FarSouth, NearSouth, MatchLine (via sense amp), TAG (feedback loop), GND (for tag reset), and FirstResponder output. The mux is controlled by MUX [0 : 2].
The NEAR neighbor connections interconnect the TAG_CELLs in an up/down shift register manner to nearest neighbors. They are typically employed for neighborhood operations along an image line, since pixels are stored consecutively by lines. The FAR connections interconnect TAG_CELLs 16 apart, for faster shifts of many steps. They are typically used for neighborhood operations between image lines.
The following instructions affect TAG: SETAG, SHUP, SHDN, LGUP, LGDN, video load up and video load down microcode signals termed herein VLUP and VLDN respectively, COMPARE, FIRSEL.
The COUNT_TAG counter counts the number of TAG_CELLs containing '1'. It consists of three iterative logic arrays of 1×342 cells each. The side inputs of the counter are fed from the TAG register outputs. The counter operates in a bit-serial mode, starting with the least significant bits. In each cycle, the carry bits are retained in the FF (memory cell flip-flop) for the next cycle, and the sum is propagated down to the next stage. The counter is partitioned into pipeline stages. The output of all six columns are added by a summation stage, which generates the final result in a bit-serial manner. The serial output appears on the CTAG outputs and signal CTAGVAL (CTAG valid) is generated by the controller. COUNT_TAG counter is activated by the COUNTAG instruction.
The FIRST_RESPONDER circuit finds the first TAG CELL containing '1', and resets all other TAG_CELLs to '0'. It is activated by the FIRSEL instruction. The beginning of the chain is fed from a FIRSTIN input, wherein FIRSTIN is a microcode command according to which the first arriving datum is the first datum to enter the chip memory. If FIRSTIN is '0', then all TAG_CELLs are reset to '0'. This is intended to chain the FIRST_SELECT operation over all ASP100s interconnected together, and the OR of the RSP outputs of the lower-numbered ASP100 should be input into FIRSTIN.
The TAG outputs can be disconnected from the FIRST_RESPONDER and COUNT_TAG circuits, in order to save power, by pulling the FIRCNTEN control input to '0'.
The RSP circuit generates '1' on the RSP output pin after COMPARE instruction, if there is at least one '1' TAG value. This output is registered.
4.5 TOP Architecture
The TOP block consists of the COMPARAND and MASK registers and their respective logic, and the vertical bus drivers. The COMPARAND register contains the word which is compared against the ARRAY. It is 72 bits long, and is partitioned according to FIFO configuration (see Section 4.3) It is affected by the following instructions: LETC, LETMC, LMCC, LMSC, LCSM. All these instructions affect only one of the three sectors at a time, according to the sector bits. The FIFO section of the COMPARAND operates differently, as described in Section 4.3.
The MASK registers masks (by '0') the bits of the COMPARAND which are to be ignored during comparison and write. The BitLines and InverseBitLines of the masked bits are kept at '0' to conserve power. It is affected by the following instructions: LETM, LETMC, LMCC, LMSC, LCSM, LMX, SMX. The former five instructions affect only one sector at a time, whereas LMX and SMX also clear the mask bits of the non-addressed sectors. The FIFO section of the MASK operates differently, as described in Section 4.3.
The 24 bit data input from DBUS (the operand) are pipelined by three stages, so as to synchronize the operand and the corresponding control signals.
4.6 BOTTOM Architecture
BOTTOM contains the BitLine and InverseBitLine sense amplifiers, the Output Register and its multiplexor, the DBUS multiplexors, and the DBUS I/O buffers. Since the ARRAY is physically organized in three columns, the output of the three sense amplifiers must be resolved. A logic unit selects which column actually generated the output, as follows:
READ: Select the column whose RSP is true.
FIFO OUT: Select the column in which the address token is in.
The Output Register is 72 bits long. 8 or 16 or 24 bits serve the FIFO and are connected to the VOUT pins. On READ operation, one of the three sectors (according to the sector bits) is connected to 24 bits of DBUS via a multiplexor.
DBUS multiplexors allow two configurations:
1. SHIFT: Connects the south long shift (from rows 1008:1023) to DBUS [31: 16] and the north long shift lines (from rows 0:15) to DBUS[15:0].
2. READ: Connects bits [15:0] (bit 0 is LSB) of the Output Register to DBUS [15:0], and bits [23:16] of the OR to DBUS [23: 16].
The DBUS I/O buffers control whether the DBUS is connected as input or output, and are controlled by the HIN and LIN control signals:
Instruction LIN HIN
SHUP, LGUP, VLUP 1 0
SHDN, LGDN, VLDN 0 1
READ 0 0 other 1 1
Since, in SHIFT mode, lines DBUS [31: 16] of one ASP100 chip must be connected to lines DBUS [15:0] of the next ASP100 chip, the external connections are not simply bused but should be switched as necessary.
4.7 CONTROL Architecture
The ASP100 is controlled by means of an external microcoded state machine, and it receives the decoded control lines. The external microcontroller allows horizontal microprogramming with parallel activities.
The combined operation of APS100, its microcontroller, and the external controller is organized in a five-stage instruction pipeline, consisting of the following stages: Fetch, Decode, microFetch, Comparand, and Execute. In the Fetch stage, the instruction is fetched from the external program memory, and is transferred over the system bus to the IR (instruction register).
In the Decode stage, the instruction (from the IR) is decoded by the microcontroller and stored in the μlR. In the μFetch stage, the control codes are transferred from the external μlR, through the input pads, into the internal μlR. In the Comparand stage, parts of the execution which affect the Comparand register are carried out, and the control codes move from the internal μIR to the internal μIR2. In the Execute stage, execution in the ARRAY and other parts takes place. See Figure 9.
Note that more pipeline stages separate the ASP100 from the External Program Memory, necessary for instruction transfer from the memory via the bus to the IR inside the microcontroller.
Pipeline breaks occur during SHIFT and READ, when DBUS is used for data transfer rather than instructions, and as a result of branches. Branches are handled by the external controller, and are interpreted as NOP by the APS100 microcontroller. Similarly, operations belonging to the controller are treated as NOPs.
4.8 Initialization
On Reset (active low), all ASP100 internal registers are set to zero, except for TAG and the memory array. 4.9 Operation Notes
1. Write is designed so that while WRITE;COMPARE;WRITE: ... can be done, two consecutive WRITES and two consecutive COMPARES cannot.
2. In all instructions which include SETs, such as SETAG, the corresponding control signal LETM and/or LETC must be set high. This is required since the set and reset of the comparand and mask registers is synchronous.
4.10 Clock Generator
Preferably, a single 50 MHz CLKIN clock is input into the ASP100. In addition, a clock synchronization control DCKIN signal is also input. The CLKIN signal serves as the clock of the generator circuit. DCLKIN is an input signal (abiding by the required setup and hold timing). The circuit creates two clocks of, for example, 25 MHz, CLK and DCLK, delayed by 1/4 cycle relative to each others. CLK is fed back into a clock-generating pad, to provide the required drive capability. CLK, DCLK and their complements provide for four-phase clocking, as shown in Figure 10.
5. Programming Model
5.1 Instruction Set
An example of an instruction set is now described. Two instruction formats are employed. Instruction format A is used for Group 1 and READ instructions, and instruction format B is used for all other groups. It contains one bit for NOP, five OpCode bits, two sector bits, and 24 operand bits.
Instruction format B contains one bit for NOP, seven OpCode bits, and 24 operand bits.
As presented here, these formats do not allow parallel execution of multiple instructions. It is appreciated that, in an alternative implementation of the instruction set format, parallel execution of multiple instructions could be included. In the following table, d(n) is a n-bit argument, n≤24, and s(2) is a 2-bit sector number.
Figure imgf000054_0001
Figure imgf000055_0001
Parallel ("Horizontal") Execution
1. Groups 1,3 instructions can be carried out in parallel.
2. Groups 1,2,4 instructions can be carried out in parallel.
It is appreciated that various features of the invention which are, for clarity, described in the contexts of separate embodiments may also be provided in combination in a single embodiment. Conversely, various features of the invention which are, for brevity, described in the context of a single embodiment may also be provided separately or in any suitable subcombination.
Associative real time vision research vision research findings of Dr. Avidan Akerib, which were submitted to the Weizman Institute of Science, Rehovot, Israel, in the framework of a doctoral thesis, are now described with reference to Figs. 11 - 34.
The ARTVM Architecture
The Associative Real Time Vision Machine (ARTVM) was endowed with a number of characteristics which enable it to meet the vision requirements. In describing the structure of the machine and its primitive operations these attributes will be emphasized.
Referring to Fig. 11, the core of the machine is a basic, classical, associative processor that is parallel by bit as well as by word. The main associative primitive is COMPARE. The comparand register is matched against all words of memory simultaneously and agreement is indicated by setting the corresponding tag bit. The comparison is only carried out in the bits indicated by the mask register and the words indicated by the tag register. Status bit rsp signals that there was at least one match (Fig. 12). The WRITE primitive operates in a similar manner. The contents of the comparand are simultaneously written into all words indicated by the tag and all bits indicated by the mask (Fig. 12). The READ command is normally used to bring out a single word, the one pointed to by the tag. Since the combination COMPARE-WRITE is of the type "if condition then action" , all logical and arithmetic functions can be executed [3]. Hence the associative machine may be regarded as an array of simple processors, one for each word in memory. ARTVM provides N × N words, one for each pixel in the image to be processed, and the pixels are arranged linearly, row after row. The full machine instruction set is given in the following table.
Neighborhood operations which play an important role in vision algorithms, require bringing data from "neighboring" pixels. Data communication is carried out a bit slice at a time via the tag register, by means of the SHIFTAG primitives, as shown in Fig. 13. The number of shifts applied determines the distance or relation between source and destination. When this relation is uniform, communication between all processors is simultaneous. Fortunately, neighborhood algorithms only require a uniform communication pattern. Since the image is two-dimensional while the tag register is only one-dimensional, communication between neighbors in adjacent rows requires N shifts. To facilitate these long shifts a multiple shift primitive, SHIFTAG(±b), was implemented in hardware, where 6 is a sub-multiple of N. The time complexity in cycles for shifting an N × N image k places is given by
Figure imgf000058_0001
where M is the precision and b is the extent of the multiple-shift primitive.
Loading data images into associative memory and outputting computed results could take up much valuable processor time. This can be avoided by distributing a frame buffer within associative memory and giving it access to the tag register [33]. That is the function of the I/O Buffer Array which consists of a 16-bit shift register attached to each word. A stereo image frame can be shifted into the buffer array as it is received and digitized, without interfering with associative processing. During vertical blanking, the stereo image frame is transferred into associative memory, a bit slice at a time, using
Figure imgf000059_0001
the TAGXCH (TAG eXCHange) primitive. Under this command, the buffer array is rotated right circularly via the tag register. If results of computation on the previous frame are available for output, they can be entered into the tag by a compare instruction before executing the tagxch. which will now put out a bit-slice at the same time that it brings one in - hence the name tag exchange for this primitive. For a full stereo image this operation must be repeated 16 times. During the next frame time, both input and output proceed in parallel with processing without interference. The following routine will exchange the contents of the buffer array with those of a 16-bit field in associative memory starting at bit position i0, for(i=i0; i<i0+16; i++)
{
letmc d(i); setag; compare; /* load tag with memory bit slice */
letc; write; /* clear bit slice in memory */
tagxch; /* exchange bit slice with buffer array */
letmc d(i) ; write; /* load memory with bit slice from buffer */
}
Execution time is 64 machine cycles (under 2 μs), which is negligible compared to the vertical blanking period (1.8 ms). While the sample routine exchanges data between the buffer array and a continuous field in memory, it should be noted that the tagxch primitive is quite flexible, can fetch data from one field and put to another, and both fields can be distributed.
Up to four operations may be done concurrently during a given memory cycle: SETAG or SHIFTAG; loading M (SETX, LETX); loading C (SETX, LETX); and COMPARE, READ or WRITE. FIRSEL resolves multiple responses in 6 cycles. COUNTAG is used to compile statistics and executes in 12 cycles. Control functions are given in the C language, and are carried out in parallel with the associative operations, hence do not contribute to the execution time.
To illustrate parallel processing capability, suppose that associative memory contains two data vectors, A and B, each having J elements and M-bit precision. We wish to replace vector field B by the sum A + B. The associative operation is carried out sequentially, a bit slice at a time, starting with the least significant bit. In each step, the three slices Ai, Bi and C (i = 0, 1, . . . , M - 1, and C is the carry slice) are compared in parallel to an input combination of the applicable truth table by means of the statement [letm d(.); letc d(.); setag; compare], followed by a parallel replacement of B and C with the appropriate output combination by means of the statement [letc d(.) write]. A full description of the routine is given in The Machine Simulator. The execution time of addition in machine cycles is 8.5M which is seen to be independent of vector size J. It follows that for a 512 × 512 image (J = 218) and a machine cycle of 30 nanoseconds (see VLSI Chip Implementation and Interconnection) , th e machine executes 125 billion 8-bit additions per second. Associative subtraction operates on the same principle and also takes 8.5 M cycles. It is easy to extend addition to multiplication (8.5M2 cycles) , and subtraction to division (15.5M2 cycles). Multiplication techniques will be discussed in detail in Vision Algorithms. ARTVM was seen to be microprogrammed, hence it can operate at just the precision needed in each phase and produce just the significant bits required in the result. In many vision algorithms precision is quite low, and this gives the ARTVM an additional speed advantage over conventional machines.
As mentioned earlier, each word in memory acts as a simple processor so that memory and processor are indistinguishable. Input, output and processing go on simultaneously in different fields of the same word. The field to be accessed or processed, is completely flexible, depending on application. Hence, processing capabilities (families of vision algorithms) may be expanded by increasing word length K. It will be shown in Vision Algorithms that for K = 152 all the vision algorithms we considered will run in real time, while K = 32 is sufficient for many simple image processing applications (such as histogram evaluation, convolution, morphological operations etc.).
VLSI Chip Implementation and Interconnection
Ruhman and Scherson [34, 35] devised a static associative cell and used it to lay out an associative memory chip. After evaluating its performance by circuit level simulation, they conservatively estimated the area ratio between associative memory and static RAM at 4. Considering that 4 megabits of static RAM is now commercially available on a chip area less than 100mm2, associative memory chip capacity becomes 1 M bits. The proposed chip for ARTVM stores 4K words × 152 bits, which is only 59 percent of capacity. Conservative extrapolation of cycle time to a 0.5 micron technology yields 30 nanoseconds. This value was used in computing execution time of associative vision algorithms.
Loading an entire comparand or mask word would require a bus 152 bits wide. Fortunately, associative algorithms only operate on one or two short fields and on a number of flag bits at a time. Hence the word was partitioned into four 32-bit sectors and an 8-bit flag field. Buses are provided for simultaneous access to the flag field and one sector. Fig. 14 describes the chip interface, and shows how 64 such chips can be interconnected to make up the associative memory of ARTVM. Considering the exponential growth of chip capacity by a factor of 10 every five years [36], the ARTVM may be reduced to 8 chips around 1995. Since the bulk of the machine is associative memory, upgrading is simple and inexpensive.
As indicated in Fig. 11, a control unit is required to generate a sequence of microin-structions and constants (mask and comparand) for associative memory as well as to receive and test its outputs (count and rsp). This unit may be realized using high speed bit-slice components, or may be optimized by the design of one or more VLSI chips. The functions of the control unit will become apparent from the associative algorithms that follow.
See fig. 14, for the notation employed.
The Machine Simulator
A simulator for the ARTVM was created, which enables the user to check out associative implementations of vision algorithms and to evaluate their performance. It was written in the "C" language and is referred to as "asslib.h". The vision machine simulator consists of an associative instruction modeler and an execution time evaluator.
Associative Instruction Modeler
The main features are:
• The dimensions of associative memory have been defined as variables memsize and word-length, hence must be initialized in the application program by # define commands.
• The contents of associative memory and its registers are defined as external parameters called:
A[MEM_SIZE] [WORD-LENGTH]
mask [WORD-LENGTH]
comparand [WORD-LENGTH]
tag[MEM_SIZE]
All are members of a structure called parameters. • The associative instructions defined in The ARTVM Architechure were implemented as "C" functions which get their inputs from, and write their results to the external structure parameters.
• Three instructions were added: load, save and print-cycles. The load command initializes the array A[.][.] with data from the file ass.inp, while the save command writes into file ass. out the contents of memory array A[.][.] at the end of the application program. The print-cycles command displays the number of machine cycles required to execute the simulated program.
• Program control functions are written directly in "C".
• The general format of an application program is as follows:
# define MEM.SIZE mmm
# define WORD-LENGTH www
# include " asslib.h"
main( )
{
load;
BODY
save;
print_cycles;
}
Execution Time Evaluator
Speed evaluation was achieved by modeling the machine as a simplified Finite Automaton (F.A.) in which a cost, in machine cycles, was assigned to each state transition. The machine has only two states: S0 and S1. The input alphabet was selected by grouping the instructions into 5 categories called Ii, i = 1, . . . , 5, where:
Figure imgf000064_0001
Fig. 15 presents the transition table and diagram. At initialization we reset a cycle counter (called cycles) to zero, then increment it at each state transition by the assigned cost in cycles (appears as output in the diagram).
Figure imgf000064_0002
This speed model reflects the fact that any instruction from group I1 can execute simultaneously with one from group I2, and they can both be overlapped by an instruction from group I3. The cost of countag (I4) is conservatively estimated at 12 cycles on the basis of implementing it on-chip by a pyramid of adders, and summing the partial counts off-chip in a two-dimensional array. The cost of firsel (I5) is conservatively estimated at 6 cycles on the basis of implementing it by a pyramid of OR gates whose depth is log2 N - 1, part of which is on-chip and the rest off-chip. In the worst case, the pyramid must be traversed twice and then a tag flip-flop must be reset. The simplicity of the model for speed evaluation imposes a mild restriction on the programmer. Instructions that are permitted to proceed concurrently must be written in the order I1, I2, I3.
To illustrate the instruction set and assembly language used by the simulator, we present below a listing of the vector addition program discussed earlier (The ARTVM Architecture).
Figure imgf000066_0001
Vision Algorithms
To test the flexibility and speed of the proposed associative architecture ARTVM, a broad range of vision functions were implemented. They included low level algorithms such as histogram generation, convolution, edge detection, thinning, stereo matching and optical flow. Mid-level functions were also implemented, including contour tracing and labeling, Hough transforms, saliency mapping, and such geometric tasks as the convex hull and Voronoi diagram. Our simulator was used to test the associative algorithms and to verify their complexity.
Before describing the associative algorithms it might be helpful to recall briefly the configuration of the machine, the value of some parameters and the data structure employed. The image resolution is taken to be 512 × 512 (N=512), hence associative memory capacity is 256K words (one word per pixel) . The data is arranged linearly in memory, in the order of the video scan, row after row of pixels, starting at the top left hand corner of the image. Incoming data is given to 8-bit precision (M = 8), and although the processing is all fixed-point, the algorithms are designed to retain the full inherent accuracy. The long shift instruction was provided for communication between rows, and its extent was denoted by b, where b is a submultiple of the row length, N. In our model of the machine, the extent of long shift is taken to be 32 places (b = 32). Algorithm complexity will be expressed in cycle times; the machine cycle was conservatively estimated in VLSI Chip Implementation and Interconnection at 30 ns., and this value will be used to compute execution time.
Low Level Vision
Histogram
The associative nature of ARTVM and its relatively fast countag instruction facilitate histogram evaluation. The program is given in Listing 1. It consists of a very short loop repeated for each gray level: a compare instruction tags all the pixels of that level, and a countag tallies them up. The count is automatically available at the controller, which accumulates the histogram in an external buffer. Accordingly, the time complexity in machine cycles is given by,
Thist = 0.5 + (13)2M (1) where M is the gray level precision and is taken to be 8 bits in our model. Hence the histogram is executed in under 3330 machine cycles or nearly 100μs.
LISTING 1: Associative Computation of Image Histogram
main .)
{
int gray_level;
long int hist ogram_array [256] ;
letm dseq(0 ,7) ; /* sets the first 8 bits in the mask */
for(gray_level=0; gray_level<256; gray_level++)
{
letc dvar (0 , 7 , gray_level) ; setag; compare ;
histogram_array [gray.level] = countag;
}
}
Convolution
Low level vision, particularly edge-detection, involves the application of various filters to the image, which are most conveniently executed by convolution. The image may be considered as a simple vector of length N2 or as a concatenation of N row vectors, each of length N. Convolution of an N-element data vector by a P-element filter results in a vector of length N + P - 1, but only the central N - P + 1 elements, representing the area of full overlap between the two vectors, are of interest here. We developed several techniques to implement convolution associatively, depending on filter characteristics such as dimensional separability, and symmetry. We start with the multiply-and-shift approach given by Ruhman & Scherson [6, 37, 38]. The word format in associative memory is depicted in Fig. 16.
The convolution filter vector [f] of length P and precision 8 is applied as operand from the machine controller, one element at a time. The result is accumulated in field [fd] of length 8 + 8 + log2(P). Bit temp is used for temporary storage of the carry that propagates through field [fd]. The mark bit serves to identify the area of complete overlap by the filter vector.
The program for row convolution is given in Listing 2. Field [d] is multiplied by successive elements of vector [f] and the results accumulated in field [fd]. Between multiplications, data field [d] is shifted down one word position for row convolution; down N word positions for column convolution. The f-element acts as multiplier, each of its bits is tested by the controller, and if set will cause field [d] to be added into field [fd] starting at bit position add.offset. After each addition the carry is propagated to the highest bit of [fd]. For column convolution, only the last program line must be changed, replacing "shiftag(l)" by "for(i=0;i<N/b;i ++) shiftag(b)".
LISTING 2: 1-d Image Convolution
/********************** ASSOCIATIVE CONVOLUTION PROGRAM *******************/
main.)
{
/* ... declarations */
for(f_index=0; f_index<f_size; f_index++)
{
/************************* Summed Multiplication *************************/
/* add [d] to [fd] if the bit at position bit .count of f[f_index] is ONE */
for(add_offset=0; add.off set<n; add.offset++)
if ( BIT (add_ offset) OF_WORD f[f_index]) /*test if bit is ONE */
{
/************ add ***************/
for(bit_count=0; bit_count<n; bit_count++ )
fd_bitcnt = add_offset + bit_count;
letm d(fd_bitcnt) d(temp) d(d_offset+bit_count) d(mark);
letc d(temp) d(mark); setag; compare;
letc d(fd_bitcnt) d(mark); write;
letc d(fd_bitcnt) d(temp) d(mark) setag; compare;
letc d(temp) d(mark); write;
letc d(fd_bitcnt) d(d_offset+bit_count) d(mark); setag; compare;
letc d(temp) d(d_offset+bit_count) d(mark); write;
letc d(d_offset+bit_count) d(mark); setag; compare;
letc d(fd_bitcnt) d(d_offset+bit_count) d(mark); write;
}
/****** propagate carry *******/
letc d(mark) d(temp);
while (fd-bitcnt < 2*n +3 )
{
letm d(fd.bitcnt) d(mark) d(temp); setag; compare;
letc d(fd.bitcnt) d(mark); write;
letc d(fd.bitcnt) d(mark) d(temp); setag; compare;
letc d(mark) d(temp); write;
bit_count++;
}
}
/********* shift [d] field down *********/
for(bit_count=mark+1; bit_count<=mark+n; bit_count++)
{
letmc d(bit_count); setag; compare; letc; write; letc d (bit-count ) ; shiftag ( 1 ) ; write ;
}
}
The time complexity for 1-d convolution in machine cycles is given by,
T1d = PM[α(Mtα + Tp 1d) + ts] (2) where tα , ts are the per-bit addition and shift complexity, and Tp 1d is the complexity of carry propagation over field [fd]. Since addition and carry propagation are only executed for a ONE digit in the multiplier (filter element), time complexity is a function of α, the ratio of ONE digits in the filter vector elements, whose range is . From the program listing,
Figure imgf000070_0003
addition time tα is 8.5 cycles. Carry propagation takes 4 cycles per bit, and the average propagation distance is , hence . The program time
Figure imgf000070_0001
Figure imgf000070_0002
to shift a field, excluding shiftag (), is 3 cycles per bit. Therefore, to move a pixel field over to the next column takes tc s = 3 cycles per bit, and to the next row per
Figure imgf000070_0004
bit. Extending the above algorithm to 2-d gives,
T2d = αP2M(tα + Tp 2d) + M(P - 1) ( tr s + Ptc s) (3) where is the average carry propagation for 2-d convolution. By
Figure imgf000070_0006
substitution of tα , Tp2d and ts into 11, we get:
Figure imgf000070_0005
This simple algorithm is quite efficient for 1-d convolution, particularly if the filters used are inherently of low precision (α << 1) . But for a wide 2-d filter, say 31 × 31, execution time may still exceed half a video frame (20 ms) . Several approaches will be considered for reducing the time complexity.
Some 2-d filters are separable into two 1-d convolutions. Thus the 2-d Gaussian can be effected by successive application of a 1-d filter in the two coordinate directions. Reduction to 1-d convolution leads to a drastic improvement in execution time. Equation 11 can be written as:
Figure imgf000071_0001
and by substitution of tα, Tp 1d and ts it reduces to,
Figure imgf000071_0002
The filters we deal with all have symmetry about the origin. The Gaussian has even symmetry, while the derivative of the Gaussian has odd symmetry. This brings up the interesting question of how far one can exploit symmetry to improve execution time. Consider first convolution in one dimension with a filter of even symmetry and length P = 2L + 1 applied at point dm,
Figure imgf000071_0003
where the filter elements fi, on either side of the center element f0, are equal. Using the following word format, the algorithm for 1-d convolution with an even function is given in Fig. 17.
For convolution in the x-direction, all shifts are one-place, "shiftag(±1)" ; in the y-direction each one is N/b long shifts, "for(i=0; i< N/b;i++) shiftag(±b);". Applying this algorithm to a separable 2-d filter with even symmetry, such as the Gaussian, the time complexity becomes,
Figure imgf000071_0004
For odd symmetry, f0 = 0, and fi is the absolute value of the filter elements on either side of it. Hence the convolution formula reduces to,
Figure imgf000071_0005
and two modifications are required in the algorithm: to remove step 1 of the program (evaluation of d · f0); and change step 1 of the loop from ΝEXT+d to ΝEXT-d. Time complexity of the odd symmetry filter, applied in both directions, becomes,
Figure imgf000072_0001
The methods for reducing convolution time that were considered so far, all take advantage of particular properties of the filter: symmetry, element bit statistics, and separability of 2-d filter. We now a consider an enhancement that applies to the most general filter, treats the pixel data as multiplier, and speeds up multiplication by applying 4 bits of the multiplier at a time. Referring to the following algorithm and word format as illustrated in Fig. 18, the 8-bit data field is partitioned into a high (dh) and low (dL) nibble. A 12-bit field, md, is provided for the partial product of the current filter element by the current data nibble. To fill in all the partial products by table look-up requires 15 compare-write cycles (no action required for all-ZERO nibble).
Using the enhanced multiplication algorithm, the time complexity of general 2-d image convolution becomes,
Figure imgf000072_0002
and that of 1-d convolution in both directions,
Figure imgf000072_0003
Tm is the time (2 × 15 × 2.5 cycles) to generate the two partial products by table look-up, and Tp1, Tp2 are their carry propagation times following addition into field fd. Substituting for Tm , tα and ts, and evaluating Tp1, Tp2, the time complexities reduce to,
Figure imgf000072_0004
and
Figure imgf000072_0005
The following table compares all the convolution methods discussed for filters of size of 7 × 7, 15 × 15 and 31 × 31. Enhanced multiplication Tenh is seen to be the fastest method to compute general convolution, achieves this at a 12-bit increase in word-length. For particular filter characteristics, other methods offer a modest advantage, but those exploiting symmetry require an even larger increment in word length (17 bits) .
Figure imgf000073_0002
Marr & Hildreth Edge Detection
This algorithm [39] detects the Zero Crossings, ZC, of the Laplacian of the Gaussian filtered image, and can be written as:
ZC(∇2Gσ * I) (13) where I is the original image, Gσ is the 2-d Gaussian filter of scale σ and∇2 is the Laplacian operator. The M&H algorithm consists of two steps:
1. Approximate∇2G * I by the Difference Of two Gaussians DOG(I).
2. Detect the ZC in the DOG(I) output. The DOG filter has the following form:
Figure imgf000073_0001
where σp and σn are the space constants of the positive and negative Gaussian respectively, and their ratio for closest agreement with the Laplacian of the Gaussian (∇2G)
Figure imgf000074_0002
operator .
Implementation of the DOG filter requires four 1-d convolutions, a row and a column convolution for each space constant. The complexity of associative DOG execution is,
Tdog = 2T1d (Pp) + 2T1d(Pn) + Tdiff (15) where Pp and Pn are the appropriate filter sizes for space constants σp and σn respectively, and Tdiff is the complexity of M-bit subtraction and borrow propagation into the sign bit. The speed of associative subtraction is the same as that of addition, hence,
Tdiff = 8.5M + 2. (16)
Assuming the fastest general convolution method , we substitute eqs. 20 and 16 with
Figure imgf000074_0003
M = 8 in eq. 15 to obtain,
Figure imgf000074_0001
Pp, Pn and DOG complexity (in cycles and milliseconds) are tabulated below for the three filters corresponding to σ = 0.5, 1 and 2 respectively, where Pp and Pn are the filter vector lengths.
Figure imgf000074_0004
The second step of the M&H algorithm, zero cross detection, operates on a 3 × 3 neighborhood. The center pixel is considered to be an edge point if one of the four directions (horizontal, vertical and the two diagonals) yields a change in sign. Specifically, we test if one item of a pair (about the center) exceeds a positive threshold T while the other is less than -T. The associative implementation of ZC for each space filter is outlined below. 1. Compare all pixels ( resulting from the DOG function) concurrently against the thresholds T and -T, and assign two bits in memory to indicate the results.
2. Shift and write all pairs of indicator bits concurrently from the 8 neighbors of each pixel into a 16-bit field of its word in memory.
3. Use the 16-bit indicator field to test all 4 directions for a ZC and mark the edge-points.
The associative algorithm to detect zero crossings shows a time complexity of 165 cycles or 4.95 microseconds. It should be noted that the M&H algorithm generates edge points without gradient direction. This parameter can be computed by operating on a larger neighborhood (9 × 9) around each edge point. An associative algorithm to detect 16 segment directions (and corners) was developed and is described below. Its time complexity is 1010 cycles or 30.3 microseconds.
Canny Edge Detection
Canny's algorithm [40] has three stages:
1. Directional derivatives of Gaussian filter (∇G * I ).
2. Non-maximum suppression.
3. Threshold with hysteresis.
The general form of the Gaussian derivative of scale σ in direction n is:
Figure imgf000075_0001
The x and y derivatives of the Gaussian filter can be obtained by convolving the image with
Figure imgf000075_0002
and respectively. Applying the enhanced multiplication method, the execution
Figure imgf000075_0004
time for a typical set of filter sizes becomes:
Figure imgf000075_0003
Figure imgf000075_0005
Non-maximum suppression selects as edge candidates pixels for which the gradient magnitude is maximal. For optimum sensitivity, the test is carried out in the direction of the gradient. Since a 3 × 3 neighborhood provides only 8 directions, interpolation is used to double this number to 16. To determine if maximal, the gradient value at each pixel is compared with those on either side of it. Associative implementation requires somewhat fewer operations than that of zero-cross detection discussed earlier.
Thresholding with hysteresis eliminates weak edges that may be due to noise, but continues to trace strong edges as they become weaken. On the basis of signal statistics and an estimate of noise in the image, two thresholds on gradient magnitude are computed - low and high. Edge candidates with gradient magnitude under low are eliminated, while those above high are considered edges. Candidates with values between low and high are considered edges if they can be connected to a pixel above high through a chain of pixels above low. All other candidates in this interval are eliminated. The process involves propagation along curves.
Associative implementation ( Listing 3 ) uses three flags as shown in Fig. 19: E, which initially marks candidates above high threshold (unambiguous edge points), and at the end designates all selected edge points; OE (Old Edges) to keep track of confirmed edges at the last iteration; and L to designate candidates above low. At every iteration, each L candidate is examined to see if at least one of its 8-neighbors is an edge, in which case it is also declared an edge by setting E. Before moving E into OE, the two flags are compared to see if steady state has been reached, in which case the process terminates.
LISTING 3: Curve Propagation
main( )
{
/* ... declarations */
letm d(OE); letc; setag; write; /* clear OE */
letmc d(E); setag; compare; letmc d(OE); write; /* copy E into OE */
while ( rsp )
{
letmc d(E); setag; compare; /* load Tag with E */
/* OR the three northern unambiguous edges into E */
shiftag(-b); shiftag(-1); write;
shiftag(1); write; shiftag(1); write;
save_new_edges();
/* OR the left and right unambiguous edges into E */
shiftag(1); write; shiftag(-1); shiftag(-1); write; save_new_edges() ;
/* OR the three southern unambiguous edges into E */
shiftag(b); shiftag(1); write;
shiftag(-1); write; shiftag(-1); write;
save_new_edges();
letm d(OE); letc; compare; setc; write;
}
}
/* Find new edges by ANDing L into E */
save_new_edges()
{
setag; compare; letc; write;
letmc d(L); compare; letmc d(E); write;
}
Program time complexity is given by,
Tpro = I(23.5 + j) (19) where I is the number of iterations, 23.5 is the time to examine the state of 8 neighbors, and N/b accounts for long shifts to bring in edge points from neighboring rows. The upper bound of I, given by the longest propagation chain, is nearly N2/2, but for a representative value of 100 iterations the complexity of curve propagation becomes 3950 cycles or 119 microseconds.
Thinning
The propagation algorithm presented above produces a curve that may not be thin. A multipass thinning algorithm is proposed, consisting of a pre-thinning phase and an iterative thinning phase. Referring to Fig. 20A-20E, the pre-thinning phase fills a single gap by applying template (a) , and removes some border noise by clearing point P if one of templates b,c or d holds. Multi-pass implies that the templates are applied first in the north direction, then in the south, east and west directions, in succession - except for template (a) which is fully symmetrical and need only be applied once. All templates are shown in the north direction and use an X to denote a "don't care" (ONE or ZERO). Similarly, the thinning phase tests templates e,f and g in each of the 4 directions successively, and clears point P when agreement is found. This 4-pass sequence is iterated until there is no further change. Particularly worthy of note is the quality of the skeleton produced by this simple local process. The most precise definition of a skeleton is based on the medial axis. Davies and Plummer [41] proposed a very elaborate algorithm to produce such a skeleton, and chose 8 images for testing it. Our thinning algorithm was applied to these images and produced interesting results: the skeletons agree virtually exactly with those of Davies and Plummer; any discrepancy, not at an end-point, occurs at a point of ambiguity and constitutes an equally valid result. The removal of border noise in the prethinning phase prevents the formation of extraneous spurs in the skeleton. Time complexity of the algorithm is given by,
Figure imgf000078_0001
where the first two terms account for the prethinning phase. Execution times are 150 cycles (4.5 μs) for prethinning and 214 cycles (6.4μs) per thinning iteration. For edge thinning 3 iterations will suffice, giving a completion time of 24 microseconds.
Single-pass thinning was considered and found to be rather critical. The algorithm proposed by Chin et alia [42] appears to be optimal, yet it does not yield an ideal skeleton, and application of their own preliminary phase of noise-trimming leads to amputation of some main branches during thinning.
Stereo Vision
Stereo vision must solve the correspondence problem: for each feature point in the left image, find its corresponding point in the right image, and compute their disparity. Since stereo has been a major research topic in computer vision over the past decade, a large number of approaches have been proposed, too many to attempt to implement them all associatively. Instead, we concentrate on the Grimson [43] algorithm. This also has some similarity with the hierarchical structure of human vision, and it can use its input edges producted by the M&H or Canny edge detection schemes discussed above.
Assume edge detection was carried out on both the left and the right image, and the results are sitting side by side in memory. The edge points are marked and their orientation given to a 4-bit precision over 2π radians or a resolution of 22.5 degrees. The stereo process uses the left image as reference, and matches edge points with gradient of equal sign and roughly the same orientation. Edge lines near the horizontal (within ±33.75 degrees) are excluded in order to minimize disparity error. The Grimson algorithm consists of the following steps:
• Locate an edge-point (of acceptable orientation) in the left image. • Divide the region about the corresponding point in the right image into three pools.
• Assign a match to the edge-point based on the potential matches within the pools.
• Disambiguate any ambiguous matches.
• Assign disparity values.
The associative memory word format for the matching process is given below. Input fields DL, DR label edge points of the left and right images, respectively, while DIR_L, DIR_R give their orientations. The resulting value of disparity will be recorded in output field DISP. The associative algorithm is outlined in Fig. 21.
• Search the neighborhood of ±W pixels divided into three pools, A, B and C, where pools A and C are equal in size and represent the divergent and convergent regions, respectively. The smaller pool B, is the region about zero disparity.
• Shift fields DIR_R and DR (of the right image) W word positions down (corresponding to a shift in the right image of W pixels to the right).
• For edge points (DR,DL = 1), compare field DIR_R against DIR_L within a tolerance of ±1. After each comparison, shift the right image fields up one word position, until position + W is reached. The results of the comparison are registered in fields PX and TX (X= A,B or C). A PX value of 00 indicates no match, 01 indicates one match, and 11 more than one match in pool X. The TX fields temporarily store disparity in each pool for use in case of ambiguity. Note that disparity is the net shift of the right image at the time the match was found.
• Edge points with unambiguous disparity (01 in some PX, 00 in all PY where Y≠ X) are selected and their disparity assigned in field DISP (DISP := TX).
• An attempt is made to disambiguate matches in more than one pool by use of the dominant disparity in the neighborhood. If there is a dominant pool in the neighborhood, and the ambiguous point has a potential match in the same pool, then that is chosen as the match. Otherwise, the match at that point remains ambiguous. To test a neighborhood for dominant disparity we start by counting all its edge points (field DL) into the COUNT field. Then, for each pool in turn, we count into field COUNT_P just its unambiguous matches over the same neighborhood. If COUNT_P > COUNT/2 the pool is dominant and a match in the same pool will have its disparity copied from TX into DISP. If no pool is dominant or there is no match in the dominant pool, DISP will not be updated and the point in question will stay marked as ambiguous in bit MR.
• The last step tests whether disparity is within range. Marr & Poggio show that for a region within range, more than 70 percent of its edge points will be matched. Unmatched edge points are labeled in MR and counted over the neighborhood into field COUNT_P. If COUNT_P > COUNT/4, all edge points in the neighborhood are labeled as unmatched and their disparity cleared.
The time complexity of the associative stereo algorithm is given by,
Tstereo = Tsh + Tmαt + Tdis + Tor (20) where Tsh accounts for shifting the right image, Tmαt is the time to evaluate matches within the pools, Tdis the disambiguation time, and Tor the time to find and remove out-of-range disparity. The shift time in cycles is given by,
Figure imgf000080_0001
The first term accounts for the initial and final W-place shift-up of fields DR, DIR-R (5 bits). The second term covers the one-place shift-down between successive matching operations. And the last term is due to the generation and update of a border flag for handling row end effects. Consider now the matching complexity Tmαt. It requires an 8-cycle comparison for every disparity (2W + 1) and every orientation (10) with a tolerance of ±1. Hence
Tmαt = 8 × 3 × 10 (2W + 1) + 13.5 (22) where the second term accounts for final processing of the comparison results.
The disambiguation process consists of the following steps:
• Count the edge points over every L × L neighborhood. • For each of three pools:
- Count the unambiguous matches over the same neighborhoods.
- Compare this result to half the edge count.
- Copy disparity from match in dominant pool.
Hence we may write,
Tdis = Tcnt + 3 (Tcnt + Tgt + Tcpy + 3.5) (23) where Tcnt is the time to count labeled pixels over a neighborhood, Tgt to compare for greater than, and Tcpy to copy disparity, respectively. Tcnt is the subject of the next section, while the other two are given by,
Figure imgf000081_0001
Figure imgf000081_0002
The disambiguation algorithm is listed below.
LISTING 4: Attempt to Resolve Ambiguous Matches
count_flag_to_field(DL, COUNT); /* count edge points in L X L neighborhood */
for(pool=PA; pool<= PA+4; pool+=2)
{ /* Flag unambiguous points in TEMP */
letm d(TEMP); letc; setag; write; /* clear TEMP */
letm dseq(PA,PA+5); letc d(pool); setag; compare;
letmc d(TEMP); write;
count-flag_to_field(TEMP,COUNT_P);
/*********** Test if COUNT_P > COUNT/2 ****************/
letm d(TEMP); letc; setag; write; /* clear TEMP */
/* Starting with COUNT_P+2k-2 , compare bit i of COUNT_P to bit i+1 of
COUNT. COUNT_P+2k-1 used as GTF */
for(bit_count=2*k-2; bit_count>=0; bit_count╌)
{ next_bit = COUNT + bit_count + 1;
next_bit_p = COUNT_P + bit_count;
letm d(MARK) d(TEMP) d(GTF) d(next_bit) d(next_bit_p);
letc d(MARK) d(next_bit); setag; compare;
letc d(MARK) d(TEMP) d(next_bit); write;
letc d(MARK) d(next_bit_p); setag; compare;
letc d(MARK) d(GTF) d(next_bit_p) ; write;
}
/********** Copy disparity of dominant pool to DISP **************/
letmc d(GTF); setag; compare;
letm dseq(DISP,DISP+k-1); letc; write; /* clear DISP field marked by GTF */ letmc d (GTF) d (MARK) d (pool ) ; setag ; compare ;
letc d (GTF) d (pool) ; write ; /* clear MARK of disambiguated points */
for(bit_count=0 ; bit_count<k ; bit_count++)
{ letmc d (GTF) d (2* (pool-PA)+TA+bit_count) ; setag ; compare ;
letmc d (DISP+bit_count) ; write ;
}
}
Lastly, the time to test and register out-of-range disparity is given by,
Tor = 4.5 + Tcnt + Tlt + Trm (26) where the constant term accounts for grouping unmatched edge points and labelling them in MR; Tcnt is the time to count the unmatched edge points over the neighborhood; Tlt covers comparison of this count to 1/4 the number of edge points in the neighborhood; and Trm is the time to label and clear disparity of edge points in out-of-range neighborhoods.
Figure imgf000082_0003
Figure imgf000082_0001
Stereo matching is performed for each of the spatial frequency channels. From the Marr and Poggio [44] model of stereo vision,
W = P and L = 2W + 1 where P is the filter size (vector length) of the channel. If we choose P of the form 2i - 1, then L has the same form,
L = 2i+1 - 1 = 2k - 1 with k = log2(L + 1)
Substituting 21 - 28 into equation 20, using the relations between P, W and L, and applying the above definition of k, we obtain,
Figure imgf000082_0002
The algorithm complexity excluding neighborhood counts, is tabulated below in cycles and milliseconds for three channels of spatial frequency.
Figure imgf000083_0001
We now consider the problem of counting labeled pixels over a sizable neighborhood, a function that is executed five times during stereo evaluation, hence could come to dominate its time complexity.
Linear Summation
The straightforward approach to count labeled pixels over a neighborhood about each pixel would be to provide a count field in each word and let the neighboring labels increment it as they are entered in some convenient sequence. For an L × L neighborhood the maximum count is L2, and for L odd, the count field length is 2 log2 L. The program listing follows, where "flag" labels the pixels to be counted, and "field" points to the beginning (LSB) of the count field. The word format is in Fig. 22.
LISTING 5: Linear Summation Over L × L with L Odd count_flag_to_field(flag,field)
int flag, field;
{ /* ... declarations */
letm d(TEMP) d(CY) dseq(field,field-1+ceiling(Log2(L*L)));
letc; setag; write; /*clear count field, TEMP & Carry*/
/* Shift flag (L-1)/2 lines and columns down and enter in TEMP */
letmc d(flag); setag; compare;
for(index=0; index< (L-1)/2; index++) {shiftag(1); shiftag(b);}
letmc d(TEMP) d(CY); write; /* move flag to TEMP & Carry*/
for(line_count=0; line_count<L; line_count++)
{ for(pixel_count=0; pixel_count<L; pixel_count++)
{ for(bit=field; bit<field+ceiling(Log2(L*L)); bit++)
{ letm d(bit) d(CY); /* increment COUNT */
letc d(CY); setag; compare; letc d(bit); write;
letc d(bit) d(CY); setag; compare; letc d(CY); write; if (pixel_count < L-1)
{ letmc d(TEMP); setag; compare; letc; write;
if(BIT(0) OF_WORD line_count) shiftag(1); else shiftag(-1); letmc d(TEMP) d(CY); write;
}
}
if (line_count<L-1)
{ letmc d(TEMP); setag; compare; letc; write;
shiftag(-b); letmc d(TEMP) d(CY); write;
}
}
Execution time in machine cycles is given by,
Figure imgf000084_0001
where k = log2 (L + 1), hence it grows as L2 log L. Incorporation of this program in the stereo algorithm completely swamps its execution time, which, for the coarsest channel, now exceeds video frame time. This is illustrated in the following table, which gives all times in milliseconds. For the sake of comparison, stereo execution time excluding neighborhood counts is repeated here under Tst-cnts .
Figure imgf000084_0003
2-d Summation
In the linear summation program, every row of L labels was actually counted L times, once in each of the vertically overlapping neighborhoods. By taking advantage of the two-dimensional structure of the neighborhood, the count may be carried out in two stages: first the neighboring labels within each row are tallied up, and then the vertically neighboring row sums are accumulated as they are entered in some convenient sequence. That requires an additional "rows" field of length log L, and yields the following program, where the word format is as in Fig. 23 and
Figure imgf000084_0002
LISTING 6: 2-d Summation Over L × L with L Odd count_flag_to_field(flag,field)
int flag, field;
{ /* . . . declarations */
letm d(cy) dseq(rows,flag-1); letc; setag; write; /* clear cy, rows & count fields */ /* Shift flag (L-1)/2 lines & columns down into "field" */
letmc d(flag); setag; compare;
for(index=0; index<(L-1)/2; index++) {shiftag(1); shiftag(b);}
tmp_cnt=field+kc-1; /* pointer */
/* Move flag to "tmp-cnt", "rows" and "field" to avoid the first row summation */ letmc d(tmp_cnt) d(field) d(rows); write;
/******************** Rows Summation *****************/
for(shift_count=0; shift_count<L-1; shift_count++)
{ /* Shift "tmp-cnt" up and enter into "tmp_cnt" and "cy" */
letmc d(tmp_cnt); setag; compare; letc; write;
letmc d(tmp_cnt) d(cy); shiftag(-1); write;
for(bit_count=0; bit_count<kr; bit_count++)
{ /* Increment rows and count fields */
rows_next=rows+bit_count;
field_next=field+bit-count;
letm d(rows_next) d(field_next) d(cy);
letc d(cy); setag; compare; letc d(rows_next) d(field_next); write;
letc d(rows_next) d(field_next); d(cy); setag; compare; letc d(cy); write; }
}
/******************** Columns Summation *****************/
letm d(tmp_cnt); letc; setag; write; /* clear "tmp_cnt" */
for(shift_count=0; shift_count<L-1; shift_count++)
{
for(bit_count=rows; bit_count<rows+kr; bit_count++)
{ letmc d(bit_count); setag; compare; letc write;
letc d(bit_count); shiftag(-b); write;
}
/* count field <╌ count field + rows field */
for(bit_count=0; bit_count<kr; bit_count++)
{ rows_next=rows+bit_count;
field_next=field+bit_count;
letm d(field_next) d(cy) d(rows_next);
letc d(cy); setag; compare; letc d(field_next); write;
letc d(field_next) d(cy); setag; compare; letc d(cy); write;
letc d(field_next) d(rows_next); setag; compare; letc d(cy) d(rows_next) ; write; letc d(rows_next); setag; compare; letc d(field_next) d(rows_next); write;
}
/****** propagate carry *******/
letc d(cy);
while (++field_next < field+kc )
{ letm d(field_next) d(cy); setag; compare; letc d(field_next); write;
letc d(field_next) d(cy); setag; compare; letc d(cy); write;
}
}
} Execution time in machine cycles is given by,
Figure imgf000086_0001
For our case, in which the filter vector lengths were chosen to be of the form 2i - 1, it was shown earlier that L is also of the form 2k - 1. Hence we may write, kr = k = log2(L + 1) and kc = 2k and Tcnt2 becomes,
Figure imgf000086_0002
Execution time now grows as L log L. This count algorithm brings stereo complexity well within real-time video limits, as indicated by the following table, listing execution times in milliseconds.
Figure imgf000086_0003
2-d Tree Summation
Although 2-d summation reduces stereo time complexity sufficiently to meet real-time requirements, the question arises whether this result is optimal or can be improved significantly. While retaining the two-dimensional summation approach (first by rows and then by columns), consider handling each dimension in binary tree fashion, summing elements in pairs, then pairing results and adding again until the dimension is covered. Since L is of the form 2k - 1, an extra row and column are added to complete the tree. The extra element denoted by "tail" , is remembered and subtracted from the sum, hence a k-bit "tail" field is provided for temporary storage of the extra row sum. The program listing is given below. The word format is in Fig. 24. LISTING 7: 2-d Summation Over L × L with L Odd count_flag_to_field (flag,field)
int flag, field;
{ /* ... declarations */
letm dseq(tail,flag-1); letc; setag; write; /* clear count & tail fields */
/* Shift flag (L+1)/2 rows & columns down into "tail" and "field" */ letmc d(flag); setag; compare;
for(index=0; index< (L+1)/2; index++) {shiftag(1); shiftag(b);}
letmc d(tail) d(field); write; /* enter flag in "tail" and "field" */
/********************** Rows Summation *******************/ tree_step=0;
for(shift_count=1; shift_count<=(L+1)/2; shift_count*=2)
{ cy = field+tree_step+1;
for(bit_count=field; bit_count<cy; bit_count++)
{ letm d(tmp); letc; setag; write; /* clear tmp */
letmc d(bit_count); setag; compare;
for(index=0; index<shift_count; index++) shiftag(-1);
letmc d(tmp); write; /* move flag to tmp */
add_bits (tmp,bit_count, cy);
}
tree_step++;
}
/ * * * Subtract "tail" from row sum, write results into tail field * * * / letm d(tmp); letc; setag; write; /* clear tmp (use it as borrow) */ letm d(temp) d(tail); letc d(temp); write;
for(bit_count=0; bit_count<k+1; bit_count++)
{ letmc d(tmp) d(field+bit_count); setag; compare; letc; write;
letc d(tmp); setag; compare; letc d(tmp) d(field+bit_count); write; letmc d(field+bit_count); setag; compare; letmc d(tail+bit_count); write;
}
/******************** Columns Summation *****************/ tree_step=0 ;
for(shift_count=1; shift_count<=(L+D/2; shift_count*=2)
{ cy = field+k+tree_step;
for(bit_count=field; bit_count<cy; bit_count++)
{ letm d(tmp); letc; setag; write; /* clear tmp */
letmc d(bit_count); setag; compare;
for(index=0; index<shift_count; index++) shiftag(-b);
letmc d(tmp); write; /* move flag to tmp */
add_bits (tmp , bit_count , cy) ;
}
tree_step++;
}
/ * * * Subtract tail field from summation * * */
letm d(tmp); letc; setag; write; /*clear temp*/
for(bit_count=0; bit_count<k; bit_count++)
{ letm d(tail+bit_count) d(field+bit_count) d(tmp);
letc d(tail+bit_count); setag; compare;
letc d(tail+bit_count) d(field+bit_count) d(tmp); write;
letc d(field+bit_count) d(tmp); setag; compare; letc; write;
letc d(field+bit_count) d(tail+bit_count); setag; compare;
letc d(tail+bit_count); write; letc d(tmp); setag; compare;
letc d(field+bit_count) d(tmp); write; }
/* * propagate borrow * */
for(; bit_count<2*k; bit_count++)
{ letmc d(field+bit_count) d(tmp); setag; compare; letc ; write;
letc d(tmp); setag; compare; letc d(field+bit_count) d(tmp); write;
}
}
/******* ADD BITS FUNCTION *******/
add_bits (tmp,bit_count, cy)
int tmp, bit_count, cy;
{ letm d(bit_count) d(tmp) d(cy);
letc d(cy) ; setag; compare; letc d(bit_count) ; write;
letc d(cy) d(bit_count) setag; compare; letc d(cy) ; write;
letc d(tmp) d(bit_count); setag; compare; letc d(cy) d(tmp) ; write;
letc d(tmp); setag; compare; letc d(tmp) d(bit_count); write;
}
Execution time is given by,
Figure imgf000088_0001
and since
Figure imgf000088_0002
Figure imgf000088_0003
The results are tabulated below for the three channels defined earlier.
Figure imgf000088_0004
Count complexity is still seen to grow as L log L in equation 32, but the table indicates an improvement of 40 percent over 2-d summation (for the largest neighborhood), and a resulting improvement in stereo complexity of 27.5 percent. The improvement stems mostly from the fact that at each level of the tree, arithmetic is carried out just to the precision required, which is known in advance.
Discussion of Stereo Results
Real time stereo vision by associative processing was considered and the problematic function was identified as the counting of labeled pixels over a large neighborhood. This function occurs four times in the resolution of ambiguous matches, and once in the handling of out-of-range disparity. Implementation of the count function in a straightforward (linear) manner pushes stereo execution time beyond real-time (video) limits. It was shown how associative array and tree techniques can be applied to great advantage; indeed, they bring stereo complexity well within real-time limits. To illustrate the results graphically, we present two comparative plots. Fig. 25 gives execution time as a function of neighborhood dimension for the three implementations described: the linear, the two-dimensional, and the two-dimensional tree. Fig. 26 presents stereo complexity without neighborhood counts, and with neighborhood counts by each of the three methods, all as a function of neighborhood dimension.
Optical Flow
Optical flow assigns to every point in the image a velocity vector which describes its motion across the visual field. The potential applications of optical flow include the areas of target tracking, target identification, moving image compression, autonomous robots and related areas. The theory of computing optical flow is based on two constraints: the brightness of a particular point in the image remains constant, and the flow of brightness patterns varies smoothly almost everywhere. Horn & Schunck [45] derived an iterative process to solve the constrained minimization problem. The flow velocity has two components (u, v). At each iteration, a new set of velocities (un+1, vn+1) can be estimated from the average of the previous velocity estimates ("f , """-) by,
Figure imgf000089_0001
where a is a weighting factor dependent on noise in the measurement. The partial derivatives Ex, Ey and Et in ( 33) are estimated as an average of four first differences taken over adjacent measurements in a cube,
Figure imgf000089_0002
Figure imgf000090_0002
Ei,j,k is the pixel value at the intersection of row i and column j in frame k. Indices i, j increase from top to bottom and left to right, respectively. Local averages ū and
Figure imgf000090_0004
in ( 33) are defined as follows:
Figure imgf000090_0003
There is the practical matter of how to interlace the iterations with the time steps. A good initial guess for the optical flow velocities is usually available from the previous time step (video frame time) . In the case of high speed motion, one iteration per time step may not be enough to get a stabilized value of optical flow, and there is a necessity to iterate several times before advancing to the next frame.
To implement the above equations associatively, the memory word was partitioned into multiple fields, each holding input data, output data, or intermediate results. The format is given in Fig. 27.
The flow is computed from two successive video frames: En and En1. Each frame contains 512 × 512 pixels whose grey level is given to 8-bit precision. During vertical blanking (time interval between frames) , the current image is moved to En1, and a new image from the I/O buffer array (Fig. 15) is written into field En. During the frame time, one or more iterations of the algorithm are executed, enough to obtain a reasonable approximation of the optical flow for use with the next frame.
Equation 33 can be rewritten as,
Figure imgf000090_0001
Figure imgf000091_0001
where Dx = Ex/P, Dy = Ey/P and P = α2 + Ex 2 + Ey 2. It will be noted that the partial derivatives Ex, Ey and Et, as well as Dx and Dy, remain fixed during a given frame and are not involved in the iterative process between frames. Hence computation of these parameters is referred to as the "fixed part" of the algorithm.
Fixed Part
The first stage computes the partial derivatives Ex, Ey and Et.
• Clear fields Ex, Ey and Et.
• Shift En, En1 one row up and one column to the left (N + 1 words up) to furnish Ei+1,j+1,n and Ei+1 ,j+1,n+1.
• Enter En+1 into Ex, Ey, Et; add or subtract En into the derivative fields as indicated in the following table, step 2.
• Shift En, En1 one column to the right (one word down), to furnish Ei+1,j,n and Ei+1,j,n+1.
• Accumulate En, En1 as shown in the following table, steps 3,4:
Figure imgf000091_0002
Shift En, En1 one row ( N words) down, and accumulate as shown in the above table, steps 5,6.
Shift En, En1 one column to the left (one word up) and accumulate as shown in the above table, steps 7,8. • Shift En, En1 to their original position (one word down) .
The next stage in is to evaluate Dx and Dy:
• Use a look up table to compute α2 + Ex 2, enter it in field Sc.
• Use another look up table to compute Ey 2, enter it in field Ac.
• Add field Ac into field Sc to obtain P.
• Divide field Ex by field Sc placing the quotient in field U.
• Divide field Ey by field Sc placing the quotient in field V .
• Copy Dx from field U to the right side ( 9 least significant bits) of field Sc.
• Copy Dy from field V to the left side of field Sc.
Before executing another division, the dividend Ex or Ey is ZERO extended to the left for proper scaling with respect to divisor P. It then follows from α > 0 that Dx, Dy < 1, hence the quotient will not overflow. The complexity of the "fixed part" was evaluated as,
Figure imgf000092_0002
Iterative Part
• Compute local average of u component (field U) as prescribed in equation 44, and place the result (ū) in field Uav. The neighboring values of u are accessed by shifting the U- field up and down in an optimal sequence and summing into field Ac. The 4-neighbors are given double weight by adding them in one place to the left. The value accumulated in Ac is divided by 3 and the result placed in field Uav.
• Similarly, compute local average of υ component (field V) as prescribed in equation 44 and place the result in field Vav .
Figure imgf000092_0001
• Copy Et into field Ac with sign extension.
• Multiply Ex by U placing the product in field U. • Add field U into field Ac
• Multiply Ey by Vav placing the product in field Ac.
• Add field U into field Ac to obtain the term
Figure imgf000093_0002
• Multiply field Ac by the right side of field Sc to obtain Place the
Figure imgf000093_0001
result in field U.
• Multiply field Ac by the left side of field Sc to obtain Place the
Figure imgf000093_0003
result in field V.
• Compute the u component of optical flow by evaluating Uav - U into field U.
• Compute the υ component of optical flow by evaluating Vav - V into field V.
The complexity of this part in machine cycles per iteration is given by,
Figure imgf000093_0004
Hence the overall complexity of optical flow can now be evaluated as,
Figure imgf000093_0005
where I denotes the number of iterations required for the flow to converge. Evaluating the above expression, the fixed part takes 266 μs., and each iteration requires an additional 196 μs. The execution time for different values of I is given in the following table:
Figure imgf000093_0006
Mid-Level Vision
Detection of Corners and Line Direction
An important feature for middle and higher level processing is the ability to distinguish corners and line direction. In the case of Canny Edge Detection, line orientation was generated during the process. On the other hand, the M&H algorithm is not directional, and the edge bit-map it produces must be further processed to detect line orientation. We proposed to use the edge bit-map of a 9 × 9 neighborhood around each pixel to distinguish segment direction. The resulting algorithm can discriminate 120 different lines and corners. The approach is outlined below.
1. Since a 9 × 9 neighborhood involves too many patterns to handle directly, it was partitioned into 24 sectors as shown in Fig. 27. To maintain angular accuracy, sector size increases with distance from the center. The sector evaluation function is here defined as the logical OR of its edge point indicators, and a 24- bit field was assigned to the sector values. Evaluation is carried out by shifting in neighboring edge point indicators and OR'ing them directly into the corresponding sector values.
2. Sector partitioning was based on 7r/8 angular resolution, and defines 16 equally spaced segments or rays around the circle. Each segment (direction) is characterized by a required code in a prescribed subset of the segment values, and a maximum Hamming distance of 1 is permitted. The sector value field is now compared against each of the 16 codes and the results registered in a 16-bit field to mark the presence of each segment direction.
3. Simplification of segment characterization and testing introduces a measure of ambiguity in each triplet about one of the 8 main compass directions. Additional sector values are now used to resolve this uncertainty and the few cases of inherent ambiguity are settled arbitrarily.
4. The 16-bit segment field can now be tested for any pair of segments representing a given line or corner. The sample program selects all pairs without distinction.
From simulation, the complexity of this algorithm was found to be 1010 cycles or 30.3 μs. It should be noted that the concept of this algorithm can be extended to a broad range of functions depending on the choice of sector boundaries, of the evaluation function, and of pattern characterization.
Contour Tracing and Labeling
A preparation step labels each contour point with its x-y coordinates. The main process is iterative and operates on a 3 x 3 neighborhood of all contour points in parallel. Every contour point looks at each one of its 8 neighbors in turn and adopts the neighbor's label if smaller than its own. The circular sequence in which neighbors are handled appreciably enhances label propagation. Iteration stops when all labels remain unchanged, leaving each contour identified by its lowest coordinates. The point of lowest coordinates in each contour is the only one to retain its original label. These points were kept track of and are now counted to obtain the number of contours in the image. Listing 8 presents the program in associative memory. The input fields are [xy_coord] to specify pixel position and [edge] to identify contour points. The output fields are contour [label] and contour starting point [mr]. The word format is in Fig. 29.
LISTING 8: Contour Tracing and Labeling main()
{
... declarations
I* * Clear working fields **/
letm dseq(label,mark); letc; setag; write;
/ * * * Mark and label all edge points * * * /
letmc d(edge); setag; compare;
letmc d(mark); write;
for(bit_count=0; bit_count<label_size; bit_count++)
{
letmc d(xy_coord+bit_count) d(edge); setag; compare;
letmc d(label+bit_count); write;
}
while ( new_condition > growth_threshold)
{
letm d(sf) letc; setag; write; /*clear switch flag */
/* * * * * * CONNECTIVITY TESTING * * * * * * /
for(window_index=0; window_index<8 ; window_index++)
{
/* Shift "edge" and "label" into "temp" and "operand" */ letm dseq(temp,operand+label_size-1); letc; setag; write; for(bit_count=0; bit_count<label_size+1; bit_count++) {
letmc d(edge+bit_count); setag; compare; letmc d(temp+bit_count);
general_shift (window_index); write;
}
/ * * Test if "operand" < "label" * */
letm d(gt) d(lt); letc; setag; write;
/* clear greater & less than flags */ for (bit_count=label_size-1 ; bit_count>=0 ; bit.count - - ) {
letm d(edge) d(temp) d(gt) d(lt)
d(operand+bit_count) d(label+bit_count); letc d(edge) d(temp) d(operand+bit_count); setag; compare;
letc d(edge) d(temp) d(gt) d(operand+bit_count); write; letc d(edge) d(temp) d(label+bit_count); setag; compare; letc d(edge) d(temp) d(lt) d(label+bit_count); write; }
letmc d(lt); setag; compare;
/* clear "label" and "mark", set switch flag */ letm dseq(label,label+label_size-1) d(mark) d(sf);
letc d(sf); write;
/* copy "operand" into "label" */
for(bit_count=0; bit_count<label_size; bit_count++) letmc d(operand+bit_count) d(lt); setag; compare; letmc d(label+bit_count); write; }
}
/ * * Test for terminat ion * */
letmc d(sf); setag; compare;
new_condition = countag;
}
/* * Find number of contours * */
letmc d(mark); setag; compare; countag;
}
general_shift (index)
int index;
{
if (index<=2) shiftag(b) ;
if (index>=4 && index<=6) shiftag (-b) ;
if (index=0 | | index=6 | | index==7) shiftag(-1) ;
if (index>=2 && index<=4) shiftag(1) ;
}
The time complexity of the algorithm (in machine cycles) is given by,
Figure imgf000097_0001
The upper bound of I is nearly N2/2, but for a representative value of 100 iterations, execution time becomes 218 kilocycles or 6.6 milliseconds. A good approximation to the time complexity is,
Figure imgf000097_0002
A list of contours, giving label and length (in pixels) , may be generated in relatively short order (24 cycles per contour).
Associative Saliency Network
Salient structures in an image can be perceived at a glance without the need for an organized search or prior knowledge about their shape. Such a structure may stand out even when embedded in a cluttered background or when its elements are fragmented. Sha'ashua & Ullman [46] have proposed a global saliency measure for curves based on their length, continuity and smoothness. Consider the image as a network of N × N grid points, with d orientation elements (segments or gaps) coming into each point from its neighbors, and as many going out to its neighbors. A curve of length L in this image is a connected sequence of orientation elements pi, pi+1 , . . . , Pi+L, each element representing a line-segment or a gap in the image, and the saliency measure of the curve is defined as,
Figure imgf000098_0001
where the local saliency σ is assigned the value unity for an active element (real segment), and zero for a virtual element (gap) . The attenuation function ρi,j provides a penalty for gaps,
Figure imgf000098_0002
where the attenuation factor ρ approaches unity for an active element and is appreciably less than unity (here taken to be 0.7) for a virtual element. The first factor, ci,j, is a discrete approximation to a bounded measure of the inverse of total curvature,
Figure imgf000098_0003
where αk denotes the difference in orientation from the k-th element to its successor, and ΔS, the length of an orientation element.
It will be noted that this measure is global in nature, being evaluated over a curve of L orientation elements, any number of which may be gaps. Hence, to find the maximum value at a given segment, one must evaluate all dL possible curves starting with this segment, where d is the (discrete) number of directions considered at each point. This exponential complexity cannot be reduced by pyramid techniques, since parts of a salient curve are not necessarily salient. Nevertheless, Sha'ashua and Ullman reduced complexity to order dL by maximizing repeatedly over shorter curves. Let Ei be a state variable associated with element ρi, then the iterative process is defined by,
Figure imgf000098_0004
where Ej is the state variable of pj, one of the d possible neighbors of pi; the superscript of E is the iteration number; and fi, j is the inverse curvature factor from pi to ρj. After L iterations the state variable becomes equivalent to the saliency measure defined earlier,
Figure imgf000099_0001
The proof is sketched in [46] and detailed in [47]. The final state variables of all the orientation elements (segments or gaps) in the N × N grid constitute a saliency map of the image.
In our associative architecture, the pixels constitute the grid points, and d = 8 orientation elements connect each pixel to its neighbors (Fig. 30). In Fig. 30, the following notation is employed:
• solid line: Ei element for which saliency is computed
• broken line: Successors used in computation
• dotted line: Successors ignored in computation
For d discrete directions, angle a is given in increments of 360/d = 45 degrees, and fi ,j takes on the following values:
Figure imgf000099_0002
It can be seen that only three values of a are significant, -45, 0, and 45, hence only successors with these values of α will be considered in the computation. Since the initial image is given as edge points, a preprocessing step is required to identify active elements as any pair of edge points that are 8-neighbors. The program outline is given in Listing 9. The memory word format is in Fig. 31.
LISTING 9: Associative Saliency Network main ( )
{
. . . declarations
for(i=0 ; i<8 ; i++) { /* updates Ei with first iteration */
letmc d(Sig+i) ; setag; compare ; letmc d(E[i]+3) ; write ; }
for (iteration=1 ; iteration<MaxIteration; iteration++) {
for(i=0; i<8; i++)
{
letm dseq(T1,T2+8); letc; setag; write; /* clear "T1" and "T2 */
shift_and_do(i,0,T2); /*perform Ej*Fij- - >T2 for j=0*/
for(j=1; j<3; j++)
{
shift_and_do(i,j,T2); /* Ej*Fij - -> T2 */
max_field(T2,T1); /* maximum(T2,T1) - - > T1 */
}
Sum_Acc(i); /* SIGi + ROi*MAX(Ej*Fij) - - > Ei */
}
}
}
It may be worth noting that as active ρ approaches unity, it becomes possible to distinguish between different degrees of high saliency, but a larger number of iterations is required. The algorithm requires a word length of 90 bits and its time complexity in cycles is given by,
Figure imgf000100_0001
where I denotes the number of iterations. Evaluating the expression in parentheses, execution time becomes 0.4 ms. per iteration. An execution time of 500 ms. per iteration was reported for the Connection Machine [48].
Hough Transform
The Hough transform can detect a curve whose shape is described by a parametric equation, such as the straight line or conic section, even if there are gaps in the curve. Each point of the figure in image space is transformed to a locus in parameter space. After splitting the parameters into suitable ranges, a histogram is generated giving the distribution of locus points in parameter space. Occurrence of the object curve is marked by a distinct peak in the histogram (intersection of many loci).
In the case of a straight line (Fig.32), we use the normal parameterization suggested by Duda & Hart [49]:
x cosθ + y sin θ = ρ which specifies the line by ρ and θ, and the histogram includes a straight line in every direction of θ. But if the candidate points are the result of edge detection by a method that yields direction, then θ is known. Following O'Gorman & Clowes [50], this information was applied to effect a major reduction of both hardware (word-length) and time complexity. For a 511 × 511 image with the origin at its center, the x-y coordinates are given by 9 bits in absolute value and sign. Angle θ from 0 to π is given to a matching precision of 10 bits (excluding sign of gradient). The sine and cosine are evaluated by table look-up. Advantage is taken of the symmetry of these functions to reduce the table size four-fold. After comparing ρ, the histogram is evaluated and read-out element by element using the COUNTAG primitive. This algorithm requires a 52-bit word length and has a time complexity of,
Tl = 1870 + 13t(r - 1) machine cycles, where t, r are the resolutions of ρ, θ respectively in the histogram. The second term accounts for histogram evaluation and dominates Tl at t, r≥ 32. At a resolution of 16 in both θ and ρ (t, r = 16), the execution time is only 150 μs per frame, and grows to just 6.4 ms at a resolution of 128.
Consider now detection of a circle with given radius R. Its equation may be written as:
(x - x0)2 + (y - y0)2 = R2, where x0, y0 are the coordinates of the center. As in the case of the straight line, we wish to use the direction of the gradient to simplify the process. Differentiating the circle equation we obtain,
Figure imgf000101_0001
where θ is the gradient direction. Solving for x0, y0,
Figure imgf000101_0002
These equations are implemented and a histogram generated for x0, y0. The algorithm uses gradient polarity to distinguish between bright circles on a dark background and dark circles on a light background, generating a separate histogram for each case. Assuming R less than 32 pixels, the word length required is 62 bits and the time complexity in cycles is given by,
Tc = 1550 + 26rxry, (46) where rx, ry are the range resolutions of x0, y0 in the histogram. For a resolution of 128 in both x0 and y0, the execution time is 10.8 ms per frame. Mixed circles that are partly black on white, partly white on black, may be detected by summing the two histograms (in the host) before thresholding. If the search is restricted to bright circles on a dark background (or vice-versa) , the complexity reduces to
Tc = 1280 + 13rxry, (47) and the execution time drops to 6.4 ms.
Geometric Problems
Convex Hull
It is sometimes interesting or useful to find the boundary of a set of points in the image. When looking at such a point set, one has little trouble distinguishing the inside points from those on the boundary. These natural boundary points are the vertices of the convex hull, which is defined mathematically as the smallest convex polygon containing the point set. Equivalently, the convex hull is the (unique) convex polygon containing the point set, whose vertices belong to the point set. Hence it is also the shortest path surrounding the point set.
The approach chosen for associative implementation is known as the package-wrapping method [51]. Starting with a point guaranteed to be on the convex hull, say the lowest one in the set (smallest y coordinate), take a horizontal ray in the positive direction and swing it upward (counter-clockwise) until it hits another point in the set; this point must also be on the hull. Then anchor the ray at this point and continue swinging to the next points, until the starting point is reached, when the package is fully wrapped.
For convenience we choose the coordinate system so that the entire image (and point set) is in the first quadrant. The lowest point Pj is located by searching for the minimum y-coordinate in the set; it is on the hull and is labeled as such. We take as reference the extension of segment PiPj, such that xi = 0 and yi = yj, and consider the angle θ it forms with every PjPk, where Pk is anyone of the other points in the set. The next point on the convex hull is the one for which θ is a minimum (Fig. 33). Denoting vector PiPj by V1 and vector PjPk by V* 2, their scalar product becomes,
V1V2 = |V1 ||V2| cos θ. (48)
Hence,
Figure imgf000103_0001
Where
α1 = xj - xi; α2 = xk - xj ;
b1 = yj - yi; b2 = xk - xj ;
To avoid taking square roots, we use cos2 θ,
Figure imgf000103_0002
Since θ ranges from 0 to π, before squaring the numerator α1α2 + b1b2, we test for any positive values. If there are, we mark them and look for the maximum cos2 θ value among them. Otherwise, if all numerators of cos θ are negative, we look for the minimum value of cos2 θ. The Pk corresponding to the selected θ is on the hull and is labeled as such. To continue the process, Pj becomes the new Pi, and the chosen Pk becomes the new Pj (Fig.
33). The process terminates when it returns to the initial (lowest) point.
Two special cases may arise. In the first step, when looking for the lowest point, we may find two or more points having the same minimum y-coordinate. We choose among them the point with highest x-coordinate as Pj, the one with lowest x-coordinate as Pi, and PiPj becomes the first reference segment. During the iterative part, when looking for the point that makes θ a minimum, we may find two or more points, Pk1, Pk2 . . . Pks, yielding the same minimum value. Clearly, lines PjPk1, PjPk2, . . . , PjPks are collinear, and the point to be chosen is the most distant from Pj, the one having maximum |xk - xj |. Should all xk - xj equal zero, the choice is made on the basis of maximum |yk - yj|.
Analysis of the algorithm implementation shows on ARTVM gives execution time in machine cycles, TCHull = 60 + 105V (51) where V is the number of vertices of the complex hull. Hence time complexity is independent of the number of points in the set. For 1000 vertices in the hull, execution time becomes 3.15 ms.
Voronoi Diagram
This is a classical mathematical object that has become an important tool in computational geometry for dealing with proximity problems. Starting with a given set of L points in the plane, Pi, i = 1, 2, . . . L, the Voronoi diagram surrounds each point Pi by a region, Ri, such that every point in Ri is closer to Pi than to any other point in the set, Pj, j = 1, 2, . . . L and i≠ j. The boundaries of all these regions, Ri, constitute the Voronoi diagram.
An associative algorithm based on the brush fire technique is presented in Listing 10. Each of the given points acts as a source of fire that spreads uniformly in all directions. The boundaries consist of those points at which fires from two (or three) sources meet. Every point in the given set is initially marked with a different color - actually its xy-coordinates. Each point in the image looks at its 8-neighbors. A blank (uncolored) point that sees a colored neighbor will copy its color. If both are colored, the point will compare colors, marking itself as a Voronoi (boundary) point if the colors are different. This process is iterated until all points are colored.
One more cycle of color comparison with the neighbors is necessary to complete demarcation of the boundaries. The order of processing the 8-neighbors was chosen to optimize boundary precision. Not unexpectedly, it alternates between opposite compass directions: N,S,E,W,NE,SW,NW,SE. Analysis of the algorithm yields the time complexity in cycles as,
Figure imgf000104_0001
or an execution time of 75μs per iteration. The regions grow diagonally at a rate of 2 pixels per iteration, hence up to 103 iterations may be required. But for a representative value of I = 20, execution time is 1.5ms. The algorithm yields boundaries that are almost thin. One round of thinning in all 4 directions (south, north, west, east) with the following template will suffice.
X 1 X
1 P 0
X 0 0
Note that the template is shown in its initial southern orientation and that the order within each pair of opposite directions has been reversed. This reversal appears essential for maintaining boundary precision. A pixel removed from the boundary is recolored by examining its 4-neighbors in turn and copying the first one that is not a boundary point. Since thinning and recoloring are not iterative processes, they do not affect execution time significantly. Fig. 34 is useful in understanding Listing 10.
LISTING 10: Associative Voronoi Diagram main( )
{
/* ... declarations */
/ * * Mark the seed points as colored * */
letm d(CL) d(VL); letc; setag; write;
letmc d(S); setag; compare; letmc d(CL); write;
/ * * * * * * BRUSH FIRE * * * * * * /
while (rsp)
{
for(window_index=0; window_index<8; window_index++)
{
/* * Bring in CN and VN * * /
letm d(CN) d(VN); letc; setag; write;
letmc d(CL); setag; compare;
letmc d(CN); general_shift (window_index); write;
letmc d(VL); setag; compare;
letmc d(VN); general_shift (window_index); write;
process ( ) ;
}
letm d(CL) letc; setag; compare;
}
process ( ) ;
} process ( )
{
for(i=0 ; i<colour_size; i++)
{ letm d(TM); letc; setag; write;
letmc d(color+i); setag; compare;
letmc d(TM); general_shift (window_index); write;
letm dseq(CL,TM) d(color+i); letc d(CL) d(CN) d(TM) ; setag; compare;
letmc d(VL); write;
letm dseq(CL,TM) d(color+i); letc d(CL) d(CN) d(color+i); setag; compare;
letmc d(VL); write;
letm dseq(CL,TM) d(color+i); letc d(CL) d(TH); setag; compare;
letmc d(color+i); write;
}
}
general_shift (index) /* 6 0 4 */
int index; /* 3 P 2 */
{ /* 5 1 7 */
if (index == 4 | | index == 0 | | index ==6) shiftag(b) ;
if (index == 5 | | index == 1 | | index ==7) shiftag(-b) ;
if (index == 6 | | index == 3 | | index ==5) shiftag(1) ;
if (index == 4 | | index == 2 | | index ==7) shiftag(-1) ;
}
The associative Voronoi algorithm was designed for quick access of statistical data. Thus it would take only 13 machine cycles to read out the length (in pixels) of the Voronoi Diagram or the area (in pixels) of any Voronoi region identified by its seed coordinates.
Word Length
This section estimates the associative memory word length (K), required to compute vision algorithms. Consider the machine model described, for monochrome computer stereo vision in three channels. Its input is M bits for each of the incoming images, left and right. The machine generates parameters in three channels for use in higher level processes. The parameters are: disparity of bit length [log2(2Wi + 1)] ; slope orientation and edge designation (of length 4 and 1 respectively) for the left and right images; and a one bit match label. Assuming input data is retained for further processing, the final word size is given by,
Figure imgf000106_0001
for M=8 and W=P=7,15,31. Additional word space is required for temporary storage of intermediate results, and this fluctuates dynamically during execution of the various algo rithms. The maximum word length depends on the order of execution. The best order in our case is to compute each channel in turn, starting with the coarsest. Examination of the various processing phases indicates that maximum word length is reached during computation of disparity for the last channel. Accordingly, the maximum word length is expressed by,
Figure imgf000107_0001
where the first term is for the input data, and A' c/_. 2 accounts for the results of the first two channels.
Figure imgf000107_0002
Ksp is the working space to compute the last channel disparity and is given by (see Stereo Vision) ,
Ksp = 3 × 2 + 2┌log2(2W1 + 1)2┐ + 5┌log2(2W1 + 1)┐ = 42 (56) where Ksp does not include flag bits. Hence Kmax becomes 91 bits.
Let us expand our model to include most of the visual algorithms implemented above. As before, the minimum word length required depends on the order of execution. The recommended order is:
• Optical Flow.
• Edge detection and contour processing.
• Hough transform, saliency mapping.
• Stereo matching.
The critical process appears to be optical flow with 132 bits (including an additional byte of pixel data for stereo). By sharing or reusing fields Enl, Uav and Vav, the word length required drops to 106. Providing some spare capacity for new algorithms, the ARTVM word length was fixed at 128 (four 32-bit sectors), plus an 8-bit flag bit, or 136 bits. This only considers associative storage - if the 16-bit image buffer is included, the total word length becomes 152. Results and Conclusions of the Akerib Thesis
A low cost, general purpose vision architecture was proposed here which could carry out any vision algorithm at video rate. The proposed machine is a classical associative structure adapted to computer vision and VLSI implementation. It is designated Associative Real Time Vision Machine (ARTVM), and uses an up-down shift mechanism in the tag register to enhance operations on a local neighborhood. An internal frame buffer virtually eliminates computer I/O time, and permits simultaneous input, output and computation. To reduce chip interface without materially affecting speed, the word is partitioned into four sectors, only one of which can be accessed at a time, and a flag field that is always accessible. The major hardware complement to handle a 512 × 512 image is shown to consist of 256 K words x l52 bits of associative memory. Extrapolation of earlier experiments to 0.5 micron technology yields a capacity of \M bits of associative memory on a chip area of 100mm2 and a cycle time of 30 nanoseconds. The proposed chip stores 4K words × l52 bits, which is 59 percent of capacity, and 64 of these chips make up the associative memory.
A simulator of ARTVM was generated in the C language for use in developing associative micro-software and evaluating its time complexity. Convolution in the x and y directions with a 15-element filter takes 0.34 ms, hence Canny edge detection executes in 0.5 ms, and the Marr & Hildreth method runs nearly twice as long. Likewise computation of stereo disparity by the Grimson method over a range of ±15 pixels, including disambiguation and out-of-range test, completes in under one ms. This stereo performance was only attained by virtue of an array algorithm for counting labeled pixels over a neighborhood. Optical flow by the Horn & Schunck method executes in less than 0.5 ms. Curve propagation, thinning and contour tracing take 1.5, 6.4 and 66 μs per iteration, respectively. The linear Hough transform takes 150 μs for a resolution of 16 in direction and distance from the origin. An interesting result was obtained for the global saliency mapping of Sha 'ashua & Ullman. It takes 0.4 ms per iteration, which is three orders of magnitude faster than the Connection Machine. Geometric problems were also implemented: the convex hull takes 3.15 μs per vertex, and the Voronoi diagram executes in 0.15 ms per iteration by the brush-fire technique.
Two methods were selected for comparative evaluation of ARTVM performance. In the first we compared our architecture to an SIMD array of up to 256 high performance processors (Inmos T800/Intel S60), and found it to have a speed advantage of 2-3 orders of magnitude. The speed advantage was lowest for neighborhood arithmetic operations of higher precision, such as convolution (factor of 97) , and reached a peak for neighborhood logic operations such as curve propagation (factor of 2500). The second method was the Abingdon cross benchmark for which test results were available on several of the better known vision architectures. The ARTVM was found to lead by 2-6 orders of magnitude in price-performance.
The ARTVM configuration used throughout this investigation assumed a long shift of 32 places (b = 32) . This added 64 pins to the associative chip, for a total interface of 160 pins. Reducing b to 16 would save 32 pins at the expense of a 17% loss in average speed. It was indicated earlier that the architecture is flexible with respect to advances in technology and can take full advantage of higher chip density. There is equal flexibility in image resolution if memory chip count is varied linearly with this parameter. Thus for a 1024 × 1024 image, the chip count will grow by a factor of 4, with a small loss in speed and, perhaps, a minor increase in word length.
In the vision algorithms described above, which the data base is inherently pixel oriented or such orientation offers a decided advantage. Exceptions are the Hough transforms and convex hull. For higher level vision functions, dealing with more complex image elements, an associative architecture is expected to offer even greater advantages.
This work has important commercial implications.
The apparatus and methods shown and described above are useful in a variety of applications, such as but not limited to: video telephony implementing the H.261 standard; video teleconferencing for CIF resolution and QCIF resolution; compression and decompression of video games; color image enhancement and manipulation for desk top publishing; optical character recognition (OCR); virtual reality; image animation such as computer generated cartoons; 2 or 3 dimensional B/W or color image inspection and processing; video detecting for traffic control; medical imaging such as 3D reconstructions and back projection filtering; real time normalized gray scale correlation; TV tracking of more than one object such as tracking of vehicles for traffic control purposes; other traffic applications such as identification of license numbers; inspection of manufactured objects such as agricultural produce; wood and metal products and microelectronic products; acceleration of computer arithmetics; neural network applications; fuzzy logic applications; post processing the quality of compressed images; photography with video, digital or analog cameras, with or without image compression, with or without special effects such as autofocus, gamma correction, photomontages, bluescreen, aperture correction, exposure correction, real time morphing, and correction of geometrical distortions; television applications such as HDTV (high definition television), satellite TV, cable TV; infotainment; speech recognition; kiosks for banking, travel, shopping and other purposes; special effects and enhancement of automated office equipment such as facsimile machines, printers, scanners, and photocopiers; compression applications such as compression of faces, fingerprints or other information onto smart cards for driver licenses, ID cards and membership cards; communication applications such as digital filtering, Viterbi decoding and dynamic programming; and training, educational, and entertainment applications.
Video and picture editing applications include acceleration of desk top publishing functions such as blurring, sharpening, rotations and other geometrical transformations, and median filtering.
CD ROM (compact disc read-only-memory) uses include compression such as MPEG-I, MPEG-II, JPEG, fractal compression, and wavelet compression, with or without enhancement such as video sharpening , for a wide variety of applications such as archiving images for medical, real estate, travel, research, and journalistic purposes.
Examples of facsimile applications include canceling out a colored background so as to sharpen the appearance of a text superimposed on the colored background, filling in gaps in letters such as Kanji characters, OCR, facsimile data compression.
An example of a photocopier application is automatically superimposing a template stored in memory, such as a logo, onto a photocopy.
Security applications for home, workplace, banks, and receptacles for valuables and proprietary information, include the following: recognition of personnel, as by face recognition, fingerprint recognition, iris recognition, voice recognition, and handwriting recognition such as signature recognition;
Camera features which may be accelerated by use of the embodiments shown and described herein include the following:
1. Gamma correction: A LUT (look up table) is employed which includes 256 cells, respectively including a value i (i = 0, ..., 255) raised to the power of gamma. Gamma may, for example, be 0.36 or 0.45. The same LUT may be employed for all three color components (R, G and B) and gamma correction may be performed in parallel for all three components.
2. Rapid color-base conversions such as color transformation in which luminance and chrominance are separated before further processing. For example, it is often desirable to transform RGB values or CMYK values into YCrCb values which can be compressed by reducing the number of bits devoted to the Cr and Cb components. Eventually, the compressed YCrCb values are transformed back into RGB or CMYK.
3. Low pass filtering of luminance chrominance signals with 5 - 15 tap filters.
4. Aperture correction. For example, the following stages may be performed:
a. Extract a high frequency component of the original signal representing the photographed scene;
b. Apply two separable filters, each of which may, for example, be: [-0.25 0.5 -0.25] to the columns and rows respectively.
c. Generate a correction signal by shifting the horizontal point signal by K=0, 1, 2 or 4 pixels.
d. Generate corrected output by adding the correction signal to the original signal.
5. Autofocus and autoexposure computations:
For example, the focus of the camera may be adjusted by a predetermined amount m a first direction. Then, the proportion of high frequency components may be computed using the embodiments shown and described above to determine whether this proportion has increased or decreased as a result of the adjustment. If there is an increase, the focus is again adjusted by a predetermined amount in the first direction. If there is a decrease, the focus is adjusted by a predetermined amount in the second direction.
6. Auto color correction computations, such as auto gain control and auto white balance. For example, the following stages may be performed:
a. Adjust black levels until the darkest parts of each R, G or B signal reaches a predetermined level.
b. Adjust differential gains so as to equalize the means of the three signals.
c. Adjust overall gain such that the most positive of the peaks of the three waveforms just reaches white level. This may be done by computing the maximum, minimum and mean levels of each color.
7. Chroma-keying:
a. Key generator: X = Cb cos(z) + Cr sin(z) ; Z = Cr cos(z)s + Cb sin(z)
K = X - a|Z|, K=0 if Z<a|Z|, a = 1/2 , 1, 2, 4 b. Foreground suppression:
Cb' = Cb - K cos(z)
Cr' = Cr - K sin(z)
Y' = Y - YsK, Y' = 0 if YsK>Y. Ys - adjustable constant
c. Key processing:
Kbg = 0 for wanted foreground, Kbg = 1 for wanted background picture.
Kbg = (K-K1)Kg for the transition band,
c. Mixer
8. Noise Reduction
Weighted averaging of two successive images with weights 1/K, 1-1/K, K=2,4,8
9. Movement Protection (Avoid smearing of moving objects).
10. Creation of composite and S-video signals:
Chrominance modulation and alternating lines:
Usin(wt)+Vsin(wt) Usin (wt)-Vsin (wt) done for frequencies of 13.5 MHz. APPENDIX A
/ * * * * * * * * * * * * * * * ASSOCIATIVE PROCESSOR SIMULATOR
* * * * * * * * * * * * * * * * * * * /
#include <stdio.h>
#define setag set_tag( )
#define firsel fir_sel( )
#define countag coun_tag ( )
#define letm let_m();
#define letc let_c();
#define letmc let_mc();
#define clearc let_c();
#define setc set_c();
#define c_to_m mov_cm( )
#define m_to_c mov_mc ( )
#define d(x) del (x);
#define dseq(x1,x2) ((x2>x1) ?dseq1 (x1, x2) :dseq1 (x2, x1));
#define dvar (x1, x2, dat) (cons (x1, x2, dat));
#define load r_disk( )
#define save w_disk( )
#define compare comp( )
#define write wr ( )
#define read rd ( )
#define BIT(k) ((1«k))
#define OF_WORD &
#define TRUE 1
#define FALSE 0
#define EVEN_WORD (x) ( ((x&1)?0:1) )
#define SIGN(x) ( (x<0)?-1:1)
#define print_cycles printf(" %3.1f CYCLES = %4.3f usec\n", CYCLES, CYCLES*.05);
#define extern float CYCLES = 0; int PULSE = FALSE;
int FLAG_CYCLES = FALSE;
int space_arr[WORD_LENGTH]; struct
{
int COUNTS, RSP;
unsigned A[WORD_LENGTH] [MEM_SIZE];
unsigned OUT_DAT[WORD_LENGTH];
struct { unsigned BOOLEAN: 1; }
M_OR_C_FLAG,
M_AND_C_FLAG,
COMP[WORD_LENGTH], MASK[WORD_LENGTH], TAG[MEM_SIZE];
} PAR;
setag {
extern struct PAR;
int i;
CYCLES+ = 0 . 5 ;
PULSE = FALSE;
FLAG_CYCLES = TRUE; for (i=0; i<MEM_SIZE; i++)
PAR.TAG[i].BOOLEAN = 1;
}
shiftag (b) int b ; {
extern struct PAR;
int i; if (abs(b)>1) CYCLES+=8;
else CYCLES+=0.5 ;
PULSE = FALSE;
FLAG_CYCLES = TRUE; if (b < 0)
{
for (i=0;i<MEM_SIZE+b; i++)
PAR.TAG[i].BOOLEAN = PAR. TAG[ i-b].BOOLEAN; for (i=MEM_SIZE+b;i<MEM_SIZE;i++)
PAR.TAG [i].BOOLEAN = 0;
}
else
{
for (i=MEM_SIZE-1; i>=b; i╌)
PAR.TAG [i].BOOLEAN = PAR.TAG[i-b].BOOLEAN; for (i=0;i<b;i++)
PAR.TAG[i].BOOLEAN = 0;
} } coun_tag ( )
{
extern struct PAR;
int i;
PULSE = FALSE;
CYCLES += 10;
PAR.COUNTS = 0;
for (i=0; i<MEM_SIZE; i++) PAR.COUNTS = PAR.COUNTS + PAR.TAG [i].BOOLEAN; return (PAR.COUNTS);
}
fir_sel( )
{
extern struct PAR;
int i;
int j = 0;
PULSE = FALSE;
CYCLES += 6;
while (!PAR.TAG [j++].BOOLEAN);
for (i=j; i<MEM_SIZE; i++)
PAR.TAG[i].BOOLEAN = FALSE;
} let_c()
{
extern struct PAR;
int i;
FLAG_CYCLES = FALSE;
PAR.M_OR_C_FLAG.BOOLEAN = TRUE;
PAR.M_AND_C_FLAG.BOOLEAN = FALSE;
for (i=0; i<WORD_LENGTH; i++)
PAR.COMP[i].BOOLEAN = 0;
}
let_m( )
{
extern struct PAR;
int i;
FLAG_CYCLES = FALSE;
PAR.M_OR_C_FLAG.BOOLEAN = FALSE;
PAR.M_AND_C_FLAG.BOOLEAN = FALSE; for ( i=0; i<WORD_LENGTH; i++)
PAR.MASK[i].BOOLEAN = 0;
} let_mc ( )
{
extern struct PAR;
int i;
FLAG_CYCLES = FALSE;
PAR.M_AND_C_FLAG.BOOLEAN = TRUE;
for (i=0; i<WORD_LENGTH; i++)
{
PAR.MASK[i].BOOLEAN = 0;
PAR.COMP[i].BOOLEAN = 0;
}
} set_c ( )
{
int i;
for (i=0; i<WORD_LENGTH; i++)
PAR.COMP[i].BOOLEAN = 1;
}
mov_cm ( )
{
extern struct PAR;
int i;
for (i=0; i<WORD_LENGTH; i++)
PAR.MASK[i].BOOLEAN = PAR.COMP[i].BOOLEAN; } mov_mc ( )
{
extern struct PAR; i nt i ;
for ( i=0; i<WORD_LENGTH; i++)
PAR.COMP[i].BOOLEAN = PAR.MASK[i].BOOLEAN;
} del (n)
int n;
{
extern struct PAR;
int i;
if (!FLAG_CYCLES && PULSE) CYCLES+=0.5;
PULSE = FLAG_CYCLES = TRUE;
if (PAR.M_OR_C_FLAG.BOOLEAN | | PAR.M_AND_C_FLAG.BOOLEAN)
PAR.COMP[n].BOOLEAN = TRUE;
if (!PAR.M_OR_C_FLAG.BOOLEAN | |
PAR.M_AND_C_FLAG.BOOLEAN)
PAR.MASK[n].BOOLEAN = TRUE;
} cons (start, stop, var)
int start, stop;
unsigned var;
{
extern struct PAR;
unsigned i;
if (!FLAG_CYCLES && PULSE) CYCLES+=0.5;
PULSE = FLAG_CYCLES = TRUE; for (i=start; i<=stop; i++)
{
if (PAR.M_OR_C_FLAG.BOOLEAN | |
PAR.M_AND_C_FLAG.BOOLEAN)
if (BIT(i-start) OF_WORD var)
PAR.COMP[i].BOOLEAN = TRUE;
if (!PAR.M_OR_C_FLAG.BOOLEAN | |
PAR.M_AND_C_FLAG.BOOLEAN) if (BIT(i-start) OF_WORD var)
PAR.MASK[i].BOOLEAN = TRUE;
}
}
dseq1 (n1, n2)
int n1, n2;
{
extern struct PAR;
int i;
if (!FLAG_CYCLES && PULSE) CYCLES+=0.5;
PULSE = FLAG_CYCLES = TRUE;
if (PAR.M_OR_C_FLAG.BOOLEAN | | PAR.M_AND_C_FLAG.BOOLEAN) for( i=n2; i>=n1; i╌)
PAR.COMP[i].BOOLEAN = TRUE;
if (!PAR.M_OR_C_FLAG.BOOLEAN | |
PAR.M_AND_C_FLAG.BOOLEAN)
for( i=n2; i>=n1; i╌)
PAR.MASK[i].BOOLEAN = TRUE;
} r_disk ( )
{
extern struct PAR;
int k, j;
char cl;
FILE *input_file; input_file = fopen ("ass.inp", "r"); for(k=0; k<WORD_LENGTH-1; k++) space_arr[k] = 0; for(j=0; j < MEM_SIZE; j++)
for(k=WORD_LENGTH-1; k >= 0; k╌) {
while ( (c1 = getc(input_file)) != '0' && c1 != '1')
space_arr[k] = 1;
if (c1 == '0')
PAR.A[k] [j] = 0;
if (c1=='1')
PAR.A[k] [j] = 1;
}
}
w_disk( )
{
extern struct PAR;
int k, j;
FILE *output_file; output_file = fopen ("ass.out", "w");
for(k=WORD_LENGTH-1; k >=0;k╌)
{
if (space_arr[k])
fprintf (output_file, " ");
fprintf (output_file, "%d", PAR.COMP[k].BOOLEAN);
}
fprintf (output_file, "\n");
for (k=WORD_LENGTH-1; k >=0; k╌)
{
if (space_arr [k])
fprintf (output_file, " ");
fprintf (output_file, "%d", PAR.MASK[k].BOOLEAN);
}
fprintf (output_file, "\n\n");
for (j=0; j < MEM_SIZE; j++)
{
for ( k = WORD_LENGTH-1 ; k >=0 ; k╌ ) (
if (space_arr [k])
fprintf (output_file, " ");
fprintf (output_file, "%d", PAR.A[k] [j]);
}
fprintf (output_file, "%3d", PAR.TAG[j].BOOLEAN); fprintf (output_file, "%5d\n", j);
}
fprintf (output_file, "\n\n %3.1f CYCLES\n", CYCLES);
} comp ( )
{
extern struct PAR;
int k, j;
unsigned or_exp;
if (PULSE | | !FLAG_CYCLES) CYCLES+=1;
else
CYCLES+=0.5;
PULSE = FALSE; for (j=0; j<MEM_SIZE; j++)
{
or_exp = FALSE;
for (k=0; k<WORD_LENGTH; k++)
or_exp = (PAR.MASK[k].BOOLEAN) &
(PAR.A[k][j] ^ PAR.COMP[k].BOOLEAN) | or_exp; PAR.TAG[j].BOOLEAN = PAR.TAG[j].BOOLEAN &
(!or_exp);
}
PAR.RSP = FALSE;
for (j=0; j<MEM_SIZE; j++)
if (PAR.TAG[j].BOOLEAN) PAR.RSP = TRUE; }
wr( )
{
extern struct PAR;
int k, j; if (PULSE | | !FLAG_CYCLES) CYCLES+=1;
else
CYCLES+=0.5;
PULSE = FALSE; for(j=0; j<MEM_SIZE; j++)
for (k=0; k<WORD_LENGTH; k++)
{
PAR.A[k][j] =
((!PAR.TAG[j].BOOLEAN) & (PAR.A[k][j])) |
(PAR.TAG[j].BOOLEAN &
(
(PAR.MASK[k].BOOLEAN &
PAR.COMP[k].BOOLEAN) |
((!PAR.MASK[k].BOOLEAN) & PAR.A[k][j])));
}
} rd()
{
extern struct PAR;
int k, j; if (PULSE | | !FLAG_CYCLES) CYCLES+=1;
else
CYCLES+=0.5; PULSE = FALSE ; for(k=0; k<WORD_LENGTH; k++)
{
PAR.OUT_DAT[k] = FALSE;
for(j=0; j<MEM_SIZE; j++)
{
PAR.OUT_DAT[k ] =
PAR.OUT_DAT[k] |
(PAR.TAG[j].BOOLEAN & PAR.A[k][j]);
}
}
}
APPENDIX B
/ * * * * ASSOCIASTIVE COMPUTATION OF IMAGE HISTOGRAM * * * * /
/ * * * MEMORY FORMAT * * */
/ * 7 6 5 4 3 2 1 0 * /
/* | - - - - - - - - - - - - - - - - - - - - - - | */
/* | - - - - - - - - - - - - - - - - - - - - - - | */
# include " stdlib . h" main ( )
{
int p;
int hist[256]; letm dseq(0, 7);
for (p=0; p<256; p++)
{
letc dvar(0,7,p); setag; compare;
hist[p] = countag;
}
}
APPENDIX C
/ * * ASSOCIATIVE CONVOLUTION PROGRAM * * /
/* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * /
/* THE MEMORY FORMAT
*/
/* * * * * * * * * * * * * * * * * *
*/
/* | - - - - d - - - - | - - | - - | - - - - - - - - - - - - - fd
- - - - - - - - - - - - - - - - - - - | */
/*
| - - - - - - - - - - - - | - - | - - | - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - |
*/
/* mark temp
*/
#define MEM_SIZE 32
#define WORD_LENGTH 29
#include "asslib.h" int f[7] = { 1,2,4,16,4,2,1 }; /* define the filter*/ main ( ) { int f_size = 7; /* filter size is 7 words */
int n = 8; /* data word length */
int temp = 19; /* temp is in bit position 19 */ int mark = 20; /* mark is in bit posion 20 */ int d_offset = 21; /* [d] field first bit posion */ int add_offset, bit_count, f_index; load ;
/ * * * * * * summed multipl i cation * * * * * * * /
/* */
/* add [d] to [fd] if the bit at */
/* position bit_count of f[f_index] */
/* is 1 */
/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * / for (f_index=0; f_index<f_size; f_index++)
{
for (add_offset=0; add_offset<n; add_offset++) if( BIT(add_offset) OF_WORD f[f_index]) /*test if bit is 1 */
{ / * * * * * * * * * * * * add * * * * * * * * * * * * * * * / for (bit_count=0; bit_count<n; bit_count++)
{
letm d(add_offset+bit_count) d(temp) d(d_offset+bit_count) d(mark);
letc d(temp) d(mark); setag; compare; letc d(add_offset+bit_count) d(mark); write; letc d (add_offset+bit_count) d(temp) d(mark)
setag; compare;
letc d(temp) d(mark); write; letc d(add_offset+bit_count)
d(d_offset+bit_count)
d(mark); setag; compare; letc d(temp) d(d_offset+bit_count) d(mark); write; letc d(d_offset+bit_count) d(mark);
setag; compare;
letc d(add_offset+bit__count)
d(d_offset+bit_count)
d(mark); write;
} / * * * * * * propagate carry * * * * * * * / for (bit_count=0; add_offset+bit_count< n+3;
bit_count++)
{
letm d(add_offset+n+bit_count) d(mark) d(temp);
letc d(mark) d(temp); setag; compare;
letc d(add_offset+n+bit_count) d(mark);
write;
letc d(add_offset+n+bit_count) d(mark) d(temp);
setag; compare;
letc d(mark) d(temp); write; } }
/ * * * * * * * * * shift [ d ] field and marker down
* * * * * * * * */ for (bit_count=mark; bit_count<=mark+n; bit_count++)
{
letm d(bit_count); letc d(bit_count); setag; compare; letc; write;
letc d(bit_count); shiftag(1); write; }
} save; print_cycles; }
APPENDIX D /*
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Lpf .C
This Program demonstrate a low pass filter
opertting on a 24 bit bitmap.
The calculation is done for the Red Green and Blue color seperatliy . - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - * /
#define Pixel 0
#define PIXELSIZE 8
#define PixelSign Pixel + PIXELSIZE
#define Array_Res 17
#define TempNig 39 shift_2 ( int pos , int len , int res , int shift_factor); sign (int pos , int len , int res , int temp );
void neighbour (int SrcPos, int DstPos, int len , int
NbrName);
void fifoxchng (int SrcPos, int DstPos, int len , int
NbrName);
add(int src , int src_len, int dst , int dst_len, int
c_bit); enum color {RED, GREEN, BLUE };
int lpf_filter [ ] = {4,2,2,2,2,1,1,1,1};
main (int , char *argv[ ])
{
int j ;
int i; int cl;
confifo (0) ;
letm dseq(0,23) ;letc ;setag; write;
letm dseq(24,47) ;letc ;setag;write;
letm dseq(48,71) ;letc ;setag; write;
printf ("Runnig Asp Simulator : %s \n\n", argv [0]);
load(24,48);
for (cl = RED ; cl <= BLUE ; cl++ )
{
fifoxchng (48+8*cl, Pixel,
PIXELSIZE, 0);
shift_2 ( Pixel , PIXELSIZE , Array_Res , 2 ); /* multply the array * 4 and put result in */
shift_2 ( Pixel , PIXELSIZE , PixelSign , 1 ); /* multply the array * 4 and put result in */ for ( j = 1 ; j <= 8; j++)
{
/* copy the neigbour */
if ( j <=4)
{
neighbour (PixelSign, TempNig
, 9, j) ;
add (TempNig, PIXELSIZE+1, Array_Res, PIXELSIZE+1+j,
Array_Res+PIXELSIZE+1+j);
}
else
{
neighbour (Pixel, TempNig, 8, j);
add (TempNig, PIXELSIZE, Array_Res,
PIXELSIZE+1+j,
Array_Res+PIXELSIZE+1+j);
} }
fifoxchng (Array_Res+4
,48+8*cl, PIXELSIZE, 0);
}
// confifo (24);
save (24,48);
return 0;
} add (int src , int src_len,int dst ,int dst_len,int c_bit)
{
int cnt ;
/* clear carray bit */ letm d(c_bit) ; setag ;
letc;
write; for(cnt=0; cnt<src_len; cnt++)
{
letm d(dst+cnt) d( c_bit) d( cnt+src); letc d(c_bit); setag;
compare;
letc d(dst+cnt);
write; letc d(dst+cnt) d( c_bit); setag;
compare;
letc d(c_bit);
write; letc d(dst+cnt) d( cnt+src); setag;
compare;
letc d(c_bit) d( cnt+src) ;
write; letc d(cnt+src); setag;
compare;
letc d(dst+cnt) d( cnt+src);
write; }
/* carry propogation */ for (; cnt < dst_len ; cnt++)
{
letm d(dst+cnt) d( c_bit) ;
letc d(c_bit); setag;
compare;
letc d(dst+cnt);
write; letmc d(dst+cnt) d( c_bit); setag; compare;
letc d(c_bit);
write;
} return 0;
} void neighbour (int SrcPos,int DstPos,int len , int NbrName)
{
int i;
/* clear destintaion */ letm dseq(DstPos ,DstPos+ len - 1) ; setag ; letc ;
write ; for (i = 0; i < len ; i ++) { letmc d(SrcPos + i) ; setag; compare;
letmc d(DstPos + i) ;
switch (NbrName)
{
case 0:
break;
case 1 :
shiftag(1) ;
break;
case 2:
shiftag(-1);
break;
case 3:
shiftag(128); break;
case 4:
shiftag(-128); break;
case 5:
shiftag(127) ; break;
case 6:
shiftag(-127); break;
case 7:
shiftag(129); break;
case 8:
shiftag(-129); break;
} write ;
} return;
}
void fifoxchng (int SrcPos,int DstPos,int len , int
NbrName)
{
int i;
/* clear destintaion */ letm dseq(DstPos ,DstPos+ len - 1) ; setag ; letc ;
write ;
for (i = 0; i < len ; i++)
{ letmc d(SrcPos + i) ; setag;
compare;
letmc d(DstPos+len-1 - i) ;
switch (NbrName)
{
case 0:
break;
case 1 :
shiftag(1) ;
break;
case 2:
shiftag(-1);
break;
case 3:
shiftag(128);
break; case 4:
shiftag (-128);
break;
case 5:
shiftag (127) ;
break;
case 6:
shiftag(-127);
break;
case 7:
shiftag(129);
break;
case 8:
shiftag (-129);
break;
} write ;
} return;
} shift_2 ( int pos , int len , int res , int shift_factor)
{
int i ;
/* clean the result operand */ letm dseq(res, res+ len + shift_factor-1) ; setag;
letc ;
write ;
/* copy the operand from pos to res
+shift_factor using the tag register */ for ( i = 0; i < len ; i++ )
{
letmc d( pos + i) ; setag ;
compare ; letmc d( res + shift_factor + i); write ;
}
return 0 ;
} sign (int pos , int len , int res , int temp )
{
int i ;
/* clean the result operand */ letm dseq(res, res+ len-1) d(temp); setag; letc ;
write ;
/* copy the operand from pos to res +shift_factor using the tag register */ for ( i = 0; i < len ; i++ )
{
letm d(pos + i) d(temp); setag ;
letc d(pos + i) ;
compare ;
letmc d( res + i) d(temp) ;
write ; letm d( pos +i) d(temp); setag; letc d(temp);
compare ;
letmc d(res+i);
write ; }
return 0 ; }
APPENDIX E /*
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Laplacian. C
This Program demonstrate a laplacian filter
opertting on a 24 bit bitmap.
The calculation is done for the Red Green and Blue color seperatliy . - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - * /
#define Pixel 0
#define PIXELSIZE 8
#define PixelSign Pixel + PIXELSIZE
#define Array_Res 24
#define TempNig 40 shift_2 ( int pos , int len , int res , int shift_factor); sign (int pos , int len , int res , int temp );
void neighbour (int SrcPos,int DstPos,int NbrName); void fifoxchng (int SrcPos,int DstPos,int NbrName); add (int src , int src_len,int dst , int dst_len, int c_bit); enum color {RED, GREEN, BLUE }; main (int , char *argv[ ])
{
int j ;
color cl; printf ("Runnig Asp Simulator : %s \n\n", argv [0]); /* insert data to fifo */
confifo (0);
letm dseq (0, 23); letc; setag; write;
letm dseq (24, 47); write;
letm dseq (48, 71); letc; setag; write;
// load (24,48);
confifo(24); for (cl = RED ; cl <= BLUE ; cl++ )
{
fifoxchng(48+8*cl, 0, 0);
shift_2 ( Pixel , PIXELSIZE , Array_Res , 2 ); /* multply the array * 4 and put result in */
sign ( Pixel , PIXELSIZE , PixelSign , PixelSign + PIXELSIZE ); for ( j = 1 ; j <= 4 ; j++)
{
/* copy the neigbour */ neighbour (PixelSign, TempNig , j); add (TempNig, PIXELSIZE, Array_Res, PIXELSIZE+1+j,
Array_Res+PIXELSIZE+1+j);
}
fifoxchng (Array_Res+2 , 48+8*cl, 0);
}
save (24,48);
return 0;
} add (int src , int src_len, int dst , int dst_len, int c_bit)
{
int cnt ; /* clear carray bit */ letm d(c_bit) ; setag ;
letc;
write; for(cnt=0; cnt<src_len; cnt++)
{
letm d(dst+cnt) d( c_bit) d( cnt+src); letc d(c_bit); setag;
compare;
letc d(dst+cnt);
write; letc d(dst+cnt) d( c_bit); setag;
compare;
letc d(c_bit);
write; letc d(dst+cnt) d( cnt+src); setag; compare;
letc d(c_bit) d( cnt+src) ;
write; letc d(cnt+src); setag;
compare;
letc d(dst+cnt) d( cnt+src);
write; }
/* carry propogation */ for (; cnt < dst_len ; cnt++)
{
letm d(dst+cnt) d( c_bit) ; letc d(c_bit); setag;
compare;
letc d(dst+cnt);
write; letmc d(dst+cnt) d( c_bit); setag; compare;
letc d(c_bit);
write;
} return 0;
} void neighbour (int SrcPos,int DstPos,int NbrName)
{
int i;
/* clear destintaion */ letm dseq(DstPos , DstPos+ PIXELSIZE-1) ; setag ;
letc ;
write ;
for (i = 0; i < PIXELSIZE ; i++)
{ letmc d(SrcPos + i) ; setag;
compare;
switch (NbrName)
{
case 0:
break;
case 1 :
shiftag(1) ; break;
case 2:
shiftag(-1);
break;
case 3:
shiftag(128);
break;
case 4:
shiftag(-128);
break;
} letmc d(DstPos + i) ;
write ;
} return;
}
void fifoxchng (int SrcPos,int DstPos,int NbrName)
{
int i;
/* clear destintaion */ letm dseq(DstPos , DstPos+ PIXELSIZE-1) ; setag ;
letc ;
write ;
for (i = 0; i < PIXELSIZE ; i++)
{ letmc d(SrcPos+ i) ; setag;
compare;
switch(NbrName)
a { case 0:
break;
case 1 :
shiftag(1) ;
break;
case 2:
shiftag(-1);
break;
case 3:
shiftag(128);
break;
case 4:
shiftag(-128);
break;
} letmc d(DstPos + PIXELSIZE-1-i) ;
write ;
} return;
} shift_2 ( int pos , int len , int res , int shift_factor)
{
int i ;
/* clean the result operand */ letm dseq (res, res+ len + shift_factor-1) ; setag;
letc ;
write ;
/* copy the operand from pos to res
+shift factor using the tag register */ for ( i = 0; i < len ; i++ )
{
letmc d( pos + i) ; setag ;
compare ; letmc d( res + shift_factor + i); write ;
}
return 0 ;
} sign (int pos , int len , int res , int temp )
{
int i ;
/* clean the result operand */ letm dseq(res, res+ len-1) d(temp); setag; letc ;
write ;
/* copy the operand from pos to res +shift_factor using the tag register */ for ( i = 0; i < len ; i++ )
{
letm d(pos + i) d(temp); setag ; letc d(pos + i) ;
compare ;
letmc d( res + i) d(temp) ;
write ; letm d( pos +i) d(temp); setag; letc d(temp);
compare ;
letmc d(res+i);
write ; }
return 0 ; }
APPENDIX F
/* Sobel filter demonstration
*/
#define Sx 0
#define Sy 12
#define Pixel 24
#define PIXELSIZE 8
#define Temp 32
#define TempNig 40
#define TRESHOLD 44
#define GT 56 shift_2 ( int pos , int len , int res , int shift_factor); sign (int pos , int len , int res , int temp );
add (int src , int src_len, int dst , int dst_len,int c_bit);
void neighbour (int SrcPos,int DstPos,int len , int
NbrName);
int sqr(int src, int len, int dst, int cont);
sub (int dst , int src, int len , int c_bit);
abs (int src, int len, int temp, int flag);
int cut_at_treshold (int treshold);
int histogram(int src); cal angel (int x, int y,int result);
main (int , char *argv[ ])
{
int j=0 ;
int i ; printf ("Runnig Asp Simulator : %s \n\n", argv [ 0 ] ) ;
/* reset the array */ letm dseq (0,23) ; letc ; setag; write; letm dseq (24,47); letc ; setag; write; letm dseq (48,71); letc ; setag; write; load (8L,64L);
for (1 = 0 ; i < PIXELSIZE ; i++)
{
letmc d(71-i) ; setag; compare;
letmc d(Pixel+i); write;
}
neighbour (Pixel, Sx+1, PIXELSIZE, 1);
/* Sx = 2* A[1] */
neighbour (Pixel, Temp, PIXELSIZE, 2);
/* Temp = A[2] */
add (Temp, PIXELSIZE, Sx+1,
PIXELSIZE, Sx+1+PIXELSIZE ); /* Sx = 2*A[1]+2*A[2] */ neighbour (Pixel, Sy+1, PIXELSIZE, 3);
/* Sy = 2* A[3] */
neighbour (Pixel, Temp, PIXELSIZE, 4);
/* Temp = A[4] */
add (Temp, PIXELSIZE, Sy+1, PIXELSIZE,
Sy+1+PIXELSIZE ); /* Sx =
2*A[3]+2*A[4] */
/* Calculate the fixed part of the filter */ /* 1 0 1 */
/ * temp = 0 0 0 */
/ * 1 0 1
*/ neighbour (Pixel, TempNig, PIXELSIZE, 5);
/* Temp = A[5] */ for ( j = 6 ; j <= 8 ; j++)
{
/* copy the neigbour */ neighbour (Pixel, Temp, PIXELSIZE , j);
add (Temp, PIXELSIZE, TempNig, PIXELSIZE+2, TempNig+PIXELSIZE+2);
} sub (Sx, TempNig, 11, Sx+11);
sub (Sy, TempNig, 11, Sy+11);
cal_angel (Sx+11, Sy+11, Pixel); histogram (Pixel);
abs (Sx, 12, Temp, Temp+1);
abs (Sy, 12, Temp, Temp+1);
//histogram(Sx);
add(Sy, 11, Sx, 11, Sx+12);
cut_at_treshold(60); letmc d(Pixel); d(Pixel+1); setag; compare; letmc d(5);write;
letm d(Pixel) d(Pixel+1); letc
d(Pixel); setag; compare;
letmc d(3);write;
letm d(Pixel) d(Pixel+1); letc
d(Pixel+1); setag; compare;
letmc d ( 4 ) ; write ; letm d(Pixel) d(Pixel+1); letc; setag; compare; letmc d(2);write;
histogram(0);
letm dseq (64, 71); letc; setag; write;
for (i = 0 ; i< 8 ; i++)
{
letmc d(Sx+i); setag; compare;
letmc d(71-i); write;
} save (8L, 64L);
return 0;
} cal_angel (int x,int y,int result)
{
letm dseq (result, result+1); letc ; setag ; write;
/* Sx and Sy are negative */ letmc d(x) d(y) ; setag; compare;
letmc dvar (result, result+1, 2) ; write;
/* Sx is negative and Sy positive second qouter * / letm d(x) d(y) ; letc d(x) ; setag ; compare; letm dseq (result, result+1); letc
dvar (result, result+1, 3); write;
/* Sy is negative and Sx positive second qouter */ letm d(x) d(y) ; letc d(y) ; setag ; compare; letm dseq (result, result+1); letc
dvar (result, result+1, 1); write; return 0 ;
} abs (int src, int len, int temp, int flag)
{
int i ;
int sign; sign = src+len-1;
/* clean the result operand */ letm d(temp) d(flag);
letc d(temp); setag ; write ;
/* copy the operand from pos to res
+shift_factor using the tag register */ for ( i = 0; i < len-1 ; i++ )
{
/* if flag is high than invert the bits
*/ letmc d(src + i) d(flag) d(sign) d(temp); setag ; compare;
letc d(flag) d(sign) ; write; letm d(sign) d(flag) d(temp) d(src+i) letc d(sign) d(flag) d(temp) ; setag ; compare;
letmc d(src+i) ;write;
letmc d(temp); setag; write;
/* find the first bit that is high */ letmc d(src+i) d(sign) ; setag; compare; letmc d(flag); write;
}
letm d(sign) ; letc; setag ; write;
return 0 ;
} add (int src , int src_len, int dst , int dst_len, int c_bit)
{
int cnt ;
/* clear carray bit */ letm d(c_bit) ; setag ; letc; write; for(cnt=0; cnt<src_len; cnt++)
{
letm d(dst+cnt) d( c_bit) d( cnt+src); letc d(c_bit); setag; compare;
letc d(dst+cnt); write; letc d(dst+cnt) d( c_bit); setag; compare;
letc d(c_bit); write; letc d(dst+cnt) d( cnt+src); setag; compare; letc d(c_bit) d( cnt+src) ; write; letc d(cnt+src); setag; compare;
letc d(dst+cnt) d( cnt+src); write; }
/* carry propogation */ for (; cnt < dst_len ; cnt++;
{
letm d(dst+cnt) d( c_bit) ; letc d(c_bit); setag; compare;
letc d(dst+cnt); write; letmc d(dst+cnt) d( c_bit); setag; compare; letc d(c_bit); write;
} return 0;
}
/* dst = dst - src */
sub (int dst , int src , int len, int c_bit)
{
int cnt ;
/* clear carray bit */ letm d(c_bit) ; setag ; letc; write; for(cnt=0; cnt<len; cnt++)
{
letm d(dst+cnt) d( c_bit) d( cnt+src);
letc d(src+cnt) ; setag; compare; letc d(c_bit) d(dst+cnt) d(src+cnt); write; letc d(dst+cnt) d( c_bit); setag; compare; letc ; write; letc d(c_bit); setag; compare;
letc d(dst+cnt) d( c bit); write;
letc d(src+cnt) d(dst+cnt); setag; compare; letc d(src+cnt); write;
} return 0;
}
void neighbour (int SrcPos, int DstPos, int len , int NbrName)
{
int i;
/* clear destintaion */ letm dseq(DstPos , DstPos+ len - 1) ; setag ; letc ;
write ;
for (i = 0; i < len ; i++)
{ letmc d(SrcPos + i) ; setag;
compare;
letmc d(DstPos + i) ;
switch (NbrName)
{
case 0:
break;
case 1 :
shiftag(1) ;
break;
case 2:
shiftag(-1);
break;
case 3:
shiftag(128);
break;
case 4: shiftag(-128);
break;
case 5:
shiftag(127) ;
break;
case 6:
shiftag(-127);
break;
case 7:
shiftag(129);
break;
case 8:
shiftag(-129);
break;
} write ;
} return;
} shift_2 ( int pos , int len , int res , int shift_factor)
{
int i ;
/* clean the result operand */ letm dseq(res, res+ len + shift_factor-1) ; setag;
letc ;
write ;
/* copy the operand from pos to res
+shift factor using the tag register */ for ( i = 0; i < len ; i++ ) {
letmc d( pos + i) ; setag ;
compare ; letmc d( res + shift_factor + i);
write ;
}
return 0 ;
} sign (int pos , int len , int res , int temp )
{
int i ;
/* clean the result operand */ letm dseq(res, res+ len-1) d(temp); setag; letc ; write ;
/* copy the operand from pos to res
+shift_factor using the tag register */ for ( i = 0; i < len ; i++ )
{
letm d(pos + i) d(temp); setag ;
letc d(pos + i) ; compare ;
letmc d( res + i) d(temp) ;write ; letm d( pos +i) d(temp); setag;
letc d(temp); compare ;
letmc d(res+i); write ;
}
return 0 ;
} int sqr(int src, int len, int dst, int cont)
{ int i , cnt ;
/* clear the result area */ letm dseq (dst, dst+2*len); letc ; setag;
write; for ( i = 0 ; i < len ; i++)
{
/* clear the control bit*/
letm d(cont) ; letc ; setag; write;
/* set the cont bit if the currunt bit is on */
letmc d(src+i); setag; compare;
letmc d (cont) ; write;
/* clear carray bit */ letm d(dst+len+i) ; setag ;
letc;
write; for ( cnt = 0 ; cnt < len ; cnt++ )
{ letm d(dst+cnt+i) d( dst+len+i) d( cnt+src) d(cont)
letc d(dst+len+i) d(cont); setag; compare; letc d(dst+cnt-ri) d(cont); write; letc d(dst+cnt+i) d( dst+len+i) d(cont); setag; compare;
letc d(dst+len+i) d(cont); write; letc d(dst+cnt+i) d( cnt+src) d(cont) ; setag; compare;
letc d(dst+len+i) d( cnt+src) d(cont); write; letc d(cnt+src+i) d(cont); setag; compare; letc d(dst+cnt+i) d( cnt+src) d(cont);
write;
}
}
return 0;
} int cut_at_treshold (int treshold)
{
int cmpr(int F, int S, int size, int gt, int le, int cont); int i;
letm dseq (TRESHOLD, TRESHOLD+11) ;
letc dvar
(TRESHOLD, TRESHOLD+11, treshold); setag; write;
letmc d (GT+2) ; setag;write;
cmpr(Sx, TRESHOLD, 12, GT, GT+1, GT+2); letmc d (GT); setag; compare; /* find all number that are less than treshold */
letm dseq (0,7) ; letc d(7) d(1); write; printf(" the number of pixel below is = %d the treshold is = %d\n", countag, treshold);
letmc d (GT+1); setag; compare; /* find all number that are less than treshold */
letm dseq(0,7) ; letc; write; printf(" the number of pixel below is = %d the treshold is = %d\n", countag, treshold);
return 0; }
/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * * * * * *
* * * * * * * * * * * * * /
/* Procedure for comparison of fields.
*/
/* if F > S then gt-bit set;
*/
/* if F<= S then le-bit set; cont - candidate bit
*/
/* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * /
cmpr(int F, int S, int size, int gt, int le, int cont)
{
int i;
letmc d(cont) setag; compare;
letm d(gt) d(le); letc; write; /*Clear */
for (i=size-1; i>=0; i╌)
{
letm d(cont) d(gt) d(le) d(F+i) d(S+i);
letc d(cont) d(F+i); setag; compare; /* 10 */ letc d(cont) d(F+i) d(gt); write;
letc d(cont) d(S+i); setag; compare; /* 01 */ letc d(cont) d(S+i) d(le); write;
}
return 0;
} int histogram(int src)
{
int i , t;
letm dseq (src, src+7);
for (i = 0 ; i < 256 ; i++)
{
letc dvar (src, src+7, i); setag; compare; t = countag;
if (t > 0 ) printf(" color is = %d and histogram = %d \n",i,t);
}
return 0;
}
APPENDIX G
/* * * * * * * CURVES PROPAGATION BY ASSOCIATIVE PROCESSOR
* * * * * * * /
/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * /
/ * * * * * * * * * * * * * * * * * MEMORY ORGANIZATION
* * * * * * * * * * * * * * * * * * * * * * * * * * /
/* | - - | - - - - - - - - - - - - - - | - - |
*/
/* | - - | - - - - - - - - - - - - - - | - - |
*/
/* MARK not use E0
*/
/* bit-7 bit-0
*/
/ * * * * * * * * * * * * * * * * * * * * MAIN PROGRAM
* * * * * * * * * * * * * * * *
#define MEM_SIZE 100 /* 10 × 10 image size example */
#define WORD_LENGTH 8
#include "asslib.h" main ( ) { int growth;
int growth_threshold = 0;
int old_condition = 0;
int new_condition = 1;
int MARK = 7;
int E0 = 0;
int b = 10; /* long shift constant */ l oad ; letmc d(E0); setag; compare; /* TAG <╌ E0 */ while ( (growth = (new_condition - old_condition)) > growth_threshold)
{ shiftag(-b); shiftag(-1); write; shiftag(1); write; shiftag (1); write;
setag; compare;
letc; write; /* clear E0 */
letmc d(MARK); compare;
letmc d(E0); write; /* new edges in E0 */ shiftag (1); write; shiftag (-1); shiftag (-1); write; setag; compare;
letc; write; /* clear E0 */
letmc d(MARK); compare;
letmc d(E0); write; /* new edges in E0 */ shiftag (b); shiftag (1); write; shiftag (-1); write; shiftag (-1); write;
setag; compare;
letc; write; /* clear E0 */
letmc d(MARK); compare;
letmc d(E0); write; /* new edges in E0 */
old_condition = new_condition;
new_condition = countag;
printf("%d %d\n", growth, new_condition ); }
save;
print_cycles; }
APPENDIX H / * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * /
/* ASSOCIATIVE "HORN & SCHUNCK" OPTICAL FLOW
(O.F.) ALGORITHM */
/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * / / * * * * * * * * * * * * * * INPUT * * * * * * * * * * * *
*/
/* Gray level images of n & n+1 frames.
*/ / * * * * * * * * * * * * * * OUTPUT * * * * * * * * * * * *
*/
/* Optical flow patterns.
*/ / * * * * * * * * * * * * * * PROCESS * * * * * * * * * * * *
*/
/* The memory organization:
*/
/* 1 1 14 14 10 10 20 18 11 11 10 8 8 */
/*
| - | - | - - - - - | - - - - - | - - - - | - - - - | - - - - - - - - | - - - - - - | - - - - | - - - - |
- - - - | - - - | - - - | */
/*
| - | - | - - - - - | - - - - - | - - - - | - - - - | - - - - - - - - | - - - - - - | - - - - | - - - - | - - - - | - - - | - - - | */
/* m t V U Vav Uav Ac Sc Et Ey Ex En1 En */
/* a e
*/ /* r m
*/
/ * k p
*/
/* En : Gray level of frame n.
*/
/* En1 : Gray level of frame n+1.
*/
/* Ex : Derivative estimation in X direction.
*/
/* Ey : Derivative estimation in Y direction.
*/
/* Et : Time rate of change in intensity.
*/
/* Sc : Accumulator to compute the scale factor D where */
/* D = alfa*alfa + Ex*Ex + Ey*Ey.
*/
/* It is also used to store Ex/D and Ey/D.
*/
/* Ac : Accumulator to compute Ex*Uav + Ey*Vav +Et.
*/
/* Uav : Local average of U. (It also computes Ex/D at first iteration) */
/* Vav : Local average of V. (It also computes Ey/D at first iteration) */
/* U : X component of O.F. (Also used as intermediate results storage) */
/* V : Y component of O.F. (Also used as intermediate results storage) */
/* temp, mark : Flags.
*/
/*
*/
/* * /
/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * MAIN PROGRAM
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * /
#define MEM_SIZE 64 /* 8 × 8 image size */
#define WORD_LENGTH 137
#include "asslib.h"
/ * * * * FIRST BIT POSITIONS DEFINITION * * * /
#define En 0
#define En1 8
#define Ex 16
#define Ey 27
#define Et 38
#define Sc 49
#define Ac 67
#define Uav 87
#define Vav 97
#define U 107
#define V 121
#define temp 135
#define mark 136
#define b 8 /* Long shift length */ main ( )
{ int bit_count, index, cy, Ey_sqr, iterations; int alfa = 1;
int no_of_iterations=1;
load; / * * * COMPUTE Ex Ey & Et * * * /
/* We compute the derivatives by shift and add according to the */
/* following table:
*/
/* | DATA | Ex | Ey | Et | Order Of Execution
| */
/*
| - - - - - - - - - - - - | - - - - - - | - - - - - - | - - - - - - | - - - - - - - - - - - - - - - - - - - -|
*/
/* |Ei+1, j+1, n+1 | + | + | + | 1
| */
/* |Ei+1, j+1, n | + | + | - | 2
| */
/* |Ei+1, j, n | - | + | - | 3
| */
/* |Ei+1, j, n+1 | - | + | + | 4
| */
/* |Ei, j, n | - | - | - | 5
| */
/* |Ei, j, n+1 | - | - | + | 6
| */
/* |Ei, j+1, n | + | - | - | 7
| */
/* |Ei, j+1, n+1 | + | - | + | 8
| */
/*
| - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - | - - - - - - - - - - - - - - - - - - - - | */
/* * * shift up fields En & En1 b+1 places * * * /
shift_fields(En, 16, b+1, -1); / * * * perform op . 1 * * * /
printf ("op1\n");
/ * * write field En1 into Ex Ey & Et * * /
for (bit_count=0; bit_count<8; bit_count++)
{
letm d(En1+bit_count) d(Ex+bit_count) d(Ey+bit_count) d(Et+bit_count);
letc d(En1+bit_count); setag; compare;
letc d(En1+bit_count) d(Ex+bit_count) d(Ey+bit_count) d(Et+bit_count);
write;
}
/ * * * perform op . 2 * * * /
printf ("op2\n");
add (En, Ex, 8, 3, mark);
add (En, Ey, 8, 3, mark);
sub (En, Et, 8, 3, mark);
/ * * * perform op . 3 & op . 4 * * * /
printf ("op3,4\n");
shift_fields (En, 16, 1, 1); /* shift En & En1 one place down
*/
sub (En, Ex, 8, 3, mark);
add (En, Ey, 8, 3, mark);
sub (En, Et, 8, 3, mark);
sub (En1, Ex, 8,3, mark);
add (En1, Ey, 8, 3, mark);
add (En, Et, 8, 3, mark);
/ * * * perform op . 5 & op . 6 * * * /
printf ("op5,6\n");
shift_fields (En, 16, b, 1); /* shift En & En1 b places down
*/
sub (En, Ex, 8, 3, mark); sub (En, Ey, 8, 3, mark);
sub (En, Et, 8, 3 , mark);
sub (En1, Ex, 8, 3, mark);
sub (En1, Ey, 8, 3, mark);
add (En, Et, 8, 3, mark);
/ * * * perform op . 7 & op . 8 * * * /
printf ("op7,8\n");
shift_fields (En, 16, 1, -1); /* shift En & En1 1 place up */ add (En, Ex, 8, 3, mark);
sub (En, Ey, 8, 3, mark);
sub (En, Et, 8, 3, mark);
add (En1, Ex, 8, 3, mark);
sub (En1, Ey, 8, 3, mark);
add (En, Et, 8, 3, mark);
shift_fields (En, 16, 1, 1); /* shift back */
/ * * * * * * * * * * * * * convert Ex Ey & Et to absolute value and sign * * * * * * * /
convert_to_abs_and_sign (Ex+2, 9);
convert_to_abs_and_sign (Ey+2, 9);
convert_to_abs_and_sign (Et+2, 9); printf ("Computing D\n");
/ * * * * * * * * * * * * * * * Compute D = alfa*alfa+Ex*Ex+Ey*Ey
* * * * * * * * * * * * * * * * * * /
/ * * * 1 . Compute alfa*alfa+Ex*Ex by look-up-table , into field " Sc " * * * /
/ * * * 2. Compute Ey*Ey by look-up-table and sum accumulate in " Sc " * * * /
printf ("ok\n");
/ * * Step 1 * * /
for(index=0; index<256; index++)
{
letm dseq(Ex+2, Ex+9); letc dvar (Ex+2, Ex+9, index); setag; compare;
letm dseq(Sc, Sc+17); letc
dvar (Sc, Sc+17, alfa*alfa+index*index);
write;
}
/ * * Step 2 * * /
printf ("ok\n");
for(index=0; index<256; index++)
{
/** Compute Ey*Ey in Ac * * /
Ey_sqr = index*index;
letm dseq(Ey+2,Ey+9); letc dvar (Ey+2, Ey+9, index);
setag; compare;
lelm dseq(Ac, Ac+16); letc dvar (Ac, Ac+16, Ey_sqr);
write;
}
add (Ac, Sc, 16, 1, mark);
/ * * * * * * * * * * * * * * * * * * * * * * COMPUTE Ex/Sc and Ey/Sc
* * * * * * * * * * * * * * * * * * * * * * /
/* To compute Ex/Sc (or Ey/Sc), Ex (Ey) is moved to unused buffer */
/* Vav+Uav). This buffer calles "D", is used to subtract
Ex from Sc, */
/* in division process. The results are saved in "U"
("V"). */
/* It shout be noted that Ex (Ey) value is started in bit position */
/* Ex+1 (Ey+1). The sign bit in bit position Ex+9 (Ey+9). */
/* the division process will be carried out only on magnitude and */
/* will be copied to the suitable positions at the end.
*/
/ * * * * * compute Ex/Sc * * * * * / div(Ex, U); /* Compute Ex/Sc into U */
div(Ey, V); /* Compute Ex/Sc into V */
/ * * Mov Ex/Sc ( Ey/Sc ) magni tude and sign into first
(last) 9-bit of Sc */
letm dseq(Sc, Sc+17); letc; setag; write; /* clear Sc field */
for (bit_count=0; bit_count<8; bit_count++)
{
letmc d(U+bit_count); setag; compare; /* * move magnitude of Ex/Sc * */
letmc d(Sc+bit_count); write;
letmc d (V+bit_count ) ; setag ; compare ; /* * move magnitude of Ey/Sc * */
letmc d(Sc+bit_count+9); write;
}
letmc d(Ex+9); setag; compare; /* move sign of Ex */ letmc d(Sc+8); write;
letmc d(Ex+9); setag; compare; /* move sign of Ey */ letmc d(Sc+17); write;
/ * * * * * * * * * * * * * * * ITERATIVE STEP * * * * * * * * * */
for (iterations=0; iterations<no_of_iterations;
iterations++)
{
/* * Computing Uav and Vav * */ laplace (U, Uav); /* convolve U and output to Uav */ laplace (V, Vav); /* convolve V and output to Vav */
/* * Computing Ex*Uav+Ey*Vav+Et in f ield Ac* */
/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * / letm dseq(U, U+17) dseq (Ac, Ac+19); letc; setag; write; /* clear working fields*/
/* 1. Copy Et with sign extention into field Ac.
for(i=0; i<9; i++)
{
letmc d(Et+2+i); setag; compare;
letmc d(Ac+i); write;
}
letmc dseq(Ac+10, Ac+19); write;
/* 2. Computing Ex*Uav in the working field U */ convert_to_abs_and_sign (U, 9); /*express Uav as a mag. & sign. */
mul (Ex, 9, Uav, 9, U, mark);
/*perform U<╌Ex*Uav (product is given as mag. & sign). */
convert_to_abs_and_sign (U, 17); /*express the product as 2's complement*/
/* 3. Perform Ac <╌ Ac+U = Et+Ex*Uav
add (Et, Ac, 17, 2, mark);
/* 4. Computing Ey*Vav in the working field U */ convert_to_abs_and_sign (Vav, 9); /*express Vav as a mag. & sign. */
mul (Ey, 9, Vav, 9, U, mark);
/*perform U<╌Ey*Vav (product is given as mag.& sign). */
convert_to_abs_and_sign (U, 17); /*express the product as 2's complement*/
/* 5. Sum the products in Ac */
add (U, Ac, 17, 2, mark);
/* 5. Express results as mag. & sign */ convert_to_abs_and_sign (Ac, 19);
/* 6. Convert back Uav and Vav to 2's complement */ convert_to_abs_and_sign (Uav, 9);
convert_to_abs_and_sign (Vav, 9);
/ * * COMPUTING U COMPONENT OF MOTION * * /
letm dseq(U, U+27); letc; setag; write; /* clear working field*/
mul (Sc, 9, Ac, 20, U, mark); /* U <╌ Sc(x)*Ac */ sub (U+20, Uav, 9, 0, mark); /* U <╌ (Sc(x)*Ac) save results in Uav
(round results starts in
U+20)*/
/ * * COMPUTING V COMPONENT OF MOTION * */
letm dseq (U, U+27); letc; setag; write; /* clear working field*/
mul (Sc+9,9, Ac, 20, V, mark); /* U <╌ Sc(y)*Ac */ sub(U+20, Vav, 9, 0,mark); /* V <╌ (Sc(y)*Ac) save results in Uav */
/* Move -(-U) and -(-V) to "U" and "V" */
letm dseq(U,U+27); letc; setag; write; /* clear "U" and "V"*/
sub (Uav, U, 9, 0, mark);
sub (Vav, V, 9, 0,mark);
}
save; print_cycles;
}
/ * * * * * * * CONVERT TO ABSOLUTE AND SIGN * * * * * * * * / /* This procedure converts "nc" bits starting at bit position "field" */
/* to absolute and sign, just wordes marked by the sign bit */ convert_to_abs_and_sign (field, nc)
int field, nc;
{
int sign, bit_count; bit_count = field;
sign = field + nc - 1 ; letm d(temp); letc; setag; write; /* clear temp */ letmc d(mark); setag; compare;
letm d(temp); write;
letm d(mark) d(temp) d(sign) d(bit_count); while (bit_count < sign)
{
letc d(mark) d(sign) d(bit_count); setag; compare; letc d(mark) d(temp) d(sign) d(bit_count); write; bit_count++ ; letm d(mark) d(temp) d(sign) d(bit_count);
letc d(mark) d(temp) d(sign); setag; compare;
letc d(mark) d(sign) d(bit_count); write;
letc d(mark) d(temp) d(sign) d(bit_count); setag; compare;
letc d(mark) d(temp) d(sign); write;
}
} / * * * * * * * * * * * * ADD * * * * * * * * * * * * * * *
/* This procedure adds "nc" bits of field starting in
*/
/* bit position "so" into field starting at bit position "de" */
/* and propagate carry "pc" bits.
*/
/* The addition process is carried out only on
candidates wordes */
/* signed by "mr.
*/ add (so, de, nc, pc, mr)
int so, de, nc, pc, mr;
{
int bit_count;
for (bit_count= 0; bit_count<nc; bit_count++)
{
letm d(mr) d(de+bit_count) d(temp) d(so+bit_count); letc d(mr) d(temp); setag; compare;
letc d(mr) d(de+bit_count); write; letc d(mr) d (de+bit_count) d(temp); setag; compare; letc d(mr) d(temp) ; write; letc d(mr) d(de+bit_count) d(so+bit_count); setag; compare;
letc d(mr) d(temp) d(so+bit_count); write; letc d(mr) d(so+bit_count); setag; compare;
letc d(mr) d(de+bit_count) d(so+bit_count); write; } / * * * clear " temp " * * * /
letm d(temp); letc; setag; write;
/ * * * * * * propagate carry * * * * * * * /
for (; bit_count< nc+pc; bit_count++)
{
letm d(mr) d(de+bit_count) d(temp) d(mr);
letc d(mr) d(temp) d(mr); setag; compare;
letc d(mr) d(de+bit_count) d(mr); write;
letc d(mr) d(de+bit_count) d(temp) d(mr); setag; compare;
letc d(mr) d(temp) d(mr); write;
}
} / * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * SUB
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * /
/* This procedure subtract "so" field from "de", each of length "nc", */
/* and output in "de". The last bit is propagates "pb" bits. */
/* The subtraction is carried out only on candidate wordes signed by "mr" */ sub (so, de, nc, pb, mr)
int so, de, nc, pb, mr;
{
int bit_count; letm d(temp); letc; setag; write; /*clear temp*/ for (bit_count=0; bit_count<nc; bit_count++)
{
letm d(mr) d(so+bit_count) d(de+bit_count) d(temp); letc d(mr) d(so+bit_count); setag; compare;
letc d(mr) d(so+bit_count) d(de+bit_count) d(temp); write; letc d(mr) d(de+bit_count) d(temp); setag;
compare;
letc d(mr); write; letc d(mr) d(so+bit_count) d(de+bit_count); setag; compare;
letc d(mr) d(so+bit_count); write; letc d(mr) d(temp); setag; compare;
letc d(mr) d(de+bit_count) d(temp); write;
}
/** propagate borrow **/
for(; bit_count<nc+pb; bit_count++)
{
letmc d(mr) d(de+bit_count) d(temp); setag;
compare;
letc d(mr); write;
letc d(mr) d(temp); setag; compare;
letc d(mr) d(de+bit_count) d(temp); write;
} } / * * * * * * * * * * * * * * * * * * * * * * * * * * DIVISION
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * /
/* This function computes en/Sc and output results to rs*/ div (en, rs)
int en, rs;
{ int nc=8; /* result field length */
int D = 87;
int bit_count, le;
/* clear working fields */
letm dseq(D,D+17); dseq (rs, rs+nc-1); letc; setag;
write;
/* mov magnitude of en to D+9+1 */
/* ( Moving to D+9 is to expand the numerator word length (8-bit)
to the denominator (18-bit). Because the the numerator is value
is less than the denominator, it is shifted one bit more to the
left).
*/
for (bit_count=0; bit_count<nc; bit_count++)
{
letmc d(en+2+bit_count); setag; compare;
letmc d(D+10+bit_count); write;
}
/* Perform rs <╌ D/Sc */
for (bit_count=0; bit_count<nc; bit_count++)
{
le = rs+nc-1-bit_count;
compare_field (Sc, D, nc+1, le, temp, mark);
letm d(temp) d(le); letc; setag; compare;
letmc d(le); write;
sub (Sc, D, nc, 1, le);
shift_left(D, nc+1);
} } / * * * * * * * * * * * * * * * * * * * * * * * * * * * * COMPARE FIELD
* * * * * * * * * * * * * * * * * * * * * * * * * * /
/* This function testing field fl against f2, each of length nc. */
/* It is condition flag to indicate if f1 < f2, while gt indicates */
/* if f1 > f2. Comparison is carried out only on words marked by "mr" */ compare_field(f1, f2, nc, lt, gt, mr)
int f1, f2, nc, lt, gt, mr;
{
int bit_count; letm d(gt) d(lt); letc; setag; write;
/* clear greater &less than flags */ for (bit_count=nc-1; bit_count>=0; bit_count╌)
{
letm d(mr) d(gt) d(lt) d(f1+bit_count)
d(f2+bit_count);
letc d(mr) d(f1+bit_count); setag; compare;
letc d(mr) d(gt) d(f1+bit_count); write;
letc d(mr) d(f2+bit_count); setag; compare;
letc d(mr) d(lt) d(f2+bit_count); write;
}
}
/ * * * * * * * * * * * * * * * * * * * * * * * * SHIFT LEFT
* * * * * * * * * * * * * * * * * * * * * * * * * * /
/* This function is shifted left one position the operand "op" */ /* of size "nc"
*/ shift_left (op, nc)
int op, nc;
{
int bit_count;
for (bit_count=op+nc-2; bit_count>=op; bit_count╌)
{
letmc d(bit_count); setag; compare; letc; write; letmc d(bit_count+l); write;
}
}
/ * * * shif t f ields * * * /
/ * * * This procedure shifts "nc" bits starting at bit-position " s0 " * * */
/ * * * "pl " places down if dir=1 or up if dir=-1 * * */ shift_fields (s0, nc, pl, dir)
int s0, nc, pl, dir;
{
int bit_count, shift_count, long_shift_count,
short_shift_count; long_shift_count = pl/b;
short_shift_count = pl - long_shift_count*b; for (bit_count=s0; bit_count<s0+nc; bit_count++)
{
letm d(bit_count);
letc d(bit_count); setag; compare;
letc; write;
letc d(bit_count);
for(shift_count=0; shift_count<long_shift_count; shift_count++)
shiftag (dir*b);
for (shift_count=0; shift_count<short_shift_count; shift_count++)
shiftag (dir);
write;
}
}
/ * * * * * * * * * * * Estimating the Laplacian * * * * * * * * * * * * * /
/* This is done by convolving U and V with the kernel
1/4 1/2 1/4
1/2 0 1/2
1/4 1/2 1/4
* / laplace (in, out)
int in, out;
{ int bit_count, shift_index; letm dseq (out, out+9); letc; setag; write; /* clear output field */
shift_fields (in, 9, 1, 1); /* shift "in" field one place down */
add(in+1, out, 8, 2, mark); /* Compute The West term */ shift_fields(in, 9, 2, -1); /* shift "in" field tow places up */
add(in+1, out, 8, 2, mark); /* Compute The East term */ shift_fields(in, 9, b, 1) ; /* shift "in" field b places down */
add(in+2, out, 7, 3, mark); /* Compute The S-E term */ shift fields (in, 9, 1, -1); /* shift "in" field one place up */
add (m+1, out, 8, 2, mark); /* Compute The S term */ shift_fields (in, 9, 1,-1); /* shift "in" field one place up */
add(m+l, out, 8, 2, mark); /* Compute The S-W term */ shift_fields(m,9, 1,-b); /* shift "in" field b places up
*/
add (m+2, out, 7, 3, mark); /* Compute The N-W term */ shift_fields (in, 9, 1, 1) ; /* shift "in" field one place down */
add (m+1, out, 8, 2, mark); /* Compute N term */
shift_fields(m,9, 1, 1) ; /* shift "in" field one place down */
add(m+2, out, 7, 3, mark); /* Compute NE term */
} / * * * * * * * * * * * * * * * * * * * * * * * * * * * * * MUL
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */
/* This function performs integer multiplication, between Multiplicand
"mcd" of length "s_mcd" bits, by Multiplier "Mlr" of length "s_mLr" bits,
and output product results into "prd", to all wordes signed by "mr".
*/ mul (mcd, s_mcd, mlr, s_mlr, prd, mr)
int mcd, s_mcd, mlr, s_mlr, prd, mr;
{
int add_offset, bit_count, f_index; for (add_offset=0; add_offset<s_mlr; add_offset++)
/** add and propagate s_mlr-add_offset positions, only where mlr+add_offset is sets. * * /
add (mcd, prd, s_mcd, s_mlr-add_offset, mlr+add_offset);
}
/ * * * * sign calculation * * * * * /
letmc d(mcd+s_mcd); setag; compare;
letmc d(prd+s_mcd+s_mlr-1); write;
letmc d(mmr+s_mlr); setag; compare;
letmc d(prd+s_mcd+s_mlr-1); write;
APPENDIX I /*
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - yyuv. C
This program demonstrate the RGB -> YUV operation
and display the Y componet of the
128 × 128 Image .
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */
#define Pixel 0
#define PIXELSIZE 8
#define PixelSign Pixel + PIXELSIZE
#define Yr 0
#define Yg Yr+8
#define Yb Yg+8
#define RedPixel 24
#define GreenPixel RedPixel+8
#define BluePixel GreenPixel+8 addb(int src , int src_len, int dst , int dst_len, int c_bit);
add(int src , int src_len, int dst , int dst_len, int c_bit); enum color {RED, GREEN, BLUE }; main(int , char *argv[ ])
{
int j, i ; int color ; confifo (0);
letm dseq (0, 23); letc; setag; write;
letm dseq (24, 47); write;
letm dseq (48, 71); write;
printf ("Runnig Asp Simulator : %s \n\n", argv [0]);
load (24L, 48L);
for ( i = 0 ; i < 24 ; i++)
{
letmc d(71-i) ; setag; compare;
letmc d(24+i); write;
} for (color = 0 ; color < 256 ; color++ )
{
letm dseq ( RedPixel, RedPixel+PIXELSIZE-
1);
letc dvar (RedPixel, RedPixel+PIXELSIZE- 1, color); setag;
compare; letm dseq ( Yr, Yr+PIXELSIZE-1);
letc dvar ( Yr, Yr+PIXELSIZE-1, (int) (0.299 * color));
write;
}
for (color = 0 ; color < 256 ; color++ )
{
letm dseq(
GreenPixel, GreenPixel+PIXELSIZE-1);
letc dvar (GreenPixel, GreenPixel+PIXELSIZE-1, color); setag;
compare; letm dseq ( Yg, Yg+PIXELSIZE-1);
letc dvar (Yg, Yg+PIXELSIZE- 1,0.587*color);
write;
}
for (color = 0 ; color < 256 ; color++ )
{
letm dseq(
BluePixel,BluePixel+PIXELSIZE-1);
letc dvar
(BluePixel, BluePixel+PIXELSIZE-1, color); setag;
compare; letm dseq ( Yb, Yb+PIXELSIZE-1);
letc dvar ( Yb, Yb+PIXELSIZE-1, (int) (0.114 * color));
write;
}
addb(Yg, PIXELSIZE, Yr, PIXELSIZE, 8);
add (Yb, PIXELSIZE, Yr, 9, 9); letm dseq (48, 71); letc; setag; write; for (i= 0 ; i< 8;i++)
{
letmc d(i); setag; compare;
letmc d(55-i) ; d(63-i) ; d(71-i) ;write;
}
save(24L, 48L);
return 0;
} addb(int src , int src_len, int dst , int dst_len, int c_bit)
{
int cnt ;
/* clear carray bit */
// letm d(c_bit) ; setag ;
// letc;
// write;
letm d(dst) d( src);
letc d(src); setag;
compare;
letc d(dst);
write; letc d(dst) d( src); setag;
compare;
letc d(c_bit);
write; for(cnt=1; cnt<src_len; cnt++)
{
letm d(dst+cnt) d( c_bit) d( cnt+src); letc d(c_bit); setag;
compare;
letc d(dst+cnt);
write; letc d(dst+cnt) d( c_bit); setag;
compare;
letc d(c_bit);
write; letc d(dst+cnt) d( cnt+src); setag;
compare; letc d(c_bit) d( cnt+src) ;
write; letc d(cnt+src); setag;
compare;
letc d(dst+cnt) d( cnt+src);
write; }
/* carry propogation */ for (; cnt < dst_len ; cnt++)
{
letm d(dst+cnt) d( c_bit) ;
letc d(c_bit); setag;
compare;
letc d(dst+cnt);
write; letmc d(dst+cnt) d( c_bit); setag;
compare;
letc d(c_bit);
write;
} return 0;
} add(int src , int src_len, int dst , int dst_len, int c_bit)
{
int cnt ;
/* clear carray bit */
letm d(c_bit) ; setag ;
letc;
write; for(cnt=0; cnt<src_len; cnt++)
{
letm d(dst+cnt) d( c_bit) d( cnt+src); letc d(c_bit); setag;
compare;
letc d(dst+cnt);
write; letc d(dst+cnt) d( c_bit); setag;
compare;
letc d(c_bit);
write; letc d(dst+cnt) d( cnt+src); setag; compare;
letc d(c_bit) d( cnt+src) ;
write; letc d(cnt+src); setag;
compare;
letc d(dst+cnt) d( cnt+src);
write; }
/* carry propogation */ for (; cnt < dst_len ; cnt++)
{
letm d(dst+cnt) d( c_bit) ;
letc d(c_bit); setag;
compare;
letc d(dst+cnt);
write; letmc d(dst+cnt) d( c_bit); setag;
compare;
letc d(c_bit);
write;
} return 0;
} shift_2 ( int pos , int len , int res , int shift_factor)
{
int i ;
/* clean the result operand */ letm dseq (res, res+ len + shift_factor-1) ; setag;
letc ;
write ;
/* copy the operand from pos to res
+shift_factor using the tag register */ for ( i = 0; i < len ; i++ )
{
letmc d( pos + i) ; setag ;
compare ; letmc d( res + shift_factor + i);
write ;
}
return 0 ;
} APPENDIX J
/* ASSOCIATIVE PROCESSOR IMPLEMENTATION, TO FIND
*/
/* LINES DIRECTIONS IN IN 9X9 NEIGHBORHOOD.
*/
/* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * /
/ * * * * * * * * * * * * * * * * * * * MEMORY ORGANI ZATION
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * /
/*
*/
/* WE ORGANIZE THE MEMORY AS FOLLOWES:
*/
/*
*/
/* | - - | - - | - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - | - - - - - - - - - - - - - - - - - - - - - - - - | - - | * /
/* | - - | - - | - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - | - - - - - - - - - - - - - - - - - - - - - - - - | - - |*/
/* M T G23 G22 . . . G1 G0 L15 L14 . . . L1 L0 D */
/* A E
*/
/* R M
*/
/* K P
*/
/*
*/
/* BIT 0 IS THE EDGE FLAG "D" ( D = 1 EDGE AT THIS
POINT) */
/* BITS 1- 16 ARE 16 ADDRESS OF THE LINES PRESENTED AT
TABLE 1. */
/* BITS 17-40 ARE THE 24 BITS OF THE REDUCTION LOGIC.
*/ /* BIT 41 IS TEMP (TEMPORARY)
*/
/* BIT 42 IS MARK
*/
/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * / / * * * * * * * * * * * * * * * * * * * * * MAIN PROGRAM
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * /
#define MEM_SIZE 400 /* 20 × 20 image size example*/
#define WORD_LENGTH 43
#include "asslib.h"
#define NW2 17 /* group-0 bit position */
#define N2 18 /* group-1 bit position */
#define NE2 19 /* group-2 bit position */
#define NW1 20 /* group-3 bit position */
#define N1 21 /* group-4 bit position */
#define NE1 22 /* group-5 bit position */
#define NW 23 /* group-6 bit position */
#define N 24 /* group-7 bit position */
#define NE 25 /* group-8 bit position */
#define W2 26 /* group-9 bit position */
#define W1 27 /* group-10 bit position */
#define W 28 /* group-11 bit position */
#define E 29 /* group-12 bit position */
#define E1 30 /* group-13 bit position */
#define E2 31 /* group-14 bit position */
#define SW 32 /* group-15 bit position */
#define S 33 /* group-16 bit position */
#define SE 34 /* group-17 bit position */
#define SW1 35 /* grcup-18 bit position */
#define S1 36 /* group-19 bit position */
#define SE1 37 /* group-20 bit position */
#define SW2 38 /* group-21 bit position */ #define S2 39 /* group-22 bit position */
#define SE2 40 /* group-23 bit position */ main ( ) {
int b = 20; /* long shift constant */
int index, bit_count, seg_address, i1,i2;
int seg_count = 0;
int D = 0; /* edge data bit position*/
int L0 = 1; /* first segment address bit position */ int G0 = 17; /* "G" field first bit position */
int TEMP = 41;
int MARK = 42; static int seg[24][3] =
{
{ E, E1, E2 } , { E, NE1, E2 }, { NE, NE1, NE2 }, { NE, NE1, N2},
{ N, N1, N2 }, { N, NW1, N2 }, { NW, NW1, NW2 }, { NW, NW1, W2 },
{ W, W1, W2 }, { W, SW1, W2 }, { SW, SW1, SW2 }, { SW, SW1, S2 },
{ S, S1, S2 }, { S, SE1, S2 }, { SE, SE1, SE2 }, { SE, SE1, E2 },
{ NE, NE1, E2 }, { N, NE1, N2 }, { NW, NW1, N2 }, { W, NW1, W2 },
{ SW, SW1, W2 }, { S, SW1, S2 }, { SE, SE1, S2 }, { E, SE1, E2 }
};
load;
/ * * * * * * * * * * * * * * * * * * * * * * * update " G " f ield
* * * * * * * * * * * * * * * * * * * * * * * * * /
/ * * * * CLEAR * * * * /
letm dseq (L0, TEMP); letc; setag; write;
/ * * * * move " D" to " TAG " and shif t 4 LINES AND 4 PIXELS DOWN * * * * /
letmc d(D) d(MARK); setag; compare;
for(index=0; index<4; index++)
shiftag (b);
for (index=0; index<4; index++)
shiftag (1);
/ * * * * * * GROUPING * * * * * * /
letmc d(NW2); write; shiftag (-1); write;
for (index=2; index<7; index++) { letmc d(N2); shiftag (-1); write; }
letmc d(NE2); shiftag (-1); write; shiftag (-1); write; shiftag (-1);
shiftag (-b); for (index=8; index>5; index╌) { letmc d(NE2);
shiftag (1); write;}
for (;index>2; index╌) { letmc d(N2); shiftag(1); write;
}
for (;index>-1; index╌) { letmc d(NW2); shiftag (1);
write; } shiftag (-b); letmc d(W2); write;
letmc d(NW2); shiftag (-1); write;
letmc d(NW1); shiftag (-1); write;
letmc d(NW1); shiftag (-1); write;
letmc d(N1); shiftag (-1); write;
letmc d(NE1); shiftag (-1); write;
letmc d(NE1); shiftag (-1); write;
letmc d(NE2); shiftag (-1); write;
letmc d(E2); shiftag (-1); write; shi f tag ( -b ) ; letmc d(E2); write;
letmc d(E2); shiftag (1); write;
letmc d(NE1); shiftag (1); write;
letmc d(NE); shiftag (1); write;
letmc d(N); shiftag (1); write;
letmc d(NW); shiftag (1); write;
letmc d(NW1); shiftag (1); write;
letmc d(W2); shiftag (1); write;
letmc d(W2); shiftag (1); write; shiftag (-b); letmc d(W2); write;
letmc d(W2); shiftag (-1); write;
letmc d(W1); shiftag (-1); write;
letmc d(W); shiftag (-1); write;
letmc d(E); shiftag (-1); shiftag (-1); write; letmc d(E1); shiftag (-1); write;
letmc d(E2); shiftag (-1); write;
letmc d(E2); shiftag (-1); write; shiftag (-b); letmc d(E2); write;
letmc d(E2); shiftag (1); write;
letmc d(SE1); shiftag (1); write;
letmc d(SE); shiftag (1); write;
letmc d(S); shiftag (1); write;
letmc d(SW); shiftag (1); write;
letmc d(SW1); shiftag (1); write;
letmc d(W2); shiftag (1); write;
letmc d(W2); shiftag (1); write; shiftag (-b); letmc d(W2); write;
letmc d(SW2); shiftag (-1); write;
letmc d(SW1); shiftag (-1); write;
letmc d(SW1); shiftag (-1); write;
letmc d(S1); shiftag (-1); write;
letmc d(SE1); shiftag (-1); write;
letmc d(SE1); shiftag (-1); write;
letmc d(SE2); shiftag (-1); write;
letmc d(E2); shiftag (-1); write; shiftag (-1);
shiftag (-b); for (index=0; index<3; index++) { letmc d(SE2);
shiftag (1); write;}
for (; index<6; index++) { letmc d(S2); shiftag (1); write;
}
for (;index<9;index++) { letmc d(SW2); shiftag (1);
write; } shiftag (-b); letmc d(SW2); write; shiftag (-1); write;
for (index=6; index>1; index╌) { letmc d(S2); shiftag (-1); write;}
letmc d(SE2); shiftag (-1); write; shiftag (-1); write;
/ * * * * * * * * * * * * * * * * * * * * * SEGMENTS CLASSIFICATION
* * * * * * * * * * * * * * * * * * / for (seg_count=0; seg_count<24; seg_count++)
{
if (seg_count>15)
seg_address = L0 + 2*(seg_count -15) - 1; el se
seg_address = L0 + seg_count;
/ * * * tes t i f hamming_di stance < = 1 * * * * /
/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * / letm d(D) d(seg[seg_count] [0]) d(seg[seg_count] [1]) d(seg_address) d(MARK);
letc d(D) d(seg[seg_count] [0]) d(seg[seg_count] [1]) d(MARK);
setag; compare;
letc d(D) d(seg[seg_count] [0]) d(seg[seg_count] [1]) d(seg_address) d(MARK); write; letm d(D) d(seg[seg_count] [0]) d(seg[seg_count] [2] ) d (seg_address) d(MARK);
letc d(D) d(seg[seg_count] [0]) d(seg[seg_count] [2]) d(MARK);
setag; compare;
letc d(D) d(seg[seg_count] [0]) d(seg[seg_count] [2]) d(seg_address) d(MARK); write; letm d(D) d(seg[seg_count] [1]) d(seg[seg_count] [2]) d(seg_address) d(MARK);
letc d(D) d(seg[seg_count] [1]) d(seg[seg_count] [2]) d(MARK);
setag; compare;
letc d(D) d(seg[seg_count] [1]) d(seg[seg_count] [2]) d(seg_address) d(MARK); write;
}
/ * * process segments * * /
/ * * * * * * * * * * * * * * * * * * * * * * / for(i1=L0; i1<L0+16; i1+=2)
{
i2 = i1 - 1;
letmc d(i2<L0 ? 15 : i2) d(i1) d(i1+1) d(MARK); setag; compare;
letc d(i1) d(MARK); write;
letmc d(i2<L0 ? 15 : i2) d(i1) d(MARK); setag; compare;
letc d(i1) d(MARK); write;
letmc d(i1) d(i1+1) d(MARK); setag; compare; letc d(i1) d(MARK); write;
}
/* * set TEMP if lines or corners were found * */ / * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * / for(i1=L0; i1<L0+16; i1++)
for(i2=L0; i2<L0+16; i2++) if (i1 != i2)
{
letm dseq(L0,L0+15) d(TEMP) d(MARK);
letc d(i1) d(i2) d(MARK); setag; compare; letmc d(TEMP); write;
} save;
print_cycles; }
APPENDIX K
/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * /
/* CONTOUR LABELING BY ASSOCIATIVE PROCESSOR
*/
/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * /
/ * * * * * * * * * * * * * * * * * * * * MEMORY ORGANI ZATION
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * /
/* Image size is N X N pixels, i.e. coordinates fields are 2Log2 (N) */
/* bits in length.
*/
/* 2Log2 (N) 2Log 2 ( N) 1 1 1 1 1
2Log2 (N) 1 * /
/* | - - - - - - - - - - - - - - | - - - - - - - - - - - - - | - - | - - | - - | - - | - - | - - - - - - - - - - - - - | - - | * /
/* | - - - xy_coord- - - | - - - operand- - - | - - | - - | - - | - - | - - | - - - - label - - - - | - - | */
/* temp sf lt gt edge */
/* mark
*/
/* INPUT FIELDS :
*/
/* 1) xy_coord gives position of all pixels. */
/* 2) edge indicates edge points.
*/
/* INTERMEDIATE (WORKING) FIELDS :
*/
/* 1) label gives x-y coordinates of all edge points. */
/* 2) operand gives label of connected edge to be tested. */
/* 3) lt, gt indicate if operand is less than or greater */
/* than the label field.
*/
/* 4) sf switch-flag indicates if label was changed in */
/* current iteration.
*/
/* 5) mark indicates if label was ever exchanged. */
/* 6) temp holds edge flag of neighbor under test. */
/* OUTPUT FIELDS :
*/
/* 1) label gives label of contours,
*/
/* 2) mark marks contour starting points.
*/
/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * /
#define MEM_SIZE 64 /* 8 × 8 image size example */
#define WORD_LENGTH 24
#include "asslib.h" main( ) {
/** Initialize bit positions **/ int edge = 0;
int label = 1;
int gt = 7; int lt = 8;
int sf = 9;
int mark = 10;
int temp = 11;
int operand = 12;
int xy_coord = 18; int label_size = 6;
int growth_threshold = 0;
int new_condition = MEM_SIZE; int bit_count, growth, window_index; load;
/** Clear working fields **/
letm dseq (label, mark); letc; setag; write;
/ * * * Mark and label all edge points ***/
letmc d(edge); setag; compare;
letmc d(mark); write;
for (bit_count=0; bit_count<label_size; bit_count++)
{
letmc d(xy_coord+bit_count) d(edge); setag;
compare;
letmc d (label+bit_count); write;
} while ( new_condition > growth_threshold)
{
printf ("%d\n", new_condition);
letm d(sf) letc; setag; write; /*clear switch flag
* /
/* * * * * * CONNECTIVITY TESTING * * * * * */
for (window_index=0; window_index<8; window_index++) {
/** Shift "edge" and "label" into "temp" and
"operand" **/
letm dseq (temp, operand+label_size-1); letc; setag; write;
for (bit_count=0; bit_count<label_size+1;
bit_count++)
{
letmc d(edge+bit_count); setag; compare; letmc d(temp+bit_count);
general_shift (window_index); write;
}
/** Test if "operand" < "label" **/ letm d(gt) d(lt); letc; setag; write;
/* clear greater & less than flags */ for (bit_count=label_size-1; bit_count>=0;
bit_count╌)
{
letm d(edge) d(temp) d(gt) d(lt)
d(operand+bit_count) d(label+bit_count); letc d(edge) d(temp) d(operand+bit_count); setag; compare;
letc d(edge) d(temp) d(gt)
d(operand+bit_count); write;
letc d(edge) d(temp) d(label+bit_count); setag; compare;
letc d(edge) d(temp) d(lt)
d(label+bit_count); write;
}
/** At all points where "It" is set, do the following:
1. Clear mark. 2. Set switch flag "sf".
3. Copy "operand" into "label".
** / letmc d(lt); setag; compare;
/* clear "label" and "mark", set switch flag */ letm dseq (label, label+label_size-1) d(mark) d(sf); letc d(sf); write;
/* exchange */
for (bit_count=0; bit_count<label_size; bit_count++)
{
letmc d(operand+bit_count) d(lt); setag; compare;
letmc d(label+bit_count); write;
}
}
/** Test for termination **/
letmc d(sf); setag; compare;
new_condition = countag;
} letmc d(mark); setag; compare;
/ * * * Display the number of contours ***/
printf(" THERE ARE %d CONTOURS IN THE IMAGE\n", countag); save; print_cycles; } / * * * * * * * * * * * * * * * * GENERAL SHIFT IN 3 X 3 NEIGHBORHOOD * * * * * * * * * * * * * * /
/* This function shifts the Tag register according to the following*/
/* table:
*/
/*
*/
/*
*/
/*
*/
/*
*/
/*
*/
/*
*/
/*
Figure imgf000205_0001
*/
/* where the pixel index 0-7 points to the required shift given in */
/* parentheses.
*/
general_shift(index)
int index;
{
int b = 8; /* long shift */ if(index<=2) shiftag (b);
if(index>=4 && index<=6) shiftag (-b);
if(index==0 || index==6 || index==7) shiftag(-1);
if(index>=2 && index<=4) shiftag (1);
}
Figure imgf000206_0001
| - - - - - - - - | - - | - - - - - - - - | - - |
mrk tmp T2 T1 E8 sig8 E2 sig2 E1 sig1 */
#define MEM_SIZE 64
#define WORD_LENGTH 90
#include "asslib.h" main( )
{
static int sig[8] = {0, 9, 18, 27, 36, 45, 54, 63}; /* sigma bit position*/
static int E[8] = {1, 10, 19, 28, 37, 46, 55, 64}; /* seliency measures
bit positions */
static int BrachConnect [8][5] = {
{ 1,0,8,1,2 }, /* neighboring braches, connecting to brach 1 */
{ 1,-1,1,2,3 }, /* neighboring braches, connecting to brach 2 */
{ 0,-1,2,3,4 }, /* neighboring braches, connecting to brach 3 */
{ -1,-1,3,4,5}, /* neighboring braches, connecting to brach 4 */
{ -1,0,4,5,6 }, /* neighboring braches, connecting to brach 5 */
{ -1,1,5,6,7 }, /* neighboring braches, connecting to brach 6 */
{ 0,1,6,7,8 }, /* neighboring braches, connecting to brach 7 */
{ 1,1,7,8,1 } /* neighboring braches, connecting to brach 8 */ } int T1 = 72; /* 8 bit temporary */
int T2 = 80; /* 8 bit temporary */
int tmp = 88;
int mrk = 89;
/* * * * * * * * * * * * * * * clear Temporary Fields
* * * * * * * * * * * * * * * * */
letm dseq(T1,T1+8); letc; setag; write;
/** Compute E, the salieny seasure, to every brach, connected to P **/
for (branch_cnt=0; brach_cnt<8; brach_cnt++)
{
shift_and_do (brach_cnt, 0, T1); /* perform Ej*Fij ╌> T1 for curv_cnt=0 */
for (curv_cnt=1; curv_cnt<2; curv_cnt++)
{
shift_and_do (brach_cnt, curv_cnt, T1); /* Ej*Fij ╌> T1 */
max_field(T1,T2); /* maximum (T1,T2) ╌> T2 */
}
Sum_Acc (brach_cnt, T2, E); /* SIGi +
ROUi*MAX(Ej*Fij) ╌> Ei */
}
}
/* * * * * * * * * * * * SHHIFT_AND_DO * * * * * * * * * * * * * * /
shift_and_do (be, cc, x0)
int bc, cc, x0;
{
int bit_cnt, long_shift, short_shift;
int b=8; if (cc==0 | | cc==2) START=1;
if(cc==1) START=0;
for (bit_cnt=START; bit_cnt<8; bit_cnt++)
{
letmc d(E[BrachConnect [bc] [cc+2]]+bit_cnt); setag; compare;
long_shift = b*Brach_connect [bc][0];
short_shift= brach_connect [bc][1];
if (long_shift != 0 ) shiftag (long_shift );
if (short_shift != 0 ) shiftag (short_shift);
letmc d (x0+bit_cnt); write;
}
}
/* * * * * * * * * * * * MAX_FIELD * * * * * * * * * * * * * /
max_field(t0, t1)
int t0,t1;
{
int bit_count, next_bit_t0, next_bit_t1;
/ * * * * * * * * * Set " tmp " if to > t1 * * * * * * * * */
letm d(tmp); letc; setag; write; /* clear "tmp" */ for (bit_count=7; bit_count>=0; bit_count╌)
{
next_bit_t0 = t0 + bit_count;
next_bit_t1 = t1 + bit_count;
letm d (tmp) d (next_bit_t0) d (next_bit_t1);
letc d (next_bit_t0); setag; compare; letc d(tmp); write;
}
/* * * * * * * * * Copy t0 to t1 i f " tmp" = 1 * * * * * * * * */ letmc d(tmp); setag; compare;
letm dseq (t1, t1+7); letc; write; /* clear "t1" field */ for (bit_count=0; bit_count<8; bit_count++)
{ letmc d (t0+bit_count); setag; compare;
letmc d (t1+bit_count); write;
}
}
/** Shift the Suitable Braches given in BrachConnect Array, to T1**/
/** Compute: T1<╌T1*fij **/
APPENDIX M /* * * HOUGH TRANSFORM FOR LINES * * * /
#define WORD_LENGTH 53
#define MEM_SIZE 4
#define IFILE "h8.inp"
#define OFILE "h8.out"
#define thmax 8 /* pow2 (outbitth) */
#define romax 8 /* pow2 (outbitro) */
#include "asslib.h"
#include <math.h>
#include "funlib5.h" int bitcount, carry, theta, ro, mth1, place; int outbitth=3; /* Length of theta in the output */
int outbitro=3; /* Length of ro in the output */ int spsin=2; /* Starting bit of sin(th)
*/
int spcos=14; /* Starting bit of cos(th) */
int spro=15; /* Starting bit of ro
*/
int spy=24; /* Starting bit of y, absolute value & sign */
int spx=33; /* Starting bit of x, absolute value & sign */
int spth=42; /* Starting bit of theta
*/ int temp=23; /* Position of temporary bit */
int cand=52; /* Position of edge point flag */ int lenth=10; /* Length of theta
*/
int lenx=9; /* Length of x and y
*/
int lensin=9; /* Length of sine and cosine */
int lenro=9; /* Length of ro, x*cos(th) and y*sin(th) */
int histogram[thmax][romax]; /* The accumulator array
*/
/* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * /
/*
| - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -| */
/* | 1 | 10 | 9 | 9 | 1 | 8 | 1 | 2 | 1 |
6 | 2 | 1 | 1 | 1 | */
/*
|- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -| */
/* |cand|theta| x | y |temp| cos(th) | guard | sin(th) | | */
/*
|- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -| */
/* |cand|theta| x | y |temp| x* cos(th) | guard | sin(th) | guard | */
/*
|- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -| */ /* |cand|theta| x | y | x*cos(th) 2's c |
|y*sin(th) 2's| | temp | */
/*
|- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -| */
/* |cand|theta| x | y | ro | |
| temp | */
/*
|- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -| */
/* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * / main( )
{
int spysin=(spsin+1) +lensin-lenx; /* Starting bit of y*sin(th) */
int spxcos=spcos+lensin-lenx; /* Starting bit of x*cos(th) */ load;
letm dseq (0, spy-1); letc; setag; write;
/* Lookup table with modified theta sharing the sine field */
lookup2 (spsin, spcos, spth, lenth, temp, spcos-1);
letm d(temp) dseq ( spsin+lensin, spcos-1); letc; setag; write;
/* Truncated multiplication of x by cos(th) */
tr_multiplication(spx, lenx, spcos, lensin, temp);
/* Make all zero values positive */
letm dseq (spxcos-3, spxcos+lenro-2); letc; setag;
compare;
letm d (spxcos+lenro-1); write; /* Reset temporary and last guard bit */
letm d(temp) d(spxcos-3); letc; setag; write;
/* Move sm(th) to its proper position */
for (bitcount=lensin-2;bitcount>-1; bitcount╌)
{
letm d(spsin+bitcount) d(spysin+bitcount); letc d(spsin+bitcount); setag; compare;
letc d(spysin+bitcount); write;
}
/* Truncated multiplication of y by sιn(th) */
tr_multiplication(spy, lenx, spsin+1, lensin, temp);
/* Make all zero values positive */
letm dseq (spysin-3, spysin+lenro-2); letc; setag; compare;
letm d (spysin+lenro-1); write;
/* Reset temporary bit */
letm d(temp); letc; setag; write; twoscomp (spxcos-2,lenro+2,temp);
letm d(temp); letc; setag; write;
twoscomp (spysm-3,lenro+3,temp);
letm d(temp); letc; setag; write; temp=0;
/* Copy sign bit */
letmc d(spxcos+lenro-1); setag; compare;
letmc d(spxcos+lenro); write;
/* Add to obtain ro */
addition (spysin-2, spxcos-2, lenro+2, temp, cand); /* Complete the addition */
letm d(spro+lenro-1) d(temp) d(spysin+lenro-1);
letc d(spysin+lenro-1) d(spro+lenro-1); setag; compare; letc d(temp) d(spysin+lenro-1); write;
letc d(spysin+lenro-1); setag; compare;
letc d (spro+lenro-1) d(spysin+lenro-1); write;
letc d(temp); setag; compare;
letc d(spro+lenro-1); write;
letc d(temp) d(spro+lenro-1); setag; compare;
letc d(temp); write;
/* Correct the addition */
letm d(spro-1) d(spro-2);
letc d(spro-1); setag; compare;
letc d(spro-2); write; for (bitcount=0; bitcount<lenro; bitcount++)
{
letm d(spro+bitcount) d(spro-2);
letc d(spro-2); setag; compare;
letc d(spro+bitcount); write;
letc d(spro-2) d(spro+bitcount); setag; compare; letc d(spro-2); write;
}
/* Accumulate and read out histogram */
/* If upper bit of ro not significant shift down one place */ place=l;
letm d(spro+lenro-1) d(spro+lenro-2) d(cand);
letc d(spro+lenro-1) d(cand); setag; compare;
if (PAR. RSP) place=0;
letc d(spro+lenro-2) d(cand); setag; compare;
if (PAR. RSP) place=0; letm dseq ( spth+lenth-outbitth, spth+lenth-1)
dseq (spro+lenro-outbitro-place, spro+lenro-1-place) d(cand); for (theta=0; theta<thmax; theta++)
{
for (ro=0; ro<romax; ro++)
{
if (ro != pow2 (outbitro-1)) {
letc dvar (spth+lenth-outbitth, spth+lenth-1, theta)
dvar (spro+lenro-outbitro-place, spro+lenro-1-place,ro) d(cand);
setag; compare;
hi stogram [ theta ] [ ro ] = countag ; }
}
} save;
}
APPENDIX N /* * HOUGH TRANSFORM FOR CIRCULES * */
#define WORD_LENGTH 63
#define MEM_SIZE 4
#define IFILE "cirl.inp"
#define OFILE "cirl.out"
#define xmax 128 /* pow2 (outbitx0) */
#define ymax 128 /* pow2 (outbity0) */
#include "asslib.h"
#include <math.h>
#include "funlib6.h" int x, y, bitcount; int R=29; /* The radius value
*/
int lenR=5; /* Length of R, Rcos(th), Rsin(th) */ int spy0=0; /* Starting bit of y0
*/
int spx0=9; /* Starting bit of x0
*/
int spsin=15; /* Starting bit of sine
*/
int spRsin=18; /* Starting bit of Rsin(th) */
int spcos=21; /* Starting bit of cosnne
*/
int spRcos=24; /* Starting bit of Rcos(th) */
int spy=30; /* Starting bit of y in 2's complement */ int spx=39; /* Starting bit of x in 2's complement */
int spth=48; /* Starting bit of theta
*/ int temp=59; /* Position of temporary flag */
int over1=60; /* Position of the first overflow flag */
int over2=61; /* Position of the second overflow flag */
int cand=62; /* Position for the edge point flag */ int lenth=11; /* Length of theta, in absolute value and sign */
int lenx=9; /* Length of x, y, x0, y0 in 2's complement */
int lencos=9; /* Length of cos, sin
*/ int outbitx0=7; /* Length of x0 in the output */
int outbity0=7; /* Length of y0 in the output */
int histogram[xmax][ymax]; /* Accumulator array
*/ /* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * /
/*- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */
/*|1 |1 |1 |1| 11 | 9 | 9 | 6 | 3 | 3 |2|1|
6 | 9 |*/
/* |- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -| */ /*|c |01 |02 |T| theta | x | y | cos(th) | | sin(th) | |*/
/* |- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -|*/
/*|c | 01 | 02 | T | theta | x | y | Rcos(th)| sin(th) |
|*/
/* |- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - |*/
/*| c | 01 | 02 | T | theta | x | y | Rcos (th) | Rsin(th) | x0
| y0 |*/
/*- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -*/
/*
*/
/* c : (cand) flag for edge point
*/
/* 01 : (over1) overflow for x0
*/
/* 02 : (over2) overflow for y0
*/
/* T : (temp) temporary flag
*/
/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * / main ( )
{
load;
letm dseq (0, spy-1) d(temp); letc; setag; write;
/* Load cosine and sine */
lookup2 (spx0, spcos, spth, lenth-1, temp, spx0-1);
letm d(temp) d(spx0.1); letc; setag; write;
/* Multiplication cos(th) by R */
r_multiplication (spcos, lencos, R, lenR, temp); letm dseq (spx0+lencos, spRcos-1) d(temp); letc; setag; write;
/* Transfer sine to its proper position */
for (bitcount=lencos-2; bitcount>-1; bitcount╌)
{
letm d(bitcount+spx0) d(spsin+bitcount);
letc d(bitcount+spx0); setag; compare;
letc d(spsin+bitcount); write;
}
/* Multiplication sin(th) by R */
r_multiplication (spsin, lencos, R, lenR, temp);
/* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * /
/*
*/
/* Find centers of white circles
*/
/*
*/
/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * / letm dseq(spy0, spRsin-1) d(temp); letc; setag; write;
/* Copy R*cos(th) and R*sin(th) to x0 and y0,
respectively, right justified */
for (bitcount=0;bitcount<lenR; bitcount++)
{
letmc d(bitcount+spRcos); setag; compare;
letmc d(bitcount+spx0); write;
letmc d(bitcount+spRsin); setag; compare;
letmc d(bitcount+spy0); write;
} /* Copy sign of R*cosine and R*sine inverting when theta<0 */ letmc d(spth+lenth-1); setag; compare;
letmc dseq(spy0+lenR, spy0+lenx-1); write; letm d(spth+lenth-1) d(spRcos+lenR)
dseq(spx0+lenR,spx0+lenx-1);
letc d(spth+lenth-1); setag; compare;
letc d(spth+lenth-1) dseq(spx0+lenR, spx0+lenx-1);
write;
letc d(spRcos+lenR); setag; compare;
letc d(spRcos+lenR) dseq(spx0+lenR, spx0+lenx-1); write;
/* Make every zero value positive */ letm dseq(spx0, spx0+lenR-1); letc; setag; compare;
letm dseq(spx0+lenR, spx0+lenx-1); write;
letm dseq(spy0, spy0+lenR-1); setag; compare;
letm dseq(spy0+lenR, spx0+lenx-1); write; twoscomp(spx0, lenR+1, temp);
letm d(temp); letc; setag; write;
twoscomp (spy0, lenR+1, temp);
letm d(temp) d(over1); letc; setag; write;
/* Look for the pos sible overf lows of x0 and mark them */ letm d(spx+lenx-1) d(spx0+lenx-1) d(over1);
letc d(spx+lenx-1) d(spx0+lenx-1); setag; compare;
letc d(over1) d(spx+lenx-1) d(spx0+lenx-1); write;
letc; setag; compare;
letc d(over1); write; addition (spx, spx0, lenx, temp, cand); /* Find the overflows of x0 */
letm d(over1) d(temp) d(spx0-lenx-1);
letc d(over1) d(temp) d(spx0-lenx-1); setag; compare; letc d(spx0+lenx-1); write;
letc d(over1); setag; compare;
letc; write;
/* Look for the possible overflows of y0 and mark them */ letm d(temp) d(over2); letc; setag; write; letm d(spy+lenx-1) d(spy0+lenx-1) d(over2);
letc d(spy+lenx-1) d(spy0+lenx-1); setag; compare;
letc d(over2) d(spy+lenx-1) d(spy0+lenx-1); write;
letc; setag; compare;
letc d(over2); write; addition (spy, spy0, lenx, temp, cand);
/* Find the overflows of y0 */
letm d(over2) d(temp) d(spy0+lenx-1);
letc d(over2) d(temp) d(spy0+lenx-1); setag; compare; letc d(spy0+lenx-1); write;
letc d(over2); setag; compare;
letc; write;
/* Histogram of white circle location */
letm dseq(spx0+lenx-outbitx0, spx0+lenx-1)
dseq(spy0+lenx-outbity0, spy0+lenx-1) d(over1) d(over2) d(cand); for (x=0; x<xmax; x++)
{
for (y=0; y<ymax; y++)
{
letc dvar(spx0+lenx-outbitx0, spx0+lenx-1,x) dvar(spy0+lenx-outbity0, spy0+lenx-1,y) d(cand);
setag; compare;
histogram[x] [y]=countag;
}
}
/* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * / /*
*/
/* Find centers of black circles
*/
/*
*/
/ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * / letm dseq(spy0,spRsin-1) d(temp); letc; setag; write;
/* Copy R*cos(th) and R*sin(th) to x0 and y0,
respectively, right justified */
for (bitcount=0; bitcount<lenR; bitcount++)
{
letmc d(bitcount+spRcos); setag; compare;
letmc d(bitcount+spx0); write;
letmc d(bitcount+spRsin); setag; compare;
letmc d(bitcount+spy0); write;
}
/* Copy sign of R*cosine and R*sine inverting when theta>0 */ letm d(spth+lenth-1) dseq(spy0+lenR, spy0+lenx-1);
letc; setag; compare;
letc dseq(spy0+lenR, spy0+lenx-1); write; letm d(spth+lenth-1) d(spRcos+lenR) dseq(spx0+lenR, spx0+lenx-1);
letc; setag; compare;
letc dseq(spx0+lenR, spx0+lenx-1); write;
letc d(spRcos+lenR) d(spth+lenth-1); setag; compare; letc d(spRcos+lenR) d(spth+lenth-1)
dseq(spx0+lenR, spx0+lenx-1); write;
/* Make every zero value positive */ letm dseq(spx0, spx0+lenR-1); letc; setag; compare;
letm dseq(spx0+lenR, spx0+lenx-1); write;
letm dseq(spy0, spy0+lenR-1); setag; compare;
letm dseq(spy0+lenR, spy0+lenx-1); write;
twoscomp (spx0, lenR+1, temp);
letm d(temp); letc; setag; write;
twoscomp (spy0, lenR+1, temp);
letm d(temp) d(over1); letc; setag; write;
/* Look for the possible overflows of x0 and mark them */ letm d(spx+lenx-1) d(spx0+lenx-1) d(over1);
letc d(spx+lenx-1) d(spx0+lenx-1); setag; compare;
letc d(over1) d (spx+lenx-1) d(spx0+lenx-1); write;
letc; setag; compare;
letc d(over1); write; addition (spx, spx0, lenx, temp, cand);
/* Find the overflows of x0 */
letm d(over1) d(temp) d(spx0+lenx-1);
letc d(over1) d(temp) d(spx0+lenx-1); setag; compare; letc d(spx0+lenx-1); wrice;
letc d(over1); setag; compare;
letc; write; /* Look for the possible overflows of y0 and mark them */ letm d(temp) d(over2); letc; setag; write; letm d(spy+lenx-1) d(spy0+lenx-1) d(over2);
letc d(spy+lenx-1) d(spy0+lenx-1); setag; compare;
letc d(over2) d(spy+lenx-1) d(spy0+lenx-1); write;
letc; setag; compare;
letc d(over2); write; addition (spy, spy0, lenx, temp, cand);
/* Find the overflows of y0 */
letm d(over2) d(temp) d(spy0+lenx-1);
letc d(over2) d(temp) d(spy0+lenx-1); setag; compare; letc d(spy0+lenx-1); write;
letc d(over2); setag; compare;
letc; write;
/* Histogram of black circle location */
letm dseq(spx0+lenx-outbitx0, spx0+lenx-1)
dseq(spy0+lenx-outbity0, spy0+lenx-l) d(over1) d(over2) d(cand); for (x=0;x<xmax;x++)
{
for (y=0;y<ymax;y++)
{
letc dvar(spx0+lenx-outbitx0, spx0+lenx-1,x) dvar(spy0+lenx-outbity0, spy0+lenx-l,y) d(cand);
setag; compare;
histogram[x] [y] =countag;
}
} save; }
APPENDIX O / *** VORONOI * ** /
#define COLOR_SIZE 8
#define color 0
#define TM color+COLOR_SIZE
#define CN TM+1
#define CL CN+1
#define VN CL+1
#define VL VN+1
#define S VL+1
#define b 128
process(int window_index);
general_shift (int index); main (int, char *argv[ ])
{
int i;
int window_index; printf ("Running asp simulator - %s\n", argv[0]); /* here start the ASP program */
// confifo (0);
letm dseq (0,23) ; letc ; setag ; write ;
/*clear the output area */
letm dseq(24,47) ; letc ; setag ; write ;
/*clear the output area */
letm dseq(48,71) ; letc ; setag ; write ;
/*clear the output area */
// save(8L, 64); load(8L, 64L);
// confifo (8);
/* insert a PICTURE */
// for (i=0;i<2054; i++) nop; confifo (0);
for (i = 0 ; i < COLOR_SIZE; i++)
{
letmc d(71-i) ; setag; compare; letmc d(i) ; write;
}
// confifo (8);
letmc d(S); setag;
write; letmc dseq (color, color+COLOR_SIZE-1); setag; compare; letm d (S); letc; write; letm d(CL) d(VL); letc; setag; write; letmc d(S); setag; compare;
letmc d(CL);write; letm d (VL); letc; ; setag; write;
letmc d(VL); setag; firsel; write;
for(i=0; i<128; i++)
{
shiftag(128); write;
}
letmc d(VL); setag; compare;
shiftag (-1); write; letm dseq (0,7); letc; setag; compare; letmc d(VL); write; /// while (rsp)
for (i = 0 ; i < 14 ; i++ )
{
for (window_mdex=0 ; window_index < 8 ; window_index++)
{
letm d ( CN) d (VN);letc ; setag ;write;
letmc d (CL); setag;
compare;
letmc
d(CN); general_shift (window_mdex); write;
letmc d(VL); setag; compare;
letmc
d(VN); general_shift (window_index); write;
process (window_index);
}
// letm d(CL) ;letc; setag; compare;
}
// save (8,0);
// letmc d(VL); setag; compare;
// letm dseq (color, color+COLOR_SIZE-1); letc; write;
confifo (0);
for ( i =0; i< COLOR_SIZE ; i++)
{
letm d(TM); letc; setag; write;
letmc d(i) ; setag; compare;
letm d(TM) d(i);letc d(TM); write;
letmc d(71-i); setag; compare;
letm d(i) d(71-i); letc d(i);write; letmc d(TM); setag; compare;
letmc d(71-i); write; }
save (8, 64);
// load (8, 64);
// confifo (8);
// for (i= 0 ; i<2054; i ++)
nop;
// print_cycles;
return 0;
} process (int window_index)
{
int i;
for (i=0 ; i < COLOR_SIZE ; i++)
{
letm d(TM) ; letc ; setag; write;
letmc d(color+i); setag ; compare;
letmc d(TM); general_shift (window_index); write; letm dseq(TM,VL) d(color+i); letc d(CL) d(CN) d(TM); setag; compare;
letmc d(VN);write; letm dseq(TM,VL) d(color+i);
letc d(CL) d(CN)
d(color+i); setag; compare;
letmc d(VN); write; letm dseq(TM,VL) d(color+i);
letc d(CN) d(color+i); setag; compare;
letm d(color+i); letc; write; letm dseq(TM,VL) d(color+i);
letc d(CN) d(TM); setag; compare; letmc d(color+i);write;
}
letm d(CL) d(CN) d(VL);
letc d(CN); setag;
compare;
letmc d(CL);
write;
return 0;
} general_shift (int index)
{
if (index ==4 | | index == 0 | | index==6) shiftag(b); if (index ==5 | | index == 1 | | index==7) shiftag(-b); if (index ==6 | | index == 3 | | index==5) shiftag(1); if (index ==4 | | index == 2 | | index==7) shiftag(-1); return 0;
}
It will be appreciated by persons skilled in the art that the present invention is not limited to what has been particularly shown and described hereinabove. Rather, the scope of the present invention is defined only by the claims that follow:

Claims

CLAI MS
1. Associative signal processing apparatus for processing an incoming signal comprising a plurality of samples, the apparatus comprising:
a two-dimensional array of processors, each processor including a multiplicity of content addressable memory cells, each sample of an incoming signal being processed by at least one of the processors; and
a register array including at least one register operative to store responders arriving from the processors and to provide communication, within a single cycle, between non-adjacent processors.
2. Associative signal processing apparatus comprising:
an array of processors, each processor including a multiplicity of associative memory cells, at least one of the processors being operative to process a plurality of samples of an incoming signal;
a register array including at least one register operative to store responders arriving from the processors and to provide communication between processors; and
an I/O buffer register operative to input an incoming signal and to output an outgoing signal.
3. Apparatus according to claim 2 wherein the processor array, the register array and the I/O buffer register are arranged on a single chip.
4. Apparatus according to claim 1 wherein the register array is operative to perform at least one multicell shift operation.
5. Apparatus according to either of claims 2 or 3 wherein the register array is operative to perform at least one multicell shift operation.
6. Associative apparatus comprising:
a plurality of comparing memory elements each of which is operative to compare the contents of memory elements other than itself to respective references in accordance with a user-selected logical criterion, thereby to generate a responder if the comparing memory element complies with the criterion; and
a register operative to store the responders.
7. Apparatus according to claim 6 wherein the criterion comprises at least one logical operand.
8. Apparatus according to any of the preceding claims 2, 3, or 5 wherein said I/O buffer register and said processors are operative in parallel.
9. Apparatus according to any of the preceding claims 2, 3, or 5 wherein the word length of the I/O buffer register is increasable by decreasing the wordlength of the associative memory cells.
10. Apparatus according to any of claims 1, 2, 4 or 5 which is operative in video real time.
11. Apparatus according to any of claims 1, 2, 4, 5 or 9 wherein the signal comprises an image.
12. Apparatus according to claim 7 wherein said at least one logical operand comprises a reference for at least one memory element other than the comparing memory element itself.
13. Apparatus according to claim 6 wherein each memory element comprises at least one memory cell.
14. Apparatus according to any of claims 6 - 7 or 12 - 13 wherein the plurality of comparing memory elements are operative m parallel to compare the contents of a memory element other than themselves to an individual reference.
15. A method for image correction comprising:
computing a transformation for an output image imaged by a distorting lens which compensates for the lens distortion; and
applying the transformation in parallel to each of a plurality of pixels in the output image.
16. A method according to claim 15 wherein the distorting lens comprises an HDTV lens.
17. An array of processors which communicate by multicell and single cell shift operations, the array comprising:
a plurality of processors;
a first bus connecting at least a pair of the processors which bus is operative to perform at least one multicell shift operation; and
a second bus connecting at least a pair of the processors which bus is operative to perform single cell shift operations.
18. A signal processing method for processing a signal comprising:
for each consecutive pair of first and second signal characteristics within a sequence of signal characteristics, counting in parallel the number of samples having the first signal characteristic; and
subsequently, counting in parallel the number of samples having the second signal characteristic.
19. A method according to claim 18 wherein the counting comprises generating a histogram.
20. A method according to any of the preceding claims 18 - 19 wherein the signal comprises a color image.
21. A method according to claim 20 wherein at least one characteristic comprises at least one of the following group of characteristics:
intensity;
noise; and
color density.
22. A method according to claim 21 and also comprising scanning a medium bearing the color image.
23. Apparatus according to claim 11 wherein the image comprises a color image.
24. An edge detection method comprising:
identifying a first plurality of edge pixels and a second plurality of candidate edge pixels;
identifying, in parallel, all candidate edge pixels which are connected to at least one edge pixel as edge pixels; and
repeating the identifying in parallel at least once.
25. A feature labeling method m which a signal is inspected, the signal including at least one feature, the feature comprising a set of connected samples, the method comprising:
storing a plurality of indices for a corresponding plurality of samples;
in parallel for each individual sample from among the plurality of samples, replacing the stored index of the individual sample by an index of a sample connected thereto, if the index of the connected sample is ordered above the index of the individual sample; and repeating the replacing at least once.
26. A method according to claim 25 wherein the replacing is repeated until only a small number of indices are replaced in each iteration.
27. A method according to any of claims 1 - 5, 8 - 11, 18 - 23, 25, and 26 wherein the signal comprises an image.
28. A method according to claim 26 wherein the signal comprises a color image.
29. Image correction apparatus comprising:
a transformation computer operative to compute a transformation for an output image imaged by a distorting lens which transformation compensates for the lens distortion; and
an in-parallel transformer operative to apply the transformation in parallel to each of a plurality of pixels in the output image.
30. An associative memory comprising:
an array of PEs including a plurality of PEs, wherein each PE includes:
a processor of variable size; and
a word of variable size including an associative memory cell,
wherein all of the associative memory cells from among the plurality of associative memory cells included in the plurality of PEs are arranged in the same location within the word and wherein the plurality of words included in the plurality of PEs together form a FIFO.
31. Apparatus according to claim 30 wherein the word of variable size includes more than one associative memory cell.
32. A method for modifying contents of a multiplicity of memory cells and comprising:
performing, once, an arithmetic computation on an individual value stored in a plurality of memory cells;
storing the result of the arithmetic computation in a plurality of memory cells which contain the individual value.
33. A method according to claim 32 wherein the storing is carried out in all memory cells in parallel.
34. A method for constructing associative signal processing apparatus for processing an incoming signal, the method comprising:
arranging, on a module, an array of processors, each processor including a multiplicity of associative memory cells, each sample of an incoming signal being processed by at least one of the processors;
arranging, on the same module, a register array including at least one register operative to store responders arriving from the processors and to provide communication between processors; and
arranging, on the same module, an I/O buffer register for inputting and outputting a signal.
35. Apparatus according to claim 1 wherein at least one sample is processed by two or more of the processors.
36. Apparatus according to claim 1 wherein at least one of the processors processes more than one sample.
37. Apparatus according to claim 1 wherein the register array comprises a plurality of registers.
38. Apparatus according to claim 2 wherein the order in which the I/O buffer inputs an image differs from the row/column order of the image.
39. Apparatus according to claim 2 wherein the order in which the I/O buffer inputs the samples differs from the order of the samples within the incoming signal.
40. Apparatus according to claim 1 wherein the register array includes a plurality of registers operative to store responders arriving from the processors.
41. Apparatus according to claim 1 wherein the at least one register provides communication between the processors.
42. Apparatus according to claim 41 wherein the at least one register provides communication between processors which are processing non-adjacent samples.
43. Apparatus according to claim 1 and also comprising an I/O buffer register operative to input and output a signal.
44. Apparatus according to claim 43 wherein the processor array, the register array and the I/O buffer register are arranged on a single module.
45. Apparatus according to claim 43 wherein the processor array, the register array and the I/O buffer register are arranged on a single silicon die.
46. Apparatus according to claim 45 wherein the I/O buffer register includes a plurality of buffer register cells whose number is at least equal to the number of processors in said two-dimensional processor array.
PCT/US1994/014219 1993-12-12 1994-12-09 Apparatus and method for signal processing WO1995016234A1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
JP7516374A JPH09511078A (en) 1993-12-12 1994-12-09 Signal processing method and apparatus
AU14334/95A AU1433495A (en) 1993-12-12 1994-12-09 Apparatus and method for signal processing
EP95905890A EP0733233A4 (en) 1993-12-12 1994-12-09 Apparatus and method for signal processing

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
IL10799693A IL107996A0 (en) 1993-12-12 1993-12-12 Apparatus and method for signal processing
IL109,801 1994-05-26
IL107,996 1994-05-26
IL10980194A IL109801A0 (en) 1994-05-26 1994-05-26 Apparatus and method for signal processing

Publications (1)

Publication Number Publication Date
WO1995016234A1 true WO1995016234A1 (en) 1995-06-15

Family

ID=26322747

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1994/014219 WO1995016234A1 (en) 1993-12-12 1994-12-09 Apparatus and method for signal processing

Country Status (5)

Country Link
US (3) US5809322A (en)
EP (1) EP0733233A4 (en)
JP (1) JPH09511078A (en)
AU (1) AU1433495A (en)
WO (1) WO1995016234A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0773502A1 (en) * 1995-11-10 1997-05-14 Nippon Telegraph And Telephone Corporation Two-dimensional associative processor and data transfer method

Families Citing this family (137)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU1433495A (en) * 1993-12-12 1995-06-27 Asp Solutions Usa, Inc. Apparatus and method for signal processing
US6507362B1 (en) * 1994-12-09 2003-01-14 Neomagic Israel Ltd. Digital image generation device for transmitting digital images in platform-independent form via the internet
JP3785700B2 (en) * 1995-12-18 2006-06-14 ソニー株式会社 Approximation method and apparatus
RU2110089C1 (en) * 1995-12-22 1998-04-27 Бурцев Всеволод Сергеевич Computer system
US7286695B2 (en) * 1996-07-10 2007-10-23 R2 Technology, Inc. Density nodule detection in 3-D digital images
JP3211676B2 (en) * 1996-08-27 2001-09-25 日本電気株式会社 Image processing method and apparatus
JPH1173509A (en) * 1997-08-29 1999-03-16 Advantest Corp Device and method for recognizing image information
DE69808798T2 (en) * 1997-12-19 2003-09-18 Bae Systems Plc Farnborough DIGITAL SIGNAL FILTER USING UNWEIGHTED NEURAL TECHNIQUES
WO1999033019A1 (en) * 1997-12-19 1999-07-01 Bae Systems Plc Neural networks and neural memory
US6304333B1 (en) * 1998-08-19 2001-10-16 Hewlett-Packard Company Apparatus and method of performing dithering in a simplex in color space
US6591004B1 (en) * 1998-09-21 2003-07-08 Washington University Sure-fit: an automated method for modeling the shape of cerebral cortex and other complex structures using customized filters and transformations
US6266442B1 (en) * 1998-10-23 2001-07-24 Facet Technology Corp. Method and apparatus for identifying objects depicted in a videostream
US6266443B1 (en) * 1998-12-22 2001-07-24 Mitsubishi Electric Research Laboratories, Inc. Object boundary detection using a constrained viterbi search
US6282317B1 (en) * 1998-12-31 2001-08-28 Eastman Kodak Company Method for automatic determination of main subjects in photographic images
EP1164544B1 (en) * 1999-03-16 2011-11-02 Hamamatsu Photonics K.K. High-speed vision sensor
ITMI990737A1 (en) * 1999-04-09 2000-10-09 St Microelectronics Srl PROCEDURE TO INCREASE THE EQUIVALENT CALCULATION ACCURACY IN ANALOGUE MEMORY MEMORY
WO2000068882A1 (en) * 1999-05-10 2000-11-16 Sony Corporation Image processing apparatus, robot apparatus and image processing method
JP2001134539A (en) * 1999-11-01 2001-05-18 Sony Computer Entertainment Inc Plane computer and arithmetic processing method of plane computer
WO2001080068A1 (en) * 2000-04-14 2001-10-25 Mobileye, Inc. Generating a model of the path of a roadway from an image recorded by a camera
US6567775B1 (en) * 2000-04-26 2003-05-20 International Business Machines Corporation Fusion of audio and video based speaker identification for multimedia information access
US6674878B2 (en) * 2001-06-07 2004-01-06 Facet Technology Corp. System for automated determination of retroreflectivity of road signs and other reflective objects
US6891960B2 (en) 2000-08-12 2005-05-10 Facet Technology System for road sign sheeting classification
US6763127B1 (en) * 2000-10-06 2004-07-13 Ic Media Corporation Apparatus and method for fingerprint recognition system
US6741250B1 (en) * 2001-02-09 2004-05-25 Be Here Corporation Method and system for generation of multiple viewpoints into a scene viewed by motionless cameras and for presentation of a view path
US7113637B2 (en) * 2001-08-24 2006-09-26 Industrial Technology Research Institute Apparatus and methods for pattern recognition based on transform aggregation
CA2360295A1 (en) * 2001-10-26 2003-04-26 Jaldi Semiconductor Corp. System and method for image warping
JP4143302B2 (en) * 2002-01-15 2008-09-03 キヤノン株式会社 Image processing apparatus, image processing method, control program, and recording medium
US7030845B2 (en) * 2002-01-20 2006-04-18 Shalong Maa Digital enhancement of streaming video and multimedia system
JP2003274374A (en) 2002-03-18 2003-09-26 Sony Corp Device and method for image transmission, device and method for transmission, device and method for reception, and robot device
US9170812B2 (en) * 2002-03-21 2015-10-27 Pact Xpp Technologies Ag Data processing system having integrated pipelined array data processor
DE10233117B4 (en) * 2002-07-20 2010-09-16 Robert Bosch Gmbh Method and device for converting and / or regulating image characterization quantities
CN100401778C (en) * 2002-09-17 2008-07-09 弗拉迪米尔·切佩尔科维奇 Fast CODEC with high compression ratio and minimum required resources
JP4014486B2 (en) * 2002-10-25 2007-11-28 松下電器産業株式会社 Image processing method and image processing apparatus
WO2004044843A2 (en) * 2002-11-06 2004-05-27 Digivision, Inc. Systems and methods for image enhancement in multiple dimensions
JP4542308B2 (en) * 2002-12-16 2010-09-15 株式会社ソニー・コンピュータエンタテインメント Signal processing device and information processing device
GB0229368D0 (en) * 2002-12-17 2003-01-22 Aspex Technology Ltd Improvements relating to parallel data processing
US7174052B2 (en) * 2003-01-15 2007-02-06 Conocophillips Company Method and apparatus for fault-tolerant parallel computation
JP2004236110A (en) * 2003-01-31 2004-08-19 Canon Inc Image processor, image processing method, storage medium and program
US7275147B2 (en) 2003-03-31 2007-09-25 Hitachi, Ltd. Method and apparatus for data alignment and parsing in SIMD computer architecture
US6941236B2 (en) * 2003-03-31 2005-09-06 Lucent Technologies Inc. Apparatus and methods for analyzing graphs
US20040252547A1 (en) * 2003-06-06 2004-12-16 Chengpu Wang Concurrent Processing Memory
TWI220849B (en) * 2003-06-20 2004-09-01 Weltrend Semiconductor Inc Contrast enhancement method using region detection
US7162573B2 (en) * 2003-06-25 2007-01-09 Intel Corporation Communication registers for processing elements
US7268788B2 (en) * 2003-09-03 2007-09-11 Neomagic Israel Ltd. Associative processing for three-dimensional graphics
US20050065263A1 (en) * 2003-09-22 2005-03-24 Chung James Y.J. Polycarbonate composition
EP1544792A1 (en) * 2003-12-18 2005-06-22 Thomson Licensing S.A. Device and method for creating a saliency map of an image
US20070210183A1 (en) * 2004-04-20 2007-09-13 Xerox Corporation Environmental system including a micromechanical dispensing device
US7590310B2 (en) 2004-05-05 2009-09-15 Facet Technology Corp. Methods and apparatus for automated true object-based image analysis and retrieval
TWI244339B (en) * 2004-10-20 2005-11-21 Sunplus Technology Co Ltd Memory managing method and video data decoding method
US8090424B2 (en) * 2005-01-10 2012-01-03 Sti Medical Systems, Llc Method and apparatus for glucose level detection
WO2006121986A2 (en) 2005-05-06 2006-11-16 Facet Technology Corp. Network-based navigation system having virtual drive-thru advertisements integrated with actual imagery from along a physical route
US7786898B2 (en) 2006-05-31 2010-08-31 Mobileye Technologies Ltd. Fusion of far infrared and visible images in enhanced obstacle detection in automotive applications
US9867530B2 (en) 2006-08-14 2018-01-16 Volcano Corporation Telescopic side port catheter device with imaging system and method for accessing side branch occlusions
US7809210B2 (en) * 2006-12-12 2010-10-05 Mitsubishi Digital Electronics America, Inc. Smart grey level magnifier for digital display
JP5524835B2 (en) 2007-07-12 2014-06-18 ヴォルカノ コーポレイション In vivo imaging catheter
WO2009009802A1 (en) 2007-07-12 2009-01-15 Volcano Corporation Oct-ivus catheter for concurrent luminal imaging
US9596993B2 (en) 2007-07-12 2017-03-21 Volcano Corporation Automatic calibration systems and methods of use
WO2009031143A2 (en) * 2007-09-06 2009-03-12 Zikbit Ltd. A memory-processor system and methods useful in conjunction therewith
US7965564B2 (en) * 2007-09-18 2011-06-21 Zikbit Ltd. Processor arrays made of standard memory cells
US7760135B2 (en) * 2007-11-27 2010-07-20 Lockheed Martin Corporation Robust pulse deinterleaving
US9990674B1 (en) 2007-12-14 2018-06-05 Consumerinfo.Com, Inc. Card registry systems and methods
ES2306616B1 (en) 2008-02-12 2009-07-24 Fundacion Cidaut PROCEDURE FOR DETERMINATION OF THE LUMINANCE OF TRAFFIC SIGNS AND DEVICE FOR THEIR REALIZATION.
US8200022B2 (en) * 2008-03-24 2012-06-12 Verint Systems Ltd. Method and system for edge detection
US9513905B2 (en) 2008-03-28 2016-12-06 Intel Corporation Vector instructions to enable efficient synchronization and parallel reduction operations
US8312033B1 (en) 2008-06-26 2012-11-13 Experian Marketing Solutions, Inc. Systems and methods for providing an integrated identifier
US8060424B2 (en) 2008-11-05 2011-11-15 Consumerinfo.Com, Inc. On-line method and system for monitoring and reporting unused available credit
US8498982B1 (en) * 2010-07-07 2013-07-30 Openlogic, Inc. Noise reduction for content matching analysis results for protectable content
KR101638919B1 (en) * 2010-09-08 2016-07-12 엘지전자 주식회사 Mobile terminal and method for controlling the same
US11141063B2 (en) 2010-12-23 2021-10-12 Philips Image Guided Therapy Corporation Integrated system architectures and methods of use
US11040140B2 (en) 2010-12-31 2021-06-22 Philips Image Guided Therapy Corporation Deep vein thrombosis therapeutic methods
US9197248B2 (en) 2011-05-30 2015-11-24 Mikamonu Group Ltd. Low density parity check decoder
US9483606B1 (en) 2011-07-08 2016-11-01 Consumerinfo.Com, Inc. Lifescore
WO2013033489A1 (en) 2011-08-31 2013-03-07 Volcano Corporation Optical rotary joint and methods of use
US9106691B1 (en) 2011-09-16 2015-08-11 Consumerinfo.Com, Inc. Systems and methods of identity protection and management
US8738516B1 (en) 2011-10-13 2014-05-27 Consumerinfo.Com, Inc. Debt services candidate locator
US9853959B1 (en) 2012-05-07 2017-12-26 Consumerinfo.Com, Inc. Storage and maintenance of personal data
US10070827B2 (en) 2012-10-05 2018-09-11 Volcano Corporation Automatic image playback
US9286673B2 (en) 2012-10-05 2016-03-15 Volcano Corporation Systems for correcting distortions in a medical image and methods of use thereof
US9324141B2 (en) 2012-10-05 2016-04-26 Volcano Corporation Removal of A-scan streaking artifact
US9367965B2 (en) 2012-10-05 2016-06-14 Volcano Corporation Systems and methods for generating images of tissue
US9858668B2 (en) 2012-10-05 2018-01-02 Volcano Corporation Guidewire artifact removal in images
US9292918B2 (en) 2012-10-05 2016-03-22 Volcano Corporation Methods and systems for transforming luminal images
US11272845B2 (en) 2012-10-05 2022-03-15 Philips Image Guided Therapy Corporation System and method for instant and automatic border detection
JP2015532536A (en) 2012-10-05 2015-11-09 デイビッド ウェルフォード, System and method for amplifying light
US10568586B2 (en) 2012-10-05 2020-02-25 Volcano Corporation Systems for indicating parameters in an imaging data set and methods of use
US9307926B2 (en) 2012-10-05 2016-04-12 Volcano Corporation Automatic stent detection
US9840734B2 (en) 2012-10-22 2017-12-12 Raindance Technologies, Inc. Methods for analyzing DNA
US9654541B1 (en) 2012-11-12 2017-05-16 Consumerinfo.Com, Inc. Aggregating user web browsing data
US9916621B1 (en) 2012-11-30 2018-03-13 Consumerinfo.Com, Inc. Presentation of credit score factors
CA2894403A1 (en) 2012-12-13 2014-06-19 Volcano Corporation Devices, systems, and methods for targeted cannulation
CA2895502A1 (en) 2012-12-20 2014-06-26 Jeremy Stigall Smooth transition catheters
CA2895770A1 (en) 2012-12-20 2014-07-24 Jeremy Stigall Locating intravascular images
US10942022B2 (en) 2012-12-20 2021-03-09 Philips Image Guided Therapy Corporation Manual calibration of imaging system
US10939826B2 (en) 2012-12-20 2021-03-09 Philips Image Guided Therapy Corporation Aspirating and removing biological material
US11406498B2 (en) 2012-12-20 2022-08-09 Philips Image Guided Therapy Corporation Implant delivery system and implants
CA2895989A1 (en) 2012-12-20 2014-07-10 Nathaniel J. Kemp Optical coherence tomography system that is reconfigurable between different imaging modes
US9612105B2 (en) 2012-12-21 2017-04-04 Volcano Corporation Polarization sensitive optical coherence tomography system
JP2016508233A (en) 2012-12-21 2016-03-17 ナサニエル ジェイ. ケンプ, Power efficient optical buffering using optical switches
EP2934323A4 (en) 2012-12-21 2016-08-17 Andrew Hancock System and method for multipath processing of image signals
US10058284B2 (en) 2012-12-21 2018-08-28 Volcano Corporation Simultaneous imaging, monitoring, and therapy
EP2936626A4 (en) 2012-12-21 2016-08-17 David Welford Systems and methods for narrowing a wavelength emission of light
JP2016508757A (en) 2012-12-21 2016-03-24 ジェイソン スペンサー, System and method for graphical processing of medical data
WO2014100606A1 (en) 2012-12-21 2014-06-26 Meyer, Douglas Rotational ultrasound imaging catheter with extended catheter body telescope
WO2014100530A1 (en) 2012-12-21 2014-06-26 Whiseant Chester System and method for catheter steering and operation
US9486143B2 (en) 2012-12-21 2016-11-08 Volcano Corporation Intravascular forward imaging device
EP2934280B1 (en) 2012-12-21 2022-10-19 Mai, Jerome Ultrasound imaging with variable line density
US10226597B2 (en) 2013-03-07 2019-03-12 Volcano Corporation Guidewire with centering mechanism
EP2965263B1 (en) 2013-03-07 2022-07-20 Bernhard Sturm Multimodal segmentation in intravascular images
CN105228518B (en) 2013-03-12 2018-10-09 火山公司 System and method for diagnosing coronal microvascular diseases
US11154313B2 (en) 2013-03-12 2021-10-26 The Volcano Corporation Vibrating guidewire torquer and methods of use
CN105120759B (en) 2013-03-13 2018-02-23 火山公司 System and method for producing image from rotation intravascular ultrasound equipment
US9301687B2 (en) 2013-03-13 2016-04-05 Volcano Corporation System and method for OCT depth calibration
US11026591B2 (en) 2013-03-13 2021-06-08 Philips Image Guided Therapy Corporation Intravascular pressure sensor calibration
US10219887B2 (en) 2013-03-14 2019-03-05 Volcano Corporation Filters with echogenic characteristics
US9406085B1 (en) 2013-03-14 2016-08-02 Consumerinfo.Com, Inc. System and methods for credit dispute processing, resolution, and reporting
EP2967606B1 (en) 2013-03-14 2018-05-16 Volcano Corporation Filters with echogenic characteristics
US10102570B1 (en) 2013-03-14 2018-10-16 Consumerinfo.Com, Inc. Account vulnerability alerts
US10292677B2 (en) 2013-03-14 2019-05-21 Volcano Corporation Endoluminal filter having enhanced echogenic properties
US10685398B1 (en) 2013-04-23 2020-06-16 Consumerinfo.Com, Inc. Presenting credit score information
US9477737B1 (en) 2013-11-20 2016-10-25 Consumerinfo.Com, Inc. Systems and user interfaces for dynamic access of multiple remote databases and synchronization of data based on user rules
US9558423B2 (en) * 2013-12-17 2017-01-31 Canon Kabushiki Kaisha Observer preference model
RU2549150C1 (en) * 2014-02-27 2015-04-20 Федеральное государственное бюджетное учреждение "Московский научно-исследовательский институт глазных болезней имени Гельмгольца" Министерства здравоохранения Российской Федерации Fractal flicker generator for biomedical investigations
US9819841B1 (en) * 2015-04-17 2017-11-14 Altera Corporation Integrated circuits with optical flow computation circuitry
JP6344353B2 (en) * 2015-09-25 2018-06-20 京セラドキュメントソリューションズ株式会社 Image forming apparatus, color conversion program, and color conversion method
DE102016120775A1 (en) 2015-11-02 2017-05-04 Cognex Corporation System and method for detecting lines in an image with a vision system
US10937168B2 (en) 2015-11-02 2021-03-02 Cognex Corporation System and method for finding and classifying lines in an image with a vision system
US9984305B2 (en) * 2016-04-19 2018-05-29 Texas Instruments Incorporated Efficient SIMD implementation of 3x3 non maxima suppression of sparse 2D image feature points
US10147445B1 (en) 2017-11-28 2018-12-04 Seagate Technology Llc Data storage device with one or more detectors utilizing multiple independent decoders
KR102649657B1 (en) * 2018-07-17 2024-03-21 에스케이하이닉스 주식회사 Data Storage Device and Operation Method Thereof, Storage System Having the Same
US20200074100A1 (en) 2018-09-05 2020-03-05 Consumerinfo.Com, Inc. Estimating changes to user risk indicators based on modeling of similarly categorized users
US11315179B1 (en) 2018-11-16 2022-04-26 Consumerinfo.Com, Inc. Methods and apparatuses for customized card recommendations
US11269629B2 (en) * 2018-11-29 2022-03-08 The Regents Of The University Of Michigan SRAM-based process in memory system
CN109675315B (en) * 2018-12-27 2021-01-26 网易(杭州)网络有限公司 Game role model generation method and device, processor and terminal
US11238656B1 (en) 2019-02-22 2022-02-01 Consumerinfo.Com, Inc. System and method for an augmented reality experience via an artificial intelligence bot
US11941065B1 (en) 2019-09-13 2024-03-26 Experian Information Solutions, Inc. Single identifier platform for storing entity data
US11562555B2 (en) * 2021-06-02 2023-01-24 The Nielsen Company (Us), Llc Methods, systems, articles of manufacture, and apparatus to extract shape features based on a structural angle template
CN114859300B (en) * 2022-07-07 2022-10-04 中国人民解放军国防科技大学 Radar radiation source data stream processing method based on graph connectivity

Family Cites Families (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3828323A (en) * 1972-05-18 1974-08-06 Little Inc A Data recording and printing apparatus
US3970993A (en) * 1974-01-02 1976-07-20 Hughes Aircraft Company Cooperative-word linear array parallel processor
DE2627885A1 (en) * 1976-06-22 1978-01-05 Philips Patentverwaltung ARRANGEMENT FOR DETERMINING THE SPATIAL DISTRIBUTION OF THE ABSORPTION OF RADIATION IN ONE PLANE OF A BODY
US4491932A (en) * 1981-10-01 1985-01-01 Yeda Research & Development Co. Ltd. Associative processor particularly useful for tomographic image reconstruction
US4404653A (en) * 1981-10-01 1983-09-13 Yeda Research & Development Co. Ltd. Associative memory cell and memory unit including same
US4482902A (en) * 1982-08-30 1984-11-13 Harris Corporation Resonant galvanometer scanner system employing precision linear pixel generation
US4964040A (en) * 1983-01-03 1990-10-16 United States Of America As Represented By The Secretary Of The Navy Computer hardware executive
US4580215A (en) * 1983-03-08 1986-04-01 Itt Corporation Associative array with five arithmetic paths
US4546428A (en) * 1983-03-08 1985-10-08 International Telephone & Telegraph Corporation Associative array with transversal horizontal multiplexers
US4686691A (en) * 1984-12-04 1987-08-11 Burroughs Corporation Multi-purpose register for data and control paths having different path widths
FR2583602B1 (en) * 1985-06-18 1988-07-01 Centre Nat Rech Scient INTEGRATED RETINA WITH PROCESSOR NETWORK
GB2180714B (en) * 1985-08-22 1989-08-16 Rank Xerox Ltd Image apparatus
US4733393A (en) * 1985-12-12 1988-03-22 Itt Corporation Test method and apparatus for cellular array processor chip
JPH077444B2 (en) * 1986-09-03 1995-01-30 株式会社東芝 Connected component extractor for 3D images
GB2211638A (en) * 1987-10-27 1989-07-05 Ibm Simd array processor
US5268856A (en) * 1988-06-06 1993-12-07 Applied Intelligent Systems, Inc. Bit serial floating point parallel processing system and method
GB8825780D0 (en) * 1988-11-03 1988-12-07 Microcomputer Tech Serv Digital computer
JPH02273878A (en) * 1989-04-17 1990-11-08 Fujitsu Ltd Noise eliminating circuit
US5181261A (en) * 1989-12-20 1993-01-19 Fuji Xerox Co., Ltd. An image processing apparatus for detecting the boundary of an object displayed in digital image
DE4014019A1 (en) * 1990-05-02 1991-11-07 Zeiss Carl Fa METHOD FOR MEASURING A PHASE-MODULATED SIGNAL
JPH0420809A (en) * 1990-05-15 1992-01-24 Lock:Kk Method for measuring area of face of slope
US5239596A (en) * 1990-06-08 1993-08-24 Xerox Corporation Labeling pixels of an image based on near neighbor attributes
JP3084866B2 (en) * 1991-12-24 2000-09-04 松下電工株式会社 Lens distortion correction method
US5282177A (en) * 1992-04-08 1994-01-25 Micron Technology, Inc. Multiple register block write method and circuit for video DRAMs
JPH0695879A (en) * 1992-05-05 1994-04-08 Internatl Business Mach Corp <Ibm> Computer system
AU1433495A (en) * 1993-12-12 1995-06-27 Asp Solutions Usa, Inc. Apparatus and method for signal processing

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
IEEE EXPERT, October 1991, WEEMS et al., "Parallel Processing in the DARPA Strategic Computing Vision Program", pages 23-38. *
IEEE JOURNAL OF SOLID STATE CIRCUITS, April 1988, JONES et al., "A 9-kbit Associative Memory for High-Speed Parallel Processing Applications", pages 543-548. *
IEEE, 1990, "A Multiple-Level Heterogenous Architecture for Image Understanding", SHU et al., 1990, pages 615-627. *
IEEE, 1992, "Saliency Mapping in Associative Vision Machine", AKERIB et al., pages 889-893. *
See also references of EP0733233A4 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0773502A1 (en) * 1995-11-10 1997-05-14 Nippon Telegraph And Telephone Corporation Two-dimensional associative processor and data transfer method
US5854760A (en) * 1995-11-10 1998-12-29 Nippon Telegraph And Telephone Corporation Two-dimensional PE array, content addressable memory, data transfer method and mathematical morphology processing method
US6154809A (en) * 1995-11-10 2000-11-28 Nippon Telegraph & Telephone Corporation Mathematical morphology processing method

Also Published As

Publication number Publication date
EP0733233A4 (en) 1997-05-14
US5809322A (en) 1998-09-15
AU1433495A (en) 1995-06-27
US5974521A (en) 1999-10-26
JPH09511078A (en) 1997-11-04
EP0733233A1 (en) 1996-09-25
US6460127B1 (en) 2002-10-01

Similar Documents

Publication Publication Date Title
US6460127B1 (en) Apparatus and method for signal processing
Little et al. Algorithmic techniques for computer vision on a fine-grained parallel machine
Li et al. FPGA-based hardware design for scale-invariant feature transform
JPH07104948B2 (en) Image understanding machine and image analysis method
Pettersson et al. Online stereo calibration using FPGAs
Fung Computer Vision on the GPU
Sun et al. A 42fps full-HD ORB feature extraction accelerator with reduced memory overhead
Meribout et al. A parallel algorithm for real-time object recognition
Shu et al. Image understanding architecture and applications
Persoon A pipelined image analysis system using custom integrated circuits
Belmessaoud et al. FPGA implementation of feature detection and matching using ORB
Ruetz et al. An image-recognition system using algorithmically dedicated integrated circuits
Dallaire et al. Mixed-signal VLSI architecture for real-time computer vision
Ibrahim Image understanding algorithms on fine-grained tree-structured simd machines (computer vision, parallel architectures)
White Using FPGAs to perform embedded image registration
Meribout et al. Hough transform algorithm for three-dimensional segment extraction and its parallel hardware implementation
Meribout et al. A real-time image segmentation on a massively parallel architecture
Little Integrating vision modules on a fine-grained parallel machine
Francis Parallel architectures for image analysis
Burt Multiresolution Pyramid Architectures for Real-Time Motion Analysis.
Lougheed Advanced image-processing architectures for machine vision
Biancardi et al. Morphological operators on a massively parallel fine grained architecture
Edgcombe Hardware Acceleration in Image Stitching: GPU vs FPGA
Méribout et al. Real-time reprogrammable low-level image processing: edge detection and edge tracking accelerator
Viitanen et al. SIMD parallel calculation of distance transformations

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AM AT AU BB BG BR BY CA CH CN CZ DE DK EE ES FI GB GE HU JP KE KG KP KR KZ LK LR LT LU LV MD MG MN MW NL NO NZ PL PT RO RU SD SE SI SK TJ TT UA US UZ VN

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): KE MW SD SZ AT BE CH DE DK ES FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
WWE Wipo information: entry into national phase

Ref document number: 1995905890

Country of ref document: EP

WWP Wipo information: published in national office

Ref document number: 1995905890

Country of ref document: EP

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

NENP Non-entry into the national phase

Ref country code: CA

WWW Wipo information: withdrawn in national office

Ref document number: 1995905890

Country of ref document: EP