WO2024097128A1 - Neuromorphic programmable multiple pathways event-based sensors - Google Patents

Neuromorphic programmable multiple pathways event-based sensors Download PDF

Info

Publication number
WO2024097128A1
WO2024097128A1 PCT/US2023/036273 US2023036273W WO2024097128A1 WO 2024097128 A1 WO2024097128 A1 WO 2024097128A1 US 2023036273 W US2023036273 W US 2023036273W WO 2024097128 A1 WO2024097128 A1 WO 2024097128A1
Authority
WO
WIPO (PCT)
Prior art keywords
event
level
vision sensor
vision
pixel
Prior art date
Application number
PCT/US2023/036273
Other languages
French (fr)
Inventor
Rajkumar Chinnakonda KUBENCRAN
Benjamin BENOSMAN
Original Assignee
University Of Pittsburgh - Of The Commonwealth System Of Higher Education
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University Of Pittsburgh - Of The Commonwealth System Of Higher Education filed Critical University Of Pittsburgh - Of The Commonwealth System Of Higher Education
Publication of WO2024097128A1 publication Critical patent/WO2024097128A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N25/00Circuitry of solid-state image sensors [SSIS]; Control thereof
    • H04N25/47Image sensors with pixel address output; Event-driven image sensors; Selection of pixels to be read out based on image data

Definitions

  • the present technology relates generally to Retinal Vision Sensors (RVSs), and more specifically, to neuromorphic programmable multiple pathways event-based sensors.
  • RVSs Retinal Vision Sensors
  • CMOS imagers have high pixel density but use frame scanning, at a fixed clock rate, to continuously stream out the pixel intensity data, which results in high data rate and hence high-power consumption.
  • RVSs Retinal Vision Sensors
  • APS Active Pixel Sensor
  • DVSs Dynamic Vision Sensors
  • RVS right atrial septal senors
  • ganglion cells that extract different information from a scene and provide the visual cortex with rich and low bandwidth precisely timed information.
  • a general-purpose, programmable, multiple pathways event-based neuromorphic vision sensor can have a transformative impact on society, by impacting critical areas like healthcare, Internet of things (loT), edge computing, and industrial automation.
  • the RVS can provide efficient, robust, resilient, and autonomous bio-inspired vision.
  • the Retinal Vision Sensor is an event-based camera that can support multiple modes for visual feature detection and extraction and achieve extreme energy efficiency while being highly versatile.
  • the RVS hardware may include a hybrid event scanning scheme that is globally asynchronous and locally synchronous (GALS), a multi-modal tunable pixel design that supports multiple pathway readout, staggered array design of pixels, and processing elements that are integrated hierarchically to operate on individual or block of pixels and smart, adaptive readout of visually relevant processed data that significantly reduces communication bandwidth.
  • GALS globally asynchronous and locally synchronous
  • the ultra-low-power operation and activity-based output streaming offer a versatile platform ideally suited for myriad applications in security surveillance, drone navigation, and other domains requiring rapid tracking and logging of visual events.
  • Various embodiments of the present disclosure relate to a computing system (which may be, or may comprise, one or more computing devices) comprising one or more processors configured to employ or otherwise implement any of the processes disclosed herein.
  • Various embodiments of the present disclosure relate to a non-transitory computer- readable storage medium with instructions configured to cause one or more processors of a computing system (which may be, or may comprise, one or more computing devices) to employ or otherwise implement any of the processes disclosed herein.
  • FIG. 1 provides an illustration of Plenoptic structures, according to various example embodiments.
  • FIG. 2 depicts temporal contrast (TC) sensing (top) and spatial contrast (SC) sensing (bottom), according to various example embodiments.
  • FIG. 3 depicts a Retinal Vision Sensor (RVS) block diagram according to various example embodiments.
  • RVS Retinal Vision Sensor
  • a globally asynchronous and locally synchronous (GALS) system architecture for visual event routing is depicted on the left, and a local neighborhood of pixels with multiply-and-accumulate (MAC) compute and event detection is depicted on the right.
  • GALS globally asynchronous and locally synchronous
  • FIG. 4A depicts a planar (two-dimensional (2D)) organization of an RVS pipeline comprising 3 layers, according to various example embodiments: LI for sensing, L2 for compute, and L3 for communication.
  • FIG. 4B depicts a stacked (three-dimensional (3D)) organization of an RVS pipeline comprising 3 layers: LI for sensing, L2 for compute, and L3 for communication.
  • FIGs. 5A - 5D represent a retinal vision sensor pipeline according to various example embodiments.
  • Level 3 covers multiple Level 1 and Level 2 tiles.
  • FIG. 5B is a close-up of the left panel in FIG. 5A
  • FIG. 5C is a close-up of the middle panel in FIG. 5A
  • FIG. 5D is a close-up of the right panel in FIG. 5A.
  • FIG. 6 depicts a traditional sensor or camera pipeline, in contrast to example embodiments of an RVS system pipeline.
  • DVS dynamic vision sensors
  • qDVS query-driven DVS
  • the output of such an event-based sensor is a time-continuous stream of pixel data, delivered at unprecedented temporal resolution, containing zero redundancy.
  • Event-based cameras provide a well-suited solution for computer-assisted driving, owing to their intrinsic low latency and low power nature in the post-processing process of the data.
  • existing event-based cameras use a common serial communication bus (USB) to send their data to a computer, which negates the benefits of the large bandwidth and the low power consumption of the sensor.
  • USB serial communication bus
  • the output of the neuromorphic cameras needs to be interfaced with data processing systems that will exploit the information they generate. While using dedicated classical hardware is possible, this approach requires a conversion of the output to a format that can be handled by conventional CMOS systems, at the cost of much higher power consumption and reduced speed. In other words, the intrinsic advantages of the spiking camera are lost in such approaches.
  • the function of the pixel is fixed, which is to sample light intensity at discrete timestamps for APS and to detect level crossing at fixed thresholds for DVS.
  • the use of high frame rates for APS ranging from 100Hz up to 1kHz) or high temporal contrast sensitivity for DVS (around 10% for standard applications and below 1% for advanced cases) results in high output data rates of several Gigabits per second (Gbps) that need to be transmitted to the processing stage. This makes the whole process power hungry and energy inefficient while introducing a transmission bottleneck in the processing chain.
  • vision sensors developed so far have limited and narrow functionality, to either sample and quantize absolute pixel intensity at periodic timestamps that are streamed synchronously or continuously monitor, detect, and transmit asynchronously temporal/spatial contrast changes as events with polarity.
  • Biological vision applies a variety of spatiotemporal filters, feature extraction, and encoding mechanisms to acquire the rich and dynamic information present in the visual scene captured in the receptive field of view.
  • Various embodiments focuses on translating these fantastic properties of the retina onto electronic systems to enhance computer vision significantly.
  • the disclosed Retinal Vision Sensor will map more tunable functionality at the sensor node or a group of pixels. This will enable visual processing and feature extraction mechanisms to operate immediately at the sensor site, instead of offloading them to a processor. This will result in significant energy savings, by avoiding a communication channel at high speed and reducing the form factor of the visual perception pipeline.
  • FIG. 1 illustrates the Plenoptic structures which act as a set of basis functions to extract relevant but orthogonal information about the visual scene.
  • Some of the Plenoptic functions include temporal contrast (TC), spatial contrast (SC), temporal row (TRV) vector, temporal column vector (TCV), and spatial diagonal vector (SDV).
  • TC temporal contrast
  • SC spatial contrast
  • TRV temporal row
  • TCV temporal column vector
  • SDV spatial diagonal vector
  • FIG. 2 depicts the RVS architecture according to various potential embodiments.
  • the design is organized as a tiled array with tightly integrated blocks to measure light intensity, amplify and/or filter the response, convert them to events and stream the events using digital readout circuitry.
  • Each tile has three building blocks - pixel unit, MAC compute (can be analog/digital or mixed-signal) and local synchronous digital compute.
  • the fundamental building block is the pixel unit, which may comprise, or may consist of, the photodiode to convert incident light to voltage and the CMOS circuitry to amplify the photovoltage and reset the pixel, when necessary.
  • the output of the pixel unit is fed into the MAC compute block that implements the Plenoptic functions using an energy-efficient multiply- and-accumulate (MAC) unit.
  • the MAC unit can be implemented using switched capacitor circuits for analog compute or using standard logic gates for digital compute.
  • the MAC compute block can also access the photovoltage of the neighboring pixel units as well to perform more advanced computations that are required to calculate the Plenoptic structures.
  • This MAC unit performance matrix multiplication of the input voltage vector, v, with the programmable weight matrix, W, will be specific to each pixel unit.
  • the product of the matrices is then added to a bias vector, b, which can also be programmed.
  • the final result from the MAC compute block can now be converted to a multi-polarity event by comparing the result with windowed threshold values.
  • the spatial contrast can be calculated for a 3x3 neighborhood using a predetermined matrix to compute vW+b and compared to a high/low threshold to generate an ON/OFF event, indicating an increase/decrease in spatial contrast, respectively.
  • the operational pipeline in the local event detection block allows multipath (TC, SC, SDV etc.) events to be generated, scanned, and transmitted in a synchronous digital scheme.
  • a local clock source will be used to read out the multipath events parallelly from the local neighborhood of pixels synchronously by connecting all the digital blocks together as shown in FIG. 2. These events are then sent to the global asynchronous digital mesh for handshaking and stream out.
  • the GALS event readout scheme is inspired by the dendritic computation in the optic nerve and enable seamless, low-latency, activity-dependent event throughput for further processing in the downstream pipeline for higher-level visual perception and cognition tasks.
  • various embodiments of the RVS demonstrate the most advanced bio-inspired neuromorphic camera, optimized for low-latency, energy-efficient, adaptive data throughput by putting a processor next to every pixel.
  • a pixel unit (“Level 1”) may be one photodiode that captures light and converts it into an electrical voltage that can be stored, buffered, or reset. With light intensity converted into voltage, the device can perform computations (“Level 2”), which may be an analog compute block. The compute block may manipulate the voltage (e.g., to obtain, for example, differences, sums, differentiations, integrations, and/or higher matrix vector multiplications.
  • Level 1 may be one photodiode that captures light and converts it into an electrical voltage that can be stored, buffered, or reset.
  • Level 2 may be an analog compute block.
  • the compute block may manipulate the voltage (e.g., to obtain, for example, differences, sums, differentiations, integrations, and/or higher matrix vector multiplications.
  • traditional cameras only have a pixel unit and voltage that is recorded in the pixel unit is just sent out.
  • Various embodiments introduce analog compute adjacent to the pixel unit: the compute block performs processing operations right next to the pixel. Consequently, each pixel in a sense has a computer sitting next to it trying to manipulate and read different factors that are being measured, with a locally synchronous transmission in combination with globally asynchronous transmission.
  • synchronous transmission involves a clock to capture the data, and the global stage (“Level 3”) is asynchronous and thus without a clock: transmission occurs via handshaking.
  • the analog compute of the plenoptic functions inspired by the biological retina with respect to what kind of information the retina extracts, are a set of mathematical functions called the plenoptic functions.
  • Various embodiments compute the plenoptic functions.
  • the device Once computed, the device generates visual events. Visual events could be defined as deemed suitable, such as an indication of something moving, something changing shape, or something appearing or disappearing. Visual events may then be transmitted out using the GALS system architecture.
  • the functions may be used to determine whether certain voltage changes are considered a visual event.
  • the hardware of the compute level may, in various implementations, be transistor-based circuitry.
  • each “Level 2 Compute” is able to receive input not from just the pixel unit right next to it, but from the neighbors of that pixel as well, providing four voltage inputs for each distinct (separate) analog compute block. Compute blocks also receive inputs from neighboring compute blocks as well. In various embodiments, event detection occurs at “Level 3” based on voltages detected at Level 2 Compute blocks.
  • each compute block may directly measure or otherwise receive a voltage from an adjacent pixel unit, and indirectly receive voltages of the three other pixel units that have been buffered or otherwise maintained in memory of a neighboring compute block. That is, in various embodiments, each compute block measures pixel voltage of their pixel unit and buffers the voltage value so that it can be measured by neighboring computer block.
  • event detection is local. That is, each Level 3 digital block detects events that are local to that digital block (i.e., the analog compute block next to digital block). The digital block does not read or detect events from the neighboring units. Determining which events actually matter to the next level will be decided at the digital block.
  • the parameters for determining which events matter and which events do not matter is programmable based on the configurations of units and their components (the programmable features of the chip). Each chip can be programmed with respect to which events are detected and passed through to the next layer. Events may be determined according to plenoptic functions (see FIG. 1).
  • the 2x2 array of tiles includes a mesh running through the array and serving as a bus covering the entire set of four units.
  • the tiles can be arranged in a two-dimensional (2D) plane, and the digital block may be one layer “up.”
  • the pixel unit and analog compute may lie in one chip, and the local synchronous digital units may be on a different chip that is stacked on top of each other.
  • the layers can be fabricated separately and stacked on each other in a manner that lines up the components. This can be a 2.5D hedge tree implementation.
  • features of what is being imaged can be processed at the hardware level, as part of the chip.
  • Conventional camera technology uses postimaging processing.
  • the plenoptic functions are “general purpose,” allowing for different computations through programming.
  • the chip can be programmed, for example, so as to function like other cameras, such as a dynamic vision sensor (DVS) camera. But they can be programmed to detect more than just temporal contrast, attaining more of the capabilities of the biological retina, such as spatial contrast, high pass filtering, low pass filtering, edge detection, and velocity estimation.
  • a programmable “computer” is situated adjacent to each pixel unit and can be programmed and reprogrammed (configured and reconfigured) to function in different “modes.” That is, the matrix is reprogrammable to achieve different sets of functionalities. One mode may be to function like a standard camera, whereas other modes enhance the functionality of standard cameras.
  • the voltages of the pixel units can be factored as a 2x2 matrix V, which can be multiplied with a weight matrix W that is also a 2x2 array.
  • the result of the multiplication may be sent to an activation function to generate events.
  • the result of the multiplication can be compared to a threshold. If the result exceeds the threshold, it can be deemed to be an “event,” and otherwise, it would not be deemed to be the event.
  • the W matrix itself may be programmed and reprogrammed for different “modes.” For example, one weight matrix W (e.g., Wl) may provide for temporal contrast, another weight matrix W (e.g., W2) may provide for spatial contrast, and yet another weight matrix W (e.g., W3) may provide for high pass filtering.
  • Wl weight matrix
  • W2 weight matrix
  • W3 weight matrix W
  • the chip could perform the multiplication of the V matrix by the user-set W matrix.
  • the 2x2 size is discussed for illustrative purposes, and in various embodiments, the matrices may be configured to have different sizes (e.g., 3x3, 4x4, 5x5, 6x6, etc.).
  • multiple weight matrices may be employed.
  • a chip via multiple pathways, a chip can perform V multiplied by Wl, V multiplied by W2, and V multiplied by W5 in parallel. This would allow the chip to provide, for example, just temporal contrast, just spatial contrast, or both temporal contrast and spatial contrast at the same time, adding additional modes.
  • the GALS architecture can provide such data throughput in implementing multiple weight matrices and modes.
  • the architecture of the chip may be designed to provide multiple pathways for different quadrants of an image. For example, if we are interested in one region of an image in which there is more “action” (changes in what is being observed).
  • the quadrants that are of lower interest (with less action) can be programmed to only provide temporal contrast, whereas the region of interest may be programmed to provide, for example, spatial contrast, optical flow, velocity estimation, etc.
  • This is analogous to how biological vision systems function: an entire visual scene is not processed at the same time. Rather, biological system apply visual attention to a particular region of a scene, and that region is where compute power is focused so as to extract maximum information.
  • This programming can be programmed in real time, such that what information is captured (observed) and in which regions can be modified to fit the current circumstances. This would allow for focus (attention) to dynamically change depending on activity.
  • event detection may define “polarities.” In an example, whether there is an event or there is not an event would be a binary polarity. In other embodiments, there can be multiple (binary or non-binaiy) polarities. For example, a polarity can have four levels instead of a binary polarity with 2 levels. Four levels (polarities) could include, for example, a positive event, a negative event, or no event. This provides for detection of multi-polarity events. In various embodiments, the hardware may be programmed for, for example, six polarities.
  • the W matrix can provide for filters, such as low pass filters, high pass filters, or band pass filters. Voltages can be manipulated with respect to neighbors as well. Within a pixel unit, we can know the voltage of the pixel at the current time as well as what the voltage was in a previous time, providing a delta V, or change in voltage over a particular time delta T. That can provide for a derivative dV, for example, useful for low pass filtering that may filtering out high frequency components.
  • This signal processing can be implemented as matrix multiplications in the digital domain, but in various embodiments, this processing can be performed in an analog domain next to the pixel units. This provides for the capabilities of a computer at this level.
  • FIG. 6 at the top is depicted a design of a traditional sensor/camera and digital signal post-processing pipeline, where the pixels are scanned synchronously frame by frame and high throughput redundant data is provided by the sensor for further processing.
  • the RVS comprises a photodiode array that provides a photocurrent readout to a local tile scanning pixel array unit, which synchronously provides an analog voltage readout to an analog to digital conversion unit in Level 1.
  • the conversion provides tile digital data to a programmable MAC kernel array and event detection unit in Level 2.
  • Multi-polarity events are provided to an asynchronous priority sorting unit, and prioritized events at a grid-level are provided to a smart feature extraction unit in Level 3.
  • Low throughput meaningful features at much lower bandwidth are then provided to digital signal post-processing.
  • Dynamic Vision Sensors are fully asynchronous, leading to excellent temporal resolution (e.g., ⁇ l .s) and high throughput (e.g., >1 Giga-events per second (Geps)) but are difficult to scale for higher resolutions (e.g., >1 MegaPixels (MP)) due to the complex pixel design and higher static power consumption.
  • Traditional CMOS imagers called Active Pixel Sensors (APS) employing synchronous frame scanning methods can scale to high resolutions (e.g., >100 MegaPixels) but have severely limited temporal resolution due to the fixed frame rate, typically 120 frames per second (FPS).
  • Embodiments of the approach disclosed above provide a novel event-based camera implementing a Globally Asynchronous Locally Synchronous (GALS) architecture that can, for example, guarantee an equivalent frame rate of around 10,000 FPS for 1 MP resolution and target throughput of 10 Geps, breaking the barriers of conventional APS and DVS cameras.
  • GALS Globally Asynchronous Locally Synchronous
  • Embodiments of the disclosed architecture combine the APS frame scanning technique locally for low-resolution synchronous tiles, and asynchronous readout of the generated events, similar to DVS cameras, globally at the grid level (array of tiles).
  • RVS camera can produce higher frame rates of, for example, 10,000 FPS for a IMP image size, making it effective for the potential applications of motion detection and object tracking at extreme speeds such as hypersonic missiles.
  • the operation of RVS will be asynchronous with the priority of event readout depending on the application and given to the pixels/group of pixels that satisfy the criteria.
  • circuit may include hardware structured to execute the functions described herein.
  • each respective “circuit” may include machine-readable media for configuring the hardware to execute the functions described herein.
  • the circuit may be embodied as one or more circuitry components including, but not limited to, processing circuitry, network interfaces, peripheral devices, input devices, output devices, sensors, etc.
  • a circuit may take the form of one or more analog circuits, electronic circuits (e.g., integrated circuits (IC), discrete circuits, system on a chip (SOC) circuits), telecommunication circuits, hybrid circuits, and any other type of “circuit.”
  • the “circuit” may include any type of component for accomplishing or facilitating achievement of the operations described herein.
  • a circuit as described herein may include one or more transistors, logic gates (e.g., NAND, AND, NOR, OR, XOR, NOT, XNOR), resistors, multiplexers, registers, capacitors, inductors, diodes, wiring, and so on.
  • the “circuit” may also include one or more processors communicatively coupled to one or more memory or memory devices.
  • the one or more processors may execute instructions stored in the memory or may execute instructions otherwise accessible to the one or more processors.
  • the one or more processors may be embodied in various ways.
  • the one or more processors may be constructed in a manner sufficient to perform at least the operations described herein.
  • the one or more processors may be shared by multiple circuits (e.g., circuit A and circuit B may comprise or otherwise share the same processor, which, in some example implementations, may execute instructions stored, or otherwise accessed, via different areas of memory).
  • the one or more processors may be structured to perform or otherwise execute certain operations independent of one or more co-processors.
  • two or more processors may be coupled via a bus to enable independent, parallel, pipelined, or multi-threaded instruction execution.
  • Each processor may be implemented as one or more general -purpose processors, ASICs, FPGAs, GPUs, TPUs, digital signal processors (DSPs), or other suitable electronic data processing components structured to execute instructions provided by memory.
  • the one or more processors may take the form of a single core processor, multi-core processor (e.g., a dual core processor, triple core processor, or quad core processor), microprocessor, etc.
  • the one or more processors may be external to the apparatus, in a non-limiting example, the one or more processors may be a remote processor (e.g., a cloud-based processor). Alternatively or additionally, the one or more processors may be internal or local to the apparatus. In this regard, a given circuit or components thereof may be disposed locally (e.g., as part of a local server, a local computing system) or remotely (e.g., as part of a remote server such as a cloud based server). To that end, a “circuit” as described herein may include components that are distributed across one or more locations.
  • An exemplary system for implementing the overall system or portions of the implementations might include a general purpose computing devices in the form of computers, including a processing unit, a system memory, and a system bus that couples various system components including the system memory to the processing unit.
  • Each memory device may include non-transient volatile storage media, non-volatile storage media, non-transitory storage media (e.g., one or more volatile or non-volatile memories), etc.
  • the non-volatile media may take the form of ROM, flash memory (e.g., flash memory such as NAND, 3D NAND, NOR, 3D NOR), EEPROM, MRAM, magnetic storage, hard discs, optical discs, etc.
  • the volatile storage media may take the form of RAM, TRAM, ZRAM, etc. Combinations of the above are also included within the scope of machine- readable media.
  • machine-executable instructions comprise, in a non-limiting example, instructions and data, which cause a general-purpose computer, special purpose computer, or special purpose processing machines to perform a certain function or group of functions.
  • Each respective memory device may be operable to maintain or otherwise store information relating to the operations performed by one or more associated circuits, including processor instructions and related data (e.g., database components, object code components, script components), in accordance with the example implementations described herein.
  • input devices may include any type of input device including, but not limited to, a keyboard, a keypad, a mouse joystick, or other input devices performing a similar function.
  • output device may include any type of output device including, but not limited to, a computer monitor, printer, facsimile machine, or other output devices performing a similar function.
  • references to implementations or elements or acts of the systems and methods herein referred to in the singular may also embrace implementations including a plurality of these elements, and any references in plural to any implementation or element or act herein may also embrace implementations including only a single element.
  • References in the singular or plural form are not intended to limit the presently disclosed systems or methods, their components, acts, or elements to single or plural configurations.
  • References to any act or element being based on any information, act, or element may include implementations where the act or element is based at least in part on any information, act, or element.
  • any implementation disclosed herein may be combined with any other implementation, and references to “an implementation,” “some implementations,” “an alternate implementation,” “various implementation,” “one implementation,” or the like are not necessarily mutually exclusive and are intended to indicate that a particular feature, structure, or characteristic described in connection with the implementation may be included in at least one implementation. Such terms as used herein are not necessarily all referring to the same implementation. Any implementation may be combined with any other implementation, inclusively or exclusively, in any manner consistent with the aspects and implementations disclosed herein.
  • references to “or” may be construed as inclusive so that any terms described using “or” may indicate any of a single, more than one, and all of the described terms.
  • a range includes each individual member.
  • a group having 1-3 cells refers to groups having 1, 2, or 3 cells.
  • a group having 1-5 cells refers to groups having 1, 2, 3, 4, or 5 cells, and so forth.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Image Processing (AREA)

Abstract

Disclosed is a new class of sensors, called Retinal Vision Sensors (RVSs), that can provide efficient, robust, resilient, and autonomous bio-inspired vision. The RVS can be an event-based camera that can support multiple modes for visual feature detection and extraction and achieve extreme energy efficiency while being highly versatile. The RVS hardware may include a hybrid event scanning scheme that is globally asynchronous and locally synchronous (GALS), a multi-modal tunable pixel design that supports multiple pathway readout, staggered array design of pixels, and processing elements that are integrated hierarchically to operate on individual or block of pixels and smart, adaptive readout of visually relevant processed data that significantly reduces communication bandwidth. The ultra-low-power operation and activity -based output streaming offer a versatile platform ideally suited for myriad applications in security surveillance, drone navigation, and other domains requiring rapid tracking and logging of visual events.

Description

NEUROMORPHIC PROGRAMMABLE MULTIPLE PATHWAYS EVENT-BASED SENSORS
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application claims priority to and the benefit of U.S. Provisional Patent Application No. 63/420,820 filed October 31, 2022, the entirety of which is incorporated herein by reference.
TECHNICAL FIELD
[0002] The present technology relates generally to Retinal Vision Sensors (RVSs), and more specifically, to neuromorphic programmable multiple pathways event-based sensors.
BACKGROUND
[0003] Traditional complementary metal-oxide semiconductor (CMOS) imagers have high pixel density but use frame scanning, at a fixed clock rate, to continuously stream out the pixel intensity data, which results in high data rate and hence high-power consumption.
SUMMARY
[0004] Disclosed is a solution to dramatically push forward vision sensing with a new class of sensors, called Retinal Vision Sensors (RVSs), that can incorporate new features and enable applications, beyond Active Pixel Sensor (APS) cameras and event-based Dynamic Vision Sensors (DVS).
[0005] With RVS, it becomes possible to extend the notion of events to a family of spatiotemporal filters that can act on a single or group of neighboring pixels that can then be output as different pathways. This is similar to biological retinas and their family of different types of ganglion cells that extract different information from a scene and provide the visual cortex with rich and low bandwidth precisely timed information.
[0006] This concept applied to pixels introduces more functionalities within single pixels and therefore reduces bandwidth and computation load, as it extracts more elaborate and compute adapted features, going beyond the simple concept of temporal contrast changes. [0007] A general-purpose, programmable, multiple pathways event-based neuromorphic vision sensor can have a transformative impact on society, by impacting critical areas like healthcare, Internet of things (loT), edge computing, and industrial automation. The RVS can provide efficient, robust, resilient, and autonomous bio-inspired vision.
[0008] In various embodiments, the Retinal Vision Sensor (RVS) is an event-based camera that can support multiple modes for visual feature detection and extraction and achieve extreme energy efficiency while being highly versatile. The RVS hardware may include a hybrid event scanning scheme that is globally asynchronous and locally synchronous (GALS), a multi-modal tunable pixel design that supports multiple pathway readout, staggered array design of pixels, and processing elements that are integrated hierarchically to operate on individual or block of pixels and smart, adaptive readout of visually relevant processed data that significantly reduces communication bandwidth. The ultra-low-power operation and activity-based output streaming offer a versatile platform ideally suited for myriad applications in security surveillance, drone navigation, and other domains requiring rapid tracking and logging of visual events.
[0009] Various embodiments of the present disclosure relate to architectures, devices, and processes for implementing the approach disclosed in the present disclosure.
[0010] Various embodiments of the present disclosure relate to a computing system (which may be, or may comprise, one or more computing devices) comprising one or more processors configured to employ or otherwise implement any of the processes disclosed herein.
[0011] Various embodiments of the present disclosure relate to a non-transitory computer- readable storage medium with instructions configured to cause one or more processors of a computing system (which may be, or may comprise, one or more computing devices) to employ or otherwise implement any of the processes disclosed herein.
[0012] Various embodiments of the disclosure relate to processes performed using architectures, devices, and/or systems disclosed herein. BRIEF DESCRIPTION OF THE DRAWINGS
[0013] FIG. 1 provides an illustration of Plenoptic structures, according to various example embodiments.
[0014] FIG. 2 depicts temporal contrast (TC) sensing (top) and spatial contrast (SC) sensing (bottom), according to various example embodiments.
[0015] FIG. 3 depicts a Retinal Vision Sensor (RVS) block diagram according to various example embodiments. A globally asynchronous and locally synchronous (GALS) system architecture for visual event routing is depicted on the left, and a local neighborhood of pixels with multiply-and-accumulate (MAC) compute and event detection is depicted on the right.
[0016] FIG. 4A depicts a planar (two-dimensional (2D)) organization of an RVS pipeline comprising 3 layers, according to various example embodiments: LI for sensing, L2 for compute, and L3 for communication. FIG. 4B depicts a stacked (three-dimensional (3D)) organization of an RVS pipeline comprising 3 layers: LI for sensing, L2 for compute, and L3 for communication.
[0017] FIGs. 5A - 5D represent a retinal vision sensor pipeline according to various example embodiments. Level 3 covers multiple Level 1 and Level 2 tiles. FIG. 5B is a close-up of the left panel in FIG. 5A, FIG. 5C is a close-up of the middle panel in FIG. 5A, and FIG. 5D is a close-up of the right panel in FIG. 5A.
[0018] FIG. 6 depicts a traditional sensor or camera pipeline, in contrast to example embodiments of an RVS system pipeline.
[0019] The foregoing and other features of the present disclosure will become apparent from the following description and appended claims, taken in conjunction with the accompanying drawings. Understanding that these drawings depict only several embodiments in accordance with the disclosure and are, therefore, not to be considered limiting of its scope, the disclosure will be described with additional specificity and detail through use of the accompanying drawings. DETAILED DESCRIPTION
[0020] It is to be appreciated that certain aspects, modes, embodiments, variations and features of the present methods are described below in various levels of detail in order to provide a substantial understanding of the present technology. It is to be understood that the present disclosure is not limited to particular uses, methods, devices, or systems, each of which can, of course, vary. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to be limiting.
[0021] In contrast to traditional CMOS imagers, dynamic vision sensors (DVS) and similar silicon retina chips use an asynchronous methodology to stream out visual events only when they occur, therefore enabling a much lower output data rate. However, typical DVS incur a large area and power overhead to handle event requests and acknowledge handshaking within each pixel, thus resulting in very low pixel density and consuming substantial power despite the savings resulting from the efficiency of visual event coding. The query-driven DVS (qDVS) architecture presented a query-based approach to visual event coding that substantially improves the achievable density and energy efficiency of dynamic vision sensing. This method uses clocked time-division multiplexing to continuously scan the array, querying each pixel for threshold change events in pixel intensity. However, the function of any DVS camera has been limited to temporal contrast (TC) sensing, whereas the biological retina is far more versatile in function and performance.
[0022] Conventional image sensors suffer from significant limitations imposed by their intrinsic principle of operation. For example, acquiring visual information frame-by-frame limits the temporal resolution of the data leading to distorted information for fast-moving objects and hence resulting in vast amounts of redundancy. This unnecessarily inflates the volume of the data and yields a relatively poor intra-scene dynamic range, especially in a rapidly changing intensity environment such as exiting from a tunnel. Neuromorphic event-based approach to vision and image sensing offers promising solutions to all these problems in existing technologies. As in biology, neuromorphic vision systems are driven by events happening within the scene in view. This is contrary to conventional image sensors, which follow artificially created time stamps and control signals. As a result, each pixel in an event-based sensor array samples its visual input individually and adapts its sampling rate to the dynamics of the input signal, significantly improving latency and energy efficiency.
[0023] The output of such an event-based sensor is a time-continuous stream of pixel data, delivered at unprecedented temporal resolution, containing zero redundancy. Event-based cameras provide a well-suited solution for computer-assisted driving, owing to their intrinsic low latency and low power nature in the post-processing process of the data. However, existing event-based cameras use a common serial communication bus (USB) to send their data to a computer, which negates the benefits of the large bandwidth and the low power consumption of the sensor. The output of the neuromorphic cameras needs to be interfaced with data processing systems that will exploit the information they generate. While using dedicated classical hardware is possible, this approach requires a conversion of the output to a format that can be handled by conventional CMOS systems, at the cost of much higher power consumption and reduced speed. In other words, the intrinsic advantages of the spiking camera are lost in such approaches.
[0024] In both cases, the function of the pixel is fixed, which is to sample light intensity at discrete timestamps for APS and to detect level crossing at fixed thresholds for DVS. Moreover, the use of high frame rates for APS (ranging from 100Hz up to 1kHz) or high temporal contrast sensitivity for DVS (around 10% for standard applications and below 1% for advanced cases) results in high output data rates of several Gigabits per second (Gbps) that need to be transmitted to the processing stage. This makes the whole process power hungry and energy inefficient while introducing a transmission bottleneck in the processing chain. More importantly, in the case of DVS, there is currently no efficient hardware to process in real-time, the massive amounts of Giga events per second output by pixels, for resolutions above Video Graphics Array (VGA) or megapixels. On the contrary, a hybrid sensor called query-driven DVS (qDVS) outputs sparse high temporal resolution “frames of events” which enables the sensor to achieve higher energy efficiency than both APS and DVS, is directly compatible with standard artificial intelligence (Al) hardware like graphics processing unit (GPU), but has higher latency compared to DVS, due to lower and fixed temporal resolution. [0025] Moreover, vision sensors developed so far have limited and narrow functionality, to either sample and quantize absolute pixel intensity at periodic timestamps that are streamed synchronously or continuously monitor, detect, and transmit asynchronously temporal/spatial contrast changes as events with polarity. Biological vision applies a variety of spatiotemporal filters, feature extraction, and encoding mechanisms to acquire the rich and dynamic information present in the visual scene captured in the receptive field of view. Various embodiments focuses on translating these fantastic properties of the retina onto electronic systems to enhance computer vision significantly.
[0026] The disclosed Retinal Vision Sensor will map more tunable functionality at the sensor node or a group of pixels. This will enable visual processing and feature extraction mechanisms to operate immediately at the sensor site, instead of offloading them to a processor. This will result in significant energy savings, by avoiding a communication channel at high speed and reducing the form factor of the visual perception pipeline.
[0027] FIG. 1 illustrates the Plenoptic structures which act as a set of basis functions to extract relevant but orthogonal information about the visual scene. Some of the Plenoptic functions include temporal contrast (TC), spatial contrast (SC), temporal row (TRV) vector, temporal column vector (TCV), and spatial diagonal vector (SDV). Collecting the Plenoptic function values individually from each pixel can be then transmitted with significantly lower bandwidth. The receiver side can recreate the visual scene faithfully by using these values. The Plenoptic functions can be computed using a unique matrix that can be predetermined and programmed externally.
[0028] FIG. 2 depicts the RVS architecture according to various potential embodiments. The design is organized as a tiled array with tightly integrated blocks to measure light intensity, amplify and/or filter the response, convert them to events and stream the events using digital readout circuitry. Each tile has three building blocks - pixel unit, MAC compute (can be analog/digital or mixed-signal) and local synchronous digital compute.
[0029] The fundamental building block is the pixel unit, which may comprise, or may consist of, the photodiode to convert incident light to voltage and the CMOS circuitry to amplify the photovoltage and reset the pixel, when necessary. The output of the pixel unit is fed into the MAC compute block that implements the Plenoptic functions using an energy-efficient multiply- and-accumulate (MAC) unit. The MAC unit can be implemented using switched capacitor circuits for analog compute or using standard logic gates for digital compute. The MAC compute block can also access the photovoltage of the neighboring pixel units as well to perform more advanced computations that are required to calculate the Plenoptic structures. This MAC unit performance matrix multiplication of the input voltage vector, v, with the programmable weight matrix, W, will be specific to each pixel unit. The product of the matrices is then added to a bias vector, b, which can also be programmed. The final result from the MAC compute block can now be converted to a multi-polarity event by comparing the result with windowed threshold values. For example, the spatial contrast can be calculated for a 3x3 neighborhood using a predetermined matrix to compute vW+b and compared to a high/low threshold to generate an ON/OFF event, indicating an increase/decrease in spatial contrast, respectively.
[0030] The operational pipeline in the local event detection block allows multipath (TC, SC, SDV etc.) events to be generated, scanned, and transmitted in a synchronous digital scheme. A local clock source will be used to read out the multipath events parallelly from the local neighborhood of pixels synchronously by connecting all the digital blocks together as shown in FIG. 2. These events are then sent to the global asynchronous digital mesh for handshaking and stream out. In various embodiments, the GALS event readout scheme is inspired by the dendritic computation in the optic nerve and enable seamless, low-latency, activity-dependent event throughput for further processing in the downstream pipeline for higher-level visual perception and cognition tasks. Hence, various embodiments of the RVS demonstrate the most advanced bio-inspired neuromorphic camera, optimized for low-latency, energy-efficient, adaptive data throughput by putting a processor next to every pixel.
[0031] With reference to FIGs. 3, 4A, 4B, and 5A - 5D, various aspects of potential embodiments will now be discussed. In certain embodiments, a pixel unit (“Level 1”) may be one photodiode that captures light and converts it into an electrical voltage that can be stored, buffered, or reset. With light intensity converted into voltage, the device can perform computations (“Level 2”), which may be an analog compute block. The compute block may manipulate the voltage (e.g., to obtain, for example, differences, sums, differentiations, integrations, and/or higher matrix vector multiplications. By contrast, traditional cameras only have a pixel unit and voltage that is recorded in the pixel unit is just sent out. Various embodiments introduce analog compute adjacent to the pixel unit: the compute block performs processing operations right next to the pixel. Consequently, each pixel in a sense has a computer sitting next to it trying to manipulate and read different factors that are being measured, with a locally synchronous transmission in combination with globally asynchronous transmission. As used herein, synchronous transmission involves a clock to capture the data, and the global stage (“Level 3”) is asynchronous and thus without a clock: transmission occurs via handshaking.
[0032] The analog compute of the plenoptic functions, inspired by the biological retina with respect to what kind of information the retina extracts, are a set of mathematical functions called the plenoptic functions. Various embodiments compute the plenoptic functions. Once computed, the device generates visual events. Visual events could be defined as deemed suitable, such as an indication of something moving, something changing shape, or something appearing or disappearing. Visual events may then be transmitted out using the GALS system architecture. The functions may be used to determine whether certain voltage changes are considered a visual event. The hardware of the compute level may, in various implementations, be transistor-based circuitry.
[0033] Referring to FIG. 3, each “Level 2 Compute” is able to receive input not from just the pixel unit right next to it, but from the neighbors of that pixel as well, providing four voltage inputs for each distinct (separate) analog compute block. Compute blocks also receive inputs from neighboring compute blocks as well. In various embodiments, event detection occurs at “Level 3” based on voltages detected at Level 2 Compute blocks.
[0034] As depicted in FIG. 3, there may be four separate local synchronous digital units able to communicate with each other, similar to the ability of each of the four separate analog compute units to communicate with the four pixel units. An analog compute unit may directly measure or otherwise receive a voltage from an adjacent pixel unit, and indirectly receive voltages of the three other pixel units that have been buffered or otherwise maintained in memory of a neighboring compute block. That is, in various embodiments, each compute block measures pixel voltage of their pixel unit and buffers the voltage value so that it can be measured by neighboring computer block.
[0035] In various embodiments, event detection is local. That is, each Level 3 digital block detects events that are local to that digital block (i.e., the analog compute block next to digital block). The digital block does not read or detect events from the neighboring units. Determining which events actually matter to the next level will be decided at the digital block. The parameters for determining which events matter and which events do not matter is programmable based on the configurations of units and their components (the programmable features of the chip). Each chip can be programmed with respect to which events are detected and passed through to the next layer. Events may be determined according to plenoptic functions (see FIG. 1).
[0036] In FIG. 3, the 2x2 array of tiles includes a mesh running through the array and serving as a bus covering the entire set of four units. In various embodiments, the tiles can be arranged in a two-dimensional (2D) plane, and the digital block may be one layer “up.” The pixel unit and analog compute may lie in one chip, and the local synchronous digital units may be on a different chip that is stacked on top of each other. The layers can be fabricated separately and stacked on each other in a manner that lines up the components. This can be a 2.5D hedge tree implementation.
[0037] In various embodiments, features of what is being imaged, such as linear motion in the X axis in the Y axis (displacement), linear velocity, wavelength of the light (color), etc., can be processed at the hardware level, as part of the chip. Conventional camera technology uses postimaging processing. The plenoptic functions are “general purpose,” allowing for different computations through programming. The chip can be programmed, for example, so as to function like other cameras, such as a dynamic vision sensor (DVS) camera. But they can be programmed to detect more than just temporal contrast, attaining more of the capabilities of the biological retina, such as spatial contrast, high pass filtering, low pass filtering, edge detection, and velocity estimation.
[0038] In various embodiments, a programmable “computer” is situated adjacent to each pixel unit and can be programmed and reprogrammed (configured and reconfigured) to function in different “modes.” That is, the matrix is reprogrammable to achieve different sets of functionalities. One mode may be to function like a standard camera, whereas other modes enhance the functionality of standard cameras.
[0039] In various embodiments, the voltages of the pixel units (e.g., VI, V2, V3, and V4) can be factored as a 2x2 matrix V, which can be multiplied with a weight matrix W that is also a 2x2 array. The result of the multiplication may be sent to an activation function to generate events. The result of the multiplication can be compared to a threshold. If the result exceeds the threshold, it can be deemed to be an “event,” and otherwise, it would not be deemed to be the event. The W matrix itself may be programmed and reprogrammed for different “modes.” For example, one weight matrix W (e.g., Wl) may provide for temporal contrast, another weight matrix W (e.g., W2) may provide for spatial contrast, and yet another weight matrix W (e.g., W3) may provide for high pass filtering. The chip could perform the multiplication of the V matrix by the user-set W matrix. The 2x2 size is discussed for illustrative purposes, and in various embodiments, the matrices may be configured to have different sizes (e.g., 3x3, 4x4, 5x5, 6x6, etc.).
[0040] In various embodiments, multiple weight matrices may be employed. For example, via multiple pathways, a chip can perform V multiplied by Wl, V multiplied by W2, and V multiplied by W5 in parallel. This would allow the chip to provide, for example, just temporal contrast, just spatial contrast, or both temporal contrast and spatial contrast at the same time, adding additional modes. The GALS architecture can provide such data throughput in implementing multiple weight matrices and modes.
[0041] In various embodiments, the architecture of the chip may be designed to provide multiple pathways for different quadrants of an image. For example, if we are interested in one region of an image in which there is more “action” (changes in what is being observed). The quadrants that are of lower interest (with less action) can be programmed to only provide temporal contrast, whereas the region of interest may be programmed to provide, for example, spatial contrast, optical flow, velocity estimation, etc. This is analogous to how biological vision systems function: an entire visual scene is not processed at the same time. Rather, biological system apply visual attention to a particular region of a scene, and that region is where compute power is focused so as to extract maximum information. This programming can be programmed in real time, such that what information is captured (observed) and in which regions can be modified to fit the current circumstances. This would allow for focus (attention) to dynamically change depending on activity.
[0042] In various embodiments, event detection may define “polarities.” In an example, whether there is an event or there is not an event would be a binary polarity. In other embodiments, there can be multiple (binary or non-binaiy) polarities. For example, a polarity can have four levels instead of a binary polarity with 2 levels. Four levels (polarities) could include, for example, a positive event, a negative event, or no event. This provides for detection of multi-polarity events. In various embodiments, the hardware may be programmed for, for example, six polarities.
[0043] In various embodiments, the W matrix can provide for filters, such as low pass filters, high pass filters, or band pass filters. Voltages can be manipulated with respect to neighbors as well. Within a pixel unit, we can know the voltage of the pixel at the current time as well as what the voltage was in a previous time, providing a delta V, or change in voltage over a particular time delta T. That can provide for a derivative dV, for example, useful for low pass filtering that may filtering out high frequency components. This signal processing can be implemented as matrix multiplications in the digital domain, but in various embodiments, this processing can be performed in an analog domain next to the pixel units. This provides for the capabilities of a computer at this level.
[0044] Referring to FIG. 6, at the top is depicted a design of a traditional sensor/camera and digital signal post-processing pipeline, where the pixels are scanned synchronously frame by frame and high throughput redundant data is provided by the sensor for further processing. At the bottom is depicted, for comparison, an example RVS system pipeline according to example embodiments of the disclosed approach. The RVS comprises a photodiode array that provides a photocurrent readout to a local tile scanning pixel array unit, which synchronously provides an analog voltage readout to an analog to digital conversion unit in Level 1. The conversion provides tile digital data to a programmable MAC kernel array and event detection unit in Level 2. Multi-polarity events are provided to an asynchronous priority sorting unit, and prioritized events at a grid-level are provided to a smart feature extraction unit in Level 3. Low throughput meaningful features at much lower bandwidth are then provided to digital signal post-processing.
[0045] As discussed above with respect to various example embodiments, Dynamic Vision Sensors (DVS) are fully asynchronous, leading to excellent temporal resolution (e.g., ~l .s) and high throughput (e.g., >1 Giga-events per second (Geps)) but are difficult to scale for higher resolutions (e.g., >1 MegaPixels (MP)) due to the complex pixel design and higher static power consumption. Traditional CMOS imagers called Active Pixel Sensors (APS) employing synchronous frame scanning methods can scale to high resolutions (e.g., >100 MegaPixels) but have severely limited temporal resolution due to the fixed frame rate, typically 120 frames per second (FPS). Embodiments of the approach disclosed above provide a novel event-based camera implementing a Globally Asynchronous Locally Synchronous (GALS) architecture that can, for example, guarantee an equivalent frame rate of around 10,000 FPS for 1 MP resolution and target throughput of 10 Geps, breaking the barriers of conventional APS and DVS cameras. Embodiments of the disclosed architecture combine the APS frame scanning technique locally for low-resolution synchronous tiles, and asynchronous readout of the generated events, similar to DVS cameras, globally at the grid level (array of tiles).
[0046] RVS camera can produce higher frame rates of, for example, 10,000 FPS for a IMP image size, making it effective for the potential applications of motion detection and object tracking at extreme speeds such as hypersonic missiles. At the global level, the operation of RVS will be asynchronous with the priority of event readout depending on the application and given to the pixels/group of pixels that satisfy the criteria. [0047] The implementations described herein have been described with reference to drawings. The drawings illustrate certain details of specific implementations that implement the systems, methods, and programs described herein. Describing the implementations with drawings should not be construed as imposing on the disclosure any limitations that may be present in the drawings.
[0048] It should be understood that no claim element herein is to be construed under the provisions of 35 U.S.C. § 112(f), unless the element is expressly recited using the phrase “means for.”
[0049] As used herein, the term “circuit” may include hardware structured to execute the functions described herein. In some implementations, each respective “circuit” may include machine-readable media for configuring the hardware to execute the functions described herein. The circuit may be embodied as one or more circuitry components including, but not limited to, processing circuitry, network interfaces, peripheral devices, input devices, output devices, sensors, etc. In some implementations, a circuit may take the form of one or more analog circuits, electronic circuits (e.g., integrated circuits (IC), discrete circuits, system on a chip (SOC) circuits), telecommunication circuits, hybrid circuits, and any other type of “circuit.” In this regard, the “circuit” may include any type of component for accomplishing or facilitating achievement of the operations described herein. In a non-limiting example, a circuit as described herein may include one or more transistors, logic gates (e.g., NAND, AND, NOR, OR, XOR, NOT, XNOR), resistors, multiplexers, registers, capacitors, inductors, diodes, wiring, and so on.
[0050] The “circuit” may also include one or more processors communicatively coupled to one or more memory or memory devices. In this regard, the one or more processors may execute instructions stored in the memory or may execute instructions otherwise accessible to the one or more processors. In some implementations, the one or more processors may be embodied in various ways. The one or more processors may be constructed in a manner sufficient to perform at least the operations described herein. In some implementations, the one or more processors may be shared by multiple circuits (e.g., circuit A and circuit B may comprise or otherwise share the same processor, which, in some example implementations, may execute instructions stored, or otherwise accessed, via different areas of memory). Alternatively or additionally, the one or more processors may be structured to perform or otherwise execute certain operations independent of one or more co-processors.
[0051] In other example implementations, two or more processors may be coupled via a bus to enable independent, parallel, pipelined, or multi-threaded instruction execution. Each processor may be implemented as one or more general -purpose processors, ASICs, FPGAs, GPUs, TPUs, digital signal processors (DSPs), or other suitable electronic data processing components structured to execute instructions provided by memory. The one or more processors may take the form of a single core processor, multi-core processor (e.g., a dual core processor, triple core processor, or quad core processor), microprocessor, etc. In some implementations, the one or more processors may be external to the apparatus, in a non-limiting example, the one or more processors may be a remote processor (e.g., a cloud-based processor). Alternatively or additionally, the one or more processors may be internal or local to the apparatus. In this regard, a given circuit or components thereof may be disposed locally (e.g., as part of a local server, a local computing system) or remotely (e.g., as part of a remote server such as a cloud based server). To that end, a “circuit” as described herein may include components that are distributed across one or more locations.
[0052] An exemplary system for implementing the overall system or portions of the implementations might include a general purpose computing devices in the form of computers, including a processing unit, a system memory, and a system bus that couples various system components including the system memory to the processing unit. Each memory device may include non-transient volatile storage media, non-volatile storage media, non-transitory storage media (e.g., one or more volatile or non-volatile memories), etc. In some implementations, the non-volatile media may take the form of ROM, flash memory (e.g., flash memory such as NAND, 3D NAND, NOR, 3D NOR), EEPROM, MRAM, magnetic storage, hard discs, optical discs, etc. In other implementations, the volatile storage media may take the form of RAM, TRAM, ZRAM, etc. Combinations of the above are also included within the scope of machine- readable media. In this regard, machine-executable instructions comprise, in a non-limiting example, instructions and data, which cause a general-purpose computer, special purpose computer, or special purpose processing machines to perform a certain function or group of functions. Each respective memory device may be operable to maintain or otherwise store information relating to the operations performed by one or more associated circuits, including processor instructions and related data (e.g., database components, object code components, script components), in accordance with the example implementations described herein.
[0053] It should also be noted that the term “input devices,” as described herein, may include any type of input device including, but not limited to, a keyboard, a keypad, a mouse joystick, or other input devices performing a similar function. Comparatively, the term “output device,” as described herein, may include any type of output device including, but not limited to, a computer monitor, printer, facsimile machine, or other output devices performing a similar function.
[0054] It should be noted that although the diagrams herein may show a specific order and composition of method steps, it is understood that the order of these steps may differ from what is depicted. In a non-limiting example, two or more steps may be performed concurrently or with partial concurrence. Also, some method steps that are performed as discrete steps may be combined, steps being performed as a combined step may be separated into discrete steps, the sequence of certain processes may be reversed or otherwise varied, and the nature or number of discrete processes may be altered or varied. The order or sequence of any element or apparatus may be varied or substituted according to alternative implementations. Accordingly, all such modifications are intended to be included within the scope of the present disclosure as defined in the appended claims. Such variations will depend on the machine-readable media and hardware systems chosen and on designer choice. It is understood that all such variations are within the scope of the disclosure. Likewise, software and web implementations of the present disclosure could be accomplished with standard programming techniques with rule-based logic and other logic to accomplish the various database searching steps, correlation steps, comparison steps, and decision steps.
[0055] While this specification contains many specific implementation details, these should not be construed as limitations on the scope of any inventions or of what may be claimed, but rather as descriptions of features specific to particular implementations of the systems and methods described herein. Certain features that are described in this specification in the context of separate implementations may also be implemented in combination in a single implementation. Conversely, various features that are described in the context of a single implementation may also be implemented in multiple implementations separately or in any suitable subcombination. Moreover, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination may in some cases be excised from the combination, and the claimed combination may be directed to a sub combi nation or variation of a subcombination.
[0056] In certain circumstances, multitasking and parallel processing may be advantageous. Moreover, the separation of various system components in the implementations described above should not be understood as requiring such separation in all implementations, and it should be understood that the described program components and systems may generally be integrated together in a single software product or packaged into multiple software products.
[0057] Having now described some illustrative implementations and implementations, it is apparent that the foregoing is illustrative and not limiting, having been presented by way of example. In particular, although many of the examples presented herein involve specific combinations of method acts or system elements, those acts and those elements may be combined in other ways to accomplish the same objectives. Acts, elements, and features discussed only in connection with one implementation are not intended to be excluded from a similar role in other implementations.
[0058] The phraseology and terminology used herein is for the purpose of description and should not be regarded as limiting. The use of “including,” “comprising,” “having,” “containing,” “involving,” “characterized by,” “characterized in that,” and variations thereof herein, is meant to encompass the items listed thereafter, equivalents thereof, and additional items, as well as alternate implementations consisting of the items listed thereafter exclusively. In one implementation, the systems and methods described herein consist of one, each combination of more than one, or all of the described elements, acts, or components. [0059] Any references to implementations or elements or acts of the systems and methods herein referred to in the singular may also embrace implementations including a plurality of these elements, and any references in plural to any implementation or element or act herein may also embrace implementations including only a single element. References in the singular or plural form are not intended to limit the presently disclosed systems or methods, their components, acts, or elements to single or plural configurations. References to any act or element being based on any information, act, or element may include implementations where the act or element is based at least in part on any information, act, or element.
[0060] Any implementation disclosed herein may be combined with any other implementation, and references to “an implementation,” “some implementations,” “an alternate implementation,” “various implementation,” “one implementation,” or the like are not necessarily mutually exclusive and are intended to indicate that a particular feature, structure, or characteristic described in connection with the implementation may be included in at least one implementation. Such terms as used herein are not necessarily all referring to the same implementation. Any implementation may be combined with any other implementation, inclusively or exclusively, in any manner consistent with the aspects and implementations disclosed herein.
[0061] References to “or” may be construed as inclusive so that any terms described using “or” may indicate any of a single, more than one, and all of the described terms.
[0062] Where technical features in the drawings, detailed description or any claim are followed by reference signs, the reference signs have been included for the sole purpose of increasing the intelligibility of the drawings, detailed description, and claims. Accordingly, neither the reference signs nor their absence have any limiting effect on the scope of any claim elements.
[0063] The foregoing description of implementations has been presented for purposes of illustration and description. It is not intended to be exhaustive or to limit the disclosure to the precise form disclosed, and modifications and variations are possible in light of the above teachings or may be acquired from this disclosure. The implementations were chosen and described in order to explain the principals of the disclosure and its practical application to enable one skilled in the art to utilize the various implementations and with various modifications as are suited to the particular use contemplated. Other substitutions, modifications, changes, and omissions may be made in the design, operating conditions and implementation of the implementations without departing from the scope of the present disclosure as expressed in the appended claims.
EQUIVALENTS
[0064] The present technology is not to be limited in terms of the particular embodiments described in this application, which are intended as single illustrations of individual aspects of the present technology. Many modifications and variations of this present technology can be made without departing from its spirit and scope, as will be apparent to those skilled in the art. Functionally equivalent methods and apparatuses within the scope of the present technology, in addition to those enumerated herein, will be apparent to those skilled in the art from the foregoing descriptions. Such modifications and variations are intended to fall within the scope of the present technology. It is to be understood that this present technology is not limited to particular methods or systems, which can, of course, vary. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to be limiting.
[0065] As will be understood by one skilled in the art, for any and all purposes, particularly in terms of providing a written description, all ranges disclosed herein also encompass any and all possible subranges and combinations of subranges thereof. Any listed range can be easily recognized as sufficiently describing and enabling the same range being broken down into at least equal halves, thirds, quarters, fifths, tenths, etc. As a non-limiting example, each range discussed herein can be readily broken down into a lower third, middle third and upper third, etc. As will also be understood by one skilled in the art all language such as “up to,” “at least,” “greater than,” “less than,” and the like, include the number recited and refer to ranges which can be subsequently broken down into subranges as discussed above. Finally, as will be understood by one skilled in the art, a range includes each individual member. Thus, for example, a group having 1-3 cells refers to groups having 1, 2, or 3 cells. Similarly, a group having 1-5 cells refers to groups having 1, 2, 3, 4, or 5 cells, and so forth. [0066] All patents, patent applications, provisional applications, and publications referred to or cited herein are incorporated by reference in their entirety, including all figures and tables, to the extent they are not inconsistent with the explicit teachings of this specification.

Claims

Claims
1. A vision system comprising a device for visual event routing, the device having a plurality of pixel units at a first level, a plurality of in-pixel compute units at a second level, and a plurality of event sorting units at a third level, wherein the first level is communicably coupled to the second level and the second level is communicably coupled to the third level.
2. The vision system of claim 1, wherein the system has a globally asynchronous and locally synchronous (GALS) architecture
3. The system of claim 1, wherein the first level includes at least four pixel units, and wherein each pixel unit is configured to (i) directly communicate with one of the compute units of the second level, and (ii) indirectly communicate with other compute units of the second level.
4. The system of claim 1, wherein the second level implements plenoptic functions.
5. The system of claim 1, wherein the second level performs analog computing functions.
6. The system of claim 1, wherein the second level performs multiply-and-accumulate
(MAC) operations.
7. The system of claim 1, wherein the third level performs functions digitally.
8. The system of claim 1, wherein each pixel unit comprises a photodiode.
9. The system of claim 1, wherein the second level implements weighting functions for voltages from pixel units of the first level.
10. The system of claim 9, wherein the weighting functions extract one or more of temporal contrast, spatial contrast, high pass filtering, low pass filtering, edge detection, contour extraction, and/or velocity estimation.
11. The system of claim 1, wherein three two-dimensional layers are stacked to obtain a three-dimensional system.
12. A retina-inspired vision sensor employing visual spatiotemporal filtering methods.
13. The vision sensor of claim 12, comprising a programmable and reconfigurable multimodal processor immediately adjacent to a sensor site.
14. The vision sensor of claim 13, wherein the processor performs computing functions, and wherein the sensor sites are pixels of the vision sensor.
15. The vision sensor of claim 12, with event detection implemented via multiple polarities.
16. The vision sensor of claim 15, wherein event detection is implemented via multiple pathways from a block of pixels, analogous to the organization of the retina.
17. The vision sensor of claim 16, wherein a first event corresponds to a first polarity being fired for contour extraction, and a second event corresponds to a second polarity being fired for edge detection.
18. The vision sensor of claim 12, configured for programmable selection of filters.
19. The vision sensor of claim 18, designed to implement selectivity of event filtering by a configurable minimal set of parameters.
20. The vision sensor of claim 12, configured to implement an efficient and adaptive event scanning methodology.
21. The vision sensor of claim 12, comprising a 2.5D H-Tree asynchronous event readout without any arbitration of events.
22. The vision sensor of claim 21, wherein the vision sensor does not employ arbitration schemes used in asynchronous event-based vision sensors as they are prone to errors and inefficient for scanning event streams with variable activity and delay.
23. The vision sensor of claim 12, configured to implement adaptive event scanning depending on statistics of the input scene.
24. The vision sensor of claim 12, configured to implement adaptive scanning of events depending on local velocity or event density within a region of interest in the visual scene.
25. The vision sensor of claim 12, comprising a fine-tuned hardware design for optimized scanning of events to achieve extreme energy efficiency.
PCT/US2023/036273 2022-10-31 2023-10-30 Neuromorphic programmable multiple pathways event-based sensors WO2024097128A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202263420820P 2022-10-31 2022-10-31
US63/420,820 2022-10-31

Publications (1)

Publication Number Publication Date
WO2024097128A1 true WO2024097128A1 (en) 2024-05-10

Family

ID=90931304

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2023/036273 WO2024097128A1 (en) 2022-10-31 2023-10-30 Neuromorphic programmable multiple pathways event-based sensors

Country Status (1)

Country Link
WO (1) WO2024097128A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140355861A1 (en) * 2011-08-25 2014-12-04 Cornell University Retinal encoder for machine vision
WO2021128531A1 (en) * 2019-12-24 2021-07-01 清华大学 Bimodal bionic vision sensor with retinal cone and retinal rod
US20220247953A1 (en) * 2021-02-04 2022-08-04 Egis Technology Inc. Image sensor chip and sensing method thereof

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140355861A1 (en) * 2011-08-25 2014-12-04 Cornell University Retinal encoder for machine vision
WO2021128531A1 (en) * 2019-12-24 2021-07-01 清华大学 Bimodal bionic vision sensor with retinal cone and retinal rod
US20220247953A1 (en) * 2021-02-04 2022-08-04 Egis Technology Inc. Image sensor chip and sensing method thereof

Similar Documents

Publication Publication Date Title
US7098437B2 (en) Semiconductor integrated circuit device having a plurality of photo detectors and processing elements
Miao et al. A programmable SIMD vision chip for real-time vision applications
Birem et al. DreamCam: A modular FPGA-based smart camera architecture
KR20160038693A (en) Dynamic vision sensor including pixel clusters, operation method thereof and system having the same
CA2578005A1 (en) Imaging apparatus, image processing method and integrated circuit
KR20130100524A (en) Method of operating a three-dimensional image sensor
Tang et al. Considerations of integrating computing-in-memory and processing-in-sensor into convolutional neural network accelerators for low-power edge devices
Song et al. A reconfigurable convolution-in-pixel cmos image sensor architecture
Yang et al. A bio-inspired spiking vision chip based on SPAD imaging and direct spike computing for versatile edge vision
Richter et al. Speck: A smart event-based vision sensor with a low latency 327k neuron convolutional neuronal network processing pipeline
Földesy et al. Configurable 3D‐integrated focal‐plane cellular sensor–processor array architecture
Liu et al. Direct servo control from in-sensor cnn inference with a pixel processor array
WO2024097128A1 (en) Neuromorphic programmable multiple pathways event-based sensors
So et al. Pixelrnn: in-pixel recurrent neural networks for end-to-end-optimized perception with neural sensors
Rodríguez-Vázquez et al. A 3D chip architecture for optical sensing and concurrent processing
JP2001195564A (en) Image detection and processing apparatus
Birem et al. FPGA-based real time extraction of visual features
Zhang et al. An ultra-high-speed hardware accelerator for image reconstruction and stereo rectification on event-based camera
Chalimbaud et al. Design of an imaging system based on FPGA technology and CMOS imager
Ryu et al. A 0.82 μW CIS-based action recognition SoC with self-adjustable frame resolution for always-on IoT devices
Schmidt et al. A smart camera processing pipeline for image applications utilizing marching pixels
Bernard et al. Output methods for an associative operation of programmable artificial retinas
Mohammed et al. Resolution mosaic-based smart camera for video surveillance
Bailey et al. Intelligent camera for object identification and tracking
Lam et al. Lowering dynamic power in stream-based harris corner detection architecture

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23886579

Country of ref document: EP

Kind code of ref document: A1