USRE49461E1 - Graphic processor based accelerator system and method - Google Patents
Graphic processor based accelerator system and method Download PDFInfo
- Publication number
- USRE49461E1 USRE49461E1 US17/136,343 US202017136343A USRE49461E US RE49461 E1 USRE49461 E1 US RE49461E1 US 202017136343 A US202017136343 A US 202017136343A US RE49461 E USRE49461 E US RE49461E
- Authority
- US
- United States
- Prior art keywords
- computations
- memory
- neural network
- artificial neural
- stream
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T1/00—General purpose image data processing
- G06T1/60—Memory management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T1/00—General purpose image data processing
- G06T1/20—Processor architectures; Processor configuration, e.g. pipelining
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2209/00—Indexing scheme relating to G06F9/00
- G06F2209/50—Indexing scheme relating to G06F9/50
- G06F2209/509—Offload
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
Definitions
- GPUs Graphics Processing Units
- PCs personal computers
- PCs personal computers
- PCs personal computers
- PCs personal computers
- workstations workstations
- GPUs Graphics Processing Units
- GPUs have included general purpose programmability into the GPU architecture leading to the increased popularity of using GPUs for highly parallelizable and computationally expensive algorithms outside of the computer graphics domain.
- GPUs general purpose GPU
- these general purpose GPU (GPGPU) applications are not able to achieve optimal performance, however.
- Numerical simulations e.g., finite element analysis, of large systems of similar elements (e.g. neural networks, genetic algorithms, particle systems, mechanical systems) are one example of an application that can benefit from GPGPU computation.
- disk and user input/output can be performed independently of computation because these two processes require interactions with peripheral hardware (disk, screen, keyboard, mouse, etc) and put relatively low load on the central processing unit/system (CPU).
- CPU central processing unit/system
- Complete independence is not desirable, however; user input might affect how the computation is performed and even interrupt it if necessary.
- the user output and the disk output are dependent on the results of the computation.
- a reasonable solution would be to separate input/output into threads, so that it is interacting with hardware occurs in parallel with the computation. In this case whatever CPU processing is required for input/output should be designed so that it provides the synchronization with computation.
- the central processing unit establishes communication and synchronization between peripherals.
- peripherals are preferably controlled by a dedicated thread that is executed in parallel with minimal interactions and dependencies on the other threads.
- a GPU on a conventional video card is usually controlled through OpenGL, DirectX, or similar graphic application programming interfaces (APIs).
- APIs establish the context of graphic operations, within which all calls to the GPU are made. This context only works when initialized within the same thread of execution that uses it. As a result, in a preferred embodiment, the context is initialized within a computational thread. This creates complications, however, in the interaction between the user interface thread that changes parameters of simulations and the computational thread that uses these parameters.
- a solution as proposed here is an implementation of the computational stream of execution in hardware, so that thread and context initialization are replaced by hardware initialization.
- This hardware implementation includes an expansion card comprising a printed circuit board having (a) one or more graphics processing units, (b) two or more associated memory banks that are logically or physically partitioned, (c) a specialized controller, and (d) a local bus providing signal coupling compatible with the PCI industry standards (this includes but is not limited to PCI-Express, PCI-X, USB 2.0, or functionally similar technologies).
- the controller handles most of the primitive operations needed to set up and control GPU computation. As a result, the CPU is freed from this function and is dedicated to other tasks.
- the invention features a computer system.
- This system comprises a central processing unit, main memory accessed by the central processing unit, and a video system for driving a video monitor in response to the central processing unit as is common.
- the computer system further comprises an accelerator that uses input data from and provides output data to the central processing unit.
- This accelerator comprises at least one graphics processing unit, accelerator memory for the graphic processing unit, and an accelerator controller that moves the input data into the at least one graphics processing unit and the accelerator memory to generate the output data.
- the central processing unit transfers the input data for a simulation to the accelerator, after which the accelerator executes simulation computations to generate the output data, which is transferred to the central processing unit.
- the accelerator controller dictates an order of execution of instructions to the at least one graphics processing unit. The use of the separate controller enables data transfer during execution such that the accelerator controller transfers output data from the accelerator memory to main memory of the central processing unit.
- the accelerator controller comprises an interface controller that enables the accelerator to communicate over a bus of the computer system with the central processing unit.
- the invention also features an accelerator system for a computer system, which comprises at least one graphics processing unit, accelerator memory for the graphic processing unit and an accelerator controller for moving data between the at least one graphics processing unit and the accelerator memory.
- the invention also features a method for performing numerical simulations in a computer system.
- This method comprises a central processing unit loading input data into an accelerator system from main memory of the central processing unit and an accelerator controller transferring the input data to a graphics processing unit with instructions to be performed on the input data.
- the accelerator controller then transfers output data generated by the graphic processing unit to the central processing unit as output data.
- FIG. 1 is a schematic diagram illustrating a computer system including the GPU accelerator according to an embodiment of the present invention
- FIG. 2 is block diagram illustrating the architecture for the GPU accelerator according to an embodiment of the present invention
- FIG. 3 is a block/flow diagram illustrating an exemplary implementation of the top level control of the GPU accelerator system
- FIG. 4 is a flow diagram illustrating an exemplary implementation of the bottom level control of the GPU accelerator system that is used to execute the target computation
- FIG. 5 is an example population of nine computational elements arranged in a 3 ⁇ 3 square and a potential packing scheme for texture pixels, according to an implementation of the present invention.
- FIG. 1 shows a computer system 100 that has been constructed according to the principles of the present invention.
- the computer system 100 in one example is a standard personal computer (PC).
- PC personal computer
- suitable computing environments for this invention including, but not limited to, workstations, server computers, supercomputers, notebook computers, hand-held electronic devices such as cell phones, mp3 players, or personal digital assistants (PDAs), multiprocessor systems, programmable consumer electronics, networks of any of the above-mentioned computing devices, and distributed computing environments that including any of the above-mentioned computing devices.
- the GPU accelerator is implemented as an expansion card 180 includes connections with the motherboard 110 , on which the one or more CPU's 120 are installed along with main, or system memory 130 and mass/non volatile data storage 140 , such as hard drive or redundant array of independent drives (RAID) array, for the computer system 100 .
- the expansion card 180 communicates to the motherboard 110 via a local bus 190 .
- This local bus 190 could be PCI, PCI Express, PCI-X, or any other functionally similar technology (depending upon the availability on the motherboard 110 ).
- An external version GPU accelerator is also a possible implementation.
- the external GPU accelerator is connected to the motherboard 110 through USB-2.0, IEEE 1394 (Firewire), or similar external/peripheral device interface.
- the CPU 120 and the system memory 130 on the motherboard 110 and the mass data storage system 140 are preferably independent of the expansion card 180 and only communicate with each other and the expansion card 180 through the system bus 200 located in the motherboard 110 .
- a system bus 200 in current generations of computers have bandwidths from 3.2 GB/s (Pentium 4 with AGTL+, Athlon XP with EV6) to around 15 GB/s (Xeon Woodcrest with AGTL+, Athlon 64/Opteron with Hypertransport), while the local bus has maximal peak data transfer rates of 4 GB/s (PCI Express 16) or 2 GB/s (PCI-X 2.0).
- PCI Express 16 PCI Express 16
- PCI-X 2.0 PCI-X 2.0
- the system memory 130 is referred to as the main random-access memory (RAM) in the description herein. However, this is not intended to limit the system memory 130 to only RAM technology.
- RAM main random-access memory
- Other possible computer storage media include, but are not limited to ROM, EEPROM, flash memory, or any other memory technology.
- the GPU accelerator system is implemented on an expansion card 180 on which the one or more GPU's 240 are mounted. It should be noted that the GPU accelerator system GPU 240 is separate from and independent of any GPU on the standard video card 150 or other video driving hardware such as integrated graphics systems. Thus the computations performed on the expansion card 180 do not interfere with graphics display (including but not limited to manipulation and rendering of images).
- GPU Various brand of GPU are relevant. Under current technology, GPU's based on the GeForce series from NVIDIA Corporation or the Catalyst series from ATI/Advanced Micro Devices, Inc.
- the output to a video monitor 170 is preferably through the video card 150 and not the GPU accelerator system 180 .
- the video card 150 is dedicated to the transfer of graphical information and connects to the motherboard 110 through a local bus 160 that is sometimes physically separate from the local bus 190 that connects the expansion card 180 to the motherboard 110 .
- FIG. 2 is a block diagram illustrating the general architecture of the GPU accelerator system and specifically the expansion card 180 in which at least one GPU 240 and associated memories 210 and 250 are mounted. Electrical (signal) and mechanical coupling with a local bus 190 provides signal coupling compatible with the PCI industry standards (this includes but is not limited to PCI, PCI-X, PCI Express, or functionally similar technology).
- the GPU accelerator further preferably comprises one specifically designed accelerator controller 220 .
- the accelerator controller 220 is field programmable gate array (FPGA) logic, or custom built application-specific (ASIC) chip mounted in the expansion card 180 , and in mechanical and signal coupling with the GPU 240 and the associated memories 210 and 250 .
- FPGA field programmable gate array
- ASIC application-specific
- the controller 220 commands the storage and retrieval of arrays of data (on a conventional video card the arrays of data are represented as textures, hence the term ‘texture’ in this document refers to a data array unless specified otherwise and each element of the texture is a pixel of color information), execution of GPU programs (on a conventional video card these programs are called shaders, hence the term ‘shader’ in this document refers to a GPU program unless specified otherwise), and data transfer between the system bus 200 and the expansion card 180 through the local bus 190 which allows communication between the main CPU 120 , RAM 130 , and disk 140 .
- textures hence the term ‘texture’ in this document refers to a data array unless specified otherwise and each element of the texture is a pixel of color information
- execution of GPU programs on a conventional video card these programs are called shaders, hence the term ‘shader’ in this document refers to a GPU program unless specified otherwise
- data transfer between the system bus 200 and the expansion card 180 through the local bus 190 which allows communication between the main
- Two memory banks 210 and 250 are mounted on the expansion card 180 .
- these memory banks separated in the hardware, as shown, or alternatively implemented as a single, logically partitioned memory component.
- the reason to separate the memory into two partitions 210 250 stems from the nature of the computations to which the GPU accelerator system is applied.
- the elements of computation are characterized by a single output variable. Such computational elements often include one or more equations. Computational elements are same or similar within a large population and are computed in parallel. An example of such a population is a layer of neurons in an artificial neural network (ANN), where all neurons are described by the same equation.
- ANN artificial neural network
- one memory the shader memory bank 210
- the texture memory bank 250 is used to store all the necessary data that are specific for every computational element (including, but not limited to, input data, output data, intermediate results, and parameters) and is coupled with both the controller 220 and the GPU 240 .
- the texture memory bank 250 is preferably further partitioned into four sections.
- the first partition 250 a is designed to hold the external input data patterns.
- the second partition 250 b is designed to hold the data textures representing internal variables.
- the third partition 250 c is designed to hold the data textures used as input at a particular computation step on the GPU 240 .
- the fourth partition 250 d holds the data textures used to accommodate the output of a particular computational step on the GPU 240 .
- This partitioning scheme can be done logically, does not require hardware implementation. Also the partitioning scheme is also altered based on new designs or needs of the algorithms being employed. The reason for this partitioning is further explained in the Data Organization section, below.
- a local bus interface 230 on the controller 220 serves as a driver that allows the controller 220 to communicate through the local bus 190 with the system bus 200 and thus the CPU 120 and RAM 130 .
- This local bus interface 230 is not intended to be limited to PCI related technology. Other drivers can be used to interface with comparable technology as a local bus 190 .
- Each computational element discussed above has output variables that affect the rest of the system. For example in the case of a neural network it is the output of a neuron.
- a computational element also usually has several internal variables that are used to compute output variables, but are not exposed to the rest of the system, not even to other elements of the same population, typically. Each of these variables is represented as a texture. The important difference between output variables and internal variables is their access.
- Output variables are usually accessed by any element in the system during every time step.
- the value of the output variable that is accessed by other elements of the system corresponds to the value computed on the previous, not the current, time step.
- This is realized by dedicating two textures to output variables—one holds the value computed during the previous time step and is accessible to all computational elements during the current time step, another is not accessible to other elements and is used to accumulate new values for the variable computed during the current time step. In-between time steps these two textures are switched, so that newly accumulated values serve as accessible input during the next time step, while the old input is replaced with new values of the variable.
- This switch is implemented by swapping the address pointers to respective textures as described in the System and Framework section.
- Textures can have up to four color components that are all processed in parallel on a GPU.
- texture element can have up to four color components that are all processed in parallel on a GPU.
- designating one texture pixel per element is ineffective because internal variables require one texture and output variables require two textures.
- different element types have different numbers of variables and unless this number is precisely a multiple of four, texture memory can be wasted.
- FIG. 5 a An example population of nine computational elements arranged in a 3 ⁇ 3 square ( FIG. 5 a ) can be packed by element ( FIG. 5 b ), by row ( FIG. 5 c ), or by square ( FIG. 5 d ).
- Packing by element means that elements 1 , 2 , 3 , 4 go into first pixel; 5 , 6 , 7 , 8 go into second pixel; 9 goes into third pixel. This is the most compact scheme, but not convenient because the geometrical relationship is not preserved during packing and its extraction depends on the size of the population.
- Packing by row means that elements 1 , 2 , 3 go into pixel ( 1 , 1 ); 3 , 4 , 5 go into pixel ( 2 , 1 ), 7 , 8 , 9 go into pixel ( 3 , 1 ).
- the element's y coordinate in the population is the pixel's y coordinate
- the element's x coordinate in the population is the pixel's x coordinate times four plus the index of color component.
- Five by five populations in this case will use 2 ⁇ 5 texture, or 10 pixels. Five of these pixels will only use one out of four components, so it wastes 37.5% of this texture. 25 ⁇ 1 population will use 6 ⁇ 1 texture (six pixels) and will waste 12.5% of it.
- Packing by square means that elements 1 , 2 , 4 , 5 go into pixel ( 1 , 1 ); 3 , 6 go into pixel ( 1 , 2 ); 7 , 8 go into pixel ( 2 , 1 ), and 9 goes into pixel ( 2 , 2 ).
- Both the row and the column of the element are determined from the row (column) of the pixel times two plus the second (first) bit of the color component index.
- Five by five populations in this case will use 3 ⁇ 3 texture, or 9 pixels. Four of these pixels will only use two out of four components, and one will only use one component, so it wastes 34.4% of this texture. This is more advantageous than packing by row, since the texture is smaller and the waste is also lower. 25 ⁇ 1 population on the other hand will use 13 ⁇ 1 texture (thirteen pixels) and waste >50% of it, which is much worse than packing by row.
- FIG. 3 shows an exemplary implementation of the top level system and method that is used to control the computation. It is a representation of one of several ways in which a system and method for processing numerical techniques can be implemented in the invention described herein and so the implementation is not intended to be limited to the following description and accompanying figure.
- the method presented herein includes two execution streams that run on the CPU 120 —User Interaction Stream 302 and Data Output Stream 301 . These two streams preferably do not interact directly, but depend on the same data accumulated during simulations. They can be implemented as separate threads with shared memory access and executed on different CPUs in the case of multi-CPU computing environment.
- the Computational Stream 303 interacts with the User Interaction Stream and the Data Output Stream through synchronization procedures during simulations.
- the crucial feature of the interaction between the User Interaction Stream 302 and the Computational Stream 303 is the shift of priorities. Outside of the simulation, the system 100 is driven by the user input, thus the User Interaction Stream 302 has the priority and controls the data exchange 304 between streams. After the user starts the simulation, the Computational Stream 303 takes the priority and controls the data exchange between streams until the simulation is finished or interrupted 350 .
- the user starts 300 the framework through the means of an operating system and interacts with the software through the user interaction section 305 of the graphic user interface 306 executed on the CPU 120 .
- the start 300 of the implementation begins with a user action that causes a GUI initialization 307 , Disk input/output initialization 308 on the CPU 120 , and controller initialization 320 of the GPU accelerator on the expansion card 180 .
- GUI initialization includes opening of the main application window and setting the interface tools that allow the user to control the framework.
- Disk I/O initialization can be performed at the start of the framework, or at the start of each individual simulation.
- the user interaction 305 controls the setting and editing of the computational elements, parameters, and sources of external inputs. It specifies which equations should have their output saved to disk and/or displayed on the screen. It allows the user to start and stop the simulation. And it performs standard interface functions such as file loading and saving, interactive help, general preferences and others.
- the user interaction 305 directs the CPU 120 to acquire the new external input textures needed (this includes but is not limited to loading from disk 140 or receiving them in real time from a recording device), parses them if necessary 309 , and initializes their transfer to the expansion card 180 , where they are stored 325 in the texture memory bank 250 by the controller 220 .
- the user interaction 305 also directs the CPU 120 to parse populations of elements that will be used in the simulation, convert them to GPU programs (shaders), compile them 310 , and initializes their transfer to the expansion card 180 , where they are stored 326 in the shader memory bank 210 by the controller 220 .
- This operation is accompanied by the upload 309 of the initial data into the input partition of the texture memory bank 250 , and stores the shader order of execution in the controller 220 .
- the user can perform operations 309 and 310 as many times as necessary prior to starting the simulation or between simulations.
- the editing of the system between simulations is difficult to accomplish without the hardware implementation of the computational thread suggested herein.
- the system of equations is represented by textures that track variables plus shaders that define processing algorithms.
- textures, shaders and other graphics related constructs can only be initialized within the rendering context, which is thread specific. Therefore textures and shaders can only be initialized in the computational thread.
- Network editing is a user-interactive process, which according to the scheme suggested above happens in the User Interaction Stream 302 .
- the simulation software thus has to take the new parameters from the User Interaction Stream 302 , communicate them to the Computational Stream 303 and regenerate the necessary shaders and textures. This is hard to accomplish without a hardware implementation of the Computational Stream 303 .
- the Computational Stream 303 is forked from the User Interaction Stream and it can access the memory of the parent thread, but the reverse communication is harder to achieve.
- the controller 220 allows operations 309 and 310 to be performed as many times as necessary by providing the necessary communication to the User Interaction Stream 302 .
- the user After execution of the input parser texture generation 309 and population parser shader generator and compiler 310 are performed at least once, the user has the option to initialize the simulation 311 .
- the main control of the framework is transferred to the GPU accelerator system's accelerator controller 220 and computation 330 is started (see FIG. 4 ; 420 ).
- the user retains the ability to interrupt the simulation, change the input, or to change the display properties of the framework, but these interactions are queued to be performed at times determined by the controller-driven data exchange 314 and 316 to avoid the corruption of the data.
- the progress monitor 312 is not necessary for performance, but adds convenience. It displays the percentage of completed time steps of the simulation and allows the user to plan the schedule using the estimates of the simulation wall clock times. Controller-driven data exchange 314 updates the display of the results 313 . Online screen output for the user selected population allows the user to monitor the activity and evaluate the qualitative behavior of the network. Simulations with unsatisfactory behavior can be terminated early to change parameters and restart. Controller-driven data exchange 314 also drives the output of the results to disk 317 . Data output to disk for convenience can be done on an element per file basis.
- a suggested file format includes a leftmost column that displays a simulated time for each of the simulation steps and subsequent columns that display variable values during this time step in all elements with identical equations (e.g. all neurons in a layer of a neural network).
- Controller-driven data exchange or input parser texture generator 316 allows the user to change input that is generated on the fly during the simulation. This allows the framework monitoring of the input that is coming from a recording device (video camera, microphone, cell recording electrode, etc) in real time. Similar to the initial input parser 309 , it preprocesses the input into a universal format of the data array suitable for texture generation and generates textures. Unlike the initial parser 309 , here the textures are transferred to hardware not whenever ready but upon the request of the controller 220 .
- the controller 220 also drives the conditional testing 315 and 318 informs the CPU-bound streams whether the simulation is finished. If so, the control returns to the User Interaction Stream. The user then can change parameters or inputs ( 309 and 310 ), restart the simulation ( 311 ) or quit the framework ( 390 ).
- SANNDRA Session Node Network Distributed Runtime Algorithm
- http://www.kinness.net/Docs/SANNDRA/html was developed to accelerate and optimize processing of numerical integration of large non-homogenous systems of differential equations.
- This library is fully reworked in its version 2.x.x to support multiple computational backends including those based on multicore CPUs, GPUs and other processing systems.
- GPU based backend for SANNDRA-2.x.x can serve as an example practical software implementation of the method and architecture described above and pictorially represented in FIG. 3 .
- TSimulator object either directly or through inheritance. This object will handle global simulation properties and control the User Interaction Stream, Data Output Stream, and Computational Stream.
- TSimulator::timestep( ) TSimulator::outfileInterval( ), and TSimulator::outmode( )
- the application can set the time step of the simulation, the time step of disk output, and the mode of the disk output.
- the external input pattern should be packed into a TPattern object and bound to the simulation object through TSimulator::resetInputs( ) method.
- TSimulator::simLength( ) sets the length of the simulation.
- the second step is to create at least one population of equations (Tpopulation object).
- Population holds one equation object TEquation.
- This object contains only a formula and does not hold element-specific data, so all elements of the population can share single TEquation.
- the TEquation object is converted to a GPU program before execution.
- GPU programs have to be executed within a graphical context, which is stream specific.
- TSimulator creates this context within a Computational Stream, therefore all programs and data arrays that are necessary for computation have to be initialized within Computational Stream.
- Constructor of TPopulation is called from User Interaction Stream, so no GPU-related objects can be initialized in this constructor.
- TPopulation::fillElements( ) is a virtual method designed to overcome this difficulty. It is called from within the Computational Stream after TSimulator::networkCreate( ) is called in the User Interaction Stream. A user has to override TPopulation::fillElements( ) to create TEquation and other computation related objects both element independent and element-specific. Element independent objects include sub-components of TEquation and objects that describe how to handle interdependencies between variables implemented through derivatives of TGate class.
- Element-specific data is held in TElement objects. These objects hold references to TEquation and a set of TGate objects. There is one TElement per population, but the size of data arrays within this object corresponds to population size. All TElement objects have to be added to the TSimulator list of elements by calling TSimulator::addUnit( ) method from TPopulation::fillElements( ).
- TPopulation::fillElements( ) should contain a set of TElement::add*Dependency( ) calls for each element. Each of these calls sets a corresponding dependency for every TGate object.
- TGate object holds element independent part of dependency
- TElement::add*Dependency( ) sets element-specific details.
- TPopulation handles the output of computational elements, both when they need to exchange the data and when they need to output it to disk.
- User implementation of TPopulation derivative can add screen output.
- Listing 1 is an example code of the user program that uses a recurrent competitive field (RCF) equation:
- FIG. 4 is a detailed flow diagram illustrating a part of an exemplary implementation of the bottom level system and method performed during the computation on the GPU accelerator of the expansion card 180 and is a more detailed view of the computational box 330 in FIG. 3 .
- FIG. 4 is a representation of one of several ways in which a system and method for processing numerical techniques can be implemented.
- ID swapping is equivalent to swapping the base memory address for two partitions of the texture memory bank 250 . They are swapped 485 during synchronization ( 485 , 430 , and 455 ) so that data transfer 445 and the computation 435 - 487 proceeds immediately and in parallel with data transfer as shown in FIG. 4 .
- a hardware solution allows this parallelism through access of the controller 220 to the onboard texture memory bank 250 .
- the main computation and data exchange are executed by the controller 220 . It runs three parallel substreams of execution: Computational Substream 403 , Data Output Substream 402 , and Data Input Substream 404 . These streams are synchronized with each other during the swap of pointers 485 to the input and output texture memory partitions of the texture memory bank 250 and the check for the last iteration 487 . Algorithmically, these two operations are a single atomic operation, but the block diagram shows them as two separate blocks for clarity.
- the Computational Substream 403 performs a computational cycle including a sequential execution of all shaders that were stored in the shader memory bank 210 using the appropriate input and output textures.
- the controller 220 initializes three execution sub streams 403 , 402 , and 404 .
- the Computational Substream 403 determines which textures the GPU 240 will need to perform the computations and initiates the upload 435 of them onto the GPU 240 .
- the GPU 240 can communicate directly with the texture memory bank 250 to upload the appropriate texture to perform the computations.
- the controller 220 also pulls the first shader (known by the stored order) from the shader memory bank 210 and uploads 450 it onto the GPU 240 .
- the GPU 240 executes the following operations in this order: performs the computation (execution of the shader) 470 ; tells the controller 220 that it is done with the computations for the current shader; and after all shaders for this particular equation are executed sends 480 the output textures to the output portion of the texture memory bank 250 . This cycle continues through all of the equations based on the branching step 482 .
- the shader in Listing 2 can be executed on conventional video card. Using the controller 220 this code can be further optimized, however. Since the integration step does not change during the simulation, the step itself as well as the halfstep and 1 ⁇ 6 of the step can be computed once per simulation, and updated in all shaders by a shader update procedures 310 , 326 discussed above.
- the main execution substream 403 on the controller 220 can switch 485 the reference pointers of the input and output portions of the texture memory bank 250 .
- the Data Input Substream 404 is controlling 440 the input of additional data from the CPU 120 . This is necessary in cases where the simulation is monitoring the changing input, for example input from a video camera or other recording device in the real time.
- This substream uploads new external input from the CPU 120 to the texture memory bank 250 so it can be used by the main computational substream 403 on the next computational step and waits for the next iteration 475 .
- the Data Output Substream 445 controls the output of simulation results to the CPU 120 if requested by the user. This substream uploads the results of the previous step to the main RAM 130 so that the CPU 120 can save them on disk 140 or show them on the results display 313 and waits for the next iteration 460 .
- the Computational Substream 403 determines the timing of input 440 and output 445 data transfers, these data transfers are driven by the controller 220 .
- the controller 220 initiates transfer only after selected computational steps. For example, if the experimental data that is simulated was recorded every 10 milliseconds (msec) and the simulation for better precision was computed every 1 msec, then only every tenth result has to be transferred to match the experimental frequency.
- This solution stores two copies of output data, one in the expansion card texture memory bank 250 and another in the system RAM 130 .
- the copy in the system RAM 130 is accessed twice: for disk I/O and screen visualization 313 .
- An alternative solution would be to provide CPU 120 with a direct read access to the onboard texture memory bank 250 by mapping the memory of the hardware onto a global memory space.
- the alternative solution will double the communication through the local bus 190 . Since the goal discussed herein is reducing the information transfer through the local bus 190 , the former solution is favored.
- the main substream 403 determines if this is the last iteration 487 . If it is the last iteration, the controller 220 waits for the all of the execution substreams to finish 490 and then returns the control to the CPU 120 , otherwise it begins the next computational cycle.
- the CPU 120 is only used for user input, sending information to the controller 220 , receiving output after each computational cycle (or less frequently as defined by the user), writing this output to disk 140 , and displaying this output on the monitor 170 . This frees the CPU 120 to execute other applications and allows the expansion card to run at its full capacity without being slowed down by extensive interactions with the CPU 120 .
- shaders will initially be stored on the shader memory bank 210 on the expansion card 180 and will be sent to the GPU 240 for execution by the general purpose controller 220 located on the expansion card.
- the GPU 240 is inherently parallel and is well suited to perform parallel computations.
- the controller 220 is uploading the data from the previous calculation into main memory 130 .
- the CPU 120 at the same time uses uploaded previous results to save them onto disk 140 and to display them on the screen through the system bus 200 .
Abstract
Description
LISTING 1 |
uint16_t w = 3, h = 3; |
static float m_compet = 0.5; |
static float m_persist = 1.0; |
class TCablePopRCF : public TPopulation |
{ |
TEq_RCF* m_equation; |
TGate* m_gatel; |
TGate* m_gate2; |
void createGatingStructure( ) |
{ |
m_gatel = new TGate(0); |
m_gate2 = new TGate(1); |
}; |
void createUnitStructure(TBasicUnit* u) |
{ |
u->addO2OPInputDependency(m_gatel, 0., 0., 0.004, 0., 0, 0); |
u->addFullDependency(m_gate2, population( )); |
} |
public: TCablePopRCF( ) : TPopulation(“compCPU RCF”, w, h, true) { }; |
~TCablePopRCF( ) {if(m_equation) delete m_equation; |
if(m_gatel) delete m_gatel; |
if(m_gate2) delete m_gate2;}; |
bool fillElements(TSimulator* sim); |
}; |
bool TCablePopRCF::fillElements(TSimulatior* sim) |
{ |
m_equation = new TEq_RCF(this, m_compet, m_persist); |
createGatingStructure( ); |
for(size_t i = 0; i < xSize( ); ++i) |
for(size_t j = 0; j < ySize( ); ++j) |
{ |
TElement* u = new TCPUElement(this, m_equation, i, j); |
sim->addUnit(u); |
createUnitStructure(u); |
} |
Return true; |
} |
int |
main( ) |
{ |
// Input pattern generation (309 in FIG.3) |
uint32_t* pat = new uint32_t[w*h]; |
TRandom<float> randGen (0); |
for(uint32_t I = 0; I < w*h; ++i) |
pat[i] = randGen.random( ); |
Tpattern* p = new Tpattern(pat, w, h); |
// Setting up the simulation |
TSimulator* cableSim = new TSimulator(“data”); //(308 and 320 in |
FIG. 3) |
cableSim->timestep(0.05); //(320 in FIG. 3) |
cableSim->resetInputs(p); //(325 in FIG. 3) |
cableSim->outfileInterval(0.1); //(308 in FIG. 3) |
cableSim->outmode(SANNDRA::timefunc); //(308 in FIG. 3) |
cableSim->simLength(60.0); //(320 in FIG. 3) |
// Preparing the population |
TPopulation* cablePop = new TCablePopRCF( ); //(310 in FIG. 3) |
cableSim->networkCreate( ); //(326 in FIG. 3) |
uint16_t user= 1; |
while(user) |
{ |
if(! cableSim->simulationStart(true, 1)) //(311 in FIG. 3) |
exit(1); |
std::cout<<“Repeat?\n”; //(305 in FIG. 3) |
std::cin>>user; //(305 in FIG. 3) |
if(user == 1) |
cableSim->networkReset( ); //(305 in FIG. 3) |
{ |
If(cableSim) |
Delete cableSim; //Also deletes cablePop and its internals |
exit(0); |
}; |
LISTING 2 | |
uniform sampler2DRect Variable; | |
uniform float integration_step; | |
float halfstep = integration_step*0.5; | |
float fl_6step = integration_step/6.0; | |
vec4 output = texture2DRect(Variable, gl_TexCoord[0].st); | |
// define equation( ) here | |
vec4 rungekutta4(vec4 x) | |
{ | |
const vec4 kl = equation(x); | |
const vec4 k2 = equation(x + halfstep*kl); | |
const vec4 k3 = equation(x + halfstep*k2); | |
const vec4 k4 = equation(x + integration step*k3); | |
return fl_6step*(k1 + 2.0*(k2 + k3) + k4); | |
} | |
Void main(void) | |
{ | |
output += rungekutta4(output); | |
gl_FragColor = output; | |
} | |
Claims (41)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/136,343 USRE49461E1 (en) | 2006-09-25 | 2020-12-29 | Graphic processor based accelerator system and method |
Applications Claiming Priority (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US82689206P | 2006-09-25 | 2006-09-25 | |
US11/860,254 US8648867B2 (en) | 2006-09-25 | 2007-09-24 | Graphic processor based accelerator system and method |
US14/147,015 US9189828B2 (en) | 2006-09-25 | 2014-01-03 | Graphic processor based accelerator system and method |
US15/808,201 USRE48438E1 (en) | 2006-09-25 | 2017-11-09 | Graphic processor based accelerator system and method |
US17/136,343 USRE49461E1 (en) | 2006-09-25 | 2020-12-29 | Graphic processor based accelerator system and method |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/147,015 Reissue US9189828B2 (en) | 2006-09-25 | 2014-01-03 | Graphic processor based accelerator system and method |
Publications (1)
Publication Number | Publication Date |
---|---|
USRE49461E1 true USRE49461E1 (en) | 2023-03-14 |
Family
ID=39416485
Family Applications (4)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/860,254 Active 2030-07-20 US8648867B2 (en) | 2006-09-25 | 2007-09-24 | Graphic processor based accelerator system and method |
US14/147,015 Ceased US9189828B2 (en) | 2006-09-25 | 2014-01-03 | Graphic processor based accelerator system and method |
US15/808,201 Active USRE48438E1 (en) | 2006-09-25 | 2017-11-09 | Graphic processor based accelerator system and method |
US17/136,343 Active USRE49461E1 (en) | 2006-09-25 | 2020-12-29 | Graphic processor based accelerator system and method |
Family Applications Before (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/860,254 Active 2030-07-20 US8648867B2 (en) | 2006-09-25 | 2007-09-24 | Graphic processor based accelerator system and method |
US14/147,015 Ceased US9189828B2 (en) | 2006-09-25 | 2014-01-03 | Graphic processor based accelerator system and method |
US15/808,201 Active USRE48438E1 (en) | 2006-09-25 | 2017-11-09 | Graphic processor based accelerator system and method |
Country Status (1)
Country | Link |
---|---|
US (4) | US8648867B2 (en) |
Families Citing this family (42)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8648867B2 (en) * | 2006-09-25 | 2014-02-11 | Neurala Llc | Graphic processor based accelerator system and method |
JP2011524049A (en) * | 2008-06-04 | 2011-08-25 | エヌイーシー ラボラトリーズ アメリカ インク | System and method for parallelizing and speeding up training and classification of learning machines using massively parallel accelerators |
US20100238188A1 (en) * | 2009-03-20 | 2010-09-23 | Sean Miceli | Efficient Display of Virtual Desktops on Multiple Independent Display Devices |
US8922566B2 (en) * | 2010-06-28 | 2014-12-30 | Nvidia Corporation | Rechargeable universal serial bus external graphics device and method |
CN103106637A (en) * | 2011-11-11 | 2013-05-15 | 辉达公司 | Standard central processing unit (CPU) module, system containing CPU module and method for driving system |
US20130163195A1 (en) * | 2011-12-22 | 2013-06-27 | Nvidia Corporation | System, method, and computer program product for performing operations on data utilizing a computation module |
CN102541804B (en) * | 2011-12-26 | 2014-04-02 | 中国人民解放军信息工程大学 | Multi-GPU (graphic processing unit) interconnection system structure in heterogeneous system |
CN103049421B (en) * | 2012-12-11 | 2019-08-27 | 百度在线网络技术(北京)有限公司 | Data transmission method and device between a kind of CPU and coprocessor |
WO2014204615A2 (en) | 2013-05-22 | 2014-12-24 | Neurala, Inc. | Methods and apparatus for iterative nonspecific distributed runtime architecture and its application to cloud intelligence |
EP2999940A4 (en) | 2013-05-22 | 2017-11-15 | Neurala Inc. | Methods and apparatus for early sensory integration and robust acquisition of real world knowledge |
CN103678234B (en) * | 2013-12-06 | 2016-11-02 | 福建鑫诺通讯技术有限公司 | A kind of interface circuit of the multiple usb type of compatibility |
EP3120300A4 (en) | 2014-03-19 | 2017-11-22 | Neurala Inc. | Methods and apparatus for autonomous robotic control |
US9626566B2 (en) | 2014-03-19 | 2017-04-18 | Neurala, Inc. | Methods and apparatus for autonomous robotic control |
US9870333B1 (en) | 2014-09-12 | 2018-01-16 | Keysight Technologies, Inc. | Instrumentation chassis including integrated accelerator module |
CN104267940A (en) * | 2014-09-17 | 2015-01-07 | 武汉狮图空间信息技术有限公司 | Quick map tile generation method based on CPU+GPU |
CN104331271A (en) * | 2014-11-18 | 2015-02-04 | 李桦 | Parallel computing method and system for CFD (Computational Fluid Dynamics) |
CN105740493A (en) * | 2014-12-12 | 2016-07-06 | 鸿富锦精密工业(武汉)有限公司 | Simulation model and simulation method for obtaining cooling flow of expansion card |
US11157800B2 (en) | 2015-07-24 | 2021-10-26 | Brainchip, Inc. | Neural processor based accelerator system and method |
WO2017049583A1 (en) * | 2015-09-25 | 2017-03-30 | Intel Corporation | Gpu-cpu two-path memory copy |
US10354692B2 (en) * | 2015-10-02 | 2019-07-16 | Twitter, Inc. | Gapless video looping |
WO2017203096A1 (en) * | 2016-05-27 | 2017-11-30 | Picturall Oy | A computer-implemented method for reducing video latency of a computer video processing system and computer program product thereto |
US11238334B2 (en) | 2017-04-04 | 2022-02-01 | Hailo Technologies Ltd. | System and method of input alignment for efficient vector operations in an artificial neural network |
US11544545B2 (en) | 2017-04-04 | 2023-01-03 | Hailo Technologies Ltd. | Structured activation based sparsity in an artificial neural network |
US11551028B2 (en) | 2017-04-04 | 2023-01-10 | Hailo Technologies Ltd. | Structured weight based sparsity in an artificial neural network |
US10387298B2 (en) | 2017-04-04 | 2019-08-20 | Hailo Technologies Ltd | Artificial neural network incorporating emphasis and focus techniques |
US11615297B2 (en) | 2017-04-04 | 2023-03-28 | Hailo Technologies Ltd. | Structured weight based sparsity in an artificial neural network compiler |
US10838902B2 (en) * | 2017-06-23 | 2020-11-17 | Facebook, Inc. | Apparatus, system, and method for performing hardware acceleration via expansion cards |
CN107729283B (en) * | 2017-10-10 | 2021-07-13 | 惠州Tcl移动通信有限公司 | Method, system and storage medium for controlling CPU extension based on mobile terminal |
US10360214B2 (en) | 2017-10-19 | 2019-07-23 | Pure Storage, Inc. | Ensuring reproducibility in an artificial intelligence infrastructure |
US10671434B1 (en) | 2017-10-19 | 2020-06-02 | Pure Storage, Inc. | Storage based artificial intelligence infrastructure |
US11455168B1 (en) | 2017-10-19 | 2022-09-27 | Pure Storage, Inc. | Batch building for deep learning training workloads |
US11861423B1 (en) | 2017-10-19 | 2024-01-02 | Pure Storage, Inc. | Accelerating artificial intelligence (‘AI’) workflows |
US11494692B1 (en) | 2018-03-26 | 2022-11-08 | Pure Storage, Inc. | Hyperscale artificial intelligence and machine learning infrastructure |
US10564989B2 (en) | 2017-11-28 | 2020-02-18 | Microsoft Technology Licensing | Thread independent parametric positioning for rendering elements |
US10424041B2 (en) * | 2017-12-11 | 2019-09-24 | Microsoft Technology Licensing, Llc | Thread independent scalable vector graphics operations |
RU199766U1 (en) * | 2019-12-23 | 2020-09-21 | Общество с ограниченной ответственностью "Эверест" | PCIe EXPANSION CARD FOR CONTINUOUS PERFORMANCE (INFERENCE) OF NEURAL NETWORKS |
US11221929B1 (en) | 2020-09-29 | 2022-01-11 | Hailo Technologies Ltd. | Data stream fault detection mechanism in an artificial neural network processor |
US11263077B1 (en) | 2020-09-29 | 2022-03-01 | Hailo Technologies Ltd. | Neural network intermediate results safety mechanism in an artificial neural network processor |
US11811421B2 (en) | 2020-09-29 | 2023-11-07 | Hailo Technologies Ltd. | Weights safety mechanism in an artificial neural network processor |
US11237894B1 (en) | 2020-09-29 | 2022-02-01 | Hailo Technologies Ltd. | Layer control unit instruction addressing safety mechanism in an artificial neural network processor |
US11874900B2 (en) | 2020-09-29 | 2024-01-16 | Hailo Technologies Ltd. | Cluster interlayer safety mechanism in an artificial neural network processor |
US20230418343A1 (en) * | 2020-10-07 | 2023-12-28 | Hewlett-Packard Development Company, L.P. | Holders for computing components |
Citations (65)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5063603A (en) | 1989-11-06 | 1991-11-05 | David Sarnoff Research Center, Inc. | Dynamic method for recognizing objects and image processing system therefor |
US5136687A (en) | 1989-10-10 | 1992-08-04 | Edelman Gerald M | Categorization automata employing neuronal group selection with reentry |
US5142665A (en) | 1990-02-20 | 1992-08-25 | International Business Machines Corporation | Neural network shell for application programs |
US5172253A (en) | 1990-06-21 | 1992-12-15 | Inernational Business Machines Corporation | Neural network model for reaching a goal state |
US5388206A (en) | 1992-11-13 | 1995-02-07 | The University Of North Carolina | Architecture and apparatus for image generation |
US6018696A (en) | 1996-12-26 | 2000-01-25 | Fujitsu Limited | Learning type position determining device |
US20010010034A1 (en) | 2000-01-20 | 2001-07-26 | Burton John Mark | Simulation of data processing apparatus |
US6336051B1 (en) | 1997-04-16 | 2002-01-01 | Carnegie Mellon University | Agricultural harvester with robotic control |
US20020046271A1 (en) | 2000-04-03 | 2002-04-18 | Huang James Ching-Liang | Single switch image for a stack of switches |
US20020050518A1 (en) | 1997-12-08 | 2002-05-02 | Roustaei Alexander R. | Sensor array |
US20020064314A1 (en) | 2000-09-08 | 2002-05-30 | Dorin Comaniciu | Adaptive resolution system and method for providing efficient low bit rate transmission of image data for distributed applications |
US20020168100A1 (en) | 2001-05-10 | 2002-11-14 | Woodall Roger L. | Spatial image processor |
US20030026588A1 (en) | 2001-05-14 | 2003-02-06 | Elder James H. | Attentive panoramic visual sensor |
US20030078754A1 (en) | 2001-10-22 | 2003-04-24 | Honeywell International Inc. | Multi-sensor information fusion technique |
US6647508B2 (en) | 1997-11-04 | 2003-11-11 | Hewlett-Packard Development Company, L.P. | Multiprocessor computer architecture with multiple operating system instances and software controlled resource allocation |
US20040015334A1 (en) | 2002-07-19 | 2004-01-22 | International Business Machines Corporation | Method and apparatus to manage multi-computer demand |
EP1224622B1 (en) | 1999-09-24 | 2004-11-10 | Sun Microsystems, Inc. | Method and apparatus for rapid visualization of three-dimensional scenes |
US20050166042A1 (en) | 2002-01-16 | 2005-07-28 | Microsoft Corporation | Secure video card methods and systems |
US20060129506A1 (en) * | 2004-07-15 | 2006-06-15 | Neurosciences Research Foundation, Inc. | Mobile brain-based device having a simulated nervous system based on the hippocampus |
US20060184273A1 (en) | 2003-03-11 | 2006-08-17 | Tsutomu Sawada | Robot device, Behavior control method thereof, and program |
US7119810B2 (en) | 2003-12-05 | 2006-10-10 | Siemens Medical Solutions Usa, Inc. | Graphics processing unit for simulation or medical diagnostic imaging |
US20070052713A1 (en) | 2005-08-09 | 2007-03-08 | Samsung Electronics Co., Ltd. | Systems and methods for storing and fetching texture data using bank interleaving |
US7219085B2 (en) * | 2003-12-09 | 2007-05-15 | Microsoft Corporation | System and method for accelerating and optimizing the processing of machine learning techniques using a graphics processing unit |
US20070198222A1 (en) | 2006-02-23 | 2007-08-23 | Rockwell Automation Technologies, Inc. | System and method to combine and weight multiple sensors with overlapping sensing range to create a measurement system utilized in a high integrity or safety environment |
US20070279429A1 (en) | 2006-06-02 | 2007-12-06 | Leonhard Ganzer | System and method for rendering graphics |
US20080033897A1 (en) | 2006-08-02 | 2008-02-07 | Lloyd Kenneth A | Object Oriented System and Method of Graphically Displaying and Analyzing Complex Systems |
US20080066065A1 (en) | 2006-09-07 | 2008-03-13 | Samsung Electronics Co., Ltd. | Software robot apparatus |
US20080258880A1 (en) | 2007-01-10 | 2008-10-23 | Smith Richard A | Information Collecting and Decision Making Via Tiered Information Network Systems |
US7477256B1 (en) * | 2004-11-17 | 2009-01-13 | Nvidia Corporation | Connecting graphics adapters for scalable performance |
US20090080695A1 (en) | 2007-09-24 | 2009-03-26 | New Span Opto-Technology, Inc. | Electro-optical Foveated Imaging and Tracking System |
US20090089030A1 (en) | 2007-09-28 | 2009-04-02 | Rockwell Automation Technologies, Inc. | Distributed simulation and synchronization |
US7525547B1 (en) * | 2003-08-12 | 2009-04-28 | Nvidia Corporation | Programming multiple chips from a command buffer to process multiple images |
US20090116688A1 (en) | 2007-11-05 | 2009-05-07 | California Institute Of Technology | Synthetic foveal imaging technology |
US20100048242A1 (en) | 2008-08-19 | 2010-02-25 | Rhoads Geoffrey B | Methods and systems for content processing |
US7765029B2 (en) | 2005-09-13 | 2010-07-27 | Neurosciences Research Foundation, Inc. | Hybrid control device |
US7861060B1 (en) * | 2005-12-15 | 2010-12-28 | Nvidia Corporation | Parallel data processing systems and methods using cooperative thread arrays and thread identifier values to determine processing behavior |
US20110004341A1 (en) | 2009-07-01 | 2011-01-06 | Honda Motor Co., Ltd. | Panoramic Attention For Humanoid Robots |
US7873650B1 (en) | 2004-06-11 | 2011-01-18 | Seisint, Inc. | System and method for distributing data in a parallel processing system |
US20110173015A1 (en) | 2006-03-03 | 2011-07-14 | Inrix, Inc. | Determining road traffic conditions using data from multiple data sources |
US20110279682A1 (en) | 2009-11-12 | 2011-11-17 | Le Li | Methods for Target Tracking, Classification and Identification by Using Foveal Sensors |
US20120072215A1 (en) | 2010-09-21 | 2012-03-22 | Microsoft Corporation | Full-sequence training of deep structures for speech recognition |
US20120089552A1 (en) | 2008-12-22 | 2012-04-12 | Shih-Fu Chang | Rapid image annotation via brain state decoding and visual pattern mining |
US20120197596A1 (en) | 2011-01-31 | 2012-08-02 | Raytheon Company | System And Method For Distributed Processing |
US20120316786A1 (en) | 2011-06-10 | 2012-12-13 | International Business Machines Corporation | Rtm seismic imaging using incremental resolution methods |
US8392346B2 (en) | 2008-11-04 | 2013-03-05 | Honda Motor Co., Ltd. | Reinforcement learning system |
US20130131985A1 (en) | 2011-04-11 | 2013-05-23 | James D. Weiland | Wearable electronic image acquisition and enhancement system and method for image acquisition and visual enhancement |
US20130126703A1 (en) | 2007-12-05 | 2013-05-23 | John Caulfield | Imaging Detecting with Automated Sensing of an Object or Characteristic of that Object |
US8510244B2 (en) | 2009-03-20 | 2013-08-13 | ISC8 Inc. | Apparatus comprising artificial neuronal assembly |
US20140019392A1 (en) | 2012-06-01 | 2014-01-16 | Brain Corporation | Intelligent modular robotic apparatus and methods |
US20140032461A1 (en) | 2012-07-25 | 2014-01-30 | Board Of Trustees Of Michigan State University | Synapse maintenance in the developmental networks |
US8648867B2 (en) * | 2006-09-25 | 2014-02-11 | Neurala Llc | Graphic processor based accelerator system and method |
US20140052679A1 (en) | 2011-09-21 | 2014-02-20 | Oleg Sinyavskiy | Apparatus and methods for implementing event-based updates in spiking neuron networks |
US20140089232A1 (en) | 2012-06-01 | 2014-03-27 | Brain Corporation | Neural network learning and collaboration apparatus and methods |
WO2014204615A2 (en) | 2013-05-22 | 2014-12-24 | Neurala, Inc. | Methods and apparatus for iterative nonspecific distributed runtime architecture and its application to cloud intelligence |
US20150127149A1 (en) | 2013-11-01 | 2015-05-07 | Brain Corporation | Apparatus and methods for online training of robots |
US9031692B2 (en) | 2010-08-24 | 2015-05-12 | Shenzhen Institutes of Advanced Technology Chinese Academy of Science | Cloud robot system and method of integrating the same |
US20150134232A1 (en) | 2011-11-22 | 2015-05-14 | Kurt B. Robinson | Systems and methods involving features of adaptive and/or autonomous traffic control |
US20150224648A1 (en) | 2014-02-13 | 2015-08-13 | GM Global Technology Operations LLC | Robotic system with 3d box location functionality |
WO2015143173A2 (en) | 2014-03-19 | 2015-09-24 | Neurala, Inc. | Methods and apparatus for autonomous robotic control |
WO2016014137A2 (en) | 2014-05-06 | 2016-01-28 | Neurala, Inc. | Apparatuses, methods, and systems for defining hardware-agnostic brains for autonomous robots |
US20160075017A1 (en) | 2014-09-17 | 2016-03-17 | Brain Corporation | Apparatus and methods for removal of learned behaviors in robots |
US20160082597A1 (en) | 2013-05-22 | 2016-03-24 | Neurala, Inc. | Methods and apparatus for early sensory integration and robust acquisition of real world knowledge |
US20160096270A1 (en) | 2014-10-02 | 2016-04-07 | Brain Corporation | Feature detection apparatus and methods for training of robotic navigation |
US9626566B2 (en) | 2014-03-19 | 2017-04-18 | Neurala, Inc. | Methods and apparatus for autonomous robotic control |
WO2019000208A1 (en) | 2017-06-27 | 2019-01-03 | 张志兰 | High-efficiency water-circulating freezer |
-
2007
- 2007-09-24 US US11/860,254 patent/US8648867B2/en active Active
-
2014
- 2014-01-03 US US14/147,015 patent/US9189828B2/en not_active Ceased
-
2017
- 2017-11-09 US US15/808,201 patent/USRE48438E1/en active Active
-
2020
- 2020-12-29 US US17/136,343 patent/USRE49461E1/en active Active
Patent Citations (74)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5136687A (en) | 1989-10-10 | 1992-08-04 | Edelman Gerald M | Categorization automata employing neuronal group selection with reentry |
US5063603A (en) | 1989-11-06 | 1991-11-05 | David Sarnoff Research Center, Inc. | Dynamic method for recognizing objects and image processing system therefor |
US5142665A (en) | 1990-02-20 | 1992-08-25 | International Business Machines Corporation | Neural network shell for application programs |
US5172253A (en) | 1990-06-21 | 1992-12-15 | Inernational Business Machines Corporation | Neural network model for reaching a goal state |
US5388206A (en) | 1992-11-13 | 1995-02-07 | The University Of North Carolina | Architecture and apparatus for image generation |
US6018696A (en) | 1996-12-26 | 2000-01-25 | Fujitsu Limited | Learning type position determining device |
US6336051B1 (en) | 1997-04-16 | 2002-01-01 | Carnegie Mellon University | Agricultural harvester with robotic control |
US6647508B2 (en) | 1997-11-04 | 2003-11-11 | Hewlett-Packard Development Company, L.P. | Multiprocessor computer architecture with multiple operating system instances and software controlled resource allocation |
US20020050518A1 (en) | 1997-12-08 | 2002-05-02 | Roustaei Alexander R. | Sensor array |
EP1224622B1 (en) | 1999-09-24 | 2004-11-10 | Sun Microsystems, Inc. | Method and apparatus for rapid visualization of three-dimensional scenes |
US20010010034A1 (en) | 2000-01-20 | 2001-07-26 | Burton John Mark | Simulation of data processing apparatus |
US20020046271A1 (en) | 2000-04-03 | 2002-04-18 | Huang James Ching-Liang | Single switch image for a stack of switches |
US20020064314A1 (en) | 2000-09-08 | 2002-05-30 | Dorin Comaniciu | Adaptive resolution system and method for providing efficient low bit rate transmission of image data for distributed applications |
US20020168100A1 (en) | 2001-05-10 | 2002-11-14 | Woodall Roger L. | Spatial image processor |
US20030026588A1 (en) | 2001-05-14 | 2003-02-06 | Elder James H. | Attentive panoramic visual sensor |
US20030078754A1 (en) | 2001-10-22 | 2003-04-24 | Honeywell International Inc. | Multi-sensor information fusion technique |
US20050166042A1 (en) | 2002-01-16 | 2005-07-28 | Microsoft Corporation | Secure video card methods and systems |
US20040015334A1 (en) | 2002-07-19 | 2004-01-22 | International Business Machines Corporation | Method and apparatus to manage multi-computer demand |
US20060184273A1 (en) | 2003-03-11 | 2006-08-17 | Tsutomu Sawada | Robot device, Behavior control method thereof, and program |
US7525547B1 (en) * | 2003-08-12 | 2009-04-28 | Nvidia Corporation | Programming multiple chips from a command buffer to process multiple images |
US7119810B2 (en) | 2003-12-05 | 2006-10-10 | Siemens Medical Solutions Usa, Inc. | Graphics processing unit for simulation or medical diagnostic imaging |
US7219085B2 (en) * | 2003-12-09 | 2007-05-15 | Microsoft Corporation | System and method for accelerating and optimizing the processing of machine learning techniques using a graphics processing unit |
US7873650B1 (en) | 2004-06-11 | 2011-01-18 | Seisint, Inc. | System and method for distributing data in a parallel processing system |
US20060129506A1 (en) * | 2004-07-15 | 2006-06-15 | Neurosciences Research Foundation, Inc. | Mobile brain-based device having a simulated nervous system based on the hippocampus |
US7477256B1 (en) * | 2004-11-17 | 2009-01-13 | Nvidia Corporation | Connecting graphics adapters for scalable performance |
US20070052713A1 (en) | 2005-08-09 | 2007-03-08 | Samsung Electronics Co., Ltd. | Systems and methods for storing and fetching texture data using bank interleaving |
US8583286B2 (en) | 2005-09-13 | 2013-11-12 | Neurosciences Research Foundation, Inc. | Hybrid control device |
US7765029B2 (en) | 2005-09-13 | 2010-07-27 | Neurosciences Research Foundation, Inc. | Hybrid control device |
US7861060B1 (en) * | 2005-12-15 | 2010-12-28 | Nvidia Corporation | Parallel data processing systems and methods using cooperative thread arrays and thread identifier values to determine processing behavior |
US20070198222A1 (en) | 2006-02-23 | 2007-08-23 | Rockwell Automation Technologies, Inc. | System and method to combine and weight multiple sensors with overlapping sensing range to create a measurement system utilized in a high integrity or safety environment |
US20110173015A1 (en) | 2006-03-03 | 2011-07-14 | Inrix, Inc. | Determining road traffic conditions using data from multiple data sources |
US20070279429A1 (en) | 2006-06-02 | 2007-12-06 | Leonhard Ganzer | System and method for rendering graphics |
US20080033897A1 (en) | 2006-08-02 | 2008-02-07 | Lloyd Kenneth A | Object Oriented System and Method of Graphically Displaying and Analyzing Complex Systems |
US20080066065A1 (en) | 2006-09-07 | 2008-03-13 | Samsung Electronics Co., Ltd. | Software robot apparatus |
US9189828B2 (en) | 2006-09-25 | 2015-11-17 | Neurala, Inc. | Graphic processor based accelerator system and method |
US8648867B2 (en) * | 2006-09-25 | 2014-02-11 | Neurala Llc | Graphic processor based accelerator system and method |
USRE48438E1 (en) * | 2006-09-25 | 2021-02-16 | Neurala, Inc. | Graphic processor based accelerator system and method |
US20080258880A1 (en) | 2007-01-10 | 2008-10-23 | Smith Richard A | Information Collecting and Decision Making Via Tiered Information Network Systems |
US20090080695A1 (en) | 2007-09-24 | 2009-03-26 | New Span Opto-Technology, Inc. | Electro-optical Foveated Imaging and Tracking System |
US20090089030A1 (en) | 2007-09-28 | 2009-04-02 | Rockwell Automation Technologies, Inc. | Distributed simulation and synchronization |
US20090116688A1 (en) | 2007-11-05 | 2009-05-07 | California Institute Of Technology | Synthetic foveal imaging technology |
US20130126703A1 (en) | 2007-12-05 | 2013-05-23 | John Caulfield | Imaging Detecting with Automated Sensing of an Object or Characteristic of that Object |
US20100048242A1 (en) | 2008-08-19 | 2010-02-25 | Rhoads Geoffrey B | Methods and systems for content processing |
US8392346B2 (en) | 2008-11-04 | 2013-03-05 | Honda Motor Co., Ltd. | Reinforcement learning system |
US20120089552A1 (en) | 2008-12-22 | 2012-04-12 | Shih-Fu Chang | Rapid image annotation via brain state decoding and visual pattern mining |
US8510244B2 (en) | 2009-03-20 | 2013-08-13 | ISC8 Inc. | Apparatus comprising artificial neuronal assembly |
US20110004341A1 (en) | 2009-07-01 | 2011-01-06 | Honda Motor Co., Ltd. | Panoramic Attention For Humanoid Robots |
US20110279682A1 (en) | 2009-11-12 | 2011-11-17 | Le Li | Methods for Target Tracking, Classification and Identification by Using Foveal Sensors |
US9031692B2 (en) | 2010-08-24 | 2015-05-12 | Shenzhen Institutes of Advanced Technology Chinese Academy of Science | Cloud robot system and method of integrating the same |
US20120072215A1 (en) | 2010-09-21 | 2012-03-22 | Microsoft Corporation | Full-sequence training of deep structures for speech recognition |
US20120197596A1 (en) | 2011-01-31 | 2012-08-02 | Raytheon Company | System And Method For Distributed Processing |
US20130131985A1 (en) | 2011-04-11 | 2013-05-23 | James D. Weiland | Wearable electronic image acquisition and enhancement system and method for image acquisition and visual enhancement |
US20120316786A1 (en) | 2011-06-10 | 2012-12-13 | International Business Machines Corporation | Rtm seismic imaging using incremental resolution methods |
US20140052679A1 (en) | 2011-09-21 | 2014-02-20 | Oleg Sinyavskiy | Apparatus and methods for implementing event-based updates in spiking neuron networks |
US20150134232A1 (en) | 2011-11-22 | 2015-05-14 | Kurt B. Robinson | Systems and methods involving features of adaptive and/or autonomous traffic control |
US9177246B2 (en) | 2012-06-01 | 2015-11-03 | Qualcomm Technologies Inc. | Intelligent modular robotic apparatus and methods |
US20140019392A1 (en) | 2012-06-01 | 2014-01-16 | Brain Corporation | Intelligent modular robotic apparatus and methods |
US20140089232A1 (en) | 2012-06-01 | 2014-03-27 | Brain Corporation | Neural network learning and collaboration apparatus and methods |
US20140032461A1 (en) | 2012-07-25 | 2014-01-30 | Board Of Trustees Of Michigan State University | Synapse maintenance in the developmental networks |
WO2014204615A2 (en) | 2013-05-22 | 2014-12-24 | Neurala, Inc. | Methods and apparatus for iterative nonspecific distributed runtime architecture and its application to cloud intelligence |
US20160082597A1 (en) | 2013-05-22 | 2016-03-24 | Neurala, Inc. | Methods and apparatus for early sensory integration and robust acquisition of real world knowledge |
US20160198000A1 (en) | 2013-05-22 | 2016-07-07 | Neurala, Inc. | Methods and apparatus for iterative nonspecific distributed runtime architecture and its application to cloud intelligence |
US20150127149A1 (en) | 2013-11-01 | 2015-05-07 | Brain Corporation | Apparatus and methods for online training of robots |
US20150224648A1 (en) | 2014-02-13 | 2015-08-13 | GM Global Technology Operations LLC | Robotic system with 3d box location functionality |
WO2015143173A2 (en) | 2014-03-19 | 2015-09-24 | Neurala, Inc. | Methods and apparatus for autonomous robotic control |
US9626566B2 (en) | 2014-03-19 | 2017-04-18 | Neurala, Inc. | Methods and apparatus for autonomous robotic control |
US10083523B2 (en) | 2014-03-19 | 2018-09-25 | Neurala, Inc. | Methods and apparatus for autonomous robotic control |
US20170024877A1 (en) | 2014-03-19 | 2017-01-26 | Neurala, Inc. | Methods and Apparatus for Autonomous Robotic Control |
US20170193298A1 (en) | 2014-03-19 | 2017-07-06 | Neurala, Inc. | Methods and apparatus for autonomous robotic control |
WO2016014137A2 (en) | 2014-05-06 | 2016-01-28 | Neurala, Inc. | Apparatuses, methods, and systems for defining hardware-agnostic brains for autonomous robots |
US20170076194A1 (en) | 2014-05-06 | 2017-03-16 | Neurala, Inc. | Apparatuses, methods and systems for defining hardware-agnostic brains for autonomous robots |
US20160075017A1 (en) | 2014-09-17 | 2016-03-17 | Brain Corporation | Apparatus and methods for removal of learned behaviors in robots |
US20160096270A1 (en) | 2014-10-02 | 2016-04-07 | Brain Corporation | Feature detection apparatus and methods for training of robotic navigation |
WO2019000208A1 (en) | 2017-06-27 | 2019-01-03 | 张志兰 | High-efficiency water-circulating freezer |
Non-Patent Citations (139)
Title |
---|
Adelson, E. H , Anderson, C. H , Bergen, J R., Burt, P. J , & Ogden, J. M (1984) Pyramid methods in image processing. RCA engineer, 29(6), 33-41. |
Aggarwal, Charu C, Hinneburg, Alexander, and Keim, Daniel A. On the surprising behavior of distance metrics in high dimensional space. Springer, 2001. 15 pages. |
Al-Kaysi, A. M. et al., A Multichannel Deep Belief Network for the Classification of EEG Data, from Ontology-based Information Extraction for Residential Land Use Suitability: A Case Study of the City of Regina, Canada, DOI 10.1007/978-3-319-26561-2_5, 8 pages (Nov. 2015). |
Ames, H, Versace, M., Gorchetchnikov, A., Chandler, B., Livitz, G., Léveillé, J., Mingolla, E., Carter, D., Abdalla, H., and Snider, G. (2012) Persuading computers to act more like brains. In Advances in Neuromorphic Memristor Science and Applications, Kozma, R.Pino,R., and Pazienza, G. (eds), Springer Verlag. 25 pages. |
Ames, H. Mingolla, E., Sohail, A., Chandler, B., Gorchetchnikov, A., Leveille, J., Livitz, G. and Versace, M. (2012) The Animat. IEEE Pulse, Feb. 2012, 3(1), 47-50. |
Apolloni, B. et al., Training a network of mobile neurons, Proceedings of International Joint Conference on Neural Networks, San Jose, CA, doi: 10.1109/IJCNN.2011.6033427, pp. 1683-1691 (Jul. 31-Aug. 5, 2011). |
Artificial Intelligence as a Service. Invited talk, Defrag, Broomfield, CO, Nov. 4-6, 2013. 22 pages. |
Aryananda, L. 2006. Attending to learn and learning to attend for a social robot. Humanoids 06, pp. 618-623. |
Baraldi, A. and Alpaydin, E. (1998). Simplified Art: A new class of Art algorithms. International Computer Science Institute, Berkeley, CA, TR-98-004, 1998. 42 pages. |
Baraldi, A. and Alpaydin, E. (2002). Constructive feedforward Art clustering networks—Part I. IEEE Transactions on Neural Networks 13(3), 645-661. |
Baraldi, A. and Parmiggiani, F. (1997). Fuzzy combination of Kohonen's and ART neural network models to detect statistical regularities in a random sequence of multi-valued input patterns. In International Conference on Neural Networks, IEEE. 6 pages. |
Baraldi, Andrea and Alpaydin, Ethem. Constructive feedforward ART clustering networks—part II IEEE Transactions on Neural Networks, 13(3):662-677, May 2002. ISSN 1045-9227. doi: 10.1109/tnn.2002.1000131. URL http://dx.doi.org/10.1109/tnn.2002.1000131. |
Bengio, Y., Courville, A., & Vincent, P. Representation learning: A review and new perspectives, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35 Issue 8, Aug. 2013. pp. 1798-1828. |
Berenson, D et al., A robot path planning framework that learns from experience, 2012 International Conference on Robotics and Automation, 2012, 9 pages [retrieved from the internet] URL:http://users.wpi.edu/-dberenson/lightning.pdf. |
Bernhard, F., and Keriven, R. 2005. Spiking Neurons on GPUs. Tech. Rep. 05-15, Ecole Nationale des Ponts et Chauss'es, 8 pages. |
Besl, P. J., & Jain, R. C. (1985). Three-dimensional object recognition. ACM Computing Surveys (CSUR), 17(1), 75-145. |
Boddapati, V., Classifying Environmental Sounds with Image Networks, Thesis, Faculty of Computing Blekinge Institute of Technology, 37 pages (Feb. 2017). |
Bohn, C.-A. Kohonen. 1998. Feature Mapping Through Graphics Hardware. In Proceedings of 3rd Int. Conference on Computational Intelligence and Neurosciences, 4 pages. |
Bradski, G., & Grossberg, S. (1995). Fast-learning Viewnet architectures for recognizing three-dimensional objects from multiple two-dimensional views. Neural Networks, 8 (7-8), 1053-1080. |
Canny, J.A. (1986). Computational Approach To Edge Detection, IEEE Trans. Pattern Analysis and Machine Intelligence, 8(6):679-698. |
Carpenter, G.A. and Grossberg, S. (1987). A massively parallel architecture for a self-organizing neural pattern recognition machine. Computer Vision, Graphics, and Image Processing 37, 54-115. |
Carpenter, G.A., and Grossberg, S. (1995). Adaptive resonance theory (ART). In M. Arbib (Ed.), The handbook of brain theory and neural networks, (pp. 79-82). Cambridge, M.A.: MIT press. |
Carpenter, G.A., Grossberg, S. and Rosen, D.B. (1991). Fuzzy Art: Fast stable learning and categorization of analog patterns by an adaptive resonance system Neural Networks 4, 759-771. |
Carpenter, Gail A and Grossberg, Stephen. The art of adaptive pattern recognition by a self-organizing neural network. Computer, 21(3):77-88, 1988. |
Coifman, R.R. and Maggioni, M. Diffusion wavelets. Applied and Computational Harmonic Analysis, 21(1):53-94, 2006. |
Coifman, R.R., Lafon, S., Lee, A.B., Maggioni, M., Nadler, B., Warner, F., and Zucker, S.W. Geometric diffusions as a tool for harmonic analysis and structure definition of data: Diffusion maps. Proceedings of the National Academy of Sciences of the United States of America, 102(21):7426, 2005. 21 pages. |
Cornwall et al., Automatically translating a general purpose C++ image processing library for GPUs. Proceedings 20th IEEE International Parallel & Distributed Processing Symposium, 2006, 8 pages. |
Davis, C. E. 2005. Graphic Processing Unit Computation of Neural Networks. Master's thesis, University of New Mexico, Albuquerque, NM, 121 pages. |
Dosher, B.A., and Lu, Z.L. (2010). Mechanisms of perceptual attention in precuing of location. Vision Res., 40(10-12). 1269-1292. |
Ellias, S. A., and Grossberg, S. 1975. Pattern formation, contrast control and oscillations in the short term memory of shunting on-center off-surround networks Biol Cybern 20, pp. 69-98. |
Extended European Search Report and Written Opinion dated Jun. 1, 2017 from European Application No. 14813864.7, 10 pages. |
Extended European Search Report and Written Opinion dated Oct. 12, 2017 from European Application No. 14800348.6, 12 pages. |
Extended European Search Report and Written Opinion Oct. 23, 2017 from European Application No. 15765396.5, 3 pages. |
Fazl, A., Grossberg, S., and Mingolla, E. (2009). View-invariant object category learning, recognition, and search: How spatial and object attention are coordinated using surface-based attentional shrouds. Cognitive Psychology 58, 1-48. |
Földiák, P. (1990). Forming sparse representations by local anti-Hebbian learning, Biological Cybernetics, vol. 64, pp. 165-170. |
Friston K., Adams R., Perrinet L., & Breakspear M. (2012). Perceptions as hypotheses: saccades as experiments. Frontiers in Psychology, 3 (151), 1-20. |
Galbraith, B.V, Guenther, F.H., and Versace, M. (2015) A neural network-based exploratory learning and motor planning system for co-robots Frontiers in Neuroscience, in press. 10 pages. |
George, D. and Hawkins, J. (2009). Towards a mathematical theory of cortical micro-circuits. PLoS Computational Biology 5(10), 1-26. |
Georgii, J., and Westermann, R. 2005. Mass-spring systems on the GPU. Simulation Modelling Practice and Theory 13, pp. 693-702. |
Gorcheichnikov, A. 2017. An Approach to a Biologically Realistic Simulation of Natural Memory. Master's thesis, Middle Tennessee State University, Murfreesboro, TN, 70 pages. |
Gorchetchnikov A., Hasselmo M. E. (2005). A biophysical implementation of a bidirectional graph search algorithm to solve multiple goal navigation tasks. Connection Science, 17(1-2), pp. 145-166. |
Gorchetchnikov A., Hasselmo M. E. (2005). A simple rule for spike-timing-dependent plasticity: local influence of AHP current. Neurocomputing, 65-66, pp. 885-890. |
Gorchetchnikov A., Versace M., Hasselmo M. E. (2005). A Model of STDP Based on Spatially and Temporally Local Information: Derivation and Combination with Gated Decay. Neural Networks, 18, pp. 458-466. |
Gorchetchnikov A., Versace M., Hasselmo M. E. (2005). Spatially and temporally local spike-timing-dependent plasticity rule. In: Proceedings of the International Joint Conference on Neural Networks, No. 1568 in IEEE CD-ROM Catalog No. 05CH37662C, pp. 390-396. |
Grossberg, S. (1973). Contour enhancement, short-term memory, and constancies in reverberating neural networks. Studies in Applied Mathematics 52, 213-257. |
Grossberg, S., and Huang, T.R. (2009). Artscene: A neural system for natural scene classification. Journal of Mision, 9 (4), 6.1-19. doi:10.1167/9.4.6. |
Grossberg, S., and Versace, M. (2008) Spikes, synchrony, and attentive learning by laminar thalamocortical circuits. Brain Research, 1218C, 278-312 [Authors listed alphabetically]. |
Hagen, T. R., Hjelmervik, J., Lie, K.-A., Natvig, J., and Ofstad Henriksen, M. 2005. Visual simulation of shallow-water waves. Simulation Modelling Practice and Theory 13, pp. 716-726. |
Hasselt, Hado Van. Double q-learning. In Advances in Neural Information Processing Systems, pp. 2613-2621,2010. |
Hinton, G. E., Osindero, S., and Teh, Y. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18, 1527-1554. |
Hodgkin, A. L., and Huxley, A. F. 1952. Quantitative description of membrane current and its application to conduction and excitation in nerve. J Physiol 117, pp. 500-544. |
Hopfield, J. 1982. Neural networks and physical systems with emergent collective computational abilities. In Proc Natl Acad Sci USA, vol. 79, pp. 2554-2558. |
Ilie, A. 2002. Optical character recognition on graphics hardware. Tech. Rep. integrative paper, UNCCH, Department of Computer Science, 9 pages. |
International Preliminary Report on Patentability dated Nov. 8, 2016 from International Application No. PCT/US2015/029438, 7 pages. |
International Preliminary Report on Patentability in related PCT Application No. PCT/US2014/039162 filed May 22, 2014, dated Nov. 24, 2015, 7 pages. |
International Preliminary Report on Patentability in related PCT Application No. PCT/US2014/039239 filed May 22, 2014, dated Nov. 24, 2015, 8 pages. |
International Search Report and Written Opinion dated Feb. 18, 2015 from International Application No. PCT/US2014/039162, 12 pages. |
International Search Report and Written Opinion dated Feb. 23, 2016 from International Application No. PCT/US2015/029438, 11 pages. |
International Search Report and Written Opinion dated Jul. 6, 2017 from International Application No. PCT/US2017/029866, 12 pages. |
International Search Report and Written Opinion dated Nov. 26, 2014 from International Application No. PCT/US2014/039239, 14 pages. |
International Search Report and Written Opinion dated Sep. 15, 2015 from International Application No. PCT/US2015/021492, 9 pages. |
Itti, L., and Koch, C. (2001). Computational modelling of visual attention. Nature Reviews Neuroscience, 2 (3), 194-203. |
Itti, L., Koch, C., and Niebur, E. (1998). A Model of Saliency-Based Visual Attention for Rapid Scene Analysis, 1-6. |
Jarrett, K., Kavukcuoglu, K., Ranzato, M. A., & LeCun, Y. (Sep. 2009). What is the best multi-stage architecture tor object recognition?. In Computer Vision, 2009 IEEE 12th International Conference on (pp. 2146-2153) IEEE. |
Khaligh-Razavi, S.-M et al., Deep Supervised, but Not Unsupervised, Models May Explain IT Cortical Representation, PLoS Computational Biology, vol. 10, Issue 11, 29 pages (Nov. 2014). |
Kim, S., Novel approaches to clustering, biclustering and algorithms based on adaptive resonance theory and Intelligent control, Doctoral Dissertations, Missouri University of Science and Technology, 125 pages (2016). |
Kipfer, P., Segal, M., and Westermann, R. 2004. UberFlow: A GPU-Based Particle Engine. In Proceedings of the SIGGRAPH/Eurographics Workshop on Graphics Hardware 2004, pp. 115-122. |
Kolb, A., L. Latta, and C. RF7K-SALAMA. 2004. "Hardware-Based Simulation and Collision Detection for Large Particle Systems." In Proceedings of the SIGGRAPH/Eurographics Workshop on Graphics Hardware 2004, pp. 123-131. |
Kompella, Varun Raj, Luciw, Matthew, and Schmidhuber, Jurgen. Incremental slow feature analysis: Adaptive low-complexity slow feature updating from high-dimensional input streams Neural Computation, 24(11):2994-3024, 2012. |
Kowler, E. (2011). Eye movements: The past 25years. Vision Research, 51(13), 1457-1483. doi:10.1016/j.visres.2010.12.014. |
Larochelle H., & Hinton G. (2012). Learning to combine foveal glimpses with a third-order Boltzmann machine. NIPS 2010,1243-1251. |
LeCun, Y., Kavukcuoglu, K., & Farabet, C. (May 2010). Convolutional networks and applications in vision. In Circuits and Systems (ISCAS), Proceedings of 2010 IEEE International Symposium on (pp. 253-256). IEEE. |
Lee, D. D. and Seung, H. S. (1999). Learning the parts of objects by non-negative matrix factorization. Nature, 401(6755):788-791. |
Lee, D. D., and Seung, H. S. (1997). "Unsupervised learning by convex and conic coding." Advances in Neural Information Processing Systems, 9. |
Legenstein, R., Wilbert, N., and Wiskott, L. Reinforcement learning on slow features of high-dimensional input streams. PLoS Computational Biology, 6(8), 2010. ISSN 1553-734X. 13 pages. |
Léveillé, J., Ames, H., Chandler, B., Gorchetchnikov, A., Mingolla, E., Patrick, S., and Versace, M. (2010) Learning in a distributed software architecture for large-scale neural modeling. BIONETICS10, Boston, MA, USA. 8 pages. |
Livitz G., Versace M., Gorchetchnikov A., Vasilkoski Z., Ames H., Chandler B., Leveille J. and Mingolla E. (2011) Adaptive, brain-like systems give robots complex behaviors, The Neuromorphic Engineer,: 10.2417/1201101.003500 Feb. 2011. 3 pages. |
Livitz, G., Versace, M., Gorchetchnikov, A., Vasilkoski, Z., Ames, H., Chandler, B., Leveille, J., Mingolla, E., Snider, G., Amerson, R., Carter, D., Abdalla, H., and Qureshi, S. (2011) Visually-Guided Adaptive Robot (ViGuAR). Proceedings of the International Joint Conference on Neural Networks (IJCNN) 2011, San Jose, CA, USA. 9 pages. |
Lowe, D.G.(2004). Distinctive Image Features from Scale-Invariant Keypoints. Journal International Journal of Computer Vision archive vol. 60, 2, 91-110. |
Lu, Z.L., Liu, J., and Dosher, B.A. (2010) Modeling mechanisms of perceptual learning with augmented Hebbian re-areighting Vision Research, 50(4). 375-390. |
Luo et al., "Ailificial neural network computation on graphic process unit." Neural Networks, 2005. IJCNN'05. Proceedings 2005 IEEE International Joint Conference on vol. 1 IEEE, 2005 pp. 622-626. |
Mahadevan, S. Proto-value functions: Developmental reinforcement learning. In Proceedings of the 22nd international conference on Machine learning, pp. 553-560. ACM, 2005. |
Meuth, J.R. and Wunsch, D.C. (2007) A Survey of Neural Computation On Graphics Processing Hardware. 22nd IEEE International Symposium on Intelligent Control, Part of IEEE Multi-conference on Systems and Control, Singapore, Oct. 1-3, 2007, 5 pages. |
Minih, Volodymyr, Kavukcuoglu, Koray, Silver, David, Rusu, Andrei A, Veness, Joel, Bellemare, Marc G, Graves, Alex, Riedmiller, Martin, Fidjeland, Andreas K, Ostrovski, Georg, et al. Human-level control through deep reinforcement learning. Nature, 518(7540):529-533, Feb. 25, 2015. |
Mishkin M, Ungerleider LG. (1982). "Contribution of striate inputs to the visuospatial functions of parieto-preoccipital cortex in monkeys," Behav Brain Res, 6 (1): 57-77. |
Montrym et al., The GeForce 6800, in IEEE Micro, vol. 25, No. 2, pp. 41-51, March-Apr. 2005. |
Moore, Andrew W and Atkeson, Christopher G. Prioritized sweeping: Reinforcement learning with less data and less time. Machine Learning, 13(1):103-130,1993. |
Najemnik, J., and Geisler, W. (2009). Simple summation rule for optimal fixation selection in visual search. Vision Research. 49, 1286-1294. |
Non-Final Office Action dated Jan. 4, 2018 from U.S. Appl. No. 15/262,637, 23 pages. |
Non-Final Office Action dated May 31, 2018 from U.S. Appl. No. 14/947,516, 16 pages. |
Notice of Alllowance dated May 22, 2018 from U.S. Appl. No. 15/262,637, 6 pages. |
Notice of Allowance dated Dec. 16, 2016 from U.S. Appl. No. 14/662,657. |
Notice of Allowance dated Jul. 27, 2016 from U.S. Appl. No. 14/662,657. |
Oh, K.-S., and Jung, K. 2004. GPU implementation of neural networks. Pattern Recognition 37, pp. 1311-1314. |
Oja, E. (1982). Simplified neuron model as a principal component analyzer. Journal of Mathematical Biology 15(3), 267-273. |
Partial Supplementary European Search Report dated Jul. 4, 2017 from European Application No. 14800348.6, 13 pages. |
Perumalla, "Discrete-event execution alternatives on general purpose graphical processing units (GPGPUs)." Proceedings of the 20th Workshop on Principles of Advanced and Distributed Simulation. IEEE Computer Society, 2006.8 pages. |
Raijmakers, M.E.J., and Molenaar, P. (1997). Exact Art: A complete implementation of an ART network Neural networks 10 (4), 649-669. |
Ranzato, M. A., Huang, F. J., Boureau, Y. L., & Lecun, Y. (2007, June). Unsupervised learning of invariant feature hierarchies with applications to object recognition. In Computer Vision and Pattern Recognition, 2007. CVPR'07. IEEE Conference on (pp. 1-8). IEEE. |
Raudies, F., Eldridge, S., Joshi, A., and Versace, M. (Aug. 20, 2014). Learning to navigate in a virtual world using optic flow and stereo disparity signals. Artificial Life and Robotics, DOI 10.1007/10015-014-0153-1. 15 pages. |
Ren, Y et al., Ensemble Classification and Regression—Recent Developments, Applications and Future Directions, in IEEE Computational Intelligence Magazine, 10.1109/MCI.2015.2471235, 14 pages (2016). |
Riesenhuber, M., & Poggio, T. (1999). Hierarchical models of object recognition in cortex. Nature Neuroscience, 2 (11), 1019-1025. |
Riesenhuber, M., & Poggio, T. (2000). Models of object recognition. Nature neuroscience, 3, 1199-1204. |
Rolfes, T. 2004. Artificial Neural Networks on Programmable Graphics Hardware. In Game Programming Gems 4, A. Kirmse, Ed. Charles River Media, Hingham, MA, pp. 373-378. |
Rublee, E., Rabaud, V., Konolige, K., & Bradski, G. (2011). ORB: An efficient alternative to SIFT or SURF. In IEEE International Conference on Computer Vision (ICCV) 2011, 2564-2571. |
Ruesch, J et al. 2008. Multimodal Saliency-Based Bottom-Up Attention a Framework for the Humanoid Robot iCub. 2008 IEEE International Conference on Robotics and Automation, pp. 962-965. |
Rumelhart D., Hinton G., and Williams, R. (1986). Learning internal representations by error propagation. In Parallel distributed processing: explorations in the microstructure of cognition, vol. 1, MIT Press. 45 pages. |
Rumpf, M. and Strzodka, R. Graphics processor units: New prospects for parallel computing. In Are Magnus Bruaset and Aslak Tveito, editors, Numerical Solution of Partial Differential Equations on Parallel Computers, vol. 51 of Lecture Notes in Computational Science and Engineering, pp. 89-134. Springer, 2005. |
Salakhutdinov, R., & Hinton, G. E. (2009). Deep boltzmann machines. In International Conference on Artificial Intelligence and Statistics (pp. 448-455). |
Schaul, Tom, Quan, John, Antonoglou, Ioannis, and Silver, David. Prioritized experience replay. arXiv preprint arXiv: 1511.05952, Nov. 18, 2015. 21 pages. |
Schmidhuber, J. (2010). Formal theory of creativity, fun, and intrinsic motivation (1990-2010). Autonomous Mental Development, IEEE Transactions on, 2(3), 230-247. |
Schmidhuber, Jürgen. Curious model-building control systems. In Neural Networks, 1991. 1991 IEEE International Joint Conference on, pp. 1458-1463. IEEE, 1991. |
Seibert, M., & Waxman, A.M. (1992). Adaptive 3-D Object Recognition from Multiple Views. IEEE Transactions on Pattern Analysis and Machine Intelligence, 14 (2), 107-124. |
Setoain et al., "Parallel hyperspectral image processing on commodity graphics hardware." Parallel Processing Workshops, 2006. ICPP 2006 Workshops 2006 International Conference on. IEEE, 2006. 8 pages. |
Sherbakov, L. and Versace, M. (2014) Computational principles for an autonomous active vision system. Ph.D., Boston University, http://search.proquest.com/docview/1558856407. 194 pages. |
Sherbakov, L. et al. 2012. CogEye: from active vision to context identification, youtube, retrieved from the Internet an Oct. 10, 2017: URL://www.youtube.com/watch?v=i5PQk962B1k, 1 page. |
Sherbakov, L. et al. 2013. CogEye: system diagram module brain area function algorithm approx # neurons, retrieved from the Internet on Oct. 12, 2017: URL://http://www-labsticc.univ-ubs.fr/˜coussy/neucomp2013/index_fichiers/material/posters/NeuComp2013_final56x36.pdf, 1 page. |
Sherbakov, L., Livitz, G., Sohail, A., Gorchetchnikov, A., Mingolla, E., Ames, H., and Versace, M (2013b) A computational model of the role of eye-movements in object disambiguation. Cosyne, Feb. 28-Mar. 3, 2013. Salt Lake City, UT, USA. 2 pages. |
Sherbakov, L., Livitz, G., Sohail, A., Gorchetchnikov, A., Mingolla, E., Ames, H., and Versace, M. (2013a) CogEye: An online active vision system that disambiguates and recognizes objects NeuComp 2013.2 pages. |
Smolensky, Paul. Information processing in dynamical systems: Foundations of harmony theory. No. CU-CS-321-86. Colorado Univ At Boulder Dept of Computer Science, 1986. 88 pages. |
Snider, Greg, et al. "From synapses to circuitry: Using memristive memory to explore the electronic brain." IEEE Computer, vol. 44(2). (2011): 21-28. |
Spratling, M. W. (2008). Predictive coding as a model of biased competition in visual attention. Vision Research, 48(12):1391-1408. |
Spratling, M. W. (2012). Unsupervised learning of generative and discriminative weights encoding elementary image components in a predictive coding model of cortical function. Neural Computation, 24(1):60-103. |
Spratling, M. W., De Meyer, K., and Kompass, R. (2009). Unsupervised learning of overlapping image components using divisive input modulation. Computational intelligence and neuroscience. 20 pages. |
Sprekeler, H. On the relation of slow feature analysis and laplacian eigenmaps. Neural Computation, pp. 1-16, 2011. |
Sun, Z et al., Recognition of SAR target based on multilayer auto-encoder and SNN, International Journal of Innovative Computing, Information and Control, vol. 9, No. 11, pp. 4331-4341, Nov. 2013. |
Sutton, R. S., and Barto, A. G. (1998). Reinforcement learning: An introduction(vol. 1, No. 1). Cambridge: MIT press. 10 pages. |
Tong, F., Ze-Nian Li, (1995). Reciprocal-wedge transform for space-variant sensing, Pattern Analysis and Machine Intelligence, IEEE Transactions on , vol. 17, No. 5, pp. 500-551 doi: 10 1109/34.391393. |
Torralba, A., Oliva, A., Castelhano, M.S., Henderson, J.M. (2006). Contextual guidance of eye movements and attention in real-world scenes: the role of global features in object search Psychological Review, 113(4).766-786. |
Van Hasselt, Hado, Guez, Arthur, and Silver, David. Deep reinforcement learning with double q-learning. arXiv preprint arXiv: 1509.06461, Sep. 22, 2015. 7 pages. |
Versace, Brain-inspired computing. Invited keynote address, Bionetics 2010, Boston, MA, USA. 1 page. |
Versace, M. (2006) From spikes to interareal synchrony: how attentive matching and resonance control learning and Information processing by laminar thalamocortical circuits. NSF Science of Learning Centers PI Meeting, Washington, DC, USA. 1 page. |
Versace, M., (2010) Open-source software for computational neuroscience: Bridging the gap between models and behavior. In Horizons in Computer Science Research, vol. 3 43 pages. |
Versace, M., Ames, H., Léveillé, J., Fortenberry, B., and Gorchetchnikov, A. (2008) KlnNeSS: A modular framework for computational neuroscience Neuroinformatics, 2008 Winter; 6(4):291-309. Epub Aug. 10, 2008. |
Versace, M., and Chandler, B. (2010) MoNeta: A Mind Made from Memristors. IEEE Spectrum, Dec. 2010. 8 pages. |
Versace, TEDx Fulbright, Invited talk, Washington DC, Apr. 5, 2014. 30 pages. |
Webster, Bachevalier, Ungerleider (1994). Connections of IT areas TEO and TE with parietal and frontal cortex in macaque monkeys. Cerebal Cortex, 4(5), 470-483. |
Wiskott, Laurenz and Sejnowski, Terrence. Slow feature analysis: Unsupervised learning of invariances. Neural Computation, 14(4):715-770, 2002. |
Wu, Yan & J. Cai, H. (2010). A Simulation Study of Deep Belief Network Combined with the Self-Organizing Mechanism of Adaptive Resonance Theory. 10.1109/CISE.2010.56//265, 4 pages. |
Also Published As
Publication number | Publication date |
---|---|
US20140192073A1 (en) | 2014-07-10 |
US8648867B2 (en) | 2014-02-11 |
USRE48438E1 (en) | 2021-02-16 |
US20080117220A1 (en) | 2008-05-22 |
US9189828B2 (en) | 2015-11-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
USRE49461E1 (en) | Graphic processor based accelerator system and method | |
US9978115B2 (en) | Sprite graphics rendering system | |
Blythe | Rise of the graphics processor | |
US8099584B2 (en) | Methods for scalably exploiting parallelism in a parallel processing system | |
US7750915B1 (en) | Concurrent access of data elements stored across multiple banks in a shared memory resource | |
KR101590734B1 (en) | Memory sharing in graphics processing unit | |
CN109643291A (en) | Method and apparatus for the effective use graphics process resource in virtualization performing environment | |
US7058945B2 (en) | Information processing method and recording medium therefor capable of enhancing the executing speed of a parallel processing computing device | |
US11526964B2 (en) | Deep learning based selection of samples for adaptive supersampling | |
Sun et al. | Evaluating performance tradeoffs on the radeon open compute platform | |
JP2020537785A (en) | Multi-layer neural network processing with a neural network accelerator using merged weights and a package of layered instructions to be hosted | |
CN103309786A (en) | Methods and apparatus for interactive debugging on a non-pre-emptible graphics processing unit | |
GB2489526A (en) | Representing and calculating with sparse matrixes in simulating incompressible fluid flows. | |
US9513923B2 (en) | System and method for context migration across CPU threads | |
CN113610697A (en) | Scalable sparse matrix multiplication acceleration using systolic arrays with feedback inputs | |
US20060100835A1 (en) | Software package definition for PPU enabled system | |
US9465666B2 (en) | Game engine and method for providing an extension of the VSIPL++ API | |
US7523264B1 (en) | Apparatus, system, and method for dependent computations of streaming multiprocessors | |
US8219372B1 (en) | Computer-readable medium, method and computing device for N-body computations using parallel computation systems | |
Mihai et al. | Implementing high performance system simulators using modern graphics rendering devices: Implementing system control algorithms on graphics hardware | |
US20240111925A1 (en) | Hardware power optimization via e-graph based automatic rtl exploration | |
US9542192B1 (en) | Tokenized streams for concurrent execution between asymmetric multiprocessors | |
Kejnar | Generating rendering commands on graphics cards | |
JP2023004864A (en) | Use of sparsity metadata for reducing systolic array power consumption | |
Hawick et al. | Parallel Acceleration with GPUs for High Performance Applications |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY |
|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO SMALL (ORIGINAL EVENT CODE: SMAL); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY |
|
AS | Assignment |
Owner name: NEURALA, INC., MASSACHUSETTS Free format text: MERGER AND CHANGE OF NAME;ASSIGNORS:NEURALA LLC;NEURALA, INC.;REEL/FRAME:055279/0655 Effective date: 20130221 Owner name: NEURALA LLC, MASSACHUSETTS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:GORCHETCHNIKOV, ANATOLI;AMES, HEATHER MARIE;VERSACE, MASSIMILIANO;AND OTHERS;REEL/FRAME:055227/0127 Effective date: 20071012 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YR, SMALL ENTITY (ORIGINAL EVENT CODE: M2552); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY Year of fee payment: 8 |