EP2725498B1 - DMA vector buffer - Google Patents

DMA vector buffer Download PDF

Info

Publication number
EP2725498B1
EP2725498B1 EP13189405.7A EP13189405A EP2725498B1 EP 2725498 B1 EP2725498 B1 EP 2725498B1 EP 13189405 A EP13189405 A EP 13189405A EP 2725498 B1 EP2725498 B1 EP 2725498B1
Authority
EP
European Patent Office
Prior art keywords
memory
dma
guard
data
instruction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
EP13189405.7A
Other languages
German (de)
English (en)
French (fr)
Other versions
EP2725498A2 (en
EP2725498A3 (en
Inventor
Andrew J. Higham
Michael S. Allen
John Redford
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Analog Devices Global ULC
Original Assignee
Analog Devices Global ULC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Analog Devices Global ULC filed Critical Analog Devices Global ULC
Publication of EP2725498A2 publication Critical patent/EP2725498A2/en
Publication of EP2725498A3 publication Critical patent/EP2725498A3/en
Application granted granted Critical
Publication of EP2725498B1 publication Critical patent/EP2725498B1/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/20Handling requests for interconnection or transfer for access to input/output bus
    • G06F13/28Handling requests for interconnection or transfer for access to input/output bus using burst mode transfer, e.g. direct memory access DMA, cycle steal

Definitions

  • the present disclosure relates generally to computer processors and, more particularly, to a direct memory access buffer.
  • Parallel processing is often implemented by a processor to optimize processing applications, for example, by a digital signal processor (DSP) to optimize digital signal processing applications.
  • DSP digital signal processor
  • a processor can operate as a single instruction, multiple data (SIMD), or data parallel, processor to achieve parallel processing.
  • SIMD operations a single instruction is sent to a number of processing elements of the processor, where each processing element can perform the same operation on different data.
  • stride refers to the incremental step size of each element, which may or may not be the same as the element size.
  • an array of 32-bit (4 byte) elements may have a stride of 4 bytes, particularly on a processor with a 32-bit data word size. This is referred to as a unity stride.
  • a non-unity stride occurs when one item is accessed for every N elements. For example, with a stride of four, every fourth WORD is accessed.
  • EP 1708090 A2 discloses a method and an apparatus for direct input and output in a virtual machine environment.
  • a direct memory access (DMA) engine comprising logic configured to receive a DMA request directed to a memory block; start a DMA transfer; and as the DMA transfer progresses, update guards associated with portions of the memory block for which the DMA transfer is complete.
  • DMA direct memory access
  • a processor comprising circuitry to provide a memory instruction directed to a memory block, the instruction configured to test a guard associated with the memory block, wherein the guard comprises a guard bit; if the guard is set, stall the instruction; and if the guard is not set: identify a free DMA channel; and send a DMA request for the memory block to a DMA engine.
  • a computer-implemented method comprising receiving a memory access request directed to an addressed memory region; setting at least one guard on the memory region; identifying a free memory channel to service the memory access request; initiating a data transfer to service the memory access request; and after completing at least a portion of the data transfer, releasing the guard associated with the completed portion.
  • processors including for example central processing units (CPUs) and digital signal processors (DSPs), continue to increase in speed and complexity at a rate greater than memory technologies. Because increased abilities also mean that processor can handle more data in a single time increment, the apparent divergence of processor speed versus memory speed is further exacerbated. This may become a limiting factor in the number of useful operations per second (OPS) performed. For example, if a fast processor relies on a slow memory, it may spend most of its time idle, waiting for data operands to be written into registers, or for old computation results to be written out from registers. Additionally, memory that runs at or near the speed of the processor may be orders of magnitude more expensive than memory that is slow relative to the processor.
  • OPS useful operations per second
  • a solution is to provide one or more levels of local, on-chip or near-chip memory such as cache or local L1 memory.
  • Local memory runs at or near the speed of the processor, and thus can provide data nearly instantly from the processor's perspective.
  • Cache holds copies of data that have a home location in slower main memory, and provides a table to track the data currently in local memory and their consistency with the same data in main memory.
  • the processor may address a datum by its main memory address, but may receive a copy from local memory if a copy is stored there.
  • local L1 memory may be directly addressable.
  • memory architecture Another aspect of memory architecture that affects performance is data placement.
  • the paths between memory and the processor can be implemented more efficiently if there are restrictions on how data move across them. For example, each processing element of a vector processor might be restricted to accessing only certain data, such as those with a particular address alignment. Therefore, algorithms may be more efficient if data are arranged in a particular way, which may not be a simple linear block.
  • the task of selecting data for loading into the cache or local memory may be handled by separate hardware, which employs certain known algorithms to select memory to pre-load, often fetching large contiguous blocks of memory, as it is common to operate on contiguous blocks.
  • the processor in the case of a cache "miss,” where the requested data are not already prefetched into cache, the processor "stalls,” sometimes for as many as tens or hundreds of clock cycles, while useful data are fetched from main memory.
  • a programmer may remain agnostic of the operation of cache.
  • the programmer merely addresses data according to their main memory address, and movement of data into and out of main memory is managed entirely by hardware.
  • cache space will be wasted on useless data, and some cache misses are inevitable, resulting in processor stalls.
  • Non-explicit cache controllers also do not address the issue of non-linear data placement.
  • DMA controllers may also be programmed to reorganize data after they are moved, to address the data placement issue.
  • Non-cache or cache-plus-L1 memory processors may rely on DMA engines to efficiently copy data into or out of the processor's memory.
  • Some DMA architectures are not synchronized (or at most, are loosely coupled) to processor instruction execution, and are therefore difficult to program so that data arrive (or are written out) just in time. Since DMA engines are effectively separate processors operating in parallel with the core, data movement may be arranged to avoid overwriting memory required by the core before it has used it, and vice versa.
  • vector processors may execute most efficiently in statically scheduled, predictably looping code, which may efficiently consume and produce long, contiguous vectors.
  • vector processors are often programmed to do only a small set of fixed, repetitive tasks.
  • vector processors may rely on local "L1" memory. Data buffers for vector processors also may not be organized in contiguous vectors outside of L1 memory.
  • a processor architecture facilitates synchronizing DMA data movement with a processing core.
  • a method for providing explicit, synchronized data pre-load/post-store to an instruction level processor is disclosed, and may be embodied in a "primitive.”
  • a primitive in this context means a basic or primitive operation that may be used to build higher-level operations in conjunction with other primitives or higher-level operations, and may be, by way of non-limiting example, a user-accessible hardware instruction, a non-user-accessible operation performed as part of another hardware instruction, a user-accessible software procedure, or a non-user-accessible software procedure performed as part of a different user-accessible software procedure.
  • FIG. 1 is a block diagram of a digital signal processing system 100 according to an example embodiment of the present disclosure.
  • a system bus 220 arbitrates communication between several subsystems including for example a core 300, local L1 memory 120, DMA engine 212, main memory 320, and I/O devices 310.
  • DMA engine 212 is configured to transfer data (such as operands) from main memory 320 (or some other I/O device 310) to L1 memory 120.
  • Core 300 operates on these data to produce results in L1 memory 120, and then DMA engine 212 transfers the results to main memory 320 or I/O devices 310.
  • FIG. 2 is a schematic block diagram of a memory subsystem 200 according to various aspects of the present disclosure.
  • Memory subsystem 200 communicates with core 300, which may include one or more processing elements, and with system bus 220.
  • Memory subsystem 200 includes local L1 memory 120, DMA engine 212 including DMA channels 210, and in some embodiments may be a DMA engine dedicated specifically to servicing processing elements PE of compute array 130 ( FIG. 3 ), or may include DMA channels specifically dedicated to the same.
  • Memory subsystem 200 may interconnect with input/output (I/O) devices 310, other devices, or combinations thereof via system bus 220.
  • Local L1 memory 120 may be a fast, small memory that in some embodiments is integrated with compute array 130 on a single chip, while main memory 320 ( FIG. 1 ) may be a larger, relatively slow off-chip memory.
  • main memory 320 FIG. 1
  • main memory 320 FIG. 1
  • An example guard mechanism may be provided by comparators 230. For example, when core 300 executes a LOAD instruction, its address is compared with all the target start and end addresses, and if it falls in any of the ranges, the load is stalled until the start address has been incremented past the LOAD instruction's address. Similarly, STORE instructions are stalled while their addresses fall within any of the source ranges.
  • FIG. 3 is a schematic block diagram of an example digital signal processor (DSP) core 300 according to various aspects of the present disclosure, showing L1 memory 120 in situ with core 300.
  • DSP digital signal processor
  • FIG. 1 has been simplified for the sake of clarity and to better understand some of the novel concepts of the present disclosure. Additional features may be added to 300 or to DSP system 100 overall, and some of the features described below may be replaced or eliminated in other embodiments of DSP 300.
  • DSP system 100 is provided as only one example embodiment of a processor that may benefit from the present disclosure.
  • Other types of processors, including central processing units and other programmable devices may be used, and in a general sense, the disclosure of this specification may be used in connection with any machine meeting the well-known von Neuman architecture.
  • Program sequencer 114 provides instruction addresses to program memory 116 for instruction fetches.
  • Program memory 116 stores programs that core 300 implements to process data (such as that stored in memory 120) and can also store process data.
  • Programs include instruction sets having one or more instructions, and core 300 implements the programs by fetching the instructions, decoding the instructions, and executing the instructions.
  • programs may include instruction sets for implementing various DSP algorithms.
  • Core 300 may be configured to perform various parallel operations. For example, during a single cycle, processing elements PE may access an instruction (via interconnection network 142) and access N data operands from memory (via interconnection network 144) for synchronous processing. In SIMD mode, core 300 may process multiple data streams in parallel. For example, when in SIMD mode, core 300 in a single cycle may dispatch a single instruction to each or a plurality of processing elements PE via interconnection network 142; load N data sets from memory (memory 120, program memory 116, other memory, or combination thereof) via interconnection network 144, one data set for each processing element PE (in an example, each data set may include two data operands); execute the single instruction synchronously in processing elements PE; and store data results from the synchronous execution in memory 120.
  • FIG. 4 illustrates an example data flow that can result during operation of core 300 according to various aspects of the present disclosure.
  • This operation can be optimized by allocating multiple buffers in L1 memory 120 so that DMA engine 212 transfers and core operations may happen in parallel, where L1 memory 120 includes multiple buffers (for example, four buffers (buffer 1, buffer 2, buffer 3, and buffer 4), as shown in FIG. 5 ).
  • the block diagram of FIG. 4 shows the overall data flow, wherein DMA engine 212 transfers data directly from main memory 120 into L1 buffer 1 120-1. Simultaneously, DMA engine 212 may be able to handle a write from L1 memory buffer 120-2 into main memory 120.
  • the buffers may be switched again, as follows:
  • one example method to ensure memory buffer availability is to implement a software handshake so that both core 300 and DMA engine 212 are aware when one has finished writing so that the other can start reading.
  • guards 610 such as guard bits, comparators, or a similar flagging mechanism, to facilitate synchronization between core 300 and DMA engine 210. This may enable safe memory operation while consuming fewer memory buffers 510 than are required for a software handshake architecture as shown in FIG. 6 .
  • Guards 610 prevent core 300 from accessing parts of a memory buffer 510 that DMA engine 212 is using. When a memory operation starts, the entire memory buffer 510 is guarded, and as the transfer progresses, the guarded region is reduced to just that portion of memory buffer 510 left to process. Memory transfers that are guarded may be referred to in some embodiments as "protected transfers.”
  • core 300 and DMA engine 212 may safely use the same buffers 510, as illustrated in FIG. 7 .
  • core 300 writes results to memory buffer 510-3 while DMA engine 212 transfers the results out to DMA buffer 210-1, and core 300 reads operands from memory buffer 510-4 while DMA engine 212 transfers data in to DMA buffer 210-2.
  • MEMCPY_OUT memory copy out
  • DMA engine 212 initiates a DMA transfer, which starts in block 870 with a partial transfer, meaning that the first portion of the requested memory operation is completed. For example if the DMA request is to load four words from memory, DMA engine 212 may finish loading the first word on the first clock cycle. In block 882, DMA engine 212 may then clear any guards associated with the first word, such as a guard bit associated with the first word. Guard bits are used as an example guard mechanism, but any of the guards disclosed in this specification may be used, and those with skill in the art will recognize other possible guard mechanisms. Furthermore, although this embodiment "clears" the guard, those with skill in the art will recognize that a guard may be updated in various ways that indicate that the memory region has become available.
  • DMA engine 212 After clearing the guard bit, in decision block 880, DMA engine 212 checks whether the full block memory access has been completed. If it is incomplete, the method returns to block 870 to transfer the next portion of the requested memory. If it is complete, then in block 890, the DMA terminates.
  • FIG. 9 is a flow chart of an example method for performing memory load and store operations from the perspective of core 300.
  • the method may be performed while DMA engine 212 continues to service previous memory requests, and may be performed as part of a MEMCPY primitive or in conjunction with a MEMCPY primitive. It is expressly intended that certain operations be considered together, even if core 300 is configured to treat them as separate instructions or primitives.
  • core 300 may provide a separate MEMCPY primitive to separately set guard bits as discussed above, while the load and store primitives are provided separately.
  • the load and store primitives themselves may be configured to set guard bits without substantially altering the spirit of the method disclosed here.
  • core 300 issues in block 910 a read ("load") or write ("store") access request.
  • core 300 checks to see whether there is a guard set over the requested memory block. If all or part of the requested block is guarded, then the MEMCPY instruction stalls and in block 930 enters a wait state. Once guard bits for the requested memory region are cleared, then in block 960, core 300 issues a block memory access request, which may be directed to DMA engine 212. After issuing the DMA request, in block 970, the MEMCPY primitive may terminate without waiting for DMA engine 212 to complete the memory operation.
  • MEMCPY primitive may have additional parameters, for example, specifying strides through external or internal memory in the manner of a two-dimensional DMA.
  • guarding may also be provided in other embodiments. For example, upon an access request from core 300, the requested address may be compared directly with the bounds of active MEMCPY operations using comparators, as seen in FIG. 2 . This method limits the MEMCPY primitives to simple block data operations, but also involves less delay at the initiation stage. It also may allow regions outside L1 memory 120 to be guarded.
  • readers are caused to stall via a standard hardware handshake protocol used to ensure the integrity of data transfers to fast memory.
  • a DMA controller 212 reading L1 memory 120 via an Advanced Extensible Interface (AXI) slave port may be stalled by a potentially long delay between the slave asserting ARREADY, to indicate the address has been accepted, and asserting RVALID, to indicate data are available.
  • AXI Advanced Extensible Interface
  • a hardware flag generates interrupts to DMA engine 212 upon a VSTORE event, which initiates a memory transaction instead of blindly initiating the transfer and stalling, as in FIG. 9 .
  • the flag state transition may initiate a fast thread swap (in the case of a hardware-threaded processor). This embodiment requires a mechanism to map valid bits to DMA channel 210 interrupts, if multiple DMA channels 210 are allowed.
  • buffers could be allocated just for the duration of loops with alignment constraints while an application is being developed, and only once it is working have their lifetimes adjusted to maximize the concurrent operation of core 300 and DMA engine 212. Even after an application has been tuned guard bits 610 ensure correct operation should a reader catch up with a writer under extraordinary system conditions.
  • the MEMCPY primitive may be implemented in software and accessed via an API, or in hardware and accessed via hardware instructions.
  • a MEMZERO primitive is also added, which zeroes out the destination range.
  • a hardware queue is added for protected transfers so that several protected transfers can be queued, and each will finish in first-in-first-out (FIFO) order. If a MEMCPY is issued when the queue is full, it will be stalled.
  • FIFO first-in-first-out
  • the MEMCPY primitive could stall for an unacceptably long time if it is waiting for a previous MEMCPY primitive to finish. So as not to delay interrupts indefinitely, in an example embodiment, the MEMCPY primitive is configured to handle interrupts. In particular, because the MEMCPY primitive did not start execution, the interrupt mechanism will return to and re-execute the MEMCPY primitive, after interrupt processing completes.
  • the MEMCPY primitive may assert an interrupt bit.
  • the interrupt bit may be masked so that no interrupt actually happens.
  • a WAIT instruction may also be added to stall until a particular interrupt bit is asserted. This can be pre-empted by other un-masked interrupts, but after interrupt processing complete, it will return to the WAIT. Only the specified interrupt will cause the WAIT instruction to complete.
  • a timeout counter control register is added so that bad transfers do not cause infinite waits.
  • This example register may only count when a WAIT instruction is active. After a certain threshold time, for example 3,000 cycles, the register may be configured to force the WAIT to its off state.
  • WAIT may be implemented with "valid" or "dirty” bits protecting buffers 510 in L1 cache 120.
  • the dirty bit may be set whenever data is written into a buffer 510 to indicate that the buffer 510 has been modified.
  • an example MEMCPY primitive may also mark the target buffer 510 as invalid or "dirty.” Once DMA engine 212 moves data into or out of a part of buffer 510, it marks buffer 510 as valid or "clean.”
  • MEMCPY primitives and DMA transfers may be configured to operate in cacheable memory regions. This may require, for example, interacting with a cache controller to gain ownership of one or more cache lines. Guard bits may then be synthesized for synchronizing core 300 and DMA engine 212 from the "standard" valid and dirty bits of the write back cache controller.
  • the standard bits may have definitions including:
  • DMA engine 212 If DMA engine 212 is moving data to the cache memory, i.e., writing to the buffer, it should interact with the cache controller to prevent the controller from inadvertently attempting a line fill if core 300 attempts to access the memory address first, for example through a load instruction.
  • the valid bit may be set after a DMA write to the cache line portion of a buffer has completed (thus resolving the stall condition, such that a load instruction to that cache line region will progress).
  • DMA engine 212 If DMA engine 212 is moving data from cache to main memory 320, it may need to communicate with the cache controller to take control of the lines that map to the buffer. DMA engine 212 may then wait for each cache line of the buffer region to be marked both valid and dirty, as set by the cache control logic after write operations from core 300. DMA engine 212 may then clear the valid and dirty bits after the DMA writes from cache to main memory 320 are complete.
  • writes to dirty regions are also delayed until DMA engine 212 has marked them clean.
  • dirty bits enables a MEMCPY primitive that has the same semantics as the standard C "memcpy()" call, so code may be written in a portable fashion.
  • a double-buffered routine in L1 memory is represented by the code below, where there is one input buffer (dram_buffer0) and one output buffer (dram_buffer1), where a and b are the two buffers in L1 memory. This code does at least three passes, where the first prefetches the input, the middles do the work, and the last saves the final output.
  • This whole sequence may be provided as a macro, or the kernel routine can be marked as in-lined.
  • a two-dimensional MEMCPY primitive may be provided. This version of the primitive may receive three additional parameters: source line stride, destination line stride, and row count. In this case, a single primitive may require up to six operands total.
  • an application programming interface is provided for a vector buffer.
  • API application programming interface
  • *vbuf allocates a buffer of count*elsize
  • *vbuf2d allocates a buffer of ycount*xcount*elsize bytes.
  • This case may use the cache line valid bit granularity, even though non-unity stride does not map to a contiguous cache line region.
  • a more general API may fully specify slow and fast (external/internal), direct memory accesses (start address, count, stride, elsize) independently. This may use elsize-granularity valid bits.
  • DMA may also have its own (or walk the processor's) page table to find the virtual to physical mapping, though in some cases this may require more cycles and more hardware.
  • a method may be provided to return a vector buffer handle for use in other API calls, as follows.
  • this method may invalidate any buffer previously associated with VBuf, start a DMA transfer into the buffer using parameters from VBuf, and return a pointer to start of the vector buffer.
  • a method may be provided to initiate a vector store operation as follows.
  • this method invalidates any buffer previously associated with VBuf, starts a DMA transfer from the buffer, which stalls until data become available (or generates a first DMA interrupt after the first valid bit set), and returns a pointer to the start of the vector buffer.
  • a method is provided to release a pointer to a vector bugger as follows.
  • this method waits until any outstanding DMA transfers are complete, and releases the buffer and all associated resources.
  • the API may also include routines to provide finer grained control of buffer states, as follows.
  • An example of use with local allocation of buffers includes:
  • the data movement can be optimized by hoisting the vbuf and vload calls higher above the loop and sinking the vrelease further below the loop.
  • valid bits may be used to enable readers and writers to access the same buffer concurrently, which may improve the performance of the application by reducing the time spent waiting for one to complete.
  • valid bits themselves should be set to their invalid state before a DMA transfer is initiated.
  • the number of valid bits that need to be written can be reduced by increasing the number of data bytes each bit describes. However, this also reduces the options for buffer size and alignment.
  • One example solution is to set a data width similar to cache line length, for example between 32 and 128 bytes.
  • the number of valid bits that can be written in parallel can be increased by packing them into the same sub-bank of physical memory.
  • core 300 waits until valid bits are set without disabling interrupts by employing two instructions.
  • the first instruction starts a state machine that runs asynchronously to the processing element PE that set the bit.
  • the second instruction which is interruptible, waits for the state machine to complete.
  • An API may use a small number of DMA channels for a larger numbers of buffers. This can be achieved with a descriptor-based DMA controller 212, by linking a descriptor for each transaction into the list that is being processed by DMA controller 212. To ensure transfers into buffers are not stalled behind transfers out of buffers separate channels should be used for each direction.
  • L1 memory 220 may be configured as cache, so cache tags containing valid bits are already present.
  • a combined scheme that allows parts of L1 to be used as vector buffers according to the present disclosure, and parts to be used as cache might reuse the same valid bits for both purposes.
  • any capacitors, clocks, DFFs, dividers, inductors, resistors, amplifiers, switches, digital core, transistors, and/or other components can readily be replaced, substituted, or otherwise modified in order to accommodate particular circuitry needs.
  • any capacitors, clocks, DFFs, dividers, inductors, resistors, amplifiers, switches, digital core, transistors, and/or other components can readily be replaced, substituted, or otherwise modified in order to accommodate particular circuitry needs.
  • complementary electronic devices, hardware, software, etc. offer an equally viable option for implementing the teachings of the present disclosure.
  • any number of electrical circuits of the FIGURES may be implemented on a board of an associated electronic device.
  • the board can be a general circuit board that can hold various components of the internal electronic system of the electronic device and, further, provide connectors for other peripherals. More specifically, the board can provide the electrical connections by which the other components of the system can communicate electrically.
  • Any suitable processors (inclusive of digital signal processors, microprocessors, supporting chipsets, etc.), memory elements, etc. can be suitably coupled to the board based on particular configuration needs, processing demands, computer designs, etc.
  • Other components such as external storage, additional sensors, controllers for audio/video display, and peripheral devices may be attached to the board as plug-in cards, via cables, or integrated into the board itself.
  • the electrical circuits of the FIGURES may be implemented as stand-alone modules (e.g., a device with associated components and circuitry configured to perform a specific application or function) or implemented as plug-in modules into application specific hardware of electronic devices.
  • SOC system on chip
  • An SOC represents an IC that integrates components of a computer or other electronic system into a single chip. It may contain digital, analog, mixed-signal, and often radio frequency functions: all of which may be provided on a single chip substrate.
  • MCM multi-chip-module
  • ASICs Application Specific Integrated Circuits
  • FPGAs Field Programmable Gate Arrays
  • the features discussed herein can be applicable to medical systems, scientific instrumentation, wireless and wired communications, radar, industrial process control, audio and video equipment, current sensing, instrumentation (which can be highly precise), and other digital-processing-based systems.
  • certain embodiments discussed above can be provisioned in digital signal processing technologies for medical imaging, patient monitoring, medical instrumentation, and home healthcare. This could include pulmonary monitors, accelerometers, heart rate monitors, pacemakers, etc. Other applications can involve automotive technologies for safety systems (e.g., stability control systems, driver assistance systems, braking systems, infotainment and interior applications of any kind). Furthermore, power train systems (for example, in hybrid and electric vehicles) can use high-precision data conversion products in battery monitoring, control systems, reporting controls, maintenance activities, etc.
  • the teachings of the present disclosure can be applicable in the industrial markets that include process control systems that help drive productivity, energy efficiency, and reliability.
  • the teachings of the signal processing circuits discussed above can be used for image processing, auto focus, and image stabilization (e.g., for digital still cameras, camcorders, etc.).
  • Other consumer applications can include audio and video processors for home theater systems, DVD recorders, and high-definition televisions.
  • Yet other consumer applications can involve advanced touch screen controllers (e.g., for any type of portable media device).
  • such technologies could readily part of smart phones, tablets, security systems, PCs, gaming technologies, virtual reality, simulation training, etc.
  • references to various features e.g., elements, structures, modules, components, steps, operations, characteristics, etc.
  • references to various features e.g., elements, structures, modules, components, steps, operations, characteristics, etc.
  • references to various features are intended to mean that any such features are included in one or more embodiments of the present disclosure, but may or may not necessarily be combined in the same embodiments.
  • a system that can include any suitable circuitry, dividers, capacitors, resistors, inductors, ADCs, DFFs, logic gates, software, hardware, links, etc.
  • a circuit board coupled to a plurality of electronic components.
  • the system can include means for clocking data from the digital core onto a first data output of a macro using a first clock, the first clock being a macro clock; meansfor clocking the data from the first data output of the macro into the physical interface using a second clock, the second clock being a physical interface clock; means for clocking a first reset signal from the digital core onto a reset output of the macro using the macro clock, the first reset signal output used as a second reset signal; means for sampling the second reset signal using a third clock, which provides a clock rate greater than the rate of the second clock, to generate a sampled reset signal; and means for resetting the second clock to a predetermined state in the physical interface in response to a transition of the sampled reset signal.
  • the 'means for' in these instances can include (but is not limited to) using any suitable component discussed herein, along with any suitable software, circuitry, hub, computer code, logic, algorithms, hardware, controller, interface, link, bus, communication pathway, etc.
  • the system includes memory that further comprises machine-readable instructions that when executed cause the system to perform any of the activities discussed above.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)
  • Memory System (AREA)
  • Multi Processors (AREA)
EP13189405.7A 2012-10-23 2013-10-18 DMA vector buffer Active EP2725498B1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201261717564P 2012-10-23 2012-10-23
US14/040,367 US9092429B2 (en) 2012-10-23 2013-09-27 DMA vector buffer

Publications (3)

Publication Number Publication Date
EP2725498A2 EP2725498A2 (en) 2014-04-30
EP2725498A3 EP2725498A3 (en) 2015-08-12
EP2725498B1 true EP2725498B1 (en) 2017-12-13

Family

ID=49447990

Family Applications (1)

Application Number Title Priority Date Filing Date
EP13189405.7A Active EP2725498B1 (en) 2012-10-23 2013-10-18 DMA vector buffer

Country Status (4)

Country Link
US (1) US9092429B2 (zh)
EP (1) EP2725498B1 (zh)
KR (1) KR101572204B1 (zh)
CN (1) CN103777923B (zh)

Families Citing this family (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9342306B2 (en) 2012-10-23 2016-05-17 Analog Devices Global Predicate counter
US9201828B2 (en) 2012-10-23 2015-12-01 Analog Devices, Inc. Memory interconnect network architecture for vector processor
JP6155723B2 (ja) * 2013-03-18 2017-07-05 富士通株式会社 レーダ装置及びプログラム
CN107250995B (zh) * 2014-11-25 2021-11-16 领特投资两合有限公司 存储器管理设备
US9563572B2 (en) 2014-12-10 2017-02-07 International Business Machines Corporation Migrating buffer for direct memory access in a computer system
DE102015104776B4 (de) * 2015-03-27 2023-08-31 Infineon Technologies Ag Verfahren und Vorrichtung zum Verarbeiten von Radarsignalen
CN105335130B (zh) * 2015-09-28 2018-06-26 深圳市中兴微电子技术有限公司 一种处理器及其处理任务的方法
US20170155717A1 (en) * 2015-11-30 2017-06-01 Intel Corporation Direct memory access for endpoint devices
US10929339B2 (en) * 2016-10-17 2021-02-23 Yokogawa Electric Corporation Generation of multiple worksheet exportation
US10503434B2 (en) 2017-04-12 2019-12-10 Micron Technology, Inc. Scalable low-latency storage interface
US10503552B1 (en) * 2017-04-28 2019-12-10 Ambarella, Inc. Scheduler for vector processing operator readiness
US10783240B2 (en) * 2017-09-29 2020-09-22 Stmicroelectronics, Inc. Secure environment in a non-secure microcontroller
US10515036B2 (en) * 2017-10-25 2019-12-24 Microchip Technology Incorporated Bit manipulation capable direct memory access
DE102017126723B4 (de) 2017-11-14 2024-10-10 Infineon Technologies Ag Vorrichtung und Verfahren zum Verarbeiten von Radarsignalen
JP2020004247A (ja) * 2018-06-29 2020-01-09 ソニー株式会社 情報処理装置、情報処理方法およびプログラム
US10606775B1 (en) 2018-12-28 2020-03-31 Micron Technology, Inc. Computing tile
US11301295B1 (en) * 2019-05-23 2022-04-12 Xilinx, Inc. Implementing an application specified as a data flow graph in an array of data processing engines
TWI773106B (zh) * 2021-01-28 2022-08-01 華邦電子股份有限公司 具有運算功能的記憶體裝置及其操作方法
US20240168762A1 (en) * 2022-11-21 2024-05-23 Nvidia Corporation Application programming interface to wait on matrix multiply-accumulate

Family Cites Families (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5247689A (en) 1985-02-25 1993-09-21 Ewert Alfred P Parallel digital processor including lateral transfer buses with interrupt switches to form bus interconnection segments
DE68928980T2 (de) 1989-11-17 1999-08-19 Texas Instruments Inc. Multiprozessor mit Koordinatenschalter zwischen Prozessoren und Speichern
US5355508A (en) 1990-05-07 1994-10-11 Mitsubishi Denki Kabushiki Kaisha Parallel data processing system combining a SIMD unit with a MIMD unit and sharing a common bus, memory, and system controller
WO1995009399A1 (fr) 1993-09-27 1995-04-06 Ntt Mobile Communications Network Inc. Multiprocesseur
US6782440B2 (en) * 2000-07-26 2004-08-24 T.N.S. Holdings, Inc. Resource locking and thread synchronization in a multiprocessor environment
GB2409063B (en) 2003-12-09 2006-07-12 Advanced Risc Mach Ltd Vector by scalar operations
US20060090025A1 (en) 2004-10-25 2006-04-27 Tufford Robert C 9U payload module configurations
JP5748935B2 (ja) 2004-11-03 2015-07-15 インテル コーポレイション Simd命令をサポートするプログラマブルデータ処理回路
US7451249B2 (en) * 2005-03-21 2008-11-11 Hewlett-Packard Development Company, L.P. Method and apparatus for direct input and output in a virtual machine environment containing a guest operating system
US7664889B2 (en) * 2005-09-29 2010-02-16 Intel Corporation DMA descriptor management mechanism
US20110022754A1 (en) 2007-12-06 2011-01-27 Technion Research & Development Foundation Ltd Bus enhanced network on chip
US20100082894A1 (en) * 2008-09-26 2010-04-01 Mediatek Inc. Communication system and methos between processors
US20120054379A1 (en) * 2010-08-30 2012-03-01 Kafai Leung Low power multi-touch scan control system
US9146743B2 (en) 2012-07-11 2015-09-29 International Business Machines Corporation Generalized bit manipulation instructions for a computer processor
US20140115278A1 (en) 2012-10-23 2014-04-24 Analog Devices, Inc. Memory architecture
US9201828B2 (en) 2012-10-23 2015-12-01 Analog Devices, Inc. Memory interconnect network architecture for vector processor
US9342306B2 (en) 2012-10-23 2016-05-17 Analog Devices Global Predicate counter

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
None *

Also Published As

Publication number Publication date
EP2725498A2 (en) 2014-04-30
KR101572204B1 (ko) 2015-11-26
CN103777923A (zh) 2014-05-07
KR20140051797A (ko) 2014-05-02
CN103777923B (zh) 2017-09-05
US20140115195A1 (en) 2014-04-24
US9092429B2 (en) 2015-07-28
EP2725498A3 (en) 2015-08-12

Similar Documents

Publication Publication Date Title
EP2725498B1 (en) DMA vector buffer
JP2966085B2 (ja) 後入れ先出しスタックを備えるマイクロプロセッサ、マイクロプロセッサシステム、及び後入れ先出しスタックの動作方法
US7590774B2 (en) Method and system for efficient context swapping
EP2725497A1 (en) Memory arbitration circuit and method
JP4934356B2 (ja) 映像処理エンジンおよびそれを含む映像処理システム
KR101642646B1 (ko) 인터럽트가능 저장 익스클루시브
US9460016B2 (en) Cache way prediction
CN107957965B (zh) 服务质量序数修改
EP3685275B1 (en) Configurable hardware accelerators
US9910801B2 (en) Processor model using a single large linear registers, with new interfacing signals supporting FIFO-base I/O ports, and interrupt-driven burst transfers eliminating DMA, bridges, and external I/O bus
US8397005B2 (en) Masked register write method and apparatus
JP2024523339A (ja) ニアメモリコンピューティングを用いた複合操作のアトミック性の提供
CN114341805A (zh) 纯函数语言神经网络加速器系统及结构
US6785743B1 (en) Template data transfer coprocessor
EP2804102B1 (en) Parallel atomic increment
US20110247018A1 (en) API For Launching Work On a Processor
US8706923B2 (en) Methods and systems for direct memory access (DMA) in-flight status
JP2023509813A (ja) Simt指令処理方法及び装置
US7107478B2 (en) Data processing system having a Cartesian Controller
JP2002278753A (ja) データ処理システム
JP2006285719A (ja) 情報処理装置および情報処理方法
JP2005301589A (ja) データ処理装置

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20131018

AK Designated contracting states

Kind code of ref document: A2

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

AX Request for extension of the european patent

Extension state: BA ME

RAP1 Party data changed (applicant data changed or rights of an application transferred)

Owner name: ANALOG DEVICES GLOBAL

PUAL Search report despatched

Free format text: ORIGINAL CODE: 0009013

AK Designated contracting states

Kind code of ref document: A3

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

AX Request for extension of the european patent

Extension state: BA ME

RIC1 Information provided on ipc code assigned before grant

Ipc: G06F 13/28 20060101AFI20150703BHEP

17P Request for examination filed

Effective date: 20160211

RBV Designated contracting states (corrected)

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

17Q First examination report despatched

Effective date: 20160613

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: EXAMINATION IS IN PROGRESS

GRAP Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOSNIGR1

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: GRANT OF PATENT IS INTENDED

INTG Intention to grant announced

Effective date: 20170717

GRAS Grant fee paid

Free format text: ORIGINAL CODE: EPIDOSNIGR3

GRAA (expected) grant

Free format text: ORIGINAL CODE: 0009210

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE PATENT HAS BEEN GRANTED

AK Designated contracting states

Kind code of ref document: B1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

REG Reference to a national code

Ref country code: GB

Ref legal event code: FG4D

REG Reference to a national code

Ref country code: AT

Ref legal event code: REF

Ref document number: 955025

Country of ref document: AT

Kind code of ref document: T

Effective date: 20171215

Ref country code: CH

Ref legal event code: EP

REG Reference to a national code

Ref country code: IE

Ref legal event code: FG4D

REG Reference to a national code

Ref country code: DE

Ref legal event code: R096

Ref document number: 602013030703

Country of ref document: DE

REG Reference to a national code

Ref country code: NL

Ref legal event code: MP

Effective date: 20171213

REG Reference to a national code

Ref country code: LT

Ref legal event code: MG4D

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: NO

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20180313

Ref country code: LT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20171213

Ref country code: FI

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20171213

Ref country code: SE

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20171213

REG Reference to a national code

Ref country code: AT

Ref legal event code: MK05

Ref document number: 955025

Country of ref document: AT

Kind code of ref document: T

Effective date: 20171213

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: RS

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20171213

Ref country code: HR

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20171213

Ref country code: GR

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20180314

Ref country code: BG

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20180313

Ref country code: LV

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20171213

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: NL

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20171213

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: CZ

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20171213

Ref country code: ES

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20171213

Ref country code: CY

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20171213

Ref country code: EE

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20171213

Ref country code: SK

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20171213

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: RO

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20171213

Ref country code: IT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20171213

Ref country code: IS

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20180413

Ref country code: AT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20171213

Ref country code: PL

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20171213

Ref country code: SM

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20171213

REG Reference to a national code

Ref country code: DE

Ref legal event code: R097

Ref document number: 602013030703

Country of ref document: DE

PLBE No opposition filed within time limit

Free format text: ORIGINAL CODE: 0009261

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT

26N No opposition filed

Effective date: 20180914

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: DK

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20171213

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: SI

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20171213

REG Reference to a national code

Ref country code: CH

Ref legal event code: PL

REG Reference to a national code

Ref country code: BE

Ref legal event code: MM

Effective date: 20181031

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: MC

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20171213

Ref country code: LU

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20181018

REG Reference to a national code

Ref country code: IE

Ref legal event code: MM4A

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: LI

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20181031

Ref country code: FR

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20181031

Ref country code: BE

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20181031

Ref country code: CH

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20181031

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: IE

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20181018

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: MT

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20181018

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: TR

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20171213

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: PT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20171213

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: MK

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20171213

Ref country code: HU

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT; INVALID AB INITIO

Effective date: 20131018

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: AL

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20171213

REG Reference to a national code

Ref country code: GB

Ref legal event code: 732E

Free format text: REGISTERED BETWEEN 20211014 AND 20211020

REG Reference to a national code

Ref country code: DE

Ref legal event code: R081

Ref document number: 602013030703

Country of ref document: DE

Owner name: ANALOG DEVICES INTERNATIONAL UNLIMITED COMPANY, IE

Free format text: FORMER OWNER: ANALOG DEVICES GLOBAL, HAMILTON, BM

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: DE

Payment date: 20230920

Year of fee payment: 11

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: GB

Payment date: 20240919

Year of fee payment: 12