US8711160B1 - System and method for efficient resource management of a signal flow programmed digital signal processor code - Google Patents

System and method for efficient resource management of a signal flow programmed digital signal processor code Download PDF

Info

Publication number
US8711160B1
US8711160B1 US13/691,696 US201213691696A US8711160B1 US 8711160 B1 US8711160 B1 US 8711160B1 US 201213691696 A US201213691696 A US 201213691696A US 8711160 B1 US8711160 B1 US 8711160B1
Authority
US
United States
Prior art keywords
algorithm
processing
memory
buffer
dummy
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
US13/691,696
Other languages
English (en)
Inventor
Mohammed Chalil
John Joseph
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Analog Devices Inc
Original Assignee
Analog Devices Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Analog Devices Inc filed Critical Analog Devices Inc
Priority to US13/691,696 priority Critical patent/US8711160B1/en
Assigned to ANALOG DEVICES, INC. reassignment ANALOG DEVICES, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: CHALIL, MOHAMMED, JOSEPH, JOHN
Application granted granted Critical
Publication of US8711160B1 publication Critical patent/US8711160B1/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/60Memory management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory

Definitions

  • This disclosure relates in general to the field of digital processing systems and, more particularly, to a system and method for efficient resource management of a signal flow programmed digital signal processor code.
  • Signal processing deals with operations on or analysis of measurements of time-varying or spatially varying signals (e.g., sound, images, and sensor data, for example biological data such as electrocardiograms, control system signals, telecommunication transmission signals, etc.)
  • digital signal processing involves processing digitized discrete time sampled signals by general-purpose computers or by digital circuits such as application specific integrated circuits (ASICs), field-programmable gate arrays or specialized digital signal processors (DSPs).
  • ASICs application specific integrated circuits
  • DSPs specialized digital signal processors
  • Arithmetic e.g., fixed-point and floating-point, real-valued and complex-valued, multiplication and addition
  • signal processing algorithms e.g., Fast Fourier transform (FFT), finite impulse response (FIR) filter, Infinite impulse response (IIR) filter, etc.
  • FFT Fast Fourier transform
  • FIR finite impulse response
  • IIR Infinite impulse response
  • FIG. 1 is a simplified block diagram illustrating an example embodiment of a system for efficient resource management of a signal flow programmed digital signal processor code
  • FIG. 2 is a simplified block diagram illustrating example details that may be associated with an embodiment of the system
  • FIG. 3 is a simplified diagram illustrating other example details associated with an embodiment of the system
  • FIGS. 4A-4B are simplified diagrams illustrating yet other example details associated with an embodiment of the system.
  • FIG. 5 is a simplified flow diagram illustrating example operations that may be associated with an embodiment of the system
  • FIG. 6 is a simplified flow diagram illustrating further example operations that may be associated with an embodiment of the system
  • FIG. 7 is a simplified block diagram illustrating example details of the system in accordance with an embodiment
  • FIG. 8 is a simplified diagram illustrating example details of the system according to the embodiment.
  • FIG. 9 is a simplified flow diagram illustrating example operations that may be associated with an embodiment of the system.
  • FIG. 10 is a simplified block diagram illustrating example details of the system according to the embodiment.
  • FIG. 11 is a simplified block diagram illustrating example details of an embodiment of the system.
  • FIG. 12 is a simplified diagram illustrating further example details of the embodiment.
  • FIG. 13 is a simplified flow diagram illustrating example operations that may be associated with an embodiment of the system.
  • a method includes determining a connection sequence of a plurality of algorithm elements in a schematic of a signal flow for an electronic circuit, the connection sequence indicating connections between the algorithm elements and a sequence of processing the algorithm elements according to the connections, determining a buffer sequence indicating an order of using a plurality of memory buffers to process the plurality of algorithm elements according to the connection sequence, and reusing at least some of the plurality of memory buffers according to the buffer sequence.
  • determining the buffer sequence includes numbering the connections, the algorithm elements and the memory buffers in an order. For each connection, a first algorithm element that generates an output on the connection before any other algorithm element may be identified. A second algorithm element that receives the output as an input on the connection after all other algorithm elements may also be identified. The first algorithm elements of all the connections may be arranged in an allocation order including an ascending order of first algorithm element numbers. A buffer index for each connection may be generated according to the allocation order, the buffer index for the connection being the same as another buffer index for a re-use connection. The second algorithm element of the re-use connection may be the same as the first algorithm element of the connection.
  • the buffer sequence may include the buffer index for all connections arranged according to the allocation order.
  • determining the buffer sequence may include constructing a memory life matrix (MLM), including information about the algorithm elements and the connection sequence.
  • MLM memory life matrix
  • the MLM may include N rows, representing N algorithm elements, and M columns, representing M connections between the algorithm elements.
  • the method may include other features in various embodiments.
  • FIG. 1 is a simplified block diagram illustrating a system 10 .
  • System 10 includes a graphical emulator 12 that can be used to design a signal flow and program it on an electronic circuit, such as a digital signal processor (DSP).
  • An example schematic (e.g., graphical representation) 13 (generally indicated by an arrow) of a signal flow for an electronic circuit is displayed on graphical emulator 12 .
  • Schematic 13 includes one or more algorithm elements (AEs) 14 ( 1 )- 14 ( 7 ) (e.g., AE 14 ( 1 ) (S1), AE 14 ( 2 ) (S2) AE 14 ( 7 ) (S7)).
  • AEs algorithm elements
  • each AE 14 ( 1 )- 14 ( 7 ) may represent an emulation (e.g., match; copy actions, functions, etc.; imitate; mimic; reproduce; etc.) of a functional electronic component, for example, an audio input, a filter, a dynamic processor, a frequency modulator, an oscillator, etc. configured to execute (e.g., process, implement, etc.) a specific algorithm.
  • a user may generate schematic 13 manually on graphical emulator 12 , for example, by building schematic 13 using available AEs and other graphical artifacts.
  • the user can associate AEs 14 ( 1 )- 14 ( 7 ) with signal processing algorithms (SPAs) pre-configured in graphical emulator 12 , or generate custom SPAs as desired.
  • SPAs signal processing algorithms
  • AEs 14 ( 1 )- 14 ( 7 ) may be connected with each other through connections 16 ( 1 )- 16 ( 6 ) to realize a specific signal processing algorithm (SPA).
  • Connections 16 ( 1 )- 16 ( 6 ) may indicate inputs to and outputs from each AE 14 ( 1 )- 14 ( 7 ).
  • Connections 16 ( 1 )- 16 ( 6 ) may represent a connection sequence (CS) that simulates signal flow through schematic 13 .
  • the term “connection sequence” includes a sequence (e.g., order, progression, string, evolution, etc.) to process AEs in the schematic according to their corresponding connections.
  • AE 14 ( 1 ) receives an input signal, processes it, and provides an output signal on connection 16 ( 1 ).
  • Output signal on connection 16 ( 1 ) from AE 14 ( 1 ) may be input to AEs 14 ( 2 ) and 14 ( 6 ).
  • AEs 14 ( 2 ) and 14 ( 6 ) consequently cannot be processed until after AE 14 ( 1 ) has been processed.
  • the output signal on connection 16 ( 2 ) from AE 14 ( 2 ) may be input to AEs 14 ( 3 ) and 14 ( 4 ).
  • AE 14 ( 3 ) cannot be processed until after AEs 14 ( 4 ) and 14 ( 5 ) have been processed, as the output from AE 14 ( 5 ) is an input to AE 14 ( 3 ).
  • the output signal on connection 16 ( 3 ) from AE 14 ( 4 ) may be input to AE 14 ( 5 ).
  • Output signal on connection 16 ( 4 ) from AE 14 ( 5 ) may be input to AE 14 ( 3 ).
  • AE 14 ( 3 ) may generate an output signal on connection 16 ( 5 ), which may be input to AE 14 ( 6 ).
  • the output signal on connection 16 ( 6 ) from AE 14 ( 6 ) may be input to AE 14 ( 7 ), which may generate an output.
  • MLM module 20 may interact with a memory element 22 , which can include one or more memory buffers 24 (e.g., buffers 24 ( 1 )- 24 ( 4 )).
  • Memory buffers 24 ( 1 )- 24 ( 4 ) may be used to store values of signals on connections 16 ( 1 )- 16 ( 6 ) during processing of AEs 14 ( 1 )- 14 ( 7 ). “Using” a memory buffer can include reading from the buffer, and/or writing to the buffer.
  • MLM module 20 may also interact with a processor 26 as appropriate.
  • MLM module 20 may facilitate emulating schematics (e.g., schematic 13 ) on graphical emulator 12 in a memory efficient manner.
  • MLM module 20 may receive (from graphical emulator 12 ) information about connections 16 ( 1 )- 16 ( 6 ) and corresponding AEs 14 ( 1 )- 14 ( 7 ).
  • the specific CS and memory requirements of AEs 14 ( 1 )- 14 ( 7 ) may be used by MLM module 20 to generate an optimum memory allocation scheme, for example, by re-using memory buffers 24 ( 1 )- 24 ( 4 ) for AEs 14 ( 1 )- 14 ( 7 ) and connections 16 ( 1 )- 16 ( 6 ) without affecting the functionality of the signal flow represented in graphical emulator 12 .
  • MLM module 20 may use buffers 24 ( 1 )- 24 ( 4 ) appropriately to reduce the amount of memory used by embodiments of system 10 during processing of AEs 14 ( 1 )- 14 ( 7 ).
  • Schematic 13 is an example, shown merely for ease of illustration, and is not a limitation. Virtually any number of AEs may be connected in any fashion to generate an appropriate schematic using graphical emulator 12 .
  • the schematic may relate to part of an electronic circuit that performs Fast Fourier Transforms (FFTs), audio processing, such as volume control, toning, etc. associated with a programmable DSP.
  • FFTs Fast Fourier Transforms
  • system 10 may be used to generate a target code for implementation on a DSP, such that signals input to the DSP are processed according to the SPA defined by system 10 .
  • Graphical block diagram emulation tools with interactive plotting and visualization capabilities can accelerate DSP signal processing design.
  • Several different methods can be used for graphical DSP programming, such as simulation and systems modeling; limited real-time development on a computer; simulation with subsequent source code generation and final cross-compilation to a DSP; and direct DSP object code generation.
  • Some DSP programming methods use block diagrams for developing DSP applications. The block diagram design is implemented on a host computer and allows the designer to develop the DSP application with or without generating a DSP executable program.
  • Another method for developing a DSP application from a graphical approach is to use sound cards and video cameras that allow limited real-time DSP applications to be constructed and implemented on a computer.
  • Yet another method for DSP programming via graphical means is to use a computer based block diagram, such as the example schematic of FIG. 1 , to construct a DSP algorithm that executes on a host computer. After the DSP algorithm has been constructed and the simulation yields the desired results, the entire block diagram design can be used to generate a target code that implements the simulated design in a specific target (e.g., DSP).
  • a specific target e.g., DSP
  • An example graphical tool for DSP programming is Analog Device SigmaStudioTM.
  • the SigmaStudio graphical development tool can program, develop, and tune certain DSP software.
  • audio processing blocks can be wired together in a schematic, and the compiler can generate DSP-ready code and a control surface for setting and tuning parameters.
  • SigmaStudio includes an extensive library of algorithms including basic low-level DSP functions and control blocks and advanced audio processing functions such as filtering, mixing, and dynamics processing.
  • the AEs available for each processor are displayed in a ‘ToolBox’ and the AE can be dragged and dropped into the schematic.
  • Each AE can contain one or more pins (e.g., input pin, output pin, control pint) to connect the AEs together.
  • Output pins can connect to input pins and vice versa.
  • Algorithms may be added to (e.g., associated with), or removed from (or de-associated from) AEs as appropriate. After the schematic is created, clicking a “compile” button can cause the tool to emulate the signal flow according the user input, and generate the target code.
  • the objective of SPA represented by the schematic using AEs and connections in a CS is to process a finite number of input channels to the various AEs to produce a finite number of output channels.
  • the graphical tool captures the SPA as a signal flow.
  • the complexity of the SPA that can be handled is typically limited by the resources of the target DSP on which the target code is to be run. For example, maximum memory, maximum Central Processing Unit (CPU) Million Instructions Per Second (MIPS) and maximum resource time of the target DSP may limit the maximum complexity that can be handled by a particular computing system.
  • SigmaStudio uses a pointer based linking to manage memory requirements for processing the SPAs.
  • the value of each signal is saved into a distinct memory buffer using a unique pointer.
  • Buffers are used, in general, to pass data to and from processes and store information locally.
  • the memory buffer's life cycle spans the time from when the buffer is created to when it is deleted. If schematic 13 of FIG. 1 were to be processed in such typical graphical tools, the value of input signal to AE 14 ( 1 ) would be saved in a buffer Buff[0], which would be accessed via a pointer Ptr[0].
  • Graphical emulator 12 would process the input signal to AE 14 ( 1 ) according to the associated algorithm, and write the value of the output signal on connection 16 ( 1 ) into another buffer Buff[1], accessible via another pointer Ptr[1].
  • the value saved into Buff[1] would be used as input in the algorithm specified by AE 14 ( 2 ), and the corresponding output signal on connection 16 ( 2 ) would be written to yet another Buff[2], accessible via yet another pointer Ptr[2], and so on.
  • Each signal would be associated with a unique buffer, accessible using a unique pointer.
  • the unique pointer may be derived from adding the size of each buffer successively to a BasePointer value.
  • Ptr[0] may be the same as BasePointer;
  • Ptr[1] may equal the sum of BasePointer and the size of Buff[0];
  • Ptr[2] may equal the sum of BasePointer, Buff[0] and Buff[1]; and so on.
  • an offset buffer may be additionally used, which can contain offsets to the actual buffer. Each offset obtained from the offset buffer is multiplied with the size of the buffer (BlockSize) to give the differential pointer to each buffer.
  • the actual pointer can be obtained by adding the BasePointer to the resulting value.
  • AE j may perform a predefined algorithm A j using m j inputs to produce n j outputs, consuming p j MIPS and engaging r j resources. Focusing on buffer memory alone, AE j can be represented as a function of m j and n j as A j (m j , n j ).
  • A may represent a subset of U (A ⁇ U) indicating the set of algorithms used by the specific schematic being analyzed.
  • the total memory for output buffers M t may be obtained from the following equation.
  • the CS information may be known apriori, before processing any AEs.
  • the CS information and the details of all AEs in the schematic can be used to derive the memory resource readiness and life requirements of the SPA.
  • maximum memory of the computing device (e.g., target DSP) processing the SPA is denoted by MaxMem
  • MaxCpuMips maximum CPU MIPS
  • MaxResTime maximum resource time
  • Each AE j has a finite memory requirement called Element Memory requirement, denoted as EMem_j (e.g., under sub-categories such as state, scratch, Input-Output, external, internal etc.).
  • Each AE j has a finite CPU load requirement denoted as ECpu_j.
  • Each AE j has a finite resource requirement denoted as EResTime_j.
  • the typical graphical tool can convert the signal flow into target code if and only if:
  • MLM module 20 may determine a sequence of operations such that memory buffers 24 ( 1 )- 24 ( 4 ) can be re-used for connections 16 ( 1 )- 16 ( 6 ), while keeping the memory size of buffers M 1t much less than M t (M 1t ⁇ M t ):
  • MLM module 20 may be applicable to scenarios involving external memory overlay, and/or load the task of accelerators and offloaders with Direct Memory Access (DMA) in the background.
  • DMA Direct Memory Access
  • Embodiments of system 10 may determine a connection sequence of a plurality of AEs (e.g., AEs 14 ( 1 )- 14 ( 7 )) in a schematic (e.g., schematic 13 ) of an electronic circuit, where the connection sequence indicates connections between the algorithm elements and a sequence of processing the algorithm elements according to the connections.
  • MLM module 20 may determine a buffer sequence. At least some of the plurality of memory buffers 24 ( 1 )- 24 ( 4 ) may be reused according to the buffer sequence.
  • the term “buffer sequence” includes an order of using the plurality of memory buffers (e.g., 24 ( 1 )- 24 ( 4 )) to process the plurality of algorithm elements (e.g., AEs 14 ( 1 )- 14 ( 7 )) according to the connection sequence.
  • the buffer sequence can comprise a numbered list of memory buffers 24 ( 1 )- 24 ( 4 ), arranged according to a sequence in which each of the outputs from AEs 14 ( 1 )- 14 ( 7 ) is written to each memory buffer, where repeated memory buffer numbers in the buffer sequence indicate buffer reuse of the corresponding memory buffers.
  • buffers 24 ( 1 )- 24 ( 4 ) may be indicated by a buffer sequence ⁇ 0, 1, 2, 3 ⁇ representing, respectively, buffers 24 ( 1 ), 24 ( 2 ), 24 ( 3 ) and 24 ( 4 ) in that order.
  • buffer sequence ⁇ 0, 1, 2, 3, 2, 1, 2 ⁇ may also represent buffers 24 ( 1 ), 24 ( 2 ), 24 ( 3 ) and 24 ( 4 ); additionally, the buffer sequence may indicate that buffers 24 ( 3 ) and 24 ( 2 ) may be reused (e.g., written to more than once) in the order specified in the buffer sequence. Values stored in buffers 24 ( 3 ) and 24 ( 2 ) may be over-written when reused.
  • At least one input is received at the algorithm element and at least one output is generated by the algorithm element.
  • An input algorithm element receives inputs from a user or other signal sources (e.g., analog-to-digital converter, music player, etc.) (i.e., not another AE); an output algorithm element generates outputs that may be displayed on screen, played out on speakers (in the case of audio signals) or sent out of graphical emulator 12 (i.e., not sent out to another AE)).
  • a user or other signal sources e.g., analog-to-digital converter, music player, etc.
  • an output algorithm element generates outputs that may be displayed on screen, played out on speakers (in the case of audio signals) or sent out of graphical emulator 12 (i.e., not sent out to another AE)
  • input on connection 16 ( 1 ) is received at AE 14 ( 2 ) and output on connection 16 ( 2 ) is generated from AE 14 ( 2 ).
  • the output can be another input to another algorithm element.
  • output on connection 16 ( 2 ) may be inputs to AEs 14 ( 4 ) and 14 ( 3 ).
  • One connection may provide the input to the algorithm element, and another connection may accept the output from the algorithm element and provides the output as another input to another algorithm element.
  • the connection providing the input is different from the connection accepting the output; the connection accepting the output can provide the output as inputs to one or more other algorithm elements.
  • Each connection may be associated with one of memory buffers 24 ( 1 )- 24 ( 4 ).
  • the output on connection 16 ( 2 ) may written to the memory buffer, and read as inputs by the AEs 14 ( 4 ) and 14 ( 3 ).
  • connections, the algorithm elements and the memory buffers may be numbered in an order. For each connection, a first algorithm element that generates an output on the connection before any other algorithm element may be identified. Further, for that connection, a second algorithm element that receives the output as an input on the connection after all other algorithm elements may also be identified.
  • the first algorithm elements of all the connections may be arranged in an allocation order.
  • allocation order indicates an order (e.g., ascending, or descending) of the first algorithm element numbers.
  • a buffer index may be generated according to the allocation order for each connection, where the buffer index for the connection may be the same as another buffer index for a re-use connection.
  • a “re-use connection” is a connection whose corresponding memory buffer may be overwritten with output values of another connection.
  • the second algorithm element of the re-use connection may be the same as the first algorithm element of the connection.
  • the buffer sequence can comprise the buffer index for all connections arranged according to the allocation order.
  • Processing of AEs 14 ( 1 )- 14 ( 7 ) may follow the connection sequence, for example, based on the availability of input signals for initiating the applicable algorithms.
  • the algorithm represented by AE 14 ( 1 ) may be processed before the algorithm represented by AE 14 ( 2 ), or AE 14 ( 6 ), as the output signal on connection 16 ( 1 ) from AE 14 ( 1 ) may feed as inputs into AEs 14 ( 2 ) and 14 ( 6 ).
  • processing for AEs 14 ( 2 ), 14 ( 3 ), 14 ( 4 ) and 14 ( 5 ) may have be completed before AE 14 ( 6 ) can be processed, as input signal to AE 14 ( 6 ) on connection 16 ( 5 ) may be obtained only after processing AE 14 ( 3 ), which can be processed only after AE 14 ( 5 ), which in turn can be processed only after AE 14 ( 4 ), and so on. Consequently, the input signal on connection 16 ( 1 ) may be retained in its corresponding buffer until processing of AE 14 ( 6 ). On the other hand, the input signal on connection 16 ( 3 ) is used only for processing AE 14 ( 5 ).
  • the buffer used to store the input signal on connection 16 ( 3 ) may be reused after processing AE 14 ( 5 ), for example, to store the output signal on connection 16 ( 5 ), which serves as the input signal to AE 14 ( 3 ).
  • Reusing memory buffers may reduce the overall memory and other resource requirements, leading to increased ability to process more complicated schematics by embodiments of system 10 .
  • MLM module 20 may construct an MLM comprising a relationship between AEs 14 ( 1 )- 14 ( 7 ) and connections 16 ( 1 )- 16 ( 6 ).
  • the MLM may indicate the sequence of writing to, and reading from, buffers 24 ( 1 )- 24 ( 4 ) as various AEs 14 ( 1 )- 14 ( 7 ) are processed by embodiments of system 10 .
  • the MLM may be manipulated to present a specific sequence of writing to, and reading from, buffers 24 ( 1 )- 24 ( 4 ), such that buffers 24 ( 1 )- 24 ( 4 ) may be re-used during the processing of AEs 14 ( 1 )- 14 ( 7 ), thereby reducing the memory size requirements to merely those buffers that are actively used in parallel to process AEs 14 ( 1 )- 14 ( 7 ).
  • system 10 may be implemented on any suitable computing device (e.g., server, desktop computer, laptop computer, smart phone, etc.) equipped with appropriate hardware (e.g., display screen, monitor, etc.) to facilitate the operations thereof.
  • system 10 may interface with the hardware (e.g., display monitors) to perform the operations described herein.
  • graphical emulator 12 may be rendered on a display screen visible to the user, and may be associated with other hardware (e.g., mouse, joystick, touch-screen, keyboard) through which the user can manipulate schematic 13 appropriately.
  • system 10 may be located on a single device. In other embodiments, system 10 may be distributed across multiple devices on a network, which can include any number of interconnected servers, virtual machines, switches, routers, and other nodes. Elements of FIG. 1 may be coupled to one another through one or more interfaces employing any suitable connection (wired or wireless), which provides a viable pathway for electronic communications. Additionally, any one or more of these elements may be combined or removed from the architecture based on particular configuration needs.
  • system 10 may include applications and hardware that operate together to perform the operations described herein.
  • a portion of system 10 may be implemented in hardware, and another portion may be implemented in software, for example, as an application.
  • an “application” can be inclusive of an executable file comprising instructions that can be understood and processed on a computer, and may further include library modules loaded during execution, object files, system files, hardware logic, software logic, or any other executable modules.
  • graphical emulator 12 may include other interface artifacts (such as drop down menus, windows, multiple pages, etc.) that can facilitate generating schematic 13 according to the user's needs.
  • system 10 may interface with a target device, such as a DSP, to offload a target code generated using features of system 10 .
  • MLM module 20 shown and described herein can also be used in a wide variety of other analytical tools, where a finite number of inputs are processed by AEs connected in a specific CS to generate a finite number of outputs.
  • the computing device implementing MLM module 20 may be of any suitable architecture, including DSPs and other processors.
  • the memory management algorithms implemented by MLM module 20 may be embedded into processors, such as DSPs, as appropriate and based on particular needs.
  • the memory reuse scheme implemented by MLM module 20 may be implemented in a DSP that executes algorithms according to the target code generated by system 10 .
  • memory buffers of the DSP may be reused appropriately as described herein when functional blocks (corresponding to AEs in the respective schematic) process the actual input signals and generate output signals.
  • MLM module 20 may be implemented on a computing device (e.g., computer, smart phone, etc.) that also hosts graphical emulator 20 .
  • the output generated by the computing device may be a target code that enables signal processing by a DSP according to the signal flow captured on graphical emulator 12 . Details of memory buffers 24 to be used by each AE 14 ( 1 )- 14 ( 7 ) may be included in the target code.
  • the computing device may determine the memory to be used by each AE 14 ( 1 )- 14 ( 7 ) according to MLM module 20 running on the computing device.
  • Signal processing using the signal flow captured on graphical emulator 12 may be performed by a target DSP (which may be separate from the computing device) according to the target code generated by the computing device.
  • Memory reuse algorithms by MLM module 20 maybe incorporated into the target code and used to optimize memory use on the target DSP.
  • FIG. 2 is a simplified block diagram illustrating another example schematic 28 processed in an embodiment of system 10 .
  • Example schematic 28 is used herein to explain further aspects of embodiments of system 10 in certain subsequent figures.
  • Schematic 28 includes AEs 14 ( 1 )- 14 ( 10 ) connected by connections 16 ( 1 )- 16 ( 10 ) in a suitable CS to realize a specific SPA.
  • the output signal from AE 14 ( 1 ) over connection 16 ( 1 ) may comprise inputs to AEs 14 ( 3 ) and 14 ( 4 ).
  • the output signal from AE 14 ( 2 ) over connection 16 ( 2 ) may comprise inputs to AEs 14 ( 3 ) and 14 ( 6 ).
  • AE 14 ( 3 ) may thus receive two inputs, and provide three outputs: (i) over connection 16 ( 3 ) to AE 14 ( 7 ); (ii) over connection 16 ( 3 ) to AE 14 ( 5 ); and (iii) over connection 16 ( 8 ) to AE 14 ( 4 ).
  • AE 14 ( 6 ) may also provide an output signal over connection 16 ( 4 ) to AE 14 ( 7 ), which may provide outputs to AEs 14 ( 5 ) and 14 ( 8 ) over connection 16 ( 7 ).
  • AE 14 ( 5 ) may receive inputs from AEs 14 ( 3 ) and 14 ( 7 ) and provide outputs to AEs 14 ( 4 ) and 14 ( 8 ) over connections 16 ( 6 ) and 16 ( 8 ), respectively.
  • AE 14 ( 4 ) may process three input signals from AEs 14 ( 1 ), 14 ( 3 ) and 14 ( 5 ), and provide an output over connection 16 ( 9 ) to AEs 14 ( 8 ) and 14 ( 10 ).
  • AE 14 ( 8 ) may receive three inputs (from AE 14 ( 5 ), 14 ( 7 ) and 14 ( 4 )) and provide an output signal over connection 16 ( 10 ) to AE 14 ( 9 ).
  • FIG. 3 is a simplified diagram illustrating example details of constructing an MLM matrix for schematic 28 according to embodiments of system 10 . It may be noted that the example details presented herein depict an analytical (e.g., logical) approach to constructing the MLM matrix, and does not represent a physical implementation thereof.
  • Schematic 28 comprises 10 AEs 14 ( 1 )- 14 ( 10 ) and 10 connections 16 ( 1 )- 16 ( 10 ).
  • Output AEs 14 ( 9 ) and 14 ( 10 ) do not consume any buffers (e.g., as their outputs are not processed by other AEs), and can be disregarded in generating the MLM matrix.
  • the CS of schematic 28 for purposes of buffer management may be represented as a matrix 30 including 8 rows and 10 columns corresponding to the 8 AEs 14 ( 1 )- 14 ( 8 ) and 10 connections 16 ( 1 )- 16 ( 10 ), respectively. Rows may be named according to AEs 14 ( 1 )- 14 ( 8 ) and columns may be named according to connections 16 ( 1 )- 16 ( 10 ).
  • Matrix 30 may be modified to matrix 32 by marking an ‘x’ in a cell if the corresponding connection to the corresponding AE represents an output and marking an ‘o’ if the corresponding connection to the corresponding AE represents an input.
  • connection 16 ( 1 ) represents an output from AE 14 ( 1 ) and may be represented by an ‘x’ in the cell at the intersection of column 1 and row S1.
  • Connection 16 ( 1 ) also represents an input to AEs 14 ( 3 ) and 14 ( 4 ) and may be represented as ‘o’ in the cells at the intersection of column 1, and S3 and S4, respectively.
  • connection 16 ( 2 ) represents an output from AE 14 ( 2 ) and may be represented by an ‘x’ in the cell at the intersection of column 2 and row S2.
  • Connection 16 ( 2 ) also represents an input to AEs 14 ( 3 ) and 14 ( 6 ) and may be represented as ‘o’ in the cells at the intersection of column 2, and S3 and S6, respectively.
  • the cells in matrix 32 may be appropriately filled according to the CS in schematic 28 .
  • Matrix 32 may be modified to matrix 34 , by changing the order of rows so that in any given column, ‘x’ appears above ‘o’. For example, moving rows S4 and S5 to below S7 in the order ⁇ S1, S2, S3, S6, S7, S5, S4, S8 ⁇ , results in matrix 34 , as shown. The last ‘o’ in each column may be marked to be distinct from others (e.g., by coloring it a different color). Information related to MLM matrix 34 may be extracted into buffers 36 , represented as an ALLOC buffer 38 and a FREE buffer 40 .
  • ALLOC buffer 38 may include the row number corresponding to entry ‘x’ for each column of MLM matrix 34
  • FREE buffer 40 may include the highest row number corresponding to entry ‘o’ in each column of MLM matrix 34
  • Buffers 36 may be modified at 42 by rearranging the columns of ALLOC buffer 38 to ALLOC buffer 44 in an ascending order.
  • Corresponding columns of FREE buffer 42 and MLM 34 may also be rearranged accordingly to obtain FREE buffer 46 and MLM 48 , respectively.
  • FIGS. 4A-4B are simplified diagram illustrating example details of memory reuse operations according to an embodiment of system 10 .
  • a buffer for each Y found in FREE buffer 46 may be freed and a buffer for each Y found in ALLOC buffer 44 may be allocated and assigned to the corresponding connection.
  • a first entry ( 0 , corresponding to AE 14 ( 1 ) over connection 16 ( 1 )) in ALLOC buffer 44 may be checked.
  • a link index 53 may present the connection corresponding to each column of ALLOC buffer 44 .
  • a buffer index 54 may indicate the location of the actual buffer, represented as table 52 in the FIGURE.
  • Link 1 may indicate a value of the signal over connection 16 ( 1 ) of example schematic 28 of FIG. 2 .
  • Link 1 may be saved into BBuff[0], and accessed via the buffer index value of 0 in buffer index 54 .
  • the next entry ( 1 , corresponding to AE 14 ( 2 ) over connection 16 ( 2 )) in ALLOC buffer 44 may be checked.
  • Buffer index 54 may indicate the location of the actual buffer, namely BBuff[1] where Link 2, the value of the signal over connection 16 ( 2 ) of example schematic 28 , may be stored.
  • next two entries (both having values 2, corresponding to AE 14 ( 3 ) over connections 16 ( 3 ) and 16 ( 8 )) in ALLOC buffer 44 may be assigned to buffers BBuff[2] and BBuff[3].
  • Link 3 and Link 8 corresponding to the values over connections 16 ( 3 ) and 16 ( 8 ), respectively, may be stored in respective buffers BBuff[2] and BBuff[3].
  • the next entry in ALLOC buffer 44 is 3 (corresponding to AE( 6 ) over connection 16 ( 4 )), and the same value may be found in FREE buffer 46 corresponding to AE( 6 ) over connection 16 ( 2 ), associated with buffer index 54 having value 1. Consequently, Link 4, the value of connection 16 ( 4 ), may be over-written on the preceding value in BBuff[1], and the corresponding buffer may be reused for connection 16 ( 4 ) at AE( 6 ).
  • the next entry in ALLOC buffer is 4 (corresponding to AE( 7 ) over connection 16 ( 7 )), and the same value may be found in FREE buffer 46 corresponding to AE 14 ( 7 ) over connection 16 ( 4 ), associated with buffer index 54 having value 1. Consequently, Link 7, the value of connection 16 ( 7 ), may be over-written on the preceding value in BBuff[1], and the corresponding buffer may be reused for connection 16 ( 7 ) at AE( 7 ).
  • next entries (both 5, corresponding to AE 14 ( 5 ) over connections 16 ( 5 ) and 16 ( 6 )) in ALLOC buffer 44 may also be found in FREE buffer 46 corresponding to AE 14 ( 5 ) over connection 16 ( 3 ), associated with buffer index 54 having value 2. Consequently, Link 5, the value of connection 16 ( 5 ), may be over-written on the preceding value in BBuff[2], and the corresponding buffer may be reused for connection 16 ( 5 ) at AE( 5 ). Because Link 5 has already been written to BBuff[2], BBuff[2] may not be reused for Link6 simultaneously. Therefore, Link 6 may be written to BBuff[4], and buffer index 54 accordingly updated.
  • ALLOC buffer 44 is 6 (corresponding to AE 14 ( 4 ) over connection 16 ( 9 )), and the same value may be found in FREE buffer 46 corresponding to AE 14 ( 4 ) over connections 16 ( 1 ), 16 ( 8 ) and 16 ( 6 ), associated with buffer index 54 having values 0, 3 and 4, respectively. Consequently, Link 9, the value of connection 16 ( 9 ), may be over-written on any one of the buffers, say BBuff[0], and the other available buffers may be made free (or available) for further reuse.
  • the next entry in ALLOC buffer 47 is 7 (corresponding to AE 14 ( 8 ) over connection 16 ( 10 )), and the same value may be found in FREE buffer 46 corresponding to AE 14 ( 8 ) over connections 16 ( 7 ), 16 ( 5 ) and 16 ( 9 ), associated with buffer index 54 having values 1, 2 and 0, respectively. Consequently, Link 10, the value of connection 16 ( 10 ), may be over-written on any one of the buffers, say BBuff[0], and the other available buffers may be made free (or available) for further reuse.
  • embodiments of system 10 may use merely 4 buffers without sacrificing any performance.
  • FIG. 5 is a simplified flow diagram illustrating example operations that may be associated with embodiments of system 10 .
  • Operations 80 include 82 , at which AEs 14 ( 1 )- 14 ( 8 ) in a graphical simulator schematic (e.g., schematic 28 ) may be determined.
  • a connection sequence between AEs 14 ( 1 )- 14 ( 8 ) may be determined, for example, by identifying connections 16 ( 1 )- 16 ( 10 ) between AEs 14 ( 1 )- 14 ( 8 ).
  • an MLM e.g., MLM 48
  • MLM 48 may include information related to AEs 14 ( 1 )- 14 ( 8 ) and the corresponding CS of schematic 28 .
  • the minimum amount of memory buffers with memory re-use to support algorithm execution by AEs 14 ( 1 )- 14 ( 8 ) may be determined.
  • the memory buffers may be re-used accordingly.
  • FIG. 6 is a simplified flow diagram illustrating example operations that may be associated with embodiments of system 10 .
  • Operations 100 include 102 , at which processing AEs, numbering N, are identified and numbered 0 to N ⁇ 1. Processing AEs include AEs whose output may be written to a memory buffer.
  • the connections may be numbered for all AEs.
  • the MLM may be constructed with N rows and M columns.
  • an ‘o’ may be marked in each cell if the algorithm element corresponding to the row of the cell receives an input on the connection corresponding to the column of the cell.
  • an ‘x’ may be marked in each cell if the algorithm element corresponding to the row of the cell receives an input on the connection corresponding to the column of the cell.
  • rows may be rearranged such that ‘x’ is on the top of every column.
  • columns in the MLM, FREE and ALLOC buffers may be rearranged in an allocation order indicating an ascending order of elements in the ALLOC buffer.
  • each entry (Y) in the ALLOC buffer may be checked.
  • a memory buffer corresponding to the connection index (i.e., link index) for each Y in the ALLOC buffer may be freed.
  • a memory buffer for each Y in the ALLOC buffer may be allocated and assigned to the connection corresponding to the connection for Y.
  • FIG. 7 is a simplified block diagram illustrating example details of an embodiment of system 10 .
  • overlaying includes replacing a block of stored instructions (or data) with another block.
  • Memory overlays can provide support for applications whose entire program instructions and data do not fit in the internal memory of the processor (e.g., processor 26 ).
  • Program instructions and data may be partitioned and stored in off-chip memory until they are required for program execution.
  • the partitions are referred to as memory overlays and the routines that call and execute them as “memory overlay managers.”
  • overlays are a “many to one” memory mapping system. Several overlays may be stored in unique locations in off-chip memory, and they run (or execute) in a common location in on-chip memory.
  • MLM Module 20 may interact with an on-chip memory buffer 128 , which may include an input/output (I/O) memory 130 and a state memory 132 .
  • I/O memory 130 may store input and output values of connections and state memory 132 may store state of AEs being processed by system 10 .
  • a portion of state memory 132 may be offloaded to an off-chip memory 134 , for example, in cases where the target memory is not sufficient to store state for all AEs in the SPA. Appropriate states may be read from and written to off-chip memory 134 before and after processing AEs, respectively.
  • ROM data (e.g., in the form of tables) may be off-loaded to off-chip memory 134 in addition to state memory 132 .
  • ROM data may not be written back to off-chip memory 134 after processing the AEs, for example, because the ROM data table may not be modified by AEs during processing.
  • a memory overlay manager 136 may facilitate overlaying off-chip memory 134 on state memory 132 .
  • overlay manager 136 can be a user defined function responsible for insuring that a function or data within an overlay on off-chip memory 134 is in state memory 132 when the function or data is needed.
  • the transfer of memory between on-chip state memory 132 and off-chip memory 134 can occur using direct memory access (DMA) capability of processor 26 .
  • Overlay manager 136 may also handle more advanced functionality such as checking if the requested overlay is already in run time memory, executing another function while loading an overlay, and tracking recursive overlay function calls.
  • on-chip memory buffer 128 may be integrated with processor 26 on the same semiconductor chip and can include instruction cache, data cache, ROM, on-chip static random access memory (SRAM), and on-chip dynamic random access memory (DRAM).
  • the instruction and data cache may be fast local memory serving an interface between processor 26 and off-chip memory 134 .
  • the on-chip SRAM may be mapped into an address space disjoint from off-chip memory 134 but connected to the same address and data buses. Both the cache and SRAM may allow fast access to their data, whereas access to off-chip memory (e.g., DRAM) 134 may require relatively longer access times.
  • off-chip memory 134 may be effected by processor 26 through a suitable cache in on-chip memory 128 .
  • Off-chip memory 134 may be used in situations with limited on-chip memory.
  • Off-chip memory 134 can include DRAM, flash RAM, SRAM, synchronous dynamic random access memory (SDRAM), hard disk drive, and any other forms of memory elements that may be implemented outside the chip having processor 26 .
  • a portion of on-chip memory buffer 128 may be overlaid with off-chip memory 134 so that effective memory availability can be increased.
  • DMA may be used for moving memory blocks between on-chip memory buffer 128 and off-chip memory 134 according to a sequence based on the MLM generated by MLM module 20 .
  • processor 26 may initiate a read/write transfer, perform other operations while the transfer is in progress, and receive an interrupt from the DMA controller when the transfer is done.
  • the memory transfer may be scheduled in the background (e.g., in parallel with other processing) so that processor wait time can be minimized.
  • Embodiments of system 10 may use memory requirement details and processing time requirements of the different AEs in the SPA being analyzed for placing automatic DMA requests.
  • the processing time of substantially all AEs in the SPA being analyzed may be considered for placing the DMA requests; in other embodiments, the processing time of only certain AEs in the SPA may be considered for placing the DMA requests.
  • Embodiments of system 10 may facilitate increasing effective on-chip memory availability using overlay mechanisms. Due to automatic scheduling of DMAs, memory transfer may be completed in the background, and may increase effective processing power.
  • FIG. 8 is a simplified diagram illustrating example details of an embodiment of system 10 that uses overlay memory management.
  • overlay memory management according to various embodiments of system 10 is explained herein with reference to schematic 13 .
  • a plurality of state buffers e.g., stat1, stat2, stat3, etc.
  • state memory 132 may be created (e.g., in state memory 132 ) in state memory 132 as appropriate.
  • a matrix 140 may be generated by MLM module 20 comprising rows corresponding to AEs 14 ( 1 )- 14 ( 8 ) and columns corresponding to buffers (e.g., stat1, stat2, etc.) in state memory 132 with a ‘x’ indicating creation of a buffer, an ‘o’ indicating a reading of the buffer, and an ‘s’ indicating writing to the buffer.
  • buffers e.g., stat1, stat2, etc.
  • an on-chip buffer (e.g., “stat5”) may be created in state memory 132 , and two disparate off-chip memory overlay buffers (e.g., ostat2 and ostat5 respectively) may be created in off-chip memory 134 for AEs 14 ( 2 ) and 14 ( 5 ).
  • Stat5 may be used (e.g., read from or written to) first by AE ( 2 ).
  • memory overlay manager 136 may post a DMA request, represented by dummy AE 142 (D 1 i ), to save the state recorded in stat5 to ostat2 in off-chip memory 134 so that AE 14 ( 5 ) can also use the same state buffer stat5.
  • dummy AE refers to an AE generated by MLM module 20 , rather than by a user.
  • the dummy AE's purpose includes writing to and reading from memory buffers and the associated algorithm may indicate such functions (e.g., read from memory; write to memory; etc.), rather than any specific electronic component functionality.
  • Memory overlay manager 136 may post another DMA request to fill stat5 with values from ostat5.
  • processing of other AEs can occur substantially simultaneously as dummy AE 142 writes to memory buffer stat5 from off-chip memory ostat5 (e.g., DMA operation may be implemented in the background).
  • stat5 may be used by AE 14 ( 5 ).
  • memory overlay manager 136 may post yet another DMA request, represented by dummy AE 144 (D 1 o ), to save stat5 to ostat5 in off-chip memory 134 so that AE 14 ( 2 ) can also use the same state stat5 in the next processing round (if needed).
  • DMA operation may be implemented in the background
  • processing of other AEs e.g., AEs 14 ( 4 ), 14 ( 8 ) may occur simultaneously (or otherwise).
  • Both AE 14 ( 2 ) and 14 ( 5 ) may use stat5 with help of memory overlay manager 136 .
  • AE 14 ( 2 ) may use an off-chip location ostat2 of size M2;
  • AE 14 ( 5 ) may use anther off-chip location ostat5 of size M5.
  • the memory bandwidth required to transfer a memory block of size M5 may be less than the combined processing times of AEs 14 ( 3 ), 14 ( 6 ) and 14 ( 7 ).
  • dummy AE 142 (D 1 i ) may be positioned (e.g., processed) to bring in stat5 buffer before AE 14 ( 3 ) so that stat5 may be available for using immediately after AE 14 ( 7 ).
  • Dummy AE 144 (D 1 o ) may be positioned to save the state back to off-chip 134 immediately after processing AE 14 ( 5 ).
  • the location of dummy AEs 142 and 144 may be based on the MLM generated by MLM module 20 for the SLA under analysis.
  • the effective on-chip memory size may equal the sum of all on-chip state memory 132 and off-chip memory 134 used for processing the SLA, with a zero wait time for DMA completion.
  • FIG. 9 is a simplified flow diagram illustrating example operations 150 that may be associated with embodiments of system 10 with memory overlay management.
  • Operations 150 may include 152 , at which the MLM for the SLA under analysis may be generated by MLM module 20 .
  • AEs whose state memory uses off-chip memory 134 may be determined.
  • dummy AEs may be generated as appropriate. For example, if a single on-chip state memory buffer is being used, two dummy AEs may be generated; if more than one on-chip state memory buffer is used, additional dummy AEs may be generated as appropriate.
  • buffer sequence of the MLM may be modified to include the dummy AEs based on the transfer time and processing time of AEs in the SLA being analyzed.
  • off-chip memory 134 may be used as desired and based upon particular needs.
  • FIG. 10 is a simplified block diagram illustrating another example embodiment of system 10 that uses processor offloading.
  • Processing offloader 162 may include a processor, a hardware accelerator, or other processing device that can process AEs in SLAs under analysis.
  • processing offloader includes a processor, such as a Central Processing Unit (CPU), service processor, hardware accelerator or other processing device used in addition to (and in conjunction with) another processor.
  • processing offloader 162 may passively receive data from memory and immediately process the data; processing offloader 162 may not actively request data using memory addresses. Thus, data may be pushed to processing offloader 162 , in contrast to processor 26 , which may pull data using appropriate memory addresses.
  • buffers have to be appropriately loaded and ready to be read by processing offloader 162 at the appropriate time.
  • MLM module 20 may interface with processing offloader 162 , memory element 22 and processor 26 .
  • time for transferring data across memory blocks, processing time of AEs, and the buffer sequence may be used to determine whether, and when offloading may be performed.
  • the time for transferring data across memory blocks may depend on the size of the memory; the processing time of AEs may depend on the specific algorithm used therein; and the buffer sequence may be determined from the MLM generated by MLM module 20 .
  • Processor 26 may offload processing of certain AEs to processing offloader 162 .
  • Processor 26 may read and write registers to control the AE processing using an appropriate port.
  • Processing offloader 162 may read data from appropriate input buffers in memory element 22 and write results to appropriate output buffers in memory element 22 .
  • Performance may be improved by pipelining algorithms appropriately using, for example, suitable DMA post modules.
  • the algorithms may use details of available/allocated processing off-loaders 162 , processing cycle requirement and data transfer overhead, among other parameters.
  • the configuration may result in an effective processing time approximately equal to a sum of the processor MIPS and the processing offloader MIPS with a minimum increase in processor wait for DMA completion.
  • FIG. 11 is a simplified diagram illustrating another example schematic 163 for describing offloading using MLMs according to an embodiment of system 10 .
  • AEs 14 ( 1 )- 14 ( 8 ) are processed by processor 26
  • AE 14 ( 5 ) is processed by processing offloader 162 .
  • AE 14 ( 5 ) may be processed in the background, for example, as processing offloader 162 and processor 26 may run different processes (e.g., AEs) in parallel.
  • Data may be read from memory element 22 , processed, and written back to memory element 22 as appropriate.
  • AE 14 ( 5 ) takes an input from AE 14 ( 3 ), and generates an output each for AE 14 ( 4 ) and AE 14 ( 8 ). Thus, AE 14 ( 5 ) may not be processed until after AE 14 ( 3 ).
  • FIG. 12 is a simplified diagram illustrating example details that may be associated with embodiments of system 10 for processor offloading as applied to schematic 163 .
  • processing offloader 162 takes P5 MIPS to execute the task and the offloading DMA overhead is OH5.
  • a dummy AE 166 (D 2 i ) may be generated and positioned to load the buffer from an interface input buffer to processing offloader 162 , such that state memory (e.g., stat5) is available for processing AE 14 ( 5 ).
  • state memory e.g., stat5
  • AE 14 ( 6 ) and 14 ( 7 ) may be performed substantially simultaneously as processing of AE 14 ( 5 ).
  • Processing of AE 14 ( 5 ) may have to be completed before the outputs from AE 14 ( 5 ) are used by AEs (e.g., AE 14 ( 4 ) and AE 14 ( 8 )) that are subsequently processed according to the connection sequence.
  • Another dummy AE 168 (D 2 o ) may be positioned to read the result buffer from processing offloader 162 to the interface output buffer of AE 14 ( 5 ).
  • output from AE 14 ( 5 ) may be made available suitably to AEs 14 ( 4 ) and 14 ( 8 ) before they are processed according to the connection sequence.
  • Dummy AEs 166 and 168 may be created by MLM module 20 to facilitate the operations described herein, and may not have much other functionalities apart from using memory buffers.
  • FIG. 13 is a simplified flow diagram illustrating example operations that may be associated with embodiments of system 10 that use processor offloading.
  • Operations 170 include 172 , at which MLM may be generated by MLM module 20 for the SPA under analysis.
  • the AEs to be processed using processing offloader 162 may be determined.
  • dummy AEs as appropriate may be generated, based on the number of AEs to be processed by processing offloader 162 . For example, for processing a single AE, two dummy AEs may be generated; for processing more than one AE, additional AEs may be generated as appropriate.
  • buffer sequence of MLM may be modified to include the dummy AEs based on the processing time (and DMA overhead) of the AEs processed by processing offloader 162 .
  • processing offloader 162 may be used as desired.
  • references to various features e.g., elements, structures, modules, components, steps, operations, characteristics, etc.
  • references to various features e.g., elements, structures, modules, components, steps, operations, characteristics, etc.
  • references to various features are intended to mean that any such features are included in one or more embodiments of the present disclosure, but may or may not necessarily be combined in the same embodiments.
  • optically efficient refers to improvements in speed and/or efficiency of a specified outcome and do not purport to indicate that a process for achieving the specified outcome has achieved, or is capable of achieving, an “optimal” or perfectly speedy/perfectly efficient state.
  • the schematics (e.g., schematics 13, 28, 163) shown and described herein are merely examples, and are not limitations of any embodiment of system 10 . Any number of AEs and connections may be included in the schematic within the broad scope of the embodiments.
  • the methods described herein may be implemented in any suitable manner on a computing device (including a DSP or other processor) comprising appropriate processors and memory elements.
  • a computing device including a DSP or other processor
  • the MLM e.g., MLM 48
  • the CS indicating the buffer sequence may be expressed in any suitable arrangement including only rows, only columns, rows and columns arranged in various different patterns, etc.
  • At least some portions of the activities outlined herein may be implemented in software in, for example, MLM module 20 .
  • one or more of these features may be implemented in hardware, provided external to these elements, or consolidated in any appropriate manner to achieve the intended functionality.
  • the various elements e.g., MLM module 20 , graphical emulator 12
  • MLM module 20 may include software (or reciprocating software) that can coordinate in order to achieve the operations as outlined herein.
  • these elements may include any suitable algorithms, hardware, software, components, modules, interfaces, or objects that facilitate the operations thereof.
  • system 10 described and shown herein may also include suitable interfaces for receiving, transmitting, and/or otherwise communicating data or information to hardware components (e.g., computer monitors, display devices) and network devices (e.g., client devices) in a network environment.
  • hardware components e.g., computer monitors, display devices
  • network devices e.g., client devices
  • some of the processors and memory elements associated with the various nodes may be removed, or otherwise consolidated such that a single processor and a single memory element are responsible for certain activities.
  • the arrangements depicted in the FIGURES may be more logical in their representations, whereas a physical architecture may include various permutations, combinations, and/or hybrids of these elements. It is imperative to note that countless possible design configurations can be used to achieve the operational objectives outlined here. Accordingly, the associated infrastructure has a myriad of substitute arrangements, design choices, device possibilities, hardware configurations, software implementations, equipment options, etc.
  • one or more memory elements can store data used for the operations described herein. This includes the memory element being able to store instructions (e.g., software, logic, code, etc.) in non-transitory media, such that the instructions are executed to carry out the activities described in this Specification.
  • a processor can execute any type of instructions associated with the data to achieve the operations detailed herein in this Specification.
  • processors could transform an element or an article (e.g., data) from one state or thing to another state or thing.
  • the activities outlined herein may be implemented with fixed logic or programmable logic (e.g., software/computer instructions executed by a processor) and the elements identified herein could be some type of a programmable processor, programmable digital logic (e.g., a field programmable gate array (FPGA), an erasable programmable read only memory (EPROM), an electrically erasable programmable read only memory (EEPROM)), an ASIC that includes digital logic, software, code, electronic instructions, flash memory, optical disks, CD-ROMs, DVD ROMs, magnetic or optical cards, other types of machine-readable mediums suitable for storing electronic instructions, or any suitable combination thereof.
  • FPGA field programmable gate array
  • EPROM erasable programmable read only memory
  • EEPROM electrically erasable programmable read only memory
  • components in system 10 can include one or more memory elements (e.g., memory element 22 ) for storing information to be used in achieving operations as outlined herein. These devices may further keep information in any suitable type of non-transitory storage medium (e.g., random access memory (RAM), read only memory (ROM), field programmable gate array (FPGA), EPROM, EEPROM, etc.), software, hardware, or in any other suitable component, device, element, or object where appropriate and based on particular needs.
  • RAM random access memory
  • ROM read only memory
  • FPGA field programmable gate array
  • EPROM programmable gate array
  • EEPROM electrically erasable programmable gate array
  • software hardware, or in any other suitable component, device, element, or object where appropriate and based on particular needs.
  • the information being tracked, sent, received, or stored in system 10 could be provided in any database, register, table, cache, queue, control list, or storage structure, based on particular needs and implementations, all of which could be referenced in any suitable time
  • any of the memory items discussed herein should be construed as being encompassed within the broad term ‘memory element.’
  • any of the potential processing elements, modules, and machines described in this Specification should be construed as being encompassed within the broad term ‘processor.’

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Memory System Of A Hierarchy Structure (AREA)
  • Design And Manufacture Of Integrated Circuits (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Logic Circuits (AREA)
US13/691,696 2012-11-30 2012-11-30 System and method for efficient resource management of a signal flow programmed digital signal processor code Active US8711160B1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US13/691,696 US8711160B1 (en) 2012-11-30 2012-11-30 System and method for efficient resource management of a signal flow programmed digital signal processor code

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US13/691,670 US8941674B2 (en) 2012-11-30 2012-11-30 System and method for efficient resource management of a signal flow programmed digital signal processor code
US13/691,696 US8711160B1 (en) 2012-11-30 2012-11-30 System and method for efficient resource management of a signal flow programmed digital signal processor code

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US13/691,670 Continuation US8941674B2 (en) 2012-11-30 2012-11-30 System and method for efficient resource management of a signal flow programmed digital signal processor code

Publications (1)

Publication Number Publication Date
US8711160B1 true US8711160B1 (en) 2014-04-29

Family

ID=49683568

Family Applications (3)

Application Number Title Priority Date Filing Date
US13/691,684 Active US8681166B1 (en) 2012-11-30 2012-11-30 System and method for efficient resource management of a signal flow programmed digital signal processor code
US13/691,696 Active US8711160B1 (en) 2012-11-30 2012-11-30 System and method for efficient resource management of a signal flow programmed digital signal processor code
US13/691,670 Active 2033-04-27 US8941674B2 (en) 2012-11-30 2012-11-30 System and method for efficient resource management of a signal flow programmed digital signal processor code

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US13/691,684 Active US8681166B1 (en) 2012-11-30 2012-11-30 System and method for efficient resource management of a signal flow programmed digital signal processor code

Family Applications After (1)

Application Number Title Priority Date Filing Date
US13/691,670 Active 2033-04-27 US8941674B2 (en) 2012-11-30 2012-11-30 System and method for efficient resource management of a signal flow programmed digital signal processor code

Country Status (4)

Country Link
US (3) US8681166B1 (zh)
EP (1) EP2738675B1 (zh)
KR (1) KR101715986B1 (zh)
CN (1) CN103870335B (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8941674B2 (en) 2012-11-30 2015-01-27 Analog Devices, Inc. System and method for efficient resource management of a signal flow programmed digital signal processor code

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9697005B2 (en) 2013-12-04 2017-07-04 Analog Devices, Inc. Thread offset counter
FR3026945B1 (fr) * 2014-10-10 2017-12-15 Oreal Composition cosmetique de revetement des fibres keratiniques
KR102581470B1 (ko) * 2017-11-22 2023-09-21 삼성전자주식회사 영상 데이터를 처리하는 방법 및 장치
US10824927B1 (en) * 2018-09-21 2020-11-03 Enernet Global, LLC Systems, methods and computer readable medium for management of data buffers using functional paradigm
DE102019211856A1 (de) * 2019-08-07 2021-02-11 Continental Automotive Gmbh Datenstruktur, Steuerungssystem zum Einlesen einer solchen Datenstruktur und Verfahren
CN112163184B (zh) * 2020-09-02 2024-06-25 深聪半导体(江苏)有限公司 一种实现fft的装置及方法

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050097140A1 (en) * 2002-03-22 2005-05-05 Patrik Jarl Method for processing data streams divided into a plurality of process steps

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU4219693A (en) * 1992-09-30 1994-04-14 Apple Computer, Inc. Inter-task buffer and connections
US20080147915A1 (en) * 2006-09-29 2008-06-19 Alexander Kleymenov Management of memory buffers for computer programs
US8341604B2 (en) * 2006-11-15 2012-12-25 Qualcomm Incorporated Embedded trace macrocell for enhanced digital signal processor debugging operations
US8681166B1 (en) 2012-11-30 2014-03-25 Analog Devices, Inc. System and method for efficient resource management of a signal flow programmed digital signal processor code

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050097140A1 (en) * 2002-03-22 2005-05-05 Patrik Jarl Method for processing data streams divided into a plurality of process steps

Non-Patent Citations (18)

* Cited by examiner, † Cited by third party
Title
"EE-66 Engineer-to-Engineer Note," Analog Devices, Inc, [retrieved and printed Nov. 29, 2012] http://www.analog.com/static/imported-files/application-notes/EE-66.pdf; 38 pages.
"Sigmastudio Graphical Development Tool," Analog Devices, Inc., [retrieved and printed on Feb. 8, 2012] http://222.analog.com/en/processors-dsp/sigmadsp/products/CU-over-SigmaStudio-graphical-dev-tool-overview/fca.html; 2 pages.
Altera Corporation, "24. DMA Controller Core," Quartus II Handbook Version 9.1, vol. 5: Embedded Peripherals, © Nov. 2009 Altera Corporation. 10 pages.
E. A. Lee and D. G. Messerschmitt, Static Scheduling of Synchronous Data Flow Programs for Digital Signal Processing, IEEE Transactions on Computers, vol. C-36, No. 1, Jan. 1987, pp. 24-35. *
Final Office Action for U.S. Appl. No. 13/ 691,684 mailed Sep. 6, 2013.
Girbal, Sylvain, et al., "A Memory Interface for Multi-Purpose Multi-Stream Accelerators," CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems, Oct. 24-29, 2010, pp. 107-116, © 2010 ACM 978-1-60558-903-Sep. 10, 2010; 9 pages.
Hyperception, Inc., Building DSP Applications via Graphical Design-"Does a picture 'cost' a thousand words?" © Hyperception, Inc., 2001 All Rights Reserved Worldwide; 30 pages.
Lyons, Michael J., et al., "The Accelerator Store Framework for High-performance, Low-power Accelerator-Based Systems," IEEE Computer Architecture Letters, Jul.-Dec. 2010 Issue, 4 pages.
Marchal, P., et al., "SDRAM-Energy-Aware Memory Allocation for Dynamic Multi-Media Applications on Multi-Processor Platforms," Proceeding of the Design, Automation and Test in Europe Conference and Exhibition (Date'03), pp. 1-6.
Notice of Allowance for U.S. Appl. No. 13/691,684 mailed Nov. 22, 2013.
Oh, Hyunok, et al., "Data Memory Minimization by Sharing Large Size Buffers," Design Automation Conference, 2000, pp. 491-496.
Panda, Preeti Ranjan, et al., "On-Chip vs. Off-Chip Memory: The Data Partitioning Problem in Embedded Processor-Based Systems," ACM Transactions on Design Automation of Electronic Systems, vol. 5, No. 3, Jul. 2000, pp. 682-704. © 2000 ACM 1084-4309/00/0070-0682; 23 pages.
Powersoft SigmaStudio User Guide, © 2011 Powersoft, 19 pages.
Response to Final Office Action for U.S. Appl. No. 13/691,684, filed Nov. 5, 2013.
Response to Non-Final Office Action for U.S. Appl. No. 13/691,684, filed Jul. 12, 2013.
U.S. Appl. No. 13/691,670, filed Nov. 30, 2012, entitled "System and Method for Efficient Resource Management of a Signal Flow Programmed Digital Signal Processor Code," Inventor(s) Mohammed Chalil, et al.
U.S. Appl. No. 13/691,684, filed Nov. 30, 2012, entitled "System and Method for Efficient Resource Management of a Signal Flow Programmed Digital Signal Processor Code," Inventor(s) Mohammed Chalil, et al.
USPTO Apr. 26, 2013 Non-Final Office Action from U.S. Appl. No. 13/691,684.

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8941674B2 (en) 2012-11-30 2015-01-27 Analog Devices, Inc. System and method for efficient resource management of a signal flow programmed digital signal processor code

Also Published As

Publication number Publication date
KR20140070493A (ko) 2014-06-10
US8681166B1 (en) 2014-03-25
EP2738675A3 (en) 2016-06-01
EP2738675B1 (en) 2019-03-27
KR101715986B1 (ko) 2017-03-13
CN103870335B (zh) 2017-05-17
EP2738675A2 (en) 2014-06-04
CN103870335A (zh) 2014-06-18
US20140152680A1 (en) 2014-06-05
US8941674B2 (en) 2015-01-27

Similar Documents

Publication Publication Date Title
US8711160B1 (en) System and method for efficient resource management of a signal flow programmed digital signal processor code
CN110515739B (zh) 深度学习神经网络模型负载计算方法、装置、设备及介质
EP3754496B1 (en) Data processing method and related products
US8725486B2 (en) Apparatus and method for simulating a reconfigurable processor
US10761822B1 (en) Synchronization of computation engines with non-blocking instructions
US20210158131A1 (en) Hierarchical partitioning of operators
US11275661B1 (en) Test generation of a distributed system
US11494326B1 (en) Programmable computations in direct memory access engine
US20220188073A1 (en) Data-type-aware clock-gating
CN113313247B (zh) 基于数据流架构的稀疏神经网络的运算方法
US11500802B1 (en) Data replication for accelerator
US11175919B1 (en) Synchronization of concurrent computation engines
CN113886162A (zh) 一种计算设备性能测试方法、计算设备及存储介质
US20130013283A1 (en) Distributed multi-pass microarchitecture simulation
US10922146B1 (en) Synchronization of concurrent computation engines
US10310823B2 (en) Program development support system and program development support software
US11748622B1 (en) Saving intermediate outputs of a neural network
US11372677B1 (en) Efficient scheduling of load instructions
US11468304B1 (en) Synchronizing operations in hardware accelerator
US11275875B2 (en) Co-simulation repeater with former trace data
CN111950219B (zh) 用于实现模拟器的方法、装置、设备以及介质
US11061654B1 (en) Synchronization of concurrent computation engines
US11983128B1 (en) Multidimensional and multiblock tensorized direct memory access descriptors
US11775299B1 (en) Vector clocks for highly concurrent execution engines
US11630667B2 (en) Dedicated vector sub-processor system

Legal Events

Date Code Title Description
AS Assignment

Owner name: ANALOG DEVICES, INC., MASSACHUSETTS

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:CHALIL, MOHAMMED;JOSEPH, JOHN;SIGNING DATES FROM 20121129 TO 20121203;REEL/FRAME:029404/0147

STCF Information on status: patent grant

Free format text: PATENTED CASE

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551)

Year of fee payment: 4

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 8