US20020066088A1 - System and method for software code optimization - Google Patents
System and method for software code optimization Download PDFInfo
- Publication number
- US20020066088A1 US20020066088A1 US09/765,916 US76591601A US2002066088A1 US 20020066088 A1 US20020066088 A1 US 20020066088A1 US 76591601 A US76591601 A US 76591601A US 2002066088 A1 US2002066088 A1 US 2002066088A1
- Authority
- US
- United States
- Prior art keywords
- implementation
- code
- target
- application
- software program
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
- G06F8/41—Compilation
- G06F8/44—Encoding
- G06F8/443—Optimisation
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01R—MEASURING ELECTRIC VARIABLES; MEASURING MAGNETIC VARIABLES
- G01R31/00—Arrangements for testing electric properties; Arrangements for locating electric faults; Arrangements for electrical testing characterised by what is being tested not provided for elsewhere
- G01R31/28—Testing of electronic circuits, e.g. by signal tracer
- G01R31/317—Testing of digital circuits
- G01R31/3181—Functional testing
- G01R31/3183—Generation of test inputs, e.g. test vectors, patterns or sequences
- G01R31/318342—Generation of test inputs, e.g. test vectors, patterns or sequences by preliminary fault modelling, e.g. analysis, simulation
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01R—MEASURING ELECTRIC VARIABLES; MEASURING MAGNETIC VARIABLES
- G01R31/00—Arrangements for testing electric properties; Arrangements for locating electric faults; Arrangements for electrical testing characterised by what is being tested not provided for elsewhere
- G01R31/28—Testing of electronic circuits, e.g. by signal tracer
- G01R31/317—Testing of digital circuits
- G01R31/3181—Functional testing
- G01R31/3183—Generation of test inputs, e.g. test vectors, patterns or sequences
- G01R31/318342—Generation of test inputs, e.g. test vectors, patterns or sequences by preliminary fault modelling, e.g. analysis, simulation
- G01R31/318357—Simulation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/30—Circuit design
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/30—Circuit design
- G06F30/32—Circuit design at the digital level
- G06F30/33—Design verification, e.g. functional simulation or model checking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/30—Circuit design
- G06F30/32—Circuit design at the digital level
- G06F30/33—Design verification, e.g. functional simulation or model checking
- G06F30/3308—Design verification, e.g. functional simulation or model checking using simulation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/30—Circuit design
- G06F30/39—Circuit design at the physical level
Definitions
- the present invention relates to the field of software development, and particularly, to system and methods for software code optimization.
- the present invention in one aspect, provides a systems and methods for optimizing software for execution on a specific host processor.
- a method is provided of optimizing a software program for a target processor in order to meet specific performance objectives, where the software program is coded in a high-level language.
- the method includes the steps of first optimizing the software program in the high-level language, using optimizations that are substantially independent of the target processor to host the application.
- the process preferably successfully exits.
- the method preferably proceeds to a second step.
- the initially optimized form of the software program is again optimized in the high-level language, although target processor-dependent optimizations are used. If the performance objectives are met after completing this second step, then the process preferably terminates. If the performance objectives are not met, then the process proceeds to a third step.
- the twice-optimized software program is optimized using a low-level language of the target processor on key portions of the code, such that although the software implementation becomes target-dependent, it remains relatively portable.
- performance profiles are determined for the intermediate forms of the optimized software program. These performance profiles are then preferably quantitatively compared to the previously defined performance objectives.
- FIG. 1 is a flow diagram generally depicting steps in a process of optimizing a software program for a target processor representing a preferred embodiment of the invention.
- FIG. 2 is a flow diagram depicting a preferred embodiment of a simplified representation of the main steps of the generic implementation process represented as one step in FIG. 1.
- FIG. 3 is a flow diagram depicting a preferred embodiment of detailed steps of the generic implementation process depicted in FIG. 2.
- FIG. 4 is a graph depicting examples of curves of the evolution of a software application with respect to its performance and size in a target-independent optimization process.
- FIG. 5 is a flow diagram depicting a preferred embodiment of a simplified representation of the main steps of the specific implementation process represented as one step in FIG. 1.
- FIG. 6 is a flow diagram depicting a preferred embodiment of detailed steps of the specific implementation process depicted in FIG. 5.
- FIG. 7 is a graph depicting examples of curves of the evolution of a software application with respect to its performance and size in a target-dependent optimization process.
- FIG. 8 is a flow diagram depicting a preferred embodiment of steps of the fully dedicated implementation process represented as one step in FIG. 1.
- each successive basic step generally results in code that is closer to being dedicated to operate on a particular target.
- the optimization process is terminated.
- the evaluation of performance, and thereby, whether the performance meets stated, predefined objectives preferably accounts for several factors.
- a key performance measure is the real-time speed of the application when operating on the specified target.
- Another performance measure is the accuracy and/or quality of the output.
- Another factor that may be integrated into the process evaluation is the binary code size. While the application is made to fit in the target processor's memory, this factor generally is less and less important as memory sizes increase, become smaller, cheaper and less power consuming.
- one major step in the optimization process is to fix the initial constraints that are applied to the development and optimization of the software application.
- constraints are preferably used to quantitatively evaluate the application implementation's performance in an overall sense, and facilitate determining the feasibility of porting the application to a specific target at each of the development stages.
- These measures preferably inherently integrate processing performance characteristics of the target processor, including its clock frequency, which relates to the number cycles available to execute the application.
- Another set of parameters that preferably are calculated is the global I/O data flow to determine if the memory accesses (read/write) are achievable for the specified target.
- This set of parameters integrate elements like the aggregated data flow over the internal and external buses (data exchanges).
- FIG. 1 is a flow diagram generally depicting steps in a process of optimizing a software program for a target processor.
- an optimization process 100 comprises three optimization steps.
- the software for the DSP processor target is written in a high-level language such as C, C++ or Ada.
- the programming language is one that is completely portable between all probable DSP targets.
- optimization techniques particular to the language are preferably used.
- the code optimization in this step 102 preferably does not employ any optimization tools that depend on the processor that is meant to host the application.
- the new implementation Upon completion of this step 102 , the new implementation, as a next step 104 , is evaluated to determine whether the performance goals have been reached. If by completing the step 102 of target-independent optimization the performance objectives are achieved, then the overall optimization process 100 successfully terminates 112 . If the performance requirements for the application have not been achieved, then in a next optimization step 106 , certain portions of the software code are re-implemented in the high-level language to take advantage of the specific processing capabilities of the DSP target.
- the software may remain partially portable for a number of reasons.
- the modified code is preferably selected from a portion of code that is short in terms of lines of source code, but is repeatedly executed and is thus responsible for a relatively significant percentage of the processing overhead.
- the amount of code that must be modified is minimized, and may additionally be flagged in the source file to indicate that it is target-specific code. If the target processor later changes, only these identified portions need be addressed for optimization. Further, the previously unoptimized code, corresponding in functionality to the portion of code that is optimized in this step 106 , may remain coded in the source file.
- This original unoptimized code may be used as a starting point for optimizing the same portion of code of any subsequent target processor.
- Another benefit is that although the coding is specific to the DSP target for the application, the code preferably remains in the high-level language. By remaining in a high-level language (versus being re-coded in a low-level language such as an assembly language), the resulting code is inherently much easier to revisit and comprehend should modifications be necessary.
- software-profiling tools are applied to readily identify the portions of code that fit the criteria required to be preferred candidates for optimization, so that they then can be optimized for the particular DSP target as necessary.
- step 108 the implementation is again evaluated to determine whether the performance objectives on the target processor have been met.
- step 104 if the performance objectives are achieved though the step 102 of target-dependent optimization using the high-level language, then the overall optimization process 100 successfully terminates 112 .
- the optimization process 100 proceeds to a third optimization step 110 .
- the software is configured to be fully dedicated to the architecture and processing benefits of the target processor.
- Various coding techniques that are particular to the target processor for the application may be employed. Some of these techniques include executing instructions in parallel or using any pipeline processing or other specialized processing capabilities.
- the method is performed automatically after the software code has been initially developed in a high-level language.
- the system is provided the performance parameters that are desired for the application, as well as the architectural specification of the target processor. Given these inputs, the system then processes the high-level language source code, compiles and simulates the code's execution, and tests the code against the specified performance requirements. If the performance requirements are not met, the system profiles the code and then optimizes the portions that are the best candidates for optimization.
- the system comprises a software-optimizing processor in conjunction with memory that automatically performs the code profiling operations, code generation operating on portions of code that are determined to be candidates for optimization, and then subsequent performance analysis.
- the software-optimizing processor may comprise any type of computer, and has processing characteristics dependent upon, for example, the processing requirements for the code generation, profiling and performance assessment operations. It may comprise, e.g., a computer, such as a workstation such as are manufactured by Sun Microsystems, a main frame computer, or a personal computer such as the type manufactured by IBM or Apple.
- a computer executing optimization software is preferably used for the software-optimizing processor, due to the utility and flexibility of a computer in programming, modifying software, and observing software performance.
- the software-optimizing processor may be implemented using any type of processor or processors that may perform the code optimization process as described herein.
- processor refers to a wide variety of computational devices or means including, for example, using multiple processors that perform different processing tasks or have the same tasks distributed between processors.
- the processor(s) may be general purpose CPUs or special purpose processors such as are often conventionally used in digital signal processing systems. Further, multiple processors may be implemented in a server-client or other network configuration, as a pipeline array of processors, etc. Some or all of the processing is alternatively implemented with hard-wired circuitry such as an application-specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other logic device.
- ASIC application-specific integrated circuit
- FPGA field programmable gate array
- the term “memory” refers to any storage medium that is accessible to a processor that meets the memory storage needs for a system for optimizing software.
- the memory buffer is random access memory (RAM) that is directly accessed by the software-optimizing processor for ease in manipulating and processing selected portions of data.
- the memory store comprises a hard disk or other nonvolatile memory device or component.
- the term generic means target-independent.
- the high-level language source code normally C ⁇ code
- the portability of the application is maintained and some optimization is integrated into the application at a high level, without using assembly language code.
- FIG. 2 is a flow diagram depicting a simplified representation of the main steps of the generic implementation process 200 represented as step 102 in FIG. 1. Preferred detailed steps of this process 200 are depicted in the task flow diagram of FIG. 3. Preferably, as shown in FIG. 2, there are four main steps that take as input the mathematical theory related to a signal processing algorithm and lead to an implementation that is later used by the specific implementation process.
- the floating-point implementation step takes as input the theoretical solution of a process and transforms the solution into a structured language implementation.
- a main purpose of the step is to be able to reflect as much of the math in the theory into the implementation.
- the floating-point implementation transitions to a fixed-point implementation linked to the precision that the DSP can handle.
- the DSP may need a 16-bit precision implementation.
- typically a group developing the floating-point application is not the group developing the fixed-point implementation.
- the theoretical implementation is made with no consideration of the precision. In that case, the implementation is oriented to processing quality and pushes the precision problem to the fixed-point porting.
- a target precision is involved at an early stage of the development and impacts the quality of the processing. This provides a full precision-oriented implementation. However, this implementation must be entirely redone if the target architecture is changed.
- the implementation architecture is preferably considered. If an implementation involves hundreds of function calls, the real-time execution at the end of the implementation flow is impacted. For this reason, two different steps in the implementation activity are utilized.
- a common method of implementing signal processing is to take the floating-point implementation and port it to a specific target.
- Another method includes porting an existing fixed-point implementation to a new target.
- the mechanisms are quite different because of the availability of a first implementation. In the latter case, it is more an adaptation of an existing application than a new implementation.
- the advantage is that it shortens the development process by reusing the existing code done for another target.
- the goal of processing qualification is to obtain an implementation that preferably provides the best trade-off on the output result for a given precision.
- One of the tools that can accelerate the completion of this step is the CiertoTM signal processing worksystem. This tool provides the capability to validate, compare, and qualify a process with a reference to the floating-point implementation.
- Fixing the derivation criteria depends primarily on the application category. For image processing comparison, information like texture, edge, contrast and distortion is considered. For voice processing, the same elements may be taken into account, but spectral analysis, tone, volume and saturation, etc. may also be considered. Depending on the application domain, the criteria can be completely different. Furthermore, within a given domain, the criteria can change. Radar can be used in military or agricultural activities but the measures made for those two applications of radar image may be quite different.
- the first sizing of the algorithm can be addressed.
- the information gathered includes the real-time data flow, the implementation structure and architecture, the profiling of instructions and cycles, and the performances of the target DSP. These elements help determine if the code can fit inside the target.
- the goal of real-time data flow is to understand the different I/Os related to the algorithm that are to be integrated into the DSP.
- On one level is the global data flow that globally indicates the availability of the raw and the processed data.
- the developer identifies the processing delays that are going to provide a basic characteristic of the application relating to data flow.
- the real behavior of the data coming in and out on the data bus of the system is not necessarily clear.
- the programmer may have to zoom in the elementary time duration (selected for the global data flow representation) to characterize the behavior of the implementation confronted with the interruptions coming from the devices involved in the process.
- This “elementary time duration” can be very different from one application to another. It can be the duration of, for example, an image frame, an image line, an audio frame, or a dedicated time dictated by control software or the processor.
- Application cadence may impact all future decisions for the application. For example, in an interrupt-driven architecture, which is the case in most of the real-time DSP constraint developments, it is then possible to make clear design choices like use of a (first-in first-out) FIFO that will buffer data. This option provides a more flexible way to manage bus I/Os because it allows a better optimization of the bandwidth usage. It is generally a more expensive system design, but it is recommended for processing that involves large amounts of data, like image processing.
- a designer may choose not to use a FIFO. This means that each piece of data produced is either immediately saved or eventually lost. This is the most constrained way of implementing a signal processing application, but is cheaper and well suited for processing that involves little data such as voice processing. This example shows the impact of the application cadence criteria on application development.
- Implementation organization preferably considers the implementation structure, the architecture of the implementation, and the behavior of the implementation.
- the implementation structure generally means that the developer knows the number of functions implemented, the number of times they are called, split if possible into low-level and high-level functions, and so on. This first measure can be made manually or by using tools. One difficulty is identifying a tool that indicates the number of times a function is called. Context switching can be expensive if it occurs too many times. For this purpose, one can use free coverage tools like gprofs that provide part of the necessary information. The use of other tools like Sparcworks (Sun Microsystems) provides the call graph.
- the architecture of the implementation generally means knowing the overall behavior of the application to know if it may be necessary to revisit the algorithm construction to emphasize real-time issues. Given a specific processing algorithm that produces a signal processing development, the requirements can be formalized as follows:
- the first step in evaluating the feasibility of the application includes determining the global data flow to fix the limit of the input/output size and the precision (8, 16 or 32 bits).
- the first output of the data flow indicates if it is possible to sustain the I/Os, but another indication concerns the algorithm structure.
- AD Architectural Delay
- the objective of high-level profiling is to provide a first indication of the number of cycles consumed by the implementation and the binary code size. If necessary, a simulator can also produce an instruction profiling.
- One difficulty is fixing the comparison criteria so that it is known whether the application fits in the targeted DSP.
- the benches are generally provided by the DSP vendors. This means that the cycle counts indicated at a higher level must be correlated with the performances of the target DSP to establish a go/no go process.
- DSP providers provide appropriate benches.
- DSPs are compared with C-written kernel functions including MAC: Multiply accumulate, Vect: Simple vectors multiply, Fir: FIR filter with redundant load elimination, Lat: Lattice synthesis, Iir: IIR filter, Jpeg: JPEG discrete cosine transform
- This implementation is the reference after generic code qualification to be optimized at the C level. Based on this code, one applies several rules concerning the method of implementing the application that fosters processing time reduction.
- a primary objective is to establish a test process that guarantees the integrity of the processing. The goal is to reduce the cycle consumption and not to transform the result of the processing. It is also possible to establish a specific test script to validate the optimization and/or use tools to compare the processing results.
- the script makes easier the run of several tests and allows the programmer to gather information (traces) on the application behavior.
- the tools allow the creation of specific comparisons on the processed data.
- the Cadence Cierto Signal Processing Worksystem (SPW) is capable of such a task and can speed the development cycle.
- optimization preferably uses tricks such as loop reduction, loop merging, test reduction, pointer usage, and in-line functions or macros, to reduce context switching. These tricks are generic and can be used for most if not all high-level languages.
- Another optimization step that can be integrated at this level is development chain optimization by addressing the specific options of the pre-processor, compiler, assembler, and/or linker. This may be useful if the implementation is initially done with the target development environment. Generally, the applications are initially developed on PCs or Workstations. Then, taking advantage of the generic compiler is not useful and can lead to bad decisions in terms of performances and code size.
- the developer assumes that the target DSP is fixed and that a simulator is available. Many C optimizations as are known in the art are possible at this language level.
- at least three parameters are integrated: the global effort in terms of time to integrate a new optimization step, the processing time reduction that can be evaluated, and the code size evolution. These parameters preferably are correlated to the time dedicated to the project and whether or not the application is mandatory to system functionality.
- a goal of these measures is to understand the impact of a modification.
- Another goal is to fix a limit for the different optimization steps in terms of time.
- One rule for example, may be to measure more than five percent of cycle reduction between two steps.
- some instructions to allow the use of DSP-specific characteristics preferably are integrated into the C or other high-level language implementation. Many of the instructions may be addressed by using pragma instructions that are placed in the code to take advantage of caches or internal RAM, loop counters, multiply-accumulate capabilities (MAC), and multiply-subtract capabilities (MSU). Other specific characteristics like splitable ALUs or multipliers, parallel instruction execution, and pipeline effects are addressed in the assembly level. For some DSPs, the only way to use these characteristics is to handle them at the assembly level. Furthermore, this step requires that the developer perform the least amount of tuning on the code to comply with the DSPs features.
- Another task of this stage is to implement the high-level language (e.g., C) code and look at the effect obtained on the generated assembler.
- the goal is not to modify assembly code but to write C code in a way that the assembler part of the compiler generates optimized assembly code.
- the assumption is made that there is some specific C implementations that will impact the generated assembly code in the same way for many compilers.
- the examples are the “do ⁇ while” or the MAC integration. However, this is mainly true for the second and third DSP generations.
- FIG. 5 is a flow diagram depicting a simplified representation of the main steps of the specific implementation process 500 represented as step 106 in FIG. 1. Detailed steps of the specific implementation process are depicted in the task flow diagram of FIG. 6.
- a key objective is to integrate specific pragmas and intrinsics into the code.
- the pragmas allow the use of cache or internal RAM memories and integration of loop counters to optimize loop branches.
- the other aspect of this optimization concerns the implementation modifications that take advantage of the specific capabilities of the target DSP, including multiply-accumulate, multiply-subtract, splittable multiply-add, and post register modification.
- the goal is to generate the assembly code and observe what can be modified in the C implementation that can be translated differently by the compiler.
- Some methods of accomplishing this include, for example, removing code that is not used, avoiding overhead introduced by recursive calls, moving loop invariant expressions out of the loops, and reducing the scope of the variables (using macros integrates this concept naturally).
- the developer preferably encapsulates the specific instructions of a DSP by integrating specific flags related to the target compiler in the code, and by using a source versioning system to handle the various target DSPs. Note that integrating specific flags can validate specific code parts depending on those flags.
- FIG. 7 is a graph depicting examples of curves of the evolution of a software application with respect to its performance and size in a target-dependent optimization process. As shown in FIG. 7, the size decreases slowly because specific points of the application are addressed, but the impact on the performance can be impressive.
- a fully dedicated implementation process is the lowest stage of the development process. Trade-offs on the application are preferably made by removing some processing passes, wherever possible. Assembly-specific optimization is also integrated to finally reach the target performance.
- FIG. 8 is a task flow diagram depicting steps of a dedicated implementation process 800 represented as step 108 in FIG. 1.
- the dedicated implementation process includes two main steps, manual assembly optimization and feature tuning/cutting.
- One work-around is to drop out some specific part of the application that will have little impact on the quality of the processed data.
- functions such as a ring subtraction, a high-pass filter on the input signal, or compression rate could be dropped without a significant loss of performance.
- the gain in terms of performance may not be high, cutting compression rate can suppress enough cycles to reach the target performance.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computer Hardware Design (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Geometry (AREA)
- Software Systems (AREA)
- Design And Manufacture Of Integrated Circuits (AREA)
- Semiconductor Integrated Circuits (AREA)
- Devices For Executing Special Programs (AREA)
Abstract
A method is provided of optimizing a software program for a target processor in order to meet specific performance objectives and yet maintain portability, where the software program is initially coded in a high-level language. The method includes a first step of optimizing the software program in the high-level language, using optimizations that are substantially independent of the target processor to host the application. Preferably, if the performance objectives are met after the completion of this step, then the process preferably successfully terminates. However, if the performance objectives are not met, then the method preferably proceeds to a second step. In the second step, the initially optimized form of the software program is again optimized in the high-level language, although target processor-dependent optimizations are used. If the performance objectives are met after completing this second step, then the process preferably terminates. If the performance objectives are not met, then the process proceeds to a third step. In the third step, the twice-optimized software program is optimized using a low-level language of the target processor on key portions of the code, such that although the software implementation becomes target-dependent, it remains relatively portable.
Description
- This application claims priority to a U.S. Provisional Application entitled “System-on-a-Chip-1,” having Ser. No. 60/216,746 and filed on Jul. 3, 2000, and which is hereby incorporated by reference into this application as though fully set forth herein.
- 1. Field of the Invention
- The present invention relates to the field of software development, and particularly, to system and methods for software code optimization.
- 2. Background
- In the design of software for digital signal processing (DSP) and other applications, programmers take advantage of the low-level but high-speed capabilities that the particular target processor (e.g., DSP processor, microcontroller) offers in order to achieve the performance requirements for the applications. However, the application of these tools early in the development process leads to the development of a program that may be unportable should a different target processor be subsequently used to host the program. The development of code that is not portable from one target processor to another may result in significant redesign and development costs for the same basic application. Often, if the program is portable, the most significant issue is the cost in time and resources to port the application to a new host, such that the performance requirements of the program on the new host are met.
- A need exists therefore for a system and method that minimizes the likelihood that a development process for software will result in a program that is unportable from one target processor to another.
- The present invention, in one aspect, provides a systems and methods for optimizing software for execution on a specific host processor.
- In one embodiment, a method is provided of optimizing a software program for a target processor in order to meet specific performance objectives, where the software program is coded in a high-level language. The method includes the steps of first optimizing the software program in the high-level language, using optimizations that are substantially independent of the target processor to host the application. Preferably, if the performance objectives are met after the completion of this step, then the process preferably successfully exits. Thus, if the performance objectives are not met, then the method preferably proceeds to a second step.
- In the second step, the initially optimized form of the software program is again optimized in the high-level language, although target processor-dependent optimizations are used. If the performance objectives are met after completing this second step, then the process preferably terminates. If the performance objectives are not met, then the process proceeds to a third step.
- In the third step, the twice-optimized software program is optimized using a low-level language of the target processor on key portions of the code, such that although the software implementation becomes target-dependent, it remains relatively portable. Preferably, in evaluating whether the performance objectives have been achieved, performance profiles are determined for the intermediate forms of the optimized software program. These performance profiles are then preferably quantitatively compared to the previously defined performance objectives.
- FIG. 1 is a flow diagram generally depicting steps in a process of optimizing a software program for a target processor representing a preferred embodiment of the invention.
- FIG. 2 is a flow diagram depicting a preferred embodiment of a simplified representation of the main steps of the generic implementation process represented as one step in FIG. 1.
- FIG. 3 is a flow diagram depicting a preferred embodiment of detailed steps of the generic implementation process depicted in FIG. 2.
- FIG. 4 is a graph depicting examples of curves of the evolution of a software application with respect to its performance and size in a target-independent optimization process.
- FIG. 5 is a flow diagram depicting a preferred embodiment of a simplified representation of the main steps of the specific implementation process represented as one step in FIG. 1.
- FIG. 6 is a flow diagram depicting a preferred embodiment of detailed steps of the specific implementation process depicted in FIG. 5.
- FIG. 7 is a graph depicting examples of curves of the evolution of a software application with respect to its performance and size in a target-dependent optimization process.
- FIG. 8 is a flow diagram depicting a preferred embodiment of steps of the fully dedicated implementation process represented as one step in FIG. 1.
- In a preferred embodiment of a software code optimization method (or process) comprising multiple basic steps, each successive basic step generally results in code that is closer to being dedicated to operate on a particular target. Thus, to promote portability, if the performance goals of the application are reached after the completion of any step in the process, then the optimization process is terminated.
- The evaluation of performance, and thereby, whether the performance meets stated, predefined objectives preferably accounts for several factors. Preferably, a key performance measure is the real-time speed of the application when operating on the specified target. Another performance measure is the accuracy and/or quality of the output. Another factor that may be integrated into the process evaluation is the binary code size. While the application is made to fit in the target processor's memory, this factor generally is less and less important as memory sizes increase, become smaller, cheaper and less power consuming.
- Thus, one major step in the optimization process is to fix the initial constraints that are applied to the development and optimization of the software application. These constraints are preferably used to quantitatively evaluate the application implementation's performance in an overall sense, and facilitate determining the feasibility of porting the application to a specific target at each of the development stages. These measures preferably inherently integrate processing performance characteristics of the target processor, including its clock frequency, which relates to the number cycles available to execute the application.
- Another set of parameters that preferably are calculated is the global I/O data flow to determine if the memory accesses (read/write) are achievable for the specified target. This set of parameters integrate elements like the aggregated data flow over the internal and external buses (data exchanges).
- FIG. 1 is a flow diagram generally depicting steps in a process of optimizing a software program for a target processor. In the embodiment represented in FIG. 1, an
optimization process 100 comprises three optimization steps. In afirst optimization step 102, the software for the DSP processor target is written in a high-level language such as C, C++ or Ada. Preferably, the programming language is one that is completely portable between all probable DSP targets. In coding the software, optimization techniques particular to the language are preferably used. The code optimization in thisstep 102 preferably does not employ any optimization tools that depend on the processor that is meant to host the application. - Upon completion of this
step 102, the new implementation, as anext step 104, is evaluated to determine whether the performance goals have been reached. If by completing thestep 102 of target-independent optimization the performance objectives are achieved, then theoverall optimization process 100 successfully terminates 112. If the performance requirements for the application have not been achieved, then in anext optimization step 106, certain portions of the software code are re-implemented in the high-level language to take advantage of the specific processing capabilities of the DSP target. - While the code after this
step 106 is less portable than the code that results after theprevious step 102, the software may remain partially portable for a number of reasons. One reason is that the modified code is preferably selected from a portion of code that is short in terms of lines of source code, but is repeatedly executed and is thus responsible for a relatively significant percentage of the processing overhead. By first modifying the code that fits these criteria, the amount of code that must be modified is minimized, and may additionally be flagged in the source file to indicate that it is target-specific code. If the target processor later changes, only these identified portions need be addressed for optimization. Further, the previously unoptimized code, corresponding in functionality to the portion of code that is optimized in thisstep 106, may remain coded in the source file. This original unoptimized code may be used as a starting point for optimizing the same portion of code of any subsequent target processor. Another benefit is that although the coding is specific to the DSP target for the application, the code preferably remains in the high-level language. By remaining in a high-level language (versus being re-coded in a low-level language such as an assembly language), the resulting code is inherently much easier to revisit and comprehend should modifications be necessary. - Preferably, software-profiling tools are applied to readily identify the portions of code that fit the criteria required to be preferred candidates for optimization, so that they then can be optimized for the particular DSP target as necessary.
- Once a section of code has been optimized for a particular DSP target, other portions of code that meet the criteria to be candidates for optimization may also be optimized for the DSP target, if the performance criteria for the application have still not been achieved.
- If the portions of code that are candidates for optimization have been optimized, then in a
next step 108, the implementation is again evaluated to determine whether the performance objectives on the target processor have been met. As instep 104, if the performance objectives are achieved though thestep 102 of target-dependent optimization using the high-level language, then theoverall optimization process 100 successfully terminates 112. However, if the performance goals of the application have not been met, then theoptimization process 100 proceeds to athird optimization step 110. In thisstep 110, the software is configured to be fully dedicated to the architecture and processing benefits of the target processor. Various coding techniques that are particular to the target processor for the application may be employed. Some of these techniques include executing instructions in parallel or using any pipeline processing or other specialized processing capabilities. Further, tradeoffs may be made between performance and throughput in order to meet pre-stated objectives of the application. The result of the process is an efficiently created program for a DSP, microcontroller, or other computing target processor that meets pre-stated performance objectives, and is optimal, or close to optimal, with respect to its portability, thereby minimizing future software development efforts for the same application. - In one embodiment of a system for performing the optimization method, the method is performed automatically after the software code has been initially developed in a high-level language. Preferably, the system is provided the performance parameters that are desired for the application, as well as the architectural specification of the target processor. Given these inputs, the system then processes the high-level language source code, compiles and simulates the code's execution, and tests the code against the specified performance requirements. If the performance requirements are not met, the system profiles the code and then optimizes the portions that are the best candidates for optimization.
- Preferably, the system comprises a software-optimizing processor in conjunction with memory that automatically performs the code profiling operations, code generation operating on portions of code that are determined to be candidates for optimization, and then subsequent performance analysis. The software-optimizing processor may comprise any type of computer, and has processing characteristics dependent upon, for example, the processing requirements for the code generation, profiling and performance assessment operations. It may comprise, e.g., a computer, such as a workstation such as are manufactured by Sun Microsystems, a main frame computer, or a personal computer such as the type manufactured by IBM or Apple. A computer executing optimization software is preferably used for the software-optimizing processor, due to the utility and flexibility of a computer in programming, modifying software, and observing software performance. More generally, the software-optimizing processor may be implemented using any type of processor or processors that may perform the code optimization process as described herein.
- Thus, the term “processor,” in its use herein, refers to a wide variety of computational devices or means including, for example, using multiple processors that perform different processing tasks or have the same tasks distributed between processors. The processor(s) may be general purpose CPUs or special purpose processors such as are often conventionally used in digital signal processing systems. Further, multiple processors may be implemented in a server-client or other network configuration, as a pipeline array of processors, etc. Some or all of the processing is alternatively implemented with hard-wired circuitry such as an application-specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other logic device. In conjunction with the term “processor,” the term “memory” refers to any storage medium that is accessible to a processor that meets the memory storage needs for a system for optimizing software. Preferably, the memory buffer is random access memory (RAM) that is directly accessed by the software-optimizing processor for ease in manipulating and processing selected portions of data. Preferably, the memory store comprises a hard disk or other nonvolatile memory device or component.
- Preferred embodiments of the method for performing each of the basic steps illustrated in FIG. 1 are now provided.
- As used herein, the term generic means target-independent. In the DSP domain, in a target-independent implementation, the high-level language source code, normally C− code, uses no specific function calls to pragma or macros dedicated to the target. With a target-independent implementation, the portability of the application is maintained and some optimization is integrated into the application at a high level, without using assembly language code.
- FIG. 2 is a flow diagram depicting a simplified representation of the main steps of the
generic implementation process 200 represented asstep 102 in FIG. 1. Preferred detailed steps of thisprocess 200 are depicted in the task flow diagram of FIG. 3. Preferably, as shown in FIG. 2, there are four main steps that take as input the mathematical theory related to a signal processing algorithm and lead to an implementation that is later used by the specific implementation process. - The floating-point implementation step takes as input the theoretical solution of a process and transforms the solution into a structured language implementation. A main purpose of the step is to be able to reflect as much of the math in the theory into the implementation.
- Precision in the calculations is important in the floating-point implementation. In general cases, those applications are done using double-floats and tools like the Cadence® Cierto™ signal processing worksystem or MathLab. Such tools provide representation of the processed data, allow graphical representation and comparison, and extract errors so that an implementation can be qualified.
- For an integer DSP, the floating-point implementation transitions to a fixed-point implementation linked to the precision that the DSP can handle. For example, the DSP may need a 16-bit precision implementation. However, typically a group developing the floating-point application is not the group developing the fixed-point implementation. This means that there are at least the following two approaches. In the first approach, the theoretical implementation is made with no consideration of the precision. In that case, the implementation is oriented to processing quality and pushes the precision problem to the fixed-point porting. In the second approach, a target precision is involved at an early stage of the development and impacts the quality of the processing. This provides a full precision-oriented implementation. However, this implementation must be entirely redone if the target architecture is changed. Details regarding floating point formats and related issues in terms of implementation are provided in several articles, including Morgan, Don, “Practical DSP Modeling, Techniques, and Programming in C,” John Wiley & Sons, 1995, pp. 263-298, and Lasley, P., Bier, J., Sholan, A., and Lee, Edward A., “DSP Processors Fundamentals, Architectures and Features,” IEEE Press Series on Signal Processing, 1997, pp. 1-30, which are hereby incorporated by reference as though fully set forth herein.
- Stage G2: Fixed Point Implementation
- In the stage of deriving the fixed-point implementation, trade-offs relating to precision may be made. The extent of these trade-offs primarily depends on the target DSP capability. If the target processor is a 16-bit precision DSP, the accepted deviation of the output result will be greater than on a 32-bit DSP.
- However, depending on the complexity of the algorithm, another factor, the implementation architecture, is preferably considered. If an implementation involves hundreds of function calls, the real-time execution at the end of the implementation flow is impacted. For this reason, two different steps in the implementation activity are utilized.
- Another consideration at this level is the inheritance. A common method of implementing signal processing is to take the floating-point implementation and port it to a specific target. Another method includes porting an existing fixed-point implementation to a new target. The mechanisms are quite different because of the availability of a first implementation. In the latter case, it is more an adaptation of an existing application than a new implementation. The advantage is that it shortens the development process by reusing the existing code done for another target.
- The goal of processing qualification is to obtain an implementation that preferably provides the best trade-off on the output result for a given precision. One of the tools that can accelerate the completion of this step is the Cierto™ signal processing worksystem. This tool provides the capability to validate, compare, and qualify a process with a reference to the floating-point implementation.
- Fixing the derivation criteria depends primarily on the application category. For image processing comparison, information like texture, edge, contrast and distortion is considered. For voice processing, the same elements may be taken into account, but spectral analysis, tone, volume and saturation, etc. may also be considered. Depending on the application domain, the criteria can be completely different. Furthermore, within a given domain, the criteria can change. Radar can be used in military or agricultural activities but the measures made for those two applications of radar image may be quite different.
- With a qualified algorithm in terms of quality and precision, the first sizing of the algorithm can be addressed. Preferably, the information gathered includes the real-time data flow, the implementation structure and architecture, the profiling of instructions and cycles, and the performances of the target DSP. These elements help determine if the code can fit inside the target.
- The goal of real-time data flow is to understand the different I/Os related to the algorithm that are to be integrated into the DSP. On one level is the global data flow that globally indicates the availability of the raw and the processed data. With the global data flow, the developer identifies the processing delays that are going to provide a basic characteristic of the application relating to data flow.
- However, with the global data flow that corresponds to a simplified representation of the data flow, the real behavior of the data coming in and out on the data bus of the system is not necessarily clear. The programmer may have to zoom in the elementary time duration (selected for the global data flow representation) to characterize the behavior of the implementation confronted with the interruptions coming from the devices involved in the process. This “elementary time duration” can be very different from one application to another. It can be the duration of, for example, an image frame, an image line, an audio frame, or a dedicated time dictated by control software or the processor.
- Another data flow consideration is application cadence. Application cadence may impact all future decisions for the application. For example, in an interrupt-driven architecture, which is the case in most of the real-time DSP constraint developments, it is then possible to make clear design choices like use of a (first-in first-out) FIFO that will buffer data. This option provides a more flexible way to manage bus I/Os because it allows a better optimization of the bandwidth usage. It is generally a more expensive system design, but it is recommended for processing that involves large amounts of data, like image processing.
- Alternatively, a designer may choose not to use a FIFO. This means that each piece of data produced is either immediately saved or eventually lost. This is the most constrained way of implementing a signal processing application, but is cheaper and well suited for processing that involves little data such as voice processing. This example shows the impact of the application cadence criteria on application development.
- Another factor is bandwidth. Such a study exhaustively integrates external data variables and code fetches. However, a strict and exact representation of the detailed I/Os is generally impossible. All of the representations reflect only the static point of view. A real I/O study preferably indicates that temporal drift impacts the overall bandwidth all along the application processing.
- Also affecting implementation sizing is implementation organization. Implementation organization preferably considers the implementation structure, the architecture of the implementation, and the behavior of the implementation.
- The implementation structure generally means that the developer knows the number of functions implemented, the number of times they are called, split if possible into low-level and high-level functions, and so on. This first measure can be made manually or by using tools. One difficulty is identifying a tool that indicates the number of times a function is called. Context switching can be expensive if it occurs too many times. For this purpose, one can use free coverage tools like gprofs that provide part of the necessary information. The use of other tools like Sparcworks (Sun Microsystems) provides the call graph.
- The architecture of the implementation generally means knowing the overall behavior of the application to know if it may be necessary to revisit the algorithm construction to emphasize real-time issues. Given a specific processing algorithm that produces a signal processing development, the requirements can be formalized as follows:
- 1) Obtain an “elementary” signal sample. This can be an audio value, an entire image, etc.
- 2) Process the sample using the development made.
- The first step in evaluating the feasibility of the application includes determining the global data flow to fix the limit of the input/output size and the precision (8, 16 or 32 bits). The first output of the data flow indicates if it is possible to sustain the I/Os, but another indication concerns the algorithm structure.
- Effectively analyzing the processing flow in conjunction with the data flow can indicate that the some steps of the processing cannot be done before others. In that case, it may be possible to conclude sometime that some delay constraints can not be matched or simply that the algorithm cannot run real-time.
- There are several definitions of the delay. There is the intrinsic delay related to every processing called real processing delay. In a data process, there is always the processing delay needed to perform the data transformation. But there is also the Architectural Delay (AD) related to the structure of the algorithm. In this case, it is related to the algorithm architecture that never allows reducing the architectural delay.
- A majority of applications integrate some computations that use static correspondence tables or lookup tables to transform the signal. However, depending on the calculation results, the tables that are used will not be the same. If, for the same computation, the signal processing uses two different conversion tables that have different sizes, then the application is non-deterministic. Thus, this part of an implementation is preferably clearly identified so that all of the steps that follow result in useful measures that can be accurately correlated with the performance increase of the application.
- The objective of high-level profiling is to provide a first indication of the number of cycles consumed by the implementation and the binary code size. If necessary, a simulator can also produce an instruction profiling.
- One difficulty is fixing the comparison criteria so that it is known whether the application fits in the targeted DSP. However, it is possible to get a ratio based on the different benches realized on the DSP. The benches are generally provided by the DSP vendors. This means that the cycle counts indicated at a higher level must be correlated with the performances of the target DSP to establish a go/no go process.
- As an example, several DSP providers provide appropriate benches. Several DSPs are compared with C-written kernel functions including MAC: Multiply accumulate, Vect: Simple vectors multiply, Fir: FIR filter with redundant load elimination, Lat: Lattice synthesis, Iir: IIR filter, Jpeg: JPEG discrete cosine transform
- From the application point of view, if the programmer takes the average cycle reduction ratio, it is possible to obtain a value of 2.8. From the DSP point of view, one can get a 2.78 gain factor. This is one indicator.
- On one hand, one thing that is not integrated in such benches is the fact that a complete application merges many kinds of functions. This means that the optimization is less efficient for part of the implemented algorithm. Furthermore, the application integrates several function calls that add, in some cases, significant overhead.
- On the other hand, such benches do not assume that the maximum potential of the DSPs is exploited. One preferably measures the gain factor to get effective comparison criteria for the go-no go decision because the developer should go further down in assembly optimization.
- Thus, if the generic C implementation cycle count indicates more than five to six times the number of targeted cycles, the developer may consider that the real-time application is not reachable in a reasonable amount of time.
- This implementation is the reference after generic code qualification to be optimized at the C level. Based on this code, one applies several rules concerning the method of implementing the application that fosters processing time reduction.
- A primary objective is to establish a test process that guarantees the integrity of the processing. The goal is to reduce the cycle consumption and not to transform the result of the processing. It is also possible to establish a specific test script to validate the optimization and/or use tools to compare the processing results.
- The script makes easier the run of several tests and allows the programmer to gather information (traces) on the application behavior. The tools allow the creation of specific comparisons on the processed data. The Cadence Cierto Signal Processing Worksystem (SPW) is capable of such a task and can speed the development cycle.
- For the high-level language implementation, optimization preferably uses tricks such as loop reduction, loop merging, test reduction, pointer usage, and in-line functions or macros, to reduce context switching. These tricks are generic and can be used for most if not all high-level languages.
- Another optimization step that can be integrated at this level is development chain optimization by addressing the specific options of the pre-processor, compiler, assembler, and/or linker. This may be useful if the implementation is initially done with the target development environment. Generally, the applications are initially developed on PCs or Workstations. Then, taking advantage of the generic compiler is not useful and can lead to bad decisions in terms of performances and code size.
- At the implementation level, the developer assumes that the target DSP is fixed and that a simulator is available. Many C optimizations as are known in the art are possible at this language level.
- Each time a specific implementation is globally applied and validated the result is preferably benched, and if possible, fixed to facilitate further optimization. Preferably, at least three parameters are integrated: the global effort in terms of time to integrate a new optimization step, the processing time reduction that can be evaluated, and the code size evolution. These parameters preferably are correlated to the time dedicated to the project and whether or not the application is mandatory to system functionality.
- In most cases, the gain in processing time follows a x−1 law. In-line functions, loop reduction and/or unrolling produce significant gain. Integrating pointers are normally less significant. However, a curve like that presented in FIG. 4, which regroups the measures realized for the generic implementation process, may be obtained.
- A goal of these measures is to understand the impact of a modification. There is no generic rule that can be applied to all the code and all of the applications that reduce the number of cycles. Modifications that appear to optimize cycle count can actually increase it. Another goal is to fix a limit for the different optimization steps in terms of time. One rule, for example, may be to measure more than five percent of cycle reduction between two steps.
- This development measure is necessary to have embedded applications that do not have several Mbytes of memory available on the final system. Experiments have shown that the generic C optimization process considerably increases the code size. A fully dedicated C optimization process generally decreases the size. However, the programmer preferably guarantees that the code size does not exceed the available memory size of the target system.
- In this process, some instructions to allow the use of DSP-specific characteristics preferably are integrated into the C or other high-level language implementation. Many of the instructions may be addressed by using pragma instructions that are placed in the code to take advantage of caches or internal RAM, loop counters, multiply-accumulate capabilities (MAC), and multiply-subtract capabilities (MSU). Other specific characteristics like splitable ALUs or multipliers, parallel instruction execution, and pipeline effects are addressed in the assembly level. For some DSPs, the only way to use these characteristics is to handle them at the assembly level. Furthermore, this step requires that the developer perform the least amount of tuning on the code to comply with the DSPs features.
- Although the pragmas and intrinsics tend to detract from the portability, those parts of the code may be encapsulated and isolated. With the use of “#if-define” or other such conditional compiling flags, target compiler dependent flags can be integrated into the code so that it is possible to recompile the same application for all the targets to be addressed. However, this method of implementation requires a clear and structured versioning system as well as clear coding rules. One of the main issues arises from the need to support more than three or four different targets.
- Another task of this stage is to implement the high-level language (e.g., C) code and look at the effect obtained on the generated assembler. The goal is not to modify assembly code but to write C code in a way that the assembler part of the compiler generates optimized assembly code. The assumption is made that there is some specific C implementations that will impact the generated assembly code in the same way for many compilers. The examples are the “do {} while” or the MAC integration. However, this is mainly true for the second and third DSP generations. One can also use the example of the post-register modification. If the developer has realized a conversion of the implementation to integrate pointers, the position in the code of the pointer increment automatically generates or not the post-register modification in the assembly.
- FIG. 5 is a flow diagram depicting a simplified representation of the main steps of the
specific implementation process 500 represented asstep 106 in FIG. 1. Detailed steps of the specific implementation process are depicted in the task flow diagram of FIG. 6. - A key objective is to integrate specific pragmas and intrinsics into the code. The pragmas allow the use of cache or internal RAM memories and integration of loop counters to optimize loop branches. The other aspect of this optimization concerns the implementation modifications that take advantage of the specific capabilities of the target DSP, including multiply-accumulate, multiply-subtract, splittable multiply-add, and post register modification.
- The goal is to generate the assembly code and observe what can be modified in the C implementation that can be translated differently by the compiler.
- Depending on the implementation structure, it may be necessary to tune some specific functions that are used intensively. Some methods of accomplishing this include, for example, removing code that is not used, avoiding overhead introduced by recursive calls, moving loop invariant expressions out of the loops, and reducing the scope of the variables (using macros integrates this concept naturally).
- While the above-mentioned improvements may be viewed as non-portable, they are portable in the sense that the overall architecture of the implementation can be ported and reused. To achieve that, the developer preferably encapsulates the specific instructions of a DSP by integrating specific flags related to the target compiler in the code, and by using a source versioning system to handle the various target DSPs. Note that integrating specific flags can validate specific code parts depending on those flags.
- FIG. 7 is a graph depicting examples of curves of the evolution of a software application with respect to its performance and size in a target-dependent optimization process. As shown in FIG. 7, the size decreases slowly because specific points of the application are addressed, but the impact on the performance can be impressive.
- A fully dedicated implementation process is the lowest stage of the development process. Trade-offs on the application are preferably made by removing some processing passes, wherever possible. Assembly-specific optimization is also integrated to finally reach the target performance.
- A key challenge is to determine after a profiling if the performance goal is reached. FIG. 8 is a task flow diagram depicting steps of a
dedicated implementation process 800 represented asstep 108 in FIG. 1. The dedicated implementation process includes two main steps, manual assembly optimization and feature tuning/cutting. - At this point, very low-level assembly language optimization is integrated. Key characteristics for this implementation generally come from parallel instructions, pipeline effects, and not fully optimized assembly code. Regarding parallel instructions, some DSPs are able to execute several instructions in the same clock cycle. It is possible to execute loads-operation-store in the same instruction. The main objective is to be able to integrate the pipeline effects that affect the availability of the processed data.
- With respect to pipeline effects, mainly in the branch call is it possible to code specific instructions and take advantage of the pipeline delay slots. This optimization can be useful for the loop intensive applications. It is mandatory to handle the parallel instruction optimization.
- For the computationally intensive part of not fully optimized assembly code, it may be necessary to reorganize the generated code and integrate a more optimal way of using accumulators and registers.
- Depending on the capacities of the used DSP, it may be necessary to re-adapt the application because it does not fit. If such a decision is made, then the high-level steps of the process are not adapted or have been neglected. It is necessary to re-evaluate the application behavior in terms of processing, which is normally a high-level task of the process, for example, floating to fixed-point implementation.
- One work-around is to drop out some specific part of the application that will have little impact on the quality of the processed data. For example, in an audio processing application, functions such as a ring subtraction, a high-pass filter on the input signal, or compression rate could be dropped without a significant loss of performance. Although the gain in terms of performance may not be high, cutting compression rate can suppress enough cycles to reach the target performance.
- Reaching this step of the dedicated implementation process may mean that the application has not been evaluated correctly. If this is the case, then optionally, some of the highest process levels may be re-addressed.
- While preferred embodiments of the invention have been described herein, and are further explained in the accompanying materials, many variations are possible which remain within the concept and scope of the invention. Such variations would become clear to one of ordinary skill in the art after inspection of the specification and the drawings. The invention therefore is not to be restricted except within the spirit and scope of any appended claims.
Claims (3)
1. A method of optimizing a software program for a target processor to meet performance objectives, where the software program is coded in a high-level language, the method comprising the steps of:
(a) optimizing the software program such that a resulting first optimized form of the software program is substantially independent of the target processor and is substantially coded in the high-level language;
(b) optimizing the first optimized form of the software program such that a resulting second optimized form of the software program is substantially dependent on the target processor and is substantially coded in the high-level language; and
(c) optimizing the second optimized form of the software program such that a resulting third optimized form of the software program is substantially dependent on the target processor and is includes portions coded in a low-level language of the target processor.
2. The method of claim 1 , further comprising steps of:
(a1) determining a first performance profile for the first optimized form of the software program, and comparing the first performance profile with the performance objectives; and
(b1) determining a second performance profile for the second optimized form of the software program, and comparing the second performance profile with the performance objectives.
3. The method of claim 2 , wherein steps (b), (b1), and (c) are not performed if the performance objectives are met after completing step (a), and step (c) is not performed if the performance objectives are met after completing step (b).
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US09/765,916 US20020066088A1 (en) | 2000-07-03 | 2001-01-18 | System and method for software code optimization |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US21674600P | 2000-07-03 | 2000-07-03 | |
US09/765,916 US20020066088A1 (en) | 2000-07-03 | 2001-01-18 | System and method for software code optimization |
Publications (1)
Publication Number | Publication Date |
---|---|
US20020066088A1 true US20020066088A1 (en) | 2002-05-30 |
Family
ID=22808341
Family Applications (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US09/765,916 Abandoned US20020066088A1 (en) | 2000-07-03 | 2001-01-18 | System and method for software code optimization |
US09/765,917 Expired - Fee Related US7100124B2 (en) | 2000-07-03 | 2001-01-18 | Interface configurable for use with target/initiator signals |
US11/408,858 Expired - Lifetime US7594205B2 (en) | 2000-07-03 | 2006-04-21 | Interface configurable for use with target/initiator signals |
Family Applications After (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US09/765,917 Expired - Fee Related US7100124B2 (en) | 2000-07-03 | 2001-01-18 | Interface configurable for use with target/initiator signals |
US11/408,858 Expired - Lifetime US7594205B2 (en) | 2000-07-03 | 2006-04-21 | Interface configurable for use with target/initiator signals |
Country Status (3)
Country | Link |
---|---|
US (3) | US20020066088A1 (en) |
EP (1) | EP1299826A1 (en) |
WO (1) | WO2002005144A1 (en) |
Cited By (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030204386A1 (en) * | 2002-04-24 | 2003-10-30 | Glenn Colon-Bonet | Class-based system for circuit modeling |
US20040006761A1 (en) * | 2002-07-05 | 2004-01-08 | Anand Minakshisundaran B. | System and method for automating generation of an automated sensor network |
US20050289519A1 (en) * | 2004-06-24 | 2005-12-29 | Apple Computer, Inc. | Fast approximation functions for image processing filters |
EP1648113A2 (en) * | 2004-10-14 | 2006-04-19 | Agilent Technologies, Inc. - a Delaware corporation - | Probe apparatus and method therefor |
US20060143601A1 (en) * | 2004-12-28 | 2006-06-29 | International Business Machines Corporation | Runtime optimizing applications for a target system from within a deployment server |
US20070061784A1 (en) * | 2005-09-09 | 2007-03-15 | Sun Microsystems, Inc. | Automatic code tuning |
US20070061785A1 (en) * | 2005-09-09 | 2007-03-15 | Sun Microsystems, Inc. | Web-based code tuning service |
US20070074008A1 (en) * | 2005-09-28 | 2007-03-29 | Donofrio David D | Mixed mode floating-point pipeline with extended functions |
US20070234147A1 (en) * | 2006-01-11 | 2007-10-04 | Tsuyoshi Nakamura | Circuit analysis device |
US20090037824A1 (en) * | 2007-07-30 | 2009-02-05 | Oracle International Corporation | Simplifying determination of whether application specific parameters are setup for optimal performance of associated applications |
US20090158263A1 (en) * | 2007-12-13 | 2009-06-18 | Alcatel-Lucent | Device and method for automatically optimizing composite applications having orchestrated activities |
US20090293051A1 (en) * | 2008-05-22 | 2009-11-26 | Fortinet, Inc., A Delaware Corporation | Monitoring and dynamic tuning of target system performance |
US20110231813A1 (en) * | 2010-03-19 | 2011-09-22 | Seo Sun Ae | Apparatus and method for on-demand optimization of applications |
US20130290936A1 (en) * | 2012-04-30 | 2013-10-31 | Nec Laboratories America, Inc. | Method and System for Correlated Tracing with Automated Multi-Layer Function Instrumentation Localization |
US20130346953A1 (en) * | 2012-06-22 | 2013-12-26 | Altera Corporation | Opencl compilation |
US8689194B1 (en) * | 2007-08-20 | 2014-04-01 | The Mathworks, Inc. | Optimization identification |
US9329846B1 (en) * | 2009-11-25 | 2016-05-03 | Parakinetics Inc. | Cooperative program code transformation |
US20220121677A1 (en) * | 2019-06-25 | 2022-04-21 | Sisense Sf, Inc. | Method for automated query language expansion and indexing |
Families Citing this family (51)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020066088A1 (en) * | 2000-07-03 | 2002-05-30 | Cadence Design Systems, Inc. | System and method for software code optimization |
US6724220B1 (en) | 2000-10-26 | 2004-04-20 | Cyress Semiconductor Corporation | Programmable microcontroller architecture (mixed analog/digital) |
US8160864B1 (en) | 2000-10-26 | 2012-04-17 | Cypress Semiconductor Corporation | In-circuit emulator and pod synchronized boot |
US8176296B2 (en) | 2000-10-26 | 2012-05-08 | Cypress Semiconductor Corporation | Programmable microcontroller architecture |
US7765095B1 (en) | 2000-10-26 | 2010-07-27 | Cypress Semiconductor Corporation | Conditional branching in an in-circuit emulation system |
US8149048B1 (en) | 2000-10-26 | 2012-04-03 | Cypress Semiconductor Corporation | Apparatus and method for programmable power management in a programmable analog circuit block |
US8103496B1 (en) | 2000-10-26 | 2012-01-24 | Cypress Semicondutor Corporation | Breakpoint control in an in-circuit emulation system |
US6605962B2 (en) * | 2001-05-06 | 2003-08-12 | Altera Corporation | PLD architecture for flexible placement of IP function blocks |
US7406674B1 (en) | 2001-10-24 | 2008-07-29 | Cypress Semiconductor Corporation | Method and apparatus for generating microcontroller configuration information |
US8078970B1 (en) | 2001-11-09 | 2011-12-13 | Cypress Semiconductor Corporation | Graphical user interface with user-selectable list-box |
US8042093B1 (en) | 2001-11-15 | 2011-10-18 | Cypress Semiconductor Corporation | System providing automatic source code generation for personalization and parameterization of user modules |
US7774190B1 (en) | 2001-11-19 | 2010-08-10 | Cypress Semiconductor Corporation | Sleep and stall in an in-circuit emulation system |
US6971004B1 (en) | 2001-11-19 | 2005-11-29 | Cypress Semiconductor Corp. | System and method of dynamically reconfiguring a programmable integrated circuit |
US7770113B1 (en) | 2001-11-19 | 2010-08-03 | Cypress Semiconductor Corporation | System and method for dynamically generating a configuration datasheet |
US8069405B1 (en) | 2001-11-19 | 2011-11-29 | Cypress Semiconductor Corporation | User interface for efficiently browsing an electronic document using data-driven tabs |
US7844437B1 (en) | 2001-11-19 | 2010-11-30 | Cypress Semiconductor Corporation | System and method for performing next placements and pruning of disallowed placements for programming an integrated circuit |
US7577726B1 (en) * | 2002-02-07 | 2009-08-18 | Cisco Technology, Inc. | Method for updating a hardware configuration of a networked communications device |
US8103497B1 (en) | 2002-03-28 | 2012-01-24 | Cypress Semiconductor Corporation | External interface for event architecture |
US7308608B1 (en) | 2002-05-01 | 2007-12-11 | Cypress Semiconductor Corporation | Reconfigurable testing system and method |
KR101073479B1 (en) | 2003-05-07 | 2011-10-17 | 코닌클리즈케 필립스 일렉트로닉스 엔.브이. | Processing system and method for transmitting data |
US7584441B2 (en) | 2003-09-19 | 2009-09-01 | Cadence Design Systems, Inc. | Method for generating optimized constraint systems for retimable digital designs |
US7406509B2 (en) * | 2004-01-07 | 2008-07-29 | Network Appliance, Inc. | Dynamic switching of a communication port in a storage system between target and initiator modes |
KR101034494B1 (en) * | 2004-02-11 | 2011-05-17 | 삼성전자주식회사 | Bus system based on open core protocol |
US7295049B1 (en) | 2004-03-25 | 2007-11-13 | Cypress Semiconductor Corporation | Method and circuit for rapid alignment of signals |
US8069436B2 (en) | 2004-08-13 | 2011-11-29 | Cypress Semiconductor Corporation | Providing hardware independence to automate code generation of processing device firmware |
US8286125B2 (en) | 2004-08-13 | 2012-10-09 | Cypress Semiconductor Corporation | Model for a hardware device-independent method of defining embedded firmware for programmable systems |
JP2008512754A (en) * | 2004-09-10 | 2008-04-24 | フリースケール セミコンダクター インコーポレイテッド | Apparatus and method for multiple endian mode bus matching |
US7332976B1 (en) | 2005-02-04 | 2008-02-19 | Cypress Semiconductor Corporation | Poly-phase frequency synthesis oscillator |
US7400183B1 (en) | 2005-05-05 | 2008-07-15 | Cypress Semiconductor Corporation | Voltage controlled oscillator delay cell and method |
US8089461B2 (en) | 2005-06-23 | 2012-01-03 | Cypress Semiconductor Corporation | Touch wake for electronic devices |
US7689736B2 (en) * | 2005-11-07 | 2010-03-30 | Dot Hill Systems Corporation | Method and apparatus for a storage controller to dynamically determine the usage of onboard I/O ports |
US8085067B1 (en) | 2005-12-21 | 2011-12-27 | Cypress Semiconductor Corporation | Differential-to-single ended signal converter circuit and method |
US8067948B2 (en) | 2006-03-27 | 2011-11-29 | Cypress Semiconductor Corporation | Input/output multiplexer bus |
JP5159161B2 (en) * | 2006-06-26 | 2013-03-06 | キヤノン株式会社 | Radiation imaging apparatus, radiation imaging system and control method thereof |
CN100426275C (en) * | 2006-11-21 | 2008-10-15 | 北京中星微电子有限公司 | Bus interface devices and method |
GB0706134D0 (en) * | 2007-03-29 | 2007-05-09 | Nokia Oyj | A modular device component |
US8516025B2 (en) * | 2007-04-17 | 2013-08-20 | Cypress Semiconductor Corporation | Clock driven dynamic datapath chaining |
US8026739B2 (en) | 2007-04-17 | 2011-09-27 | Cypress Semiconductor Corporation | System level interconnect with programmable switching |
US7737724B2 (en) | 2007-04-17 | 2010-06-15 | Cypress Semiconductor Corporation | Universal digital block interconnection and channel routing |
US8092083B2 (en) | 2007-04-17 | 2012-01-10 | Cypress Semiconductor Corporation | Temperature sensor with digital bandgap |
US9564902B2 (en) | 2007-04-17 | 2017-02-07 | Cypress Semiconductor Corporation | Dynamically configurable and re-configurable data path |
US8130025B2 (en) | 2007-04-17 | 2012-03-06 | Cypress Semiconductor Corporation | Numerical band gap |
US8266575B1 (en) | 2007-04-25 | 2012-09-11 | Cypress Semiconductor Corporation | Systems and methods for dynamically reconfiguring a programmable system on a chip |
US8065653B1 (en) | 2007-04-25 | 2011-11-22 | Cypress Semiconductor Corporation | Configuration of programmable IC design elements |
US9720805B1 (en) | 2007-04-25 | 2017-08-01 | Cypress Semiconductor Corporation | System and method for controlling a target device |
US8049569B1 (en) | 2007-09-05 | 2011-11-01 | Cypress Semiconductor Corporation | Circuit and method for improving the accuracy of a crystal-less oscillator having dual-frequency modes |
US9448964B2 (en) | 2009-05-04 | 2016-09-20 | Cypress Semiconductor Corporation | Autonomous control in a programmable system |
US8146027B1 (en) * | 2009-05-07 | 2012-03-27 | Xilinx, Inc. | Creating interfaces for importation of modules into a circuit design |
US8661390B2 (en) * | 2012-02-13 | 2014-02-25 | Chihliang (Eric) Cheng | Method of extracting block binders and an application in block placement for an integrated circuit |
US9727679B2 (en) * | 2014-12-20 | 2017-08-08 | Intel Corporation | System on chip configuration metadata |
US10437946B1 (en) * | 2016-09-01 | 2019-10-08 | Xilinx, Inc. | Using implemented core sources for simulation |
Citations (57)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5450588A (en) * | 1990-02-14 | 1995-09-12 | International Business Machines Corporation | Reducing pipeline delays in compilers by code hoisting |
US5517611A (en) * | 1993-06-04 | 1996-05-14 | Sun Microsystems, Inc. | Floating-point processor for a high performance three dimensional graphics accelerator |
US5517432A (en) * | 1994-01-31 | 1996-05-14 | Sony Corporation Of Japan | Finite state machine transition analyzer |
US5524244A (en) * | 1988-07-11 | 1996-06-04 | Logic Devices, Inc. | System for dividing processing tasks into signal processor and decision-making microprocessor interfacing therewith |
US5539652A (en) * | 1995-02-07 | 1996-07-23 | Hewlett-Packard Company | Method for manufacturing test simulation in electronic circuit design |
US5548761A (en) * | 1993-03-09 | 1996-08-20 | International Business Machines Corporation | Compiler for target machine independent optimization of data movement, ownership transfer and device control |
US5557779A (en) * | 1991-06-10 | 1996-09-17 | Kabushiki Kaisha Toshiba | Method for distributing a clock signal within a semiconductor integrated circuit by minimizing clock skew |
US5577213A (en) * | 1994-06-03 | 1996-11-19 | At&T Global Information Solutions Company | Multi-device adapter card for computer |
US5581669A (en) * | 1992-12-18 | 1996-12-03 | Microsoft Corporation | System and method for peripheral data transfer |
US5596587A (en) * | 1993-03-29 | 1997-01-21 | Teradyne, Inc. | Method and apparatus for preparing in-circuit test vectors |
US5644754A (en) * | 1993-11-22 | 1997-07-01 | Siemens Aktiengesellschaft | Bus controller and electronic device in a system in which several electronic devices are networked |
US5651111A (en) * | 1994-06-07 | 1997-07-22 | Digital Equipment Corporation | Method and apparatus for producing a software test system using complementary code to resolve external dependencies |
US5732209A (en) * | 1995-11-29 | 1998-03-24 | Exponential Technology, Inc. | Self-testing multi-processor die with internal compare points |
US5737234A (en) * | 1991-10-30 | 1998-04-07 | Xilinx Inc | Method of optimizing resource allocation starting from a high level block diagram |
US5761078A (en) * | 1996-03-21 | 1998-06-02 | International Business Machines Corporation | Field programmable gate arrays using semi-hard multicell macros |
US5764498A (en) * | 1997-06-25 | 1998-06-09 | Honeywell Inc. | Electronics assembly formed with a slotted coupling device that absorbs mechanical forces, such as vibration and mechanical shock |
US5774371A (en) * | 1994-08-03 | 1998-06-30 | Matsushita Electric Industrial Co., Ltd. | Semiconductor integrated circuit and layout designing method for the same |
US5784291A (en) * | 1994-12-22 | 1998-07-21 | Texas Instruments, Incorporated | CPU, memory controller, bus bridge integrated circuits, layout structures, system and methods |
US5797013A (en) * | 1995-11-29 | 1998-08-18 | Hewlett-Packard Company | Intelligent loop unrolling |
US5812854A (en) * | 1996-03-18 | 1998-09-22 | International Business Machines Corporation | Mechanism for integrating user-defined instructions with compiler-generated instructions and for optimizing the integrated instruction stream |
US5946488A (en) * | 1997-05-16 | 1999-08-31 | Thnkage Ltd. | Method for selectively and incrementally displaying the results of preprocessing |
US5960186A (en) * | 1995-06-08 | 1999-09-28 | Arm Limited | Digital circuit simulation with data interface scheduling |
US5966537A (en) * | 1997-05-28 | 1999-10-12 | Sun Microsystems, Inc. | Method and apparatus for dynamically optimizing an executable computer program using input data |
US5983303A (en) * | 1997-05-27 | 1999-11-09 | Fusion Micromedia Corporation | Bus arrangements for interconnection of discrete and/or integrated modules in a digital system and associated method |
US6034542A (en) * | 1997-10-14 | 2000-03-07 | Xilinx, Inc. | Bus structure for modularized chip with FPGA modules |
US6067644A (en) * | 1998-04-15 | 2000-05-23 | International Business Machines Corporation | System and method monitoring instruction progress within a processor |
US6102961A (en) * | 1998-05-29 | 2000-08-15 | Cadence Design Systems, Inc. | Method and apparatus for selecting IP Blocks |
US6122690A (en) * | 1997-06-05 | 2000-09-19 | Mentor Graphics Corporation | On-chip bus architecture that is both processor independent and scalable |
US6134606A (en) * | 1997-07-25 | 2000-10-17 | Flashpoint Technology, Inc. | System/method for controlling parameters in hand-held digital camera with selectable parameter scripts, and with command for retrieving camera capabilities and associated permissible parameter values |
US6148432A (en) * | 1997-11-17 | 2000-11-14 | Micron Technology, Inc. | Inserting buffers between modules to limit changes to inter-module signals during ASIC design and synthesis |
US6154873A (en) * | 1997-06-05 | 2000-11-28 | Nec Corporation | Layout designing method and layout designing apparatus |
US6164841A (en) * | 1998-05-04 | 2000-12-26 | Hewlett-Packard Company | Method, apparatus, and product for dynamic software code translation system |
US6230317B1 (en) * | 1997-07-11 | 2001-05-08 | Intel Corporation | Method and apparatus for software pipelining of nested loops |
US6237128B1 (en) * | 1997-10-01 | 2001-05-22 | International Business Machines Corporation | Method and apparatus for enabling parallel layout checking of designing VLSI-chips |
US6247174B1 (en) * | 1998-01-02 | 2001-06-12 | Hewlett-Packard Company | Optimization of source code with embedded machine instructions |
US6260175B1 (en) * | 1997-03-07 | 2001-07-10 | Lsi Logic Corporation | Method for designing an integrated circuit using predefined and preverified core modules having prebalanced clock trees |
US6269467B1 (en) * | 1998-09-30 | 2001-07-31 | Cadence Design Systems, Inc. | Block based design methodology |
US6305001B1 (en) * | 1998-06-18 | 2001-10-16 | Lsi Logic Corporation | Clock distribution network planning and method therefor |
US6311313B1 (en) * | 1998-12-29 | 2001-10-30 | International Business Machines Corporation | X-Y grid tree clock distribution network with tunable tree and grid networks |
US6327696B1 (en) * | 1998-05-05 | 2001-12-04 | Lsi Logic Corporation | Method and apparatus for zero skew routing from a fixed H trunk |
US6345384B1 (en) * | 1998-04-22 | 2002-02-05 | Kabushiki Kaisha Toshiba | Optimized program code generator, a method for compiling a source text and a computer-readable medium for a processor capable of operating with a plurality of instruction sets |
US6347395B1 (en) * | 1998-12-18 | 2002-02-12 | Koninklijke Philips Electronics N.V. (Kpenv) | Method and arrangement for rapid silicon prototyping |
US6367060B1 (en) * | 1999-06-18 | 2002-04-02 | C. K. Cheng | Method and apparatus for clock tree solution synthesis based on design constraints |
US6367051B1 (en) * | 1998-06-12 | 2002-04-02 | Monterey Design Systems, Inc. | System and method for concurrent buffer insertion and placement of logic gates |
US20020100029A1 (en) * | 2000-07-20 | 2002-07-25 | Matt Bowen | System, method and article of manufacture for compiling and invoking C functions in hardware |
US6477691B1 (en) * | 2000-04-03 | 2002-11-05 | International Business Machines Corporation | Methods and arrangements for automatic synthesis of systems-on-chip |
US20030005419A1 (en) * | 1999-10-12 | 2003-01-02 | John Samuel Pieper | Insertion of prefetch instructions into computer program code |
US6622300B1 (en) * | 1999-04-21 | 2003-09-16 | Hewlett-Packard Development Company, L.P. | Dynamic optimization of computer programs using code-rewriting kernal module |
US6643630B1 (en) * | 2000-04-13 | 2003-11-04 | Koninklijke Philips Electronics N.V. | Apparatus and method for annotating an intermediate representation of an application source code |
US6654952B1 (en) * | 2000-02-03 | 2003-11-25 | Sun Microsystems, Inc. | Region based optimizations using data dependence graphs |
US6701474B2 (en) * | 2000-06-28 | 2004-03-02 | Cadence Design Systems, Inc. | System and method for testing integrated circuits |
US20040068707A1 (en) * | 2002-10-03 | 2004-04-08 | International Business Machines Corporation | System on a chip bus with automatic pipeline stage insertion for timing closure |
US6738967B1 (en) * | 2000-03-14 | 2004-05-18 | Microsoft Corporation | Compiling for multiple virtual machines targeting different processor architectures |
US6751723B1 (en) * | 2000-09-02 | 2004-06-15 | Actel Corporation | Field programmable gate array and microcontroller system-on-a-chip |
US6845489B1 (en) * | 1999-04-30 | 2005-01-18 | Matsushita Electric Industrial Co., Ltd. | Database for design of integrated circuit device and method for designing integrated circuit device |
US7072817B1 (en) * | 1999-10-01 | 2006-07-04 | Stmicroelectronics Ltd. | Method of designing an initiator in an integrated circuit |
US7100124B2 (en) * | 2000-07-03 | 2006-08-29 | Cadence Design Systems, Inc. | Interface configurable for use with target/initiator signals |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4872169A (en) | 1987-03-06 | 1989-10-03 | Texas Instruments Incorporated | Hierarchical scan selection |
US5838583A (en) | 1996-04-12 | 1998-11-17 | Cadence Design Systems, Inc. | Optimized placement and routing of datapaths |
US6067409A (en) | 1996-06-28 | 2000-05-23 | Lsi Logic Corporation | Advanced modular cell placement system |
US6286128B1 (en) | 1998-02-11 | 2001-09-04 | Monterey Design Systems, Inc. | Method for design optimization using logical and physical information |
US6311302B1 (en) | 1999-04-01 | 2001-10-30 | Philips Semiconductor, Inc. | Method and arrangement for hierarchical control of multiple test access port control modules |
US6470486B1 (en) * | 1999-05-26 | 2002-10-22 | Get2Chip | Method for delay-optimizing technology mapping of digital logic |
-
2001
- 2001-01-18 US US09/765,916 patent/US20020066088A1/en not_active Abandoned
- 2001-01-18 EP EP01984186A patent/EP1299826A1/en not_active Ceased
- 2001-01-18 WO PCT/US2001/001820 patent/WO2002005144A1/en active Application Filing
- 2001-01-18 US US09/765,917 patent/US7100124B2/en not_active Expired - Fee Related
-
2006
- 2006-04-21 US US11/408,858 patent/US7594205B2/en not_active Expired - Lifetime
Patent Citations (57)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5524244A (en) * | 1988-07-11 | 1996-06-04 | Logic Devices, Inc. | System for dividing processing tasks into signal processor and decision-making microprocessor interfacing therewith |
US5450588A (en) * | 1990-02-14 | 1995-09-12 | International Business Machines Corporation | Reducing pipeline delays in compilers by code hoisting |
US5557779A (en) * | 1991-06-10 | 1996-09-17 | Kabushiki Kaisha Toshiba | Method for distributing a clock signal within a semiconductor integrated circuit by minimizing clock skew |
US5737234A (en) * | 1991-10-30 | 1998-04-07 | Xilinx Inc | Method of optimizing resource allocation starting from a high level block diagram |
US5581669A (en) * | 1992-12-18 | 1996-12-03 | Microsoft Corporation | System and method for peripheral data transfer |
US5548761A (en) * | 1993-03-09 | 1996-08-20 | International Business Machines Corporation | Compiler for target machine independent optimization of data movement, ownership transfer and device control |
US5596587A (en) * | 1993-03-29 | 1997-01-21 | Teradyne, Inc. | Method and apparatus for preparing in-circuit test vectors |
US5517611A (en) * | 1993-06-04 | 1996-05-14 | Sun Microsystems, Inc. | Floating-point processor for a high performance three dimensional graphics accelerator |
US5644754A (en) * | 1993-11-22 | 1997-07-01 | Siemens Aktiengesellschaft | Bus controller and electronic device in a system in which several electronic devices are networked |
US5517432A (en) * | 1994-01-31 | 1996-05-14 | Sony Corporation Of Japan | Finite state machine transition analyzer |
US5577213A (en) * | 1994-06-03 | 1996-11-19 | At&T Global Information Solutions Company | Multi-device adapter card for computer |
US5651111A (en) * | 1994-06-07 | 1997-07-22 | Digital Equipment Corporation | Method and apparatus for producing a software test system using complementary code to resolve external dependencies |
US5774371A (en) * | 1994-08-03 | 1998-06-30 | Matsushita Electric Industrial Co., Ltd. | Semiconductor integrated circuit and layout designing method for the same |
US5784291A (en) * | 1994-12-22 | 1998-07-21 | Texas Instruments, Incorporated | CPU, memory controller, bus bridge integrated circuits, layout structures, system and methods |
US5539652A (en) * | 1995-02-07 | 1996-07-23 | Hewlett-Packard Company | Method for manufacturing test simulation in electronic circuit design |
US5960186A (en) * | 1995-06-08 | 1999-09-28 | Arm Limited | Digital circuit simulation with data interface scheduling |
US5797013A (en) * | 1995-11-29 | 1998-08-18 | Hewlett-Packard Company | Intelligent loop unrolling |
US5732209A (en) * | 1995-11-29 | 1998-03-24 | Exponential Technology, Inc. | Self-testing multi-processor die with internal compare points |
US5812854A (en) * | 1996-03-18 | 1998-09-22 | International Business Machines Corporation | Mechanism for integrating user-defined instructions with compiler-generated instructions and for optimizing the integrated instruction stream |
US5761078A (en) * | 1996-03-21 | 1998-06-02 | International Business Machines Corporation | Field programmable gate arrays using semi-hard multicell macros |
US6260175B1 (en) * | 1997-03-07 | 2001-07-10 | Lsi Logic Corporation | Method for designing an integrated circuit using predefined and preverified core modules having prebalanced clock trees |
US5946488A (en) * | 1997-05-16 | 1999-08-31 | Thnkage Ltd. | Method for selectively and incrementally displaying the results of preprocessing |
US5983303A (en) * | 1997-05-27 | 1999-11-09 | Fusion Micromedia Corporation | Bus arrangements for interconnection of discrete and/or integrated modules in a digital system and associated method |
US5966537A (en) * | 1997-05-28 | 1999-10-12 | Sun Microsystems, Inc. | Method and apparatus for dynamically optimizing an executable computer program using input data |
US6154873A (en) * | 1997-06-05 | 2000-11-28 | Nec Corporation | Layout designing method and layout designing apparatus |
US6122690A (en) * | 1997-06-05 | 2000-09-19 | Mentor Graphics Corporation | On-chip bus architecture that is both processor independent and scalable |
US5764498A (en) * | 1997-06-25 | 1998-06-09 | Honeywell Inc. | Electronics assembly formed with a slotted coupling device that absorbs mechanical forces, such as vibration and mechanical shock |
US6230317B1 (en) * | 1997-07-11 | 2001-05-08 | Intel Corporation | Method and apparatus for software pipelining of nested loops |
US6134606A (en) * | 1997-07-25 | 2000-10-17 | Flashpoint Technology, Inc. | System/method for controlling parameters in hand-held digital camera with selectable parameter scripts, and with command for retrieving camera capabilities and associated permissible parameter values |
US6237128B1 (en) * | 1997-10-01 | 2001-05-22 | International Business Machines Corporation | Method and apparatus for enabling parallel layout checking of designing VLSI-chips |
US6034542A (en) * | 1997-10-14 | 2000-03-07 | Xilinx, Inc. | Bus structure for modularized chip with FPGA modules |
US6148432A (en) * | 1997-11-17 | 2000-11-14 | Micron Technology, Inc. | Inserting buffers between modules to limit changes to inter-module signals during ASIC design and synthesis |
US6247174B1 (en) * | 1998-01-02 | 2001-06-12 | Hewlett-Packard Company | Optimization of source code with embedded machine instructions |
US6067644A (en) * | 1998-04-15 | 2000-05-23 | International Business Machines Corporation | System and method monitoring instruction progress within a processor |
US6345384B1 (en) * | 1998-04-22 | 2002-02-05 | Kabushiki Kaisha Toshiba | Optimized program code generator, a method for compiling a source text and a computer-readable medium for a processor capable of operating with a plurality of instruction sets |
US6164841A (en) * | 1998-05-04 | 2000-12-26 | Hewlett-Packard Company | Method, apparatus, and product for dynamic software code translation system |
US6327696B1 (en) * | 1998-05-05 | 2001-12-04 | Lsi Logic Corporation | Method and apparatus for zero skew routing from a fixed H trunk |
US6102961A (en) * | 1998-05-29 | 2000-08-15 | Cadence Design Systems, Inc. | Method and apparatus for selecting IP Blocks |
US6367051B1 (en) * | 1998-06-12 | 2002-04-02 | Monterey Design Systems, Inc. | System and method for concurrent buffer insertion and placement of logic gates |
US6305001B1 (en) * | 1998-06-18 | 2001-10-16 | Lsi Logic Corporation | Clock distribution network planning and method therefor |
US6269467B1 (en) * | 1998-09-30 | 2001-07-31 | Cadence Design Systems, Inc. | Block based design methodology |
US6347395B1 (en) * | 1998-12-18 | 2002-02-12 | Koninklijke Philips Electronics N.V. (Kpenv) | Method and arrangement for rapid silicon prototyping |
US6311313B1 (en) * | 1998-12-29 | 2001-10-30 | International Business Machines Corporation | X-Y grid tree clock distribution network with tunable tree and grid networks |
US6622300B1 (en) * | 1999-04-21 | 2003-09-16 | Hewlett-Packard Development Company, L.P. | Dynamic optimization of computer programs using code-rewriting kernal module |
US6845489B1 (en) * | 1999-04-30 | 2005-01-18 | Matsushita Electric Industrial Co., Ltd. | Database for design of integrated circuit device and method for designing integrated circuit device |
US6367060B1 (en) * | 1999-06-18 | 2002-04-02 | C. K. Cheng | Method and apparatus for clock tree solution synthesis based on design constraints |
US7072817B1 (en) * | 1999-10-01 | 2006-07-04 | Stmicroelectronics Ltd. | Method of designing an initiator in an integrated circuit |
US20030005419A1 (en) * | 1999-10-12 | 2003-01-02 | John Samuel Pieper | Insertion of prefetch instructions into computer program code |
US6654952B1 (en) * | 2000-02-03 | 2003-11-25 | Sun Microsystems, Inc. | Region based optimizations using data dependence graphs |
US6738967B1 (en) * | 2000-03-14 | 2004-05-18 | Microsoft Corporation | Compiling for multiple virtual machines targeting different processor architectures |
US6477691B1 (en) * | 2000-04-03 | 2002-11-05 | International Business Machines Corporation | Methods and arrangements for automatic synthesis of systems-on-chip |
US6643630B1 (en) * | 2000-04-13 | 2003-11-04 | Koninklijke Philips Electronics N.V. | Apparatus and method for annotating an intermediate representation of an application source code |
US6701474B2 (en) * | 2000-06-28 | 2004-03-02 | Cadence Design Systems, Inc. | System and method for testing integrated circuits |
US7100124B2 (en) * | 2000-07-03 | 2006-08-29 | Cadence Design Systems, Inc. | Interface configurable for use with target/initiator signals |
US20020100029A1 (en) * | 2000-07-20 | 2002-07-25 | Matt Bowen | System, method and article of manufacture for compiling and invoking C functions in hardware |
US6751723B1 (en) * | 2000-09-02 | 2004-06-15 | Actel Corporation | Field programmable gate array and microcontroller system-on-a-chip |
US20040068707A1 (en) * | 2002-10-03 | 2004-04-08 | International Business Machines Corporation | System on a chip bus with automatic pipeline stage insertion for timing closure |
Cited By (32)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030204386A1 (en) * | 2002-04-24 | 2003-10-30 | Glenn Colon-Bonet | Class-based system for circuit modeling |
US20040006761A1 (en) * | 2002-07-05 | 2004-01-08 | Anand Minakshisundaran B. | System and method for automating generation of an automated sensor network |
US8271937B2 (en) | 2002-07-05 | 2012-09-18 | Cooper Technologies Company | System and method for automating generation of an automated sensor network |
US7346891B2 (en) * | 2002-07-05 | 2008-03-18 | Eka Systems Inc. | System and method for automating generation of an automated sensor network |
US20050289519A1 (en) * | 2004-06-24 | 2005-12-29 | Apple Computer, Inc. | Fast approximation functions for image processing filters |
EP1648113A2 (en) * | 2004-10-14 | 2006-04-19 | Agilent Technologies, Inc. - a Delaware corporation - | Probe apparatus and method therefor |
US20060083179A1 (en) * | 2004-10-14 | 2006-04-20 | Kevin Mitchell | Probe apparatus and metod therefor |
EP1648113A3 (en) * | 2004-10-14 | 2008-06-04 | Agilent Technologies, Inc. | Probe apparatus and method therefor |
US9535679B2 (en) * | 2004-12-28 | 2017-01-03 | International Business Machines Corporation | Dynamically optimizing applications within a deployment server |
US20060143601A1 (en) * | 2004-12-28 | 2006-06-29 | International Business Machines Corporation | Runtime optimizing applications for a target system from within a deployment server |
US7895585B2 (en) * | 2005-09-09 | 2011-02-22 | Oracle America, Inc. | Automatic code tuning |
US20070061785A1 (en) * | 2005-09-09 | 2007-03-15 | Sun Microsystems, Inc. | Web-based code tuning service |
US20070061784A1 (en) * | 2005-09-09 | 2007-03-15 | Sun Microsystems, Inc. | Automatic code tuning |
US20070074008A1 (en) * | 2005-09-28 | 2007-03-29 | Donofrio David D | Mixed mode floating-point pipeline with extended functions |
US20070234147A1 (en) * | 2006-01-11 | 2007-10-04 | Tsuyoshi Nakamura | Circuit analysis device |
US7624362B2 (en) * | 2006-01-11 | 2009-11-24 | Panasonic Corporation | Circuit analysis device using processor information |
US20090037824A1 (en) * | 2007-07-30 | 2009-02-05 | Oracle International Corporation | Simplifying determination of whether application specific parameters are setup for optimal performance of associated applications |
US8572593B2 (en) * | 2007-07-30 | 2013-10-29 | Oracle International Corporation | Simplifying determination of whether application specific parameters are setup for optimal performance of associated applications |
US8689194B1 (en) * | 2007-08-20 | 2014-04-01 | The Mathworks, Inc. | Optimization identification |
US9934004B1 (en) | 2007-08-20 | 2018-04-03 | The Mathworks, Inc. | Optimization identification |
US20090158263A1 (en) * | 2007-12-13 | 2009-06-18 | Alcatel-Lucent | Device and method for automatically optimizing composite applications having orchestrated activities |
US8601454B2 (en) * | 2007-12-13 | 2013-12-03 | Alcatel Lucent | Device and method for automatically optimizing composite applications having orchestrated activities |
US20090293051A1 (en) * | 2008-05-22 | 2009-11-26 | Fortinet, Inc., A Delaware Corporation | Monitoring and dynamic tuning of target system performance |
US9329846B1 (en) * | 2009-11-25 | 2016-05-03 | Parakinetics Inc. | Cooperative program code transformation |
US9383978B2 (en) * | 2010-03-19 | 2016-07-05 | Samsung Electronics Co., Ltd. | Apparatus and method for on-demand optimization of applications |
US20110231813A1 (en) * | 2010-03-19 | 2011-09-22 | Seo Sun Ae | Apparatus and method for on-demand optimization of applications |
US9092568B2 (en) * | 2012-04-30 | 2015-07-28 | Nec Laboratories America, Inc. | Method and system for correlated tracing with automated multi-layer function instrumentation localization |
US20130290936A1 (en) * | 2012-04-30 | 2013-10-31 | Nec Laboratories America, Inc. | Method and System for Correlated Tracing with Automated Multi-Layer Function Instrumentation Localization |
US9134981B2 (en) * | 2012-06-22 | 2015-09-15 | Altera Corporation | OpenCL compilation |
US20130346953A1 (en) * | 2012-06-22 | 2013-12-26 | Altera Corporation | Opencl compilation |
US20220121677A1 (en) * | 2019-06-25 | 2022-04-21 | Sisense Sf, Inc. | Method for automated query language expansion and indexing |
US11954113B2 (en) * | 2019-06-25 | 2024-04-09 | Sisense Sf, Inc. | Method for automated query language expansion and indexing |
Also Published As
Publication number | Publication date |
---|---|
US7100124B2 (en) | 2006-08-29 |
EP1299826A1 (en) | 2003-04-09 |
US20020016706A1 (en) | 2002-02-07 |
US20060230369A1 (en) | 2006-10-12 |
WO2002005144A1 (en) | 2002-01-17 |
US7594205B2 (en) | 2009-09-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20020066088A1 (en) | System and method for software code optimization | |
US6113650A (en) | Compiler for optimization in generating instruction sequence and compiling method | |
Lee et al. | Mediabench: A tool for evaluating and synthesizing multimedia and communications systems | |
US5819064A (en) | Hardware extraction technique for programmable reduced instruction set computers | |
JP4745341B2 (en) | System, method and apparatus for dependency chain processing | |
Menard et al. | Automatic floating-point to fixed-point conversion for DSP code generation | |
US6035123A (en) | Determining hardware complexity of software operations | |
US8387026B1 (en) | Compile-time feedback-directed optimizations using estimated edge profiles from hardware-event sampling | |
Brunie et al. | Code generators for mathematical functions | |
Sandberg et al. | Faster WCET flow analysis by program slicing | |
Hong et al. | An integrated environment for rapid prototyping of DSP Algorithms using matlab and Texas instruments’ TMS320C30 | |
Djoudi et al. | Exploring application performance: a new tool for a static/dynamic approach | |
US6256776B1 (en) | Digital signal processing code development with fixed point and floating point libraries | |
Aamodt et al. | Compile-time and instruction-set methods for improving floating-to fixed-point conversion accuracy | |
EP3997593B1 (en) | A streaming compiler for automatic adjoint differentiation | |
Liveris et al. | A code transformation-based methodology for improving I-cache performance of DSP applications | |
Ayub et al. | PEAL: Probabilistic error analysis methodology for low-power approximate adders | |
Spadini et al. | Characterization of repeating dynamic code fragments | |
Bloch et al. | Performance estimation of high-level dataflow program on heterogeneous platforms | |
Coors et al. | Integer code generation for the TI TMS320C62X | |
Miomandre et al. | Approximate buffers for reducing memory requirements: Case study on SKA | |
Miomandre et al. | Approximate buffers for reducing memory requirements in the ska | |
Aamodt | Floating-point to fixed-point compilation and embedded architectural support | |
Varnagirytė et al. | A practical approach to DSP code optimization using compiler/architecture | |
JPH02176938A (en) | Machine language instruction optimizing system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: CADENCE DESIGN SYSTEMS, INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:CANUT, FREDERIC;DERRAS, MUSTAPHA;REEL/FRAME:012233/0459 Effective date: 20010907 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |