US20080098208A1 - Analyzing and transforming a computer program for executing on asymmetric multiprocessing systems - Google Patents
Analyzing and transforming a computer program for executing on asymmetric multiprocessing systems Download PDFInfo
- Publication number
- US20080098208A1 US20080098208A1 US11/898,360 US89836007A US2008098208A1 US 20080098208 A1 US20080098208 A1 US 20080098208A1 US 89836007 A US89836007 A US 89836007A US 2008098208 A1 US2008098208 A1 US 2008098208A1
- Authority
- US
- United States
- Prior art keywords
- program
- data
- sections
- computer program
- code
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/36—Preventing errors by testing or debugging software
- G06F11/362—Software debugging
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/28—Error detection; Error correction; Monitoring by checking the correct order of processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/22—Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/36—Preventing errors by testing or debugging software
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/36—Preventing errors by testing or debugging software
- G06F11/362—Software debugging
- G06F11/3636—Software debugging by tracing the execution of the program
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3466—Performance evaluation by tracing or monitoring
Definitions
- the field of the invention relates to data processing and in particular to improving the performance of program execution.
- Such complex systems may comprise a number of different processing or execution units and they may be heterogeneous or asymmetric, with specialised processing units being used to increase energy efficiency and lower gate count.
- specialised processing units being used to increase energy efficiency and lower gate count.
- the programming of embedded systems with their hardware restriction, demand for efficiency and the ever decreasing time to market is becoming a real problem.
- Decoupling programs to produce a number of threads communicating via FIFO pipelines has been used many times before: Smith (James E. Smith, “Decoupled access/execute computer architectures”, ACM Transactions Computing Systems, 2(4), 289-308, 1984) applies the technique manually to Cray assembly code; Palacharla and Smith (S. Palacharla and J. E. Smith, “Decoupling integer execution in superscalar processors”, in MICRO 28: Proc. of International Symposium on Microarchitecture, 285-290, 1995) describe the use of program slicing to automate the separation. These uses of decoupling were targeted at hiding memory latency by having one thread perform all load-store operations while the other thread performs all arithmetic operations.
- SoC system on chip
- a first aspect of the present invention provides a method of transforming a portion of a computer program comprising a list of sequential instructions comprising control code and data processing code and a program separation indicator indicating a point where said sequential instructions may be divided to form separate sections that are capable of being separately executed and that each comprise different data processing code, said method comprising the steps of: (i) analysing said portion of said program to determine if said sequential instructions can be divided at said point indicated by said program separation indicator and in response to determining that it can: (iia) providing data communication between said separate sections indicated by said program separation indicator, such that said separate sections can be decoupled from each other, such that at least one of said sections is capable of being separately executed by an execution mechanism that is separate from an execution mechanism executing another of said separate sections, said at least one of said sections being capable of generating data and communicating said data to at least one other of said separate sections; and in response to determining it can not: (iib) not performing step (iia).
- the present method provides a tool for analysing a portion of the program to determine if the instructions can be divided at a point indicated by a separation indicator.
- separation indicators are provided within at least a section of the program and indicate where it is desirable to divide the program.
- the division of the program is determined to some degree by these separation indicators and can thus, be controlled by a programmer.
- the method of the present invention forms an analysis of a program that actually includes the separation indicators and decides if the program can indeed be separated at these points. If it is decided that it can be it provides data communication between the two sections to allow them to be decoupled from each other.
- the program can be split into sections suitable for separate execution allowing a program to be efficiently processed by a variety of different, often complex devices. If it decides it cannot be divided at this point then it does not perform the data communication step.
- a warning indicating an error in the computer program is output. Providing the programmer with a warning may be the most appropriate thing to do if the separation indicators are not in the correct position.
- said step (iib) comprises amending said computer program such that said sequential instructions can be divided at said point and then performing step (iia).
- the method can amend the computer program so that the sequential instructions can be divided at this point and then the data communication can be divided between the different sections. It may be that it is a relatively simple matter to amend the computer program so that it can be divided at the point indicated and if this is the case then the method can perform this step rather than outputting a warning.
- said step of amending said computer program comprises inserting data transfer instructions at said point indicated by said program separation indicator.
- the step required to amend the computer program may be one of inserting data transfer instructions at the point indicated by the program separation indicator.
- said step (iib) comprises merging said two sections together and removing said program separation indicator.
- said program separation indicator comprises at least one data transfer instruction, said data communication between said separate sections being provided in dependence upon said at least one data transfer instruction.
- program separation indicators can take a number of forms it is quite efficient if they take the form of data transfer instructions.
- providing program separation indictors in the form of data transfer instructions may facilitate their separation by providing the data communication required.
- said step (iia) of providing data communication comprises inserting at least one “put data into a data store” instruction and at least one “get data from said data store” instruction into said instruction stream, and dividing said computer program between said put and get instructions to form said at least one separate section.
- the step of providing data communication can be done by inserting a put data into data store and a get data from said data store instructions into the instruction stream. This allows data to be removed from one section of the program and then input into the other section via a data store. Thus, the two sections are in effect decoupled from each other but data can travel between the two via this data store.
- said data store comprises a FIFO buffer.
- the data store may comprise a FIFO buffer as this is clearly the simplest arrangement where the first data to exit from a section of the program is the first data to enter the next section, it may be that the data is not required in a particular order or indeed that all the data generated by one section is not required by the other.
- a variety of different data stores and different arrangements can be used in some embodiments. For example, a stack which has a last in first out semantics could be used, one advantage of this is that a stack is simple to implement.
- said step (iia) comprises providing cyclic data communication between said separate sections.
- the decoupling of threads can be further extended to use where communication between threads is cyclic. Cyclic thread dependencies can lead to deadlock that is, two threads may not run in parallel because of data dependencies between them and thus, in devices of the prior art decoupling is limited to acyclic thread dependencies. Embodiments of the present invention address this problem and provide cyclic dependencies. This may be done, for example, by using put and get instructions and not requiring the number of puts to be equal to the number of gets. This is in contrast to the prior art where put and get operations are always inserted in corresponding places in each thread. Allowing put operations to be inserted in places that do not correspond to get operations in other threads, means that code such as is illustrated in FIG. 4 can be produced.
- said separate sections comprise the same control code.
- control code is the same in the two sections as the computer program is divided such that different data processing steps are performed under the same control in each divided section. Duplicating control code in this way enables the program to be divided.
- control code will be different. This is because it may be advantageous occasionally to slightly modify the control code in one of the sections such that, for example, a conditional code that is no longer required is not present.
- said portion of said computer program comprises a plurality of program separation indicators each indicating a point where said sequential instructions may be divided to form separate sections, each of said separate sections being capable of being separately executed and comprising different data processing code, said method providing data communication between said separate sections indicated by said plurality of program separation indicators.
- An instruction loop having several data processing steps for example can be divided by embodiments of the present invention into two sections by allowing the different sections to have different data processing codes. This can increase the performance of a system significantly. Generally this is done by duplicating the control code and in effect performing two loops, one performing one or more of the data processing steps of the original loop and the other performing the rest of the steps.
- said transformed computer program is suitable for execution upon respective execution mechanisms of a heterogeneous system having a complex asymmetric memory hierarchy and a plurality of execution mechanisms.
- Embodiments of the present invention although applicable to symmetric systems are particularly valuable in asymmetric heterogeneous systems wherein it is often difficult to separately execute sections of a program, particularly where at least a portion of the program is written sequentially.
- a section of code is executed by a single execution mechanisms
- said control code of at least one of said sections is operable to be processed by a processor of said heterogeneous system and said data processing code of said section is operable to be processed by an execution mechanism under control of said control code processed by said processor.
- An execution mechanism may be a simple mechanism designed for a particular function, such as a memory transfer unit (colloquially known as a “DMA engine”) and in such cases it may be that the control code is performed on a separate processor, while the data processing operations are performed on the simpler mechanism.
- DMA engine memory transfer unit
- the plurality of execution mechanisms can take a number of forms, including a general purpose processor; a direct memory access unit; a coprocessor; a VLIW processor; a digital signal processor; and a hardware accelerator unit.
- said method step comprises an initial step performed before step (i) of: defining said portion of said computer program by marking said computer program with indications delimiting said portion of said sequential instructions within which said at least two sections are to be located.
- said computer program comprises said portion having a number of instructions to be executed sequentially and at least one further portion having instructions to be performed in parallel with each other.
- a computer program may have different portions, some for execution sequentially and some already written for parallel processing. In such a case, it is the portion that has the instructions for sequential execution that is analysed to see if it can be divided into sections for separate execution. It should be noted that a portion to be analysed may be within a section that is to be executed in parallel. Furthermore, a portion to be analysed may also contain two or more sections that are to be executed in parallel.
- said portion of said computer program comprises an instruction loop comprising at least two data processing instructions, and said at least two sections each comprise said instruction loop each section comprising at least one of said at least two data processing instructions, said at least two sections comprising different data processing instructions.
- An instruction loop having several data processing steps can be divided into two sections, and thereby increase the performance of a system significantly.
- the present method is able to duplicate the control code and perform in effect two loops, one performing one or more of the data processing steps of the initial loop and the other performing the rest of the steps.
- said portion of said computer program comprises a whole computer program.
- a second aspect of the present invention provides a computer-readable storage medium comprising a computer program for controlling a computer to perform the method of the first aspect of the present invention.
- a third aspect of the invention provides a computer executing a computer program to perform the method of a first aspect of the present invention.
- a further aspect of the present invention provides a method of transforming a portion of a computer program comprising a list of sequential instructions and a program separation indicator indicating a point where said sequential instructions may be divided to form separate sections that are capable of being separately executed and that each comprise different data processing code, said list of instructions comprising control code and data processing code, said method comprising the step of:
- the program comprises program separation indictors indicating points where they may be divided then the program can be transformed by providing data communication between the separate sections at the points indicated so that they can be decoupled from each other. This allows the program to be split into sections suitable for separate execution and allows the program to be efficiently processed by a variety of different often complex devices. This enables future analysis of the program via a programmer to be relatively straight forward and yet still enable it to execute efficiently on a parallel system.
- said method comprises a further initial step (0) performed before step (i) of in response to said program separation indicator, analysing said portion of said computer program and determining which of said sequential instructions should be in which of said separate sections prior to providing said data communication.
- FIGS. 1 a to 1 c show flow diagrams of methods according to embodiments of the present invention
- FIG. 2 a to 2 d schematically shows the splitting into separately executable sections of a computer program according to an embodiment of the present invention
- FIG. 3 a to 3 b schematically shows a method of splitting and then merging sections of a computer program
- FIG. 4 schematically shows data communication between two sections of a program
- FIG. 5 a shows a simple computer program annotated according to an embodiment of the present invention
- FIG. 5 b shows the maximal set of threads for the program of FIG. 4 a.
- FIG. 6 schematically illustrates an asymmetric multiprocessing apparatus with an asymmetric memory hierarchy
- FIG. 7 illustrates an architectural description
- FIG. 8 illustrates a communication requirement
- FIG. 9 illustrates communication support
- FIG. 1 a shows a flow diagram illustrating a method according to an embodiment of the present invention.
- a first step is performed in which a portion of a computer program comprising a list of sequential instructions and a program separation indicator indicating a point where the sequential instructions may be divided to form separate sections that are capable of being separately executed is analysed. The analysis determines if the sequential instructions can be split at the point indicated by the separation indicator into separate sections that can be processed on different execution mechanisms. If it determines it can the sequential instructions are divided into the separate sections at the point indicated by the program separation indicator. If it determines they cannot be separated at this point then a warning is output to the programmer to indicate an error in the program.
- FIG. 1 b illustrates an alternative embodiment in which rather than outputting a warning if the program cannot be decoupled and separated at the indicated point, the program is amended by inserting data communication instructions into the list of sequential instructions, these data communication instructions enabling the different sections to be decoupled and thus, separated. The separation can then be performed.
- FIG. 1 c provides an alternative embodiment in which in response to determining that the program cannot be separated at the indicated point the two sections are merged together and the program separation indicator removed.
- the three embodiments provide different solutions to an analysis that determines that it is not possible to separate a program into sections at the point indicated. Different ones of these embodiments may be used in the same analysis of a program for different separation indicators depending on circumstances. Thus, it may be that the preferred course of action is to amend the program to make it separable at he indicated point, if this cannot be done then it may be chosen to merge the two portions, or if this would result in an unacceptably large portion then a warning may be output.
- FIG. 2 a shows a portion of a computer program comprising a loop in which data items are processed, function f operating on the data items, and function g operating on the data items output by function f and then function h operating on these items. These functions being performed n times in a row for values of i from 1 to n.
- control flow can be seen as following the solid arrows while data flow follows the dotted arrows.
- decouple or program separation indications are inserted into the data flow where it is seen as being desirable to split the portion into sections that are decoupled from each other and can thus, be executed on separate execution mechanisms.
- a decouple indication is provided between the data processing operations f and g. This can be seen as being equivalent to inserting a buffer in the data flow, as the two sections are decoupled by providing a data store between then so that the function f can produce its results which can then be accessed at a different time by function g.
- the program is then analysed to see if it can indeed be decoupled at the point indicated by the separation indicators. If it can then the method proceeds to FIG. 2 c . If it cannot then a warning may be output to the programmer, the program may be amended to enable it to be decoupled at this point, or the decouple indication may be removed from the program and the two sections merged.
- FIG. 2 c shows how the separate sections of the program are decoupled by the insertion of “put” and “get” instructions into the data stream. These result in the data being generated by the f function being put into a data store, from which it is retrieved by the get instruction to be processed by function g.
- the two sections of the program are in effect decoupled from each other and can be executed on separate executions mechanisms.
- one of the functions may be a function suitable for processing by an accelerator in which case it can be directed to an accelerator, while the other portion is processed by say, the CPU of the apparatus.
- the splitting of the program results in the control code of the program being duplicated in both section, while the data processing code is different in each section.
- the put and get operations used in FIG. 2 c can be used in programs both for scalar and non-scalar values but they are inefficient for large (non-scalar) values as they require a memory copy.
- a different embodiment of the invention applies this idea to the channel interface, by replacing the simple ‘put’ operation with two functions: put_begin obtains the address of the next free buffer in the channel and put end makes this buffer available to readers of the channel:
- sequences of code such as:
- Cyclic thread dependencies can lead to deadlock—that is, two threads may not run in parallel because of data dependencies between them and thus, in devices of the prior art decoupling is generally limited to acyclic thread dependencies.
- FIGS. 3 a and 3 b schematically illustrate the program code shown in FIG. 2 .
- a data store is provided to decouple functions f and g, but one is not provided between g and h.
- analysis of the program to decouple it is performed automatically and several potential sections are provided, in this case these are loops having functions f, g and h in them. The automatic analysis then checks that each loop can be executed separately and in this case identifies a missing data path between functions g and h. Thus, these two functions are remerged to provide two sections with a data path between.
- a further example of a code fragment that can be split by an embodiment of the present invention is provided below. Since communication lies at the boundaries between threads, the compiler's job is to “color in” the code that lies between the boundaries. This is done through a dependency analysis to decide the set of operations that are on the “producer” side of a channel and the set of operations on the “consumer” side of a channel. The compiler then partitions the operations according to that analysis; and generates a separate thread for each equivalence class.
- the PIPELINE annotation on line 1 identifies the region of code to be split into threads.
- the FIFO annotation on line 5 identifies that the communication between threads is to be performed between f and g.
- the compiler performs a data and control flow analysis to determine that the call to g has a data dependency on the FIFO operation and also has control dependencies on the if statement (line 4) and the for loop (line 2). This results in the following thread:
- the data and control flow analysis also determines that the FIFO operation (line 5) has a data dependency on the call to f (line 3) and also has control dependencies on the if statement (line 4) and the for loop (line 2). This results in the following thread:
- Decoupling must make two essential decisions: “What variables and operations to replicate?” and “What operations to place in the same thread?”.
- the task of decoupling is to split the region of code into as many threads as possible, without introducing timing-dependent behaviour, using channels to communicate between threads.
- the generated threads do not strictly partition the statements in the original code: some variables and operations (principally those used for control) are duplicated between threads.
- the choice of what to duplicate is an essential part of the transformation: if too much code or data is duplicated, the transformed program can run slower and use much more memory than the original program. While these decisions could be controlled using annotations on every variable and statement, some simple default rules can be used that give the programmer control without requiring excessive annotation.
- variables to be duplicated are determined by the location of their declaration (variables declared inside the PIPELINE annotation may be duplicated) and their size (scalar variables may be duplicated). Operations other than function calls may be duplicated unless they have side-effects or modify a non-duplicable variable.
- the dependency analysis stage forms a large number of “pre-threads” by computing a backward data and control slice (see Mark Weiser, “Program slicing”, in ICSE '81: Proc. of International Conference on Software Engineering, 439-449, 1981) from each unduplicated operation ignoring data dependencies on FIFO operations but including all duplicable and unduplicable operations in the slice. That is, we repeatedly apply rules (1) and (2) to form prethreads. In our running example, there are three prethreads: one each for f( ), FIFO(2,x), and g(x).
- the prethread for f( ) is:
- the merging stage combines “prethreads” by merging threads that contain the same non-duplicable operation or variable. For example, the prethread for f( ) is merged with the prethread for FIFO(2,x) because they both contain the operation f( ), resulting in the prethread:
- the thread production stage converts prethreads to threads by inserting channel declarations and initialization, privatizing duplicable variables, replacing FIFO operations with fifo_put operations and inserting a fifo_get operation in every thread that contains an operation dependent on a FIFO operation. If multiple threads contain operations dependent on the same FIFO operation, a separate channel has to be introduced for each fifo_get operation introduced and the FIFO operation is replaced with a fifo_put operation on each channel.
- the problem can be fixed by moving the FIFO operation to before the if-statement or by arranging to pass the if-condition through a channel by changing the if-statement to if (FIFO(2,x>0)) ⁇ . . . ⁇ .
- Decoupling can be used with all channel types except non-blocking FIFOs.
- FIG. 4 shows a further example of how an original piece of code can be split into two threads to be executed in parallel using of put and get instructions.
- Parallelizing at a coarse granularity allows the duplication of more control code between threads which reduces and simplifies inter-thread communication allowing the generation of distributed schedules. That is, we can distribute the control code across multiple processors both by putting each control thread on a different processor and by putting different parts of a single control thread onto different processors.
- the transfer of data may be done by, writing the data to a particular buffer such as a FIFO. Alternatively it may simply be done by providing the other section of the program with information as to where the data has been stored.
- the way of transferring the data depends on the system the program is executing on.
- the architecture does not have shared memory, it is necessary to insert DMA copies from a buffer in one memory to a buffer in a different memory. This can lead to a lot of changes in the code: declaring both buffers, performing the copy, etc.
- an analysis is performed to determine which buffers need to be replicated in multiple memory regions and to determine exactly which form of copy should be used.
- DMA copies are also inserted automatically subject to some heuristics when the benefit from having the programmer make the decision themselves is too small.
- Atomic channels provide atomic access to an element: an atomic_get operation acquires a copy of the element and makes the element unavailable to other threads (i.e., it “locks” the variable) and an atomic_put operation makes the variable available for use by other threads (i.e., it “unlocks” the variable).
- an atomic_get operation acquires a copy of the element and makes the element unavailable to other threads (i.e., it “locks” the variable) and an atomic_put operation makes the variable available for use by other threads (i.e., it “unlocks” the variable).
- atomic channels are equivalent to a fifo channel of maximum length 1.
- Nonblocking put and get channels are a variant on fifo channels where the nbpfifo_put operation returns an error code if the channel is full instead of blocking as fifo channels do. These channels are for use in interrupt handlers since it is possible to block a thread but not an interrupt. We also provide channels that provide a non-blocking nbgfifo_get operation.
- Timed channels provide time-indexed access to data.
- a timestamp is specified:
- the ts_get operation returns the entry with the closest timestamp to that specified. All ts_put operations must use strictly increasing times and all ts_get operations must use strictly increasing times. This restriction allows entries to be discarded when they can no longer be accessed.
- Timed channels allow for more parallelism between threads since, after the first ts_put is performed, ts_get operations never block because there is always an entry with a closest timestamp.
- the cost of this increased performance is less precise synchronization between threads than with FIFO channels: applications that use timed channels are unlikely to give deterministic results.
- Timed channels are useful for implementing time-sensitive information where it is important to use current data.
- mobile telephones implementing the “3rd generation” W-CDMA protocol use rake receivers to increase the bandwidth of links subject to multipath fading (i.e., where the signal contains delayed echoes of the transmitted signal typically due to reflections off buildings).
- Rake receivers estimate the strength and delay of these reflections and use these estimates to combine delayed copies of the received signal to maximize the amount of received signal energy.
- the best estimate to use is the one closest in time to the data arrived which may be different from the next estimate generated.
- Timed channels are an example of a channel type which makes sense in some domains or applications but not in others. Rather than fix the set of channel types in the language, our compiler allows new channel types to be added using annotations to identify channel types and put and get operations.
- the only properties on which SoC-C relies are that operations are atomic, directional and obey a copy semantics. That is, put operations atomically copy data into a channel and get operations atomically copy data out of a channel.
- FIG. 5 a shows a simple computer program annotated according to an embodiment of the present invention. An analysis of this program is performed initially and parts of the program are identified by programmer annotation in this embodiment although it could be identified by some other analysis including static analysis, profile driven feedback, etc. The parts identified are as follows:
- the “replicatable objects” that is variables and operations which it is acceptable to replicate.
- a simple rule of thumb is that scalar variables (i.e., not arrays) which are not used outside the scope, scalar operations which only depend on and only modify replicatable variables, and control flow operations should be replicated but more sophisticated policies are possible.
- the algorithm splits the operations in the scope into a number of threads whose execution will produce the same result as the original program under any scheduling policy that respects the FIFO access ordering of the channels used to communicate between threads.
- the particular decoupling algorithm used generates a maximal set of threads such that the following properties hold:
- FIG. 5 b shows the maximal set of threads for the program of FIG. 5 a .
- One way to generate the set of threads shown in FIG. 5 b is as follows:
- Each replicatable variable must be initialized at the start of each thread with the value of the original variable before entering the scope and one of the copies of each replicatable variable should be copied back into the master copy on leaving the scope. (Executing all these protothreads is highly unlikely to give the same answer as the original program, because it lacks the necessary synchronization between threads. This is fixed by the next steps.)
- Another way is to pick an operation, identify all the operations which must be in the same thread as that operation by repeatedly adding operations which would be merged (in step 2 above). Then pick the next operation not yet assigned to a thread and add all operations which must be in the same thread as that operation. Repeat until there are no more non-replicatable operations. It should be noted that this is just one possible way of tackling this problem: basically, we are forming equivalence classes based on a partial order and there are many other known ways to do this.
- the above method splits a program into a number of sections which can be executed in parallel. There are many possible mechanisms that can be used to accomplish this task.
- FIG. 6 shows a flow diagram of a method according to an embodiment of the present invention.
- An initial step of the method comprises analysing the computer program that contains sequential code and program separation indicators. For each program separation indicator the program is analysed to determine how it can be divided into separate sections around this separation indicator. First of all it is checked that it is reasonable to divide it there. If it can be divided at this point then data communication between the two sections is provided and this may be done in a number of ways including the insertion of put and get instructions as was discussed earlier and then the program is analysed to determine if there are further program separation indicators. If there is then the program at this point is analysed to determine if it can be divided. If it cannot in its present state then it is checked to determine if it requires further data transfer instructions to divide it.
- Program separation indicators can take a number of forms. They may simply be some sort of indicator such as a split indicating that the program needs to be divided there or they can in fact be data transfer functions which do by themselves provide a data communication between the two sections. Thus, they may be fifo instructions indicating that data should be sent fifo to transfer the data between the two sections as was discussed earlier or they may be put and get instructions. It may be that several variables need to be transferred between the two sections and that there are not sufficient data transfer instructions within the program. In such a case then the further data transfer instructions for these variables which have not been addressed can be inserted into the program and then the data communication between the two sections is provided and can then be separated before separate execution on separate execution mechanisms.
- the compilation tools can take a program that is either sequential or contains few threads and map it onto the available hardware, introducing parallelism in the process.
- the task of programming a SoC is to map different parts of an application onto different parts of the hardware.
- blocks of code must be mapped onto processors, data engines, accelerators, etc. and data must be mapped onto various memories.
- the mapping process is both tedious and error-prone because the mappings must be consistent with each other and with the capabilities of the hardware. We reduce these problems using program analysis which:
- the number of legal mappings is usually large but once the programmer has made a few choices, the number of legal options usually drops significantly so it is feasible to ask the programmer to make a few key choices and then have the tool fill in the less obvious choices automatically.
- Programmable accelerators may have limited program memory so it is desirable to upload new code while old code is running. For correctness, we must guarantee that the new code is uploaded (and I-caches made consistent) before we start running it.
- Our compiler uses program analysis to check this and/or to schedule uploading of code at appropriate places.
- mappings of an application For applications with highly variable load, it is desirable to have multiple mappings of an application and to switch dynamically between different mappings.
- annotations which provide the semantics we want.
- the primary annotations are on data and on code. If a tag is repeated, it indicates alternative mappings.
- the tags associated with data include:
- the tags associated with code include:
- processor P1 is to execute fft followed by P1.
- the semantics is similar to that of a synchronous remote procedure call: when control reaches this code, free variables are marshalled and sent to processor P1, processor P1 starts executing the code and the program continues when the code finishes executing.
- the tags associated with functions are:
- An error such as mapping a piece of code to a fixed-function accelerator that does not support that function should probably just be reported as an error that the programmer must fix.
- Errors such as omitting synchronization can sometimes be fixed by automatically inserting synchronization. Errors such as requiring more variables to a memory bank than will fit can be solved, to some extent, using overlay techniques. Errors such as mapping an overly large variable to a memory can be resolved using software managed paging though this may need hardware support or require that the kernel be compiled with software paging turned on (note: software paging is fairly unusual so we have to implement it before we can turn it on!). Errors such as omitting memory barriers, cache flush/invalidate operations or DMA transfers can always be fixed automatically though it can require heuristics to insert them efficiently and, in some cases, it is more appropriate to request that the programmer fix the problem themselves.
- Our compiler uses information about the SoC architecture, extracted from the architecture description, to determine how to implement the communication requirements specified within the program. This enables it to generate the glue code necessary for communication to occur efficiently and correctly. This can include generation of memory barriers, cache maintenance operations, DMA transfers and synchronisation on different processing elements.
- This automation reduces programming complexity, increases reliability and flexibility, and provides a useful mechanism for extended debugging options.
- RPCs Remote Procedure Calls
- RPC abstraction can be expressed as functions mapped to particular execution mechanisms:
- This provides a simple mechanism to express invocation of functions, and the associated resourcing, communication and synchronisation requirements.
- Code can be translated to target the selected processing elements, providing the associated synchronisation and communication. For example, this could include checking the resource is free, configuring it, starting it and copying the results on completion.
- the compiler can select appropriate glue mechanisms based on the source and target of the function call. For example, an accelerator is likely to be invoked primarily by glue on a processor using a mechanism specific to the accelerator.
- the glue code may be generated automatically based on a high level description of the accelerator or the programmer may write one or more pieces of glue by hand.
- processor on which the operation runs can be determined statically or can be determined dynamically. For example, if there are two identical DMA engines, one might indicate that the operation can be mapped onto either engine depending on which is available first.
- the compiler optimisations based on the desired RPC interface can range from a dynamically linked interface to inter-procedural specialisation of the particular RPC interface.
- RPC calls may be synchronous or asynchronous.
- Asynchronous calls naturally introduce parallelism, while synchronous calls are useful as a simpler function call model, and may be used in conjunction with fork-join parallelism. In fact, parallelism is not necessary for efficiency; a synchronous call alone can get the majority of the gain when targeting accelerators. Manually and automatically selecting between asynchronous and synchronous options can benefit debugging, tracing and optimisation.
- RPC calls may be re-entrant or non-reentrant, and these decisions can be made implicitly, explicitly or through program analysis to provide benefit such as optimisation where appropriate.
- This mechanism enables a particular function to have a number of different execution targets within a program, but each of those targets can be associated back to the original function; debugging and trace can exploit this information.
- This enables a user to set a breakpoint on a particular function, and the debug and trace mechanisms be arranged such that it can be caught wherever it executes, or on a restricted subset (e.g. a particular processing element).
- RPC interface implementation can be abstracted away in some debugging views.
- the datatypes are often bulk datastructures such as arrays of data, multimedia data, signal processing data, network packets, etc. and the operations may be executed with some degree of parallelism on a coprocessor, DSP processor, accelerator, etc. It is therefore possible to view programs as a series of often quite coarse-grained operations applied to quite large data structures instead of the conventional view of a program as a sequence of ‘scalar’ operations (like ‘32 bit add’) applied to ‘scalar’ values like 32-bit integers or the small sets of values found in SIMD within a register (SWAR) processing such as that found in NEON. It is also advantageous to do so because this coarse-grained view can be a good match for accelerators found in modern SoCs.
- SWAR register
- optimization techniques known to work on fine-grained operations and data can be adapted to operate on coarse-grained operations and data.
- Our compiler understands the semantics associated with the data structures and their use within the system, and can manipulate them and the program to perform transformations and optimisations to enable and optimise execution of the program.
- this might be generated automatically from a precise description of the operation (including the implementation of the operation) or it might be generated from an approximate description of the main effects of the operation or it might be provided as a direct annotation.
- compilers do something similar for scalar variables: the value of a scalar variable ‘x’ might sometimes live on the stack, sometimes in register 3, sometimes in register 6, etc. and the compiler keeps track of which copies currently contain the live value.
- the compiler can provide improved memory allocation through memory reuse because it can identify opportunities to place two different variables in the same memory location. Indeed, one can use many algorithms normally used for register allocation (where the registers contain scalar values) to perform allocation of data structures. One modification required is that one must handle the varying size of buffers whereas, typically, all scalar registers are the same size.
- Compiler books list many other standard transformations that can be performed to scalar code. Some of the mapping and optimisation techniques that can be applied at the coarse-grain we discuss include value splitting, spilling, coalescing, dead variable removal, recomputation, loop hoisting and CSE.
- Data structures will be passed as arguments, possibly as part of an ABI. Optimisations such as specialisation and not conforming to the ABI when it is not exposed can be applied.
- Our compiler supports a variety of code generation strategies which allow the parallelized control code to run on a control processor in a real time operating system, in interrupt handlers or in a polling loop (using ‘wait for event’ if available to reduce power) and it also supports distributed scheduling where some control code runs on one or more control processors, some control code runs on programmable accelerators, some simple parts of the code are implemented using conventional task-chaining hardware mechanisms. It is also possible to design special ‘scheduler devices’ which could execute some parts of the control code. The advantage of not running all the control code on the control processor is that it can greatly decrease the load on the control processor.
- the basic decoupling algorithm splits a block of code into a number of threads that pass data between each other via FIFO channels.
- the algorithm requires us to identify (by programmer annotation or by some other analysis including static analysis, profile driven feedback, etc.) the following parts of the program:
- the algorithm splits the operations in the scope into a number of threads whose execution will produce the same result as the original program under any scheduling policy that respects the FIFO access ordering of the channels used to communicate between threads.
- the put and get operations used when decoupling can be used both for scalar and non-scalar values (i.e., both for individual values (scalars) and arrays of values (non-scalars) but they are inefficient for large scalar values because they require a memory copy. Therefore, for coarse-grained decoupling, it is desirable to use an optimized mechanism to pass data between threads.
- get operation is split into a get_begin and get_end pair
- the modified decoupling algorithm treats the puts and gets in much the same way that the standard algorithm treats data boundaries. Specifically, it constructs the maximal set of threads such that:
- the modified decoupling algorithm will produce:
- Writing code using explicit puts can also be performed as a preprocessing step. For example, we could transform:
- a First-In First-Out (FIFO) channel preserves the order of values that pass through it: the first value inserted is the first value extracted, the second value inserted is the second value extracted, etc.
- Other kinds of channel are possible including:
- Exclusive access can be arranged in several ways. For example, one may ‘acquire’ (aka ‘lock) a ‘lock’ (aka ‘mutex’) before starting to access the resource and ‘release’ (aka ‘unlock’) the lock after using the resource. Exclusive access may also be arranged by disabling pre-emption (such as interrupts) while in a critical section (i.e., a section in which exclusive access is required). In some circumstances, one might also use a ‘lock free’ mechanism where multiple users may use a resource but at some point during use (in particular, at the end), they will detect the conflict, clean up and retry.
- Some examples of wanting exclusive access include having exclusive access to a hardware accelerator, exclusive access to a block of memory or exclusive access to an input/output device. Note that in these cases, it is usually not necessary to preserve the order of accesses to the resource.
- the basic decoupling algorithm avoids introducing race conditions by preserving all ordering dependencies on statements that access non-replicated resources. Where locks have been inserted into the program, the basic decoupling algorithm is modified as follows:
- Decoupling can be applied to any sequential section of a parallel program. If the section communicates with the parallel program, we must determine any ordering dependencies that apply to operations within the section (a safe default is that the order of such operations should be preserved). What I'm saying here is that one of the nice properties of decoupling is that it interacts well with other forms of parallellization including manual parallelization.
- the decoupling algorithm generates sections of code that are suitable for execution on separate processors but can be executed on a variety of different execution engines by modifying the “back end” of the compiler. That is, by applying a further transformation to the code after decoupling to better match the hardware or the context we wish it to run in.
- the most straightforward execution model is to execute each separate section in the decoupled program on a separate processor or, on a processor that supports multiple hardware contexts (i.e., threads), to execute each separate section on a separate thread.
- SoC System on Chip
- DSPs digital signal processors
- GPUs graphics processing units
- DMA direct memory access
- data engines programmable accelerators or fixed-function accelerators.
- This data processing can be modelled as a synchronous remote procedure call.
- a memory copy operation on a DMA engine can be modelled as a function call to perform a memory copy.
- the thread will typically:
- This mode of execution can be especially effective because one ‘control processor’ can keep a number of accelerator's busy with the control processor possibly doing little more than deciding which accelerator to start next and on what data. This mode of execution can be usefully combined with all of the following forms of execution.
- a thread library such as operating system (OS) or real time operating system (RTOS) running on one or more processors to execute the threads introduced by decoupling.
- OS operating system
- RTOS real time operating system
- transformations can be viewed as a way of transforming a thread into a state machine with each context switch point representing a state and the code that continues execution from each context switch point viewed as a transition function to determine the next state.
- Execution of transformed threads can be viewed as having been transformed to an event-based model where all execution occurs in response to external events such as responses from input/output devices or from accelerators. It is not necessary to transform all threads: event-based execution can coexist with threaded execution.
- Transforming threads as described above is also a good match for polling-based execution where the control processor tests for completion of tasks on a set of accelerators by reading a status register associated with each accelerator. This is essentially the same as interrupt-driven execution except that the state of the accelerators is updated by polling and the polling loop executes until all threads complete execution.
- Distributed scheduling can be done in various ways. Some part of a program may be simple enough that it can be implemented using a simple state machine which schedules one invocation of an accelerator after completion of another accelerator. Or, a control processor can hand over execution of a section within a thread to another processor. In both cases, this can be viewed as a RPC like mechanism (“ ⁇ foo( ); bar( )@P0; ⁇ @P1”). In the first case, one way to implement it is to first transform the thread to event-based form and then opportunistically spot that a sequence of system states can be mapped onto a simple state machine and/or you may perform transformations to make it map better.
- a system has to meet a set of deadlines and the threads within the system share resources such as processors, it is common to use a priority mechanism to select which thread to run next.
- priorities might be static or they may depend on dynamic properties such as the time until the next deadline or how full/empty input and output queues are.
- a long-standing problem of parallelizing compilers is that it is hard to relate the view of execution seen by debug mechanisms to the view of execution the programmer expects from the original sequential program.
- Our tools can take an execution trace obtained from running a program on parallel hardware and reorder it to obtain a sequential trace that matches the original program. This is especially applicable to but not limited to the coarse-grained nature of our parallelization method.
- partial reconstruction can be achieved by using points in the program that synchronize with each other to guide the matching process.
- the resulting trace will not be sequential but will be easier to understand.
- a useful application is to make it simpler to understand a trace of a program written using an event-based programming style (e.g., a GUI, interrupt handlers, device drivers, etc.)
- Partial reconstruction could also be used to simplify parallel programs running on systems that use release consistency. Such programs must use explicit memory barriers at all synchronization points so it will be possible to simplify traces to reduce the degree of parallelism the programmer must consider.
- HP has been looking at using trace to enable performance debugging of distributed protocols. Their focus is on data mining and performance not reconstructing a sequential trace.
- each section is a node in a directed graph and there is an edge from a node M to a node N if the section corresponding to M writes to an address x and the section corresponding to N reads from address x and, in the original trace, no write to x happens between M's write to x and N's read from x.
- This directed dataflow graph shows how different sections communicate with each other and can be used for a variety of purposes:
- the first section talks about what you need for the general case of a program that has been parallelized and you would like to serialize trace from a run of the parallel program based on some understanding of what transformations were done during parallelization (i.e., you know how different bits of the program relate to the original program).
- the second part talks about how you would specifically do this if the parallellization process included decoupling.
- the sketch describes the simplest case in which it can work but it is possible to relax these restrictions significantly.
- Condition 10 onwards relate mainly to what decoupling aims to achieve. But, some conditions are relevant such as conditions 5 and 6 because, in practice, it is useful to be able to relax these conditions slightly.
- (5) says that kernels have exclusive access to buffers but it is obviously ok to have multiple readers of the same buffer and it would also be ok (in most real programs) for two kernels to (atomically) invoke ‘malloc’ and ‘free’ in the middle of the kernels even though the particular heap areas returned will depend on the precise interleaving of those calls and it may even be ok for debugging printfs from each kernel to be ordered.
- Consequences of (1)-(4) We can identify each transaction with a kernel instance and we can see all transactions a kernel performs.
- Consequences of (1)-(6) Given a trace consisting of the interleaved transactions of a set of kernel instances, we can reorder the transactions such that all transactions of a kernel are contiguous and the resulting trace satisfies all read after write data dependencies. That is, we can construct a sequentially consistent view of the transactions as though kernels executed atomically and sequentially.
- Consequences of (7)-(9) Given a trace of the state transitions and synchronizations, we can reorder them into any of the set of legal transitions those state machines could have made where a transition is legal if it respects synchronization dependencies.
- Consequences of (1)-(9) Given a trace of all kernel transactions and all state transitions and synchronizations, we can reorder them into any legal trace which respects the same synchronization dependencies and data dependencies.
- Consequences of (1)-(10) We can reorder any trace to match a sequential version of the same program.
- decoupling gives us property (10) (i.e., that any trace of the decoupled program can be reordered to give a trace of the original program and to show how to do that reordering), we need to establish a relationship between the parallel state machine and the master state machine (i.e., the original program). This relationship is an “embedding” (i.e., a mapping between states in the parallel and the master machines such that the transitions map to each other in the obvious way). It is probably easiest to prove this by considering what happens when we decouple a single state machine (i.e., a program) into two parallel state machines.
- Extensions of decoupling allow the programmer to indicate that two operations can be executed in either order even though there is a data dependency between them (e.g., both increment a variable atomically). This mostly needs us to relax the definition of what trace reconstruction is meant to do. One major requirement is that the choice of order doesn't have any knock-on effects on control flow.
- a sufficient (and almost necessary) condition is that a put and a get on the same channel must not be inside corresponding critical sections (in different threads):
- a useful and safe special case is that all initialization code does N puts, a loop then contains only put_get pairs and then finalization code does at most N gets. It should be possible to prove that this special case is ok.
- a task may have a deadline or it may require that it receive 2 seconds of CPU in every 10 second interval but tasks rarely require that they receive a particular pattern of scheduling.
- the idea is to use the flexibility that the system provides to explore different sequences from those that a traditional scheduler would provide.
- schedulers in common use are ‘work conserving schedulers’: if the resources needed to run a task are available and the task is due to execute, the task is started. In contrast, a non-work-conserving scheduler might choose to leave a resource idle for a short time even though it could be used. Non-work-conserving schedulers are normally used to improve efficiency where there is a possibility that a better choice of task will become available if the scheduler delays for a short time.
- a non-work-conserving scheduler for testing concurrent systems because they provide more flexibility over the precise timing of different tasks than does a work-conserving scheduler.
- the modification of the schedule is probably done within the constraints of the real-time requirements of the tasks. For example, when a task becomes runnable, one might establish how much ‘slack’ there is in the schedule and then choose to delay the task for at most that amount. In particular, when exploring different phases, if the second event doesn't happen within that period of slack, then the first event must be sent to the system and we will hope to explore that phase the next time the event triggers.
- the scheduler can choose to execute the thread that would run that piece of code. (Again, it may be necessary to insert instrumentation into the code to help the scheduler figure out the status of each thread so that it can execute them in the correct order.)
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Quality & Reliability (AREA)
- General Physics & Mathematics (AREA)
- Computer Hardware Design (AREA)
- Debugging And Monitoring (AREA)
- Test And Diagnosis Of Digital Computers (AREA)
- Multi Processors (AREA)
- Memory System Of A Hierarchy Structure (AREA)
- Devices For Executing Special Programs (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/898,360 US20080098208A1 (en) | 2006-10-24 | 2007-09-11 | Analyzing and transforming a computer program for executing on asymmetric multiprocessing systems |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US85375606P | 2006-10-24 | 2006-10-24 | |
US11/898,360 US20080098208A1 (en) | 2006-10-24 | 2007-09-11 | Analyzing and transforming a computer program for executing on asymmetric multiprocessing systems |
Publications (1)
Publication Number | Publication Date |
---|---|
US20080098208A1 true US20080098208A1 (en) | 2008-04-24 |
Family
ID=38219318
Family Applications (5)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/898,360 Abandoned US20080098208A1 (en) | 2006-10-24 | 2007-09-11 | Analyzing and transforming a computer program for executing on asymmetric multiprocessing systems |
US11/898,363 Abandoned US20080098207A1 (en) | 2006-10-24 | 2007-09-11 | Analyzing diagnostic data generated by multiple threads within an instruction stream |
US11/907,881 Active 2029-01-22 US7809989B2 (en) | 2006-10-24 | 2007-10-18 | Performing diagnostic operations upon an asymmetric multiprocessor apparatus |
US11/976,315 Active 2031-02-07 US8190807B2 (en) | 2006-10-24 | 2007-10-23 | Mapping a computer program to an asymmetric multiprocessing apparatus |
US11/976,314 Active 2031-05-19 US8250549B2 (en) | 2006-10-24 | 2007-10-23 | Variable coherency support when mapping a computer program to a data processing apparatus |
Family Applications After (4)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/898,363 Abandoned US20080098207A1 (en) | 2006-10-24 | 2007-09-11 | Analyzing diagnostic data generated by multiple threads within an instruction stream |
US11/907,881 Active 2029-01-22 US7809989B2 (en) | 2006-10-24 | 2007-10-18 | Performing diagnostic operations upon an asymmetric multiprocessor apparatus |
US11/976,315 Active 2031-02-07 US8190807B2 (en) | 2006-10-24 | 2007-10-23 | Mapping a computer program to an asymmetric multiprocessing apparatus |
US11/976,314 Active 2031-05-19 US8250549B2 (en) | 2006-10-24 | 2007-10-23 | Variable coherency support when mapping a computer program to a data processing apparatus |
Country Status (11)
Country | Link |
---|---|
US (5) | US20080098208A1 (ja) |
EP (1) | EP2076837B1 (ja) |
JP (1) | JP5054115B2 (ja) |
KR (1) | KR101325229B1 (ja) |
CN (1) | CN101529391B (ja) |
DE (1) | DE602007009857D1 (ja) |
GB (1) | GB2443277B (ja) |
IL (1) | IL197314A (ja) |
MY (1) | MY144449A (ja) |
TW (1) | TWI407374B (ja) |
WO (1) | WO2008050076A1 (ja) |
Cited By (59)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090199189A1 (en) * | 2008-02-01 | 2009-08-06 | Arimilli Ravi K | Parallel Lock Spinning Using Wake-and-Go Mechanism |
US20090199028A1 (en) * | 2008-02-01 | 2009-08-06 | Arimilli Ravi K | Wake-and-Go Mechanism with Data Exclusivity |
US20090199197A1 (en) * | 2008-02-01 | 2009-08-06 | International Business Machines Corporation | Wake-and-Go Mechanism with Dynamic Allocation in Hardware Private Array |
US20090199184A1 (en) * | 2008-02-01 | 2009-08-06 | Arimilli Ravi K | Wake-and-Go Mechanism With Software Save of Thread State |
US20090199030A1 (en) * | 2008-02-01 | 2009-08-06 | Arimilli Ravi K | Hardware Wake-and-Go Mechanism for a Data Processing System |
US20090199029A1 (en) * | 2008-02-01 | 2009-08-06 | Arimilli Ravi K | Wake-and-Go Mechanism with Data Monitoring |
US20100162217A1 (en) * | 2008-12-22 | 2010-06-24 | Microsoft Corporation | Debugging System Using Static Analysis |
US20100241828A1 (en) * | 2009-03-18 | 2010-09-23 | Microsoft Corporation | General Distributed Reduction For Data Parallel Computing |
US20100268791A1 (en) * | 2009-04-16 | 2010-10-21 | International Business Machines Corporation | Programming Idiom Accelerator for Remote Update |
US20100268790A1 (en) * | 2009-04-16 | 2010-10-21 | International Business Machines Corporation | Complex Remote Update Programming Idiom Accelerator |
US20100268915A1 (en) * | 2009-04-16 | 2010-10-21 | International Business Machines Corporation | Remote Update Programming Idiom Accelerator with Allocated Processor Resources |
US20100287341A1 (en) * | 2008-02-01 | 2010-11-11 | Arimilli Ravi K | Wake-and-Go Mechanism with System Address Bus Transaction Master |
US20100293341A1 (en) * | 2008-02-01 | 2010-11-18 | Arimilli Ravi K | Wake-and-Go Mechanism with Exclusive System Bus Response |
US20110016293A1 (en) * | 2009-07-15 | 2011-01-20 | Comm. a l' ener. atom. et aux energies alter. | Device and method for the distributed execution of digital data processing operations |
US20110072006A1 (en) * | 2009-09-18 | 2011-03-24 | Microsoft Corporation | Management of data and computation in data centers |
US20110173423A1 (en) * | 2008-02-01 | 2011-07-14 | Arimilli Ravi K | Look-Ahead Hardware Wake-and-Go Mechanism |
US20110173625A1 (en) * | 2008-02-01 | 2011-07-14 | Arimilli Ravi K | Wake-and-Go Mechanism with Prioritization of Threads |
US20110173417A1 (en) * | 2008-02-01 | 2011-07-14 | Arimilli Ravi K | Programming Idiom Accelerators |
US20110173630A1 (en) * | 2008-02-01 | 2011-07-14 | Arimilli Ravi K | Central Repository for Wake-and-Go Mechanism |
US20110173631A1 (en) * | 2008-02-01 | 2011-07-14 | Arimilli Ravi K | Wake-and-Go Mechanism for a Data Processing System |
US20110173593A1 (en) * | 2008-02-01 | 2011-07-14 | Arimilli Ravi K | Compiler Providing Idiom to Idiom Accelerator |
US20110173419A1 (en) * | 2008-02-01 | 2011-07-14 | Arimilli Ravi K | Look-Ahead Wake-and-Go Engine With Speculative Execution |
US20110173632A1 (en) * | 2008-02-01 | 2011-07-14 | Arimilli Ravi K | Hardware Wake-and-Go Mechanism with Look-Ahead Polling |
US8145849B2 (en) | 2008-02-01 | 2012-03-27 | International Business Machines Corporation | Wake-and-go mechanism with system bus response |
GB2486485A (en) * | 2010-12-16 | 2012-06-20 | Imagination Tech Ltd | Completing execution of one phase of a computer program before scheduling any instructions for the next phase |
US8230201B2 (en) | 2009-04-16 | 2012-07-24 | International Business Machines Corporation | Migrating sleeping and waking threads between wake-and-go mechanisms in a multiple processor data processing system |
US20130010150A1 (en) * | 1997-07-15 | 2013-01-10 | Kia Silverbrook | Portable handheld device with multi-core image processor |
US8533720B2 (en) | 2011-02-25 | 2013-09-10 | International Business Machines Corporation | Offloading work from one type to another type of processor based on the count of each type of service call instructions in the work unit |
US8566831B2 (en) | 2011-01-26 | 2013-10-22 | International Business Machines Corporation | Execution of work units in a heterogeneous computing environment |
US8612952B2 (en) | 2010-04-07 | 2013-12-17 | International Business Machines Corporation | Performance optimization based on data accesses during critical sections |
US20130339923A1 (en) * | 2012-06-19 | 2013-12-19 | Charles Chen Xu | Data Handling Among Actors in a Dataflow Programming Environment |
US8621430B2 (en) | 2011-03-03 | 2013-12-31 | International Business Machines Corporation | Method for code transformation supporting temporal abstraction of parameters |
US8725992B2 (en) | 2008-02-01 | 2014-05-13 | International Business Machines Corporation | Programming language exposing idiom calls to a programming idiom accelerator |
US20140157229A1 (en) * | 2012-12-04 | 2014-06-05 | International Business Machines Corporation | Streamlining Hardware Initialization Code |
US20140195834A1 (en) * | 2013-01-04 | 2014-07-10 | Microsoft Corporation | High throughput low latency user mode drivers implemented in managed code |
US8789939B2 (en) | 1998-11-09 | 2014-07-29 | Google Inc. | Print media cartridge with ink supply manifold |
US8823823B2 (en) | 1997-07-15 | 2014-09-02 | Google Inc. | Portable imaging device with multi-core processor and orientation sensor |
US8866923B2 (en) | 1999-05-25 | 2014-10-21 | Google Inc. | Modular camera and printer |
US8896724B2 (en) | 1997-07-15 | 2014-11-25 | Google Inc. | Camera system to facilitate a cascade of imaging effects |
US8902333B2 (en) | 1997-07-15 | 2014-12-02 | Google Inc. | Image processing method using sensed eye position |
US8910137B2 (en) | 2012-04-13 | 2014-12-09 | International Business Machines Corporation | Code profiling of executable library for pipeline parallelization |
US8908075B2 (en) | 1997-07-15 | 2014-12-09 | Google Inc. | Image capture and processing integrated circuit for a camera |
US8936196B2 (en) | 1997-07-15 | 2015-01-20 | Google Inc. | Camera unit incorporating program script scanner |
US8949809B2 (en) | 2012-03-01 | 2015-02-03 | International Business Machines Corporation | Automatic pipeline parallelization of sequential code |
US9055221B2 (en) | 1997-07-15 | 2015-06-09 | Google Inc. | Portable hand-held device for deblurring sensed images |
US9323543B2 (en) | 2013-01-04 | 2016-04-26 | Microsoft Technology Licensing, Llc | Capability based device driver framework |
US20160313991A1 (en) * | 2013-06-16 | 2016-10-27 | President And Fellows Of Harvard College | Methods and apparatus for parallel processing |
US9652817B2 (en) | 2015-03-12 | 2017-05-16 | Samsung Electronics Co., Ltd. | Automated compute kernel fusion, resizing, and interleave |
US20170236244A1 (en) * | 2016-02-12 | 2017-08-17 | Arm Limited | Graphics processing systems |
US9811319B2 (en) | 2013-01-04 | 2017-11-07 | Microsoft Technology Licensing, Llc | Software interface for a hardware device |
US10296340B2 (en) | 2014-03-13 | 2019-05-21 | Arm Limited | Data processing apparatus for executing an access instruction for N threads |
CN110199269A (zh) * | 2017-01-23 | 2019-09-03 | 三星电子株式会社 | 用于多处理器之间的数据处理的方法和电子装置 |
CN110998540A (zh) * | 2017-08-01 | 2020-04-10 | 微软技术许可有限责任公司 | 调试器中的跟踪代码的聚焦的执行 |
CN111476264A (zh) * | 2019-01-24 | 2020-07-31 | 国际商业机器公司 | 访问受限的系统的对抗鲁棒性的测试 |
US10732982B2 (en) | 2017-08-15 | 2020-08-04 | Arm Limited | Data processing systems |
US10869108B1 (en) | 2008-09-29 | 2020-12-15 | Calltrol Corporation | Parallel signal processing system and method |
US20210271666A1 (en) * | 2018-09-28 | 2021-09-02 | Marc Brandis Ag | Analyzing a processing engine of a transaction-processing system |
US11604752B2 (en) | 2021-01-29 | 2023-03-14 | Arm Limited | System for cross-routed communication between functional units of multiple processing units |
US20230195321A1 (en) * | 2021-12-17 | 2023-06-22 | Samsung Electronics Co., Ltd. | Storage device and operating method thereof |
Families Citing this family (207)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7614037B2 (en) * | 2004-05-21 | 2009-11-03 | Microsoft Corporation | Method and system for graph analysis and synchronization |
US8079019B2 (en) * | 2007-11-21 | 2011-12-13 | Replay Solutions, Inc. | Advancing and rewinding a replayed program execution |
GB2443277B (en) * | 2006-10-24 | 2011-05-18 | Advanced Risc Mach Ltd | Performing diagnostics operations upon an asymmetric multiprocessor apparatus |
US8341604B2 (en) * | 2006-11-15 | 2012-12-25 | Qualcomm Incorporated | Embedded trace macrocell for enhanced digital signal processor debugging operations |
US8370806B2 (en) * | 2006-11-15 | 2013-02-05 | Qualcomm Incorporated | Non-intrusive, thread-selective, debugging method and system for a multi-thread digital signal processor |
US8533530B2 (en) * | 2006-11-15 | 2013-09-10 | Qualcomm Incorporated | Method and system for trusted/untrusted digital signal processor debugging operations |
US8380966B2 (en) * | 2006-11-15 | 2013-02-19 | Qualcomm Incorporated | Method and system for instruction stuffing operations during non-intrusive digital signal processor debugging |
US10353797B2 (en) * | 2006-12-29 | 2019-07-16 | International Business Machines Corporation | Using memory tracking data to inform a memory map tool |
US8484516B2 (en) * | 2007-04-11 | 2013-07-09 | Qualcomm Incorporated | Inter-thread trace alignment method and system for a multi-threaded processor |
WO2008144960A1 (en) * | 2007-05-31 | 2008-12-04 | Intel Coporation | Method and apparatus for mpi program optimization |
CN101329638B (zh) * | 2007-06-18 | 2011-11-09 | 国际商业机器公司 | 程序代码的并行性的分析方法和系统 |
US20090007115A1 (en) * | 2007-06-26 | 2009-01-01 | Yuanhao Sun | Method and apparatus for parallel XSL transformation with low contention and load balancing |
US8548777B2 (en) * | 2007-09-28 | 2013-10-01 | Rockwell Automation Technologies, Inc. | Automated recommendations from simulation |
US7801710B2 (en) * | 2007-09-28 | 2010-09-21 | Rockwell Automation Technologies, Inc. | Simulation controls for model variability and randomness |
US20090089031A1 (en) * | 2007-09-28 | 2009-04-02 | Rockwell Automation Technologies, Inc. | Integrated simulation of controllers and devices |
US8181165B2 (en) * | 2007-10-30 | 2012-05-15 | International Business Machines Corporation | Using annotations to reuse variable declarations to generate different service functions |
IL187038A0 (en) * | 2007-10-30 | 2008-02-09 | Sandisk Il Ltd | Secure data processing for unaligned data |
US8402438B1 (en) | 2007-12-03 | 2013-03-19 | Cadence Design Systems, Inc. | Method and system for generating verification information and tests for software |
US8156474B2 (en) * | 2007-12-28 | 2012-04-10 | Cadence Design Systems, Inc. | Automation of software verification |
US8468504B2 (en) * | 2007-12-28 | 2013-06-18 | Streaming Networks (Pvt.) Ltd. | Method and apparatus for interactive scheduling of VLIW assembly code |
US9063778B2 (en) * | 2008-01-09 | 2015-06-23 | Microsoft Technology Licensing, Llc | Fair stateless model checking |
GB2456813B (en) | 2008-01-24 | 2012-03-07 | Advanced Risc Mach Ltd | Diagnostic context construction and comparison |
JP5278336B2 (ja) * | 2008-02-15 | 2013-09-04 | 日本電気株式会社 | プログラム並列化装置、プログラム並列化方法及びプログラム並列化プログラム |
US8615647B2 (en) | 2008-02-29 | 2013-12-24 | Intel Corporation | Migrating execution of thread between cores of different instruction set architecture in multi-core processor and transitioning each core to respective on / off power state |
EP2257874A4 (en) | 2008-03-27 | 2013-07-17 | Rocketick Technologies Ltd | DESIGN SIMULATION ON THE BASIS OF PARALLEL PROCESSORS |
US8776030B2 (en) * | 2008-04-09 | 2014-07-08 | Nvidia Corporation | Partitioning CUDA code for execution by a general purpose processor |
US9678775B1 (en) * | 2008-04-09 | 2017-06-13 | Nvidia Corporation | Allocating memory for local variables of a multi-threaded program for execution in a single-threaded environment |
GB0808576D0 (en) * | 2008-05-12 | 2008-06-18 | Xmos Ltd | Compiling and linking |
FR2931269A1 (fr) * | 2008-05-16 | 2009-11-20 | Ateji Soc Par Actions Simplifi | Procede et systeme de developpement de programmes paralleles |
US9223677B2 (en) | 2008-06-11 | 2015-12-29 | Arm Limited | Generation of trace data in a multi-processor system |
WO2009153619A1 (en) * | 2008-06-19 | 2009-12-23 | Freescale Semiconductor, Inc. | A system, method and computer program product for debugging a system |
WO2009153621A1 (en) * | 2008-06-19 | 2009-12-23 | Freescale Semiconductor, Inc. | A system, method and computer program product for scheduling processor entity tasks in a multiple-processing entity system |
WO2009153620A1 (en) * | 2008-06-19 | 2009-12-23 | Freescale Semiconductor, Inc. | A system, method and computer program product for scheduling a processing entity task |
US8572577B2 (en) * | 2008-06-20 | 2013-10-29 | International Business Machines Corporation | Monitoring changes to data within a critical section of a threaded program |
US8332825B2 (en) * | 2008-06-26 | 2012-12-11 | Microsoft Corporation | Dynamically monitoring application behavior |
JP5733860B2 (ja) * | 2008-07-10 | 2015-06-10 | ロケティック テクノロジーズ リミテッド | 依存問題の効率的並列計算 |
US9032377B2 (en) * | 2008-07-10 | 2015-05-12 | Rocketick Technologies Ltd. | Efficient parallel computation of dependency problems |
JP2010026851A (ja) * | 2008-07-22 | 2010-02-04 | Panasonic Corp | コンパイラによる最適化方法 |
US8028113B2 (en) * | 2008-08-15 | 2011-09-27 | International Business Machines Corporation | Methods and systems for deadlock-free allocation of memory |
WO2010020828A1 (en) * | 2008-08-18 | 2010-02-25 | Telefonaktiebolaget L M Ericsson (Publ) | Data sharing in chip multi-processor systems |
US8230442B2 (en) | 2008-09-05 | 2012-07-24 | International Business Machines Corporation | Executing an accelerator application program in a hybrid computing environment |
US8504344B2 (en) * | 2008-09-30 | 2013-08-06 | Cadence Design Systems, Inc. | Interface between a verification environment and a hardware acceleration engine |
US20100095286A1 (en) * | 2008-10-10 | 2010-04-15 | Kaplan David A | Register reduction and liveness analysis techniques for program code |
US8418146B2 (en) * | 2008-11-26 | 2013-04-09 | Microsoft Corporation | Sampling techniques for dynamic data-race detection |
GB0823329D0 (en) * | 2008-12-22 | 2009-01-28 | Geotate Bv | Position signal sampling method and apparatus |
KR101511273B1 (ko) * | 2008-12-29 | 2015-04-10 | 삼성전자주식회사 | 멀티 코어 프로세서를 이용한 3차원 그래픽 렌더링 방법 및시스템 |
US8527734B2 (en) | 2009-01-23 | 2013-09-03 | International Business Machines Corporation | Administering registered virtual addresses in a hybrid computing environment including maintaining a watch list of currently registered virtual addresses by an operating system |
US9286232B2 (en) * | 2009-01-26 | 2016-03-15 | International Business Machines Corporation | Administering registered virtual addresses in a hybrid computing environment including maintaining a cache of ranges of currently registered virtual addresses |
US8843880B2 (en) * | 2009-01-27 | 2014-09-23 | International Business Machines Corporation | Software development for a hybrid computing environment |
US8255909B2 (en) | 2009-01-28 | 2012-08-28 | International Business Machines Corporation | Synchronizing access to resources in a hybrid computing environment |
US20100191923A1 (en) * | 2009-01-29 | 2010-07-29 | International Business Machines Corporation | Data Processing In A Computing Environment |
US9170864B2 (en) * | 2009-01-29 | 2015-10-27 | International Business Machines Corporation | Data processing in a hybrid computing environment |
KR20110124309A (ko) * | 2009-02-16 | 2011-11-16 | 인크론 지엠비에이치 | 시스템의 실시간 성능을 분석하기 위한 방법 |
US8205117B2 (en) * | 2009-02-25 | 2012-06-19 | Hewlett-Packard Development Company, L.P. | Migratory hardware diagnostic testing |
US20100242014A1 (en) * | 2009-03-17 | 2010-09-23 | Xiaohan Zhu | Symmetric multi-processor operating system for asymmetric multi-processor architecture |
JP5316128B2 (ja) | 2009-03-17 | 2013-10-16 | トヨタ自動車株式会社 | 故障診断システム、電子制御ユニット、故障診断方法 |
US8843927B2 (en) * | 2009-04-23 | 2014-09-23 | Microsoft Corporation | Monitoring and updating tasks arrival and completion statistics without data locking synchronization |
US8413108B2 (en) * | 2009-05-12 | 2013-04-02 | Microsoft Corporation | Architectural data metrics overlay |
US8719831B2 (en) * | 2009-06-18 | 2014-05-06 | Microsoft Corporation | Dynamically change allocation of resources to schedulers based on feedback and policies from the schedulers and availability of the resources |
US9378062B2 (en) * | 2009-06-18 | 2016-06-28 | Microsoft Technology Licensing, Llc | Interface between a resource manager and a scheduler in a process |
DE102009025572A1 (de) * | 2009-06-19 | 2010-12-23 | Wolfgang Pree Gmbh | Eine Methode zur Entwicklung von garantiert korrekten Echtzeitsystemen |
US8914799B2 (en) * | 2009-06-30 | 2014-12-16 | Oracle America Inc. | High performance implementation of the OpenMP tasking feature |
JP5452125B2 (ja) * | 2009-08-11 | 2014-03-26 | クラリオン株式会社 | データ処理装置及びデータ処理方法 |
US8566804B1 (en) * | 2009-08-13 | 2013-10-22 | The Mathworks, Inc. | Scheduling generated code based on target characteristics |
US8990783B1 (en) | 2009-08-13 | 2015-03-24 | The Mathworks, Inc. | Scheduling generated code based on target characteristics |
US8381194B2 (en) | 2009-08-19 | 2013-02-19 | Apple Inc. | Methods and apparatuses for selective code coverage |
US9430353B2 (en) * | 2009-10-26 | 2016-08-30 | Microsoft Technology Licensing, Llc | Analysis and visualization of concurrent thread execution on processor cores |
US9594656B2 (en) | 2009-10-26 | 2017-03-14 | Microsoft Technology Licensing, Llc | Analysis and visualization of application concurrency and processor resource utilization |
US8359588B2 (en) * | 2009-11-25 | 2013-01-22 | Arm Limited | Reducing inter-task latency in a multiprocessor system |
US8392929B2 (en) * | 2009-12-15 | 2013-03-05 | Microsoft Corporation | Leveraging memory isolation hardware technology to efficiently detect race conditions |
US8826234B2 (en) * | 2009-12-23 | 2014-09-02 | Intel Corporation | Relational modeling for performance analysis of multi-core processors |
WO2011083459A1 (en) * | 2010-01-08 | 2011-07-14 | Daniel Geist | Utilizing temporal assertions in a debugger |
US8516467B2 (en) * | 2010-01-29 | 2013-08-20 | Nintendo Co., Ltd. | Method and apparatus for enhancing comprehension of code time complexity and flow |
US9417905B2 (en) * | 2010-02-03 | 2016-08-16 | International Business Machines Corporation | Terminating an accelerator application program in a hybrid computing environment |
US8578132B2 (en) * | 2010-03-29 | 2013-11-05 | International Business Machines Corporation | Direct injection of data to be transferred in a hybrid computing environment |
US8959496B2 (en) * | 2010-04-21 | 2015-02-17 | Microsoft Corporation | Automatic parallelization in a tracing just-in-time compiler system |
US9015443B2 (en) | 2010-04-30 | 2015-04-21 | International Business Machines Corporation | Reducing remote reads of memory in a hybrid computing environment |
US8756590B2 (en) * | 2010-06-22 | 2014-06-17 | Microsoft Corporation | Binding data parallel device source code |
US8972995B2 (en) * | 2010-08-06 | 2015-03-03 | Sonics, Inc. | Apparatus and methods to concurrently perform per-thread as well as per-tag memory access scheduling within a thread and across two or more threads |
US9652365B2 (en) * | 2010-08-24 | 2017-05-16 | Red Hat, Inc. | Fault configuration using a registered list of controllers |
US20120240224A1 (en) * | 2010-09-14 | 2012-09-20 | Georgia Tech Research Corporation | Security systems and methods for distinguishing user-intended traffic from malicious traffic |
US8990551B2 (en) | 2010-09-16 | 2015-03-24 | Microsoft Technology Licensing, Llc | Analysis and visualization of cluster resource utilization |
US20120096445A1 (en) * | 2010-10-18 | 2012-04-19 | Nokia Corporation | Method and apparatus for providing portability of partially accelerated signal processing applications |
US8656496B2 (en) * | 2010-11-22 | 2014-02-18 | International Business Machines Corporations | Global variable security analysis |
US8832659B2 (en) * | 2010-12-06 | 2014-09-09 | University Of Washington Through Its Center For Commercialization | Systems and methods for finding concurrency errors |
US8959501B2 (en) * | 2010-12-14 | 2015-02-17 | Microsoft Corporation | Type and length abstraction for data types |
US20120160272A1 (en) * | 2010-12-23 | 2012-06-28 | United Microelectronics Corp. | Cleaning method of semiconductor process |
KR101785116B1 (ko) * | 2010-12-24 | 2017-10-17 | 한양대학교 산학협력단 | 모뎀 하드웨어에 독립적인 라디오 어플리케이션을 위한 소프트웨어 정의 라디오 단말 장치 |
US8856764B2 (en) * | 2011-01-25 | 2014-10-07 | International Business Machines Corporation | Distributed static analysis of computer software applications |
US8726245B2 (en) * | 2011-01-28 | 2014-05-13 | International Business Machines Corporation | Static analysis of computer software applications having a model-view-controller architecture |
DE102011004363B4 (de) * | 2011-02-18 | 2023-10-05 | Airbus Operations Gmbh | Steuervorrichtung zum Steuern von Netzwerkteilnehmern, Verfahren zum Betreiben eines Computernetzwerks und Computernetzwerk |
US9189283B2 (en) * | 2011-03-03 | 2015-11-17 | Hewlett-Packard Development Company, L.P. | Task launching on hardware resource for client |
GB2489278B (en) * | 2011-03-24 | 2019-12-25 | Advanced Risc Mach Ltd | Improving the scheduling of tasks to be performed by a non-coherent device |
US8650542B1 (en) * | 2011-03-25 | 2014-02-11 | The Mathworks, Inc. | Hierarchical, self-describing function objects |
WO2012134322A1 (en) * | 2011-04-01 | 2012-10-04 | Intel Corporation | Vectorization of scalar functions including vectorization annotations and vectorized function signatures matching |
US9128748B2 (en) * | 2011-04-12 | 2015-09-08 | Rocketick Technologies Ltd. | Parallel simulation using multiple co-simulators |
US8949777B2 (en) * | 2011-04-22 | 2015-02-03 | Intel Corporation | Methods and systems for mapping a function pointer to the device code |
US8855194B2 (en) * | 2011-05-09 | 2014-10-07 | Texas Instruments Incorporated | Updating non-shadow registers in video encoder |
US9043363B2 (en) * | 2011-06-03 | 2015-05-26 | Oracle International Corporation | System and method for performing memory management using hardware transactions |
US9069545B2 (en) * | 2011-07-18 | 2015-06-30 | International Business Machines Corporation | Relaxation of synchronization for iterative convergent computations |
US8918770B2 (en) * | 2011-08-25 | 2014-12-23 | Nec Laboratories America, Inc. | Compiler for X86-based many-core coprocessors |
GB2495959A (en) | 2011-10-26 | 2013-05-01 | Imagination Tech Ltd | Multi-threaded memory access processor |
US8909696B1 (en) * | 2011-11-02 | 2014-12-09 | Google Inc. | Redundant data requests with redundant response cancellation |
US9043765B2 (en) * | 2011-11-09 | 2015-05-26 | Microsoft Technology Licensing, Llc | Simultaneously targeting multiple homogeneous and heterogeneous runtime environments |
US8615614B2 (en) * | 2011-11-30 | 2013-12-24 | Freescale Semiconductor, Inc. | Message passing using direct memory access unit in a data processing system |
US9367687B1 (en) * | 2011-12-22 | 2016-06-14 | Emc Corporation | Method for malware detection using deep inspection and data discovery agents |
US9686152B2 (en) | 2012-01-27 | 2017-06-20 | Microsoft Technology Licensing, Llc | Techniques for tracking resource usage statistics per transaction across multiple layers of protocols |
KR101885211B1 (ko) * | 2012-01-27 | 2018-08-29 | 삼성전자 주식회사 | Gpu의 자원 할당을 위한 방법 및 장치 |
US8793697B2 (en) * | 2012-02-23 | 2014-07-29 | Qualcomm Incorporated | Method and system for scheduling requests in a portable computing device |
US9928109B2 (en) | 2012-05-09 | 2018-03-27 | Nvidia Corporation | Method and system for processing nested stream events |
US8838861B2 (en) | 2012-05-09 | 2014-09-16 | Qualcomm Incorporated | Methods and apparatuses for trace multicast across a bus structure, and related systems |
DE102012011584A1 (de) * | 2012-06-13 | 2013-12-19 | Robert Bosch Gmbh | Ressourcen-Managementsystem fürAutomatisierungsanlagen |
RU2012127578A (ru) | 2012-07-02 | 2014-01-10 | ЭлЭсАй Корпорейшн | Анализатор применимости программного модуля для разработки и тестирования программного обеспечения для многопроцессорных сред |
RU2012127581A (ru) * | 2012-07-02 | 2014-01-10 | ЭлЭсАй Корпорейшн | Генератор исходного кода для разработки и тестирования программного обеспечения для многопроцессорных сред |
EP2706420B1 (de) * | 2012-09-05 | 2015-03-18 | Siemens Aktiengesellschaft | Verfahren zum Betreiben eines Automatisierungsgerätes |
CN109240704B (zh) * | 2012-11-06 | 2022-06-14 | 相干逻辑公司 | 用于设计重用的多处理器编程工具包 |
CN108717387B (zh) * | 2012-11-09 | 2021-09-07 | 相干逻辑公司 | 对于多处理器系统的实时分析和控制 |
CN104781803B (zh) * | 2012-12-26 | 2018-06-15 | 英特尔公司 | 用于架构不同核的线程迁移支持 |
US9519568B2 (en) | 2012-12-31 | 2016-12-13 | Nvidia Corporation | System and method for debugging an executing general-purpose computing on graphics processing units (GPGPU) application |
US9207969B2 (en) * | 2013-01-25 | 2015-12-08 | Microsoft Technology Licensing, Llc | Parallel tracing for performance and detail |
US8762916B1 (en) * | 2013-02-25 | 2014-06-24 | Xilinx, Inc. | Automatic generation of a data transfer network |
US8924193B2 (en) * | 2013-03-14 | 2014-12-30 | The Mathworks, Inc. | Generating variants from file differences |
US9471456B2 (en) * | 2013-05-15 | 2016-10-18 | Nvidia Corporation | Interleaved instruction debugger |
US10802876B2 (en) * | 2013-05-22 | 2020-10-13 | Massachusetts Institute Of Technology | Multiprocessor scheduling policy with deadline constraint for determining multi-agent schedule for a plurality of agents |
GB2514618B (en) * | 2013-05-31 | 2020-11-11 | Advanced Risc Mach Ltd | Data processing systems |
IL232836A0 (en) * | 2013-06-02 | 2014-08-31 | Rocketick Technologies Ltd | Efficient parallel computation of dependency problems |
US9292419B1 (en) * | 2013-06-04 | 2016-03-22 | The Mathworks, Inc. | Code coverage and confidence determination |
US9697003B2 (en) * | 2013-06-07 | 2017-07-04 | Advanced Micro Devices, Inc. | Method and system for yield operation supporting thread-like behavior |
US9075624B2 (en) | 2013-06-24 | 2015-07-07 | Xilinx, Inc. | Compilation of system designs |
WO2015016907A1 (en) * | 2013-07-31 | 2015-02-05 | Hewlett Packard Development Company, L.P. | Data stream processing using a distributed cache |
US10372590B2 (en) * | 2013-11-22 | 2019-08-06 | International Business Corporation | Determining instruction execution history in a debugger |
US20150195383A1 (en) * | 2014-01-08 | 2015-07-09 | Cavium, Inc. | Methods and systems for single instruction multiple data programmable packet parsers |
US9733981B2 (en) * | 2014-06-10 | 2017-08-15 | Nxp Usa, Inc. | System and method for conditional task switching during ordering scope transitions |
US10061592B2 (en) | 2014-06-27 | 2018-08-28 | Samsung Electronics Co., Ltd. | Architecture and execution for efficient mixed precision computations in single instruction multiple data/thread (SIMD/T) devices |
US10061591B2 (en) | 2014-06-27 | 2018-08-28 | Samsung Electronics Company, Ltd. | Redundancy elimination in single instruction multiple data/thread (SIMD/T) execution processing |
US9182990B1 (en) * | 2014-07-01 | 2015-11-10 | Google Inc. | Method and apparatus for detecting execution of unsupported instructions while testing multiversioned code |
US9672029B2 (en) * | 2014-08-01 | 2017-06-06 | Vmware, Inc. | Determining test case priorities based on tagged execution paths |
US10148547B2 (en) | 2014-10-24 | 2018-12-04 | Tektronix, Inc. | Hardware trigger generation from a declarative protocol description |
US9338076B1 (en) | 2014-10-24 | 2016-05-10 | Tektronix, Inc. | Deriving hardware acceleration of decoding from a declarative protocol description |
US20160170767A1 (en) * | 2014-12-12 | 2016-06-16 | Intel Corporation | Temporary transfer of a multithreaded ip core to single or reduced thread configuration during thread offload to co-processor |
US9280389B1 (en) * | 2014-12-30 | 2016-03-08 | Tyco Fire & Security Gmbh | Preemptive operating system without context switching |
US9996354B2 (en) | 2015-01-09 | 2018-06-12 | International Business Machines Corporation | Instruction stream tracing of multi-threaded processors |
US20160224327A1 (en) * | 2015-02-02 | 2016-08-04 | Lenovo Enterprise Solutions (Singapore) Pte. Ltd. | Linking a Program with a Software Library |
US11119903B2 (en) * | 2015-05-01 | 2021-09-14 | Fastly, Inc. | Race condition testing via a scheduling test program |
US10102031B2 (en) | 2015-05-29 | 2018-10-16 | Qualcomm Incorporated | Bandwidth/resource management for multithreaded processors |
US9910760B2 (en) * | 2015-08-07 | 2018-03-06 | Nvidia Corporation | Method and apparatus for interception of synchronization objects in graphics application programming interfaces for frame debugging |
US10067878B2 (en) | 2015-09-23 | 2018-09-04 | Hanan Potash | Processor with logical mentor |
US9977693B2 (en) | 2015-09-23 | 2018-05-22 | Hanan Potash | Processor that uses plural form information |
US10140122B2 (en) | 2015-09-23 | 2018-11-27 | Hanan Potash | Computer processor with operand/variable-mapped namespace |
US10095641B2 (en) | 2015-09-23 | 2018-10-09 | Hanan Potash | Processor with frames/bins structure in local high speed memory |
US10061511B2 (en) | 2015-09-23 | 2018-08-28 | Hanan Potash | Computing device with frames/bins structure, mentor layer and plural operand processing |
WO2017062612A1 (en) * | 2015-10-09 | 2017-04-13 | Arch Systems Inc. | Modular device and method of operation |
US10534697B2 (en) * | 2015-10-27 | 2020-01-14 | Sap Se | Flexible configuration framework |
US9678788B2 (en) * | 2015-11-10 | 2017-06-13 | International Business Machines Corporation | Enabling poll/select style interfaces with coherent accelerators |
US10860499B2 (en) | 2016-03-22 | 2020-12-08 | Lenovo Enterprise Solutions (Singapore) Pte. Ltd | Dynamic memory management in workload acceleration |
US10203747B2 (en) | 2016-03-22 | 2019-02-12 | Lenovo Enterprise Solutions (Singapore) Pte. Ltd. | Workload placement based on heterogeneous compute performance per watt |
US10884761B2 (en) | 2016-03-22 | 2021-01-05 | Lenovo Enterprise Solutions (Singapore) Pte. Ltd | Best performance delivery in heterogeneous computing unit environment |
US11093286B2 (en) * | 2016-04-26 | 2021-08-17 | Hanan Potash | Computing device with resource manager and civilware tier |
US10303466B1 (en) * | 2016-09-07 | 2019-05-28 | Amazon Technologies, Inc. | Semantic annotations in source code |
US10177795B1 (en) | 2016-12-29 | 2019-01-08 | Amazon Technologies, Inc. | Cache index mapping |
US20180285241A1 (en) * | 2017-03-28 | 2018-10-04 | Carnegie Mellon University | Energy-interference-free debugger for intermittent energy-harvesting systems |
US20180331973A1 (en) * | 2017-05-09 | 2018-11-15 | Microsoft Technology Licensing, Llc | Increasing virtual machine availability during server updates |
US10282274B2 (en) * | 2017-06-14 | 2019-05-07 | Microsoft Technology Licensing, Llc | Presenting differences between code entity invocations |
US10635108B2 (en) * | 2017-07-03 | 2020-04-28 | Baidu Usa Llc | Centralized scheduling system using global store for operating autonomous driving vehicles |
US10732634B2 (en) | 2017-07-03 | 2020-08-04 | Baidu Us Llc | Centralized scheduling system using event loop for operating autonomous driving vehicles |
US20190057017A1 (en) * | 2017-08-16 | 2019-02-21 | Microsoft Technology Licensing, Llc | Correlation Of Function Calls To Functions In Asynchronously Executed Threads |
US10474600B2 (en) | 2017-09-14 | 2019-11-12 | Samsung Electronics Co., Ltd. | Heterogeneous accelerator for highly efficient learning systems |
US11489773B2 (en) | 2017-11-06 | 2022-11-01 | Pensando Systems Inc. | Network system including match processing unit for table-based actions |
WO2019118628A1 (en) * | 2017-12-12 | 2019-06-20 | Arch Systems Inc. | System and method for physical machine monitoring and analysis |
JP6955163B2 (ja) * | 2017-12-26 | 2021-10-27 | 富士通株式会社 | 情報処理装置、情報処理方法及びプログラム |
CN112074808A (zh) * | 2018-02-22 | 2020-12-11 | 思想系统公司 | 可编程计算机io设备接口 |
US10636112B2 (en) * | 2018-03-28 | 2020-04-28 | Intel Corporation | Graphics processor register data re-use mechanism |
JP7236811B2 (ja) * | 2018-03-30 | 2023-03-10 | 株式会社デンソー | 情報処理装置 |
US11237946B2 (en) * | 2018-05-03 | 2022-02-01 | Sap Se | Error finder tool |
US11468338B2 (en) | 2018-09-11 | 2022-10-11 | Apple Inc. | Compiling models for dedicated hardware |
US11354254B2 (en) * | 2018-10-19 | 2022-06-07 | Nippon Telegraph And Telephone Corporation | Data processing system, central arithmetic processing apparatus, and data processing method |
US11126532B1 (en) * | 2018-11-14 | 2021-09-21 | Teledyne Lecroy, Inc. | Method and apparatus for a parallel, metadata-based trace analytics processor |
US10824538B2 (en) | 2019-01-22 | 2020-11-03 | Oracle International Corporation | Scalable incremental analysis using caller and callee summaries |
US11169886B2 (en) * | 2019-01-29 | 2021-11-09 | Sap Se | Modification of temporary database pages |
WO2020185752A1 (en) | 2019-03-12 | 2020-09-17 | Arch Systems Inc. | System and method for network communication monitoring |
US11782816B2 (en) * | 2019-03-19 | 2023-10-10 | Jens C. Jenkins | Input/output location transformations when emulating non-traced code with a recorded execution of traced code |
US11281560B2 (en) * | 2019-03-19 | 2022-03-22 | Microsoft Technology Licensing, Llc | Input/output data transformations when emulating non-traced code with a recorded execution of traced code |
US11657162B2 (en) * | 2019-03-22 | 2023-05-23 | Intel Corporation | Adversarial training of neural networks using information about activation path differentials |
US11036546B1 (en) | 2019-04-16 | 2021-06-15 | Xilinx, Inc. | Multi-threaded shared memory functional simulation of dataflow graph |
US11204745B2 (en) * | 2019-05-23 | 2021-12-21 | Xilinx, Inc. | Dataflow graph programming environment for a heterogenous processing system |
US11138019B1 (en) | 2019-05-23 | 2021-10-05 | Xilinx, Inc. | Routing in a compilation flow for a heterogeneous multi-core architecture |
US10802807B1 (en) | 2019-05-23 | 2020-10-13 | Xilinx, Inc. | Control and reconfiguration of data flow graphs on heterogeneous computing platform |
US10860766B1 (en) | 2019-05-23 | 2020-12-08 | Xilinx, Inc. | Compilation flow for a heterogeneous multi-core architecture |
US11727265B2 (en) * | 2019-06-27 | 2023-08-15 | Intel Corporation | Methods and apparatus to provide machine programmed creative support to a user |
US11516234B1 (en) * | 2019-07-08 | 2022-11-29 | Cisco Technology, Inc. | In-process correlation through class field injection |
US11068364B2 (en) * | 2019-07-12 | 2021-07-20 | Intelliflash By Ddn, Inc. | Predictable synchronous data replication |
US10949332B2 (en) | 2019-08-14 | 2021-03-16 | Microsoft Technology Licensing, Llc | Data race analysis based on altering function internal loads during time-travel debugging |
US11216446B2 (en) * | 2019-08-29 | 2022-01-04 | Snowflake Inc. | Identifying software regressions based on query retry attempts in a database environment |
US11016849B2 (en) * | 2019-09-04 | 2021-05-25 | Red Hat, Inc. | Kernel software raid support for direct-access file systems |
CN111427816A (zh) * | 2020-03-04 | 2020-07-17 | 深圳震有科技股份有限公司 | 一种amp系统核间通讯方法、计算机设备及存储介质 |
US11216259B1 (en) * | 2020-03-31 | 2022-01-04 | Xilinx, Inc. | Performing multiple functions in single accelerator program without reload overhead in heterogenous computing system |
US11693795B2 (en) * | 2020-04-17 | 2023-07-04 | Texas Instruments Incorporated | Methods and apparatus to extend local buffer of a hardware accelerator |
US20210382888A1 (en) * | 2020-06-08 | 2021-12-09 | Mongodb, Inc. | Hedged reads |
US11611588B2 (en) * | 2020-07-10 | 2023-03-21 | Kyndryl, Inc. | Deep learning network intrusion detection |
US11360918B1 (en) * | 2020-12-21 | 2022-06-14 | Otis Elevator Company | Real-time processing system synchronization in a control system |
DE102021102460A1 (de) | 2021-02-03 | 2022-08-04 | Ford Global Technologies, Llc | Verfahren zur Durchführung einer Simulation |
US20230004365A1 (en) * | 2021-06-24 | 2023-01-05 | Marvell Asia Pte Ltd | Multistage compiler architecture |
US11467811B1 (en) * | 2021-06-24 | 2022-10-11 | Marvell Asia Pte Ltd | Method and apparatus for generating metadata by a compiler |
US11537457B2 (en) * | 2021-06-25 | 2022-12-27 | Intel Corporation | Low latency remoting to accelerators |
US11941291B2 (en) * | 2021-09-02 | 2024-03-26 | Micron Technology, Inc. | Memory sub-system command fencing |
US20220191003A1 (en) * | 2021-12-10 | 2022-06-16 | Tamas Mihaly Varhegyi | Complete Tree Structure Encryption Software |
CN114398019B (zh) * | 2022-01-24 | 2024-02-23 | 广州文石信息科技有限公司 | 屏幕更新请求的处理方法、装置及电子墨水屏设备 |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5088034A (en) * | 1988-01-29 | 1992-02-11 | Hitachi, Ltd. | Compiling method for determining programs to be executed parallelly by respective processors in a parallel computer which transfer data with a data identifier to other processors |
US5414849A (en) * | 1992-10-30 | 1995-05-09 | Hitachi, Ltd. | Evaluating method of data division patterns and a program execution time for a distributed memory parallel computer system, and parallel program producing method using such an evaluating method |
US5799142A (en) * | 1994-09-12 | 1998-08-25 | Nec Corporation | Debugging method and debugging system for multi-task programs |
US6170051B1 (en) * | 1997-08-01 | 2001-01-02 | Micron Technology, Inc. | Apparatus and method for program level parallelism in a VLIW processor |
US6611956B1 (en) * | 1998-10-22 | 2003-08-26 | Matsushita Electric Industrial Co., Ltd. | Instruction string optimization with estimation of basic block dependence relations where the first step is to remove self-dependent branching |
US20040078779A1 (en) * | 2002-10-22 | 2004-04-22 | Bala Dutt | Inducing concurrency in software code |
US20050188364A1 (en) * | 2004-01-09 | 2005-08-25 | Johan Cockx | System and method for automatic parallelization of sequential code |
US20060005179A1 (en) * | 2004-06-30 | 2006-01-05 | Nec Corporation | Program parallelizing apparatus, program parallelizing method, and program parallelizing program |
US7047395B2 (en) * | 2001-11-13 | 2006-05-16 | Intel Corporation | Reordering serial data in a system with parallel processing flows |
US20080046875A1 (en) * | 2006-08-16 | 2008-02-21 | Gad Haber | Program Code Identification System and Method |
US20080127200A1 (en) * | 2006-07-04 | 2008-05-29 | Iti Scotland Limited | Techniques for program execution |
US20090150872A1 (en) * | 2006-07-04 | 2009-06-11 | George Russell | Dynamic code update |
Family Cites Families (41)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US117274A (en) * | 1871-07-25 | Improvement in barrel-heads | ||
EP0396833A1 (en) * | 1989-05-12 | 1990-11-14 | International Business Machines Corporation | Trace facility for use in a multiprocessing environment |
US5692193A (en) | 1994-03-31 | 1997-11-25 | Nec Research Institute, Inc. | Software architecture for control of highly parallel computer systems |
US6539339B1 (en) * | 1997-12-12 | 2003-03-25 | International Business Machines Corporation | Method and system for maintaining thread-relative metrics for trace data adjusted for thread switches |
US6115763A (en) | 1998-03-05 | 2000-09-05 | International Business Machines Corporation | Multi-core chip providing external core access with regular operation function interface and predetermined service operation services interface comprising core interface units and masters interface unit |
US20060117274A1 (en) * | 1998-08-31 | 2006-06-01 | Tseng Ping-Sheng | Behavior processor system and method |
US20040154027A1 (en) * | 1998-10-14 | 2004-08-05 | Jean-Jacques Vandewalle | Method and means for managing communications between local and remote objects in an object oriented client server system in which a client application invokes a local object as a proxy for a remote object on the server |
US6480818B1 (en) * | 1998-11-13 | 2002-11-12 | Cray Inc. | Debugging techniques in a multithreaded environment |
US6636950B1 (en) * | 1998-12-17 | 2003-10-21 | Massachusetts Institute Of Technology | Computer architecture for shared memory access |
JP2000293498A (ja) * | 1999-04-05 | 2000-10-20 | Nec Corp | 分散環境におけるリモートデバッグシステムおよび記録媒体 |
US20020065864A1 (en) | 2000-03-03 | 2002-05-30 | Hartsell Neal D. | Systems and method for resource tracking in information management environments |
US6748583B2 (en) * | 2000-12-27 | 2004-06-08 | International Business Machines Corporation | Monitoring execution of an hierarchical visual program such as for debugging a message flow |
US6857084B1 (en) * | 2001-08-06 | 2005-02-15 | Lsi Logic Corporation | Multiprocessor system and method for simultaneously placing all processors into debug mode |
US6862694B1 (en) * | 2001-10-05 | 2005-03-01 | Hewlett-Packard Development Company, L.P. | System and method for setting and executing breakpoints |
US7318164B2 (en) * | 2001-12-13 | 2008-01-08 | International Business Machines Corporation | Conserving energy in a data processing system by selectively powering down processors |
US6941492B1 (en) * | 2002-02-05 | 2005-09-06 | Emc Corporation | Debugging tool for efficient switching between targets in a multi-processor environment |
US7080283B1 (en) * | 2002-10-15 | 2006-07-18 | Tensilica, Inc. | Simultaneous real-time trace and debug for multiple processing core systems on a chip |
US7243264B2 (en) * | 2002-11-01 | 2007-07-10 | Sonics, Inc. | Method and apparatus for error handling in networks |
US7222343B2 (en) * | 2003-01-16 | 2007-05-22 | International Business Machines Corporation | Dynamic allocation of computer resources based on thread type |
US7444546B2 (en) * | 2003-04-17 | 2008-10-28 | Arm Limited | On-board diagnostic circuit for an integrated circuit |
US7114042B2 (en) * | 2003-05-22 | 2006-09-26 | International Business Machines Corporation | Method to provide atomic update primitives in an asymmetric heterogeneous multiprocessor environment |
US7743382B2 (en) * | 2003-11-03 | 2010-06-22 | Ramal Acquisition Corp. | System for deadlock condition detection and correction by allowing a queue limit of a number of data tokens on the queue to increase |
US7721069B2 (en) * | 2004-07-13 | 2010-05-18 | 3Plus1 Technology, Inc | Low power, high performance, heterogeneous, scalable processor architecture |
WO2006028520A1 (en) * | 2004-09-07 | 2006-03-16 | Starent Networks, Corp. | Migration of tasks in a computing system |
GB0420442D0 (en) * | 2004-09-14 | 2004-10-20 | Ignios Ltd | Debug in a multicore architecture |
US7437581B2 (en) * | 2004-09-28 | 2008-10-14 | Intel Corporation | Method and apparatus for varying energy per instruction according to the amount of available parallelism |
JP2006227706A (ja) * | 2005-02-15 | 2006-08-31 | Matsushita Electric Ind Co Ltd | プログラム開発装置およびプログラム開発プログラム |
US7665073B2 (en) * | 2005-04-18 | 2010-02-16 | Microsoft Corporation | Compile time meta-object protocol systems and methods |
US7689867B2 (en) * | 2005-06-09 | 2010-03-30 | Intel Corporation | Multiprocessor breakpoint |
US7827551B2 (en) * | 2005-09-21 | 2010-11-02 | Intel Corporation | Real-time threading service for partitioned multiprocessor systems |
US7793278B2 (en) * | 2005-09-30 | 2010-09-07 | Intel Corporation | Systems and methods for affine-partitioning programs onto multiple processing units |
US8490065B2 (en) * | 2005-10-13 | 2013-07-16 | International Business Machines Corporation | Method and apparatus for software-assisted data cache and prefetch control |
US9081609B2 (en) * | 2005-12-21 | 2015-07-14 | Xerox Corporation | Image processing system and method employing a threaded scheduler |
US8533680B2 (en) * | 2005-12-30 | 2013-09-10 | Microsoft Corporation | Approximating finite domains in symbolic state exploration |
US9038040B2 (en) * | 2006-01-25 | 2015-05-19 | International Business Machines Corporation | Method for partitioning programs between a general purpose core and one or more accelerators |
US20070250820A1 (en) * | 2006-04-20 | 2007-10-25 | Microsoft Corporation | Instruction level execution analysis for debugging software |
GB2443277B (en) * | 2006-10-24 | 2011-05-18 | Advanced Risc Mach Ltd | Performing diagnostics operations upon an asymmetric multiprocessor apparatus |
US9229726B2 (en) * | 2006-10-26 | 2016-01-05 | International Business Machines Corporation | Converged call flow and web service application integration using a processing engine |
US20080108899A1 (en) * | 2006-11-06 | 2008-05-08 | Nahi Halmann | Hand-held ultrasound system with single integrated circuit back-end |
EP2006784A1 (en) * | 2007-06-22 | 2008-12-24 | Interuniversitair Microelektronica Centrum vzw | Methods for characterization of electronic circuits under process variability effects |
US9223677B2 (en) * | 2008-06-11 | 2015-12-29 | Arm Limited | Generation of trace data in a multi-processor system |
-
2007
- 2007-05-11 GB GB0709182A patent/GB2443277B/en active Active
- 2007-08-24 KR KR1020097010591A patent/KR101325229B1/ko active IP Right Grant
- 2007-08-24 EP EP07789311A patent/EP2076837B1/en active Active
- 2007-08-24 CN CN2007800396948A patent/CN101529391B/zh active Active
- 2007-08-24 MY MYPI20091066A patent/MY144449A/en unknown
- 2007-08-24 DE DE602007009857T patent/DE602007009857D1/de active Active
- 2007-08-24 WO PCT/GB2007/003223 patent/WO2008050076A1/en active Application Filing
- 2007-08-24 JP JP2009533925A patent/JP5054115B2/ja not_active Expired - Fee Related
- 2007-09-06 TW TW096133294A patent/TWI407374B/zh active
- 2007-09-11 US US11/898,360 patent/US20080098208A1/en not_active Abandoned
- 2007-09-11 US US11/898,363 patent/US20080098207A1/en not_active Abandoned
- 2007-10-18 US US11/907,881 patent/US7809989B2/en active Active
- 2007-10-23 US US11/976,315 patent/US8190807B2/en active Active
- 2007-10-23 US US11/976,314 patent/US8250549B2/en active Active
-
2009
- 2009-02-26 IL IL197314A patent/IL197314A/en not_active IP Right Cessation
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5088034A (en) * | 1988-01-29 | 1992-02-11 | Hitachi, Ltd. | Compiling method for determining programs to be executed parallelly by respective processors in a parallel computer which transfer data with a data identifier to other processors |
US5414849A (en) * | 1992-10-30 | 1995-05-09 | Hitachi, Ltd. | Evaluating method of data division patterns and a program execution time for a distributed memory parallel computer system, and parallel program producing method using such an evaluating method |
US5799142A (en) * | 1994-09-12 | 1998-08-25 | Nec Corporation | Debugging method and debugging system for multi-task programs |
US6170051B1 (en) * | 1997-08-01 | 2001-01-02 | Micron Technology, Inc. | Apparatus and method for program level parallelism in a VLIW processor |
US6611956B1 (en) * | 1998-10-22 | 2003-08-26 | Matsushita Electric Industrial Co., Ltd. | Instruction string optimization with estimation of basic block dependence relations where the first step is to remove self-dependent branching |
US7047395B2 (en) * | 2001-11-13 | 2006-05-16 | Intel Corporation | Reordering serial data in a system with parallel processing flows |
US20040078779A1 (en) * | 2002-10-22 | 2004-04-22 | Bala Dutt | Inducing concurrency in software code |
US20050188364A1 (en) * | 2004-01-09 | 2005-08-25 | Johan Cockx | System and method for automatic parallelization of sequential code |
US20060005179A1 (en) * | 2004-06-30 | 2006-01-05 | Nec Corporation | Program parallelizing apparatus, program parallelizing method, and program parallelizing program |
US20080127200A1 (en) * | 2006-07-04 | 2008-05-29 | Iti Scotland Limited | Techniques for program execution |
US20090150872A1 (en) * | 2006-07-04 | 2009-06-11 | George Russell | Dynamic code update |
US20080046875A1 (en) * | 2006-08-16 | 2008-02-21 | Gad Haber | Program Code Identification System and Method |
Cited By (142)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8902340B2 (en) | 1997-07-12 | 2014-12-02 | Google Inc. | Multi-core image processor for portable device |
US9544451B2 (en) | 1997-07-12 | 2017-01-10 | Google Inc. | Multi-core image processor for portable device |
US9338312B2 (en) | 1997-07-12 | 2016-05-10 | Google Inc. | Portable handheld device with multi-core image processor |
US8947592B2 (en) | 1997-07-12 | 2015-02-03 | Google Inc. | Handheld imaging device with image processor provided with multiple parallel processing units |
US9179020B2 (en) | 1997-07-15 | 2015-11-03 | Google Inc. | Handheld imaging device with integrated chip incorporating on shared wafer image processor and central processor |
US8953060B2 (en) | 1997-07-15 | 2015-02-10 | Google Inc. | Hand held image capture device with multi-core processor and wireless interface to input device |
US9584681B2 (en) | 1997-07-15 | 2017-02-28 | Google Inc. | Handheld imaging device incorporating multi-core image processor |
US9560221B2 (en) | 1997-07-15 | 2017-01-31 | Google Inc. | Handheld imaging device with VLIW image processor |
US9432529B2 (en) | 1997-07-15 | 2016-08-30 | Google Inc. | Portable handheld device with multi-core microcoded image processor |
US9237244B2 (en) | 1997-07-15 | 2016-01-12 | Google Inc. | Handheld digital camera device with orientation sensing and decoding capabilities |
US9219832B2 (en) | 1997-07-15 | 2015-12-22 | Google Inc. | Portable handheld device with multi-core image processor |
US9197767B2 (en) | 1997-07-15 | 2015-11-24 | Google Inc. | Digital camera having image processor and printer |
US9191530B2 (en) | 1997-07-15 | 2015-11-17 | Google Inc. | Portable hand-held device having quad core image processor |
US8908075B2 (en) | 1997-07-15 | 2014-12-09 | Google Inc. | Image capture and processing integrated circuit for a camera |
US9191529B2 (en) | 1997-07-15 | 2015-11-17 | Google Inc | Quad-core camera processor |
US9185246B2 (en) | 1997-07-15 | 2015-11-10 | Google Inc. | Camera system comprising color display and processor for decoding data blocks in printed coding pattern |
US9185247B2 (en) | 1997-07-15 | 2015-11-10 | Google Inc. | Central processor with multiple programmable processor units |
US9168761B2 (en) | 1997-07-15 | 2015-10-27 | Google Inc. | Disposable digital camera with printing assembly |
US9148530B2 (en) | 1997-07-15 | 2015-09-29 | Google Inc. | Handheld imaging device with multi-core image processor integrating common bus interface and dedicated image sensor interface |
US9143636B2 (en) | 1997-07-15 | 2015-09-22 | Google Inc. | Portable device with dual image sensors and quad-core processor |
US9143635B2 (en) | 1997-07-15 | 2015-09-22 | Google Inc. | Camera with linked parallel processor cores |
US9137398B2 (en) | 1997-07-15 | 2015-09-15 | Google Inc. | Multi-core processor for portable device with dual image sensors |
US8908051B2 (en) | 1997-07-15 | 2014-12-09 | Google Inc. | Handheld imaging device with system-on-chip microcontroller incorporating on shared wafer image processor and image sensor |
US9131083B2 (en) | 1997-07-15 | 2015-09-08 | Google Inc. | Portable imaging device with multi-core processor |
US9124737B2 (en) | 1997-07-15 | 2015-09-01 | Google Inc. | Portable device with image sensor and quad-core processor for multi-point focus image capture |
US9124736B2 (en) | 1997-07-15 | 2015-09-01 | Google Inc. | Portable hand-held device for displaying oriented images |
US9060128B2 (en) | 1997-07-15 | 2015-06-16 | Google Inc. | Portable hand-held device for manipulating images |
US9055221B2 (en) | 1997-07-15 | 2015-06-09 | Google Inc. | Portable hand-held device for deblurring sensed images |
US8953178B2 (en) | 1997-07-15 | 2015-02-10 | Google Inc. | Camera system with color display and processor for reed-solomon decoding |
US8823823B2 (en) | 1997-07-15 | 2014-09-02 | Google Inc. | Portable imaging device with multi-core processor and orientation sensor |
US8953061B2 (en) | 1997-07-15 | 2015-02-10 | Google Inc. | Image capture device with linked multi-core processor and orientation sensor |
US8947679B2 (en) | 1997-07-15 | 2015-02-03 | Google Inc. | Portable handheld device with multi-core microcoded image processor |
US8936196B2 (en) | 1997-07-15 | 2015-01-20 | Google Inc. | Camera unit incorporating program script scanner |
US8937727B2 (en) * | 1997-07-15 | 2015-01-20 | Google Inc. | Portable handheld device with multi-core image processor |
US8934053B2 (en) | 1997-07-15 | 2015-01-13 | Google Inc. | Hand-held quad core processing apparatus |
US8934027B2 (en) | 1997-07-15 | 2015-01-13 | Google Inc. | Portable device with image sensors and multi-core processor |
US8928897B2 (en) | 1997-07-15 | 2015-01-06 | Google Inc. | Portable handheld device with multi-core image processor |
US8922791B2 (en) | 1997-07-15 | 2014-12-30 | Google Inc. | Camera system with color display and processor for Reed-Solomon decoding |
US8922670B2 (en) | 1997-07-15 | 2014-12-30 | Google Inc. | Portable hand-held device having stereoscopic image camera |
US20130010150A1 (en) * | 1997-07-15 | 2013-01-10 | Kia Silverbrook | Portable handheld device with multi-core image processor |
US8913151B2 (en) | 1997-07-15 | 2014-12-16 | Google Inc. | Digital camera with quad core processor |
US8913182B2 (en) | 1997-07-15 | 2014-12-16 | Google Inc. | Portable hand-held device having networked quad core processor |
US8913137B2 (en) | 1997-07-15 | 2014-12-16 | Google Inc. | Handheld imaging device with multi-core image processor integrating image sensor interface |
US8908069B2 (en) | 1997-07-15 | 2014-12-09 | Google Inc. | Handheld imaging device with quad-core image processor integrating image sensor interface |
US9137397B2 (en) | 1997-07-15 | 2015-09-15 | Google Inc. | Image sensing and printing device |
US8902324B2 (en) | 1997-07-15 | 2014-12-02 | Google Inc. | Quad-core image processor for device with image display |
US8902333B2 (en) | 1997-07-15 | 2014-12-02 | Google Inc. | Image processing method using sensed eye position |
US8902357B2 (en) | 1997-07-15 | 2014-12-02 | Google Inc. | Quad-core image processor |
US8896724B2 (en) | 1997-07-15 | 2014-11-25 | Google Inc. | Camera system to facilitate a cascade of imaging effects |
US8896720B2 (en) | 1997-07-15 | 2014-11-25 | Google Inc. | Hand held image capture device with multi-core processor for facial detection |
US8866926B2 (en) | 1997-07-15 | 2014-10-21 | Google Inc. | Multi-core processor for hand-held, image capture device |
US8836809B2 (en) | 1997-07-15 | 2014-09-16 | Google Inc. | Quad-core image processor for facial detection |
US8789939B2 (en) | 1998-11-09 | 2014-07-29 | Google Inc. | Print media cartridge with ink supply manifold |
US8866923B2 (en) | 1999-05-25 | 2014-10-21 | Google Inc. | Modular camera and printer |
US8341635B2 (en) | 2008-02-01 | 2012-12-25 | International Business Machines Corporation | Hardware wake-and-go mechanism with look-ahead polling |
US20110173632A1 (en) * | 2008-02-01 | 2011-07-14 | Arimilli Ravi K | Hardware Wake-and-Go Mechanism with Look-Ahead Polling |
US20090199029A1 (en) * | 2008-02-01 | 2009-08-06 | Arimilli Ravi K | Wake-and-Go Mechanism with Data Monitoring |
US8788795B2 (en) | 2008-02-01 | 2014-07-22 | International Business Machines Corporation | Programming idiom accelerator to examine pre-fetched instruction streams for multiple processors |
US20090199028A1 (en) * | 2008-02-01 | 2009-08-06 | Arimilli Ravi K | Wake-and-Go Mechanism with Data Exclusivity |
US8732683B2 (en) | 2008-02-01 | 2014-05-20 | International Business Machines Corporation | Compiler providing idiom to idiom accelerator |
US8725992B2 (en) | 2008-02-01 | 2014-05-13 | International Business Machines Corporation | Programming language exposing idiom calls to a programming idiom accelerator |
US8640142B2 (en) | 2008-02-01 | 2014-01-28 | International Business Machines Corporation | Wake-and-go mechanism with dynamic allocation in hardware private array |
US8640141B2 (en) | 2008-02-01 | 2014-01-28 | International Business Machines Corporation | Wake-and-go mechanism with hardware private array |
US8880853B2 (en) | 2008-02-01 | 2014-11-04 | International Business Machines Corporation | CAM-based wake-and-go snooping engine for waking a thread put to sleep for spinning on a target address lock |
US20090199197A1 (en) * | 2008-02-01 | 2009-08-06 | International Business Machines Corporation | Wake-and-Go Mechanism with Dynamic Allocation in Hardware Private Array |
US20100287341A1 (en) * | 2008-02-01 | 2010-11-11 | Arimilli Ravi K | Wake-and-Go Mechanism with System Address Bus Transaction Master |
US20110173423A1 (en) * | 2008-02-01 | 2011-07-14 | Arimilli Ravi K | Look-Ahead Hardware Wake-and-Go Mechanism |
US8612977B2 (en) | 2008-02-01 | 2013-12-17 | International Business Machines Corporation | Wake-and-go mechanism with software save of thread state |
US20090199189A1 (en) * | 2008-02-01 | 2009-08-06 | Arimilli Ravi K | Parallel Lock Spinning Using Wake-and-Go Mechanism |
US20090199183A1 (en) * | 2008-02-01 | 2009-08-06 | Arimilli Ravi K | Wake-and-Go Mechanism with Hardware Private Array |
US20110173625A1 (en) * | 2008-02-01 | 2011-07-14 | Arimilli Ravi K | Wake-and-Go Mechanism with Prioritization of Threads |
US20110173417A1 (en) * | 2008-02-01 | 2011-07-14 | Arimilli Ravi K | Programming Idiom Accelerators |
US8171476B2 (en) | 2008-02-01 | 2012-05-01 | International Business Machines Corporation | Wake-and-go mechanism with prioritization of threads |
US20100293341A1 (en) * | 2008-02-01 | 2010-11-18 | Arimilli Ravi K | Wake-and-Go Mechanism with Exclusive System Bus Response |
US20110173630A1 (en) * | 2008-02-01 | 2011-07-14 | Arimilli Ravi K | Central Repository for Wake-and-Go Mechanism |
US8516484B2 (en) | 2008-02-01 | 2013-08-20 | International Business Machines Corporation | Wake-and-go mechanism for a data processing system |
US8452947B2 (en) | 2008-02-01 | 2013-05-28 | International Business Machines Corporation | Hardware wake-and-go mechanism and content addressable memory with instruction pre-fetch look-ahead to detect programming idioms |
US20110173631A1 (en) * | 2008-02-01 | 2011-07-14 | Arimilli Ravi K | Wake-and-Go Mechanism for a Data Processing System |
US8386822B2 (en) | 2008-02-01 | 2013-02-26 | International Business Machines Corporation | Wake-and-go mechanism with data monitoring |
US20090199030A1 (en) * | 2008-02-01 | 2009-08-06 | Arimilli Ravi K | Hardware Wake-and-Go Mechanism for a Data Processing System |
US20110173593A1 (en) * | 2008-02-01 | 2011-07-14 | Arimilli Ravi K | Compiler Providing Idiom to Idiom Accelerator |
US8316218B2 (en) | 2008-02-01 | 2012-11-20 | International Business Machines Corporation | Look-ahead wake-and-go engine with speculative execution |
US8312458B2 (en) | 2008-02-01 | 2012-11-13 | International Business Machines Corporation | Central repository for wake-and-go mechanism |
US8250396B2 (en) | 2008-02-01 | 2012-08-21 | International Business Machines Corporation | Hardware wake-and-go mechanism for a data processing system |
US20110173419A1 (en) * | 2008-02-01 | 2011-07-14 | Arimilli Ravi K | Look-Ahead Wake-and-Go Engine With Speculative Execution |
US8015379B2 (en) | 2008-02-01 | 2011-09-06 | International Business Machines Corporation | Wake-and-go mechanism with exclusive system bus response |
US20090199184A1 (en) * | 2008-02-01 | 2009-08-06 | Arimilli Ravi K | Wake-and-Go Mechanism With Software Save of Thread State |
US8225120B2 (en) | 2008-02-01 | 2012-07-17 | International Business Machines Corporation | Wake-and-go mechanism with data exclusivity |
US8127080B2 (en) | 2008-02-01 | 2012-02-28 | International Business Machines Corporation | Wake-and-go mechanism with system address bus transaction master |
US8145849B2 (en) | 2008-02-01 | 2012-03-27 | International Business Machines Corporation | Wake-and-go mechanism with system bus response |
US10869108B1 (en) | 2008-09-29 | 2020-12-15 | Calltrol Corporation | Parallel signal processing system and method |
US9274930B2 (en) * | 2008-12-22 | 2016-03-01 | Microsoft Technology Licensing, Llc | Debugging system using static analysis |
US20100162217A1 (en) * | 2008-12-22 | 2010-06-24 | Microsoft Corporation | Debugging System Using Static Analysis |
US8239847B2 (en) * | 2009-03-18 | 2012-08-07 | Microsoft Corporation | General distributed reduction for data parallel computing |
US20100241828A1 (en) * | 2009-03-18 | 2010-09-23 | Microsoft Corporation | General Distributed Reduction For Data Parallel Computing |
US8886919B2 (en) | 2009-04-16 | 2014-11-11 | International Business Machines Corporation | Remote update programming idiom accelerator with allocated processor resources |
US20100268791A1 (en) * | 2009-04-16 | 2010-10-21 | International Business Machines Corporation | Programming Idiom Accelerator for Remote Update |
US20100268790A1 (en) * | 2009-04-16 | 2010-10-21 | International Business Machines Corporation | Complex Remote Update Programming Idiom Accelerator |
US8082315B2 (en) * | 2009-04-16 | 2011-12-20 | International Business Machines Corporation | Programming idiom accelerator for remote update |
US20100268915A1 (en) * | 2009-04-16 | 2010-10-21 | International Business Machines Corporation | Remote Update Programming Idiom Accelerator with Allocated Processor Resources |
US8230201B2 (en) | 2009-04-16 | 2012-07-24 | International Business Machines Corporation | Migrating sleeping and waking threads between wake-and-go mechanisms in a multiple processor data processing system |
US8145723B2 (en) * | 2009-04-16 | 2012-03-27 | International Business Machines Corporation | Complex remote update programming idiom accelerator |
US9569272B2 (en) * | 2009-07-15 | 2017-02-14 | Commissariat a l'energie atomique et aux alternatives | Device and method for the distributed execution of digital data processing operations |
US20110016293A1 (en) * | 2009-07-15 | 2011-01-20 | Comm. a l' ener. atom. et aux energies alter. | Device and method for the distributed execution of digital data processing operations |
US8392403B2 (en) | 2009-09-18 | 2013-03-05 | Microsoft Corporation | Management of data and computation in data centers |
US20110072006A1 (en) * | 2009-09-18 | 2011-03-24 | Microsoft Corporation | Management of data and computation in data centers |
US8612952B2 (en) | 2010-04-07 | 2013-12-17 | International Business Machines Corporation | Performance optimization based on data accesses during critical sections |
US9304812B2 (en) | 2010-12-16 | 2016-04-05 | Imagination Technologies Limited | Multi-phased and multi-threaded program execution based on SIMD ratio |
US11947999B2 (en) | 2010-12-16 | 2024-04-02 | Imagination Technologies Limited | Multi-phased and multi-threaded program execution based on SIMD ratio |
GB2486485A (en) * | 2010-12-16 | 2012-06-20 | Imagination Tech Ltd | Completing execution of one phase of a computer program before scheduling any instructions for the next phase |
GB2486485B (en) * | 2010-12-16 | 2012-12-19 | Imagination Tech Ltd | Method and apparatus for scheduling the issue of instructions in a microprocessor using multiple phases of execution |
US10585700B2 (en) | 2010-12-16 | 2020-03-10 | Imagination Technologies Limited | Multi-phased and multi-threaded program execution based on SIMD ratio |
US8566831B2 (en) | 2011-01-26 | 2013-10-22 | International Business Machines Corporation | Execution of work units in a heterogeneous computing environment |
US8533720B2 (en) | 2011-02-25 | 2013-09-10 | International Business Machines Corporation | Offloading work from one type to another type of processor based on the count of each type of service call instructions in the work unit |
US8621430B2 (en) | 2011-03-03 | 2013-12-31 | International Business Machines Corporation | Method for code transformation supporting temporal abstraction of parameters |
US8949809B2 (en) | 2012-03-01 | 2015-02-03 | International Business Machines Corporation | Automatic pipeline parallelization of sequential code |
US10452369B2 (en) | 2012-04-13 | 2019-10-22 | International Business Machines Corporation | Code profiling of executable library for pipeline parallelization |
US9619360B2 (en) | 2012-04-13 | 2017-04-11 | International Business Machines Corporation | Code profiling of executable library for pipeline parallelization |
US8910137B2 (en) | 2012-04-13 | 2014-12-09 | International Business Machines Corporation | Code profiling of executable library for pipeline parallelization |
US20130339923A1 (en) * | 2012-06-19 | 2013-12-19 | Charles Chen Xu | Data Handling Among Actors in a Dataflow Programming Environment |
US8904371B2 (en) * | 2012-06-19 | 2014-12-02 | Telefonaktiebolaget L M Ericsson (Publ) | Data handling among actors in a dataflow programming environment |
US9158537B2 (en) * | 2012-12-04 | 2015-10-13 | International Business Machines Corporation | Streamlining hardware initialization code |
US20140157229A1 (en) * | 2012-12-04 | 2014-06-05 | International Business Machines Corporation | Streamlining Hardware Initialization Code |
US9021426B2 (en) * | 2012-12-04 | 2015-04-28 | International Business Machines Corporation | Streamlining hardware initialization code |
US20140157230A1 (en) * | 2012-12-04 | 2014-06-05 | International Business Machines Corporation | Streamlining Hardware Initialization Code |
US20140195834A1 (en) * | 2013-01-04 | 2014-07-10 | Microsoft Corporation | High throughput low latency user mode drivers implemented in managed code |
US9811319B2 (en) | 2013-01-04 | 2017-11-07 | Microsoft Technology Licensing, Llc | Software interface for a hardware device |
US9323543B2 (en) | 2013-01-04 | 2016-04-26 | Microsoft Technology Licensing, Llc | Capability based device driver framework |
US10949200B2 (en) * | 2013-06-16 | 2021-03-16 | President And Fellows Of Harvard College | Methods and apparatus for executing data-dependent threads in parallel |
US20160313991A1 (en) * | 2013-06-16 | 2016-10-27 | President And Fellows Of Harvard College | Methods and apparatus for parallel processing |
US10296340B2 (en) | 2014-03-13 | 2019-05-21 | Arm Limited | Data processing apparatus for executing an access instruction for N threads |
US9652817B2 (en) | 2015-03-12 | 2017-05-16 | Samsung Electronics Co., Ltd. | Automated compute kernel fusion, resizing, and interleave |
US10475147B2 (en) * | 2016-02-12 | 2019-11-12 | Arm Limited | Multiple GPU graphics processing system |
US20170236244A1 (en) * | 2016-02-12 | 2017-08-17 | Arm Limited | Graphics processing systems |
CN110199269A (zh) * | 2017-01-23 | 2019-09-03 | 三星电子株式会社 | 用于多处理器之间的数据处理的方法和电子装置 |
CN110998540A (zh) * | 2017-08-01 | 2020-04-10 | 微软技术许可有限责任公司 | 调试器中的跟踪代码的聚焦的执行 |
US10732982B2 (en) | 2017-08-15 | 2020-08-04 | Arm Limited | Data processing systems |
US20210271666A1 (en) * | 2018-09-28 | 2021-09-02 | Marc Brandis Ag | Analyzing a processing engine of a transaction-processing system |
CN111476264A (zh) * | 2019-01-24 | 2020-07-31 | 国际商业机器公司 | 访问受限的系统的对抗鲁棒性的测试 |
US11836256B2 (en) | 2019-01-24 | 2023-12-05 | International Business Machines Corporation | Testing adversarial robustness of systems with limited access |
US11604752B2 (en) | 2021-01-29 | 2023-03-14 | Arm Limited | System for cross-routed communication between functional units of multiple processing units |
US20230195321A1 (en) * | 2021-12-17 | 2023-06-22 | Samsung Electronics Co., Ltd. | Storage device and operating method thereof |
Also Published As
Publication number | Publication date |
---|---|
TW200821938A (en) | 2008-05-16 |
CN101529391A (zh) | 2009-09-09 |
JP5054115B2 (ja) | 2012-10-24 |
MY144449A (en) | 2011-09-30 |
US7809989B2 (en) | 2010-10-05 |
JP2010507855A (ja) | 2010-03-11 |
WO2008050076A1 (en) | 2008-05-02 |
GB2443277B (en) | 2011-05-18 |
US8250549B2 (en) | 2012-08-21 |
US20080215768A1 (en) | 2008-09-04 |
US20080098262A1 (en) | 2008-04-24 |
CN101529391B (zh) | 2011-06-15 |
KR20090082254A (ko) | 2009-07-29 |
US20080098207A1 (en) | 2008-04-24 |
TWI407374B (zh) | 2013-09-01 |
IL197314A (en) | 2012-12-31 |
KR101325229B1 (ko) | 2013-11-04 |
GB0709182D0 (en) | 2007-06-20 |
US20080114937A1 (en) | 2008-05-15 |
IL197314A0 (en) | 2009-12-24 |
DE602007009857D1 (de) | 2010-11-25 |
GB2443277A (en) | 2008-04-30 |
EP2076837A1 (en) | 2009-07-08 |
EP2076837B1 (en) | 2010-10-13 |
US8190807B2 (en) | 2012-05-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20080098208A1 (en) | Analyzing and transforming a computer program for executing on asymmetric multiprocessing systems | |
US10430190B2 (en) | Systems and methods for selectively controlling multithreaded execution of executable code segments | |
Stratton et al. | Efficient compilation of fine-grained SPMD-threaded programs for multicore CPUs | |
Ottoni et al. | Automatic thread extraction with decoupled software pipelining | |
ElTantawy et al. | MIMD synchronization on SIMT architectures | |
Grossman et al. | CnC-CUDA: declarative programming for GPUs | |
Arnold et al. | Power aware heterogeneous MPSoC with dynamic task scheduling and increased data locality for multiple applications | |
Reid et al. | SoC-C: efficient programming abstractions for heterogeneous multicore systems on chip | |
Sorensen et al. | Specifying and testing GPU workgroup progress models | |
Sorensen et al. | GPU schedulers: how fair is fair enough? | |
Ročkai | Model checking software | |
Salcic et al. | GALS-HMP: A heterogeneous multiprocessor for embedded applications | |
Bernard et al. | On the compilation of a language for general concurrent target architectures | |
US20140223419A1 (en) | Compiler, object code generation method, information processing apparatus, and information processing method | |
Lankamp | Developing a reference implementation for a microgrid of microthreaded microprocessors | |
Kumar et al. | A Modern Parallel Register Sharing Architecture for Code Compilation | |
Rutgers | Programming models for many-core architectures: a co-design approach | |
Stavrou et al. | Hardware budget and runtime system for data-driven multithreaded chip multiprocessor | |
Bari | Achieving Resilience and Maintaining Performance in OpenSHMEM+ X Applications | |
Traulsen | Reactive processing for synchronous languages and its worst case reaction time analysis | |
Berkovich | Parallel Run-Time Verification | |
Kreiliger | Time-predictable GPU execution | |
Baudisch | Synthesis of Synchronous Programs to Parallel Software Architectures | |
de Oliveira Jr et al. | CML: C modeling language | |
Naji | Timing analysis for time-predictable architectures |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: REGENTS OF THE UNIVERSITY OF MICHIGAN, THE, MICHIG Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:LIN, YUAN;REEL/FRAME:020240/0435 Effective date: 20071029 Owner name: ARM LIMITED, UNITED KINGDOM Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:REID, ALASTAIR DAVID;FORD, SIMON ANDREW;REEL/FRAME:020232/0590 Effective date: 20071003 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |