JP4066838B2 - Shared resource conflict detector and shared resource conflict detection method - Google Patents

Shared resource conflict detector and shared resource conflict detection method Download PDF

Info

Publication number
JP4066838B2
JP4066838B2 JP2003041575A JP2003041575A JP4066838B2 JP 4066838 B2 JP4066838 B2 JP 4066838B2 JP 2003041575 A JP2003041575 A JP 2003041575A JP 2003041575 A JP2003041575 A JP 2003041575A JP 4066838 B2 JP4066838 B2 JP 4066838B2
Authority
JP
Japan
Prior art keywords
event
logical
shared resource
counter
pattern
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
JP2003041575A
Other languages
Japanese (ja)
Other versions
JP2004252670A (en
Inventor
耕一 久門
周史 山村
Original Assignee
富士通株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 富士通株式会社 filed Critical 富士通株式会社
Priority to JP2003041575A priority Critical patent/JP4066838B2/en
Publication of JP2004252670A publication Critical patent/JP2004252670A/en
Application granted granted Critical
Publication of JP4066838B2 publication Critical patent/JP4066838B2/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Description

[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a detector and a detection method for finding a competition state of shared resources between logical CPUs in a multi-thread processor composed of a plurality of logical CPUs.
[0002]
[Prior art]
In general, in order to exclusively access resources shared between processors in a multiprocessor system, a program area called “critical region” is often provided. A critical region is a program area that can execute only one processor at a time. Access to shared resources between processors in this area maintains the consistency of data during execution of parallel programs.
[0003]
In practice, in a system with multiple processors, a “spinlock variable” is used to determine whether a processor can enter the critical region, that is, whether another processor is not running in the critical region. Provide a variable to call.
For example, for spinlock variables:
-1 if a processor is running in the critical region
・ "0" when no processor is running
Is set.
[0004]
In this implementation, when one processor is running in the critical region and another processor tries to enter the critical region, it is necessary to repeatedly check the spinlock variable value until the spinlock variable becomes “0”. . This loop processing is called “spin loop”. The spin loop is frequently used in software operating on a multiprocessor system as a simple implementation method of exclusive control.
[0005]
However, a big problem occurs in a multi-thread processor (refer to Non-Patent Document 1 for a multi-thread processor). In other words, the thread that is spinning on a certain logical CPU robs the computation resources shared between the logical CPUs, so that the effective performance of other threads performing computation processing is greatly reduced. (For example, see Non-Patent Documents 2 and 3).
[0006]
Further, in the multi-thread processor, not only the competition (competition) of operation resources such as spin lock, but also other resources shared between logical CPUs (for example, Intel Xeon processor (see Non-Patent Document 4)). In some cases, contention in the primary / secondary cache memory and TLB (Translation Look-aside Buffer) is shared between logical CPUs also causes performance degradation (see, for example, Non-Patent Document 5).
[0007]
Next, a conventional technique for finding a competitive part of a resource will be described.
In the execution of a program, the work of collecting statistical information such as “where in the program the most time was consumed” is called performance profiling. The most basic and widely used technique for performing performance profiling is PC sampling (see Non-Patent Document 6, for example).
[0008]
PC (program counter) sampling is to record which part of the program was executed at certain intervals, and perform performance profiling by performing statistical processing on the sampled data after the program is executed. In practice, PC sampling is realized on an existing processor by combining an event measurement counter and a counter overflow interrupt.
[0009]
For example, a performance monitoring counter (for example, see Non-Patent Document 5) mounted on the Intel processor is an event measurement counter having the above-described functions. However, when a conventional event measurement counter is used, it is possible to perform sampling based on a specific event (for example, time base, number of executed instructions, etc.), but a combination of operations of multiple instructions such as a spin loop. It cannot cope with measuring the event caused by.
[0010]
Also, an event measurement counter for multiprocessor / multithread processor has been proposed (see, for example, Patent Documents 1 and 2). However, each of these proposals has only a function of enabling counting for each thread running on the processor or recording the total running time for all threads. Sampling measurement using such a function seems to be able to identify the part where all threads are active, but the important thing in performance profiling is that all threads are active , To determine what operation (eg, spin lock) they were performing. In this regard, any of the above methods can only count the number of occurrences of a single event, such as simply counting the number of instructions executed or counting the number of occurrences of a cache miss event. It can be said that it is insufficient for considering the relationship between logical CPUs.
[0011]
In addition to the above method, a method called “ProfileMe” (see, for example, Non-Patent Document 7) that profiles an instruction itself has also been proposed. However, in this method, an identifier is set for each instruction to measure the execution delay of the instruction itself, and it is not possible to check a loop process composed of a plurality of instructions such as a spin loop.
[0012]
Furthermore, a function called a “Watch Dog timer” for checking the operating state of the processor at certain intervals is known. If this function is applied, there is a possibility that the occurrence location of the spin loop can be specified. However, with this method, it is difficult to distinguish from loop processing other than the spin loop appearing in the program, and even if it can be detected, it is only necessary to identify one location where the spin loop is present, and performance profiling It cannot be applied to statistical processing such as In addition, the above-described method can be used only for loop processing, and it is difficult to apply it to detect shared resource contention, which is a problem in the present invention.
[0013]
Here, the logical CPU is simply defined. In order to control multiple independent instruction flows inside the multi-thread processor,
(1) A register group that holds an instruction control unit and an instruction execution state (2) There is an arithmetic unit or the like shared between the above (1). Here, the combination of (1) and (2) necessary for executing an independent instruction stream is called a logical CPU. On the other hand, the entire physical processor is referred to as a “physical CPU”.
[0014]
The “thread” is a series of execution instruction sequences having an execution context that can be recognized by the OS or hardware.
[0015]
[Non-Patent Document 1]
"Simultaneous Multithreading: Maximizing On-Chip Parallelism", Dean M. Tullsen, Susan J. Eggers, and Henry M. Levy, In Proc. Of 22nd Annual Interna-tional Symposium on Computer Architecture, pp. 392-403, June 1995.
[0016]
[Non-Patent Document 2]
"Using Spin-Loops on Intel Pentium4 Processor and Intel Xeon ProcessorVersion 2.1", May 2001, Order Number 248674-002.
[0017]
[Non-Patent Document 3]
"Introduction to Next Generation Multiprocessing: Hyper-Threading Tech-nology", http://www.intel.com/technology/hyperthread/intro nexgen /.
[0018]
[Non-Patent Document 4]
"Hyper-Threading Technology Architecture and Microarchitecture",
Deborah T. Marr, et al., Intel Technology Journal, Volume.6, Issue.1, February 2002.
[0019]
[Non-Patent Document 5]
"IA-32 Intel Architecture Software Developer's Manual Volume 3 System Programming Guide", September, 2002, Order Number 245472-009, p.7-40.
[0020]
[Non-Patent Document 6]
"Measuring Computer Performance A Practitioner's Guide", David J. Lilja, Cambridge University Press, New York, NY, 2000.
[0021]
[Non-Patent Document 7]
"ProfileMe: Hardware Support for Instruction-Level Profiling on Out-of-Order Processors", Jeffrey Dean, James E.Hicks, Carl A.Waldspurger, WilliamE.Weihl, George Chrysos, International Symposium on Microarchitecture, 1997
[0022]
[Patent Document 1]
Japanese Patent Laid-Open No. 10-275100 (first page, FIG. 1)
[0023]
[Patent Document 2]
JP-A-9-237203 (first page)
[0024]
[Problems to be solved by the invention]
As described above, in a multi-thread processor, the execution speed of a program on a certain logical processor is greatly influenced by the operation status of the program on another logical processor. In particular, when a spin loop is executed on one logical processor, the effective performance of the program on the other logical processor may be greatly reduced. However, it has been considered difficult to detect when and where such a spin loop occurs during program execution.
[0025]
Further, in the case of frequent competition between logical CPUs in a shared resource (for example, cache memory) provided in the multithread processor, there is a risk of causing a significant performance degradation. However, it has also been difficult to identify where such conflicts are likely to occur.
In order to fully demonstrate the performance of the multi-thread processor, it is a problem to easily and accurately find the competition between the spin loop and the shared resource that cause the performance deterioration as described above.
[0026]
[Means for Solving the Problems]
In order to solve the above problems, the shared resource conflict detector in the multi-thread processor of the present invention is configured as follows.
(1) First Invention The principle of the first invention will be described with reference to FIG. The detector of the present invention comprises event acquisition means 1, count means 2 and interrupt means 3.
[0027]
The event acquisition unit 1 acquires events (types of executed events) that occur with the execution of commands from a plurality of CPUs (logical CPUs) being executed in the multi-thread processor.
The counting means 2 counts up the counter when the acquired event is equal to the previously registered event pattern. For example, when the registered event patterns are registered in the order of event A, event B, and event C, if the order of events acquired by the event acquisition means 1 and the event type are the same as the registered event pattern The counter is incremented by one.
[0028]
When the count value counted up by the count unit 2 reaches a predetermined value, the interrupt unit 3 determines that a shared resource conflict has occurred and interrupts the CPU that has generated the event.
The first invention detects the occurrence of contention by paying attention to the occurrence of a characteristic event pattern when a shared resource contention occurs between logical CPUs. Although the configuration of the present invention is as described above, the present invention can be schematically illustrated as shown in FIG. The instructions of the program being executed by the logical CPUx in FIG. 2 are, for example, as shown in FIG. 3A using an instruction set architecture in an existing Intel processor. The events in the registered event pattern in FIG. 2 are similarly shown using performance monitoring events of the Intel processor. The event generated by executing the program in FIG. 3A is shown in FIG. 3B. This event is compared with the registered event pattern, and if they match, the counter is incremented. . When the counter value reaches a predetermined value, an interrupt is generated in the CPU #x that has generated the event.
[0029]
According to the first aspect, it is possible to detect the occurrence of shared resource contention in the multithread processor.
(2) Second invention An event pattern to be registered is an event and an event generation source (a logical CPU generating a spin loop) associated with the event. Thereby, it is possible to detect a race condition between logical CPUs. For example, as shown in FIG. 4A, when the logical CPU # 1 generates a cache miss immediately after the logical CPU # 0 has generated a cache miss, there is a possibility that the access to the cache memory is in a competition state. High nature. By identifying the event source in this way, it is possible to determine in which part of the program execution the contention in the shared resource has an adverse effect. For example, the logical CPU # 0 and the logical CPU # 1 are expected to generate the event shown in FIG. 4B, and this is registered as an event pattern.
(3) In the third invention interrupt means, when the counter reaches a predetermined value, that is, when a conflicting state of the shared resource is detected, the logical CPU is interrupted and the value of the program counter of the logical CPU is sampled. It is. Thereby, it is possible to perform profiling by PC sampling to determine which part of the program a spin loop has occurred in the other logical CPU.
(4) In the fourth invention interrupt means, when a counter reaches a predetermined value, that is, when a conflict state of a shared resource is detected, priority is given to a thread in an execution state different from the thread in which the conflict occurs. Scheduled. Alternatively, the execution of the thread that causes the contention is suspended (stopped). Thereby, it is possible to suppress the occurrence of shared resource contention.
(5) Fifth Invention The present invention comprises the shared resource detection method, event acquisition procedure, count procedure and interrupt procedure of the present invention. The event acquisition procedure acquires an event that occurs in accordance with the execution of a command from a plurality of logical CPUs being executed in the multithread processor. In the counting procedure, the counter is counted up when the acquired event is equal to the previously registered event pattern. In the interrupt procedure, when the count value counted up by the count procedure becomes a predetermined value, the CPU that has generated the event is interrupted. As a result, occurrence of shared resource contention in the multi-thread processor can be detected.
[0030]
DETAILED DESCRIPTION OF THE INVENTION
Next, embodiments of the present invention will be described with reference to the drawings.
(Embodiment 1)
Embodiment 1 is a multi-thread processor composed of two logical CPUs, one logical CPU executing a program composed of five functions (A, B, C, D, E) And At this time, an example of detecting which part of the program is affected by the spin loop executed on the logical CPU # 1 is shown.
[0031]
FIG. 5 shows a basic configuration of a multi-thread processor having a resource contention detection function according to the present invention. This example is a multi-thread processor having two logical CPUs. Each component includes an instruction fetch unit 11 that fetches a command from the program 40, an instruction sequencer 12 that controls a thread, an SU 13 that selects an arithmetic unit, an ALU 14 that is an arithmetic / logical arithmetic unit, an FPA 15 that is a floating-point adder, a multiplier FPM 16, A divider FPD 17, a load store unit LD / ST 18, a register set REG 20 corresponding to the instruction sequencer 12, a Retirement Unit 19 that performs instruction termination processing, and an event comparison unit 30 that detects resource contention. Since the center of the present invention lies in the event comparison unit 30, other parts of the multi-thread processor are omitted.
[0032]
The event comparison unit has a register PTRN REGISTERS 35 for storing event patterns, and event patterns to be detected are registered here. In this example, a maximum of six event occurrence sequences can be detected, and the event occurrence source and the event are registered. Inside the event comparison unit is a PTRN INDEX REGISTER 34 which is used to indicate which event in the PTRNREGISTERS 35 is currently being compared. The generated event is input to the event comparison unit through the event fetch unit 31 and compared with the registered event pattern by the comparator 32. If they match, the counter 33 is counted up, and when the counter 33 overflows, it is determined that a conflict has occurred and an interrupt signal is generated. The counter 33 is composed of 40 bits, for example.
[0033]
FIG. 5 shows an example of a program 40 that performs a spin loop. Assume that this program is executed by the logical CPU # 1, and in this case, an event occurs in a sequence like the event generation pattern of FIG. In this embodiment, as shown in FIG. 6, the counter overflow interrupt generated in the logical CPU # 1 is generated in the logical CPU # 0. Then, PC sampling is performed inside an interrupt handler that is executed when an interrupt occurs to the logical CPU # 0. At the same time, PC sampling is also performed on the basis of the time base (clock base) and the number of execution completion instructions that can be realized by the conventional technology.
[0034]
Next, the processing flow of Embodiment 1 will be described with reference to FIG. First, the count value CNT of the counter 33 and the item number I of the registered event pattern are set to “0” for initialization. The CPU # 1 executes the command of the program 40, and acquires an event that occurs along with the execution from the event fetch unit 31. (S11-S14).
[0035]
The registered event item number I is counted up, and it is checked whether or not the acquired event matches the I-th registered event pattern, and if it matches, it is checked whether or not it is the sixth event. If it is the sixth event, it matches the registered event pattern consisting of six occurrence sequences, so the counter value “CNT” is counted up. If the counter has not overflowed, the process returns to S12 and the acquisition of the event is repeated. If it is not the sixth event, the process returns to S12 and the acquisition of the event is repeated. (S15-S19).
[0036]
When the counter 33 overflows, it is determined that contention has occurred, interrupts the CPU # 0, and samples the value of the PC counter that is one of the register sets REG20. (S20).
FIG. 8 shows an example of the result of PC sampling. This is a result of performing PC sampling for 1 second by the logical CPU # 0 in the CPU operating at 1 GHz. In FIG. 7, when (a) time base (clock base) sampling is performed, (b) sampling is performed based on the number of execution completion instructions, (c) sampling is performed based on a spin lock detection event. In each case, the appearance ratio of each function by PC sampling is shown. For simplicity, it is assumed that event sampling is performed as each event occurs.
[0037]
(A), (b) is the profiling result by the conventional event measurement counter. Usually, the software developer determines from the profiling result (a) that the function C spends the most time, and this is a bottleneck for improving performance. In this case, the CPI (Cycles Per Instructions) of the function C is
[0038]
[Expression 1]
[0039]
It becomes. Here, from (c), while the logical CPU # 0 is executing the function C, the spin lock is established in the logical CPU # 1.
[Expression 2]
[0041]
It can be seen that it has occurred. Since one detected spinlock consists of 6 instructions, the total
[Equation 3]
[0043]
The instruction is executed for the spin lock in the logical CPU # 1. Therefore, roughly speaking, assuming that the spin lock process is not performed in the logical CPU # 1, it can be expected that the number of executed instructions in the logical CPU # 0 will be increased. Therefore, the CPI of the function C excluding the influence of the spin lock is
[0044]
[Expression 4]
[0045]
Thus, about 32% speedup can be expected for the function C.
As described above, according to the present invention, it is possible to detect a portion where the execution time is significantly increased due to the influence of the spin lock.
(Embodiment 2)
An example of a system that improves the effective performance of a processor by detecting a spin loop being executed on a logical CPU and performing thread scheduling in response to the detection in a multi-thread processor composed of two logical CPUs will be described.
[0046]
FIG. 9 shows a configuration example of a multi-thread processor that performs thread scheduling. The components in FIG. 9 are the same as those in FIG. 5, and a data path is provided for transferring the comparison result from the event comparison unit 30 to the scheduling unit 13 and the instruction sequencer 12.
Here, it is assumed that a thread that performs normal calculation processing is executed in the logical CPU # 0, and a thread that performs spin lock is executed in the logical CPU # 1. At this time, the instructions included in each thread are issued from the corresponding instruction sequencers A and B.
[0047]
In normal times, the scheduling unit 13 inputs instructions issued from two instruction sequencers to the execution unit based on the same priority. Here, when the spin lock is detected by the event comparison unit 30, the other thread is scheduled with priority over the thread executing the spin lock. That is, the instruction issued by the instruction sequencer A is given priority to the execution unit over the instruction issued from the instruction sequencer B that performs the spin lock.
[0048]
According to the present invention, a dynamic instruction scheduling apparatus corresponding to the program execution state as described above can be realized, and the instruction execution performance of the multithread processor can be improved.
(Supplementary note 1) A contention detector for a shared resource of a multi-thread processor having a plurality of CPUs,
Event acquisition means for acquiring an event generated by execution of a command by the plurality of CPUs;
A count means for comparing the acquired event with a pre-registered event pattern and counting up a counter when they match,
A shared resource conflict detector, comprising: interrupt means for interrupting a CPU that has generated the event when a count value of the counter reaches a predetermined value.
[0049]
(Supplementary Note 2) The registered event pattern is an event generation source associated with an event and the event,
The sharing according to claim 1, wherein when the count value of the counter reaches a predetermined value, the interrupt unit interrupts the CPU that generated the event or the registered CPU that generated the event. Resource contention detector.
[0050]
(Supplementary note 3) The supplementary means 1 or Supplementary note, wherein when the count value of the counter reaches a predetermined value, the interrupt means interrupts the CPU that generated the event and samples the value of the program counter 3. The shared resource contention detector according to 2.
(Supplementary Note 4) The shared resource contention detector according to Supplementary Note 1 or 2, wherein the interrupting unit performs thread scheduling when the count value of the counter reaches a predetermined value.
[0051]
(Supplementary Note 5) A method for detecting contention for a shared resource of a multi-thread processor having a plurality of CPUs,
An event pattern acquisition procedure for acquiring event occurrence patterns generated by execution of commands by the plurality of CPUs;
A count procedure for comparing the acquired event occurrence pattern with a registered event pattern registered in advance and counting up a counter when they match,
A shared resource conflict detection method, comprising: an interrupt procedure for interrupting a CPU that has generated the event occurrence pattern when a count value of the counter reaches a predetermined value.
[0052]
(Supplementary Note 6) The shared resource conflict detector according to Supplementary Note 1 or 2, wherein the registered event pattern is an event and an occurrence time interval of the event.
(Supplementary Note 7) The supplementary note 1, the supplementary note 2 or the supplementary note, wherein when the count value of the counter reaches a predetermined value, the interrupt means interrupts the CPU that has generated the event. 6. The shared resource contention detector according to 6.
[0053]
【The invention's effect】
By using a counter capable of recognizing an event occurrence pattern in a multi-thread processor, the present invention makes it possible for logical CPUs such as computation resource contention due to a spin loop having a characteristic event occurrence pattern and cache memory use contention. It is possible to efficiently identify a place where a contention for shared resources occurs. This information supports techniques for optimizing programs that run on multithreaded processors.
[Brief description of the drawings]
FIG. 1 is a principle diagram of the present invention.
FIG. 2 is a schematic diagram of the first invention.
FIG. 3 is an example of a spin loop implementation program and an example of an event pattern that has occurred.
FIG. 4 is an example of contention detection in cache memory access.
FIG. 5 is a configuration example of Embodiment 1;
FIG. 6 is an example of conflict detection by identification of an event generation source.
FIG. 7 is a flow example of Embodiment 1;
FIG. 8 is an example of PC sampling.
FIG. 9 is a configuration example of Embodiment 2;
[Explanation of symbols]
1: Event acquisition means 2: Count means 3: Interrupt means 10: Multithread processor 11: Instruction fetch unit 12: Instruction sequencer 13: Selection unit 14: Arithmetic / logical operation unit 15: Floating point adder 16: Multiplier 17: Divider 18: Load / store unit 19: Retirement unit 20: Register set 30: Event comparison unit 31: Event fetch unit 32: Comparator 33: Counter 34: Pattern index register 35: Pattern register

Claims (4)

  1. A shared resource contention detector of a multi-thread processor having a plurality of logical CPUs,
    Event acquisition means for acquiring an event generated by execution of a command by the plurality of logical CPUs;
    The acquired event is compared with a pre-registered event pattern that appears at the time of executing the spin lock, and a count unit that counts up the counter when they match,
    A shared resource contention detector, comprising: interrupt means for performing thread scheduling processing on one or more logical CPUs when a count value of the counter reaches a predetermined value.
  2. The registered event pattern is an event generation source associated with an event and the event,
    The shared resource contention detector according to claim 1, wherein when the count value of the counter reaches a predetermined value, the interrupt unit interrupts the logical CPU that has generated the event.
  3. The interrupt means interrupts one or more logical CPUs when the count value of the counter reaches a predetermined value, and samples the value of a program counter corresponding to the logical CPU. The contention detector for shared resources according to claim 1 or 2.
  4. A method for detecting contention for a shared resource of a multi-thread processor having a plurality of logical CPUs,
    An event pattern acquisition procedure for acquiring an event occurrence pattern generated by execution of a command by the plurality of logical CPUs;
    A count procedure for comparing the acquired event occurrence pattern with a registered event pattern that appears at the time of executing a pre-registered spin lock, and counting up a counter when they match,
    An interrupt procedure for performing thread scheduling processing on one or more logical CPUs when the count value of the counter reaches a predetermined value;
    A conflict detection method for shared resources, comprising:
JP2003041575A 2003-02-19 2003-02-19 Shared resource conflict detector and shared resource conflict detection method Active JP4066838B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2003041575A JP4066838B2 (en) 2003-02-19 2003-02-19 Shared resource conflict detector and shared resource conflict detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2003041575A JP4066838B2 (en) 2003-02-19 2003-02-19 Shared resource conflict detector and shared resource conflict detection method

Publications (2)

Publication Number Publication Date
JP2004252670A JP2004252670A (en) 2004-09-09
JP4066838B2 true JP4066838B2 (en) 2008-03-26

Family

ID=33025116

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2003041575A Active JP4066838B2 (en) 2003-02-19 2003-02-19 Shared resource conflict detector and shared resource conflict detection method

Country Status (1)

Country Link
JP (1) JP4066838B2 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4276201B2 (en) 2005-03-31 2009-06-10 富士通株式会社 Billing processing apparatus for SMT processor, billing processing method, and billing processing program
US7992146B2 (en) 2006-11-22 2011-08-02 International Business Machines Corporation Method for detecting race conditions involving heap memory access
JP5397544B2 (en) 2010-06-25 2014-01-22 富士通株式会社 Multi-core system, multi-core system scheduling method, and multi-core system scheduling program
JP6447320B2 (en) * 2015-04-01 2019-01-09 株式会社デンソー Performance monitoring device

Also Published As

Publication number Publication date
JP2004252670A (en) 2004-09-09

Similar Documents

Publication Publication Date Title
US20160188475A1 (en) Concurrent Execution of Critical Sections by Eliding Ownership of Locks
US10073719B2 (en) Last branch record indicators for transactional memory
Yu et al. Maple: a coverage-driven testing tool for multithreaded programs
JP5615990B2 (en) Acquisition of power profile information with reduced overhead
Li et al. Timing analysis of concurrent programs running on shared cache multi-cores
JP2015111439A (en) Primitives to enhance thread-level speculation
Chen et al. The Jrpm system for dynamically parallelizing Java programs
US8615619B2 (en) Qualifying collection of performance monitoring events by types of interrupt when interrupt occurs
Austin et al. Dynamic dependency analysis of ordinary programs
US10296346B2 (en) Parallelized execution of instruction sequences based on pre-monitoring
Devietti et al. DMP: deterministic shared memory multiprocessing
Cintra et al. Toward efficient and robust software speculative parallelization on multiprocessors
DE602004006858T2 (en) Billing method and circuit for determining pro-thread use of processorial actuators in a simultaneous multithread processor (smt)
TW559732B (en) Method and apparatus for maintaining processor ordering
US8042102B2 (en) Method and system for autonomic monitoring of semaphore operations in an application
US7765547B2 (en) Hardware multithreading systems with state registers having thread profiling data
US5835702A (en) Performance monitor
Zaparanuks et al. Accuracy of performance counter measurements
US6374367B1 (en) Apparatus and method for monitoring a computer system to guide optimization
JP4034363B2 (en) Performance monitoring method and system for operating system based program
US6675192B2 (en) Temporary halting of thread execution until monitoring of armed events to memory location identified in working registers
US7496918B1 (en) System and methods for deadlock detection
US7487502B2 (en) Programmable event driven yield mechanism which may activate other threads
Zhou et al. HARD: Hardware-assisted lockset-based race detection
US6658654B1 (en) Method and system for low-overhead measurement of per-thread performance information in a multithreaded environment

Legal Events

Date Code Title Description
RD04 Notification of resignation of power of attorney

Free format text: JAPANESE INTERMEDIATE CODE: A7424

Effective date: 20040610

RD02 Notification of acceptance of power of attorney

Free format text: JAPANESE INTERMEDIATE CODE: A7422

Effective date: 20040610

A621 Written request for application examination

Free format text: JAPANESE INTERMEDIATE CODE: A621

Effective date: 20060215

A977 Report on retrieval

Free format text: JAPANESE INTERMEDIATE CODE: A971007

Effective date: 20070807

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20070828

A521 Written amendment

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20071017

TRDD Decision of grant or rejection written
A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

Effective date: 20071218

A61 First payment of annual fees (during grant procedure)

Free format text: JAPANESE INTERMEDIATE CODE: A61

Effective date: 20071231

R150 Certificate of patent or registration of utility model

Free format text: JAPANESE INTERMEDIATE CODE: R150

Ref document number: 4066838

Country of ref document: JP

Free format text: JAPANESE INTERMEDIATE CODE: R150

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20110118

Year of fee payment: 3

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20110118

Year of fee payment: 3

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20120118

Year of fee payment: 4

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20130118

Year of fee payment: 5

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20130118

Year of fee payment: 5

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20140118

Year of fee payment: 6