CN105683906B - Selection for being omitted and being locked using lock carries out the self-adaptive processing of data sharing - Google Patents
Selection for being omitted and being locked using lock carries out the self-adaptive processing of data sharing Download PDFInfo
- Publication number
- CN105683906B CN105683906B CN201480053800.8A CN201480053800A CN105683906B CN 105683906 B CN105683906 B CN 105683906B CN 201480053800 A CN201480053800 A CN 201480053800A CN 105683906 B CN105683906 B CN 105683906B
- Authority
- CN
- China
- Prior art keywords
- hle
- affairs
- lock
- failure
- counting
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30076—Arrangements for executing specific machine instructions to perform miscellaneous control operations, e.g. NOP
- G06F9/30087—Synchronisation or serialisation instructions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/52—Program synchronisation; Mutual exclusion, e.g. by means of semaphores
- G06F9/526—Mutual exclusion algorithms
Abstract
It is omitted in (HLE) environment in hardware lock, provides and predictably determine that HLE affairs execute in which whether should actually obtain lock and non-transactional.Including:Based on HLE lock acquisition instruction is encountered, it is based on HLE fallout predictor, determination is to omit lock and continue or obtain lock as HLE affairs and continue as non-transactional;It is predicted as omitting based on HLE fallout predictor, the address of lock is set as to the reading collection of affairs, and inhibit to lock any write-in of the acquisition instruction to lock, and continue in HLE transactional execution pattern, until encountering xrelease instruction or HLE affairs encounter transactional conflict, wherein, xrelease instruction release lock;And it is predicted as not omitting based on HLE fallout predictor, HLE lock acquisition instruction is considered as non-HLE lock acquisition instruction and continues in non-transactional mode.
Description
Technical field
Present disclosure relates generally to transactional memory systems, and more particularly relate to by using lock omit and
Locking selects the adaptively method of shared data, computer program and computer system.
Background technique
The quantity of central processing unit (CPU) core on chip and the quantity for the CPU core connecting with shared memory are continuous
It is significant to increase, to support the workload capacity requirement increased.It cooperates to handle the ever-increasing CPU of identical workload
Quantity causes significantly to bear to software scalability;For example, shared queue or data knot by traditional semaphore protection
Structure becomes hot spot and the road sub-line n is caused to be stretched curve.Traditionally, by realize in software the locking of more fine granulation come
Cope with this point.Realize more fine granulation locking with improve software scalability may be it is extremely complex and error-prone,
And according to current cpu frequency, the delay time of hardwired interconnections is limited to the physical size and the light velocity of chip and system.
Have been introduced into the realization of hardware transaction memory (HTM, or in this discussion text, referred to as TM), wherein
Other central processing unit (CPU) and I/O subsystem apparently, the data of one group of instruction --- be known as affairs --- in memory
(in other literature, atomic operation is also referred to as " block is concurrent " or " serialization ") is operated in structure in a manner of atom.Affairs exist
It is optimistically executed in the case where not locked, still, if the operation of the affairs being carrying out in memory location and same
Another operation conflict in one memory location, it would be possible that needing to stop and retry transactional execution.In the past, software is proposed
Transaction memory is realized to support software transactional memory (TM).But compared with software TM, hardware TM can provide improved property
It can aspect and ease of use.
Entitled " the Method and apparatus for the submitted on August 28th, 2002
Synchronization of distributed caches " U.S. Patent Application Publication No2004/0044850 teaches use
In the synchronous method and apparatus of distributed buffer, the content of the patent application publication is incorporated herein by reference.Specifically, providing
Embodiment be related to cache memory system, and relate more particularly to be suitable for be used together with distributed buffer layering caching association
View is included in caching input/output (I/O) hub and uses.
In entitled " the Partial cache line write transactions that on March 24th, 1994 submits
The United States Patent (USP) 5586297 of in a computing system with a write back cache ", which is taught, proposes one
Computing system of the kind comprising memory, input/output adapter and processor, the content of the patent are incorporated herein by reference.
Processor include can wherein store dirty data write back caching.When execution unanimously writing from input/output adapter to memory
Fashionable, data block is written to the memory location in memory from input/output adapter.The data ratio that data block includes is write
The global buffer row returned in caching is few.Search write back caching with determine write back caching whether include the memory location data.When
When search determines that writing back caching includes the data of the memory location, the global buffer row of the data comprising the memory location is clear
It removes.
Summary of the invention
There is provided a kind of hardware lock omit in (HLE) environment for predictably determining whether HLE affairs should actually obtain
It locks and the method that executes of non-transactional ground.According to the embodiment of present disclosure, method may include:It is obtained based on HLE lock is encountered
Instruction fetch, is based on HLE fallout predictor, and determination is to omit lock and as the continuation of HLE affairs or acquisition lock and as non-transactional
Continue;It is predicted as omitting based on HLE fallout predictor, the address of lock is set as to the reading collection of HLE affairs, and inhibit to be obtained by lock
Any write-in to lock is instructed, and is continued under HLE transactional execution pattern, until encountering xrelease instruction (wherein,
Xrelease instruction release lock) or HLE affairs encounter transactional conflict until;And do not omitted based on the prediction of HLE fallout predictor,
HLE lock acquisition instruction is considered as non-HLE lock acquisition instruction and is continued in non-transactional mode.
In another embodiment of the present disclosure, it is possible to provide hardware lock is omitted to be used for predictably in (HLE) environment
Determine the computer program product that HLE affairs execute in which whether should actually obtain lock and non-transactional.The computer program produces
Product may include:It can be read and be stored by processing circuit and executed for processing circuit to be used to execute method comprising the following steps
Instruction computer readable storage medium:Based on HLE lock acquisition instruction is encountered, it is based on HLE fallout predictor, determination is to omit lock simultaneously
And continues or obtain lock as HLE affairs and continue as non-transactional;It is predicted as omitting based on HLE fallout predictor, by lock
Address is set as the reading collection of affairs, and inhibits any write-in by lock acquisition instruction to lock, and executes in HLE transactional
Continue in mode, encounters transactional until encountering xrelease instruction (wherein, xrelease instruction release lock) or HLE affairs
Until conflict;And do not omitted based on the prediction of HLE fallout predictor, HLE lock acquisition instruction is considered as non-HLE lock acquisition instruction and non-
Affairs sexual norm relaying is continuous.
In another embodiment of the present disclosure, it provides and is used for predictably really in hardware lock omission (HLE) environment
Determine the computer system that HLE affairs execute in which whether should actually obtain lock and non-transactional.The computer system may include:It deposits
Reservoir;With the processor communicated with memory, wherein computer system is configured as executing method comprising the following steps:Base
In encountering HLE lock acquisition instruction, it is based on HLE fallout predictor, determination is to omit lock and continue as HLE affairs or obtain lock simultaneously
And continue as non-transactional;It is predicted as omitting based on HLE fallout predictor, the address of lock is set as to the reading collection of affairs, and press down
Any write-in by lock acquisition instruction to lock is made, and is continued in HLE transactional execution pattern, is referred to until encountering xrelease
Until enabling (wherein, xrelease instructs release to lock) or HLE affairs encounter transactional conflict;And it is pre- based on HLE fallout predictor
It surveys not omit, HLE lock acquisition instruction is considered as non-HLE lock acquisition instruction and is continued in non-transactional mode.
Detailed description of the invention
It is disclosed herein according to the described in detail below of the explanatory embodiment for the present disclosure to be read in conjunction with the figure
The feature and advantage of embodiment will be apparent.The various features of attached drawing are not drawn to, because diagram is to illustrate, to have
Present disclosure is understood in conjunction with specific descriptions conducive to those skilled in the art.In the accompanying drawings:
Fig. 1 and Fig. 2 shows the example multicore transaction memory environment according to the embodiment of present disclosure;
Fig. 3 shows the exemplary components of the example CPU according to the embodiment of present disclosure;
Fig. 4 show according to example hardware or software implementation for using selection between lock is omitted and locked come
The adaptively flow chart of the method for shared data;
Fig. 5 is shown realizes that the conflict for being also referred to as HLE fallout predictor or hardware lock virtual machine is pre- in the environment supported there are HLE
Survey the flow chart of device;
Fig. 6 show according to there is no additional hardware capabilities exemplary embodiment for by using lock omit with
Selection between locking carrys out the flow chart of the adaptively method of shared data;
Fig. 7 is shown according to the exemplary embodiment with hardware lock monitoring for by omitting and locking using in lock
Between selection carry out the flow chart of the adaptively method of shared data;
Fig. 8~9 show the exemplary flow of adaptively shared data;With
Figure 10 is the hardware and software according to the computer environment of at least one exemplary embodiment of the method for Fig. 4~7
Schematic block diagram.
Specific embodiment
In history, computer system or processor only have single processor (aka processing unit or central processing list
Member).Processor includes instruction process unit (IPU), branch units and memory control unit etc..This processor can be primary
Execute the single thread of program.Operating system is developed, which can execute on a processor a period of time by distribution program
Between the period, and then distribute another program and execute another time cycle on a processor and carry out time shared server.With technology
It develops, usually the complicated pooled address translation to processor and comprising translation lookahead buffer (TLB) adds memory subsystem
System caching.IPU itself is commonly referred to as processor.As technology continues to develop, entire processor can be encapsulated as single semiconductor
Chip or bare die, this processor are referred to as microprocessor.Then, the processor that multiple IPU are added, this processor are developed
Also commonly known as multiprocessor.This processor of each of multiprocessor computer system (processor) may include individual or total
Caching, memory interface, system bus and address translation mechanism for enjoying etc..Virtual machine and instruction set framework (ISA) emulation
Device adds software layer to processor, this is mentioned using the single IPU in single hardware processor to virtual machine by isochronous surface
For multiple " virtual processors " (aka processor).As technology further develops, multiline procedure processor is developed, so that
Single hardware processor with single multi-threaded I PU is capable of providing the ability for being performed simultaneously the thread of distinct program, more as a result,
Each thread of thread processor shows as processor to operating system.It, can be in single semiconductor as technology further develops
Multiple processors is placed on chip or bare die (each there is IPU).These processors are referred to as processor core or are only claimed
For core.Thus, for example, such as processor, central processing unit, processing unit, multi-processor core, core, processor core, processor
The term of thread and thread is often used interchangeably.It, can be by the inclusion of the above without departing substantially from teaching herein
Any or all processor implement many aspects of the embodiments herein.Wherein, term " thread " or " place are being used herein
Manage device thread ", it is expected that can have the specific advantages of embodiment in processor thread realization.
It is based onEmbodiment in transactional execute
Be incorporated herein by reference in their entirety " Architecture Instruction Set
In Extensions Programming Reference " 319433-012A, February 2012, the 8th chapter is partly instructed
Multithreading application can realize higher performance using more and more CPU cores.But the write-in of multithreading application requires programmer
Understand and considers the data sharing among multiple threads.Synchronization mechanism is generally required to the access of shared data.Often through
Using the critical section by lock protection, these synchronization mechanisms are employed to ensure that multiple threads pass through the string that is applied in shared data
Rowization operates to update shared data.Due to serializing limiting concurrent, programmer attempts limitation due to caused by synchronizing
Expense.
The synchronous extension of transactional (TSX processor) is allowed to dynamically determine whether to need to protect by lock
The critical section of shield serializes thread and only executes the serialization when needed.This allow processor exposure and using due to
The concurrency in application is hidden in the upper unnecessary synchronization of dynamic.
It utilizesTSX, transactionally execute programmer as defined in code region (also referred to as " transactional region " or
Only it is only called " affairs ").Transactional executes if successfully completed, then, when being watched from other processors, in transactional region
All storage operations of interior execution will appear to instantaneously occur.Only when occurring successfully submitting, that is, when affairs have succeeded
When at executing, processor makes the storage operation being performed in transactional region for being performed affairs can to other processors
See.The processing is also commonly known as atomic commitment.
TSX provides two software interfaces to provide the code region executed for transactional.Hardware lock is omitted
(HLE) be regulation transactional region traditional Compatible instruction set extension (comprising XACQUIRE and XRELEASE prefix).It is constrained
Transaction memory (RTM) be programmer be used for may the mode more more flexible than HLE limit the new instruction set in transactional region
Interface (instructs) comprising XBEGIN, XEND and XABORT.HLE is used for the backward compatibility of the mutual exclusion programming model of preference routine simultaneously
And it is ready that HLE is run in conventional hardware to be enabled software and be also ready on the hardware supported with HLE be omitted using new lock
The programmer of ability.RTM executes the programmer of the flexible interface of hardware for preference transactional.In addition,TSX is also mentioned
It is instructed for XTEST.Whether the instruction allow software to inquire logic processor in the transactional region by HLE or RTM identification
Transactionally execute.
Since the execution of successful transactional ensures atomic commitment, processor is in the case where no explicitly synchronous
Optimistically execute code region.If it is unnecessary for synchronizing for specific execution, execution can be in no any cross-line
Journey is submitted in the case where serializing.If processor cannot be submitted atomically, optimism executes failure.When such case occurs
When, for processor by rollback (roll back) execution, this is the processing of referred to as transactional suspension.When transactional stops, processing
Device will give up all updates executed in the memory area used by affairs, and architecture state is restored to and is seemed seemingly
There is not optimistic execution, and restarts to execute in a manner of non-transactional.
Processor can execute transactional suspension for many reasons.The main reason for stopping affairs is due to transactionally holding
The memory access that conflicts between capable logic processor and another logic processor.This conflict memory access can be interfered into
The transactional of function executes.The storage address read out of transactional region constitutes the reading collection in transactional region, and is written
The write-in collection in transactional region is constituted to the address in transactional region.TSX keeps reading with the granularity of cache lines
Collection and write-in collection.If another logic processor read a position (a part that the position is the write-in collection in transactional region) or
A position (position is the reading collection in transactional region or a part of write-in collection) is written in person, then will appear conflict memory access
It asks.Conflict access generally means that the serialization needed to the code region.Due toTSX is examined with the granularity of cache lines
Measured data conflict, then the extraneous data position being placed in same cache lines will be detected as conflicting, this causes transactional to stop.
Transactional suspension may also occur since transaction resource is limited.For example, the amount of accessed data can exceed that in the zone
Specific to the capacity of implementation.In addition, some instructions and system event will lead to transactional suspension.Frequent transactional stops
Cause circulation waste and efficiency lower.
Hardware lock is omitted
Hardware lock omits (HLE) and provides the traditional Compatible instruction set interface for using transactional to execute for programmer.HLE is provided
Two new instruction prefixes prompts:XACQUIRE and XRELEASE.
Using HLE, programmer adds XACQUIRE prefix to for obtaining before the instruction for protecting the lock of critical section.
The prefix is considered as the prompt of omission write-in relevant to lock acquisition operation by processor.Although lock, which obtains to have, is associated lock
Write operation, but processor to transactional region write-in collection addition lock address, also do not issue any write-in to lock and ask
It asks.But the address of lock is added to reading collection.Logic processor enters transactional execution.If lock is being with XACQUIRE
It is available before the instruction of prefix, then all other processor will continue to lock to be considered as to be available later.Due to thing
Business property executes the address that logic processor is neither written to collection addition lock, does not also execute externally visible write operation to lock,
Therefore, other logic processors can read lock in the case where not leading to data collision.This allow other logic processors also into
Enter and is executed concurrently by the critical section of lock protection.Processor detects any number occurred during transactional executes automatically
Stop according to conflict, and if necessary, transactional will be executed.
Although omitting processor does not execute any external write operation to lock, hardware ensures the program time of the operation to lock
Sequence.If omitting the value for the lock that processor itself is read in critical section, it will look as processor and obtained lock, i.e.,
The reading will return to the value of non-omission.The behavior allows HLE to execute the execution for being functionally equivalent to no HLE prefix.
XRELEASE prefix can be added before the instruction of the lock for release guard critical section.Release lock is comprising to lock
Write-in.If instruction be the value of lock is restored to lock it is same lock using XACQUIRE as prefix lock obtain operation before
The value having then processor omits external write request associated with the release of lock, and does not collect addition lock to write-in
Address.Then processor attempts that transactional is submitted to execute.
By HLE, if multiple threads are executed by the same critical section for locking protection but they are not in mutual data
Any conflict operation is executed, then thread can execute concomitantly, in the case where no serialization.Although software is in shared lock
It is upper to obtain operation using lock, but hardware identification this point, omit lock and in the case where not needing any communication by lock
If executing critical section on two threads --- this communication is unnecessary in dynamic.
If processor cannot transactionally execute region, processor is not held by non-transactional and elliptically
Row region.HLE enables software and guarantees with forward direction progress identical with the execution based on non-HLE lock of lower layer.In order to successful
HLE is executed, and lock and critical section code must comply with certain guilding principles.These guilding principles only influence performance;And it does not abide by
Following these guilding principles will not result in function failure.Before the hardware for not having HLE to support will ignore XACQUIRE and XRELEASE
Sew prompt, and any omission will not be executed, the reason is that these prefixes and effectively being instructed in XACQUIRE and XRELEASE
Ignored REPNE/REPE IA-32 prefix is corresponding.Importantly, HLE is compatible with the programming model based on existing lock.It is uncomfortable
Locality not will lead to function loophole using prompt, but it can expose the delay loophole in code.
Controlled transaction memory (RTM) executes transactional and provides flexible software interface.RTM provide three it is new
Instruction --- XBEGIN, XEND and XABORT --- starts for programmer, submits and stops transactional execution.
Programmer provides the beginning of transactional code region using XBEGIN instruction and instructs regulation affairs using XEND
The end of property code region.If the region RTM cannot be transactionally successfully executed, XBEGIN instruction is obtained to provide and be arrived back
Move back the operand of the opposite offset of IA.
Processor may stop the execution of RTM transactional for many reasons.In many cases, hardware detects affairs automatically
Property stop condition and to restart to execute from back-off instruction address, wherein architecture state be present in XBEGIN instruction
The architecture state at beginning is corresponding, and eax register is updated to description abort state.
XABORT instruction permission programmer clearly stops the execution in the region RTM.XABORT instruction acquirement is loaded into EAX and posts
In storage and thus to available 8 immediate arguments of software after RTM suspension.RTM instruction, which does not have, to be associated with
Any data memory location of connection.Although hardware is not in relation to the region RTM and whether once submits to successful transaction offer guarantee,
But the most of affairs for following the guilding principle of recommendation are submitted with being expected to successful transaction.But programmer must always return
It is preceding to progress to guarantee to provide alternative code sequence in rolling path.This may execute rule with acquisition lock and non-transactional
It is equally simple to determine code region.Also, the affairs always stopped on given implementation can be on the implementation in future
Transactionally complete.Therefore, programmer must assure that the code path for transactional region and alternative code sequence quilt
Success is tested.
The detection that HLE is supported
If CPUID.07H.EBX.HLE [position 4]=1, processor supports HLE to execute.But using can not examine
Look into whether processor is supported to use HLE prefix (XACQUIRE and XRELEASE) in the case where HLE.The processing for not having HLE to support
Device ignores these prefixes and will execute code in the case where not entering transactional and executing.
The detection that RTM is supported
If CPUID.07H.EBX.RTM [position 11]=1, processor supports RTM to execute.Using must be in processor
Whether RTM is supported using check processor before RTM instruction (XBEGIN, XEND, XABORT).These instructions, which are worked as, not to be supported
It is abnormal that #UD will be generated when being used on the processor of RTM.
The detection of XTEST instruction
If processor supports HLE or RTM, processor just XTEST to be supported to instruct.Using must use XTEST
Any of these signatures are checked before instruction.The instruction on the processor for not supporting HLE or RTM when being used
It is abnormal #UD will to be generated.
Inquire that transactional executes state
XTEST instructs the transaction status that can be used to determine the transactional region as defined in HLE or RTM.Although note that
HLE prefix is ignored on the processor for not supporting HLE, but XTEST instruction is when the quilt on the processor for not supporting HLE or RTM
It is abnormal that #UD will be generated when use.
Requirement to HLE lock
The HLE for successful transaction submitted is executed, and lock must satisfy certain characteristics and must abide by the access of lock
Follow certain guilding principles.
The value for the lock being omitted must be restored to what it had before lock obtains by the instruction of prefix of XRELEASE
Value.This allows hardware by not collecting addition lock to write-in safely to omit these locks.Lock release (using XRELEASE as prefix)
The data size and data address of instruction must match the data size and data address that lock obtains (using XACQUIRE as prefix),
And must not lock and intersect with caching line boundary.
Software should not be by being written to the region transactional HLE divided by XRELEASE for any instruction other than the instruction of prefix
The interior lock being omitted, otherwise this write-in will lead to transactional suspension.In addition, recursive locks (thread repeatedly obtains same lock, and
It is not that first release is locked) also result in transactional suspension.Note that the lock being omitted that software observable obtains in critical section
As a result.This read operation will return to the value of write-in to lock.
Processor detects the violation to these pointer policies automatically, and is safely transitioned into the non-transactional not omitted
It executes.Due toTSX detects conflict in the granularity on cache lines, therefore, be co-located at phase to the lock being omitted
Other logic processors that the write-in of data on same cache lines can be omitted same lock are detected as data collision.
Transaction nest
Both HLE and RTM support subtransaction region.But transactional stops state being restored to beginning affairs
Property execute operation:Outermost instructs by the HLE valid instruction of prefix or outermost XBEGIN of XACQUIRE.Processor is by institute
There are nested affairs to be considered as an affairs.
HLE is nested and omits
Programmer can the nesting region HLE, until depth of the MAX_HLE_NEST_COUNT specific to implementation.
Each logic processor is counted in internal trace nesting, but the counting is disabled software.Using XACQUIRE as the HLE of prefix
Valid instruction is incremented by nested counting, and successively decreases it by the HLE valid instruction of prefix of XRELEASE.Logic processor is in nesting
Count from 0 become 1 when enter transactional execute.Only when nesting counting becomes 0, logic processor is attempted to submit.If nested
It counts more than MAX_HLE_NEST_COUNT, then transactional suspension may occur in which.
Other than supporting the nested region HLE, processor can also omit multiple nested locks.Processor tracking lock, with
In to start and the lock to be with XRELEASE for same lock as the HLE valid instruction of prefix using XACQUIRE
The omission that the HLE valid instruction of prefix terminates.Processor can be up to MAX_HLE_ELIDED_ in any one time tracking
The lock of LOCKS quantity.For example, if implementation supports MAX_HLE_ELIDED_LOCKS value and if programmer for 2
The critical section of three HLE of nesting identification in three different locking (by executing using XACQUIRE as the legal finger of the HLE of prefix
Enable, without on any of lock execute intervention using XRELEASE as the HLE valid instruction of prefix), then the first two lock
It will be omitted, but third will not be omitted (and will be added to affairs write-in collection).But execution will transactionally after
It is continuous.Once one XRELEASE in the lock being omitted for two is encountered, by using XACQUIRE as the HLE of prefix
The subsequent lock that valid instruction obtains will be omitted.
When all XACQUIRE being omitted become 0 and lock to meet the requirements with XRELEASE to matched, nested counting
When, processor attempts that HLE is submitted to execute.If execution cannot be submitted atomically, mistake in the case where no omission is executed
It crosses to non-transactional and executes, as the first instruction does not have XACQUIRE prefix.
RTM is nested
Programmer can the nesting region RTM, until specific to the MAX_RTM_NEST_COUNT of implementation.At logic
It manages device to count in internal trace nesting, but the counting is unavailable to software.XBEGIN instruction is incremented by nested counting, also, XEND
Instruct nested count of successively decreasing.Only nested count becomes 0, and logic processor is just attempted to submit.If nesting is counted more than MAX_
RTM_NEST_COUNT, then there is transactional suspension.
Nested HLE and RTM
HLE and RTM provides two kinds of alternative software interfaces to shared transactional executive capability.When HLE and RTM are nested in
When together, for example, transactional processing behavior is specific for implementation when HLE is in RTM or RTM is in HLE
's.But in all cases, implementation will keep HLE and RTM semantic.Implementation can work as to be used in the region RTM
When selection ignore HLE prompt, also, when RTM instruction in the region HLE by use, can lead to transactional suspension.In latter
In the case of, seamlessly occur going to the transition of non-transactional execution from transactional, the reason is that processor will be in not practical progress
The region HLE is re-executed in the case where omission, and then executes RTM instruction.
Abort state definition
RTM is using eax register abort state is transmitted to software.RTM suspension after, eax register have with
Under definition.
Table 1
The reason of EAX abort state of RTM only provides suspension.It does not pass through itself to whether stopping to the region RTM
Or it submits and is encoded.The value of EAX can be 0 after RTM suspension.For example, the cpuid instruction used when in the region RTM
Cause transactional to stop, and is not able to satisfy the requirement for setting any of EAX.It is 0 that this, which will lead to EAX value,.
RTM memory order
Successful RTM submission causes all storage operations in the region RTM to seem atomically to execute.Include heel
The region successful submission RTM of the XBEGIN of XEND, or even when not having storage operation in the region RTM, has with LOCK and is
The identical sequence of the instruction of prefix is semantic.
It is semantic that XBEGIN instruction does not have protection (fencing).But if RTM executes suspension, the area RTM is come from
All memory updatings in domain are rejected and not visible to any other logic processor.
RTM enables debugger and supports
It is default to, any debugging in the region RTM will lead to transactional suspension extremely, and will be so that control flow is reset
To back-off instruction address is arrived, while architecture state is resumed and the position 4 in EAX is set.But in order to allow software
Debugger intercepts the execution in debugging exception, and RTM architecture provides additional ability.
If the position 11 of DR7 and the position 15 of IA32_DEBUGCTL_MSR are 1 and due to debugging abnormal (#DB) or breakpoint
Any RTM suspension caused by abnormal (#BP) cause to execute rollback and from XBEGIN instruction rather than rollback address is restarted.
In this scenario, eax register will also be restored to the point of XBEGIN instruction.
Programming considers
General programmer's identification region is expected to successfully transactional and executes and submit.ButTSX is not provided
Any this guarantee.Transactional execution can stop for many reasons.In order to make full use of transaction-capable, programmer should follow certain
A little guilding principles are run succeeded the probability of submission with increasing its transactional.
This section discussion can lead to the various events of transactional suspension.Architecture ensures in the subsequent affairs for stopping to execute
The update of execution will never become visible.The transactional only submitted executes update of the starting to architecture state.Affairs
Property stop never cause function failure and only influence performance.
Based on instruction the considerations of
Programmer can safely make using any instruction and in any privilege level in the affairs (HLE or RTM)
Use affairs.But some instructions will always stop transactional and execute and cause to execute seamless and be safely transitioned into non-transactional
Property path.
TSX allows most of shared instructions to be used in affairs in the case where not causing and stopping.In affairs
Following operation do not cause generally to stop:
Behaviour on instruction pointer register, general register (GPR) and status indication (CF, OF, SF, PF, AF and ZF)
Make;With
Operation on XMM and YMM register and MXCSR register.
But when mixing SSE and AVX operation in transactional region, programmer must be careful.Hybrid access control XMM is posted
The SSE of storage is instructed and the AVX instruction of access YMM register will lead to transaction abort.Programmer can be used affairs in
REP/REPNE is the string operation of prefix.But long string will lead to suspension.Also, if the use of CLD and STD instruction changes DF
If the value of label, then the use of CLD and STD instruction will lead to suspension.But if DF is 1, STD, instruction will not be led
It causes to stop.Similar, if DF is 0, CLD, instruction will not result in suspension.
Because causing the instruction stopped without enumerating herein not lead to transaction abort generally when using in affairs
(example is including but not limited to MFENCE, LFENCE, SFENCE, RDTSC, RDTSCP etc.).
Instruction below will not stop transactional execution on any implementation:
·XABORT
·CPUID
·PAUSE
In addition, in some implementations, instruction below always can cause transactional to stop.Do not expect that these instructions are normal
For in general transactional region.But programmer must not instruct dependent on these to force transactional to stop, because they
Transactional suspension whether is caused to be to rely on implementation.
Operation in X87 and MMX architecture state.This includes all MMX and X87 instruction, comprising FXRSTOR and
FXSAVE instruction.
Update to non-status sections CLI, STI, POPFD, POPFQ, CLTS of EFLAGS.
Update sector register, debugging register and/or the instruction for controlling register:MOV to DS/ES/FS/GS/
SS、POP DS/ES/FS/GS/SS、LDS、LES、LFS、LGS、LSS、SWAPGS、WRFSBASE、WRGSBASE、LGDT、SGDT、
LIDT、SIDT、LLDT、SLDT、LTR、STR、Far CALL、Far JMP、Far RET、IRET、MOV to DRx、MOV to
CR0/CR2/CR3/CR4/CR8 and LMSW.
Ring transition:SYSENTER, SYSCALL, SYSEXIT and SYSRET.
TLB and cacheability control:CLFLUSH, INVD, WBINVD, INVLPG, INVPCID and having non-temporal is mentioned
The memory instructions (MOVNTDQA, MOVNTDQ, MOVNTI, MOVNTPD, MOVNTPS and MOVNTQ) shown.
Processor state saves:XSAVE, XSAVEOPT and XRSTOR.
It interrupts:INTn,INTO.
·IO:IN, INS, REP INS, OUT, OUTS, REP OUTS and their variable.
·VMX:VMPTRLD,VMPTRST,VMCLEAR,VMREAD,VMWRITE,VMCALL,VMLAUNCH,
VMRESUME, VMXOFF, VMXON, INVEPT and INVVPID.
·SMX:GETSEC.
UD2, RSM, RDMSR, WRMSR, HLT, MONITOR, MWAIT, XSETBV, VZEROUPPER, MASKMOVQ and
V/MASKMOVDQU。
Consider when operation
Other than in addition to based on instruction the considerations of, run time events can lead to transactional and execute suspension.They may be due to
Data access patterns or microarchitecture implementation feature.The not all comprehensive discussion for stopping reason of list below.
Any failure for being necessarily exposed to software or trap in affairs will be suppressed.Transactional execution will stop and hold
It is about to be transitioned into non-transactional execution, as failure or trap did not occur.If abnormal be not blanked, do not cover
Exception will lead to that transactional stops and state will appear as not occurring as abnormal.
Synchronous abnormality event (#DE, #OF, #NP, #SS, #GP, #BR, #UD, #AC, the # occurred during transactional executes
XF, #PF, #NM, #TS, #MF, #DB, #BP/INT3) it can lead to execute and not submit transactionally, and non-transactional is needed to hold
Row.These events are suppressed, as they did not occur.In the case where HLE, due to non-transactional code path and affairs generation
Code path is identical, and therefore, when being merely re-executed with leading to abnormal instruction non-transactional, these events are generally reappeared,
So as to cause suitably transmitting associated synchronous event in non-transactional executes.The synchronization occurred during transactional executes
Event (NMI, SMI, INTR, IPI, PMI etc.), which can lead to transactional and execute, stops and is transitioned into non-transactional execution.Synchronous thing
Part is by pending (pended) and will stop in transactional processed processed later.
Affairs are only supported to write back cacheable memory type operations.If affairs include on any other type of memory
Operation, then affairs can always stop.This includes that the instruction to UC type of memory obtains.
Memory access in transactional region can require processor setting benchmark page table entries access and dirty label.Place
The behavior how reason device handles it is specific for implementation.Even if some implementations allow the update to these labels
Transactional region then stops also to become external visible.It is someTSX implementation may be selected to mark needs at these
Stop transactional when being updated to execute.Also, the page table operation (walk) of processor can produce the transactional write-in to their own
But the access for the state that do not submit.It is someTSX implementation may be selected to stop transactional region in this case
Execution.In any case, which ensures, if transactional region stops, will not pass through the row of the structure of such as HLE
To make the state of transactional write-in architecturally visible.
Transactional, which executes self modifying code, may also lead to transactional suspension.Even if when using HLE and RTM, programmer
It must continue to follow for self-modifying and mutually the Intel recommendation guilding principle of modification code to be written.Although the realization of RTM and HLE
Mode generally will be provided for executing enough resources in shared transactional region, but the implementation constraint and mistake in transactional region
Large scale, which will lead to transactional and execute, stops and is transitioned into non-transactional execution.Architecture does not guarantee to can be used for carrying out affairs
Property execute stock number and do not guarantee transactional execution will succeed.
Affairs successful execution can be prevented to the conflict request of the cache lines accessed in transactional region.For example, if patrolling
Volume processor P0 read the line A in transactional region and another logic processor P1 write line A (in transactional region or
Outside), if that the ability that the write-in interference processor P0 transactional of logic processor P1 executes, then logic processor P0 can in
Only.
Similarly, if the line A and P1 reading or write line A in P0 write-in transactional region are (in transactional region
Or outer), if that the ability that P1 executes the access interference P0 transactional of line A, then P0 can stop.In addition, other relevant
Between traffic meeting or shows as conflict request and will lead to suspension.Although these mistake conflicts can occur, they are expected to not
It is common.The conflict-solving strategy for determining whether P0 or P1 stops is specific for implementation in the above scenario.
Generic transaction executes embodiment:
This is submitted to when being partially completed PH.D degree and requiring in June, 2009 according to Austen McDonald
Paper " the ARCHITECTURES FOR TRANSACTIONAL of smooth good fortune university computer science system and the postgraduate committee
There are three kinds of mechanism required for the transactional region for realizing atom and isolation in MEMORY ":Version Control, collision detection and competing
Management is striven, the full content of the paper is added here by reference.
In order to enable transactional code region seems atomicity, it must by all modifications that the transactional code region executes
Must be stored and keep with other transaction isolations, until submission time.System is by realizing that Edition Control Strategy is completed
This point.There are two Version Control patterns:Thirst for and lazy.Thirst for version control system to store newly generated transactional value
In place and by previous storage device value memory on side, cancelled in log so-called.Lazy version control system is temporarily deposited
Chu Xin value only copies them to memory when submitting in so-called write buffer.In any system, caching by with
In the storage of optimization new version.
In order to ensure affairs seem to be performed atomically, conflict must be detected and solve.The two systems, that is, thirst for
With lazy version control system, conflict is detected by realizing optimistic or pessimistic collision detection strategy.Optimistic system in parallel is held
Business is acted, conflict is only checked when affairs are submitted.Pessimistic system is in each load and the inspection conflict of the place of storage.With Version Control class
Seemingly, collision detection also using caching, thus by each line be denoted as read collection a part or write-in collection a part or this
The two.The two systems are by realizing that competition management strategy solves conflict.There are many competition management strategies, it is some more suitable for
Optimistic collision detection and some more suitable for pessimistic collision detection.Some example policies are described below.
Since each transaction memory (TM) system needs Version Control detection and collision detection, these options that can generate
Four kinds of different TM designs:It is optimistic (LO) to thirst for pessimistic (EP), serious hope optimistic (EO), lazy pessimistic (LP) and laziness.Table 2 is brief
Ground describes all four different TM designs.
Fig. 1 and 2 shows the example of multicore TM environment.Fig. 1 show under the management of interconnected control 120a, 120b with interconnection
Many TM on one bare die 100 of 122 connections enable CPU (CPU1 114a and CPU2 114b etc.).Each CPU 114a, 114b
(also referred to as processor) can have the caching of separation, and the caching of the separation is comprising for caching the finger from memory to be executed
Instruction buffer 116a, 116b of order and with for cache will by CPU 114a, the 114b memory location operated data (behaviour
Count) TM support data buffer storage 118a, 118b.In implementation, the caching of multiple bare dies 100 is interconnected to support more
Caching coherence between the caching of a bare die 100.In implementation, protected using single caching rather than isolated caching
Hold both instruction and datas.In implementation, cpu cache is the caching of a rank in level buffer structure.For example,
Each bare die 100 may be used at the shared buffer memory 124 shared among all CPU 114a, 114b on bare die 100.In another realization
In mode, each bare die 100 may have access to the shared buffer memory 124 shared among all processors of all bare dies 100.
Fig. 2 indicates the details of example transactions CPU 114, the addition comprising supporting TM.Transactional CPU 114 (processor)
It may include the hardware for supporting register checkpointing 126 and special TM register 128.Transactional cpu cache can have routine
The position MESI 130, label 140 and the data 142 of caching, but also have for example, indicating that line is by CPU while executing affairs
114 positions R 132 read and the expression position W 138 that line is written by CPU 114 while executing affairs.
Key detail in any TM system for programmer is how non-transactional access is handed over affairs
Mutually.By design, mechanism more than use mutually screens business call.But it still has to consider rule, non-transactional load
With the interaction between the affairs being newly worth comprising the address.In addition, it is also necessary to inquire into non-transactional storage and read the ground
Interaction between the affairs of location.The problem of these are concept database isolation.
When each non-transactional load and store appear similar to atomic transaction when, TM system be considered realize by force every
From also sometimes referred to as strong atomicity.Therefore, non-transactional load cannot see that the data that do not submit and non-transactional stores
Atomicity is caused to violate in any affairs for having read the address.The system being not the case be considered realizing it is less isolated,
Also sometimes referred to as Weak atomicity.
Strong isolation is often more even more ideal than less isolated, the reason is that relatively easy generalities and realization are isolated by force.In addition, if
Programmer has forgotten about with affairs around some shared memory benchmark so as to cause loophole, then by being isolated by force, program
Member usually will detect the carelessness using simple debugging interface, because programmer will be seen that the non-transactional for causing atomicity to violate
Region.In addition, the program being written in a model may work in different ways in alternate model.
Also, compared with less isolated, strong isolation is often easier TM in hardware.Using strong isolation, due to coherence protocol
Through the load between management processor and transmission is stored, therefore affairs can detecte non-transactional and load and store, and take
Action appropriate.In order to realize strong isolation in software transactional memory (TM), non-transactional code be must be modified, to include
It reads obstacle and writes obstacle;To potentially weaken performance.Although having paid huge effort to remove many unwanted obstacles,
But this technology is often complicated and performance is generally significantly less than the performance of HTM.
Table 2
Table 2 shows the Basic Design space of transaction memory (Version Control and collision detection).
Serious hope-pessimism (EP)
First TM design described below is referred to as serious hope-pessimism.EP system be written into centrally stored " in place " (because
This gains the name " serious hope "), and storage rewrites the old value of line to support rollback in " cancelling log ".Processor uses W 138 and R
132 cache bits are read with tracking and collect and be written collection, and detected and conflict when receiving and eavesdropping load request.EP in known references
The most noticeable example of system may be LogTM and UTM.
Start affairs in EP system like the affairs started in other systems:Tm_begin () obtains register inspection
Point, and initialize any status register.EP system is also required to initialization and cancels log, and details depends on journal format, but
Often comprising the basic pointer of log is initialised to the region of predistribution, thread private memory and removes the deposit of log boundary
Device.
Version Control:In EP, the mode to work, 130 status transition of MESI are designed to due to thirsting for Version Control
(caching line indicator corresponding with Xiu Gai, exclusive, shared and invalid code state) keeps most of constant.Except affairs,
130 status transition of MESI keeps completely constant.When reading the line in affairs, standard coherent transition is applicable in (S (shared) → S, I
(invalid) → S or I → E (exclusive)), it issues to record as needed and miss, but the position R 132 is also set.Similarly, write line is applied
The quasi- transition of mark-on (S → M, E → I, I → M) issues miss as needed, but also sets W (write-in) position 138.When line is by for the first time
When write-in, the legacy version of entire line is loaded and is then written to cancel daily record with reservation when just in case Current transaction stops
It.On legacy data, then the data being newly written are stored " in place ".
Collision detection:Pessimistic collision detection is used about missing, upgrading the relevant message exchanged, to find between affairs
Conflict.When appearance reading is missed in affairs, other processors receive load requests;But if they do not have institute
The line needed, then they ignore request.If other processor non-speculative with required line or have line R 132
(reading), then line is downgraded to S by them, and in some cases, if their M with MESI or the line in E-state,
It then issues and is cached to caching transmission.But if caching has line W 138, conflict is detected between two affairs, and
Additional action must be taken.
Similarly, when (in the first write-in) affairs seek by line from it is shared be upgraded to modification when, affairs sending is also used for
Detect the exclusive load request of conflict.If having line to received caching non-speculative, which is deactivated, also,
In some cases, it issues and is cached to caching transmission (M or E-state).But if line is R 132 or W 138, detect punching
It is prominent.
Verifying:Due to only executing collision detection in each load, affairs always have the write-in collection to its own
Exclusive access.Therefore, verifying does not require any extra work.
It submits:The new version of data item is stored in place due to thirsting for Version Control, the process of submission simply removes W
138 and R 132 simultaneously gives up and cancels log.
Stop:When transaction rollback, the original version for cancelling each cache lines in log must be resumed, this is to be known as
The process of " expansion " or " application " log.This is completed during tm_discard (), and must be former relative to other affairs
Sub- property.Specifically, write-in collection still must be used to detect conflict:The affairs are only cancelled at it correct with line in log
Version, also, request transaction has to wait for from the correct version of the journal recovery.It can be by using in hardware state machine or software
Only processor applies this log.
Serious hope-pessimism has characteristics that:Submission is simple, also, in place because of it, speed is very fast.
Similarly, verifying is do-nothing operation.Pessimistic collision detection detects conflict very early, thus reduces the quantity of " being doomed failure " affairs.
For example, the dependence is detected immediately in pessimistic collision detection if two affairs are related in read-after-write dependence.But
It is that in optimistic collision detection, this conflict is not detected before writer submits.
Serious hope-pessimism also has characteristics that:As described above, old value must quilt when cache lines are written for the first time
It is written to log, to generate additional cache access.Suspension is expensive, the reason is that they need to cancel log.For day
Each cache lines in will, it is necessary to load is issued, it may be before proceeding to next line as far as main memory.Pessimistic collision detection
Prevent that there are the scheduling of certain serializabilities.
In addition, accordingly, there exist livelock a possibility that, and, it is necessary to using small processed when they occur due to conflicting
The competition management mechanism of the heart is preceding to progress to guarantee.
Lazy-optimistic (LO)
Another popular TM design is lazy-optimistic (LO), its storage in " write buffer " or " redoing log "
It is written collection and in submission time detection conflict (still using R and W).
Version Control:As in EP system, the MESI protocol of LO design is forced to implement outside affairs.Once
In affairs, read line just causes standard MESI transition, but also sets the position R 132.Similarly, the position W of write line setting line
138, but the MESI transition for handling LO design is different from EP design.Firstly, the new edition of data is written by lazy Version Control
Originally it is stored in cache hierarchy, until submitting, and other affairs are able to access that in memory or other cachings
Available legacy version.In order to enable legacy version is available, it is necessary to evict dirty line (M line) from when reading first by affairs.Secondly, by
In optimistic collision detection feature, therefore does not need upgrading and miss:If affairs have the line in S state, it can be simply
It is written to line and the line is upgraded to M state, without transmitting these variations with other affairs, because collision detection is being submitted
Time completes.
Collision detection and verifying:In order to verify affairs and detection conflict, LO is only when it prepares to submit by predictive modification
The address of line be transmitted to other affairs.In verifying, one of all addresses of the processor transmission comprising write-in concentration is potential
Big network packet.Data are not sent, but are stayed in the caching of presenter and be denoted as dirty (M).In order to not to mark
The packet is constructed in the case where line search caching for W, using the simple bit vector for being known as " storage buffer ", wherein each slow
A position for depositing line tracks the line of these predictive modifications.Other affairs are wrapped using the address to detect conflict:If caching
In find address and set R 132 and/or W 138, then conflict is initialised.If finding line but without setting R
132 and W 138, then line is by simply invalidation, this is similar with exclusive load is handled.
In order to support transaction atomicity, these addresses packet must be operated atomically, that is, there are two address packet is available
Identical address disposably exists.It, can be by simply obtaining global submission token before sending address packet in LO system
To realize this point.It but can (may be most first by sending out address packet first, collecting response, enforce ordering protocols
Old affair business) to submit scheme using the two-stage, and it is satisfactory for disposably submitting all responses.
It submits:Once verifying, there is no need to special processing for submission:Simply remove W 138 and R 132 with
And storage buffer.The write-in of affairs be denoted as in the buffer it is dirty, and the copy of other cachings of these lines via
Address is coated with invalidation.Then other processors can access submitted data by normal coherence protocol.
Stop:Rollback is similarly very simple:Because write-in collection is contained in local cache, these lines can be deactivated
Change, then removes W 138 and R 132 and storage buffer.Storage buffer allows discovery W line invalid, without
Search for caching.
Laziness-optimism has characteristics that:Suspension be it is very fast, without additional load or storage, and
And only carry out localized variation.More serialized schedulings than finding in EP may be present, this allows LO system more actively to speculate
Affairs be it is independent, this can produce higher performance.Finally, the advanced stage detection of conflict can increase a possibility that being in progress forward.
It is lazy-optimistic also to have characteristics that:Verifying needs the global delivery time proportional to the size of write-in collection.
Conflict due to only being detected in submission time, ill-fated affairs may waste work.
Lazy-pessimistic (LP)
Lazy-pessimistic (LP) indicates the 3rd TM design option, thus in the somewhere between EP and LO:In write buffer
It is middle to store the line being newly written, but conflict is detected on the basis of each access.
Version Control:Version Control is similar with the Version Control of LO but not identical:Read line sets its position R 132, write line
Its position W 138 is set, also, storage buffer is used for the W line in trace cache.Also, as in LO, dirty (M) when
It must be expelled out of by transaction write is fashionable first.But due to collision detection be it is pessimistic, load is exclusive must be from I, S
It is performed when → M upgrade transaction line, this is different from LO.
Collision detection:The operation of the collision detection of LP is identical as EP's:Using relevant message to find rushing between affairs
It is prominent.
Verifying:Such as in EP, pessimistic collision detection ensure at any point the upper affairs run not with any other operation
Affairs have conflict, therefore verifying is do-nothing operation.
It submits:Submission does not need special processing:As in LO, simply removes W 138 and R 132 and deposit
Store up buffer.
Stop:Rollback is also similar with the rollback of LO:Simply by use storage buffer will write-in collection invalidation and
Remove W 138 and R 132 and storage buffer.
Serious hope-optimism (EO)
The LP has characteristics that:It is similar with LO, stop very fast.It is similar with EP, it is reduced using pessimistic collision detection
The quantity of " being doomed failure " affairs.Similar with EP, some serialized schedulings are not allowed to, and, it is necessary to miss in each caching
Execute collision detection.
The final combination of Version Control and collision detection is serious hope-optimism (EO).EO may not be most for HTM system
Good selection:Since new transactional version is written into place, other affairs do not select, and can only be when conflict occurs
(that is, when caching miss occur when) notice conflict.But since EO is waited until just detecting conflict until submission time,
These affairs become " corpse ", they are continued to execute, waste of resource, still " are doomed " to stop.
EO, which has been proved in STM, to be useful and is realized by Bartok-STM and McRT.Lazy Version Control STM
Need to check its write buffer in each reading to ensure that it is reading nearest value.Since write buffer is not hardware
Structure, therefore it is expensive, so that Version Control is thirsted in preference write-in in place.In addition, since the inspection to conflict is in STM
It is also expensive, therefore optimistic collision detection provides the advantages of executing in batches this operation.
Competition management
Have been described above once system determined stop affairs when the affairs how rollback;But since conflict relates to
And two affairs, therefore which affairs should be stopped, how should initialize the suspension and should when reattempt to by
The topic of the affairs of suspension needs to inquire into.These are competition management (CM) topics to be solved, which is affairs storage
The critical component of device.Be described below about system how to initialize the strategy of suspension and manage which affairs should in collision in
Various maturation methods only.
Competition management strategy
Competition management (CM) strategy is to determine which affairs being related in collision should stop and when should weigh
The mechanism of the suspended affairs of new try.For example, situation often reattempts to suspension immediately does not lead to optimal performance.Phase
Instead, better performance can produce to the avoidance mechanism of suspended affairs reattempted to using delay.STM sets about finding first
Best competition management strategy, also, many in the strategy being exemplified below are developed to STM.
CM strategy takes a large amount of measure to make decision, the age including affairs, reads size, the elder generation of collection and write-in collection
The quantity etc. of preceding suspension.It is innumerable for making the combination of such measure determined, but presses increased complexity in general below
Order certain combinations are described
In order to establish some nomenclatures, it is first noted that there are two aspects in collision:Attacker and defender.Attacker
It is the affairs for requesting access to shared memory position.In pessimistic collision detection, attacker is to issue load or load exclusive
Affairs.In optimistic collision detection, attacker is an attempt to the affairs of verifying.Defender in the case of two kinds is to receive attacker
Request affairs.
Positive CM strategy always reattempts to attacker or defender immediately.In LO, actively mean that attacker is total
It is to win, so actively sometimes referred to as submitter wins.This strategy is used for earliest LO system.In the case where EP, actively
It can be won for defender or attacker wins.
Restart that the conflict transaction of another conflict is undergone necessarily to waste work immediately --- i.e. interconnection bandwidth backfill caching
It misses.Courtesy CM strategy uses exponential backoff (but can also be used linear) before restarting conflict.In order to prevent it is hungry (i.e.
Processing does not have the case where resource that it is distributed to by scheduler), exponential backoff greatly increases after certain n times reattempt to
The successful probability of affairs.
Another method of Conflict solving is, random to stop attacker or defender's (strategy being referred to as randomized).It is this
Strategy can be in conjunction with random back scheme, to avoid unnecessary competition.
But select at random, when selecting the affairs to be stopped, can lead to the thing that " extensive work " is completed in suspension
Business, this possible waste of resource.In order to avoid this waste, can consider to be completed in affairs when which affairs determination will stop
Workload.One of work measures the age that can be affairs.Other methods include oldest, batch TM, size consider,
Karma and Polka.Oldest is the simple timestamp method for stopping the most young affairs in conflict.Batch TM uses the program.Ruler
It is very little consider with it is oldest similar, but not instead of using the affairs age, use the quantity of read/write word as priority, from
And it is returned to after the suspension of fixed number of times oldest.Karma be it is similar, the size for using write-in to collect is as priority.Then
Rollback continues after keeping out of the way set time amount.Suspended affairs keep their priority (thus to claim after being aborted
For Karma).Polka works similar to Karma, and still, as the substitution for keeping out of the way predetermined time amount, it is exponentially mended every time
It repays more.
Due to stopping to waste work, therefore, it is considered that delay attacker will lead to more preferably until defender completes its affairs
Performance be logical.Unfortunately, this simple scheme easilys lead to deadlock.
Dead time revision technology can be used to solve this problem.Greediness avoids deadlock using two rules.First
Rule is, if the first affairs T1 has the priority or if T1 waiting another affairs, T1 lower than the second affairs T0
The suspension when conflicting with T0.Second rule is that, if T1 has the priority than T0 high and do not waiting, T0 is waited
Until T0 is submitted, is stopped or start waiting (in this case, the first rule is applicable in).Greediness is provided about being used for
Execute some guarantees of the event horizon of one group of affairs.One EP design (LogTM) is using the CM strategy similar to greediness with benefit
Delay is realized with conservative dead time revision.
Example MESI coherence's rule provides four kinds of possible states that the cache lines of multiprocessor caching system can be resident:
M, E, S and I are defined as foloows:
It modifies (M):Cache lines exist only in current cache and are dirty;It is repaired from the value in main memory
Change.Before any other reading for allowing main memory state (no longer valid), caching is needed will in certain times in future
Data write back to main memory.Writing back becomes exclusive state for line.
Exclusive (E):Cache lines exist only in current cache, but are clean;It matches main memory.It can be in office
When between become shared state in response to read requests.As an alternative, it can become modification state when it is written.
Shared (S):Indicate that the cache lines can be stored in other cachings of machine and be " clean ";It matches main memory
Reservoir.The line can be rejected at any time and (become invalid state).
(I) in vain:Indicate that the cache lines are invalid (unused).
It is encoded other than MESI coherence position or in MESI coherence position, TM coherence can be provided to each cache lines
Positioning indicator (R 132, W 138).Current transaction, and W are read in the instruction of 132 indicator of R from the data of cache lines
138 indicators instruction Current transaction has been written to the data of cache lines.
In the another aspect of TM design, by using transactional storage buffer designing system.On March 31st, 2000
Submit and be added here by reference entitled " the Methods and Apparatus for of entire contents
Reordering and Renaming Memory References in a Multiprocessor Computer
The United States Patent (USP) No.6349361 introduction of System " is at least with the multiprocessor computer of the first and second processors
The method that memory benchmark is resequenced and renamed in system.There is first processor the first privately owned caching and first to delay
Device is rushed, and second processor has the second privately owned caching and the second buffer.Method includes to received by first processor
It includes data that each of multiple gatings storage request for memory data, which is exclusively obtained through the first privately owned caching,
Cache lines and in the first buffer the step of storing data.The first buffer from first processor receive load request with
When loading specific data, specific data is based on loading and store the in-order sequence of operation from being stored in the first buffer
First processor is provided in data.When the first caching receives the load request for data-oriented from the second caching, refer to
Show erroneous condition, also, when load request and the data that are stored in the first buffer for data-oriented to it is corresponding when processing
The current state of at least one of device is reset to state earlier.
The main realization component of one this transaction memory facility is for keeping in pre- affairs GR (general register)
The affairs back-up registers file of appearance, is used for buffer-stored at the CACHE DIRECTORY for tracking the cache lines accessed during affairs
Memory buffers until affairs terminate and the firmware routines for executing various sophisticated functions.In this part, description is detailed
Thin implementation.
IBM zEnterprise EC12 enterprise servers embodiment
IBM zEnterprise EC12 enterprise servers introduce transactional in transaction memory and execute (TX), and portion
Point ground can from IEEE Computer Association meeting issue service (CPS) obtain, 1 to 5 December in 2012 Canada it is British
Article " the Transactional Memory of the 25-36 pages of the collection of thesis given a lecture on the MICRO-45 in Colombia Vancouver
It is described in Architecture and Implementation for IBM System z ", is added here by reference
Entire contents.
Table 3 indicates example transactions.Do not ensure that the affairs since TBEGIN were once successfully completed with TEND, because they can
It is each attempt to execute when undergo suspension condition, for example, due to repeating to conflict with other CPU.This requires program to support back
Route of retreat diameter for example to execute same operation by using conventional locking scheme with carrying out non-transactional.It should be to programming or software verification
Team brings significant burden, especially in the case where not automatically generating rollback path by reliable compiler.
Table 3
It may be heavy for executing (TX) affairs to provide the requirement in rollback path for aborted transactional.In shared data
The many affairs operated in structure are expected to shorter, only contact several different memory locations, and only use simple instruction.
For those affairs, IBM zEnterprise EC12 introduces the concept of controlled affairs;Under normal operation, 114 CPU
Guarantee that controlled affairs finally successfully terminate, even if in the case where not providing stringent limitation to necessary number of retries.
Controlled affairs are instructed with TBEGINC to be started and is terminated with normal TEND.Task is embodied as constrained or not by about
The affairs of beam generally result in quite comparable performance, but controlled affairs are by removing the demand to rollback path come simple
Software development.By IBM disclosed in September, 2012 z/Architecture, Principles of Operation,
The transactional that IBM is further described in Tenth Edition, SA22-7832-09 executes framework, adds here by reference
Enter entire contents.
Controlled affairs are started with TBEGINC instruction.It must comply with a series of programming with the affairs that TBEGINC starts
Constraint;Otherwise program takes non-filterable constraint violation to interrupt.Exemplary constraint may include but be not limited to:Affairs can execute
Most 32 instructions, all instruction texts must be in 256 successive bytes of memory;Affairs only include to refer to forward phase
To branch (i.e. without circulation and subroutine call);(eight words (octoword) are eight words of accessible most 4 alignment of affairs
32 bytes) memory;The complicated order as decimal or floating-point operation is excluded to the limitation of instruction set.Constraint is selected,
So that many common operations of such as double-strand list insertion/deletion operation can be executed, including eight for up to 4 alignment
The very powerful concept that the atom of word compares and exchanges.Meanwhile constraint is selected by conservative, so that following CPU implementation
The success that can guarantee affairs is constrained without adjusting, because it is incompatible otherwise to will lead to software.
In addition to being controlled there is no flating point register (FPR) with program interrupt filtered fields and other than control is considered as zero,
The behavior of TBEGINC is very similar toTBEGIN on the zEC12 server of TBEGIN or IBM in TSX.?
When transactional stops, IA directly set back TBEGINC, rather than subsequent instruction, to reflect for constrained
Affairs the missing retried immediately and stop path.
Do not allow subtransaction in controlled affairs, but if there is TBEGINC in free affairs,
So it is considered as opening new free nested rank, as TBEGIN can be done so.For example, if free
Affairs are invoked at the internal subroutine using controlled affairs, then this is likely to occur.
Implicit closing is filtered due to interrupting, all exceptions during controlled affairs lead to operating system (OS)
In interruption.The final of affairs successfully completes to enter dependent on OS pages by most energy of page 4 of any controlled transaction touch
Power.OS must also ensure that isochronous surface long enough to allow affairs to complete.
Table 4
It is assumed that controlled affairs are not interacted with other codes based on locking, table 4 indicates the constrained of the code in table 3
Transactional implementation.Therefore lock test is not shown, still, if mixing controlled affairs and the code based on lock,
It may addition lock test.
When repeating unsuccessfully, software emulation is executed by using the milli code of a part as system firmware.
Advantageously, because from the burden that programmer removes, therefore controlled affairs have desired characteristic.
IBM zEnterprise EC12 processor introduces transactional and executes facility.The processor can be followed by each clock
Ring decodes 3 instructions;Simple instruction is assigned as single microoperation, and more complicated instruction is cracked into multiple microoperations
232b.Microoperation (Uop 232b, be shown in FIG. 3) is written to unified sending queue 216, they can be out-of-order therefrom
It issues.Most two fixing points, a floating-point, two load/store and two branch instructions can execute each period.Entirely
Office completes table (GCT) 232 and keeps each microoperation and transaction nest depth (TND) 232a.GCT 232 is in decoding time by due order
Sequence write-in tracks the execution state of each microoperation, and when all microoperation 232b of oldest instruction group are by success
Instruction is completed when execution.
1 grade of (L1) data buffer storage 240 (Fig. 3) is the 96KB (K word that delay is recycled with 256 byte cache lines and 4
Section) 6 tunnels association caching, it is associated with the 2nd grade of (L2) data buffer storage 268 (Fig. 3) coupling with dedicated 8 tunnel of 1MB (Mbytes), wherein right
1L, which is missed, is recycled delay cost with 7.L1 caches the caching that 240 (Fig. 3) are closest to processor, also, Ln caching is
Caching on n-th grade of caching.L1 240 (Fig. 3) and L2 268 (Fig. 3) caching is through storage.Each central processing unit
(CP) six cores on chip share 48MB 3rd level storage inside caching, also, the 4th grade of six CP cores and the outer 384MB of chip are slow
Connection is deposited, which is packaged in together on glass ceramics multi-chip module (MCM).Most 4 multi-chip modules
(MCM) it can connect that (not every core can be used for transporting with having relevant symmetric multiprocessor (SMP) system of up to 144 cores
Row customer workload).
Coherence is managed by the variant of MESI protocol.Cache lines read-only can be possessed (sharedly) either exclusive
's;L1 240 (Fig. 3) and L2 268 (Fig. 3) are through storage, and therefore do not include dirty line.L3 and L4 caching is storage inside
And track dirty situation.Even lower level other caching of each caching comprising its all connection.
Coherence's request is referred to as " cross-examination " (XI), and caches from higher level to even lower level by level and do not cache
It sends, and is sent between L4.When a core misses L1 240 (Fig. 3) and L2 268 (Fig. 3) and asks from its local L3
When seeking cache lines, L3 checks whether it possesses line, and before cache lines are returned to requestor by it, if necessary then at this
XI is sent under L3 L2 268 (Fig. 3)/L1 240 (Fig. 3) being currently owned by ensure coherence.If L3 is also missed in request,
So L3 transmits the request to L4, the L4 by by XI be sent under the L4 it is necessary to L3 and be sent to adjacent L4 come
Enforce coherence.Then, L4 responds the L3 for making request, which is transferred to L2 268 (Fig. 3)/L1 for response
240 (Fig. 3).
Note that since caching level is comprising rule, due to upper by from requesting other cache lines in higher level caching
Relevance overflow caused by evict from, sometimes cache lines from junior cache by XI.These XI are referred to alternatively as " LRU XI ", here,
LUR represents minimum use recently.
It is requested referring to another type of XI, degradation-XI will cache ownership and be transformed into read-only status from exclusive, also, solely
It accounts for-XI and will cache ownership and be transformed into invalid state from exclusive.Degradation-XI and exclusive-XI needs to return to the response of XI transmitter.
Target cache " can receive " XI, alternatively, if it before receiving XI firstly the need of dirty data is evicted from, send " refusal " ring
It answers.L1 240 (Fig. 3)/L2 268 (Fig. 3) caching is through storage, still, if they before making exclusive state degradation
Needing to be sent to has storage in the storage queue of L3, then is rejected by degradation-XI and exclusive-XI.The XI being rejected will be by sending out
It send and thinks highly of again.Read-only XI, which is sent to, possesses the read-only caching of line;This XI is not needed to respond, because they cannot be refused
Absolutely.The details and P.Mak, C.Walters and G.Strait of SMP agreement research and develop periodical the 53rd in IBM in 2009:Volume 1
In " IBM System z10 processor cache subsystem microarchitecture " to IBM z10 describe
Those of it is similar, here by reference be added entire contents.
Transactional instruction execution
The exemplary components of Fig. 3 depicted example CPU.Instruction decoding unit (IDU) 208 keeps the tracking Current transaction depth of nesting
(TND)212.When IDU 208 receives TBEGIN instruction, the depth of nesting is incremented by, and successively decreases on the contrary in TEND instruction.It is right
In each assigned instruction, the depth of nesting is written in GCT 232.When the supposition that TBEGIN or TEND are removed afterwards
Property path on when being decoded, the minimus GCT 232 never removed refreshes the depth of nesting of IDU 208.Transaction status
It is written into and issues in queue 216 for execution unit use, mainly used for load/store unit (LSU) 280.It is assumed that thing
Business stops before reaching TEND instruction, and TBEGIN instruction could dictate that the affairs diagnostics block (TDB) for recording status information.
Similar with the depth of nesting, IDU 208/GCT 232 is collaboratively tracked access register/floating-point by transaction nest and posted
Storage (AR/FPR) modifies exposure mask;When AR/FPR modification instruction decoded and modify mask blocks it when, IDU 208 can will in
Only request is put into GCT 232.When instruction becomes next completion, completion is blocked and transaction abort.Others by
Limit instruction is similarly processed, including during the controlled affairs if decoded or more than the maximum depth of nesting
TBEGIN。
Outmost TBEGIN is broken into multiple microoperations according to Gr- preservation-exposure mask;Each microoperation will be by two fixing points
An execution in unit (FXU) 202, a pair of of GR 228 is stored in special affairs back-up registers file 224, should
Affairs back-up registers file 224 in the case where transaction abort for restoring 228 content of GR afterwards.Also, if one
TDB is prescribed, then TBEGIN causes the addressable test of microoperation 226b execution TDB;Address is stored in special objective and posts
In storage, for later use in the case of suspension.In the decoding of outmost TBEGIN, the instruction text of IA and TBEGIN
This is also stored in special objective register, so that later potential suspension is handled.
TEND and NTSTG is single microoperation 232b instruction;In addition to being denoted as non-transactional so that LSU in issuing queue
280 can suitably be handled other than it, and NTSTG (non-transactional storage) is similarly processed with normal storage.TEND is when being executed
For not operation, the end of affairs is executed when completing TEND.
As described above, the instruction in affairs is indicated after this manner in issuing queue 216, but it is otherwise almost unchanged
Ground executes;LSU 280 is executed is isolated tracking described in next part.
Since decoding is in-order, and since IDU 208 keeps tracking current transaction status and by it together with next
Issued in queue 216 from each instructions of affairs write-in, therefore, before affairs, within and TBEGIN, TEND and instruction later
Execution can be by Out-of-order execution.Effective address calculator 236 is contained in LSU 280.It even being capable of (although be less likely)
TEND is executed first, followed by entire affairs and last TBEGIN execution.Pass through 232 recovery routine of GCT in the deadline
Order.The length of affairs is not limited by the size of GCT 232, because can restore general register from back-up registers file 224
(GR)228。
During execution, control filter event is inhibited to record (PER) event, also, PER TEND thing based on event
Part is detected if being activated.Similarly, when in affairs sexual norm, pseudo-random generator can lead to is examined by affairs
The random suspension that disconnected control enables.
Tracking for transaction isolation
Load/store unit tracking cache lines for accessing during transactional executes, also, if from another CPU (or
LUR-XI XI) conflicts with trace (footprint), then triggering stops.If the XI of conflict is exclusive or degradation
XI, then the hope refusal XI that LSU cherishes the completion affairs before L3 repeats XI returns to L3." refusing to budge " is somebody's turn to do in the thing of high competition
It is very effective in business.In order to prevent from hanging up when two CPU mutually refuse to budge, realize that XI refuses counter, XI refusal meter
Number device will trigger transaction abort when meeting threshold value.
L1 CACHE DIRECTORY 240 is conventionally being realized by static random access memory (SRAM).For transaction memory reality
The significance bit 244 (64 rows × 6 column) of existing mode, the catalogue has been shifted in normal logic latch, and every cache lines are mended
Fill two more positions:TX reads 248 and TX dirty 252.
When new outermost TBEGIN is decoded (it is interlocked with still pendent affairs before), TX reads 248
Position is reset.TX is set between reading 248 each load instructions when being executed by being designated as " affairs " in issuing queue.Note
Meaning, if for example executing predictive load on the individual path of error prediction, this can lead to excessive mark.It is loading
The alternative solution that deadline sets TX reading position is too expensive for silicon area, because multiple loads can be completed at the same time, from
And many read ports are needed in load queue.
Storage is executed in a manner of identical with non-transactional mode, but transaction signature is placed in the storage team of store instruction
It arranges in (STQ) 260 entry.The time is being write back, when the data from STQ 260 are written in L1 240, write-in is being delayed
Deposit TX dirty 252 in line setting L1 catalogue 256.Only occur writing back to storage in L1 240 after completing store instruction,
Also, each circulation is written back to more storages.Before completing and writing back, load can be visited by storage forwarding from STQ 260
Ask data;After writing back, CPU 114 (Fig. 2) may have access to the speculative update data in L1 240.If affairs are successfully tied
Beam, then the dirty position 252 the TX of all cache lines is removed, and the TX for the storage being written not yet is indicated in quilt in STQ 260
It removes, to effectively become normally storing by pendent storage.
In transaction abort, all pendent transactional storages are invalidated from STQ 260, even being completed
Those of.Make their significance bit by all cache lines of the affairs modification (that is, opening the dirty position 252 TX) in L1 240
Shutdown, to effectively remove them from L1 240 at once.
Framework requires the isolation for keeping affairs to read collection and write-in collection before completing new instruction.And being hanged in XI not
Delay instruction in reasonable time when certainly and is done to ensure that the isolation;Allow predictive out-of-sequence execution, thus it is optimistic assume it is outstanding and
Pending XI will arrive different addresses and not practical lead to transactional conflict.The design be very natural with it is real on existing system
Existing XI- completes interlocking and adapts to, to ensure the strong memory order of framework needs.
When L1 240 receives XI, L1 240 accesses catalogue to check the validity of the address by XI in L1 240, and
And if TX reads position 248 by the Above-the-line of XI and XI is not rejected, the triggering of LSU 280 stops.It lives when having
Dynamic TX read the cache lines of position 248 from L1 240 by LRU when, special LRU extension vector is to every in 64 rows of L1 240
One is remembered that there are TX read lines on the row.Since there is no accurate addresses to track to LRU extension, LSU is hit
Any XI triggering not being rejected of 280 effective extension row stops.It is assumed that for non-precision LRU extension tracking not with it is other
The conflict of CPU 114 (Fig. 2) causes to stop, and prints then increasing with providing LRU extremely efficient from L1 size to the reading of L2 size
Mark ability and relevance.
Memory trace is limited by memory buffers size (memory buffers discuss in further detail below) and is thus implied
Ground is limited by L2 size and relevance.When the dirty cache lines of TX from L1 by LRU when, do not need execute LRU extension action.
Memory buffers
In existing system, since L1 240 and L2 268 is through memory buffers, each store instruction leads to L3
Storage access;Using 6 cores of present every L3 and the performance of each core further increased, for L3 (and lower
For L2 in degree) filling rate problem is become for certain workloads.In order to avoid store queue delay, it is necessary to which addition is received
Collect memory buffers, the collection memory buffers combination storage and adjacent address before sending storage to L3.
For transaction memory performance, making the dirty cache lines invalidation of each TX from L1 240 be in transaction abort can
Receive, because L2 caching 268 very close (7 circulation L1 miss cost) is in taking back clean line.But for performance (and use
In the silicon area of tracking) for, L2 268 is written before affairs terminate and then makes institute when stopping so that transactional is stored
Dirty L2 cache lines invalidation is unacceptable (or worse on shared L3).
Two for solving the problems, such as memory bandwidth and transaction memory storage processing using memory buffers 264 are collected.Caching
264 be the circular queue with 64 entries, and each entry keeps 128 bytes with the data of the accurate significance bit of byte.
In non-transactional operation, when receiving storage from LSU 280, memory buffers 264 check whether that there are entries to same address, and
And if it is new storage is collected into existing entry.If there is no entry, then new entry is written to queue
In, also, if the quantity of free entry is lower than threshold value, oldest entry is written back in L2 268 and L3 caching.
When starting new outmost affairs, all existing entries in memory buffers 264 are denoted as closing, so that
Not new storage can be collected into these entries, also, start these entries evicting to L2 268 and L3.From this point
It rises, the transactional storage come out from LSU 280STQ 260 distributes new entry or is collected into existing transactional entry.
These storages are write back in L2 268 and L3 and are blocked, until affairs successfully terminate;In the point, subsequent (rear affairs)
Storage can continue to be collected into existing entry, until next affairs are again switched off these entries.
Memory buffers 264 are asked in each exclusive or degradation XI, and if XI is compared with any activity entries
XI is caused to refuse.If fruit stone is not completed further to instruct while continuing to refuse XI, then affairs quilt in certain threshold values
Stop to avoid hang-up.
When memory buffers are overflowed, 280 request transaction of LSU stops.LSU 280 is attempted to send and cannot be merged into now at it
The condition is detected when having the new storage in entry, also, entire memory buffers 264 are filled the storage from Current transaction.It deposits
Storage caching 264 is managed as the subset of L2 268:Although the dirty line of affairs can be evicted from from L1 240, they must be in entire affairs
In remain resident in L2 268.Thus maximum storage trace is limited to the memory buffers size of 64 × 128 bytes, but it
It is limited by the relevance of L2 268.Since L2 268 is that 8 tunnels are associated and have 512 rows, it is general it is sufficiently large with
Just do not lead to transaction abort.
If transactional stops, memory buffers are notified and all entries of transactional data are kept to be deactivated
Change.Memory buffers are also had the mark whether being written by NTSTG instruction about entry by each double word (8 byte) --- this
A little double words keep effective across transaction abort.
The function that milli code is realized
Conventionally, IBM host server processes device includes the firmware layer of referred to as milli code, which executes as certain
The sophisticated functions of cisc instruction execution, interrupt processing, system synchronization and RAS.With the instruction of application program and operating system (OS)
Similar, milli code includes that machine relies on instruction and the instruction from memory acquirement and the instruction set architecture (ISA) executed.Firmware
It resides in the confined area for the main memory that customer's program cannot access.When hardware detection is to needing the case where calling milli code
When, instruction acquisition unit 204 is switched to " milli code pattern " and starts the appropriate position in milli code memory region
It obtains.Milli code can be obtained and be executed by mode identical with the instruction of instruction set architecture (ISA), and may include ISA
Instruction.
For transaction memory, milli code is related under various complex situations.Each transaction abort calls dedicated milli code
Subroutine is to execute necessary hang up.Transaction abort milli code keeps stopping reason, potential different inside hardware by reading
The special register (SPR) of normal reason and suspended IA starts, and then milli code makes if a TDB is designated
TDB is stored with the special register.TBEGIN instruction text is loaded to obtain GR and save exposure mask from SPR, this is to milli code
Know and restores which GR 228 is needed.
CPU 114 (Fig. 2) supports special only milli code command to read backup GR and copy them in main GR to.
TBEGIN IA is also loaded to set the new command address in PSW from SPR, when milli code stops subroutine completion
It is continued to execute after TBEGIN.In the case where stopping the situation as caused by non-filtered program interrupt, which can be saved afterwards
For the old PSW of program.
TABORT instruction can be the realization of milli code;When IDU 208 decodes TABORT, its indicator acquisition unit
It is branched off into the milli code of TABORT, milli code, which is therefrom branched off into share, to be stopped in subroutine.
Extracting transaction nest depth (ETND) instruction can also be by milli code, because it is not to performance-critical;Milli generation
Code loads the current depth of nesting in special hardware register and puts it into GR 228.PPA was instructed by milli generation
Codeization;It by software as the current suspension that operand is supplied to PPA based on being counted and also based on shape inside other hardware
State executes optimal delay.
For controlled affairs, milli code can keep the quantity of tracking suspension.Counter success TEND complete when or
Person is reset to if there is the interruption (because whether or when OS will not be known back to program) entered in OS
0.Stop to count according to current, milli code can call certain mechanism to improve the chance of success that subsequent affairs retry.The mechanism packet
The amount for for example increasing continuously the random delay between retrying and reducing conjectural execution is included, to avoid encountering by practical not to affairs
Stop caused by the predictive access of the data used.As last countermeasure, other CPU are being discharged to continue normally to handle it
Before, milli code can be broadcast to other CPU to stop all conflict work, retry local matter.Multiple CPU must be coordinated with
Do not lead to deadlock, therefore, it is necessary to some serializations between the milli code instance on different CPU.
Referring now to Fig. 4, appended drawing reference 400, which generally illustrates, can be realized in hardware or in software for adaptively sharing number
According to method exemplary embodiment.
In current implementation, it can usually implement two methods for keeping data access synchronous based on lock.Also referred to as
In locking or the data structure really locked locking, in the critical section of code, program may want to be guaranteed to also referred to as
The exclusive access of the memory area of shared data.In this case, program can protect shared data by lock, act on class
Be similar to shared data the time not available competitive program label.But locking mechanism can be controlled strictly to altogether
Enjoy the access of data.In slightly competition memory area, competitive program waits in which may not be necessary, to negatively affect
Performance.For example, while thread 1 keeps lock on structure hash_tbl, the waiting of thread 2 is held in code sample below
It row (although different piece of two threads more new construction) and can be performed in parallel.
Table 5
Above-mentioned HLE allows to be written into hard with being executed using realization transactional with the program for using traditional locks to determine code
The chance of part.But in severe competitiveness memory area, if there is conflict, then processor can stop affairs and
Critical section is re-executed by using pessimistic locking behavior.In one embodiment, any lock intersected with cache lines not by
It omits and automatic trigger will be re-executed in the case where no HLE.Therefore, in known critical section as affairs constantly
It is default to be executed to transactional and then successfully restart performance be made to deteriorate by using lock in the case where failure.
In 410, when processor, that is, CPU 114 (Fig. 2) starting code sequence is to access memory area, CPU 114
(Fig. 2) calls the conflict prediction device (that is, HLE fallout predictor or hardware lock virtualizer) that can be realized in hardware or in software, to taste
Whether examination predicts whether that lock omits may succeed or answer alternative using locking.In operation, as discussed below, punching
Prominent fallout predictor can operate in various hardware and software environment.But in conflict prediction device referring to the conflict prediction in HLE environment
Embodiment in the case where, conflict prediction device is also referred to as HLE fallout predictor.In one embodiment, such as in hardware it deposits
In device or in based on per thread or the memory location shared to all threads, the simple of affairs execution is remained successful
It counts.When transmitting indicates the threshold value for the counting that successful transaction executes, at 410, conflict prediction device can be predicted transactional and execute
It is more effective (to lock) path for path, i.e. non-transactional at lock omission comparable 455, because interference is impossible.At least
In one embodiment, at least one embodiment that the transactional for preferably corresponding to be omitted based on lock is executed, counter is first
Beginningization is with the more effective execution route of originally preference.In another embodiment, within hardware or by insertion program flow
Instruction executes the opposite acquisition of affairs and can be calculated by the estimation relative cost that lock executes.Relative cost based on calculating, punching
Transactional path can be predicted for prominent fallout predictor or non-transactional path is more effective, because the path for example predicted executes cost
It is lower or unlikely encounter interference.In another embodiment, it is pre- impliedly can be inserted into conflict by compiler for behavior prompt
It surveys in device, to select the locking path at the transactional execution route or 455 at 420 at 410.CPU 114 (Fig. 2) can start
Critical section is executed as the affairs at 420, thus the more new data as needed at 425.Affairs at 430 terminate
When but before submitting result, CPU 114 (Fig. 2) can be determined whether to detect the interference that will lead to transaction abort at 435
Two or more code sequences of parallel work-flow (that is, in same data).When not detecting interference, then 440
Place, affairs can be submitted successfully as a result, this then can be used by other affairs.But if CPU 114 (Fig. 2) is examined at 435
Interference is measured, then restarting to execute by using locking at 455.At 460, critical section must be obtained explicitly
The lock in memory which will be accessed region must be protected.But locking requester can be forced to wait until the movement being known as rotation
In by competitive processing release lock until.When finally obtaining lock at 460, critical section can be continued with.It is protected when by lock
When the data of shield are updated at 470, then critical section is completed and can discharge lock at 475.
Referring to Fig. 5, appended drawing reference 500 is generally illustrated realizes conflict prediction device (that is, hard in the environment supported there are HLE
Part lock virtualizer) exemplary embodiment.As described above, HLE isTraditional Compatible instruction set extension, including
XACQUIRE and XRELEASE, which, which allows to be written into, with the program for using traditional locks to determine code there is utilization to realize
The chance for the hardware that transactional executes is without substantially modifying code.In the present embodiment, HLE fallout predictor is HLE
Particular example.
At 505, CPU 114 (Fig. 2) is executedXACQUIRE prefix instruction using associated lock to be obtained
Affairs start HLE sequence.In one embodiment, the sequence can by followed by lock obtain affairs XACQUIRE indicate.Some
In implementation, XACQUIRE prefix can be ignored.In other implementations, XACQUIRE sequence is optionally executed
Column.After starting HLE homing sequence, conflict prediction device (i.e. HLE fallout predictor) is called at 510.Based on prediction, can hold
Row lock omits or available lock.When lock omit and obtain lock between predict when, processing can with Fig. 4 420~
The substantially similarly continuation described at 475.
Referring to Fig. 6, appended drawing reference 600 generally illustrate according to there is no the exemplary embodiment of additional hardware capabilities,
For omitting the selection between locking using lock come the flow chart of the method for adaptively shared data.In this exemplary implementation
In example, can for example it be mentioned through operating system in the code flow of application program or by hardware offer to conflict fallout predictor
Show.For example, in one embodiment, programmer can explicitly be inserted into one or more instructions or compiler and can impliedly insert
Enter the behavior prompt to conflict fallout predictor.Conflict prediction device can keep history vector or counting, in some of such as 1 second
The quantity of both success prediction and unsuccessful prediction (i.e. error prediction) is tracked on period.Then, at 610, conflict prediction
Device can the counting that comparison error is predicted during the time window and the number of thresholds to fail.When mistake is pre- during time window
Survey when being more than the number of thresholds of failure, conflict prediction device can remainder to time window it is default to using lock (i.e. non-transactional
Sexual norm) execution.During the time window, due to when multiple affairs simultaneously update inconsistency data when working characteristics, deposit
Reservoir region can be high competition.By will lock temporarily be selected as it is default, conflict prediction device can avoid must open again
A possibility that affairs to fail that begin, and handling capacity is improved by avoiding transaction abort.But once time window expires,
The competition of memory area can be become easily, and conflict prediction device can again attempt to transactional execution.In embodiment,
Conflict prediction device is implemented with software, wherein to be made by the algorithm of software realization execution lock omit also be locked out determine will
The second edition that the lock that the first version or code that the lock that control is transmitted to code realization omits are realized obtains.In other implementations
In example, the history based on interference, in response to by the instruction to the particular items to be updated of software, and reflect with as more
The relevant expectation interference of the field of the target of new affairs or non-interference etc., determine 610 by using alternative test realization.
At 655, critical section must explicitly obtain the lock of the accessed memory area of protection.But lock requester
Until being forced to wait until that lock is discharged in the movement for being referred to as rotation by competitive program.It is finally obtained when at 660
When lock, critical section can be continued with.When by lock protection data when 670 are updated, then at 675 complete key area
Section, and lock and can be released.At 680, CPU 114 (Fig. 2) can review time window expire.If time window does not arrive
Phase, then processing terminate at 680.But if time window expires, at 685, failure affairs execute and success thing
The counting that business executes can be reset, to effectively reset time window and start the re -training of conflict prediction device.
In the case where error prediction is no more than the number of thresholds of failure during time window, at 610, conflict prediction
Device may be selected lock and omit, i.e., HLE affairs or reads lock word with explicit rather than obtain lock and join together to realize the affairs that lock omits.
When be selected as HLE affairs execute (or as with by execute its read concentrate comprising lock word affairs come execute lock save
Software transaction slightly joins together to realize the affairs that lock omits) when, at 615, CPU 114 (Fig. 2) can be incremented by successful transaction and hold
Capable counting.HLE affairs at 620 can at 625 more new data as needed.At the end of affairs at 630 but at 635
Submission result before, CPU 114 (Fig. 2) can be determined whether to detect the interference that will lead to transaction abort (that is, in same data
Two or more code sequences of upper parallel work-flow).When not detecting interference, at 640, HLE affairs (or realize lock
The other affairs omitted) it can successfully submit as a result, these results then can be used by others processing.But if at 635
CPU 114 (Fig. 2) detects interference, then being incremented by the counting that failure affairs execute at 650, because failure affairs can be regarded as mistake
Misprediction and can be used for trains conflict prediction device more accurate predict in the future.At 655 and 660, CPU 114
(Fig. 2) can attempt to be locked on memory area now and non-transactional restart critical section (i.e. using lock).
When by locking the data protected finally when 670 are updated, then the processing of critical section is completed, and is locked and can be released at 675
It puts.At 680, CPU 114 (Fig. 2) can review time window expire.If time window does not expire, at 680
Processing terminate.But when time window expires, then the counting that failure affairs execute and successful transaction executes can at 685
It is reset, to effectively start the re -training of conflict prediction device.
Referring now to Fig. 7, it may include executing that appended drawing reference 700, which is generally illustrated for the method for adaptively shared data,
The flow chart of the exemplary embodiment of facility when lock in monitoring hardware.In Fig. 7, the processing (i.e. 710 to 750) of HLE affairs
It is similar that HLE affairs (i.e. 610 to 650) substantially how are handled with the embodiment of Fig. 6.But Fig. 7 is critical section just in non-thing
The path executed to business property introduces hardware lock monitoring facility.In the present embodiment, in permission critical section in locked storage
While execution in device region, hardware lock monitoring facility is attempted to minimize error prediction by prediction result, such as key area
It is that HLE affairs are the same that Duan Shiji, which is executed,.Once successfully obtaining lock at 760 and 765, hardware lock monitoring facility can start
The situation of monitoring lock at 770.Critical section at 775 updates the data in locked memory area and leads at 780
Release lock is crossed to complete to execute.But during execution, if hardware lock monitoring facility detects another processing inspection at 785
The state of lock label is looked into, if then it is affairs rather than non-transactional that the critical section, which executes, by the trial of other processing
Processing will lead to interference and affairs failure.In one embodiment, only monitoring is locked.In another embodiment, as locked
The data that a part in region is updated are monitored.As a result, hardware lock monitoring facility can be incremented by unsuccessfully affairs at 790
The counting of execution.
In another embodiment, hardware lock monitoring facility can monitor all trial data in locked memory area
Access.If another processing is attempted to access the data in the region, at 790, hardware lock monitoring facility can be counted
For interference and the failure of potential affairs.Therefore, conflict prediction device can learn more accurately to predict transactional execution or non-transactional
Property execute be more likely to success.
In another embodiment, can increasing affairs when executing the counting of failure the setting at 750 restart to mark.So
Afterwards, when the counting that successful transaction executes is incremented by, this can be resetted at 755 and restarts to mark.Restarting label can lead to
The counting that crossing prevents unsuccessfully affairs from executing is incremented by (primary when failure i.e. at 750 as HLE affairs, and using twice
Lock restarts primary at 755) improve forecasting accuracy.
It referring now to Fig. 8, in embodiment, is omitted in (HLE) environment in hardware lock, predictably determines that HLE affairs are
It is no to execute 810 with actually obtaining lock and non-transactional and include:Based on HLE lock acquisition instruction is encountered, it is based on HLE fallout predictor,
Determination is to omit lock and continue or obtain lock as HLE affairs and continue 820 as non-transactional;Based on HLE fallout predictor
It is predicted as omitting, the address of lock is set as to the reading collection of HLE affairs, and inhibit to write any of lock by lock acquisition instruction
Enter, and continue in HLE transactional execution pattern, encounters transactional punching until encountering xrelease instruction or HLE affairs
Until prominent 830, wherein xrelease instruction release lock;And do not omitted based on the prediction of HLE fallout predictor, by HLE lock acquisition instruction view
Acquisition instruction is locked for non-HLE and continues 840 in non-transactional mode.
Referring now to Fig. 9, in embodiment, the prediction based on HLE affairs is successfully updated HLE fallout predictor.Based on for the first time
The HLE affairs with lock address are encountered, the counting that success HLE affairs associated with lock address execute is initialized as zero;Base
In any subsequent HLE affairs for completing that there is lock address, it is incremented by associated with the lock address of HLE affairs in HLE fallout predictor
The counting that failure HLE affairs execute, wherein high counting indicator may stop 920.In non-transactional mode, monitor by another
One processing accesses the trial of lock;And when the trial access by another processing is detected, it is incremented by what failure HLE affairs executed
Count 950.Track the counting that the successful HLE affairs in time window execute and the counting that failure HLE affairs execute;And it is based on
The counting that the HLE affairs that fail execute is more than the number of thresholds of failure, and the remainder of time window is default to non-transaction mode
970.It is expired based on time window, is zero by the counting that success HLE affairs execute and the count resets that failure HLE affairs execute
960。
Referring now to fig. 10, calculate the internal part 800 and external component 900 that equipment 1000 may include each group.Internal portion
Each of the group of part 800 includes:One or more processors 820;One or more computer-readable RAM 822;
One or more computer-readable ROM 824 in one or more buses 826;One or more operating systems 828;
Execute one or more software applications of the method for Fig. 5~7;With one or more computer-readable tangible storage devices
830.One or more operating systems are stored on one or more in each computer-readable tangible storage device 830,
With one via one or more in each RAM 822 (generally comprising buffer memory) by each processor 820 or more
Multiple execution.In the embodiment shown in fig. 10, each of computer-readable tangible storage device 830 is that internal hard drive drives
Dynamic disk storage equipment.As an alternative, each of computer-readable tangible storage device 830 is such as ROM
824, EPROM, flash memory semiconductor memory apparatus or can store computer program and digital information it is any its
Its computer-readable tangible storage device.
Each group of internal part 800 further includes for reading from one or more computer-readable tangible storage devices 936
The R/W driving for taking and being written to or interface 832, wherein one or more computer-readable tangible storage devices 936 are all
Such as thin supply memory devices, CD-ROM, DVD, SSD, memory stick, tape, disk, CD or semiconductor memory apparatus.R/W
Driving or interface 832 can be used for 840 firmware of device driver, software or microcode being loaded into tangible storage device 936
To be conducive to and calculate the communication of the component of equipment 100.
Each group of internal part 800 may also include such as TCP/IP adapter card, wireless Wi-Fi interface card or 3G or 4G
The network adapter (or switch port card) or interface 836 of wireless interface card or other wired or wireless communication links.With
Calculating the associated operating system 828 of equipment 1000 can be via network (for example, internet, local area network or wide area network) and each network
Adapter or interface 386 calculate (for example, server) from outside and are downloaded to calculating equipment 1000.It (or is opened from network adapter
Close port adapter) or interface 836, operating system 828 associated with equipment 1000 is calculated is loaded into each hard drive 830
With network adapter 836.Network may include copper wire, optical fiber, Wireless transceiver, router, firewall, switch, gateway computer and/
Or Edge Server.
Each of the group of external component 900 may include computer display monitor 920, keyboard 930 and computer mouse
Mark 934.External component 900 may also include touch screen, dummy keyboard, touch tablet, sensing equipment and other human interface devices.It is interior
Each of the group of portion's component 800 further includes docking with computer display monitor 920, keyboard 930 and computer mouse 934
Device driver 840.Device driver 840, R/W driving or interface 832 and network adapter or network 836 include hardware and
In software (being stored in storage equipment 830 and/or ROM 824).
The various embodiments of present disclosure can be in being suitable for memory and/or the data processing system for executing program code
It is implemented, which includes at least one processing directly or indirectly coupled with memory component by system bus
Device.Memory component includes such as local storage used in the practical execution of program code, mass storage and is
Reducing in the process of implementation must provide the temporary of at least some program codes from the number of mass storage retrieval coding
When the buffer memory that stores.
Input/output or I/O equipment are (including but not limited to keyboard, display, sensing equipment, DASD, band, CD, DVD, U
Dish driving and other storage mediums etc.) the I/O controller that can direct or through intervention couples with system.Network adapter
It can be coupled with system so that data processing system is become and other data processing systems by the dedicated or common network intervened
Or remote printer or storage equipment couple.Modem, cable modem and Ethernet card are only available types
It is some in network adapter.
The present invention can be system, method and/or computer program product.Computer program product may include computer
Readable storage medium storing program for executing, containing for making processor realize the computer-readable program instructions of various aspects of the invention.
Computer readable storage medium, which can be, can keep and store the tangible of the instruction used by instruction execution equipment
Equipment.Computer readable storage medium for example can be-- but it is not limited to-- storage device electric, magnetic storage apparatus, optical storage
Equipment, electric magnetic storage apparatus, semiconductor memory apparatus or above-mentioned any appropriate combination.Computer readable storage medium
More specific example (non exhaustive list) includes:Portable computer diskette, random access memory (RAM), read-only is deposited hard disk
It is reservoir (ROM), erasable programmable read only memory (EPROM or flash memory), static random access memory (SRAM), portable
Compact disk read-only memory (CD-ROM), digital versatile disc (DVD), memory stick, floppy disk, mechanical coding equipment, for example thereon
It is stored with punch card or groove internal projection structure and the above-mentioned any appropriate combination of instruction.Calculating used herein above
Machine readable storage medium storing program for executing is not interpreted that instantaneous signal itself, the electromagnetic wave of such as radio wave or other Free propagations lead to
It crosses the electromagnetic wave (for example, the light pulse for passing through fiber optic cables) of waveguide or the propagation of other transmission mediums or is transmitted by electric wire
Electric signal.
Computer-readable program instructions as described herein can be downloaded to from computer readable storage medium it is each calculate/
Processing equipment, or outer computer or outer is downloaded to by network, such as internet, local area network, wide area network and/or wireless network
Portion stores equipment.Network may include copper transmission cable, optical fiber transmission, wireless transmission, router, firewall, interchanger, gateway
Computer and/or Edge Server.Adapter or network interface in each calculating/processing equipment are received from network to be counted
Calculation machine readable program instructions, and the computer-readable program instructions are forwarded, for the meter being stored in each calculating/processing equipment
In calculation machine readable storage medium storing program for executing.
Computer program instructions for executing operation of the present invention can be assembly instruction, instruction set architecture (ISA) instructs,
Machine instruction, machine-dependent instructions, microcode, firmware instructions, condition setup data or with one or more programming languages
The source code or object code that any combination is write, the programming language include the programming language-such as Java of object-oriented,
Smalltalk, C++ etc., and conventional procedural programming languages-such as " C " language or similar programming language.Computer
Readable program instructions can be executed fully on the user computer, partly execute on the user computer, be only as one
Vertical software package executes, part executes on the remote computer or completely in remote computer on the user computer for part
Or it is executed on server.In situations involving remote computers, remote computer can pass through network-packet of any kind
It includes local area network (LAN) or wide area network (WAN)-is connected to subscriber computer, or, it may be connected to outer computer (such as benefit
It is connected with ISP by internet).In some embodiments, by utilizing computer-readable program instructions
Status information carry out personalized customization electronic circuit, such as programmable logic circuit, field programmable gate array (FPGA) or can
Programmed logic array (PLA) (PLA), the electronic circuit can execute computer-readable program instructions, to realize each side of the invention
Face.
Referring herein to according to the method for the embodiment of the present invention, the flow chart of device (system) and computer program product and/
Or block diagram describes various aspects of the invention.It should be appreciated that flowchart and or block diagram each box and flow chart and/
Or in block diagram each box combination, can be realized by computer-readable program instructions.
These computer-readable program instructions can be supplied to general purpose computer, special purpose computer or other programmable datas
The processor of processing unit, so that a kind of machine is produced, so that these instructions are passing through computer or other programmable datas
When the processor of processing unit executes, function specified in one or more boxes in implementation flow chart and/or block diagram is produced
The device of energy/movement.These computer-readable program instructions can also be stored in a computer-readable storage medium, these refer to
It enables so that computer, programmable data processing unit and/or other equipment work in a specific way, thus, it is stored with instruction
Computer-readable medium then includes a manufacture comprising in one or more boxes in implementation flow chart and/or block diagram
The instruction of the various aspects of defined function action.
Computer-readable program instructions can also be loaded into computer, other programmable data processing units or other
In equipment, so that series of operation steps are executed in computer, other programmable data processing units or other equipment, to produce
Raw computer implemented process, so that executed in computer, other programmable data processing units or other equipment
Instruct function action specified in one or more boxes in implementation flow chart and/or block diagram.
The flow chart and block diagram in the drawings show the system of multiple embodiments according to the present invention, method and computer journeys
The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation
One module of table, program segment or a part of instruction, the module, program segment or a part of instruction include one or more use
The executable instruction of the logic function as defined in realizing.In some implementations as replacements, function marked in the box
It can occur in a different order than that indicated in the drawings.For example, two continuous boxes can actually be held substantially in parallel
Row, they can also be executed in the opposite order sometimes, and this depends on the function involved.It is also noted that block diagram and/or
The combination of each box in flow chart and the box in block diagram and or flow chart, can the function as defined in executing or dynamic
The dedicated hardware based system made is realized, or can be realized using a combination of dedicated hardware and computer instructions.
Although describing preferred embodiment in detail herein, it will be apparent to those skilled in the art that, it can be
Without departing substantially from present disclosure spirit in the case where carry out various modifications, add and substitute, and therefore, these are considered as locating
In limited in claim below scope of the present disclosure interior.
Claims (20)
1. a kind of hardware lock omits the method in HLE environment, the method is for predictably determining whether HLE affairs should be practical
It executes with obtaining lock and non-transactional, the method includes:
Based on HLE lock acquisition instruction is encountered, it is based on HLE fallout predictor, determination is to omit lock and continue or obtain as HLE affairs
It takes lock and continues as non-transactional;
Compare unsuccessfully the number of thresholds of the counting and failure of the execution of HLE affairs;
It is no more than the number of thresholds of failure based on the counting that failure HLE affairs execute, continues in HLE transactional execution pattern,
Until encountering xrelease instruction or detecting interference, wherein xrelease instruction release lock, it is dry in response to detecting
It disturbs, is incremented by the counting that failure HLE affairs execute;And
It is more than the number of thresholds of failure based on the counting that failure HLE affairs execute, HLE lock acquisition instruction is considered as non-HLE lock and is obtained
Instruction fetch and continue in non-transactional mode.
2. according to the method described in claim 1, further including:
HLE fallout predictor is updated based on the success of the prediction to HLE affairs, wherein HLE fallout predictor prediction HLE affairs whether may
Stop.
3. according to the method described in claim 1, further including:
Based on the HLE affairs with lock address are encountered for the first time, by the counting of success HLE affairs execution associated with lock address
It is initialized as zero;
Based on any subsequent HLE affairs for stopping that there is lock address, it is incremented by associated with the lock address of HLE affairs in fallout predictor
The counting that failure HLE affairs execute;
Based on any subsequent HLE affairs for completing that there is lock address, it is incremented by related to the lock address of HLE affairs in HLE fallout predictor
The counting that the successful HLE affairs of connection execute.
4. according to the method described in claim 1, further including:
Monitor that another processing accesses the trial of lock in non-transactional mode;With
When the trial for detecting another processing accesses, it is incremented by the counting that failure HLE affairs execute.
5. according to the method described in claim 1, further including:
Track the counting that the successful HLE affairs in time window execute and the counting that failure HLE affairs execute;
Compare the number of thresholds of counting and failure that the failure HLE affairs during the time window execute;And
It is more than the number of thresholds of the failure based on the counting that failure HLE affairs execute, by the remainder of the time window
It is default to arrive non-transactional mode.
6. according to the method described in claim 5, further including:
It is expired based on the time window, is by the counting that success HLE affairs execute and the count resets that failure HLE affairs execute
Zero.
7. a kind of computer readable storage medium, the computer readable storage medium can be read by processing circuit, and be deposited
Storage is executed by processing circuit with the instruction for executing method comprising the following steps:
Based on HLE lock acquisition instruction is encountered, it is based on HLE fallout predictor, determination is to omit lock and continue or obtain as HLE affairs
It takes lock and continues as non-transactional;
Compare unsuccessfully the number of thresholds of the counting and failure of the execution of HLE affairs;
It is no more than the number of thresholds of failure based on the counting that failure HLE affairs execute, continues in HLE transactional execution pattern,
Until encountering xrelease instruction or detecting interference, wherein xrelease instruction release lock, it is dry in response to detecting
It disturbs, is incremented by the counting that failure HLE affairs execute;With
It is more than the number of thresholds of failure based on the counting that failure HLE affairs execute, HLE lock acquisition instruction is considered as non-HLE lock and is obtained
Instruction fetch and continue in non-transactional mode.
8. computer readable storage medium according to claim 7, wherein being executed instruction by processing circuit for executing
Method further include:
HLE fallout predictor is updated based on the success of the prediction to HLE affairs, wherein HLE fallout predictor prediction HLE whether may in
Only.
9. computer readable storage medium according to claim 7, wherein being executed instruction by processing circuit for executing
Method further include:
Monitor that another processing accesses the trial of lock in non-transactional mode;With
When the trial for detecting another processing accesses, it is incremented by the counting that failure HLE affairs execute.
10. computer readable storage medium according to claim 7, wherein being executed instruction by processing circuit for executing
Method further include:
Monitor that another processing accesses the trial of the memory area by lock protection in non-transactional mode;With
When the trial for detecting another processing accesses, it is incremented by the counting that failure HLE affairs execute.
11. computer readable storage medium according to claim 7, wherein being executed instruction by processing circuit for executing
Method further include:
Based on the HLE affairs with lock address are encountered for the first time, counting associated with lock address is initialized as zero;
Based on any subsequent HLE affairs for stopping that there is lock address, it is incremented by associated with the lock address of HLE affairs in fallout predictor
The counting that failure HLE affairs execute;
Based on any subsequent HLE affairs for completing that there is lock address, it is incremented by associated with the lock address of HLE affairs in fallout predictor
The counting that success HLE affairs execute.
12. computer readable storage medium according to claim 7, wherein being executed instruction by processing circuit for executing
Method further include:
Track the counting that the successful HLE affairs in time window execute and the counting that failure HLE affairs execute;
Compare the number of thresholds of counting and failure that the failure HLE affairs during the time window execute;With
It is more than the number of thresholds of the failure based on the counting that failure HLE affairs execute, by the remainder of the time window
It is default to arrive non-transactional mode.
13. computer readable storage medium according to claim 12, wherein being executed instruction by processing circuit for holding
Capable method further includes:
It is expired based on the time window, is by the counting that success HLE affairs execute and the count resets that failure HLE affairs execute
Zero.
14. a kind of hardware lock omits the computer system in HLE environment, the computer system is for predictably determining HLE
Affairs execute in which whether should actually obtain lock and non-transactional, and the computer system includes:
Memory;With
The processor communicated with the memory, wherein computer system is configured as executing a kind of method, the method packet
It includes:
Based on HLE lock acquisition instruction is encountered, it is based on HLE fallout predictor, determination is to omit lock and continue or obtain as HLE affairs
It takes lock and continues as non-transactional;
Compare unsuccessfully the number of thresholds of the counting and failure of the execution of HLE affairs;
It is no more than the number of thresholds of failure based on the counting that failure HLE affairs execute, continues in HLE transactional execution pattern,
Until encountering xrelease instruction or detecting interference, wherein xrelease instruction release lock, it is dry in response to detecting
It disturbs, is incremented by the counting that failure HLE affairs execute;And
It is more than the number of thresholds of failure based on the counting that failure HLE affairs execute, HLE lock acquisition instruction is considered as non-HLE lock and is obtained
Instruction fetch and continue in non-transactional mode.
15. computer system according to claim 14, wherein further including by the method that computer system executes:
HLE fallout predictor is updated based on the success of the prediction to HLE affairs, wherein HLE fallout predictor prediction HLE whether may in
Only.
16. computer system according to claim 14, wherein further including by the method that computer system executes:
Monitor that another processing accesses the trial of lock in non-transactional mode;And
When the trial for detecting another processing accesses, it is incremented by the counting that failure HLE affairs execute.
17. computer system according to claim 14, wherein further including by the method that computer system executes:
Monitor that another processing accesses the trial of the memory area by lock protection in non-transactional mode;And
When the trial for detecting another processing accesses, it is incremented by the counting that failure HLE affairs execute.
18. computer system according to claim 14, wherein further including by the method that computer system executes:
Based on the HLE affairs with lock address are encountered for the first time, counting associated with lock address is initialized as zero;
Based on any subsequent HLE affairs for stopping that there is lock address, it is incremented by associated with the lock address of HLE affairs in fallout predictor
The counting that failure HLE affairs execute;
Based on any subsequent HLE affairs for completing that there is lock address, it is incremented by associated with the lock address of HLE affairs in fallout predictor
The counting that success HLE affairs execute.
19. computer system according to claim 14, wherein further including by the method that computer system executes:
Track the counting that the successful HLE affairs in time window execute and the counting that failure HLE affairs execute;
Compare the number of thresholds of counting and failure that the failure HLE affairs during the time window execute;And
It is more than the number of thresholds of the failure based on the counting that failure HLE affairs execute, by the remainder of the time window
It is default to arrive non-transactional mode.
20. computer system according to claim 19, wherein further including by the method that computer system executes:
It is expired based on the time window, is by the counting that success HLE affairs execute and the count resets that failure HLE affairs execute
Zero.
Applications Claiming Priority (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201314052960A | 2013-10-14 | 2013-10-14 | |
US14/052,960 | 2013-10-14 | ||
US14/191,581 | 2014-02-27 | ||
US14/191,581 US9524195B2 (en) | 2014-02-27 | 2014-02-27 | Adaptive process for data sharing with selection of lock elision and locking |
PCT/CN2014/087692 WO2015055083A1 (en) | 2013-10-14 | 2014-09-28 | Adaptive process for data sharing with selection of lock elision and locking |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105683906A CN105683906A (en) | 2016-06-15 |
CN105683906B true CN105683906B (en) | 2018-11-23 |
Family
ID=52827651
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201480053800.8A Active CN105683906B (en) | 2013-10-14 | 2014-09-28 | Selection for being omitted and being locked using lock carries out the self-adaptive processing of data sharing |
Country Status (3)
Country | Link |
---|---|
JP (1) | JP6642806B2 (en) |
CN (1) | CN105683906B (en) |
WO (1) | WO2015055083A1 (en) |
Families Citing this family (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP6468053B2 (en) * | 2015-04-28 | 2019-02-13 | 富士通株式会社 | Information processing apparatus, parallel processing program, and shared memory access method |
JP6895719B2 (en) * | 2016-06-24 | 2021-06-30 | 日立Astemo株式会社 | Vehicle control device |
US11868818B2 (en) * | 2016-09-22 | 2024-01-09 | Advanced Micro Devices, Inc. | Lock address contention predictor |
JP6943030B2 (en) | 2017-06-16 | 2021-09-29 | 富士通株式会社 | Information processing equipment, information processing methods and programs |
EP3462308B1 (en) * | 2017-09-29 | 2022-03-02 | ARM Limited | Transaction nesting depth testing instruction |
JP6839126B2 (en) * | 2018-04-12 | 2021-03-03 | 日本電信電話株式会社 | Control processing device, control processing method and control processing program |
US10860388B1 (en) * | 2019-07-09 | 2020-12-08 | Micron Technology, Inc. | Lock management for memory subsystems |
WO2021026938A1 (en) * | 2019-08-15 | 2021-02-18 | 奇安信安全技术(珠海)有限公司 | Shellcode detection method and apparatus |
CN110781016B (en) * | 2019-10-30 | 2021-04-23 | 支付宝(杭州)信息技术有限公司 | Data processing method, device, equipment and medium |
CN112199391B (en) * | 2020-09-30 | 2024-02-23 | 深圳前海微众银行股份有限公司 | Data locking detection method, equipment and computer readable storage medium |
CN114791899A (en) * | 2021-01-25 | 2022-07-26 | 华为技术有限公司 | Database management method and device |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102103485A (en) * | 2009-12-16 | 2011-06-22 | 英特尔公司 | Two-stage commit (TSC) region for dynamic binary optimization in X86 |
CN102722418A (en) * | 2007-11-07 | 2012-10-10 | 英特尔公司 | Late lock acquire mechanism for hardware lock elision (hle) |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7529914B2 (en) * | 2004-06-30 | 2009-05-05 | Intel Corporation | Method and apparatus for speculative execution of uncontended lock instructions |
US8190859B2 (en) * | 2006-11-13 | 2012-05-29 | Intel Corporation | Critical section detection and prediction mechanism for hardware lock elision |
US8914620B2 (en) * | 2008-12-29 | 2014-12-16 | Oracle America, Inc. | Method and system for reducing abort rates in speculative lock elision using contention management mechanisms |
US8244988B2 (en) * | 2009-04-30 | 2012-08-14 | International Business Machines Corporation | Predictive ownership control of shared memory computing system data |
US20130159653A1 (en) * | 2011-12-20 | 2013-06-20 | Martin T. Pohlack | Predictive Lock Elision |
-
2014
- 2014-09-28 JP JP2016521660A patent/JP6642806B2/en active Active
- 2014-09-28 CN CN201480053800.8A patent/CN105683906B/en active Active
- 2014-09-28 WO PCT/CN2014/087692 patent/WO2015055083A1/en active Application Filing
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102722418A (en) * | 2007-11-07 | 2012-10-10 | 英特尔公司 | Late lock acquire mechanism for hardware lock elision (hle) |
CN102103485A (en) * | 2009-12-16 | 2011-06-22 | 英特尔公司 | Two-stage commit (TSC) region for dynamic binary optimization in X86 |
Also Published As
Publication number | Publication date |
---|---|
JP6642806B2 (en) | 2020-02-12 |
JP2016537709A (en) | 2016-12-01 |
CN105683906A (en) | 2016-06-15 |
WO2015055083A1 (en) | 2015-04-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105683906B (en) | Selection for being omitted and being locked using lock carries out the self-adaptive processing of data sharing | |
US10585697B2 (en) | Dynamic prediction of hardware transaction resource requirements | |
CN106030534B (en) | For saving the method and system for the hardware transactional that part executes | |
US9361031B2 (en) | Software indications and hints for coalescing memory transactions | |
US9430276B2 (en) | Coalescing memory transactions | |
US9262207B2 (en) | Using the transaction-begin instruction to manage transactional aborts in transactional memory computing environments | |
US9619383B2 (en) | Dynamic predictor for coalescing memory transactions | |
CN106133705B (en) | Indicate the method and system of the consistency protocol enhancing of transaction status | |
US9690556B2 (en) | Code optimization to enable and disable coalescing of memory transactions | |
US9864690B2 (en) | Detecting cache conflicts by utilizing logical address comparisons in a transactional memory | |
US10235201B2 (en) | Dynamic releasing of cache lines | |
US9852014B2 (en) | Deferral instruction for managing transactional aborts in transactional memory computing environments | |
US9442776B2 (en) | Salvaging hardware transactions with instructions to transfer transaction execution control | |
US10876228B2 (en) | Enabling end of transaction detection using speculative look ahead | |
US20160357595A1 (en) | Alerting hardware transactions that are about to run out of space | |
US10996982B2 (en) | Regulating hardware speculative processing around a transaction |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |