CN102460376B - The optimization of Unbounded transactional memory (UTM) system - Google Patents

The optimization of Unbounded transactional memory (UTM) system Download PDF

Info

Publication number
CN102460376B
CN102460376B CN200980160097.XA CN200980160097A CN102460376B CN 102460376 B CN102460376 B CN 102460376B CN 200980160097 A CN200980160097 A CN 200980160097A CN 102460376 B CN102460376 B CN 102460376B
Authority
CN
China
Prior art keywords
address
metadata
data
abstract
affairs
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN200980160097.XA
Other languages
Chinese (zh)
Other versions
CN102460376A (en
Inventor
G·谢弗
J·格雷
B·史密斯
A-R·阿德-塔巴塔巴伊
R·杰瓦
V·巴辛
D·卡拉汉
Y·倪
B·萨哈
M·泰列费尔
S·赖金
K·山田
L·王
A·基尚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Intel Corp
Original Assignee
Intel Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Intel Corp filed Critical Intel Corp
Publication of CN102460376A publication Critical patent/CN102460376A/en
Application granted granted Critical
Publication of CN102460376B publication Critical patent/CN102460376B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/3004Arrangements for executing specific machine instructions to perform operations on memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/10Address translation
    • G06F12/1027Address translation using associative or pseudo-associative address translation means, e.g. translation look-aside buffer [TLB]
    • G06F12/1045Address translation using associative or pseudo-associative address translation means, e.g. translation look-aside buffer [TLB] associated with a data cache
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/10Address translation
    • G06F12/1027Address translation using associative or pseudo-associative address translation means, e.g. translation look-aside buffer [TLB]
    • G06F12/1036Address translation using associative or pseudo-associative address translation means, e.g. translation look-aside buffer [TLB] for multiple virtual address spaces, e.g. segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/10Address translation
    • G06F12/109Address translation for multiple virtual address spaces, e.g. segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30076Arrangements for executing specific machine instructions to perform miscellaneous control operations, e.g. NOP
    • G06F9/30087Synchronisation or serialisation instructions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30181Instruction operation extension or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30181Instruction operation extension or modification
    • G06F9/30185Instruction operation extension or modification according to one or more bits in the instruction, e.g. prefix, sub-opcode
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30181Instruction operation extension or modification
    • G06F9/30189Instruction operation extension or modification according to execution mode, e.g. mode flag
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3824Operand accessing
    • G06F9/3834Maintaining memory consistency
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • G06F9/3851Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution from multiple instruction streams, e.g. multistreaming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3854Instruction completion, e.g. retiring, committing or graduating
    • G06F9/3858Result writeback, i.e. updating the architectural state or memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/52Program synchronisation; Mutual exclusion, e.g. by means of semaphores
    • G06F9/526Mutual exclusion algorithms
    • G06F9/528Mutual exclusion algorithms by using speculative mechanisms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/40Specific encoding of data in memory or cache
    • G06F2212/401Compressed data

Abstract

This paper describes the method and apparatus for optimizing Unbounded transactional memory (UTM) system. Hardware supported for supervision, buffering and metadata is provided, wherein can be associated with the software subsystem in thread and/or thread respectively for the orthogonal abstract address space of metadata. In addition, metadata can be with respect to data with compress mode, software is kept with hardware pellucidly. In addition,, in response to metadata access instructions/operations, hardware can be supported the multiple patterns of compulsory metadata values to allow affairs to carry out. But, if detect that supervision, buffered data, metadata or out of Memory are lost or conflict, hardware provides the modification of losing instruction, and it can carry out poll transaction status register for this loss or conflict, and in response to this loss or conflict being detected, execution is jumped to label. Similarly, provide the multiple modification of submitting instruction to, for allowing software definition to submit condition to and submitting the information that will remove afterwards to. In addition, hardware provides support in the time encircling level transitions, to allow hang-up and the recovery of affairs.

Description

The optimization of Unbounded transactional memory (UTM) system
Technical field
The present invention relates to processor and carry out field, and particularly, relate to the execution of instruction group.
Background technology
The progress of semiconductor processes and logical design allows may reside in the logic on IDEThe increase of quantity. As a result, the single or multiple integrated electrics from system of computer system configurationsRoad is evolved to the multiple cores and the multiple logic processor that are present on independent integrated circuit. ProcessorOr integrated circuit generally comprises single processor tube core, wherein this processor tube core can comprise arbitrarilyThe core of quantity or logic processor.
Ever-increasing core on integrated circuit and logical processor number allow concurrent execution more how softPart thread. But the increase of the quantity of the software thread that can carry out has caused soft simultaneouslyData shared between part thread are carried out synchronous problem. Access is at multiple cores or multiple logics placeA public solution of the shared data in reason device system comprises and ensures sharing several with lockAccording to multiple access between mutual exclusion. But, carry out this ever-increasing abilities of multiple software threadsThe serialization that may cause mistake competition and carry out.
For example, consider to keep sharing the hash table of data. Utilize lock system, programmer can lock wholeIndividual hash table, allows a thread to visit whole hash table. But, because other thread can notAccess any entry in this hash table until this lock is released, therefore, may adversely affect thisHandling capacity and the performance of other thread a bit. Alternatively, can lock the each entry in hash table.No matter which kind of mode, in this simple example is extrapolated to large scalable program after, aobvious and easilySee, lock competition, serialization, fine granularity complexity synchronous and that deadlock is avoided can become programmerGreat burden.
Another nearest data synchronization technology comprises the use of transaction memory (TM). Common thingBusiness is carried out and is comprised the grouping of carrying out multiple microoperations, operation or instruction. In above-mentioned example, threadAll in hash table, carry out, and its memory access comes under observation/follows the tracks of. If two threads are all visitedAsk/change same entry, can carry out conflict solution and guarantee data validity. The thing of one typeBusiness is carried out and is comprised software transactional memory (STM), the wherein tracking to memory access, conflict solve,Aborting task and other Transaction task are carried out in software, and conventionally there is no the support of hardware.
The affairs of another type are carried out and are comprised hardware transaction memory (HTM) system, comprising firmlyPart is to support access track, conflict to solve and other Transaction task. Previously, utilized additional bitExpansion physical storage data array, with maintenance information, such as, read for following the tracks of, write withAnd the hardware attributes of buffering, therefore, these data are along with the data dissemination from processor to memory. ByPropagate along with data run through memory hierarchy in this information, therefore, this information is known as conventionally lastinglyProperty, that is, it can not lose in the time that high-speed cache is regained. But this persistence is given whole memoryHierarchical system has increased more expense.
In addition, previous hardware transaction memory (HTM) system has inefficiencies in many-side. DoFor first example, HTM is current do not provide not buffering or buffering and not monitored state to buffering andBetween monitored state, changed before submitting in affairs and guarantee conforming effective ways. As separatelyOne example, existence utilizes many-sided inefficiencies of the HTM interface of software. Particularly, hardware does not haveBe provided for suitably promoting the mechanism of software memory access barrier, wherein such mechanism to consider,Strong atomicity between affairs and non-transaction operation and Weak atomicity multi-form. In addition attempting,During affairs are submitted to, determine when for the loss based on supervision, buffering and/or other attribute informationEnd or submit affairs to, hardware does not provide any help. Similarly, these previous HTMWhen instruction set does not provide the submission that is defined in affairs for the submission instruction of the information that keeps or remove.Other exemplary inefficiencies comprises: HTM is not provided for conflict or the loss of information being detectedIn time, guides or instruction that redirect is carried out effectively, and current HTM incapability term of execution of affairsPower is processed ring level priority and is changed.
Brief description of the drawings
By example, the present invention is shown, and the invention is not restricted to the figure in accompanying drawing.
Fig. 1 shows the processor that comprises multiple treatment elements that can the multiple software threads of concurrent executionEmbodiment.
Fig. 2 shows the embodiment that the metadata of data item is associated.
Fig. 3 shows the multiple orthogonal abstract address of the software subsystem separating in multiple treatment elementsThe embodiment in space.
Fig. 4 shows the embodiment of metadata to the compression of data.
Fig. 5 shows the embodiment of the flow chart of the method for accesses meta-data.
Fig. 6 shows in strong atomicity and Weak atomicity environment and supports the metadata store that affairs are acceleratedThe embodiment of element.
Fig. 7 shows the flow process that simultaneously maintains the atomicity of transaction environment for accelerating non-transaction operationThe embodiment of figure.
Fig. 8 shows for before submitting in affairs, and data block is effectively changed into buffering and monitorsThe embodiment of the flow chart of the method for state.
Fig. 9 shows the state value based in transaction status register, supports to lose instruction to jump toThe embodiment of the hardware of object label.
Figure 10 shows conflict or the loss based on customizing messages, carries out and loses instruction to jump to orderThe embodiment of flow chart of method of label.
Figure 11 shows to be supported in and submits the enforcement that defines submission condition and remove the hardware of controlling in instruction toExample.
Figure 12 shows and carries out the flow process that has defined submission condition and removed the method for the submission instruction of controllingThe embodiment of figure.
Figure 13 shows the embodiment that supports the hardware of processing privilege level transformation term of execution of affairs.
Detailed description of the invention
In the following description, set forth multiple concrete details, concrete such as what carry out for affairsThe height of the particular type of the particular type of hardware configuration, access monitor and realization, test access conflictThe particular type of the consistent model of speed buffer memory, specific data granularity and memory access and position etc.Example, to provide thorough understanding of the present invention. But, it will be apparent to those skilled in the artNot adopt these concrete details to put into practice the present invention. In other situation, not in detailKnown assembly or method are described in ground, such as coding, the compiling of update cause of affairs in softwareDevice is carried out the division of cited function, affairs, specific and interchangeable multi-core and multithreading placeThe concrete operations details of reason device framework, specific compiler method/realization and microprocessor etc., withAvoid unnecessarily fuzzy the present invention.
Method and apparatus described herein is for carrying out for Unbounded transactional memory (UTM)Optimize hardware and software. Particularly, with reference to the support of UTM system is carried out to paper optimization. SoAnd method and apparatus described herein can be different from any form of UTM system in implementationTransactional memory systems in use, such as, support or accelerate software transactional memory system(STM) in hardware, in pure hardware transaction memory system (HTM) or above-mentionedIn mixing, use.
With reference to Fig. 1, show the embodiment of processor that can the multiple threads of concurrent execution. Note, locateReason device 100 can comprise the hardware that support hardware affairs are carried out. Carry out combination with hardware transactional, orIndividually, processor 100 can also provide support software transactional memory (STM) hardware-accelerated,The independent execution of STM or above-mentioned combination (such as, mix transaction memory (TM) system)Hardware. Processor 100 comprises random processor, such as, microprocessor, flush bonding processor, numberThe miscellaneous equipment of word signal processor (DSP), network processing unit or run time version. ShownProcessor 100 comprises multiple treatment elements.
In one embodiment, treatment element refer to can keep processor state (such as, carry outState or architecture states) thread units, process unit, context, logic processor, hardwareThread, core and/or arbitrarily other element. In other words, in one embodiment, treatment element refers toCan be independent with the code of such as software thread, operating system, application program or other code etc.Any hardware of being associated. Concurrent physical processor refers generally to integrated circuit, and it may comprise such as coreOr any amount of other treatment element of hardware thread etc.
Core is often referred to and is positioned at logic on integrated circuit, that can maintain independent architecture state, wherein,Each architecture states independently maintaining and the execution resource dependency connection that at least some are special. Relative with coreAccording to, hardware thread refers generally to be positioned at logic on integrated circuit, that can maintain independent architecture state,Wherein, the architecture states independently maintaining is shared the access to carrying out resource. Can find out, when specificallyResource be shared and other be dedicated to architecture states time, between hardware thread and the name of coreLine have overlapping. But conventionally, core and hardware thread are considered as independent logic place by operating systemReason device, wherein operating system can be dispatched the operation on each logic processor individually.
Concurrent physical processor 100 shown in Fig. 1 comprises two cores, i.e. core 101 and core 102,These two cores are shared the more access of higher level cache 110. Although processor 100 can compriseAsymmetric core, has the core of different configurations, functional unit and/or logic that is, still,What illustrate is symmetrical core. Therefore, discuss and be depicted as the core identical with core 101 no longer in detail102, to avoid repeating discussion. In addition, core 101 comprises two hardware thread 101a and 101b,And core 102 comprises two hardware thread 102a and 102b. Therefore, such as the software of operating system etc.Entity can be considered as processor 100 four processors that separate, can four softwares of concurrent executionFour logic processors of thread or treatment element.
Here, the first thread is associated with architecture states register 101a, the second thread and architecture statesRegister 101b is associated, and the 3rd thread is associated with architecture states register 102a, and the 4th lineJourney is associated with architecture states register 102b. As shown, at architecture states register 101bMiddle Replication Architecture status register 101a, therefore, can be for logic processor 101a and logical processDevice 101b stores independent architecture states/context. Can also copy it for thread 101a and 101bThe resource that he is less, for example, the rename logic in instruction pointer and rename dispatcher logic 130.Some resources, for example, resequencing buffer, ILTB120 in, reorder/retirement unit 135, addCarry/storage buffer and queue, can share by subregion. And other resources, for example general inPortion's register, page table base register, low level data high-speed cache and data TLB115, performance element140, and the part of out of order unit 135, can be shared completely.
Processor 100 generally includes other resources, and subregion can be shared, pass through to these resources completelyShare, or items special/be exclusively used in treatment element. In Fig. 1, only example is describedThe embodiment of the processor of property, it has the illustrative functional unit/resource of processor. Note, processDevice can comprise or omit any these functional units, also can comprise that any other is unshownedKnow functional unit, logic OR firmware.
As shown, processor 100 comprises bus interface module 105, for processor 100Outside equipment (for example, system storage 175, chipset, north bridge or other integrated circuits) carries outCommunication. Memory 175 can be exclusively used in processor 100, or is shared by other equipment in system.More senior or more outer high-speed cache 110 is for to recently obtaining from higher level cache 110 moreElement carries out buffer memory. Note, more senior or more outer refer to cache level increase or from executionUnit is farther. In one embodiment, more higher level cache 110 is secondary data high-speed caches.But more higher level cache 110 is not limited to this, it can and instruction high-speed cache be associated orPerson comprises instruction cache. Trace cache (, a kind of instruction cache) can substituteBe coupling in decoder 125 after, for storing the tracking of nearest decoding. Module 120 can also be wrappedDraw together the branch target buffer for predicting the branch that will carry out/adopt, and for storing the address of instructionConversion stripes object instruction transformation buffer (I-TLB).
Decoding module 125 is coupled to acquiring unit 120, so that the element obtaining is carried out to decoding. OneIn individual embodiment, processor 100 and instruction collection frameworks (ISA) are associated, and described instruction set architecture is fixedJustice/specified executable instruction on processor 100. Here the machine code instruction of being identified by ISA, is logicalOften comprise that part that is called as command code (opcode) in this instruction, it relates to/specifies and will holdInstruction or the operation of row.
In one example, distributor and rename device module 130 comprise distributor, all for retainingResource as register file, to store instruction process result. But, thread 101a and 101bMay can carry out out of order execution, in this case, distributor and rename device module 130 are also protectedStay other resources such such as resequencing buffer, with trace command result. Unit 130 is all rightComprising register renaming device, is in processor 100 for program/instruction being quoted to register renamingOther registers of portion. Reorder/retirement unit 135 comprise all resequencing buffers as mentioned above,Load buffer and the such assembly of storage buffer, to support the instruction of out of order execution and out of order executionOrderly resignation afterwards.
In one embodiment, scheduler and performance element module 140 comprise dispatcher unit, forInstructions/operations on scheduling execution units. For example, there is the performance element of available performance element of floating pointPort on dispatch floating point instruction. The register file being associated with these performance elements is also includedStorage information command result. Exemplary performance element comprises performance element of floating point, integer fill orderUnit, redirect performance element, load and execution unit, storage performance element, and other known fill orderUnit.
Low level data high-speed cache and data transaction buffer (D-TLB) 150 are coupled to performance element140. The element that this data cache stores is used recently/operates, for example data operand, these yuanElement can be maintained at memory consistency state. The void to physical address that this D-TLB storage is nearestPlan/linear transformation. As a particular example, processor can comprise page table structure, with by physical storeDevice is divided into multiple virtual pages.
In one embodiment, processor 100 can carry out hardware transactional execution, software transaction is carried outOr their combination or mixing. Affairs also can be called as the critical or former subsegment of code, and it comprisesThe group being formed by instruction, operation or microoperation that will be performed as former subgroup. For example, instruction or behaviourWork can be used for dividing affairs or critical section. In one embodiment, as described in more detail below,These instructions are parts of the instruction set such such as instruction set architecture (ISA), and it can be by processor100 such as the such hardware identification of above-mentioned decoder. Conventionally, once be compiled into firmly from high-level languageThe discernible assembler language of part, these instructions comprise command code (opcode), or these instructionsOther parts that the decoding stage is identified by decoder.
Conventionally, carrying out during affairs, in affairs submitted (commit) before, to memoryUpgrade is not that the overall situation is visible. For example, it may be visible the affairs of a position being write local thread, still, before comprising that affairs that these affairs write are submitted, for reading from another threadOperation, do not forward write data be used as response. As below institute discussed in detail, when affairs not yetCertainly time, tracked from the data item/element of memory loading and write memory. Once affairs arrive and carryIntersection point, if the conflict for these affairs do not detected, these affairs are submitted, and make this affairsBeing updated to of carrying out is during this time overall visible.
But if affairs are disabled during it is unsettled, affairs are ended and may be restarted,And it is visible not make to be updated to the overall situation. Therefore, as used herein, the unsettled of affairs refers to that affairsThrough starting execution, but also not submitted or termination is also unsettled.
Software transactional memory (STM) system typically refers in software or at least partially in softwareCarry out access track, conflict solution, or other transaction memory tasks. In one embodiment, locateReason device 100 can be carried out compiler and carry out compiler code, to support affairs to carry out. Here compiling,Device can update, call, function and other codes, to make it possible to carry out affairs.
Compiler generally includes program or procedure set, for source text/code is converted to target text/Code. Conventionally the compiling of, with compiler, programs/applications program code being carried out is with multiple stages and multipassRealize, high-level programming language code is converted to rudimentary machine or assembler language code. But,Still can be by single pass compiler for simple compiling. Compiler can use any known technique of compiling,And carry out any known compiler operation, and for example, morphological analysis, pretreatment, syntactic analysis, semantemeAnalysis, code generation, code conversion and code optimization.
Although larger compiler generally includes multiple stages, these stages are modal is to compriseIn two generic phase: (1) front end, generally can carry out syntax treatment, semanteme that is thereinProcess and some conversion/optimizations; And (2) rear end, that is, generally analyze therein, change,Optimize and code generation. Some compilers relate to middle-end, between the front-end and back-end of its explanation compilerDefine fuzzy. Therefore, the insertion of related compiler, association, generation or other operations are passableIn any other known multiple stages of aforementioned multiple stages or multipass and compiler or multipassIn any one, carry out. As illustrated examples, compiler can be by transaction operation, call, functionDeng inserting in the one or more stages of compiling, for example, in the front-end phase of compiling, insert and call/graspDo, then during transaction memory translate phase, will call/operate and convert low level code to.
But, no matter which kind of situation the execution environment of compiler is with dynamic or static characteristic, at oneIn embodiment, compiler all compiler code makes it possible to carry out affairs execution. Therefore, oneIn individual embodiment, the execution of related program code refers to that (1) dynamically or statically carry out compilerProgram, maintains transaction structure or carries out the operation that other affairs are relevant to compile main program code; (2)Execution comprises the main program code of transaction operation/call; (3) carry out its being associated with main program codeHis program code, for example storehouse; Or (4) their combination.
Conventionally in software transactional memory (STM) system, will with compiler insert with will quiltInline some of application code of compiling operates, calls and other code, difference in storehouse simultaneouslyProvide that other operates, calls, function and code. This can provide storehouse distributor for optimize alsoAnd upgrade storehouse and the not necessary ability that recompilates application code. As a concrete example,To submit to function call can affairs submit to some place inline be inserted in application code, withTime submission function is provided respectively in renewable storehouse. In addition, for placing concrete operations and callingThe selection of position may affect the efficiency of application code. For example, if with reference to Fig. 6 about visitAsk that the filter operations that barrier is discussed in detail utilizes code to insert inlinely, can will carry out guidingTo carrying out this filter operations before barrier, instead of ineffectually guide to barrier and then carry out wave filterOperation.
In one embodiment, processor 100 can utilize hardware/logic (, to deposit at hardware transactionalIn reservoir (HTM) system) execution affairs. In the time realizing HTM, many specific implementation details are comeFrom in framework and these two kinds of angles of micro-architecture; Major part in these details is not discussed in this article,In order to avoid unnecessarily fuzzy the present invention. But, for purposes of illustration, some structures and reality are disclosedExisting. But, should be noted that these structures and realization nonessential, and can be with thering is differenceRealize other structures of details and supplement and/or replace these structures and realization.
As combination, processor 100 can be carried out in Unbounded transactional memory (UTM) systemAffairs, its trial utilizes the two advantage of STM and HTM system. For example,, because HTM disobeysRely in software and carry out all access track, collision detection, checking and affairs are submitted to, therefore forCarry out little affairs, HTM normally fast and effectively. But HTM only can conventionallyEnough process little affairs, and STM can process big or small free affairs. Therefore, a realityExecute in example, UTM system is carried out less affairs with hardware and is carried out for hardware with softwareExcessive affairs. As visible from discussion below, even at software during just in processing transactions,Also can use hardware to help and accelerate software. In addition, importantly should be noted that and can also usePure STM system is supported and accelerated to identical hardware.
As mentioned above, affairs comprise by the local treatment element in processor 100 and possible otherThe transactional memory accesses to data item that treatment element carries out. In transactional memory systems, do not pacifyIn the situation of full mechanism, some in these access may cause invalid data and execution, that is,It is invalid that writing of data made to read, or the reading of invalid data. Therefore, processor 100 canComprise and follow the tracks of or monitor data item or potential for identifying from the memory access of data itemThe logic of conflict, such as reading as discussed below monitor and writing monitor.
Data item or data element can comprise the data of any grain size category, as by hardware, softwareOr above-mentioned combination definition. Data, data element, data item or the example of quoting to itNon exhaustive list comprises: the word of the type of storage address, data object, class, dynamic language codeSection, the type of dynamic language code, variable, operand, data structure, and to storage addressIndirect referencing. But any known data group can be called as data element or data item.Some in example above, the class of for example field of the type of dynamic language code and dynamic language codeType, refers to the data structure of dynamic language code. In order to describe, such as SunMicrosystemsThe Java of companyTMSuch dynamic language code is strongly typed language. In the time of compiling, each variable toolThere is known type. These types are divided into two classes---primitive type (Boolean type and numeric type, exampleAs, integer, floating type) and reference type (class, interface and array). The value of reference type is to rightQuoting of elephant. At JavaTMIn, the object being made up of field can be class example or array. Given class AObject a, custom refers to the field x of type A with notation A::x, refers to class A with a.xThe field x of object a. For example, an expression formula can be expressed as a.x=a.y+a.z. Here field,Y and field z are loaded to be added, and result is write to field x.
Therefore, can carry out the supervision of the memory access to data item/slow with any data level granularityPunching. For example, in one embodiment, can monitor the memory access to data with type level.Here, the affairs of field A::x are write with the non-affairs of field A::y and load and can be used as same numberBe monitored according to the access of item (, type A). In another embodiment, hold with field level granularitySupervision/the buffering of line storage access. Here, the affairs of A::x are write with the non-affairs of A::y and loadedBe not monitored as the access to same data item, because they are the access to different field. NoteMeaning, in following the tracks of the memory access of data item, can consider other data structures or programming technique.For example, suppose field x and the y (, A::x and A::y) of the object of the sense(-)class B of the object of class ABe initialised to newly assigned object, and never write after initializing. A realityExecute in example, load for the non-affairs of the field B::z of A::y object pointed, to A::x indicationWrite not as the memory access to the data item identical with it to the affairs of the field B::z of object andBe monitored. From the extrapolation of these examples, can determine that monitor can hold with any data granularity rankMonitor/buffering of row.
In one embodiment, processor 100 comprises that monitor detects or tracking and data item phaseAssociated access and possible follow-up conflict. As an example, the hardware of processor 100 comprises to be readGet monitor and write monitor and correspondingly determine loading and the storage that will be monitored to follow the tracks of. As oneIndividual example, hardware read monitor and write monitor be used for data item granularity place monitor data item andNo matter which kind of granularity lower storage junctions structure is. In one embodiment, by storage organization granularity place phaseAssociated follow-up mechanism is carried out bound data item, to guarantee monitoring suitably at least whole data item.
As specific description example, read and write monitor and comprise relevant to requestedConnection attribute, such as the position in low level data high-speed cache 150, with monitor from these position phasesThe loading of associated address and the storage to the address being associated with these positions. Here carrying out,To the address being associated with requested read event time, data cache 150 is setThe reading attributes of requested, writes the possible conflict of identical address to monitor. ThisIn situation, operate in a similar manner and write attribute for the event of writing, with monitor to identical address canThe conflict of energy is read and is write. Continue this example, hardware can be based on requested be read andThe monitoring writing, utilization is read and/or is write attribute and detects conflict, and this reads and/or writes attribute quiltArrange correspondingly to indicate these requested to be monitored. On the contrary, in one embodiment,Monitor is read and writes in setting, or requested is updated to buffer status, cause such asRead requests or read the monitoring of ownership request, this allow with other high-speed cache in the ground that monitorsThe conflict of location is detected.
Therefore, based on this design, the supervision uniformity of cache coherence request and cache lineThe various combination of state causes possible conflict, reads monitored state guarantor such as cache line to shareHold data item and monitor the write request of instruction to data item. On the contrary, cache line writes with bufferingState keeps data item and outside monitor instruction and can be considered to be possible to the read requests of data itemConflict. In one embodiment, for this combination of test access request and attribute status, monitorLogic is coupled to collision detection/report logic, such as the monitor for collision detection/report and/or patrolVolume, and for reporting the status register of conflict.
But, for can be by the affairs of instruction definition such as submitting instruction etc. to, condition and sightIt is invalid that any combination is all likely considered to be, and this will discuss with reference to Figure 11-12 below in more detailState. May be regarded as comprising and depositing transactions access being detected for the example of factor of not submitting affairs toThe conflict of reservoir position, lose monitor message, lose the data of buffered data, loss and transactions accessThe metadata that is associated of item and the invalid event that detects other, such as interrupting, ring-type changes,Or explicit user instruction.
In one embodiment, the hardware of processor 100 is used for keeping affairs to upgrade with buffering method.As mentioned above, before submitting to, affairs do not make affairs be written as the overall situation visible. But, write with affairsThe local software thread being associated can accessing work renewal carry out subsequent transaction access. As firstExample provides buffer structure separately in processor 100, to keep the renewal of buffering, this energyEnough provide renewal to local thread, and do not provide renewal to other outside thread. But, comprise separatelyBuffer structure may be expensive and complexity.
Relatively, as another example, be used to delay such as the high-speed cache of data cache 150Rush this renewal, identical transaction functionality is provided simultaneously. Here, high-speed cache 150 can be to cushion oneCause sexual state and keep data item; In one case, new Cache consistency state is added into such asAmendment is monopolized in the cache coherent protocol of sharing invalid (MESI) agreement etc., to form MESIBAgreement. In response to this locality of buffered data item (data item keeping with Cache consistency state) is askedAsk, high-speed cache 150 provides data item to local treatment element, to guarantee inner affairs order rowOrder. But, in response to outside request of access, provide miss response, to guarantee not making affairs moreNew data item was that the overall situation is visible before submitting to. In addition, when the row of high-speed cache 150 is to cushion oneCause sexual state and keep and be selected for while withdrawal, the renewal of buffering is not write back to more senior heightIn speed buffer storage---the renewal of this buffering can not diffuse through this accumulator system, that is, can notBecome the overall situation visible, until after submitting to. Once submit to, change the row of buffering into amendment stateSo that the data item overall situation is visible.
Note, term inside and outside normally with respect to treatment element or the thing of shared cacheThe angle of the thread that the execution of business is associated. For example, carry out and be associated for carrying out with affairsThe first treatment element of software thread be called local thread. Therefore, in above-mentioned discussion, ifReceive the storage of the address to previously being write by the first thread or (it is led from the loading of this addressThe cache line that causes this address keeps with Cache consistency state), owing to being local thread, at a high speedThe buffered version of cache lines is provided for the first thread. Relatively, the second thread can be in same treatmentOn another treatment element in device, carry out, and not with the cache line of being responsible for keeping with buffer statusAffairs are carried out (outside thread) and are associated; Therefore, the loading to this address or deposit from the second threadChu Buhui hits the buffered version of cache line, and replaces from more with normal high-speed cacheSenior memory obtains the not buffered version of cache line.
Here, inner/local and outside/remote thread is carried out on identical processor, and at someIn embodiment, can be in the identical core of processor share access to high-speed cache point otherOn treatment element, carry out. But, the use of these terms is not limited to this. As mentioned above, this locality canReferring to share multiple threads of access to high-speed cache, carry out and be associated and be non-specific for affairsSingle thread, and outside or the long-range thread that can refer to not share the access to high-speed cache.
As above, described in the time of initial reference Fig. 1, the framework of processor 100 is only illustrative to useIn the object of discussing. Similarly, owing to can using in the difference entry of data and the same memoryAny method of being associated of metadata, therefore, the data address of reference metadata is convertedConcrete example is also exemplary.
The abstract address space of metadata
Metadata
Turn to Fig. 2, show the embodiment of the metadata that keeps data item in processor. As described, the metadata of data item 216 217 this locality are remained in memory 215. Metadata comprise withAny characteristic or attribute that data item 216 is associated, such as the transaction information relevant to data item 216.Some illustrated examples of metadata are included in hereinafter; But the example of disclosed metadata onlyBe only illustrative, and do not comprise exclusive list. In addition, under metadata position 217 can keepThe example that face is discussed the and there is no any combination of other attribute of the data item 216 of concrete discussion.
As the first example, if data item 216 previously accessed in affairs, buffering and/orBackup, metadata 217 comprises backup or the buffer position to write data item 216 for affairsQuote. Here, in some implementations, by the backup copy of the previous version of data item 216Remain in different positions, therefore, metadata 217 comprises for the address of backup location or itsIt is quoted. Alternatively, metadata 217 self can be as backup or the buffering of data item 216Position.
As another example, metadata 217 comprises that filter value is to accelerate the weight to data item 216Multiple transactions access. Conventionally,, the term of execution using the affairs of software, in transactional memory accesses, hold at placeRow access barrier is to guarantee uniformity and data validity. For example, before affairs load operation, holdRow reads barrier and reads barrier operating to carry out, thereby whether test data item 216 is not locked, trueSettled front affairs read collection whether still effectively, upgrade filter value and version value be recorded in to thingThe reading to concentrate of business verified subsequently allowing. But, if the term of execution of affairs, carried out rightReading of this position, the identical barrier operating that reads may be unnecessary.
Therefore, a solution comprises with reading wave filter and keeps the first default value with indicated numberAccording to item 216 or therefore indicate address not to be read the affairs term of execution, and keep second to visitThe value of asking is with designation data item 216 or therefore indicate address accessed during affairs are co-pending. ThisIn matter, the second access value indicates whether should accelerate to read barrier. In this case, if receivedIn affairs load operation and metadata position 217, read filter value designation data item 216Through being read, in one embodiment, omitting (not carrying out) and read barrier, with by not carrying outThe barrier operating that reads of unnecessary, redundancy accelerates affairs and carries out. Note, writing filtering device value canTo move in the mode identical with write operation. But independent filter value is only illustrative, in one embodiment, indicate address whether accessed by single filter value---No matter write or read. Here, for the first number that loads and store the metadata 217 that checks 216Use single filter value according to accessing operation, itself and metadata 217 comprise and point else read filter valueForm contrast with the above-mentioned example of writing filtering device value. As specific description embodiment, by first numberDistribute to and read wave filter to indicate whether about the data item acceleration being associated according to four bits of 217Reading barrier, writing filtering device accelerates to write barrier, get about the data item being associated indicating whetherDisappear wave filter to indicate whether to accelerate cancelling operation and compound filter is used as filter value quiltSoftware uses in any way.
Other example of some of metadata comprise the address to handling procedure instruction, represent or quote---be general or specific for the following: the affairs that are associated with data item 216, withThe affairs that data item 216 is associated unalterable/the losing of the character that is difficult to overcome, data item 216Lose, the loss of the monitor message of data item 216, the conflict detecting for data item 216, with numberWhat be associated according to item 216 reads concentrated address of reading entry or read collection, data item 216The version of precedence record, the current version of data item 216, the lock that allows the data item 216 to access,The transaction descriptor of the version value of data item 216, the affairs that are associated with data item 216, with andThe descriptive information relevant with affairs that it is known. In addition, as mentioned above, to the use of metadata notBe limited to transaction information. As corollary, metadata 217 can also comprise and data item 216 phasesAssociation and information, characteristic, attribute or state that affairs do not relate to.
Continue the discussion of the explanation to metadata, in certain embodiments, above-described hardware monitoringDevice and buffering coherency state are also regarded as metadata. Monitor indicates whether for outside read requestsOr outside ownership read requests monitors position, and the instruction of Cache consistency state keeps data itemThe data cache line being associated whether be cushioned. But, in above-mentioned example, will monitorDevice is maintained attribute bit, and wherein this attribute bit is attached to cache line or otherwise straightConnect with cache line and be associated, and Cache consistency state is added to the consistent proterties of cache lineState bit. Therefore, in this case, hardware monitor and buffering coherency state are cache linesA part for structure, and do not remain on point other abstractively in space, location, all first numbers as shownAccording to 217. But in other embodiments, monitor can be used as metadata 127 and remains on and data216 memory locations that separate, similarly, metadata 217 can comprise about data item 216The quoting of instruction of the data item of buffering. On the contrary, replace wherein data item 216 as described aboveLocal update (update-in-place) framework that is kept and upgrades with buffer status, metadata217 can keep the data item of buffering, and visible the overall situation of data item 216 version is maintained, it is originalIn position. Here once submit to, remain on the Catch updated alternate data item in metadata 217,216。
There is the metadata of loss
To above with reference to buffering cache coherence state discuss similar, at an embodimentIn, metadata 217 has loss---do not provide to the external request of memory 215 overseas portionsLocal information. Suppose that memory 215 is embodiment for shared cache memory, ringsShould be not serviced in the miss outside in cache memory 215 territories of metadata access operations.In essence, because the metadata 217 that has loss only remains in cache domains this locality, and depositingIn reservoir subsystem, do not exist as permanent data, therefore, have no reason this missly outwards to turnSend out, serve from the more request of advanced memories. Therefore, to have lose the miss of metadata canWith serviced in quick and effective mode; In processor, the distribution immediately of memory can be assigned with,And do not need to wait for for the external request of the metadata that will generate or serve.
Abstract address space
As directed embodiment describes, and metadata 217 remains on the storage separating with data item 216Position---in different addresses, this has caused the abstract address space separating of metadata; This is abstractAddress space and data address orthogonal space---the metadata access operations to abstract address space can notHit or revise physical data entry. But, metadata remain on identical memory (such asMemory 215) in embodiment in, by the competition of the distribution in memory 215, abstractAddress space may affect data address space. As an example, data item 216 is buffered in depositsIn the entry of reservoir 215, and the metadata 217 of data 216 is maintained at high-speed cache anotherIn order. Here, follow-up metadata operation may cause selecting the memory location of data item 216, withFor the metadata of regaining and replacing different pieces of information item. Therefore, be associated with the address of metadata 217Operation can hiting data item 216, but the metadata address of associated metadata elements may be replaced physicsData, such as the data item 216 in memory 215.
Even like this, in this embodiment, metadata may with data contention cache memory inSpace, the local ability that keeps metadata can cause the effective support to metadata, and does not existOn whole memory hierarchy, spread the expensive expense of permanent metadata. As the hypothesis of this example is inferred---metadata remains on same memory, in memory 215; But, in interchangeable realityExecute in example, the data item 217 that the metadata 217/ of data item 216 is associated with data item 216 keepsIn point other memory construction. Here, the address of metadata and data can be identical, and firstThe abstract partial index of data address is to point other metadata store structure instead of a data store organisationIn.
In the ratio of metadata and data 1 to 1, spatially masked data address space, abstract address,But still as above is orthogonal. Relatively, as described below, can come about physical dataCompression metadata. In this case, the size of the abstract address space of metadata can not covered data groundLocation space size, but remain orthogonal.
Abstract address mapping
Continue the discussion of abstract address space, can use by data address space such as data itemThe data address of 216 address etc. is transformed into the metadata such as metadata 217 in abstract address spaceAny method of the abstract address of address etc. In one embodiment, use abstract converter logic 210To become metadata address such as the address mapping of data address 200 grades. Wrap described address 200Draw together address that be associated with data item 216 or reference data item 216. Can use normal numberCome storage according to conversion (such as the conversion between physical address or linear address and virtual address)Data item 216 in device 215 is carried out index. In addition, metadata 217 and data item 216 is associatedThe address 200 that comprises reference data item 216 is similar to another different addresses of reference metadata 217Conversion; Therefore, utilize address 200 that data transformation logic 205 carries out to the conversion of data address withAnd utilize the address 200 that abstract conversion 210 is carried out to cause respectively to the conversion of different abstract addressesAccess, and can not affect each other---created the property of orthogonality of two address spaces. As below moreDiscuss in detail, in one embodiment, the use of data transformation 205 or abstract conversion 210 is basesIn the action type to reference address 200---the normal data access behaviour of visit data item 216Do usage data conversion 205, and the metadata access operations of accesses meta-data 217 is used abstract conversion210, this can identify by a part for the operation code of instructions/operations (command code).
In another embodiment, the instruction of being identified by its command code may be for given metadata addressVisit data and metadata the two, and therefore carry out complicated operation, such as based on metadata to dataCondition storage. As example, be test metadata and be set to a value by Instruction decodingTest and metadata operation is set, and data are set to the attached of a value in the time that metadata is tested successfullyAdd operation. As another example, data that data item can be based on reading from data storage and movingMove the metadata address of coupling.
Next comprise the example that data address 200 is transformed to the metadata address of metadata 217.As the first example, data address is transformed to metadata address and comprises and utilize abstract converter logic 210Use physical address or virtual address and add arbitrary value---after normal data transformation 205---,So that data address and metadata address are separated. The in the situation that of conversion using virtual address,Abstract converter logic 210 comprises the logic of virtual address and arbitrary value combination. But, just usingNormal is virtual in the situation of physical address conversion, obtains from ground with normal data transformation 205The address that location 200 converts, then, abstract converter logic 210 comprises address and the arbitrary value of conversionCombination is to form the logic of metadata address. As another example, can use abstract conversion 210In point other mapped structure, table and/or logic come transform data address 200, to obtain different unitsData address. Here,, compared with data transformation logic 205, abstract converter logic 210 can reflectOr comprise point other logic---the logic by address 200 with arbitrary value combination, and still, abstract conversionLogic 210 comprises that page table information is to be transformed to address 200 different, unique metadata address.Visible, by information being added to data address, utilizing the information that is attached to data address to expand, replaceInformation or the transform data address changed in data address obtain metadata address, the uniqueness obtainingMetadata address by be added, expansion, replace or conversion algorithm and be associated with data item,Simultaneously still orthogonally upgrade improperly or reading out data item preventing.
The following describes data address is transformed to metadata address, or in other words, according to/based on dataSeveral specific description examples of metadata address are determined in address. (1) use normal virtual arrivingThe first data address is transformed to the second data address by physical address conversion, and by arbitrary value add,Add or be included in this data address or this data address, to form metadata address;
(2) data address is not performed the virtual to physical address conversion, and by arbitrary value interpolation, attachedAdd or be included in data address or data address, to form metadata address; (3) use and take outResemble the metadata address that conversion table logic is transformed to data address conversion, this can also comprise but needn'tMust comprise arbitrary value is added, adds or be included in the metadata address of conversion or first number of conversionIn address, to form metadata address. In addition, above-mentioned any converter technique of mentioning can merge,That is, the compression ratio based on data to metadata, thereby for each compression ratio storing metadata respectively.
Here, can come as follows modified address for conversion and/or compress by example: ignoreBeing used in the specific bit of address, the specific bit that removes address, change address selects data notThe bit of one-size, convert specific bit and the utilization information relevant with metadata is added or replacesChange specific bit. Discuss in more detail compression below with reference to Fig. 4.
Multiple abstract address spaces
Forward Fig. 3 to, it shows the embodiment that supports multiple abstract address spaces. In one embodiment,Each treatment element and abstract address space correlation connection, can maintain independently each treatment elementMetadata. Four treatment element 301-304 have been described. As mentioned above, above treatment element can compriseWith reference to any element described in Fig. 1. As the first example, treatment element comprises the core of processor.But, as the illustrated examples of further discussing below, with reference to the hardware thread (line in processorJourney) discuss treatment element 301-304; Each hardware thread executive software thread and possible multiple softPart subsystem.
Therefore, maybe advantageously, allow the individual threads in thread 301-304 to maintain point other yuanData. In one embodiment, abstract converter logic 310 is by the access from different threads 301-304The abstract address space correlation connection suitable with it. As example, quote in conjunction with metadata access operationsThe thread identifier (ID) that address is used indexes correct abstract address space.
In order to illustrate, to suppose to have received and be associated with thread 302 and the number of reference data item 316According to the metadata access operations of address 300. Can be with any transform method as above by numberData address according to item 316 is transformed to metadata address. But this conversion additionally comprises and threadID302 combination, wherein for example, Thread Id 302 can or come from the control register of thread 302Command code from the instruction receiving of thread 302 obtains. This combination can comprise Thread Id 302Be attached to this address, replace the bit in address or Thread Id is associated with address otherPerception method. As a result, abstract converter logic 310 can be selected/be indexed to and data for treatment element 302The item 316 abstract address spaces that are associated.
This example of extrapolating, by being used the Thread Id of thread 301-304 as the conversion to abstract addressA part, each treatment element 301-304 can maintain the independent entry data of data item 316. But,Due to by so that the transparent mode of software is used to Thread Id, hardware can keep abstract address spaceSeparately, therefore, programmer does not need to manage individually abstract address space. In addition abstract address sky,Between be orthogonal---a metadata access from a thread can not accessed from another threadMetadata, this is that (it comprises unique thread because each metadata access is with point other address setID quotes) be associated.
But, as described below, about the instructions/operations of accesses meta-data, may exist wherein from oneThe metadata access of individual thread is provided some situation of the access of the metadata to another thread. ChangeYan Zhi, in some implementation, the access of crossing over PEID and/or MDID (as described below) canCan be favourable. For example, for determine hardware conflict whether detected, in order to check from anotherThe supervision metadata of individual thread with determine the data item that is associated whether by another thread monitor, forRemove the metadata of other thread or for determine submission condition, thread may need check, repairChange or the metadata of other thread that removing is associated with data item 316.
Here, the particular opcode of accessing the operation of the metadata of another thread is identified, and because ofThis, abstract converter logic 310 executive address 300 are to all metadata ground of metadata that will be accessedThe conversion of location. As specific description example, wherein four bits are attached to address 300, eachBit represents in treatment element 301-304, and metadata access operations (such as clear operation)Clear data all metadata of 316, then every by four bits of abstract converter logic 310One is set to access all metadata 317. Here, can design memory 315 search logic,All metadata 317 are accessed in the individual access wherein with four all bit settings, or abstractConverter logic 310 can generate four difference with the different threads ID bit of four bits in arrangingAccess, to access all metadata 317. As illustrated examples, can be by mask (mask)Be applied to address value to allow a thread to hit the metadata of another thread.
In addition, as shown, each treatment element 301-304 can join with the space correlation of multiple abstract addresses,So that the multiple contexts in single thread or software subsystem are interweaved to multiple metadata address spaceIn. For example, in some cases, maybe advantageously, allow multiple soft in single treatment elementPart subsystem maintains independently collection of metadata. Therefore, in one example, can be in multiple processingComponent level (such as, core level, hardware thread rank and/or software subsystem rank) just provideHand over metadata address space. In this example, each treatment element 301-304 and two abstract address skiesBetween be associated, wherein each in two abstract address spaces is associated with software subsystem, withOn a treatment element, carry out.
Software subsystem comprises will be on treatment element (it can use point other space, location abstractively)Any task or the code carried out. As an illustrated examples, can with independent abstract address skyBetween four subsystems being associated comprise affairs subsystem running time, garbage collected running timeSubsystem, memory protection subsystem and software mapping subsystem, they can be in single processing elementsOn part, carry out. Here, each software subsystem can be controlled treatment element in the different moment. AsAnother example, software subsystem is included in the independent affairs of carrying out in single treatment element. TrueUpper, may be desirably in the nested affairs and point other location space correlation abstractively on same thread, carried outConnection. In order to illustrate, may failure for the filters to test of the access of the data item in outer transaction,But, maybe advantageously, the wave filter that the second uniqueness is provided in internal layer subtransaction withThe access of one data item, this can successfully accelerate the access in inner transaction respectively. In addition, work as embeddingWhen the inner transaction of cover is ended, in order to ensure the metadata that maintains outer transaction, each nested affairs---subsystem---joins with unique metadata space correlation, makes the unit of the affairs nested to internal layerThe removing of data can not affect the metadata of outer transaction. But software subsystem is not limited,It can be can management of metadata any task or code.
In one embodiment, as mentioned above, in order to provide abstractively orthogonal in software subsystem rankSpace, location, this address and treatment element ID (PEID) combination, and in addition, with metadata ID (MDID)Or context ID combination. Therefore, dividing other metadata can be for the subsystem in treatment elementUnique identification. Use example above, suppose that treatment element 301-304 is hardware thread, and lineJourney 302 is just being carried out outer transaction and inner transaction nested in outer transaction. For outer transaction, unitData 317c is transformed to the data address of data item 316 300 to add skin by abstract conversion 310The Thread Id (TID) of affairs and the address of metadata ID (MDID) are come relevant to data item 316Connection, wherein said address reference metadata 317c.
As pure illustrated examples, metadata 317c comprises four filter value---read filter value,Writing filtering device value, cancel filter value and compound filter value; For the backup position of data item 316The pointer of putting or other are quoted; The supervision value whether instruction has lost the supervision of data item 316;The version of transaction descriptor value and data item 316. Similarly, the unit of inner transaction and data item 316Data 317d is associated, and it comprises the metadata word identical with metadata fields in metadata 317cSection. As above, abstract conversion 310 is transformed to the data address of data item 316 300 and inner transactionThread Id and the address of metadata ID combination, described address reference metadata 317d.
Here the metadata of the metadata address of reference metadata 317c and reference metadata 317d ground,Between location, only difference can be the metadata ID of outer transaction and inner transaction; But, this addressDifference has guaranteed that address space is disjoint/orthogonal---from the visit to metadata of inner transactionAsk and will can not affect the metadata from outer transaction, this is due to the access for from inner transactionMDID will be different from outer transaction. As mentioned above, nested for rollback (rollingback)Affairs or keep different metadata values for different stage affairs, this may be favourable. Particularly,If inner transaction is ended, the Backup Data that remains on the data item 316 in metadata 317d is passableBe eliminated, or be used to make data item 316 rollbacks to inlet point before inner transaction, and do not haveRemoving or impact remain on the Backup Data of the outer transaction in metadata 317c.
Note, the metadata ID (MDID) that abstract software subsystem address space is separated can beArbitrarily size and can be from multiple sources. As simple explanation example, for four processingElement (PE) 301-304, PEID can be from the combination of two bits---and 00,01,10,11.Similarly, if support four points of other space, location, MDID of two bits abstractively---00,01,10,11 can between four subsystems, distinguish similarly. In order to illustrate, represent to processThe value of the subsystem 2 in element 302 and PE302 comprises 0101, and (the first two bit is for PE30201, latter two bit be for the second subsystem 01). In this example, abstract converter logicThis value and data address 300 or its conversion are combined, comprise metadata position 317d to quotePE302MDID01.
But the two may be more complicated for Thread Id and MDID. For example, suppose thread 301-302Share the access to memory 315, thread 303-304 is the access of not sharing memory 315 simultaneouslyTeleprocessing element. In addition, suppose that each in thread 301-302 is for thread 301-302'sFour orthogonal address spaces---PE301MD0, PE301MD1, PE302MD0 and PE altogether302MD1 address space is supported two software subsystems. In this case, be used for obtaining metadata groundThe Thread Id of the combination of location and the value of MDID can be from command code, control register or its combinations.In order to illustrate, command code provides a bit for context/MDID, and control register provides oneBit is for the treatment of element ID (PEID)---suppose only two treatment elements, and such as MDCRThe metadata control register of 320 grades provides four bits to identify specific software subsystem/contextFor larger granularity. Therefore, when from the second thread,---PE302---receives reference data itemWhen the metadata access operations of 316 address 300, a bit from command code (is comprised to instructionThe first bit of second contextual 1) and from control register for the treatment of of element 302Two bits (comprising the second bit of 1 of instruction treatment element 302) to and relevant from the second threadThe MDID of the metadata control register (MDCR) 320 of connection is combined; Previously controlled theThe MDCR identification that the current subsystem MDID of two threads---0010---upgrades and the behaviour who receivesThe suitable subsystem that work is associated. Abstract converter logic obtains combined value, such as 110010,And further itself and data address 300 or its conversion of quoting are combined, to obtain unitData address. But the subsystem that 110010 parts of metadata address are derived from accessing operation isUnique, therefore it is the metadata address 317d that only hits or revise in memory 315, and can notHit or affect metadata address 317a, b, c, e, f, g, h---for the second thread and otherThe orthogonal abstract address space of other subsystem in thread.
As specific description example, comprise the discussion of the MDCR to particular form. Real at someExecute in example, can utilize every thread meta-data identifier register (MDID register) to expand ISA,This register is as the MDID's for the responsive metadata load/store of MDID/test/arrange instructionSource. In certain embodiments, having multiple this registers is easily. For example, MDCR: unitData Control register is that 32 bits read-write register, and it comprises current metadata contextID (MDID). It can be upgraded by CRMOV. Exemplary bit fields is defined as follows:
The exemplary embodiment of the bit of Table A: MDCR
MDID0 and MDID1 are the metadata ID that instruction set can Concurrency Access. In these fieldsThe actual amount of bits using is MDID size, and in one embodiment, it is only allowing arbitrarilyRank reads, and it is specified by processor design. But, in other embodiments, different stageLevel of privilege may can be revised this size. May not exist and guarantee what the applicable big or small bit of MDID distributedHardware check. In one embodiment, MDID0 and MDID1 can allow arbitrarily rank to be writeEnter and read. Can also specify the spy who is always read as 0 or 1 by special MDID valueDifferent metadata space. This can be used by software, with force metadata values with reference to 6 and 7 couples, figureThe similar mode of discussion of register, force all metadata tests in frame to be true or to be false.
But, in another example, as mentioned above, the abstract change of being combined with decoder (not shown)Change logic 310 and can identify the metadata access operations from thread 302, described metadata access behaviourMake the metadata of intention access from the metadata address space of thread 301, and allow to access thoseSpecific instructions/operations is to read or to revise the metadata of thread 301.
Metadata is to the compression of data
Discuss the one to one mapping of data to metadata (unpressed metadata) above;But in some cases, it is more effective using metadata more in a small amount compared with data---pressContracting metadata, wherein metadata size is less than data. Note the abstract address converter logic in Fig. 2-3210 and 310 can the conversion of executive address and when amendment considering compression, correspondingly to quote compressionMetadata. With reference to figure 4, show modified address to realize the embodiment of compression of metadata; Particularly,The embodiment that to have described data be 8 to the compression ratio of metadata. Abstract address mapping such as Fig. 2-3 is patrolledVolumes 210 and 310 etc. control logic is for receiving the data address of being quoted by metadata access operations400. As an example, compression is included in address 400 or from address 400 and is shifted or removeslog2(N) individual bit, wherein, N is the compression ratios of data to metadata. In shown example,For the compression ratio that is 8, for metadata address 405, three bits are shifted downwards and are removed. ThisIn matter, comprise that the address 400 that 64 bits carry out the particular data byte in reference stores device is cut out threeIndividual bit, to form the meta-data bytes ground for the metadata in reference stores device in byte granularityLocation 405; Select the bit in metadata with three bits that previously removed from address, to formMeta-data bytes address.
In one embodiment, replace with other bit the bit that is shifted/removes. As shown,Utilize 0 to replace the higher order bits of address 400 after being shifted. But, can utilize other numberAccording to or information, such as the treatment element ID, the context identifier that are associated with metadata access operations(ID) and/or metadata ID (MDID), replace the bit that is removed/is shifted. Although show at thisIn example, remove the bit of lowest number, but factor that can be based on any amount, such as slow at a high speedDeposit the timing of tissue, cache circuit, metadata to the locality of data and minimise data andConflict between metadata, removes and replaces the bit of any position.
For example, the log that may data address be shifted2, but address bit is made zero at 0: 2 (N).As a result, identical physical address and the bit of virtual address are not to be shifted as above-mentioned example, and this permitsPermitted to select in advance to there is unmodified bit set and the block of (such as bit 11: 3).
Note, can be combined with compression about the discussion of conversion. In other words, compression ratio can be toThe input of the abstract address converter logic 210 and 310 of Fig. 2-3, and converter logic in conjunction with PEID,CID, MDID, arbitrary value or out of Memory use compression ratio, data address is transformed to first numberAccording to address. Then, use metadata address to visit the memory that keeps metadata. State as discussed above, because metadata is local structure---there is loss, therefore, right based on metadata addressThe miss of memory can be served rapidly and effectively---allocate memory position, and do not needGenerating outside miss service request and do not need to wait for will serviced external request. Here,Distribute entry for metadata in normal mode. For example, based on metadata address 405 and such asThe high-speed cache replace Algorithm of least recently used (LRU) algorithm, by the entry such as Fig. 2 217Deng entry select, distribute and be initialized as metadata default value. As a result, metadata may with routineData contention space, but still compressed and non-intersect with other software subsystem/treatment element.
Note, be that 8 compression ratio is pure illustrative, and can use any compression ratio. DoFor another example, use the compression ratio of 512: 1---the data of bit-cell data representation 64 bytes.To above-mentioned similar, by log that data address is shifted downwards2(512) bit---9 bits, convert/Update Table address is to form metadata address. Here, substitute bit 0: 2, still use bit 6: 8Select bit, thereby effectively by creating compression in the selection at 512 bit granularity places. Due toData address has been shifted 9 bits, and therefore, the high order part of data address has 9 openingsBit position carry out maintenance information. In one embodiment, these 9 bits are used for keeping identifier,Such as context ID, Thread Id and/or MDID. In addition, abstract space value also can remain on thisIn a little bits, or can expand this address by arbitrary value.
In one embodiment, support multiple concurrent compression ratios by hardware. Here, by compression ratioRepresent to remain a part for the arbitrary value combining with data address, to obtain metadata address. As a result,Utilizing data address to during the searching of memory, considering compression than and this compression ratio not from differentThe matching addresses of compression ratio. In addition, software can depend on hardware, thereby storage information is not forwardedTo the load of different compression ratios.
In one embodiment, although hardware uses single compression recently to realize, comprise that other is hardPart supports to present multiple compression ratios to software. As example, as shown in Figure 4, suppose to use 8: 1Compression recently realize cache hardware. But, in the metadata of different grain size place accesses meta-dataAccessing operation is decoded, to comprise the microoperation of the metadata for reading default volume, and for testThe test microoperation of the suitable part that metadata reads. As example, the default volume that metadata reads is32 bits. But, read for the test operation test metadata of the different grain size/compressions of 8: 1 32The correct bit of bit, the specific quantity bit that this can be based on address, many such as metadata addressIndividual LSB and/or context ID.
As explanation, the first number in support for the misaligned data of the metadata bit of every byte dataAccording to scheme in, three LSB based on metadata address read the minimum of bit from 32 of metadataIn effective 8 bits, select individual bit. For the data of a word, based on three LSB of addressFrom read minimum effective 16 bits of 32 bits of metadata, select two continuous first numbersAccording to bit, and continue 16 bits for 128 bit-cell size of data by all aforesaid ways.
Metadata access instructions/operations
Turn to Fig. 5, show the flow chart of the method for access metadata associated with the data. Although withSubstantially continuous mode shows the flow process of Fig. 5, but this flow process executed in parallel at least partly,And may carry out with different orders.
In flow process 505, run into the metadata operation of the data address of quoting data-oriented item. UpperIn the discussion of stating, mentioned and can support metadata instructions/operation with hardware, with read, revise and/Or removing metadata. In other words, can in processor instruction set framework (ISA), support to make instructionThe operation code (command code) of decoder recognition instruction of processor, with visit data correspondingly with holdThe logic of row access. Note, the use of instruction can also be called as operation. Some processors use grandThe concept of instruction, this macro-instruction can be interpreted as multiple microoperations, to carry out independent task, instituteState macro-instruction such as test and metadata macro-instruction is set, this macro-instruction is interpreted as metadata test behaviourWork/macro operation, for testing metadata, and if correct Boolean is as the result of test operationObtain, setting operation is specific value by metadata updates.
But metadata access operations is not limited to the explicit software instruction of accesses meta-data, but alsoCan comprise implicit expression microoperation, this implicit expression microoperation is as the data item comprising being associated with metadataThe part of larger more complicated instruction of access and decoded. Here, data access instruction can be byBe decoded as multiple operations, upgrade such as the implicit expression of the access to data item and associated metadata.
As previously discussed, in one embodiment, in hardware, metadata is to the physical mappings pair of dataSoftware is not directly visible. As a result, in this example, metadata access operations reference dataAddress, and depend on hardware and carry out correct conversion, that is, mapping, suitably to access first numberAccording to. But, depend on thread, context and/or software subsystem that metadata access operations is derived fromSystem, metadata access operations can be quoted separately point other space, location abstractively. Therefore, memory canTo keep the metadata of data item in transparent mode about software. When hardware detection is to metadataWhen accessing operation, by explicit operation code (command code of instruction) or be first number by Instruction decodingAccording to access microoperation, necessity of the data address that hardware implement is quoted by accessing operation converts, thereby visitsAsk metadata.
As shown in this example, program can comprise quotes same data item (such as, the number of Fig. 2-3According to item 216 and 316) operate (for example, data access operation or metadata access behaviour dividing else of addressDo), and hardware can be by these access map to different address spaces, such as physical address spaceWith abstract address space. In certain embodiments, ISA can be expanded and have for for given virtualAddress, MDID, compression ratio and operand width load storage test the instruction of metadata is set.Any one of these parameters can be explicit commands operand, can be coded in command code orPerson can obtain from point other control register. Instruction can be by metadata load/store operations and otherOperative combination, other operational example as loaded some data, test these data some bit andBe provided for the CC condition code of condition subsequent redirect. Instruction can also be cleaned all metadata, orThe only metadata of cleaning needle to specific MDID. The multiple declarative metadata access behaviour that list belowDo. Note, some illustrative instructions are with reference to the instruction of specific 64X compression ratio, still can be forDifferent compression ratios and unpressed metadata, used similar instruction, but they are not by toolBody is open.
The test of metadata bit and setting (MDLT)
Metadata loads and test instruction (MDLT) has 2 independents variable: metadata is associated with itData address (as source operand) and register (destination operand), wherein byte, word, twoWord, four words or other the big or small metadata that comprises bit are written in register. First number of testBe written in this register according to the value of bit. Programmer should not suppose to know and be stored in MDLT and refer toThe relevant any knowledge of data in destination register in order, and should not handle this register.This register is only with doing the metadata store of identical address and the source operand that instruction (MDSS) is set.In one embodiment, MDLT instruction will be tested and setting operation combination, but will test successfullySituation under get into (squash) setting operation.
Metadata store and setting (MSS)
Metadata store and instruction (MDSS) is set there are 2 independents variable: metadata is associated with itData address and register (source operand), wherein byte, word, double word, four words or other sizesThe metadata that comprises bit be stored in memory from this register. MDSS instruction will be from itCorrect bit is set in the value of source operand.
Metadata store and reset indication (MDSR)
MDSR instruction has 2 and is derived from variable: the metadata data address associated with it is (as sourceOperand) and register (source operand), wherein byte, word, double word, four words or other sizesThe metadata that comprises bit is reset from this register. MDSR instruction is by from its source operandThe correct bit of resetting in value.
Determine metadata address according to the data address of quoting. The example of determining metadata address comprisesIn superincumbent abstract address mapping and multiple abstract addresses space segment. But, note this conversionCan be incorporated to the compression ratio of (that is, based on) data to metadata, thereby recently divide for each compressionNot storing metadata.
Test metadata (CMDT)
Table B: the illustrative embodiment of test metadata operation
The compressing mapping function that CMDT instruction utilization depends on realization converts metadata address to storageDevice metadata address, and whether the test metadata bit corresponding with storage element data address is establishedPut. As an example, compression ratio CR is that 1 bit is to 8 bytes. Metadata address is calculated from MDCRRegister is incorporated to a context ID, to provide unique MD for each independent context IDSet, addressing MDBLK[CR] [MDCR.MDID[MDIDnumber]] .META. This instruction willAddress " mem " is aimed at the size of data of specifying, and has therefore forced aligning. This instruction testing unit numberAccording to whether being set up.
What comprise below is that (ZF mark is set to represent 0 to the exemplary pseudo-code relevant with CDMTMetadata values. Other all marks are eliminated):
The metadata store (CMDS) of compression
Table C: the illustrative embodiment of storing metadata operation
The compressing mapping function that CMDS instruction utilization depends on realization becomes memory data address transitionStorage element data address. Compression ratio is that 1 bit of data is to 8 bytes. The coding of imm8 value is as follows:0 → MDvalue; Store value and 7: 1 → reservation in MD into; Do not use
What comprise below is the exemplary pseudo-code being associated with CMDS:
Realize and noting: instruction will be carried out and read-arrange bit-write operation metadata.
Mark impact: nothing
Protected mode and compatibility mode are abnormal
#UDifCR4.OSTM[bit15]=0
No#PF
64 bit modes are abnormal
#GP (0) is if storage address is non-canonical form.
The metadata of compression is removed (CMDCLR)
Table D: the illustrative embodiment of removing metadata operation
CMDCLR instruction reset with leap MBLK (mem) scope in any data correspondingAll MDBLK[CR] [MDCR.MDID[MDIDnumber]] .META.
Comprise as follows the exemplary pseudo-code relevant with CMDCLR:
Realize note: in the first realization, support 64: 1CR, 1 byte will be removed.
Mark impact: nothing
Protected mode and compatibility mode are abnormal
#UDifCR4.OSTM[bit15]=0
The abnormal #GP of 64-bit mode (0) is if storage address is non-standard form.
Next, in flow process 510, based on compression ratio, treatment element ID, context ID, MDID,Arbitrary value, operand size and/or the relevant value of other abstract address spatial alternation, come according to metadataThe data address of quoting in accessing operation is determined metadata address. Can use above-described any sideMethod (for example, not to data address convert, to data address normally convert, to dataAddress divides other in situation of location conversion, to combine ID value abstractively) obtain suitable metadataLocation.
In addition, as mentioned above, in some cases, provide test, setting, the removing of a versionOr other instruction, to allow a thread or metadata context to test, arrange or remove other lineJourney or the contextual metadata of metadata. As a result, can comprise amendment to the conversion of metadata addressAddress, such as application mask, to allow access from a thread or context ID to another threadOr context ID conducts interviews.
In flow process 515, the metadata that access is quoted by metadata address. For normal condition, rightThe non-intersect position of the metadata being associated with local request thread or context ID conducts interviews, and holdsRow proper handling, such as carrying out test, arrange and removing. But, in the second situation, as aboveDescribed, also can in this flow process, conduct interviews to the metadata of other thread or context ID.
Extract
Comprise the embodiment of the extraction (abstract) of software herein. The CR providing is 2 power,Its instruction has how many data bits to be mapped to a bit of metadata. That limited by implementation canTo use which CR value (if any). CR > 1 represents the metadata of compression. CR=1 represents notThe metadata of compression.
MDBLKP[CR] [*] be ceil (CR/8) byte in size, and be that nature is aimed at.MDBLK is associated with physical data, instead of is associated with its linear virtual address. There is identical valueAll effective physical address A of floor (A/MDBLK[CR] [*] _ SIZE) specify identicalMDBLK set.
For the CR providing, can there are the different MDID of any amount, each MDID specifies onlyOne metadata example. For the metadata of given CR and MDID with for any other CROr the metadata difference of MDID. For example, for Thd#0, suppose that addr is that QWORD aims at,By MDBLK[CR=64] [MDID=3] meta data block of (addr) quoting withMDBLK[CR=64] [MDID=3] (addr+7) identical, but its natch withMDBLK[CR=64] [MDID=4] (addr) and MDBLK[CR=512] [MDID=3] (addr) difference.
The realization providing can be supported multiple concurrent contexts, and wherein contextual number will depend onCR be the relevant specific configuration information of its a part of particular system with processor. For unpressed, there is the metadata of QWORD for the physical data of every QWORD in metadata.
Metadata is only explained by software. Software can arrange, resets or test specificallyMDBLK[CR] META of [MDID], or the MDBLK[* of all Thd that reset] [*]META, or all Thd that replacement can be crossing with given MBLK (addr)MDBLK[CR] META of [MDID].
Metadata is lost. Any META characteristic of Thd can reset to 0 naturally, to produce first numberAccording to loss event.
Compulsory metadata values
With reference to Fig. 6, show the embodiment that hardware supported is provided for compulsory metadata values. STMConventionally guarantee to use the uniformity between the memory access operation of accessing barrier. For example,, at memoryBefore visit data item, check the metadata position or the lock position that are associated with data item, to determine numberWhether can use according to item. Other possible barrier operating is included in the data item in metadata or lock positionObtain lock (such as reading lock, write lock or other lock), affairs read or write set in rememberThe version of record/storing data item, determines the reading set and whether remain effectively of affairs at this some place,Buffering or the value of Backup Data item, monitor is set, upgrades filter value and other thing arbitrarilyBusiness operation.
But, conventionally, in affairs, the subsequent access of same data item can be caused and be run into this at every turnWhen the access of data item, carry out the expense of the affairs barrier being associated. In order to illustrate, in affairs, carry outThree times to the writing of address A, and this has caused carrying out respectively and writes barrier for three times to obtain in this caseGet the lock that writes to address A. But, obtain by writing barrier in the fashionable execution of the first transaction writeGot the lock to address A, and follow-up before latter two affairs write writes holding of barrier for twiceRow is unnecessary---needn't again obtain the lock to address A.
Therefore, in one embodiment, hardware keeps filter value to be associated with these barriers with accelerationExecution. This filter value can be used as annotation bit and being included in high-speed cache, such as for readingGet and write monitor, or remain in the metadata position in abstract address space, as previous instituteDescribe. Use example above, when running into first while writing barrier, its by writing filtering device value fromAccess value is not updated to access value, and to indicate, the address A's that run into write barrier in thingIn business. Therefore, operate in affairs once follow-up two affairs write, write barrier guiding toBefore, check the writing filtering device value of address A. Here, filter value comprises access value, and it refers toShow not need to carry out and write barrier---in affairs, carry out and write barrier. As a result, will notExecution guides to the barrier that writes for latter two write operation. In other words, filter value has been acceleratedAffairs are carried out---and compared with the previous example that does not use wave filter, omit or do not comprised for rearThe execution that writes barrier of two access.
Note, can with make for the identical mode of the writing filtering device of write/storage operation aboveWith for load/read read wave filter, for cancelling the cancellation wave filter of operation and for generallyThe compound filter of filter operations.
The concept being also associated with affairs barrier is strong atomicity and Weak atomicity, and it relates to transaction operationIsolation with non-transaction operation. Here, as to affairs the affairs of the memory location that loads writeBe potential conflict, it is to cause non-affairs to load that the affairs of the memory location that non-affairs are loaded writeThe potential conflict of the invalid data that operation is used. In Weak atomicity system, do not have or minimumBarrier is inserted in non-transaction operation place, and therefore the operation of Weak atomicity system exists the wind of invalid executionDanger. On the contrary, in strong atomicity system, affairs barrier is also inserted in non-transaction operation place; This carriesSupplied protection and isolation between affairs and non-transaction operation, but cost is---in each non-affairsThe expense of affairs barrier is carried out in operation place.
Therefore, in one embodiment, above-described wave filter can be at non-transaction operation placeWith strong atomicity barrier combination, to support strong atomicity operation and the Weak atomicity operation of different mode.In order to illustrate, the exemplary embodiment of simplifying is shown in Fig. 6. Here, metadata 610 remains onIn the hardware of data 605, as discussed above. Receive metadata access 600 with accesses meta-data 610.In one embodiment, metadata access comprises test metadata operation, with test filter, such asRead wave filter, writing filtering device, cancel wave filter or compound filter.
The test metadata operation of test filter can be derived from affairs or the operation of non-transactions access. OneIn individual embodiment, in the time of compiling application code, test filter is operated inline insertion by compilerIn application code, using as carry out calling affairs barrier at affairs and non-transactions access placeCondition. Therefore, in affairs, before calling barrier, carry out filter operations, and if it returnsReturn successfully, calling of affairs barrier is not performed, so that the acceleration of being discussed to be above provided.
But for non-transaction operation, in one embodiment, hardware can be in Weak atomicity patternWith in strong atomicity pattern, operate, in described Weak atomicity pattern non-transaction operation place carry out thingBusiness barrier, and carry out affairs barrier in described strong atomicity pattern.
Can be in metadata control register (MDCR) 615 setting operation pattern or control 625,It can, with above-described for keeping the MDCR combination of version of MDID, can be maybe respectivelyControl register. In another embodiment, the control 625 of operator scheme can remain on general thingIn business control register or status register. Here, the first execution pattern comprises strong atomicity pattern,Wherein carry out affairs barrier at non-transaction operation place. In this case, control 625 and represent the first value,Such as 00, to indicate strong atomicity and non-transaction operation pattern. As response, be listed as exemplaryThe metadata 610 that the logic 620 of multiplexer maintains from the hardware being associated with data address ASelection will be provided for the metadata values of destination register 650 for metadata access 600. EssenceUpper, in strong atomicity pattern, keep metadata to accelerate barrier based on actual hardware. ReplaceableGround, during the second execution pattern such as Weak atomicity and non-transaction mode, as being worth by expression secondThe control 625 of (such as 01) is indicated, in response to metadata access 600, and will be from MDCRFixing or compulsory value but not metadata 610 that hardware maintains offers destination register 650.
In essence, in Weak atomicity pattern, in response to test filter operation 600, force values is carriedSupply with destination register 650, always successful to guarantee the test of filter value, and deposit in non-affairsBefore reservoir access, do not carry out calling affairs barrier. Note this description hypothesis test filter behaviourReturning to Boolean is that successfully (not carrying out barrier) (will or failed with instruction filters to testCarry out barrier). As a result, be configured to by omitting barrier based on filter value and accelerate the identical of affairsFilter software is used to provide an operator scheme, wherein omits all barriers at non-transaction operation place---Weak atomicity pattern, and the second operator scheme, the metadata wherein maintaining based on hardware is heldRow or accelerate the barrier at non-transaction operation place---strong atomicity. In another embodiment, can be forEach pattern provides different force values. Here,, in strong atomicity pattern, force values will be guaranteed to surveyThe failure of examination filter operations, is always performed barrier, and in Weak atomicity pattern, force valuesTo guarantee that test filter operates successfully, makes not carry out barrier.
Although about based on pattern, operation provides fixing/force values or metadata values to describe based on allAs the control information of controlling 625 grades provides the pressure from the control register of such as MDCR615 etc.Or fixed value, but can make to force or fixed value for using to provide for any general metadata,Such as allowing the constant behavior of usage data for debugging, and to the memory that can enable by demandThe general supervision that access is carried out.
Turn to Fig. 7, described for accelerate atomicity that non-transaction operation maintains transaction environment simultaneouslyEmbodiment or flow chart. In flow process 705, run into metadata (MD) access of reference data addressOperation. As a specific illustrated examples, MD accessing operation comprises previously by compiler and applicationThe test operation that program code inserts inlinely, omits when to return to a value (success) in test non-The affairs barrier at transactional memory accesses place, and in the time that the second value (failure) is returned in test, carry out screenBarrier. But test MD operation is not limited to this, it can comprise for returning to boolean's success or failureAny test operation of value.
In flow process 710, determine operator scheme. Here, the example of operator scheme can be with former by forceAffairs or the non-affairs of sub-property or Weak atomicity combination. Therefore, other register of one or two point canTo keep the operator scheme of the first bit with instruction affairs or non-affairs, and for strong atomicity operationThe second bit of pattern or Weak atomicity operator scheme.
If operator scheme is affairs or non-affairs and strong atomicity, carry to metadata access operationsFor hardware maintain metadata values---the value that hardware is maintained is placed in the order of being specified by MD accessing operationRegister in. On the contrary, if operator scheme is non-affairs and Weak atomicity, by compulsory MDCRFixed value but not MD value that hardware maintains offers MD accessing operation. As a result, at strong atomicity mouldDuring formula, the MD value that barrier is accelerated or does not maintain based on hardware, and in Weak atomicity pattern,Accelerate barrier based on compulsory MDCR value.
To effective transformation of buffering and monitored state
Next turn to Fig. 8, show and data block is effectively converted to buffering and prison before submitting affairs toDepending on the embodiment of the flow chart of the method for state. As mentioned above, can cushion and/or monitor maintenance dataThe memory block of such as the cache line etc. of item or metadata. For example, the uniformity of cache lineBit comprises the expression of buffer status, and the attribute bit of cache line instruction cache line isBe not monitored, be read supervision and be still written into supervision.
In certain embodiments, cache line is cushioned, but is not monitored, and this means and remains onData in cache line are that have a loss and detected to the conflict of this cache line, thisOwing to there is no application monitors. For example, affairs this locality and there is no submitted data, such as unitData, can be maintained at buffering and not in monitored state.
When detecting buffered data when to conflicting between the writing of same address, can should to dataWith reading supervision. Then, cache line moves to buffering and reads monitored state; But, forArrive this state, read requests is sent to outside treatment element, to force other all copiesBe converted to shared state. These outside read requests may cause and maintain same/cache lineOn the conflict of another treatment element that writes supervision.
Similarly, when detecting buffered data and during to conflict between the reading of same memory block,Write supervision to cache line application. Then, this row is moved to buffering and writes monitored state,This is by sending ownership read requests to force other all copies to be converted to other treatment elementDisarmed state is realized. Similarly, with maintain on the same memory piece read or write supervision appointThe conflict of meaning treatment element is detected.
But in order to minimize affairs conflict, but affairs need to be upgraded the memory of finally not submitting toPiece is maintained at buffering and monitored state not, as described above. But, remain on if determinedThe piece of buffering and not monitored state is submitted, in one embodiment, provides as shown in Figure 8From buffering and not monitored state to active path that can submit state.
As an example, in flow process 805, receive memory block---keep the high speed of this pieceCache lines---Catch updated. Before Catch updated, or with Catch updated side by side, will readGet supervision and be applied to this piece. For example, the reading attributes of cache line is set to read supervision value,To indicate this piece to be read supervision. But, read supervision in order to apply, in flow process 815, firstRead requests is outwards sent to other treatment element. In response to receiving this read requests, in flow processIn 820, other treatment element detects owing to this row being maintained and writes rushing of causing in monitored stateProminent, or change its copy into shared state. In flow process 825, if not conflict willCache line changes buffering into and reads monitored state---cache line uniformity bit upgradedFor Cache consistency state, and surveillance attribute is read in setting.
In flow process 830, monitor and detect for the conflict of cache line and write based on reading. ?In an embodiment, reading attributes is coupled to snoop logic, makes the outside institute to cache lineThe read requests of having the right is arranged on reading in cache line by utilization and monitors and detect conflict.
Subsequently, when this piece in flow process 835 to serve as transaction status a part and when submitted,In stream 840, application writes supervision. Here, in flow process 845, ownership read requests is sent,To other treatment element, this reads or writes monitored state and examine in response to cache line is remained onMeasure conflict, or change its copy into disarmed state in flow process 850. As a result, in ownershipRead requests place detects any conflict to the detection permission of conflict at this some place, and this is in essence by this rowBeing placed in can submit state.
Therefore, two stages---in flow process 810 and flow process 840 by buffering and do not monitor piece changeFor can submit state having superiority. Obtain to postpone via the stage that reads and write supervisionProprietorial obtain allow multiple concurrent transformations upgrade identical piece, reduced simultaneously these affairs itBetween conflict. If because any reason affairs do not arrive presentation stage, to cushion and to read prisonDepending on mode upgrade piece and unnecessarily end arriving another affairs of presentation stage not causing.In addition, postpone and obtain unique ownership of piece until therefore presentation stage becomes obtains between thread moreHigh concurrency and do not sacrifice a kind of mode of data validity.
Table E below shows the embodiment of the conflict situation between two treatment element: P0 and P1.For example, remain on buffering by P1 and read monitored state row and the P0 of (as indicated by R-B hurdle)Have maintain the cache line that writes supervision free position (as by-W-, RW-, WB, RWBIndicated) be (as represented by the x in cross unit) of conflict.
Table E: the embodiment of the conflict situation between two treatment elements
In addition, table F below shows in response to listed operation under P0, relevant in treatment element P1The loss of connection characteristic. For example, if being remained on buffering by P1, row reads monitored state, as R-B hurdle instituteInstruction, and storage occurs on P0 or arrange to write supervisory work, P1 loses reading of this rowGet and monitor and buffering, as indicated by the x-x interweaving in part that stores/arrange WM row and R-B hurdle.
Table F: the embodiment losing as the attribute of operating result
For the conflict of Transaction Information or the branch instruction of loss (JLOSS)
Turn to Fig. 9, show and support the state value of loss instruction based in transaction status register to jump toThe embodiment of the hardware of object label. In one embodiment, hardware provides for checking affairsConforming accelerated method. As example, hardware can be by providing such mechanism to support unanimouslyProperty check, this mechanism follows the tracks of from the supervision of high-speed cache or the loss of buffered data---buffering or superviseDepending on capable withdrawal, or follow the tracks of the conflict access possible to this data---monitor to detect conflict prisonListen, such as to monitoring capable proprietorial read requests.
In addition, in one embodiment, hardware provides architecture interface, to allow software based on supervisionOr the state of buffered data visits these mechanisms. Two kinds of these class interfaces comprise following: (1) is for readingGet or the instruction of write state register, its allow software the term of execution explicitly poll register;(2) allow software to set up the interface of handling procedure, this handling procedure is as long as status register instruction mayUniformity just called while losing.
In another embodiment, hardware supported is called as the new instruction of JLOSS, and this instruction is based on HWThe state executive condition branch of supervision or buffered data. If hardware detection is to appointing from high-speed cacheMay losing of what supervision or buffered data, or the potentially conflicting to any these class data detected,JLOSS instruction is branched off into label. Label comprises any destination, such as loss of data or conflictThe handling procedure that the result detecting will be performed or the address of other code.
As an illustrative embodiment, Fig. 9 has described decoder 810, and it is identified as JLOSSA part of processor ISA, and this instruction is carried out to decoding to allow the logic of processor based on thingBusiness state comes executive condition branch. As example, transaction status remains on transaction status register 915In. Transaction status register can represent the state of affairs, such as when hardware detection is to loss of dataOr conflict---referred to herein as loss event. In order to illustrate, while being monitored in monitor instruction address, establishPut the conflict mark in TSR915, wherein in conjunction with the monitoring of institute's monitor address is monitored to described address,Conflict mark instruction in TSR912 has detected conflict. Similarly, in the time of loss of data (such as bagDraw together the withdrawal of the row of Transaction Information or metadata), loss marker is set.
Therefore, in this article, in the time of decoded and execution, JLOSS test state register mark,And if there is loss event---lose and/or conflict, logic 925 is by the mark of being quoted by JLOSSSign as redirect destination address and offer and carry out resource 930. As a result of, utilize single instruction, softwareCan distinguish the state of affairs, and execution can be guided into by this single instruction and specified based on this stateLabel. Because JLOSS checks uniformity, therefore the report of wrong conflict is acceptable---JLOSS can suitably report that conflict occurs.
In one embodiment, such as the software of compiler, JLOSS instruction is inserted in program codeTo carry out poll for uniformity. But, can with primary application code inline use JLOSS,Conventionally in reading and writing barrier, use JLOSS instruction, to determine as required uniformity, described inReading and write barrier provides conventionally in storehouse; Therefore, the execution of program code can comprise compilerJLOSS is inserted in code, or carry out the JLOSS from program code, and insert or carry outAny other form of instruction. The poll that expection is undertaken by JLOSS is far faster than showing status registerFormula reads, and this is because JLOSS instruction does not require extra register---come without destination registerReceive for the explicit status information reading. There is the embodiment of multiple these instructions, wherein in instructionExplicitly provide or implicitly provide the bar checking for uniformity in point other control registerPart.
As an example, transaction status register 915 or other memory element keep specific punchingDash forward and lost condition information, such as, monitor that position is write by another agency if read---readGet conflict, write supervision position and read or write by another agency---write conflict, physics thingThe business loss of data or the loss of metadata. Therefore, can use the different editions of JLOSS instruction.For example, read if any and monitor that position may be write by another agency, JLOSS.rm<label>Instruction will be branched off into its label. Hardware-accelerated STM (HASTM) can use this JLOSS.rmAs long as instruction is to accelerate consistency check---while guaranteeing to read-arrange uniformity (such as at primary codeAfter each affairs in TM system load) just by using JLOSS.rm to check rapidly readingThe conflict arranging is upgraded. In this case, can verify and read with reading JLOSS in barrierGet setting, thereby JLOSS instruction is inserted in this barrier in storehouse, or after load operationInline with primary application code. With for detection of monitoring that to reading the JLOSS.rm writing of position refers toMake similar, can with JLOSS.wm instruction detect to write monitor any of position read or writeEnter. As another example, in can the processor of buffer position, can use JLOSS.buf to refer toOrder determines whether buffered data has been lost and as a result of jumped to specify labels.
The following false code that is labeled as false code A shows primary code STM and reads barrier, and it providesConsistent reading arrange and use JLOSS. Setrm (void*address) function is established on given addressPut and read supervision, and jloss_rm () function is the built-in function for JLOSS instruction, it is mayOccur to return in the situation to reading any conflict access that monitors position true. This false code monitors and addsCarry data, but can instead monitor transaction journal (ownership record). Can use will be to readingThe setting and the data that monitor load the instruction of combining---for example, and loading and supervision dataMovxm instruction. Can also except monitor also carry out filtering read barrier in use this instruction,And (executive software does not read only using STM system for reading the hardware monitoring that checking is setRecord and the STM system of not carrying out SW checking) middle this instruction of use.
False code A: the reading of the STM of local update, optimization, primary code read barrier
Similarly, do not maintain the conforming STM system that arranges that reads, such as, for management codeSTM, can avoid Infinite Cyclic, or due to JLOSS.rm instruction is inserted in circulation return limit orOther closes person key control flow point (such as causing abnormal instruction) inconsistency of locating to cause and drawsOther the incorrect control stream rising---abnormal.
The following false code that is marked as false code B shows provides conforming another primary code to readGet barrier. The TM system of this version use high-speed cache resident write setting, it uses for thingThe Catch updated writing that business is inner. The reading of position of losing from previous buffering but subsequently causes notUniformity, therefore in order to maintain uniformity, this reads barrier and is avoided entering from the buffer position of any lossRow reads. If STM uses submission time lock for buffer position,COMMIT_LOCKING is labeled as very. In the time not using submission time lock, to locating from prior lockPutting reading of carrying out uses jloss_buf () to check; Otherwise, use jloss_buf () to check to all reading.False code B: local update, primary code STM read barrier
As mentioned above, TM system can monitor combination by reading supervision with buffering and writing, and therefore also wrapsDraw together and check to the conflict monitoring or buffering is gone, to maintain uniformity. In order to adapt to this system, differenceEmbodiment JLOSS evolution (flavor) can also be provided, it is for different supervision and buffered eventLogical combination and branch, such as JLOSS.rm.buf (read monitor or buffer position on conflict),JLOSS.rm.wm (read or write monitor locational conflict) or JLOSS.* (read supervision,Write the conflict in supervision or buffer position).
In interchangeable embodiment, architecture interface carries out uncoupling according to condition to JLOSS instruction,Under this condition, it is by allowing software, in point other control register, this condition is set---read/Write the conflict monitoring on row or buffering row---carry out branch. This embodiment only requires singleJLOSS instruction encoding, and can supported feature expand to the event sets that JLOSS should branch.
Turn to Figure 10, described a kind of conflict or loss based on customizing messages and carried out loss instruction to jumpForward the embodiment of the flow chart of the method for object label to. In one embodiment, in flow process 1005Receive JLOSS instruction. As mentioned above, JLOSS instruction can be inserted into by programmer or compilerIn main code, such as after load operation, to guarantee to read, uniformity is set, or is inserted into screenIn barrier, such as to reading or writing in barrier. The JLOSS instruction of discussing above and modification thereof are at oneIn embodiment, can be identified as a part of processor ISA. Here, decoder can be to JLOSS instructionCommand code carry out decoding.
In flow process 1010, determine whether to occur conflict or the loss of information. In one embodiment,Conflict or the type of losing depend on the modification of JLOSS instruction. For example, if receive JLOSS refer toOrder is JLOSS.rm instruction, determines to read to monitor whether writing conflict and visit by outside of rowAsk. But, as mentioned above, can receive any modification of JLOSS, comprise and allow user to specify controlThe JLOSS instruction of the condition in register processed.
Therefore, once set up this condition, no matter be, from control register or the class of JLOSS instructionType is set up, and determines whether these conditions meet. As the first example, use transaction statusFor example, information in register (TSR915) determines whether to have met condition. Here TSR915,Can comprise and read monitored state mark, its acquiescence is set to without conflict value and is updated to conflictValue occurs with instruction conflict. But status register is not for determining whether conflict is sent outRaw unique method, in fact, can use any known method of losing or conflict for determining.
In response to conflict not detected, such as monitor conflict mark still when reading in STR915While being set to default value, in flow process 1025, return to falsity, and normal continuation carried out. But,If conflict detected or lose, monitor that such as reading conflict mark is set up, in flow process 1015JLOSS returns very, and in flow process 1020, guides execution to jump to by the JLOSS instruction receivingThe label of definition.
The hardware supported of submitting to for transaction memory
As discussed previously, the affairs of hardware supported can be by the transaction write in buffering high-speed cacheEnter and do not make their overall situations accelerate as seen the version management of software. In this case, can useSimple submission instruction, this makes buffer value all visible to all processors, but is cushioning arbitrarily rowFailure when loss. But, hardware also keep software can be used for accelerate metadata ability (such asFor eliminating/filter the wave filter of barrier of redundancy) can wish to carry during to any conflict in hardware detectionInterdigital order unsuccessfully. In addition,, once submit to, may expect to remove the letter for affairs keeping in hardwareThe various combination of breath, such as metadata, monitor and buffering row.
Therefore, in one embodiment, the submission instruction of hardware supported various ways, to allow submissionThe information that will remove for the condition submitted to after submitting to is specified in instruction. With reference to Figure 11, describe hardwareBe supported in and submit the embodiment that defines the ordinary circumstance of submission condition and removing control in instruction to.
As shown, submit to instruction 105 to comprise command code 1110, this command code can be identified as processor ISAA part---decoder 1115 can carry out decoding to command code 1110. In the example shown,Command code 1110 comprises two parts: submission condition 1111 and removing control 1112. Submission condition 1111Appointment is used for submitting to the condition of affairs, and will remove after submission removing control 1112 appointment submission affairsInformation.
In one embodiment, these two parts include four values: read supervision (RM), writeMonitor (WM), buffering (Buf) and metadata (MD). In essence, if in these four valuesAny value is set up in part 1111---comprise that the attribute/property that instruction is associated is submission conditionValue, corresponding characteristic is the condition for submitting to. In other words, if the correspondence of condition 1111Be set up in the first bit that reads monitor message, from the monitor 1135 being associated with affairsRead arbitrarily and monitor that the loss of data causes ending---due to the specified requirements failure of submission instruction notSubmit to. Similarly, if the value in 1112 is set up, remove afterwards corresponding characteristic in submission. ContinueContinuous this example as the RM in fruit part 1112 is set up, is removed for affairs in the time submitting affairs toMonitor 1135 in read monitor message. Therefore, in this example, may exist four forThe condition of submitting to and four removings are controlled, and this causes 256 possible combinations as the change of submitting instruction toType. In one embodiment, by allowing to specify submission condition in command code, hardware can be supportedAll this modification. But, discuss several modification below further to understand dissimilar proposingInterdigital order and can how to use them.
TXCOMWM
As the first example, discuss Txcomwm instruction. This instruction finishes affairs, and is not havingWriting while monitoring loss of data (success) makes all writing monitor that the buffered data overall situation is visible; Otherwise,Its failure in the situation that writing supervision loss of data. Txcomwm arrange (or reset) mark withIndicate successfully (or failure). Once success, Txcomwm removes all writing and monitors the slow of dataRush state. Txcomwm does not affect and reads or write monitored state, allow software in subsequent transaction againThis state of inferior use; It does not affect buffering yet but does not write the state of the position of supervision, allows softPart continues to have the information keeping in this position. The false code being labeled as below false code C illustratesThe arthmetic statement of Txcomwm. In the time that TSR.LOSS_WM is 0, all writing monitors bufferingThe BF characteristic of BBLK is removed automatically, and all sort buffer data become visible to other agency.Remove TCR.IN_TX. The buffer stopper that lacks WM is unaffected, and still cushions. After completing, establishPut CF mark. In the time that TSR.LOSS_WM is 1, removes CF mark, and remove TCR.IN_TX.Be set to 1 if operate successful CF mark, if operation failure is set to 0. By OF, SF, ZF,AF and PF mark are set to 0.
False code C: the embodiment of the algorithm operating for Txcomwm
The false code that is labeled as false code D below shows HASTM system and can how to useAffairs are submitted in Txcomwm instruction to, and these affairs are used hardware to write buffering to avoid local update STMIn cancellation record. CACHE_RESIDENT_WRITES mark is indicated this execution pattern. IllustrateThe embodiment of HASTM.
False code D: for using the embodiment of false code of Txcomwm instruction
TXCOMWMRM
Modification txcomwmrm expansion Txcomwm instruction, makes it lose anyRead failure in the situation that monitors position. This modification arranges conflict for only detecting to read with hardwareAffairs be useful. The false code that is labeled as false code E below shows TxcomwmrmArthmetic statement. In the time that TSR.LOSS_WM and TSR.LOSS_RM are 0, allly write supervisionThe BF characteristic of buffering BBLK is removed automatically, and all sort buffer data become other agencyVisible. Remove TCR.IN_TX. Lack the unaffected and buffering still of the buffer stopper of WM. CompleteAfter CF mark is set. In the time that TSR.LOSS_WM or TSR.LOSS_RM are 1, remove CFMark and removing TCR.IN_TX mark. If operated successfully, CF mark is set to 1, if lostLosing removing is 0. OF, SF, ZF, AF and PF mark are set to 0.
The embodiment of the arthmetic statement of false code E:Txcomwmrm
Next false code, i.e. false code F, shows for cushion affairs with hardware and writes and examineThe submission algorithm of the STM system use txcomwmrm instruction of conflict is read-arranges in survey.Whether HW_READ_MONITORING mark instruction algorithm only uses hardware to read-arrange punchingProminent detection.
False code F: the embodiment that uses the false code of txcomwmrm instruction
TXCOMWMIRMC
The modification of the 3rd discussion has been shown in the arthmetic statement of false code F below. WhenTSR.LOSS_WM and TSR.LOSS_IRM are 0 o'clock, and all writing monitors buffering BBLK'sBF characteristic is removed automatically, and the data of all sort buffers become visible to other agency. WillRM, WM and IRM and TCR.IN_TX remove. The buffer stopper that lacks WM is also unaffectedAnd still buffering. CF mark is set after completing. As TSR.LOSS_WM or TSR.LOSS_IRMBe 1 o'clock, remove CF mark and remove TCR.IN_TX. If operated successfully, CF mark is establishedBe set to 1, if failure, removing is 0. OF, SF, ZF, AF and PF are set to 0.
The embodiment of the arthmetic statement of false code F:Txcomwmirmc instruction
With reference to Figure 12, show the embodiment of the flow chart of carrying out the method for submitting instruction to, wherein submit toInstruction definition submission condition and remove control. In flow process 1205, receive and submit instruction to. As above instituteState, compiler can will submit to instruction to be inserted in program code. As a specific illustrative realityExecute example, will calling of function of submission be inserted into main code, and such as being included in false code in the aboveIn those submission function be provided in storehouse; Compiler can also will submit to instruction to be inserted in storehouseSubmission function in.
After receiving submission instruction, decoder can be to submitting to instruction carry out decoding. According to decodingInformation is determined by the condition of submitting to the command code of instruction to specify in flow process 1210. As mentioned above, behaviourWork code can arrange some mark and other mark of resetting uses those conditions with instruction for submission.If satisfied condition, return to vacation, and abort transaction respectively. But, if for carryingThe condition of handing over (reads such as nothing and monitors loss, monitors loss without writing, loses and/or nothing without metadataAny combination of buffer loss) meet,, in flow process 1215, determine cleared condition/control. AsExample, determine to remove for affairs read supervisions, write supervision, metadata and/or buffering timesMeaning combination. As a result, in flow process 1225, remove and determine the information that will remove.
For the storage management of the optimization of UTM
As mentioned above, Unbounded transactional memory (UTM) framework and hardware thereof are realized by under introducingRow characteristic is carried out extensible processor framework: supervision, buffering and metadata. These combinations provide to softwareRealize the necessary means of various complicated algorithms, comprise the wide spectrum of transaction memory design. Can lead toCross and in high-speed cache is realized, expand existing cache protocol or distribute independently new hardware moneyEach characteristic is realized in source in hardware.
Utilize the UTM characteristic being realized by HW, UTM framework and hardware thereof are realized can be effective at itAvoid and minimize energy in the situation of unexpected (such as UTM transaction abort and subsequent transaction retry operation)Enough provide the performance of the unique solution of software (STM) to strengthen. The main cause of hardware transaction abortsOne of be because the frequent ring being caused by external interrupt, system call event and page fault changes.
Suspending mechanism based on current level of privilege (CPL) makes hardware transactional activity, and (support has UTMCharacteristic (such as buffering and monitor and enable discharge mechanism) hardware-accelerated affairs), simultaneous processor existsLevel of privilege 3 (user model) is located operation. From encircle any ring of 3 change the affairs that make current active fromMoving hang-up (stop generating UTM characteristic and forbid output mechanism). Similarly, get back to any of ring 3Ring changes the hardware transactional (if it was once movable) that recovery had previously been hung up automatically. The possibility of the methodDisadvantage be the use of the hardware transaction memory resource in kernel code or except encircle 3 itMajor part in the use at outer any other ring grade place is all excluded.
Other method is to introduce the TM copying to control resource, such as depositing for encircling 0 affairs controlDevice (TxCR), makes us still can utilize other TM resource of these points to enable for encircling 0Hardware transactional. But the method may lack for processing during ring 0 transaction operation nestedInterrupt and abnormal effective solution.
As a result, Figure 13 shows the enforcement of the hardware that the term of execution of being supported in affairs, processing priority changesExample, this makes to encircle 0 affairs on user model (ring 3) affairs, but also makes OS and managementProgram (monitoring (VMM) such as virtual machine) can be processed unlimited rank in existence ring 0 affairs situationNested interrupt and NMI situation.
Comprise affairs enable field (TEF) such as the memory element of EFLAGS register 1310 grades1311. In the time that TEF1311 keeps activity value, it is indicated the current activity of affairs and enables, and works asWhen TEF1311 keeps inactive value, its instruction affairs are suspended.
In one embodiment, affairs start operation or other operation at affairs section start, by TEFField 1311 is set to activity value. The ring level transitions event at flow process 1300 places (such as interrupting,Extremely, system call, exit virtual machine or enter virtual machine) occur after, in flow process 1301,The state of PE0E flag register 1310 is pushed on kernel stack 1320. At flow process 302 places, willInactive value is removed/be updated to TEF field 1311, so that affairs are hung up. Ring level transitions event is enteredProcessing or service that row is suitable, affairs are hung up simultaneously. Once 1303 places detect the event of returning in flow process,, in flow process 1304, be pushed into the state of the E flag register 1310 on stack at flow process 1301 placesBe ejected, to utilize previous state to recover E mark 1310. The recovery of original state is by TEF1311Be returned as activity value, and be movable by transaction recovery and enable.
List the particular example of the process of illustrative ring level transitions event below. Interrupting with abnormalAfter, processor pushes EFLAGS register on kernel stack and at " affairs enable " bit and is set upSituation under removed, thereby the affairs that previously enabled are hung up. After IRET, processor is from interiorCore stack recovers the whole EFLAGS buffer status of the interrupt thread that comprises " affairs enable " bit,Thereby in the situation that previously enabling, affairs remove the hang-up to affairs.
After SYSCALL, processor pushes EFLAGS register and at " affairs enable " quiltIn situation about arranging, removed, thereby hung up the affairs that previously enabled. After SYSRET, processDevice recovers the whole EFLAGS register of the interrupt thread that comprises " affairs enable " bit from kernel stackState, thus in the situation that previously enabling, affairs remove the hang-up to affairs.
After VM-exits, processor will comprise the guest's of " affairs enable " bit status EFLAGSRegister is kept in virtual machine control structure (VMCS), and the EFLAGS of loading main frame postsStorage state (its " affairs enable " bit status is eliminated), thus if it is enabled, hang upGuest's the affairs that previously enabled.
After VM-enters, processor recovers to comprise coming of " affairs enable " bit status from VMCSGuest's EFLAGS register, thus if be enabled before affairs, remove previously enabling guestThe hang-up of affairs.
This makes the hardware-accelerated UTM affairs of kernel mode (ring 0) hard at user model (ring 3)On the UTM affairs that part accelerates, but also for OS and VMM provide at operation ring 0In the situation of affairs, process the nested interrupt of infinite stages and the method for NMI situation. Without any existing skillArt provides this mechanism.
The module using herein refers to any hardware, software, firmware or above-mentioned combination. LogicalOften, the module border being illustrated respectively generally can change and may be overlapping. For example, first and secondModule can be shared hardware, software, firmware or above-mentioned combination, may retain some independently simultaneouslyHardware, software or firmware. In one embodiment, the use of terminological logic comprises hardware, such as crystalline substanceBody pipe, register or other hardware such as programmable logic device etc. But, at another embodimentIn, logic also comprises software or the code with hardware integration, such as firmware or microcode.
The value here using comprises any of numeral, state, logic state or two-valued function stateKnow expression. Conventionally, the use of logic level or logical value is also referred to as to 1 or 0, this simple earth's surfaceShow two-valued function state. For example, 1 refer to high logic level and 0 refer to low logic level. A realityExecute in example, memory cell, for example transistor or flash cell, can have keep single logical value orThe ability of the multiple logical values of person. But, use other expression of computer system intermediate value. ExampleAs, ten's digit 10 also can be represented as binary value 1010, and hexadecimal letter A.Therefore, value comprises any expression of the information that can be stored in computer system.
In addition, state can be by being worth or the part of value represents. For example, the first value, for example logical one,Can represent acquiescence or original state, and the second value, for example logical zero, can represent non-default conditions.In addition, in one embodiment, term reset and arrange refer to respectively acquiescence with upgrade value or state.For example, default value may comprise high logic value, reset, and the value of upgrading may comprise low logical value,Arrange. Note, any combination of value can be used to represent the state of any amount.
The embodiment of method, hardware, software, firmware or the code set of setting forth above can be by storageThe instruction that can be carried out by treatment element on machine-accessible or machine readable media or code come realExisting. Machine-accessible/computer-readable recording medium comprises to be provided (store and/or transmit), and machine (for example, calculatesMachine or electronic system) any mechanism of information of readable form. For example, machine accessible medium comprisesRandom access memory (RAM), as static RAM (SRAM) (SRAM) or dynamic ram (DRAM);ROM; Magnetic or optical storage media; Flush memory device; Electrical storage device, light storage device, the acoustic memoryTransmitting signal (for example, carrier wave, infrared signal, the data signal) memory device of part or other formEtc.. For example, machine can be by the medium from preserving the information that will transmit at transmitting signalReceive described transmitting signal (for example, carrier wave) accessing memory part.
In description, mentioning " embodiment " or " embodiment " in the whole text means in conjunction with this realitySpecial characteristic, structure or the characteristic of executing example description are included at least one embodiment of the present invention.Therefore, phrase " in one embodiment " or " in one embodiment " occur everywhere in descriptionAnd nonessentially all refer to same embodiment. In addition, described specific feature, structure or characteristic can beCombination in any suitable manner in one or more embodiment.
In aforementioned specification, provide detailed description with reference to specific exemplary embodiment.But it is evident that, can the present invention be made various amendments and change and do not departed from as appended powerGiven broader spirit of the present invention and scope during profit requires. Therefore, description and accompanying drawing are will quiltBe considered as illustrative and nonrestrictive. In addition making embodiment and other exemplary language,With and the same embodiment of nonessential finger or same instance, but can refer to different or different embodiment,And refer to possibly same embodiment.

Claims (14)

1. for a device for transactional memory systems, comprising:
Multiple treatment elements, wherein, the treatment element in described multiple treatment elements and multiple softwareSystem is associated;
Logic element, at least based on data address and with first of described multiple software subsystemsThe meta-data identifier that software subsystem is associated is by metadata access operations and and described the first softwareThe abstract address space that system is associated carries out association, wherein said metadata access operations and describedOne software subsystem is associated and quotes described data address, and wherein, described abstract address space onlyFor storing metadata,
Wherein, described logic element comprises abstract converter logic element, described abstract converter logic elementFor at least based on described meta-data identifier by described data address be transformed to described the first softwareMetadata address in the described abstract address space that subsystem is associated.
2. device as claimed in claim 1, wherein, is associated with described the first software subsystemDescribed abstract address space with comprise described data address data address space and and described multiple softAt least one other abstract address space that the second software subsystem in part subsystem is associated is orthogonal.
3. device as claimed in claim 2, wherein, each in described multiple software subsystemsTo select separately the group forming from the following: affairs subsystem running time, garbage are receivedTransporting something containerized line time subsystem, memory protection subsystem, software mapping subsystem, nested transaction setOuter transaction and the inner transaction of nested transaction set.
4. device as claimed in claim 1, also comprises decoder, for to described metadata accessDecoding is carried out in operation, and wherein said metadata access operations comprises and is identified as multiple in described decoderSupport the command code of one of operation.
5. device as claimed in claim 1, wherein, described abstract converter logic element is by described numberBe the unit in the described abstract address space being associated with described the first software subsystem according to address mappingData address is the treatment element identifier based on being associated with described treatment element also.
6. device as claimed in claim 5, wherein, described abstract converter logic element is by described numberBe the unit in the described abstract address space being associated with described the first software subsystem according to address mappingData address is the compression ratio to metadata based on data also.
7. device as claimed in claim 5, also comprises register, and described register can pass through instituteState the first software subsystem amendment, wherein said register is in response to from described the first softwareWriting of system and keep described meta-data identifier to indicate described the first software subsystem firstOn described treatment element, carry out, and wherein, described abstract converter logic element is based on described processing elementsPart identifier and described meta-data identifier by described data address be transformed to described the first softwareMetadata address in the described abstract address space that system is associated comprises: described abstract converter logicElement enters the expression of described data address and described treatment element identifier and described meta-data identifierRow combination.
8. device as claimed in claim 7, wherein, described abstract converter logic element is by carrying outAt least one in following operation by the expression of described data address and described treatment element identifier andDescribed meta-data identifier combines: by described treatment element identifier and described meta-data identifierAdd described data address to form described metadata address, to use normal data transformation table by instituteState that data address is transformed to the data address of conversion and by described treatment element identifier and metadata markKnow and accord with the address of adding described conversion to form described metadata address and to use and normal numberThe abstract map table for metadata separating according to map table is transformed to described data address the unit of conversionData address and described treatment element identifier and meta-data identifier are added to the unit of described conversionData address is to form described metadata address.
9. for a method for transactional memory systems, comprising:
The metadata operation of reference data address detected, described data address is positioned at data address spaceIn and be associated with the data item remaining in the data entry of cache memory;
The treatment element of the treatment element being associated based on described data address, with described metadata operationIdentifier and for the meta-data identifier of the software subsystem being associated with described treatment element, comesDetermine with the disjoint abstract address space in described data address space in metadata address, wherein,Described abstract address space is only for storing metadata;
Visit the metadata entry of described cache memory based on described metadata address.
10. method as claimed in claim 9, wherein, described abstract address space and other taking outResemble address space also non-intersect, described other abstract address space is to also relevant with described treatment elementThe other software subsystem of connection is associated.
11. methods as claimed in claim 9, wherein, described software subsystem is from the followingIn the group forming, select: affairs subsystem running time, garbage collected subsystem running time,Outer transaction, the Yi Jiqian of memory protection subsystem, software mapping subsystem, nested transaction setThe inner transaction of the transaction set of cover.
12. methods as claimed in claim 9, also comprise:
In response to detect from described software subsystem to the control being associated with described treatment elementThe write operation of register and described meta-data identifier is write to described control register, using as rightThe response of carrying out on the current described treatment element of described software subsystem; And
Determine described meta-data identifier from described control register.
13. methods as claimed in claim 12, also comprise: from the command code of described metadata operationA part determine described treatment element identifier.
14. methods as claimed in claim 12, wherein, from described data address, described processing elementsPart identifier and described meta-data identifier determine that described metadata address comprises: by carrying out following behaviourAt least one in work is by described data address, described treatment element identifier and described metadata markKnowing symbol combines: add described treatment element identifier and described meta-data identifier to described numberAccording to address to form described metadata address, to use normal data transformation table that described data address is becomeBe changed to the data address of conversion and add described treatment element identifier and meta-data identifier to instituteThe address of stating conversion separates to form described metadata address and to use with normal data transformation tableThe abstract map table for metadata by described data address be transformed to conversion metadata address andThe metadata address of adding described treatment element identifier and meta-data identifier to described conversion is with shapeBecome described metadata address.
CN200980160097.XA 2009-06-26 2009-06-26 The optimization of Unbounded transactional memory (UTM) system Expired - Fee Related CN102460376B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2009/048947 WO2010151267A1 (en) 2009-06-26 2009-06-26 Optimizations for an unbounded transactional memory (utm) system

Publications (2)

Publication Number Publication Date
CN102460376A CN102460376A (en) 2012-05-16
CN102460376B true CN102460376B (en) 2016-05-18

Family

ID=43386805

Family Applications (1)

Application Number Title Priority Date Filing Date
CN200980160097.XA Expired - Fee Related CN102460376B (en) 2009-06-26 2009-06-26 The optimization of Unbounded transactional memory (UTM) system

Country Status (7)

Country Link
JP (1) JP5608738B2 (en)
KR (1) KR101370314B1 (en)
CN (1) CN102460376B (en)
BR (1) BRPI0925055A2 (en)
DE (1) DE112009005006T5 (en)
GB (1) GB2484416B (en)
WO (1) WO2010151267A1 (en)

Families Citing this family (55)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9477515B2 (en) 2009-12-15 2016-10-25 Intel Corporation Handling operating system (OS) transitions in an unbounded transactional memory (UTM) mode
US8521995B2 (en) 2009-12-15 2013-08-27 Intel Corporation Handling operating system (OS) transitions in an unbounded transactional memory (UTM) mode
US8316194B2 (en) 2009-12-15 2012-11-20 Intel Corporation Mechanisms to accelerate transactions using buffered stores
US8095824B2 (en) 2009-12-15 2012-01-10 Intel Corporation Performing mode switching in an unbounded transactional memory (UTM) system
US9448796B2 (en) 2012-06-15 2016-09-20 International Business Machines Corporation Restricted instructions in transactional execution
US9772854B2 (en) 2012-06-15 2017-09-26 International Business Machines Corporation Selectively controlling instruction execution in transactional processing
US9336046B2 (en) 2012-06-15 2016-05-10 International Business Machines Corporation Transaction abort processing
US20130339680A1 (en) 2012-06-15 2013-12-19 International Business Machines Corporation Nontransactional store instruction
US9740549B2 (en) 2012-06-15 2017-08-22 International Business Machines Corporation Facilitating transaction completion subsequent to repeated aborts of the transaction
US8682877B2 (en) 2012-06-15 2014-03-25 International Business Machines Corporation Constrained transaction execution
US9348642B2 (en) 2012-06-15 2016-05-24 International Business Machines Corporation Transaction begin/end instructions
US9384004B2 (en) 2012-06-15 2016-07-05 International Business Machines Corporation Randomized testing within transactional execution
US9436477B2 (en) 2012-06-15 2016-09-06 International Business Machines Corporation Transaction abort instruction
US10437602B2 (en) 2012-06-15 2019-10-08 International Business Machines Corporation Program interruption filtering in transactional execution
US9442737B2 (en) 2012-06-15 2016-09-13 International Business Machines Corporation Restricting processing within a processor to facilitate transaction completion
US8688661B2 (en) 2012-06-15 2014-04-01 International Business Machines Corporation Transactional processing
US9361115B2 (en) 2012-06-15 2016-06-07 International Business Machines Corporation Saving/restoring selected registers in transactional processing
CN102830953B (en) * 2012-08-02 2017-08-25 中兴通讯股份有限公司 Command processing method and network processing unit instruction processing unit
US9547594B2 (en) * 2013-03-15 2017-01-17 Intel Corporation Instructions to mark beginning and end of non transactional code region requiring write back to persistent storage
US9697040B2 (en) * 2014-03-26 2017-07-04 Intel Corporation Software replayer for transactional memory programs
US9710245B2 (en) * 2014-04-04 2017-07-18 Qualcomm Incorporated Memory reference metadata for compiler optimization
US9195593B1 (en) * 2014-09-27 2015-11-24 Oracle International Corporation Hardware assisted object memory migration
US9952987B2 (en) * 2014-11-25 2018-04-24 Intel Corporation Posted interrupt architecture
US9451307B2 (en) * 2014-12-08 2016-09-20 Microsoft Technology Licensing, Llc Generating recommendations based on processing content item metadata tags
BR112017014359A2 (en) * 2014-12-31 2018-04-10 Huawei Tech Co Ltd method and apparatus for detecting transaction and computer system conflict.
EP4012548A1 (en) * 2015-01-20 2022-06-15 Ultrata LLC Object memory data flow instruction execution
US9747218B2 (en) 2015-03-20 2017-08-29 Mill Computing, Inc. CPU security mechanisms employing thread-specific protection domains
WO2016154115A1 (en) * 2015-03-20 2016-09-29 Mill Computing, Inc. Cpu security mechanisms employing thread-specific protection domains
GB2539433B8 (en) * 2015-06-16 2018-02-21 Advanced Risc Mach Ltd Protected exception handling
GB2539429B (en) 2015-06-16 2017-09-06 Advanced Risc Mach Ltd Address translation
GB2539428B (en) 2015-06-16 2020-09-09 Advanced Risc Mach Ltd Data processing apparatus and method with ownership table
US9760432B2 (en) 2015-07-28 2017-09-12 Futurewei Technologies, Inc. Intelligent code apparatus, method, and computer program for memory
US9921754B2 (en) 2015-07-28 2018-03-20 Futurewei Technologies, Inc. Dynamic coding algorithm for intelligent coded memory system
US10180803B2 (en) 2015-07-28 2019-01-15 Futurewei Technologies, Inc. Intelligent memory architecture for increased efficiency
US10019360B2 (en) * 2015-09-26 2018-07-10 Intel Corporation Hardware predictor using a cache line demotion instruction to reduce performance inversion in core-to-core data transfers
GB2543306B (en) * 2015-10-14 2019-05-01 Advanced Risc Mach Ltd Exception handling
US10437480B2 (en) 2015-12-01 2019-10-08 Futurewei Technologies, Inc. Intelligent coded memory architecture with enhanced access scheduler
US9996471B2 (en) * 2016-06-28 2018-06-12 Arm Limited Cache with compressed data and tag
US10191936B2 (en) * 2016-10-31 2019-01-29 Oracle International Corporation Two-tier storage protocol for committing changes in a storage system
CN106411945B (en) * 2016-11-25 2019-08-06 杭州迪普科技股份有限公司 A kind of access method and device of Web
US10120805B2 (en) * 2017-01-18 2018-11-06 Intel Corporation Managing memory for secure enclaves
US10579377B2 (en) * 2017-01-19 2020-03-03 International Business Machines Corporation Guarded storage event handling during transactional execution
US10324857B2 (en) * 2017-01-26 2019-06-18 Intel Corporation Linear memory address transformation and management
US10795836B2 (en) * 2017-04-17 2020-10-06 Microsoft Technology Licensing, Llc Data processing performance enhancement for neural networks using a virtualized data iterator
GB2562062B (en) * 2017-05-02 2019-08-14 Advanced Risc Mach Ltd An apparatus and method for managing capability metadata
US10732634B2 (en) * 2017-07-03 2020-08-04 Baidu Us Llc Centralized scheduling system using event loop for operating autonomous driving vehicles
GB2568059B (en) * 2017-11-02 2020-04-08 Advanced Risc Mach Ltd Method for locating metadata
GB2573558B (en) * 2018-05-10 2020-09-02 Advanced Risc Mach Ltd A technique for managing a cache structure in a system employing transactional memory
US10866890B2 (en) * 2018-11-07 2020-12-15 Arm Limited Method and apparatus for implementing lock-free data structures
CN112306956B (en) * 2019-07-31 2024-04-12 伊姆西Ip控股有限责任公司 Methods, apparatuses, and computer program products for metadata maintenance
GB2588134B (en) * 2019-10-08 2021-12-01 Imagination Tech Ltd Verification of hardware design for data transformation component
CN111552619B (en) * 2020-04-29 2021-05-25 深圳市道旅旅游科技股份有限公司 Log data display method and device, computer equipment and storage medium
US11372548B2 (en) * 2020-05-29 2022-06-28 Nvidia Corporation Techniques for accessing and utilizing compressed data and its state information
CN114064302A (en) * 2020-07-30 2022-02-18 华为技术有限公司 Method and device for interprocess communication
CN117056157B (en) * 2023-10-11 2024-01-23 沐曦集成电路(上海)有限公司 Register hierarchy verification method, storage medium and electronic equipment

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH07182241A (en) * 1993-12-22 1995-07-21 Toshiba Corp Cache memory control device
US7363474B2 (en) * 2001-12-31 2008-04-22 Intel Corporation Method and apparatus for suspending execution of a thread until a specified memory access occurs
US8683143B2 (en) * 2005-12-30 2014-03-25 Intel Corporation Unbounded transactional memory systems
US7991965B2 (en) * 2006-02-07 2011-08-02 Intel Corporation Technique for using memory attributes
US8065499B2 (en) * 2006-02-22 2011-11-22 Oracle America, Inc. Methods and apparatus to implement parallel transactions
US7376807B2 (en) * 2006-02-23 2008-05-20 Freescale Semiconductor, Inc. Data processing system having address translation bypass and method therefor
US7739411B2 (en) * 2006-08-11 2010-06-15 Samsung Electronics Co., Ltd. Method and system for content synchronization and detecting synchronization recursion in networks
JPWO2008155849A1 (en) * 2007-06-20 2010-08-26 富士通株式会社 Arithmetic processing device, TLB control method, TLB control program, and information processing device
KR101639672B1 (en) * 2010-01-05 2016-07-15 삼성전자주식회사 Unbounded transactional memory system and method for operating thereof

Also Published As

Publication number Publication date
KR101370314B1 (en) 2014-03-05
GB2484416A (en) 2012-04-11
JP2012530960A (en) 2012-12-06
WO2010151267A1 (en) 2010-12-29
CN102460376A (en) 2012-05-16
BRPI0925055A2 (en) 2015-07-28
GB201119084D0 (en) 2011-12-21
KR20130074726A (en) 2013-07-04
DE112009005006T5 (en) 2013-01-10
JP5608738B2 (en) 2014-10-15
GB2484416B (en) 2015-02-25

Similar Documents

Publication Publication Date Title
CN102460376B (en) The optimization of Unbounded transactional memory (UTM) system
Friedman et al. A persistent lock-free queue for non-volatile memory
CN101322103B (en) Unbounded transactional memory systems
US8627048B2 (en) Mechanism for irrevocable transactions
CN101814017B (en) Method and device for providing memory model for hardware attributes for transaction executions
CN101286123B (en) Device, method and system for efficient transactional execution
US9304769B2 (en) Handling precompiled binaries in a hardware accelerated software transactional memory system
US20100122073A1 (en) Handling exceptions in software transactional memory systems
US9280397B2 (en) Using buffered stores or monitoring to filter redundant transactional accesses and mechanisms for mapping data to buffered metadata
JP5944417B2 (en) Registering user handlers in hardware to handle transactional memory events
CN103544054B (en) Method, apparatus and system for strong atomicity in a transactional memory system
US10210018B2 (en) Optimizing quiescence in a software transactional memory (STM) system
US8180971B2 (en) System and method for hardware acceleration of a software transactional memory
US8555016B2 (en) Unified optimistic and pessimistic concurrency control for a software transactional memory (STM) system
CN101889266B (en) Parallel nested transactions in transactional memory
US9274855B2 (en) Optimization for safe elimination of weak atomicity overhead
CN104598397A (en) Mechanisms To Accelerate Transactions Using Buffered Stores
CN104487946A (en) Method, apparatus, and system for adaptive thread scheduling in transactional memory systems
CN106030534A (en) Salvaging hardware transactions
JP6023765B2 (en) Unlimited transactional memory (UTM) system optimization
JP2017004570A (en) Optimization of unlimited transactional memory (UTM) system
Finkler et al. DYCE: A resilient shared memory paradigm for heterogenous distributed systems without memory coherence
US9652302B2 (en) Method for building a ranked register and a compare-and-swap object, a ranked register and compare-and-swap-objects
Porter Operating system transactions

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20160518

Termination date: 20180626

CF01 Termination of patent right due to non-payment of annual fee