CN101425052B - Method for implementing transactional memory - Google Patents

Method for implementing transactional memory Download PDF

Info

Publication number
CN101425052B
CN101425052B CN2008102390105A CN200810239010A CN101425052B CN 101425052 B CN101425052 B CN 101425052B CN 2008102390105 A CN2008102390105 A CN 2008102390105A CN 200810239010 A CN200810239010 A CN 200810239010A CN 101425052 B CN101425052 B CN 101425052B
Authority
CN
China
Prior art keywords
message
state
instruction
transactional memory
piece
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN2008102390105A
Other languages
Chinese (zh)
Other versions
CN101425052A (en
Inventor
范彬
吴承勇
Original Assignee
Institute of Computing Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Computing Technology of CAS filed Critical Institute of Computing Technology of CAS
Priority to CN2008102390105A priority Critical patent/CN101425052B/en
Publication of CN101425052A publication Critical patent/CN101425052A/en
Application granted granted Critical
Publication of CN101425052B publication Critical patent/CN101425052B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention provides a method for realizing transactional memory. The method comprises the steps of compiling a paragraph of program statement into byte codes, identifying and extracting the relevant byte codes of transaction from the byte codes and carrying out marking to shared objects therein, and compiling the relevant byte codes of the transaction into local codes with transactional versionand adding TxLoad instructions or TxStore instructions or calling to a transactional memory bank interface of software after compiling results according to the semanteme of the byte codes. The invention adopts a TMSI protocol to accelerate read-write interception and collision detection, thus effectively reducing the overhead of the transactional memory of pure software; and compared with the pure hardware mode, the complexity of hardware is less due to no need of realizing all the functions of the transactional memory.

Description

A kind of realizing transactional memory
Technical field
The present invention relates to the processing of affairs under the multi-thread environment, particularly a kind of realizing transactional memory.
Background technology
Along with the development of multi-core system structure with popularize, traditional serial programming model no longer is fit to the needs of multi-core system structure.The problem of pendulum in face of the deviser of programming language and programming tool is: how to provide a kind of complicacy lower, and be applicable to the programming model of multi-core system structure? transactional memory proposes under this background, the notion of transactional memory has been proposed in list of references 1 " M.Herlihy; J.Eliot B.Moss:Transactional Memory:Architectural Support for Lock-Free Data Structures.Proceedings of the 20th annual international symposium on Computerarchitecture, 1993 " the earliest.Described transactional memory is the general designation of a kind of multiple programming model and realization thereof, and its basic thought provides a kind of programming language structure---affairs of coarseness, and guarantees that the accessing operation that comprises in the affairs has the feature of atomicity.Atomicity described herein is a feature of affairs, for example, if at two thread Ta and each self-contained one group of accessing operation Ma of Tb and Mb (may contain many access instruction among Ma and the Mb), then in the process of Ta and Tb executed in parallel, the result that memory mapping that Ma sees when carrying out or Mb all are finished, or Mb do not have the result that carries out fully, for the execution of Mb similar agreement arranged also.
The programmer only needs may all be claimed as affairs by conflicting accessing operation with some groups when using transactional memory model coding, just can guarantee the atomicity of these group accessing operations.Atomicity is to guarantee the key property of concurrent program correctness, and transactional memory makes the programmer express the character of this concurrent program inherence by the mode that is highly susceptible to understanding.Except correctness, performance of parallel program also is the importance that the programming personnel pays close attention to, and quality that transactional memory is realized and application program capacity have contact very closely.The realization of transactional memory efficiently can make uses this model written application to have good single-thread performance and the extensibility under the multi-thread environment.
The implementation procedure of transactional memory generally comprises the management of metadata, the detection of conflict, the several sections such as solution of conflict.On implementation method, the realization of transactional memory can be divided into pure software and realize, pure hardware is realized and hybrid realization.The several sections that pure software relates in realizing will realizing is exactly all realized (mode with a storehouse provides usually) by software, accordingly, hardware realizes being exactly that these several parts are all realized (machine instruction that can realize corresponding function normally is provided) by hardware.And hybrid-type implementation is divided the work to software and hardware, realizes the different piece of transactional memory according to the functional characteristics of software and hardware self.In general, software realization mode is more more flexible than hardware implementation mode, can support more characteristic of speech sounds, but realizes not as hardware on performance, and written application also is difficult to be transplanted on other platforms on hardware transactional internal memory platform.Hybrid-type implementation can be brought into play software and hardware strong point separately, realizes it being a kind of more efficiently transactional memory implementation with respect to software realization and hardware.
In the prior art, there has been the method that adopts hybrid implementation to realize transactional memory, for example, in list of references 2 " B.Saha; A.Adl-Tabatabai; Q.Jacobson:ArchitecturalSupport for Software Transactional Memory.Proceedings of the 39th AnnualIEEE/ACM International Symposium on Microarchitecture; 2006 ", proposed a kind of hybrid transactional memory scheme, this scheme combines a pure software pattern and software and hardware model.Its advantage is: by the support of runtime system, this scheme has easy programming preferably; After software and hardware combined, hardware can be removed the redundant expense that readset closes consistency detection of reading to tackle and reduce, and quickens the effect that existing software transaction is carried out.Its defective is: 1) hardware capability is simple relatively, and software is reading still to need to pay big expense aspect interception and the collision detection; 2) this method just reduces but does not avoid readset to close consistency detection, when cacheline occurring and swap out, still need carry out readset and close consistency detection.Experiment shows, reads to tackle with readset and closes the chief component that the consistance detection overhead is the software transactional memory cost, and therefore, the performance of mixture model is good relatively not enough in this scheme.
In addition, in list of references 3 " A.Shriraman; M.Spear; H.Hossain; V.J.Marathe; S.Dwarkadas, M.L.Scott:An integrated hardware-software approach toflexible transactional memory.Proceedings of the 34th annual internationalsymposium on Computer architecture, 2007 ", a kind of hybrid transactional memory scheme has been proposed also.In this scheme, just the simple combination of pure software and pure hardware approach when hardware approach can not correctly be finished affairs, switches to software approach.The advantage of this scheme is: in pure hardware approach, read and write the record of set and the collision detection between affairs by revising the support of MESI cache coherent protocol, have good performance.The defective of this scheme is: the DLL (dynamic link library) of two kinds of patterns in this scheme is more complicated all, and the restriction in using is more, the operation that some are commonly used, and for example function call is unsupported in affairs.Therefore this scheme is difficult to be applied in the actual software performance history.
Summary of the invention
The objective of the invention is to overcome in the existing transactional memory implementation method by software and realize reading to tackle big, the not good defective of performance of the expense that is caused with collision detection, thus provide a kind of efficiently, transactional memory implementation method fast.
To achieve these goals, the invention provides a kind of realizing transactional memory, realize having on the computing machine of a plurality of processor cores, described processor core comprises privately owned high-speed cache, shared drive for all processor core visits is arranged on described computing machine, in the micro-order controller of described processor core, defined TxLoad instruction and TxStore instruction; The software transactional memory bank also is installed on described computing machine; This method comprises:
Step 1), one section program statement is compiled into bytecode;
Step 2), from described bytecode identification and the extraction bytecode relevant with affairs, and shared object wherein carried out mark;
Step 3), instant compiler will the bytecode relevant with affairs be compiled into the local code of transactional version; In described compilation process, behind the compiling result of respective byte sign indicating number, add TxLoad instruction or TxStore instruction or calling to software transactional memory bank interface according to the semanteme of described bytecode; Wherein,
The processor core that described TxLoad designated command is carried out this instruction sends PrTxnRd message to the director cache of this processor core, and reads the data of a pointer length from high-speed cache; Described PrTxnRd message is used for according to the current state of data block described state being adjusted, and sends the BusTxnRd message that is used for the piece of TM state is done the detection conflict to bus;
The processor core that described TxStore designated command is carried out this instruction sends PrTxnWr message to the director cache of this processor core, and with the data write cache of a pointer length; Described PrTxnWr message is used to write data, according to the current state of data block described state is adjusted, and is sent the BusTxnWr message that is used for the piece of TM state and TS state is done collision detection to bus.
In the technique scheme, also comprise:
Thread parallel on step 4), a plurality of processor cores of computing machine is carried out the local code after the instant compiler compiling, described TxLoad instruction in the affairs or described TxStore instruction are operated accordingly according to the instruction implication, the method that the cacheline conflict that described operation is caused is called in the described software transactional memory bank is carried out ruling, rollback, and with the maintaining cached state of FlushAllTxnCacheLineAbort instruction;
Wherein, described FlushAllTxnCacheLineAbort instruction resets the T zone bit in all high-speed caches, is the S state with the TS status modifier, is the I state with the TM status modifier.
In the technique scheme, in described step 4), also comprise: described TxLoad instruction in the affairs or described TxStore instruction are operated the conflict that does not cause cacheline accordingly according to implication, then after affairs are carried out end, the method of calling in the software transactional memory bank is submitted the modification of affairs to internal memory to, make amended result as seen, and call the maintaining cached state of FlushAllTxnCacheLineCommit instruction every other affairs; Wherein, described FlushAllTxnCacheLineCommit instruction resets the T zone bit in all high-speed caches, and the TS status modifier is S, and the TM status modifier is M.
In the technique scheme, described step 3) also comprises, for the bytecode of the functional dependence called in the affairs compiles out the local code of transactional version and common local code simultaneously, in the process that local code is carried out, determine the version carried out according to whether being in transaction context.
In the technique scheme, in described step 3), when will the bytecode relevant being compiled into the local code of transactional version with affairs, described bytecode is the read operation instruction, behind the compiling result of described read operation instruction, insert the TxLoad instruction, described bytecode is the write operation instruction, inserts the TxStore instruction behind the compiling result of described write operation instruction.
In the technique scheme, in described step 3), after described director cache is received described PrTxnRd message, if cache hit, and the state of respective data blocks is M or S, is TS with its Status Change then, and sends BusTxnRd message on bus; If high-speed cache does not hit or the state of respective data blocks is I, then directly on bus, send BusTxnRd message, and wait for and from the high-speed cache of other processor cores or shared drive, to obtain the data that will read that high-speed cache is changed to TS with state after obtaining data to pack self high-speed cache into.
In the technique scheme, in high-speed cache, state is that the piece of I, S, TS is not done response to described BusTxnRd message, state is after the piece of M is received BusTxnRd message, this blocks of data is sent to the source high-speed cache of BusTxnRd, simultaneously this blocks of data is write shared drive, state is that the piece of TM receives that BusTxnRd message represents to detect conflict.
In the technique scheme, in described step 3), after described director cache is received described PrTxnWr message, if described cache hit, and the state of respective data blocks is M or S, the Status Change that then new data is write behind this piece this piece is TM, and sends a BusTxnWr message to bus; If high-speed cache does not hit or the relevant block state is I, then send a BusTxnWr message to bus.
In the technique scheme, in described high-speed cache, state is that the piece of I is not done response to BusTxnWr message; State is that the piece of S receives that Status Change is I after the BusTxnWr message; State is that the piece of M receives that Status Change is I after the BusTxnWr message, and data is sent to simultaneously source high-speed cache and the shared drive of BusTxnWr; State is that the piece of TM, TS receives that BusTxnWr message represents to detect conflict.
In the technique scheme, described program language is the trustship language, comprises Java language or .NET language.
The invention has the advantages that:
1, transactional memory implementation method of the present invention has adopted the TMSI agreement to come acceleration reading/writing interception and collision detection, be effective in the expense that reduces the pure software transactional memory, and because need not to realize the repertoire of transactional memory, compare with pure hardware pattern, the complicacy of hardware is less.
2, transactional memory implementation method of the present invention can be supported function calling in affairs, makes it more can adapt to the exploitation of actual software.
3, transactional memory implementation method of the present invention does not rely on the realization of hardware, even under existing hardware condition, the software developer also can attempt writing transactional memory and use.
Description of drawings
Fig. 1 is the synoptic diagram of realizing transactional memory of the present invention software application environment in one embodiment;
Fig. 2 is the state transition graph of the related TMSI agreement of realizing transactional memory of the present invention;
Fig. 3 is the process flow diagram of realizing transactional memory of the present invention.
Embodiment
The present invention will be described below in conjunction with the drawings and specific embodiments.
Before method involved in the present invention is elaborated, at first to realizing that needed hardware environment of the inventive method and software environment describe.
Because transactional memory implementation method of the present invention mainly is to solve collision detection between a plurality of affairs and solution under the multi-thread environment, therefore, need a computing machine with a plurality of processor cores, on each processor core, each have privately owned high-speed cache, in addition can be in computing machine for the shared drive of each processor core visit.Remainder in the computing machine and computing machine of the prior art are the same, in this no longer repeat specification.
In the aforementioned calculation machine, as shown in Figure 1, the software environment that is adopted comprises Java environment and software transactional memory bank.Needed environmental facies can comprise JDK etc. together when related Java environment and Java code operate as normal.And described software transactional memory bank is a run-time library, and application program is the functions such as submission, collision detection and solution of calling the establishment that realizes affairs, read-write interception, affairs of the interface function by this storehouse in the process of implementation.Related in this article software transactional memory bank can be considered as it run-time library that follow-up described process of mentioning of software and hardware mixed mode of this paper is all realized by software.The software transactional memory bank is the known prior aries of those of ordinary skills, details about the software transactional memory bank can list of references 4 " Bratin Saha; Ali-Reza Adl-Tabatabai; Richard L Hudson; Chi Cao Minh, Ben Hertzberg:McRT-STM:a high performance software transactionalmemory system for a multi-core runtime.PPoPP 2006 ".
On the basis of above-mentioned hardware environment and software environment, in order to support realization of the present invention, the present invention has also made certain modification to the instruction set architecture among cache coherent protocol and the CPU.Wherein, described cache coherent protocol is the related protocol that is used for realizing the data mode unanimity of cacheline, it is on the basis of existing MSI cache coherent protocol, for cacheline has added TM and TS two states, add original M, S in the MSI agreement, I state, make and adopt the cacheline one of this agreement to have five kinds of states, this agreement is also referred to as the TMSI agreement.In this agreement, the implication of TM that increases newly and TS state is expressed as follows:
TM: a transactions modify of expression the data in cacheline, and " monopolizing " these data, the content of the relevant block in the high-speed cache of other processors is not up-to-date.
TS: represent that affairs have read the data in the cacheline, the content of this cacheline is current up-to-date, but the copy of this cacheline may be present in the high-speed cache of a plurality of processors simultaneously.
TMSI agreement of the present invention is on the basis of above-mentioned five kinds of states, the relevant action that is sent when mutually transforming needed message and state exchange between each state has been done definition, in Fig. 2, detailed explanation has been done in above-mentioned definition, wherein, the line segment of band arrow is represented a kind of state transition, and the A/B on the line segment represents that the state transition of arrow points can take place a kind of state when place processor or bus listen to message A, sends a message or produces an action B to bus simultaneously.For example after " BusTxnWr " message that it is sent, can be transformed into " I " state listening under " M " state on the bus when a cacheline, also can produce a Flush action simultaneously the data write memory in the current cacheline.Can enter the TS state after and for example state cacheline that is M, S, I is received PrTxnRd message, and by BusTxnRd message of bus broadcast.For another example when state be the cacheline of MS when receiving BusTxnWr affairs from bus (read/write conflict has taken place in explanation), hold mode is constant, and sends a BusConflict message to bus, this message also can carry conflict the opposing party's information.Realize in the above-mentioned controller of TMSI agreement in the high-speed cache of each processor.
In the instruction set architecture of CPU, except the common instruction of CPU, newly added four new instructions, these instructions are relevant with aforementioned TMSI agreement.The instruction that increases newly comprises:
TxLoad: from the data that read a pointer length the high-speed cache (is 4 bytes for 32 computing machine promptly, and be 8 bytes for 64 bit machines), if the data that will read then do not read data after shared drive is packed high-speed cache into earlier in high-speed cache again.The TxLoad instruction can make its place processor core send a PrTxnRd message to its privately owned director cache.
In conjunction with the TMSI constitutional diagram as can be seen, director cache is after receiving the PrTxnRd message that is caused by the TxLoad instruction, if have desired data (promptly hitting) in the private cache, and the state of respective data blocks is M or S, be TS then with its Status Change, and send BusTxnRd message on bus, at this moment, described BusTxnRd message represents to notify these data of other processor cores to be read by certain affairs; If private cache does not hit or the state of respective data blocks is I, then directly on bus, send BusTxnRd message, and wait for and from the high-speed cache of other processor cores or shared drive, to obtain these data that high-speed cache is changed to TS with state after obtaining these data to pack self high-speed cache into.In this case, BusTxnRd message reads this blocks of data except that the existing affairs of expression, also to other processor cores or this blocks of data of shared drive application.In high-speed cache, state is that the piece of I, S, TS is not done response to BusTxnRd message, and state is after the piece of M is received BusTxnRd message, and this blocks of data is sent to the source high-speed cache of BusTxnRd, simultaneously this blocks of data is write shared drive.State is that the piece of TM receives that BusTxnRd message promptly detects conflict.
TxStore: with the data write cache of a pointer length, and order is carried out the processor core of this instruction to its privately owned director cache transmission PrTxnWr message.
In conjunction with the TMSI constitutional diagram as can be seen, director cache is after receiving the PrTxnWr message that is caused by the TxStore order, if private cache hits and the state of respective data blocks is M, the Status Change that then new data is write behind this piece this piece is TM, and send a BusTxnWr message to bus, this moment this BusTxnWr message represent to notify other processors the existing transactions modify of director cache this blocks of data; If private cache hits and the state of respective data blocks is S, the Status Change that then new data is write behind this piece this piece is TM, and send a BusTxnWr message to bus, this moment this message this blocks of data of not only having represented existing transactions modify, also can make other contain this piece and state is that the high-speed cache of S is changed to I with the state of this piece; If high-speed cache does not hit or the relevant block state is I, then send a BusTxnWr message to bus, this moment this BusTxnWr message this blocks of data of not only having represented existing transactions modify, also to pack into self private cache of the private cache of other processors or this blocks of data of shared drive application, after packing into, will newly be worth and write.In the high-speed cache, state is that the piece of I is not done response to BusTxnWr message; State is that the piece of S receives that Status Change is I after the BusTxnWr message; State is that the piece of M receives that Status Change is I after the BusTxnWr message, and data is sent to simultaneously source high-speed cache and the shared drive of BusTxnWr; State is that the piece of TM, TS detects conflict when receiving BusTxnWr message.
FlushAllTxnCacheLineCommit: the T zone bit in all high-speed caches is resetted, and the TS status modifier is S, and the TM status modifier is M.
FlushAllTxnCacheLineAbort: the T zone bit in all high-speed caches is resetted, and the TS status modifier is S, and the TM status modifier is I.
After the function of the instruction that will will increase was done as above definition, those skilled in the art just can do corresponding operation in the micro-order controller according to the above-mentioned CPU of being defined in, to realize the function of above-mentioned instruction on CPU.
Providing above-mentioned hardware environment and software environment, and on the basis that cache coherent protocol related in the hardware environment, instruction set architecture have been done as above to revise, on computing machine, how solving the affairs conflict be illustrated below with multiprocessor nuclear.Understand for convenience, in the embodiment below, with reference to figure 3, the method that is called putback with a name is an example, and realization of the present invention is illustrated.
The operation that related putback method will be finished in the present embodiment is the new node n ewNode of tail end insertion at chained list list.Because the programmer estimates that this method may be by a plurality of thread dispatchings and executed in parallel, thereby in the method by transaction declaration the part that needs atom to carry out, and by begin_txn and end_txn mark, promptly in the method, statement between begin_txn and the end_txn is exactly the affairs in this method, and the parameter l ist of begin_txn represents that these affairs need visit shared variable list.The putback method of representing with Java language is as follows:
void?putback(List?list,Node?newNode)
{
begin_txn(list);
if(list.head==NULL){
list.head=newNode;
list.tail=newNode;
end_txn();
return;
}
list.tail.next=newNode;
list.tail=newNode;
end_txn();
}
Above-mentioned Java code is imported in the computing machine and when being moved under user's control, at first this putback method is translated into the Java bytecode of standard by the Java compiler, in this translation process, to being used to represent the identifier of affairs, as begin_txn and end_txn, and the concrete statement in the affairs do not do actual processing, just regards them as built-in function, generates the bytecode of correspondence.
After the Java compiler generates the Java bytecode of putback method, resulting bytecode is sent to the instant compiler of system, the bytecode that comprises affairs is done further processing by described instant compiler.As previously mentioned, the affairs in the putback method partly start from begin_txn, terminate in end_txn, therefore, are putting aside under the prerequisite of subtransaction that instant compiler is easy to the affairs in the bytecode are extracted.In the Java code snippet of the putback method of being mentioned in front, the code relevant with affairs comprises (at this still with the Java example code, but in fact be translated into bytecode):
bengin_txn(list);
list.tail.next=newNode;
list.tail=newNode;
end_txn();
Instant compiler is in identification, the extraction code relevant with affairs, also to carry out mark to shared object, as the list object in the putback method and this object all object (list.tail that can refer to, list.head etc.), the shared object of institute's mark can also be according to relevant with read operation still relevant with write operation in affairs, the set of shared object is further divided into readset closes, write set.Instant compiler will be compiled into the bytecode of the affairs that identify the local code of transactional version after identifying affairs, and other the bytecode of non-affairs is compiled into common local code according to routine operation.An important difference between the local code of related herein transactional version and common local code is: in the local code of transactional version, according to the semanteme of correlative in the affairs can be in compilation process for these statements add one or more in aforesaid TxLoad, the TxStore instruction, common local code does not then comprise above-mentioned instruction.For example, instant compiler instructs list.tail.next=newNode (because this instruction at first will be read the object that quote in the tail territory of list in the read operation of compiling shared object list, and then the next territory of this object made amendment, therefore this instruction is actually read-only to list) time, this newly-increased machine instruction of TxLoad (list) need behind the compiling result of this instruction, be inserted.Again for example, instant compiler is when the write operation instruction list.tail=newNode of compiling shared object list, need behind the compiling result of this instruction, insert this machine instruction of TxStore (list), will insert simultaneously in addition the calling of the method for log in the software transactional memory bank, the effect of this method is that the value record before list.tail is revised gets off.
In the present invention, allow in the affairs, can know that then list of references 3 is not supported in call function in the affairs in to the explanation of list of references 3 in background technology to function calls.In the present invention, when having function call in the affairs, instant compiler need be prepared the local code of two versions for function, i.e. the local code of the local code of transactional version and non-transactional version (being also referred to as common local code).From now on, use the local code of transactional version when in affairs, calling this function, use the local code of non-transactional version when in non-affairs, calling this function.
Behind the code of instant compiler compiling putback method, system begins to carry out the local code that compiling obtains.Execution to common local code is same as the prior art, does not do duplicate explanation at this.When local code is carried out the code relevant with affairs, when promptly running into to the calling of begin_txn (list) instruction, correspondent transaction initial method in the system call software transactional memory bank.The initialization of affairs comprises the application distribution related data structure to system, by the information of these data structure records affairs, and the structured fingers of the thread of application execution affairs, to the operations such as initialization of data structures such as daily record, read-write set.The initialized specific implementation of affairs is that those of ordinary skill in the art is known.
In the implementation to the local code of transactional version, the instructions such as TxLoad, TxStore that can run in compilation process to be added by to the calling of these instructions, can be finished collision detection and solution effectively.Understanding for convenience, is example with a scene, in conjunction with the compiling result of putback method noted earlier, the process that how to detect the conflict between affairs and how to solve is described.
In a scene, suppose to have two thread a, b will carry out the putback method respectively, before a, b entered putback, the state of preserving the cacheline of list object head was I (promptly this object is not now in high-speed cache).On the basis of above-mentioned hypothesis, at first carried out TxLoad (list) instruction by thread a.In the past in the face of the state transition graph of the TMSI agreement of the description of TxLoad (list) instruction and Fig. 1, thread a will read in list object head the high-speed cache of its place processor core earlier from internal memory, and send a PrTxnRd message to director cache, this message makes that the Status Change of respective caches piece is TS; Then thread a continues to carry out TxStore (list), in the past in the face of the state transition graph of the description of TxStore instruction and the TMSI agreement among Fig. 1 as can be known, the PrTxnWr message that TxStore produces is TM with the Status Change of list object place cacheline in the high-speed cache of its place processor core.Because various possibilities, thread a is before submitting affairs to, and another thread b also begins to carry out the local code of this transactional version.Thread b is when carrying out TxLoad (list), also the list object is read in the high-speed cache of self place processor core from internal memory, and the state of respective caches piece changed to TS from I, the director cache to other processor cores sends a BusTxnRd message simultaneously.After the director cache of thread a is received this message, because the state of its relevant block is TM, state transition graph according to the TMSI agreement of Fig. 2, the director cache of thread a place processor core sends a BusConflict message to the director cache of thread b place processor core, shows to have detected conflict.After thread b receives this message, produce an interruption, call the conflict arbitration method in the software transactional memory bank.If arbitration result is the b rollback, then the backing method that calls in the software transactional memory bank of thread b will return to the state of thread b when beginning to the modification of internal memory in the affairs, and calls the maintaining cached state of aforesaid FlushAllTxnCacheLineAbort instruction.If arbitration result is a rollback, then thread b sends a TxnAbortOther message to the director cache of thread a, after a receives this message, by similar method with self rollback.If not conflict generation in the affairs implementation, when then the java runtime system carries out txn_end, call in the software transactional memory bank corresponding method and submit the modification of affairs to internal memory, make amended result as seen, call the maintaining cached state of FlushAllTxnCacheLineCommit simultaneously every other affairs.
From top scene as can be seen, read-write interception, collision detection in the implementation procedure of transactional memory are all finished by the TMSI agreement, and other are claimed as the function of calling the software transactional memory bank, for example initialization, conflict arbitration, transaction rollback etc. are all finished by software, thereby this implementation is also referred to as mixed mode.Because in mixed mode, with workload in the affairs implementation is big, long part expends time in, as write down read-write and gather, carry out collision detection, realize by hardware, and with part simple in the affairs implementation, as beginning, the submission log record of affairs, realize therefore having very high execution efficient by software.
The successful execution of above-mentioned mixed mode depends on the shared object that is loaded into high-speed cache can be by the high-speed cache that swaps out in the affairs implementation, and high-speed cache can not overflow.In order to guarantee the successful execution of affairs,, can adopt the pure software pattern to carry out affairs when existing shared object when affairs are carried out, to be swapped out or high-speed cache when situation such as overflowing.Compare with aforesaid mixed mode, in the pure software pattern, instant compiler in the system is when compiling the code relevant with affairs, insertion is called method in the software transactional memory bank in the runtime system, but not the calling of the machine instruction of aforementioned newly-increased support affairs type internal memory, then in the process of implementation, finish detection and the solution that affairs are conflicted by the method for calling in the described software transactional memory bank.The specific implementation of pure software pattern can adopt the implementation method of the pure software pattern in the aforesaid list of references 4 in one embodiment of the invention with reference to various software implementation methods of the prior art.
In the above description, be example all with the Java environment, the specific implementation step of the inventive method is illustrated.But those of ordinary skill in the art should understand, method of the present invention is not limited to the Java environment, any one programming language with runtime environment (is also referred to as the trustship language in the prior art, or managed language), can directly use the method that the present invention proposes, for example .NET and other some scripts.And for the programming language that does not have runtime environment (being also referred to as the unmanaged language), method of the present invention can be applied in the corresponding programming language by other modes such as storehouses.
It should be noted last that above embodiment is only unrestricted in order to technical scheme of the present invention to be described.Although the present invention is had been described in detail with reference to embodiment, those of ordinary skill in the art is to be understood that, technical scheme of the present invention is made amendment or is equal to replacement, do not break away from the spirit and scope of technical solution of the present invention, it all should be encompassed in the middle of the claim scope of the present invention.

Claims (15)

1. realizing transactional memory, realize having on the computing machine of a plurality of processor cores, described processor core comprises privately owned high-speed cache, shared drive for all processor core visits is arranged on described computing machine, in the micro-order controller of described processor core, also defined TxLoad instruction and TxStore instruction; The software transactional memory bank also is installed on described computing machine; This method comprises:
Step 1), one section program statement is compiled into bytecode;
Step 2), from described bytecode identification and the extraction bytecode relevant with affairs, and shared object wherein carried out mark;
Step 3), instant compiler will the bytecode relevant with affairs be compiled into the local code of transactional version; In described compilation process, behind the compiling result of respective byte sign indicating number, add TxLoad instruction or TxStore instruction or calling to software transactional memory bank interface according to the semanteme of described bytecode; Wherein,
The processor core that described TxLoad designated command is carried out this instruction sends PrTxnRd message to the director cache of this processor core, and reads the data of a pointer length from high-speed cache; Described PrTxnRd message is used for according to the current state of data block described state being adjusted, and sends the BusTxnRd message that is used for the piece of TM state is done the detection conflict to bus;
The processor core that described TxStore designated command is carried out this instruction sends PrTxnWr message to the director cache of this processor core, and with the data write cache of a pointer length; Described PrTxnWr message is used to write data, according to the current state of data block described state is adjusted, and is sent the BusTxnWr message that is used for the piece of TM state and TS state is done collision detection to bus.
2. realizing transactional memory according to claim 1 is characterized in that, also comprises:
Thread parallel on step 4), a plurality of processor cores of computing machine is carried out the local code after the instant compiler compiling, described TxLoad instruction in the affairs or described TxStore instruction are operated accordingly according to the instruction implication, the method that the cacheline conflict that described operation is caused is called in the described software transactional memory bank is carried out ruling, rollback, and with the maintaining cached state of FlushAllTxnCacheLineAbort instruction;
Wherein, described FlushAllTxnCacheLineAbort instruction resets the T zone bit in all high-speed caches, is the S state with the TS status modifier, is the I state with the TM status modifier.
3. realizing transactional memory according to claim 2, it is characterized in that, in described step 4), also comprise: described TxLoad instruction in the affairs or described TxStore instruction are operated the conflict that does not cause cacheline accordingly according to implication, then after affairs are carried out end, the method of calling in the software transactional memory bank is submitted the modification of affairs to internal memory to, make amended result as seen, and call the maintaining cached state of FlushAllTxnCacheLineCommit instruction every other affairs; Wherein, described FlushAllTxnCacheLineCommit instruction resets the T zone bit in all high-speed caches, and the TS status modifier is S, and the TM status modifier is M.
4. according to claim 1 or 2 or 3 described realizing transactional memory, it is characterized in that, described step 3) also comprises, for the bytecode of the functional dependence called in the affairs compiles out the local code of transactional version and common local code simultaneously, in the process that local code is carried out, determine the version carried out according to whether being in transaction context.
5. according to claim 1 or 2 or 3 described realizing transactional memory, it is characterized in that, in described step 3), when will the bytecode relevant being compiled into the local code of transactional version with affairs, described bytecode is the read operation instruction, insert the TxLoad instruction behind the compiling result of described read operation instruction, described bytecode is the write operation instruction, inserts the TxStore instruction behind the compiling result of described write operation instruction.
6. realizing transactional memory according to claim 4, it is characterized in that, in described step 3), when will the bytecode relevant being compiled into the local code of transactional version with affairs, described bytecode is the read operation instruction, insert the TxLoad instruction behind the compiling result of described read operation instruction, described bytecode is the write operation instruction, inserts the TxStore instruction behind the compiling result of described write operation instruction.
7. according to claim 1 or 2 or 3 described realizing transactional memory, it is characterized in that, in described step 3), after described director cache is received described PrTxnRd message, if cache hit, and the state of respective data blocks is M or S, is TS with its Status Change then, and sends BusTxnRd message on bus; If high-speed cache does not hit or the state of respective data blocks is I, then directly on bus, send BusTxnRd message, and wait for and from the high-speed cache of other processor cores or shared drive, to obtain the data that will read that high-speed cache is changed to TS with state after obtaining data to pack self high-speed cache into.
8. realizing transactional memory according to claim 4, it is characterized in that, in described step 3), after described director cache is received described PrTxnRd message, if cache hit, and the state of respective data blocks is M or S, is TS with its Status Change then, and sends BusTxnRd message on bus; If high-speed cache does not hit or the state of respective data blocks is I, then directly on bus, send BusTxnRd message, and wait for and from the high-speed cache of other processor cores or shared drive, to obtain the data that will read that high-speed cache is changed to TS with state after obtaining data to pack self high-speed cache into.
9. realizing transactional memory according to claim 7, it is characterized in that, in high-speed cache, state is that the piece of I, S, TS is not done response to described BusTxnRd message, state is after the piece of M is received BusTxnRd message, this blocks of data is sent to the source high-speed cache of BusTxnRd, simultaneously this blocks of data is write shared drive, state is that the piece of TM receives that BusTxnRd message represents to detect conflict.
10. realizing transactional memory according to claim 8, it is characterized in that, in high-speed cache, state is that the piece of I, S, TS is not done response to described BusTxnRd message, state is after the piece of M is received BusTxnRd message, this blocks of data is sent to the source high-speed cache of BusTxnRd, simultaneously this blocks of data is write shared drive, state is that the piece of TM receives that BusTxnRd message represents to detect conflict.
11. according to claim 1 or 2 or 3 described realizing transactional memory, it is characterized in that, in described step 3), after described director cache is received described PrTxnWr message, if described cache hit, and the state of respective data blocks is M or S, and the Status Change that then new data is write behind this piece this piece is TM, and sends a BusTxnWr message to bus; If high-speed cache does not hit or the relevant block state is I, then send a BusTxnWr message to bus.
12. realizing transactional memory according to claim 4, it is characterized in that, in described step 3), after described director cache is received described PrTxnWr message, if described cache hit, and the state of respective data blocks is M or S, and the Status Change that then new data is write behind this piece this piece is TM, and sends a BusTxnWr message to bus; If high-speed cache does not hit or the relevant block state is I, then send a BusTxnWr message to bus.
13. realizing transactional memory according to claim 11 is characterized in that, in described high-speed cache, state is that the piece of I is not done response to BusTxnWr message; State is that the piece of S receives that Status Change is I after the BusTxnWr message; State is that the piece of M receives that Status Change is I after the BusTxnWr message, and data is sent to simultaneously source high-speed cache and the shared drive of BusTxnWr; State is that the piece of TM, TS receives that BusTxnWr message represents to detect conflict.
14. realizing transactional memory according to claim 12 is characterized in that, in described high-speed cache, state is that the piece of I is not done response to BusTxnWr message; State is that the piece of S receives that Status Change is I after the BusTxnWr message; State is that the piece of M receives that Status Change is I after the BusTxnWr message, and data is sent to simultaneously source high-speed cache and the shared drive of BusTxnWr; State is that the piece of TM, TS receives that BusTxnWr message represents to detect conflict.
15. realizing transactional memory according to claim 1 is characterized in that, the program language of described program statement is the trustship language, comprises Java language or .NET language.
CN2008102390105A 2008-12-04 2008-12-04 Method for implementing transactional memory Expired - Fee Related CN101425052B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2008102390105A CN101425052B (en) 2008-12-04 2008-12-04 Method for implementing transactional memory

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2008102390105A CN101425052B (en) 2008-12-04 2008-12-04 Method for implementing transactional memory

Publications (2)

Publication Number Publication Date
CN101425052A CN101425052A (en) 2009-05-06
CN101425052B true CN101425052B (en) 2010-06-09

Family

ID=40615678

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2008102390105A Expired - Fee Related CN101425052B (en) 2008-12-04 2008-12-04 Method for implementing transactional memory

Country Status (1)

Country Link
CN (1) CN101425052B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102163164B (en) * 2011-05-06 2014-06-25 北京华为数字技术有限公司 Processing method and processor for critical data in shared memory
CN103235745B (en) * 2013-03-27 2016-08-10 华为技术有限公司 A kind of address conflict detecting method and device
CN108228483B (en) * 2016-12-15 2021-09-14 北京忆恒创源科技股份有限公司 Method and apparatus for processing atomic write commands
CN110675256B (en) * 2019-08-30 2020-08-21 阿里巴巴集团控股有限公司 Method and device for deploying and executing intelligent contracts
US10783082B2 (en) 2019-08-30 2020-09-22 Alibaba Group Holding Limited Deploying a smart contract

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007067390A2 (en) * 2005-12-07 2007-06-14 Microsoft Corporation Optimization of software transactional memory operations
CN101273332A (en) * 2005-09-30 2008-09-24 英特尔公司 Thread-data affinity optimization using compiler
CN101300556A (en) * 2005-11-28 2008-11-05 国际商业机器公司 Method and system allowing for indeterminate read data latency in a memory system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101273332A (en) * 2005-09-30 2008-09-24 英特尔公司 Thread-data affinity optimization using compiler
CN101300556A (en) * 2005-11-28 2008-11-05 国际商业机器公司 Method and system allowing for indeterminate read data latency in a memory system
WO2007067390A2 (en) * 2005-12-07 2007-06-14 Microsoft Corporation Optimization of software transactional memory operations

Also Published As

Publication number Publication date
CN101425052A (en) 2009-05-06

Similar Documents

Publication Publication Date Title
JP6342970B2 (en) Read and write monitoring attributes in transactional memory (TM) systems
JP5608738B2 (en) Unlimited transactional memory (UTM) system optimization
US10430190B2 (en) Systems and methods for selectively controlling multithreaded execution of executable code segments
KR101778479B1 (en) Concurrent inline cache optimization in accessing dynamically typed objects
Christie et al. Evaluation of AMD's advanced synchronization facility within a complete transactional memory stack
Larus et al. Transactional memory
US20090217020A1 (en) Commit Groups for Strand-Based Computing
US8457943B2 (en) System and method for simulating a multiprocessor system
Litz et al. SI-TM: Reducing transactional memory abort rates through snapshot isolation
US20090172303A1 (en) Hybrid transactions for low-overhead speculative parallelization
CN111742301A (en) Logging cache inflow to higher level caches by request
CN114580344B (en) Test excitation generation method, verification system and related equipment
JP2021515312A (en) Trace recording by logging inflows into the lower layer cache based on entries in the upper layer cache
JP2005332387A (en) Method and system for grouping and managing memory instruction
CN101425052B (en) Method for implementing transactional memory
CN113196243A (en) Improving simulation and tracking performance using compiler-generated simulation-optimized metadata
Honarmand et al. RelaxReplay: Record and replay for relaxed-consistency multiprocessors
US20060149940A1 (en) Implementation to save and restore processor registers on a context switch
US10671400B2 (en) Enhanced managed runtime environments that support deterministic record and replay
US8387009B2 (en) Pointer renaming in workqueuing execution model
US20090177871A1 (en) Architectural support for software thread-level speculation
Mühlig et al. MxTasks: How to Make Efficient Synchronization and Prefetching Easy
CN106775501A (en) Elimination of Data Redundancy method and system based on nonvolatile memory equipment
CN112099799B (en) NUMA-aware multi-copy optimization method and system for SMP system read-only code segments
Irving et al. BifurKTM: Approximately consistent distributed transactional memory for GPUs

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
ASS Succession or assignment of patent right

Owner name: BEIJING FENGHUICAIZHI INTELLECTUAL PROPERTY CONSUL

Free format text: FORMER OWNER: INSTITUTE OF COMPUTING TECHNOLOGY, CHINESE ACADEMY OF SCIENCES

Effective date: 20121105

C41 Transfer of patent application or patent right or utility model
COR Change of bibliographic data

Free format text: CORRECT: ADDRESS; FROM: 100190 HAIDIAN, BEIJING TO: 100193 HAIDIAN, BEIJING

TR01 Transfer of patent right

Effective date of registration: 20121105

Address after: 100193 Beijing City, Haidian District Zhongguancun Software Park incubator room 2A2330

Patentee after: Beijing Feng Hui Choi intellectual property consultant limited liability company

Address before: 100190 Haidian District, Zhongguancun Academy of Sciences, South Road, No. 6, No.

Patentee before: Institute of Computing Technology, Chinese Academy of Sciences

ASS Succession or assignment of patent right

Owner name: LAN WEIPING

Free format text: FORMER OWNER: BEIJING FENGHUICAIZHI INTELLECTUAL PROPERTY CONSULTANT CO., LTD.

Effective date: 20150805

C41 Transfer of patent application or patent right or utility model
TR01 Transfer of patent right

Effective date of registration: 20150805

Address after: 100193, room 2, building 2223, incubator, Zhongguancun Software Park, Haidian District, Beijing

Patentee after: Lan Weiping

Address before: 100193 Beijing City, Haidian District Zhongguancun Software Park incubator room 2A2330

Patentee before: Beijing Feng Hui Choi intellectual property consultant limited liability company

CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20100609

Termination date: 20161204

CF01 Termination of patent right due to non-payment of annual fee