CN117687744A

CN117687744A - Method for dynamically scheduling transaction in hardware transaction memory

Info

Publication number: CN117687744A
Application number: CN202311539849.1A
Authority: CN
Inventors: 韩军; 万力
Original assignee: Fudan University
Current assignee: Fudan University
Priority date: 2023-11-19
Filing date: 2023-11-19
Publication date: 2024-03-12

Abstract

The invention belongs to the technical field of on-chip multiprocessors, and particularly relates to a method for dynamically scheduling transactions in a hardware transaction memory. The invention mainly enhances the original memory subsystem, introduces a local scheduling module into the private cache, introduces a global scheduling module into the shared cache, and forms a layered transaction dynamic scheduling system. The local scheduling module is mainly responsible for scheduling part of transactions so as to enable part of conflict-friendly transactions to be started quickly, and specifically comprises a local transaction conflict predictor module and extended private cache controller logic; the global scheduling module is responsible for scheduling the rest of the transactions, and only allows the transactions with smaller probability to collide with the running transactions to be started on the basis of the comprehensive global collision information, and specifically comprises a global transaction collision predictor module and an expanded shared buffer controller logic. The invention is used as the expansion of the hardware transaction memory, and dynamically controls the starting of the transaction under the condition of being transparent to software, thereby reducing the occurrence of conflict again.

Description

Method for dynamically scheduling transaction in hardware transaction memory

Technical Field

The invention belongs to the technical field of on-chip multiprocessors, and particularly relates to a method for dynamically scheduling transactions in a hardware transaction memory.

Background

With the advent of multi-core processors, multi-threaded programming has become critical in fully exploiting modern computer hardware performance. However, multithreaded programming is often accompanied by complex concurrency control problems, such as data race and deadlock, making developing and debugging multithreaded applications complex and difficult. Traditional mechanisms for synchronizing using locks, such as mutual exclusion locks and semaphores, are used to control multithreaded access to shared data, but they can become bottlenecks in multi-core performance and are prone to problems such as deadlock and starvation. Transactional memory is a concurrency programming paradigm aimed at providing simpler, safer concurrency control, making it easier for developers to write concurrency programs, while exhibiting higher performance potential due to the lockless nature. Compared with a transaction memory implemented by software, the hardware transaction memory achieves higher performance at the cost of increasing certain hardware complexity, and is favored by extensive scientific researchers and hardware manufacturers. Despite the long-term full-scale search of hardware transactional memory performance by the academia, hardware vendors have eventually employed a hardware transactional memory (Best-effort) implementation to compromise implementation complexity and verification difficulty.

As in reference 1, intel implements a best effort hardware transactional memory that identifies the beginning and end of a transaction by the execution of transaction start and commit instructions by the Core (CPU). The end of the transaction is that the core normally runs to the transaction commit instruction to successfully commit the transaction, and the transaction is possibly failed to be ended in the middle of the transaction in advance due to the fact that memory conflict and other factors are encountered in the running process. The private cache of the Core (CPU) receives a request for starting a transaction sent by the CPU, and enters a transaction state, and a subsequent access request sent by the Core (CPU) is recorded in a read-write set by the private cache. When the private cache receives access requests from other Cores (CPU), conflict detection is needed through a read-write set, and a conflict arbitration strategy adopts a strategy that a requester always wins, namely if the conflict is detected, the transaction receiving the request directly fails, and the requested transaction runs normally. After the transaction fails, the private cache read-write set needs to be cleared, and the Core (CPU) rolls back to the checkpoint established at the beginning of the transaction.

In practice, it is found that the simple implementation mode can greatly cause mutual termination of the transactions, and no transaction can be successfully submitted within a period of time, so that the method is one of the main reasons for unstable performance of the hardware transaction memory.

Disclosure of Invention

In order to alleviate the problem that the mutual termination of the transactions in the hardware transactional memory may cause performance degradation and instability, the invention provides a method for dynamically scheduling the transactions in the hardware transactional memory, selectively pausing the transactions which may cause re-collision without modifying the original conflict management strategy, and waking up the transactions in the future when appropriate.

The method for dynamically scheduling the transaction in the hardware transaction memory provided by the invention can be realized on a hardware transaction memory architecture (reference 1) similar to the best effort as an extension; on the basis, the transaction dynamic scheduling transparent to software is realized through a hardware scheduler residing in a cache; specifically, the invention mainly enhances the original memory subsystem, introduces a local (local) scheduling module in the private cache and introduces a global scheduling module in the shared cache, and the two modules form the layered dynamic scheduling system of the invention. The local (local) scheduling module is mainly responsible for scheduling part of transactions so as to enable part of conflict-friendly transactions to be started quickly; specifically including a local transaction conflict predictor module and extended private cache controller logic. The global scheduling module is mainly responsible for scheduling the rest of the transactions, and only allows the transaction with smaller probability to conflict with the running transaction to be started on the basis of comprehensive global conflict information; specifically including a global transaction conflict predictor module and extended shared cache controller logic.

The invention provides a local (local) scheduling module residing in a private cache, which comprises: a local transaction conflict predictor module and extended private cache controller logic; wherein:

the local transaction conflict predictor module gives a preliminary prediction of whether the current transaction can conflict with other running transactions according to the submitting condition of the running transaction on the current core history. In order to minimize overhead and to maximize response speed, the present invention uses only two bits to store the current collision status. The result of the collision prediction can be obtained by simply decoding the collision state of the two bits. Specifically, a 00 code initialization state is adopted to indicate that low confidence coefficient has no conflict; 01 code high confidence no conflict; 10 coding low confidence has conflict; 11 coding high confidence has a conflict. The state machine jump logic is as follows: in the low-confidence conflict-free state, if an event submitted by a transaction is received, the next cycle jumps to the high-confidence conflict-free state, and if an event terminated by the transaction is received, the next cycle jumps to the low-confidence conflict-free state; in the high-confidence conflict-free state, if an event submitted by a transaction is received, the state is kept unchanged, and if an event terminated by the transaction is received, the next cycle jumps to the low-confidence conflict-free state; under the conflict state of low confidence, if an event submitted by a transaction is received, the next cycle jumps to the conflict-free state of low confidence, and if an event terminated by the transaction is received, the next cycle jumps to the conflict state of high confidence; in the high confidence conflict state, if the event submitted by the transaction is received, the next cycle jumps to the low confidence conflict-free state, and if the event terminated by the transaction is received, the state remains unchanged. When the conflict state is in a low confidence conflict-free or high confidence conflict-free state, referred to as a conflict-friendly state, transactions initiated in this state are considered conflict-friendly transactions and can be initiated quickly.

The extended private cache controller logic mainly extends the processing of access requests for the transaction start and the transaction commit sent by the CPU. When the request which is not related to the transaction and received by the cache controller, the request is processed according to the original logic. But when the cache controller receives a request for the start of a transaction, it needs to take different actions according to the results of the local transaction collision predictor module described above. Specifically, if the result of the local transaction conflict predictor is no conflict, the processing flow is consistent with the previous, the cache is set to be in a transaction mode, and the only difference is that a message of transaction start needs to be sent to the next-level shared cache for updating the relevant table entry of the global scheduler. If the result of the local transaction conflict predictor is that there is a conflict, the request needs to be sent to the next level cache to wait for the result of the next level cache response, similar to the common request in the cache miss. The request for acquiring the transaction starting authority is required to be sent to the next-level cache, if the response of the next-level cache is the grant, the cache is set to be in a transaction mode, and then the CPU is responded according to the original logic, so that the whole process of processing the request for transaction starting is completed; if the response of the next level cache is a refusal grant, the cache state is not modified, i.e. is in a non-transactional state, and a response indicating that the request for the start of the transaction is refused is returned to the CPU. The CPU receives the response that the transaction starting request is refused, enters a dormant state and waits for awakening.

The invention provides a global scheduling module residing in a shared cache, which mainly comprises the following steps: a global transaction conflict predictor module and extended shared cache controller logic; wherein:

the global transaction conflict predictor module predicts whether the running of the transaction on the current core can conflict with the running transaction according to the conflict confidence among the transactions running on the current cores, the transaction state of the cores and the configured conflict confidence threshold value. The global transaction conflict predictor module has a core ID as input, a 0 as output, a start of accepting a transaction, a 1 as output, and a reject transaction. The global transaction conflict predictor needs to maintain a conflict confidence table with (N x (N-1))/2 entries, where N represents the number of cores in the system. Each entry requires 32 bits and stores the confidence of the transaction conflict on the < i, j > core pair. In addition, two tables with N table entries, namely a core transaction state table and a core waiting state table, are required to be maintained, wherein each table entry only needs one bit, and whether the current core is running a transaction (1 represents that the transaction is being executed) is stored; the latter also requires only one bit per entry, storing whether the current core is denied execution of the transaction, in a waiting state (1 indicates in a waiting state). There is also a configurable parameter table with a total of five entries, a conflict confidence threshold, a historical duty cycle coefficient, a conflict manufacturer coefficient, a conflict witness coefficient, and a commit witness coefficient, respectively. The specific logic is that the input core ID is assumed to be x, and the remaining core IDs in the system are 1,2,3, and 4. Firstly, selecting the table items corresponding to < x,1>, < x,2>, < x,3>, < x,4> from a conflict confidence level table, and simultaneously selecting the table items corresponding to 1,2,3,4 from a transaction state table, and performing an AND operation on the elements corresponding to the two to obtain four corresponding results (the value of the table item < x,1> in the conflict confidence level table is a, if the core1 is in a non-transaction state, the result of the two phases after sign bit expansion is zero, otherwise the result of the two phases is the conflict confidence level a). The four results after the phase are sequentially compared with a confidence threshold (if the confidence value is larger than the conflict confidence threshold, a 1 is output, otherwise a 0 is output), and the four results after the phase are subjected to OR operation to obtain a final output (as long as one running transaction exists, the conflict confidence between the transaction and the requested transaction is larger than the conflict confidence threshold, and the output is 1).

Further, the table entry update flow of the global transaction conflict predictor is as follows:

(1) When a request for starting the application transaction is received: if the result given by the global transaction conflict predictor is 1, namely the transaction is refused to start, the transaction state table is not updated, and only a response indicating that the transaction needs to be paused is sent to the upper-level cache; if the result given by the global scheduler is 0, namely, the transaction is accepted to start, updating the corresponding table entry of the transaction state table to be 1, and simultaneously sending a response indicating that the transaction can start to the upper-level cache;

(2) When a request is received informing of the start of a transaction: updating the corresponding table entry of the transaction state table to be 1;

(3) The formula for conflict confidence update is:

I(n) _xy ＝AI(n-1) _xy +(1-A)TT，

wherein I (n) _xy Indicating updated confidence of conflict between core x and core y, and I (n-1) _xy Representing the confidence of the conflict between core x and core y before the update; wherein A is oneA coefficient greater than or equal to 0 and less than or equal to 1 indicates the influence degree of the conflict confidence coefficient by history, if the influence degree is closer to 1, the influence ratio by the history information is larger, and the change of the conflict confidence coefficient is smoother; if the ratio is closer to 0, the ratio is more influenced by the current event, and the change of conflict confidence is more rugged; TT represents the stimulus for a particular event; the invention totally summarizes three events with great relation with conflict prediction: when a transaction on one core commits, the other cores are running the transaction, and the value of TT is the commit witness coefficient in the configurable parameter table; when a transaction on a core is terminated and the reason for the termination is that a memory conflict occurs, and the core is just the core that caused the conflict, the value of TT is the conflict manufacturer coefficient in the configurable parameter table; when a transaction on a core is terminated and the cause of the termination is that a memory conflict occurs and the core is running a transaction but not the conflicting core, then the value of TT is the conflicting witness coefficient in the configurable parameters;

(4) When a request is received informing of the end of the transaction and is a successful commit: updating the corresponding table entry of the transaction state table to be 0; according to the current transaction state table, TT=submitting witness coefficients are adopted for cores running transactions, and conflict confidence is updated;

(5) When a request is received informing of the end of the transaction and is a failed termination: updating the corresponding table entry of the transaction state table to be 0; the cores causing the conflict adopt TT=conflict manufacturer coefficients, and the conflict confidence is updated; updating the conflict confidence is performed on the remaining cores, the transaction state table of which shows that the transaction is running, by adopting tt=conflict witness coefficient.

The extended shared cache controller logic is mainly extended to the processing of the request related to the transaction, and the processing logic of the rest requests is kept unchanged. When a request is received to apply for the beginning of a transaction, the results of the global transaction conflict predictor need to be queried. If the predicted result is that the conflict exists, the beginning of the refused transaction is used as a response, the transaction state table is kept unchanged, and the corresponding table entry in the transaction waiting table is set to be 1; if the predicted result is that there is no conflict, the corresponding table entry of the transaction state table is updated to 1 in response to accepting the transaction start. When a request informing of the start of a transaction is received, the corresponding entry of the transaction state table is updated to 1. When a request informing the end of a transaction is received, the corresponding conflict confidence table and transaction state table need to be updated as described above, while attempting to wake up the core that was refused to begin.

Further, the flow of the mechanism to dynamically schedule the transaction is as follows:

(1) A first-level scheduling module positioned in the private cache receives a request for starting a transaction sent by a CPU, and a local transaction conflict predictor starts to work; if the predicted result is that no conflict exists, the transaction is started normally, and the start of the next level of shared cache transaction is notified at the same time, so that the related table entry of the second level scheduling module is updated; if the predictor results in conflict, the permission of the transaction start needs to be applied to the next level of shared cache;

(2) When a second-level scheduling module positioned in the shared cache receives a request for applying transaction starting authority, the global transaction conflict predictor starts working; if the predicted result is no conflict, returning a response for accepting the start of the transaction, and recording that the core of the request is in a transaction state; if the predicted result is that the conflict exists, returning a response for refusing the start of the transaction, and recording that the core of the request is in a waiting state; when the private cache receives the response of the lower level cache, returning a corresponding response to the CPU according to the response result;

(3) The second-level scheduling module is positioned in the shared cache, and updates the corresponding table entry of the transaction state table to be in a non-transaction state if a request for notifying the start of a transaction is received; if a request for notifying the transaction end is received, updating the corresponding table entry of the transaction state table to be in a non-transaction state, calling a global conflict predictor to predict whether all cores waiting for the transaction are in conflict or not, and waking up all cores which are not in conflict with the running transaction;

(4) To avoid starvation of cores waiting somewhat in high-conflict applications, a counter is set for each waiting core in the second level scheduling module; the waiting core is forced to wake up when it has not been already awakened when the counter is zeroed.

Compared with the prior art, the invention has the following main characteristics and advantages:

(1) Compared with the existing best-effort hardware transactional memory, the dynamic scheduling method provided by the invention is used for expanding, so that the possibility of collision reoccurrence can be remarkably reduced; in the original system, the scene that the transactions terminate mutually is quite common, but the dynamic scheduling method plays a key role in the common scene, so that the stability of the system performance is greatly improved;

(2) The transaction dynamic scheduling method provided by the invention is realized by adopting simple and efficient hardware, and the performance cost of transaction scheduling is greatly reduced; meanwhile, the invention is completely transparent in the software layer, and the original software framework is not required to be modified, thereby reducing the burden of deployment; furthermore, as an extension, the present invention provides more flexible transaction scheduling options through configurable parameters.

Drawings

FIG. 1 is a diagram of the overall architecture of a method for dynamically scheduling transactions in a hardware transactional memory.

FIG. 2 is a schematic diagram of the architecture of a local transaction conflict predictor.

FIG. 3 is a process flow diagram of an extended private cache controller.

FIG. 4 is a schematic architecture diagram of a global transaction conflict predictor.

FIG. 5 is a process flow diagram of an extended shared cache controller.

Detailed Description

The overall architecture of the transaction dynamic scheduling method in the hardware transactional memory is shown in fig. 1. For convenience of description, only two cores, namely CPU#0 and CPU#1, are drawn in the figure, each core has a private cache, and the private cache is loaded with a local transaction conflict predictor; in addition, each private cache is connected to the shared next-level cache, namely an L2 cache in the figure, through an Internet network. The shared cache L2 adopts a cache consistency protocol of a directory structure, and is provided with a global transaction conflict predictor.

First, as shown in fig. 1, according to the original logic, when a core executes an instruction identifying the start of a transaction, the instruction has a memory access attribute, and a request for starting the transaction is sent to a cache side. First, the first level private cache, i.e. the L1 cache in the figure, is received, and the cache controller of L1 will initiate a request to the local transaction collision predictor. The local transaction collision predictor is essentially a two-bit state machine, the state transition logic of which is shown in fig. 2. If the current conflict state is low confidence and no conflict or high confidence and no conflict, outputting 0 to indicate that no conflict exists; if the current conflict state is that the low confidence level is conflicting or the high confidence level is conflicting, an output of 1 indicates that there is a conflict.

FIG. 3 illustrates the processing logic of the private cache controller after adding transaction dynamic scheduling extension support, when the prediction result of the local transaction conflict predictor is no conflict, the processing flow is consistent with the previous, the cache state is set to be the transaction state and a response that the transaction normally starts is returned to the CPU, the only difference is that a special message needs to be sent to the next level cache for notifying the next level cache that the core enters the transaction state; when the prediction result of the local transaction conflict predictor is that a conflict exists, a request for acquiring the transaction starting authority needs to be sent to the next-level shared cache, and whether the conflict exists really can be confirmed when a response of the next level is received. The process is similar to the common access request, has cache miss, needs to send a request to the next level of cache, and can process the request after waiting for the response of the next level. When receiving a response of requesting transaction starting permission from the next-level cache, if the response result is that the transaction is accepted to start, the processing flow is the same as that of the local conflict predictor giving a conflict-free result; if the response result is refusal of the transaction start, the buffer state is kept to be a non-transaction state, and a response of the transaction start failure is sent to the CPU.

FIG. 4 is a schematic diagram of a global conflict predictor architecture in a shared cache, wherein there are four main tables, respectively a conflict confidence table, recording the conflict probability of transactions between cores; the core transaction state table records whether the core is running a transaction; the core wait state table records whether the core is waiting for waking up; the configurable parameter table has a conflict confidence, a historical duty cycle, a conflict manufacturer coefficient, a conflict witness coefficient, and a submit witness coefficient for updating the conflict confidence table and providing a result of the conflict prediction. Wherein the initial value of the collision confidence of the core is set to 0, and the transaction state table and the wait state table of the core are initialized to 0, which respectively indicate that no core is conducting transactions and no core is waiting to wake up. When the shared L2 cache in the figure receives a transaction start request from the L1 cache #0, the conflict confidence of the core0 and the rest cores in the system is found from the core conflict confidence table, the transaction states of the rest cores in the system are found from the core transaction state table, and the two groups of data are subjected to AND operation. And comparing the result after the operation with a configurable conflict confidence threshold value, and finally, obtaining the comparison result or the comparison result as a predicted final result. If the conflict confidence of all cores running the transaction and the core requesting to start the transaction is lower than or equal to the conflict confidence threshold, the core is indicated to be capable of running the transaction, and the conflict is not caused in a high probability; conversely, if the conflict confidence of at least one core that is running a transaction with the core requesting to begin the transaction is above the conflict confidence threshold, it is stated that the core may not be running the transaction because of the high probability of conflict with the running transaction.

FIG. 5 is processing logic of the shared cache controller after joining transaction dynamic scheduling extension support. If the shared cache receives a request for the start of a transaction, the global conflict predictor begins to operate and gives the result of the conflict prediction. The L2 cache needs to send a corresponding response to the previous level cache according to the result. When the L2 cache receives the transaction starting request of the L1 cache #0 at a certain time, if the predicted result is that no conflict exists, modifying the table entry corresponding to the transaction state table core0 in fig. 4 to be 1, and simultaneously returning a response for receiving the transaction starting to the L1 cache #0; if the predicted result is that there is a conflict, the transaction state table does not need to be modified, a response for rejecting the start of the transaction is directly returned to the L1 cache #0, and the entry corresponding to the wait state table core0 in FIG. 4 is modified to be 1.

Assuming that the L1 cache #0 receives a response from the L2 cache accepting the start of the transaction, at some point the CPU #1 initiates a request to start the transaction, the local transaction conflict predictor of the L1 cache #1 predicts no conflict, and sends a notification of the start of the transaction to the next level cache while returning a response to the normal start of the transaction to the CPU #1. When the shared L2 cache receives the notification of the start of the transaction, the entry corresponding to the transaction state table core1 is 1. If the L1 cache #0 receives a conflict request from the L1 cache #1 after a period of time, the transaction on the CPU #0 needs to terminate itself according to the conflict arbitration policy won by the requester, and the original data of the transaction is given to the CPU #1 as a normal response. When the CPU #0 sends a request for ending the transaction to the memory subsystem, the L1 cache #0 sends an event of transaction ending to the local transaction conflict predictor, the conflict state is changed from low confidence conflict-free to low confidence conflict, and simultaneously, a request for ending the transaction is sent to the next-level cache, and the ID of the conflict manufacturer CPU #1 is carried in the message. When the shared L2 cache receives notification of the failure of the CPU #0 transaction, the entry corresponding to the core transaction state table core0 in fig. 4 is modified to 0. At the same time, the < core0, core1> entries in the conflict confidence table are updated with TT = conflict manufacturer coefficients. If core2 is also in the transaction at this time, then the < core0, core2> entry in the conflict confidence table is updated with the tt=conflict witness coefficient. After a period of back-off time, the CPU #0 restarts the transaction, when the conflict state of the local conflict predictor of the L1 cache #0 is low in confidence and has a conflict, and the output result is 1, which indicates that the conflict exists, a request for requesting the start of the transaction needs to be sent to the next-level shared cache. When the L2 shared cache receives the transaction start request from the L1 cache #0, the global conflict predictor begins working. Assuming a 1 prediction indicates a conflict, the L2 cache replies to the L1 cache #0 with a response rejecting the start of the transaction, while core0 is recorded in the core wait state table as a wait state. After a period of time, the execution of the transaction by the CPU #1 is completed, the transaction is started to be submitted, and a request is sent to the L1 cache #1. The L1 cache receives the request for transaction commit, sends a commit event to the local conflict predictor, and changes the next cycle conflict state from a low confidence conflict-free state to a high confidence conflict-free state. And simultaneously, notifying the next level of buffer memory of the transaction end, and setting the table entry corresponding to core1 in the core transaction state table to 0 after the L2 shared buffer memory receives the notification of the transaction end. Since core0 is not in the transaction at this time, there is no need to update the conflict confidence of < core1, core0>, and if core2 is in the transaction, then < core1, core2> in the conflict confidence table is updated with tt=commit witness coefficient. At the same time, it is also necessary to check whether there is a core in the core wait state table, in this example, core0 may be awakened because core0 is in the wait state and core1 has already executed to complete a core that is not currently executing. To avoid the problem of core starvation, a counter is bound for each core in the global conflict predictor, ensuring that cores waiting for a certain time are forced to wake up if they have not been already woken up.

Reference to the literature

[1]Rajwar,Ravi,and Martin Dixon.”Intel transactional synchronization extensions.”In Intel Developer Forum San Francisco,vol.2012。

Claims

1. A method for dynamically scheduling transaction in hardware transaction memory is characterized in that the transaction dynamic scheduling transparent to software is realized through a hardware scheduler residing in a cache; specifically, the original memory subsystem is mainly enhanced, a local scheduling module is introduced into a private cache, and a global scheduling module is introduced into a shared cache, so that the two modules form the layered dynamic scheduling system of the invention; the local scheduling module is mainly responsible for scheduling part of transactions so as to enable part of conflict-friendly transactions to be started quickly, and specifically comprises a local transaction conflict predictor module and extended private cache controller logic; the global scheduling module is mainly responsible for scheduling the rest of transactions, only allows the transaction with smaller probability to collide with the running transaction to be started on the basis of comprehensive global collision information, and specifically comprises a global transaction collision predictor module and an expanded shared buffer controller logic; wherein:

the local transaction conflict predictor module gives a preliminary prediction of whether the current transaction conflicts with other running transactions according to the submitting condition of the running transaction on the current core history; wherein two bits are employed to store the current collision status; decoding the conflict state of the two bits to obtain a result of conflict prediction; specifically, a 00 code initialization state is adopted to represent low confidence that no conflict exists; 01 code high confidence no conflict; 10 coding low confidence has conflict; 11 high confidence of codes has conflicts; the state machine jump logic is as follows: in the low-confidence conflict-free state, if an event submitted by a transaction is received, the next cycle jumps to the high-confidence conflict-free state, and if an event terminated by the transaction is received, the next cycle jumps to the low-confidence conflict-free state; in the high-confidence conflict-free state, if an event submitted by a transaction is received, the state is kept unchanged, and if an event terminated by the transaction is received, the next cycle jumps to the low-confidence conflict-free state; under the conflict state of low confidence, if an event submitted by a transaction is received, the next cycle jumps to the conflict-free state of low confidence, and if an event terminated by the transaction is received, the next cycle jumps to the conflict state of high confidence; in the conflict state with high confidence, if an event submitted by a transaction is received, the next cycle jumps to the conflict-free state with low confidence, and if an event terminated by the transaction is received, the state is kept unchanged; a state called conflict-friendly when the conflict state is in a low confidence conflict-free or high confidence conflict-free state, in which a transaction initiated is considered a conflict-friendly transaction, which can be initiated quickly;

the extended private cache controller logic is mainly used for processing the transaction start and the access request submitted by the transaction sent by the CPU; when the cache controller receives a request which is not related to the transaction, processing according to the original logic; when the cache controller receives a request for starting a transaction, different actions are required to be made according to the result of the local transaction conflict predictor module; specifically, if the result of the local transaction conflict predictor is no conflict, the processing flow is consistent with the previous one, the cache is set to be in a transaction mode, and the only difference is that a message for starting the transaction needs to be sent to the next-level shared cache for updating the related table entry of the global scheduler; if the result of the local transaction conflict predictor is that there is a conflict, the request needs to be sent to the next level cache to wait for the result of the next level cache response, similar to the common request in the cache miss; the request for acquiring the transaction starting authority is required to be sent to the next-level cache, if the response of the next-level cache is the grant, the cache is set to be in a transaction mode, and then the CPU is responded according to the original logic, so that the whole processing process of the request for starting the transaction is completed; if the response of the next level cache is refused to authorize, the cache state is not modified, namely the cache state is not in a transaction state, and a response indicating that the request for starting the transaction is refused is returned to the CPU; the CPU receives the response that the transaction starting request is refused, enters a dormant state and waits for awakening;

the global transaction conflict predictor module predicts whether the running of the transaction on the current core can conflict with the running transaction according to the conflict confidence among the transactions running on the current cores, the transaction state of the cores and the configured conflict confidence threshold; the input of the global transaction conflict predictor module is core ID, the output is 0, which indicates that the transaction is accepted, the output is 1, which indicates that the transaction is refused; the global transaction conflict predictor maintains a conflict confidence table having (N x (N-1))/2 entries, where N represents the number of cores in the system; each entry requires 32 bits, storing the confidence of the transaction conflict on the < i, j > core pair; in addition, two tables with N table items are maintained, namely a core transaction state table and a core waiting state table, wherein each table item only needs one bit, and whether the current core is running a transaction (1 represents that the transaction is being executed) is stored; the latter also requires only one bit per entry, storing whether the current core is refused to execute the transaction, in a waiting state (1 indicates in a waiting state); a configurable parameter table comprising five entries including a conflict confidence threshold, a history duty factor, a conflict manufacturer factor, a conflict witness factor, and a submit witness factor; the specific logic is that the input core ID is assumed to be x, and the rest core IDs in the system are 1,2,3 and 4; firstly, selecting the table items corresponding to < x,1>, < x,2>, < x,3>, < x,4> from a conflict confidence level table, and simultaneously selecting the table items corresponding to 1,2,3,4 from a transaction state table, and performing the AND operation on the elements corresponding to the two elements one by one to obtain four corresponding results: the value of the < x,1> table item in the conflict confidence table is a, if the core1 is in a non-transaction state, the result of the sign bit after the sign bit expansion is zero, otherwise, the result of the sign bit after the sign bit expansion is zero is the conflict confidence a; comparing the phase-separated four results with a confidence threshold in sequence, outputting 1 if the phase-separated four results are larger than the conflict confidence threshold, otherwise outputting 0, and performing OR operation on the four compared results to obtain final output;

the extended shared cache controller logic is mainly extended in terms of processing requests related to transactions, and the processing logic of the rest requests is kept unchanged; when a request for applying for the start of a transaction is received, the result of the global transaction conflict predictor needs to be queried; if the predicted result is that the conflict exists, the beginning of the refused transaction is used as a response, the transaction state table is kept unchanged, and the corresponding table entry in the transaction waiting table is set to be 1; if the predicted result is that there is no conflict, the corresponding table entry of the transaction state table is updated to be 1 by taking the start of the accepted transaction as a response; when a request for informing the start of a transaction is received, updating the corresponding table entry of the transaction state table to be 1; when a request informing the end of a transaction is received, the corresponding conflict confidence table and transaction state table are updated as described above, while attempting to wake up the core that was refused to begin.

2. The method for dynamically scheduling transactions in a hardware transactional memory according to claim 1, wherein in the global transaction conflict predictor, an entry update flow is as follows:

(3) The formula for conflict confidence update is:

I(n) _xy ＝AI(n-1) _xy +(1-A)TT，

wherein I (n) _xy Indicating updated confidence of conflict between core x and core y, I (n-1) _xy Representing the confidence of the conflict between core x and core y before the update; a is a coefficient which is more than or equal to 0 and less than or equal to 1, and represents the influence degree of the conflict confidence coefficient by history, if the influence degree is closer to 1, the influence ratio by the history information is larger, and the change of the conflict confidence coefficient is smoother; if the ratio is closer to 0, the ratio is more influenced by the current event, and the change of conflict confidence is more rugged; TT represents the stimulus for a particular event; there are three total events with greater relation to conflict prediction: when a transaction on one core commits, the other cores are running the transaction, and the value of TT is the commit witness coefficient in the configurable parameter table; when a transaction on a core is terminated and the reason for the termination is that a memory conflict occurs, and the core is just the core that caused the conflict, the value of TT is the conflict manufacturer coefficient in the configurable parameter table; when a transaction on a core is terminated and the cause of the termination is that a memory conflict occurs and the core is running a transaction but not the conflicting core, then the value of TT is the conflicting witness coefficient in the configurable parameters;

3. The method for dynamically scheduling transactions in a hardware transactional memory according to claim 1, wherein in the extended shared cache controller logic, the flow of dynamically scheduling transactions by the mechanism is as follows:

(1) A first-level scheduling module positioned in the private cache receives a request for starting a transaction sent by a CPU, and a local transaction conflict predictor starts to work; if the predicted result is that no conflict exists, the transaction is started normally, and the start of the next level of shared cache transaction is notified at the same time, so that the related table entry of the second level scheduling module is updated; if the predictor results in conflict, applying permission for starting the transaction to the next level of shared cache;

(2) When a second-level scheduling module positioned in the shared cache receives a request for applying transaction starting authority, the global transaction conflict predictor starts working; if the predicted result is no conflict, returning a response for accepting the start of the transaction, and recording that the core of the request is in a transaction state; if the predicted result is that the conflict exists, returning a response for refusing the transaction to start, and recording that the core of the request is in a waiting state; when the private cache receives the response of the lower level cache, returning a corresponding response to the CPU according to the response result;