CN102681890A - Restrictive value delivery method and device applied to thread-level speculative parallelism - Google Patents

Restrictive value delivery method and device applied to thread-level speculative parallelism Download PDF

Info

Publication number
CN102681890A
CN102681890A CN2012101330669A CN201210133066A CN102681890A CN 102681890 A CN102681890 A CN 102681890A CN 2012101330669 A CN2012101330669 A CN 2012101330669A CN 201210133066 A CN201210133066 A CN 201210133066A CN 102681890 A CN102681890 A CN 102681890A
Authority
CN
China
Prior art keywords
thread
data
priority
value
affairs
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2012101330669A
Other languages
Chinese (zh)
Other versions
CN102681890B (en
Inventor
安虹
邓博斌
李颀
李功明
毛梦捷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Science and Technology of China USTC
Original Assignee
University of Science and Technology of China USTC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Science and Technology of China USTC filed Critical University of Science and Technology of China USTC
Priority to CN201210133066.9A priority Critical patent/CN102681890B/en
Publication of CN102681890A publication Critical patent/CN102681890A/en
Application granted granted Critical
Publication of CN102681890B publication Critical patent/CN102681890B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention provides a restrictive value delivery method and device applied to thread-level speculative parallelism, and when a conflict happens, the total execution time of a system can be reduced in a method of value delivery, i.e. a conflict thread can receive needed data only when a specific condition is met, and otherwise execution is conducted only according to the mode of the original system. The method disclosed by the invention is a lightweight class value delivery method, and compared with a complete value delivery and value prediction method, the method has the advantage of low hardware and protocol complexity, but under an ordinary condition, the performance is inferior to that of complete value delivery and value prediction. Through experimental data analysis, compared with a value prediction model, a restrictive value delivery model does not have large performance loss. The device is realized and verified on a LogSPoTM model, and is also suitable for other thread-level systems.

Description

A kind of thread-level that is applied to is inferred parallel restricted value transmission method and device
Technical field
The invention belongs to computer microprocessor structural design field, the particularly a kind of lightweight value transmission method and device that can improve the multi-threaded system performance effectively.
Background technology
Speculative multithreading technology and affairs memory technology
Along with multi core chip (Chip Multi-Processor; CMP) arrival in epoch; The serial program threading that how will be difficult to parallelization traditionally is to quicken single program implementation; Also for but increasing computation core on the sheet provides the calculation task of more executed in parallel to improve the utilization factor of resource on the sheet, become the common hot research problem of paying close attention to of academia and industry member simultaneously.
In order to develop available Thread-Level Parallelism property on the more multi core chip; Solve that the concurrent program correctness is safeguarded the complicacy brought to multiple programming and to the restriction problem of performance; Academia has proposed thread-level supposition (Thread-Level Speculation respectively from different angles; TLS) and affairs storages (Transactional Memory, TM) two kinds of technology.TLS is intended to break cross-thread and relies on the restriction that thread parallel is carried out, and increases the chance that program parallelization is carried out.When compiler or programmer can't confirm dependence complete between the thread candidate; It is machine-processed need not adopt conservative strategy to abandon synchronous protection parallel or that add bulk redundancy; Can ignore the direct parallelization of the dependence that possibly exist; The maintenance of serial semantics supports to infer that by when operation the hardware mechanisms of carrying out guarantees, thereby has the technological potentiality of the concurrency in the excavation program farthest.TM is intended to seek replacement scheme for explicit lock synchronization mechanism, through the implicit expression synchronization mechanism that runtime system provides, realizes not having the shared storage programming of lock.Owing to need not ask lock and release, so this also is a kind of method of synchronization of unblock, both solved correctness problems such as the deadlock that exists in the lock mechanism, pirority inversion, also solved the influence that lock granularity possibly cause performance.TM is automatically safeguarded from the concurrent operations of a plurality of threads or the performance element semantic consistency when sharing the storage organization status modifier by system.The common ground of TLS and TM is to reduce the difficulty of multiple programming, increases the chance that thread parallel is carried out; The requirement of similar support presumed access, metadata cache, conflict cancellation has also been proposed at hardware aspect.
In order to enlarge this two kinds of The Application of Technology scopes, researchers have proposed the mixture model of some TM and TLS, and the TCC of Stanford University is wherein more typically arranged, the Bulk in champagne branch school, Illinois, the LogSPoTM that also has China Science & Technology University to propose.Wherein, LogSPoTM is based on and realizes on the basis of LogTM, has increased the semantic support to TLS for original TM system.The execution pattern of LogSPoTM is divided into parallel (gray line) and two parts of serial (black line) shown in accompanying drawing 1.For the serial part, in order to guarantee program correctness, thread must be submitted to according to priority from high to low according to the order of sequence.LogSPoTM is based on the structure (TCC and Bulk are based on bus interconnection) of network interconnection, and it is better to expand putting property.Aspect the system hardware complexity, LogSPoTM is more moderate, is fit to further expanding of systemic-function.
The existing problems of speculative multithreading technology and affairs memory technology
As above-mentioned, when the dependence of the data collision between program ratio was lower, TLS and TM technology can both obtain reasonable performance on multi-core platform.But when existing more data collision to rely in the program, it is very bad that system performance can become.In view of this, the procedural application scope of TLS and TM system is more limited, and this also is why general processor manufacturer does not adopt these two kinds of main causes that technology is carried out actual production always.
In order to alleviate this problem, there are some scholars to propose some value transmission methods and quicken to conflict and rely on many parallel multithread programs, rely on affairs memory technologies (DATM) such as consciousness.This is a kind of radical value transmission technology; That is to say that as long as affairs have triggered the data dependence, it just can receive needed data and continue and carry out from last affairs.Though DATM obtains reasonable speed-up ratio in some test procedures, its practicality is also bad.At first, DATM is based on that the bus consistency protocol realizes, can not directly be suitable for for expanding the reasonable catalogue consistency protocol of putting property, need carry out that a large amount of agreements is improved and checking.Secondly, DATM has carried out large-scale modification to MSI bus consistency protocol, having only 3 status state machine to increase sharply to 13 states originally, and changes the mechanism with complex state.If want intactly to be transplanted to this thought in the general bibliographic structure, it is more complicated more than bus structure that state transition graph will become, and this system in reality can not realize.At last, DATM has added bigger hardware costs on general affair storage system, and this also is that processor manufacturer is unwilling to accept.
The present invention is to be target to improve system performance, relies on when serious in the routine data conflict, comes the contradiction of mitigation performance difference through the method for restricted value transmission.In improved process, also want emphasis to consider the problem of hardware costs, remove to revise hardware configuration as far as possible less.The false sharing problem of some TLS&TM mixture models (like LogSPoTM) can also be solved in addition by value transmission technology, system performance can be further improved.
Summary of the invention
Key problem of the present invention is to adopt few hardware costs of trying one's best to alleviate TM and TLS performance tradeoff, so we have proposed a kind of parallel restricted value transmission method and device of thread-level supposition that be applied to.It is a kind of compromise of primal system and radical value transmission system.From the empirical average result, restricted value transmission method also can be obtained more attracting performance, and hardware complexity is relatively low, and consistency protocol is changed little, has good transplantability.We are in the LogSPoTM model this restricted value transmission technology that has been Platform Implementation.We choose the LogSPoTM system mainly is because its hardware complexity is lower than existing other TM&TLS commingled systems, and is based on bibliographic structure, has to expand putting property preferably.But restricted value transmission technology is equally applicable to other TM and TLS system.
LogSPoTM supports TM and two kinds of semantemes of TLS, and in this manual, we only explain with TLS as an example.Rely on the performance loss that is brought in order to reduce data collision, we can adopt the added value pass through mechanism to alleviate.In the accompanying drawing 2 (a) is the conflict dependence processing mode of original LogSPoTM model, and when conflict took place, the thread that priority is low only could continue to carry out after waiting for the high thread submission completion of priority.The method that Fig. 2 (b) has adopted value to transmit; When the data collision dependence took place, the thread that priority is high can send needed data to the low thread of priority, let the low follow-up thread of priority continue to carry out; And submit to according to the order of sequence, thereby save total system's execution time.If but send thread the data of transmitting are revised, must in time be notified and accept thread.In order to guarantee the correctness of system, receiving thread must rollback and execution again.
Shortcoming about radical value transmission (DATM) above-mentioned is so the present invention proposes the method that a kind of restricted value is transmitted.Accompanying drawing 3 (b) has provided the execution pattern that restricted value is transmitted.The priority of thread is following among the figure: T1>T2>T3.Shared data X is occupied by T1.As T2 during to the T1 request msg X that also do not have to submit to, T1 can transmit this data to T2, lets unlike equally direct transmissions of original LogSPoTM NACK T2 is empty to wait for that at this moment T2 can continue execution.As T3 during also to T1 request msg X, restricted value transmission system can be the same with original LogSPoTM, lets T1 send NACK to T3, and unlike radical value transmission system that kind to T3 transmission data X.That is to say that system be as long as send corresponding data just can for the thread of request satisfying under certain agreed terms, otherwise only can carry out according to original LogSPoTM pattern.Restricted value transmission technology must satisfy following rule simultaneously and just allow Data transmission:
(1) data owner's thread must not send any data to other threads.
(2) data owner's thread must not receive the data that other any threads send over.
(3) data owner's thread sends data (if thread is orderly) only can for its low thread of priority ratio.
Transmit technological hardware costs for restricted value, we have only done micro-modification in original LogSPoTM system.We have increased by two registers group for each processor: data transmitter register group and data are accepted registers group.According to data limit property delivery rules above-mentioned, each two registers group above the processor can not be used simultaneously.Accompanying drawing 4 has provided concrete explanation of field.RID and SID have write down processor numbering that receives data and the processor numbering of sending data respectively.What ADD preserved is the address of this Data transmission.What DATA preserved is the value of these data.What be worth explanation is, receiving thread receives after the data, and the retouching operation of these data is only carried out in receiving register, when this thread is submitted to, just can write back among the cache at last.BITS is with solving false sharing problem, and the byte number that the figure place of BITS and cache are capable is identical and corresponding one by one.False sharing problem is because the institutional framework of cache and the acting in conjunction of collision detection mechanism cause.In the computer system of current main-stream, the cache structure is all organized with behavior unit, i.e. the corresponding a plurality of continuous address of each row.And when LogSPoTM carries out the collision detection of data, detect with cache behavior unit.That is to say that if two thread accesses is the capable different address locations of same cache, system also can handle it as data collision, though in fact do not send conflict, Here it is so-called false sharing problem.In some TM&TLS commingled systems (like TCC), false sharing problem has obtained reasonable solution.But these methods also are not suitable for the LogSPoTM model, and false sharing problem never is well solved in LogSPoTM, and becomes bottleneck of performance many times.The present invention is through having introduced the value pass through mechanism; Only need to increase a BITS registers group and just can alleviate the false performance loss that brings of sharing largely; Further improve system performance; Through the test of 8 benchmark programs, on average, in the configuration of 4 threads, obtained 25.8% performance boost.
In short, the present invention is the method and system that a kind of restricted value is transmitted, and infers that with things storage and thread-level mixture model is the basis.Between thread, clash when relying on, the thread that priority is high can send data to the low thread of priority under situation about satisfying condition, let it stop to wait for, continues executive routine.If do not satisfy the transmission condition, just the same with general hypothetical system, the thread that priority is low can be waited for.In order to guarantee the correctness of program structure, also set up the authentication mechanism of cover transmission data through sending the method for revising data.
The present invention proposes a kind of thread-level that is applied to and infers parallel restricted value transfer device; Comprise on-chip multi-processor, affairs memory function parts, value transferring elements; Also comprise and support to infer the processor core of carrying out; Increase the cache controller of timestamp, increased the L1 data cache that reads and writes the position, the L2cache that has increased the read-write position transmits normal data transmitter register group and the Data Receiving registers group of carrying out with assurance.
The present invention also proposes a kind of device according to claim 1 and carries out restricted value transmission method, may further comprise the steps:
Step 1, system detects data collision, and the thread that priority is high detects the condition that whether satisfies Data transmission, just satisfies and sends data to the low thread of priority, otherwise only send Nack message, and the low thread of priority this moment can only be waited for;
Step 2, the thread that priority is low are received after the data that pass over, and can leave it in the receive data register in, and low priority thread continues executive routine with these data then; As long as this low priority thread does not take place to submit to or rollback, all directly on receive data register, operate when having access to this colliding data piece later on;
Step 3, if the higher transmission thread of priority has rewritten the data block of sending, this high priority thread will send to receiving thread to the part of revising; Receiving thread can be verified, looks at whether the part of revising was used; If used, will carry out rollback operation.If do not use, just carry out the data fusion operation.
Advantage of the present invention and good effect mainly show:
Alleviate TM or TLS system through restricted value transmission method and rely on the contradiction of poor performance when serious, reduce the situation of system performance performance extreme difference effectively at data collision.
Solve the false sharing problem of LogSPoTM effectively, further improve system performance, and expand the range of application of multi-threaded systems such as LogSPoTM.
Hardware complexity is lower, and consistency protocol is revised little, has good portability.Increase description of drawings here
Description of drawings
Fig. 1 .LogSPoTM execution pattern synoptic diagram;
Fig. 2. it is the thread process method after clashing in the original LogSPoTM system that value is delivered in application model figure (a) among the LogSPoTM; Figure (b) is the thread process method of added value pass through mechanism;
Fig. 3. the restricted value in the multi-thread environment is transmitted LogSPoTM model execution pattern, and figure (a) is the disposal route of LogSPoTM under the same data block situation of multithreading request; Figure (b) is the disposal route behind the added value pass through mechanism;
Fig. 4. the hardware that restricted value TRANSFER MODEL is added on original LogSPoTM model basis;
Fig. 5. an implementation that has the false program of sharing at restricted value transmission LogSPoTM model, figure (a) is two and has the false thread code of sharing; Figure (b) is the situation of change that restricted value is transmitted LogSPoTM system registers group when execution graph (a) code.
Embodiment
With having the situation of change that the false usability of program fragments of sharing of data is showed the value of two groups of registers that add through one, the technological concrete course of work in LogSPoTM of restricted value transmissions is described below with this.Accompanying drawing 5 (a) has provided two affairs, and they carry out corresponding separately code.We suppose that the priority of affairs 1 is higher than affairs 2, and each cache is capable to have 4 bytes, that is to say that it is capable that 0 to No. 3 address of address space belongs to same cache.Like this, affairs 1 and affairs 2 just might trigger when collision detection conflicts, though in fact they do not have data and rely on.
Accompanying drawing 5 (b) has provided the execution change procedure of affairs.The data transmitter register group on the left side is from the processor core at affairs 1 place, and the Data Receiving register on the right then comes the processor core of self-operating affairs 2.Execution in step is following:
(1) transmitter register group and receiving register group all are initial values, and affairs 1 are all carried out the affairs sign on affairs 2.
(2) write operation has been carried out in 1 pair of address of affairs 0.Current write operation is directly to cache operation, not other transactions requests data.So the transmitting data register group of affairs 1 does not change.
(3) write operation is carried out in 2 pairs of addresses of affairs 1.Because collision detection mechanism and cache institutional framework, system triggers data collision.This situation is to meet the condition that restricted value is transmitted, so affairs 1 send data (whole cache is capable) for affairs 2.Concrete operations be that affairs 1 copy data the transmitting data register group to from cache, carry out relevant information record.And then send to affairs 2; Affairs 2 are carried out record at receive data register after receiving data; And write down the capable operative position of 2 couples of cache of affairs (read or write all can record, and write operation has been carried out in 2 pairs of addresses of affairs 1 in this example, so second of BITS becomes 1 from 0) in the BITS position.Continue executive routine then.
(4) write operation is carried out in 1 pair of address of affairs 2.Because the cache at 2 places, address is capable identical with the cache that is kept at the transmitter register group; So system is when the cache to affairs 1 carries out write operation; Also to carry out write operation to transmitting data register; And a data division (part of modification, it is capable to be not necessarily complete cache) of revising sends to affairs 2.
(5) affairs 2 receive and revise after the data message, detect the BITS position, and the data oneself that discovery affairs 1 send over were not used, so only need merge the modification data that receive, need not to carry out rollback operation, have avoided false sharing problem effectively.If but oneself had used the data that rigidly connect the modification of receiving, genuine data collision has just taken place in that, at this moment in order to guarantee the correctness of procedure result, just must carry out rollback operation.At this time affairs 2 have executed whole operations, but because the higher also not submission of affairs 1 of priority, so can only wait for.
(6) affairs 1 complete and submission empties the transmitting data register group.Affairs 2 just allow to submit to then, and write back the value of receiving register among the cache, empty the receive data register group at last.
Can find out that through above example the present invention not only gives the advantage that value transmission is provided between the conflict thread, and can also the most false sharing problem of resolution system, bigger performance boost space had.

Claims (2)

1. one kind is applied to the parallel restricted value transfer device of thread-level supposition; Comprise on-chip multi-processor, affairs memory function parts, value transferring elements; It is characterized in that: also comprise and support to infer the processor core of carrying out; Increase the cache controller of timestamp, increased the L1 data cache that reads and writes the position, the L2cache that has increased the read-write position transmits normal data transmitter register group and the Data Receiving registers group of carrying out with assurance.
2. the device according to claim 1 carries out restricted value transmission method, it is characterized in that may further comprise the steps:
Step 1, system detects data collision, and the thread that priority is high detects the condition that whether satisfies Data transmission, just satisfies and sends data to the low thread of priority, otherwise only send Nack message, and the low thread of priority this moment can only be waited for;
Step 2, the thread that priority is low are received after the data that pass over, and can leave it in the receive data register in, and low priority thread continues executive routine with these data then; As long as this low priority thread does not take place to submit to or rollback, all directly on receive data register, operate when having access to this colliding data piece later on;
Step 3, if the higher transmission thread of priority has rewritten the data block of sending, this high priority thread will send to receiving thread to the part of revising; Receiving thread can be verified, looks at whether the part of revising was used; If used, will carry out rollback operation.If do not use, just carry out the data fusion operation.
CN201210133066.9A 2012-04-28 2012-04-28 A kind of thread-level that is applied to infers parallel restricted value transmit method and apparatus Active CN102681890B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210133066.9A CN102681890B (en) 2012-04-28 2012-04-28 A kind of thread-level that is applied to infers parallel restricted value transmit method and apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210133066.9A CN102681890B (en) 2012-04-28 2012-04-28 A kind of thread-level that is applied to infers parallel restricted value transmit method and apparatus

Publications (2)

Publication Number Publication Date
CN102681890A true CN102681890A (en) 2012-09-19
CN102681890B CN102681890B (en) 2015-09-09

Family

ID=46813859

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210133066.9A Active CN102681890B (en) 2012-04-28 2012-04-28 A kind of thread-level that is applied to infers parallel restricted value transmit method and apparatus

Country Status (1)

Country Link
CN (1) CN102681890B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103995689A (en) * 2013-02-16 2014-08-20 长沙中兴软创软件有限公司 Flow line parallel computing method achieving distribution of received information
CN104156226A (en) * 2013-05-15 2014-11-19 索尼公司 Pending or shutdown method for hybrid memory device
CN106255958A (en) * 2014-04-25 2016-12-21 索尼公司 Memory-efficient thread-level speculates
CN109471732A (en) * 2018-11-22 2019-03-15 山东大学 A kind of data distributing method towards CPU-FPGA heterogeneous multi-core system
CN110569067A (en) * 2019-08-12 2019-12-13 阿里巴巴集团控股有限公司 Method, device and system for multithread processing
CN110727465A (en) * 2019-09-11 2020-01-24 无锡江南计算技术研究所 Protocol reconfigurable consistency implementation method based on configuration lookup table
US11216278B2 (en) 2019-08-12 2022-01-04 Advanced New Technologies Co., Ltd. Multi-thread processing

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
RUI GUO, ET AL.: "LogSPoTM: a Scalable Thread Level Speculation Model Based on Transactional Memory", 《COMPUTER SYSTEMS ARCHITECTURE CONFERENCE》 *
SALIL PANT, GREGORY T.BYRD: "Limited Early Value Communication to Improve Performance of Transactional Memory", 《PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON SUPERCOMPUTING》 *
WENBO DAI, ET AL.: "A Priority-aware NoC to Reduce Squashes in Thread Level Speculation for Chip Multiprocessors", 《PARALLEL AND DISTRIBUTED PROCESSING WITH APPLICATIONS》 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103995689A (en) * 2013-02-16 2014-08-20 长沙中兴软创软件有限公司 Flow line parallel computing method achieving distribution of received information
CN104156226A (en) * 2013-05-15 2014-11-19 索尼公司 Pending or shutdown method for hybrid memory device
CN104156226B (en) * 2013-05-15 2019-01-15 索尼公司 Mix hang-up or the closedown method of memory device
CN106255958A (en) * 2014-04-25 2016-12-21 索尼公司 Memory-efficient thread-level speculates
CN106255958B (en) * 2014-04-25 2019-08-02 索尼公司 Method and apparatus for executing program code
CN109471732A (en) * 2018-11-22 2019-03-15 山东大学 A kind of data distributing method towards CPU-FPGA heterogeneous multi-core system
CN110569067A (en) * 2019-08-12 2019-12-13 阿里巴巴集团控股有限公司 Method, device and system for multithread processing
CN110569067B (en) * 2019-08-12 2021-07-13 创新先进技术有限公司 Method, device and system for multithread processing
US11216278B2 (en) 2019-08-12 2022-01-04 Advanced New Technologies Co., Ltd. Multi-thread processing
CN110727465A (en) * 2019-09-11 2020-01-24 无锡江南计算技术研究所 Protocol reconfigurable consistency implementation method based on configuration lookup table
CN110727465B (en) * 2019-09-11 2021-08-10 无锡江南计算技术研究所 Protocol reconfigurable consistency implementation method based on configuration lookup table

Also Published As

Publication number Publication date
CN102681890B (en) 2015-09-09

Similar Documents

Publication Publication Date Title
Boroumand et al. CoNDA: Efficient cache coherence support for near-data accelerators
Tomić et al. EazyHTM: Eager-lazy hardware transactional memory
CN104375958B (en) cache memory management transactional memory access request
CN102681890B (en) A kind of thread-level that is applied to infers parallel restricted value transmit method and apparatus
US8417897B2 (en) System and method for providing locale-based optimizations in a transactional memory
Ceze et al. BulkSC: Bulk enforcement of sequential consistency
US8997103B2 (en) N-way memory barrier operation coalescing
CN105612502B (en) Virtually retry queue
US9792147B2 (en) Transactional storage accesses supporting differing priority levels
CN104487946A (en) Method, apparatus, and system for adaptive thread scheduling in transactional memory systems
US8051250B2 (en) Systems and methods for pushing data
JP2001236221A (en) Pipe line parallel processor using multi-thread
JP2003030050A (en) Method for executing multi-thread and parallel processor system
US10031697B2 (en) Random-access disjoint concurrent sparse writes to heterogeneous buffers
KR101804677B1 (en) Hardware apparatuses and methods to perform transactional power management
US11797474B2 (en) High performance processor
Kubiatowicz et al. Closing the window of vulnerability in multiphase memory transactions
CN102110019B (en) Transactional memory method based on multi-core processor and partition structure
CN101719116B (en) Method and system for realizing transaction memory access mechanism based on exception handling
CN103019655B (en) Towards memory copying accelerated method and the device of multi-core microprocessor
CN101872299A (en) Conflict prediction realizing method and conflict prediction processing device used by transaction memory
CN112527729A (en) Tightly-coupled heterogeneous multi-core processor architecture and processing method thereof
US9946665B2 (en) Fetch less instruction processing (FLIP) computer architecture for central processing units (CPU)
CN101533363B (en) Pre-retire and post-retire mixed hardware locking ellipsis (HLE) scheme
JP5967646B2 (en) Cashless multiprocessor with registerless architecture

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant