CN109582474A - A kind of novel cache optimization multithreading Deterministic Methods - Google Patents

A kind of novel cache optimization multithreading Deterministic Methods Download PDF

Info

Publication number
CN109582474A
CN109582474A CN201811298378.9A CN201811298378A CN109582474A CN 109582474 A CN109582474 A CN 109582474A CN 201811298378 A CN201811298378 A CN 201811298378A CN 109582474 A CN109582474 A CN 109582474A
Authority
CN
China
Prior art keywords
thread
certainty
multithreading
parallel
memory
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811298378.9A
Other languages
Chinese (zh)
Inventor
王开宇
季振洲
吴倩倩
张源悍
王楷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Institute of Technology
Original Assignee
Harbin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Institute of Technology filed Critical Harbin Institute of Technology
Priority to CN201811298378.9A priority Critical patent/CN109582474A/en
Publication of CN109582474A publication Critical patent/CN109582474A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/546Message passing systems or structures, e.g. queues

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

The invention proposes the multithreading Deterministic Methods based on cache optimization, including thread certainty construction module, multi-threading correspondence isolation module, atomic transaction divided stages module, thread synchronization strategy study module and LIRS cache optimization.The thread certainty that the present invention can be used for support mission in multi-threaded system executes, and optimizes Deterministic Methods bring overhead, prevents because of the synchronous competition of thread uncertainty operation bring and data contention.The thread operation phase is divided as unit of affairs, parallel thread parallel executes, and thread communication is isolated, and setting fence carries out global synchronization.Serial stage thread obtains token by certainty sequence, and submission is successively executed into memory, carries out deterministic schedule.Because communicating isolation between thread, caching becomes the shared storage of most final stage, therefore using the LIRS cache replacement algorithm optimization system performance of multithreading is more suitable, guarantees that multithreading operation is deterministic while reducing overhead with this.

Description

A kind of novel cache optimization multithreading Deterministic Methods
Technical field
The present invention is applied to the guarantee thread under multi-thread environment and runs certainty.
Background technique
With the development of microelectric technique, chip multi-core processor has become the computing platform and research heat of current mainstream Point.Compared with previous single core processor, multi-core processor achieves explosive promotion, traditional serial journey in hardware performance Sequence cannot play its performance, and multiple programming is only and gives full play to the key of its multicore performance, is that mainstream applications can be allowed from more The unique programming mode benefited in core cpu performance.
It is supported in spite of java standard library, but compares conventional serial program, concurrent program is bringing promotion to calculated performance Meanwhile challenge also is brought to the exploitation of program and maintenance.Concurrent program usually completes one by multiple parallel execution individual collaborations A task, therefore the relationship that there is competition, interference between individual is executed, result in the uncertainty of concurrent program, i.e. journey Sequence is run multiple times under identical input may generate different results.It is this uncertain in many aspects to concurrent program Bring new challenge.Currently, certainty technology is considered as coping with the key technology of this challenge.It is parallel that there are mainly two types of shapes Formula, one is multi-threaded parallel, i.e., shared drive between each parallel individual;Another kind is that multi-process is parallel, each individual Between not shared drive, but communicated by other means.The purpose of certainty technology is eliminated by parallel caused not true It is qualitative, the development and maintenance cost of concurrent program is reduced, the reliability of concurrent program is improved.
Summary of the invention
Technical problems based on background technology, the invention proposes a kind of cache optimization multithreading Deterministic Methods.
A kind of cache optimization multithreading Deterministic Methods proposed by the present invention, the system comprises thread certainty structure moulds Block, multi-threading correspondence isolation module, atomic transaction divided stages module, thread synchronization strategy study module and LIRS caching replace Scaling method optimization
Preferably, thread certainty construction module guarantees that thread runs certainty for thread operation rule to be arranged.
Preferably, multi-threading correspondence isolation module prevents number for thread to be isolated in the communication interaction of parallel According to competition.
Preferably, atomic transaction divided stages module is for dividing the thread operation phase.
Preferably, thread synchronization strategy study module, for ensure thread the operation phase conversion when, it is suitable according to certainty Sequence obtains token, avoids the occurrence of synchronous competition.
Preferably, LIRS cache replacement algorithm optimizes the performance cost for optimizing deterministic system.
In the present invention, thread synchronization policy module is that thread establishes connection between serial stage and parallel operation, Thread operation setting transaction concepts of the present invention, thread operation is divided into serial stage and parallel two parts in a wheel affairs. All threads enter the serial stage after terminating parallel and reaching synchronous point, according to the sequence for obtaining token, and thread is serial After stage executes, new round affairs are opened after all threads terminate the serial stage in synchronous point obstruction and are executed.Passing through will The operation and submission data separating of thread, guarantee that thread guarantees multithreading operation by the certainty of the sequence of synchronous point really It is qualitative.And by using the cache replacement algorithm lifting system performance for being more suitable for multi-thread environment.
Detailed description of the invention
Fig. 1 is thread operation phase schematic diagram of the invention.
Fig. 2 is that certainty sequence of the invention submits schematic diagram.
Fig. 3 is overall operation flow diagram of the invention.
Specific embodiment
Combined with specific embodiments below the present invention is made further to explain.
Embodiment
It is thread operation setting fence in parallel, each parallel can only allow thread to execute with reference to Fig. 1 A certain number of instructions, thread is blocked by fence after execution, other threads is waited to enter simultaneously operating.
In the serial stage, thread obtains token according to network topology in synchronous point, and application locks memory, then mentions Hand over implementing result.Obtaining token and application locking is all mutual exclusion behavior, and per thread can only execute once in each round.Thread Blocked after having executed submission operation by fence, waits subsequent thread to enter the serial stage, when whole threads all terminate serially After stage, the respective privately owned page is submitted in the shared page by thread, is compared with the shared page, after obtaining epicycle execution Newest shared data starts to prepare for next round parallel.Token passing uses circle queue, and all threads are according to ID Sequence enters queue, successively obtains token and obtains the lock resource of read-write shared data permission, starts the read-write operation with memory.
With reference to Fig. 2, because having blocked the exchange between thread, guarantee that thread exists using the mode for establishing shared copy When each transaction phase starts, identical data are possessed in the privately owned page, guarantee data consistency.It is each in parallel Thread starts to execute according to the shared data of last round of copy, executes parallel between each thread, without data exchange.Work as whole After thread is blocked in synchronous point by fence, thread is successively submitted to the shared page according to the sequence for obtaining token, due to Weak deterministic guarantees thread synchronization sequence, thus the content of the shared page be it is deterministic, terminate when the serial stage, all After thread has been carried out submission operation, the privately owned page is submitted to by each thread carries out byte-by-byte comparison acquisition in the shared page Modification information.The privately owned page of per thread is actually equivalent to the local replica of the shared page, and the operation of thread is first submitted It to local replica, is submitted in the shared page, guarantees in the shared page according still further to certainty synchronizing sequence when reaching synchronous point Data have certainty.
With reference to Fig. 3, in each round affairs, multithread programs, which first pass through, guarantees program by the serial stage after parallel It is anti-to eliminate multithread programs using LIRS cache optimization when inwardly depositing into row read-write requests in the serial stage for the certainty of operation It is re-reading write caused by the high situation of cache invalidation rate, lifting system performance, therefore have compared to general Deterministic Methods smaller Overhead.
The foregoing is only a preferred embodiment of the present invention, but scope of protection of the present invention is not limited thereto, Anyone skilled in the art in the technical scope disclosed by the present invention, according to the technique and scheme of the present invention and its Inventive concept is subject to equivalent substitution or change, should be covered by the protection scope of the present invention.

Claims (6)

1. a kind of cache optimization multithreading deterministic system, it is characterised in that: can guarantee under multi-thread environment, thread is according to true Qualitative sequence operation avoids the data contention generated due to thread memory access conflict from generating with competition by synchronous dot sequency same Step competition, while using the LIRS cache replacement algorithm optimization system performance of multi-thread environment is more suitable, reduce certainty band The expense come.The system comprises thread certainty construction module, multi-threading correspondence isolation module, atomic transaction divided stages moulds Block, thread synchronization strategy study module and LIRS cache optimization.
Thread certainty construction module, for thread operation rule to be arranged in multithreading certainty, in the serial stage and parallel Fence is set between the stage, thread is forced to synchronize;
Multi-threading correspondence isolation module, for prevented in parallel that thread is run data communication between thread and thread with Communication behavior between memory, prevents thread from accessing memory in a manner of uncertainty in parallel, so that it is competing to generate data It strives.
Atomic transaction divided stages module, for dividing the thread operation phase, by the way that the thread operation phase is divided into simultaneously row order Section and serial stage, the degree of parallelism of lifting system operation.Wherein parallel thread parallel executes, and blocks at fence and wait It is synchronous, token then is obtained according to certainty sequence and successively starts the serial stage, executes the interactive task with memory.
Thread synchronization strategy study module, for guarantee the thread at synchronous place according to the token passing in synchronization policy sequentially, according to Secondary acquisition token starts serial stage execution, avoids competing the synchronous competition occurred by synchronous point due to thread.
The optimization of LIRS cache replacement algorithm, is used for optimization system performance, reduces cache invalidation caused by multithreading is frequently read and write, changes Kind system is as guaranteeing overhead caused by certainty.
2. being applied to cache optimization multithreading deterministic system according to 1.It is characterized in that, the thread certainty structure Module;
Since in Posix thread multi-thread environment, thread structure can be divided into thread control, lock construction and thread in system In program, there is no control thread operating status structure, therefore thread the operation phase convert between need that fence is arranged It forces thread to synchronize, while ensuring the creation of thread, kill is all completed under the premise of obtaining token, the method includes Following steps:
Step 1: the fence of setting thread operation phase transfer point;
Step 2: guaranteeing that thread obtains token and obtains the consistency of lock resource;
Step 3: thread carries out creation sub thread and shutoff operation after obtaining token;
Step 4: thread starts the serial stage after parallel, according to the sequence for obtaining token;
Step 5: for thread after the serial stage, all threads reach synchronous point, start next round affairs and execute.
3. the cache optimization multithreading deterministic system system according to 1.It is characterized in that isolation thread is in parallel It interactively communicates.
The main reason for multithreading generation data contention, is sequence when thread contention access same memory address, so if The sequence that can control thread accesses memory can avoid the occurrence of data contention.This method is by isolation thread in parallel Interactively communicate, avoid the data exchange between thread between thread and memory, the data that this stage thread executes come from it is upper The data exchange of thread and memory is postponed till string by the memory copying of each thread, this method after the wheel affairs serial stage Row order section executes, and in serial stage thread according to the certainty sequential access memory for obtaining token, avoids the occurrence of data contention guarantor The certainty of thread operation is demonstrate,proved.
4. being applied to cache optimization multithreading deterministic system service system according to 1.It is characterized in that dividing thread fortune The operation phase of thread is divided into serial and parallel two parts by row order section, is thread operation setting fence in parallel, often One parallel can only allow thread to execute a certain number of instructions, and thread is blocked by fence after execution, waits it He enters simultaneously operating by thread.
In the serial stage, thread obtains token according to network topology in synchronous point, and application locks memory, then submits and holds Row result.Obtaining token and application locking is all mutual exclusion behavior, and per thread can only execute once in each round.Thread is being held It has gone after submission operates and has been blocked by fence, waited subsequent thread to enter the serial stage, when whole threads all terminate the serial stage Afterwards, the respective privately owned page is submitted in the shared page by thread, is compared with the shared page, is obtained newest after epicycle executes Shared data, start to prepare for next round parallel.
5. being applied to cache optimization multithreading deterministic system service system according to 1.It is characterized in that being passed by token It passs mechanism and guarantees that thread enters the serial stage according to certainty sequence.
Occur data contention in order to prevent, do not allow thread inwardly to deposit in parallel and be written and read, therefore thread enters The sequence in serial stage just determines the sequence of thread read/write memory, and sequentially there is certainty to ensure that multithreading will not go out for this Existing data contention.Thread obtains token according to Thread Id sequence in synchronous point, ensure that thread will not be because of competition in synchronous point There is the case where synchronous competition and deadlock by the sequence of fence, avoids the synchronous competition of appearance, guarantee multithreading operation Certainty.
6. being applied to cache optimization multithreading deterministic system service system according to 1.It is characterized in that having used LIRS The lru algorithm of cache replacement algorithm substitution script.Due to defect of the LRU algorithm in structure, prevent the algorithm is from good The demand read and write repeatedly is adapted to, therefore is showed under multi-thread environment poor.This method is using being more suitable multi-thread environment LIRS algorithm replace lru algorithm, improve because multithread programs read and write repeatedly caused by the low problem of cache hit rate, improvement by The overhead caused by setting Deterministic rules, so that this system has better performance index and widely applies ring Border.
CN201811298378.9A 2018-11-02 2018-11-02 A kind of novel cache optimization multithreading Deterministic Methods Pending CN109582474A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811298378.9A CN109582474A (en) 2018-11-02 2018-11-02 A kind of novel cache optimization multithreading Deterministic Methods

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811298378.9A CN109582474A (en) 2018-11-02 2018-11-02 A kind of novel cache optimization multithreading Deterministic Methods

Publications (1)

Publication Number Publication Date
CN109582474A true CN109582474A (en) 2019-04-05

Family

ID=65921167

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811298378.9A Pending CN109582474A (en) 2018-11-02 2018-11-02 A kind of novel cache optimization multithreading Deterministic Methods

Country Status (1)

Country Link
CN (1) CN109582474A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110083445A (en) * 2019-04-21 2019-08-02 哈尔滨工业大学 A kind of multithreading certainty execution method based on weak memory consistency
CN110096378A (en) * 2019-04-29 2019-08-06 杭州涂鸦信息技术有限公司 A kind of inter-thread communication method and relevant apparatus
CN112528583A (en) * 2020-12-18 2021-03-19 广东高云半导体科技股份有限公司 Multithreading comprehensive method and comprehensive system for FPGA development

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6854108B1 (en) * 2000-05-11 2005-02-08 International Business Machines Corporation Method and apparatus for deterministic replay of java multithreaded programs on multiprocessors
US20090235262A1 (en) * 2008-03-11 2009-09-17 University Of Washington Efficient deterministic multiprocessing
US20100023674A1 (en) * 2008-07-28 2010-01-28 Aviles Joaquin J Flash DIMM in a Standalone Cache Appliance System and Methodology
CN106104481A (en) * 2014-02-06 2016-11-09 优创半导体科技有限公司 Certainty and opportunistic multithreading
CN107704324A (en) * 2017-07-20 2018-02-16 哈尔滨工业大学(威海) It is a kind of towards the deterministic hardware based internal memory partition method of multinuclear
CN109471734A (en) * 2018-10-27 2019-03-15 哈尔滨工业大学(威海) A kind of novel cache optimization multithreading Deterministic Methods

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6854108B1 (en) * 2000-05-11 2005-02-08 International Business Machines Corporation Method and apparatus for deterministic replay of java multithreaded programs on multiprocessors
US20090235262A1 (en) * 2008-03-11 2009-09-17 University Of Washington Efficient deterministic multiprocessing
US20100023674A1 (en) * 2008-07-28 2010-01-28 Aviles Joaquin J Flash DIMM in a Standalone Cache Appliance System and Methodology
CN106104481A (en) * 2014-02-06 2016-11-09 优创半导体科技有限公司 Certainty and opportunistic multithreading
CN107704324A (en) * 2017-07-20 2018-02-16 哈尔滨工业大学(威海) It is a kind of towards the deterministic hardware based internal memory partition method of multinuclear
CN109471734A (en) * 2018-10-27 2019-03-15 哈尔滨工业大学(威海) A kind of novel cache optimization multithreading Deterministic Methods

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110083445A (en) * 2019-04-21 2019-08-02 哈尔滨工业大学 A kind of multithreading certainty execution method based on weak memory consistency
CN110083445B (en) * 2019-04-21 2023-04-25 哈尔滨工业大学 Multithreading deterministic execution method based on weak memory consistency
CN110096378A (en) * 2019-04-29 2019-08-06 杭州涂鸦信息技术有限公司 A kind of inter-thread communication method and relevant apparatus
CN112528583A (en) * 2020-12-18 2021-03-19 广东高云半导体科技股份有限公司 Multithreading comprehensive method and comprehensive system for FPGA development

Similar Documents

Publication Publication Date Title
CN109471734A (en) A kind of novel cache optimization multithreading Deterministic Methods
Shamis et al. Fast general distributed transactions with opacity
Ardekani et al. Non-monotonic snapshot isolation: Scalable and strong consistency for geo-replicated transactional systems
Reed Implementing atomic actions on decentralized data
CN109582474A (en) A kind of novel cache optimization multithreading Deterministic Methods
LaMarca A performance evaluation of lock-free synchronization protocols
EP3193256A1 (en) A fault-tolerant data processing computer system and method for implementing a distributed two-tier state machine
Dhoked et al. An adaptive approach to recoverable mutual exclusion
Hay et al. Experiments with hardware-based transactional memory in parallel simulation
Singh et al. A non-database operations aware priority ceiling protocol for hard real-time database systems
Danek et al. Local-spin group mutual exclusion algorithms
Shrivastava et al. Supporting transaction predictability in replicated DRTDBS
Shen et al. Rolis: a software approach to efficiently replicating multi-core transactions
CN116909741A (en) Method and system for improving speed of parallel write sharing main memory critical resource of slave core based on new generation Shenwei many-core processor
Shanker et al. Some performance issues in distributed real time database systems
Nykiel et al. Sharing across multiple MapReduce jobs
US6366946B1 (en) Critical code processing management
Yi et al. A Universal Construction to implement Concurrent Data Structure for NUMA-muticore
Liu et al. Lock-free scheduling of logical processes in parallel simulation
CN109408239A (en) A kind of asynchronous I O process method based on queue
Qi et al. Smart contract parallel execution with fine-grained state accesses
Bhalla The performance of an efficient distributed synchronization and recovery algorithm
Huang et al. A novel multi-CPU/GPU collaborative computing framework for SGD-based matrix factorization
Pang et al. On using similarity for resolving conflicts at commit in mixed distributed real-time databases
Peng et al. Fast wait-free construction for pool-like objects with weakened internal order: Stacks as an example

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20190405

WD01 Invention patent application deemed withdrawn after publication