CN102722446B - Dynamic recorder for local memory access model for stream processor - Google Patents

Dynamic recorder for local memory access model for stream processor Download PDF

Info

Publication number
CN102722446B
CN102722446B CN201210185144.XA CN201210185144A CN102722446B CN 102722446 B CN102722446 B CN 102722446B CN 201210185144 A CN201210185144 A CN 201210185144A CN 102722446 B CN102722446 B CN 102722446B
Authority
CN
China
Prior art keywords
local memory
record
module
access
thread
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201210185144.XA
Other languages
Chinese (zh)
Other versions
CN102722446A (en
Inventor
史晓华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN201210185144.XA priority Critical patent/CN102722446B/en
Publication of CN102722446A publication Critical patent/CN102722446A/en
Application granted granted Critical
Publication of CN102722446B publication Critical patent/CN102722446B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention relates to a dynamic recorder for a local memory access model for a stream processor, and belongs to the technical field of microprocessor system structure and compiling. The dynamic recorder comprises a local memory access switching module and an access record module. The local memory access switching module switches access to the local memory of all threads and records associated information therein in the other module i.e. the access record module. The access record module is a memory unit or a register unit provided with M fixed-length records. The dynamic recorder for the local memory access model for the stream processor provided by the invention has less impact on the operating efficiency of original programs, does not modify correction of the original programs, has no impact on semantic of the original programs, and can dynamically record the threads and program addresses which lead to maximum conflicts of local memory blocks.

Description

A kind of local memory access module dynamic recorder towards stream handle
Technical field
The present invention relates to a kind of local memory access module dynamic recorder towards stream handle, belong to micro-processor architecture and technique of compiling field.
Background technology
Stream handle is one of processor type of being used widely in computer systems, which at present, and it is represented as GPU, i.e. graphic process unit.Stream handle has outstanding odds for effectiveness in the Floating-point Computation of large-scale data, is therefore also used to high-performance calculation and parallel data processing.
Stream handle has significantly different from typical architecture such as CPU in memory Accessing Mechanism, after this makes many former programs of being converted into and stream handle running of designing program based on CPU, can meet with efficiency bottle neck on internal storage access, and wherein most representative efficiency bottle neck comes from the access conflict to local memory block in stream handle.The access module of local memory in stream handle when logging program runs, the access conflict of wherein local memory block can be located, after obtaining these related datas, programming personnel, compiler or stream handle itself, the optimization method of program pin to local internal storage access can be cooked up, thus improve the operational efficiency of program.
Summary of the invention
For problems of the prior art, the present invention proposes a kind of local memory access module dynamic recorder towards stream handle, solves and how to find and add up the problem that the program caused due to access local memory block conflict runs bottleneck.
The present invention proposes a kind of local memory access module dynamic recorder towards stream handle, and described dynamic recorder comprises local memory access interconnecting module and Visitor Logs module two parts.
The access of all threads to local memory is transferred by described local memory access interconnecting module, and relevant information is wherein recorded in another module and Visitor Logs module; Described Visitor Logs module is internal storage location or the register cell with M fixed-length record.The size of M depends on the ability of stream handle hardware or operating system software, the IA of the packets of information vinculum journey that each record comprises, accessed local memory block number, thread number and visiting frequency.
The control procedure that realizes of local memory access interconnecting module comprises following step:
Step one: when local memory access patterns dynamic recorder obtain the thread A IA P in the program of current operation to the access application of local memory block O after, enter step 2;
Step 2: judge whether address P is present in existing Visitor Logs module, if, by this record position in the thread number write-access logging modle of thread A, and the frequency in record is added 1, when in a certain record, L thread number is fully occupied, and thread number to be written all different from existing thread number time, can an optional current thread number cover, or abandon write; If not, perform step 3; The size of L depends on the ability of hardware or software.
Step 3: judge that whether the memory space of current accessed logging modle is full, if so, find the record that visiting frequency is minimum, deleted; If not, perform step 4;
According to the working mechanism of local memory access interconnecting module, when the memory space of all M all occupied i.e. Visitor Logs module of record is full, record minimum for visiting frequency in existing record can cover by new record to be written, ensures that the record that always visiting frequency is maximum is present in Visitor Logs module; If recording that visiting frequency is minimum is multiple, chooses arbitrarily one of them and cover.
Step 4: increase by 1 new Visitor Logs in Visitor Logs module, the information such as the IA of thread A, accessed local memory block number, current thread number are write this record, and frequency is set to 1;
Step 5: continue to perform current thread A to the access of local memory block O.
The invention has the advantages that:
(1) the present invention proposes a kind of local memory access module dynamic recorder towards stream handle, less on the operational efficiency impact of original program;
(2) the present invention proposes a kind of local memory access module dynamic recorder towards stream handle, does not do to change, do not affect the semanteme of original program to original program correctness;
(3) the present invention proposes a kind of local memory access module dynamic recorder towards stream handle, can the dynamically recording thread that causes the conflict of local memory block maximum and program address;
Accompanying drawing explanation
Fig. 1 is the structural representation of Visitor Logs device in the present invention;
Fig. 2 is the flow chart of local memory access interconnecting module in the present invention.
Detailed description of the invention
Below in conjunction with accompanying drawing, the present invention is described in further detail.
The present invention proposes a kind of local memory access module dynamic recorder towards stream handle, and its structure as shown in Figure 1, comprises local memory access interconnecting module and Visitor Logs module two parts:
Described local memory access interconnecting module as shown in Figure 1, the access of all threads to local memory is transferred by this module, and relevant information is wherein recorded in another module and Visitor Logs module, this module can be realized by programming language, such as C or C Plus Plus, also can realize in microprocessor chip." local memory block " in Fig. 1, refer to existing local memory block in stream handle, " thread " refers to the current thread run on stream handle.
The control procedure that realizes of local memory access interconnecting module comprises following step, as shown in Figure 2:
Step one: when local memory access patterns dynamic recorder obtain the thread A IA P in the program of current operation to the access application of local memory block O after, enter step 2;
Step 2: judge whether address P is present in existing Visitor Logs module, if, by this record position in the thread number write-access logging modle of thread A, and the frequency in record is added 1, when in a certain record, L thread number is fully occupied, and thread number to be written all different from existing thread number time, can an optional current thread number cover, or abandon write; If not, perform step 3;
Step 3: judge that whether the memory space of current accessed logging modle is full, if so, find the record that visiting frequency is minimum, deleted; If not, perform step 4;
Step 4: increase by 1 new Visitor Logs in Visitor Logs module, the information such as the IA of thread A, accessed local memory block number, current thread number are write this record, and frequency is set to 1;
Step 5: continue to perform current thread A to the access of local memory block O.
Described Visitor Logs module is internal storage location or a register cell with M fixed-length record, and the size of M depends on the ability of stream handle hardware or operating system software, such as, can be 16,256 or more.Wherein, the information that each record comprises comprises (but being not limited to): the information such as the IA of thread, accessed local memory block number, thread number and visiting frequency, wherein thread number can be multiple, as mostly being most L in Fig. 1, the size of L depends on the ability of hardware or software, such as, can be 4,16 or more.According to the working mechanism of local memory access interconnecting module, when in a certain record, L thread number is fully occupied, and when thread number to be written is all different from existing thread number, an optional current thread number can cover, or abandon write.
According to the working mechanism of local memory access interconnecting module, when the memory space of all M all occupied i.e. Visitor Logs module of record is full, record minimum for visiting frequency in existing record can cover by new record to be written, ensures that the record that always visiting frequency is maximum is present in Visitor Logs module.If recording that visiting frequency is minimum is multiple, chooses arbitrarily one of them and cover.

Claims (1)

1. towards a local memory access module dynamic recorder for stream handle, it is characterized in that: described dynamic recorder comprises local memory access interconnecting module and Visitor Logs module two parts;
The access of all threads to local memory is transferred by described local memory access interconnecting module, and relevant information is wherein recorded in another module and Visitor Logs module; Described Visitor Logs module is internal storage location or the register cell with M fixed-length record; The size of M depends on the ability of stream handle hardware or operating system software, the IA of the packets of information vinculum journey that each record comprises, accessed local memory block number, thread number and visiting frequency;
The control procedure that realizes of local memory access interconnecting module comprises following step:
Step one: when obtain the thread A IA P in present procedure to the access application of local memory block O after, enter step 2;
Step 2: judge whether address P is present in existing Visitor Logs module, if, by the record position in the thread number write-access logging modle of thread A, and the frequency in record is added 1, when in a certain record, L thread number is fully occupied, and thread number to be written all different from existing thread number time, can an optional current thread number cover, or abandon write; If not, perform step 3; The size of L depends on the ability of hardware or software;
Step 3: judge that whether the memory space of current accessed logging modle is full, if so, find the record that visiting frequency is minimum, deleted; If not, perform step 4;
According to the working mechanism of local memory access interconnecting module, when the memory space of all M all occupied i.e. Visitor Logs module of record is full, record minimum for visiting frequency in existing record can cover by new record to be written, ensures that the record that always visiting frequency is maximum is present in Visitor Logs module; If recording that visiting frequency is minimum is multiple, chooses arbitrarily one of them and cover;
Step 4: the Visitor Logs that increase by 1 is new in Visitor Logs module, the IA of thread A, accessed local memory block number, current thread information are write this record, and frequency is set to 1;
Step 5: continue to perform current thread A to the access of local memory block O.
CN201210185144.XA 2012-06-06 2012-06-06 Dynamic recorder for local memory access model for stream processor Active CN102722446B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210185144.XA CN102722446B (en) 2012-06-06 2012-06-06 Dynamic recorder for local memory access model for stream processor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210185144.XA CN102722446B (en) 2012-06-06 2012-06-06 Dynamic recorder for local memory access model for stream processor

Publications (2)

Publication Number Publication Date
CN102722446A CN102722446A (en) 2012-10-10
CN102722446B true CN102722446B (en) 2015-03-25

Family

ID=46948220

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210185144.XA Active CN102722446B (en) 2012-06-06 2012-06-06 Dynamic recorder for local memory access model for stream processor

Country Status (1)

Country Link
CN (1) CN102722446B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103218303B (en) * 2013-03-27 2016-08-10 北京航空航天大学 The track record method of Obj State in internal memory garbage collector based on address chain
CN110348211B (en) * 2018-07-17 2020-10-16 清华大学 Method, apparatus, system, and medium for recording input and output operations of a processor

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101021779A (en) * 2007-03-19 2007-08-22 中国人民解放军国防科学技术大学 Instruction control method aimed at stream processor
CN101814039A (en) * 2010-02-02 2010-08-25 北京航空航天大学 GPU-based Cache simulator and spatial parallel acceleration simulation method thereof
US7809927B2 (en) * 2007-09-11 2010-10-05 Texas Instruments Incorporated Computation parallelization in software reconfigurable all digital phase lock loop
CN101989236A (en) * 2010-11-04 2011-03-23 浙江大学 Method for realizing instruction buffer lock

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101021779A (en) * 2007-03-19 2007-08-22 中国人民解放军国防科学技术大学 Instruction control method aimed at stream processor
US7809927B2 (en) * 2007-09-11 2010-10-05 Texas Instruments Incorporated Computation parallelization in software reconfigurable all digital phase lock loop
CN101814039A (en) * 2010-02-02 2010-08-25 北京航空航天大学 GPU-based Cache simulator and spatial parallel acceleration simulation method thereof
CN101989236A (en) * 2010-11-04 2011-03-23 浙江大学 Method for realizing instruction buffer lock

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
《流处理技术研究与发展》;杨学军 等;《计算机工程与科学》;20081231;第30卷(第4期);全文 *

Also Published As

Publication number Publication date
CN102722446A (en) 2012-10-10

Similar Documents

Publication Publication Date Title
CN103856567B (en) Small file storage method based on Hadoop distributed file system
US8457943B2 (en) System and method for simulating a multiprocessor system
CN103279428B (en) A kind of explicit multi-core Cache consistency active management method towards stream application
CN103150265B (en) The fine-grained data distribution method of isomery storer on Embedded sheet
CN102073596B (en) Method for managing reconfigurable on-chip unified memory aiming at instructions
US11494308B2 (en) Methods and devices for bypassing the internal cache of an advanced DRAM memory controller
US10423354B2 (en) Selective data copying between memory modules
CN103324466B (en) Data dependency serialization IO parallel processing method
CN104021109A (en) Technique for communicating interrupts in a computer system
WO2013155750A1 (en) Page colouring technology-based memory database access optimization method
Lee et al. ActiveSort: Efficient external sorting using active SSDs in the MapReduce framework
Jung Exploring parallel data access methods in emerging non-volatile memory systems
CN102681937A (en) Correctness verifying method of cache consistency protocol
CN104317770A (en) Data storage structure and data access method for multiple core processing system
CN103678571A (en) Multithreaded web crawler execution method applied to single host with multi-core processor
Jing et al. Energy-efficient eDRAM-based on-chip storage architecture for GPGPUs
TWI439925B (en) Embedded systems and methods for threads and buffer management thereof
US20190187964A1 (en) Method and Apparatus for Compiler Driven Bank Conflict Avoidance
CN103268297A (en) Accelerating core virtual scratch pad memory method based on heterogeneous multi-core platform
CN102722446B (en) Dynamic recorder for local memory access model for stream processor
Guz et al. Real-time analytics as the killer application for processing-in-memory
CN104035898A (en) Memory access system based on VLIW (Very Long Instruction Word) type processor
CN103455364A (en) System and method for online obtaining Cache performance of parallel program under multi-core environment
CN103914318A (en) Program starting method and device
Zhang et al. G10: Enabling An Efficient Unified GPU Memory and Storage Architecture with Smart Tensor Migrations

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant