CN102722446B - Dynamic recorder for local memory access model for stream processor - Google Patents
Dynamic recorder for local memory access model for stream processor Download PDFInfo
- Publication number
- CN102722446B CN102722446B CN201210185144.XA CN201210185144A CN102722446B CN 102722446 B CN102722446 B CN 102722446B CN 201210185144 A CN201210185144 A CN 201210185144A CN 102722446 B CN102722446 B CN 102722446B
- Authority
- CN
- China
- Prior art keywords
- local memory
- record
- module
- access
- thread
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Abstract
The invention relates to a dynamic recorder for a local memory access model for a stream processor, and belongs to the technical field of microprocessor system structure and compiling. The dynamic recorder comprises a local memory access switching module and an access record module. The local memory access switching module switches access to the local memory of all threads and records associated information therein in the other module i.e. the access record module. The access record module is a memory unit or a register unit provided with M fixed-length records. The dynamic recorder for the local memory access model for the stream processor provided by the invention has less impact on the operating efficiency of original programs, does not modify correction of the original programs, has no impact on semantic of the original programs, and can dynamically record the threads and program addresses which lead to maximum conflicts of local memory blocks.
Description
Technical field
The present invention relates to a kind of local memory access module dynamic recorder towards stream handle, belong to micro-processor architecture and technique of compiling field.
Background technology
Stream handle is one of processor type of being used widely in computer systems, which at present, and it is represented as GPU, i.e. graphic process unit.Stream handle has outstanding odds for effectiveness in the Floating-point Computation of large-scale data, is therefore also used to high-performance calculation and parallel data processing.
Stream handle has significantly different from typical architecture such as CPU in memory Accessing Mechanism, after this makes many former programs of being converted into and stream handle running of designing program based on CPU, can meet with efficiency bottle neck on internal storage access, and wherein most representative efficiency bottle neck comes from the access conflict to local memory block in stream handle.The access module of local memory in stream handle when logging program runs, the access conflict of wherein local memory block can be located, after obtaining these related datas, programming personnel, compiler or stream handle itself, the optimization method of program pin to local internal storage access can be cooked up, thus improve the operational efficiency of program.
Summary of the invention
For problems of the prior art, the present invention proposes a kind of local memory access module dynamic recorder towards stream handle, solves and how to find and add up the problem that the program caused due to access local memory block conflict runs bottleneck.
The present invention proposes a kind of local memory access module dynamic recorder towards stream handle, and described dynamic recorder comprises local memory access interconnecting module and Visitor Logs module two parts.
The access of all threads to local memory is transferred by described local memory access interconnecting module, and relevant information is wherein recorded in another module and Visitor Logs module; Described Visitor Logs module is internal storage location or the register cell with M fixed-length record.The size of M depends on the ability of stream handle hardware or operating system software, the IA of the packets of information vinculum journey that each record comprises, accessed local memory block number, thread number and visiting frequency.
The control procedure that realizes of local memory access interconnecting module comprises following step:
Step one: when local memory access patterns dynamic recorder obtain the thread A IA P in the program of current operation to the access application of local memory block O after, enter step 2;
Step 2: judge whether address P is present in existing Visitor Logs module, if, by this record position in the thread number write-access logging modle of thread A, and the frequency in record is added 1, when in a certain record, L thread number is fully occupied, and thread number to be written all different from existing thread number time, can an optional current thread number cover, or abandon write; If not, perform step 3; The size of L depends on the ability of hardware or software.
Step 3: judge that whether the memory space of current accessed logging modle is full, if so, find the record that visiting frequency is minimum, deleted; If not, perform step 4;
According to the working mechanism of local memory access interconnecting module, when the memory space of all M all occupied i.e. Visitor Logs module of record is full, record minimum for visiting frequency in existing record can cover by new record to be written, ensures that the record that always visiting frequency is maximum is present in Visitor Logs module; If recording that visiting frequency is minimum is multiple, chooses arbitrarily one of them and cover.
Step 4: increase by 1 new Visitor Logs in Visitor Logs module, the information such as the IA of thread A, accessed local memory block number, current thread number are write this record, and frequency is set to 1;
Step 5: continue to perform current thread A to the access of local memory block O.
The invention has the advantages that:
(1) the present invention proposes a kind of local memory access module dynamic recorder towards stream handle, less on the operational efficiency impact of original program;
(2) the present invention proposes a kind of local memory access module dynamic recorder towards stream handle, does not do to change, do not affect the semanteme of original program to original program correctness;
(3) the present invention proposes a kind of local memory access module dynamic recorder towards stream handle, can the dynamically recording thread that causes the conflict of local memory block maximum and program address;
Accompanying drawing explanation
Fig. 1 is the structural representation of Visitor Logs device in the present invention;
Fig. 2 is the flow chart of local memory access interconnecting module in the present invention.
Detailed description of the invention
Below in conjunction with accompanying drawing, the present invention is described in further detail.
The present invention proposes a kind of local memory access module dynamic recorder towards stream handle, and its structure as shown in Figure 1, comprises local memory access interconnecting module and Visitor Logs module two parts:
Described local memory access interconnecting module as shown in Figure 1, the access of all threads to local memory is transferred by this module, and relevant information is wherein recorded in another module and Visitor Logs module, this module can be realized by programming language, such as C or C Plus Plus, also can realize in microprocessor chip." local memory block " in Fig. 1, refer to existing local memory block in stream handle, " thread " refers to the current thread run on stream handle.
The control procedure that realizes of local memory access interconnecting module comprises following step, as shown in Figure 2:
Step one: when local memory access patterns dynamic recorder obtain the thread A IA P in the program of current operation to the access application of local memory block O after, enter step 2;
Step 2: judge whether address P is present in existing Visitor Logs module, if, by this record position in the thread number write-access logging modle of thread A, and the frequency in record is added 1, when in a certain record, L thread number is fully occupied, and thread number to be written all different from existing thread number time, can an optional current thread number cover, or abandon write; If not, perform step 3;
Step 3: judge that whether the memory space of current accessed logging modle is full, if so, find the record that visiting frequency is minimum, deleted; If not, perform step 4;
Step 4: increase by 1 new Visitor Logs in Visitor Logs module, the information such as the IA of thread A, accessed local memory block number, current thread number are write this record, and frequency is set to 1;
Step 5: continue to perform current thread A to the access of local memory block O.
Described Visitor Logs module is internal storage location or a register cell with M fixed-length record, and the size of M depends on the ability of stream handle hardware or operating system software, such as, can be 16,256 or more.Wherein, the information that each record comprises comprises (but being not limited to): the information such as the IA of thread, accessed local memory block number, thread number and visiting frequency, wherein thread number can be multiple, as mostly being most L in Fig. 1, the size of L depends on the ability of hardware or software, such as, can be 4,16 or more.According to the working mechanism of local memory access interconnecting module, when in a certain record, L thread number is fully occupied, and when thread number to be written is all different from existing thread number, an optional current thread number can cover, or abandon write.
According to the working mechanism of local memory access interconnecting module, when the memory space of all M all occupied i.e. Visitor Logs module of record is full, record minimum for visiting frequency in existing record can cover by new record to be written, ensures that the record that always visiting frequency is maximum is present in Visitor Logs module.If recording that visiting frequency is minimum is multiple, chooses arbitrarily one of them and cover.
Claims (1)
1. towards a local memory access module dynamic recorder for stream handle, it is characterized in that: described dynamic recorder comprises local memory access interconnecting module and Visitor Logs module two parts;
The access of all threads to local memory is transferred by described local memory access interconnecting module, and relevant information is wherein recorded in another module and Visitor Logs module; Described Visitor Logs module is internal storage location or the register cell with M fixed-length record; The size of M depends on the ability of stream handle hardware or operating system software, the IA of the packets of information vinculum journey that each record comprises, accessed local memory block number, thread number and visiting frequency;
The control procedure that realizes of local memory access interconnecting module comprises following step:
Step one: when obtain the thread A IA P in present procedure to the access application of local memory block O after, enter step 2;
Step 2: judge whether address P is present in existing Visitor Logs module, if, by the record position in the thread number write-access logging modle of thread A, and the frequency in record is added 1, when in a certain record, L thread number is fully occupied, and thread number to be written all different from existing thread number time, can an optional current thread number cover, or abandon write; If not, perform step 3; The size of L depends on the ability of hardware or software;
Step 3: judge that whether the memory space of current accessed logging modle is full, if so, find the record that visiting frequency is minimum, deleted; If not, perform step 4;
According to the working mechanism of local memory access interconnecting module, when the memory space of all M all occupied i.e. Visitor Logs module of record is full, record minimum for visiting frequency in existing record can cover by new record to be written, ensures that the record that always visiting frequency is maximum is present in Visitor Logs module; If recording that visiting frequency is minimum is multiple, chooses arbitrarily one of them and cover;
Step 4: the Visitor Logs that increase by 1 is new in Visitor Logs module, the IA of thread A, accessed local memory block number, current thread information are write this record, and frequency is set to 1;
Step 5: continue to perform current thread A to the access of local memory block O.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210185144.XA CN102722446B (en) | 2012-06-06 | 2012-06-06 | Dynamic recorder for local memory access model for stream processor |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210185144.XA CN102722446B (en) | 2012-06-06 | 2012-06-06 | Dynamic recorder for local memory access model for stream processor |
Publications (2)
Publication Number | Publication Date |
---|---|
CN102722446A CN102722446A (en) | 2012-10-10 |
CN102722446B true CN102722446B (en) | 2015-03-25 |
Family
ID=46948220
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201210185144.XA Active CN102722446B (en) | 2012-06-06 | 2012-06-06 | Dynamic recorder for local memory access model for stream processor |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102722446B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103218303B (en) * | 2013-03-27 | 2016-08-10 | 北京航空航天大学 | The track record method of Obj State in internal memory garbage collector based on address chain |
CN110348211B (en) * | 2018-07-17 | 2020-10-16 | 清华大学 | Method, apparatus, system, and medium for recording input and output operations of a processor |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101021779A (en) * | 2007-03-19 | 2007-08-22 | 中国人民解放军国防科学技术大学 | Instruction control method aimed at stream processor |
CN101814039A (en) * | 2010-02-02 | 2010-08-25 | 北京航空航天大学 | GPU-based Cache simulator and spatial parallel acceleration simulation method thereof |
US7809927B2 (en) * | 2007-09-11 | 2010-10-05 | Texas Instruments Incorporated | Computation parallelization in software reconfigurable all digital phase lock loop |
CN101989236A (en) * | 2010-11-04 | 2011-03-23 | 浙江大学 | Method for realizing instruction buffer lock |
-
2012
- 2012-06-06 CN CN201210185144.XA patent/CN102722446B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101021779A (en) * | 2007-03-19 | 2007-08-22 | 中国人民解放军国防科学技术大学 | Instruction control method aimed at stream processor |
US7809927B2 (en) * | 2007-09-11 | 2010-10-05 | Texas Instruments Incorporated | Computation parallelization in software reconfigurable all digital phase lock loop |
CN101814039A (en) * | 2010-02-02 | 2010-08-25 | 北京航空航天大学 | GPU-based Cache simulator and spatial parallel acceleration simulation method thereof |
CN101989236A (en) * | 2010-11-04 | 2011-03-23 | 浙江大学 | Method for realizing instruction buffer lock |
Non-Patent Citations (1)
Title |
---|
《流处理技术研究与发展》;杨学军 等;《计算机工程与科学》;20081231;第30卷(第4期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN102722446A (en) | 2012-10-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103856567B (en) | Small file storage method based on Hadoop distributed file system | |
US8457943B2 (en) | System and method for simulating a multiprocessor system | |
CN103279428B (en) | A kind of explicit multi-core Cache consistency active management method towards stream application | |
CN103150265B (en) | The fine-grained data distribution method of isomery storer on Embedded sheet | |
CN102073596B (en) | Method for managing reconfigurable on-chip unified memory aiming at instructions | |
US11494308B2 (en) | Methods and devices for bypassing the internal cache of an advanced DRAM memory controller | |
US10423354B2 (en) | Selective data copying between memory modules | |
CN103324466B (en) | Data dependency serialization IO parallel processing method | |
CN104021109A (en) | Technique for communicating interrupts in a computer system | |
WO2013155750A1 (en) | Page colouring technology-based memory database access optimization method | |
Lee et al. | ActiveSort: Efficient external sorting using active SSDs in the MapReduce framework | |
Jung | Exploring parallel data access methods in emerging non-volatile memory systems | |
CN102681937A (en) | Correctness verifying method of cache consistency protocol | |
CN104317770A (en) | Data storage structure and data access method for multiple core processing system | |
CN103678571A (en) | Multithreaded web crawler execution method applied to single host with multi-core processor | |
Jing et al. | Energy-efficient eDRAM-based on-chip storage architecture for GPGPUs | |
TWI439925B (en) | Embedded systems and methods for threads and buffer management thereof | |
US20190187964A1 (en) | Method and Apparatus for Compiler Driven Bank Conflict Avoidance | |
CN103268297A (en) | Accelerating core virtual scratch pad memory method based on heterogeneous multi-core platform | |
CN102722446B (en) | Dynamic recorder for local memory access model for stream processor | |
Guz et al. | Real-time analytics as the killer application for processing-in-memory | |
CN104035898A (en) | Memory access system based on VLIW (Very Long Instruction Word) type processor | |
CN103455364A (en) | System and method for online obtaining Cache performance of parallel program under multi-core environment | |
CN103914318A (en) | Program starting method and device | |
Zhang et al. | G10: Enabling An Efficient Unified GPU Memory and Storage Architecture with Smart Tensor Migrations |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant |