CN102722446A - Dynamic recorder for local memory access model for stream processor - Google Patents
Dynamic recorder for local memory access model for stream processor Download PDFInfo
- Publication number
- CN102722446A CN102722446A CN201210185144XA CN201210185144A CN102722446A CN 102722446 A CN102722446 A CN 102722446A CN 201210185144X A CN201210185144X A CN 201210185144XA CN 201210185144 A CN201210185144 A CN 201210185144A CN 102722446 A CN102722446 A CN 102722446A
- Authority
- CN
- China
- Prior art keywords
- local memory
- module
- thread
- record
- visit
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Abstract
The invention relates to a dynamic recorder for a local memory access model for a stream processor, and belongs to the technical field of microprocessor system structure and compiling. The dynamic recorder comprises a local memory access switching module and an access record module. The local memory access switching module switches access to the local memory of all threads and records associated information therein in the other module i.e. the access record module. The access record module is a memory unit or a register unit provided with M fixed-length records. The dynamic recorder for the local memory access model for the stream processor provided by the invention has less impact on the operating efficiency of original programs, does not modify correction of the original programs, has no impact on semantic of the original programs, and can dynamically record the threads and program addresses which lead to maximum conflicts of local memory blocks.
Description
Technical field
The present invention relates to a kind of local memory access module dynamic recorder, belong to micro-processor architecture and technique of compiling field towards stream handle.
Background technology
Stream handle is one of processor type of in computer system, being used widely at present, and it is represented as GPU, i.e. graphic process unit.Stream handle has outstanding odds for effectiveness on the Floating-point Computation of large-scale data, therefore also is used to high-performance calculation and data parallel and handles.
Stream handle has significantly different with typical architecture such as CPU on memory Accessing Mechanism; This make many former based on CPU design program be converted into the program of moving on the stream handle after; Efficiency bottle neck can on internal storage access, be met with, and wherein most representative efficiency bottle neck comes from the access conflict to local memory piece in the stream handle.The access module of local memory in the stream handle during logging program operation; Can locate the wherein access conflict of local memory piece; After obtaining these related datas; Programming personnel, compiler or stream handle itself can be cooked up the optimization method of program pin to local internal storage access, thereby improves the operational efficiency of program.
Summary of the invention
To the problem that exists in the prior art, the present invention proposes a kind of local memory access module dynamic recorder towards stream handle, has solved the problem of how finding and adding up the program run bottleneck that causes owing to the conflict of visit local memory piece.
The present invention proposes a kind of local memory access module dynamic recorder towards stream handle, and described dynamic recorder comprises local memory visit interconnecting module and Visitor Logs module two parts.
Described local memory visit interconnecting module is transferred all threads to the visit of local memory, and the relevant information records of inciting somebody to action wherein is in the Visitor Logs module in another module; Described Visitor Logs module is internal storage location or the register cell with M fixed-length record.
The realization control procedure of local memory visit interconnecting module comprises following step:
Step 1: after Where topical memory access patterns dynamic recorder obtains the visit application of the instruction address P of thread A in the program of current operation to local memory block O, get into step 2;
Step 2: judge whether address P is present in the existing Visitor Logs module; If with this record position in the thread number write-access logging modle of thread A, and the frequency in will writing down adds 1; L thread number is by absorb in a certain record; And thread number to be written and existing thread number are all not simultaneously, can choose a current thread wantonly and number cover, and perhaps abandon writing; If not, execution in step three;
Step 3: whether the storage space of judging the current accessed logging modle is full, if find the minimum record of visiting frequency, with its deletion; If not, execution in step four;
Step 4: in the Visitor Logs module, increase by 1 new Visitor Logs, information such as the instruction address of thread A, the local memory piece number of being visited, current thread number are write this record, frequency is changed to 1;
Step 5: continue to carry out the visit of current thread A to local memory block O.
The invention has the advantages that:
(1) the present invention proposes a kind of local memory access module dynamic recorder towards stream handle, and is less to the operational efficiency influence of original program;
(2) the present invention proposes a kind of local memory access module dynamic recorder towards stream handle, and original program correctness is not changed, and does not influence the semanteme of original program;
(3) the present invention proposes a kind of local memory access module dynamic recorder towards stream handle, can cause the conflict of local memory piece maximum thread and program address by dynamically recording;
Description of drawings
Fig. 1 is the structural representation of Visitor Logs device among the present invention;
Fig. 2 is the process flow diagram of local memory visit interconnecting module among the present invention.
Embodiment
To combine accompanying drawing that the present invention is done further detailed description below.
The present invention proposes a kind of local memory access module dynamic recorder towards stream handle, and its structure is as shown in Figure 1, comprises local memory visit interconnecting module and Visitor Logs module two parts:
Described local memory visit interconnecting module is as shown in Figure 1; This module is transferred all threads to the visit of local memory; And the relevant information records of inciting somebody to action wherein is in the Visitor Logs module in another module; This module can realize that for example C or C Plus Plus also can be realized through programming language in microprocessor chip." local memory piece " among Fig. 1 refers to existing local memory piece in the stream handle, and " thread " refers to the current thread that on stream handle, moves.
The realization control procedure of local memory visit interconnecting module comprises following step, and is as shown in Figure 2:
Step 1: after Where topical memory access patterns dynamic recorder obtains the visit application of the instruction address P of thread A in the program of current operation to local memory block O, get into step 2;
Step 2: judge whether address P is present in the existing Visitor Logs module; If with this record position in the thread number write-access logging modle of thread A, and the frequency in will writing down adds 1; L thread number is by absorb in a certain record; And thread number to be written and existing thread number are all not simultaneously, can choose a current thread wantonly and number cover, and perhaps abandon writing; If not, execution in step three;
Step 3: whether the storage space of judging the current accessed logging modle is full, if find the minimum record of visiting frequency, with its deletion; If not, execution in step four;
Step 4: in the Visitor Logs module, increase by 1 new Visitor Logs, information such as the instruction address of thread A, the local memory piece number of being visited, current thread number are write this record, frequency is changed to 1;
Step 5: continue to carry out the visit of current thread A to local memory block O.
Described Visitor Logs module is an internal storage location or a register cell with M fixed-length record, and the size of M depends on the ability of stream handle hardware or operating system software, for example, can be 16,256 or more.Wherein, The information that each record comprises comprises (but being not limited to): the instruction address of thread, the information such as local memory piece number, thread number and visiting frequency of being visited; Wherein thread number can be for a plurality of, and as being L at most among Fig. 1, the size of L depends on the ability of hardware or software; For example, can be 4,16 or more.According to the working mechanism of local memory visit interconnecting module, L thread number is by absorb in a certain record, and thread number to be written and existing thread number are all not simultaneously, can choose a current thread wantonly and number cover, and perhaps abandons writing.
Working mechanism according to local memory visit interconnecting module; When all M records are all occupied is that the storage space of Visitor Logs module is full; New record to be written can cover having the record that visiting frequency is minimum in the record, guarantees that the record that always visiting frequency is maximum is present in the Visitor Logs module.If recording that visiting frequency is minimum is a plurality of, chooses one of them arbitrarily and cover.
Claims (2)
1. local memory access module dynamic recorder towards stream handle is characterized in that: described dynamic recorder comprises local memory visit interconnecting module and Visitor Logs module two parts;
Described local memory visit interconnecting module is transferred all threads to the visit of local memory, and the relevant information records of inciting somebody to action wherein is in the Visitor Logs module in another module; Described Visitor Logs module is internal storage location or the register cell with M fixed-length record;
The realization control procedure of local memory visit interconnecting module comprises following step:
Step 1: after obtaining the visit application of the instruction address P of thread A in present procedure, get into step 2 to local memory block O;
Step 2: judge whether address P is present in the existing Visitor Logs module; If with this record position in the thread number write-access logging modle of thread A, and the frequency in will writing down adds 1; L thread number is by absorb in a certain record; And thread number to be written and existing thread number are all not simultaneously, can choose a current thread wantonly and number cover, and perhaps abandon writing; If not, execution in step three;
Step 3: whether the storage space of judging the current accessed logging modle is full, if find the minimum record of visiting frequency, with its deletion; If not, execution in step four;
Step 4: in the Visitor Logs module, increase by 1 new Visitor Logs, information such as the instruction address of thread A, the local memory piece number of being visited, current thread number are write this record, frequency is changed to 1;
Step 5: continue to carry out the visit of current thread A to local memory block O.
2. a kind of local memory access module dynamic recorder towards stream handle according to claim 1 is characterized in that: the instruction address of the packets of information vinculum journey that each record comprises in the described Visitor Logs module, local memory piece number, thread number and the visiting frequency of being visited.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210185144.XA CN102722446B (en) | 2012-06-06 | 2012-06-06 | Dynamic recorder for local memory access model for stream processor |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210185144.XA CN102722446B (en) | 2012-06-06 | 2012-06-06 | Dynamic recorder for local memory access model for stream processor |
Publications (2)
Publication Number | Publication Date |
---|---|
CN102722446A true CN102722446A (en) | 2012-10-10 |
CN102722446B CN102722446B (en) | 2015-03-25 |
Family
ID=46948220
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201210185144.XA Active CN102722446B (en) | 2012-06-06 | 2012-06-06 | Dynamic recorder for local memory access model for stream processor |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102722446B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103218303A (en) * | 2013-03-27 | 2013-07-24 | 北京航空航天大学 | Tracking and recording method of object states in memory garbage collector based on address chain |
CN110348211A (en) * | 2018-07-17 | 2019-10-18 | 清华大学 | Method, apparatus, system and the medium of recording processor input-output operation |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101021779A (en) * | 2007-03-19 | 2007-08-22 | 中国人民解放军国防科学技术大学 | Instruction control method aimed at stream processor |
CN101814039A (en) * | 2010-02-02 | 2010-08-25 | 北京航空航天大学 | GPU-based Cache simulator and spatial parallel acceleration simulation method thereof |
US7809927B2 (en) * | 2007-09-11 | 2010-10-05 | Texas Instruments Incorporated | Computation parallelization in software reconfigurable all digital phase lock loop |
CN101989236A (en) * | 2010-11-04 | 2011-03-23 | 浙江大学 | Method for realizing instruction buffer lock |
-
2012
- 2012-06-06 CN CN201210185144.XA patent/CN102722446B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101021779A (en) * | 2007-03-19 | 2007-08-22 | 中国人民解放军国防科学技术大学 | Instruction control method aimed at stream processor |
US7809927B2 (en) * | 2007-09-11 | 2010-10-05 | Texas Instruments Incorporated | Computation parallelization in software reconfigurable all digital phase lock loop |
CN101814039A (en) * | 2010-02-02 | 2010-08-25 | 北京航空航天大学 | GPU-based Cache simulator and spatial parallel acceleration simulation method thereof |
CN101989236A (en) * | 2010-11-04 | 2011-03-23 | 浙江大学 | Method for realizing instruction buffer lock |
Non-Patent Citations (1)
Title |
---|
杨学军 等: "《流处理技术研究与发展》", 《计算机工程与科学》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103218303A (en) * | 2013-03-27 | 2013-07-24 | 北京航空航天大学 | Tracking and recording method of object states in memory garbage collector based on address chain |
CN103218303B (en) * | 2013-03-27 | 2016-08-10 | 北京航空航天大学 | The track record method of Obj State in internal memory garbage collector based on address chain |
CN110348211A (en) * | 2018-07-17 | 2019-10-18 | 清华大学 | Method, apparatus, system and the medium of recording processor input-output operation |
CN110348211B (en) * | 2018-07-17 | 2020-10-16 | 清华大学 | Method, apparatus, system, and medium for recording input and output operations of a processor |
Also Published As
Publication number | Publication date |
---|---|
CN102722446B (en) | 2015-03-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8457943B2 (en) | System and method for simulating a multiprocessor system | |
CN102073596B (en) | Method for managing reconfigurable on-chip unified memory aiming at instructions | |
US11494308B2 (en) | Methods and devices for bypassing the internal cache of an advanced DRAM memory controller | |
CN101630276B (en) | High-efficiency memory access method | |
CN103279428B (en) | A kind of explicit multi-core Cache consistency active management method towards stream application | |
CN103150265A (en) | Fine grit data distributing method orienting to embedded on-chip heterogeneous memory | |
Jing et al. | Cache-emulated register file: An integrated on-chip memory architecture for high performance GPGPUs | |
US20170083240A1 (en) | Selective data copying between memory modules | |
Jung | Exploring parallel data access methods in emerging non-volatile memory systems | |
Jing et al. | Energy-efficient eDRAM-based on-chip storage architecture for GPGPUs | |
US20190187964A1 (en) | Method and Apparatus for Compiler Driven Bank Conflict Avoidance | |
CN107977577A (en) | access instruction access detection method and device | |
CN102722446B (en) | Dynamic recorder for local memory access model for stream processor | |
Ausavarungnirun | Techniques for shared resource management in systems with throughput processors | |
US20140122807A1 (en) | Memory address translations | |
CN104035898A (en) | Memory access system based on VLIW (Very Long Instruction Word) type processor | |
Zhang et al. | G10: Enabling An Efficient Unified GPU Memory and Storage Architecture with Smart Tensor Migrations | |
Jacob | A case for studying DRAM issues at the system level | |
US10331385B2 (en) | Cooperative write-back cache flushing for storage devices | |
Sahoo et al. | Formal modeling and verification of controllers for a family of DRAM caches | |
Ungethüm et al. | Overview on hardware optimizations for database engines | |
Bojnordi et al. | A programmable memory controller for the DDRx interfacing standards | |
Paik et al. | Selective-delay garbage collection mechanism for read operations in multichannel flash-based storage devices | |
Perach et al. | On consistency for bulk-bitwise processing-in-memory | |
Huang et al. | A reconfigurable cache for efficient use of tag RAM as scratch-pad memory |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant |