CN102722446A - Dynamic recorder for local memory access model for stream processor - Google Patents

Dynamic recorder for local memory access model for stream processor Download PDF

Info

Publication number
CN102722446A
CN102722446A CN201210185144XA CN201210185144A CN102722446A CN 102722446 A CN102722446 A CN 102722446A CN 201210185144X A CN201210185144X A CN 201210185144XA CN 201210185144 A CN201210185144 A CN 201210185144A CN 102722446 A CN102722446 A CN 102722446A
Authority
CN
China
Prior art keywords
local memory
module
thread
record
visit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201210185144XA
Other languages
Chinese (zh)
Other versions
CN102722446B (en
Inventor
史晓华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN201210185144.XA priority Critical patent/CN102722446B/en
Publication of CN102722446A publication Critical patent/CN102722446A/en
Application granted granted Critical
Publication of CN102722446B publication Critical patent/CN102722446B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention relates to a dynamic recorder for a local memory access model for a stream processor, and belongs to the technical field of microprocessor system structure and compiling. The dynamic recorder comprises a local memory access switching module and an access record module. The local memory access switching module switches access to the local memory of all threads and records associated information therein in the other module i.e. the access record module. The access record module is a memory unit or a register unit provided with M fixed-length records. The dynamic recorder for the local memory access model for the stream processor provided by the invention has less impact on the operating efficiency of original programs, does not modify correction of the original programs, has no impact on semantic of the original programs, and can dynamically record the threads and program addresses which lead to maximum conflicts of local memory blocks.

Description

A kind of local memory access module dynamic recorder towards stream handle
Technical field
The present invention relates to a kind of local memory access module dynamic recorder, belong to micro-processor architecture and technique of compiling field towards stream handle.
Background technology
Stream handle is one of processor type of in computer system, being used widely at present, and it is represented as GPU, i.e. graphic process unit.Stream handle has outstanding odds for effectiveness on the Floating-point Computation of large-scale data, therefore also is used to high-performance calculation and data parallel and handles.
Stream handle has significantly different with typical architecture such as CPU on memory Accessing Mechanism; This make many former based on CPU design program be converted into the program of moving on the stream handle after; Efficiency bottle neck can on internal storage access, be met with, and wherein most representative efficiency bottle neck comes from the access conflict to local memory piece in the stream handle.The access module of local memory in the stream handle during logging program operation; Can locate the wherein access conflict of local memory piece; After obtaining these related datas; Programming personnel, compiler or stream handle itself can be cooked up the optimization method of program pin to local internal storage access, thereby improves the operational efficiency of program.
Summary of the invention
To the problem that exists in the prior art, the present invention proposes a kind of local memory access module dynamic recorder towards stream handle, has solved the problem of how finding and adding up the program run bottleneck that causes owing to the conflict of visit local memory piece.
The present invention proposes a kind of local memory access module dynamic recorder towards stream handle, and described dynamic recorder comprises local memory visit interconnecting module and Visitor Logs module two parts.
Described local memory visit interconnecting module is transferred all threads to the visit of local memory, and the relevant information records of inciting somebody to action wherein is in the Visitor Logs module in another module; Described Visitor Logs module is internal storage location or the register cell with M fixed-length record.
The realization control procedure of local memory visit interconnecting module comprises following step:
Step 1: after Where topical memory access patterns dynamic recorder obtains the visit application of the instruction address P of thread A in the program of current operation to local memory block O, get into step 2;
Step 2: judge whether address P is present in the existing Visitor Logs module; If with this record position in the thread number write-access logging modle of thread A, and the frequency in will writing down adds 1; L thread number is by absorb in a certain record; And thread number to be written and existing thread number are all not simultaneously, can choose a current thread wantonly and number cover, and perhaps abandon writing; If not, execution in step three;
Step 3: whether the storage space of judging the current accessed logging modle is full, if find the minimum record of visiting frequency, with its deletion; If not, execution in step four;
Step 4: in the Visitor Logs module, increase by 1 new Visitor Logs, information such as the instruction address of thread A, the local memory piece number of being visited, current thread number are write this record, frequency is changed to 1;
Step 5: continue to carry out the visit of current thread A to local memory block O.
The invention has the advantages that:
(1) the present invention proposes a kind of local memory access module dynamic recorder towards stream handle, and is less to the operational efficiency influence of original program;
(2) the present invention proposes a kind of local memory access module dynamic recorder towards stream handle, and original program correctness is not changed, and does not influence the semanteme of original program;
(3) the present invention proposes a kind of local memory access module dynamic recorder towards stream handle, can cause the conflict of local memory piece maximum thread and program address by dynamically recording;
Description of drawings
Fig. 1 is the structural representation of Visitor Logs device among the present invention;
Fig. 2 is the process flow diagram of local memory visit interconnecting module among the present invention.
Embodiment
To combine accompanying drawing that the present invention is done further detailed description below.
The present invention proposes a kind of local memory access module dynamic recorder towards stream handle, and its structure is as shown in Figure 1, comprises local memory visit interconnecting module and Visitor Logs module two parts:
Described local memory visit interconnecting module is as shown in Figure 1; This module is transferred all threads to the visit of local memory; And the relevant information records of inciting somebody to action wherein is in the Visitor Logs module in another module; This module can realize that for example C or C Plus Plus also can be realized through programming language in microprocessor chip." local memory piece " among Fig. 1 refers to existing local memory piece in the stream handle, and " thread " refers to the current thread that on stream handle, moves.
The realization control procedure of local memory visit interconnecting module comprises following step, and is as shown in Figure 2:
Step 1: after Where topical memory access patterns dynamic recorder obtains the visit application of the instruction address P of thread A in the program of current operation to local memory block O, get into step 2;
Step 2: judge whether address P is present in the existing Visitor Logs module; If with this record position in the thread number write-access logging modle of thread A, and the frequency in will writing down adds 1; L thread number is by absorb in a certain record; And thread number to be written and existing thread number are all not simultaneously, can choose a current thread wantonly and number cover, and perhaps abandon writing; If not, execution in step three;
Step 3: whether the storage space of judging the current accessed logging modle is full, if find the minimum record of visiting frequency, with its deletion; If not, execution in step four;
Step 4: in the Visitor Logs module, increase by 1 new Visitor Logs, information such as the instruction address of thread A, the local memory piece number of being visited, current thread number are write this record, frequency is changed to 1;
Step 5: continue to carry out the visit of current thread A to local memory block O.
Described Visitor Logs module is an internal storage location or a register cell with M fixed-length record, and the size of M depends on the ability of stream handle hardware or operating system software, for example, can be 16,256 or more.Wherein, The information that each record comprises comprises (but being not limited to): the instruction address of thread, the information such as local memory piece number, thread number and visiting frequency of being visited; Wherein thread number can be for a plurality of, and as being L at most among Fig. 1, the size of L depends on the ability of hardware or software; For example, can be 4,16 or more.According to the working mechanism of local memory visit interconnecting module, L thread number is by absorb in a certain record, and thread number to be written and existing thread number are all not simultaneously, can choose a current thread wantonly and number cover, and perhaps abandons writing.
Working mechanism according to local memory visit interconnecting module; When all M records are all occupied is that the storage space of Visitor Logs module is full; New record to be written can cover having the record that visiting frequency is minimum in the record, guarantees that the record that always visiting frequency is maximum is present in the Visitor Logs module.If recording that visiting frequency is minimum is a plurality of, chooses one of them arbitrarily and cover.

Claims (2)

1. local memory access module dynamic recorder towards stream handle is characterized in that: described dynamic recorder comprises local memory visit interconnecting module and Visitor Logs module two parts;
Described local memory visit interconnecting module is transferred all threads to the visit of local memory, and the relevant information records of inciting somebody to action wherein is in the Visitor Logs module in another module; Described Visitor Logs module is internal storage location or the register cell with M fixed-length record;
The realization control procedure of local memory visit interconnecting module comprises following step:
Step 1: after obtaining the visit application of the instruction address P of thread A in present procedure, get into step 2 to local memory block O;
Step 2: judge whether address P is present in the existing Visitor Logs module; If with this record position in the thread number write-access logging modle of thread A, and the frequency in will writing down adds 1; L thread number is by absorb in a certain record; And thread number to be written and existing thread number are all not simultaneously, can choose a current thread wantonly and number cover, and perhaps abandon writing; If not, execution in step three;
Step 3: whether the storage space of judging the current accessed logging modle is full, if find the minimum record of visiting frequency, with its deletion; If not, execution in step four;
Step 4: in the Visitor Logs module, increase by 1 new Visitor Logs, information such as the instruction address of thread A, the local memory piece number of being visited, current thread number are write this record, frequency is changed to 1;
Step 5: continue to carry out the visit of current thread A to local memory block O.
2. a kind of local memory access module dynamic recorder towards stream handle according to claim 1 is characterized in that: the instruction address of the packets of information vinculum journey that each record comprises in the described Visitor Logs module, local memory piece number, thread number and the visiting frequency of being visited.
CN201210185144.XA 2012-06-06 2012-06-06 Dynamic recorder for local memory access model for stream processor Active CN102722446B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210185144.XA CN102722446B (en) 2012-06-06 2012-06-06 Dynamic recorder for local memory access model for stream processor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210185144.XA CN102722446B (en) 2012-06-06 2012-06-06 Dynamic recorder for local memory access model for stream processor

Publications (2)

Publication Number Publication Date
CN102722446A true CN102722446A (en) 2012-10-10
CN102722446B CN102722446B (en) 2015-03-25

Family

ID=46948220

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210185144.XA Active CN102722446B (en) 2012-06-06 2012-06-06 Dynamic recorder for local memory access model for stream processor

Country Status (1)

Country Link
CN (1) CN102722446B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103218303A (en) * 2013-03-27 2013-07-24 北京航空航天大学 Tracking and recording method of object states in memory garbage collector based on address chain
CN110348211A (en) * 2018-07-17 2019-10-18 清华大学 Method, apparatus, system and the medium of recording processor input-output operation

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101021779A (en) * 2007-03-19 2007-08-22 中国人民解放军国防科学技术大学 Instruction control method aimed at stream processor
CN101814039A (en) * 2010-02-02 2010-08-25 北京航空航天大学 GPU-based Cache simulator and spatial parallel acceleration simulation method thereof
US7809927B2 (en) * 2007-09-11 2010-10-05 Texas Instruments Incorporated Computation parallelization in software reconfigurable all digital phase lock loop
CN101989236A (en) * 2010-11-04 2011-03-23 浙江大学 Method for realizing instruction buffer lock

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101021779A (en) * 2007-03-19 2007-08-22 中国人民解放军国防科学技术大学 Instruction control method aimed at stream processor
US7809927B2 (en) * 2007-09-11 2010-10-05 Texas Instruments Incorporated Computation parallelization in software reconfigurable all digital phase lock loop
CN101814039A (en) * 2010-02-02 2010-08-25 北京航空航天大学 GPU-based Cache simulator and spatial parallel acceleration simulation method thereof
CN101989236A (en) * 2010-11-04 2011-03-23 浙江大学 Method for realizing instruction buffer lock

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
杨学军 等: "《流处理技术研究与发展》", 《计算机工程与科学》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103218303A (en) * 2013-03-27 2013-07-24 北京航空航天大学 Tracking and recording method of object states in memory garbage collector based on address chain
CN103218303B (en) * 2013-03-27 2016-08-10 北京航空航天大学 The track record method of Obj State in internal memory garbage collector based on address chain
CN110348211A (en) * 2018-07-17 2019-10-18 清华大学 Method, apparatus, system and the medium of recording processor input-output operation
CN110348211B (en) * 2018-07-17 2020-10-16 清华大学 Method, apparatus, system, and medium for recording input and output operations of a processor

Also Published As

Publication number Publication date
CN102722446B (en) 2015-03-25

Similar Documents

Publication Publication Date Title
US8457943B2 (en) System and method for simulating a multiprocessor system
CN102073596B (en) Method for managing reconfigurable on-chip unified memory aiming at instructions
US11494308B2 (en) Methods and devices for bypassing the internal cache of an advanced DRAM memory controller
CN101630276B (en) High-efficiency memory access method
CN103279428B (en) A kind of explicit multi-core Cache consistency active management method towards stream application
CN103150265A (en) Fine grit data distributing method orienting to embedded on-chip heterogeneous memory
Jing et al. Cache-emulated register file: An integrated on-chip memory architecture for high performance GPGPUs
US20170083240A1 (en) Selective data copying between memory modules
Jung Exploring parallel data access methods in emerging non-volatile memory systems
Jing et al. Energy-efficient eDRAM-based on-chip storage architecture for GPGPUs
US20190187964A1 (en) Method and Apparatus for Compiler Driven Bank Conflict Avoidance
CN107977577A (en) access instruction access detection method and device
CN102722446B (en) Dynamic recorder for local memory access model for stream processor
Ausavarungnirun Techniques for shared resource management in systems with throughput processors
US20140122807A1 (en) Memory address translations
CN104035898A (en) Memory access system based on VLIW (Very Long Instruction Word) type processor
Zhang et al. G10: Enabling An Efficient Unified GPU Memory and Storage Architecture with Smart Tensor Migrations
Jacob A case for studying DRAM issues at the system level
US10331385B2 (en) Cooperative write-back cache flushing for storage devices
Sahoo et al. Formal modeling and verification of controllers for a family of DRAM caches
Ungethüm et al. Overview on hardware optimizations for database engines
Bojnordi et al. A programmable memory controller for the DDRx interfacing standards
Paik et al. Selective-delay garbage collection mechanism for read operations in multichannel flash-based storage devices
Perach et al. On consistency for bulk-bitwise processing-in-memory
Huang et al. A reconfigurable cache for efficient use of tag RAM as scratch-pad memory

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant