CN102043723A

CN102043723A - On-chip cache structure used for variable memory access mode of general-purpose stream processor

Info

Publication number: CN102043723A
Application number: CN 201110001556
Authority: CN
Inventors: 邢座程; 付桂涛; 陈小保; 马安国; 黄平; 汤先拓; 何锐; 王庆林; 晏小波; 李方圆; 邱建雄; 蔡放; 闵银皮; 梅家祥; 孟晓冬; 赵齐; 王宏燕
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2011-01-06
Filing date: 2011-01-06
Publication date: 2011-05-04
Anticipated expiration: 2031-01-06
Also published as: CN102043723B

Abstract

The invention discloses an on-chip cache structure used for a variable memory access mode of a general-purpose stream processor, which comprises a memory array unit, a cache controller, a scratch pad memory controller and a memory array boundary segmentation logical unit, wherein the memory array unit comprises a cache part and a scratch pad memory part which are segmented through the memory array boundary segmentation logical unit, the cache controller is used for accessing the cache part in the memory array unit, and the scratch pad memory controller is used for accessing the scratch pad memory part in the memory array unit. The on-chip cache structure has the advantages of simple and compact structure, low cost, good reliability and a wide range of applications, and can change the size ratio of the cache to the scratch pad memory, meet the performances of the processor to the maximum extent and the like.

Description

Be used for buffer structure on the sheet of variable memory access pattern of general stream handle

Technical field

The present invention is mainly concerned with the buffer structure field of stream handle, refers in particular to buffer structure on a kind of sheet of the variable memory access pattern that is applicable to general stream handle.

Background technology

In up-to-date high-performance microprocessor structural research, flowing system structure is becoming the research focus, multiple processor with stream processing feature has appearred, as the Imagine stream handle of Stanford design and STI(Sony, Toshiba, IBM) the Cell processor released of alliance etc.Flowing system structure can effectively be alleviated storage wall bottleneck, other program parallelization of the multiple level of exploitation, the verified handling property that can greatly promote application such as media and data-intensive science calculating.Current main-stream processor and video card factory commercial city are in research and release the processor with stream processing feature, as the professional figure video card of NVIDIA at GPGPU(General Purpose GPU) medical imaging direction in the field obtained fabulous effect; In the supercomputing field, the Linpack performance of GPU is tens times of at present equal price polycaryon processor.AMD has released the Firestream processor, and its peak performance reaches 500GFLOPS.Intel also is about to release the Larrabee graphic process unit with stream processing feature, and its peak performance will reach 1TFLOPS.Studies show that stream is handled and is suitable for a lot of applications, a lot of research is at present attempted stream handle is applied to the application that science is calculated, and CUDA is exactly typical case's representative of this application.Science is calculated at first needs the computing system to have powerful floating-point operation ability, calculates as the main body of macromolecule motion analysis, oil exploration analysis, atmospheric science, solid mechanics, molecular mechanics, fluid mechanics, finite element analysis etc.The multimedia computing function of computing node is also emphasized in some science computing application, calculates as the many analog simulations that need in atmospheric science and the fluid mechanics application.

Stream handle uses scratchpad memory(SPM, be scratch pad memory or scratch-pad storage) as buffer memory, scratchpad memory is the portable storer of a kind of low capacity, adopts independently address space directly to be visited by the access instruction of processor.Management to scratchpad memory is finished at compiling duration by compiler, so the delay of each access instruction determines that the task executions time is also determined after compiling is finished.Scratchpad memory is particularly suitable for the processing of stream data.General processor adopts the cache(Cache) as buffer memory on the sheet, science is calculated needs the lot of data storage, and cache employing group links to each other and replacement policy can provide more storage space.General stream handle should be handled streaming and use towards multiple application, and the science that faces is again calculated the demand to storage, therefore needs buffer memory on two kinds of sheets.Because the restriction of processor chips area, can not integrated infinitely-great cache and scratchpad memory on the general stream handle as sheet on buffer memory, for the better utilization existing resources, can adopt buffer memory on the sheet of cache and scratchpad memory mixed structure.In the up-to-date stream handle fermi of NVIDIA, adopt configurable mode to share buffer memory on cache and the scratchpad memory mixed structure sheet.But this configurable configuration flexibility of Fermi is poor, and change cache that can not be random and scratchpad memory ratio cause performance reduction when handling some special applications.

Summary of the invention

The technical problem to be solved in the present invention just is: at the technical matters that prior art exists, the invention provides a kind of simple and compact for structure, with low cost, good reliability, applied widely, can change cache and scratchpad memory size arbitrarily, satisfy buffer structure on the sheet of the variable memory access pattern that is used for general stream handle of performance of processors to greatest extent.

For solving the problems of the technologies described above, the present invention by the following technical solutions:

Buffer structure on a kind of sheet of the variable memory access pattern that is used for general stream handle, it is characterized in that: comprise cells of memory arrays, the cache controller, scratchpad memory controller and storage array boundary segmentation logical block, described cells of memory arrays partly is made up of the cache part and the scratchpad memory that are divided into by storage array boundary segmentation logical block, the described cache controller cache part that visits in the cells of memory arrays, the described scratchpad memory controller scratchpad memory part that visits in the cells of memory arrays.

As a further improvement on the present invention:

Described cache controller comprises code translator, logical circuit and border register, and described border register is used on the border of compiling duration according to the instructs registration storage array of processor, and by logical circuit each cache line is produced a judgement position.

Described scratchpad memory controller comprises arbitration logic unit, buffer queue unit, border input block and address mapping logic unit, the priority of carrying out when described arbitration logic unit is used to arbitrate a plurality of data contention is judged, the request of data that needs to wait for is put in the buffer queue unit request that the priority processing rank is high earlier; Described border input block is used to refer to ingredient in the address mapping logic cell memory array.

Storage array in the described cells of memory arrays adopts SRAM, and described cells of memory arrays comprises 2 read ports and 2 write ports.

Compared with prior art, the invention has the advantages that:

1, the present invention is by supporting two kinds of access modules, and flexible and changeable configuration size can satisfy the performance requirement of general stream handle towards multiple application, makes general stream handle have stronger versatility;

2, two kinds of access modules of the present invention's proposition are shared the mode of storage array, have saved chip area, adopt two cover control structures also to avoid the complexity of Logic Circuit Design, have simple, the lower-cost characteristics of realization;

3, the present invention can regard storage on the sheet as single cache or scratchpad memory under special circumstances, can satisfy the performance requirement of stream handle greatly.

Description of drawings

Fig. 1 is a structural representation of the present invention;

Fig. 2 is the structural principle synoptic diagram of cache controller among the present invention;

Fig. 3 is the structural representation of scratchpad memory controller among the present invention.

Embodiment

Below with reference to Figure of description and specific embodiment the present invention is described in further details.

As shown in Figure 1, the present invention is used for buffer structure on the sheet of variable memory access pattern of general stream handle, comprise cells of memory arrays, the cache controller, scratchpad memory controller and storage array boundary segmentation logical block, cells of memory arrays partly is made up of the cache part and the scratchpad memory that are divided into by storage array boundary segmentation logical block, the cache controller cache part that visits in the cells of memory arrays, the scratchpad memory controller scratchpad memory part that visits in the cells of memory arrays.This shows that go up buffer memory for of the present invention and can support two kinds of access modules, every kind of access module all has each self-corresponding control structure, two kinds of access modules are shared a storage array.In the present embodiment, the storage array in the cells of memory arrays adopts SRAM, and it has 2 read ports and 2 write ports.

When stream handle is handled application program, just can learn that at compiling duration this uses needed scratchpad memory size, storage array boundary segmentation logical block is divided into cache and scratchpad memory two parts to storage array.

Access instruction can be told the size of cache part in the cache controller memory array, when stream handle visit cache part, because the cache part is transparent to processor, the size processor of cache part can't be distinguished, should check at first whether tag is the position at cache part place, if just can visit data in the scope of cache part; If not in this position, show that routine access makes mistakes.When stream handle visit scratchpad memory, because scratchpad memory size is by the decision of compile time procedure size, so decoding logic is pointed to scratchpad memory position partly in the storage array to the address, just can not produce overflow problem.

As shown in Figure 2, in the present embodiment, the cache controller mainly comprises code translator, logical circuit and border register.At first in the tag array, added one, be used to judge that current cache line belongs to the cache part or belong to scratchpad memory part.This generation of judging the position comes from compiling duration, and processor sends an instruction to the cache controller, and the border register in the cache controller will be registered the border of storage array.By certain logical circuit each cache line is produced one and judge the position.When cache visits, judge whether this cache line belongs to the cache part earlier, if in the scope of cache part, continue to search the tag array so, carry out next step data access, otherwise stop searching this tag.

The cache controller class of other parts in the Cache controller (such as code translator etc.) and general processor seemingly after input enters code translator, produces the address and searches the tag array.After finishing the tag array and searching, determined the bit line of visit data, in storage array, just can find corresponding data then.

In concrete application example, the size of border register can change according to the granularity of boundary demarcation.If carry out fine-grained division, can be divided into two parts to storage array with the size of cache line, the state of each cache line of border recorder trace is determined the value of corresponding judgment position.If adopt the division of coarseness, just can reduce the expense of border register.

As shown in Figure 3, in the present embodiment, scratchpad memory controller mainly comprises arbitration logic unit, buffer queue unit, border input block and address mapping logic unit.Because just determined scratchpad memory size partly at compiling duration, therefore the visit to scratchpad memory all is can not surpass this boundary.Arbitration logic unit mainly is that the priority of carrying out when being used to arbitrate a plurality of data contention judges that the request of data that needs to wait for is put in the buffer queue unit request that the priority processing rank is high earlier.The border input block be used for telling the address mapping logic unit those are cache, those are scratchpad memory, not the situation that can cause data access to overflow.

Below only be preferred implementation of the present invention, protection scope of the present invention also not only is confined to the foregoing description, and all technical schemes that belongs under the thinking of the present invention all belong to protection scope of the present invention.Should be pointed out that for those skilled in the art the some improvements and modifications not breaking away under the principle of the invention prerequisite should be considered as protection scope of the present invention.

Claims

1. buffer structure on the sheet of a variable memory access pattern that is used for general stream handle, it is characterized in that: comprise cells of memory arrays, the cache controller, scratchpad memory controller and storage array boundary segmentation logical block, described cells of memory arrays partly is made up of the cache part and the scratchpad memory that are divided into by storage array boundary segmentation logical block, the described cache controller cache part that visits in the cells of memory arrays, the described scratchpad memory controller scratchpad memory part that visits in the cells of memory arrays.

2. buffer structure on the sheet of the variable memory access pattern that is used for general stream handle according to claim 1, it is characterized in that: described cache controller comprises code translator, logical circuit and border register, described border register is used on the border of compiling duration according to the instructs registration storage array of processor, and by logical circuit each cache line is produced a judgement position.

3. buffer structure on the sheet of the variable memory access pattern that is used for general stream handle according to claim 1, it is characterized in that: described scratchpad memory controller comprises arbitration logic unit, buffer queue unit, border input block and address mapping logic unit, the priority of carrying out when described arbitration logic unit is used to arbitrate a plurality of data contention is judged, the request of data that needs to wait for is put in the buffer queue unit request that the priority processing rank is high earlier; Described border input block is used to refer to ingredient in the address mapping logic cell memory array.

4. according to buffer structure on the sheet of claim 1 or the 2 or 3 described variable memory access patterns that are used for general stream handle, it is characterized in that: the storage array in the described cells of memory arrays adopts SRAM, and described cells of memory arrays comprises 2 read ports and 2 write ports.