CN102043723A - On-chip cache structure used for variable memory access mode of general-purpose stream processor - Google Patents

On-chip cache structure used for variable memory access mode of general-purpose stream processor Download PDF

Info

Publication number
CN102043723A
CN102043723A CN 201110001556 CN201110001556A CN102043723A CN 102043723 A CN102043723 A CN 102043723A CN 201110001556 CN201110001556 CN 201110001556 CN 201110001556 A CN201110001556 A CN 201110001556A CN 102043723 A CN102043723 A CN 102043723A
Authority
CN
China
Prior art keywords
memory
cache
cells
controller
border
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN 201110001556
Other languages
Chinese (zh)
Other versions
CN102043723B (en
Inventor
邢座程
付桂涛
陈小保
马安国
黄平
汤先拓
何锐
王庆林
晏小波
李方圆
邱建雄
蔡放
闵银皮
梅家祥
孟晓冬
赵齐
王宏燕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN2011100015569A priority Critical patent/CN102043723B/en
Publication of CN102043723A publication Critical patent/CN102043723A/en
Application granted granted Critical
Publication of CN102043723B publication Critical patent/CN102043723B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Memory System Of A Hierarchy Structure (AREA)

Abstract

The invention discloses an on-chip cache structure used for a variable memory access mode of a general-purpose stream processor, which comprises a memory array unit, a cache controller, a scratch pad memory controller and a memory array boundary segmentation logical unit, wherein the memory array unit comprises a cache part and a scratch pad memory part which are segmented through the memory array boundary segmentation logical unit, the cache controller is used for accessing the cache part in the memory array unit, and the scratch pad memory controller is used for accessing the scratch pad memory part in the memory array unit. The on-chip cache structure has the advantages of simple and compact structure, low cost, good reliability and a wide range of applications, and can change the size ratio of the cache to the scratch pad memory, meet the performances of the processor to the maximum extent and the like.

Description

Be used for buffer structure on the sheet of variable memory access pattern of general stream handle
Technical field
The present invention is mainly concerned with the buffer structure field of stream handle, refers in particular to buffer structure on a kind of sheet of the variable memory access pattern that is applicable to general stream handle.
Background technology
In up-to-date high-performance microprocessor structural research, flowing system structure is becoming the research focus, multiple processor with stream processing feature has appearred, as the Imagine stream handle of Stanford design and STI(Sony, Toshiba, IBM) the Cell processor released of alliance etc.Flowing system structure can effectively be alleviated storage wall bottleneck, other program parallelization of the multiple level of exploitation, the verified handling property that can greatly promote application such as media and data-intensive science calculating.Current main-stream processor and video card factory commercial city are in research and release the processor with stream processing feature, as the professional figure video card of NVIDIA at GPGPU(General Purpose GPU) medical imaging direction in the field obtained fabulous effect; In the supercomputing field, the Linpack performance of GPU is tens times of at present equal price polycaryon processor.AMD has released the Firestream processor, and its peak performance reaches 500GFLOPS.Intel also is about to release the Larrabee graphic process unit with stream processing feature, and its peak performance will reach 1TFLOPS.Studies show that stream is handled and is suitable for a lot of applications, a lot of research is at present attempted stream handle is applied to the application that science is calculated, and CUDA is exactly typical case's representative of this application.Science is calculated at first needs the computing system to have powerful floating-point operation ability, calculates as the main body of macromolecule motion analysis, oil exploration analysis, atmospheric science, solid mechanics, molecular mechanics, fluid mechanics, finite element analysis etc.The multimedia computing function of computing node is also emphasized in some science computing application, calculates as the many analog simulations that need in atmospheric science and the fluid mechanics application.
Stream handle uses scratchpad memory(SPM, be scratch pad memory or scratch-pad storage) as buffer memory, scratchpad memory is the portable storer of a kind of low capacity, adopts independently address space directly to be visited by the access instruction of processor.Management to scratchpad memory is finished at compiling duration by compiler, so the delay of each access instruction determines that the task executions time is also determined after compiling is finished.Scratchpad memory is particularly suitable for the processing of stream data.General processor adopts the cache(Cache) as buffer memory on the sheet, science is calculated needs the lot of data storage, and cache employing group links to each other and replacement policy can provide more storage space.General stream handle should be handled streaming and use towards multiple application, and the science that faces is again calculated the demand to storage, therefore needs buffer memory on two kinds of sheets.Because the restriction of processor chips area, can not integrated infinitely-great cache and scratchpad memory on the general stream handle as sheet on buffer memory, for the better utilization existing resources, can adopt buffer memory on the sheet of cache and scratchpad memory mixed structure.In the up-to-date stream handle fermi of NVIDIA, adopt configurable mode to share buffer memory on cache and the scratchpad memory mixed structure sheet.But this configurable configuration flexibility of Fermi is poor, and change cache that can not be random and scratchpad memory ratio cause performance reduction when handling some special applications.
Summary of the invention
The technical problem to be solved in the present invention just is: at the technical matters that prior art exists, the invention provides a kind of simple and compact for structure, with low cost, good reliability, applied widely, can change cache and scratchpad memory size arbitrarily, satisfy buffer structure on the sheet of the variable memory access pattern that is used for general stream handle of performance of processors to greatest extent.
For solving the problems of the technologies described above, the present invention by the following technical solutions:
Buffer structure on a kind of sheet of the variable memory access pattern that is used for general stream handle, it is characterized in that: comprise cells of memory arrays, the cache controller, scratchpad memory controller and storage array boundary segmentation logical block, described cells of memory arrays partly is made up of the cache part and the scratchpad memory that are divided into by storage array boundary segmentation logical block, the described cache controller cache part that visits in the cells of memory arrays, the described scratchpad memory controller scratchpad memory part that visits in the cells of memory arrays.
As a further improvement on the present invention:
Described cache controller comprises code translator, logical circuit and border register, and described border register is used on the border of compiling duration according to the instructs registration storage array of processor, and by logical circuit each cache line is produced a judgement position.
Described scratchpad memory controller comprises arbitration logic unit, buffer queue unit, border input block and address mapping logic unit, the priority of carrying out when described arbitration logic unit is used to arbitrate a plurality of data contention is judged, the request of data that needs to wait for is put in the buffer queue unit request that the priority processing rank is high earlier; Described border input block is used to refer to ingredient in the address mapping logic cell memory array.
Storage array in the described cells of memory arrays adopts SRAM, and described cells of memory arrays comprises 2 read ports and 2 write ports.
Compared with prior art, the invention has the advantages that:
1, the present invention is by supporting two kinds of access modules, and flexible and changeable configuration size can satisfy the performance requirement of general stream handle towards multiple application, makes general stream handle have stronger versatility;
2, two kinds of access modules of the present invention's proposition are shared the mode of storage array, have saved chip area, adopt two cover control structures also to avoid the complexity of Logic Circuit Design, have simple, the lower-cost characteristics of realization;
3, the present invention can regard storage on the sheet as single cache or scratchpad memory under special circumstances, can satisfy the performance requirement of stream handle greatly.
Description of drawings
Fig. 1 is a structural representation of the present invention;
Fig. 2 is the structural principle synoptic diagram of cache controller among the present invention;
Fig. 3 is the structural representation of scratchpad memory controller among the present invention.
Embodiment
Below with reference to Figure of description and specific embodiment the present invention is described in further details.
As shown in Figure 1, the present invention is used for buffer structure on the sheet of variable memory access pattern of general stream handle, comprise cells of memory arrays, the cache controller, scratchpad memory controller and storage array boundary segmentation logical block, cells of memory arrays partly is made up of the cache part and the scratchpad memory that are divided into by storage array boundary segmentation logical block, the cache controller cache part that visits in the cells of memory arrays, the scratchpad memory controller scratchpad memory part that visits in the cells of memory arrays.This shows that go up buffer memory for of the present invention and can support two kinds of access modules, every kind of access module all has each self-corresponding control structure, two kinds of access modules are shared a storage array.In the present embodiment, the storage array in the cells of memory arrays adopts SRAM, and it has 2 read ports and 2 write ports.
When stream handle is handled application program, just can learn that at compiling duration this uses needed scratchpad memory size, storage array boundary segmentation logical block is divided into cache and scratchpad memory two parts to storage array.
Access instruction can be told the size of cache part in the cache controller memory array, when stream handle visit cache part, because the cache part is transparent to processor, the size processor of cache part can't be distinguished, should check at first whether tag is the position at cache part place, if just can visit data in the scope of cache part; If not in this position, show that routine access makes mistakes.When stream handle visit scratchpad memory, because scratchpad memory size is by the decision of compile time procedure size, so decoding logic is pointed to scratchpad memory position partly in the storage array to the address, just can not produce overflow problem.
As shown in Figure 2, in the present embodiment, the cache controller mainly comprises code translator, logical circuit and border register.At first in the tag array, added one, be used to judge that current cache line belongs to the cache part or belong to scratchpad memory part.This generation of judging the position comes from compiling duration, and processor sends an instruction to the cache controller, and the border register in the cache controller will be registered the border of storage array.By certain logical circuit each cache line is produced one and judge the position.When cache visits, judge whether this cache line belongs to the cache part earlier, if in the scope of cache part, continue to search the tag array so, carry out next step data access, otherwise stop searching this tag.
The cache controller class of other parts in the Cache controller (such as code translator etc.) and general processor seemingly after input enters code translator, produces the address and searches the tag array.After finishing the tag array and searching, determined the bit line of visit data, in storage array, just can find corresponding data then.
In concrete application example, the size of border register can change according to the granularity of boundary demarcation.If carry out fine-grained division, can be divided into two parts to storage array with the size of cache line, the state of each cache line of border recorder trace is determined the value of corresponding judgment position.If adopt the division of coarseness, just can reduce the expense of border register.
As shown in Figure 3, in the present embodiment, scratchpad memory controller mainly comprises arbitration logic unit, buffer queue unit, border input block and address mapping logic unit.Because just determined scratchpad memory size partly at compiling duration, therefore the visit to scratchpad memory all is can not surpass this boundary.Arbitration logic unit mainly is that the priority of carrying out when being used to arbitrate a plurality of data contention judges that the request of data that needs to wait for is put in the buffer queue unit request that the priority processing rank is high earlier.The border input block be used for telling the address mapping logic unit those are cache, those are scratchpad memory, not the situation that can cause data access to overflow.
Below only be preferred implementation of the present invention, protection scope of the present invention also not only is confined to the foregoing description, and all technical schemes that belongs under the thinking of the present invention all belong to protection scope of the present invention.Should be pointed out that for those skilled in the art the some improvements and modifications not breaking away under the principle of the invention prerequisite should be considered as protection scope of the present invention.

Claims (4)

1. buffer structure on the sheet of a variable memory access pattern that is used for general stream handle, it is characterized in that: comprise cells of memory arrays, the cache controller, scratchpad memory controller and storage array boundary segmentation logical block, described cells of memory arrays partly is made up of the cache part and the scratchpad memory that are divided into by storage array boundary segmentation logical block, the described cache controller cache part that visits in the cells of memory arrays, the described scratchpad memory controller scratchpad memory part that visits in the cells of memory arrays.
2. buffer structure on the sheet of the variable memory access pattern that is used for general stream handle according to claim 1, it is characterized in that: described cache controller comprises code translator, logical circuit and border register, described border register is used on the border of compiling duration according to the instructs registration storage array of processor, and by logical circuit each cache line is produced a judgement position.
3. buffer structure on the sheet of the variable memory access pattern that is used for general stream handle according to claim 1, it is characterized in that: described scratchpad memory controller comprises arbitration logic unit, buffer queue unit, border input block and address mapping logic unit, the priority of carrying out when described arbitration logic unit is used to arbitrate a plurality of data contention is judged, the request of data that needs to wait for is put in the buffer queue unit request that the priority processing rank is high earlier; Described border input block is used to refer to ingredient in the address mapping logic cell memory array.
4. according to buffer structure on the sheet of claim 1 or the 2 or 3 described variable memory access patterns that are used for general stream handle, it is characterized in that: the storage array in the described cells of memory arrays adopts SRAM, and described cells of memory arrays comprises 2 read ports and 2 write ports.
CN2011100015569A 2011-01-06 2011-01-06 On-chip cache structure used for variable memory access mode of general-purpose stream processor Expired - Fee Related CN102043723B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2011100015569A CN102043723B (en) 2011-01-06 2011-01-06 On-chip cache structure used for variable memory access mode of general-purpose stream processor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2011100015569A CN102043723B (en) 2011-01-06 2011-01-06 On-chip cache structure used for variable memory access mode of general-purpose stream processor

Publications (2)

Publication Number Publication Date
CN102043723A true CN102043723A (en) 2011-05-04
CN102043723B CN102043723B (en) 2012-08-22

Family

ID=43909874

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2011100015569A Expired - Fee Related CN102043723B (en) 2011-01-06 2011-01-06 On-chip cache structure used for variable memory access mode of general-purpose stream processor

Country Status (1)

Country Link
CN (1) CN102043723B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105009093A (en) * 2013-02-21 2015-10-28 高通股份有限公司 Inter-set wear-leveling for caches with limited write endurance
CN105263022A (en) * 2015-09-21 2016-01-20 山东大学 Multi-core hybrid storage management method for high efficiency video coding (HEVC) process
CN106990940A (en) * 2016-01-20 2017-07-28 南京艾溪信息科技有限公司 A kind of vector calculation device
US10762164B2 (en) 2016-01-20 2020-09-01 Cambricon Technologies Corporation Limited Vector and matrix computing device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101477493A (en) * 2008-12-17 2009-07-08 康佳集团股份有限公司 Method for implementing block memory device
CN201570016U (en) * 2009-12-25 2010-09-01 东南大学 Dynamic command on-chip heterogenous memory resource distribution circuit based on virtual memory mechanism
CN101930357A (en) * 2010-08-17 2010-12-29 中国科学院计算技术研究所 System and method for realizing accessing operation by adopting configurable on-chip storage device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101477493A (en) * 2008-12-17 2009-07-08 康佳集团股份有限公司 Method for implementing block memory device
CN201570016U (en) * 2009-12-25 2010-09-01 东南大学 Dynamic command on-chip heterogenous memory resource distribution circuit based on virtual memory mechanism
CN101930357A (en) * 2010-08-17 2010-12-29 中国科学院计算技术研究所 System and method for realizing accessing operation by adopting configurable on-chip storage device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
《电  子  学  报》 20051130 温淑鸿 等 嵌入式多媒体应用中的片上存储器分配 第33卷, 第11期 2 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105009093A (en) * 2013-02-21 2015-10-28 高通股份有限公司 Inter-set wear-leveling for caches with limited write endurance
CN105263022A (en) * 2015-09-21 2016-01-20 山东大学 Multi-core hybrid storage management method for high efficiency video coding (HEVC) process
CN105263022B (en) * 2015-09-21 2018-03-02 山东大学 A kind of multinuclear mixing memory management method for HEVC Video codings
CN106990940A (en) * 2016-01-20 2017-07-28 南京艾溪信息科技有限公司 A kind of vector calculation device
CN106990940B (en) * 2016-01-20 2020-05-22 中科寒武纪科技股份有限公司 Vector calculation device and calculation method
US10762164B2 (en) 2016-01-20 2020-09-01 Cambricon Technologies Corporation Limited Vector and matrix computing device
US11734383B2 (en) 2016-01-20 2023-08-22 Cambricon Technologies Corporation Limited Vector and matrix computing device

Also Published As

Publication number Publication date
CN102043723B (en) 2012-08-22

Similar Documents

Publication Publication Date Title
US10102179B2 (en) Multiple core computer processor with globally-accessible local memories
Seshadri et al. Simple operations in memory to reduce data movement
KR101830685B1 (en) On-chip mesh interconnect
CN104412233B (en) The distribution of aliasing register in pipeline schedule
CN105492989B (en) For managing device, system, method and the machine readable media of the gate carried out to clock
US20190228308A1 (en) Deep learning accelerator system and methods thereof
CN102043723B (en) On-chip cache structure used for variable memory access mode of general-purpose stream processor
CN100489830C (en) 64 bit stream processor chip system structure oriented to scientific computing
Wang et al. ProPRAM: Exploiting the transparent logic resources in non-volatile memory for near data computing
Tseng et al. Gullfoss: Accelerating and simplifying data movement among heterogeneous computing and storage resources
Zhang et al. Pm3: Power modeling and power management for processing-in-memory
Oukid et al. On the diversity of memory and storage technologies
Kogge et al. Yearly update: exascale projections for 2013.
CN102760106A (en) PCI (peripheral component interconnect) academic data mining chip and operation method thereof
Ye On-chip multiprocessor communication network design and analysis
Liu et al. Hippogriff: Efficiently moving data in heterogeneous computing systems
JP2012008747A (en) Integration device, memory allocation method and program
Chang et al. Guest editorial: IEEE transactions on computers special section on emerging non-volatile memory technologies: From devices to architectures and systems
US20230027351A1 (en) Temporal graph analytics on persistent memory
Ciobanu et al. Scalability study of polymorphic register files
Chunmao et al. Research of embedded operating system based on multi-core processor
Marongiu et al. Controlling NUMA effects in embedded manycore applications with lightweight nested parallelism support
Sun et al. An energy-efficient 3D stacked STT-RAM cache architecture for CMPs
Fang Architecture support for emerging memory technologies
Upadhyay et al. Design and Implementation of Cache Memory with FIFO Cache-Control

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20120822

Termination date: 20130106

CF01 Termination of patent right due to non-payment of annual fee