WO2019118037A1 - Home agent based cache transfer acceleration scheme - Google Patents

Home agent based cache transfer acceleration scheme Download PDF

Info

Publication number
WO2019118037A1
WO2019118037A1 PCT/US2018/051756 US2018051756W WO2019118037A1 WO 2019118037 A1 WO2019118037 A1 WO 2019118037A1 US 2018051756 W US2018051756 W US 2018051756W WO 2019118037 A1 WO2019118037 A1 WO 2019118037A1
Authority
WO
WIPO (PCT)
Prior art keywords
probe
cache
early
entry
processing node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/US2018/051756
Other languages
English (en)
French (fr)
Inventor
Amit P. Apte
Ganesh Balakrishnan
Vydhyanathan Kalyanasundharam
Kevin M. Lepak
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Advanced Micro Devices Inc
Original Assignee
Advanced Micro Devices Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Advanced Micro Devices Inc filed Critical Advanced Micro Devices Inc
Priority to EP21202235.4A priority Critical patent/EP3961409B1/en
Priority to EP18783262.1A priority patent/EP3724772B1/en
Priority to JP2020532672A priority patent/JP6975335B2/ja
Priority to CN201880088010.1A priority patent/CN111656332B/zh
Priority to KR1020207020385A priority patent/KR102383040B1/ko
Publication of WO2019118037A1 publication Critical patent/WO2019118037A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0815Cache consistency protocols
    • G06F12/0817Cache consistency protocols using directory methods
    • G06F12/082Associative directories
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/12Replacement control
    • G06F12/121Replacement control using replacement algorithms
    • G06F12/128Replacement control using replacement algorithms adapted to multidimensional cache systems, e.g. set-associative, multicache, multiset or multilevel
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0815Cache consistency protocols
    • G06F12/0817Cache consistency protocols using directory methods
    • G06F12/0824Distributed directories, e.g. linked lists of caches
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0815Cache consistency protocols
    • G06F12/0817Cache consistency protocols using directory methods
    • G06F12/0826Limited pointers directories; State-only directories without pointers
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0815Cache consistency protocols
    • G06F12/0831Cache consistency protocols using a bus scheme, e.g. with bus monitoring or watching means
    • G06F12/0833Cache consistency protocols using a bus scheme, e.g. with bus monitoring or watching means in combination with broadcast means (e.g. for invalidation or updating)
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0891Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches using clearing, invalidating or resetting means
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/10Providing a specific technical effect
    • G06F2212/1016Performance improvement
    • G06F2212/1024Latency reduction
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/50Control mechanisms for virtual memory, cache or TLB
    • G06F2212/507Control mechanisms for virtual memory, cache or TLB using speculative control
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/62Details of cache specific to multiprocessor cache arrangements

Definitions

  • each CPU 305A-N is coupled to a pair of coherent slaves via a corresponding coherent master 315A-N and bus/fabric 318.
  • CPU 305 A is coupled through coherent master 315A and bus/fabric 318 to coherent slaves 320A-B.
  • Coherent slave (CS) 320A is coupled to memory controller (MC) 330A and coherent slave 320B is coupled to memory controller 330B.
  • Coherent slave 320A is coupled to probe filter (PF) 325A, with probe filter 325A including entries for memory regions that have cache lines cached in system 300 for the memory accessible through memory controller 330A.
  • PF probe filter
  • the corresponding demand probe generated after the lookup of probe filter 415 can be handled in different manners depending on the embodiment.
  • a demand probe is not launched if the early probe is to the correct target.
  • the early probe will cause the data to be retrieved from the target and returned to the requesting node.
  • a demand probe is sent to the correct target.
  • the early probe pulls data out of the cache subsystem of the target and then the data is stored in a temporary buffer. This data can be dropped if a timer expires before the demand probe arrives.
  • a demand probe is launched after the early probe, and the demand probe forwards the data pulled out of the cache subsystem to the requesting node.
  • FIG. 5 one embodiment of a method 500 for implementing an early probe mechanism is shown.
  • the steps in this embodiment and those of FIG. 6 are shown in sequential order. However, it is noted that in various embodiments of the described methods, one or more of the elements described are performed concurrently, in a different order than shown, or are omitted entirely. Other additional elements are also performed as desired. Any of the various systems or apparatuses described herein are configured to implement method 500.
  • the region targeted by the memory request is determined (block 615).
  • a new entry is allocated in the early probe cache for the region of the memory request (block 620). Any suitable eviction algorithm can be utilized to determine which entry to evict to make space for the new entry.
  • the confidence indicator field of the new entry is set to a default value and the LRU field of the new entry is initialized (block 625).
  • An ID of the node targeted by the demand probe is stored in the region owner field of the new entry in the early probe cache (block 630).

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)
PCT/US2018/051756 2017-12-15 2018-09-19 Home agent based cache transfer acceleration scheme Ceased WO2019118037A1 (en)

Priority Applications (5)

Application Number Priority Date Filing Date Title
EP21202235.4A EP3961409B1 (en) 2017-12-15 2018-09-19 Home agent based cache transfer acceleration scheme
EP18783262.1A EP3724772B1 (en) 2017-12-15 2018-09-19 Home agent based cache transfer acceleration scheme
JP2020532672A JP6975335B2 (ja) 2017-12-15 2018-09-19 ホームエージェントベースのキャッシュ転送アクセラレーションスキーム
CN201880088010.1A CN111656332B (zh) 2017-12-15 2018-09-19 基于本地代理的高速缓存传送加速方案
KR1020207020385A KR102383040B1 (ko) 2017-12-15 2018-09-19 홈 에이전트 기반 캐시 전송 가속 기법

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US15/844,215 US10776282B2 (en) 2017-12-15 2017-12-15 Home agent based cache transfer acceleration scheme
US15/844,215 2017-12-15

Publications (1)

Publication Number Publication Date
WO2019118037A1 true WO2019118037A1 (en) 2019-06-20

Family

ID=63794694

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2018/051756 Ceased WO2019118037A1 (en) 2017-12-15 2018-09-19 Home agent based cache transfer acceleration scheme

Country Status (6)

Country Link
US (2) US10776282B2 (https=)
EP (2) EP3724772B1 (https=)
JP (1) JP6975335B2 (https=)
KR (1) KR102383040B1 (https=)
CN (1) CN111656332B (https=)
WO (1) WO2019118037A1 (https=)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10776282B2 (en) 2017-12-15 2020-09-15 Advanced Micro Devices, Inc. Home agent based cache transfer acceleration scheme
US11163688B2 (en) * 2019-09-24 2021-11-02 Advanced Micro Devices, Inc. System probe aware last level cache insertion bypassing
US11210248B2 (en) * 2019-12-20 2021-12-28 Advanced Micro Devices, Inc. System direct memory access engine offload
US11874783B2 (en) * 2021-12-21 2024-01-16 Advanced Micro Devices, Inc. Coherent block read fulfillment
US20250240156A1 (en) * 2022-12-23 2025-07-24 Advanced Micro Devices, Inc. Systems and methods relating to confidential computing key mixing hazard management
CN117651021B (zh) * 2024-01-25 2024-04-30 苏州萨沙迈半导体有限公司 过滤器及其控制方法和装置、电气设备
US20260086950A1 (en) * 2024-09-25 2026-03-26 Advanced Micro Devices, Inc. Systems and methods for region-based probe filter shootdown

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090327616A1 (en) * 2008-06-30 2009-12-31 Patrick Conway Snoop filtering mechanism
CN104331377A (zh) * 2014-11-12 2015-02-04 浪潮(北京)电子信息产业有限公司 一种多核处理器系统的目录缓存管理方法
US20160117250A1 (en) * 2014-10-22 2016-04-28 Imagination Technologies Limited Apparatus and Method of Throttling Hardware Pre-fetch
US20170177484A1 (en) * 2015-12-22 2017-06-22 Advanced Micro Devices, Inc. Region probe filter for distributed memory system

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6055610A (en) * 1997-08-25 2000-04-25 Hewlett-Packard Company Distributed memory multiprocessor computer system with directory based cache coherency with ambiguous mapping of cached data to main-memory locations
US6631401B1 (en) 1998-12-21 2003-10-07 Advanced Micro Devices, Inc. Flexible probe/probe response routing for maintaining coherency
US7234029B2 (en) * 2000-12-28 2007-06-19 Intel Corporation Method and apparatus for reducing memory latency in a cache coherent multi-node architecture
US7395375B2 (en) * 2004-11-08 2008-07-01 International Business Machines Corporation Prefetch miss indicator for cache coherence directory misses on external caches
WO2007025112A1 (en) 2005-08-23 2007-03-01 Advanced Micro Devices, Inc. Method for proactive synchronization within a computer system
WO2010052799A1 (ja) * 2008-11-10 2010-05-14 富士通株式会社 情報処理装置及びメモリ制御装置
US9081706B2 (en) * 2012-05-10 2015-07-14 Oracle International Corporation Using a shared last-level TLB to reduce address-translation latency
US9405687B2 (en) * 2013-11-04 2016-08-02 Intel Corporation Method, apparatus and system for handling cache misses in a processor
US9639470B2 (en) * 2014-08-26 2017-05-02 Arm Limited Coherency checking of invalidate transactions caused by snoop filter eviction in an integrated circuit
US11237965B2 (en) * 2014-12-31 2022-02-01 Arteris, Inc. Configurable snoop filters for cache coherent systems
US9817760B2 (en) * 2016-03-07 2017-11-14 Qualcomm Incorporated Self-healing coarse-grained snoop filter
US11061572B2 (en) * 2016-04-22 2021-07-13 Advanced Micro Devices, Inc. Memory object tagged memory monitoring method and system
US10776282B2 (en) 2017-12-15 2020-09-15 Advanced Micro Devices, Inc. Home agent based cache transfer acceleration scheme

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090327616A1 (en) * 2008-06-30 2009-12-31 Patrick Conway Snoop filtering mechanism
US20160117250A1 (en) * 2014-10-22 2016-04-28 Imagination Technologies Limited Apparatus and Method of Throttling Hardware Pre-fetch
CN104331377A (zh) * 2014-11-12 2015-02-04 浪潮(北京)电子信息产业有限公司 一种多核处理器系统的目录缓存管理方法
US20170177484A1 (en) * 2015-12-22 2017-06-22 Advanced Micro Devices, Inc. Region probe filter for distributed memory system

Also Published As

Publication number Publication date
US10776282B2 (en) 2020-09-15
US20210064545A1 (en) 2021-03-04
US20190188155A1 (en) 2019-06-20
CN111656332B (zh) 2024-08-27
EP3961409B1 (en) 2024-04-10
CN111656332A (zh) 2020-09-11
JP6975335B2 (ja) 2021-12-01
KR20200096975A (ko) 2020-08-14
EP3961409A1 (en) 2022-03-02
KR102383040B1 (ko) 2022-04-08
JP2021507371A (ja) 2021-02-22
EP3724772A1 (en) 2020-10-21
EP3724772B1 (en) 2021-10-27
US11782848B2 (en) 2023-10-10

Similar Documents

Publication Publication Date Title
US11809322B2 (en) Region based directory scheme to adapt to large cache sizes
US11782848B2 (en) Home agent based cache transfer acceleration scheme
JP7036988B2 (ja) 領域ベースのキャッシュディレクトリスキームにおけるプライベート領域へのアクセスの加速
EP4035015B1 (en) System probe aware last level cache insertion bypassing
US10503648B2 (en) Cache to cache data transfer acceleration techniques
US12332795B2 (en) Reducing probe filter accesses for processing in memory requests
US10545875B2 (en) Tag accelerator for low latency DRAM cache

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18783262

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2020532672

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 20207020385

Country of ref document: KR

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2018783262

Country of ref document: EP

Effective date: 20200715