TWI310500B - A method, system, and computer readable storage medium, provided with relative instructions, for optionally pushing i/o data into a processor's cache - Google Patents

A method, system, and computer readable storage medium, provided with relative instructions, for optionally pushing i/o data into a processor's cache Download PDF

Info

Publication number
TWI310500B
TWI310500B TW094142814A TW94142814A TWI310500B TW I310500 B TWI310500 B TW I310500B TW 094142814 A TW094142814 A TW 094142814A TW 94142814 A TW94142814 A TW 94142814A TW I310500 B TWI310500 B TW I310500B
Authority
TW
Taiwan
Prior art keywords
data
processor
cache
dead knot
memory
Prior art date
Application number
TW094142814A
Other languages
Chinese (zh)
Other versions
TW200643727A (en
Inventor
Shubhendu Mukherjee
Original Assignee
Intel Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Intel Corp filed Critical Intel Corp
Publication of TW200643727A publication Critical patent/TW200643727A/en
Application granted granted Critical
Publication of TWI310500B publication Critical patent/TWI310500B/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0815Cache consistency protocols
    • G06F12/0831Cache consistency protocols using a bus scheme, e.g. with bus monitoring or watching means
    • G06F12/0835Cache consistency protocols using a bus scheme, e.g. with bus monitoring or watching means for main memory peripheral accesses (e.g. I/O or DMA)
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0862Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with prefetch

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)
  • Multi Processors (AREA)

Description

1310500 (1) 九、發明說明 【發明所屬之技術領域】 本發明係關於可選擇地將I/O資料推送至處理器之快 取中的方法、系統與裝置。 錢 . 【先前技術】 用於計算機系統之處理器,例如網際網路服務,在資 Φ 料上非常快速運作,因此需要一穩定的資料供應以有效運 作。倘若處理器所需要獲得的資料不是來自處理器內部的 快取記憶體’則當資料正在讀取時可能導致處理器閒置的 時脈週期。某些嘗試克服處理器效益的先前技術設計,包 - 含當資料一寫入記憶體時即立刻將資料推送至快取記憶體 中。這些先前技術設計面臨一問題,即是倘若該資料並非 馬上所需要’則該資料可能被蓋過去,當需要該資料時又 要重新到記憶體去提取該資料,因此浪費傳送頻寬。這些 • 先前技術設計面臨另—問題則是在多處理器系統裡,該系 統無法決定哪個處理器需要資料。 '【發明內容】 本發明係提供一種執行快取推送的方法,包括將資料 從一輸入/輸出(I/O )裝置直接推送至一處理器之一快取 中’以回應進來資料的偵測;以及若發生一死結時,可選 擇地丟掉該資料。 本發明亦提供一種執行快取推送的系統及其儲存媒體 -4- (2) 1310500 。在下文中’本發明之優點及其他特徵將藉由若干實施例 及附圖加以詳述。 【實施方式】 在下面說明中,爲了解釋與不受限的目的,將提到特 定的細節’如特別的結構、構造、介面、技術等等,以提 供本發明在不同觀點的完整解說,但是,對受益於本揭露 之技術專業人士而言’本發明的不同觀點可實施在與這些 特定細節背離的其他範例是顯而易見的。在某些情況下, 省略眾所周知的裝置、電路,與方法之說明,並不會對本 發明的說明造成誤解。 本發明的實施例一般係指供計算機系統執行如快取推 送之已知技術的方法、裝置與系統。快取推送技術藉由可 選擇地將資料推送至另一處理器之快取的能力而提昇單一 寫入程式無效協定,但卻不會改變記憶體的一致性模式。 記憶體的一致性模式可以是循序一致性順序模式。循序— 致性模式係一種記憶體存取順序的模式,該模式可能要求 程式在不同的處理器執行時,應該以程式在某一處理器執 行的某種交錯方式來呈現。對各處理器而言,假如沒有讀 或寫係根據來自相同處理器的先前讀或寫而重排,則可滿 足循序一致性存取模式。 在不改變一致性模式之下,藉由無縫整合快取推送技 術’可避免對計算機系統或者機器之程式模組有任何改變 。此外,快取推送可在推送(或製作者)快取或裝置的控 -5- (3) 1310500 制之下,因而達到最佳化以避免互連頻寬 在適當時間將資料推送至另一處理器之快 效益的優勢’該效益係指必要時將資料移 之效益。 ' 圖1係計算機系統適於執行符合本發 • 取推送範例之方塊圖。計算機系統1 〇 〇係 傳統與非傳統之計算機系統、伺服器、網 Φ 路由器、無線通訊用戶單元 '無線通訊電 、個人數位助理、機上盒,或任一電子家 學理而受益的任一種。根據本文所描述的 系統100可包括一個或多個處理器1〇2、 ' 1 0 8、1 1 0 ' Η 2 ;匯流排1 1 6 ;記憶體1 1 I/O )控制器〗20。 處理器102、104、106可代表很多種 個’該控制邏輯包括但不侷限於一個或多 Φ 程式化邏輯裝置(PLD)、可程式化邏輯 特定應用積體電路(ASIC)、微控制器, ' 本發明係不受制在此方面。在一實施例!1310500 (1) Description of the Invention [Technical Field] The present invention relates to a method, system and apparatus for selectively pushing I/O data into a processor's cache. [Previous Technology] Processors used in computer systems, such as Internet services, operate very quickly on the resources and therefore require a stable data supply to operate effectively. If the data that the processor needs to obtain is not from the internal cache memory of the processor, then the clock cycle may be caused by the processor being idle while the data is being read. Some prior art designs that attempt to overcome processor benefits include the ability to push data into the cache as soon as the data is written to the memory. A problem with these prior art designs is that if the information is not needed immediately, then the data may be overwritten and the data will be retrieved when the data is needed, thus wasting the transmission bandwidth. These • The prior art design faces another problem—in a multiprocessor system, the system cannot decide which processor needs the data. SUMMARY OF THE INVENTION The present invention provides a method of performing a cache push, including pushing data directly from an input/output (I/O) device to a processor in a cache to respond to incoming data detection. And, if a dead knot occurs, optionally discard the material. The present invention also provides a system for performing cache push and its storage medium -4- (2) 1310500. The advantages and other features of the present invention will be described in detail hereinafter by way of several embodiments and the accompanying drawings. DETAILED DESCRIPTION OF THE INVENTION In the following description, for purposes of explanation and description, reference to the It will be apparent to those skilled in the art having the benefit of this disclosure that the various aspects of the present invention may be practiced in other examples. In some instances, well-known devices, circuits, and methods are omitted and the description of the present invention is not misunderstood. Embodiments of the invention generally refer to methods, apparatus and systems for a computer system to perform known techniques such as cache feed. Cache push technology enhances a single write invalidation protocol by selectively pushing data to another processor's cache, but does not change the memory's consistency mode. The consistency mode of the memory can be a sequential consistency sequential mode. Sequential-to-sense mode is a pattern of memory access sequences that may require programs to be rendered in a staggered manner in which a program executes on a particular processor when executed by a different processor. For each processor, a sequential consistent access mode can be satisfied if no read or write is rearranged based on previous reads or writes from the same processor. By seamlessly integrating the cache push technology without changing the consistency mode, any changes to the computer system or the program modules of the machine can be avoided. In addition, the cache push can be optimized under push (or producer) cache or device control -5 (13) 1310500, thus optimizing the interconnect bandwidth to push data to another at the appropriate time. The advantage of the fast benefit of the processor' This benefit refers to the benefit of moving the data when necessary. Figure 1 is a block diagram of a computer system suitable for performing a push example in accordance with the present invention. Computer System 1 传统 Traditional and non-traditional computer systems, servers, networks Φ routers, wireless communication subscriber units 'wireless communication, personal digital assistants, set-top boxes, or any of the benefits of electronics. System 100 according to the description herein may include one or more processors 1 〇 2, '1 0 8 , 1 1 0 ' Η 2; bus 1 1 6 ; memory 1 1 I/O) controller -20. The processors 102, 104, 106 can represent a wide variety of 'control logic' including but not limited to one or more Φ stylized logic devices (PLDs), programmable logic application specific integrated circuits (ASICs), microcontrollers, The invention is not subject to this aspect. In an embodiment!

• 100可以是網路伺服器,以及處理器可J• 100 can be a web server, and the processor can be J

Intel Itanium 2 處理器。 在另一個實施例中,計算機系統1〇〇 、或其他、或像單晶片多處理器或多晶片 理器成分。單晶片多處理器係包含多處理 電路’其中每一處理核心係一能執行指令f 的浪費。甚至, 取中將具有改善 至離處理器更近 明一實施例之快 用以代表很多種 路交換機、網路 話基礎架構元件 電等因本發明之 實施例,計算機 104、 106;快取 8與輸入/輸出( 控制邏輯的任一 個微處理器、可 陣列(PLA )、 以及其他,雖然 中’計算機系統 以是一個或多個 可以是多晶片卡 模組這樣的多處 核心之單一積體 Ϊ勺處理。 -6- 104、 (4) 1310500 快取108、110、112係分別連接至處理器1〇2、 1 0 6。處理器可具有內部快取記憶體以作爲對資料 做低存取延遲之用。當一處理器所需執行的資料或 不位在內部的快取時,處理器則嘗試到記憶體〗j 8 料或指令。快取1 〇 8、1 1 0、1 1 2係連接至匯流排1 匯流排1 1 6亦連接至記憶體1 1 8。 匯流排1 1 6在其他實施例裡可以是點對點的互 可以是兩點或多點之匯流排。可使用各種知名或其 可用之匯流排、互連、或其他的通訊協定以允許與 部元件如記憶體、其他處理器' I/O元件、橋接器 通訊。 圖1計算機系統的實施例可包括複數之處理器 之快取。這些處理器與快取包含一多處理器系統, 處理器系統中,透過快取資料一致性之機制,使得 快取間保持一致性。快取資料一致性之協定可在處 快取與記憶體之間實施以確保快取的一致。快取資 性之機制可在匯流排Π 6內包括額外的快取資料一 制線。 記億體1 1 8可代表任一形式之記憶體裝置,用 處理器已使用或將使用之資料與指令。一般而言’ 發明並不受限在此方面,但是記憶體Π 8依然可由 機存取記憶體(DRAM )所組成。在另外實施例中 體1 1 8可包括半導體記憶體。在再另外一實施例中 體1 1 8可包括像硬碟機這樣的磁性儲存裝置。然而: 與指令 指令並 去讀資 1 6,該 連或者 他地方 其他外 等等來 與複數 在該多 快取與 理器之 料一致 致之控 於儲存 雖然本 動態隨 ,言己憶 ,記憶 本發明 (5) !310500 Y受限在這裡所提之記憶體範例。 I/O控制器120係連接至匯流排1 16。I/O控制器120 可代表任一形式之晶片組或控制邏輯作爲I/〇裝置1 22與 言十算機系統100的其他元件相接之介面。一個或多個1/0 裝置1 22可代表任—形式之裝置、周邊或提供計算機系統 1 〇〇輸入或者處理來自計算機系統〗〇〇輸出之元件。在— 實施例中,雖然本發明不是那麼受限,但至少有一個I/O 裝置1 22可做爲網路介面控制器,該網路介面控制器具有 能力來執行直接記憶體存取(DMA )以複製資料至記憶體 118°就此方面而言,由於當收到TCP/IP封包時,I/O裝 置122即執行DMA,因此存在一軟體具有網際網路協定 .之傳輸控制協定(TCP/IP )堆疊則由處理器102、104、 106所執行。本發明—般而言,特別是1/〇裝置122不會 只侷限在作爲網路介面控制器。在其他實施例中,至少有 —I/O裝置1 22可以作爲圖像控制器或磁碟控制器,或者 另一種因本發明學理而受惠之控制器。 圖2係圖1的計算機系統所執行之範例方法的流程圖 。雖然以下的操作係以循序式流程來描述,但是實際上許 多操作係以平行或同時發生的方式來進行,這對那些相關 技術的專業人士而言,是顯而易見的。此外,可以重新安 排操作的順序,卻不會背離本發明實施例的精神。 根據一實施例,圖2的方法200可以從計算機系統 100偵測到一封包205並透過一 I/O裝置122將該封包寫 到記憶體1 1 8而開始。在一實施例中,一 I / 〇裝置發送一 (6) 1310500 通訊至計算機系統1 〇〇表示該封包的細節。在另一實施例 中’計算機系統1 00透過入內寫到記憶體之監測可偵測到 該封包。 接下來,將一中斷210送至一特定的處理器以通知該 處理器去存取資料。在此時該處理器並不知道該資料封包 是在快取或在I/O裝置。於是同時間(在中斷的陰影中) ’ I/O裝置122可將封包直接推送至該處理器之快取215 中。藉由將封包直接推送至快取中,計算機系統1 00可避 免由處理器去讀取來自I/O裝置的封包所發生的額外延遲 。這對計.算機系統而言係特別重要,因爲該計算機系統作 爲高效率路由器從一網路介面讀取封包並將這些封包再導 至另一網路介面。 一旦外部封包到達該處理器之快取,此封包啓動一死 結220。該死結220可以發生在儲存媒體,將於圖4做進 ~步說明。倘若沒發生死結220,則硬體可可選擇地將資 料推送至快取中。因此,處理器可直接從它的快取2 3 5存 取被推送之資料。但是,倘若發生死結220,則在沒有空 間情況下處理器有權丟掉該資料225,因此避免了該死結 。處理器可一直讀取來自記憶體230的資料。 圖3係當圖1的計算機系統決定可選擇地丟掉資料時 ,該系統所執行之範例方法的流程圖。可重新排列操作的 順序卻不會違背本發明實施例的精神係顯而易見。 根據一實施例,圖3的方法3 00可以從計算機系統1 〇 偵測到到達快取的一外部封包而開始3 0 5。但是,該快取 (7) 1310500 可能沒有一區塊可用來儲存資料3 1 0。倘若該快取並沒有 一區塊可分配給該資料’那麼該快取首先必須找出一區塊 予以收回3 1 5。假如被收回的區塊係已使用(dirty ) /修正 ’而且快取可透過通道將該資料寫回到記憶體(如:輸出 緩衝器到記憶體互連)的該等通道目前正忙碌時,那麼該 系統發生死結3 2 0。 假如快取可透過通道將資料寫到記憶體的該等通道並 不忙碌時,那麼該資料封包被推送至該區塊34〇中而且標 註該資料已被修正過345。藉由標示該區塊已被修正,將 使該快取知道此資料尙未存入記憶體。 但是’假如快取可透過通道將資料寫到記憶體的該通 道正忙碌時’那麼一外在之確認訊息被送至I/O控制器 325。該I/O控制器可轉譯將該訊息以決定該資料封包來 自於何方3 3 0。該資料可以從任一上述之1/0裝置來發送 。所以該資料封包傳送到處理器可存取該資料的記憶體 3 3 5 ° 被推送的資料遭可選擇的丟掉,可能不常發生,至少 有下列原因。首先,假如一無效區塊已配置在快取時,那 麼處理器可單純地標示該區塊爲有效並將進來的資料放入 那裡。這並不需要已修正資料的任何寫回。第二,唯讀資 料常存於快取且可被新來的封包蓋過而不用任何的寫回。 第三,輸出緩衝器到記憶體互連可適當地調整大小以緩衝 足夠的運送封包,這可降低緩衝器被塡滿的機率,因而造 成死結。 -10- (8) 1310500 在以傳播爲主的互連中,如匯流排或環狀網路,快取 可窺探通過的資料並將該資料吸收到它自己的快取中。伸 是在一以目錄爲依據的協定裡’製作者(如1/〇裝置或處 理器之快取)可將資料導至消耗特定處理器的快取。 因此’本發明之快取推送技術可改善具有丨/ 〇裝置之 多處理器系統的效率’但不致變更記憶體—致性模式或者 浪費記憶體過多的互連頻寬。 圖4係描繪包括內容405之儲存媒體400範例之方塊 圖,其中當內容被存取時’造成一電子裝置執行本發明之 —個或多個型態且/或與方法200、300有關。就這點而言 ,儲存媒體4 0 0包括內容4 05 (如指令、資料、或任—組 合)’當執行該內容時’造成該機器執行該快取推送技術 的一個或多個型態’如上所述。該機器可讀取(儲存)媒 體300可包括但不受限於軟碟、光碟、CD-ROM,與磁光 碟、ROM、RAM、EPROM、EEPROM、磁或光卡 '快閃言己 憶體、或其他類形之適合儲存電子式指令的媒體/機器可 讀取媒體。 在下面說明中,爲了解釋與不受限的目的,將提到特 定的細節,如特別的結構、構造、介面、技術等等,以提 供本發明在不同觀點的完整解說,但是,對受益於本揭露 之技術專業人士而言,本發明的不同觀點可實施在與這些 特定細節背離的其他範例是顯而易見的。在某些情況下’ 省略眾所周知的裝置、電路,與方法之說明,並不會對本 發明的說明造成誤解。 -11 - (9) 1310500 【圖式簡單說明】 從下面附有圖式所述的較佳實施例說明得知,本發明 的各種特色將顯而易見,其中同樣的參考數字一般泛指貫 • 穿本圖式之相同部位。該等圖式不必按比例繪製,相反地 - 著重在說明本發明的原理。 圖1係計算機系統爲執行符合本發明的—實施例之快 # 取推送範例之方塊圖。 圖2係圖1計算機系統可選擇地將1/0資料推送至處 理器之快取中的執行範例方法之流程圖》 圖3係圖】計算機系統可選擇地從處理器之快取將 I / 〇資料丟掉的執行方法範例之流程圖。 圖4係包括資料內容之商用產品範例之方塊圖,其中 當裝置存取資料內容時,造成該裝置執行本發明一個或多 個實施例之一個或多個以上之型態。 【主要元件符號說明】 . 100 :計算機系統 - 102、 104、 106:處理器 10 8、110、112:快取 ]1 6 :匯流排 1 1 8 :記憶體 120、122 :輸入/輸出控制器 1 0 :計算機系統 -12 - (10)Intel Itanium 2 processor. In another embodiment, the computer system 1 or other, or a single-chip multi-processor or multi-processor component. The single-chip multiprocessor system includes a multiprocessing circuit' in which each processing core is capable of executing the waste of the instruction f. Even, the access will be improved to be closer to the processor, faster than the embodiment to represent a wide variety of switch, network infrastructure components, etc., according to embodiments of the present invention, computers 104, 106; cache 8 With a single input/output (control microprocessor, arrayable (PLA), and others, although the 'computer system' is a single integrated body that is one or more cores that can be multi-chip card modules -6 处理 108 108 108 108 108 108 108 108 108 108 108 108 108 108 108 108 108 108 108 108 108 108 108 108 108 108 108 108 108 108 108 108 108 108 108 108 108 108 108 108 108 108 108 108 108 108 108 108 The delay is used. When a processor needs to execute data or does not have an internal cache, the processor tries to access the memory or instruction. Cache 1 〇 8, 1 1 0, 1 1 2 is connected to busbar 1 Busbar 1 1 6 is also connected to memory 1 1 8 . Busbar 1 1 6 In other embodiments, it may be a point-to-point busbar that can be two or more points. Well-known or available bus, interconnect, or other communication Agreements allow communication with various components such as memory, other processor 'I/O components, bridges. Figure 1 A computer system embodiment can include a plurality of processor caches. These processors and caches contain a multi-processing In the processor system, the mechanism for consistency of data is used to maintain consistency between the caches. The protocol for cache data consistency can be implemented between the cache and the memory to ensure the consistency of the cache. The mechanism of quick access can include additional cache data in the bus. 6 The Bill 1 1 8 can represent any form of memory device, using the data that the processor has used or will use. In general, the invention is not limited in this respect, but the memory Π 8 may still be comprised of a machine access memory (DRAM). In other embodiments, the body 181 may include a semiconductor memory. In another embodiment, the body 1 18 may include a magnetic storage device such as a hard disk drive. However, the instruction command is used to read the capital 1, the connection or other places, and the like. And processor The control is consistent with the storage. Although this dynamic is followed by the memory, the present invention (5) !310500 Y is limited to the memory example mentioned herein. The I/O controller 120 is connected to the bus 1 16 . The I/O controller 120 can represent any form of chipset or control logic as an interface between the I/〇 device 1 22 and other components of the computer system 100. One or more 1/0 devices 1 22 can Representing a form-of-device, peripheral or providing computer system 1 〇〇 inputting or processing components from a computer system 。 output. In the embodiment, although the invention is not so limited, at least one I/O device 1 22 can be used as a network interface controller, the network interface controller has the ability to perform direct memory access (DMA) to copy data to the memory 118° in this respect, since when receiving a TCP/IP packet At this time, the I/O device 122 executes the DMA, so there is a software with a network protocol. The Transmission Control Protocol (TCP/IP) stack is executed by the processors 102, 104, 106. In general, the 1/〇 device 122 is not limited to being a network interface controller. In other embodiments, at least the I/O device 1 22 can function as an image controller or disk controller, or another controller that benefits from the teachings of the present invention. 2 is a flow chart of an exemplary method performed by the computer system of FIG. 1. Although the following operations are described in a sequential process, in practice many of the operations are performed in parallel or concurrent manners, as will be apparent to those skilled in the relevant art. In addition, the order of operations may be rearranged without departing from the spirit of the embodiments of the invention. According to an embodiment, the method 200 of FIG. 2 can begin by detecting a packet 205 from the computer system 100 and writing the packet to the memory 1 18 via an I/O device 122. In one embodiment, an I/〇 device sends a (6) 1310500 communication to the computer system 1 to indicate the details of the packet. In another embodiment, the computer system 100 detects the packet by monitoring the in-write memory. Next, an interrupt 210 is sent to a particular processor to inform the processor to access the data. At this point the processor does not know that the data packet is on the cache or on the I/O device. Thus, at the same time (in the shadow of the interruption), the I/O device 122 can push the packet directly into the cache 215 of the processor. By pushing the packet directly into the cache, computer system 100 can avoid additional delays incurred by the processor in reading packets from the I/O device. This is especially important for computer systems because the computer system acts as a high efficiency router to read packets from a network interface and redirect those packets to another network interface. Once the outer packet arrives at the cache of the processor, the packet initiates a deadlock 220. The dead knot 220 can occur in the storage medium and will be described in Figure 4. If the dead knot 220 does not occur, the hardware can optionally push the data to the cache. Therefore, the processor can directly retrieve the pushed data from its cache 2 3 5 . However, in the event of a dead knot 220, the processor has the right to discard the data 225 in the absence of space, thus avoiding the dead knot. The processor can always read data from the memory 230. 3 is a flow diagram of an exemplary method performed by the system when the computer system of FIG. 1 decides to selectively drop data. The order in which the operations can be rearranged is not apparent to the spirit of the embodiments of the present invention. According to an embodiment, the method 300 of FIG. 3 may begin with the computer system 1 侦测 detecting an external packet arriving at the cache. However, the cache (7) 1310500 may not have a block to store data 3 1 0. If the cache does not have a block to allocate to the data, then the cache must first find a block to recover 3 1 5 . If the reclaimed block is already used (dirty)/corrected' and the cache can be written back to the memory (eg, output buffer to memory interconnect) through the channel, the channels are currently busy. Then the system has a dead knot 3 2 0. If the cache can write data to the memory via the channel and is not busy, then the data packet is pushed into the block 34 and the data has been corrected 345. By indicating that the block has been corrected, the cache will be made aware that the data has not been stored in the memory. However, if the cache is able to write data to the memory through the channel while the channel is busy, then an external confirmation message is sent to the I/O controller 325. The I/O controller can translate the message to determine where the data packet comes from. This information can be sent from any of the above 1/0 devices. Therefore, the data packet is transmitted to the memory of the processor to access the data. 3 3 5 ° The pushed data is selectively discarded, which may occur infrequently for at least the following reasons. First, if an invalid block is already configured at the cache, then the processor can simply indicate that the block is valid and put incoming data there. This does not require any write back of the corrected material. Second, the read-only information is often stored in the cache and can be overwritten by new packets without any write back. Third, the output buffer to memory interconnect can be appropriately sized to buffer enough transport packets, which reduces the chances that the buffer is full and thus creates a dead knot. -10- (8) 1310500 In a communications-based interconnect, such as a bus or ring network, the cache can snoop through the data and sink it into its own cache. Stretching is a directory-based protocol where producers (such as 1/〇 devices or processor caches) can direct data to a cache that consumes a particular processor. Thus, the cache transfer technique of the present invention can improve the efficiency of a multiprocessor system having a 丨/〇 device' without changing the memory-based mode or wasting excessive interconnect bandwidth. 4 is a block diagram depicting an example of a storage medium 400 that includes content 405 that, when accessed, causes an electronic device to perform one or more forms of the present invention and/or relate to methods 200, 300. In this regard, the storage medium 400 includes content 4500 (eg, instructions, materials, or any-combination) 'when the content is executed 'causes the machine to perform one or more types of the cache technology' As mentioned above. The machine readable (storage) medium 300 can include, but is not limited to, a floppy disk, a compact disc, a CD-ROM, and a magneto-optical disc, ROM, RAM, EPROM, EEPROM, magnetic or optical card. Or other type of media/machine readable medium suitable for storing electronic instructions. In the following description, for purposes of explanation and description, the specific details, such as particular structures, structures, interfaces, techniques, etc., may be referred to in order to provide a complete explanation of the various aspects of the present invention, but It will be apparent to those skilled in the art that the various aspects of the invention may be practiced in other embodiments. The description of the present invention is not to be construed as a <RTI ID=0.0> -11 - (9) 1310500 [Brief Description of the Drawings] The various features of the present invention will be apparent from the description of the preferred embodiments illustrated in the drawings. The same part of the schema. The drawings are not necessarily to scale unless the 1 is a block diagram of a computer system for performing a fast push example of an embodiment consistent with the present invention. 2 is a flow chart showing an exemplary execution method of the computer system of FIG. 1 for selectively pushing the 1/0 data to the cache of the processor. FIG. 3 is a diagram of the computer system selectively exchanging I/ from the processor. A flowchart of an example of an execution method in which data is lost. 4 is a block diagram of an example of a commercial product including data content, wherein when the device accesses the data content, the device is caused to perform one or more of the forms of one or more embodiments of the present invention. [Main component symbol description] . 100 : Computer system - 102, 104, 106: Processor 10 8, 110, 112: Cache] 1 6 : Bus 1 1 8 : Memory 120, 122: Input/output controller 1 0 : Computer System-12 - (10)

1310500 3 00 :機器可讀取(儲存)媒體 400 :儲存媒體 405 :內容1310500 3 00: Machine readable (storage) media 400: Storage media 405: Content

-13--13-

Claims (1)

1310500 __ Ί牌丨月丨Γει修正本 十、申請專利範圍 ~~~ 附件4Α: 第941 428 14號專利申請案 中文申請專利範圍替換本i 民國98年1月15日修正 1. 一種用於可選擇地將I/O資料推送至處理器之快 取中的方法,包括:1310500 __ Ί 丨 丨 丨Γ ι ι 修正 ι ι ι 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 、 第 第 第 第 第 第 第 941 941 941 941 941 941 941 941 941 A method of selectively pushing I/O data to a processor cache, including: 將資料從一輸入/輸出(I/O)裝置直接推送至一處理 器之一快取中,以回應進來資料的偵測;以及 若發生一死結時’可選擇地丟掉該資料,該死結係該 快取之寫回通道忙碌的狀態。 2·如申請專利範圍第1項所述之方法,另外包括: 偵測將被寫至記憶體的資料;以及 發送一中斷。 3 ·如申請專利範圍第2項所述之方法,其中資料的 偵測係透過監視至該記憶體的內向寫入。 4. 如申請專利範圍第2項所述之方法,其中資料的 偵測係透過將該資料的細節發送至一計算機系統。 5. 如申請專利範圍第2項所述之方法,另外包括: 偵測該死結。 6 如申請專利範圍第5項所述之方法,另外包括: 若沒有死結被偵測到,則持續將資料推送至該處理器 之該快取中。 7.如申請專利範圍第6項所述之方法,另外包括: 從該處理器之該快取直接存取被推送之該資料。 1310500 8如申g靑專利範圍第5項所述之方法,另外包括: 若死結被偵測到,則發送訊息到關於丟掉該資料之 裝置;以及 轉譯該訊息。 9.如申請專利範圍第8項所述之方法,另外包括: 若死結被偵測到,則讀取來自記憶體之該資料。Push data from an input/output (I/O) device directly into a cache of a processor to respond to the detection of incoming data; and if a dead knot occurs, 'optionally discard the data, the dead knot The cache is written back to the busy state of the channel. 2. The method of claim 1, wherein the method further comprises: detecting data to be written to the memory; and transmitting an interrupt. 3. The method of claim 2, wherein the detecting of the data is performed by monitoring the inbound writing to the memory. 4. The method of claim 2, wherein the detecting of the data is by sending the details of the data to a computer system. 5. The method of claim 2, further comprising: detecting the dead knot. 6 The method of claim 5, further comprising: if no dead knot is detected, continuously pushing the data to the cache of the processor. 7. The method of claim 6, further comprising: directly accessing the pushed data from the cache of the processor. 1310500 8 The method of claim 5, wherein: if the dead knot is detected, sending a message to the device for dropping the data; and translating the message. 9. The method of claim 8, further comprising: if the dead knot is detected, reading the data from the memory. 1 °·如申請專利範圍第8項所述之方法,另外包括: 若死結被偵測到’則自該〗/〇裝置存取該資料。 11. 一種用於可選擇地將I/O資料推送至處理器之快 取中的系統,包括: 一輸入/輸出(I/O )裝置; 耦合到至少一快取的至少一處理器;以及 一記憶體裝置,耦合到該輸入/輸出(〗/〇)裝置與該 至少一處理器以儲存資料; 其中該輸入/輸出(i/o)裝置將資料直接推送到該處 理器之該快取中,以回應進來資料的偵測,且如果發生— 死結,至少一處理器可選擇地丟掉該資料,該死結係該快 取之寫回通道忙碌的狀態。 1 2 ·如申請專利範圍第1 1項所述之系統,其中該I / 〇 裝置係一網路控制器。 1 3 ·如申請專利範圔第1 2項所述之系統,其中該網 路控制器係執行直接記憶體存取(DMA )操作。 14.如申請專利範圍第1 1項所述之系統,其中該I/O 裝置發送一中斷到該處理器’以通知該處理器存取該資料 1310500 15.如申請專利範圍第1 4項所述之系統,其中該系 統係偵測一死結。 16·如申請專利範圍第1 5項所述之系統,其中若沒 有死結被偵測到,則該I/O裝置將該資料推送到該處理器 之該快取中。 1 7 如申請專利範圍第1 6項所述之系統,其中該處 理器直接從其快取存取該資料。 1 8 ·如申請專利範圍第1 5項所述之系統,其中若一 死結被偵測到’則該處理器從該記憶體裝置讀取該資料。 19. 一種記錄可以執行可選擇地將I/O資料推送至處1 °. The method of claim 8, wherein the method further comprises: if the dead knot is detected, the data is accessed from the device. 11. A system for selectively pushing I/O data to a cache of a processor, comprising: an input/output (I/O) device; at least one processor coupled to at least one cache; a memory device coupled to the input/output device and the at least one processor for storing data; wherein the input/output (i/o) device directly pushes the data to the cache of the processor In response to the detection of incoming data, and if a dead-end occurs, at least one processor can optionally discard the data, and the dead knot is the state in which the cache is written back to the busy state of the channel. The system of claim 11, wherein the I/〇 device is a network controller. 1 3 The system of claim 1, wherein the network controller performs a direct memory access (DMA) operation. 14. The system of claim 11, wherein the I/O device sends an interrupt to the processor to notify the processor to access the data 1310500. 15. As claimed in claim 14 The system described, wherein the system detects a dead knot. 16. The system of claim 15, wherein if no dead knot is detected, the I/O device pushes the data to the cache of the processor. 1 7 The system of claim 16, wherein the processor accesses the data directly from its cache. The system of claim 15, wherein the processor reads the data from the memory device if a dead knot is detected. 19. A record can be executed to selectively push I/O data everywhere 理器之快取中的指令的電腦可讀取儲存媒體,具有儲存於 該處的指令,當由一機器執行時,該等指令造成該機器將 內容直接推送至該處理器之該快取中’以回應藉由該處理 器的記憶體讀取,且如果發生一死結’可選擇地丟掉資料 ,該死結係該快取之寫回通道忙碌的狀態° 2 0 _如申請專利範圍第1 9項所述之電腦可讀取儲存 媒體,另外包括指令,當由該機器執行時’該等指令造成 該機器發送一中斷到該處理器。 2 1.如申請專利範圍第2 0項所述之儲存媒體’另外 包括指令,當由該機器執行時,該等指令造成該機器死結 2 2 .如申請專利範圍第2 1項所述之電腦可讀取儲存 -3- 1310500 媒體,另外包括指令,當由該機器執行時,若沒有死結被 偵測到,則該等指令造成該機器直接將內容推送到該處理 器之一快取中。A computer readable storage medium having instructions in the cache of the processor having instructions stored therein that, when executed by a machine, cause the machine to push content directly into the cache of the processor 'Responding to the memory read by the processor, and if a dead knot occurs, the data can be selectively discarded, and the dead knot is the busy state of the write back channel. 2 0 _ as claimed in the patent scope The computer-readable storage medium described in the item additionally includes instructions that, when executed by the machine, cause the machine to send an interrupt to the processor. 2 1. The storage medium as described in claim 20 of the patent application additionally includes instructions which, when executed by the machine, cause the machine to die. 2 2. The computer described in claim 21 The -3- 1310500 media can be read and stored, and instructions are included. When executed by the machine, if no dead knot is detected, the instructions cause the machine to directly push the content to one of the processors.
TW094142814A 2004-12-06 2005-12-05 A method, system, and computer readable storage medium, provided with relative instructions, for optionally pushing i/o data into a processor's cache TWI310500B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US11/005,970 US7574568B2 (en) 2004-12-06 2004-12-06 Optionally pushing I/O data into a processor's cache

Publications (2)

Publication Number Publication Date
TW200643727A TW200643727A (en) 2006-12-16
TWI310500B true TWI310500B (en) 2009-06-01

Family

ID=36575728

Family Applications (1)

Application Number Title Priority Date Filing Date
TW094142814A TWI310500B (en) 2004-12-06 2005-12-05 A method, system, and computer readable storage medium, provided with relative instructions, for optionally pushing i/o data into a processor's cache

Country Status (4)

Country Link
US (1) US7574568B2 (en)
CN (1) CN101099137B (en)
TW (1) TWI310500B (en)
WO (1) WO2006062837A1 (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060192653A1 (en) * 2005-02-18 2006-08-31 Paul Atkinson Device and method for selectively controlling the utility of an integrated circuit device
US7930459B2 (en) * 2007-09-28 2011-04-19 Intel Corporation Coherent input output device
GB0722707D0 (en) * 2007-11-19 2007-12-27 St Microelectronics Res & Dev Cache memory
GB2454809B (en) * 2007-11-19 2012-12-19 St Microelectronics Res & Dev Cache memory system
US8683484B2 (en) * 2009-07-23 2014-03-25 Novell, Inc. Intelligently pre-placing data for local consumption by workloads in a virtual computing environment
US20150363312A1 (en) * 2014-06-12 2015-12-17 Samsung Electronics Co., Ltd. Electronic system with memory control mechanism and method of operation thereof
US10366027B2 (en) 2017-11-29 2019-07-30 Advanced Micro Devices, Inc. I/O writes with cache steering
CN116303135B (en) * 2023-02-24 2024-03-22 格兰菲智能科技有限公司 Task data loading method and device and computer equipment

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5551005A (en) * 1994-02-25 1996-08-27 Intel Corporation Apparatus and method of handling race conditions in mesi-based multiprocessor system with private caches
US5915129A (en) * 1994-06-27 1999-06-22 Microsoft Corporation Method and system for storing uncompressed data in a memory cache that is destined for a compressed file system
US5652846A (en) * 1995-07-03 1997-07-29 Compaq Computer Corporation Bus deadlock prevention circuit for use with second level cache controller
US6438660B1 (en) * 1997-12-09 2002-08-20 Intel Corporation Method and apparatus for collapsing writebacks to a memory for resource efficiency
US6434673B1 (en) * 2000-06-30 2002-08-13 Intel Corporation Optimized configurable scheme for demand based resource sharing of request queues in a cache controller
US6760809B2 (en) * 2001-06-21 2004-07-06 International Business Machines Corporation Non-uniform memory access (NUMA) data processing system having remote memory cache incorporated within system memory
US6842822B2 (en) * 2002-04-05 2005-01-11 Freescale Semiconductor, Inc. System and method for cache external writing
US6711650B1 (en) * 2002-11-07 2004-03-23 International Business Machines Corporation Method and apparatus for accelerating input/output processing using cache injections
US20040199727A1 (en) * 2003-04-02 2004-10-07 Narad Charles E. Cache allocation

Also Published As

Publication number Publication date
US7574568B2 (en) 2009-08-11
CN101099137A (en) 2008-01-02
CN101099137B (en) 2011-12-07
TW200643727A (en) 2006-12-16
US20060123195A1 (en) 2006-06-08
WO2006062837A1 (en) 2006-06-15

Similar Documents

Publication Publication Date Title
TWI310500B (en) A method, system, and computer readable storage medium, provided with relative instructions, for optionally pushing i/o data into a processor's cache
TWI519958B (en) Method and apparatus for memory allocation in a multi-node system
JP5575870B2 (en) Satisfaction of memory ordering requirements between partial read and non-snoop access
CN101430664B (en) Multiprocessor system and Cache consistency message transmission method
US6345352B1 (en) Method and system for supporting multiprocessor TLB-purge instructions using directed write transactions
EP1615138A2 (en) Multiprocessor chip having bidirectional ring interconnect
TW201011536A (en) Optimizing concurrent accesses in a directory-based coherency protocol
US20080028103A1 (en) Memory-mapped buffers for network interface controllers
TW201543218A (en) Chip device and method for multi-core network processor interconnect with multi-node connection
TW542958B (en) A method and apparatus for pipelining ordered input/output transactions to coherent memory in a distributed memory, cache coherent, multi-processor system
TWI541649B (en) System and method of inter-chip interconnect protocol for a multi-chip system
JP5050009B2 (en) Dynamic update of route table
JP2005189928A (en) Multi-processor system, consistency controller for same system, and method for controlling consistency
TW201543360A (en) Method and system for ordering I/O access in a multi-node environment
JP2006260159A (en) Information processing apparatus, and data control method for information processing apparatus
EP1508100B1 (en) Inter-chip processor control plane
US20050198438A1 (en) Shared-memory multiprocessor
JP7419261B2 (en) Data processing network using flow compression for streaming data transfer
TWI324755B (en) Processing modules with multilevel cache architecture
Zhao et al. Hardware support for accelerating data movement in server platform
CN114356839B (en) Method, device, processor and device readable storage medium for processing write operation
Chung et al. PANDA: ring-based multiprocessor system using new snooping protocol
CN116414563A (en) Memory control device, cache consistency system and cache consistency method
US8732351B1 (en) System and method for packet splitting
JP3764015B2 (en) Memory access method and multiprocessor system

Legal Events

Date Code Title Description
MM4A Annulment or lapse of patent due to non-payment of fees