TWI310500B

TWI310500B - A method, system, and computer readable storage medium, provided with relative instructions, for optionally pushing i/o data into a processor's cache

Info

Publication number: TWI310500B
Application number: TW094142814A
Authority: TW
Inventors: Shubhendu Mukherjee
Original assignee: Intel Corp
Priority date: 2004-12-06
Filing date: 2005-12-05
Publication date: 2009-06-01
Also published as: US7574568B2; CN101099137A; CN101099137B; TW200643727A; US20060123195A1; WO2006062837A1

Description

1310500 (1) 九、發明說明【發明所屬之技術領域】本發明係關於可選擇地將I/O資料推送至處理器之快取中的方法、系統與裝置。錢 . 【先前技術】用於計算機系統之處理器，例如網際網路服務，在資 Φ 料上非常快速運作，因此需要一穩定的資料供應以有效運作。倘若處理器所需要獲得的資料不是來自處理器內部的快取記憶體’則當資料正在讀取時可能導致處理器閒置的時脈週期。某些嘗試克服處理器效益的先前技術設計，包 - 含當資料一寫入記憶體時即立刻將資料推送至快取記憶體中。這些先前技術設計面臨一問題，即是倘若該資料並非馬上所需要’則該資料可能被蓋過去，當需要該資料時又要重新到記憶體去提取該資料，因此浪費傳送頻寬。這些 • 先前技術設計面臨另—問題則是在多處理器系統裡，該系統無法決定哪個處理器需要資料。 '【發明內容】本發明係提供一種執行快取推送的方法，包括將資料從一輸入/輸出（I/O )裝置直接推送至一處理器之一快取中’以回應進來資料的偵測；以及若發生一死結時，可選擇地丟掉該資料。本發明亦提供一種執行快取推送的系統及其儲存媒體 -4- (2) 1310500 。在下文中’本發明之優點及其他特徵將藉由若干實施例及附圖加以詳述。【實施方式】在下面說明中，爲了解釋與不受限的目的，將提到特定的細節’如特別的結構、構造、介面、技術等等，以提供本發明在不同觀點的完整解說，但是，對受益於本揭露之技術專業人士而言’本發明的不同觀點可實施在與這些特定細節背離的其他範例是顯而易見的。在某些情況下，省略眾所周知的裝置、電路，與方法之說明，並不會對本發明的說明造成誤解。本發明的實施例一般係指供計算機系統執行如快取推送之已知技術的方法、裝置與系統。快取推送技術藉由可選擇地將資料推送至另一處理器之快取的能力而提昇單一寫入程式無效協定，但卻不會改變記憶體的一致性模式。記憶體的一致性模式可以是循序一致性順序模式。循序— 致性模式係一種記憶體存取順序的模式，該模式可能要求程式在不同的處理器執行時，應該以程式在某一處理器執行的某種交錯方式來呈現。對各處理器而言，假如沒有讀或寫係根據來自相同處理器的先前讀或寫而重排，則可滿足循序一致性存取模式。在不改變一致性模式之下，藉由無縫整合快取推送技術’可避免對計算機系統或者機器之程式模組有任何改變。此外，快取推送可在推送（或製作者）快取或裝置的控 -5- (3) 1310500 制之下，因而達到最佳化以避免互連頻寬在適當時間將資料推送至另一處理器之快效益的優勢’該效益係指必要時將資料移之效益。 ' 圖1係計算機系統適於執行符合本發 • 取推送範例之方塊圖。計算機系統1 〇〇係傳統與非傳統之計算機系統、伺服器、網 Φ 路由器、無線通訊用戶單元 '無線通訊電、個人數位助理、機上盒，或任一電子家學理而受益的任一種。根據本文所描述的系統100可包括一個或多個處理器1〇2、 ' 1 0 8、1 1 0 ' Η 2 ;匯流排1 1 6 ;記憶體1 1 I/O )控制器〗20。處理器102、104、106可代表很多種個’該控制邏輯包括但不侷限於一個或多 Φ 程式化邏輯裝置（PLD)、可程式化邏輯特定應用積體電路（ASIC)、微控制器， ' 本發明係不受制在此方面。在一實施例！1310500 (1) Description of the Invention [Technical Field] The present invention relates to a method, system and apparatus for selectively pushing I/O data into a processor's cache. [Previous Technology] Processors used in computer systems, such as Internet services, operate very quickly on the resources and therefore require a stable data supply to operate effectively. If the data that the processor needs to obtain is not from the internal cache memory of the processor, then the clock cycle may be caused by the processor being idle while the data is being read. Some prior art designs that attempt to overcome processor benefits include the ability to push data into the cache as soon as the data is written to the memory. A problem with these prior art designs is that if the information is not needed immediately, then the data may be overwritten and the data will be retrieved when the data is needed, thus wasting the transmission bandwidth. These • The prior art design faces another problem—in a multiprocessor system, the system cannot decide which processor needs the data. SUMMARY OF THE INVENTION The present invention provides a method of performing a cache push, including pushing data directly from an input/output (I/O) device to a processor in a cache to respond to incoming data detection. And, if a dead knot occurs, optionally discard the material. The present invention also provides a system for performing cache push and its storage medium -4- (2) 1310500. The advantages and other features of the present invention will be described in detail hereinafter by way of several embodiments and the accompanying drawings. DETAILED DESCRIPTION OF THE INVENTION In the following description, for purposes of explanation and description, reference to the It will be apparent to those skilled in the art having the benefit of this disclosure that the various aspects of the present invention may be practiced in other examples. In some instances, well-known devices, circuits, and methods are omitted and the description of the present invention is not misunderstood. Embodiments of the invention generally refer to methods, apparatus and systems for a computer system to perform known techniques such as cache feed. Cache push technology enhances a single write invalidation protocol by selectively pushing data to another processor's cache, but does not change the memory's consistency mode. The consistency mode of the memory can be a sequential consistency sequential mode. Sequential-to-sense mode is a pattern of memory access sequences that may require programs to be rendered in a staggered manner in which a program executes on a particular processor when executed by a different processor. For each processor, a sequential consistent access mode can be satisfied if no read or write is rearranged based on previous reads or writes from the same processor. By seamlessly integrating the cache push technology without changing the consistency mode, any changes to the computer system or the program modules of the machine can be avoided. In addition, the cache push can be optimized under push (or producer) cache or device control -5 (13) 1310500, thus optimizing the interconnect bandwidth to push data to another at the appropriate time. The advantage of the fast benefit of the processor' This benefit refers to the benefit of moving the data when necessary. Figure 1 is a block diagram of a computer system suitable for performing a push example in accordance with the present invention. Computer System 1 传统 Traditional and non-traditional computer systems, servers, networks Φ routers, wireless communication subscriber units 'wireless communication, personal digital assistants, set-top boxes, or any of the benefits of electronics. System 100 according to the description herein may include one or more processors 1 〇 2, '1 0 8 , 1 1 0 ' Η 2; bus 1 1 6 ; memory 1 1 I/O) controller -20. The processors 102, 104, 106 can represent a wide variety of 'control logic' including but not limited to one or more Φ stylized logic devices (PLDs), programmable logic application specific integrated circuits (ASICs), microcontrollers, The invention is not subject to this aspect. In an embodiment!

• 100可以是網路伺服器，以及處理器可J• 100 can be a web server, and the processor can be J

Intel Itanium 2 處理器。在另一個實施例中，計算機系統1〇〇、或其他、或像單晶片多處理器或多晶片理器成分。單晶片多處理器係包含多處理電路’其中每一處理核心係一能執行指令f 的浪費。甚至，取中將具有改善至離處理器更近明一實施例之快用以代表很多種路交換機、網路話基礎架構元件電等因本發明之實施例，計算機 104、 106；快取 8與輸入/輸出（控制邏輯的任一個微處理器、可陣列（PLA )、以及其他，雖然中’計算機系統以是一個或多個可以是多晶片卡模組這樣的多處核心之單一積體 Ϊ勺處理。 -6- 104、 (4) 1310500 快取108、110、112係分別連接至處理器1〇2、 1 0 6。處理器可具有內部快取記憶體以作爲對資料做低存取延遲之用。當一處理器所需執行的資料或不位在內部的快取時，處理器則嘗試到記憶體〗j 8 料或指令。快取1 〇 8、1 1 0、1 1 2係連接至匯流排1 匯流排1 1 6亦連接至記憶體1 1 8。匯流排1 1 6在其他實施例裡可以是點對點的互可以是兩點或多點之匯流排。可使用各種知名或其可用之匯流排、互連、或其他的通訊協定以允許與部元件如記憶體、其他處理器' I/O元件、橋接器通訊。圖1計算機系統的實施例可包括複數之處理器之快取。這些處理器與快取包含一多處理器系統，處理器系統中，透過快取資料一致性之機制，使得快取間保持一致性。快取資料一致性之協定可在處快取與記憶體之間實施以確保快取的一致。快取資性之機制可在匯流排Π 6內包括額外的快取資料一制線。記億體1 1 8可代表任一形式之記憶體裝置，用處理器已使用或將使用之資料與指令。一般而言’ 發明並不受限在此方面，但是記憶體Π 8依然可由機存取記憶體（DRAM )所組成。在另外實施例中體1 1 8可包括半導體記憶體。在再另外一實施例中體1 1 8可包括像硬碟機這樣的磁性儲存裝置。然而: 與指令指令並去讀資 1 6，該連或者他地方其他外等等來與複數在該多快取與理器之料一致致之控於儲存雖然本動態隨，言己憶，記憶本發明 (5) !310500 Y受限在這裡所提之記憶體範例。 I/O控制器120係連接至匯流排1 16。I/O控制器120 可代表任一形式之晶片組或控制邏輯作爲I/〇裝置1 22與言十算機系統100的其他元件相接之介面。一個或多個1/0 裝置1 22可代表任—形式之裝置、周邊或提供計算機系統 1 〇〇輸入或者處理來自計算機系統〗〇〇輸出之元件。在— 實施例中，雖然本發明不是那麼受限，但至少有一個I/O 裝置1 22可做爲網路介面控制器，該網路介面控制器具有能力來執行直接記憶體存取（DMA )以複製資料至記憶體 118°就此方面而言，由於當收到TCP/IP封包時，I/O裝置122即執行DMA，因此存在一軟體具有網際網路協定 .之傳輸控制協定（TCP/IP )堆疊則由處理器102、104、 106所執行。本發明—般而言，特別是1/〇裝置122不會只侷限在作爲網路介面控制器。在其他實施例中，至少有 —I/O裝置1 22可以作爲圖像控制器或磁碟控制器，或者另一種因本發明學理而受惠之控制器。圖2係圖1的計算機系統所執行之範例方法的流程圖。雖然以下的操作係以循序式流程來描述，但是實際上許多操作係以平行或同時發生的方式來進行，這對那些相關技術的專業人士而言，是顯而易見的。此外，可以重新安排操作的順序，卻不會背離本發明實施例的精神。根據一實施例，圖2的方法200可以從計算機系統 100偵測到一封包205並透過一 I/O裝置122將該封包寫到記憶體1 1 8而開始。在一實施例中，一 I / 〇裝置發送一 (6) 1310500 通訊至計算機系統1 〇〇表示該封包的細節。在另一實施例中’計算機系統1 00透過入內寫到記憶體之監測可偵測到該封包。接下來，將一中斷210送至一特定的處理器以通知該處理器去存取資料。在此時該處理器並不知道該資料封包是在快取或在I/O裝置。於是同時間（在中斷的陰影中） ’ I/O裝置122可將封包直接推送至該處理器之快取215 中。藉由將封包直接推送至快取中，計算機系統1 00可避免由處理器去讀取來自I/O裝置的封包所發生的額外延遲。這對計.算機系統而言係特別重要，因爲該計算機系統作爲高效率路由器從一網路介面讀取封包並將這些封包再導至另一網路介面。一旦外部封包到達該處理器之快取，此封包啓動一死結220。該死結220可以發生在儲存媒體，將於圖4做進 ~步說明。倘若沒發生死結220，則硬體可可選擇地將資料推送至快取中。因此，處理器可直接從它的快取2 3 5存取被推送之資料。但是，倘若發生死結220，則在沒有空間情況下處理器有權丟掉該資料225，因此避免了該死結。處理器可一直讀取來自記憶體230的資料。圖3係當圖1的計算機系統決定可選擇地丟掉資料時，該系統所執行之範例方法的流程圖。可重新排列操作的順序卻不會違背本發明實施例的精神係顯而易見。根據一實施例，圖3的方法3 00可以從計算機系統1 〇偵測到到達快取的一外部封包而開始3 0 5。但是，該快取 (7) 1310500 可能沒有一區塊可用來儲存資料3 1 0。倘若該快取並沒有一區塊可分配給該資料’那麼該快取首先必須找出一區塊予以收回3 1 5。假如被收回的區塊係已使用（dirty ) /修正 ’而且快取可透過通道將該資料寫回到記憶體（如：輸出緩衝器到記憶體互連）的該等通道目前正忙碌時，那麼該系統發生死結3 2 0。假如快取可透過通道將資料寫到記憶體的該等通道並不忙碌時，那麼該資料封包被推送至該區塊34〇中而且標註該資料已被修正過345。藉由標示該區塊已被修正，將使該快取知道此資料尙未存入記憶體。但是’假如快取可透過通道將資料寫到記憶體的該通道正忙碌時’那麼一外在之確認訊息被送至I/O控制器 325。該I/O控制器可轉譯將該訊息以決定該資料封包來自於何方3 3 0。該資料可以從任一上述之1/0裝置來發送。所以該資料封包傳送到處理器可存取該資料的記憶體 3 3 5 ° 被推送的資料遭可選擇的丟掉，可能不常發生，至少有下列原因。首先，假如一無效區塊已配置在快取時，那麼處理器可單純地標示該區塊爲有效並將進來的資料放入那裡。這並不需要已修正資料的任何寫回。第二，唯讀資料常存於快取且可被新來的封包蓋過而不用任何的寫回。第三，輸出緩衝器到記憶體互連可適當地調整大小以緩衝足夠的運送封包，這可降低緩衝器被塡滿的機率，因而造成死結。 -10- (8) 1310500 在以傳播爲主的互連中，如匯流排或環狀網路，快取可窺探通過的資料並將該資料吸收到它自己的快取中。伸是在一以目錄爲依據的協定裡’製作者（如1/〇裝置或處理器之快取）可將資料導至消耗特定處理器的快取。因此’本發明之快取推送技術可改善具有丨/ 〇裝置之多處理器系統的效率’但不致變更記憶體—致性模式或者浪費記憶體過多的互連頻寬。圖4係描繪包括內容405之儲存媒體400範例之方塊圖，其中當內容被存取時’造成一電子裝置執行本發明之 —個或多個型態且/或與方法200、300有關。就這點而言，儲存媒體4 0 0包括內容4 05 (如指令、資料、或任—組合）’當執行該內容時’造成該機器執行該快取推送技術的一個或多個型態’如上所述。該機器可讀取（儲存）媒體300可包括但不受限於軟碟、光碟、CD-ROM，與磁光碟、ROM、RAM、EPROM、EEPROM、磁或光卡 '快閃言己憶體、或其他類形之適合儲存電子式指令的媒體/機器可讀取媒體。在下面說明中，爲了解釋與不受限的目的，將提到特定的細節，如特別的結構、構造、介面、技術等等，以提供本發明在不同觀點的完整解說，但是，對受益於本揭露之技術專業人士而言，本發明的不同觀點可實施在與這些特定細節背離的其他範例是顯而易見的。在某些情況下’ 省略眾所周知的裝置、電路，與方法之說明，並不會對本發明的說明造成誤解。 -11 - (9) 1310500 【圖式簡單說明】從下面附有圖式所述的較佳實施例說明得知，本發明的各種特色將顯而易見，其中同樣的參考數字一般泛指貫 • 穿本圖式之相同部位。該等圖式不必按比例繪製，相反地 - 著重在說明本發明的原理。圖1係計算機系統爲執行符合本發明的—實施例之快 # 取推送範例之方塊圖。圖2係圖1計算機系統可選擇地將1/0資料推送至處理器之快取中的執行範例方法之流程圖》圖3係圖】計算機系統可選擇地從處理器之快取將 I / 〇資料丟掉的執行方法範例之流程圖。圖4係包括資料內容之商用產品範例之方塊圖，其中當裝置存取資料內容時，造成該裝置執行本發明一個或多個實施例之一個或多個以上之型態。【主要元件符號說明】 . 100 :計算機系統 - 102、 104、 106:處理器 10 8、110、112：快取 ]1 6 :匯流排 1 1 8 :記憶體 120、122 :輸入/輸出控制器 1 0 :計算機系統 -12 - (10)Intel Itanium 2 processor. In another embodiment, the computer system 1 or other, or a single-chip multi-processor or multi-processor component. The single-chip multiprocessor system includes a multiprocessing circuit' in which each processing core is capable of executing the waste of the instruction f. Even, the access will be improved to be closer to the processor, faster than the embodiment to represent a wide variety of switch, network infrastructure components, etc., according to embodiments of the present invention, computers 104, 106; cache 8 With a single input/output (control microprocessor, arrayable (PLA), and others, although the 'computer system' is a single integrated body that is one or more cores that can be multi-chip card modules -6 处理 108 108 108 108 108 108 108 108 108 108 108 108 108 108 108 108 108 108 108 108 108 108 108 108 108 108 108 108 108 108 108 108 108 108 108 108 108 108 108 108 108 108 108 108 108 108 108 108 The delay is used. When a processor needs to execute data or does not have an internal cache, the processor tries to access the memory or instruction. Cache 1 〇 8, 1 1 0, 1 1 2 is connected to busbar 1 Busbar 1 1 6 is also connected to memory 1 1 8 . Busbar 1 1 6 In other embodiments, it may be a point-to-point busbar that can be two or more points. Well-known or available bus, interconnect, or other communication Agreements allow communication with various components such as memory, other processor 'I/O components, bridges. Figure 1 A computer system embodiment can include a plurality of processor caches. These processors and caches contain a multi-processing In the processor system, the mechanism for consistency of data is used to maintain consistency between the caches. The protocol for cache data consistency can be implemented between the cache and the memory to ensure the consistency of the cache. The mechanism of quick access can include additional cache data in the bus. 6 The Bill 1 1 8 can represent any form of memory device, using the data that the processor has used or will use. In general, the invention is not limited in this respect, but the memory Π 8 may still be comprised of a machine access memory (DRAM). In other embodiments, the body 181 may include a semiconductor memory. In another embodiment, the body 1 18 may include a magnetic storage device such as a hard disk drive. However, the instruction command is used to read the capital 1, the connection or other places, and the like. And processor The control is consistent with the storage. Although this dynamic is followed by the memory, the present invention (5) !310500 Y is limited to the memory example mentioned herein. The I/O controller 120 is connected to the bus 1 16 . The I/O controller 120 can represent any form of chipset or control logic as an interface between the I/〇 device 1 22 and other components of the computer system 100. One or more 1/0 devices 1 22 can Representing a form-of-device, peripheral or providing computer system 1 〇〇 inputting or processing components from a computer system 。 output. In the embodiment, although the invention is not so limited, at least one I/O device 1 22 can be used as a network interface controller, the network interface controller has the ability to perform direct memory access (DMA) to copy data to the memory 118° in this respect, since when receiving a TCP/IP packet At this time, the I/O device 122 executes the DMA, so there is a software with a network protocol. The Transmission Control Protocol (TCP/IP) stack is executed by the processors 102, 104, 106. In general, the 1/〇 device 122 is not limited to being a network interface controller. In other embodiments, at least the I/O device 1 22 can function as an image controller or disk controller, or another controller that benefits from the teachings of the present invention. 2 is a flow chart of an exemplary method performed by the computer system of FIG. 1. Although the following operations are described in a sequential process, in practice many of the operations are performed in parallel or concurrent manners, as will be apparent to those skilled in the relevant art. In addition, the order of operations may be rearranged without departing from the spirit of the embodiments of the invention. According to an embodiment, the method 200 of FIG. 2 can begin by detecting a packet 205 from the computer system 100 and writing the packet to the memory 1 18 via an I/O device 122. In one embodiment, an I/〇 device sends a (6) 1310500 communication to the computer system 1 to indicate the details of the packet. In another embodiment, the computer system 100 detects the packet by monitoring the in-write memory. Next, an interrupt 210 is sent to a particular processor to inform the processor to access the data. At this point the processor does not know that the data packet is on the cache or on the I/O device. Thus, at the same time (in the shadow of the interruption), the I/O device 122 can push the packet directly into the cache 215 of the processor. By pushing the packet directly into the cache, computer system 100 can avoid additional delays incurred by the processor in reading packets from the I/O device. This is especially important for computer systems because the computer system acts as a high efficiency router to read packets from a network interface and redirect those packets to another network interface. Once the outer packet arrives at the cache of the processor, the packet initiates a deadlock 220. The dead knot 220 can occur in the storage medium and will be described in Figure 4. If the dead knot 220 does not occur, the hardware can optionally push the data to the cache. Therefore, the processor can directly retrieve the pushed data from its cache 2 3 5 . However, in the event of a dead knot 220, the processor has the right to discard the data 225 in the absence of space, thus avoiding the dead knot. The processor can always read data from the memory 230. 3 is a flow diagram of an exemplary method performed by the system when the computer system of FIG. 1 decides to selectively drop data. The order in which the operations can be rearranged is not apparent to the spirit of the embodiments of the present invention. According to an embodiment, the method 300 of FIG. 3 may begin with the computer system 1 侦测 detecting an external packet arriving at the cache. However, the cache (7) 1310500 may not have a block to store data 3 1 0. If the cache does not have a block to allocate to the data, then the cache must first find a block to recover 3 1 5 . If the reclaimed block is already used (dirty)/corrected' and the cache can be written back to the memory (eg, output buffer to memory interconnect) through the channel, the channels are currently busy. Then the system has a dead knot 3 2 0. If the cache can write data to the memory via the channel and is not busy, then the data packet is pushed into the block 34 and the data has been corrected 345. By indicating that the block has been corrected, the cache will be made aware that the data has not been stored in the memory. However, if the cache is able to write data to the memory through the channel while the channel is busy, then an external confirmation message is sent to the I/O controller 325. The I/O controller can translate the message to determine where the data packet comes from. This information can be sent from any of the above 1/0 devices. Therefore, the data packet is transmitted to the memory of the processor to access the data. 3 3 5 ° The pushed data is selectively discarded, which may occur infrequently for at least the following reasons. First, if an invalid block is already configured at the cache, then the processor can simply indicate that the block is valid and put incoming data there. This does not require any write back of the corrected material. Second, the read-only information is often stored in the cache and can be overwritten by new packets without any write back. Third, the output buffer to memory interconnect can be appropriately sized to buffer enough transport packets, which reduces the chances that the buffer is full and thus creates a dead knot. -10- (8) 1310500 In a communications-based interconnect, such as a bus or ring network, the cache can snoop through the data and sink it into its own cache. Stretching is a directory-based protocol where producers (such as 1/〇 devices or processor caches) can direct data to a cache that consumes a particular processor. Thus, the cache transfer technique of the present invention can improve the efficiency of a multiprocessor system having a 丨/〇 device' without changing the memory-based mode or wasting excessive interconnect bandwidth. 4 is a block diagram depicting an example of a storage medium 400 that includes content 405 that, when accessed, causes an electronic device to perform one or more forms of the present invention and/or relate to methods 200, 300. In this regard, the storage medium 400 includes content 4500 (eg, instructions, materials, or any-combination) 'when the content is executed 'causes the machine to perform one or more types of the cache technology' As mentioned above. The machine readable (storage) medium 300 can include, but is not limited to, a floppy disk, a compact disc, a CD-ROM, and a magneto-optical disc, ROM, RAM, EPROM, EEPROM, magnetic or optical card. Or other type of media/machine readable medium suitable for storing electronic instructions. In the following description, for purposes of explanation and description, the specific details, such as particular structures, structures, interfaces, techniques, etc., may be referred to in order to provide a complete explanation of the various aspects of the present invention, but It will be apparent to those skilled in the art that the various aspects of the invention may be practiced in other embodiments. The description of the present invention is not to be construed as a <RTI ID=0.0> -11 - (9) 1310500 [Brief Description of the Drawings] The various features of the present invention will be apparent from the description of the preferred embodiments illustrated in the drawings. The same part of the schema. The drawings are not necessarily to scale unless the 1 is a block diagram of a computer system for performing a fast push example of an embodiment consistent with the present invention. 2 is a flow chart showing an exemplary execution method of the computer system of FIG. 1 for selectively pushing the 1/0 data to the cache of the processor. FIG. 3 is a diagram of the computer system selectively exchanging I/ from the processor. A flowchart of an example of an execution method in which data is lost. 4 is a block diagram of an example of a commercial product including data content, wherein when the device accesses the data content, the device is caused to perform one or more of the forms of one or more embodiments of the present invention. [Main component symbol description] . 100 : Computer system - 102, 104, 106: Processor 10 8, 110, 112: Cache] 1 6 : Bus 1 1 8 : Memory 120, 122: Input/output controller 1 0 : Computer System-12 - (10)

1310500 3 00 :機器可讀取（儲存）媒體 400 :儲存媒體 405 :內容1310500 3 00: Machine readable (storage) media 400: Storage media 405: Content

-13--13-

Claims

1310500 __ Ί 丨丨丨Γ ι ι 修正 ι ι ι 、、、、、、、、、、、、、、、、、、、、、、、、、、、、、、第第第第第第第 941 941 941 941 941 941 941 941 941 A method of selectively pushing I/O data to a processor cache, including:

Push data from an input/output (I/O) device directly into a cache of a processor to respond to the detection of incoming data; and if a dead knot occurs, 'optionally discard the data, the dead knot The cache is written back to the busy state of the channel. 2. The method of claim 1, wherein the method further comprises: detecting data to be written to the memory; and transmitting an interrupt. 3. The method of claim 2, wherein the detecting of the data is performed by monitoring the inbound writing to the memory. 4. The method of claim 2, wherein the detecting of the data is by sending the details of the data to a computer system. 5. The method of claim 2, further comprising: detecting the dead knot. 6 The method of claim 5, further comprising: if no dead knot is detected, continuously pushing the data to the cache of the processor. 7. The method of claim 6, further comprising: directly accessing the pushed data from the cache of the processor. 1310500 8 The method of claim 5, wherein: if the dead knot is detected, sending a message to the device for dropping the data; and translating the message. 9. The method of claim 8, further comprising: if the dead knot is detected, reading the data from the memory.

1 °. The method of claim 8, wherein the method further comprises: if the dead knot is detected, the data is accessed from the device. 11. A system for selectively pushing I/O data to a cache of a processor, comprising: an input/output (I/O) device; at least one processor coupled to at least one cache; a memory device coupled to the input/output device and the at least one processor for storing data; wherein the input/output (i/o) device directly pushes the data to the cache of the processor In response to the detection of incoming data, and if a dead-end occurs, at least one processor can optionally discard the data, and the dead knot is the state in which the cache is written back to the busy state of the channel. The system of claim 11, wherein the I/〇 device is a network controller. 1 3 The system of claim 1, wherein the network controller performs a direct memory access (DMA) operation. 14. The system of claim 11, wherein the I/O device sends an interrupt to the processor to notify the processor to access the data 1310500. 15. As claimed in claim 14 The system described, wherein the system detects a dead knot. 16. The system of claim 15, wherein if no dead knot is detected, the I/O device pushes the data to the cache of the processor. 1 7 The system of claim 16, wherein the processor accesses the data directly from its cache. The system of claim 15, wherein the processor reads the data from the memory device if a dead knot is detected. 19. A record can be executed to selectively push I/O data everywhere

A computer readable storage medium having instructions in the cache of the processor having instructions stored therein that, when executed by a machine, cause the machine to push content directly into the cache of the processor 'Responding to the memory read by the processor, and if a dead knot occurs, the data can be selectively discarded, and the dead knot is the busy state of the write back channel. 2 0 _ as claimed in the patent scope The computer-readable storage medium described in the item additionally includes instructions that, when executed by the machine, cause the machine to send an interrupt to the processor. 2 1. The storage medium as described in claim 20 of the patent application additionally includes instructions which, when executed by the machine, cause the machine to die. 2 2. The computer described in claim 21 The -3- 1310500 media can be read and stored, and instructions are included. When executed by the machine, if no dead knot is detected, the instructions cause the machine to directly push the content to one of the processors.