TWI328746B

TWI328746B - A system for performing peer-to-peer data transfer

Info

Publication number: TWI328746B
Application number: TW94142832A
Authority: TW
Inventors: Samuel H Duncan; Wei Je Huang; John H Edmondson
Original assignee: Nvidia Corp
Priority date: 2004-12-06
Filing date: 2005-12-05
Publication date: 2010-08-11
Also published as: WO2006062950A1; TW200636491A

Description

1328746 玖、發明說明：【發明所屬之技術領域】本發明係有關於在一電腦環境内部之資料傳輸，以及，特別是有關於在此一電腦環境内部之點對點資料傳輸。【先前技術】1328746 发明, INSTRUCTION DESCRIPTION: TECHNICAL FIELD OF THE INVENTION The present invention relates to data transmission within a computer environment, and more particularly to point-to-point data transmission within such a computer environment. [Prior Art]

在現代電腦環境中，大量的裝置被廣泛地互相連接以便在該電腦環境内部提供處理速度與彈性。要創造此一電腦環境，各式裝置經由一互連結構像是一網路或是匯流排架構來與另外的一裝置連接。連接至該互連結構的該等裝置通常包含本地記憶體，該記憶體係在一計算過程期間由一裝置所使用。In modern computer environments, a large number of devices are widely interconnected to provide processing speed and resiliency within the computer environment. To create this computer environment, various devices are connected to another device via an interconnect structure such as a network or bus-slot architecture. The devices connected to the interconnect structure typically include local memory that is used by a device during a computational process.

此一電腦環境之一範例係用於圖像處理，其中複數個圖像處理單元（GPU)係透過一互連結構連接至另外的 GPU，且各個 GPU係耦合至一圖框緩衝器（也就是本地記憶體）。該圖框緩衝器儲存由該等個別之GPU所處理的圖像資料。通常而言，大量的資料需要由該GPU來處理，以繪製材質並創造其他圖像資訊以用來顯示。為能達到快速處理，該處理作業係依GPU數量將其分割，因此該作業之組件能以平行方式來執行。有時候，在此一電腦環境中，該圖像處理單元可需要使用儲存在一端點GPU之圖框緩衝器中的資訊，或可需要寫入資訊至一端點 GPU之圖框緩衝器中，如此該端點 GPU可於本地端使用該資訊。目前，許多互連結構標準之 6 1328746An example of such a computer environment is for image processing, in which a plurality of image processing units (GPUs) are connected to another GPU through an interconnect structure, and each GPU is coupled to a frame buffer (ie, Local memory). The frame buffer stores image data processed by the individual GPUs. In general, a large amount of data needs to be processed by the GPU to draw materials and create other image information for display. In order to achieve fast processing, the processing is divided by the number of GPUs, so the components of the job can be executed in parallel. Occasionally, in this computer environment, the image processing unit may need to use information stored in a frame buffer of an endpoint GPU, or may need to write information to a frame buffer of an endpoint GPU, such that The endpoint GPU can use this information on the local end. Currently, many interconnection structure standards 6 1328746

實作像是 AGP、PCI、PCI-ExressTM，先行 switching)等等，可致能端點以寫入資訊到另一址空間，但並未致能儲存在另一端點之位址空讀取能力。結果，該等圖像處理單元將加倍努力以已經創造完成的資料，因為他們並未具有在該之端點位址空間中存取到一端點之圖框緩衝器外，一 GPU可達成某些資訊之處理；若資料仍需，該創造者可寫入該資訊到一普通可用之系該系統記憶體係透過任何一個該等連接至該 GPU所存取。然而，使用普通系統記憶體給此是消耗時間且增加額外支出的處理。不變的是統記憶體的使用可減緩該圖像處理。因此，在該領域中用以改善在一電腦環對點傳輸資訊之方法與設備是有需要的。【發明内容】本發明之具體實施例通常直接為一用以結構來提供點對點資料傳輸的方法與設備。本越在一互連結構（an interconnecting fabric)中 4 入要求，來致能一第一裝置以便讀取/寫入資料裝置之本地記憶體當中。即使當該互連結構之i 允許此類傳輸，此類資料傳輸方式仍可被執行換（advance 個端點的位間的資訊的創造其端點資訊所儲存的權利。此為一端點所統記憶體。互連結構的類資料傳輸，該普通系境内部中點透過一互連發明透過穿 =遞讀取與寫自/至一第二 ί訊協定並未 7 1328746The implementation is like AGP, PCI, PCI-ExressTM, switching, etc., which can enable the endpoint to write information to another address space, but does not enable the address to read at another endpoint. . As a result, the image processing units will redouble their efforts to create the completed material because they do not have a frame buffer that accesses an endpoint in the endpoint address space of the endpoint, and a GPU can achieve certain The processing of the information; if the data is still needed, the creator can write the information to a commonly available system memory system through any one of the connections to the GPU. However, using normal system memory gives this a processing that consumes time and adds extra expense. What remains unchanged is the use of the memory to slow down the image processing. Therefore, there is a need in the art for methods and apparatus for improving the transmission of information in a computer peer-to-peer. SUMMARY OF THE INVENTION A particular embodiment of the present invention is generally directed to a method and apparatus for providing peer-to-peer data transmission. The more the requirement is in an interconnecting fabric to enable a first device to be read/written in the local memory of the data device. Even when the i of the interconnect structure allows such transmissions, such data transfer methods can still be executed (advance the information between the bits of the endpoints to create the rights stored in their endpoint information. This is an endpoint Memory. The type of data transmission of the interconnected structure. The internal midpoint of the common system is transmitted through an intervening invention through the pass-through read and write/from a second protocol. 7 1328746

【實施方式】第1圖所示為一電腦環境1 0 0，至少包含一系統級電腦1 0 2，與互連結構1 0 4以及複數個端點裝置1 0 3 105。該等端點裝置，雖然其可為任何形式之電腦裝置，括電腦系統、網路應用裝置、儲存裝置、積體電路、微制器等等，在第1圖中的該具體實施例中該裝置103 105，至少包含圖像處理單元（GPU) 106與 108。雖然在僅描述兩個裝置1 0 3與1 0 5，在該領域中熟悉技藝者將解該發明係可應用於任何數量的裝置。如下所述，本發提供一方法與設備以促進在該端點裝置103與105之間資料傳輸。該系統電腦1 0 5為一泛用處理電腦系統，其至少含一中央處理單元（CPU) 126、一系統記憶體128、資源理器129與支援電路130。在本發明之一具體實施例中此系統電腦係一電腦或伺服器的「主機板」。該系統電 1 02使用該裝置1 03與1 0 5來提供特定種類的計算（例如像處理）。該CPU126可為泛用微處理器或微控制器的任形式其令之一。系統記憶體包含隨機存取記憶體、唯讀憶體、可移除式儲存設備、磁碟驅動儲存設備或任何記體裴置的結合。該資源管理器1 2 9配置位址資訊到裝置像是在電腦環境1〇〇内部之裝置1〇3與1〇5，並產生一憶體映像給該系統記憶體1 2 8。該支援電路1 3 0為用於進該電腦系統1 〇 2之功能性的已知電路，該電路包括時電路、快取、電源供應器、介面電路系統等等。層與包控與此了明的包管，腦圖何記憶記促脈 8 1328746 該互連結構1 〇4(以下將簡單地將其稱為該「結構」）至少包含許多形式的架構之一，像是致能資料以從一端點裝置傳輸到另一端點裝置，或傳輸到系統記憶體。此結構包括一先行切換網路或一支援AGP、PCI-Express™或PCI 匯流排協定或任何其他形式之可用於戶連端點裝置的架構互連性之匯流排橋接裝置。在該領域中已知的互連結構 104之一範例為Intel®的北橋晶片。[Embodiment] FIG. 1 shows a computer environment 100 including at least one system level computer 102, an interconnection structure 104, and a plurality of endpoint devices 1 0 3 105. The endpoint device, although it can be any form of computer device, including a computer system, a network application device, a storage device, an integrated circuit, a micro-processor, etc., in the specific embodiment of FIG. 1 Apparatus 103 105 includes at least image processing units (GPUs) 106 and 108. Although only two devices 1 0 3 and 1 0 5 are described, those skilled in the art will recognize that the invention is applicable to any number of devices. As described below, the present disclosure provides a method and apparatus for facilitating data transfer between the endpoint devices 103 and 105. The system computer 105 is a general purpose processing computer system including at least a central processing unit (CPU) 126, a system memory 128, a resource processor 129, and a support circuit 130. In one embodiment of the invention, the system computer is a "master board" of a computer or server. The system uses the devices 103 and 105 to provide a particular type of calculation (e.g., image processing). The CPU 126 can be any of the forms of a general purpose microprocessor or microcontroller. The system memory contains a combination of random access memory, read-only memory, removable storage devices, disk drive storage devices, or any record device. The resource manager 192 configures the address information to the device, such as devices 1〇3 and 1〇5, within the computer environment, and generates a memory image to the system memory 128. The support circuit 130 is a known circuit for the functionality of the computer system 1 , 2, which includes a time circuit, a cache, a power supply, an interface circuit system, and the like. Layer and packet control with this clear package, brain map and memory record 8 1328746 The interconnect structure 1 〇 4 (hereinafter simply referred to as the "structure") contains at least one of many forms of architecture, It is like enabling data to be transferred from one endpoint device to another endpoint device or to system memory. The architecture includes a look-ahead switching network or a bus bridge that supports AGP, PCI-ExpressTM or PCI bus protocol or any other form of architectural interconnect that can be used to host endpoint devices. An example of an interconnect structure 104 known in the art is an Intel® North Bridge wafer.

雖然在第 1圖中，端點裝置 10 3相異於端點裝置 105，在某些本發明之具體實施例中，該端點裝置 103與 105各者之邏輯與軟體可為相同的。該端點裝置103包括一耦合到一圖框緩衝器110之主要GPU 106，並支援電路 116。在該主要 GPU106 之内部為一記憶體管理單元 (MMU)167、結構介面166、頁面表格 136與主機端邏輯 1 3 4。該結構介面1 6 6耦合主機端邏輯1 3 4到該結構1 0 4。該主機端邏輯134至少包含一讀取完成郵件匣140與一標籤142。該圖框緩衝器110基本上為某些具有非常大容量之形式的隨機存取記憶體，例如在 2個十億位元組 (gigabyte)單位或以上的等級。該記憶體管理單元167將一圖框緩衝器110與一代理168耦合至其他在主要GPU 106 内部之單元。該代理168在該主要GPU106與一或多個客戶端】1 2之間進行聯繫（例如在該電腦環境内部之處理或硬體）。該支援電路116至少包含促進該主要GPU 106之功能性的已知電路，而該電路包括時脈電路、介面硬體、電源供應器等等。 9 1328746Although in Figure 1, the endpoint device 10 is distinct from the endpoint device 105, in some embodiments of the present invention, the logic and software for each of the endpoint devices 103 and 105 may be the same. The endpoint device 103 includes a primary GPU 106 coupled to a frame buffer 110 and supports circuitry 116. Inside the main GPU 106 is a memory management unit (MMU) 167, a fabric interface 166, a page table 136, and host side logic 134. The fabric interface 166 couples the host side logic 1 3 4 to the structure 104. The host side logic 134 includes at least a read completion message 140 and a label 142. The frame buffer 110 is basically a random access memory of some form having a very large capacity, for example, in the order of two gigabyte units or more. The memory management unit 167 couples a frame buffer 110 and a proxy 168 to other units internal to the primary GPU 106. The agent 168 contacts the primary GPU 106 with one or more clients (e.g., processing or hardware within the computer environment). The support circuit 116 includes at least known circuitry that facilitates the functionality of the primary GPU 106, which includes a clock circuit, an interface hardware, a power supply, and the like. 9 1328746

該端點裝置105至少包含一编合至一圖框目標GPU 108與支援電路124。該圖框緩衝器基些形式之具有非常大容量的隨機存取記憶體，分個十億位元組（gigabyte)單位或以上的等級。在該 108内部係一結構介面174與主機端邏輯172。該 174耦合主機端邏輯172至該結構1〇4。該主機端至少包含一讀取郵件匣16〇、一寫入資料郵件匣寫入位址郵件匣156。該支援電路124至少包含標GPU 108之功能性的已知電路，而該電路包路、介面硬體、電源供應器等等。在操作中’一資料傳輸處理起始於該等客其中之一以要求存取至在一圖框緩衝器11〇或n 資料。該客戶端112與執行於該主要GPU 106之該通訊。該代理168與該記憶體管理單元ι67通訊係判斷該要求是否需要存取至一本地圖框緩衝器框緩衝器Π0，或是判斷該要求是否需要存取至框缓衝器，像是圖框緩衝器Π8。根據本發明，在一電腦環境中其係允許該客存取在一端點緩衝器118内部之資料，該頁面表適用來指出在該端點圖框緩衝器118中哪個實體址係存取給一特定的虛擬位址，該虛擬位址係對客戶端112發出之該讀取或寫入要求。該頁面表該資源管理器129在初始過程中所產生，以反射映像，例如本地端、系統、端點等等。在該頁面緩衝器之本上為某 J如在 2 目標GPU 結構介面邏輯1 72 1 5 4 與一促進該目括時脈電戶端1 12 8内部的代理1 6 8 ，該單元 ’像是圖一端點圖戶端1 12 格136係記憶體位應至從該格136由該記憶體表格1 36 10 1328746 内部之一屬性欄位可識別出該資料是否關聯於一本地圖框緩衝器1 1 0，還是關聯於該系統記憶體1 2 8，還是關聯於該圖框緩衝器 118，或並未關聯於一記憶體。如之前上面所述，該記憶體管理單元167使用該頁面表格136來將關聯於從該客戶端112發出之該讀取或寫入要求的該資料用於該讀取或寫入位址，並據此判斷該位址是否在該端點圖框緩衝器1 1 8内部。The endpoint device 105 includes at least one frame GPU 108 and a support circuit 124. The frame buffer has a very large capacity random access memory in a form of a gigabyte unit or more. Within the 108 is a structural interface 174 and host side logic 172. The 174 couples the host side logic 172 to the structure 1〇4. The host side includes at least one read mail 匣 16 一, one write data mail 匣 write address mail 匣 156. The support circuit 124 includes at least the known circuitry of the functionality of the GPU 108, the circuit package, the interface hardware, the power supply, and the like. In operation, a data transfer process begins with one of the users requesting access to a frame buffer 11 or n data. The client 112 is in communication with the primary GPU 106. The agent 168 communicates with the memory management unit ι 67 to determine whether the request needs to be accessed to a map frame buffer box buffer Π 0, or to determine whether the request needs to be accessed to the frame buffer, such as a frame. Buffer Π 8. In accordance with the present invention, in a computer environment, the user is allowed to access data within an endpoint buffer 118 that is adapted to indicate which physical location is accessed in the endpoint frame buffer 118. A specific virtual address that is the read or write request issued by the client 112. The page table The resource manager 129 is generated during the initial process to reflect the image, such as the local end, system, endpoints, and the like. On the page buffer is a certain J as in the 2 target GPU structure interface logic 1 72 1 5 4 and a proxy 1 6 8 that promotes the internal clock terminal 1 1 8 8 , the unit 'like Figure 1 Endpoint diagram client 1 12 grid 136 series memory location should be from the grid 136 from the memory table 1 36 10 1328746 inside an attribute field can identify whether the data is associated with a map frame buffer 1 1 0, is still associated with the system memory 128, or associated with the frame buffer 118, or is not associated with a memory. As previously described above, the memory management unit 167 uses the page table 136 to use the material associated with the read or write request issued from the client 112 for the read or write address, and Based on this, it is judged whether the address is inside the endpoint frame buffer 1 1 8 .

更詳細地說，用於辨識與解碼在一本地端或遠端圖框緩衝器内之資訊的資訊是儲存在該頁面表格136並由該要求端點裝置103所使用的輸入端。該等頁面表格輸入端係由資源管理軟體所維持，並由該記憶體管理單元1 6 7所解譯。重要的是，該實體位址與參考在一從該客戶端 112 發出之要求的各個頁面之資料類型，係由在該頁面表格 136中之該資源管理器129所儲存。該目標端點裝置105 需要此資訊以提供該必須資料給該要求端點裝置1 0 3。為了此說明之目的，「資料類型」可包括（但非僅限於），終端要求、壓縮格式、資料結構組織、關於該資料係如何說明之資訊、任何相關於當從一本地端提取或儲存時，該資料將如何參考或轉換的資訊、系統或端點位址空間或任何以上之結合。從一端點圖框緩衝器1 1 8中讀取資料之細節係在關於第3圖的以下部份中提出，而寫入資料至一端點圖框緩衝器118中之細節係在關於第4圖的以下部份中提出。第2圖所示為第1圖中該電腦環境之一功能性區塊 11 1328746In more detail, the information used to identify and decode information in a local or remote frame buffer is the input stored in the page table 136 and used by the requesting endpoint device 103. The page form inputs are maintained by the resource management software and interpreted by the memory management unit 167. Importantly, the entity address and the data type of each page referenced at a request from the client 112 are stored by the resource manager 129 in the page table 136. The target endpoint device 105 needs this information to provide the required information to the requesting endpoint device 103. For the purposes of this description, "data type" may include (but is not limited to), terminal requirements, compressed format, data structure organization, information about how the data is described, and any relevant when extracted or stored from a local end. , how the information will be referenced or converted, information, system or endpoint address space or any combination of the above. The details of reading data from an endpoint frame buffer 1 1 8 are presented in the following section regarding FIG. 3, and the details of writing data into an endpoint frame buffer 118 are related to FIG. Presented in the following sections. Figure 2 shows one of the functional areas of the computer environment in Figure 1 1 1328746

圖，特別是該端點裝置103與105的互連性。此互能該端點裝置103以存取（讀取或寫入）該端點裝置圖框緩衝器1 1 8，而不需對該結構1 04或該結構介 174進行修改。如此一來，如同該結構104本身之介面1 66與1 74可以為任何標準互連性結構，並與訊功能性於該等端點裝置〗〇 3，1 0 5的介面相關。為對該端點裝置105之圖框緩衝器118的存取，該主輯 134提供該促進對該端點圖框緩衝器118之存性，而不需該結構介面166，174之變更。結果，當戶端112發出並用於在該端點圖框緩衝器118内部的一要求係由該主機端134所接收，該主機端邏輯行該資料要求之一轉換到一由該結構介面1 6 6所支定中。該結構介面166透過該結構105傳輸該要求點裝置之結構介面1 74。該主機端1 72提供功能性在該遠端端點裝置105内部的該圖框緩衝器118。舉例而言，如果耦合到該端點裝置1 0 3之該 112請求從在該端點裝置105内部之圖框118發出該主機端邏輯轉換該要求成一個可理解的格式，並過該結構介面166、該結構104與該結構介面174萍如此一來，該要求將會自該端點裝置103通過到該置105,並透過在該端點裝置105内部之該主機端邏處理。該主機端邏輯172提供存取到該圖框緩衝器因此該資料將可被讀取亦可被寫入，其係根據在該置105内部之該要求。若該要求為一讀取要求，從連性致 105之面 1 6 6，該結構提供通了達成機端邏取功能從該客之資料 134執持的協到該端以存取客戶端資料，準備透 L使用。端點裝輯172 118，端點裝該圖框 12 1328746The figure, in particular, the interconnectivity of the endpoint devices 103 and 105. The mutual end device 103 accesses (reads or writes) the endpoint device frame buffer 1 1 8 without modifying the structure 104 or the fabric 174. As such, the interfaces 1 66 and 1 74 of the structure 104 itself can be any standard interconnect structure and are associated with the interface of the endpoint devices 〇 3, 105. For access to the frame buffer 118 of the endpoint device 105, the host 134 provides for facilitating the presence of the endpoint frame buffer 118 without requiring changes to the fabric interface 166, 174. As a result, when a request issued by the client 112 and used within the endpoint frame buffer 118 is received by the host 134, the host-side logic row converts one of the data requests to a configuration interface 16 6 is in the middle. The structural interface 166 transmits the structural interface 1 74 of the desired point device through the structure 105. The host side 1 72 provides the frame buffer 118 functional inside the remote endpoint device 105. For example, if the 112 request coupled to the endpoint device 103 requests the host-side logical translation from the frame 118 inside the endpoint device 105, the request is in an understandable format and passes through the fabric interface. 166. The structure 104 and the structure interface 174 are such that the request passes from the endpoint device 103 to the device 105 and is processed by the host terminal inside the endpoint device 105. The host side logic 172 provides access to the frame buffer so that the data will be readable and can be written, depending on the requirements within the 105. If the request is a read request, from the connection of the 105, the structure provides access to the client data through the cooperation of the client's data 134 to the terminal. , ready to use through L. Endpoint Installation 172 118, Endpoint Mounting This Frame 12 1328746

緩衝器118發出之該資料係從該主機端邏輯172以該結介面174'該結構104與該結構介面166所接受之此一式來傳送，且傳遞該資料到該端點裝置1 03，其中該主端邏輯134將處理該資料。在此方式中，端點裝置103 1 〇 5可執行點對點資料傳輸而不需對該結構介面1 6 6，1 或該結構1 04進行任何修改。結果，包含本發明之端點置在該電腦環境1 00中以一使用標準通訊協定之標準方來通訊。在一特定具體實施例中，該結構1 〇 4與該結構介 166，174支援PCI或 PCI-Express™。如同在該領域中已知的，某些使用 PCI與 PCI-Express™通訊協定的結 1 04實作並未准許一讀取要求來通訊以促進點對點讀取能。然而，此些相同之結構1 04實作確准許一寫入要求通訊以促進點對點寫入功能。因此，標準協定准許一端裝置已寫入資料到另一端點之圖框缓衝器，但是將不會許一端點裝置從另一端點之圖框緩衝器中讀取資料。本明透過致能主機端邏輯134，172克服了在標準協定中的缺點，以提供增強的功能性來造成在端點裝置間讀取與入資料的能力。讀取異動第3圖所示為根據本發明，在一電腦環境中用於點對點之方式讀取資料的一處理3 00之流程圖。為了能有效的了解該發明，要求讀取資料之該端點裝置103在下將被辨識為該「主要裝置」，當包含該資訊以便讀取之構方機與 74 裝式面所構功來點准發此寫從最以該 13 1328746The data sent by the buffer 118 is transmitted from the host side logic 172 in the manner that the interface 174' the structure 104 and the structure interface 166 accept, and the data is passed to the endpoint device 103, where The master logic 134 will process the data. In this manner, the endpoint device 103 1 可执行 5 can perform point-to-point data transmission without any modification to the fabric interface 166, 1 or the structure 104. As a result, the endpoints of the present invention are placed in the computer environment 100 to communicate using a standard protocol using standard protocols. In a particular embodiment, the structure 1 与 4 and the structure 166, 174 support PCI or PCI-ExpressTM. As is known in the art, some implementations of the PCI and PCI-ExpressTM protocol do not permit a read request to communicate to facilitate point-to-point read performance. However, such identical structure 104 does permit a write request communication to facilitate the point-to-point write function. Therefore, the standard protocol permits one device to write data to the frame buffer of the other endpoint, but will not allow an endpoint device to read data from the frame buffer of the other endpoint. The present invention overcomes the shortcomings in the standard protocol by enabling host-side logic 134, 172 to provide enhanced functionality to enable the ability to read and write data between endpoint devices. Read Transaction Figure 3 is a flow diagram of a process 300 for reading data in a point-to-point manner in a computer environment in accordance with the present invention. In order to effectively understand the invention, the endpoint device 103, which is required to read the data, will be identified as the "main device" underneath, and when the information is included for reading, the constructor and the 74-faced surface are constructed. Quasi-issue this write from the most to the 13 1328746

端點裝置105將被辨識為該「目標裝置」。該處理300起始於步驟302並進行步驟304係位於根據從該客戶端112發出 168處，並從該目標圖框緩衝器118請求驟3 0 6，該主要裝置1 0 3必須判斷該資料由擁有該記憶體管理單元167存取該頁面資料係儲存於本地端，該處理300進行至衝器110中擷取資料的步驟408。該資料提供給該代理，且該處理300結束於步驟另一方面，如果在步驟 306，該資在該目標裝置105，則該方法300進行至古 312,在該主要裝置103内部之該主機端邏該客戶端112之該讀取指令與該位址與資裝至一寫入到該端點之讀取要求封包（也 105)。因為該結構104可不需致能讀取要裝過程必須執行。然而該結構1 04將了解要求。因此，該主機端邏輯134封裝該讀資訊包括一指令欄位、一辨識該資料係位衝器1 1 8内部何處的實體位址、所請求之及該資料之資料類型（例如一針對用來編該格式的辨識器）。在一具體實施例中，該可通過該記憶體管理單元167判斷來自該該資料之實體位址、該資料之大小以及型。在該領域中一熟知技藝人士將可辨識至步驟3 04，該之指令的該代理讀取資料。在步係位於何處，藉表格1 36。若該從該本地圖框缓則在步驟3 1 0時 3 3 2 ° 料被判斷是位於 >驟3 1 2。在步驟輯1 3 4會把來自料類型，一同封就是該目標裝置求操作，故此封並傳遞此一寫入取指令資訊’該於該端點圖框緩該資料大小、以碼該儲存資料之 :主機端邏輯1 34 頁面表格1 3 6的該資料之資料類出，傳送該實體 14 1328746 位址到該目標裝置105可允許該讀取命令由目標裝置105 來處理，而不需任何位址轉換。The endpoint device 105 will be identified as the "target device." The process 300 begins at step 302 and proceeds to step 304 based on issuing 168 from the client 112 and requesting a trip 3 0 from the target frame buffer 118. The primary device 1 0 3 must determine the data by Having the memory management unit 167 accessing the page data is stored at the local end, and the process 300 proceeds to step 408 of extracting data from the buffer 110. The data is provided to the agent, and the process 300 ends on the other hand. If the resource is at the target device 105 at step 306, the method 300 proceeds to 312, the host side of the main device 103. The read command and the address of the client 112 are loaded to a read request packet (also 105) written to the endpoint. Because the structure 104 does not need to be able to read the loading process, it must be performed. However, the structure 104 will understand the requirements. Therefore, the host side logic 134 encapsulates the read information including an instruction field, a physical address identifying where the data system buffer 1 1 8 is internal, and a data type of the requested data (eg, for a specific purpose) To compile the recognizer of this format). In a specific embodiment, the memory management unit 167 can determine the physical address from the data, the size and type of the data. A person skilled in the art will recognize that step 3 04, the agent of the instruction reads the material. Where the step is located, borrow Form 1 36. If it is slow from the map frame, at step 3 1 0, 3 3 2 ° is judged to be located at > 3 3 2 . In the step series 1 3 4 will take the material type, the same seal is the target device to operate, so the seal and pass the write instruction information 'the size of the data in the endpoint frame, the code to store the data The data of the data of the host side logic 1 34 page table 1 3 6 is transmitted, and the transfer of the entity 14 1328746 address to the target device 105 allows the read command to be processed by the target device 105 without any bit Address conversion.

在步驟314，該主機端邏輯134將該寫入要求（包含該讀取要求封包）定址給一在該目標裝置 105中的讀取郵件匣 1 6 0。在一具體實施例中，此讀取郵件匣位址係由該資源管理器164所提供。在步驟316，該結構介面166通過該結構 1 04傳遞該要求給目標端點裝置1 05。在步驟 318，該組織介面174接收該寫入要求。在步驟320，該有效負荷係從該讀取要求封包所擷取出，藉此轉換包含該讀取要求封包之該寫入要求至一讀取指令。該主機端邏輯 172安置該讀取指令於該讀取郵件匣160, 一具有一深度之相稱先進先出（FIFO)緩衝器，其係相稱於具有最大數量之異動，該異動係由該主要裝置103允許下具有未處理部分。就此而論，該FIF 0 (或佇列）持有各個讀取指令直到其係由該目標裝置105所處理。在步驟322之後，該主機端邏輯172從該讀取郵件匣160移除該讀取指令並根據該讀取指令與資料類型，從該圖框緩衝器118讀取該資料。在步驟324，該主機端邏輯172封裝該所擷取資料至一完整封包，該封包係以如同一寫入要求封包之方式傳輸，並與由該結構104所支持的協定相容一致。該主機端邏輯172 於步驟326中，將該寫入要求定址到在該主要裝置103内部之該讀取完成郵件匣140。在步驟328，該結構介面174 通過該結構104傳送該寫入要求給請求者，也就是該客戶端112，並經由該主要裝置103。 15 1328746 在步驟330，該結構介面166接收該寫入要求。在步驟332，在該主要裝置103内部之該主機端邏輯134安置該寫入要求到該讀取完成郵件匣140。在步驟334，該主機端邏輯134從該讀取完成郵件匣140中擷取該資料，並於步驟336中經由該代理168提供資料給該客戶端112。該處理結束於步驟338。At step 314, the host side logic 134 addresses the write request (including the read request packet) to a read mail 匣 160 in the target device 105. In a specific embodiment, the read mail address is provided by the resource manager 164. At step 316, the fabric interface 166 passes the request to the target endpoint device 105 via the fabric 104. At step 318, the organization interface 174 receives the write request. In step 320, the payload is fetched from the read request packet, thereby converting the write request containing the read request packet to a read command. The host side logic 172 places the read command in the read mail port 160, a proportional first-in first-out (FIFO) buffer having a depth, which is commensurate with the largest number of changes, and the change is performed by the main device 103 allows for an unprocessed portion. In this connection, the FIF 0 (or queue) holds individual read commands until it is processed by the target device 105. After step 322, the host side logic 172 removes the read command from the read mail frame 160 and reads the data from the frame buffer 118 based on the read command and data type. At step 324, the host side logic 172 encapsulates the retrieved data into a complete packet that is transmitted as if it were the same write request packet and is consistent with the protocol supported by the structure 104. The host side logic 172 addresses the write request to the read completion message 140 within the primary device 103 in step 326. At step 328, the fabric interface 174 transmits the write request to the requestor, i.e., the client 112, via the fabric 104 and via the primary device 103. 15 1328746 At step 330, the fabric interface 166 receives the write request. At step 332, the host side logic 134 within the primary device 103 places the write request to the read completion message 匣 140. At step 334, the host side logic 134 retrieves the data from the read completion message 140 and provides the data to the client 112 via the agent 168 in step 336. The process ends at step 338.

當寫入要求在步驟332中安置到該讀取完成郵件匣 140時，位於該讀取完成郵件匣140下的追跡邏輯被用來追跡未處理讀取要求以及將該等未處理請求與自該目標裝置105所回傳之該等讀取完成封包相匹配。該追跡邏輯可為了回傳讀取資料至該適當的客戶端 112，而明確地儲存客戶端資訊。在本發明之某些具體實施例中，該讀取完成郵件匣14 0包括儲存資源，例如暫存器、記憶體等等，其係保留給該追跡邏輯所需，以用於在將該資料傳輸給客戶端1 1 2之前儲存所擷取資料。在一具體實施例中，在該主機端邏輯134内，當該讀取指令已發出時一標籤則產生，且那標籤會與該讀取完成標籤相比對，該讀取完成標籤係與寫入至該讀取完成郵件匣140之資料所共同接收。因此許多客戶端112可通過該代理168製作讀取要求給在該電腦環慶100内部之各式的圖框緩衝器，對於追跡該發布與完成讀取要求以避免可能錯誤而言是很重要的。如上所述，本發明由致能一點對點讀取異動來執行，即使該互連結構104可不為一讀取異動所支持。就此而論，在不需修改由該結構使用或是在該結構之特定實作 16 1328746 中所支持的該標準通訊協定的情況下，即可執行該讀取異動。寫入異動第4圖所示為根據本發明在一點對點電腦環境内部之一寫入要求異動的一流程圖。為了能最有效的了解該發明，當讀取以下之揭示時，該讀者應同時觀察第4圖與第 1圖兩者。When the write request is placed in the read completion message 匣 140 in step 332, the trace logic located under the read completion message 匣 140 is used to trace the unprocessed read request and the unprocessed request and the The read completion packets returned by the target device 105 match. The tracing logic can explicitly store client information in order to pass back the read data to the appropriate client 112. In some embodiments of the present invention, the read completion message 140 includes a storage resource, such as a scratchpad, a memory, etc., which is reserved for the trace logic for use in the data. Store the captured data before transmitting it to the client 1 1 2 . In a specific embodiment, within the host-side logic 134, a tag is generated when the read command has been issued, and the tag is compared to the read-complete tag, the read-complete tag is written and written. The data entered into the read completion email 140 is received in common. Therefore, many clients 112 can make a read request to the various frame buffers inside the computer ring 100 through the agent 168, which is important for tracking the release and completing the read request to avoid possible errors. . As described above, the present invention is performed by enabling a point-to-point read transaction, even though the interconnect structure 104 may not be supported by a read transaction. In this connection, the read transaction can be performed without modifying the standard communication protocol used by the structure or supported in the specific implementation of the structure 16 1328746. Write Change Figure 4 shows a flow diagram of a write request change in a one-to-one computer environment in accordance with the present invention. In order to most effectively understand the invention, the reader should observe both Figure 4 and Figure 1 when reading the following disclosure.

該處理400起始於步驟402並進行至步驟404，該步驟404係位於代理168，回應一客戶端112要求，並要求寫入資料至圖框緩衝器110或118。在步驟406,該記憶體管理單元167從該頁面表格136判斷用於寫入該資料的該位置是否在一本地圖框緩衝器110或在一端點圖框緩衝器118内部。若該位置係在一本地圖框缓衝器110内部，則該處理進行至該資料已寫入至該本地圖框緩衝器110的步驟408，且該步驟結束於步驟410。然而，若該資料係寫入至該端點圖框緩衝器 1 1 8，則該方法 4 0 0進行至步驟 412。在步驟412,該主機端邏輯134將分派至該客戶端112 寫入要求（其係在該頁面表格中）的一實體頁面位址和一資料類型，與儲存在該標籤142中（其係位於該主要裝置103 内）的該實體頁面位址和資料類型進行比較。如在此進一步所描述，列於該標籤142内的該實體頁面位址與資料類型，將對應至將該最後寫入要求輸出至該目標裝置1〇5(其係來自於該主要裝置103)之該實體頁面位址與資料類型。因此，在標籤142中之該實體頁面位 17 1328746The process 400 begins at step 402 and proceeds to step 404, which is located at the agent 168, in response to a request from the client 112, and requires the data to be written to the frame buffer 110 or 118. At step 406, the memory management unit 167 determines from the page table 136 whether the location for writing the material is within a map frame buffer 110 or within an endpoint frame buffer 118. If the location is within a map frame buffer 110, the process proceeds to step 408 where the material has been written to the map frame buffer 110, and the step ends at step 410. However, if the data is written to the endpoint frame buffer 1 1 8 , then the method 400 proceeds to step 412. At step 412, the host side logic 134 dispatches a physical page address and a data type assigned to the client 112 write request (which is in the page table), and is stored in the tag 142 (the system is located The physical page address and data type of the primary device 103 are compared. As further described herein, the physical page address and data type listed in the tag 142 will correspond to output the last write request to the target device 1〇5 (from the primary device 103). The physical page address and data type. Therefore, the physical page bit in the tag 142 17 1328746

址與資料類型，總是與在該目標裝置105之主機端邏葬内部，並儲存在該寫入位址郵件匣156的該實體頁面和資料類型相匹配。儲存在該寫入位址郵件匣156内實體頁面位址可定義一孔徑至該端點圖框緩衝器 118 孔徑係小於在該端點圖框緩衝器1 1 8内部之區域。舉言，該孔徑可對應至在該端點圖框緩衝器118内部的定記憶體頁面，而該寫入要求位址之該等較低位元提進入該孔徑之索引。該孔徑可藉由改變在該寫入位址匣156内的該實體頁面位址而移動，如在此進一步所當在標籤142中的該實體頁面位址和資料類型與該寫求的實體位址和資料類型相匹配時，進入該端點圖框器118之該孔徑係正確地對準於該客戶端Π2寫入要. 若在步驟412中分派到該客戶端112寫入要求實體頁面位址和資料類型並未與列於該標籤1 4 2内的體頁面位址和資料類型相匹配時，則該處理4 0 0進行驟 414。在步驟 414，為回應一錯配，在該主要裝置中的該主機端邏輯1 3 4創造一寫入位址要求以更新列寫入位址郵件S 1 5 6的該實體頁面位址和資料類型。驟之目的係建立一孔徑位置，該位置係匹配於分派至戶端112寫入要求的該實體頁面位址，以致於包含在戶端112寫入要求内的該資料可適當地寫入至該目標緩衝器 118。一般而言，此孔徑可調整到跟該目標圖衝器1 1 8之記憶體頁面一樣的大小。同樣地，該寫入郵件匣154可調整大小以便儲存該目標圖框缓衝器1 .172 位址的該。該例而一特供一郵件述。入要緩衝 K。之該該實至步 103 於該此步該客該客圖框框缓資料 .8之 18 1328746The address and data type are always matched to the physical page of the target device 105 and stored in the physical address and data type of the write address mail 156. Stored in the write address message 156, the physical page address defines an aperture to the endpoint frame buffer 118. The aperture system is smaller than the area inside the endpoint frame buffer 118. In other words, the aperture may correspond to a fixed memory page within the endpoint frame buffer 118, and the lower bits of the write request address are indexed into the aperture. The aperture can be moved by changing the physical page address within the write address 匣 156, as further described herein in the label 142 of the physical page address and data type and the physical bit of the write. When the address and the data type match, the aperture entering the endpoint framer 118 is correctly aligned with the client Π2 write request. If the client 112 is assigned to the request entity page bit in step 412 If the address and data type do not match the body page address and data type listed in the tag 142, then the process 400 proceeds to step 414. In step 414, in response to a mismatch, the host side logic 134 in the primary device creates a write address request to update the physical page address and data of the column write address mail S1 5 6 Types of. The purpose of the step is to establish an aperture position that matches the physical page address assigned to the client 112 write request, so that the data contained in the write request of the client 112 can be appropriately written to the Target buffer 118. In general, the aperture can be adjusted to the same size as the memory page of the target image 118. Similarly, the write message 154 can be resized to store the target frame buffer 1.172 address. This example is specially provided for an email. Into the buffer K. The fact that the step 103 is at this step, the guest, the guest frame, the frame is slowed down. 8 of 18 1328746

記憶體頁面。在步驟416,該主要裝置103的結傳送該寫入位址要求，其包含分派至該客戶端求（其係通過該結構 104到該目標裝置105)之位址和資料類型，。在步驟418，該目標裝置105的結構介該寫入位址要求，且在步驟420，該主機端邏: 包括在該寫入位址要求（其係於步驟414中創i 荷内的該實體頁面位址和資料類型，來更新該件匣156。當儲存在該寫入位址郵件匣156的位址已改變，該孔徑則移動，也就是位於在該衝器118内部之不同記憶體頁面。在步驟422 邏輯134更新該標籤142以配對在寫入位址郵的該實體頁面位址與資料類型。在此處進入該衝器118的該孔徑係正確地對準於該客戶端11: 在步驟424，該主機端邏輯134將一寫定址到位在該目標裝置105内之該寫入資料郵該寫入資料要求之有效負荷可為包括在該客戶要求内之所有或部分的資料。在步驟426，該結通過該結構104傳送該寫入資料要求至該目標在步驟428，該目標裝置105之結構介面174 資料要求。在步驟430，該主機端邏輯172將入資料要求之該有效負荷安置於該寫入資料郵在步驟432，該有效負荷從該寫入資料郵件匣在該圖框緩衝器 118内部之孔徑並由該寫入構介面1 6 6 1 1 2寫入要該實體頁面面174接收輯172使用 I:)之有效負寫入位址郵該實體頁面端點圖框緩，該主機端件匣156中端點圖框緩 2寫入要求。入資料要求件匣154。端1 1 2寫入構介面1 6 6 裝置105 。接收該寫入包括在該寫件匣 154。 1 54寫入至位址郵件匣 19 1328746 156所指定的一位址。進一步而言，該有效負載係以由包含在該寫入位址郵件匣156内之資料類型所指定的格式來寫入。該方法400結束於步驟434。Memory page. At step 416, the node of the primary device 103 transmits the write address request, which includes the address and data type assigned to the client request (through the structure 104 to the target device 105). At step 418, the target device 105 is structured to address the address request, and at step 420, the host side logic includes the write address request (which is tied to the entity in step 414). The page address and the data type are used to update the file 156. When the address stored in the write address message 156 has changed, the aperture is moved, that is, the different memory page located inside the buffer 118. The tag 142 is updated by the logic 134 at step 422 to pair the physical page address and data type of the address in the write address. The aperture entering the punch 118 here is correctly aligned to the client 11: At step 424, the host side logic 134 posts a write to the write data in the target device 105. The payload of the write data request may be all or part of the data included in the client request. Step 426, the node transmits the write data request to the target through the structure 104 to the target device 174 data request of the target device 105 in step 428. In step 430, the host side logic 172 enters the payload required by the data entry. Placement In the step 432, the payload is sent from the write data message to the aperture inside the frame buffer 118 and written by the write interface 1 6 6 1 1 2 to the physical page surface. The 174 receiving set 172 uses the effective negative write address of I:) to send the end point frame of the physical page to be slow, and the end point frame of the host end piece 匣 156 is slow to write the request. Input data requirements 匣 154. End 1 1 2 writes interface 1 6 6 device 105. Receiving the write is included in the write 匣 154. 1 54 Write to Address One of the addresses specified in Address Mail 匣 19 1328746 156. Further, the payload is written in a format specified by the type of material contained in the write address message 156. The method 400 ends at step 434.

在此方式中，一般產生在區塊中之額外數量的資料，像是1 2 8位元組，可以連續地寫入至該相同的記憶體頁面，也就是在該目標圖框緩衝器118内部之孔徑。一但該孔徑已安置，該主要裝置103之主機端邏輯134可快速地傳送包含儲存在該目標圖框緩衝器1 1 8 (例如該記憶體頁面）中之資料的連續寫入資料要求，以對應至該孔徑。當一客戶端112寫入要求跨越一頁面邊界時，該寫入位址郵件匣156已更新來移動該孔徑至一不同的記憶體頁面，如上所述。因為該寫入位址要求傳遞至該包括在該目標圖框緩衝器118内之一實體位址的寫入位址郵件匣，該目標裝置 1 0 5並不需要執行位址異動操作以使用接收自一端點裝置的寫入動作，而該寫入處理可非常有效率地執行。In this manner, an additional amount of data, such as a 1 2 8 byte, is typically generated in the block, which can be continuously written to the same memory page, that is, inside the target frame buffer 118. The aperture. Once the aperture has been placed, the host side logic 134 of the primary device 103 can quickly transmit a continuous write data request containing data stored in the target frame buffer 1 18 (eg, the memory page) to Corresponds to the aperture. When a client 112 write request crosses a page boundary, the write address mail 156 has been updated to move the aperture to a different memory page, as described above. Since the write address request is passed to the write address mail included in the physical address of the target frame buffer 118, the target device 1 0 5 does not need to perform an address change operation to use the receive address. The write operation from an endpoint device can be performed very efficiently.

當前述部分係針對本發明之具體實施例時，其他更多的本發明之具體實施例可在不違反其基本範疇的情況下所發明，且其範疇係由以下所述之申請專利範圍所判斷。【圖式簡單說明】為了讓在以上本發明之特徵所述之方法可被詳細地了解，一簡短的摘要於上且更為詳細的本發明之描述，可透 20 1328746While the foregoing is directed to the specific embodiments of the present invention, further embodiments of the present invention may be practiced without departing from the basic scope thereof, and the scope of the invention is determined by the scope of the claims described below. . BRIEF DESCRIPTION OF THE DRAWINGS In order to provide a more detailed description of the method of the present invention as set forth above, a brief summary of the present invention is described in more detail above.

過具體實施例作為參考，其中某些是描繪於該附加之圖中。然而，吾人可注意到，該附加之圖示所繪僅有標準本發明之具體實施例，且因此不需考慮去設限其範疇，讓該發明可允許其他均等有效的具體實施例。第1圖所示為一電腦環境，其係可在本發明之具體實例中被使用；第2圖所示為第1圖之該電腦環境中之一功能性區第3圖所示為根據本發明之一具體實施例，在一讀取理所執行之一流程圖；第4圖所示為根據本發明之一具體實施例，在一寫入理所執行之一流程圖；【主要元件符號說明】 100電腦環境 102電腦 104結構示的以施塊處處The specific embodiments are referred to as references, some of which are depicted in this additional figure. However, it is to be understood that the appended drawings are merely illustrative of specific embodiments of the invention and are not intended to Figure 1 shows a computer environment which can be used in a specific example of the present invention; Figure 2 shows a functional area in the computer environment of Figure 1 which is shown in Figure 3 A specific embodiment of the invention, a flow chart executed in a reading process; FIG. 4 is a flow chart executed in a writing process according to an embodiment of the present invention; Description] 100 computer environment 102 computer 104 structure shown in the block everywhere

105裝置 1 06圖像處理單元（GPU) 1 08圖像處理單元（GPU) 1 1 0圖框緩衝器 1 12客戶端 1 1 8圖框緩衝器 124支援電路 21 1328746 126中央處理單元（CPU) 1 2 8系統記憶體 129資源管理器 1 3 0支援電路 1 3 6頁面表格 1 6 8代理 1 1 6支援電路105 device 1 06 image processing unit (GPU) 1 08 image processing unit (GPU) 1 1 0 frame buffer 1 12 client 1 1 8 frame buffer 124 support circuit 21 1328746 126 central processing unit (CPU) 1 2 8 System Memory 129 Resource Manager 1 3 0 Support Circuit 1 3 6 Page Table 1 6 8 Agent 1 1 6 Support Circuit

167記憶體管理單元（MMU) 1 6 6結構介面 1 3 6頁面表格 1 34主機端邏輯 1 4 0郵件匣 142標籤 1 5 4資料郵件匣 1 5 6位址郵件匣 1 6 0讀取郵件匣167 Memory Management Unit (MMU) 1 6 6 Structure Interface 1 3 6 Page Table 1 34 Host Side Logic 1 4 0 Mail 匣 142 Label 1 5 4 Data Mail 匣 1 5 6 Address Mail 匣 1 6 0 Read Mail 匣

1 6 4資源管理器 174結構介面 1 72主機端邏輯 300處理 302〜338步驟 400處理 402~434 步驟 221 6 4 Resource Manager 174 Structure Interface 1 72 Host Side Logic 300 Processing 302~338 Steps 400 Processing 402~434 Step 22

Claims

1328746 Pickup, patent application scope: 1. A system for performing a point-to-point data transmission, comprising at least a first device; a second device; an interconnection structure; wherein the first device comprises at least a first host end Logic, which can be negotiated from the first device to the second device through the interconnect structure to create a read command, and a first structure interface, the read command is from the first through the interconnect structure Transmitting the device to the device; and wherein the second device includes at least a second host-side logic that takes a local memory within the second device to retrieve data identified by the command and a second structural interface And the interconnecting structure for transmitting the data from the second device to the device. 2. The system of claim 1, wherein the interconnecting bridge device supports AGP, PCI bus PCI-ExpressTM bus bar protocol. 3. The system of claim 1, wherein the first device comprises at least an image processing unit and the local memory is at least a frame buffer. Contains:

Using a communication for the second storage read reference to use the first structure as or PCI and the second inclusion a 23 1328746 4. A system for performing point-to-point data transmission, comprising at least: a device; a second device; and an interconnect structure;

The first device includes at least a first host-side logic for creating a write address request, the write address request includes a page address and a data type; creating a file that can be written a write data request for data into the local memory of the second device; and a structural interface for requesting the write address and the write data request from the first through the interconnect structure Transmitting a device to the second device;

The second device includes at least a second host side logic for responding to the address address request to be updated in a write address mail (which has the inclusion in the write address request) A page address and a data type of the page address and the data type are stored in the data message and stored in the local memory. 5. The system of claim 4, wherein the first host-side logic further comprises the address and the data type listed in a tag (which is present in the first device), and the dispatch Comparing an address of a write request (received by the first device) with a data type, wherein the address assigned to the write request and the data type are included in the write address The page address in the request is the same as the data type. The system of claim 5, wherein the first host-side logic further comprises a address and a data type listed on a tag (which is present in the first device) Updated to the page address and the data type written from the write address request to the write address message.

7. The system of claim 4, wherein the interconnect structure is a bridge device that supports AGP, PCI bus or PCI PCI-ExpressTM bus protocol. 8. The system of claim 4, wherein the first and second devices comprise at least an image processing unit, and the local memory comprises at least a frame buffer.

9. The system of claim 4, wherein the second host-side logic writes the data from the write data message to the local memory via an aperture. 25