TWI615803B

TWI615803B - System and method for improving the graphics performance of hosted applications

Info

Publication number: TWI615803B
Application number: TW102110747A
Authority: TW
Inventors: 道格拉斯辛二世迪崔; 妮可貝尼提茲; 提莫西卡特
Original assignee: 新力電腦娛樂（美國）責任有限公司
Priority date: 2012-03-26
Filing date: 2013-03-26
Publication date: 2018-02-21
Also published as: TW201351342A; WO2013148595A3; WO2013148595A2

Abstract

本發明揭示一種系統及方法，用於使用有限硬體及/或軟體資源有效地處理一視訊流。例如，用於使用具有複數個管線階段之一處理器管線來有效地處理一視訊流之一電腦實施方法之一實施例包括：識別該處理器管線內之一瓶頸階段，該瓶頸階段處理該視訊流之訊框；在一或多個上游階段處，自該瓶頸階段接收一回饋信號，該回饋信號提供該瓶頸階段處理該視訊流之該等訊框之速度之一指示；及回應地調整該一或多個上游階段處理該視訊流之訊框的速度以概算該瓶頸階段處理該視訊流之該等訊框之該速度。 The present invention discloses a system and method for efficiently processing a video stream using limited hardware and / or software resources. For example, one embodiment of a computer-implemented method for efficiently processing a video stream using a processor pipeline having a plurality of pipeline stages includes identifying a bottleneck stage within the processor pipeline, the bottleneck stage processing the video Frame of the stream; receiving a feedback signal from the bottleneck stage at one or more upstream stages, the feedback signal providing an indication of the speed at which the bottleneck stage processes the frames of the video stream; and adjusting the response in response The speed at which one or more upstream stages process the frames of the video stream to estimate the speed at which the bottleneck stage processes the frames of the video stream.

Description

System and method for improving image performance of hosted application

本發明大體上係關於資料處理系統之領域且特定言之係關於一種用於改善託管應用程式之圖像效能之系統及方法。 The present invention relates generally to the field of data processing systems and, more particularly, to a system and method for improving the performance of graphics for hosted applications.

本申請案係主張2009年3月23日申請之美國臨時申請案第61/210,888號之權利之2009年8月7日申請之標題為「System and Method for Accelerated Machine Switching」之美國專利申請案第12/538,077號之一部分接續申請案，且係2002年12月10日申請之標題為「Apparatus and Method for Wireless Video Gaming」之第10/315,460號之一部分接續(CIP)申請案，該申請案讓渡給本CIP申請案之受讓人。 This application claims the right of U.S. Provisional Application No. 61 / 210,888, filed on March 23, 2009, and filed on August 7, 2009 with the title of "System and Method for Accelerated Machine Switching" Partial continuation application No. 12 / 538,077, and a partial continuation application (CIP) application No. 10 / 315,460 entitled “Apparatus and Method for Wireless Video Gaming”, filed on December 10, 2002 To the assignee of this CIP application.

對於諸如視訊遊戲之低延時應用程式，儘可能有效地進行圖像操作係重要的。然而，嘗試加速圖像演現程序可導致非所期視覺失真，諸如「圖像撕裂」，即以一單一螢幕繪圖在一顯示裝置上展示來自兩個或更多個不同訊框之資訊。下文所述之本發明之實施例提供用於改善圖像演現之效率同時減小此等非所期視覺失真之多種技術。 For low-latency applications such as video games, it's important to perform image operations as efficiently as possible. However, attempts to speed up the image rendering process can lead to unintended visual distortions, such as "image tearing", that is, displaying information from two or more different frames on a display device with a single screen drawing. The embodiments of the present invention described below provide various techniques for improving the efficiency of image rendering while reducing these unintended visual distortions.

101‧‧‧控制系統 101‧‧‧control system

102‧‧‧伺服器 102‧‧‧Server

103‧‧‧區域儲存網路 103‧‧‧Local Storage Network

104‧‧‧低延時視訊壓縮 104‧‧‧Low-latency video compression

105‧‧‧磁碟陣列 105‧‧‧Disk Array

106‧‧‧控制信號 106‧‧‧Control signal

106a‧‧‧控制信號 106a‧‧‧Control signal

106b‧‧‧控制信號 106b‧‧‧Control signal

110‧‧‧網際網路 110‧‧‧Internet

112‧‧‧低延時視訊解壓縮 112‧‧‧Low-latency video decompression

113‧‧‧控制信號邏輯 113‧‧‧Control signal logic

115‧‧‧家用/辦公用戶端 115‧‧‧Home / office client

121‧‧‧輸入裝置 121‧‧‧ input device

122‧‧‧監控器/標準解析度電視機/高解析度電視機 122‧‧‧Monitor / Standard Resolution TV / High Resolution TV

202‧‧‧伺服器 202‧‧‧Server

204‧‧‧視訊壓縮器 204‧‧‧Video Compressor

206‧‧‧網際網路 206‧‧‧Internet

209‧‧‧路由 209‧‧‧route

210‧‧‧託管服務 210‧‧‧Hosting Services

211‧‧‧使用者場地 211‧‧‧user venue

215‧‧‧家用及辦公用戶端/平台/使用者用戶端 215‧‧‧Home & Office Client / Platform / User Client

221‧‧‧外部輸入裝置 221‧‧‧External input device

222‧‧‧監控器/標準解析度電視機/高解析度電視機 222‧‧‧Monitor / Standard Resolution TV / High Resolution TV

241‧‧‧中央辦公室、前端系統、小區發射塔 241‧‧‧Central office, front-end system, residential tower

241‧‧‧廣域網路介面 241‧‧‧WAN interface

242‧‧‧廣域網路介面 242‧‧‧WAN interface

243‧‧‧防火牆/路由器/天線 243‧‧‧Firewall / Router / Antenna

251‧‧‧控制信號 251‧‧‧Control signal

252‧‧‧使用者場地路由 252‧‧‧User Site Routing

253‧‧‧使用者網際網路供應商 253‧‧‧ User Internet Provider

254‧‧‧網際網路 254‧‧‧Internet

255‧‧‧伺服器中心路由 255‧‧‧Server-centric routing

256‧‧‧訊框計算 256‧‧‧Frame calculation

257‧‧‧視訊壓縮 257‧‧‧Video Compression

258‧‧‧視訊解壓縮 258‧‧‧Video decompression

301‧‧‧入站網際網路流量 301‧‧‧ inbound internet traffic

302‧‧‧入站路由/入站路由網路 302‧‧‧ Inbound Routing / Inbound Routing Network

311‧‧‧磁碟陣列 311‧‧‧disk array

312‧‧‧磁碟陣列 312‧‧‧disk array

315‧‧‧延遲緩衝區/磁碟陣列 315‧‧‧ Delay Buffer / Disk Array

321‧‧‧應用程式/遊戲伺服器 321‧‧‧Application / Game Server

322‧‧‧應用程式/遊戲伺服器 322‧‧‧Application / Game Server

325‧‧‧應用程式/遊戲伺服器 325‧‧‧Application / Game Server

329‧‧‧未經壓縮視訊/音訊 329‧‧‧ Uncompressed Video / Audio

330‧‧‧共用視訊壓縮/共用集區/共用硬體壓縮 330‧‧‧Common Video Compression / Shared Pool / Common Hardware Compression

339‧‧‧經壓縮視訊/音訊/出站網際網路流量 339‧‧‧compressed video / audio / outbound internet traffic

340‧‧‧出站路由/出站路由網路 340‧‧‧outbound routing / outbound routing network

350‧‧‧延遲緩衝區之視訊/音訊及/或分組視圖 350‧‧‧ Video / Audio and / or Grouped View of Delay Buffer

399‧‧‧出站網際網路流量 399‧‧‧ Outbound Internet Traffic

401‧‧‧中央處理單元(CPU) 401‧‧‧Central Processing Unit (CPU)

402‧‧‧圖像處理單元(GPU) 402‧‧‧Image Processing Unit (GPU)

403‧‧‧記憶體 403‧‧‧Memory

405‧‧‧後置緩衝區 405‧‧‧Back buffer

406‧‧‧前置緩衝區 406‧‧‧Pre-buffer

408‧‧‧視訊遊戲程式碼/視訊遊戲輸出/未經壓縮視訊輸出 408‧‧‧video game code / video game output / uncompressed video output

410‧‧‧圖像資料 410‧‧‧Image Information

430‧‧‧圖像引擎 430‧‧‧Image Engine

1301‧‧‧實際相機位置 1301‧‧‧actual camera position

1302‧‧‧預測相機位置 1302‧‧‧ Predict camera position

1303‧‧‧實際背景 1303‧‧‧actual background

1304‧‧‧演現背景 1304‧‧‧Acting background

P1‧‧‧中央處理單元 P1‧‧‧Central Processing Unit

P2‧‧‧圖像處理單元 P2‧‧‧Image Processing Unit

P3‧‧‧監控器 P3‧‧‧Monitor

P4‧‧‧瓶頸階段 P4‧‧‧ Bottleneck stage

Q12‧‧‧佇列 Q12‧‧‧ queue

Q23‧‧‧佇列 Q23‧‧‧ queue

Q34‧‧‧佇列 Q34‧‧‧ queue

圖1圖解說明根據本發明之一實施例之用於執行線上視訊遊戲之一系統架構。 FIG. 1 illustrates a system architecture for executing an online video game according to an embodiment of the present invention.

圖2圖解說明根據本發明之一實施例之可在其上玩一線上視訊遊戲之不同的通信頻道。 FIG. 2 illustrates an online video game on which an online video game can be played according to one embodiment of the present invention Different communication channels.

圖3圖解說明用於壓縮由一視訊遊戲產生之音訊/視訊之一系統架構之一實施例。 FIG. 3 illustrates one embodiment of a system architecture for compressing audio / video generated by a video game.

圖4圖解說明根據本發明之一實施例之一系統架構。 Figure 4 illustrates a system architecture according to one embodiment of the invention.

圖5至圖12圖解說明根據本發明之一實施例所採用之各種系統組件之間的資料流動及回饋。 5 to 12 illustrate data flow and feedback between various system components used according to an embodiment of the present invention.

圖13圖解說明一預測相機位置與一實際相機位置之間的區別。 FIG. 13 illustrates the difference between a predicted camera position and an actual camera position.

自隨後的實施方式及隨附圖式將會更完整地理解本發明，然而這不應被視為將所揭示之標的限於所示之特定實施例，而僅僅係為了解釋及理解之目的。 The invention will be more fully understood from the subsequent embodiments and accompanying drawings, however this should not be seen as limiting the disclosed subject matter to the specific embodiments shown, but merely for purposes of explanation and understanding.

在下列描述中陳述特定細節(諸如裝置類型、系統組態、通信方法等等)以提供對本發明之完全理解。然而，相關領域之一般技術者將明白，實踐所述之實施例不一定需要此等特定細節。 Specific details (such as device type, system configuration, communication method, etc.) are set forth in the following description to provide a complete understanding of the invention. However, those of ordinary skill in the relevant arts will understand that implementation of the described embodiments does not necessarily require these specific details.

本申請案之受讓人已開發出一種線上視訊遊戲及應用程式主機系統。例如，2009年8月7日申請之標題為「System and Method for Accelerated Machine Switching」之美國專利申請案第12/538,077號(下文稱為「077申請案」)中描述此系統之某些實施例，該申請案主張2009年3月23日申請之美國臨時申請案第61/210,888號之權利且係2002年12月10日申請之標題為「Apparatus and Method for Wireless Video Gaming」之第10/315,460號之一部分接續(CIP)申請案，其讓渡給本CIP申請案之受讓人。此等申請案有時候被稱為「同在申請中的申請案」且藉由引用併入本文。現在將簡要地描述在該等同在申請中的申請案中所述之線視訊遊戲及應用程式主機系統之某些相關態樣，接著詳細描述用於託管應用程式之一視覺化及加密系統及方法。 The assignee of this application has developed an online video game and application host system. For example, certain embodiments of this system are described in U.S. Patent Application No. 12 / 538,077, filed August 7, 2009, entitled "System and Method for Accelerated Machine Switching" , This application claims the right of U.S. Provisional Application No. 61 / 210,888, filed on March 23, 2009, and is filed on December 10, 2002 under the title of "Apparatus and Method for Wireless Video Gaming", No. 10 / 315,460 Part of the continuation (CIP) application, it is assigned to the assignee of this CIP application. These applications are sometimes referred to as "co-applications" and are incorporated herein by reference. Some related aspects of the online video game and application host system described in the equivalent application will now be briefly described, followed by a detailed description of a visualization and encryption system and method for hosting applications .

一例示性線上視訊遊戲及應用程式主機系統An exemplary online video game and application host system

圖1圖解說明同在申請中的申請案中所述之一視訊遊戲/應用程式託管服務210之一實施例。該託管服務210託管在伺服器102上運作之應用程式，該等伺服器102自一輸入裝置121接受藉由家用或辦公用戶端115接收且透過網際網路110發送至該託管服務210之輸入。該等伺服器102係回應於該輸入且因此更新其透過低延時視訊壓縮104壓縮之視訊及音訊輸出。經壓縮視訊接著透過網際網路110串流以待該家用或辦公用戶端115解壓縮，接著在一監控器或SD/HDTV 122上顯示。此系統係如前述提及的「同在申請中的申請案」中更完整描述之一低延時串流互動視訊系統。 FIG. 1 illustrates one embodiment of a video game / application hosting service 210 described in the same application. The hosting service 210 hosts applications running on the server 102, and the servers 102 receive input from an input device 121 via a home or office client 115 and send to the hosting service 210 via the Internet 110. The servers 102 respond to the input and therefore update their video and audio output compressed by the low-latency video compression 104. The compressed video is then streamed through the Internet 110 for the home or office client 115 to decompress, and then displayed on a monitor or SD / HDTV 122. This system is a low-latency streaming interactive video system as described more fully in the aforementioned "Applications in the Same Application".

如圖2中所示，可透過可靠程度不同的多種網路技術(諸如通常較為可靠之有線或光纖技術及可遭遇不可預測的干擾或範圍限制(例如，Wi-Fi)且通常不可靠之無線技術)實施該託管服務210與家用及辦公用戶端215之間的網路連接。此等用戶端裝置之任一者可具有其自身的輸入裝置(例如，鍵盤、按鈕、觸控螢幕、軌跡板或慣性感測識別筆、視訊捕捉相機及/或運動追蹤相機等等)或其可使用連接導線或無線連接之外部輸入裝置221(例如，鍵盤、滑鼠、遊戲控制器、慣性感測識別筆、視訊捕捉相機及/或運動追蹤相機等等)。如下文更詳細地描述，該託管服務210包含各種效能等級的伺服器，包含具有高性能CPU/GPU處理能力之伺服器。在該託管服務210上玩遊戲或使用一應用程式期間，一家用或辦公用戶端裝置215自使用者接收鍵盤及/或控制器輸入，接著其透過網際網路206將該控制器輸入傳輸至該託管服務210，該託管服務210執行遊戲程式碼作為回應且對遊戲或應用程式軟體產生連續的視訊輸出訊框(一系列視訊影像)(例如，若使用者按下一按鈕(這將引導螢幕上的角色移動至右側)，則遊戲程式將接著產生展示一系列視訊影像，其展示該角色移動至右側)。接著使用一低延時視訊壓縮器壓縮此系列視訊影像，且該託管服務210接著透過該網際網路206傳輸低延時視訊流。家用或辦公用戶端裝置接著解碼經壓縮視訊流且在一監控器或TV上演現經解壓縮視訊影像。因此，該用戶端裝置215之計算及圖像硬體需求大幅減小。該用戶端215僅需要具有將鍵盤/控制器輸入傳遞至該網際網路206並解碼且解壓縮接收自該網際網路206之一經解壓視訊流之處理能力，實際上當今任何個人電腦皆能夠以其CPU上之軟體完成上述操作(例如，在接近2GHz下運作之一Intel Corporation Core Duo CPU能夠解壓縮使用諸如H.264及Windows Media VC9之壓縮器編碼之720p HDTV)。且在任何用戶端裝置之情況中，專用晶片亦可針對此等標準以遠低於諸如一現代PC所需一般用途CPU之成本及功率消耗即時執行視訊解壓縮。注意，為執行傳遞控制器輸入及解壓縮視訊之功能，家用用戶端裝置215無需任何特製圖像處理單元(GPU)、光碟機或硬碟機。 As shown in Figure 2, a variety of network technologies with varying degrees of reliability, such as wired or fiber optic technologies that are generally more reliable Technology) Implement a network connection between the hosting service 210 and the home and office client 215. Any of these client devices may have their own input devices (e.g. keyboard, buttons, touch screen, trackpad or inertial recognition pen, video capture camera and / or motion tracking camera, etc.) or An external input device 221 (eg, a keyboard, a mouse, a game controller, an inertial sensing pen, a video capture camera, and / or a motion tracking camera, etc.) may be used with a connection wire or wireless connection. As described in more detail below, the hosting service 210 includes servers of various performance levels, including servers with high-performance CPU / GPU processing capabilities. While playing a game or using an application on the hosting service 210, a home or office client device 215 receives keyboard and / or controller input from the user, and then transmits the controller input to the server via the Internet 206 Hosting service 210 that executes game code in response and generates a continuous video output frame (a series of video images) to the game or application software (for example, if the user presses a button (this will guide the screen To the right), the game program will then generate a series of video images showing the character to the right). This series of video images is then compressed using a low-latency video compressor, and the hosting service 210 then uses the The Internet 206 transmits a low-latency video stream. The home or office client device then decodes the compressed video stream and presents the decompressed video image on a monitor or TV. Therefore, the computing and graphics hardware requirements of the client device 215 are greatly reduced. The client 215 only needs to have the processing capability to pass keyboard / controller input to the Internet 206 and decode and decompress a decompressed video stream received from the Internet 206. In fact, any personal computer today can use The software on its CPU does the above (for example, an Intel Corporation Core Duo CPU operating at close to 2GHz can decompress a 720p HDTV encoded using a compressor such as H.264 and Windows Media VC9). And in the case of any client device, a dedicated chip can also perform video decompression in real time for these standards at far lower cost and power consumption than a general-purpose CPU such as a modern PC requires. Note that in order to perform the functions of transmitting controller input and decompressing video, the home client device 215 does not need any special image processing unit (GPU), optical disk drive or hard disk drive.

由於遊戲及應用程式軟體變得愈來愈複雜且愈來愈逼真，其將需要較高效能的CPU、GPU、更多的RAM及更大且更快的磁碟機，且該託管服務210處之計算能力可能繼續提升，但是將無須最終使用者更新該家用或辦公用戶端平台215，這係因為其處理需求將使用一給定視訊解壓縮演算法針對一顯示解析度及訊框速率保持恆定。因此，圖解說明之系統中並不存在當今遭遇的硬體限制及相容問題。 As games and application software become more complex and more realistic, they will require higher-performance CPUs, GPUs, more RAM, and larger and faster disk drives. The computing power may continue to increase, but the end user will not be required to update the home or office client platform 215, because its processing requirements will use a given video decompression algorithm to maintain a constant display resolution and frame rate . Therefore, the illustrated system does not have the hardware limitations and compatibility issues encountered today.

進一步言之，因為遊戲及應用程式軟體僅在該託管服務210中之伺服器中執行，所以使用者之家裡或辦公室(除非另有描述，否則如本文使用「辦公室」應包含任何非住宅環境，包含(例如)學校宿舍)中不存在該遊戲或應用程式軟體之一複本(呈光學媒體之形式，或作為下載軟體)。此明顯地緩解非法拷貝(盜版)一遊戲或應用程式軟體之可能性，且緩解盜版、盜用或以其他方式折損可由一遊戲或應用程式軟體使用之一有價值資料庫之可能性。實際上，若玩遊戲或應用程式軟體需要無法實際用於家用或辦公室之特製伺服器(例如，需要極昂貴、極大或極嘈雜設備)，則即使獲得該遊戲或應用程式軟體之一盜版複本，其亦將不能在家裡或辦公室中操作。 Furthermore, because the game and application software is only executed on the server in the hosting service 210, the user ’s home or office (unless otherwise described, as used herein, the “Office” shall include any non-residential environment, Include, for example, a copy of the game or application software (in the form of optical media, or as download software) that does not exist in the school dormitory. This obviously alleviates the possibility of illegally copying (piracy) a game or application software, and alleviates the possibility of piracy, misappropriation or other damage to a valuable database that can be used by a game or application software. In fact, if you are playing a game or application software, you need a special server that you ca n’t actually use in your home or office (for example, Expensive, extremely large, or extremely noisy devices), even if a pirated copy of the game or application software is obtained, it will not work at home or in the office.

圖3圖解說明用於以下特徵描述之託管服務210之一伺服器中心之組件之一實施例。就圖1至圖2中圖解說明之託管服務210而言，除非另有描述，否則藉由一託管服務210控制系統101控制及協同此伺服器中心之組件。 FIG. 3 illustrates one embodiment of the components of a server center for a hosting service 210 described below. Regarding the hosting service 210 illustrated in FIGS. 1 to 2, unless otherwise described, a hosting service 210 control system 101 controls and cooperates with components of this server center.

來自使用者用戶端215之入站網際網路流量301引導至入站路由302。通常，入站網際網路流量301將經由至網際網路之一高速光纖連接進入伺服器中心，但是足夠頻寬、可靠度及低延時之任何網路連接方式將已足夠。入站路由302係網路(該網路可實施為乙太網路、光纖頻道網路或該網路可透過任何其他傳輸方式而實施)交換器及支持該等交換器之路由伺服器之一系統，該系統擷取到達封包且將各封包路由至適當的應用程式/遊戲(「app/game」)伺服器321至325。在一實施例中，傳遞至一特定app/game伺服器之一封包代表接收自用戶端及/或可藉由資料中心內之其他組件(例如，網路組件，諸如閘道器及路由器)轉發/改變之一資料子集。在一些情況中，例如，若一遊戲或應用程式同時在多個伺服器上並行運作，則每次將會將封包路由至一個以上伺服器321至325。RAID陣列311至312係連接至該入站路由網路302，使得該等app/game伺服器321至325可對該等RAID陣列311至312讀取及寫入。另外，一RAID陣列315(其可實施為多個RAID陣列)亦係連接至該入站路由302且可自app/game伺服器321至325讀取來自RAID陣列315之資料。該入站路由302可實施於多種先前技術網路架構中，包含一交換器之樹狀結構，該入站網際網路流量301在其根部處；實施於互連所有各種裝置之一網狀結構中；或實施為一系列互連子網路，其中互通信裝置中的集中流量與其他裝置中的集中流量隔離。一種網路組態係一SAN，雖然通常用於儲存裝置，但是其亦可用於裝置之間的一般高速資料傳送。此外，該等app/game伺服器321至325可各自具有至該入站路由302之多個網路連接。例如，一伺服器321至325可具有至附接至RAID陣列311至312之一子網路之一網路連接及至附接至其他裝置之一子網路之另一網路連接。 Inbound Internet traffic 301 from the user client 215 is directed to an inbound route 302. Generally, inbound Internet traffic 301 will enter the server center via a high-speed fiber optic connection to the Internet, but any network connection with sufficient bandwidth, reliability, and low latency will be sufficient. Inbound routing 302 is one of the networks (the network can be implemented as an Ethernet, Fibre Channel network, or the network can be implemented by any other transmission method) switches and one of the routing servers that support these switches System that retrieves the arriving packets and routes each packet to the appropriate application / game ("app / game") servers 321-325. In one embodiment, a packet passed to a particular app / game server is representatively received from the client and / or can be forwarded by other components (e.g., network components such as gateways and routers) in the data center / Change one of the data subsets. In some cases, for example, if a game or application is running on multiple servers simultaneously, packets will be routed to more than one server 321 to 325 at a time. The RAID arrays 311 to 312 are connected to the inbound routing network 302, so that the app / game servers 321 to 325 can read and write to the RAID arrays 311 to 312. In addition, a RAID array 315 (which can be implemented as multiple RAID arrays) is also connected to the inbound route 302 and can read data from the RAID array 315 from the app / game server 321 to 325. The inbound routing 302 can be implemented in a variety of prior art network architectures, including a tree structure of switches, the inbound Internet traffic 301 at its root; implemented in a mesh structure interconnecting all kinds of devices Medium; or implemented as a series of interconnected sub-networks, where centralized traffic in intercommunication devices is isolated from centralized traffic in other devices. A network configuration is a SAN. Although it is usually used for storage devices, it can also be used for devices General high-speed data transfer between. In addition, the app / game servers 321 to 325 may each have multiple network connections to the inbound route 302. For example, a server 321 to 325 may have a network connection to a subnet attached to one of the RAID arrays 311 to 312 and another network connection to a subnet attached to another device.

如先前所述，該等app/game伺服器321至325可經組態為全部相同、一些不同或全部不同。在一實施例中，各使用者在使用託管服務時通常使用至少一app/game伺服器321至325。為便於解釋，將假定一給定使用者正在使用app/game伺服器321，但是一使用者可使用多個伺服器，且多個使用者可共用一單一app/game伺服器321至325。如先前所述般，發送自用戶端215之使用者的控制輸入經接收作為入站網際網路流量301且透過輸入路由302路由至app/game伺服器321。app/game伺服器321使用該使用者的控制輸入作為在該伺服器上運作之遊戲或應用程式的控制輸入，而計算下一個視訊訊框及與其相關聯的音訊。app/game伺服器321接著將未經壓縮的視訊/音訊329輸出至共用視訊壓縮330。app/game伺服器可經由任何方式(包含一或多個千兆位元乙太網路連接)輸出該未經壓縮的視訊，但是在一實施例中，該視訊係經由一DVI連接輸出，而該音訊及其他壓縮及通信頻道狀態資訊則係經由一通用串列匯流排(USB)連接輸出。 As mentioned previously, the app / game servers 321 to 325 may be configured to be all the same, some different, or all different. In one embodiment, each user generally uses at least one app / game server 321 to 325 when using a hosting service. For ease of explanation, it will be assumed that a given user is using the app / game server 321, but a user can use multiple servers, and multiple users can share a single app / game server 321 to 325. As previously described, the control input sent by the user from the client 215 is received as inbound Internet traffic 301 and routed to the app / game server 321 via the input route 302. The app / game server 321 uses the user's control input as the control input for a game or application running on the server, and calculates the next video frame and the audio associated with it. The app / game server 321 then outputs the uncompressed video / audio 329 to the shared video compression 330. The app / game server can output the uncompressed video via any method (including one or more Gigabit Ethernet connections), but in one embodiment, the video is output via a DVI connection, and The audio and other compression and communication channel status information is output via a universal serial bus (USB) connection.

該共用視訊壓縮330壓縮來自該等app/game伺服器321至325之未經壓縮視訊及音訊。該壓縮可完全實施於硬體中或在硬體上運作的軟體中實施。各app/game伺服器321至325可存在一專用壓縮器，或若該等壓縮器足夠快，則可使用一給定壓縮器以壓縮來自一個以上app/game伺服器321至325之視訊/音訊。例如，在60fps速度下，一視訊訊框時間係16.67ms。若一壓縮器能夠在1ms內壓縮一訊框，則可使用該壓縮器以藉由依次自伺服器擷取輸入而壓縮來自多達16個app/game伺服器321至325之視訊/音訊，其中該壓縮器保留各視訊/音訊壓縮程序之狀態，且隨著該壓縮器在來自該等伺服器的視訊/音訊流之間循環處理而切換處理的內容。此導致大幅節省壓縮硬體之成本。因為不同的伺服器將在不同時間完成訊框，所以在一實施例中，壓縮器資源係與用於儲存各壓縮程序狀態之共用儲存構件(例如，RAM、快閃記憶體)在一共用集區330中，且當一伺服器321至325訊框完成且備妥以待壓縮時，一控制構件判定此時可用的壓縮資源，向該壓縮資源提供該伺服器之壓縮程序狀態及未經壓縮視訊/音訊之訊框以進行壓縮。 The shared video compression 330 compresses uncompressed video and audio from the app / game servers 321 to 325. The compression can be implemented entirely in hardware or in software running on hardware. A dedicated compressor may exist for each app / game server 321 to 325, or if the compressors are fast enough, a given compressor may be used to compress video / audio from more than one app / game server 321 to 325 . For example, at 60fps, the video frame time is 16.67ms. If a compressor can compress a frame in 1ms, the compressor can be used to compress video / audio from up to 16 app / game servers 321 to 325 by sequentially capturing input from the server, of which The compressor retains each video / audio The state of the video compression process, and the content of the processing is switched as the compressor cycles through the video / audio streams from these servers. This results in significant cost savings in compression hardware. Because different servers will complete the frames at different times, in one embodiment, the compressor resources are in a common set with the common storage components (e.g., RAM, flash memory) used to store the state of each compression process. Area 330, and when a frame of servers 321 to 325 is completed and ready to be compressed, a control component determines the compression resources available at this time, and provides the compression resource of the server's compression program status and uncompressed to the compression resources. Video / audio frame for compression.

注意，各伺服器之壓縮程序狀態之部分包含關於壓縮自身的資訊，諸如可用作對P微磚之一參考之先前訊框之解壓縮訊框緩衝區資料、視訊輸出之解析度；壓縮品質；平鋪結構；分配給每一微磚的位元；壓縮品質、音訊格式(例如，立體聲、環繞音效、Dolby® AC-3)。但是壓縮程序狀態亦包含關於以下各者之通信頻道狀態資訊：峰值資料速率及當前是否輸出一先前訊框(且因此應忽略當前訊框)，及是否潛在地存在壓縮中應考慮之頻道特性，諸如影響對壓縮的決定(例如，就I微磚之頻率等等而言)之過量封包損失。由於峰值資料速率或其他頻道特性隨時間而改變，如藉由支持發送自該用戶端215之各使用者監控資料之一app/game伺服器321至325判定，該app/game伺服器321至325將相關資訊發送至該共用硬體壓縮330。同在申請中的申請案中詳細描述該託管服務210之此等及其他特徵。 Note that the part of the compression process status of each server contains information about the compression itself, such as the decompressed frame buffer data of previous frames that can be used as a reference to one of the P-tiles, the resolution of the video output; compression quality; Tile structure; bits allocated to each microbrick; compression quality, audio format (for example, stereo, surround sound, Dolby® AC-3). But the compression program status also contains information about the communication channel status information about the following: peak data rate and whether a previous frame is currently output (and therefore the current frame should be ignored), and whether there are potentially channel characteristics that should be considered in compression Excessive packet loss such as influencing decisions on compression (e.g., in terms of the frequency of I-tiles, etc.). As the peak data rate or other channel characteristics change over time, as determined by app / game server 321 to 325 which supports one of the user monitoring data sent from the client 215, the app / game server 321 to 325 Send relevant information to the shared hardware compression 330. These and other features of the hosting service 210 are described in detail in the same application as the application.

該共用硬體壓縮330亦使用諸如之前所述之方式封包化經壓縮視訊/音訊，且若適當，應用FEC程式碼、複製某些資料或採取其他步驟以充分確保藉由該用戶端215接收視訊/音訊且解壓縮使得品質及可靠度儘可能高之能力。 The shared hardware compression 330 also encapsulates the compressed video / audio using a method such as described above, and if appropriate, applies FEC code, copies some data, or takes other steps to fully ensure that the video is received by the client 215 / Audio and decompression ability to make quality and reliability as high as possible.

諸如下文所述之一些應用程式需要一給定app/game伺服器321至325之視訊/音訊輸出在多個解析度(或以其他多種格式)下同時可用。若該app/game伺服器321至325如此通知該共用硬體壓縮330資源，則該app/game伺服器321至325之未經壓縮視訊/音訊329將同時以不同格式、不同解析度及/或在不同封包/錯誤校正結構中被壓縮。在某些情況中，壓縮相同視訊/音訊之多個壓縮程序間共用某些壓縮資源(例如，在許多壓縮演算法中，在施加壓縮之前存在按比例調整影像為多種大小之一步驟。若需要輸出不同大小的影像，則可使用此步驟以同時為多個壓縮步驟服務)。在其他情況中，針對每一格式將需要不同的壓縮資源。在任何情況中，針對一給定app/game伺服器321至325(無論係一個或多個)所需所有各種解析度及格式之壓縮視訊/音訊339將被同時輸出至出站路由340。在一實施例中，該經壓縮視訊/音訊339之輸出係呈UDP格式，因此其係一單向封包流。 Some applications, such as those described below, require the video / audio output of a given app / game server 321 to 325 to be available simultaneously at multiple resolutions (or in other multiple formats). If the app / game server 321 to 325 thus informs the shared hardware compression 330 resource, the uncompressed video / audio 329 of the app / game server 321 to 325 will be in different formats, different resolutions and / or at the same time Compressed in different packet / error correction structures. In some cases, certain compression resources are shared among multiple compression programs that compress the same video / audio (for example, in many compression algorithms, there is a step of scaling the image to multiple sizes before applying compression. If needed Output images of different sizes, you can use this step to serve multiple compression steps at the same time). In other cases, different compression resources will be required for each format. In any case, the compressed video / audio 339 for all the various resolutions and formats required for a given app / game server 321 to 325 (regardless of one or more) will be output to the outbound route 340 simultaneously. In one embodiment, the output of the compressed video / audio 339 is in UDP format, so it is a unidirectional packet stream.

該出站路由網路340包括一系列路由伺服器及交換器，其透過出站網際網路流量339介面(其通常將連接至一光纖介面而至網際網路)將各經壓縮視訊/音訊流引導至所要使用者或其他目的地及/或引導返回至延遲緩衝區315(在一實施中實施為一RAID陣列)及/或引導返回至該入站路由302及/或透過一私有網絡(未展示)輸出以供視訊發行之用。注意(如下所述)，該出站路由340可同時輸出一給定視訊/音訊流至多個目的地。在一實施例中，此係使用其中同時廣播意欲串流至多個目的地之一給定UDP流之網際網路協定(IP)多播而實施，且該廣播藉由該出站路由340中之路由伺服器及交換器轉發。廣播之多個目的地可為經由網際網路而至多個使用者用戶端、經由入站路由302而至多個app/game伺服器321至325及/或至一或多個延遲緩衝區315。因此，一給定伺服器321至325之輸出被壓縮為一或多種格式，且每一經壓縮流被引導至一或多個目的地。 The outbound routing network 340 includes a series of routing servers and switches that compress compressed video / audio streams through an outbound Internet traffic 339 interface (which will typically be connected to a fiber optic interface to the Internet). Guide to desired user or other destination and / or return to delay buffer 315 (implemented as a RAID array in an implementation) and / or return to the inbound route 302 and / or through a private network (not Display) for video distribution. Note (as described below) that the outbound route 340 can simultaneously output a given video / audio stream to multiple destinations. In one embodiment, this is implemented using Internet Protocol (IP) multicasting in which a given UDP stream intended to be streamed to one of multiple destinations is broadcast simultaneously, and the broadcast is routed through the outbound route 340 Forwarded by routing server and switch. The multiple destinations of the broadcast may be multiple users / clients via the Internet, multiple app / game servers 321-325 via inbound routing 302, and / or one or more delay buffers 315. Therefore, the output of a given server 321 to 325 is compressed into one or more formats, and each compressed stream is directed to one or more destinations.

進一步言之，在另一實施例中，若一使用者同時使用多個app/game伺服器321至325(例如，以一並行處理組態使用以產生一複雜場景之3D輸出)且各伺服器產生所得影像之一部分，則多個伺服器321至325之視訊輸出可藉由該共用硬體壓縮330組合為一組合訊框，且自此之後如同該視訊輸出來自一單一app/game伺服器321至325一般而如上所述般處置該視訊輸出。 Further, in another embodiment, if a user simultaneously uses multiple app / game servers 321 to 325 (for example, using a parallel processing configuration to generate a complex 3D output of various scenes) and each server generates a part of the obtained image, the video output of multiple servers 321 to 325 can be combined into a combined frame by the shared hardware compression 330, and it will be like the video from now on The output from a single app / game server 321 to 325 generally handles the video output as described above.

注意在一實施例中，由app/game伺服器321至325產生之所有視訊之一複本(呈至少具有使用者觀看之視訊之解析度或更高)在至少一定的分鐘數目(在一實施例中，15分鐘)記錄在延遲緩衝區315中。此容許各使用者可「倒帶觀看」來自各節(session)之視訊以檢視先前工作或功績(在一遊戲之情況中)。因此，在一實施例中，路由至一使用者用戶端215之各壓縮視訊/音訊輸出339流亦被多播至一延遲緩衝區315。當在一延遲緩衝區315上儲存視訊/音訊時，該延遲緩衝區315上之一目錄在經延遲視訊/音訊之來源之app/game伺服器321至325之網路位址與該延遲緩衝區315上可發現該經延遲視訊/音訊之位置之間提供一交叉參考。 Note that in one embodiment, one copy of all the videos generated by the app / game server 321 to 325 (presenting at least the resolution of the video viewed by the user or higher) is at least a certain number of minutes (in one embodiment (15 minutes, 15 minutes) in the delay buffer 315. This allows each user to "rewind" to view videos from sessions to view previous work or achievements (in the case of a game). Therefore, in one embodiment, each compressed video / audio output 339 stream routed to a user client 215 is also multicast to a delay buffer 315. When video / audio is stored in a delay buffer 315, a directory on the delay buffer 315 is located on the network address of the app / game server 321 to 325 via the source of the delayed video / audio and the delay buffer A cross-reference can be found between the delayed video / audio locations on 315.

一線上遊戲系統之一實施例中之圖像處理Image processing in one embodiment of an online gaming system

對於諸如視訊遊戲之低延時應用程式，重要的是儘可能有效地進行圖像操作。然而，嘗試加速圖像演現程序可導致非所期視覺失真，諸如「圖像撕裂」，即以一單一螢幕繪圖在一顯示裝置上展示來自兩個或更多個不同訊框之資訊。下文所述之本發明之實施例提供用於改善圖像演現之效率並同時減小此等非所期視覺失真之多種技術。 For low-latency applications such as video games, it's important to perform the graphics operations as efficiently as possible. However, attempts to speed up the image rendering process can lead to unintended visual distortions, such as "image tearing", that is, displaying information from two or more different frames on a display device with a single screen drawing. The embodiments of the present invention described below provide various techniques for improving the efficiency of image rendering while reducing these unintended visual distortions.

如圖4中圖解說明，在一實施例中，各應用程式/遊戲伺服器321裝備有用於執行儲存於記憶體403中之視訊遊戲程式碼408之一中央處理單元(CPU)401及用於執行演現該視訊遊戲輸出408之圖像命令之一圖像處理單元(GPU)。CPU及GPU之架構已為人所熟知，且因此本文將不會詳細描述此等單元及藉由此等單元執行之指令/命令。簡而言之，GPU能夠處理如藉由諸如Open GL或Direct 3D之一或多個圖像應用程式化介面(API)指定之一圖像命令程式庫。用於執行此等圖像API之程式碼在圖4中表示為圖像引擎430。當CPU處理該視訊遊戲程式碼408時，其將由API指定之圖像命令交遞給執行該等命令並產生該視訊輸出408之GPU。然而，應注意，本發明之根本原理並不限於任何特定圖像標準。 As illustrated in FIG. 4, in one embodiment, each application / game server 321 is equipped with a central processing unit (CPU) 401 for executing a video game code 408 stored in the memory 403 and for executing One of the image commands of the video game output 408 is a graphics processing unit (GPU). The architecture of the CPU and GPU is well known, and therefore this article will not describe these units and the instructions / commands executed by these units in detail. In short, the GPU is capable of processing applications such as one or more images through one or more of Open GL or Direct 3D. Specify a graphics command library using a programmatic interface (API). The code for implementing these image APIs is shown as the image engine 430 in FIG. 4. When the CPU processes the video game code 408, it passes the image commands specified by the API to the GPU that executes the commands and generates the video output 408. It should be noted, however, that the underlying principles of the invention are not limited to any particular image standard.

在一實施例中，CPU及GPU係管線型處理器，意謂一組資料處理階段係串聯連接於CPU及GPU內，使得一階段之輸出係下一階段之輸入。例如，CPU管線通常包含一指令擷取階段、一指令解碼階段、一執行階段及一轉進(retirement)階段，其各自可具有多個子階段。一GPU管線可具有更多階段，包含(例如且無限制)變換、頂點照明、視景變換、圖元產生、投影變換、剪裁、視域變換、光柵化、紋理化、片段著色及顯示。此等管線階段由一般技術者充分瞭解且將不會在本文加以詳細描述。一管線之元件通常係以並行或以時間分段方式執行且該管線之多個階段之間通常需要一定量的佇列儲存。 In one embodiment, the CPU and GPU are pipeline processors, which means that a set of data processing stages are connected in series within the CPU and GPU, so that the output of one stage is the input of the next stage. For example, the CPU pipeline usually includes an instruction fetching phase, an instruction decoding phase, an execution phase, and a retirement phase, each of which may have multiple sub-phases. A GPU pipeline can have more stages, including (for example and without limitation) transformation, vertex lighting, scene transformation, primitive generation, projection transformation, clipping, viewshed transformation, rasterization, texturing, fragment shading, and display. These pipeline stages are well understood by the average skilled person and will not be described in detail herein. The elements of a pipeline are usually executed in parallel or in a time-segmented manner and a certain amount of queue storage is usually required between the stages of the pipeline.

上述階段及該等階段之間所需佇列各自對圖像命令的執行增加一定量的延時。以下本發明之實施例提供用於最小化此延時之技術。減小延時係重要的，這係因為其擴大可使用一裝置之市場。此外，一裝置之製造商可能無法控制重大延時來源。例如，一使用者可將一高延時電視機附接至一視訊遊戲主控台，或可遠距使用一多媒體裝置(例如，線上視訊遊戲、經由網際網路控制之一醫學裝置或軍事裝置與前線上的目標交戰而操作者仍安全地待在後方)。 The above stages and the required queues between these stages each add a certain amount of delay to the execution of the image command. The following embodiments of the present invention provide techniques for minimizing this delay. Reducing latency is important because it expands the market for a device. In addition, the manufacturer of a device may have no control over the source of significant delays. For example, a user can attach a high-latency TV to a video game console, or can use a multimedia device (such as an online video game, a medical device or a military device controlled via the Internet, and Targets on the front line are engaged while the operator remains safely behind.)

如圖4中圖解說明，本發明之一實施例包含一後置緩衝區405及一前置緩衝區406以在使用者玩視訊遊戲時儲存由該圖像引擎430產生之視訊遊戲影像訊框。各「訊框」由表示視訊遊戲之螢幕影像之一組像素資料所構成。在操作中，各訊框係隨著使用圖像資料執行圖像命令而產生於該後置緩衝區中。當在該後置緩衝區中完成一訊框時，將該訊框傳送至逐線掃描輸出該訊框之前置緩衝區406以產生該未經壓縮視訊輸出408。可以一預定標準頻率(例如，如實施於標準CRT或LCD監控器之60Hz或120Hz)進行掃描輸出程序。接著可使用同在申請中的申請案中所述之各種先進低延時視訊壓縮技術壓縮該未經壓縮視訊輸出408。當然，無須如上文暗示般自視訊卡中掃描輸出訊框緩衝區(例如，經由一數位視訊介面(DVI))。該訊框緩衝區可經由(例如)應用程式伺服器之內部匯流排(例如，一PCI Express匯流排)直接傳送至壓縮硬體。該訊框緩衝區可藉由CPU或GPU之一者拷貝在記憶體中。壓縮硬體可為(例如且無限制)CPU、GPU、安裝於伺服器中之硬體及/或GPU卡上之硬體。 As illustrated in FIG. 4, an embodiment of the present invention includes a back buffer 405 and a front buffer 406 to store a video game video frame generated by the image engine 430 when a user plays a video game. Each "frame" consists of a set of pixel data representing a screen image of a video game. In operation, each frame is generated in the back buffer as image commands are executed using image data. When a frame is completed in the back buffer, the The frame is sent to a line-by-line scan output buffer 406 before the frame to generate the uncompressed video output 408. The scan output procedure can be performed at a predetermined standard frequency (for example, 60 Hz or 120 Hz as implemented in a standard CRT or LCD monitor). The uncompressed video output 408 may then be compressed using various advanced low-latency video compression techniques described in the same application. Of course, there is no need to scan out the frame buffer from the video card (for example, via a digital video interface (DVI)) as suggested above. The frame buffer can be sent directly to the compression hardware via, for example, an internal bus of the application server (for example, a PCI Express bus). The frame buffer can be copied in memory by one of the CPU or GPU. The compression hardware may be (for example and without limitation) a CPU, a GPU, hardware installed in a server, and / or hardware on a GPU card.

圖5展示一非同步管線，其中各處理階段(P1、P2、P3、P4)之間之佇列(Q12、Q23、Q34)保存由先前階段產生之資料直至其由下一階段消耗。在本發明之一實施例中，本文中所述之各種階段係該GPU 402內之階段。此一管線之延時係各階段中變換資料所消耗的時間(Tp1、Tp2、Tp3)加上資料停滯於各佇列中所消耗的時間(Tq1、Tq2、Tq3)之總和。 Figure 5 shows an asynchronous pipeline, in which the queues (Q12, Q23, Q34) between each processing stage (P1, P2, P3, P4) save the data generated from the previous stage until it is consumed by the next stage. In one embodiment of the invention, the various stages described herein are stages within the GPU 402. The delay of this pipeline is the sum of the time (Tp1, Tp2, Tp3) consumed in transforming the data in each stage plus the time (Tq1, Tq2, Tq3) consumed by the data in each queue.

最小化延時之明顯的第一步驟係最小化佇列或甚至完全丟棄該等佇列。一種常見的最小化延時的方式係按照圖6同步化各管線階段。每一階段同時對不同組資料操作。當所有階段備妥時，其全部將其資料傳遞至管線中的下一階段。佇列變得不重要且將不會在圖式中展示。一同步化管線之延時係階段數目乘以完成最慢階段之時間。 The obvious first step in minimizing the delay is to minimize queues or even discard them completely. A common way to minimize the delay is to synchronize the pipeline stages according to FIG. 6. Each stage operates on different sets of data at the same time. When all stages are ready, they all pass their data to the next stage in the pipeline. Queuing becomes unimportant and will not be shown in the drawing. The delay of a synchronization pipeline is the number of stages multiplied by the time to complete the slowest stage.

管線中之此最慢階段係所有圖式中之瓶頸P4。此階段通常係一設計者無法加以控制之裝置之一固定特徵。圖7展示瓶頸階段下游之資料流。注意，無須佇列或同步化。延時係完成各階段所消耗的時間總和。延時不可能慢於此總和。 This slowest stage in the pipeline is the bottleneck P4 in all diagrams. This stage is usually a fixed feature of a device that the designer cannot control. Figure 7 shows the data flow downstream of the bottleneck stage. Note that there is no need to queue or synchronize. Delay is the sum of the time it takes to complete each phase. Delays cannot be slower than this sum.

此啟發了一種用於按照圖8最小化瓶頸上游之管線階段的延時之方法。若第一管線階段確切已知每一管線階段將會消耗的時間且瓶頸階段將何時請求新資料，則可預測何時開始產生將恰好將備妥以用於瓶頸階段之新資料。因此，在一實施例中，該第一管線階段可基於該瓶頸階段何時需要該新資料而降低其時脈以減慢資料處理。此技術可稱為一鎖相管線。總延時係各管線階段的時間總和。 This inspired a method for minimizing the delay of the pipeline stage upstream of the bottleneck according to Figure 8. method. If the first pipeline stage knows exactly how much time each pipeline stage will consume and when the bottleneck stage will request new data, it can be predicted when new data will begin to be produced that will be just ready for the bottleneck stage. Therefore, in an embodiment, the first pipeline stage may reduce its clock to slow down data processing based on when the bottleneck stage needs the new data. This technique can be called a phase-locked pipeline. The total delay is the sum of the time of each pipeline stage.

圖9中圖解說明另一實施例，其中瓶頸階段係藉由將第一管線階段減慢至稍微慢於實際瓶頸階段而人工地移動至第一管線階段。P1中標記為5的框開始於P4中的框3之後。P1中的框4亦應稍微慢於P4中之框2的頂部。這在瓶頸階段係電腦與監視器之間之實體連接的視訊遊戲中，是常見的案例。圖9中存在一缺點：階段P3與P4之間一定存在某些引發延時的佇列(未展示)。另一缺點係使用者經歷的延時可隨時間漂移，穩定地降低然後突然增加卻又再次開始降低。其亦可導致丟棄訊框。開發者通常藉由以儘可能接近瓶頸階段之一速率來驅動第一階段以最小化丟棄訊框。然而，通常並未確知此速率。若驅動第一階段使其甚至稍微快於瓶頸速率，則系統中的佇列將填滿而拖延上游階段。諷刺的是，嘗試使用此方法最小化延時將冒上最大化延時的風險。 Another embodiment is illustrated in FIG. 9, where the bottleneck stage is manually moved to the first pipeline stage by slowing down the first pipeline stage to be slightly slower than the actual bottleneck stage. The box labeled 5 in P1 starts after box 3 in P4. Box 4 in P1 should also be slightly slower than the top of box 2 in P4. This is a common case in video games where the bottleneck is a physical connection between the computer and the monitor. There is a disadvantage in Figure 9: there must be some queues (not shown) that cause delays between stages P3 and P4. Another disadvantage is that the delay experienced by the user can drift over time, decrease steadily and then suddenly increase but start to decrease again. It can also result in discarded frames. Developers usually minimize the drop frame by driving the first stage at a rate as close to one of the bottleneck stages as possible. However, this rate is usually not known. If the first stage is driven to be even slightly faster than the bottleneck rate, the queues in the system will fill up and delay the upstream stage. Ironically, trying to use this method to minimize latency will run the risk of maximizing latency.

在本發明之一實施例中(圖10中所示)，第一階段被限制成與瓶頸階段相同的速率。P1中之各數字框頂部的間隔距離應為與P4中之各框頂部的間隔距離相同。P1產生訊框的速率剛好匹配P4消耗訊框的速率。必須自瓶頸階段提供回饋至第一階段以確保速率剛好匹配。每一階段提供包含(但不限於)運作資料所需時間及佇列所消耗之時間的回饋。鎖相組件維持各階段上的統計資訊，且可以一預定信賴度準確地預測當瓶頸階段要求資料時該資料將備妥且佇列最少。注意，此實施例中無需一通用時脈。鎖相組件僅需要相對時間。因此，各管線階段可使用不同的時脈。實際上，該等時脈可在可潛在地分隔幾千英哩的不同實體裝置中。總而言之，在本發明之此實施例中，基於時序限制識別出一瓶頸相位。接著自瓶頸相位提供回饋至上游階段，以容許上游階段精確地匹配該瓶頸相位。上游階段之相位經調整以最小化佇列中所浪費的時間。 In one embodiment of the invention (shown in FIG. 10), the first stage is limited to the same rate as the bottleneck stage. The separation distance at the top of each number box in P1 should be the same as the separation distance at the top of each box in P4. The rate at which P1 generates frames just matches the rate at which P4 consumes frames. Feedback must be provided from the bottleneck stage to the first stage to ensure that the rates just match. Each stage provides feedback that includes (but is not limited to) the time required to run the data and the time spent in the queue. The phase-locked component maintains statistical information at each stage, and can accurately predict with a predetermined degree of confidence that when the bottleneck stage requires data, the data will be prepared and queued to a minimum. Note that a universal clock is not required in this embodiment. Phase-locked components require only relative time. Therefore, each pipeline stage can use a different clock. In fact, such clocks can potentially separate thousands of miles In different physical devices. In summary, in this embodiment of the invention, a bottleneck phase is identified based on timing constraints. Feedback is then provided from the bottleneck phase to the upstream phase to allow the upstream phase to precisely match the bottleneck phase. The phase of the upstream phase is adjusted to minimize wasted time in the queue.

前述圖示圖解說明輕量級應用程式。此等應用程式係無效率地，這係因為硬體大部分時間為閒置無動作。如圖11中圖解說明，形成一便宜設計之本發明之一實施例係對各階段指派最少硬體資源但仍保證各階段快於瓶頸階段之一設計。在此情況中，鎖相方法只比按照圖6之一完全同步管線獲得甚少增益。另一實例係電腦遊戲，其使用較高解析度紋理演現更多的多邊形、使用更多的去鋸齒化、特殊效果，直到訊框速率開始降低為止。 The preceding icon illustrates a lightweight application. These applications are inefficient because the hardware is idle for most of the time. As illustrated in FIG. 11, one embodiment of the present invention that forms an inexpensive design is a design that assigns minimum hardware resources to each stage but still guarantees that each stage is faster than one of the bottleneck stages. In this case, the phase-locked method only gains less than the fully synchronized pipeline according to one of FIG. 6. Another example is a computer game that uses higher resolution textures to render more polygons, uses more anti-aliasing, special effects, until the frame rate starts to decrease.

此實施例直接引導出本發明之另一實施例，其中使用最少硬體但以低延時實施高級圖像處理。在此實施例中，視訊流被細分為可獨立處理之兩個邏輯部分：(a)一資源少、延時嚴重部分及(b)一資源多、延時容忍部分。此等兩個部分可組合於如圖12中圖解說明之一混合系統中。(許多可能實例中之)一特定實例將為被稱為一「第一人稱射手」之一電腦遊戲，其中一使用者以一3維世界中的遊戲角色的角度來四處活動。在此類型的遊戲中，演現背景及非玩家角色消耗的資源多且容忍延時(在圖12中以「b」代表「背景」)，而使演現玩家角色之影像消耗的資源少且不能容忍延時(在圖12中以「a」代表「化身」)(即，因為比極低延時效能之任何效能表現將會導致一非所期使用者體驗)。當使用者扣下扳機時，期望即刻看見其武器開火。在經圖解說明之特定實施例中，使用一個人電腦實施該遊戲，以一中央處理單元(CPU)作為階段P1並以一圖像處理單元(GPU)作為階段P2。標示為P3之監控器係瓶頸階段。在此情況中，「監控器」意謂消耗未經壓縮視訊流之任何裝置。其可為壓縮硬體。 This embodiment directly leads to another embodiment of the present invention, in which advanced image processing is performed using a minimum of hardware but with low latency. In this embodiment, the video stream is subdivided into two logical parts that can be processed independently: (a) a part with few resources and severe delay, and (b) a part with many resources and delay tolerance. These two parts can be combined in a hybrid system as illustrated in FIG. A particular instance (of many possible examples) would be a computer game known as a "first-person shooter" in which a user moves around from the perspective of a game character in a 3-dimensional world. In this type of game, the background and non-player characters consume a lot of resources and tolerate delays ("b" in Figure 12 represents "background"), and the resources that consume the player's images consume little and cannot Tolerate latency ("a" stands for "avatar" in Figure 12) (ie, because any performance performance over extremely low latency performance will result in an unexpected user experience). When the user pulls the trigger, he expects to see his weapon fire immediately. In the illustrated specific embodiment, the game is implemented using a personal computer, with a central processing unit (CPU) as stage P1 and an image processing unit (GPU) as stage P2. The monitor labeled P3 is the bottleneck stage. In this case, "monitor" means any device that consumes an uncompressed video stream. It may be compression hardware.

在此實施例中，CPU在完成其對標示為2a之化身影像之工作之前先完成其對標示為3b之背景影像之工作。然而，為減小與化身相關聯之延時，GPU在處理3b之前先處理2a，從而在一之前演現背景2b上演現化身2a(以儘可能有效地演現化身之運動)，輸出該訊框，且接著即刻開始演現下一個訊框之背景(標示為3b)。GPU可閒置短時間以等待來自CPU之資料以完成下一個訊框。在此實施例中，CPU閒置以等待鎖相發出信號通知是時候為使用者的化身作出繪圖命令之一清單並將該清單傳遞給GPU。CPU接著即刻開始繪製新訊框的背景，但是該新的訊框不可能係下一個訊框，這係因為GPU將會開始繪製該下一個訊框。CPU絕不可能及時備妥該下一個訊框。因此，CPU必須針對下一個訊框之後的訊框開始繪製背景。此情形類似於如圖6中圖解說明之一同步管線之操作。 In this embodiment, the CPU completes its work on the background image labeled 3b before completing its work on the avatar image labeled 2a. However, in order to reduce the delay associated with the avatar, the GPU processes 2a before processing 3b, thereby rendering the avatar 2a on a previously rendered background 2b (to perform the avatar's motion as effectively as possible) and outputs the frame , And then immediately begin to show the background of the next frame (labeled 3b). The GPU can be idle for a short time to wait for data from the CPU to complete the next frame. In this embodiment, the CPU is idle waiting for the phase lock to signal that it is time to make a list of one of the drawing commands for the user's avatar and pass the list to the GPU. The CPU then immediately draws the background of the new frame, but the new frame cannot be the next frame, because the GPU will start to draw the next frame. The CPU is never ready for the next frame in time. Therefore, the CPU must start drawing the background for the frame after the next frame. This situation is similar to the operation of a synchronization pipeline as illustrated in FIG. 6.

化身與背景之間的這樣一個訊框的相位差在大部分情況中可為使用者接受。然而，在期望最高可能品質之情況中，可採用下列額外的技術。高延時路徑預測輸入以產生資料。在第一人稱射手實例中，提前預測相機位置。當組合高延時路徑之輸出與低延時路徑之輸出時，修改高延時路徑路徑之輸出(例如，背景)以更近似匹配使用實際輸入而非預測輸入產生的輸出。在第一人稱射手實例中，將平移、按比例調整及/或旋轉背景以匹配實際相機位置。注意，此意味著高延時路徑將必須演現稍微大於由如圖13中圖解說明之玩家實際觀看到的區域之一區域，圖13展示一實際相機位置1301、一預測相機位置1302、一實際背景1303及一演現背景1304。因此，若一使用者正在玩一遊戲(其中一角色正朝一株樹跑去)，則在每個訊框中該樹又稍微靠近一點，意謂較大。使用者開槍射擊，命中該樹。在混合場景中，該樹比射擊落後一個訊框。因此對於一訊框而言，事物可能看起來係「錯誤的」(即，射擊似乎沒有命中)。為作出補償，本發明之所述實施例放大該樹以趨近其在開火射擊之訊框中應該顯現的樣子。 The phase difference of such a frame between the avatar and the background is acceptable to the user in most cases. However, where the highest possible quality is desired, the following additional techniques can be used. High-latency path predicts input to produce data. In the first-person shooter example, the camera position is predicted in advance. When combining the output of the high-latency path with the output of the low-latency path, modify the output (eg, background) of the high-latency path to more closely match the output produced using the actual input rather than the predicted input. In the first-person shooter example, the background will be translated, scaled, and / or rotated to match the actual camera position. Note that this means that the high-latency path will have to be rendered slightly larger than one of the areas actually viewed by the player as illustrated in Figure 13. Figure 13 shows an actual camera position 1301, a predicted camera position 1302, and an actual background 1303 and a show background 1304. Therefore, if a user is playing a game (where a character is running towards a tree), the tree is slightly closer in each frame, which means larger. The user fired a shot and hit the tree. In a mixed scene, the tree is one frame behind the shot. So for a frame, things may appear to be "wrong" (ie, the shot does not seem to hit). To compensate, the described embodiments of the invention An embodiment enlarges the tree to approximate what it should appear in a firing frame.

作為另一實例，當一使用者在玩第一人稱射手視訊遊戲且按下開火按鈕時，使用者希望即刻看見槍中射出的火花。因此，在一實施例中，程式在一之前演現背景上繪出開火的槍且該遊戲測定繪圖時間，使得完成遊戲恰好以由管線中之下一階段擷取(該下一階段係dvi輸出(vsync)或編碼器輸入或某個其他瓶頸)。接著該遊戲繪出其對下一個訊框之背景應為如何之最佳推測。若該推測不佳，則一實施例修改背景以更接近匹配其自正確的相機位置演現時所應呈現的背景。因此，圖13中所示之技術係一簡單的仿射變形(affine warp)。其他實施例中使用之更複雜的技術使用z緩衝區以獲致較佳的成果。 As another example, when a user is playing a first-person shooter video game and pressing the fire button, the user wants to immediately see the sparks emitted from the gun. Therefore, in an embodiment, the program draws a fired gun on a previously rendered background and the game measures the drawing time so that the game is completed to be retrieved by the next stage in the pipeline (the next stage is the dvi output (vsync) or encoder input or some other bottleneck). The game then draws its best guess as to how the background of the next frame should be. If the speculation is not good, an embodiment modifies the background to more closely match the background that it should present when rendering from the correct camera position. Therefore, the technique shown in FIG. 13 is a simple affine warp. More complex techniques used in other embodiments use the z-buffer to achieve better results.

在一實施例中，本文圖解說明之各種功能模組及相關步驟可藉由含有用於執行該等步驟之硬接線邏輯(諸如一特定應用積體電路(「ASIC」))之特定硬體組件或藉由程式電腦組件及客製硬體組件之任意組合而加以執行。 In an embodiment, the various functional modules and related steps illustrated herein may be implemented by specific hardware components containing hard-wired logic (such as an application-specific integrated circuit ("ASIC")) for performing the steps. Or by any combination of program computer components and custom hardware components.

在一實施例中，該等模組可實施於一可程式化數位信號處理器(「DSP」)上，諸如一德州儀器之TMS320x架構(例如，一TMS320C6000、TMS320C5000、……，等等)。可使用各種不同的DSP而仍符合此等根本原理。 In one embodiment, the modules may be implemented on a programmable digital signal processor ("DSP"), such as a TMS320x architecture from Texas Instruments (eg, a TMS320C6000, TMS320C5000, ..., etc.). A variety of different DSPs can be used while still meeting these fundamental principles.

實施例可包含如上文陳述之各個步驟。該等步驟可具體實施於使一般用途或特殊用途處理器執行某些步驟之機器可執行指令中。一些或所有圖式中省略與此等根本原則無關的各種元件(諸如電腦記憶體、硬碟機、輸入裝置)以避免混淆相關態樣。 Embodiments may include various steps as stated above. The steps may be embodied in machine-executable instructions that cause a general-purpose or special-purpose processor to perform certain steps. In some or all drawings, various components (such as computer memory, hard disk drive, input device) that are not related to these fundamental principles are omitted to avoid confusing related aspects.

所揭示標的之元件亦可提供作為用於儲存該等機器可執行指令之一機器可讀媒體。該機器可讀媒體可包含(但不限於)快閃記憶體、光碟、CD-ROM、DVD RPM、RAM、RPEMO、EEPROM、磁卡或光學卡、傳播媒體或適用於儲存電子指令之其他類型的機器可讀媒體。例如，本發明可經下載作為一電腦程式，其可藉由具體實施於一載波或其他傳播媒體中之資料信號經由一通信鏈路(例如，一數據機或網路連接)自一遠距電腦(例如，一伺服器)傳送至一請求電腦(例如，一用戶端)。 The disclosed subject matter elements may also be provided as a machine-readable medium for storing such machine-executable instructions. The machine-readable medium may include, but is not limited to, flash memory, optical discs, CD-ROMs, DVD RPM, RAM, RPEMO, EEPROM, magnetic or optical cards, propagation media, or other types of machines suitable for storing electronic instructions Readable media. For example, the present invention can be downloaded as a computer program, which can be implemented from a remote computer via a communication link (for example, a modem or a network connection) by a data signal embodied in a carrier wave or other propagation medium. (E.g., a server) to a requesting computer (e.g., a client).

亦應瞭解，所揭示標的之元件亦可提供作為一電腦程式產品，其可包含其上儲存可用以程式化一電腦(例如，一處理器或其他電子裝置)以執行一系列操作之指令之一機器可讀媒體。或者，該等操作可藉由硬體與軟體之一組合執行。該機器可讀媒體可包含(但不限於)軟式磁碟、光碟、CD-ROM及磁光碟、ROM、RAM、RPEMO、EEPROM、磁卡或光學卡、傳播媒體或適用於儲存電子指令之其他類型的媒體/機器可讀媒體。例如，所揭示標的之元件亦可經下載作為一電腦程式產品，其中該程式可藉由具體實施於一載波或其他傳播媒體中之資料信號經由一通信鏈路(例如，一數據機或網路連接)自一遠距電腦或電子裝置傳送至一請求程序。 It should also be understood that the disclosed subject matter component may also be provided as a computer program product, which may include one of the instructions stored thereon that can be used to program a computer (e.g., a processor or other electronic device) to perform a series of operations Machine-readable media. Alternatively, these operations may be performed by a combination of hardware and software. The machine-readable medium may include, but is not limited to, floppy disks, optical disks, CD-ROMs and magneto-optical disks, ROM, RAM, RPEMO, EEPROM, magnetic or optical cards, propagation media, or other types suitable for storing electronic instructions Media / machine-readable media. For example, the disclosed subject matter component can also be downloaded as a computer program product, where the program can be transmitted via a communication link (e.g., a modem or network) by a data signal embodied in a carrier wave or other propagation medium. Connection) from a remote computer or electronic device to a requesting program.

此外，雖然已結合特定實施例描述所揭示標的，但是數種修改及變更亦係完全在本發明之範疇內。因此，應以闡釋性意義而非限制性意義看待本說明書及各圖式。 In addition, although the disclosed subject matter has been described in connection with specific embodiments, several modifications and changes are also fully within the scope of the present invention. Therefore, this description and the drawings should be viewed in an interpretive and not a restrictive sense.

Claims

A computer-implemented method for efficiently processing a video stream with a processor pipeline having a plurality of pipeline stages, comprising: identifying a bottleneck stage in the processor pipeline, the bottleneck stage having a first clock and Processing the frame of the video stream; receiving a feedback signal from the bottleneck stage at one or more upstream stages, at least one of the upstream stages having a second clock, the feedback signal containing information about the operation of the bottleneck stage Information on the time required for the data and information on the time consumed by the data queue; and adjust the speed of the frame of the video stream in the one or more upstream stages in response to estimate the processing of the video stream in the bottleneck stage. Waiting for the speed of the frame, wherein the speed is at least partially adjusted by modifying a frequency of the second clock, wherein the video stream is generated by a user playing a code of a video game, and wherein a host servo The video game is executed on a server, wherein the user plays the video game from a client computer, and wherein the pipeline stages are stages within the host server.

The method of claim 1, wherein the processor pipeline includes one or more stages in a central processing unit (CPU) and one or more stages in an image processing unit (GPU).

A method for processing a video stream includes the following operations: identifying a bottleneck stage in a processor pipeline, the processor pipeline having a plurality of pipeline stages, wherein the bottleneck stage has a first clock and processes the video stream Frame; at one or more upstream stages, receiving a feedback signal from the bottleneck stage, wherein at least one of the upstream stages has a second clock, and wherein the feedback signal Contains information about the time required for the operational data of the bottleneck stage and information about the time consumed by the data queue; and adjusts the speed of the frame of the video stream in the one or more upstream stages in response to estimate the bottleneck stage Processing the speed of the frames of the video stream, wherein the speed is at least partially adjusted by modifying a frequency of the second clock, and wherein each of the operations is performed by hardware including hard-wired logic Component execution, wherein the video stream is generated by a user playing a video game code, and wherein the video game is executed on a host server, wherein the user plays the video game from a client computer, And the pipeline stages are the stages in the host server.

The method of claim 3, wherein the processor pipeline includes one or more stages in a central processing unit (CPU) and one or more stages in an image processing unit (GPU).