WO2012173650A1 - Processing of graphics data of a server system for transmission - Google Patents

Processing of graphics data of a server system for transmission Download PDF

Info

Publication number
WO2012173650A1
WO2012173650A1 PCT/US2011/064992 US2011064992W WO2012173650A1 WO 2012173650 A1 WO2012173650 A1 WO 2012173650A1 US 2011064992 W US2011064992 W US 2011064992W WO 2012173650 A1 WO2012173650 A1 WO 2012173650A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
graphics
server system
buffer
client
Prior art date
Application number
PCT/US2011/064992
Other languages
French (fr)
Inventor
Satyaki Koneru
Ke YIN
Dinakar Munagala
Original Assignee
Thinci, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Thinci, Inc. filed Critical Thinci, Inc.
Priority to KR1020147001108A priority Critical patent/KR101898565B1/en
Priority to GB1322404.3A priority patent/GB2510056B/en
Publication of WO2012173650A1 publication Critical patent/WO2012173650A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/005General purpose rendering architectures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/60Memory management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/14Digital output to display device ; Cooperation and interconnection of the display device with other functional units
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/12Selection from among a plurality of transforms or standards, e.g. selection between discrete cosine transform [DCT] and sub-band transform or selection between H.263 and H.264
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/156Availability of hardware or computational resources, e.g. encoding based on power-saving criteria
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/172Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/42Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/46Embedding additional information in the video signal during the compression process
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/236Assembling of a multiplex stream, e.g. transport stream, by combining a video stream with other content or additional data, e.g. inserting a URL [Uniform Resource Locator] into a video stream, multiplexing software data into a video stream; Remultiplexing of multiplex streams; Insertion of stuffing bits into the multiplex stream, e.g. to obtain a constant bit-rate; Assembling of a packetised elementary stream
    • H04N21/23614Multiplexing of additional data and video streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/426Internal components of the client ; Characteristics thereof
    • H04N21/42653Internal components of the client ; Characteristics thereof for processing graphics
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/434Disassembling of a multiplex stream, e.g. demultiplexing audio and video streams, extraction of additional data from a video stream; Remultiplexing of multiplex streams; Extraction or processing of SI; Disassembling of packetised elementary stream
    • H04N21/4348Demultiplexing of additional data and video streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/8146Monomedia components thereof involving graphical data, e.g. 3D object, 2D graphics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer

Definitions

  • the described embodiments relate generally to transmission of graphics data. More particularly, the described embodiments relate to methods, apparatuses and systems for processing of graphics data on a server system for transmission to a client system and processing on a client system.
  • Centralized computer includes most of the resources of a system being "centralized". These resources generally include a centralized server that includes central processing unit (CPU), memory, storage and support for networking. Applications run on the centralized server and the results are transferred to one or more clients.
  • CPU central processing unit
  • Video compression scheme is most suited for remote processing of graphics for thin-client applications as the content of the frame buffer changes incrementally. Video compression scheme is an adaptive compression technique based on instantaneous network bandwidth availability, computationally intensive and places additional burden on the server resources. In video compression scheme, the image quality is compromised and additional latency is introduced due to the compression phase.
  • One embodiment includes a method of selecting graphics data of a server system for transmission.
  • the method includes reading data from graphics memory of the server system.
  • the data read from the graphics memory is placed in a transmit buffer if the data is being read for the first time, and was not written by a processor of the server system during graphics rendering.
  • Another embodiment includes a system for selecting graphics data for transmission.
  • the system includes a server system comprising graphics memory, a frame buffer and a processor.
  • the server system is operable to read data from the graphics memory.
  • the server system is operable to place the data in a transmit buffer if the data is being read for the first time, and was not written by the processor during graphics rendering.
  • Figure 1 shows a block diagram of an embodiment of a server and client systems.
  • Figure 2 is a flow chart that includes the steps of an example of a method selecting graphics data for transmission from the server to the client.
  • Figure 3 is a flow chart that includes the steps of an example of a method placing data in a transmit buffer.
  • Figure 4 is a flow chart that includes steps of an example of a method of selecting graphics data of a server system for transmission.
  • Figure 5 shows an example of setting and resetting of status-bits that are used for determining whether to place data in the transmit buffer.
  • Figure 6 is a flow chart that includes steps of a method of operating a client system.
  • Figure 7 shows a block diagram of an embodiment of a server system and a client system.
  • Figure 8 shows a block diagram of a hardware assisted memory virtualization in a graphics system.
  • Figure 9 shows a block diagram of hardware virtualization in a graphics system.
  • Figure 10 shows a block diagram of fast context switching in a graphics system.
  • Figure 11 shows a block diagram of scalar/vector adaptive execution in a graphics system.
  • Figure 12 shows a flowchart of a smart pre-fetch/pre-decode technique in a graphics system.
  • Figure 13 shows a diagram of motion estimation for video encoding in a video processing system.
  • Figure 14 shows a diagram of tap filtering for video post-processing in a video processing system.
  • FIG. 15 shows a flowchart of a Single Instruction Multiple Data (SIMD) branch technique.
  • SIMD Single Instruction Multiple Data
  • Figure 16 shows a flowchart of programmable output merger
  • processor refers to a device that processes graphics which includes and not limited to any one of or all of graphics processing unit (GPU), central processing unit (CPU), Accelerated Processing Unit (APU) and Digital Signal Processor (DSP).
  • graphics processing unit GPU
  • CPU central processing unit
  • APU Accelerated Processing Unit
  • DSP Digital Signal Processor
  • graphics stream refers to uncompressed data which is a subset of graphics and command data.
  • video stream refers to compressed frame buffer data.
  • FIG. 1 shows a block diagram of an embodiment of a graphics server- client co-processing system.
  • the system consists of server system 110 and client system 140.
  • This embodiment of server system 110 includes graphics memory 112, central processing unit (CPU) 116, graphics processing unit (GPU) 120, graphics stream 124, video stream 128, mux 130, control 132 and link 134.
  • This embodiment of the client system 140 includes client graphics memory 142, CPU 144, and GPU 148.
  • graphics memory 112 includes command and graphics data 114, frame buffer 118, transmit buffer 122, and compressed frame buffer 126.
  • graphics memory 112 resides in server system 110. In another embodiment, graphics memory 112 may not reside in server system 110. The server system processes graphics data and manages data for transmission to the client system. Graphics memory 112 may be any one of or all of
  • graphics memory 112 is a DRAM storing graphics data.
  • a block of data that is read or written to memory is referred to as a cache-line.
  • the status of the cache-line of command and graphics data 114 is stored in graphics memory 112. In another embodiment, the status can be stored in a separate memory.
  • status-bits refer to a set of one or more status bits of memory used to store the status of a cache-line or a subset of the cache-line.
  • a cache-line can have one or more sets of status-bits.
  • graphics memory 112 is located in the system memory (not shown in Figure 1). In another embodiment, graphics memory 112 may be in a separate dedicated video memory. Graphics application running on the CPU loads graphics data into system memory. For the described embodiments, graphics data includes at least index buffers, vertex buffers and textures.
  • the graphics driver of GPU 120 translates graphics Application Programming Interface (API) calls made by, for example, a graphics application into command data.
  • graphics API refers to an industry standard API such as OpenGL or DirectX.
  • the graphics and command data is placed in graphics memory either by copying or remapping. Typically, the graphics data is large and generally not practical to transmit to client systems as is.
  • GPU 120 processes command and data in command and graphics data 114 and selectively places data either in frame buffer 118 at the end of graphics rendering or in transmit buffer 122 during graphics rendering.
  • GPU 120 is a specialized processor for manipulating and displaying graphics.
  • GPU 120 supports 2D, 3D graphics and/or video.
  • GPU 120 manages generation of compressed data for placement in the compressed frame buffer 126 and a subset of uncompressed graphics and command data is placed in transmit buffer 122.
  • the data from transmit buffer contains graphics data and is referred to as graphics stream 124.
  • Transmit buffer 122 is populated with a selected subset of command and graphics data 114 during graphics rendering.
  • the selected subset of data from command and graphics data 114 is such that the results obtained by the client system by processing the subset of data can be identical or almost identical to processing the entire contents of command and graphics data 114.
  • the process of selecting a subset of data from command and graphics data 114 to fill transmit buffer 122 is discussed further in conjunction with Figure 2.
  • GPU 120 fills transmit buffer 122.
  • the contents of transmit buffer includes at least command data or graphics API command calls along with graphics data.
  • the allocated size of transmit buffer 122 is adaptively determined by the maximum available bandwidth on the link. For example, the size of the frame buffer can dynamically change over time as the bandwidth of the link between the server system and the client system varies.
  • GPU 120 is responsible for graphics rendering frame buffer 118 and generating compressed frame buffer 126.
  • compressed frame buffer 126 is generated if the client does not have capabilities or the bandwidth is not sufficient to transmit graphics stream.
  • the compressed frame buffer is generated by encoding the contents of frame buffer 118 using industry standard compression techniques, for example MPEG2 and MPEG4.
  • Graphics stream 124 includes at least uncompressed graphics data and header with at least data type information. Graphics stream 124 is generated during graphics rendering and may be available while the transmit buffer has data.
  • Video stream 128 includes at least a compressed video data and header conveying the information required for interpreting the data type for decompression. Video stream 128 can be available as and when compressed frame buffer 126 is generated.
  • Mux 130 illustrates a selection between graphics stream 124 generated by data from the transmit buffer 122 and video stream 128 generated by data from compressed frame buffer 126.
  • the selection by mux 130 is done on a frame-by-frame basis and is controlled by control 132, which at least in some embodiments is generated by the GPU
  • a frame is the interval of processing time for generating a frame-buffer for display.
  • control 132 is generated by CPU and/or GPU.
  • control 132 dependents on at least in part upon either bandwidth of link 134 between the server system 110 and the client system 140, and the processing capabilities of client system 140.
  • Mux 130 selects between the graphics stream and the video stream, the selection can occur once per clock cycle, which is typically less than a frame.
  • the data transmitted on link 134 consists of data from compressed frame buffer and/or transmit buffer.
  • link 134 is a dedicated Wide Area Graphics Network (WAGN) / Local Area Graphics Network (LAGN) to transmit graphics/video stream from server system 110 to client system 140.
  • a hybrid Transmission Control Protocol (TCP)-User Datagram Protocol (UDP) may be implemented to provide an optimal combination of speed and reliability.
  • TCP protocol is used to transmit the command /control packets and the UDP protocol is used to transfer the data packets.
  • command/control packet can be the previously described command data
  • the data packets can be the graphics data.
  • the client system receives data from the server system and manages the received data for user display.
  • client system 140 includes at least client graphics memoryl42, CPU 144, and GPU 148.
  • Client graphics memory 142 which includes at least a frame buffer may be a Dynamic Random Access memory (DRAM), Static Random Access Memory (SRAM), flash memory, content addressable memory or any other type of memory.
  • client graphics memory 142 is a DRAM storing command and graphics data.
  • graphics/video stream received from server system
  • the 110 via link 134 is a frame of data and processed using standard graphics rendering or video processing techniques to generate the frame buffer for display.
  • the received frame includes at least a header and data.
  • the GPU reads the header to detect the data type which can include at least uncompressed graphics stream or compressed video stream to process the data. The method of handling the received data is discussed in conjunction with Figure 5.
  • FIG. 2 is a flow chart of method 200 that includes the steps of an example of a method of selecting graphics data for transmission from the server to the client.
  • step 210 command data buffer generation takes place.
  • step 210 the graphics software application commands are compiled by the GPU software driver to translate command data in system memory. This step also involves the process of loading the system memory with graphics data.
  • step 220 command and graphics data buffer is allocated.
  • a portion of free or unused graphics memory 112 is defined as command and graphics data 114 based on the requirement and the command and graphics data in system memory is copied to graphics memory 112 if the graphics memory is a dedicated video memory or remapped/copied to graphics memory 112 if the graphics memory is part of system memory.
  • graphics data is rendered on server system 110.
  • Graphics data in server system 110 read from command and graphics data 114 is rendered by GPU 120.
  • graphics rendering or 3D rendering is the process of producing a two-dimensional image based on three-dimensional scene data.
  • Graphics rendering involves processing of polygons and generating the contents of frame buffer 118 for display.
  • Polygons such as triangles, lines & points have attributes associated with the vertices which are stored in vertex buffer/s and determine how the polygons are processed.
  • the position coordinates undergo linear (scaling, rotation, translation etc.) and viewing (world and view space) transformation.
  • the polygons are rasterized to determine the pixels enclosed within. Texturing is a technique to apply/paste texture images onto these pixels.
  • the pixel color values are written to frame buffer 118.
  • Step 240 involves checking the client system capabilities to decide the compression technique.
  • the size and bandwidth of client graphics memory 142, graphics API support in the client system, the performance of GPU 148 and decompression capabilities of client system 140 constitutes client system capabilities.
  • transmit buffer is generated.
  • step 260 the contents of transmit buffer 122 is generated during graphics rendering. Data is written into transmit buffer 122 as and when data is rendered. A subset of graphics and command data is identified and unique instances of data are selected for placing data in transmit buffer 122 which is discussed in conjunction with Figure 3. The data from transmit buffer is referred to as graphics stream 124.
  • step 270 method 200 checks for at least the bandwidth of link 134 connecting server system 110 and client system 140. If sufficient bandwidth is available, graphics stream 124 is transmitted in step 290.
  • compressed frame buffer 126 is generated.
  • compressed frame buffer is generated by encoding the contents of frame buffer 118 using MPEG2, MPEG4 or any other compression techniques. The selection of compression technique is determined by the client capabilities. After graphics rendering is complete, the compressed frame buffer is filled during compression of frame buffer 118.
  • compressed frame buffer is transmitted.
  • Figure 3 is a flow chart of method 300 that includes the steps of an example of a method placing data in a transmit buffer 122.
  • step 310 a cache-line or a block of data is read from command and graphics data 114 or frame buffer 118 graphics rendering by the server system.
  • step 320 the cache-line is checked for being read for the first time to determine if the data in the cache-line is new. If the data has been read earlier, the data is available on client system 140 or present in transmit buffer 122; the cache-line is not processed further and method 300 returns to step 310. If the cache-line is being read for the first time, the client system does not have the data and not present in the transmit buffer 122, method 300 proceeds to step 330. [0047] In step 330, the cache-line of command and graphics data 114 or frame buffer 118 is checked if the data in the cache-line was written during graphics rendering by a processor.
  • step 340 the cache-line is placed in transmit buffer 122.
  • Figure 4 is a flow chart that includes steps of an example of a method of selecting graphics data of a server system for transmission.
  • a first step 410 includes reading data from graphics memory of the server system.
  • a second step 420 includes placing the data in a transmit buffer if the data is being read for the first time, and was not written during graphics rendering by a processor of the server system.
  • a third step 430 includes transmitting the data of the transmit buffer to a client system.
  • the processor is a CPU and/or a GPU.
  • the server system includes a central processing unit
  • CPU central processing unit
  • GPU graphics processing unit
  • the GPU controls compression and placement of data of a frame buffer into a compressed frame buffer.
  • the GPU controls selection of either compressed data of the compressed frame buffer or uncompressed data of the transmit buffer for transmission to the client system.
  • Checking a first status-bit determines whether the data is being read for the first time.
  • the first status-bit is set when the data is placed in the transmit buffer and not yet transmitted.
  • the data being read can be a cache-line which is a block of data.
  • One or more status-bits define the status of the cache-line.
  • each sub- block of the cache-line can have one or more status-bits.
  • the data comprises a plurality of blocks, and wherein determining if the data is being read for the first time comprises checking at least one status-bit corresponding to at least one block
  • the second status-bit determines whether the data was not written by the processor.
  • the second status-bit is set when the processor writes to the graphics memory.
  • the first status-bit is reset upon detecting a direct memory access (DMA) of the graphics memory or reallocation of the graphics memory.
  • the second status-bit is reset upon detecting a direct memory access (DMA) of the graphics memory or reallocation of the graphics memory.
  • DMA refers to the process of copying data from the system memory to graphics memory.
  • the method of selecting graphics data of a server system for transmission further comprises compressing data of a frame buffer of the graphics memory.
  • transmission further comprises checking at least one of a bandwidth of a link between the server system and a client system, and capabilities of the client system, and the server system transmitting at least one of the compressed frame buffer data or the transmit buffer based at least in part on the at least one of the bandwidth of the links and the capabilities of the client system.
  • the bandwidth and the client capabilities are checked on a frame-by- frame basis to determine whether to compress data of the frame buffer on a frame-by- frame basis, and place a percentage of the data in the transmit buffer for every frame.
  • checking on a frame-by-frame basis includes checking the client capabilities and the bandwidth at the start of each frame, and placing the compresses or uncompressed data in the frame buffer or transmit buffer accordingly for the frame.
  • the transmit buffer is transmitted to the client system. If the bandwidth and the client capabilities determine that graphics stream 124 cannot be transmitted, then compressed frame buffer data and optionally partial uncompressed transmit buffer data is transmitted to the client system. If the client system does not have the capabilities to handle uncompressed data, then compressed frame buffer data is transmitted to the client system. If the transmit buffer is capable of being transmitted to the client system, the compression phase is dropped and no compressed video stream is generated. [0057] The server system maintains reference frame/s for subsequent
  • Figure 5 shows an example of setting and resetting of status-bits that are used for determining whether to place data in the transmit buffer.
  • at least two status-bits are required to determine if a cache-line can be placed in transmit buffer for transmission to the client system.
  • ⁇ ', ' ⁇ , ⁇ and '10' indicate the state of the status-bits or the value of the status-bits.
  • the status-bits can have the value ⁇ when the cache- line is transmitted to client system 140 via transmit buffer 122.
  • the status-bits are reset when the cache-line is cleared due to memory reallocation or Direct Memory Access (DMA) operation.
  • DMA Direct Memory Access
  • FIG. 6 is a flow chart of method 600 that includes steps of a method of operating a client system.
  • client system 140 in one or more handshaking operations, establish the connection with server system 110 and communicate the capabilities of client system 140.
  • client system 140 receives a frame of data from server system 110.
  • the data received includes a header with information about the type of data and the type of compression technique followed by data.
  • the received data includes one or more header and data combinations so that the header and data maybe interleaved.
  • step 630 method 600 reads the data header to detect the data type. If method 600 detects uncompressed data, method 600 proceeds to step 640. If method 600 detects compressed data, method 600 proceeds to step 650. Graphics rendering of received data takes place in step 640. In step 650, method 600 decompresses the received data. In step 660, data is placed in the frame buffer of client graphics memory 142 for display.
  • FIG. 6 shows a block diagram of an embodiment of a server system and a client system.
  • the paradigm is shifting from distributed computing to centralized computing. All the resources in the system are being centralized. These include the CPU, storage, networking etc. Applications are run on the centralized server and the results are ported over to the client.
  • This model works well in a number of scenarios but fails to address execution of graphics-rich applications which are becoming increasingly important in the consumer space. Centralizing graphics computes has not been addressed adequately as yet. This is because of issues with virtualization of the GPU and bandwidth constraints for transfer of the GPU output buffers to the client.
  • Video compression is a technique which lends itself to adaptive compression based on instantaneous network bandwidth availability. Video compression technique does have a few limitations. These include :- o Computationally intensive and places a heavy additional burden on the server resources.
  • Data in memory is stored in the form of cache-lines.
  • a bit- map is maintained on the server side which tracks the status of each cache-line. The bit-map indicates
  • a dedicated Wide/Local Area Graphics Network (WAGN/LAGN) is implemented to carry the graphics network data from the server to the client.
  • a hybrid TCP-UDP protocol is implemented to provide an optimal combination of speed and reliability.
  • the TCP protocol is used to transmit the command/control packets (command buffers/shader programs) and the UDP protocol is used to transfer the data packets (index buffers/vertex buffers/textures/constant buffers).
  • Virtualization is a technique for hiding the physical characteristics of computing resources to simplify the way in which other systems, applications, or end users interact with those resources.
  • the proposal lists different features which are implemented in the hardware to assist virtualization of the graphics resource. These include :-
  • FIG. 7 shows a block diagram of hardware assisted memory virtualization in a graphics system.
  • Video memory is split between the virtual machines (VMs).
  • the amount of memory allocated to each VM is updated regularly based on utilization and availability. But it is ensured that there is no overlap of memory between the VMs so that video memory management can be carried out by the VMs.
  • Hardware keeps track of the allocation for each VM in terms of memory blocks of 32 MB. Thus the remapping of the addresses used by the VMs to the actual video memory addresses is carried out by hardware.
  • FIG. 8 shows a block diagram of hardware virtualization in a graphics system.
  • each VM is provided an entry point into the hardware.
  • the VMs deliver workloads to the hardware in a time- sliced fashion.
  • the hardware builds in mechanisms to fairly arbitrate and manage the execution of these workloads from each of the VMs.
  • Figure 9 shows a block diagram of fast context switching in a graphics system.
  • the number of context switches (changing workloads) would be more frequent.
  • fast context- switching is required to get minimal overhead when switching between the VMs.
  • the hardware implements thread-level context switching for fast response and also concurrent context save and restore to hide the switch latency.
  • Figure 10 shows a block diagram of scalar/vector adaptive execution in a graphics system.
  • Processors have an instruction-set defined to which the device is programmed. Different instruction-sets have been developed over the years.
  • the baseline scalar instruction- set for OpenCL/DirectCompute defines instructions which operate on one data entity.
  • a vector instruction- set defines instructions which operate on multiple data i.e. they are SIMD.
  • 3D graphics APIs (openGl/DirectX) define a vector instruction set which operate on 4-channel operands.
  • the scheme we have here defines a technique whereby the processor core carries out adaptive execution of scalar/4-D vector instruction sets with equal efficiency.
  • the data operands read from the on-chip registers or buffers in memory are 4x the width of the ALU compute block.
  • the data is serialized into the compute block over 4 clocks.
  • the 4 sets of data correspond to one register for the execution thread.
  • the 4 sets of data correspond to one register for four execution threads.
  • the 4 sets of result data are gathered and written back to the on-chip registers.
  • Figure 11 shows a flowchart of a smart pre-fetch/pre-decode technique in a graphics system.
  • the processors of today have multiple pipeline stages in the compute core.
  • Figure 12 shows a diagram of video encoding in a video processing system.
  • a completely programmable multi-threaded video processing engine is implemented to carry out decode/encode/transcode and other video post-processing operations.
  • Video processing involves parsing of bit-streams and computations on blocks of pixels. The presence of multiple blocks in a frame enables efficient multi-threaded processing. All the block computations are carried out in SIMD fashion.
  • the key to realizing maximum benefit from SIMD processing is designing the right width for the SIMD engine and also providing the infrastructure to feed the engine the data that it needs. This data includes the instruction along with the operands which could be on-chip registers or data from buffers in memory.
  • Video Decoding Involves high-level parsing for stream properties & stream marker identification followed by variable-length parsing of the bit-stream data between markers. This is implemented in the programmable processor with specialized instructions for fast parsing. For the subsequent mathematical operations (Inverse Quantization, IDCT, Motion Compensation, De-blocking, De-ringing), a byte engine to accelerate operations on byte & word operands has been defined.
  • Video Encoding - Motion Estimation is carried out to determine the best match using a high-density SAD4x4 instruction (each of the four 4x4 blocks in the source are compared against the sixteen different 4x4 blocks in the reference). This is followed by
  • DCT quantization and video decoding which is carried out in the byte engine.
  • the subsequent variable-length-coding is carried out with special bit-stream encoding and packing instructions.
  • Video Transcoding Uses a combination of the techniques defined for decoding and encoding.
  • Figure 13 shows a diagram of video post-processing in a video processing system.
  • a number of post-processing algorithms involve filtering of pixels in horizontal and vertical direction.
  • the fetching of pixel data from memory and its organization in the on- chip registers enables efficient access to data in both directions.
  • the filtering is carried out with dot-product instructions (dp5, dp9 & dp 16) in multiple shapes (horizontal, bidirectional, square, vertical).
  • Figure 14 shows a flowchart of branch technique.
  • SIMD multiple threads in one group
  • IP execution instruction pointer
  • the flag indicates that the thread is in the same flow as the current execution and hence, execution only occurs for threads that have their flag set.
  • the flag is set for all threads at the beginning of execution. Because of a conditional branch, if a thread does not take the current execution code path, its flag is turned off and its execution IP is set to the pointer it needs to move to. At merge points, the execution IP of threads whose flags are turned off are compared with the current execution IP. If the IPs match, the flag is set. At branch points, if all currently active threads take the branch, the current execution IP is set to the closest (minimum positive delta from the current execution IP) execution IP among all threads.
  • FIG. 15 shows a flowchart of programmable output merger.
  • the 3D graphics APIs (openGL, DirectX) define a processing pipeline as shown in the diagram.
  • Most of the pipeline stages are defined as shaders which are programs run on the appropriate entities (vertices/polygons/pixels).
  • Each shader stage receives inputs from the previous stage (or from memory), uses various other input resources (programs, constants, textures, ...) to process the inputs and delivers outputs to the next stage.
  • a set of general purpose registers are used for temporary storage of variables.
  • the other stages are fixed-function blocks controlled by state.
  • the APIs categorize all of the state defining the entire pipeline into multiple groups. Maintaining orthogonality of these state groups in hardware i.e. keeping the state groups independent of each other eliminates dependencies in the driver compiler and enables a state-less driver.
  • the output merger state defines how the pixel values are blended/combined with the co- located frame buffer values.
  • this state is implemented as a pair of subroutines run before and after the pixel shader execution.
  • a prefix subroutine issues a fetch of the frame buffer values.
  • a suffix subroutine has the blend instructions.
  • the pixel- shader outputs (which are created into the general purpose registers) need to be combined with the frame buffer values (fetched by the prefix subroutine) using the blend
  • the pixel-shader output registers are tagged as such and a CAM (Content
  • Register Remapping This is a compiler technique to optimize/minimize the registers used in a program. To carry out remapping of the registers used in the shader programs, a bottoms-up approach is used.
  • the program is pre-compiled top-to-bottom with instructions of fixed size.
  • This pre-compiled program is then parsed bottom- to-top.
  • a register map is maintained for the general purpose registers (GPR) which tracks the mapping between the original register number and the remapped register number. Since the registers in shader programs are 4-channel, the channel enable bits are also tracked in the register map.
  • GPR general purpose registers
  • a GPR is removed from the register map if it is a destination register
  • the program can be recompiled top-to-bottom one more time to use variable length instructions. Also, some registers with only a sub-set of channels enabled can be merged into one single register.

Abstract

Methods, systems and apparatuses for selecting graphics data of a server system for transmission are disclosed. One method includes reading data from graphics memory of the server system. The data read from the graphics memory is placed in a transmit buffer if the data is being read for the first time, and was not written by a processor of the server system. One system includes a server system including graphics memory, a frame buffer and a processor. The server system is operable to read data from the graphics memory. The server system is operable to place the data in a transmit buffer if the data is being read for the first time, and was not written by the processor during rendering.

Description

PROCESSING OF GRAPHICS DATA OF A SERVER SYSTEM FOR
TRANSMISSION
Related Applications
This patent application claims priority to US provisional patent application serial number 61/355,768 filed June 17, 2010.
Field of the Embodiments
[001] The described embodiments relate generally to transmission of graphics data. More particularly, the described embodiments relate to methods, apparatuses and systems for processing of graphics data on a server system for transmission to a client system and processing on a client system.
Background
[002] The onset of cloud computing is causing a paradigm shift from distributed computing to centralized computing. Centralized computer includes most of the resources of a system being "centralized". These resources generally include a centralized server that includes central processing unit (CPU), memory, storage and support for networking. Applications run on the centralized server and the results are transferred to one or more clients.
[003] Centralized computing works well in many applications, but falls short in the execution of graphics-rich applications, which are increasingly popular with consumers. Proprietary techniques are currently used for remote processing of graphics for thin-client applications. Proprietary techniques include Microsoft RDP (Remote Desktop Protocol), Personal Computer over Internet Protocol (PCoIP), VMware View and Citrix Independent Computing Architecture (ICA) and may apply a compression technique to a frame/display buffer. [004] Video compression scheme is most suited for remote processing of graphics for thin-client applications as the content of the frame buffer changes incrementally. Video compression scheme is an adaptive compression technique based on instantaneous network bandwidth availability, computationally intensive and places additional burden on the server resources. In video compression scheme, the image quality is compromised and additional latency is introduced due to the compression phase.
[005] It is desirable to have a method, apparatus and system for transmission of graphics data that reduces computation demands, enables lossless compression and improves latency.
Summary
[006] One embodiment includes a method of selecting graphics data of a server system for transmission. The method includes reading data from graphics memory of the server system. The data read from the graphics memory is placed in a transmit buffer if the data is being read for the first time, and was not written by a processor of the server system during graphics rendering.
[007] Another embodiment includes a system for selecting graphics data for transmission. The system includes a server system comprising graphics memory, a frame buffer and a processor. The server system is operable to read data from the graphics memory. The server system is operable to place the data in a transmit buffer if the data is being read for the first time, and was not written by the processor during graphics rendering.
[008] Other aspects and advantages of the described embodiments will become apparent from the following detailed description, taken in conjunction with the accompanying drawings, illustrating by way of example the principles of the described embodiments.
Brief Description of the Drawings
[009] Figure 1 shows a block diagram of an embodiment of a server and client systems.
[0010] Figure 2 is a flow chart that includes the steps of an example of a method selecting graphics data for transmission from the server to the client.
[0011] Figure 3 is a flow chart that includes the steps of an example of a method placing data in a transmit buffer.
[0012] Figure 4 is a flow chart that includes steps of an example of a method of selecting graphics data of a server system for transmission.
[0013] Figure 5 shows an example of setting and resetting of status-bits that are used for determining whether to place data in the transmit buffer.
[0014] Figure 6 is a flow chart that includes steps of a method of operating a client system.
[0015] Figure 7 shows a block diagram of an embodiment of a server system and a client system.
[0016] Figure 8 shows a block diagram of a hardware assisted memory virtualization in a graphics system.
[0017] Figure 9 shows a block diagram of hardware virtualization in a graphics system.
[0018] Figure 10 shows a block diagram of fast context switching in a graphics system.
[0019] Figure 11 shows a block diagram of scalar/vector adaptive execution in a graphics system. [0020] Figure 12 shows a flowchart of a smart pre-fetch/pre-decode technique in a graphics system.
[0021] Figure 13 shows a diagram of motion estimation for video encoding in a video processing system.
[0022] Figure 14 shows a diagram of tap filtering for video post-processing in a video processing system.
[0023] Figure 15 shows a flowchart of a Single Instruction Multiple Data (SIMD) branch technique.
[0024] Figure 16 shows a flowchart of programmable output merger
implementation in a graphics system.
Detailed Description
[0025] The described embodiments are embodied in methods, apparatuses and systems for selecting graphics data for transmission. These embodiments provide for lossless or near-lossless transmission of graphics data between a server system and a client system while maintaining low latency. For the described embodiments, lossless and near-lossless may be used interchangeably and may mean lossless or near-lossless compression and transmission methods. For the described embodiments, processor refers to a device that processes graphics which includes and not limited to any one of or all of graphics processing unit (GPU), central processing unit (CPU), Accelerated Processing Unit (APU) and Digital Signal Processor (DSP). Depending upon a link bandwidth and/or capabilities of the client system, the described embodiments also include the transmission of video stream. For the described embodiments, graphics stream refers to uncompressed data which is a subset of graphics and command data. For the described embodiments, video stream refers to compressed frame buffer data.
[0026] Figure 1 shows a block diagram of an embodiment of a graphics server- client co-processing system. The system consists of server system 110 and client system 140. This embodiment of server system 110 includes graphics memory 112, central processing unit (CPU) 116, graphics processing unit (GPU) 120, graphics stream 124, video stream 128, mux 130, control 132 and link 134. This embodiment of the client system 140 includes client graphics memory 142, CPU 144, and GPU 148.
Server system
[0027] As shown in Figure 1, for the described embodiments, graphics memory
112 includes command and graphics data 114, frame buffer 118, transmit buffer 122, and compressed frame buffer 126. For the described embodiments, graphics memory 112 resides in server system 110. In another embodiment, graphics memory 112 may not reside in server system 110. The server system processes graphics data and manages data for transmission to the client system. Graphics memory 112 may be any one of or all of
Dynamic Random Access memory (DRAM), Static Random Access Memory (SRAM), flash memory, content addressable memory or any other type of memory. For the described embodiments, graphics memory 112 is a DRAM storing graphics data. For the described embodiments, a block of data that is read or written to memory is referred to as a cache-line. For the described embodiments, the status of the cache-line of command and graphics data 114 is stored in graphics memory 112. In another embodiment, the status can be stored in a separate memory. In this embodiment, status-bits refer to a set of one or more status bits of memory used to store the status of a cache-line or a subset of the cache-line. A cache-line can have one or more sets of status-bits.
[0028] For the described embodiments, graphics memory 112 is located in the system memory (not shown in Figure 1). In another embodiment, graphics memory 112 may be in a separate dedicated video memory. Graphics application running on the CPU loads graphics data into system memory. For the described embodiments, graphics data includes at least index buffers, vertex buffers and textures. The graphics driver of GPU 120 translates graphics Application Programming Interface (API) calls made by, for example, a graphics application into command data. For the described embodiments, graphics API refers to an industry standard API such as OpenGL or DirectX. For the described embodiments, the graphics and command data is placed in graphics memory either by copying or remapping. Typically, the graphics data is large and generally not practical to transmit to client systems as is.
[0029] GPU 120 processes command and data in command and graphics data 114 and selectively places data either in frame buffer 118 at the end of graphics rendering or in transmit buffer 122 during graphics rendering. GPU 120 is a specialized processor for manipulating and displaying graphics. For the described embodiments, GPU 120 supports 2D, 3D graphics and/or video. As will be described, GPU 120 manages generation of compressed data for placement in the compressed frame buffer 126 and a subset of uncompressed graphics and command data is placed in transmit buffer 122. The data from transmit buffer contains graphics data and is referred to as graphics stream 124.
[0030] Transmit buffer 122 is populated with a selected subset of command and graphics data 114 during graphics rendering. The selected subset of data from command and graphics data 114 is such that the results obtained by the client system by processing the subset of data can be identical or almost identical to processing the entire contents of command and graphics data 114. The process of selecting a subset of data from command and graphics data 114 to fill transmit buffer 122 is discussed further in conjunction with Figure 2. During the process of graphics rendering, GPU 120 fills transmit buffer 122. For the described embodiments, the contents of transmit buffer includes at least command data or graphics API command calls along with graphics data. For an embodiment, the allocated size of transmit buffer 122 is adaptively determined by the maximum available bandwidth on the link. For example, the size of the frame buffer can dynamically change over time as the bandwidth of the link between the server system and the client system varies.
[0031] In this embodiment, GPU 120 is responsible for graphics rendering frame buffer 118 and generating compressed frame buffer 126. In this embodiment, compressed frame buffer 126 is generated if the client does not have capabilities or the bandwidth is not sufficient to transmit graphics stream. The compressed frame buffer is generated by encoding the contents of frame buffer 118 using industry standard compression techniques, for example MPEG2 and MPEG4.
[0032] Graphics stream 124 includes at least uncompressed graphics data and header with at least data type information. Graphics stream 124 is generated during graphics rendering and may be available while the transmit buffer has data.
[0033] Video stream 128 includes at least a compressed video data and header conveying the information required for interpreting the data type for decompression. Video stream 128 can be available as and when compressed frame buffer 126 is generated.
[0034] Mux 130 illustrates a selection between graphics stream 124 generated by data from the transmit buffer 122 and video stream 128 generated by data from compressed frame buffer 126. The selection by mux 130 is done on a frame-by-frame basis and is controlled by control 132, which at least in some embodiments is generated by the GPU
120. A frame is the interval of processing time for generating a frame-buffer for display.
For other embodiments, control 132 is generated by CPU and/or GPU. For the described embodiments, control 132 dependents on at least in part upon either bandwidth of link 134 between the server system 110 and the client system 140, and the processing capabilities of client system 140.
[0035] Mux 130 selects between the graphics stream and the video stream, the selection can occur once per clock cycle, which is typically less than a frame. In this embodiment, the data transmitted on link 134 consists of data from compressed frame buffer and/or transmit buffer. For some embodiments, link 134 is a dedicated Wide Area Graphics Network (WAGN) / Local Area Graphics Network (LAGN) to transmit graphics/video stream from server system 110 to client system 140. In an embodiment, a hybrid Transmission Control Protocol (TCP)-User Datagram Protocol (UDP) may be implemented to provide an optimal combination of speed and reliability. For example, the TCP protocol is used to transmit the command /control packets and the UDP protocol is used to transfer the data packets. For example, command/control packet can be the previously described command data, the data packets can be the graphics data.
Client System
[0036] The client system receives data from the server system and manages the received data for user display. For the described embodiments, client system 140 includes at least client graphics memoryl42, CPU 144, and GPU 148. Client graphics memory 142 which includes at least a frame buffer may be a Dynamic Random Access memory (DRAM), Static Random Access Memory (SRAM), flash memory, content addressable memory or any other type of memory. In this embodiment, client graphics memory 142 is a DRAM storing command and graphics data.
[0037] In an embodiment, graphics/video stream received from server system
110 via link 134 is a frame of data and processed using standard graphics rendering or video processing techniques to generate the frame buffer for display. The received frame includes at least a header and data. For the described embodiments, the GPU reads the header to detect the data type which can include at least uncompressed graphics stream or compressed video stream to process the data. The method of handling the received data is discussed in conjunction with Figure 5.
[0038] Figure 2 is a flow chart of method 200 that includes the steps of an example of a method of selecting graphics data for transmission from the server to the client. In step 210, command data buffer generation takes place. In this step, the graphics software application commands are compiled by the GPU software driver to translate command data in system memory. This step also involves the process of loading the system memory with graphics data.
[0039] In step 220 command and graphics data buffer is allocated. In this step, a portion of free or unused graphics memory 112 is defined as command and graphics data 114 based on the requirement and the command and graphics data in system memory is copied to graphics memory 112 if the graphics memory is a dedicated video memory or remapped/copied to graphics memory 112 if the graphics memory is part of system memory.
[0040] In step 230, graphics data is rendered on server system 110. Graphics data in server system 110 read from command and graphics data 114 is rendered by GPU 120. For the described embodiments, graphics rendering or 3D rendering is the process of producing a two-dimensional image based on three-dimensional scene data. Graphics rendering involves processing of polygons and generating the contents of frame buffer 118 for display. Polygons such as triangles, lines & points have attributes associated with the vertices which are stored in vertex buffer/s and determine how the polygons are processed. The position coordinates undergo linear (scaling, rotation, translation etc.) and viewing (world and view space) transformation. The polygons are rasterized to determine the pixels enclosed within. Texturing is a technique to apply/paste texture images onto these pixels. The pixel color values are written to frame buffer 118.
[0041] Step 240 involves checking the client system capabilities to decide the compression technique. In the described embodiments, the size and bandwidth of client graphics memory 142, graphics API support in the client system, the performance of GPU 148 and decompression capabilities of client system 140 constitutes client system capabilities.
[0042] When the client system has capabilities, transmit buffer is generated. In step 260, the contents of transmit buffer 122 is generated during graphics rendering. Data is written into transmit buffer 122 as and when data is rendered. A subset of graphics and command data is identified and unique instances of data are selected for placing data in transmit buffer 122 which is discussed in conjunction with Figure 3. The data from transmit buffer is referred to as graphics stream 124.
[0043] In step 270, method 200 checks for at least the bandwidth of link 134 connecting server system 110 and client system 140. If sufficient bandwidth is available, graphics stream 124 is transmitted in step 290.
[0044] If the bandwidth available is not sufficient or if the client system does not have capabilities, compressed frame buffer 126 is generated. In step 250, compressed frame buffer is generated by encoding the contents of frame buffer 118 using MPEG2, MPEG4 or any other compression techniques. The selection of compression technique is determined by the client capabilities. After graphics rendering is complete, the compressed frame buffer is filled during compression of frame buffer 118. In step 280, compressed frame buffer is transmitted.
[0045] Figure 3 is a flow chart of method 300 that includes the steps of an example of a method placing data in a transmit buffer 122. In step 310, a cache-line or a block of data is read from command and graphics data 114 or frame buffer 118 graphics rendering by the server system.
[0046] In step 320, the cache-line is checked for being read for the first time to determine if the data in the cache-line is new. If the data has been read earlier, the data is available on client system 140 or present in transmit buffer 122; the cache-line is not processed further and method 300 returns to step 310. If the cache-line is being read for the first time, the client system does not have the data and not present in the transmit buffer 122, method 300 proceeds to step 330. [0047] In step 330, the cache-line of command and graphics data 114 or frame buffer 118 is checked if the data in the cache-line was written during graphics rendering by a processor. If the data in the cache-line was written by a processor, the data in cache- line is not processed and method 300 returns to step 310. If the cache-line is not written by the processor, then method 300 proceeds to step 340. In step 340, the cache-line is placed in transmit buffer 122.
[0048] Figure 4 is a flow chart that includes steps of an example of a method of selecting graphics data of a server system for transmission. A first step 410 includes reading data from graphics memory of the server system. A second step 420 includes placing the data in a transmit buffer if the data is being read for the first time, and was not written during graphics rendering by a processor of the server system. A third step 430 includes transmitting the data of the transmit buffer to a client system. In an embodiment, the processor is a CPU and/or a GPU.
[0049] In this embodiment, the server system includes a central processing unit
(CPU) and a graphics processing unit (GPU). The GPU controls compression and placement of data of a frame buffer into a compressed frame buffer. The GPU controls selection of either compressed data of the compressed frame buffer or uncompressed data of the transmit buffer for transmission to the client system.
[0050] Checking a first status-bit determines whether the data is being read for the first time. The first status-bit is set when the data is placed in the transmit buffer and not yet transmitted.
[0051] The data being read can be a cache-line which is a block of data. One or more status-bits define the status of the cache-line. In another embodiment, each sub- block of the cache-line can have one or more status-bits. For an embodiment, the data comprises a plurality of blocks, and wherein determining if the data is being read for the first time comprises checking at least one status-bit corresponding to at least one block
[0052] The second status-bit determines whether the data was not written by the processor. The second status-bit is set when the processor writes to the graphics memory. The first status-bit is reset upon detecting a direct memory access (DMA) of the graphics memory or reallocation of the graphics memory. The second status-bit is reset upon detecting a direct memory access (DMA) of the graphics memory or reallocation of the graphics memory. For the described embodiments, DMA refers to the process of copying data from the system memory to graphics memory.
[0053] The method of selecting graphics data of a server system for transmission, further comprises compressing data of a frame buffer of the graphics memory.
[0054] The method of selecting graphics data of a server system for
transmission, further comprises checking at least one of a bandwidth of a link between the server system and a client system, and capabilities of the client system, and the server system transmitting at least one of the compressed frame buffer data or the transmit buffer based at least in part on the at least one of the bandwidth of the links and the capabilities of the client system.
[0055] The bandwidth and the client capabilities are checked on a frame-by- frame basis to determine whether to compress data of the frame buffer on a frame-by- frame basis, and place a percentage of the data in the transmit buffer for every frame. For an embodiment, checking on a frame-by-frame basis includes checking the client capabilities and the bandwidth at the start of each frame, and placing the compresses or uncompressed data in the frame buffer or transmit buffer accordingly for the frame.
[0056] If adequate bandwidth is available and the client is capable of processing graphics stream 124, the transmit buffer is transmitted to the client system. If the bandwidth and the client capabilities determine that graphics stream 124 cannot be transmitted, then compressed frame buffer data and optionally partial uncompressed transmit buffer data is transmitted to the client system. If the client system does not have the capabilities to handle uncompressed data, then compressed frame buffer data is transmitted to the client system. If the transmit buffer is capable of being transmitted to the client system, the compression phase is dropped and no compressed video stream is generated. [0057] The server system maintains reference frame/s for subsequent
compression of data of the frame buffer. For each frame, a decision is made to send either lossless graphics data or lossy video compression data. When implementing video compression for a particular frame on the server, previous frames are used as reference frames. The reference frames correspond to lossless frame or lossy frame transmitted to the client.
[0058] Figure 5 shows an example of setting and resetting of status-bits that are used for determining whether to place data in the transmit buffer. For the described embodiment, at least two status-bits are required to determine if a cache-line can be placed in transmit buffer for transmission to the client system. ΌΟ', 'Ο , Ί and '10' indicate the state of the status-bits or the value of the status-bits.
[0059] From '00' State: When a cache-line of server graphics data is read or written by the processors for the first time from command and graphics data 114 and /or frame buffer 118 (step 310) the status-bits of each cache-line has a value '00' also referred to as state '00' . The cache-line can be either read by the processors or written by the processor to change state. When the processor reads the cache-line, the status-bits are updated to Ό state. If the cache-line is written by the processor, the status-bits of the cache-line are updated to ' 10' state.
[0060] From Ό State: The status-bits of the cache-line read by the processor is updated to state Ί when the cache-line is transmitted to client system 140. The status- bits are reset to '00' state if the cache-line was not transmitted due to bandwidth limitations.
[0061] From Ί State: The status-bits can have the value Ί when the cache- line is transmitted to client system 140 via transmit buffer 122. The status-bits are reset when the cache-line is cleared due to memory reallocation or Direct Memory Access (DMA) operation.
[0062] From ' 10' State: Once a cache-line is written by processor 120, the cache- line cannot be transmitted via transmit buffer and assumes a ' 10' state. The status-bits of the cache-line are reset due to memory reallocation or Direct Memory Access (DMA) operation.
[0063] Figure 6 is a flow chart of method 600 that includes steps of a method of operating a client system. In step 610, client system 140 in one or more handshaking operations, establish the connection with server system 110 and communicate the capabilities of client system 140. In step 620, client system 140 receives a frame of data from server system 110. In this embodiment, the data received includes a header with information about the type of data and the type of compression technique followed by data. The received data includes one or more header and data combinations so that the header and data maybe interleaved.
[0064] In step 630, method 600 reads the data header to detect the data type. If method 600 detects uncompressed data, method 600 proceeds to step 640. If method 600 detects compressed data, method 600 proceeds to step 650. Graphics rendering of received data takes place in step 640. In step 650, method 600 decompresses the received data. In step 660, data is placed in the frame buffer of client graphics memory 142 for display.
Extensions and Alternatives
Network Graphics
[0065] Figure 6 shows a block diagram of an embodiment of a server system and a client system. With the onset of cloud computing, the paradigm is shifting from distributed computing to centralized computing. All the resources in the system are being centralized. These include the CPU, storage, networking etc. Applications are run on the centralized server and the results are ported over to the client. This model works well in a number of scenarios but fails to address execution of graphics-rich applications which are becoming increasingly important in the consumer space. Centralizing graphics computes has not been addressed adequately as yet. This is because of issues with virtualization of the GPU and bandwidth constraints for transfer of the GPU output buffers to the client.
[0066] Different proprietary techniques are currently used for remoting of graphics for thin-client applications. These include Microsoft RDP (Remote Desktop Protocol), PCoIP, VMware View and Citrix ICA. All of them rely on some kind of compression technique applied to the frame/display buffer. Given the property that the frame buffer content changes incrementally, a video compression scheme is most suited. Video compression is a technique which lends itself to adaptive compression based on instantaneous network bandwidth availability. Video compression technique does have a few limitations. These include :- o Computationally intensive and places a heavy additional burden on the server resources.
o To achieve adequate compression, the image quality is compromised, o Network latency is an issue in remote graphics. Additional latency
introduced because of the compression phase.
[0067] The evolution of the graphics API has also created a relatively low, albeit variable, bandwidth interface at the API level. There are different resources/surfaces (indices, vertices, constant buffers, shader programs, textures) needed by the GPU for processing. In 3d graphics processing, these resources get reused for multiple frames and enable cross-frame caching. Vertex and texture data are the biggest consumers of the available video memory foot-print but only a small percentage of the data is actually used and the utilization is spread across multiple frames.
[0068] The above-described property of the 3D API is exploited to develop the scheme of API remoting. A server-client co-processing model has been developed to significantly trim the bandwidth requirements and enable API remoting. The server operates as a stand-alone system with all the desktop graphics applications being run on the server. During the execution, key information is gathered which identifies the minimal set of data needed for execution of the same on the client side. The data is then transferred over the network. The API interface bandwidth being variable, one cannot guarantee adequate bandwidth availability. Hence an adaptive technique is adopted whereby when the API remoting bandwidth needs exceed the available bandwidth, the display frame (which was anyhow created on the server side to generate the statistics for minimal data-transfer) is video-encoded and sent over the network. The decision is made at frame granularity.
[0069] Data in memory is stored in the form of cache-lines. A bit- map is maintained on the server side which tracks the status of each cache-line. The bit-map indicates
• 0 - the cache-line is clean (never written to or never accessed so far since the last DMA write)
• 1 - has been transferred to the client.
[0070] When a particular cache-line is accessed and its status is 'Ο', the accessed data is placed in a network ring and the status is updated to Ί ' . If the network ring overflows i.e. the required bandwidth for API remoting exceeds the available network bandwidth, execution continues but does not update the bitmap/network ring. The data in the network ring is trickled down to the client. After the creation of the final display buffer, it is adaptively video-encoded for transmission. Over time, the bandwidth requirements for API remoting will gradually reduce and will eventually enable it. [0071 ] A dedicated Wide/Local Area Graphics Network (WAGN/LAGN) is implemented to carry the graphics network data from the server to the client. A hybrid TCP-UDP protocol is implemented to provide an optimal combination of speed and reliability. The TCP protocol is used to transmit the command/control packets (command buffers/shader programs) and the UDP protocol is used to transfer the data packets (index buffers/vertex buffers/textures/constant buffers).
[0072] To avoid the need for a graphics pre -processor on the server, software running on the server side can generate the traffic to be sent to the client for processing. The driver stack running on the server would identify the surfaces/resources/state required for processing the workload and push the associated data to the client over the system network. Conceptually, the above-mentioned bandwidth reduction scheme (running the workload on the server using a software rasterizer and identifying the minimal data for processing on the client side) can also be implemented and the shortlisted data can be transferred to the client.
Graphics Virtualization - Hardware Assist
[0073] Virtualization is a technique for hiding the physical characteristics of computing resources to simplify the way in which other systems, applications, or end users interact with those resources. The proposal lists different features which are implemented in the hardware to assist virtualization of the graphics resource. These include :-
Memory virtualization
[0074] Figure 7 shows a block diagram of hardware assisted memory virtualization in a graphics system. Video memory is split between the virtual machines (VMs). The amount of memory allocated to each VM is updated regularly based on utilization and availability. But it is ensured that there is no overlap of memory between the VMs so that video memory management can be carried out by the VMs. Hardware keeps track of the allocation for each VM in terms of memory blocks of 32 MB. Thus the remapping of the addresses used by the VMs to the actual video memory addresses is carried out by hardware.
Hardware virtualization
[0075] Figure 8 shows a block diagram of hardware virtualization in a graphics system. To provide a view of dedicated hardware to the VMs, each VM is provided an entry point into the hardware. The VMs deliver workloads to the hardware in a time- sliced fashion. The hardware builds in mechanisms to fairly arbitrate and manage the execution of these workloads from each of the VMs.
Fast context- switching
[0076] Figure 9 shows a block diagram of fast context switching in a graphics system. With hardware virtualization, the number of context switches (changing workloads) would be more frequent. To get effective hardware virtualization, fast context- switching is required to get minimal overhead when switching between the VMs. The hardware implements thread-level context switching for fast response and also concurrent context save and restore to hide the switch latency.
Scalar/Vector adaptive execution
[0077] Figure 10 shows a block diagram of scalar/vector adaptive execution in a graphics system.
[0078] Processors have an instruction-set defined to which the device is programmed. Different instruction-sets have been developed over the years. The baseline scalar instruction- set for OpenCL/DirectCompute defines instructions which operate on one data entity. A vector instruction- set defines instructions which operate on multiple data i.e. they are SIMD. 3D graphics APIs (openGl/DirectX) define a vector instruction set which operate on 4-channel operands. [0079] The scheme we have here defines a technique whereby the processor core carries out adaptive execution of scalar/4-D vector instruction sets with equal efficiency. The data operands read from the on-chip registers or buffers in memory are 4x the width of the ALU compute block. The data is serialized into the compute block over 4 clocks. For vector instructions, the 4 sets of data correspond to one register for the execution thread. For scalar instructions, the 4 sets of data correspond to one register for four execution threads. At the output of the ALU, the 4 sets of result data are gathered and written back to the on-chip registers.
Smart Pre-fetch/Pre-decode technique
[0080] Figure 11 shows a flowchart of a smart pre-fetch/pre-decode technique in a graphics system.
[0081] The processors of today have multiple pipeline stages in the compute core.
Keeping the pipeline fed is a challenge for designers. Fetch latencies (from memory) and branching are hugely detrimental to performance. To address these problems, a lot of complexity is added to maintain a high efficiency in the compute pipeline. Techniques include speculative prefetching and branch prediction. These solutions are required in single-threaded scenarios. Multi-threaded processors lend themselves to a unique execution model to mitigate these same set of problems.
[0082] While executing a program for a thread on the multi-threaded processor, only one instruction cache-line (made up of multiple instructions time. The clocks required to process the instructions in the instruction cache-line match the instruction fetch latency.
This ensures that in non-branch scenarios, the instruction fetch latency is hidden. On reception of the instruction cache-line from memory, it is pre-decoded. If an
unconditional branch instruction is) is fetched at a present, the fetch for the next instruction cache-line is issued from the branch instruction pointer. If a conditional branch instruction is present, the fetch of the next instruction cache-line is deferred until the branch is resolved. Because of the presence of multiple threads, this mechanism does not result in reduction of efficiency.
[0083] While pre-decoding the instruction cache-line, another piece of information extracted is about all the data operands required from memory. A memory fetch for all these data operands is issued at this point.
Video Processing
[0084] Figure 12 shows a diagram of video encoding in a video processing system. A completely programmable multi-threaded video processing engine is implemented to carry out decode/encode/transcode and other video post-processing operations. Video processing involves parsing of bit-streams and computations on blocks of pixels. The presence of multiple blocks in a frame enables efficient multi-threaded processing. All the block computations are carried out in SIMD fashion. The key to realizing maximum benefit from SIMD processing is designing the right width for the SIMD engine and also providing the infrastructure to feed the engine the data that it needs. This data includes the instruction along with the operands which could be on-chip registers or data from buffers in memory.
[0085] Video Decoding - Involves high-level parsing for stream properties & stream marker identification followed by variable-length parsing of the bit-stream data between markers. This is implemented in the programmable processor with specialized instructions for fast parsing. For the subsequent mathematical operations (Inverse Quantization, IDCT, Motion Compensation, De-blocking, De-ringing), a byte engine to accelerate operations on byte & word operands has been defined.
[0086] Video Encoding - Motion Estimation is carried out to determine the best match using a high-density SAD4x4 instruction (each of the four 4x4 blocks in the source are compared against the sixteen different 4x4 blocks in the reference). This is followed by
DCT, quantization and video decoding which is carried out in the byte engine. The subsequent variable-length-coding is carried out with special bit-stream encoding and packing instructions.
[0087] Video Transcoding - Uses a combination of the techniques defined for decoding and encoding.
Video post-processing
[0088] Figure 13 shows a diagram of video post-processing in a video processing system. A number of post-processing algorithms involve filtering of pixels in horizontal and vertical direction. The fetching of pixel data from memory and its organization in the on- chip registers enables efficient access to data in both directions. The filtering is carried out with dot-product instructions (dp5, dp9 & dp 16) in multiple shapes (horizontal, bidirectional, square, vertical).
Branch technique
[0089] Figure 14 shows a flowchart of branch technique. When processing programs in SIMD (multiple threads in one group) fashion, scenarios emerge where the different threads within the group take different paths in the program. A simple and cheap scheme to handle branches, both conditional and unconditional in a SIMD engine, is described here.
[0090] An execution instruction pointer (IP) is maintained along with a flag bit for each thread in the group. The flag indicates that the thread is in the same flow as the current execution and hence, execution only occurs for threads that have their flag set. The flag is set for all threads at the beginning of execution. Because of a conditional branch, if a thread does not take the current execution code path, its flag is turned off and its execution IP is set to the pointer it needs to move to. At merge points, the execution IP of threads whose flags are turned off are compared with the current execution IP. If the IPs match, the flag is set. At branch points, if all currently active threads take the branch, the current execution IP is set to the closest (minimum positive delta from the current execution IP) execution IP among all threads. Programmable Output Merger
[0091] Figure 15 shows a flowchart of programmable output merger. The 3D graphics APIs (openGL, DirectX) define a processing pipeline as shown in the diagram. Most of the pipeline stages are defined as shaders which are programs run on the appropriate entities (vertices/polygons/pixels). Each shader stage receives inputs from the previous stage (or from memory), uses various other input resources (programs, constants, textures, ...) to process the inputs and delivers outputs to the next stage. During processing, a set of general purpose registers are used for temporary storage of variables. The other stages are fixed-function blocks controlled by state.
[0092] The APIs categorize all of the state defining the entire pipeline into multiple groups. Maintaining orthogonality of these state groups in hardware i.e. keeping the state groups independent of each other eliminates dependencies in the driver compiler and enables a state-less driver.
[0093] The final stages of the 3D pipeline operate on pixels. After the pixels are shaded, the output merger state defines how the pixel values are blended/combined with the co- located frame buffer values.
[0094] In our programmable output merger, this state is implemented as a pair of subroutines run before and after the pixel shader execution. A prefix subroutine issues a fetch of the frame buffer values. A suffix subroutine has the blend instructions. The pixel- shader outputs (which are created into the general purpose registers) need to be combined with the frame buffer values (fetched by the prefix subroutine) using the blend
instructions in the suffix subroutine. To maintain orthogonality with the pixel-shader state, the pixel-shader output registers are tagged as such and a CAM (Content
Addressable Memory) is used to access these registers in the suffix subroutine.
Register Remapping [0095] This is a compiler technique to optimize/minimize the registers used in a program. To carry out remapping of the registers used in the shader programs, a bottoms-up approach is used.
[0096] The program is pre-compiled top-to-bottom with instructions of fixed size.
[0097] This pre-compiled program is then parsed bottom- to-top. A register map is maintained for the general purpose registers (GPR) which tracks the mapping between the original register number and the remapped register number. Since the registers in shader programs are 4-channel, the channel enable bits are also tracked in the register map.
[0098] All instructions not contributing to an output register are removed.
[0099] When a register is used as a source in an instruction and is not found in the register map, the register is remapped to an unused register and it is placed in the register map.
[00100] If a register used as a source/destination in an instruction is found in the register map, it is renamed accordingly.
[00101] A GPR is removed from the register map if it is a destination register
(after it has been renamed) and all the enabled channels in the register map are written to (as per the destination register mask).
[00102] Once the bottom-to-top compile is complete, the program can be recompiled top-to-bottom one more time to use variable length instructions. Also, some registers with only a sub-set of channels enabled can be merged into one single register.
[00103] Although specific embodiments have been described and illustrated, the described embodiments are not to be limited to the specific forms or arrangements of parts so described and illustrated. The embodiments are limited only by the appended claims.

Claims

CLAIMS What is claimed;
1. A method of selecting graphics data of a server system for transmission,
comprising:
reading data from graphics memory of the server system; and
placing the data in a transmit buffer if the data is being read for the first time, and was not written by a processor of the server system; and
transmitting the data of the transmit buffer to a client system.
2. The method of claim 1, wherein the processor includes at least one of a central processing unit (CPU) and a graphics processing unit (GPU), the method further comprising the GPU controlling compression and placement of data of a frame buffer into a compressed frame buffer, and the GPU controlling a selection of either compressed graphics data of the compressed frame buffer or data of the transmit buffer for transmission to the client system.
3. The method of claim 1, wherein determining whether the data is being read for the first time comprises checking at least a first status-bit.
4. The method of claim 1, wherein determining whether the data was not written by the processor comprises checking at least a second status-bit.
5. The method of claim 3, wherein the at least first status-bit is set when data is placed in the transmit buffer.
6. The method of claim 4, wherein the at least second status-bit is set when the processor writes to the graphics memory.
7. The method of claim 5, wherein the at least first status-bit is reset upon detecting at least one of a direct memory access (DMA) of the graphics memory or reallocation of the graphics memory.
8. The method of claim 6, wherein the second status-bit is reset upon detecting at least one of a direct memory access (DMA) of the graphics memory or reallocation of the graphics memory.
9. The method of claim 1, wherein the data comprises a plurality of blocks, and wherein determining if the data is being read for the first time comprises checking at least one status-bit corresponding to at least one block.
10. The method of claim 1, further comprising compressing data of a frame buffer of the graphics memory.
11. The method of claim 10, further comprising checking at least one of a bandwidth of a link between the server system and the client system, and capabilities of the client system, and the server system transmitting at least one of the compressed frame buffer data or the data of the transmit buffer based at least in part on the at least one of the bandwidth of the links and the capabilities of the client system.
12. The method of claim 11, wherein checking the bandwidth and the capabilities is preformed on a frame-by-frame basis.
13. The method of claim 11, further comprising determining whether to compress data of the frame buffer on a frame-by- frame basis, and placing at least a percentage of the data in the transmit buffer for every frame.
14. The method of claim 1, further comprising the server system providing a
reference frame to the client system for allowing the client system to decompress compressed video received from the server system and maintaining the reference frame for subsequent compression of data of the frame buffer even when the reference frame is lossless.
15. A system for selecting graphics data for transmission, comprising:
a server system comprising a processor and graphics memory, where in the graphics memory comprises a frame buffer and a transmit buffer, the server system operable to read data from the graphics memory; and
the server system operable to place the data in the transmit buffer if the data is being read for the first time, and was not written by the processor.
16. The system of claim 15, wherein the processor comprises at least a central
processing unit (CPU) and a graphics processing unit (GPU), and wherein the GPU controls compression and placement of data of the frame buffer into a compressed frame buffer, and the GPU controls a selection of either compressed graphics data of the compressed frame buffer or data of the transmit buffer for transmission to a client system.
17. The system of claim 15, wherein the server system is operable to compress data of the frame buffer.
18. The system of claim 17, further comprising the server system operable to select between transmitting the data of the transmit buffer or the compressed data of the frame buffer depending on a data bandwidth of a link between the server system and a client system.
19. The system of claim 17, further comprising the server system operable to select between transmitting the data of the transmit buffer or the compressed data of the frame buffer depending on capabilities of a client system, wherein the server system is linked to a client system.
20. The system of claim 15, further comprising a client system that is linked to the server system, the client system comprising at least a processor, and a client graphics memory, and wherein the GPU determines to decompress the data or render the data to place the data in a frame buffer of the client graphics memory based on the information in a data header of the data.
21. The method of claim 1, wherein an allocated size of the transmit buffer is
adaptively determined based on at least a bandwidth of a link between the server system and the client system.
PCT/US2011/064992 2011-06-16 2011-12-14 Processing of graphics data of a server system for transmission WO2012173650A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
KR1020147001108A KR101898565B1 (en) 2011-06-16 2011-12-14 Processing of graphics data of a server system for transmission
GB1322404.3A GB2510056B (en) 2011-06-16 2011-12-14 Processing of graphics data of a server system for transmission

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US13/161,547 US8754900B2 (en) 2010-06-17 2011-06-16 Processing of graphics data of a server system for transmission
US13/161,547 2011-06-16

Publications (1)

Publication Number Publication Date
WO2012173650A1 true WO2012173650A1 (en) 2012-12-20

Family

ID=45328222

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2011/064992 WO2012173650A1 (en) 2011-06-16 2011-12-14 Processing of graphics data of a server system for transmission

Country Status (4)

Country Link
US (1) US8754900B2 (en)
KR (1) KR101898565B1 (en)
GB (1) GB2510056B (en)
WO (1) WO2012173650A1 (en)

Families Citing this family (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8745173B1 (en) 2011-12-30 2014-06-03 hopTo Inc. Client computing system for and method of receiving cross-platform remote access to 3D graphics applications
US20170365237A1 (en) * 2010-06-17 2017-12-21 Thincl, Inc. Processing a Plurality of Threads of a Single Instruction Multiple Data Group
US9373152B2 (en) * 2010-06-17 2016-06-21 Thinci, Inc. Processing of graphics data of a server system for transmission including multiple rendering passes
US9600350B2 (en) * 2011-06-16 2017-03-21 Vmware, Inc. Delivery of a user interface using hypertext transfer protocol
US9378560B2 (en) * 2011-06-17 2016-06-28 Advanced Micro Devices, Inc. Real time on-chip texture decompression using shader processors
US8769052B1 (en) * 2011-12-30 2014-07-01 hopTo Inc. Cloud-based server computing system for and method of providing cross-platform remote access to 3D graphics applications
US8766990B1 (en) 2011-12-30 2014-07-01 hopTo Inc. Server computing system for and method of providing cross-platform remote access to 3D graphics applications
US9420322B2 (en) * 2012-03-14 2016-08-16 Time Warner Cable Enterprises Llc System and method for delivering compressed applications
US9858052B2 (en) * 2013-03-21 2018-01-02 Razer (Asia-Pacific) Pte. Ltd. Decentralized operating system
EP2954401B1 (en) * 2013-09-19 2024-03-06 Citrix Systems, Inc. Transmitting hardware-rendered graphical data
US20150220293A1 (en) * 2014-01-31 2015-08-06 LEAP Computing, Inc. Systems and methods for performing graphics processing
US9704270B1 (en) 2015-07-30 2017-07-11 Teradici Corporation Method and apparatus for rasterizing and encoding vector graphics
US10460513B2 (en) * 2016-09-22 2019-10-29 Advanced Micro Devices, Inc. Combined world-space pipeline shader stages
JP6809249B2 (en) * 2017-01-23 2021-01-06 コニカミノルタ株式会社 Image display system
US10319063B2 (en) * 2017-02-24 2019-06-11 Ati Technologies Ulc System and method for compacting compressed graphics streams for transfer between GPUs
KR102136699B1 (en) 2018-10-24 2020-07-22 한국건설기술연구원 Apparatus and method for providing 3D flooding information
US10981059B2 (en) * 2019-07-03 2021-04-20 Sony Interactive Entertainment LLC Asset aware computing architecture for graphics processing

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030120747A1 (en) * 2001-12-20 2003-06-26 Samsung Electronics Co., Ltd. Thin client network system and data transmitting method therefor
US20050036546A1 (en) * 2001-10-05 2005-02-17 Rey Jose Luis Video data transmission method and apparatus
US20050267779A1 (en) * 2004-05-31 2005-12-01 Samsung Electronics Co., Ltd. Method, apparatus, and medium for servicing clients in remote areas
US7281213B2 (en) * 2003-07-21 2007-10-09 Landmark Graphics Corporation System and method for network transmission of graphical data through a distributed application

Family Cites Families (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6195391B1 (en) * 1994-05-31 2001-02-27 International Business Machines Corporation Hybrid video compression/decompression system
US5758342A (en) * 1995-01-23 1998-05-26 International Business Machines Corporation Client server based multi-processor file system wherein client files written to by a client processor are invisible to the server
US6624761B2 (en) 1998-12-11 2003-09-23 Realtime Data, Llc Content independent data compression method and system
CA2301996A1 (en) 2000-03-13 2001-09-13 Spicer Corporation Wireless attachment enabling
ES2515090T3 (en) 2002-07-31 2014-10-29 Koninklijke Philips N.V. Method and apparatus for encoding a digital video signal
US7242400B2 (en) * 2002-11-13 2007-07-10 Ati Technologies Ulc Compression and decompression of data using plane equations
US8274909B2 (en) * 2009-03-26 2012-09-25 Limelight Networks, Inc. Conditional protocol control
GB0716158D0 (en) * 2007-08-17 2007-09-26 Imagination Tech Ltd Data compression
US8150175B2 (en) * 2007-11-20 2012-04-03 General Electric Company Systems and methods for image handling and presentation
CN102047612A (en) * 2008-03-24 2011-05-04 惠普开发有限公司 Image-based remote access system
JP2009290552A (en) 2008-05-29 2009-12-10 Fujifilm Corp Motion picture compressing apparatus and motion picture compression program
US8441494B2 (en) * 2009-04-23 2013-05-14 Vmware, Inc. Method and system for copying a framebuffer for transmission to a remote display
GB0916924D0 (en) 2009-09-25 2009-11-11 Advanced Risc Mach Ltd Graphics processing systems
NO331356B1 (en) 2009-10-16 2011-12-12 Cisco Systems Int Sarl Methods, computer programs and devices for encoding and decoding video
KR20110060181A (en) 2009-11-30 2011-06-08 한국전자통신연구원 Apparatus and method for lossless/near-lossless image compression
US9519728B2 (en) 2009-12-04 2016-12-13 Time Warner Cable Enterprises Llc Apparatus and methods for monitoring and optimizing delivery of content in a network
CN102741830B (en) 2009-12-08 2016-07-13 思杰系统有限公司 For the system and method that the client-side of media stream remotely presents

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050036546A1 (en) * 2001-10-05 2005-02-17 Rey Jose Luis Video data transmission method and apparatus
US20030120747A1 (en) * 2001-12-20 2003-06-26 Samsung Electronics Co., Ltd. Thin client network system and data transmitting method therefor
US7281213B2 (en) * 2003-07-21 2007-10-09 Landmark Graphics Corporation System and method for network transmission of graphical data through a distributed application
US20050267779A1 (en) * 2004-05-31 2005-12-01 Samsung Electronics Co., Ltd. Method, apparatus, and medium for servicing clients in remote areas

Also Published As

Publication number Publication date
GB201322404D0 (en) 2014-02-05
US8754900B2 (en) 2014-06-17
KR101898565B1 (en) 2018-09-13
US20110310105A1 (en) 2011-12-22
KR20140079350A (en) 2014-06-26
GB2510056B (en) 2017-12-27
GB2510056A (en) 2014-07-23

Similar Documents

Publication Publication Date Title
US8754900B2 (en) Processing of graphics data of a server system for transmission
US20170365237A1 (en) Processing a Plurality of Threads of a Single Instruction Multiple Data Group
US9640150B2 (en) Selecting data of a server system for transmission
EP3274841B1 (en) Compaction for memory hierarchies
US10719447B2 (en) Cache and compression interoperability in a graphics processor pipeline
EP3385838B1 (en) Apparatus and method for remote display and content protection in a virtualized graphics processing environment
US10282811B2 (en) Apparatus and method for managing data bias in a graphics processing architecture
CN108369748B (en) Method and apparatus for load balancing in ray tracing architecture
US10115223B2 (en) Graphics apparatus including a parallelized macro-pipeline
US10282808B2 (en) Hierarchical lossless compression and null data support
US9705526B1 (en) Entropy encoding and decoding of media applications
WO2016078069A1 (en) Apparatus and method for efficient graphics processing in virtual execution environment
WO2017105745A1 (en) Method and apparatus for color buffer compression
US11341212B2 (en) Apparatus and method for protecting content in virtualized and graphics environments
JP2017526036A (en) Method and apparatus for updating a shader program based on a current state
WO2017052884A1 (en) Supporting data conversion and meta-data in a paging system
WO2017074377A1 (en) Boosting local memory performance in processor graphics
US9830676B2 (en) Packet processing on graphics processing units using continuous threads
WO2017172032A1 (en) System and method of caching for pixel synchronization-based graphics techniques
US10332278B2 (en) Multi-format range detect YCoCg compression
WO2017116779A1 (en) A method of color transformation using at least two hierarchical lookup tables (lut)
CN109219832B (en) Method and apparatus for frame buffer compression

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11867923

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 1322404

Country of ref document: GB

Kind code of ref document: A

Free format text: PCT FILING DATE = 20111214

WWE Wipo information: entry into national phase

Ref document number: 1322404.3

Country of ref document: GB

ENP Entry into the national phase

Ref document number: 20147001108

Country of ref document: KR

Kind code of ref document: A

122 Ep: pct application non-entry in european phase

Ref document number: 11867923

Country of ref document: EP

Kind code of ref document: A1