WO2018049321A1

WO2018049321A1 - Method and systems for displaying a portion of a video stream with partial zoom ratios

Info

Publication number: WO2018049321A1
Application number: PCT/US2017/050945
Authority: WO
Inventors: Jeffrey Allen Cooper; Kumar Ramaswamy; John Richardson
Original assignee: Vid Scale, Inc.
Priority date: 2016-09-12
Filing date: 2017-09-11
Publication date: 2018-03-15

Abstract

Systems and methods are described to enable video clients to zoom in to a region or object of interest without substantial loss of resolution. In an exemplary method, a server transmits a manifest, such as a DASH MPD, to a client device. The manifest identifies a plurality of sub-streams, where each sub-stream represents a respective spatial portion of a source video. The manifest also includes information associating an object of interest with a plurality of the spatial portions. To view high-quality zoomed video, the client requests the sub-streams that are associated with the object of interest and renders the requested sub-streams. In some embodiments, different sub-streams are available with different zoom ratios.

Description

METHOD AND SYSTEMS FOR DISPLAYING A PORTION OF A VIDEO STREAM WITH PARTIAL

ZOOM RATIOS

CROSS-REFERENCE TO RELATED APPLICATIONS

[0001 ] The present application is a non-provisional filing of, and claims benefit under 35 U.S.C. §119(c) from, U.S. Provisional Patent Application Serial No. 62/393,555, filed September 12, 2016, entitled "METHOD AND SYSTEMS FOR DISPLAYING A PORTION OF A VIDEO STREAM WITH PARTIAL ZOOM RATIOS", which is incorporated herein by reference in its entirety.

BACKGROUND

[0002] Digital video signals are commonly characterized by parameters such as i) resolution (luma and chroma resolution or horizontal and vertical pixel dimensions), ii) frame rate, and iii) dynamic range or bit depth (bits per pixel). The resolution of the digital video signals has increased from Standard Definition (SD) through 8K-Ultra High Definition (UHD). The other digital video signal parameters have also increased from 30 frames per second (fps) up to 240 fps, and the bit depth has increased from 8 bit to 10 bit. To transmit digital video signals over a network, MPEG/ITU standardized video compression has undergone several generations of successive improvements in compression efficiency, including MPEG2, MPEG4/H.264, and HEVC/H.265. The technology to display the digital video signals on a consumer device, such as a television or mobile phone, has also increased correspondingly.

[0003] Consumers requesting higher quality digital video on network-connected devices face more bandwidth constraints from the video content delivery network. In an effort to mitigate the effects of bandwidth constraints, several solutions have emerged. Video content is initially captured at a higher resolution, frame rate, and dynamic range. For example, a 4:2:2, 10 bit HD video content is often down-resolved to 4:2:0, 8 bit video for distribution. The digital video is encoded and stored at multiple resolutions at a server, and these versions at varying resolutions are made available for retrieval, decoding and rendering by clients with possibly varying capabilities. The digital video gets encoded and stored at multiple resolutions at a server. Adaptive bit rate (ABR) further addresses network congestion. In ABR, a digital video is encoded at multiple bit rates (e.g. choosing the same or multiple lower resolutions, lower frame rates, etc.) and is made available at a server. The client device requests a different bit rate for consumption at periodic intervals based on its calculated available network bandwidth or local computing resources. SUMMARY

[0004] Systems and methods are described herein for enabling a viewer of streaming video to retrieve and view a zoomed-in version of a spatial portion of the video. In an exemplary embodiment, a client device receives first video stream data comprising (i) a first video and (ii) information identifying at least one object of interest in the first video. The information identifying the object of interest may be provided in, for example, a manifest file. A user of the client device may select a particular object of interest through a user interface and may further select, in some embodiments, a desired zoom ratio. In response to a user selection of the object of interest, the client device sends to a server a request for a zoomed video. The request for the zoomed video identifies the selected object of interest and provides additional parameters relating to the zoomed video, which may include one or more of (i) a requested zoom ratio, (ii) pixel dimensions of a display region, and/or (iii) a requested source video resolution. Based on the request, a server generates and/or selects a zoomed video responsive to the request and sends the zoomed video to the client device. The zoomed video may include a plurality of separately-decodable slices. The client device receives the zoomed video and causes display of the zoomed video in the display region, e.g. as a full-screen zoomed video or as a picture-in-picture inset video.

[0005] The digital video may be encoded and stored at multiple resolutions at a server, and these versions at varying resolutions are made available for retrieval, decoding and rendering by clients with possibly varying capabilities. The server may make available additional metadata so that clients may request and receive data sufficient to decode and render one or more objects or areas of interest at a high resolution and/or a zoomed scale, where the spatial support for the objects or areas of interest may vary in time.

[0006] In an exemplary method, a server transmits a manifest, such as a DASH MPD, to a client device. The manifest identifies at least one unzoomed stream representing an unzoomed version of a source video. The manifest further identifies a plurality of sub-streams, where each sub-stream represents a respective spatial portion of the source video. The server also transmits, to the client device, information associating at least one object of interest with a plurality of the spatial portions. This information may be provided in the manifest. The server receives, from the client device, a request for at least one of the sub-streams. In response, the server transmits the requested sub-streams to the client device. The sub-streams may be encoded at a higher resolution than the unzoomed stream, allowing for higher-quality video when a client device zooms in on an object of interest represented in the sub-streams.

[0007] The information that associates the at least one object of interest with a plurality of the spatial portions may be provided by including, in the manifest, a syntax element for each sub-stream that identifies at least one object of interest associated with the respective sub-stream.

[0008] In some embodiments, the server also transmits to the client a render point for the object of interest. In instances where the object of interest encompasses less than the entirety of the sub-streams, the render point may be used to indicate which portions of the sub-streams are to be displayed. For example, the render point may represent coordinates of one or more corners of a rectangular region of interest, where the rectangular region of interest is smaller than a complete region represented by all of the sub-streams. The rectangular region of interest is displayed, while portions of the sub-streams that are outside of the rectangular region of interest may not be displayed.

[0009] The rendering reference points may be communicated to the client device. For example, rendering reference points may be transmitted in-band as part of the video streams or video segments, or as side information sent along with the video streams or video segments. One or more rendering reference points may be transmitted in-band in a video stream, such as in an unzoomed stream or in one or more sub- streams. Alternately the rendering reference points may be specified in an out-of-band communication (e.g. as metadata in a manifest such as a DASH MPD).

[0010] In some embodiments, the sub-streams are encoded for adaptive bit rate (ABR) streaming, for example with at least two sub-streams with different bitrates being available for at least some of the spatial portions. The client may select which sub-stream to request based on network conditions.

[0011 ] In an exemplary client-side method, a video client receives a manifest, where the manifest identifies an unzoomed stream representing an unzoomed version of a source video. The manifest also identifies a plurality of sub-streams, where each sub-stream represents a respective spatial portion of the source video. The client further receives information associating at least one object of interest with a plurality of the spatial portions. The client device receives a selection (e.g. a user selection entered through a user interface device such as a remote control) of one of the objects of interest. The client device identifies the spatial portions associated with the selected object of interest and retrieves a representative sub-stream for each of the spatial portions. Where there is more than one representative sub-stream for a spatial portion (e.g. with different bitrates), the client device may select which of the representative sub-streams to retrieve based on network conditions. The client device then causes display of a zoomed version of the object of interest by rendering the retrieved sub-streams. The display of the zoomed version may be provided by the client device itself (e.g. on a built-in screen), or the client device may transmit uncompressed video to an external display device (such as a television or monitor).

BRIEF DESCRIPTION OF THE DRAWINGS

[0012] A more detailed understanding may be had from the following description, presented by way of example in conjunction with the accompanying drawings.

[0013] FIG. 1A depicts an example communications system in which one or more disclosed embodiments may be implemented. [0014] FIG. 1 B depicts an example client device that may be used within the communications system of FIG. 1A.

[0015] FIG. 1 C depicts an example network entity 190, that may be used as a video server in some embodiments.

[0016] FIG. 2 depicts an example video encoding and distribution system.

[0017] FIG. 3 depicts example screen resolutions.

[0018] FIG. 4 schematically depicts ABR encoding.

[0019] FIG. 5 depicts an example video coding system and distribution system, according to an embodiment.

[0020] FIG. 6 depicts example coding resolutions, in accordance with an embodiment.

[0021 ] FIG. 7 depicts an example of a video zoom operation, in accordance with an embodiment.

[0022] FIG. 8 depicts a second example of video zoom operation, in accordance with an embodiment.

[0023] FIG. 9 depicts an example of a digital video with an object of interest, in accordance with an embodiment.

[0024] FIG. 10 is a message sequence diagram depicting encoding and delivery of content to a client in accordance with an embodiment.

[0025] FIG. 11 is a message sequence diagram depicting a second example of encoding and delivery of content to a client in accordance with an embodiment.

[0026] FIG. 12 is a message sequence diagram depicting an example communications process, in accordance with an embodiment.

[0027] FIG. 13 illustrates a video having a plurality of spatial portions, at least some of the spatial portions having associated sub-streams to enable zoomed display of an object of interest.

[0028] FIG. 14 is a schematic illustration of a zoom coding method employing client-side scaling.

[0029] FIG. 15 is a schematic illustration of an embodiment employing server-side generation of pre- scaled video streams.

[0030] FIG. 16 is a schematic illustration of another embodiment employing server-side generation of pre-scaled video streams.

[0031 ] FIG. 17 is a message-flow diagram illustrating exchange of information in some embodiments. DETAILED DESCRIPTION

[0032] A detailed description of illustrative embodiments will now be provided with reference to the various Figures. Although this description provides detailed examples of possible implementations, it should be noted that the provided details are intended to be by way of example and in no way limit the scope of the application.

Exemplary Network.

[0033] The systems and methods relating to video compression may be used with the wired and wireless communication systems described with respect to FIG. 1A. As an initial matter, these wired and wireless systems will be described.

[0034] FIG. 1 A is a diagram of an example communications system 100 in which one or more disclosed embodiments may be implemented. The communications system 100 may be a multiple access system that provides content, such as voice, data, video, messaging, broadcast, and the like, to multiple wireless users. The communications system 100 may enable multiple wired and wireless users to access such content through the sharing of system resources, including wired and wireless bandwidth. For example, the communications systems 100 may employ one or more channel-access methods, such as code division multiple access (CDMA), time division multiple access (TDMA), frequency division multiple access (FDMA), orthogonal FDMA (OFDMA), single-carrier FDMA (SC-FDMA), and the like. The communications systems 100 may also employ one or more wired communications standards (e.g.: Ethernet, DSL, radio frequency (RF) over coaxial cable, fiber optics, and the like.

[0035] As shown in FIG. 1A, the communications system 100 may include client devices 102a, 102b, 102c, and/or 102d, Radio Access Networks (RAN) 103/104/105, a core network 106/107/109, a public switched telephone network (PSTN) 108, the Internet 110, and other networks 112, and communication links 115/116/117, and 119, though it will be appreciated that the disclosed embodiments contemplate any number of client devices, base stations, networks, and/or network elements. Each of the client devices 102a, 102b, 102c, 102d may be any type of device configured to operate and/or communicate in a wired or wireless environment. By way of example, the client device 102a is depicted as a tablet computer, the client device 102b is depicted as a smart phone, the client device 102c is depicted as a computer, and the client device 102d is depicted as a television.

[0036] The communications systems 100 may also include a base station 114a and a base station 114b. Each of the base stations 114a, 114b may be any type of device configured to wirelessly interface with at least one of the WTRUs 102a, 102b, 102c, 102d to facilitate access to one or more communication networks, such as the core network 106/107/109, the Internet 110, and/or the networks 112. By way of example, the base stations 114a, 114b may be a base transceiver station (BTS), a Node-B, an eNode B, a Home Node B, a Home eNode B, a site controller, an access point (AP), a wireless router, and the like. While the base stations 114a, 114b are each depicted as a single element, it will be appreciated that the base stations 114a, 114b may include any number of interconnected base stations and/or network elements. [0037] The base station 114a may be part of the RAN 103/104/105, which may also include other base stations and/or network elements (not shown), such as a base station controller (BSC), a radio network controller (RNC), relay nodes, and the like. The base station 114a and/or the base station 114b may be configured to transmit and/or receive wireless signals within a particular geographic region, which may be referred to as a cell (not shown). The cell may further be divided into sectors. For example, the cell associated with the base station 114a may be divided into three sectors. Thus, in one embodiment, the base station 114a may include three transceivers, i.e., one for each sector of the cell. In another embodiment, the base station 114a may employ multiple-input multiple output (MIMO) technology and, therefore, may utilize multiple transceivers for each sector of the cell.

[0038] The base stations 114a, 114b may communicate with one or more of the client devices 102a, 102b, 102c, and 102d over an air interface 115/116/117, or communication link 119, which may be any suitable wired or wireless communication link (e.g., radio frequency (RF), microwave, infrared (IR), ultraviolet (UV), visible light, and the like). The air interface 115/116/117 may be established using any suitable radio access technology (RAT).

[0039] More specifically, as noted above, the communications system 100 may be a multiple access system and may employ one or more channel-access schemes, such as CDMA, TDMA, FDMA, OFDMA, SC-FDMA, and the like. For example, the base station 114a in the RAN 103/104/105 and the client devices 102a, 102b, 102c may implement a radio technology such as Universal Mobile Telecommunications System (UMTS) Terrestrial Radio Access (UTRA), which may establish the air interface 115/116/117 using wideband CDMA (WCDMA). WCDMA may include communication protocols such as High-Speed Packet Access (HSPA) and/or Evolved HSPA (HSPA+). HSPA may include High-Speed Downlink Packet Access (HSDPA) and/or High-Speed Uplink Packet Access (HSUPA).

[0040] In another embodiment, the base station 114a and the client devices 102a, 102b, 102c may implement a radio technology such as Evolved UMTS Terrestrial Radio Access (E-UTRA), which may establish the air interface 115/116/117 using Long Term Evolution (LTE) and/or LTE-Advanced (LTE-A).

[0041] In other embodiments, the base station 114a and the client devices 102a, 102b, 102c may implement radio technologies such as IEEE 802.16 (i.e., Worldwide Interoperability for Microwave Access (WiMAX)), CDMA2000, CDMA2000 1X, CDMA2000 EV-DO, Interim Standard 2000 (IS-2000), Interim Standard 95 (IS-95), Interim Standard 856 (IS-856), Global System for Mobile communications (GSM), Enhanced Data rates for GSM Evolution (EDGE), GSM EDGE (GERAN), and the like.

[0042] The base station 114b in FIG. 1 A may be a wired router, a wireless router, Home Node B, Home eNode B, or access point, as examples, and may utilize any suitable wired transmission standard or RAT for facilitating wireless connectivity in a localized area, such as a place of business, a home, a vehicle, a campus, and the like. In one embodiment, the base station 114b and the client devices 102c, 102d may implement a radio technology such as IEEE 802.11 to establish a wireless local area network (WLAN). In another embodiment, the base station 114b and the client devices 102c, 102d may implement a radio technology such as IEEE 802.15 to establish a wireless personal area network (WPAN). In yet another embodiment, the base station 114b and the client devices 102c, 102d may utilize a cellular-based RAT (e.g., WCDMA, CDMA2000, GSM, LTE, LTE-A, and the like) to establish a picocell or femtocell. In yet another embodiment, the base station 114b communicates with client devices 102a, 102b, 102c, and 102d through communication links 119. As shown in FIG. 1A, the base station 114b may have a direct connection to the Internet 110. Thus, the base station 114b may not be required to access the Internet 110 via the core network 106/107/109.

[0043] The RAN 103/104/105 may be in communication with the core network 106/107/109, which may be any type of network configured to provide voice, data, applications, and/or voice over internet protocol (VoIP) services to one or more of the client devices 102a, 102b, 102c, 102d. As examples, the core network 106/107/109 may provide call control, billing services, mobile location-based services, pre-paid calling, Internet connectivity, video distribution, and the like, and/or perform high-level security functions, such as user authentication. Although not shown in FIG. 1A, it will be appreciated that the RAN 103/104/105 and/or the core network 106/107/109 may be in direct or indirect communication with other RANs that employ the same RAT as the RAN 103/104/105 or a different RAT. For example, in addition to being connected to the RAN 103/104/105, which may be utilizing an E-UTRA radio technology, the core network 106/107/109 may also be in communication with another RAN (not shown) employing a GSM radio technology.

[0044] The core network 106/107/109 may also serve as a gateway for the client devices 102a, 102b, 102c, 102d to access the PSTN 108, the Internet 110, and/or other networks 112. The PSTN 108 may include circuit-switched telephone networks that provide plain old telephone service (POTS). The Internet 110 may include a global system of interconnected computer networks and devices that use common communication protocols, such as the transmission control protocol (TCP), user datagram protocol (UDP) and IP in the TCP/IP Internet protocol suite. The networks 112 may include wired and/or wireless communications networks owned and/or operated by other service providers. For example, the networks 112 may include another core network connected to one or more RANs, which may employ the same RAT as the RAN 103/104/105 or a different RAT.

[0045] Some or all of the client devices 102a, 102b, 102c, 102d in the communications system 100 may include multi-mode capabilities, i.e., the client devices 102a, 102b, 102c, 102d may include multiple transceivers for communicating with different wired or wireless networks over different communication links. For example, the WTRU 102c shown in FIG. 1A may be configured to communicate with the base station 114a, which may employ a cellular-based radio technology, and with the base station 114b, which may employ an IEEE 802 radio technology. Exemplary Client and Server Entities.

[0046] FIG. 1 B depicts an example client device that may be used in embodiments disclosed herein. In particular, FIG. 1 B is a system diagram of an example client device 102. As shown in FIG. 1 B, the client device 102 may include a processor 118, a transceiver 120, a transmit/receive element 122, a speaker/microphone 124, a keypad 126, a display/touchpad 128, a non-removable memory 130, a removable memory 132, a power source 134, a global positioning system (GPS) chipset 136, and other peripherals 138. It will be appreciated that the client device 102 may represent any of the client devices 102a, 102b, 102c, and 102d, and include any sub-combination of the foregoing elements while remaining consistent with an embodiment. Also, embodiments contemplate that the base stations 114a and 114b, and/or the nodes that base stations 114a and 114b may represent, such as but not limited to transceiver station (BTS), a Node-B, a site controller, an access point (AP), a home node-B, an evolved home node-B (eNodeB), a home evolved node-B (HeNB), a home evolved node-B gateway, and proxy nodes, among others, may include some or all of the elements depicted in FIG. 1 B and described herein.

[0047] The processor 118 may be a general purpose processor, a special purpose processor, a conventional processor, a digital signal processor (DSP), a plurality of microprocessors, one or more microprocessors in association with a DSP core, a controller, a microcontroller, Application Specific Integrated Circuits (ASICs), Field Programmable Gate Array (FPGAs) circuits, any other type of integrated circuit (IC), a state machine, and the like. The processor 118 may perform signal coding, data processing, power control, input/output processing, and/or any other functionality that enables the client device 102 to operate in a wired or wireless environment. The processor 118 may be coupled to the transceiver 120, which may be coupled to the transmit/receive element 122. While FIG. 1 B depicts the processor 118 and the transceiver 120 as separate components, it will be appreciated that the processor 118 and the transceiver 120 may be integrated together in an electronic package or chip.

[0048] The transmit/receive element 122 may be configured to transmit signals to, or receive signals from, a base station (e.g., the base station 114a) over the air interface 115/116/117 or communication link 119. For example, in one embodiment, the transmit/receive element 122 may be an antenna configured to transmit and/or receive RF signals. In another embodiment, the transmit/receive element 122 may be an emitter/detector configured to transmit and/or receive IR, UV, or visible light signals, as examples. In yet another embodiment, the transmit/receive element 122 may be configured to transmit and receive both RF and light signals. In yet another embodiment, the transmit/receive element may be a wired communication port, such as an Ethernet port. It will be appreciated that the transmit/receive element 122 may be configured to transmit and/or receive any combination of wired or wireless signals.

[0049] In addition, although the transmit/receive element 122 is depicted in FIG. 1 B as a single element, the client device 102 may include any number of transmit/receive elements 122. More specifically, the client device 102 may employ MIMO technology. Thus, in one embodiment, the WTRU 102 may include two or more transmit/receive elements 122 (e.g., multiple antennas) for transmitting and receiving wireless signals over the air interface 115/116/117.

[0050] The transceiver 120 may be configured to modulate the signals that are to be transmitted by the transmit/receive element 122 and to demodulate the signals that are received by the transmit/receive element 122. As noted above, the client device 102 may have multi-mode capabilities. Thus, the transceiver 120 may include multiple transceivers for enabling the client device 102 to communicate via multiple RATs, such as UTRA and IEEE 802.11 , as examples.

[0051] The processor 118 of the client device 102 may be coupled to, and may receive user input data from, the speaker/microphone 124, the keypad 126, and/or the display/touchpad 128 (e.g., a liquid crystal display (LCD) display unit or organic light-emitting diode (OLED) display unit). The processor 118 may also output user data to the speaker/microphone 124, the keypad 126, and/or the display/touchpad 128. In addition, the processor 118 may access information from, and store data in, any type of suitable memory, such as the non-removable memory 130 and/or the removable memory 132. The non-removable memory 130 may include random-access memory (RAM), read-only memory (ROM), a hard disk, or any other type of memory storage device. The removable memory 132 may include a subscriber identity module (SIM) card, a memory stick, a secure digital (SD) memory card, and the like. In other embodiments, the processor 118 may access information from, and store data in, memory that is not physically located on the client device 102, such as on a server or a home computer (not shown).

[0052] The processor 118 may receive power from the power source 134, and may be configured to distribute and/or control the power to the other components in the client device 102. The power source 134 may be any suitable device for powering the WTRU 102. As examples, the power source 134 may include one or more dry cell batteries (e.g., nickel-cadmium (NiCd), nickel-zinc (NiZn), nickel metal hydride (NiMH), lithium-ion (Li-ion), and the like), solar cells, fuel cells, a wall outlet and the like.

[0053] The processor 118 may also be coupled to the GPS chipset 136, which may be configured to provide location information (e.g., longitude and latitude) regarding the current location of the client device 102. In addition to, or in lieu of, the information from the GPS chipset 136, the WTRU 102 may receive location information over the air interface 115/116/117 from a base station (e.g., base stations 114a, 114b) and/or determine its location based on the timing of the signals being received from two or more nearby base stations. It will be appreciated that the client device 102 may acquire location information by way of any suitable location-determination method while remaining consistent with an embodiment. In accordance with an embodiment, the client device 102 does not comprise a GPS chipset and does not acquire location information. [0054] The processor 118 may further be coupled to other peripherals 138, which may include one or more software and/or hardware modules that provide additional features, functionality and/or wired or wireless connectivity. For example, the peripherals 138 may include an accelerometer, an e-compass, a satellite transceiver, a digital camera (for photographs or video), a universal serial bus (USB) port, a vibration device, a television transceiver, a hands free headset, a Bluetooth® module, a frequency modulated (FM) radio unit, a digital music player, a media player, a video game player module, an Internet browser, and the like.

[0055] FIG. 1 C depicts an example network entity 190 that may be used in some embodiments, for example as an adaptive bitrate encoder, as a streaming server, or as another server. As depicted in FIG. 1 C, network entity 190 includes a communication interface 192, a processor 194, and non-transitory data storage 196, all of which are communicatively linked by a bus, network, or another communication path 198.

[0056] Communication interface 192 may include one or more wired communication interfaces and/or one or more wireless-communication interfaces. With respect to wired communication, communication interface 192 may include one or more interfaces such as Ethernet interfaces, as an example. With respect to wireless communication, communication interface 192 may include components such as one or more antennae, one or more transceivers/chipsets designed and configured for one or more types of wireless (e.g., LTE) communication, and/or any other components deemed suitable by those of skill in the relevant art. Further with respect to wireless communication, communication interface 192 may be equipped at a scale and with a configuration appropriate for acting on the network side— as opposed to the client side— of wireless communications (e.g., LTE communications, Wi-Fi communications, and the like). Thus, communication interface 192 may include the appropriate equipment and circuitry (perhaps including multiple transceivers) for serving multiple mobile stations, UEs, or other access terminals in a coverage area.

[0057] Processor 194 may include one or more processors of any type deemed suitable by those of skill in the relevant art, some examples including a general-purpose microprocessor and a dedicated DSP.

[0058] Data storage 196 may take the form of any non-transitory computer-readable medium or combination of such media, some examples including flash memory, read-only memory (ROM), and random- access memory (RAM) to name but a few, as any one or more types of non-transitory data storage deemed suitable by those of skill in the relevant art could be used. As depicted in FIG. 1 C, data storage 196 contains program instructions 197 executable by processor 194 for carrying out various combinations of the various network-entity functions described herein.

[0059] In some embodiments, the network-entity functions described herein are carried out by a network entity having a structure similar to that of network entity 190 of FIG. 1 C. In some embodiments, one or more of such functions are carried out by a set of multiple network entities in combination, where each network entity has a structure similar to that of network entity 190 of FIG. 1 C. Exemplary Adaptive Bitrate (ABR) Distribution System.

[0060] FIG. 2 depicts an example video encoding and distribution system that may be used in conjunction with embodiments disclosed herein. In particular, FIG. 2 depicts the example system 160. The example system 160 includes a full resolution input video source 162, an adaptive bitrate encoder 164, a streaming server 166, a network 168, and client devices 169. The example system 160 may be implemented in the context of the example communication system 100 depicted in FIG. 1A. For example, both the adaptive bitrate encoder 164 and the streaming server 168 may be entities in any of the networks depicted in the communication system 100. The client devices 169 may be the client devices 102a-d depicted in the communication system 100.

[0061] In accordance with an embodiment, the adaptive bitrate encoder or transcoder 164 receives an uncompressed or compressed input video stream from source 162 and encodes or transcodes the video stream into a plurality of representations 165. Each of the representations may differ from the others in a property such as resolution, frame rate, bit rate, and the like. The adaptive bitrate encoder 164 communicates the encoded video streams 165 to the streaming server 166. The streaming server 166 transmits an encoded video stream via the network to the client devices. The transmission may take place over any of the communication interfaces, such as the communication link 115/116/117 or 119.

Encoding and Distribution of Images with Different Resolutions.

[0062] FIG. 3 provides an illustration 170 of different image resolutions. The example image resolutions, listed from lowest resolution to highest resolution, include standard definition (SD), full high definition (FHD), 4K Ultra High Definition (UHD), and 8K UHD, although other resolutions may also be available. As an aid to understanding the present disclosure, higher-resolution video is generally illustrated herein using a larger rectangle (as if each pixel were of equal size), though it should be understood that display size is not necessarily correlated with image resolution.

[0063] FIG. 4 provides a schematic illustration of ABR encoding. As illustrated in FIG. 4, a 4K UHD source video is converted to three other encodings with three different resolutions. For example, the source video may be down converted to a stream ABR-1 (182), which may be, for example, a 1080p HD video; a stream ABR-2 (184), which may be, for example, a standard definition (SD) stream; and a stream ABR-3 (186), which may be a still lower-resolution stream (e.g. for use under conditions of network congestion). Each of the ABR encoded versions of the source video are transmitted to the streaming server for further transmission to client devices based in part on client device capability and network congestion. Thus, the highest spatial resolution that is available is not always delivered to the client devices.

[0064] FIG. 5 depicts an example video coding and distribution system, according to an embodiment. In particular, FIG. 5 depicts the example system 200. The example system 200 includes components analogous to those depicted in the example ABR system 160 of FIG. 2, such as a full resolution input video source 262, an adaptive bitrate encoder 264 generating traditional ABR streams 265, a streaming server 266, a network 268, and client devices 269. In addition, system 200 further includes a zoom coding encoder 204. The zoom coding encoder 204 receives a source video stream from the full resolution video source 262, either in uncompressed or a previously compressed format. Zoom coding encoder 204 encodes or transcodes the source video stream into a plurality of zoom coded sub-streams, wherein each of the zoom coded sub-streams encodes a spatial portion (e.g. a segment, a slice, a quadrant, or other division) representing an area smaller than the complete area of the overall source video. In embodiments using transcoding that convert a video stream from one compressed format to another, a decoding process is performed that brings the video back to the uncompressed domain at its full resolution followed by the re- encoding process of creating new compressed video streams representing different resolutions, bit rate or frame rates.

[0065] The zoom coded sub-streams 206 may be i) encoded at a resolution and quality of the source video stream, or similar to the ABR encoding, and/or ii) encoded into a plurality of resolutions. The zoom coded sub-streams 206 are transmitted to the streaming server 266 for further transmission to the client devices 269. In some embodiments, the ABR encoder and the zoom coding encoder are the same encoder, configured to encode the source video into the ABR streams and the zoom coded sub-streams.

[0066] FIG. 6 depicts example coding resolutions, in accordance with an embodiment. In particular FIG. 6 depicts an overview of coding 300. The overview includes a digital source video 302, an ABR encoder 304, a zoom coding encoder 306, ABR streams 308-312, and zoom coded sub-streams 314-320. The digital source video 302 is depicted as having four quadrants, the top left is indicated with diagonal cross hatching, the top right is indicated with vertical and horizontal lines, the bottom left is indicated with diagonal lines, and the bottom right is indicated with dotted shading. The full resolution of the source digital video 302 is 3840 horizontally by 2160 vertically (4K χ 2K). The four quadrants are shown by way of example as the digital video source may be divided into any number of areas in any arrangement, including segments of different sizes and shapes. The digital source video 302 is received by the ABR encoder 304 and the zoom coding encoder 306. The ABR encoder 304 processes the digital source video into three different ABR streams 308, 310 and 312. Each ABR stream is of a different resolution. For example, the ABR stream 308 is encoded in 2K x 1 K (specifically 1920 χ 1080), has the highest resolution, and is depicted as the largest area. The ABR stream 312 is encoded in 500 χ 250 (specifically 480 χ 270), has the lowest resolution, and is depicted as the smallest area. While the ABR streams in example 300 vary the resolution, other attributes, including but not limited to bit rate and frames per second, can also varied by themselves on in conjunction with other digital video attributes. The zoom coded sub-streams 314, 316, 318, and 320 are each encoded in a 2K χ 1 K resolution (specifically 1920 χ 1080), matching the resolution of the corresponding regions in the digital source video 302.

[0067] In an embodiment, a client device is streaming digital video via the system 200, and a source video is being encoded (or has previously been encoded and stored at a streaming server) as depicted in FIG. 6. The client device can receive, can decode, and can display any of the ABR streams 308, 310, 312 that depict the source video encoded at varying digital video parameters. The client device can zoom in on a portion of the (e.g. decoded) traditional ABR streams 308, 310, or 312. However, the whole ABR stream is transmitted over the network, including portions which the client device may not display (e.g. portions that are outside the boundary of the display when the video is zoomed in). Also, the resulting zoomed image portion displayed on the client device is likely to appear pixelated or otherwise to display a lower resolution. Using embodiments disclosed herein, however, the client device can zoom in on a portion of the video stream by requesting one or more of the zoom coded sub-streams 314, 316, 318, and 320 corresponding to the portion of the video stream requested by the client device. The client device can, for example, request to see the top left portion of the digital video corresponding to the diagonally cross-hatched area. In response, the streaming server transmits the zoom coded sub-stream 314 over the network to the client device. Thus, the portion of the video display requested by the client device is transmitted over the network and the resulting display is of a higher quality than a zoomed-in version of an ABR stream. A separate video camera or source video is not required to provide a high-quality video stream to the client device.

[0068] To facilitate a client device receiving a zoom coded video stream via the network, the streaming server may be configured to notify the client device via a profile communication file of available streams. For example, the profile communication file may be a manifest file, a session description file, a media presentation description (MPD) file, a DASH MPD, or another suitable representation for describing the available streams.

[0069] In various embodiments, the source video may be a sports event, a replay of a sports event, an action sequence, a surveillance security video, a movie, a television broadcast, or other material.

[0070] FIG. 7 depicts examples of zooming video streams, in accordance with an embodiment. In particular, FIG. 7 depicts the process to display an area of interest 402 of the source video 302. The area of interest 402 may include an object of interest in the video scene, which may be a stationary object or a moving object. (Henceforth, the terms object of interest and area of interest are used interchangeably in the present disclosure.) As described in conjunction with FIG. 6, the source video 302 is encoded with the ABR encoder 304 producing video streams 308, 310, 312 and is further encoded with the zoom coding encoder 306 producing video streams 314, 316, 318, 320. The regions 404 and 406 represent the portion of the encoded streams associated with the area of interest of the source video 302. The displays 406 and 410 represent the video displayed on the client device by zooming in on a traditional ABR stream (406) as compared to the use of zoom coding (410). The area of interest 402 overlaps areas encoded in four different zoom coded sub-streams and has resolution dimensions of 2K x 1 K of the original 4K x 2K source video. Thus, the highest resolution available to be displayed on a client device that represents the area of interest is 2K x 1 K.

[0071] The ABR encoder 304 is able to provide a zoomed-in view of the area of interest. In this embodiment, the ABR encoder 304 produces three ABR encoded streams 308, 310, and 312. The ABR stream 308 has a resolution of 2K x 1 K and includes the region 404 that corresponds to the area of interest 402. However, the portion of the video stream corresponding to the area of interest has resolution dimensions of approximately 1 K x 500 (specifically 960 χ 540). Whether the ABR stream 308 or a separate stream representative of the region 404 is transmitted to the client device, the final displayed video 406, with resolution dimensions of approximately 1 K x 500 (specifically 960 χ 540), is at a lower resolution than the region of interest 402 in the source video 302, and displayed video 406 may be scaled for display on the client device.

[0072] The zoom coding encoder 306 is also able to provide for a zoomed-in view of the area of interest. In this embodiment, the zoom coding encoder 306 produces four zoom coded sub-streams 314, 316, 318, and 320. Each zoom coded sub-stream has the resolution of 2K x 1 K. The region 408 overlaps all four zoom coded sub-streams and has a maximum resolution available of 2K x 1 K, the same resolution dimensions available for region of interest 402 in the source video 302.

[0073] The source video 302 may be further divided into smaller portions, or slices, than the quadrants depicted. For example, the source video 302 may be divided using a grid of 8 portions horizontally and 8 portions vertically, or using a different grid of 32 portions horizontally and 16 portions vertically, or some other partitioning into portions. Slice encoding is supported by video encoding standards, such as H.264, H.265, HEVC, and the like. In embodiments where the area of interest does not overlap a region covered by an available zoom coded video sub-stream, not all available zoom coded video sub-streams are transmitted via the network to the client device. Either the client device or a network entity, such as the streaming server, can determine the appropriate subset of the available zoom coded video sub-streams to transmit to the client device to cover the area of interest. For example, if the area of interest 402 of FIG. 7 were of a smaller size and/or were shifted to the left and so did not include video from the top-right or bottom-right portions of the source video, then only zoom coded video sub-streams 314 and 316 are provided to represent the area of interest. In this case, only the streams 314 and 316 are transmitted to the client device, and streams 318 and 320 are not transmitted to the client device, in order to allow the client to decode and display the region of interest 402.

[0074] FIG. 8 depicts a second example of video zoom, in accordance with an embodiment. In particular, FIG. 8 depicts zooming in circumstances where segments of video are divided into a plurality of slices, tiles, or other divisions of area. Each slice, tile, or other division of area (collectively referred to herein as a slice) is independently decodable. Each slice may be individually requested, retrieved, and decoded by a client, in the same manner as described for the alternative zoom coded sub-streams in the preceding examples. In the example of FIG. 8, a source video 412 with resolution dimensions of 4K x 2K has an area of interest 414 that has resolution dimensions of 2K x 1 K. In this embodiment, the source video is encoded into twelve slices, six on the left side and six on the right side. The division into slices may be performed using any video codec (e.g. any video codec supporting independently decodable portions or slices) as known to those of skill in the art. Here, the area of interest overlaps eight slices, and does not include the top two and bottom two video slices. Thus, the client device can request to only receive eight of the total twelve video segments from (e.g. encoded by) the zoom coding encoder. Thus, the client device can display the area of interest in the full resolution of 2K x 1 K without scaling the video and without the need to receive all of the available zoom coded segments.

[0075] In some embodiments, the zoom coded video sub-streams 314, 316, 318, and 320 or the zoom coded slices depicted in FIG. 8 are further encoded into other bit rates that have different resolutions, frames per second, and bit depths, thus making multiple representations of each zoom coded sub-stream or segment available to be transmitted to the client device.

[0076] FIG. 9 depicts an example of a digital video with an object of interest, in accordance with an embodiment. In particular, FIG. 9 depicts an example digital video 500. The digital video 500 includes a plurality of video slices 502a, 502b, etc. As illustrated in FIG. 9, the use of zoom coded sub-streams allows a user to view a zoomed version of an object or area of interest that moves such that it overlaps different slices at different times. In an exemplary embodiment, the source video 502 has resolution dimensions of 3840 horizontally by 2160 vertically. Each of video segments 502a, 502b, etc., have approximate resolutions dimensions of 800 horizontally and 333 vertically. The source video 500 may be encoded by various ABR encoders and zoom coding encoders and provide the encoded video streams to a streaming server for further transmitting over a network to a client device.

[0077] An object of interest, depicted by a soccer ball, is located at position 504a (inside slice 502c) at a first time T1. The position of the ball may be represented by a data structure (P1 ,T1), where P1 represents the position 504a. At time T2, the object of interest is located further up and to the right (in slice 502d) at position 504b, which may be represented by (P2,T2). At time T3, the object of interest is located further up and to the right (in slice 502e) at position 504c, which may be represented by (P3,T3). In response to a user input indicating a desire to zoom in on the object of interest, a client device may initially (for viewing of time period T1) request slice 502c (and, in some embodiments, immediately neighboring slices). The client device, on receiving the requested slices, causes display of a zoomed-in stream that includes the object of interest. To continue providing a zoomed-in view of the object of interest, the client device may subsequently (for viewing of time period T2) request and display slice 502d (and, in some embodiments, immediately neighboring slices). Subsequently, to continue tracking the object of interest, the client device may subsequently (for viewing of time period T3) request and display slice 502e (and, in some embodiments, immediately neighboring slices).

[0078] Selection of the appropriate slices to show the object of interest in context (e.g. with some surrounding background) may be performed on the client device or at the streaming server. Further, the concepts in this disclosure may apply to larger objects, objects which span multiple neighboring slices, objects traversing slices at different speeds, multiple objects, source video streams segmented into smaller segments, and the like.

[0079] A rendering reference point or "render point" may be used to indicate a rendering position associated with one or more positions of the object/area of interest. The rendering reference point may, for example, indicate a position (e.g. a corner or an origin point) of a renderable region which contains the object of interest at some point in time. The rendering reference point may indicate a size or extent of the renderable region. The rendering reference point may define a bounding box which defines the location and extent of the object/area of interest or of the renderable region containing the object/area of interest. The client may use the rendering reference point information to extract the renderable region from one or multiple zoom coded sub-streams or segments, and may render the region as a zoomed region of interest on the client display. In the first set of video segments, the rendering reference point, (0, 0) is depicted in the bottom left corner of the source video 502. However, the second set of video segments has a rendering reference point of (a, b), and is depicted in the bottom left corner of slice 502f. The rendering reference points may be communicated to the client device. For example, rendering reference points may be transmitted in-band as part of the video streams or video segments, or rendering reference points may be transmitted as side information sent along with the video streams or video segments. Alternately the rendering reference points may be specified in an out-of-band communication (e.g. as metadata in a manifest such as a DASH MPD). A discrete jump in the rendering reference point from (0, 0) to (a, b) as the object transitions from (P1 , T1) to (P3, T3) will cause an abrupt change in the location of the object of interest as displayed on the client device. The rendering reference point as communicated to the client may be updated on a frame-by-frame basis, which may allow the client to continuously vary the location of the extracted renderable region, and so the object of interest may be smoothly tracked on the client display. Alternately the rendering reference point may be updated more coarsely in time, in which case the client may interpolate the rendering position between updates in order to smoothly track the object of interest when displaying the renderable region on the client display. The rendering reference point may include two parameters, a vertical distance and a horizontal distance represented by (x, y). The rendering reference points may be communicated, for example, as supplemental enhancement information (SEI) messages to the client device. [0080] At each subsequent frame, the render reference point may be updated to reflect the global object motion between each frame. When the render reference adjustment is equal to the global motion of the object of interest, the object will appear motionless (e.g. having a relatively constant position relative to the displayed region), as if the camera were panning to keep the object at the same point on the screen. When the motion of the object of interest is underestimated, the object skips backwards on the screen. Conversely, when the motion of the object of interest is overestimated, the object skips forwards between frames. Minimizing the error of the object motion results in smooth rendering.

[0081] In the above scenario, it is assumed that the video display transitions from the first set of video segments (and, in some embodiments, video segments which contain slices in the spatial neighborhood of the first set of video segments) to the second set of video segments (and, in some embodiments, video segments which contain slices in the spatial neighborhood of the second set of video segments) when the object of interest is at (P2, T2). Therefore, in this embodiment, the render reference point for each frame transmitted is adjusted (e.g. interpolated) to smoothly transition from (0, 0) to (a, b) over the time from T1 to T2. The smooth transition may be linear (e.g. moving the rendering reference point a set distance equally each frame), non-linear (e.g. moving the rendering reference point small amounts closer to time T1 , large amounts between times T1-T2 and small amounts closer to time T2), or in any other similar method. In some embodiments, the rendering reference point is transmitted as two coordinates, such as (x, y), and in other embodiments, the rendering reference point is transmitted as a differential from the previous frame.

[0082] FIG. 10 depicts an example process for encoding and delivery of content to a client using adaptive bitrate coding. In step 602, source content is communicated from a content source 604 to an encoder 606. The source content is a compressed or uncompressed stream of digital video. The encoder 606 encodes the video into several representations 608 with different bitrates, different resolutions, and or other different characteristics and transmits those representations 608 to a transport packager 610. The transport packager 610 uses the representations 608 to generate segments of, e.g., a few seconds in duration. The transport packager 610 further generates a manifest (e.g. a DASH MPD) describing the available segments. The generated manifest and the segmented files (collectively 616) are distributed to one or more edge streaming servers 614 through an origin server 612. Subsequent segments (collectively 617) are also distributed to the origin server 612 and/or the edge streaming server 614.

[0083] To view the video, a client 620 visits a web server 618, e.g. by sending an HTTP GET request 622. The web server 618 may send a response 624 that directs or redirects the client 620 to a streaming server such as the edge streaming server 614. The client thus sends a request 626 to the edge streaming server. In response, the edge streaming server sends a manifest (e.g. a DASH MPD) 628 to the client. Based on the client capabilities and network conditions, the client selects an appropriate representation of the content and issues a request 630 for an appropriate segment (e.g. the first segment of recorded content, or the most recent segment of live content). The edge streaming server responds by providing the requested segment 632 to the client. As illustrated at 634, the client may request a subsequent segment of the content (which may be at the same bitrate or a different bitrate from the segment 632), and the subsequent segment is sent to the client at 636.

[0084] FIG. 11 depicts an example process of encoding and delivery of content to a client using zoom coding. In step 702, source content is communicated from a content source 704 to an encoder 706. The source content is a compressed or uncompressed stream of digital video. The encoder 706 encodes the video into several representations 708 of the complete screen area with different bitrates, different resolutions, and or other different characteristics and transmits those representations 708 to a transport packager 710. In addition, the zoom coding encoder encodes the video into several different slice streams (e.g. streams 712, 714) representing different areas of the complete video image. Each of the streams 712 may represent a first encoded slice area of the content, with each of the streams being encoded at a different bitrate, and each of the streams 714 may represent a second encoded slice area of the content, again with each of the streams being encoded at a different bitrate. Depending on the chosen partitioning of the content into slices, additional slice streams representing various encoded bit rates for other content slices may be included, though not shown in the figure.

[0085] The transport packager 710 uses the representations 708, 712, 714 to generate segments of, e.g., a few seconds in duration. The transport packager 710 further generates a manifest (e.g. a DASH MPD) describing the available segments, including the segments that represent the entire screen and segments that represent only a slice area of the screen. The generated manifest and the segmented files (collectively 716) are distributed to one or more streaming servers such as edge streaming server 720 through an origin server 718.

[0086] To view the video, a client 724 visits a web server 722, e.g. by sending an HTTP GET request 726. The web server 722 may send a response 728 that directs or redirects the client 724 to the edge streaming server 720. The client thus sends a request 730 to the edge streaming server. In response, the edge streaming server sends a manifest (e.g. a DASH MPD) 732 to the client. Based on the client capabilities and network conditions, the client selects an appropriate representation of the normal (unzoomed) content and issues a request 734 for an appropriate segment (e.g. the first segment of recorded content, or the most recent segment of live content). The edge streaming server responds by providing the requested unzoomed segment 736 to the client. The client may request, receive, parse, decode and display additional unzoomed segments in addition to the segment 736 shown in the diagram.

[0087] In response to user input indicating selection of an object or region of interest, the client device 724 may issue a request 738 for one or more sub-streams that are associated with an object or region of interest. In some embodiments, the client device identifies the streams to be requested based on, e.g. information such as render point information which may be provided in the manifest or in-band in the video streams. In other embodiments, the client device identifies the object or region of interest and forms a request based on the identified object or region of interest, and the identification of appropriate streams for that object or region of interest is made at the server side. Such server-identified streams or segments may then be returned by the server to the client in response to the request.

[0088] The appropriate slice stream or streams 740 are sent to the client device 724, and the client device decodes and combines the streams 740 to provide a zoomed version of the objector region of interest. The client may request and receive the stream or streams 740 at a bitrate appropriate to the capabilities of the client device and the current network conditions using ABR techniques.

[0089] In accordance with an embodiment, more than one object of interest can be tracked and displayed. For example, at a certain point in time, a first object may be associated with a first set of slices such that a client must retrieve the slices of the first set in order to recover and render a view (e.g. a zoomed view) of the first object. At the same time, a second object may be associated with a second set of slices such that the client must retrieve the slices of the second set in order to recover and render a view (e.g. a zoomed view) of the second object. The first set of slices may be completely different than, partially overlapping with, or fully overlapping with the second set of slices. Moreover, the amount of overlap between the first and second set of slices may change with time as the underlying objects move. The render point information for each set may be independently encoded for each such set and may be contained in different slices or the same slice. The receiver must retrieve the appropriate rendering point (corresponding to a current zoom coded object) and apply the render point offset accordingly.

[0090] As the object, or objects, of interest move through the screen, there may be changes to the sets of slices that represent the new zoomed view. The manifest may be updated to signal such changes, or a completely new manifest can be created. The client device may use the updated manifest information to appropriately request the set of slices that represent the updated view. Alternately the changes may be signaled in-band in the video stream, or in side information such as a render point metadata file retrievable by the client.

[0091] The request for streams may correspond to a particular area of the video or correspond to an object ID. For example, if the video source is video of a soccer (a.k.a. football) game, examples of different objects may include a goal box, a ball, or a player. The objects may be detected via any means, including image detection (e.g. detecting the rectangular dimensions of a goal, the round shape of a ball, or numbers on a uniform, etc.), spatial information encoded in the source video (e.g. correlation between camera position and a stationary object's position, sensor information being transmitted from a soccer ball, etc.) or any other similar method. In this embodiment, a client device can request to receive zoom coded sub-streams associated with an object of interest such as the ball. The request may also include a magnitude of context to include with the ball, such that the ball comprises a certain percentage of the display, or the like. The magnitude of context may be specified as a rendering area size specified as a horizontal dimension and a vertical dimension in pixels, for example. A network entity, or the client device, can determine the appropriate zoom coded sub-streams to frame the object of interest, and may communicate to the streaming server which zoom coded sub-streams to send to the client device. The client device receives the zoom coded sub-streams and the appropriate rendering information to display the zoomed-in video stream.

[0092] The spatial regions of the object, or objects, of interest may be determined at the streaming server, at the client device, a separate network entity, or a combination of the above examples. In one embodiment, the server-side creates the arbitrary spatial region, such as mapping the streams to slices for encoding. In another embodiment, the client-device side creates or assembles the arbitrary spatial regions by, for example, decoding more than one spatial content portion (e.g. more than one slice or video segment) from the server and combining parts of the decoded spatial content portions in order to create or assemble the desired spatial region. In yet another embodiment, a hybrid server-side/player-side creates the arbitrary spatial regions.

[0093] Other variations can be applied to the various zoom coding examples, including the following:

o Zoom-coded regions can include variations of frame rate, chroma resolution, and bit depth characteristics. For example, the ABR Streams for Each Segment as shown in FIG. 8 may be encoded using such variations.

o Various techniques for storage and transport layer packaging of zoom coded sequences may be applied. For example, the zoom coded sub-streams or segments may be packaged using MPEG-2 transport stream segments, or using an ISO Base Media file format.

o Zoom-coded sequences or segments may be created with additional bit depth for specific spatial regions. The regions of enhanced bit depth may correspond to the areas or objects of interest, for example.

o Two-way interaction may be used to optimize for client side display capabilities. o Creation of special effects may be provided, such as slow motion and zoom.

Exemplary Use of DASH.

[0094] FIG. 12 depicts an example communications process, in accordance with an embodiment. In particular, FIG. 12 depicts a DASH-Type exchange between a streaming server 802 and a client device 804 to receive a zoom coded sub-stream. The client device 804 sends a request 808 to a web server 806 for streaming services, and the web server at 810 directs or redirects the client device 804 to the edge streaming server 802. In response to a request 812 from the client device 804, the edge streaming server sends an extended MPD 814, with zoom coded information, to the client device 804. The client device parses the extended MPD in order to determine what objects/areas of interest are available and also to determine in step 816 the slices to request for each object. The client sends requests for the appropriate slices (e.g. requesting a first slice at 818 and requesting a second slice at 820). The requested slices may be a subset of the available slices, and/or may be requested by requesting the video segments which contain the slices. The edge streaming server sends each requested slice of the video stream to the client device (e.g. sending the first slice at 822 and the second slice at 824), and the client device renders the zoom coded frame for the specific object in step 826. In step 828, the client device causes display of the zoom coded frame, e.g. by displaying the frame on a built-in screen or by transmitting information representing the frame to an external display. Composition of the zoom coded frame by the client may comprise receiving, decoding, and/or rendering the requested slices for an object/area of interest. The client may render a subset of the pixels of the requested slices, as determined by a current render point and/or a rendering area size or context magnitude indication for the object. The DASH-type message may include additional extensions to support tracking of multiple objects with overlapping slices.

[0095] Zoom coding may be enabled using MPEG-DASH. MPEG-DASH (ISO/I EC 23009-1 :2014) is an ISO standard that defines an adaptive streaming protocol for media delivery over Internet Protocol (IP) Networks. An exemplary process for performing zoom coding using MPEG-DASH may be performed as follows. A determination is made that a zoom coded representation is available and how to access that content. This information is signaled to the DASH client using syntax in the MPD descriptor. Per Amendment 2 of the ISO DASH standard, the MPD may provide a "supplementary stream." This supplementary stream may be utilized for zoom coding. A spatial relationship descriptor (SRD) syntax element can describe a spatial portion of an image (see Annex H of ISO 23009-1 AM2).

[0096] An object render point provided in the video bitstream is used to render the zoomed section for the object being tracked. The zoomed section may, for example, be rendered with uniform motion or with interpolated motion as described herein. The object (or objects) render point may be sent in user data for one or more slices as an SEI message. For example, the SEI message may be as defined in a video coding standard such as AVC/H.264 or HEVC/H.265. Zero or more objects may be signaled per slice.

[0097] Exemplary Slice User Data for object render points includes the following parameters:

ObjectJD: Range 0-255. This syntax element provides a unique identifier for each object.

Object_x_position[n]: For each object ID n, the x position of the object bounding box.

Object_y_position[n]: For each object ID n, the y position of the object bounding box.

Object _x_size_in_slice[n] : For each object ID n, the x_dimension of the object bounding box.

Object_y_size_in_slice[n] : For each object ID n, the y_dimension of the object bounding box. [0098] The object bounding box represents a rectangular region that encloses the object. The object bounding box may also enclose some amount of surrounding context to be rendered with the object. The x,y position may indicate, for example, the upper left corner position of the object bounding box. The object position and size may pertain to the portion of the object contained in the slice that contains the user data.

[0099] The video depicted in FIG. 9 (described in further detail above) may be used in the implementation of zoom coding of a video with resolution of 4K, or 3840x2160. The video of FIG. 9 is illustrated in FIG. 13, with each slice (spatial portion) of the video being assigned a number ranging from 1 to 30. The 4K video is encoded with, e.g., H.264 compression into thirty independent H.264 slices. Each slice is 768x360 pixels. The native full image is scaled down to HD 1920x1080 and provided as the normal unzoomed stream for a client device to display. Additionally, each of the thirty segments is encoded in the native 768x360 resolution. The encoder tracks an object as shown in the figure moving across the scene. The subset of slices is signaled to the client via the MPD SRD descriptor. For each slice, an Adaptation Set with SRD descriptor is provided.

[0100] In order to support tracking and zooming of more than one object, the SRD descriptor syntax may be extended in some embodiments to allow the client device to determine which slices are needed for rendering the given objects. The ObjectJD (consistent with the slice SEI information) is included in the SRD, added to the end of the "value" syntax for SRD. If multiple objects are associated with the slice, then multiple ObjectJD's may be added to the end of the SRD value syntax. When OBJECTJDs are used, the SpatiaLSetJD may also be in the stream. After the Spatial_Set_ID parameter in the SRD, up to 256 ObjectJDs can be included. An example of two SupplementalProperty SRD elements are shown below.

Example 1 : SRD with 1 object (ObjectJD 5)

<SupplementalProperty schemeldUri="urn:mpeg:dash:srd:2014"

value="0,x16,y16,768,360,3840,2160,1 ,57>

Example 2: SRD with 5 objects (ObjectJD 2, 4, 7, 9, and 14)

<SupplementalProperty schemeldUri="urn:mpeg:dash:srd:2014"

value="0,x16,y16,768,360,3840,2160,1 ,2,4,7,9,147>

[0101] As illustrated in these examples, the SupplementalProperty syntax element may be used to provide an association between particular spatial portions of a video (e.g. particular slices) and particular objects of interest. Thus, Example 1 above provides an association between the slice numbered 16 and an object numbered 5, and Example 2 above provides an association between the slice numbered 16 and objects numbered 2, 4, 7, 9, and 14. To provide a zoomed view of a selected object (e.g. a user-selected object), a client device may request all slices that are associated with the selected object.

[0102] In accordance with an embodiment, a full SRD example is described. In the full SRD, xM,yM (e.g. x16,y16), refers to the x,y position of the slice origin. In an actual SRD, these may be pixel values. For example, x16,y16 would be equal to 0, 1080.

[0103] As each frame is encoded (e.g. with H.264, HEVC, or the like) a user data entry may be inserted for each slice and may be used to provide the object position information. The client uses this information provide a smoothly rendered picture with the object being tracked on the client display. As the object moved across and/or into different slices, the MPD may be updated with a new list of slices for the client to access. This MPD change may include a sequence access point (SAP) in the DASH segments.

[0104] Following, is an example of an MPD with zoomed content slices 16, 17, 18, 21 , 22, 23, 26, 27, 28 associated with a single tracked object:

<?xml version="1.0" encoding="UTF-8"?>

<MPD

xmlns="urn:mpeg:dash:schema:mpd:2011 "

type="static"

mediaPresentationDuration="PT10S"

minBufferTime="PT1 S"

profiles="urn:mpeg:dash:profile:isoff-on-demand:2011 "> <Programlnformation>

<Title>Example of a DASH Media Presentation Description using Spatial Relationship Description to indicate that a video is a zoomed part of another< Title>

</Programlnformation>

<!-- Panorama Video ->

<SupplementalProperty schemeldUri="urn:mpeg:dash:srd:2014" value="0,0,0,1920,1080,1920,1080"/> Representation mimeType="video/mp4" codecs="avd .42c033" width="1920" height="1080" bandwidth="1055223" startWithSAP="1 ">

<BaseURL> panorama_video.mp4</BaseURL>

</Representation>

</AdaptationSet>

<!-- Zoomed Video ->

<BaseURL> zoomed_video_slice16.mp4</BaseURL>

</Representation>

</AdaptationSet>

<SupplementalProperty schemeldUri="urn:mpeg:dash:srd:2014"

value="0,x17,y17,768,360,3840,2160,1 ,1 "/>

<BaseURL> zoomed_video_slice17.mp4</BaseURL>

</Representation>

</AdaptationSet>

<SupplementalProperty schemeldUri="urn:mpeg:dash:srd:2014"

value="0,x18,y18,768,360,3840,2160,1 ,1 "/>

<BaseURL> zoomed_video_slice18.mp4</BaseURL>

</Representation>

</AdaptationSet>

<SupplementalProperty schemeldUri="urn:mpeg:dash:srd:2014"

value="0,x21 ,y21 ,768,360,3840,2160,1 ,1 "/>

<BaseURL> zoomed_video_slice21.mp4</BaseURL>

</Representation>

</AdaptationSet>

<SupplementalProperty schemeldUri="urn:mpeg:dash:srd:2014"

value="0,x22,y22,768,360,3840,2160,1 ,1 "/>

<BaseURL> zoomed_video_slice22.mp4</BaseURL> <SegmentBase indexRangeExact- 'true" indexRange="838-989"/>

</Representation>

</AdaptationSet>

<SupplementalProperty schemeldUri="urn:mpeg:dash:srd:2014"

value="0,x23,y23,768,360,3840,2160,1 ,17>

<BaseURL> zoomed_video_slice23.mp4</BaseURL>

</Representation>

</AdaptationSet>

<SupplementalProperty schemeldUri="urn:mpeg:dash:srd:2014"

value="0,x26,y26,768,360,3840,2160,1 ,17>

<BaseURL> zoomed_video_slice26.mp4</BaseURL>

</Representation>

</AdaptationSet>

<SupplementalProperty schemeldUri="urn:mpeg:dash:srd:2014"

value="0,x27,y27,768,360,3840,2160,1 ,17>

<BaseURL> zoomed_video_slice27.mp4</BaseURL>

</Representation>

</AdaptationSet>

<SupplementalProperty schemeldUri="urn:mpeg:dash:srd:2014"

value="0,x28,y28,768,360,3840,2160,1 ,1 "/>

<BaseURL> zoomed_video_slice28.mp4</BaseURL>

</Representation>

</AdaptationSet> Exemplary Uses of Partial Zoom Ratios.

[0105] As described in the present disclosure, zoom coding systems and methods enable the exploitation of high resolution, high frame or high bit depth capture of content to be repurposed to end consumers as a novel set of features. Exemplary embodiments described above provide a mapping from the highest available spatial resolution to the native display resolution of the client device. Such a mapping provides a fixed zoom ratio that depends on the relationship between the highest available resolution and the native display resolution at the client device.

[0106] In some cases, it may be desirable for a user to view a zoom coded stream for a particular object with a zoom ratio that is different from a ratio that corresponds to a direct mapping from the highest resolution (or frame rate or bit depth) to native display of the receiver. To enable zooming for other ratios, some embodiments implement a process (which may be performed at the client side or at the server side) in which a zoom region is selected and then the image is cropped or scaled to fit the display type. The zoom region may define a zoom ratio. Such embodiments are described in greater detail below. Embodiments as described below allow supplemental video to be provided with arbitrary zoom ratios that are not strictly limited to the fixed ratio between the original captured video and the display resolution.

[0107] Embodiments described herein may be used in situations where the zoom ratio desired for a specific video object is not a direct mapping from the highest resolution (or frame rate or bit depth) to native display resolution of the receiver. For example, if the camera resolution is 6K (6144x3072) and the receiver display is HD (1920*1080), then the highest native pixel zoom ratio for the receiver is 3.2 horizontal ratio and 2.84 vertical ratio, achieved by mapping each pixel of the 6K video within the region of interest to a corresponding pixel of HD video. Therefore, when the receiver requests a zoom to an object, the native 6k pixels would provide roughly 3x zoom. In many cases, however, it may be desirable to provide a zoomed video of an object of interest at a zoom ratio that is less than or greater than 3x.

[0108] Exemplary embodiments provide mechanisms to provide optimized access to any zoom ratio.

[0109] Some embodiments operate using client-side scaling. In one such embodiment, the client device sends to a server (e.g. a headend video server capable of providing zoom coded streams) a request for zoomed content that identifies (among other information) a particular desired zoom ratio. Based on the client request, the server identifies a video stream having the lowest available video resolution that is greater than the resolution that would provide the requested zoom ratio. The server conveys the identified video stream to the client. The client device receives the identified video stream and crops and/or scales the region of interest for display.

[0110] As described above in the present disclosure, various different client-side and/or server-side techniques may be used to identify and or track regions of interest, and various different techniques may be used where appropriate to convey information regarding these regions of interest to client devices. Furthermore, as described in greater detail above, video may be encoded into independently-decodable slices such that transmission of a particular region of interest may include transmission only of slices that fall within (or which intersect) the region of interest.

[0111] In a more specific example, consider a case where the display device is an HD display device with a resolution of 1920x1080 pixels and the client requests a zoomed view of a particular region of interest with a requested zoom ratio of 2.5, The client may indicate the intended display size in the request, which may be a native (e.g. full) display resolution or may be the size of a smaller area of the display in which the client intends to display the zoomed view of the content. Suppose for the current example that the intended display size is the same as the client's native display resolution, 1920x1080. The server may have access, for example, to an HD version of the video (1920x1080), a 4K version of the video (4096x2180), and a 6K version of the video (6144x3072 pixels). Using a comparison to the intended display resolution, the HD version of the video gives a native zoom ratio of one (i.e., unzoomed). The 4K version of the video gives a native zoom ratio of approximately two (4096÷1920-2, 13), which is less than the requested zoom ratio of 2.5. The 6K version of the video gives a zoom ratio of approximately 3 (8144÷1920=3.2), which is greater than the requested zoom ratio of 2.5. In this example, the 6K video is the video with the lowest resolution that still provides a native zoom ratio greater than or equal to the requested zoom ratio and is thus selected as a source from which a zoomed view of the region of interest may be derived.

[0112] Continuing with the foregoing example, a region size is selected to provide the requested zoom ratio of 2.5. Given the selection of the 6K version of the video (8144x3072) as a source, the size of the selected region to provide to the client in order to achieve the requested zoom ratio may be calculated by dividing the dimensions of the source image by the requested zoom ratio. For example, a zoom ratio of 2.5 may be attained by providing a selected region with (horizontal vertical) pixel dimensions of ((6144÷2.5) * (3072÷2.5)), which gives results of (2457.6x1228.8) and may be rounded e.g. to (2458x1229). Rounding may be done to the nearest integer pixel value, to the nearest even integer pixel value, to the nearest pixel value which is a multiple of 16, and/or the like. In an exemplary embodiment, a 2458» 1229 section of video containing (e.g. centered on) the region of interest is sent to the client device. If the server is able to encode and package content on the fly, then the 2458x1229 section may be cropped from the available 6k content version and re-encoded for delivery to the client. Otherwise the server may send a pre~encoded stream or perhaps multiple files which contain content including the 2458* 1229 section, and the server may indicate in the response what spatial subset of the content is occupied by the 2458x1229 section. For example, the server may send horizontal and vertical offsets which indicate an upper left comer of the 2458 1229 section within the larger content returned by the server, and additionally the server may send an indication of the size of the section, in this case 2458x1229. In general, the server will select the minimum available content stream or a minimum set of avaiiabie tiles which contains the section which is to be sent to the client in order to attain the requested zoom ratio. The client device receives this section of video (decoding it and/or cropping it from the larger content, if necessary) and scales the selected section to 1920x1080 for display on the HD display.

[0113] An exemplary process for client-side scaling is illustrated schematically in FIG. 14. in the example of FIG. 14, a region of interest with a 2,5x zoom factor is provided to a client. The region of interest 1402 is defined within a full-scene video 1404. The resolution of the full-scene video 1404 in this example is 6144x3072, and the region of interest 1402 has pixel dimensions 2458x1228. The full 2458x1228 region of interest is delivered to the client and the client scales (e.g. down-samples) the region of interest to a resolution of 1920x1080 for full-screen HD display 1408 of the region of interest.

[0114] Table 1 provides example values for the size (horizontal χ vertical) of the video section which may be sent to a client in order to achieve a given zoom ratio on an HD display with pixel dimensions of 1920x1080. The size of the video section depends on the available source resolutions (e.g. 4K, 6K, and 8K) and the desired zoom factors which may be requested by the client (e.g. 1.5x, 2x, 2.5x, 3x, 3.5x, 4x).

Table t

[0115] For example, to obtain a zoom ratio of 2.5x, the client may receive a 3072x1728 section of an 8K video or a 2458x1229 section of a 6K video and down-sample to 1920x1080. it may be preferable in this instance to send the section of 6K video to the client, because sending the larger section of 8K video is likely to consume a greater amount of bandwidth while providing little if any improvement in image quality due to the down-sampling.

[0116] in some embodiments, scaling is performed at the server side using techniques such as those described as follows, in an exemplary embodiment a server operates to pre-scale a high-resolution source video to a plurality of pre-sea!ed videos at different pre-determined resolution scales based on display resolutions that are common among client devices and/or a set of zoom factors which clients are expected to request. A client device sends to the server a request for zoomed content identifying a requested zoom ratio and possibly identifying the client's display resolution or the size at which the client intends to display the zoomed view of the content, in response to the reques the sea'er identifies the pre-scaled video that has the lowest resolution available that is greater than the resolution corresponding to the requested zoom ratio. The server delivers the video of the region of interest area from the identified pre-scaled video to the client device. As before, the server may send an available video stream, or possibly a set of tiles, in order to deliver to the client a section of the video which when displayed by the client at the intended resolution may achieve the requested scale factor. If needed, the server may indicate to the client what section of larger content the client should display (e.g. an offset and/or size which identifies the section within a larger spatial extent of the delivered content).

[0117] if the video delivered to the client device does not provide the requested scale factor, the client device may operate to crop and/or scale the received video of the region of interest area to obtain the desired zoom ratio. As described elsewhere in the present disclosure, the pre-scaled video may be coded into separately decodabie slices, with only the slices that overlap the region of interest being sent to the client device.

[0118] in a more specific example, consider a case where the maximum native resolution of a video available at the server is 8K (7680 χ 4320), and where the display resolution or intended zoom display window of a client is typically 1920 1080. The server may pre-scale the 8K content according to Table 2 to attain various representations of the content suitable for display by the client at various zoom factors.

Table 2,

[0119] For example, the server may obtain a 3.5x representation by scaling the 8K original version to

6720 x 3780, in which case a 1920 χ 1080 section cropped from this scaled representation may be rendered at the client to achieve an effective zoom ratio of 3.5x. As another example, the server may attain a 2.5x representation by scaling the 8K original version to 4800 χ 2700, in which case a 1920 χ 1080 section cropped from this scaled representation may be rendered at the client to achieve an effective zoom ratio of 2.5x. Note that the set of pre-scaled representations chosen by the server may make it possible for the server to deliver content at multiple common zoom ratios which may be suitable for the client to display at the client's desired display resolution of 1920 χ 1080 without requiring further scaling by the client. Additional display resolutions and/or zoom factors may be accommodated from the pre-scaled representations at the server, but in this case the client may operate to crop and/or scale before display.

[0120] FIG. 15 is a schematic illustration of a process using server-side scaling. The server 1502 may have (e.g. may have created) multiple pre-scaled content representations which correspond to different zoom ratios for an assumed display resolution of the client. The client 1504 may send a request 1506 for a zoomed view of the content (e.g. for a selected ROI) at a particular zoom ratio. The client may specify the zoom ratio in the request and possibly may specify a display resolution (e.g. 1920x1080) for the zoomed content view, if such resolution is not already known to the server. The server may select the appropriate scaled representation of the content. For example, the server may select the representation created for the smallest zoom ratio which is larger than or equal to the requested zoom ratio. The server may determine the size of the video section to return to the client. The server may return a video stream 1508 or possibly multiple tiles which contain the video section. The client may decode the received video, and may display the section of zoomed content selected by the server. If the section of zoomed content matches the display resolution (or intended display resolution) of the client, then the client may display the section of zoomed content without further scaling. Otherwise the client may scale the section of zoomed content to match the clients display resolution (or intended display resolution) before displaying the section of zoomed content.

[0121] An example embodiment with server-side video scaling is illustrated in FIG. 16. As illustrated in FIG. 16, different pre-scaled versions of video content are provided with different resolutions, e.g. 2K (HD), 4K, and 6K. These different pre-scaled versions are available to a streaming server 1602. A client device 1604 is initially receiving and displaying the HD version 1606 of the full-screen video. The client device 1604 is further receiving metadata 1608 identifying one or more objects of interest. The viewer identifies an object of interest, e.g. "Player #9," through a user interface of the client device. The client device sends a request 1610 to the streaming server requesting a stream corresponding to Player #9. The request may include additional parameters, such as information identifying the resolution at which the stream will be displayed (e.g. 720^480 for a picture-in-picture insert) and information identifying which pre-scaled video should be used as the source. In some embodiments, the request 1610 may identify a zoom ratio, and the choice of pre-scaled video to use as a source may be made based on the zoom ratio and the resolution at which the stream will be displayed. (Information identifying which pre-scaled videos are available may be provided in, for example, a manifest file sent to the client device by the server.) [0122] In the example of FIG. 18, the client device requests a stream that corresponds to Player #9, that is sourced from a 6K pre-scaled video, and that will be displayed with a resolution of 720x480. The corresponding zoomed video 1612 is provided to the user as described with respect to FIG. 15, e.g. by selecting slices of the 8K video that overlap the region of interest. The zoomed video is displayed on client 1604 along with the HD version 1606 of the full-scene video.

[0123] Traditional sports programing provides a viewer with a single view of a game as chosen by a content creator. Exemplary embodiments provide greater customization and user control in video delivery. In an exemplary use case, a viewer of a sports broadcast may issue to a client device a command (e.g. through a user interface such as a remote control or a voice interface) to provide an enhanced view of "Player #9" with a zoom factor of 5x. in response, using techniques described herein, the client device receives a stream suitable for displaying Player #9 zoomed at 5x and displays the stream with the appropriate zoom factor, e.g. in a picture-in-picture arrangement with respect to the default full-screen view.

[0124] FIG. 17 is a message flow diagram illustrating exchange of data in an exemplary embodiment. In the embodiment of FIG. 17, a high-resolution camera or other video source 1702 captures a high-resolution video and sends the high-resolution video 1704 to a server 1706. (Server 1706 is illustrated in FIG. 17 as a single server, but in some embodiments, the functions described herein may be performed by more than one server.) The server 1706 scales the high-resolution video in step 1707 to generate one or more full-scene representations that are viewable to client devices (e.g. to HD-capable clients) and delivers at least one of the scaled full-scene representations 1714 to a client 1716.

[0125] The video and/or additional sensor information 1708 (such as RFID tracking information) is used by an object tracker module 1710 to determine a location of one or more objects of interest within the video. Object location information 1718 is sent by the object tracker 1710 to the server 1706, in step 1720, the server constructs an object description based at least in part on the object location information, and the object description 1722 is sent to the client, e.g. in an MPD. in step 1724, the client receives a user selection of an object (or region) of interest and a zoom factor. The client sends selection information 1726 to the server, including information identifying the object or region of interest and the selected zoom factor. In some embodiments, the selection information further includes information indicating the display resolution of the client device or of a portion of the display (e.g. dimensions of a picture-in-picture inset region) to be used for displaying the region of interest, in other embodiments, the server may already have such information on the display resolution based on a resolution at which the client has been receiving the full-scene view.

[0126] Based on the selection information 1726 and the display resolution, the server in step 1728 crops and/or scales a portion of the high-resolution video using one or more of the techniques described herein and sends the portion 1730 to the client, in some embodiments, the client may further crop and/or scale the portion to an appropriate size for display. [0127] Although features and elements are described above in particular combinations, one of ordinary skill in the art will appreciate that each feature or element can be used alone or in any combination with the other features and elements. In addition, the methods described herein may be implemented in a computer program, software, or firmware incorporated in a computer-readable medium for execution by a computer or processor. Examples of computer-readable media include electronic signals (transmitted over wired or wireless connections) and computer-readable storage media. Examples of computer-readable storage media include, but are not limited to, a read only memory (ROM), a random access memory (RAM), a register, cache memory, semiconductor memory devices, magnetic media such as internal hard disks and removable disks, magneto-optical media, and optical media such as CD-ROM disks, and digital versatile disks (DVDs). A processor in association with software may be used to implement a radio frequency transceiver for use in a WTRU, UE, terminal, base station, RNC, or any host computer.

Claims

CLAIMS What is claimed:

1. A method of providing zoomed video to a client, the method comprising:

storing a plurality of representations of a full-scene video, wherein a corresponding representation is stored for each of a plurality of available resolutions;

streaming a first representation of the full-scene video to a client;

receiving from the client a request for zoomed video of a selected region of interest within the first video, wherein the request includes information identifying a zoom ratio;

based at least in part on the identified zoom ratio, selecting one of the available resolutions; and streaming to the client at least a selected portion of a representation having the selected resolution.

2. The method of claim 1 , wherein the selected portion of the representation is a portion including the selected region of interest.

3. The method of claim 1 , further comprising sending to the client metadata including at least one identifier of at least one region of interest, wherein the request for zoomed video includes the identifier of the selected region of interest.

4. The method of claim 3, wherein the identifier of at least one region of interest is sent to the client in a manifest file.

5. The method of claim 1 , wherein selecting one of the available resolutions comprises selecting the lowest available resolution that is greater than or equal to a resolution capable of providing the requested zoom ratio as a native zoom ratio.

6. The method of claim 1 , wherein each available resolution is associated with a native zoom ratio, and wherein selecting one of the available resolutions comprises selecting the lowest available resolution with a native zoom ratio at least as great as the requested zoom ratio.

7. The method of claim 6, wherein the native zoom ratio associated with an available resolution is the available resolution divided by a display resolution of the client.

8. The method of claim 6, wherein the selected resolution has a native zoom ratio greater than the requested zoom ratio, the method further comprising down-scaling the selected portion to achieve the requested zoom ratio.

9. The method of claim 1 , wherein the selected resolution has a native zoom ratio greater than the requested zoom ratio, the method further comprising:

receiving the selected portion at the client; and

down-scaling the selected portion at the client to achieve the requested zoom ratio.

10. The method of claim 1 , wherein the request for zoomed video includes information identifying a display resolution of the client, wherein the selection of one of the available resolutions is based at least in part on the identified display resolution.

11. The method of claim 1 , wherein the request for zoomed video includes information identifying a pixel size of a display region of the client, wherein a size of the selected portion of a representation having the selected resolution is based at least in part on the pixel size of the display region.

12. The method of claim 1 , wherein the first representation has a resolution equal to a display resolution of the client.

13. The method of claim 1 , further comprising, at the client:

receiving and displaying the first representation;

receiving the selected portion; and

displaying the selected portion as a picture-in-picture inset with respect to the first representation.

14. A system comprising a processor and a non-transitory computer-readable storage medium storing instructions operative, when executed on the processor, to perform functions comprising:

streaming a first representation of the full-scene video to a client;

15. The system of claim 14, wherein each available resolution is associated with a native zoom ratio, and wherein selecting one of the available resolutions comprises selecting the lowest available resolution with a native zoom ratio at least as great as the requested zoom ratio.