WO2017180439A1

WO2017180439A1 - System and method for fast stream switching with crop and upscale in client player

Info

Publication number: WO2017180439A1
Application number: PCT/US2017/026388
Authority: WO
Inventors: Kumar Ramaswamy; Jeffrey Allen Cooper; John Richardson; Ralph Neff
Original assignee: Vid Scale, Inc.
Priority date: 2016-04-15
Filing date: 2017-04-06
Publication date: 2017-10-19

Abstract

Described herein are methods and systems for enabling a client to effect a rapid change from displaying an unzoomed video to displaying a zoomed version of the video. In an exemplary embodiment, a video client receives (i) a primary video stream, (ii) metadata identifying a secondary video stream corresponding to a zoomed version of a region of the primary video stream, and (iii) data defining that region. In response to a user selection of the zoomed video stream, the video client requests the identified secondary video stream. Until the secondary video stream has been retrieved, the video client displays a locally-generated zoomed version of the video that is created by cropping areas outside the defined region and upscaling the defined region. Once enough of the secondary video stream has been retrieved, the video client displays the secondary video stream, which may be a higher-quality zoomed video.

Description

SYSTEM AND METHOD FOR FAST STREAM SWITCHING WITH

CROP AND UPSCALE IN CLIENT PLAYER

CROSS-REFERENCE TO RELATED APPLICATION

[0001] This application is a non-provisional filing of, and claims benefit under 35 U.S.C. §119(e) from, U.S. Provisional Patent Application Serial No. 62/323,105, entitled "System and Method for Fast Stream Switching with Crop and Upscale in Client Player," filed April 15, 2016, the entirety of which is incorporated herein by reference.

BACKGROUND

[0002] Digital video signals are commonly characterized by parameters including i) resolution (e.g. luma and chroma resolution or horizontal and vertical pixel dimensions), ii) frame rate, and iii) dynamic range or bit depth (e.g. bits per pixel). The resolution of digital video signals has increased from Standard Definition (SD) through 8K-Ultra High Definition (UHD). The other digital video signal parameters have also improved, with frame rate increasing from 30 frames per second (fps) up to 240 fps and bit depth increasing from 8 bit to 12 bit. To transmit a digital video signal over a network, MPEG/ITU standardized video compression has undergone several generations of successive improvements in compression efficiency, including MPEG2, MPEG4 part 2, MPEG-4 part 10/H.264, and HEVC/H.265. The technology to display the digital video signals on a consumer device, such as a television or mobile phone, has also increased correspondingly.

[0003] Consumers requesting higher quality digital video on network-connected devices face more bandwidth constraints from the video content delivery network. In an effort to mitigate the effects of bandwidth constraints, several solutions have emerged. Video content is initially captured at a higher resolution, frame rate, and dynamic range than will be used for distribution. For example, a 4:2:2, 10 bit HD video content is often down-resolved to a 4:2:0, 8 bit format for distribution. The digital video is encoded and stored at multiple resolutions at a server, and these versions at varying resolutions are made available for retrieval, decoding and rendering by clients with possibly varying capabilities. Adaptive bit rate (ABR) further addresses network congestion. In ABR, a digital video is encoded at multiple bit rates (e.g. : choosing the same or multiple lower resolutions, lower frame rates, etc.) and these alternate versions at different bit rates are made available at a server. The client device may request a different bit rate version of the video content for consumption at periodic intervals based on the client's calculated available network bandwidth or local computing resources.

SUMMARY

[0004] Described herein are methods and systems for implementing a function that enables a client to effect a rapid change from displaying a primary (e.g. unzoomed) video to displaying a secondary (e.g. zoomed) version of the video. Using already available information regarding zoomable regions in a primary (full content) view of a video, the client may crop and upscale the unzoomed version of the video to correspond to those zoomable regions based on cues from the server regarding the location of the server-supported zoomable regions. Since the streaming client may perform this crop-and-upscale process at any time (e.g. continuously or at the user-prompted switching time), a switch to any zoomed region may appear substantially instantaneous to a viewer. Then a higher resolution zoomed stream may be requested in the background (e.g. while the cropped and upscaled version of the primary view is being displayed) and will be sent down by the server for succeeding segments. The client may transition from the lower quality cropped and upscaled version of the zoomed region to the higher quality zoomed stream when enough of the zoomed stream is available in the client's buffer.

[0005] In some embodiments, a method includes receiving (i) a primary video stream, (ii) metadata identifying a secondary video stream corresponding to a region of the primary video stream, and (iii) region data defining the region within the primary video stream that corresponds to the secondary video stream, generating a cropped-upscaled portion of the primary video stream by cropping areas outside the defined region and upscaling the defined region, and in response to a user selection of the secondary video stream, causing display of the cropped-upscaled portion of the primary video stream, requesting the secondary video stream using the received metadata, and in response to receiving the secondary video stream, causing display of the secondary video stream in place of the cropped-upscaled portion of the primary video stream.

BRIEF DESCRIPTION OF THE DRAWINGS

[0006] A more detailed understanding may be had from the following description, presented by way of example in conjunction with the accompanying drawings, wherein:

[0007] FIG. 1 A depicts an example communications system in which one or more disclosed embodiments may be implemented. [0008] FIG. IB depicts an example client device that may be used within the communications system of FIG. 1A.

[0009] FIG 1C illustrates an exemplary network entity that may be employed as a server in accordance with some embodiments.

[0010] FIG. 2 is a workflow diagram for a zoom coding system, in accordance with some embodiments.

[0011] FIG. 3 illustrates an example of extracting multiple zoom streams from a baseline- transmitted video.

[0012] FIG 4 is a message flow diagram illustrating how a client may uses crop-and-upscale technique to transition from a primary view to a zoomed view, in accordance with some embodiments.

[0013] FIG. 5 depicts an example of a user interface, in accordance with some embodiments.

[0014] FIG. 6 is a diagram illustrating a display sequence, in accordance with some embodiments.

[0015] FIGs. 7 A and 7B illustrate example display configurations, in accordance with some embodiments.

[0016] FIG. 8 is a flowchart of a method, in accordance with some embodiments.

DETAILED DESCRIPTION

[0017] A detailed description of illustrative embodiments will now be provided with reference to the various Figures. Although this description provides detailed examples of possible implementations, it should be noted that the provided details are intended to be by way of example and in no way limit the scope of the application. The systems and methods relating to video compression may be used with the wired and wireless communication systems described with respect to FIGS. 1A-1C. Descriptions for FIGs. 1A-1C will be provided at the end of this document.

[0018] In traditional client devices for receiving digital video streams, the client device includes either a dedicated hardware decoder or a software decoder running on a CPU (e.g. a MPEG2, H.264 or other video decoder, and an MP3, AAC, or other audio decoder). Example client devices are set top boxes, PCs, laptops, tablets, or smartphones. The client decodes an incoming compressed audio/video stream and then presents it to the display, where the display is either attached via a data connection such as a video cable or integrated as part of the client device such as a smartphone or tablet. [0019] There is a latency in compressed video/audio systems that causes a delay from the time a user chooses a signal (which may be a particular channel) until the time the picture and sound are present on the display device and speakers. This "stream switch time", or "channel change time" latency, depends on the architecture of the broadcast system.

[0020] Traditional broadcast systems such as cable, satellite, and terrestrial broadcast have relatively low latencies since the content is pushed from the headend to the receivers all the time, and the client receiver only needs to switch to a different incoming signal. There is a small delay due to client decoder/di splay processing and buffering before the decoder, but typical systems can switch streams (e.g. when the user changes channels) in 1-2 seconds.

[0021] In over the top (OTT) systems that use protocols such as HTTP live streaming (HLS), Dynamic Adaptive Streaming over HTTP (DASH), or Silverlight on the internet, the stream switching time is much longer due to the protocols. The client requests a different channel or stream from the headend, and the headend responds. In addition, since the transmission medium is typically the internet, the transmission is bursty. The OTT protocols provide resilience to the bursty /lossy characteristics of the internet. This results in a stream-change latency of 5-10 seconds or longer.

[0022] Systems and methods disclosed herein offer a differentiated way to present video to end consumers. Exemplary embodiments enable consumers to experience personalized, differentiated, and high quality views of content on-the-fly by leveraging high capture resolutions, frame rates, and bit depths.

[0023] FIG. 2 depicts an example workflow for an exemplary system. Traditionally, an input full- resolution stream (4K or 6K resolution, for example) may be processed and delivered at a lower resolution (for example, 1080p HD) to an end consumer. The down-sampled resolution may be limited by the display device of the user - for example, content may be captured at 4K or 6K, however the user may have only an HD or SD display device. In FIG. 2, this processing of the full content view is represented by Adaptive Bitrate Encoder 205. In this case, an adaptive bit rate encoder 205 may produce ABR streams and may publish such streams to a streaming server 215 that may in turn deliver customized streams to end customers 225a/b/c via network 220.

[0024] An exemplary multi-view encoder 210, shown in the bottom part of the workflow in FIG. 2, may ingest the full resolution input video stream and with a variety of techniques produce, for example, cropped portions of the original sequence in the original native resolution (e.g. native pixels from the full resolution capture stream, which may be for example a 4K or a 6K stream), or at least at a higher resolution than the down-sampled resolution at which the full (un-cropped) view of the content would be delivered to and displayed for the end users 225a/b/c. [0025] The cropped portions may correspond to objects of interest in the content. For example, cropped portions may correspond to players in a sporting event such as soccer, football or basketball, or to cars in a movie chase scene. The cropped portions may then be encoded using traditional ABR techniques. Cropped portions may be separated from the primary content and individually encoded to produce a set of ABR streams for each cropped portion. Alternately, the primary content may be partitioned into slices or tiles (e.g. on a regular grid pattern, such as quadrants or using a finer grid) and the primary content may then be encoded as a whole to produce a set of ABR streams wherein the slices or tiles are independently decodable within each of the ABR streams. In the latter case, each cropped portion - that is, the content corresponding to an object of interest - may be delivered by processing an ABR bit stream to extract the independently decodable slices or tiles which include the cropped portion, and then delivering these slices or tiles to a client, along with a bounding box or other specification of which portion of the decoded slices or tiles corresponds to the cropped portion.

[0026] In the following description, ABR streams conveying the cropped portions (e.g. the high or full resolution encodings of objects of interest) are referred to as secondary video streams, or alternatively as zoomed streams.

[0027] A user may be presented with the choice of watching either the normal program (delivered using ABR profiles, which may be at a down-sampled resolution such as HD) or any of the secondary video streams that may represent zoomed portions (e.g. zoomed objects of interest) of the original program. For example, the user may begin by watching the normal program at lower- than-capture resolution, and may switch to a selected secondary video stream to display a zoomed object of interest at a higher effective resolution (e.g. at the full capture resolution). Once the user makes a choice to view a secondary video stream, the streaming client on the user's device may send a request to the streaming server for the selected secondary video stream at the appropriate profile. The profile may encompass one or more encoding properties of the stream, such as bit rate, frame rate, bit depth, and/or resolution. The streaming server will then deliver the appropriate secondary video stream to the end client.

[0028] In the stream-zoomed streaming context, there is an alternative to streaming multiple separate streams corresponding to the zoomed streams. Instead, on a per-frame basis, region data may be communicated to a client player including coordinate information of the various zoom regions. This information may be quite compact. For example, the region data may convey the corners (xl,y 1) and (x2,y2) of a simple bounding box which indicates the size and position of each zoom region within a full view of the content. Alternatively, if an aspect ratio is known, then the region data may include (xl,y 1) and a size parameter for each zoom region. In some embodiments, the ROI may be represented as a bounding circle defined by a center point (xl,yl) and a diameter of the bounding circle. In the above embodiments, the coordinates may be pixel coordinates on the display, and the size parameter/diameters may be defined by a number of pixels. This region data could be carried in-band in the video bit stream itself, or as a separate data stream which identifies the objects and provides the bounding box information per frame for each object, or the data could be relayed to the client in a manifest file such as a DASH media presentation description (MPD). For the in-band case, the data may be communicated in the private data section in the frame header of a video bit stream. In some embodiments, the coordinate information may be provided in a Pan- Scan supplemental enhancement information (SEI) message of H.264/H.265. With this information, a client player has access to the zoomable regions of interest in the primary (e.g. full content) view with no appreciable network bandwidth increase (only a very small incremental increase in bandwidth to communicate the zoom coordinates for each zoomable region in each frame). When the switch to a stream-zoomed stream for a zoomed region is requested, the player may start to crop the decoded primary (full content) view to correspond to the region of the specific requested secondary video stream, and the player may upscale and display the cropped portion of the primary view. The cropped and upscaled representation may be displayed for a period of time when transitioning from the primary (full content) view to the zoomed region view. Furthermore, in response to the request to switch to a stream-zoomed stream, the client may send a request to the server for the higher quality secondary video stream for the selected zoomed region. The client may continue to display the cropped and upscaled representation of the zoomed region until enough of the stream-zoomed stream is received and decoded. At this point, the client may stop displaying the cropped and upscaled representation of the zoomed region, and may start displaying the higher quality representation decoded from the stream-zoomed stream.

[0029] In some embodiments, the client may continuously crop and upscale the primary (full content) view to correspond to one or more of the zoomed regions in anticipation of the user selecting one of the zoomed regions for display. In such embodiments, the client may switch the display to the already available cropped and upscaled representation of the zoomed region at the time the user selects the zoomed region for display, which may cause the transition to appear smoother. In alternative embodiments, the client may begin cropping and upscaling the primary view to correspond to the selected zoomed region for display in response to user input selecting the corresponding zoomed region.

[0030] FIG. 3 illustrates an example of extracting multiple zoom streams from the baseline- transmitted video. The incoming video received via network 307 in this example includes zoom crop information for all zoom regions of interest on a per frame basis. FIG. 3 illustrates a set of m zoom regions 305 for a given frame at time t, each zoom region Xtm defined by two (x,y) coordinates and (k,l) representing the rectangular zoom region. The main decoder 312 of system 310 feeds frames of uncompressed video to each of the crop and scale blocks 314a ... Aw and 316a/.../m, respectively, that recreate the zoom region (in lower resolution since it is cropped and scaled) and stores them in a video buffer 318. Selector 320 receives a zoom select signal, and the corresponding zoom frame or the primary stream frame is displayed. As referred to earlier, if a zoomed stream is selected, then the player sends a request for a higher quality secondary video stream back to the server in the background.

[0031] FIG. 4 is a message flow diagram illustrating how the client may use a crop-and-upscale technique in some embodiments to transition from the primary (full content) view to a secondary view representing a stream -zoomed view representing a high quality zoomed view of an object or region of interest. For example, the primary view may represent a wide-angle camera shot of a football game, and secondary views may be available for football players of interest, for the ball, or a region such as a goal.

[0032] In FIG. 4 a media client 401 (e.g. DASH streaming client) requests 404 and receives

406 a manifest file (e.g. a DASH MPD file). The manifest may come from a streaming server 402

(e.g. DASH server) or from another manifest source. The manifest may enumerate available ABR streams, including ABR streams available for a primary (full content) view and for one or multiple available secondary views. (The ABR streams for the secondary views may be zoomed streams.)

The client may parse 408 the manifest in order to learn the availability of the various ABR streams, their encoding properties, and their zoom relationships. The client may request 410 media content for the primary (full content) view. This may involve the client choosing one of the ABR streams for the primary view (e.g. based on currently available network bandwidth, or based on other network assumptions). The client may request individual media segments on an ongoing basis, in order to retrieve the primary view content from the server. In response, the server may send 412 primary view media segments to the client. The client may buffer the received segments.

[0033] The client may receive 414 metadata identifying secondary streams that may be available. The metadata may identify one or multiple zoomable objects. Further, the client may receive region data that may specify what portion (e.g. a bounding box or a set of pixels identifying zoomable object regions-of-interest (ROIs) in each video frame) of the primary view corresponds to each of the zoomable objects. The metadata and/or region data may be delivered in-band within the media segments of the primary view content. In some embodiments, the metadata and/or region data may be delivered out of band via a separate stream. In such embodiments, the metadata and/or region data may be delivered to the client in response to a request for the metadata sent from the client to a server, although such a request is not illustrated in FIG.4. The metadata and/or region data may have been delivered as part of the manifest file.

[0034] The client may buffer the primary view segments, and begin decoding these segments in order to display the primary view to the user. The client may parse the metadata to determine what zoomed views are available, and may present 416 a user interface that allows the user to select at least one of the zoomed views. For example, the client may highlight the portions of the primary view, which correspond to each zoomable object (as specified in the metadata), and may allow the user to select one of the highlighted portions in order to switch to the display of the corresponding secondary (e.g. zoomed) view. The user may select 418 a zoomed view in the presented user interface.

[0035] In response to user selection of a zoomed view, the client may begin display 420 of a cropped and upscaled portion of the primary view content corresponding to the zoomed view. The client may use the region data of the zoomed object to determine what portion of the primary view to crop and upscale. While displaying the cropped and upscaled portion, the client may request 422 the secondary video stream (which may take the form of a stream-zoomed stream) for the selected zoomed view using the metadata. As before, this may take the form of periodic, ongoing requests for media segments of the secondary video stream.

[0036] The server may return 424 media segments of the stream-zoomed stream to the client, and the client may buffer the segments. When a sufficient number of media segments of the secondary video stream are buffered, the client may stop displaying the cropped and upscaled portion of the primary view, and may transition to decoding and displaying 426 the higher quality secondary video stream. The transitions between displaying the primary video stream, displaying the cropped and upscaled portion, and displaying the secondary video stream may all be performed as seamless transitions. In some embodiments, a seamless transition includes the start point of one displayed stream being synchronized with the end point of the previously-displayed stream. For example, the last-displayed frame of the primary video stream and the first-displayed frame of the cropped and upscaled portion may correspond to consecutive frames of the same source content. Similarly, the last-displayed frame of the cropped and upscaled portion and the first-displayed frame of the secondary stream may correspond to consecutive frames of the source content. In this way, a user is likely to perceive little or no jump in time during transition of among the different streams. In some embodiments, substantially seamless transitions may include a disparity of up to a few frames (e.g. up to around twelve frames, or around one half second of video). Seamless transitions may be effected using one or more of various available techniques. For example, a transition from display of the cropped and upscaled portion of the primary stream to display of the secondary stream may be performed at the start of a segment of the secondary stream.

[0037] In some embodiments, the client may stop requesting and retrieving the primary view content when the client transitions to decode and display the stream-zoomed stream content. In some embodiments, the client may request and retrieve segments for the primary view only to the extent that such segments are needed to provide primary view content from which the transitional cropped and upscaled representation of the zoomed view is taken.

[0038] In some embodiments, the client may continue requesting and retrieving the primary view content in parallel to the secondary video stream content for the zoomed view. This may be done so that the client may continue to display the primary view content in addition to the high quality zoomed view provided by the secondary video stream. (For example, the client may display the primary content in a background full screen configuration, and may display the zoomed view in a 'Picture-in-Picture' window in the foreground). The client may switch to a lower quality ABR representation for the primary view in this case, in order to retrieve both the primary view content and the selected zoomed content in parallel (e.g. if network bandwidth is too low to retrieve both at full quality). This may be acceptable for display, since the user may focus on the detailed high quality zoomed view of the selected object of interest rather than the 'background' display of the primary (full view) content.

[0039] In some embodiments, the client may use the continually received primary view content to achieve a fast switch back to the primary content view or to a different available zoomed view. For example, while displaying the high quality zoomed view (using the secondary video stream), the client may present a user interface which allows the user to return to the primary content view or to switch to one of the other available zoomed views. The client may continue to request and retrieve media segments from the primary view while requesting, retrieving, and displaying media segments for the high quality zoomed view. The client may maintain a buffer of primary view content, and (optionally) may be decoding the received primary view content while continuing to receive, decode, and display the content for the high quality zoomed view. If the user selects 'return to primary view' using the presented user interface, the client may rapidly switch to displaying the primary view content using the buffered (and possibly already decoded) primary view content. If the user selects to switch to a different zoomed view (e.g. to switch from displaying a first zoomed view to displaying a second, different zoomed view), the client may rapidly switch to displaying a cropped and upscaled representation of the newly selected different zoomed view, by cropping and upscaling the primary content view (which is already available in the client buffer and possibly is already decoded) to correspond to selected zoomed view. In the latter case, the user may display the cropped and upscaled representation during a transition period while the client requests and retrieves a tertiary video stream, which may be a zoomed stream corresponding to the newly selected different zoomed view.

[0040] Continuous retrieval of the primary view content may thus allow the perception of a fast (e.g. seamless) switch between the primary view content and any of the available zoomed views, as well as a fast (e.g. seamless) switch between any available zoomed view and any another available zoomed view.

[0041] FIG. 5 depicts an example user interface for selecting available secondary video streams. As shown, user interface is displaying a primary video stream 505, with available secondary video streams 510 and 515 corresponding to a soccer player and a soccer ball, respectively. As shown, the secondary video streams 510 and 515 are identified by ROIs 520 and 525, respectively. In response to a user selecting secondary video stream 515, the client device may crop and upscale a portion of the primary video stream identified by the ROI associated with the selected secondary video stream 515, and display the cropped-upscaled version of the primary video stream. In some embodiments, the cropped-upscaled version of the primary video stream is displayed in place of the primary video stream, as shown in FIG. 7B. In alternative embodiments, the cropped-upscaled version of the primary video stream may be overlaid on the primary video stream, in a picture-in-picture configuration, as shown in FIG. 7A. Thus, the cropped-upscaled version of the primary video stream may be upscaled according to the size of the picture-in-picture overlay, or may be upscaled according to client display device resolution. If the aspect ratio of the ROI of the zoomed object does not match the aspect ratio of the client display device, then the cropped-upscaled stream may be upscaled in order to maximize a height or width of the cropped- upscaled stream, for example, as shown in FIG. 7B. While the cropped-upscaled video stream is being displayed, the client may request the selected secondary video stream from a server, and when a sufficient buffering has been received, the client may display the received secondary video stream 515 in place of the cropped-upscaled version of the primary video stream.

[0042] FIG. 6 is a diagram illustrating a display sequence, in accordance with some embodiments. As shown, a decoded primary video stream view 605 is displayed. In response to selecting available secondary video stream 610, the client crops the primary video stream and upscales the cropped portion, as shown by 615. Note that there is a loss of resolution when cropping and upscaling the primary video stream. The client may display the cropped-upscaled portion of the primary video stream 615 while buffering the selected secondary video stream from the server. When a sufficient number of frames has been received, the client may display the received secondary video stream, as shown in 620. The resolution is improved using stream- zoomed techniques. The received secondary video stream may continue to display until the user selects a different secondary video stream, or selects the primary video stream for display.

[0043] In some embodiments, a geometric location of a secondary video stream is given to a client device. In some embodiments, H264/H.265 Pan Scan SEI messages may include region data used to specify a region of the primary video stream to be cropped and upscaled. In response to selection of a secondary video stream, the client crops, upscales and displays the defined area while buffering the secondary video stream.

[0044] FIG. 8 is a flowchart of a method 800, in accordance with some embodiments. It should be noted that while the flowchart of FIG. 8 illustrates one exemplary embodiment, alternative embodiments may execute the steps of method 800 in an alternative order. As shown, method 800 begins at step 802 by receiving a primary video stream, metadata identifying a secondary video stream corresponding to a region of the primary video stream, and region data defining the region within the primary video stream that corresponds to the secondary video stream. At step 804, a cropped-upscaled portion of the primary video stream is generated by cropping areas outside the defined region and uspcaling the defined region. At step 806, the cropped-upscaled portion of the primary video stream is displayed. At step 808, the secondary video stream is requested using the received metadata, and in response to receiving the secondary video stream, the secondary video stream is displayed at step 810 in place of the cropped-upscaled portion of the primary video stream.

[0045] In some embodiments, the method includes displaying the primary video stream before user selection of the secondary video stream, and displaying the cropped-upscaled portion of the primary video stream in place of the primary video stream in response to the user selection. Alternatively, the cropped-upscaled portion of the primary video stream is displayed as a picture- in-picture on the primary video stream. The cropped-upscaled portion of the primary video stream may be generated in response to the user selection of the secondary video stream, or alternatively the cropped-upscaled portion of the primary video stream may be generated prior to the user selection of the secondary video stream.

[0046] The region data may be included in a manifest file, for example in a DASH MPD. Alternatively, the region data may be included in a pan scan supplemental enhancement information (SEI) H.264/H.265 message. Further, the region data may be received in alternative known formats.

[0047] The primary and secondary streams may be synchronized such that a starting frame of the received secondary video stream corresponds to a future point in time with respect to a time of the request of the secondary video stream. [0048] In some embodiments, the primary video stream is received while displaying the secondary video stream, and may be displayed in response to a second user request to display the primary video stream. Alternatively, the primary video stream may be received in response to a second user request to display the primary video stream while displaying the secondary video stream, in which case the primary video stream is displayed in response to receiving the primary video stream.

[0049] FIG. 1 A is a diagram of an example communications system 100 in which one or more disclosed embodiments may be implemented. The communications system 100 may be a multiple access system that provides content, such as voice, data, video, messaging, broadcast, and the like, to multiple wireless users. The communications system 100 may enable multiple wired and wireless users to access such content through the sharing of system resources, including wired and wireless bandwidth. For example, the communications systems 100 may employ one or more channel-access methods, such as code division multiple access (CDMA), time division multiple access (TDMA), frequency division multiple access (FDMA), orthogonal FDMA (OFDMA), single-carrier FDMA (SC-FDMA), and the like. The communications systems 100 may also employ one or more wired communications standards (e.g.: Ethernet, DSL, radio frequency (RF) over coaxial cable, fiber optics, and the like.

[0050] As shown in FIG. 1 A, the communications system 100 may include client devices 102a, 102b, 102c, and/or 102d, Radio Access Networks (RAN) 103/104/105, a core network 106/107/109, a public switched telephone network (PSTN) 108, the Internet 110, and other networks 112, and communication links 115/116/117, and 119, though it will be appreciated that the disclosed embodiments contemplate any number of client devices, base stations, networks, and/or network elements. Each of the client devices 102a, 102b, 102c, 102d may be any type of device configured to operate and/or communicate in a wired or wireless environment. By way of example, the client device 102a is depicted as a tablet computer, the client device 102b is depicted as a smart phone, the client device 102c is depicted as a computer, and the client device 102d is depicted as a television.

[0051] The communications systems 100 may also include a base station 114a and a base station 114b. Each of the base stations 114a, 114b may be any type of device configured to wirelessly interface with at least one of the WTRUs 102a, 102b, 102c, 102d to facilitate access to one or more communication networks, such as the core network 106/107/109, the Internet 110, and/or the networks 112. By way of example, the base stations 114a, 114b may be a base transceiver station (BTS), a Node-B, an eNode B, a Home Node B, a Home eNode B, a site controller, an access point (AP), a wireless router, and the like. While the base stations 114a, 114b are each depicted as a single element, it will be appreciated that the base stations 114a, 114b may include any number of interconnected base stations and/or network elements.

[0052] The base station 114a may be part of the RAN 103/104/105, which may also include other base stations and/or network elements (not shown), such as a base station controller (BSC), a radio network controller (RNC), relay nodes, and the like. The base station 114a and/or the base station 114b may be configured to transmit and/or receive wireless signals within a particular geographic region, which may be referred to as a cell (not shown). The cell may further be divided into sectors. For example, the cell associated with the base station 114a may be divided into three sectors. Thus, in one embodiment, the base station 114a may include three transceivers, i.e., one for each sector of the cell. In another embodiment, the base station 114a may employ multiple- input multiple output (MIMO) technology and, therefore, may utilize multiple transceivers for each sector of the cell.

[0053] The base stations 114a, 114b may communicate with one or more of the client devices 102a, 102b, 102c, and 102d over an air interface 115/116/117, or communication link 119, which may be any suitable wired or wireless communication link (e.g., radio frequency (RF), microwave, infrared (IR), ultraviolet (UV), visible light, and the like). The air interface 115/116/117 may be established using any suitable radio access technology (RAT).

[0054] More specifically, as noted above, the communications system 100 may be a multiple access system and may employ one or more channel-access schemes, such as CDMA, TDMA, FDMA, OFDMA, SC-FDMA, and the like. For example, the base station 114a in the RAN 103/104/105 and the client devices 102a, 102b, 102c may implement a radio technology such as Universal Mobile Telecommunications System (UMTS) Terrestrial Radio Access (UTRA), which may establish the air interface 115/116/117 using wideband CDMA (WCDMA). WCDMA may include communication protocols such as High-Speed Packet Access (HSPA) and/or Evolved HSPA (HSPA+). HSPA may include High-Speed Downlink Packet Access (HSDPA) and/or High-Speed Uplink Packet Access (HSUPA).

[0055] In another embodiment, the base station 114a and the client devices 102a, 102b, 102c may implement a radio technology such as Evolved UMTS Terrestrial Radio Access (E-UTRA), which may establish the air interface 115/116/117 using Long Term Evolution (LTE) and/or LTE- Advanced (LTE- A).

[0056] In other embodiments, the base station 114a and the client devices 102a, 102b, 102c may implement radio technologies such as IEEE 802.16 (i.e., Worldwide Interoperability for

Microwave Access (WiMAX)), CDMA2000, CDMA2000 IX, CDMA2000 EV-DO, Interim

Standard 2000 (IS-2000), Interim Standard 95 (IS-95), Interim Standard 856 (IS-856), Global System for Mobile communications (GSM), Enhanced Data rates for GSM Evolution (EDGE), GSM EDGE (GERAN), and the like.

[0057] The base station 114b in FIG. 1 A may be a wired router, a wireless router, Home Node B, Home eNode B, or access point, as examples, and may utilize any suitable wired transmission standard or RAT for facilitating wireless connectivity in a localized area, such as a place of business, a home, a vehicle, a campus, and the like. In one embodiment, the base station 114b and the client devices 102c, 102d may implement a radio technology such as IEEE 802.11 to establish a wireless local area network (WLAN). In another embodiment, the base station 114b and the client devices 102c, 102d may implement a radio technology such as IEEE 802.15 to establish a wireless personal area network (WPAN). In yet another embodiment, the base station 114b and the client devices 102c, 102d may utilize a cellular-based RAT (e.g., WCDMA, CDMA2000, GSM, LTE, LTE-A, and the like) to establish a picocell or femtocell. In yet another embodiment, the base station 114b communicates with client devices 102a, 102b, 102c, and 102d through communication links 119. As shown in FIG. 1A, the base station 114b may have a direct connection to the Internet 110. Thus, the base station 114b may not be required to access the Internet 110 via the core network 106/107/109.

[0058] The RAN 103/104/105 may be in communication with the core network 106/107/109, which may be any type of network configured to provide voice, data, applications, and/or voice over internet protocol (VoIP) services to one or more of the client devices 102a, 102b, 102c, 102d. As examples, the core network 106/107/109 may provide call control, billing services, mobile location-based services, pre-paid calling, Internet connectivity, video distribution, and the like, and/or perform high-level security functions, such as user authentication. Although not shown in FIG. 1A, it will be appreciated that the RAN 103/104/105 and/or the core network 106/107/109 may be in direct or indirect communication with other RANs that employ the same RAT as the RAN 103/104/105 or a different RAT. For example, in addition to being connected to the RAN 103/104/105, which may be utilizing an E-UTRA radio technology, the core network 106/107/109 may also be in communication with another RAN (not shown) employing a GSM radio technology.

[0059] The core network 106/107/109 may also serve as a gateway for the client devices 102a,

102b, 102c, 102d to access the PSTN 108, the Internet 110, and/or other networks 112. The PSTN

108 may include circuit-switched telephone networks that provide plain old telephone service

(POTS). The Internet 110 may include a global system of interconnected computer networks and devices that use common communication protocols, such as the transmission control protocol

(TCP), user datagram protocol (UDP) and IP in the TCP/IP Internet protocol suite. The networks 112 may include wired and/or wireless communications networks owned and/or operated by other service providers. For example, the networks 112 may include another core network connected to one or more RANs, which may employ the same RAT as the RAN 103/104/105 or a different RAT.

[0060] Some or all of the client devices 102a, 102b, 102c, 102d in the communications system 100 may include multi-mode capabilities, i.e., the client devices 102a, 102b, 102c, 102d may include multiple transceivers for communicating with different wired or wireless networks over different communication links. For example, the WTRU 102c shown in FIG. 1A may be configured to communicate with the base station 114a, which may employ a cellular-based radio technology, and with the base station 114b, which may employ an IEEE 802 radio technology.

[0061] FIG. IB depicts an example client device that may be used within the communications system of FIG. 1 A. In particular, FIG. IB is a system diagram of an example client device 102. As shown in FIG. IB, the client device 102 may include a processor 118, a transceiver 120, a transmit/receive element 122, a speaker/microphone 124, a keypad 126, a display/touchpad 128, a non-removable memory 130, a removable memory 132, a power source 134, a global positioning system (GPS) chipset 136, and other peripherals 138. It will be appreciated that the client device 102 may represent any of the client devices 102a, 102b, 102c, and 102d, and include any subcombination of the foregoing elements while remaining consistent with an embodiment. Also, embodiments contemplate that the base stations 114a and 114b, and/or the nodes that base stations 114a and 114b may represent, such as but not limited to transceiver station (BTS), a Node-B, a site controller, an access point (AP), a home Node-B, an evolved home Node-B (eNodeB), a home evolved Node-B (HeNB), a home evolved Node-B gateway, and proxy nodes, among others, may include some or all of the elements depicted in FIG. IB and described herein.

[0062] The processor 118 may be a general purpose processor, a special purpose processor, a conventional processor, a digital signal processor (DSP), a plurality of microprocessors, one or more microprocessors in association with a DSP core, a controller, a microcontroller, Application Specific Integrated Circuits (ASICs), Field Programmable Gate Array (FPGAs) circuits, any other type of integrated circuit (IC), a state machine, and the like. The processor 118 may perform signal coding, data processing, power control, input/output processing, and/or any other functionality that enables the client device 102 to operate in a wired or wireless environment. The processor 118 may be coupled to the transceiver 120, which may be coupled to the transmit/receive element 122. While FIG. IB depicts the processor 118 and the transceiver 120 as separate components, it will be appreciated that the processor 118 and the transceiver 120 may be integrated together in an electronic package or chip. [0063] The transmit/receive element 122 may be configured to transmit signals to, or receive signals from, a base station (e.g., the base station 114a) over the air interface 115/116/117 or communication link 119. For example, in one embodiment, the transmit/receive element 122 may be an antenna configured to transmit and/or receive RF signals. In another embodiment, the transmit/receive element 122 may be an emitter/detector configured to transmit and/or receive IR, UV, or visible light signals, as examples. In yet another embodiment, the transmit/receive element 122 may be configured to transmit and receive both RF and light signals. In yet another embodiment, the transmit/receive element may be a wired communication port, such as an Ethernet port. It will be appreciated that the transmit/receive element 122 may be configured to transmit and/or receive any combination of wired or wireless signals.

[0064] In addition, although the transmit/receive element 122 is depicted in FIG. IB as a single element, the client device 102 may include any number of transmit/receive elements 122. More specifically, the client device 102 may employ MTMO technology. Thus, in one embodiment, the WTRU 102 may include two or more transmit/receive elements 122 (e.g., multiple antennas) for transmitting and receiving wireless signals over the air interface 115/116/117.

[0065] The transceiver 120 may be configured to modulate the signals that are to be transmitted by the transmit/receive element 122 and to demodulate the signals that are received by the transmit/receive element 122. As noted above, the client device 102 may have multi-mode capabilities. Thus, the transceiver 120 may include multiple transceivers for enabling the client device 102 to communicate via multiple RATs, such as UTRA and IEEE 802.11, as examples.

[0066] The processor 118 of the client device 102 may be coupled to, and may receive user input data from, the speaker/microphone 124, the keypad 126, and/or the display/touchpad 128 (e.g., a liquid crystal display (LCD) display unit or organic light-emitting diode (OLED) display unit). The processor 118 may also output user data to the speaker/microphone 124, the keypad 126, and/or the display/touchpad 128. In addition, the processor 118 may access information from, and store data in, any type of suitable memory, such as the non-removable memory 130 and/or the removable memory 132. The non-removable memory 130 may include random-access memory (RAM), read-only memory (ROM), a hard disk, or any other type of memory storage device. The removable memory 132 may include a subscriber identity module (SIM) card, a memory stick, a secure digital (SD) memory card, and the like. In other embodiments, the processor 118 may access information from, and store data in, memory that is not physically located on the client device 102, such as on a server or a home computer (not shown).

[0067] The processor 118 may receive power from the power source 134, and may be configured to distribute and/or control the power to the other components in the client device 102. The power source 134 may be any suitable device for powering the WTRU 102. As examples, the power source 134 may include one or more dry cell batteries (e.g., nickel-cadmium (NiCd), nickel- zinc (NiZn), nickel metal hydride (NiMH), lithium-ion (Li-ion), and the like), solar cells, fuel cells, a wall outlet and the like.

[0068] The processor 118 may also be coupled to the GPS chipset 136, which may be configured to provide location information (e.g., longitude and latitude) regarding the current location of the client device 102. In addition to, or in lieu of, the information from the GPS chipset 136, the WTRU 102 may receive location information over the air interface 115/116/117 from a base station (e.g., base stations 114a, 114b) and/or determine its location based on the timing of the signals being received from two or more nearby base stations. It will be appreciated that the client device 102 may acquire location information by way of any suitable location-determination method while remaining consistent with an embodiment. In accordance with an embodiment, the client device 102 does not comprise a GPS chipset and does not acquire location information.

[0069] The processor 118 may further be coupled to other peripherals 138, which may include one or more software and/or hardware modules that provide additional features, functionality and/or wired or wireless connectivity. For example, the peripherals 138 may include an accelerometer, an e-compass, a satellite transceiver, a digital camera (for photographs or video), a universal serial bus (USB) port, a vibration device, a television transceiver, a hands free headset, a Bluetooth® module, a frequency modulated (FM) radio unit, a digital music player, a media player, a video game player module, an Internet browser, and the like.

[0070] FIG. 1C depicts an exemplary network entity 190 that may be used in embodiments of the present disclosure, for example as a server. As depicted in FIG. 1C, network entity 190 includes a communication interface 192, a processor 194, and non-transitory data storage 196, all of which are communicatively linked by a bus, network, or other communication path 198.

[0071] Communication interface 192 may include one or more wired communication interfaces and/or one or more wireless-communication interfaces. With respect to wired communication, communication interface 192 may include one or more interfaces such as Ethernet interfaces, as an example. With respect to wireless communication, communication interface 192 may include components such as one or more antennae, one or more transceivers/chipsets designed and configured for one or more types of wireless (e.g., LTE) communication, and/or any other components deemed suitable by those of skill in the relevant art. And further with respect to wireless communication, communication interface 192 may be equipped at a scale and with a configuration appropriate for acting on the network side— as opposed to the client side— of wireless communications (e.g., LTE communications, Wi-Fi communications, and the like). Thus, communication interface 192 may include the appropriate equipment and circuitry (perhaps including multiple transceivers) for serving multiple mobile stations, UEs, or other access terminals in a coverage area.

[0072] Processor 194 may include one or more processors of any type deemed suitable by those of skill in the relevant art, some examples including a general-purpose microprocessor and a dedicated DSP.

[0073] Data storage 196 may take the form of any non-transitory computer-readable medium or combination of such media, some examples including flash memory, read-only memory (ROM), and random-access memory (RAM) to name but a few, as any one or more types of non- transitory data storage deemed suitable by those of skill in the relevant art could be used. As depicted in FIG. 1C, data storage 196 contains program instructions 197 executable by processor 194 for carrying out various combinations of the various network-entity functions described herein.

[0074] Although features and elements are described above in particular combinations, one of ordinary skill in the art will appreciate that each feature or element can be used alone or in any combination with the other features and elements. In addition, the methods described herein may be implemented in a computer program, software, or firmware incorporated in a computer- readable medium for execution by a computer or processor. Examples of computer-readable media include electronic signals (transmitted over wired or wireless connections) and computer-readable storage media. Examples of computer-readable storage media include, but are not limited to, a read only memory (ROM), a random access memory (RAM), a register, cache memory, semiconductor memory devices, magnetic media such as internal hard disks and removable disks, magneto-optical media, and optical media such as CD-ROM disks, and digital versatile disks (DVDs). A processor in association with software may be used to implement a radio frequency transceiver for use in a WTRU, UE, terminal, base station, RNC, or any host computer.

Claims

CLAIMS We claim:

1. A method comprising:

receiving (i) a primary video stream, (ii) metadata identifying a secondary video stream corresponding to a region of the primary video stream, and (iii) region data defining the region within the primary video stream that corresponds to the secondary video stream;

generating a cropped-upscaled portion of the primary video stream by cropping areas outside the defined region and upscaling the defined region;

in response to a user selection of the secondary video stream:

causing display of the cropped-upscaled portion of the primary video stream; requesting the secondary video stream using the received metadata; and in response to receiving the secondary video stream, causing display of the secondary video stream in place of the cropped-upscaled portion of the primary video stream.

2. The method of claim 1, further comprising:

before user selection of the secondary video stream, causing display of the primary video stream; and

in response to the user selection, displaying the cropped-upscaled portion of the primary video stream in place of the primary video stream.

3. The method of claim 1, further comprising:

in response to the user selection, displaying the cropped-upscaled portion of the primary video stream as a picture-in-picture on the primary video stream.

4. The method as in any of claims 1-3, wherein the cropped-upscaled portion of the primary video stream is generated in response to the user selection of the secondary video stream.

5. The method as in any of claims 1-3, wherein the cropped-upscaled portion of the primary video stream is generated prior to the user selection of the secondary video stream.

6. The method as in any of claims 1-5, wherein the region data is included in a manifest file.

7. The method of claim 6, wherein the manifest file is a Dynamic Adaptive Streaming over HTTP (DASH) media presentation description (MPD).

8. The method as an any of claims 1-5, wherein the region data is included in a pan scan supplemental enhancement information (SEI) H.264/H.265 message.

9. The method as in any of claims 1-8, wherein displaying the secondary video stream comprises seamlessly transitioning from the cropped-upscaled portion of the primary video stream.

10. The method as in any of claims 1-9, wherein the region data comprises first and second pixel coordinates (xl,yl) and (x2,y2) for a bounding box for at least one identified region within the primary video stream.

11. The method as in any of claims 1-9, wherein the region data comprises a pixel coordinate (xl,y 1) and a size parameter of a bounding box for at least one identified region within the primary video stream.

12. The method as in any of claims 1-9, wherein the region data comprises a pixel coordinate (xl,y 1) of a center of a bounding circle and a diameter parameter of the bounding circle for at least one identified region within the primary video stream.

13. A system comprising a non-transitory computer readable medium for carrying one or more instructions, wherein the one or more instructions, when executed by one or more processors, causes the one or more processors to perform the steps of:

in response to a user selection of the secondary video stream:

14. The system of claim 13, wherein the non-transitory computer readable medium comprises instructions for:

in response to the user selection, causing display of the cropped-upscaled portion of the primary video stream in place of the primary video stream.

15. The system of claim 13, wherein the non-transitory computer readable medium comprises instructions for:

16. The system as in any of claims 13-15, wherein the cropped-upscaled portion of the primary video stream is generated in response to the user selection of the secondary video stream.

17. The system as in any of claims 13-15, wherein the cropped-upscaled portion of the primary video stream is generated prior to the user selection of the secondary video stream.

18. The system as in any of claims 13-17, wherein the region data is included in a manifest file.

19. The system of claim 18, wherein the manifest file is a Dynamic Adaptive Streaming over HTTP (DASH) media presentation description (MPD).

20. The system as in any of claims 13-17, wherein the region data is included in a pan scan supplemental enhancement information (SEI) H.264/H.265 message.