WO2018044731A1 - Systems and methods for hybrid network delivery of objects of interest in video - Google Patents

Systems and methods for hybrid network delivery of objects of interest in video Download PDF

Info

Publication number
WO2018044731A1
WO2018044731A1 PCT/US2017/048712 US2017048712W WO2018044731A1 WO 2018044731 A1 WO2018044731 A1 WO 2018044731A1 US 2017048712 W US2017048712 W US 2017048712W WO 2018044731 A1 WO2018044731 A1 WO 2018044731A1
Authority
WO
WIPO (PCT)
Prior art keywords
video stream
video
interest
network
over
Prior art date
Application number
PCT/US2017/048712
Other languages
French (fr)
Inventor
Kumar Ramaswamy
Jeffrey Allen Cooper
John Richardson
Original Assignee
Vid Scale, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Vid Scale, Inc. filed Critical Vid Scale, Inc.
Publication of WO2018044731A1 publication Critical patent/WO2018044731A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/236Assembling of a multiplex stream, e.g. transport stream, by combining a video stream with other content or additional data, e.g. inserting a URL [Uniform Resource Locator] into a video stream, multiplexing software data into a video stream; Remultiplexing of multiplex streams; Insertion of stuffing bits into the multiplex stream, e.g. to obtain a constant bit-rate; Assembling of a packetised elementary stream
    • H04N21/23614Multiplexing of additional data and video streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/4302Content synchronisation processes, e.g. decoder synchronisation
    • H04N21/4307Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen
    • H04N21/43072Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen of multiple content streams on the same device
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/25Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
    • H04N21/262Content or additional data distribution scheduling, e.g. sending additional data at off-peak times, updating software modules, calculating the carousel transmission frequency, delaying a video stream transmission, generating play-lists
    • H04N21/26258Content or additional data distribution scheduling, e.g. sending additional data at off-peak times, updating software modules, calculating the carousel transmission frequency, delaying a video stream transmission, generating play-lists for generating a list of items to be played back in a given order, e.g. playlist, or scheduling item distribution according to such list
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/434Disassembling of a multiplex stream, e.g. demultiplexing audio and video streams, extraction of additional data from a video stream; Remultiplexing of multiplex streams; Extraction or processing of SI; Disassembling of packetised elementary stream
    • H04N21/4348Demultiplexing of additional data and video streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/4728End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for selecting a Region Of Interest [ROI], e.g. for requesting a higher resolution version of a selected region
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/60Network structure or processes for video distribution between server and client or between remote clients; Control signalling between clients, server and network components; Transmission of management data between server and client, e.g. sending from server to client commands for recording incoming content stream; Communication details between server and client 
    • H04N21/61Network physical structure; Signal processing
    • H04N21/6106Network physical structure; Signal processing specially adapted to the downstream path of the transmission network
    • H04N21/6112Network physical structure; Signal processing specially adapted to the downstream path of the transmission network involving terrestrial transmission, e.g. DVB-T
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/60Network structure or processes for video distribution between server and client or between remote clients; Control signalling between clients, server and network components; Transmission of management data between server and client, e.g. sending from server to client commands for recording incoming content stream; Communication details between server and client 
    • H04N21/61Network physical structure; Signal processing
    • H04N21/6106Network physical structure; Signal processing specially adapted to the downstream path of the transmission network
    • H04N21/6125Network physical structure; Signal processing specially adapted to the downstream path of the transmission network involving transmission via Internet
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/60Network structure or processes for video distribution between server and client or between remote clients; Control signalling between clients, server and network components; Transmission of management data between server and client, e.g. sending from server to client commands for recording incoming content stream; Communication details between server and client 
    • H04N21/61Network physical structure; Signal processing
    • H04N21/6156Network physical structure; Signal processing specially adapted to the upstream path of the transmission network
    • H04N21/6175Network physical structure; Signal processing specially adapted to the upstream path of the transmission network involving transmission via Internet
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/858Linking data to content, e.g. by linking an URL to a video object, by creating a hotspot
    • H04N21/8586Linking data to content, e.g. by linking an URL to a video object, by creating a hotspot by using a URL

Definitions

  • This disclosure relates to systems and methods for streaming video content. More specifically, this disclosure relates to systems and methods for streaming enhancement video content over an alternative network to a primary video and primary network.
  • displaying additional information on a video screen often improves an audience's viewing experience. For example, during a video broadcast of an American football game, the location of a first-down line may be displayed as a yellow line superimposed on the video broadcast at the location of the first-down line. Additionally, a football player's name and statistics may be displayed on the video broadcast when the video broadcast is displaying video of the football player.
  • Systems and methods disclosed herein operate to enhance a normal broadcast video viewing experience by providing access to enhanced views, such as zoomed or highlighted views of particular regions of interest, or partial or complete views of content with high resolution, high frame rate, high bit depth, or customized tone mapping.
  • enhanced views such as zoomed or highlighted views of particular regions of interest, or partial or complete views of content with high resolution, high frame rate, high bit depth, or customized tone mapping.
  • zoom coded streams are made available over a source other than broadcast, such as a packet-switched network.
  • Information identifying the available zoom coded streams may be provided in-band in the broadcast video. Identifying available streams in the broadcast video rather than providing the streams themselves in the broadcast video may provide advantages with respect to bandwidth consumption, as the zoom coded streams are generally sent to a client at the request of the client rather than on a continuous basis.
  • interactive video content is provided over a combination of a broadcast network and a broadband network.
  • a camera captures a high-resolution video of a scene, and the location of objects of interest in a video frame is determined, e.g., by collecting object tracking sensor data and camera positioning data and by fusing the data.
  • a bounding box is calculated for the object tracked using the object position and a target resolution.
  • Per-frame metadata is broadcast indicating the bounding box of an object in the broadcast video.
  • a client receives an indication of a selected object (e.g., based on a user selection) and responsively signals to a broadband video server the selection of zoom stream and, in some embodiments, parameters of zoom.
  • the zoomed stream is constructed by the broadband server or other network entity. The zoomed stream is delivered to the client over the broadband network.
  • the video streams from broadcast and broadband networks may be time synchronized for display.
  • visual information from a video stream is fused with object-in- space information obtained from other location information sources.
  • the other location information sources may be in the form of a radio frequency tracking system, radio frequency identification (RFID) tags, GPS, WiFi Locating Systems, and the like. Coordinates of object-of- interest areas that surround objects of interest are determined based on the fused visual and object- in-space information and may be determined on a per-frame basis.
  • RFID radio frequency identification
  • a method includes capturing, with a camera, a video frame of a scene; determining a camera orientation and camera location of the camera capturing the video; determining a location of an object of interest; mapping the location of the object of interest to a location on the video frame; and determining an object-of-interest area based on the location of the object of interest on the video frame.
  • a method of providing video content includes receiving, over a broadcast channel of a first network, a first video stream and metadata regarding the availability of a second video stream via a second network.
  • the method includes causing display of the first video stream on a first display device.
  • the method includes presenting the availability of at least the second video stream to a user, based on the metadata.
  • the method includes receiving a selection by the user of the second video stream.
  • the method includes requesting, responsive to the selection by the user, the second video stream over the second network using the received metadata.
  • the method includes receiving the second video stream over the second network.
  • the method includes causing display of the second video stream on the first display device.
  • FIG. 1 is a schematic block diagram of an adaptive bit rate (ABR) video distribution system with zoom coding capabilities.
  • ABR adaptive bit rate
  • FIG. 2 illustrates an information flow diagram, in accordance with an embodiment.
  • FIG. 3 depicts a view of a playing field and a real-time location system, in accordance with an embodiment.
  • FIG. 4 illustrates a plurality of potential bounding boxes that may be selected from a frame of high-resolution video for zoomed display of an object of interest, in accordance with some embodiments.
  • FIG. 5 illustrates a plurality of potential bounding boxes that may be selected from a frame of high-resolution video for zoomed display of an object of interest, in accordance with some embodiments, in cases where the object of interest is near the edge of the frame of high- resolution video.
  • FIG. 6 illustrates a plurality of potential bounding boxes that may be selected from a frame of high-resolution video for zoomed display of an object of interest in accordance with some embodiments.
  • FIG. 7A depicts a view of a video frame, in accordance with an embodiment.
  • FIG. 7B depicts a view of the video frame with object-of-interest areas, in accordance with an embodiment.
  • FIG. 8 depicts a display device on which a video is being displayed, in accordance with an embodiment.
  • FIG. 9 is a first message flow diagram, in accordance with an embodiment.
  • FIG. 10 illustrates an exemplary system architecture of a hybrid network zoom coding system.
  • FIG. 11 illustrates an exemplary system architecture of a hybrid network zoom coding system in which information identifying the availability of zoom coded streams is sent using ATSC 3.0 ROUTE/DASH and the video content of a selected zoom coded stream is delivered to the client using DASH over HTTP.
  • FIG. 12 illustrates an exemplary system architecture of a hybrid network zoom coding system in which information identifying the availability of zoom coded streams is sent using MPUs and the video content of a selected zoom coded stream is delivered to the client over a broadband network using DASH over HTTP.
  • FIG. 13 is a schematic illustration providing an overview of an exemplary multi- network embodiment.
  • FIG. 14 is a message flow diagram illustrating operation of an exemplary hybrid network zoom coding system.
  • FIG. 15 is a flow diagram of providing video content, in accordance with an embodiment.
  • FIG. 16A is a system diagram illustrating an example communications system in which one or more disclosed embodiments may be implemented.
  • FIG. 16B is a system diagram illustrating an example wireless transmit/receive unit (WTRU) that may be used within the communications system illustrated in FIG. 16A according to an embodiment.
  • WTRU wireless transmit/receive unit
  • FIG. 1 An exemplary functional architecture of an adaptive bitrate video distribution system with zoom coding features is illustrated in FIG. 1.
  • an input full-resolution stream (4K resolution, for example) which may be at a high bit depth may be processed and delivered at a lower resolution, such as high definition (HD), and/or lower bit depth, to an end consumer.
  • HD high definition
  • FIG. 1 traditional processing is represented in the components labeled "Traditional ABR Streams".
  • ABR adaptive bit rate
  • an adaptive bit rate encoder may produce ABR streams that are published to a streaming server, and the streaming server in turn delivers customized streams to end customers.
  • An exemplary zoom coding encoder shown in the bottom part of the workflow in FIG. 1, receives the high-bit-depth input video stream and with a variety of techniques produces auxiliary video streams.
  • auxiliary video streams may include, for example, streams representing cropped and/or zoomed portions of the original video streams to which different tone maps may have been applied to different regions, or, as discussed in greater detail below, streams for different regions of interest.
  • These auxiliary video streams may in turn be encoded using traditional ABR techniques.
  • a user is presented with the choice of watching the normal program (delivered using ABR profiles) and in addition, zoom coded streams that may represent zoomed portions of the original program or other auxiliary streams relating to the original program.
  • the client may request an appropriate stream from the streaming server.
  • the streaming server may then deliver the appropriate stream to the end client.
  • the streaming server is configured to transmit a video stream over a network to different display devices.
  • the network may be a local network, the Internet, or other similar network.
  • the display devices include devices capable of displaying the video, such as a television, computer monitor, laptop, tablet, smartphone, projector, and the like.
  • the video stream may pass through an intermediary device, such as a cable box, a smart video disc player, a dongle, or the like.
  • the client devices may each remap the received video stream to best match the display and viewing conditions.
  • Systems and method described herein enable viewers of a video stream to selectively view supplemental data regarding objects of interest in the video stream.
  • Metadata is provided to a video client device that identifies the supplemental data.
  • FIG. 2 is an information flow diagram, in accordance with an embodiment.
  • the object of interest is a professional athlete being filmed for a television broadcast during a sporting event.
  • Real-time object location information is determined for each object of interest, for example, each athlete and the ball have real-time location information.
  • the real-time object location information may be determined in any number of ways. Examples include placing a marker, such as an RFID tag, on each object of interest, outfitting each object of interest with a position determination system such as a GPS receiver, using local wireless rangers, and the like.
  • a position determination system such as a GPS receiver
  • Each player may be equipped with a unique marker that may be placed on any number of locations on the player, such as in a helmet, on a shoe, in a protective pad, or the like.
  • the location information may also be determined or supplemented by video data. For example, optical recognition techniques may be used to identify labels or numbers on a player's uniform, the shape of the playing ball, object colors, object shapes, object movement characteristics, and the like. The object's locations are determined and mapped to location information for the venue, such as the stadium.
  • the camera information includes information and/or metadata on the location and orientation of the camera, which may include data such as the pan, tilt, and direction the camera is pointed, and video and audio streams.
  • the camera information may also be supplemented with the camera's visual settings, such as its optical zoom level, focus settings, and the like.
  • the camera's field-of-view volume may be determined. This volume may then be mapped into a camera frame, and the player's location within the volume may be fused into the camera frame.
  • the stream zoom video encoder then receives the fused camera information and object information, with video frames marked with object-of-interest metadata for every frame or every set number of frames.
  • the object of interest metadata may indicate the position and/or size of an object of interest within the frame, as well as possibly providing identification for each object of interest. With information available on a frame-by-frame basis, a trajectory may be built for any object of interest.
  • the field-of-view volume may also be based on the camera's focal settings. In such an embodiment, an object of interest may be in the line of sight of the camera, but based on the focal settings may be out of focus, and thus not included in a determination (described below) of different object-of-interest areas.
  • location information from each object of interest is obtained and mapped into each video image captured.
  • the identified objects can then be tracked through the video as they appear and disappear from the view of the camera.
  • FIG. 3 depicts a view of a playing field and a real-time location system, in accordance with an embodiment.
  • FIG. 3 depicts the view 300 that includes the field 302, eight players 304, a ball 306, a video camera 308, and a real-time location system (RTLS) 310.
  • the field 302 is shown as an American football field.
  • the view 300 includes eight players 304, four on each side of the ball 306.
  • Each of the eight players 304 and the ball 306 is equipped with an RFID tag that is configured to transmit a radio signal.
  • the RTLS 310 receives the radio signals from the RFID tags and is able to determine a real-time location for each object based on the time-of-arrival (TO A) of each radio signal at the RTLS 310 receivers.
  • the camera 308 records video and sound of the players 304 and the ball 306 on the field 302.
  • the camera is also able to determine its position relative to a reference position and its orientation relative to a reference orientation.
  • the camera may be equipped with a camera mount that is able to detect the pitch, roll, and translation of the camera with respect to reference coordinates.
  • the camera is equipped with an RFID tag, and the RTLS 310 determines the position and orientation of the camera in relation to reference positions and orientations.
  • the camera 308 then transmits the video, audio, and camera location and orientation information to a fuse mapping service configured to fuse the camera based information with the real-time location information of the different objects of interest.
  • an orthographic projection may be developed using the methods disclosed in, for example, Sheikh, Y., et. al., Geodetic Alignment of Aerial Video Frames (2003).
  • an orthographic projection may be made based on the identification of the tags (sensors), and their determined locations.
  • Both the video-based and location -based orthographic projections may be fused onto any set of coordinates, such as GPS, Cartesian, polar, cylindrical, or any other set of coordinates suited to the environment.
  • Various embodiments may extend such an approach to scenarios with multiple video cameras. In such embodiments, different views of player positions appearing in the field of view of specific cameras can be collected and made available in a consolidated manner.
  • a video stream may be compiled from all available views from different cameras that are available.
  • the view 300 depicts a football field with players equipped with RFID tags
  • the scenario may be modified.
  • the field 302 may be replaced with an automobile race track
  • the players 304 may be replaced with automobiles equipped with GPS location technology and be able to transmit a determined GPS location to a server for mapping the locations of the automobiles on the race track for fusion with a video of the race.
  • Other scenarios may be likewise accommodated (e.g., a soccer game, a baseball game, a golf tournament, the filming of a movie set or a news program, etc.) in which one or more cameras and one or more people and/or objects of interest may be similarly outfitted in order to provide the camera information and the object information as described herein.
  • a bounding box that surrounds an object of interest may be defined for each video frame in which the object of interest appears.
  • the bounding box may be based on pixel coordinates of the video frame. With the coordinate position of each object of interest in the video frame identified, metadata is created to notify the client about the availability of zoom coded streams or other supplemental content regarding different objects of interest, as well as bounding box or object-of-interest area information for such objects of interest.
  • FIG. 4 illustrates a plurality of potential bounding boxes that may be selected from a frame 500 of native-resolution video for zoomed display of an object of interest.
  • the video frame is at a native source resolution of 6K.
  • Various bounding boxes may be defined to select portions of the video frame that can be presented at a lower resolution. For example, one bounding box may be used to select a region with dimensions of 1280 x 720 pixels, which allows for presentation of video in high definition (FID).
  • Another bounding box may be used to select a region with dimensions of 640 x 480 pixels, which allows for presentation of video in standard definition (SD).
  • SD standard definition
  • the bottom left corner is located at (xi, yi), the top left corner at (xi, yi + 720), the top right corner located at (xi + 1280, yi + 720), and the bottom right corner located at (xi + 1280, yi).
  • the bottom left corner is located at ( 2, yi), the top left corner at ( 2, y2 + 480), the top right corner located at (x2 + 640, y2 + 480), and the bottom right corner located at (x2 + 640, yi).
  • the position of each bounding box may be determined automatically based on real-time information regarding the position and orientation of the camera and the location of the object of interest.
  • the real-time location of the object of interest is determined to be at the black dot located at (ai, bi).
  • the location of the object of interest may be determined by any real-time location service, and may be fused into the location of the video view.
  • a player may be known to be wearing an RFID tag near the waist.
  • a bounding box may then be positioned, as in FIG. 4, such that the object of interest is at the center of the bounding box.
  • xl may be placed at al-640 and yl at bl-360.
  • x2 may be placed at al-320 and y2 at b 1-240.
  • the bounding box may be positioned such that the location of the object of interest, point (ai, bi), is toward the top of the bounding box, leaving room for the player's body and legs to be in the middle portion and lower portion, respectively, of the region of interest frame when the player is vertical.
  • the orientation (standing, jumping, diving, etc.) may be determined for the player in selecting the position of the bounding box around the object of interest.
  • Example methods for determining a player's orientation may include placing RFID sensors on the player's head and feet to determine two end-point locations of the player or correlating optical features of the video with the determined location, or the like.
  • FIG. 5 Another situation in which it may be desirable to select a bounding box location that is not centered on the corresponding object of interest is illustrated in FIG. 5.
  • an object of interest is near the edge of the native-resolution video frame.
  • the view depicted in FIG. 5 is similar to the view depicted in FIG. 4, except that the object of interest is located near the left edge of the frame.
  • bounding boxes may be positioned such that, for example, the distance between the object of interest and the center of the bounding box is minimized, subject to the constraint that the bounding box is entirely within the native-resolution frame.
  • FIG. 6 illustrates a frame of native-resolution video including the positions of an HD bounding box, an SD bounding box, and an object-of-interest area.
  • the view depicted in FIG. 6 is similar to the view depicted in FIG. 4, with a location of the object-of-interest area added within the bounding boxes. Based on the location of the object of interest, an object-of-interest area is determined, as represented by the ellipse.
  • the object-of-interest area is depicted as an ellipse, it may have any shape, such as a square, a rectangle, a circle, an arbitrary object shape outline, and the like.
  • a user is provided with an option of selecting content within a particular bounding box.
  • the video content within the bounding box is delivered as a separate stream to the user.
  • the user may select one or more objects of interest, and in response a streaming server may deliver the video content (e.g., an enhanced version of the video content) within a bounding box which contains the one or more objects of interest.
  • One or more objects of interest may be positioned within the bounding box.
  • a supplemental data such as a zoom coded video stream may be provided to the client.
  • FIG. 7 A depicts a view of a video frame, in accordance with an embodiment.
  • FIG. 7A depicts the view 800.
  • the view 800 is a view of a video frame from the perspective of the camera 308 of FIG. 3.
  • the view 800 includes portions of the field 302, four players on the left (304A-D), the ball 306 in the center, and four players on the right (304E-H).
  • a real-time location system determines the location of each of the players 304A-H and the ball 306. The physical locations are fused with the locations of the objects in the video frame.
  • FIG. 7B depicts a view of the video frame with object-of-interest areas, in accordance with an embodiment.
  • FIG. 7B depicts the view 810.
  • the objects present in view 810 include the objects of the view 800 of FIG. 7A.
  • Also depicted in the view 810 are object-of-interest areas.
  • Each of the objects of interest has a determined object-of-interest area, as depicted by the rectangles 804A-H around the players 304A-H and the rectangle 806 around the ball 306.
  • the object-of-interest areas may indicate areas that are preferably not obstructed when supplemental data is displayed (e.g., overlayed) on the video frame.
  • FIG. 8 depicts a display device (e.g., a television or computer monitor) on which a video is being displayed, in accordance with an embodiment.
  • FIG. 8 depicts a display device 830 that depicts the view 800 of the football game shown in FIG. 7 A, with supplemental data 832 displayed.
  • a user has requested supplemental data related to the player 304H (the right-most player as illustrated in FIG. 7A) to be displayed.
  • the indicator 824H is displayed linking the text of the supplemental data with the location of the object of interest.
  • supplemental data may be displayed beyond text, such as image or video content related to the object of interest, a highlighting overlay of the object of interest, and/or the like.
  • determining the bounding boxes may be performed over several frames. Video Delivery Method.
  • FIG. 9 is a first message flow diagram, in accordance with a network-based system for delivering zoom coded content.
  • FIG. 9 illustrates the operation of an exemplary video delivery system, depicting communications between a content source, a fuse mapper, an encoder, a transport packager, a server (e.g., an origin server, an edge streaming server, or other streaming server), a web server, and a client device.
  • a server e.g., an origin server, an edge streaming server, or other streaming server
  • Exemplary embodiments disclosed herein employ alternative techniques in which at least a portion of the video data provided to a client device is provided over a broadcast transmission.
  • the content source transmits a compressed or uncompressed media stream of the source media (such as a video stream) to a fuse mapper. Additionally, location information (e.g., object location and/or bounding box information, such as RTLS location data) associated with objects of interest is also transmitted to a fuse mapper. Location information is added to each frame for the obj ects of interest.
  • the fuse mapper then transmits the fused video and location information to the encoder at a high bit depth. The locations of the object-of-interest areas are included in the transmission to the encoder.
  • the encoder may separately create ABR streams with default tone mappings and in some embodiments ABR streams with alternative tone remappings in various regions of interest (or combinations thereof).
  • the various ABR streams with both the fused location and video information are transmitted to a transport packager.
  • the transport packager may segment the files and make the files available via an ftp or http download and may prepare a manifest.
  • the segmented media content may be delivered to an origin server, to one or multiple edge streaming servers, to a combination of these server types, or any suitable media server from which a media client may request the media content.
  • a manifest file e.g., a DASH MPD
  • the streaming server e.g., the origin server and/or the edge streaming server
  • the manifest file may be generated dynamically in response to a client request for the manifest file.
  • a client device may transmit a signal to a web server requesting to download the media content and may receive a streaming server redirect signal.
  • the client device may request a manifest which describes the available content files (e.g., media segment files).
  • the request may be sent from the client to a server.
  • the server e.g., origin server or an edge streaming server
  • the manifest may indicate availability of the various ABR streams, region-of-interest areas bounding box areas, supplemental data or metadata for the various objects of interest, and the like.
  • the client may request a default stream from a streaming server, and the streaming server may responsively transmit the default stream (e.g., media segments of that default stream) to the client device.
  • the client device may display the default stream.
  • the default stream may be provided to the client device using a broadcast transmission, and the broadcast transmission may further include information regarding the location and/or content of supplemental streams that can be retrieved over the non-broadcast network.
  • the client device may detect a cue to request streams associated with particular bounding boxes or objects of interest.
  • the cue may be user input wherein the user selects a player or other object of interest associated with a bounding box.
  • the fuse mapper may include location information of all objects of interest in or at a venue. However, only the relevant location information corresponding to the camera image is laid on top of the pixel coordinates on a frame by frame basis.
  • the fuse mapper thus outputs both video data and coordinates of the objects identified by the real-time location system in the form of per-frame metadata that identifies the object in the camera pixel domain.
  • the coordinates of the objects of interest and the related region-of-interest areas are updated as the object locations are updated.
  • the system may also determine open spaces within the video frames, which may be used as display locations for supplemental streams or overlays, and may transmit these open areas to a client device.
  • the open spaces may be determined using methods such as those described in international patent application PCT/US 17/43248, entitled “Systems and Methods for Integrating and Delivering Objects of Interest in Video", filed July 21, 2017, or U.S. Provisional Patent Application No. 62/365,868, filed July 22, 2016, entitled “Systems and Methods for Integrating and Delivering Objects of Interest in Video", each of which is hereby incorporated by reference in its entirety.
  • a clear area may be identified that is outside of an object-of-interest area but within a bounding box.
  • the clear area may be defined using pixel coordinates of the respective video frame. Locating the clear areas outside of all of the object-of-interest areas may prevent the displayed supplemental data or views from obstructing the object-of-interest areas (such as players, a ball, etc.). A viewer can select supplemental streams associated with different objects of interest, after which a client device can receive and/or retrieve supplemental data associated with the selected overlay of interest and display video with the supplemental data being displayed, in some instances within the determined open area(s) (but not necessarily).
  • the result of mapping the real-time locations of an object of interest to the camera pixel 2D image space is a (x, y) pixel position in the camera 2D image.
  • the (x, y) pixel position moves according to the location-tracking results of the objects of interest, for example, as the players move across the field.
  • the size of the object-of-interest area that contains the object may also be determined by the fusion mapper based on the camera parameters: zoom setting, camera angle, camera location, and the like.
  • the camera focus and zoom may change dynamically as the camera operator follows the action.
  • the camera can also be moving, for example with a handheld or aerial camera.
  • the camera may further include an RFID tag, and the real-time location service may determine the camera's location.
  • the client may request metadata and video streams for different scenarios.
  • the client requests overlay of highlights on tracked objects during video playback of the full field view.
  • the client may use the metadata to request specific zoom streams from the server.
  • the supplemental data requested by the user may be other than an alternative or zoomed view, but may include additional overlays such as graphical information displays (e.g., player stats, etc.), related content (such as images or video), and/or the like.
  • Embodiments disclosed herein may be employed in an MPEG-DASH ABR video distribution system.
  • Metadata such as the identity and location of objects of interest and supplemental information regarding the objects of interest may be contained in the DASH MPD (or other streaming protocol manifest) and/or in the user data of the video elementary stream to provide for frame by frame updates.
  • a client device may first read the manifest on start up for initial object information, and then continually update the object track information by parsing the video frame elementary data.
  • Object-of-interest metadata may be available in-band or from a separate metadata file and may be used to update the user interface or other components related to selection and display of objects of interest and associated supplemental data.
  • Num_Objects Range 0-255, defines the number of objects to be defined in the current list. If Num Objects is greater than zero, then the following parameters are provided for each of the Num Objects objects, each of which may pertain to an object of interest.
  • Object_ID Range 0-255. This syntax element provides a unique identifier for each object of interest.
  • Object_x_position[n] For each object ID n, the x position of the object-of-interest area.
  • Object_y_position[n] For each object ID n, the y position of the object-of-interest area.
  • Object_x_size[n] For each object ID n, the x dimension of the object-of-interest area.
  • Object_y_size[n] For each object ID n, the y dimension of the object-of-interest area.
  • Object_UserData[n] For each object ID n, User Data can be included to be used by the Client to present User Interface Selection Criteria for the Object.
  • Object x,y position and size may be provided in pixel units that correspond to the first- listed representation in the appropriate adaption set.
  • the Object x,y size and position values are scaled to the secondary representations picture dimensions with a linear scale factor.
  • the parameters may also include information identifying open areas related to objects of interest, which may be used for locating a second stream to be displayed in relation to the first stream.
  • a client device by receiving an MPD or in-band data with the above-identified information can represent the object of interest on the user interface in a variety of ways.
  • the Supplemental Property of an adaption set indicates to the client how many objects of interest are available.
  • Object UserData may be used by the client to display information describing aspects of the object of information. For example, in a sports game this can be specific player information.
  • a camera captures video, which may be at high resolution, of a scene.
  • the camera may also be configured to detect its location, pan, tilt, optical zoom, focal settings, and the like.
  • the location of an object of interest may be determined in the video frame.
  • An object of interest's location may be determined by a real-time location tracking system, such as RFID, and based on the determined locations of the object of interest and the camera's position, the location of each object of interest may be determined within the video frame.
  • An object-of- interest area may be determined within the video frame for each object of interest.
  • the object-of- interest area may be defined by a set of pixels of the video frame.
  • Metadata related to video content may be received at the client device.
  • the metadata may be sent, for example, in a broadcast video transmission.
  • the metadata may include, for example, information identifying object-of-interest areas (e.g., providing coordinates of object-of- interest areas, or providing information identifying a resource through which coordinates of object- of-interest areas may be obtained).
  • zoom coded streams offer enhanced viewing experiences to end users, with zoom coded streams being generated from, for example, high resolution, high frame- rate, and/or high bit depth videos of content.
  • zoom coding may be used as a way to generate additional or alternate views for the end user.
  • a traditional broadcast network encodes live content and delivers the encoded streams over a variety of wired and wireless networks. Broadcasting live content is an efficient way to deliver content in a scalable manner to a wide customer base.
  • One common mechanism to deliver digital video broadcast services is using standard MPEG2 transport stream packages.
  • An alternative is to use an MP4 packaging format.
  • the broadcast network is a one-way network.
  • Embodiments disclosed herein make use of an alternative network, such as a packet- switched network, to deliver more interactive content.
  • a combination of a broadcast network with content and a two-way IP network can allow for content to be available from both networks simultaneously.
  • Mechanisms are described herein for delivering zoom coded content (e.g., enhanced views of one or more selected objects of interest) over an alternative network that is different from a network over which primary content is being delivered.
  • zoom coded content e.g., enhanced views of one or more selected objects of interest
  • any secondary content may be delivered over the alternative network.
  • alternative overlays may be delivered over the alternative network, not just a zoomed or enhanced view of primary content.
  • a primary video stream is broadcast over the air while streams providing customized zoom experiences are delivered over broadband connections.
  • transmission of a primary video stream along with data identifying one or more available zoom coded streams is transmitted using the ATSC 3.0 standard are used to enable user customization and zoom coding functionality.
  • FIG. 10 An exemplary embodiment is illustrated in FIG. 10.
  • content may be encoded and sent over a broadcast network (such as a Terrestrial Broadcast, microwave, satellite or fiber system) to be received by a broadcast receiver.
  • Information identifying one or more zoom coded streams may be embedded in the Transport Stream (or MP4) of the broadcast system in the private data section.
  • Such information may include, for example, information on the location (e.g., URL for the manifest) and/or availability (e.g., what streams are available and with what characteristics) of zoom coded streams.
  • the information in the broadcast stream that identifies the availability of zoom coded streams may include an embedded timestamp that is indicative of the time availability of the zoom coded streams.
  • the information indicating the availability of zoom coded streams may be sent periodically, since the point where an end user tunes into the channel and the location of the indicators may not line up.
  • Various mechanisms known to those of skill in the art may be used to carry such private data within a digital broadcast system to a receiving client.
  • the client device may display for an end user information on the availability of zoom coded streams at the appropriate time. If a user requests a specific zoom coded stream to be played, a network interface of the client device may initiate streaming with a server on which the zoom coded stream is available and may cause display of the streamed content (e.g., on a built-in or external monitor). In response to reaching the end of streamed content (or earlier, if requested by a user), the client device may automatically switch back to display of the live broadcast video.
  • a server on which the zoom coded stream is available and may cause display of the streamed content (e.g., on a built-in or external monitor).
  • the client device may automatically switch back to display of the live broadcast video.
  • a hybrid system makes use of ATSC 3.0 broadcast video and a separate broadband data network.
  • ATSC 3.0 there are several tools and mechanisms available for hybrid broadcast and broadband delivery of content. These mechanisms are tailored to deliver both real-time and non-real-time, file based programs as well.
  • the availability of zoom coded content may be indicated in a broadcast transmission by sending a DASH MPD or other manifest using the Realtime Object Delivery over Unidirectional Transport (ROUTE) protocol as described by ATSC for delivering DASH format and non-real-time content over broadcast channels.
  • ROUTE is derived from the FLUTE (File Delivery over Unidirectional Networks IETF RFC 6726).
  • FLUTE File Delivery over Unidirectional Networks IETF RFC 6726
  • various techniques may be used to handle the delivery of information identifying the zoom coded streams.
  • the normal broadcast stream is delivered over broadcast using ROUTE, and the zoom coded streams are delivered over broadband as DASH segments via HTTP on the broadband channel.
  • the broadcast and broadband paths are time-aligned and delivered to accomplish time alignment at a receiver according to the receiver buffer model. In some embodiments, this is accomplished by delaying the broadcast path sufficiently to ensure that the broadband path is synchronized with the broadcast path. Support for handover from the broadcast to the broadband pipe to receive the zoom coded stream may be indicated by the MPD fragment in the Service Layer Signaling.
  • FIG. 11 illustrates an exemplary system architecture of a hybrid network zoom coding system in which information identifying the availability of zoom coded streams is sent using ATSC 3.0 ROUTE/DASH and the video content of a selected zoom coded stream is delivered to the client using DASH over HTTP.
  • media data are encapsulated into Media Processing Units (MPUs) that are packetized into MPEG Media Transport Protocol (MMTP) packets.
  • MPUs Media Processing Units
  • MMTP MPEG Media Transport Protocol
  • both the transmitter and receiver systems may be locked to UTC to maintain synchronization.
  • the zoom coded streams may be presented over broadband over DASH. All the streams being sent over broadcast or broadband may be locked to UTC to help with alignment/synchronization.
  • techniques for coordinating broadcast and broadband communications include those described in ATSC Document S33-174, ATSC Candidate Standard for Signaling Delivery, Synchronization and Error Protection" Jan 2016. FIG.
  • FIG. 12 illustrates an exemplary system architecture of a hybrid network zoom coding system in which information identifying the availability of zoom coded streams is sent over the broadcast using MPUs and the video content of a selected zoom coded stream is delivered to the client over a broadband network using DASH over HTTP.
  • FIG. 13 An exemplary user interface is illustrated in FIG. 13.
  • an unzoomed view 1305 of a sporting event is provided in a live broadcast stream (for example, transmitted from an antenna 1310 and received at a user antenna 1315) that is displayed 1340 on a television, monitor, or other display device 1335.
  • the broadcast includes data identifying the availability of one or multiple zoom coded streams.
  • the zoom coded streams may represent zoomed or otherwise enhanced views of sports players or other objects of interest visible in the live broadcast stream.
  • a user interface element may be provided on the display 1335. The user interface element may allow selection of at least one of the zoom coded streams.
  • FIG. 13 an unzoomed view 1305 of a sporting event is provided in a live broadcast stream (for example, transmitted from an antenna 1310 and received at a user antenna 1315) that is displayed 1340 on a television, monitor, or other display device 1335.
  • the broadcast includes data identifying the availability of one or multiple zoom coded streams.
  • the zoom coded streams may
  • the user interface element may be a still image or a video (e.g., a low- resolution video) provided in a picture-in-picture arrangement.
  • Other depictions of the user interface element are possible; for example, the user interface element may highlight a sports player or other object of interest in the main video view as rendered from the live broadcast stream 1305, where an associated zoom coded stream 1320 is available (such as at a server 1325) for that sports player or object of interest.
  • the client device 1335 retrieves the requested stream 1320 (e.g., a zoom coded stream providing a zoomed or otherwise enhanced view of the sports player or other object of interest) over a broadband connection (such as through a modem 1330).
  • a broadband connection such as through a modem 1330.
  • the zoom coded video 1320 is a high-resolution zoomed video of a particular region of interest in the sporting event.
  • the requested stream 1320 may be decoded and displayed to the user, for example as a full screen zoomed view or as a picture-in-picture arrangement 1345 overlaid on the main view 1340.
  • FIG. 14 illustrates an exemplary flow of information among components in some embodiments.
  • a high-resolution camera may obtain a video to be broadcast, and a broadcast encoder encodes the video in an appropriate broadcast format.
  • Sensor information e.g., RFID information
  • Information on the location and/or extent of objects of interest may be encoded in the broadcast data and may be used, for example by the client device to highlight the object of interest while displaying the ordinary broadcast video.
  • a broadband server (or other network entity) may generate a video representing a zoomed version of the respective object of interest.
  • These videos may be generated from a high resolution (e.g., the native-resolution) version of the video to generate a high-quality zoomed video.
  • a user client may retrieve a zoom coded video corresponding to that object of interest from a broadband server and subsequently cause the zoom coded video to be displayed.
  • the zoom coded video may be displayed as a full screen zoom of the selected obj ect or region of interest, or the zoom coded video may be displayed together with the main video view from the broadcast signal.
  • the broadcast client and the broadband client may be within a user display device (e.g., a connected television, a tablet device, a mobile phone, a gaming device, a media player device, and/or the like).
  • any secondary content may be delivered over the broadband network.
  • alternative overlays may be delivered over the broadband network, not just a zoomed or enhanced view of broadcast video.
  • FIG. 15 is a flow diagram of providing video content, in accordance with an embodiment.
  • the method may include a step 1505 of receiving, over a broadcast channel of a first network, a first video stream and metadata regarding the availability of a second video stream which may be requested via a second network.
  • the method may include a step 1510 of causing display of the first video stream on a first display device.
  • the method may include a step 1515 of presenting the availability of at least the second video stream to a user, based on the metadata.
  • the method may include a step 1520 of receiving a selection by the user of the second video stream.
  • the method may include a step 1525 of requesting, responsive to the selection by the user, the second video stream over the second network using the received metadata.
  • the method may include a step 1530 of receiving the second video stream over the second network.
  • the method may include a step 1535 of causing display of the second video stream on the first display device.
  • a method of providing video content comprising: receiving, over a broadcast channel, first video stream data and metadata regarding the availability of a second video stream via a packet-switched network; sending, over the packet-switched network, in accordance with the metadata regarding the availability of a second video stream via a network received over the broadcast channel with the first video stream data, a request for the second video stream data; receiving, over the network, second video stream data; and causing display of the second video stream data.
  • the method may also include wherein the display of the first video stream data and second video stream data are time synchronized.
  • the broadcast channel is an ATSC channel or a cable QAM channel.
  • the method may also include wherein the packet-switched network is an IP network.
  • the method may also include wherein the packet-switched network includes a cellular data connection.
  • the method may also include wherein the metadata regarding the availability of a second video stream identifies an object of interest, and wherein the second video stream is an enhanced view of the object of interest.
  • the method may also include wherein the metadata regarding the availability of a second video stream identifies an object of interest, and wherein the request is sent in response to user selection of the object of interest.
  • the method may also include wherein the metadata regarding the availability of a second video stream comprises coordinates of an object-of-interest area.
  • the method may also include wherein causing display of the second video stream comprises causing display of the second video stream simultaneously with display of the first video stream.
  • the method may also include wherein causing display of the second video stream comprises causing display of the second video stream in a picture-in-picture relationship with the first video stream.
  • the method may also include wherein the metadata regarding the availability of a second video stream identifies an object of interest, and wherein the second video stream is a zoomed view of the object of interest.
  • a system for providing video content the system being operative to perform functions comprising: receiving, over a broadcast channel, first video stream data and metadata regarding the availability of a second video stream via a packet-switched network; sending, over the packet-switched network, in accordance with the metadata regarding the availability of a second video stream via a network received over the broadcast channel with the first video stream data, a request for the second video stream data; receiving, over the network, second video stream data; and causing display of the second video stream data.
  • a system for providing video content comprising a processor and a non-transitory storage medium storing instructions operative, when executed on the processor, to perform functions comprising: receiving, over a broadcast channel, first video stream data and metadata regarding the availability of a second video stream via a packet-switched network; sending, over the packet-switched network, in accordance with the metadata regarding the availability of a second video stream via a network received over the broadcast channel with the first video stream data, a request for the second video stream data; receiving, over the network, second video stream data; and causing display of the second video stream data.
  • the system may further comprise a display, wherein causing display of the second video stream data comprises displaying the second video stream data on the display.
  • there is a method of providing video content comprising: receiving, over a broadcast channel, a first video stream and metadata regarding the availability of a second video stream which may be requested via a network; displaying the first video stream; requesting the second video stream over the network using the received metadata; receiving the second video stream over the network; and causing display of the second video stream.
  • modules that carry out (i.e., perform, execute, and the like) various functions that are described herein in connection with the respective modules.
  • a module includes hardware (e.g., one or more processors, one or more microprocessors, one or more microcontrollers, one or more microchips, one or more application-specific integrated circuits (ASICs), one or more field programmable gate arrays (FPGAs), one or more memory devices) deemed suitable by those of skill in the relevant art for a given implementation.
  • ASICs application-specific integrated circuits
  • FPGAs field programmable gate arrays
  • Each described module may also include instructions executable for carrying out the one or more functions described as being carried out by the respective module, and it is noted that those instructions may take the form of or include hardware (i.e., hardwired) instructions, firmware instructions, software instructions, and/or the like, and may be stored in any suitable non-transitory computer-readable medium or media, such as commonly referred to as RAM, ROM, etc.
  • Exemplary embodiments disclosed herein are implemented using one or more wired and/or wireless network nodes, such as a wireless transmit/receive unit (WTRU) or other network entity.
  • WTRU wireless transmit/receive unit
  • FIG. 16A is a diagram illustrating an example communications system 1600 in which one or more disclosed embodiments may be implemented.
  • the communications system 1600 may be a multiple access system that provides content, such as voice, data, video, messaging, broadcast, etc., to multiple wireless users.
  • the communications system 1600 may enable multiple wireless users to access such content through the sharing of system resources, including wireless bandwidth.
  • the communications systems 1600 may employ one or more channel access methods, such as code division multiple access (CDMA), time division multiple access (TDMA), frequency division multiple access (FDMA), orthogonal FDMA (OFDMA), single- carrier FDMA (SC-FDMA), zero-tail unique-word DFT-Spread OFDM (ZT UW DTS-s OFDM), unique word OFDM (UW-OFDM), resource block-filtered OFDM, filter bank multicarrier (FBMC), and the like.
  • CDMA code division multiple access
  • TDMA time division multiple access
  • FDMA frequency division multiple access
  • OFDMA orthogonal FDMA
  • SC-FDMA single- carrier FDMA
  • ZT UW DTS-s OFDM zero-tail unique-word DFT-Spread OFDM
  • UW-OFDM unique word OFDM
  • FBMC filter bank multicarrier
  • the communications system 1600 may include wireless transmit/receive units (WTRUs) 1602a, 1602b, 1602c, 1602d, a RAN 1604, a CN 1606, a public switched telephone network (PSTN) 1608, the Internet 1610, and other networks 1612, though it will be appreciated that the disclosed embodiments contemplate any number of WTRUs, base stations, networks, and/or network elements.
  • WTRUs 1602a, 1602b, 1602c, 1602d may be any type of device configured to operate and/or communicate in a wireless environment.
  • the WTRUs 1602a, 1602b, 1602c, 1602d may be configured to transmit and/or receive wireless signals and may include a user equipment (UE), a mobile station, a fixed or mobile subscriber unit, a subscription- based unit, a pager, a cellular telephone, a personal digital assistant (PDA), a smartphone, a laptop, a netbook, a personal computer, a wireless sensor, a hotspot or Mi-Fi device, an Internet of Things (IoT) device, a watch or other wearable, a head-mounted display (HMD), a vehicle, a drone, a medical device and applications (e.g., remote surgery), an industrial device and applications (e.g., a robot and/or other wireless devices operating in an industrial and/or an automated processing chain contexts), a consumer electronics device, a device operating on commercial and/or industrial wireless networks, and the like.
  • UE user equipment
  • PDA personal digital assistant
  • smartphone a laptop
  • a netbook a personal
  • the communications systems 1600 may also include a base station 1614a and/or a base station 1614b.
  • Each of the base stations 1614a, 1614b may be any type of device configured to wirelessly interface with at least one of the WTRUs 1602a, 1602b, 1602c, 1602d to facilitate access to one or more communication networks, such as the CN 1606, the Internet 1610, and/or the other networks 1612.
  • the base stations 1614a, 1614b may be a base transceiver station (BTS), a Node-B, an eNode B, a Home Node B, a Home eNode B, a gNB, a NR NodeB, a site controller, an access point (AP), a wireless router, and the like. While the base stations 1614a, 1614b are each depicted as a single element, it will be appreciated that the base stations 1614a, 1614b may include any number of interconnected base stations and/or network elements.
  • the base station 1614a may be part of the RAN 1604, which may also include other base stations and/or network elements (not shown), such as a base station controller (BSC), a radio network controller (RNC), relay nodes, etc.
  • BSC base station controller
  • RNC radio network controller
  • the base station 1614a and/or the base station 1614b may be configured to transmit and/or receive wireless signals on one or more carrier frequencies, which may be referred to as a cell (not shown). These frequencies may be in licensed spectrum, unlicensed spectrum, or a combination of licensed and unlicensed spectrum.
  • a cell may provide coverage for a wireless service to a specific geographical area that may be relatively fixed or that may change over time.
  • the cell may further be divided into cell sectors. For example, the cell associated with the base station 1614a may be divided into three sectors.
  • the base station 1614a may include three transceivers, i.e., one for each sector of the cell.
  • the base station 1614a may employ multiple-input multiple output (MTMO) technology and may utilize multiple transceivers for each sector of the cell.
  • MTMO multiple-input multiple output
  • beamforming may be used to transmit and/or receive signals in desired spatial directions.
  • the base stations 1614a, 1614b may communicate with one or more of the WTRUs 1602a, 1602b, 1602c, 1602d over an air interface 1616, which may be any suitable wireless communication link (e.g., radio frequency (RF), microwave, centimeter wave, micrometer wave, infrared (IR), ultraviolet (UV), visible light, etc.).
  • the air interface 1616 may be established using any suitable radio access technology (RAT).
  • RAT radio access technology
  • the communications system 1600 may be a multiple access system and may employ one or more channel access schemes, such as CDMA, TDMA, FDMA, OFDMA, SC-FDMA, and the like.
  • the base station 1614a in the RAN 1604 and the WTRUs 1602a, 1602b, 1602c may implement a radio technology such as Universal Mobile Telecommunications System (UMTS) Terrestrial Radio Access (UTRA), which may establish the air interface 1616 using wideband CDMA (WCDMA).
  • WCDMA may include communication protocols such as High-Speed Packet Access (HSPA) and/or Evolved HSPA (HSPA+).
  • HSPA High-Speed Packet Access
  • HSPA+ Evolved HSPA
  • HSPA may include High-Speed Downlink (DL) Packet Access (HSDPA) and/or High-Speed UL Packet Access (HSUPA).
  • DL High-Speed Downlink
  • HSDPA High-Speed Downlink Packet Access
  • HSUPA High-Speed UL Packet Access
  • the base station 1614a and the WTRUs 1602a, 1602b, 1602c may implement a radio technology such as Evolved UMTS Terrestrial Radio Access (E-UTRA), which may establish the air interface 1616 using Long Term Evolution (LTE) and/or LTE- Advanced (LTE- A) and/or LTE- Advanced Pro (LTE-A Pro).
  • LTE Long Term Evolution
  • LTE- A LTE- Advanced
  • LTE-A Pro LTE- Advanced Pro
  • the base station 1614a and the WTRUs 1602a, 1602b, 1602c may implement a radio technology such as NR Radio Access, which may establish the air interface 1616 using New Radio (NR).
  • a radio technology such as NR Radio Access, which may establish the air interface 1616 using New Radio (NR).
  • the base station 1614a and the WTRUs 1602a, 1602b, 1602c may implement multiple radio access technologies.
  • the base station 1614a and the WTRUs 1602a, 1602b, 1602c may implement LTE radio access and NR radio access together, for instance using dual connectivity (DC) principles.
  • DC dual connectivity
  • the air interface utilized by WTRUs 1602a, 1602b, 1602c may be characterized by multiple types of radio access technologies and/or transmissions sent to/from multiple types of base stations (e.g., a eNB and a gNB).
  • the base station 1614a and the WTRUs 1602a, 1602b, 1602c may implement radio technologies such as IEEE 802.11 (i.e., Wireless Fidelity (WiFi), IEEE 802.16 (i.e., Worldwide Interoperability for Microwave Access (WiMAX)), CDMA2000, CDMA2000 IX, CDMA2000 EV-DO, Interim Standard 2000 (IS-2000), Interim Standard 95 (IS- 95), Interim Standard 856 (IS-856), Global System for Mobile communications (GSM), Enhanced Data rates for GSM Evolution (EDGE), GSM EDGE (GERAN), and the like.
  • IEEE 802.11 i.e., Wireless Fidelity (WiFi)
  • IEEE 802.16 i.e., Worldwide Interoperability for Microwave Access (WiMAX)
  • CDMA2000, CDMA2000 IX, CDMA2000 EV-DO Code Division Multiple Access 2000
  • IS-95 IS- 95
  • IS-856 Interim Standard 856
  • GSM Global System for Mobile
  • the base station 1614b in FIG. 16A may be a wireless router, Home Node B, Home eNode B, or access point, for example, and may utilize any suitable RAT for facilitating wireless connectivity in a localized area, such as a place of business, a home, a vehicle, a campus, an industrial facility, an air corridor (e.g., for use by drones), a roadway, and the like.
  • the base station 1614b and the WTRUs 1602c, 1602d may implement a radio technology such as IEEE 802.11 to establish a wireless local area network (WLAN).
  • WLAN wireless local area network
  • the base station 1614b and the WTRUs 1602c, 1602d may implement a radio technology such as IEEE 802.15 to establish a wireless personal area network (WPAN).
  • the base station 1614b and the WTRUs 1602c, 1602d may utilize a cellular- based RAT (e.g., WCDMA, CDMA2000, GSM, LTE, LTE-A, LTE-A Pro, NR etc.) to establish a picocell or femtocell.
  • the base station 1614b may have a direct connection to the Internet 1610.
  • the base station 1614b may not be required to access the Internet 1610 via the CN 1606.
  • the RAN 1604 may be in communication with the CN 1606, which may be any type of network configured to provide voice, data, applications, and/or voice over internet protocol (VoIP) services to one or more of the WTRUs 1602a, 1602b, 1602c, 1602d.
  • the data may have varying quality of service (QoS) requirements, such as differing throughput requirements, latency requirements, error tolerance requirements, reliability requirements, data throughput requirements, mobility requirements, and the like.
  • QoS quality of service
  • the CN 1606 may provide call control, billing services, mobile location-based services, pre-paid calling, Internet connectivity, video distribution, etc., and/or perform high-level security functions, such as user authentication.
  • the RAN 1604 and/or the CN 1606 may be in direct or indirect communication with other RANs that employ the same RAT as the RAN 1604 or a different RAT.
  • the CN 1606 may also be in communication with another RAN (not shown) employing a GSM, UMTS, CDMA 2000, WiMAX, E-UTRA, or WiFi radio technology.
  • the CN 1606 may also serve as a gateway for the WTRUs 1602a, 1602b, 1602c, 1602d to access the PSTN 1608, the Internet 1610, and/or the other networks 1612.
  • the PSTN 1608 may include circuit-switched telephone networks that provide plain old telephone service (POTS).
  • POTS plain old telephone service
  • the Internet 1610 may include a global system of interconnected computer networks and devices that use common communication protocols, such as the transmission control protocol (TCP), user datagram protocol (UDP) and/or the internet protocol (IP) in the TCP/IP internet protocol suite.
  • the networks 1612 may include wired and/or wireless communications networks owned and/or operated by other service providers.
  • the networks 1612 may include another CN connected to one or more RANs, which may employ the same RAT as the RAN 1604 or a different RAT.
  • Some or all of the WTRUs 1602a, 1602b, 1602c, 1602d in the communications system 1600 may include multi-mode capabilities (e.g., the WTRUs 1602a, 1602b, 1602c, 1602d may include multiple transceivers for communicating with different wireless networks over different wireless links).
  • the WTRU 1602c shown in FIG. 16A may be configured to communicate with the base station 1614a, which may employ a cellular-based radio technology, and with the base station 1614b, which may employ an IEEE 802 radio technology.
  • FIG. 16B is a system diagram illustrating an example WTRU 1202.
  • the WTRU 1202 may include a processor 1618, a transceiver 1620, a transmit/receive element 1622, a speaker/microphone 1624, a keypad 1626, a display/touchpad 1628, nonremovable memory 1630, removable memory 1632, a power source 1634, a global positioning system (GPS) chipset 1636, and/or other peripherals 1638, among others.
  • GPS global positioning system
  • the processor 1618 may be a general purpose processor, a special purpose processor, a conventional processor, a digital signal processor (DSP), a plurality of microprocessors, one or more microprocessors in association with a DSP core, a controller, a microcontroller, Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) circuits, any other type of integrated circuit (IC), a state machine, and the like.
  • the processor 1618 may perform signal coding, data processing, power control, input/output processing, and/or any other functionality that enables the WTRU 1602 to operate in a wireless environment.
  • the processor 1618 may be coupled to the transceiver 1620, which may be coupled to the transmit/receive element 1622. While FIG. 16B depicts the processor 1618 and the transceiver 1620 as separate components, it will be appreciated that the processor 1618 and the transceiver 1620 may be integrated together in an electronic package or chip.
  • the transmit/receive element 1622 may be configured to transmit signals to, or receive signals from, a base station (e.g., the base station 1614a) over the air interface 1616.
  • the transmit/receive element 1622 may be an antenna configured to transmit and/or receive RF signals.
  • the transmit/receive element 1622 may be an emitter/detector configured to transmit and/or receive IR, UV, or visible light signals, for example.
  • the transmit/receive element 1622 may be configured to transmit and/or receive both RF and light signals. It will be appreciated that the transmit/receive element 1622 may be configured to transmit and/or receive any combination of wireless signals.
  • the WTRU 1602 may include any number of transmit/receive elements 1622. More specifically, the WTRU 1602 may employ MTMO technology. Thus, in one embodiment, the WTRU 1602 may include two or more transmit/receive elements 1622 (e.g., multiple antennas) for transmitting and receiving wireless signals over the air interface 1616.
  • the WTRU 1602 may include two or more transmit/receive elements 1622 (e.g., multiple antennas) for transmitting and receiving wireless signals over the air interface 1616.
  • the transceiver 1620 may be configured to modulate the signals that are to be transmitted by the transmit/receive element 1622 and to demodulate the signals that are received by the transmit/receive element 1622.
  • the WTRU 1602 may have multi-mode capabilities.
  • the transceiver 1620 may include multiple transceivers for enabling the WTRU 1602 to communicate via multiple RATs, such as R and IEEE 802.11, for example.
  • the processor 1618 of the WTRU 1602 may be coupled to, and may receive user input data from, the speaker/microphone 1624, the keypad 1626, and/or the display/touchpad 1628 (e.g., a liquid crystal display (LCD) display unit or organic light-emitting diode (OLED) display unit).
  • the processor 1618 may also output user data to the speaker/microphone 1624, the keypad 1626, and/or the display/touchpad 1628.
  • the processor 1618 may access information from, and store data in, any type of suitable memory, such as the non-removable memory 1630 and/or the removable memory 1632.
  • the non-removable memory 1630 may include random-access memory (RAM), read-only memory (ROM), a hard disk, or any other type of memory storage device.
  • the removable memory 1632 may include a subscriber identity module (SFM) card, a memory stick, a secure digital (SD) memory card, and the like.
  • the processor 1618 may access information from, and store data in, memory that is not physically located on the WTRU 1602, such as on a server or a home computer (not shown).
  • the processor 1618 may receive power from the power source 1634, and may be configured to distribute and/or control the power to the other components in the WTRU 1602.
  • the power source 1634 may be any suitable device for powering the WTRU 1602.
  • the power source 1634 may include one or more dry cell batteries (e.g., nickel -cadmium (NiCd), nickel-zinc (NiZn), nickel metal hydride (NiMH), lithium-ion (Li-ion), etc.), solar cells, fuel cells, and the like.
  • the processor 1618 may also be coupled to the GPS chipset 1636, which may be configured to provide location information (e.g., longitude and latitude) regarding the current location of the WTRU 1602.
  • location information e.g., longitude and latitude
  • the WTRU 1602 may receive location information over the air interface 1616 from a base station (e.g., base stations 1614a, 1614b) and/or determine its location based on the timing of the signals being received from two or more nearby base stations. It will be appreciated that the WTRU 1602 may acquire location information by way of any suitable location-determination method while remaining consistent with an embodiment.
  • the processor 1618 may further be coupled to other peripherals 1638, which may include one or more software and/or hardware modules that provide additional features, functionality and/or wired or wireless connectivity.
  • the peripherals 1638 may include an accelerometer, an e-compass, a satellite transceiver, a digital camera (for photographs and/or video), a universal serial bus (USB) port, a vibration device, a television transceiver, a hands free headset, a Bluetooth® module, a frequency modulated (FM) radio unit, a digital music player, a media player, a video game player module, an Internet browser, a Virtual Reality and/or Augmented Reality (VR/AR) device, an activity tracker, and the like.
  • FM frequency modulated
  • the peripherals 1638 may include one or more sensors, the sensors may be one or more of a gyroscope, an accelerometer, a hall effect sensor, a magnetometer, an orientation sensor, a proximity sensor, a temperature sensor, a time sensor; a geolocation sensor; an altimeter, a light sensor, a touch sensor, a magnetometer, a barometer, a gesture sensor, a biometric sensor, and/or a humidity sensor.
  • the WTRU 1602 may include a full duplex radio for which transmission and reception of some or all of the signals (e.g., associated with particular subframes for both the UL (e.g., for transmission) and downlink (e.g., for reception) may be concurrent and/or simultaneous.
  • the full duplex radio may include an interference management unit to reduce and or substantially eliminate self-interference via either hardware (e.g., a choke) or signal processing via a processor (e.g., a separate processor (not shown) or via processor 1618).
  • the WRTU 1602 may include a half-duplex radio for which transmission and reception of some or all of the signals (e.g., associated with particular subframes for either the UL (e.g., for transmission) or the downlink (e.g., for reception)).
  • ROM read only memory
  • RAM random access memory
  • register cache memory
  • semiconductor memory devices magnetic media such as internal hard disks and removable disks, magneto-optical media, and optical media such as CD- ROM disks, and digital versatile disks (DVDs).
  • a processor in association with software may be used to implement a radio frequency transceiver for use in a WTRU, UE, terminal, base station, RNC, or any host computer.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

A normal broadcast video viewing experience may be augmented by providing access to enhanced views, such as zoomed or highlighted views of particular regions of interest, or partial or complete views of content with high resolution, high frame rate, high bit depth, or customized tone mapping. Such enhanced views, or zoom coded streams, may be made available over a source other than broadcast, such as a packet-switched network. Information, such as metadata, identifying the available zoom coded streams may be provided in-band in the broadcast video. A second video stream may be requested over the network using the received metadata. The second video stream may be received over the network and then displayed.

Description

SYSTEMS AND METHODS FOR HYBRID NETWORK DELIVERY OF OBJECTS OF
INTEREST IN VIDEO
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] The present application is a non-provisional filing of, and claims benefit under 35 U.S.C. §119(c) from, U.S. Provisional Patent Application Serial No. 62/383,366, filed September 2, 2016, entitled "SYSTEMS AND METHODS FOR HYBRID NETWORK DELIVERY OF OBJECTS OF INTEREST IN VIDEO", which is incorporated herein by reference in its entirety.
TECHNICAL FIELD
[0002] This disclosure relates to systems and methods for streaming video content. More specifically, this disclosure relates to systems and methods for streaming enhancement video content over an alternative network to a primary video and primary network.
BACKGROUND
[0003] In video broadcasting, displaying additional information on a video screen often improves an audience's viewing experience. For example, during a video broadcast of an American football game, the location of a first-down line may be displayed as a yellow line superimposed on the video broadcast at the location of the first-down line. Additionally, a football player's name and statistics may be displayed on the video broadcast when the video broadcast is displaying video of the football player.
SUMMARY
[0004] Systems and methods disclosed herein operate to enhance a normal broadcast video viewing experience by providing access to enhanced views, such as zoomed or highlighted views of particular regions of interest, or partial or complete views of content with high resolution, high frame rate, high bit depth, or customized tone mapping. In exemplary embodiments, such enhanced views, referred to herein as zoom coded streams, are made available over a source other than broadcast, such as a packet-switched network. Information identifying the available zoom coded streams may be provided in-band in the broadcast video. Identifying available streams in the broadcast video rather than providing the streams themselves in the broadcast video may provide advantages with respect to bandwidth consumption, as the zoom coded streams are generally sent to a client at the request of the client rather than on a continuous basis.
[0005] In an exemplary embodiment, interactive video content is provided over a combination of a broadcast network and a broadband network. A camera captures a high-resolution video of a scene, and the location of objects of interest in a video frame is determined, e.g., by collecting object tracking sensor data and camera positioning data and by fusing the data. A bounding box is calculated for the object tracked using the object position and a target resolution. Per-frame metadata is broadcast indicating the bounding box of an object in the broadcast video. A client receives an indication of a selected object (e.g., based on a user selection) and responsively signals to a broadband video server the selection of zoom stream and, in some embodiments, parameters of zoom. In some embodiments, the zoomed stream is constructed by the broadband server or other network entity. The zoomed stream is delivered to the client over the broadband network. The video streams from broadcast and broadband networks may be time synchronized for display.
[0006] In some embodiments, visual information from a video stream is fused with object-in- space information obtained from other location information sources. The other location information sources may be in the form of a radio frequency tracking system, radio frequency identification (RFID) tags, GPS, WiFi Locating Systems, and the like. Coordinates of object-of- interest areas that surround objects of interest are determined based on the fused visual and object- in-space information and may be determined on a per-frame basis. In accordance with an embodiment, a method includes capturing, with a camera, a video frame of a scene; determining a camera orientation and camera location of the camera capturing the video; determining a location of an object of interest; mapping the location of the object of interest to a location on the video frame; and determining an object-of-interest area based on the location of the object of interest on the video frame.
[0007] In one embodiment, there is a method of providing video content. The method includes receiving, over a broadcast channel of a first network, a first video stream and metadata regarding the availability of a second video stream via a second network. The method includes causing display of the first video stream on a first display device. The method includes presenting the availability of at least the second video stream to a user, based on the metadata. The method includes receiving a selection by the user of the second video stream. The method includes requesting, responsive to the selection by the user, the second video stream over the second network using the received metadata. The method includes receiving the second video stream over the second network. The method includes causing display of the second video stream on the first display device. BRIEF DESCRIPTION OF THE DRAWINGS
[0008] FIG. 1 is a schematic block diagram of an adaptive bit rate (ABR) video distribution system with zoom coding capabilities.
[0009] FIG. 2 illustrates an information flow diagram, in accordance with an embodiment.
[0010] FIG. 3 depicts a view of a playing field and a real-time location system, in accordance with an embodiment.
[0011] FIG. 4 illustrates a plurality of potential bounding boxes that may be selected from a frame of high-resolution video for zoomed display of an object of interest, in accordance with some embodiments.
[0012] FIG. 5 illustrates a plurality of potential bounding boxes that may be selected from a frame of high-resolution video for zoomed display of an object of interest, in accordance with some embodiments, in cases where the object of interest is near the edge of the frame of high- resolution video.
[0013] FIG. 6 illustrates a plurality of potential bounding boxes that may be selected from a frame of high-resolution video for zoomed display of an object of interest in accordance with some embodiments.
[0014] FIG. 7A depicts a view of a video frame, in accordance with an embodiment.
[0015] FIG. 7B depicts a view of the video frame with object-of-interest areas, in accordance with an embodiment.
[0016] FIG. 8 depicts a display device on which a video is being displayed, in accordance with an embodiment.
[0017] FIG. 9 is a first message flow diagram, in accordance with an embodiment.
[0018] FIG. 10 illustrates an exemplary system architecture of a hybrid network zoom coding system.
[0019] FIG. 11 illustrates an exemplary system architecture of a hybrid network zoom coding system in which information identifying the availability of zoom coded streams is sent using ATSC 3.0 ROUTE/DASH and the video content of a selected zoom coded stream is delivered to the client using DASH over HTTP.
[0020] FIG. 12 illustrates an exemplary system architecture of a hybrid network zoom coding system in which information identifying the availability of zoom coded streams is sent using MPUs and the video content of a selected zoom coded stream is delivered to the client over a broadband network using DASH over HTTP. [0021] FIG. 13 is a schematic illustration providing an overview of an exemplary multi- network embodiment.
[0022] FIG. 14 is a message flow diagram illustrating operation of an exemplary hybrid network zoom coding system.
[0023] FIG. 15 is a flow diagram of providing video content, in accordance with an embodiment.
[0024] FIG. 16A is a system diagram illustrating an example communications system in which one or more disclosed embodiments may be implemented.
[0025] FIG. 16B is a system diagram illustrating an example wireless transmit/receive unit (WTRU) that may be used within the communications system illustrated in FIG. 16A according to an embodiment.
DETAILED DESCRIPTION Distribution of Streaming Video Content.
[0026] An exemplary functional architecture of an adaptive bitrate video distribution system with zoom coding features is illustrated in FIG. 1. Traditionally, an input full-resolution stream (4K resolution, for example) which may be at a high bit depth may be processed and delivered at a lower resolution, such as high definition (HD), and/or lower bit depth, to an end consumer. In FIG. 1, traditional processing is represented in the components labeled "Traditional ABR Streams". Using traditional adaptive bit rate (ABR) coding, an adaptive bit rate encoder may produce ABR streams that are published to a streaming server, and the streaming server in turn delivers customized streams to end customers.
[0027] An exemplary zoom coding encoder, shown in the bottom part of the workflow in FIG. 1, receives the high-bit-depth input video stream and with a variety of techniques produces auxiliary video streams. These auxiliary video streams may include, for example, streams representing cropped and/or zoomed portions of the original video streams to which different tone maps may have been applied to different regions, or, as discussed in greater detail below, streams for different regions of interest. These auxiliary video streams may in turn be encoded using traditional ABR techniques. A user is presented with the choice of watching the normal program (delivered using ABR profiles) and in addition, zoom coded streams that may represent zoomed portions of the original program or other auxiliary streams relating to the original program. Once the user makes a choice to view a zoom coded stream, the client may request an appropriate stream from the streaming server. The streaming server may then deliver the appropriate stream to the end client. [0028] The streaming server is configured to transmit a video stream over a network to different display devices. The network may be a local network, the Internet, or other similar network. The display devices include devices capable of displaying the video, such as a television, computer monitor, laptop, tablet, smartphone, projector, and the like. The video stream may pass through an intermediary device, such as a cable box, a smart video disc player, a dongle, or the like. The client devices may each remap the received video stream to best match the display and viewing conditions.
Overview of Exemplary Zoom Coding System.
[0029] Systems and method described herein enable viewers of a video stream to selectively view supplemental data regarding objects of interest in the video stream. Metadata is provided to a video client device that identifies the supplemental data.
[0030] FIG. 2 is an information flow diagram, in accordance with an embodiment. In this embodiment, the object of interest is a professional athlete being filmed for a television broadcast during a sporting event. Real-time object location information is determined for each object of interest, for example, each athlete and the ball have real-time location information. The real-time object location information may be determined in any number of ways. Examples include placing a marker, such as an RFID tag, on each object of interest, outfitting each object of interest with a position determination system such as a GPS receiver, using local wireless rangers, and the like. Each player may be equipped with a unique marker that may be placed on any number of locations on the player, such as in a helmet, on a shoe, in a protective pad, or the like. The location information may also be determined or supplemented by video data. For example, optical recognition techniques may be used to identify labels or numbers on a player's uniform, the shape of the playing ball, object colors, object shapes, object movement characteristics, and the like. The object's locations are determined and mapped to location information for the venue, such as the stadium.
[0031] The camera information includes information and/or metadata on the location and orientation of the camera, which may include data such as the pan, tilt, and direction the camera is pointed, and video and audio streams. The camera information may also be supplemented with the camera's visual settings, such as its optical zoom level, focus settings, and the like. Based on the camera's orientation, the camera's field-of-view volume may be determined. This volume may then be mapped into a camera frame, and the player's location within the volume may be fused into the camera frame. The stream zoom video encoder then receives the fused camera information and object information, with video frames marked with object-of-interest metadata for every frame or every set number of frames. The object of interest metadata may indicate the position and/or size of an object of interest within the frame, as well as possibly providing identification for each object of interest. With information available on a frame-by-frame basis, a trajectory may be built for any object of interest. The field-of-view volume may also be based on the camera's focal settings. In such an embodiment, an object of interest may be in the line of sight of the camera, but based on the focal settings may be out of focus, and thus not included in a determination (described below) of different object-of-interest areas.
[0032] Based on the information flow of FIG. 2, location information from each object of interest is obtained and mapped into each video image captured. The identified objects can then be tracked through the video as they appear and disappear from the view of the camera.
[0033] FIG. 3 depicts a view of a playing field and a real-time location system, in accordance with an embodiment. In particular, FIG. 3 depicts the view 300 that includes the field 302, eight players 304, a ball 306, a video camera 308, and a real-time location system (RTLS) 310. The field 302 is shown as an American football field. The view 300 includes eight players 304, four on each side of the ball 306. Each of the eight players 304 and the ball 306 is equipped with an RFID tag that is configured to transmit a radio signal. The RTLS 310 receives the radio signals from the RFID tags and is able to determine a real-time location for each object based on the time-of-arrival (TO A) of each radio signal at the RTLS 310 receivers. The camera 308 records video and sound of the players 304 and the ball 306 on the field 302. The camera is also able to determine its position relative to a reference position and its orientation relative to a reference orientation. The camera may be equipped with a camera mount that is able to detect the pitch, roll, and translation of the camera with respect to reference coordinates. In other embodiments, the camera is equipped with an RFID tag, and the RTLS 310 determines the position and orientation of the camera in relation to reference positions and orientations. The camera 308 then transmits the video, audio, and camera location and orientation information to a fuse mapping service configured to fuse the camera based information with the real-time location information of the different objects of interest.
[0034] In mapping the video locations, an orthographic projection may be developed using the methods disclosed in, for example, Sheikh, Y., et. al., Geodetic Alignment of Aerial Video Frames (2003). Similarly, in mapping the real-time location information, an orthographic projection may be made based on the identification of the tags (sensors), and their determined locations. Both the video-based and location -based orthographic projections may be fused onto any set of coordinates, such as GPS, Cartesian, polar, cylindrical, or any other set of coordinates suited to the environment. Various embodiments may extend such an approach to scenarios with multiple video cameras. In such embodiments, different views of player positions appearing in the field of view of specific cameras can be collected and made available in a consolidated manner. A video stream may be compiled from all available views from different cameras that are available.
[0035] While the view 300 depicts a football field with players equipped with RFID tags, the scenario may be modified. For example, the field 302 may be replaced with an automobile race track, the players 304 may be replaced with automobiles equipped with GPS location technology and be able to transmit a determined GPS location to a server for mapping the locations of the automobiles on the race track for fusion with a video of the race. Other scenarios may be likewise accommodated (e.g., a soccer game, a baseball game, a golf tournament, the filming of a movie set or a news program, etc.) in which one or more cameras and one or more people and/or objects of interest may be similarly outfitted in order to provide the camera information and the object information as described herein. A bounding box that surrounds an object of interest may be defined for each video frame in which the object of interest appears. The bounding box may be based on pixel coordinates of the video frame. With the coordinate position of each object of interest in the video frame identified, metadata is created to notify the client about the availability of zoom coded streams or other supplemental content regarding different objects of interest, as well as bounding box or object-of-interest area information for such objects of interest.
Determination of a Bounding Box.
[0036] FIG. 4 illustrates a plurality of potential bounding boxes that may be selected from a frame 500 of native-resolution video for zoomed display of an object of interest. The video frame is at a native source resolution of 6K. Various bounding boxes may be defined to select portions of the video frame that can be presented at a lower resolution. For example, one bounding box may be used to select a region with dimensions of 1280 x 720 pixels, which allows for presentation of video in high definition (FID). Another bounding box may be used to select a region with dimensions of 640 x 480 pixels, which allows for presentation of video in standard definition (SD).
The location of each corner of the different resolutions is depicted in Cartesian x a dy coordinates.
For the HE) bounding box, the bottom left corner is located at (xi, yi), the top left corner at (xi, yi + 720), the top right corner located at (xi + 1280, yi + 720), and the bottom right corner located at (xi + 1280, yi). Similarly, for the SD bounding box, the bottom left corner is located at ( 2, yi), the top left corner at ( 2, y2 + 480), the top right corner located at (x2 + 640, y2 + 480), and the bottom right corner located at (x2 + 640, yi). The position of each bounding box may be determined automatically based on real-time information regarding the position and orientation of the camera and the location of the object of interest. As shown in FIG. 4, the real-time location of the object of interest is determined to be at the black dot located at (ai, bi). The location of the object of interest may be determined by any real-time location service, and may be fused into the location of the video view. In one embodiment, a player may be known to be wearing an RFID tag near the waist. A bounding box may then be positioned, as in FIG. 4, such that the object of interest is at the center of the bounding box.
[0037] To select the position of an HD bounding box for the object of interest, xl may be placed at al-640 and yl at bl-360. Similarly, to select the position of an SD bounding box for the object of interest, x2 may be placed at al-320 and y2 at b 1-240. Although only a single object of interest is shown in each of FIGs. 4, 5, and 6, in the more general case multiple objects of interest may be present, and some or all of the bounding boxes may contain or may overlap with other objects of interest in addition to the object of interest for which the bounding box was defined.
[0038] In some situations, it may be desirable to select a bounding box location that is not centered on the corresponding object of interest. For example, if a player wears an RFID tag on a helmet on his head, the bounding box may be positioned such that the location of the object of interest, point (ai, bi), is toward the top of the bounding box, leaving room for the player's body and legs to be in the middle portion and lower portion, respectively, of the region of interest frame when the player is vertical. In some embodiments, the orientation (standing, jumping, diving, etc.) may be determined for the player in selecting the position of the bounding box around the object of interest. Example methods for determining a player's orientation may include placing RFID sensors on the player's head and feet to determine two end-point locations of the player or correlating optical features of the video with the determined location, or the like.
[0039] Another situation in which it may be desirable to select a bounding box location that is not centered on the corresponding object of interest is illustrated in FIG. 5. In the situation depicted in FIG. 5, an object of interest is near the edge of the native-resolution video frame. The view depicted in FIG. 5 is similar to the view depicted in FIG. 4, except that the object of interest is located near the left edge of the frame. To accommodate such a situation, bounding boxes may be positioned such that, for example, the distance between the object of interest and the center of the bounding box is minimized, subject to the constraint that the bounding box is entirely within the native-resolution frame.
[0040] FIG. 6 illustrates a frame of native-resolution video including the positions of an HD bounding box, an SD bounding box, and an object-of-interest area. The view depicted in FIG. 6 is similar to the view depicted in FIG. 4, with a location of the object-of-interest area added within the bounding boxes. Based on the location of the object of interest, an object-of-interest area is determined, as represented by the ellipse. Although the object-of-interest area is depicted as an ellipse, it may have any shape, such as a square, a rectangle, a circle, an arbitrary object shape outline, and the like. [0041] In some embodiments, a user is provided with an option of selecting content within a particular bounding box. In response to a user selection, the video content within the bounding box is delivered as a separate stream to the user. For example, the user may select one or more objects of interest, and in response a streaming server may deliver the video content (e.g., an enhanced version of the video content) within a bounding box which contains the one or more objects of interest. One or more objects of interest may be positioned within the bounding box. In response to a user selection of a bounding box area, a supplemental data such as a zoom coded video stream may be provided to the client.
[0042] FIG. 7 A depicts a view of a video frame, in accordance with an embodiment. In particular, FIG. 7A depicts the view 800. The view 800 is a view of a video frame from the perspective of the camera 308 of FIG. 3. The view 800 includes portions of the field 302, four players on the left (304A-D), the ball 306 in the center, and four players on the right (304E-H). A real-time location system determines the location of each of the players 304A-H and the ball 306. The physical locations are fused with the locations of the objects in the video frame.
[0043] FIG. 7B depicts a view of the video frame with object-of-interest areas, in accordance with an embodiment. In particular, FIG. 7B depicts the view 810. The objects present in view 810 include the objects of the view 800 of FIG. 7A. Also depicted in the view 810 are object-of-interest areas. Each of the objects of interest has a determined object-of-interest area, as depicted by the rectangles 804A-H around the players 304A-H and the rectangle 806 around the ball 306. The object-of-interest areas may indicate areas that are preferably not obstructed when supplemental data is displayed (e.g., overlayed) on the video frame.
[0044] FIG. 8 depicts a display device (e.g., a television or computer monitor) on which a video is being displayed, in accordance with an embodiment. In particular, FIG. 8 depicts a display device 830 that depicts the view 800 of the football game shown in FIG. 7 A, with supplemental data 832 displayed. In this embodiment, a user has requested supplemental data related to the player 304H (the right-most player as illustrated in FIG. 7A) to be displayed. In some embodiments, the indicator 824H is displayed linking the text of the supplemental data with the location of the object of interest. In some embodiments, other types of supplemental data may be displayed beyond text, such as image or video content related to the object of interest, a highlighting overlay of the object of interest, and/or the like. In some embodiments, determining the bounding boxes may be performed over several frames. Video Delivery Method.
[0045] FIG. 9 is a first message flow diagram, in accordance with a network-based system for delivering zoom coded content. In particular, FIG. 9 illustrates the operation of an exemplary video delivery system, depicting communications between a content source, a fuse mapper, an encoder, a transport packager, a server (e.g., an origin server, an edge streaming server, or other streaming server), a web server, and a client device. Exemplary embodiments disclosed herein employ alternative techniques in which at least a portion of the video data provided to a client device is provided over a broadcast transmission.
[0046] The content source transmits a compressed or uncompressed media stream of the source media (such as a video stream) to a fuse mapper. Additionally, location information (e.g., object location and/or bounding box information, such as RTLS location data) associated with objects of interest is also transmitted to a fuse mapper. Location information is added to each frame for the obj ects of interest. The fuse mapper then transmits the fused video and location information to the encoder at a high bit depth. The locations of the object-of-interest areas are included in the transmission to the encoder. The encoder may separately create ABR streams with default tone mappings and in some embodiments ABR streams with alternative tone remappings in various regions of interest (or combinations thereof). The various ABR streams with both the fused location and video information are transmitted to a transport packager. The transport packager may segment the files and make the files available via an ftp or http download and may prepare a manifest.
[0047] Note that the content preparation entities and steps shown in FIG. 9 are by way of example and should not be taken as limiting. Variations are possible, for example entities may be combined at the same location or into the same physical device. Also, the segmented media content may be delivered to an origin server, to one or multiple edge streaming servers, to a combination of these server types, or any suitable media server from which a media client may request the media content. A manifest file (e.g., a DASH MPD) that describes the media content may be prepared in advance and delivered to the streaming server (e.g., the origin server and/or the edge streaming server), or the manifest file may be generated dynamically in response to a client request for the manifest file.
[0048] A client device may transmit a signal to a web server requesting to download the media content and may receive a streaming server redirect signal. The client device may request a manifest which describes the available content files (e.g., media segment files). The request may be sent from the client to a server. The server (e.g., origin server or an edge streaming server) may deliver the manifest file in response to the client request. The manifest may indicate availability of the various ABR streams, region-of-interest areas bounding box areas, supplemental data or metadata for the various objects of interest, and the like.
[0049] Initially, the client may request a default stream from a streaming server, and the streaming server may responsively transmit the default stream (e.g., media segments of that default stream) to the client device. The client device may display the default stream. In alternative embodiments as described in further detail below, the default stream may be provided to the client device using a broadcast transmission, and the broadcast transmission may further include information regarding the location and/or content of supplemental streams that can be retrieved over the non-broadcast network.
[0050] The client device may detect a cue to request streams associated with particular bounding boxes or objects of interest. For example, the cue may be user input wherein the user selects a player or other object of interest associated with a bounding box.
[0051] In some embodiments, the fuse mapper may include location information of all objects of interest in or at a venue. However, only the relevant location information corresponding to the camera image is laid on top of the pixel coordinates on a frame by frame basis. The fuse mapper thus outputs both video data and coordinates of the objects identified by the real-time location system in the form of per-frame metadata that identifies the object in the camera pixel domain. The coordinates of the objects of interest and the related region-of-interest areas are updated as the object locations are updated.
[0052] In some embodiments, the system may also determine open spaces within the video frames, which may be used as display locations for supplemental streams or overlays, and may transmit these open areas to a client device. In some embodiments, the open spaces may be determined using methods such as those described in international patent application PCT/US 17/43248, entitled "Systems and Methods for Integrating and Delivering Objects of Interest in Video", filed July 21, 2017, or U.S. Provisional Patent Application No. 62/365,868, filed July 22, 2016, entitled "Systems and Methods for Integrating and Delivering Objects of Interest in Video", each of which is hereby incorporated by reference in its entirety. For example, a clear area may be identified that is outside of an object-of-interest area but within a bounding box. The clear area may be defined using pixel coordinates of the respective video frame. Locating the clear areas outside of all of the object-of-interest areas may prevent the displayed supplemental data or views from obstructing the object-of-interest areas (such as players, a ball, etc.). A viewer can select supplemental streams associated with different objects of interest, after which a client device can receive and/or retrieve supplemental data associated with the selected overlay of interest and display video with the supplemental data being displayed, in some instances within the determined open area(s) (but not necessarily).
Object of Interest Tracking Metadata.
[0053] In some embodiments, the result of mapping the real-time locations of an object of interest to the camera pixel 2D image space is a (x, y) pixel position in the camera 2D image. The (x, y) pixel position moves according to the location-tracking results of the objects of interest, for example, as the players move across the field. The size of the object-of-interest area that contains the object may also be determined by the fusion mapper based on the camera parameters: zoom setting, camera angle, camera location, and the like. The camera focus and zoom may change dynamically as the camera operator follows the action. In some cases, the camera can also be moving, for example with a handheld or aerial camera. The camera may further include an RFID tag, and the real-time location service may determine the camera's location.
[0054] The client, or viewer, may request metadata and video streams for different scenarios. In one scenario, the client requests overlay of highlights on tracked objects during video playback of the full field view. The client may use the metadata to request specific zoom streams from the server. In some embodiments, the supplemental data requested by the user may be other than an alternative or zoomed view, but may include additional overlays such as graphical information displays (e.g., player stats, etc.), related content (such as images or video), and/or the like.
[0055] Embodiments disclosed herein may be employed in an MPEG-DASH ABR video distribution system. Metadata such as the identity and location of objects of interest and supplemental information regarding the objects of interest may be contained in the DASH MPD (or other streaming protocol manifest) and/or in the user data of the video elementary stream to provide for frame by frame updates. A client device may first read the manifest on start up for initial object information, and then continually update the object track information by parsing the video frame elementary data. Object-of-interest metadata may be available in-band or from a separate metadata file and may be used to update the user interface or other components related to selection and display of objects of interest and associated supplemental data.
[0056] The following parameters, among others, may be conveyed in exemplary embodiments:
Num_Objects: Range 0-255, defines the number of objects to be defined in the current list. If Num Objects is greater than zero, then the following parameters are provided for each of the Num Objects objects, each of which may pertain to an object of interest. Object_ID: Range 0-255. This syntax element provides a unique identifier for each object of interest.
Object_x_position[n]: For each object ID n, the x position of the object-of-interest area. Object_y_position[n]: For each object ID n, the y position of the object-of-interest area. Object_x_size[n] : For each object ID n, the x dimension of the object-of-interest area. Object_y_size[n] : For each object ID n, the y dimension of the object-of-interest area. Object_UserData[n] : For each object ID n, User Data can be included to be used by the Client to present User Interface Selection Criteria for the Object.
[0057] Object x,y position and size may be provided in pixel units that correspond to the first- listed representation in the appropriate adaption set. For secondary representations (if they exist), the Object x,y size and position values are scaled to the secondary representations picture dimensions with a linear scale factor.
[0058] In some embodiments, the parameters may also include information identifying open areas related to objects of interest, which may be used for locating a second stream to be displayed in relation to the first stream.
[0059] A client device by receiving an MPD or in-band data with the above-identified information can represent the object of interest on the user interface in a variety of ways. The Supplemental Property of an adaption set indicates to the client how many objects of interest are available. Object UserData may be used by the client to display information describing aspects of the object of information. For example, in a sports game this can be specific player information.
Exemplary Methods for Determination of Object-of-interest Area.
[0060] In exemplary embodiments, a camera captures video, which may be at high resolution, of a scene. The camera may also be configured to detect its location, pan, tilt, optical zoom, focal settings, and the like. The location of an object of interest may be determined in the video frame. An object of interest's location may be determined by a real-time location tracking system, such as RFID, and based on the determined locations of the object of interest and the camera's position, the location of each object of interest may be determined within the video frame. An object-of- interest area may be determined within the video frame for each object of interest. The object-of- interest area may be defined by a set of pixels of the video frame.
[0061] Metadata related to video content may be received at the client device. The metadata may be sent, for example, in a broadcast video transmission. The metadata may include, for example, information identifying object-of-interest areas (e.g., providing coordinates of object-of- interest areas, or providing information identifying a resource through which coordinates of object- of-interest areas may be obtained).
Exemplary Multi-Network Systems and Methods.
[0062] As described above, zoom coded streams offer enhanced viewing experiences to end users, with zoom coded streams being generated from, for example, high resolution, high frame- rate, and/or high bit depth videos of content. For example, zoom coding may be used as a way to generate additional or alternate views for the end user.
[0063] A traditional broadcast network encodes live content and delivers the encoded streams over a variety of wired and wireless networks. Broadcasting live content is an efficient way to deliver content in a scalable manner to a wide customer base. One common mechanism to deliver digital video broadcast services is using standard MPEG2 transport stream packages. An alternative is to use an MP4 packaging format. However, the broadcast network is a one-way network. Embodiments disclosed herein make use of an alternative network, such as a packet- switched network, to deliver more interactive content. As an example, a combination of a broadcast network with content and a two-way IP network can allow for content to be available from both networks simultaneously.
[0064] Mechanisms are described herein for delivering zoom coded content (e.g., enhanced views of one or more selected objects of interest) over an alternative network that is different from a network over which primary content is being delivered. In some embodiments, rather than zoom coded content, any secondary content may be delivered over the alternative network. For example, alternative overlays may be delivered over the alternative network, not just a zoomed or enhanced view of primary content.
[0065] In an exemplary embodiment, a primary video stream is broadcast over the air while streams providing customized zoom experiences are delivered over broadband connections. In some embodiments, transmission of a primary video stream along with data identifying one or more available zoom coded streams is transmitted using the ATSC 3.0 standard are used to enable user customization and zoom coding functionality.
[0066] An exemplary embodiment is illustrated in FIG. 10. As shown in FIG. 10, content may be encoded and sent over a broadcast network (such as a Terrestrial Broadcast, microwave, satellite or fiber system) to be received by a broadcast receiver. Information identifying one or more zoom coded streams may be embedded in the Transport Stream (or MP4) of the broadcast system in the private data section. Such information may include, for example, information on the location (e.g., URL for the manifest) and/or availability (e.g., what streams are available and with what characteristics) of zoom coded streams. In order to pinpoint the exact time location of the alternative streams, the information in the broadcast stream that identifies the availability of zoom coded streams may include an embedded timestamp that is indicative of the time availability of the zoom coded streams. The information indicating the availability of zoom coded streams may be sent periodically, since the point where an end user tunes into the channel and the location of the indicators may not line up. Various mechanisms known to those of skill in the art may be used to carry such private data within a digital broadcast system to a receiving client.
[0067] Once this private data indicating the availability of one or more zoom coded streams (e.g., an MPD) is received and interpreted by a client device, the client device may display for an end user information on the availability of zoom coded streams at the appropriate time. If a user requests a specific zoom coded stream to be played, a network interface of the client device may initiate streaming with a server on which the zoom coded stream is available and may cause display of the streamed content (e.g., on a built-in or external monitor). In response to reaching the end of streamed content (or earlier, if requested by a user), the client device may automatically switch back to display of the live broadcast video.
[0068] In an exemplary embodiment, a hybrid system makes use of ATSC 3.0 broadcast video and a separate broadband data network. In ATSC 3.0, there are several tools and mechanisms available for hybrid broadcast and broadband delivery of content. These mechanisms are tailored to deliver both real-time and non-real-time, file based programs as well.
ROUTE/DASH over Broadcast with DASH over Broadband.
[0069] In one exemplary embodiment, the availability of zoom coded content may be indicated in a broadcast transmission by sending a DASH MPD or other manifest using the Realtime Object Delivery over Unidirectional Transport (ROUTE) protocol as described by ATSC for delivering DASH format and non-real-time content over broadcast channels. ROUTE is derived from the FLUTE (File Delivery over Unidirectional Networks IETF RFC 6726). Using the ROUTE/DASH mechanism, various techniques may be used to handle the delivery of information identifying the zoom coded streams. In some embodiments, the normal broadcast stream is delivered over broadcast using ROUTE, and the zoom coded streams are delivered over broadband as DASH segments via HTTP on the broadband channel.
[0070] In exemplary embodiments, the broadcast and broadband paths are time-aligned and delivered to accomplish time alignment at a receiver according to the receiver buffer model. In some embodiments, this is accomplished by delaying the broadcast path sufficiently to ensure that the broadband path is synchronized with the broadcast path. Support for handover from the broadcast to the broadband pipe to receive the zoom coded stream may be indicated by the MPD fragment in the Service Layer Signaling. FIG. 11 illustrates an exemplary system architecture of a hybrid network zoom coding system in which information identifying the availability of zoom coded streams is sent using ATSC 3.0 ROUTE/DASH and the video content of a selected zoom coded stream is delivered to the client using DASH over HTTP.
MMT/MPU over broadcast with DASH over Broadband.
[0071] In some embodiments, media data are encapsulated into Media Processing Units (MPUs) that are packetized into MPEG Media Transport Protocol (MMTP) packets. In such embodiments, both the transmitter and receiver systems may be locked to UTC to maintain synchronization. The zoom coded streams may be presented over broadband over DASH. All the streams being sent over broadcast or broadband may be locked to UTC to help with alignment/synchronization. In some embodiments, techniques for coordinating broadcast and broadband communications include those described in ATSC Document S33-174, ATSC Candidate Standard for Signaling Delivery, Synchronization and Error Protection" Jan 2016. FIG. 12 illustrates an exemplary system architecture of a hybrid network zoom coding system in which information identifying the availability of zoom coded streams is sent over the broadcast using MPUs and the video content of a selected zoom coded stream is delivered to the client over a broadband network using DASH over HTTP.
Exemplary user interface.
[0072] An exemplary user interface is illustrated in FIG. 13. As shown in FIG. 13, an unzoomed view 1305 of a sporting event is provided in a live broadcast stream (for example, transmitted from an antenna 1310 and received at a user antenna 1315) that is displayed 1340 on a television, monitor, or other display device 1335. The broadcast includes data identifying the availability of one or multiple zoom coded streams. The zoom coded streams may represent zoomed or otherwise enhanced views of sports players or other objects of interest visible in the live broadcast stream. In response, a user interface element may be provided on the display 1335. The user interface element may allow selection of at least one of the zoom coded streams. In the embodiment of FIG. 13, the user interface element may be a still image or a video (e.g., a low- resolution video) provided in a picture-in-picture arrangement. Other depictions of the user interface element are possible; for example, the user interface element may highlight a sports player or other object of interest in the main video view as rendered from the live broadcast stream 1305, where an associated zoom coded stream 1320 is available (such as at a server 1325) for that sports player or object of interest. Upon user selection of the user interface element, the client device 1335 retrieves the requested stream 1320 (e.g., a zoom coded stream providing a zoomed or otherwise enhanced view of the sports player or other object of interest) over a broadband connection (such as through a modem 1330). In the example of FIG. 13, the zoom coded video 1320 is a high-resolution zoomed video of a particular region of interest in the sporting event. The requested stream 1320 may be decoded and displayed to the user, for example as a full screen zoomed view or as a picture-in-picture arrangement 1345 overlaid on the main view 1340.
Exemplary message flow.
[0073] FIG. 14 illustrates an exemplary flow of information among components in some embodiments. A high-resolution camera may obtain a video to be broadcast, and a broadcast encoder encodes the video in an appropriate broadcast format. Sensor information (e.g., RFID information) regarding the location of objects of interest may be collected and encoded as a position and/or a bounding box within the video image. Information on the location and/or extent of objects of interest may be encoded in the broadcast data and may be used, for example by the client device to highlight the object of interest while displaying the ordinary broadcast video. Based on the locations of one or more objects of interest within the broadcast video, a broadband server (or other network entity) may generate a video representing a zoomed version of the respective object of interest. These videos may be generated from a high resolution (e.g., the native-resolution) version of the video to generate a high-quality zoomed video. In response to a user selection of a region of interest identified in broadcast data, a user client may retrieve a zoom coded video corresponding to that object of interest from a broadband server and subsequently cause the zoom coded video to be displayed. The zoom coded video may be displayed as a full screen zoom of the selected obj ect or region of interest, or the zoom coded video may be displayed together with the main video view from the broadcast signal. The broadcast client and the broadband client may be within a user display device (e.g., a connected television, a tablet device, a mobile phone, a gaming device, a media player device, and/or the like).
[0074] In some embodiments, rather than a zoomed view being delivered over the broadband network, any secondary content may be delivered over the broadband network. For example, alternative overlays may be delivered over the broadband network, not just a zoomed or enhanced view of broadcast video.
[0075] FIG. 15 is a flow diagram of providing video content, in accordance with an embodiment. As illustrated in FIG. 15, there may be a method 1500. The method may include a step 1505 of receiving, over a broadcast channel of a first network, a first video stream and metadata regarding the availability of a second video stream which may be requested via a second network. The method may include a step 1510 of causing display of the first video stream on a first display device. The method may include a step 1515 of presenting the availability of at least the second video stream to a user, based on the metadata. The method may include a step 1520 of receiving a selection by the user of the second video stream. The method may include a step 1525 of requesting, responsive to the selection by the user, the second video stream over the second network using the received metadata. The method may include a step 1530 of receiving the second video stream over the second network. The method may include a step 1535 of causing display of the second video stream on the first display device.
Further Embodiments.
[0076] In an embodiment, there is a method of providing video content, comprising: receiving, over a broadcast channel, first video stream data and metadata regarding the availability of a second video stream via a packet-switched network; sending, over the packet-switched network, in accordance with the metadata regarding the availability of a second video stream via a network received over the broadcast channel with the first video stream data, a request for the second video stream data; receiving, over the network, second video stream data; and causing display of the second video stream data. The method may also include wherein the display of the first video stream data and second video stream data are time synchronized. The method may also include wherein the broadcast channel is an ATSC channel or a cable QAM channel. The method may also include wherein the packet-switched network is an IP network. The method may also include wherein the packet-switched network includes a cellular data connection. The method may also include wherein the metadata regarding the availability of a second video stream identifies an object of interest, and wherein the second video stream is an enhanced view of the object of interest. The method may also include wherein the metadata regarding the availability of a second video stream identifies an object of interest, and wherein the request is sent in response to user selection of the object of interest. The method may also include wherein the metadata regarding the availability of a second video stream comprises coordinates of an object-of-interest area. The method may also include wherein causing display of the second video stream comprises causing display of the second video stream simultaneously with display of the first video stream. The method may also include wherein causing display of the second video stream comprises causing display of the second video stream in a picture-in-picture relationship with the first video stream. The method may also include wherein the metadata regarding the availability of a second video stream identifies an object of interest, and wherein the second video stream is a zoomed view of the object of interest.
[0077] In an embodiment, there is a system for providing video content, the system being operative to perform functions comprising: receiving, over a broadcast channel, first video stream data and metadata regarding the availability of a second video stream via a packet-switched network; sending, over the packet-switched network, in accordance with the metadata regarding the availability of a second video stream via a network received over the broadcast channel with the first video stream data, a request for the second video stream data; receiving, over the network, second video stream data; and causing display of the second video stream data.
[0078] In an embodiment, there is a system for providing video content, the system comprising a processor and a non-transitory storage medium storing instructions operative, when executed on the processor, to perform functions comprising: receiving, over a broadcast channel, first video stream data and metadata regarding the availability of a second video stream via a packet-switched network; sending, over the packet-switched network, in accordance with the metadata regarding the availability of a second video stream via a network received over the broadcast channel with the first video stream data, a request for the second video stream data; receiving, over the network, second video stream data; and causing display of the second video stream data. The system may further comprise a display, wherein causing display of the second video stream data comprises displaying the second video stream data on the display.
[0079] In an embodiment, there is a method of providing video content, comprising: receiving, over a broadcast channel, a first video stream and metadata regarding the availability of a second video stream which may be requested via a network; displaying the first video stream; requesting the second video stream over the network using the received metadata; receiving the second video stream over the network; and causing display of the second video stream.
Exemplary Client and Server Hardware.
[0080] Note that various hardware elements of one or more of the described embodiments are referred to as "modules" that carry out (i.e., perform, execute, and the like) various functions that are described herein in connection with the respective modules. As used herein, a module includes hardware (e.g., one or more processors, one or more microprocessors, one or more microcontrollers, one or more microchips, one or more application-specific integrated circuits (ASICs), one or more field programmable gate arrays (FPGAs), one or more memory devices) deemed suitable by those of skill in the relevant art for a given implementation. Each described module may also include instructions executable for carrying out the one or more functions described as being carried out by the respective module, and it is noted that those instructions may take the form of or include hardware (i.e., hardwired) instructions, firmware instructions, software instructions, and/or the like, and may be stored in any suitable non-transitory computer-readable medium or media, such as commonly referred to as RAM, ROM, etc. [0081] Exemplary embodiments disclosed herein are implemented using one or more wired and/or wireless network nodes, such as a wireless transmit/receive unit (WTRU) or other network entity.
[0082] FIG. 16A is a diagram illustrating an example communications system 1600 in which one or more disclosed embodiments may be implemented. The communications system 1600 may be a multiple access system that provides content, such as voice, data, video, messaging, broadcast, etc., to multiple wireless users. The communications system 1600 may enable multiple wireless users to access such content through the sharing of system resources, including wireless bandwidth. For example, the communications systems 1600 may employ one or more channel access methods, such as code division multiple access (CDMA), time division multiple access (TDMA), frequency division multiple access (FDMA), orthogonal FDMA (OFDMA), single- carrier FDMA (SC-FDMA), zero-tail unique-word DFT-Spread OFDM (ZT UW DTS-s OFDM), unique word OFDM (UW-OFDM), resource block-filtered OFDM, filter bank multicarrier (FBMC), and the like.
[0083] As shown in FIG. 16 A, the communications system 1600 may include wireless transmit/receive units (WTRUs) 1602a, 1602b, 1602c, 1602d, a RAN 1604, a CN 1606, a public switched telephone network (PSTN) 1608, the Internet 1610, and other networks 1612, though it will be appreciated that the disclosed embodiments contemplate any number of WTRUs, base stations, networks, and/or network elements. Each of the WTRUs 1602a, 1602b, 1602c, 1602d may be any type of device configured to operate and/or communicate in a wireless environment. By way of example, the WTRUs 1602a, 1602b, 1602c, 1602d, any of which may be referred to as a "station" and/or a "STA", may be configured to transmit and/or receive wireless signals and may include a user equipment (UE), a mobile station, a fixed or mobile subscriber unit, a subscription- based unit, a pager, a cellular telephone, a personal digital assistant (PDA), a smartphone, a laptop, a netbook, a personal computer, a wireless sensor, a hotspot or Mi-Fi device, an Internet of Things (IoT) device, a watch or other wearable, a head-mounted display (HMD), a vehicle, a drone, a medical device and applications (e.g., remote surgery), an industrial device and applications (e.g., a robot and/or other wireless devices operating in an industrial and/or an automated processing chain contexts), a consumer electronics device, a device operating on commercial and/or industrial wireless networks, and the like. Any of the WTRUs 1602a, 1602b, 1602c and 1602d may be interchangeably referred to as a UE.
[0084] The communications systems 1600 may also include a base station 1614a and/or a base station 1614b. Each of the base stations 1614a, 1614b may be any type of device configured to wirelessly interface with at least one of the WTRUs 1602a, 1602b, 1602c, 1602d to facilitate access to one or more communication networks, such as the CN 1606, the Internet 1610, and/or the other networks 1612. By way of example, the base stations 1614a, 1614b may be a base transceiver station (BTS), a Node-B, an eNode B, a Home Node B, a Home eNode B, a gNB, a NR NodeB, a site controller, an access point (AP), a wireless router, and the like. While the base stations 1614a, 1614b are each depicted as a single element, it will be appreciated that the base stations 1614a, 1614b may include any number of interconnected base stations and/or network elements.
[0085] The base station 1614a may be part of the RAN 1604, which may also include other base stations and/or network elements (not shown), such as a base station controller (BSC), a radio network controller (RNC), relay nodes, etc. The base station 1614a and/or the base station 1614b may be configured to transmit and/or receive wireless signals on one or more carrier frequencies, which may be referred to as a cell (not shown). These frequencies may be in licensed spectrum, unlicensed spectrum, or a combination of licensed and unlicensed spectrum. A cell may provide coverage for a wireless service to a specific geographical area that may be relatively fixed or that may change over time. The cell may further be divided into cell sectors. For example, the cell associated with the base station 1614a may be divided into three sectors. Thus, in one embodiment, the base station 1614a may include three transceivers, i.e., one for each sector of the cell. In an embodiment, the base station 1614a may employ multiple-input multiple output (MTMO) technology and may utilize multiple transceivers for each sector of the cell. For example, beamforming may be used to transmit and/or receive signals in desired spatial directions.
[0086] The base stations 1614a, 1614b may communicate with one or more of the WTRUs 1602a, 1602b, 1602c, 1602d over an air interface 1616, which may be any suitable wireless communication link (e.g., radio frequency (RF), microwave, centimeter wave, micrometer wave, infrared (IR), ultraviolet (UV), visible light, etc.). The air interface 1616 may be established using any suitable radio access technology (RAT).
[0087] More specifically, as noted above, the communications system 1600 may be a multiple access system and may employ one or more channel access schemes, such as CDMA, TDMA, FDMA, OFDMA, SC-FDMA, and the like. For example, the base station 1614a in the RAN 1604 and the WTRUs 1602a, 1602b, 1602c may implement a radio technology such as Universal Mobile Telecommunications System (UMTS) Terrestrial Radio Access (UTRA), which may establish the air interface 1616 using wideband CDMA (WCDMA). WCDMA may include communication protocols such as High-Speed Packet Access (HSPA) and/or Evolved HSPA (HSPA+). HSPA may include High-Speed Downlink (DL) Packet Access (HSDPA) and/or High-Speed UL Packet Access (HSUPA). [0088] In an embodiment, the base station 1614a and the WTRUs 1602a, 1602b, 1602c may implement a radio technology such as Evolved UMTS Terrestrial Radio Access (E-UTRA), which may establish the air interface 1616 using Long Term Evolution (LTE) and/or LTE- Advanced (LTE- A) and/or LTE- Advanced Pro (LTE-A Pro).
[0089] In an embodiment, the base station 1614a and the WTRUs 1602a, 1602b, 1602c may implement a radio technology such as NR Radio Access, which may establish the air interface 1616 using New Radio (NR).
[0090] In an embodiment, the base station 1614a and the WTRUs 1602a, 1602b, 1602c may implement multiple radio access technologies. For example, the base station 1614a and the WTRUs 1602a, 1602b, 1602c may implement LTE radio access and NR radio access together, for instance using dual connectivity (DC) principles. Thus, the air interface utilized by WTRUs 1602a, 1602b, 1602c may be characterized by multiple types of radio access technologies and/or transmissions sent to/from multiple types of base stations (e.g., a eNB and a gNB).
[0091] In other embodiments, the base station 1614a and the WTRUs 1602a, 1602b, 1602c may implement radio technologies such as IEEE 802.11 (i.e., Wireless Fidelity (WiFi), IEEE 802.16 (i.e., Worldwide Interoperability for Microwave Access (WiMAX)), CDMA2000, CDMA2000 IX, CDMA2000 EV-DO, Interim Standard 2000 (IS-2000), Interim Standard 95 (IS- 95), Interim Standard 856 (IS-856), Global System for Mobile communications (GSM), Enhanced Data rates for GSM Evolution (EDGE), GSM EDGE (GERAN), and the like.
[0092] The base station 1614b in FIG. 16A may be a wireless router, Home Node B, Home eNode B, or access point, for example, and may utilize any suitable RAT for facilitating wireless connectivity in a localized area, such as a place of business, a home, a vehicle, a campus, an industrial facility, an air corridor (e.g., for use by drones), a roadway, and the like. In one embodiment, the base station 1614b and the WTRUs 1602c, 1602d may implement a radio technology such as IEEE 802.11 to establish a wireless local area network (WLAN). In an embodiment, the base station 1614b and the WTRUs 1602c, 1602d may implement a radio technology such as IEEE 802.15 to establish a wireless personal area network (WPAN). In yet another embodiment, the base station 1614b and the WTRUs 1602c, 1602d may utilize a cellular- based RAT (e.g., WCDMA, CDMA2000, GSM, LTE, LTE-A, LTE-A Pro, NR etc.) to establish a picocell or femtocell. As shown in FIG. 16 A, the base station 1614b may have a direct connection to the Internet 1610. Thus, the base station 1614b may not be required to access the Internet 1610 via the CN 1606.
[0093] The RAN 1604 may be in communication with the CN 1606, which may be any type of network configured to provide voice, data, applications, and/or voice over internet protocol (VoIP) services to one or more of the WTRUs 1602a, 1602b, 1602c, 1602d. The data may have varying quality of service (QoS) requirements, such as differing throughput requirements, latency requirements, error tolerance requirements, reliability requirements, data throughput requirements, mobility requirements, and the like. The CN 1606 may provide call control, billing services, mobile location-based services, pre-paid calling, Internet connectivity, video distribution, etc., and/or perform high-level security functions, such as user authentication. Although not shown in FIG. 16A, it will be appreciated that the RAN 1604 and/or the CN 1606 may be in direct or indirect communication with other RANs that employ the same RAT as the RAN 1604 or a different RAT. For example, in addition to being connected to the RAN 1604, which may be utilizing a NR radio technology, the CN 1606 may also be in communication with another RAN (not shown) employing a GSM, UMTS, CDMA 2000, WiMAX, E-UTRA, or WiFi radio technology.
[0094] The CN 1606 may also serve as a gateway for the WTRUs 1602a, 1602b, 1602c, 1602d to access the PSTN 1608, the Internet 1610, and/or the other networks 1612. The PSTN 1608 may include circuit-switched telephone networks that provide plain old telephone service (POTS). The Internet 1610 may include a global system of interconnected computer networks and devices that use common communication protocols, such as the transmission control protocol (TCP), user datagram protocol (UDP) and/or the internet protocol (IP) in the TCP/IP internet protocol suite. The networks 1612 may include wired and/or wireless communications networks owned and/or operated by other service providers. For example, the networks 1612 may include another CN connected to one or more RANs, which may employ the same RAT as the RAN 1604 or a different RAT.
[0095] Some or all of the WTRUs 1602a, 1602b, 1602c, 1602d in the communications system 1600 may include multi-mode capabilities (e.g., the WTRUs 1602a, 1602b, 1602c, 1602d may include multiple transceivers for communicating with different wireless networks over different wireless links). For example, the WTRU 1602c shown in FIG. 16A may be configured to communicate with the base station 1614a, which may employ a cellular-based radio technology, and with the base station 1614b, which may employ an IEEE 802 radio technology.
[0096] FIG. 16B is a system diagram illustrating an example WTRU 1202. As shown in FIG. 16B, the WTRU 1202 may include a processor 1618, a transceiver 1620, a transmit/receive element 1622, a speaker/microphone 1624, a keypad 1626, a display/touchpad 1628, nonremovable memory 1630, removable memory 1632, a power source 1634, a global positioning system (GPS) chipset 1636, and/or other peripherals 1638, among others. It will be appreciated that the WTRU 1602 may include any sub-combination of the foregoing elements while remaining consistent with an embodiment. [0097] The processor 1618 may be a general purpose processor, a special purpose processor, a conventional processor, a digital signal processor (DSP), a plurality of microprocessors, one or more microprocessors in association with a DSP core, a controller, a microcontroller, Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) circuits, any other type of integrated circuit (IC), a state machine, and the like. The processor 1618 may perform signal coding, data processing, power control, input/output processing, and/or any other functionality that enables the WTRU 1602 to operate in a wireless environment. The processor 1618 may be coupled to the transceiver 1620, which may be coupled to the transmit/receive element 1622. While FIG. 16B depicts the processor 1618 and the transceiver 1620 as separate components, it will be appreciated that the processor 1618 and the transceiver 1620 may be integrated together in an electronic package or chip.
[0098] The transmit/receive element 1622 may be configured to transmit signals to, or receive signals from, a base station (e.g., the base station 1614a) over the air interface 1616. For example, in one embodiment, the transmit/receive element 1622 may be an antenna configured to transmit and/or receive RF signals. In an embodiment, the transmit/receive element 1622 may be an emitter/detector configured to transmit and/or receive IR, UV, or visible light signals, for example. In yet another embodiment, the transmit/receive element 1622 may be configured to transmit and/or receive both RF and light signals. It will be appreciated that the transmit/receive element 1622 may be configured to transmit and/or receive any combination of wireless signals.
[0099] Although the transmit/receive element 1622 is depicted in FIG. 16B as a single element, the WTRU 1602 may include any number of transmit/receive elements 1622. More specifically, the WTRU 1602 may employ MTMO technology. Thus, in one embodiment, the WTRU 1602 may include two or more transmit/receive elements 1622 (e.g., multiple antennas) for transmitting and receiving wireless signals over the air interface 1616.
[0100] The transceiver 1620 may be configured to modulate the signals that are to be transmitted by the transmit/receive element 1622 and to demodulate the signals that are received by the transmit/receive element 1622. As noted above, the WTRU 1602 may have multi-mode capabilities. Thus, the transceiver 1620 may include multiple transceivers for enabling the WTRU 1602 to communicate via multiple RATs, such as R and IEEE 802.11, for example.
[0101] The processor 1618 of the WTRU 1602 may be coupled to, and may receive user input data from, the speaker/microphone 1624, the keypad 1626, and/or the display/touchpad 1628 (e.g., a liquid crystal display (LCD) display unit or organic light-emitting diode (OLED) display unit). The processor 1618 may also output user data to the speaker/microphone 1624, the keypad 1626, and/or the display/touchpad 1628. In addition, the processor 1618 may access information from, and store data in, any type of suitable memory, such as the non-removable memory 1630 and/or the removable memory 1632. The non-removable memory 1630 may include random-access memory (RAM), read-only memory (ROM), a hard disk, or any other type of memory storage device. The removable memory 1632 may include a subscriber identity module (SFM) card, a memory stick, a secure digital (SD) memory card, and the like. In other embodiments, the processor 1618 may access information from, and store data in, memory that is not physically located on the WTRU 1602, such as on a server or a home computer (not shown).
[0102] The processor 1618 may receive power from the power source 1634, and may be configured to distribute and/or control the power to the other components in the WTRU 1602. The power source 1634 may be any suitable device for powering the WTRU 1602. For example, the power source 1634 may include one or more dry cell batteries (e.g., nickel -cadmium (NiCd), nickel-zinc (NiZn), nickel metal hydride (NiMH), lithium-ion (Li-ion), etc.), solar cells, fuel cells, and the like.
[0103] The processor 1618 may also be coupled to the GPS chipset 1636, which may be configured to provide location information (e.g., longitude and latitude) regarding the current location of the WTRU 1602. In addition to, or in lieu of, the information from the GPS chipset 1636, the WTRU 1602 may receive location information over the air interface 1616 from a base station (e.g., base stations 1614a, 1614b) and/or determine its location based on the timing of the signals being received from two or more nearby base stations. It will be appreciated that the WTRU 1602 may acquire location information by way of any suitable location-determination method while remaining consistent with an embodiment.
[0104] The processor 1618 may further be coupled to other peripherals 1638, which may include one or more software and/or hardware modules that provide additional features, functionality and/or wired or wireless connectivity. For example, the peripherals 1638 may include an accelerometer, an e-compass, a satellite transceiver, a digital camera (for photographs and/or video), a universal serial bus (USB) port, a vibration device, a television transceiver, a hands free headset, a Bluetooth® module, a frequency modulated (FM) radio unit, a digital music player, a media player, a video game player module, an Internet browser, a Virtual Reality and/or Augmented Reality (VR/AR) device, an activity tracker, and the like. The peripherals 1638 may include one or more sensors, the sensors may be one or more of a gyroscope, an accelerometer, a hall effect sensor, a magnetometer, an orientation sensor, a proximity sensor, a temperature sensor, a time sensor; a geolocation sensor; an altimeter, a light sensor, a touch sensor, a magnetometer, a barometer, a gesture sensor, a biometric sensor, and/or a humidity sensor. [0105] The WTRU 1602 may include a full duplex radio for which transmission and reception of some or all of the signals (e.g., associated with particular subframes for both the UL (e.g., for transmission) and downlink (e.g., for reception) may be concurrent and/or simultaneous. The full duplex radio may include an interference management unit to reduce and or substantially eliminate self-interference via either hardware (e.g., a choke) or signal processing via a processor (e.g., a separate processor (not shown) or via processor 1618). In an embodiment, the WRTU 1602 may include a half-duplex radio for which transmission and reception of some or all of the signals (e.g., associated with particular subframes for either the UL (e.g., for transmission) or the downlink (e.g., for reception)).
[0106] Although features and elements are described above in particular combinations, one of ordinary skill in the art will appreciate that each feature or element can be used alone or in any combination with the other features and elements. In addition, the methods described herein may be implemented in a computer program, software, or firmware incorporated in a computer- readable medium for execution by a computer or processor. Examples of computer-readable storage media include, but are not limited to, a read only memory (ROM), a random access memory (RAM), a register, cache memory, semiconductor memory devices, magnetic media such as internal hard disks and removable disks, magneto-optical media, and optical media such as CD- ROM disks, and digital versatile disks (DVDs). A processor in association with software may be used to implement a radio frequency transceiver for use in a WTRU, UE, terminal, base station, RNC, or any host computer.

Claims

CLAIMS What is Claimed:
1. A method of providing video content, comprising: receiving, over a broadcast channel of a first network, a first video stream and metadata regarding the availability of a second video stream via a second network; causing display of the first video stream on a first display device; presenting the availability of at least the second video stream to a user, based on the metadata; receiving a selection by the user of the second video stream; requesting, responsive to the selection by the user, the second video stream over the second network using the received metadata; receiving the second video stream over the second network; and causing display of the second video stream on the first display device.
2. The method of claim 1, wherein the display of the first video stream and second video stream are time synchronized.
3. The method of claim 2, wherein the first and second video streams are time-synchronized at a receiver according to a receiver buffer model.
4. The method of any of claims 2-3, wherein the first video stream is delayed to enable time- synchronization with the second video stream.
5. The method of any of claims 1-4, wherein the metadata regarding the availability of the second video stream identifies an object of interest, and wherein the second video stream is an enhanced view of the object of interest.
6. The method of any of claims 1-5, wherein the metadata regarding the availability of the second video stream identifies a plurality of objects of interest, and wherein receiving the selection by the user of the second video stream comprises selection by the user of an object of interest associated with the second video stream.
7. The method of any of claims 1-6, wherein causing display of the second video stream comprises causing display of the second video stream simultaneously with display of the first video stream.
8. The method of any of claims 1-7, wherein causing display of the second video stream comprises causing display of the second video stream in a picture-in-picture relationship with the first video stream.
9. The method of any of claims 1-8, wherein the metadata regarding the availability of the second video stream identifies an object of interest, and wherein the second video stream is a zoomed view of the object of interest.
10. The method of any of claims 1-9, wherein causing display of the second video stream comprises causing display of the second video stream at a first location within the first video stream, the first location indicated as an open area by the metadata.
11. The method of any of claims 1-10, wherein the second video stream is received over the second network as DASH segments via HTTP.
12. The method of any of claims 1-11, wherein the broadcast channel of the first network uses ROUTE protocol.
13. The method of any of claims 1-12, further comprising, responsive to reaching an end of the second video stream, causing display of only the first video stream on the first display device.
14. The method of any of claims 1-12, further comprising, responsive to a user indication, stopping display of the second video stream and causing display of only the first video stream on the first display device.
15. A system for providing video content, the system comprising a processor and a non- transitory storage medium storing instructions operative, when executed on the processor, to perform functions comprising: receiving, over a broadcast channel of a first network, a first video stream and metadata regarding the availability of a second video stream via a second network; causing displaying of the first video stream on a first display device; presenting the availability of at least the second video stream to a user, based on the metadata; receiving a selection by the user of the second video stream; requesting, responsive to the selection by the user, the second video stream over the second network using the received metadata; receiving the second video stream over the second network; and causing display of the second video stream on the first display device.
PCT/US2017/048712 2016-09-02 2017-08-25 Systems and methods for hybrid network delivery of objects of interest in video WO2018044731A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201662383366P 2016-09-02 2016-09-02
US62/383,366 2016-09-02

Publications (1)

Publication Number Publication Date
WO2018044731A1 true WO2018044731A1 (en) 2018-03-08

Family

ID=59846643

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2017/048712 WO2018044731A1 (en) 2016-09-02 2017-08-25 Systems and methods for hybrid network delivery of objects of interest in video

Country Status (1)

Country Link
WO (1) WO2018044731A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110177256A (en) * 2019-06-17 2019-08-27 北京影谱科技股份有限公司 A kind of tracking video data acquisition methods and device
CN112437233A (en) * 2021-01-26 2021-03-02 北京深蓝长盛科技有限公司 Video generation method, video processing device and camera equipment

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2793479A1 (en) * 2011-12-12 2014-10-22 LG Electronics Inc. Device and method for receiving media content
US20150128178A1 (en) * 2012-03-30 2015-05-07 Sony Corporation Method, device and computer program product for outputting a transport stream

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2793479A1 (en) * 2011-12-12 2014-10-22 LG Electronics Inc. Device and method for receiving media content
US20150128178A1 (en) * 2012-03-30 2015-05-07 Sony Corporation Method, device and computer program product for outputting a transport stream

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
JEAN LE FEUVRE ET AL: "Hybrid Broadcast Services using MPEG DASH", 18 February 2013 (2013-02-18), XP055414256, Retrieved from the Internet <URL:http://h2b2vs.epfl.ch/wp-content/uploads/2013/04/21-Hybrid-broadcast-services-using-MPEG-DASH.pdf> [retrieved on 20171010] *
WALKER GORDON KENT ET AL: "ROUTE/DASH IP Streaming-Based System for Delivery of Broadcast, Broadband, and Hybrid Services", IEEE TRANSACTIONS ON BROADCASTING, IEEE SERVICE CENTER, PISCATAWAY, NJ, US, vol. 62, no. 1, 1 March 2016 (2016-03-01), pages 328 - 337, XP011608972, ISSN: 0018-9316, [retrieved on 20160304], DOI: 10.1109/TBC.2016.2515539 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110177256A (en) * 2019-06-17 2019-08-27 北京影谱科技股份有限公司 A kind of tracking video data acquisition methods and device
CN110177256B (en) * 2019-06-17 2021-12-14 北京影谱科技股份有限公司 Tracking video data acquisition method and device
CN112437233A (en) * 2021-01-26 2021-03-02 北京深蓝长盛科技有限公司 Video generation method, video processing device and camera equipment
CN112437233B (en) * 2021-01-26 2021-04-16 北京深蓝长盛科技有限公司 Video generation method, video processing device and camera equipment

Similar Documents

Publication Publication Date Title
US11849178B2 (en) Metrics and messages to improve experience for 360-degree adaptive streaming
US20210014472A1 (en) Methods and apparatus of viewport adaptive 360 degree video delivery
US11736675B2 (en) Viewpoint metadata for omnidirectional video
US11770594B2 (en) 360-degree video delivery over next generation network
US20210243418A1 (en) 360 degree multi-viewport system
US20190253747A1 (en) Systems and methods for integrating and delivering objects of interest in video
JP7405738B2 (en) Tracked video zooming
WO2018035133A1 (en) Secondary content insertion in 360-degree video
US11924397B2 (en) Generation and distribution of immersive media content from streams captured via distributed mobile devices
US20180270515A1 (en) Methods and systems for client interpretation and presentation of zoom-coded content
WO2018044731A1 (en) Systems and methods for hybrid network delivery of objects of interest in video

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17764942

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17764942

Country of ref document: EP

Kind code of ref document: A1