WO2016008131A1 - Techniques pour lire séparément des données audio et vidéo dans des réseaux locaux - Google Patents

Techniques pour lire séparément des données audio et vidéo dans des réseaux locaux Download PDF

Info

Publication number
WO2016008131A1
WO2016008131A1 PCT/CN2014/082390 CN2014082390W WO2016008131A1 WO 2016008131 A1 WO2016008131 A1 WO 2016008131A1 CN 2014082390 W CN2014082390 W CN 2014082390W WO 2016008131 A1 WO2016008131 A1 WO 2016008131A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio
video
rendering
content
media
Prior art date
Application number
PCT/CN2014/082390
Other languages
English (en)
Inventor
Yunchao CHEN
Lu Jiang
Original Assignee
21 Vianet Group, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 21 Vianet Group, Inc. filed Critical 21 Vianet Group, Inc.
Priority to PCT/CN2014/082390 priority Critical patent/WO2016008131A1/fr
Publication of WO2016008131A1 publication Critical patent/WO2016008131A1/fr

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/434Disassembling of a multiplex stream, e.g. demultiplexing audio and video streams, extraction of additional data from a video stream; Remultiplexing of multiplex streams; Extraction or processing of SI; Disassembling of packetised elementary stream
    • H04N21/4341Demultiplexing of audio and video streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/4302Content synchronisation processes, e.g. decoder synchronisation
    • H04N21/4307Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen
    • H04N21/43076Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen of the same content streams on multiple devices, e.g. when family members are watching the same movie on different devices
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/4302Content synchronisation processes, e.g. decoder synchronisation
    • H04N21/4305Synchronising client clock from received content stream, e.g. locking decoder clock with encoder clock, extraction of the PCR packets
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/436Interfacing a local distribution network, e.g. communicating with another STB or one or more peripheral devices inside the home
    • H04N21/43615Interfacing a Home Network, e.g. for connecting the client to a plurality of peripherals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/45Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies, resolving scheduling conflicts
    • H04N21/462Content or additional data management, e.g. creating a master electronic program guide from data received from the Internet and a Head-end, controlling the complexity of a video stream by scaling the resolution or bit-rate based on the client capabilities
    • H04N21/4621Controlling the complexity of the content stream or additional data, e.g. lowering the resolution or bit-rate of the video stream for a mobile client with a small screen
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • H04N21/8547Content authoring involving timestamps for synchronizing content

Definitions

  • the present invention relates generally to playing audio and video data, and in particular, to separately playing audio and video data in local networks.
  • the multimedia device may render a portion of the audio content with a corresponding portion of the video content at the same time point as determined based on a timestamp for playing the portion of the audio content.
  • the audio content may be delivered to an audio device over a network that is entirely different from another network over which the video content is delivered to a display device.
  • Many differences and variations in device types, underlying network types, data sizes, network conditions, processing powers, memory spaces, load conditions on devices, etc. tend to generate a large amount of time differences in the playing of the audio content and the video content of the same media program by these different devices.
  • FIG. 1A and FIG. IB illustrate example system configurations
  • FIG. 2 illustrates an example media device
  • FIG. 3 illustrates an example audio rendering device
  • FIG. 4 illustrates an example video rendering device
  • FIG. 5A and FIG. 5B illustrate example process flows
  • FIG. 6 illustrates an example hardware platform on which a computer or a computing device as described herein may be implemented.
  • Example embodiments which relate to separately playing audio and video data in local networks, are described herein.
  • numerous specific details are set forth in order to provide a thorough understanding of the present invention. It will be apparent, however, that the present invention may be practiced without these specific details. In other instances, well-known structures and devices are not described in exhaustive detail, in order to avoid unnecessarily occluding, obscuring, or obfuscating the present invention.
  • a multitude of content rendering devices may be available for rendering multimedia content at a given location such as a home, an office, a classroom, a theater, etc. Each of these content rendering devices may have its respective set of capabilities different from other content rendering devices. For example, individual content rendering devices may be configured to receive their respective media content over different networks, different data links, different end-to-end network paths, etc. Examples of networks may include, but are not limited to only, any of: Ethernet networks, Wi-Fi networks, Bluetooth networks, point-to-point networks, HDMI video links/networks, etc.
  • An earphone device may be able to receive audio content over a Bluetooth network.
  • An audio receiver system may be able to receive audio content over an audio cable, a radio frequency link, a Wi-Fi link, etc.
  • a display device may be able to receive video content over a video link such as an HDMI link, etc.
  • a tablet computer or a wearable computer may be able to receive audio content and video content wirelessly via a public or private wireless network.
  • Transmission rates, transmission delays, content processing delays, network-specific latencies, device-specific latencies, component-specific latencies, etc. may vary greatly among individual content rendering devices.
  • a display device may receive video content over an HDMI link at a high data transmission rate and render the video content with a relatively low or moderate latency.
  • a Bluetooth earphone may receive audio content with a relatively low data transmission rate and render the audio content with a different latency as compared with the display device.
  • a media device is used to separate audiovisual data of a received media program into video content and audio content and streamed the video content and audio content separately in different data streams to different content rendering devices (e.g., display devices, audio devices, etc.) for rendering.
  • content rendering devices e.g., display devices, audio devices, etc.
  • video chunks, decoded from the audiovisual data received by the media device which comprise video content to be rendered by a specific video rendering device, can be compressed, transcoded or transrated into a video data stream such that the video data stream can be transmitted to the video rendering device at a specific video transmission rate supported by the video rendering device.
  • the media device can adaptively adjust the specific video transmission rate based on transmission and rendering statistics (e.g., related to an end-to-end network path between the media device and the video rendering device, the processing speed of the video rendering device, etc.) collected while the media program is being rendered collectively by the different audio and video rendering devices.
  • audio chunks, decoded from the audiovisual data as received by the media data which comprise audio content to be rendered by a specific audio rendering device can be compressed, transcoded or transrated into an audio data stream such that the audio data stream can be transmitted to the audio rendering device at a specific audio transmission rate supported by the audio rendering device.
  • the media device can adaptively adjust the specific audio transmission rate based on transmission and rendering statistics (e.g., related to an end-to-end network path between the media device and the audio rendering device, the processing speed of the audio rendering device, etc.) collected while the media program is being rendered collectively by the different audio and video rendering devices.
  • the media device may be configured to perform clock synchronization operations with individual content rendering devices, in order to synchronize clocks used by the content rendering devices for content rendering to a common clock for content rendering.
  • Each data stream of one or more audio data streams and one or more video data streams transmitted by the media device to one or more audio rendering devices and one or more video rendering devices carries the same set of timestamps.
  • a timestamp indicates a specific time point in reference to the common clock for rendering a specific audio data unit or a video frame.
  • a video frame and an audio data unit that are to be rendered at the same time point share the same timestamp.
  • the media device (102) can adaptively determine a combined audio and video rendering latency at any given time while processing the audiovisual data.
  • the combined audio and video rendering latency may be a combination of a maximum rendering latency among the individual rendering devices and a safe margin.
  • the combined audio and video rendering latency may be used to determine the above-mentioned set of timestamps to indicate specific time points at which video frames or audio data units are to be rendered by the individual rendering devices such that all the rendering devices are enabled to render their respective video frames or audio data units at these indicated time points.
  • mechanisms as described herein form a part of an information processing system, including but not limited to any of: a handheld device, game machine, television, laptop computer, netbook computer, tablet computer, cellular radiotelephone, electronic book reader, point of sale terminal, desktop computer, computer workstation, computer kiosk, plug-in devices, media accessing devices, or various other kinds of terminals and media processing units.
  • FIG. 1A depicts an example configuration 100 comprising a media device 102, an audio rendering device 104, a video rendering device 106, one or more (e.g., cloud-based, on premise, internet-based, etc.) media content servers 110, etc.
  • the media device (102) can be communicatively and/or operatively linked with one or more computer networks (108).
  • the one or more computer networks (108) may include, but are not limited to only, any of: one or more of the Internet, intranets, wireless or wire-based local area networks, telecom networks, wireless or wire-based wide area networks (WANs), etc.
  • the media device (102) can be communicatively and/or operatively linked with a first (e.g., local, non-local, etc.) network (112-1) for communicating with the audio rendering device (104), and operatively linked with a second (e.g., local, non-local, etc.) network (112-2) for communicating with the video rendering device (106).
  • a first e.g., local, non-local, etc.
  • a second network e.g., local, non-local, etc.
  • Examples of (e.g., local, non-local, etc.) networks between a media device (e.g., 102, etc.) and a rendering device (e.g., 104, 106, etc.) include, but are not limited to only, any of: Wi-Fi networks, Bluetooth networks, infrared links, CDMA networks, GSM networks, HDMI video links, non-HDMl video links, radio frequency (RF) networks, optical links, etc.
  • Wi-Fi networks e.g., Bluetooth networks, infrared links, CDMA networks, GSM networks, HDMI video links, non-HDMl video links, radio frequency (RF) networks, optical links, etc.
  • RF radio frequency
  • the first network (112-1) may be the same as the second network (112-2). In some other embodiments, the first network (112-1) may be different from the second network (112-2); for example, the first network (1 12- ) may be a Bluetooth network, whereas the second network (112-2) may be a non-Bluetooth network.
  • a media device refers to a computing device configured to separate audiovisual data (e.g., a combination or mixture of audio data and video data, etc.) into audio data component and video data component.
  • the media device (102) can be further configured to synchronize the playing of an audio data stream (generated based on the audio data component) by the audio rendering device (104) with the playing of a video data stream (generated from the video data component) by the video rendering device (106).
  • An audio rendering device refers to a content rendering device for rendering audio content, which is a computing device specifically configured to receive an audio data stream from a media device (e.g., 102, etc.) and operate in conjunction with the media device (102) to synchronize the playing the audio data stream by the audio rendering device (104) with the playing of a corresponding video data stream by a video rendering device (e.g., 106, etc.).
  • a media device e.g., 102, etc.
  • a video rendering device e.g., 106, etc.
  • the video rendering device (106) refers to a content rendering device for rendering video content, which is a computing device specifically configured to receive a video data stream from the media device (102) and operate in conjunction with the media device (102) to synchronize the playing the video data stream by the video rendering device (106) with the playing of a corresponding audio data stream by the audio rendering device (104).
  • the media device (1 2) can retrieve audiovisual data as described herein from one or more of a variety of audiovisual data sources.
  • the media device (102) can retrieve the audiovisual data from one or more storage media accessible to the media device (102).
  • the media device (102) can retrieve audiovisual data from one or more of local computing devices or remote media content servers (e.g., 110, etc.) accessible to the media device (102).
  • the audiovisual data received by the media device (102) can be formatted in one of a variety of standard-based or proprietary content formats, and received in one of a variety of packages including but not limited to media files, streams, broadcast signals, cable signals, analog signals, digital signals, etc.
  • FIG. IB depicts an example configuration 100-1 comprising a media device 102, two audio rendering devices 104-1 and 104-2, a video rendering device 106-1, one or more (e.g., cloud-based, on premise, internet-based, etc.) media content servers 110-1 and 110-2, etc.
  • the media device (102) can be communicatively and/or operatively linked with one or more computer networks (108).
  • the media device (102) can be communicatively and/or operatively linked with a third (e.g., local, non-local, etc.) network (112-3) for communicating with the audio rendering devices (104-1 and 114-2), and operatively linked with a fourth (e.g., local, non-local, etc.) network (112-4) for communicating with the video rendering device (106-1).
  • a third e.g., local, non-local, etc.
  • a fourth e.g., local, non-local, etc.
  • the media device (102) establishes an individual end-to-end network path (e.g., 114-1, 114-2, 1 14-3, etc.) with each of the audio rendering devices (104-1 andl04-2) and the video rendering device (106-1).
  • the third network (112-3) may be the same as the fourth network (112-4). In some other embodiments, the fourth network (112-3) may be different from the second network (1 12-4).
  • the media device (102) is configured to synchronize (e.g., a portion of, all, etc.) clocks (used for rendering content) of the audio rendering devices (104-1 and 104-2) and the video rendering device (106-1) to a common clock (used for rendering content).
  • the media device (102) is further configured to separate audiovisual data (e.g., a combination or mixture of audio data and video data, etc.) into audio data component and video data component; generate, based at least in part on the audio data component, two audio data streams for the audio rendering devices (104-1 and 104-2), respectively; generates, based at least in part on the video data component, a video data stream for the video rendering device (106-1); etc.
  • audiovisual data include, but are not limited to, one or more of: movies, media programs, live TV programs, etc.
  • the media device (102) While the media device (102) is separating the audio visual data into the audio data component and the video data component and generating the audio data streams and the video data stream, the media device (102) communicates with the audio rendering devices (104-1 andl04-2) and the video rendering device (106-1) to determine individual rendering characteristics of each of the audio rendering devices (104-1 andl 04-2) and the video rendering device (106-1) over a respective end-to-end network path (e.g., 114-1, 114-2, 114-3, etc.).
  • a respective end-to-end network path e.g., 114-1, 114-2, 114-3, etc.
  • rendering characteristics can be determined as functions of time based on bandwidth tests, delay or latency tests, latency tests, analyses of statistics collected for operations involved in rendering audio or video content by each of the audio rendering devices (104-1 and 104-2) and the video rendering device (106-1), etc.
  • the media device (102) is configured to adaptively determine an individual transmission rate (e.g., a combination of a maximum transmission rate and a safe margin, etc.) that can be supported by each of the audio rendering devices (104-1 andl04-2) and the video rendering device (106-1) at any given time while processing the audiovisual data.
  • a safe margin for setting transmission rate as described herein can be set either with absolute values (e.g., 0.5Mbps, 1 Mbps, etc., over the maximum transmission rate) or relative values (e.g., 5%, 10%, 15%, etc., of the maximum transmission rate).
  • a safe margin for setting transmission rate can be dynamically and/or adaptively set.
  • the safe margin for setting transmission rates may be set relatively low (e.g., 0.5 Mbps, 5%, etc.).
  • the safe margin for setting transmission rates may be set relatively high (e.g., 2 Mbps, 30%, etc.).
  • network characteristics such as bit error rates, delays, jitters, etc., can be taken into consideration in determining a safe margin for setting transmission rates.
  • the media device (102) is also configured to adaptively determine a combined audio and video rendering latency at any given time while processing the audiovisual data.
  • the combined audio and video rendering latency may be a combination of a maximum rendering latency and a safe margin.
  • the maximum rendering latency can be set to be as great as any individual rendering latency among individual rendering latencies of all content rendering devices. In the present example, the maximum rendering latency can be set to be the greatest rendering latency among the individual audio rendering latencies of the audio rendering device 104-1 and the audio rendering device 104-2 and the individual video rendering latency of the video rendering device 106-1.
  • an individual audio or video rendering latency may account for a number of delays and latencies such as transmission delays, content processing delays, processor loads, processor ratings, network-specific latencies, device-specific latencies, component-specific latencies, etc., related to content rendering by an individual audio or video rendering device.
  • the individual audio or video rendering latency excludes idle time spent by the audio or video rendering device for synchronizing content rendering.
  • a safe margin for setting delays or latencies as described herein can be set either with absolute values (e.g., 50 milliseconds, 100 milliseconds, etc., over the maximum delay or latency) or relative values (e.g., 5%, 10%, 15%, etc., of the maximum delay or latency).
  • a safe margin for setting delays or latencies can be dynamically and/or adaptively set. For example, if a network, a data link, an end-to-end network path, etc., provides a relatively low delay or latency, the safe margin for setting delays or latencies may be set relatively low (e.g., 10 milliseconds, 5%, etc.). On the other hand, if a network, a data link, an end-to-end network path, etc., provides a relatively high delay or latency, the safe margin for setting delays or latencies may be set relatively high (e.g., 500 milliseconds, 30%, etc.). In various embodiments, one or more of network characteristics such as bit error rates, delays, jitters, etc., cm be taken into consideration in determining a safe margin for setting delays or latencies.
  • the media device (102) comprises rate conversion units, modules, etc., to adapt the audio data streams to respective audio transmission rates supported by the audio rendering devices (104-1 and 104-2) and the video rendering device (106-1), and to adapt the video data stream to the video transmission rate supported by the video rendering device (106-1).
  • the media device (102) To synchronize the playing rendering of audio content in the audio data streams by each of the audio rendering devices (104-1 and 104-2), of video content in the video data stream by the video rendering device (106-1), the media device (102) includes a set of timestamps in reference to the common clock in each of the audio data streams and the video data stream to the audio rendering devices (104-1 and 104-2) and the video rendering device (106-1).
  • timestamps in the set of timestamps are generated based on timestamps decoded from the audiovisual data and the combined audio and video rendering latency determined at a given time, and are to be interpreted by a recipient device in reference to the common clock used for synchronized rendering by the audio rendering devices (104-1 and 104-2) and the video rendering device (106-1).
  • a video frame decoded from the video data stream may share a specific timestamp with respective decoded audio data units in the audio data streams.
  • the video rendering device (106-1) is configured to render the video frame at a specific time point as indicated by the specific timestamp
  • the audio rendering devices (104-1 and 104-2) are configured to render the respective decoded audio data units at the same specific time point as indicated by the specific timestamp.
  • the individual audio and video transmission rates can be dynamically (and individually) adjusted continuously, from time to time.
  • the combined audio and video rendering latency can also be dynamically adjusted continuously, from time to time, etc., to adapt to real-time or near-real-time changes occurring in the rendering characteristics involving the audio rendering devices (104-1 and 104-2) and the video rendering device (106-1).
  • smoothing algorithms may be implemented to transition gracefully from a first individual transmission rate at a first time to a second individual transmission rate a second time subsequent to the first time for a content rendering device (e.g., audio rendering device 104-1, audio rendering device 104-2, video rendering device 106-1, etc.).
  • a content rendering device e.g., audio rendering device 104-1, audio rendering device 104-2, video rendering device 106-1, etc.
  • jitter buffers, a threshold to limit rate changes per unit time, etc. can be implemented by one or more of the media device (102), the audio rendering devices (104-1 and 104-2), or the video rendering device (106-1).
  • the media device ( 102) can retrieve the audiovisual data from one or more of a variety of audiovisual data sources (e.g., 110-1, 110-2, etc.) accessible to the media device (102).
  • audiovisual data sources e.g., 110-1, 110-2, etc.
  • FIG. 2 illustrates an example media device (e.g., 102 of FIG. 1A, etc.) comprising a content separation unit 202, a content synchronization unit 204, an audio transmission unit 206, a video transmission unit 208, etc.
  • the content separation unit (202) compdses software, hardware, a combination of software and hardware, etc., configured to retrieve audiovisual data 210 from one or more of internal or external content sources such as media content server 110 of FIG. 1, locally or remotely accessible storage media, media files, media containers, bitstreams from one or more media streaming servers, etc.
  • the audiovisual data (210) as received by the media device (102) comprises multiplexed audio and video data (or a mixture of audio or video data).
  • the audiovisual data (210) may comprise a combination of a set of video frames (or video chunks) and a set of corresponding audio data units (e.g., audio samples, audio data blocks, audio chunks, etc.).
  • the audiovisual data (210) may comprise a combination of links (e.g., local links, hyperlinks, etc.) to audio and/or video data.
  • a set of (input) timestamps can be extracted from the audiovisual data (210).
  • timestamps as described herein include integers indicating time points from a reference time point such as the beginning of a media program, etc.; frame indexes such as video frame indexes, etc.; integers indicating multitudes of a set time interval such as 20 milliseconds, etc.,; logical time values relative to a reference time point (e.g., a particular scheduled time, the beginning, etc.) in a media program, etc.; etc.
  • a video frame (in the set of video frames) that is to be rendered at the same time as an audio data unit (in the set of audio data units) is to be rendered may share or comprise the same timestamp (in the set of input timestamps) with the audio data unit.
  • the content separation unit (202) is further configured to separate or de-multiplex the audiovisual data (210) into audio data component 216-1 and video data component 216-2; forward the audio data component (216-1) to the audio transmission unit (206) for further processing; forward the video data component (216-2) to the video transmission unit (208) for further processing; etc.
  • the content separation unit (202) includes the same set of input timestamps extracted from the audiovisual data (210) with each of the audio data component (216-1) and the video data component (216-2).
  • the content synchronization unit (204) comprises software, hardware, a combination of software and hardware, etc., configured to synchronize (e.g., a portion of, all, etc.) clocks (used for content rendering) of the one or more audio rendering devices and the one or more video rendering devices to a common clock (used for content rendering).
  • the common clock can be based on an external clock external to the media device (102), an internal clock of the media device (102), a clock derived from one or more clocks of the media device (102), the audio rendering devices, or video rendering devices, etc.
  • some or all of the media device (102), the audio rendering devices, or the video rendering devices may be synchronized to an external clock sources provided or supported by a satellite based navigational system (e.g., the BeiDou navigational system, the GPS navigational system, etc.).
  • a satellite based navigational system e.g., the BeiDou navigational system, the GPS navigational system, etc.
  • some or all of the media device, the audio rendering devices, or the video rendering devices may be synchronized to a terrestrial radio clock.
  • some or all of the media device (102), the audio rendering devices, or the video rendering devices may be synchronized to a clock (used for content rendering) derived based on one or more internal clocks of one of the media device (102), the audio rendering devices, or video rendering devices.
  • An algorithm based on one or more of a network time protocol, etc. can be implemented by some or all of the media device (102), the audio rendering devices, or video rendering devices, for the purpose of synchronizing the clocks (used for content rendering) of the media device (102) and/or the content rendering systems.
  • the media device (102), or the content synchronization unit (204) therein can be configured to communicate over one (e.g., 212-1 of FIG. 3, etc.) of one or more data paths 212 with each audio rendering device (e.g., 104, etc.) of one or more audio rendering devices to determine audio rendering characteristics pertaining to that audio rendering device (104); communicate with each video rendering device (e.g., 106, etc.) of one or more video rendering devices to determine video rendering characteristics pertaining to that video rendering device (106); etc.
  • each audio rendering device e.g., 104, etc.
  • each video rendering device e.g., 106, etc.
  • Examples of rendering characteristics pertaining to a content rendering device may include, but are not limited to only, any of: (time- varying or time-constant) transmission rates (e.g., supported bit rates, etc.) between the media device (102) and the content rendering device, (time- varying or time-constant) round-trip times between the media device (102) and the content rendering device, (time- varying or time-constant) transmission delays between the media device (102) and the content rendering device, content processing delays for rendering operations performed internally by the content rendering device, other delays or latencies associated with the underlying network, devices, components, etc.
  • transmission rates e.g., supported bit rates, etc.
  • time- varying or time-constant transmission delays e.g., time- varying or time-constant transmission delays between the media device (102) and the content rendering device
  • content processing delays for rendering operations performed internally by the content rendering device e.g., other delays or latencies associated with the underlying network, devices, components, etc.
  • one or both of the media device (102) and the content rendering device may perform one or more specific operations.
  • one or both of the media device (102) and a content rendering device can run one or more bandwidth tests (e.g., by transmitting test data, by transmitting test files, test packets, etc.) on an end-to-end network path between the media device (102) and the content rendering device (e.g., 104, 106, etc.) to determine or estimate a transmission rate from the media device (102) to the content rendering device (e.g., 104, 106, etc.), continuously, from time to time, periodically, on-demand, over a plurality of time points, through polling, by emitting reporting events, etc.
  • bandwidth tests e.g., by transmitting test data, by transmitting test files, test packets, etc.
  • the content rendering device e.g., 104, 106, etc.
  • one or both of the media device (102) and a content rendering device can run one or more delay or latency tests (e.g., by transmitting test data, by transmitting test files, test packets, etc.) on the end-to-end network path between the media device (102) and the content rendering device (e.g., 104 or 106, etc.) to determine or estimate delays or latencies from the media device (102) to the content rendering device (e.g., 104 or 106, etc.), continuously, from time to time, periodically, on-demand, over a plurality of time points, through polling, by emitting reporting events, etc.
  • delay or latency tests e.g., by transmitting test data, by transmitting test files, test packets, etc.
  • the content rendering device e.g., 104 or 106, etc.
  • one or both of the media device (102) and a content rendering device can determine some or all of the rendering characteristics based on statistics collected by the media device (1 2) and the content rendering device (e.g., 104, 106, etc.). For example, real-time or non-real-time statistics collected for processing received audio or video content can be analyzed to determine transmission rates, transmission delays, content processing delays, other delays or latencies, etc., as functions of time.
  • the transmission rates, transmission delays, content processing delays, other delays or latencies, etc., as determined based on analyzing statistics can be updated on a real-time or near-real-time basis, at the same time when the media device (102) and the content rendering device (e.g., 104, 106, etc.) are processing and rendering audio or video content.
  • the media device (102) and the content rendering device e.g., 104, 106, etc.
  • Rendering characteristics such as transmission rates, transmission delays, processing delays, other delays or latencies, etc., are highly dependent on the types of underlying (e.g., local, non-local, etc.) networks and possibly time-varying conditions of the underlying networks.
  • the delivery/transmission of audio content e.g., a portion of 214, etc.
  • an audio rendering device e.g., 104 of FIG. 1A, etc.
  • the delivery/transmission of audio content is made over a different local network from one used for the delivery/transmission of video content (e.g., 222, etc.) from the media device (102) to a video rendering device (e.g., 106 of FIG. 1A, etc.).
  • the audio data (214) can be delivered over one or more first networks (e.g., 112-1 of FIG. 1 A, etc.), which may be Bluetooth based, Wi-Fi based, etc.
  • the video data (222) can be delivered over a second network (e.g., 112-2 of FIG. 1A, etc.), which may be non-Bluetooth based, HDMI video link based, etc.
  • the content synchronization unit (204) can be configured to determine individual rendering characteristics related to individual underlying networks used to transport audio data and video data to individual content rendering devices (e.g., the audio rendering device 104, the video rendering device 106, etc.).
  • the content synchronization unit (204) determines a combined audio and video rendering latency to be used for some or all of the content rendering devices that are to receive media content for rendering.
  • the content synchronization unit (204) can compute or estimate an individual content rendering latency (e.g., summing all of the transmission delay, the content processing delay, other delays or latencies, a safety margin, etc.) for each of all of the audio rendering devices and the video rendering devices, and select the largest of all of audio rendering latencies of the audio rendering devices and video rendering latencies of the video rendering devices as the combined audio and video rendering latency.
  • the content synchronization unit (204) further determines an individual transmission rate (e.g., a percentile such as 60% of the individual maximum transmission rate supported by the underlying network used to deliver content from the media device 102, etc.) for each of all the content rendering devices.
  • an individual transmission rate e.g., a percentile such as 60% of the individual maximum transmission rate supported by the underlying network used to deliver content from the media device 102, etc.
  • Individual audio transmission rates for individual audio rendering devices may be provided to the audio transmission unit (206) by the content synchronization unit (204) as one or more audio content transmission parameters (218-1) to be used in processing the audio data component (216-1).
  • individual video transmission rates for individual video rendering devices e.g., 106, etc.
  • the combined audio and video rendering latency, etc. can be provided to the video transmission unit (208) by the content synchronization unit (204) as one or more video content transmission parameters (218-2) to be used in processing the video data component (216-2).
  • the audio transmission unit (206) comprises software, hardware, a combination of software and hardware, etc., configured to receive the audio data component (216-1) from the content separation unit (202); receive the audio content transmission parameters (218-1) from the content synchronization unit (204); based at least in part on the audio content transmission parameters (218-1), generate output audio data 214; transmit the output audio content (214) to the one or more audio rendering devices.
  • the audio transmission unit (206) may be configured to generate one or more individual input audio data portions from the audio data component (216-1); each of the individual input audio data portions comprises audio content to be rendered by a specific audio rendering device of the one or more audio rendering devices.
  • An individual input audio data portion as described herein may refer to specific audio content designated for one or more of: a left front audio channel, a right front audio channel, a center audio channel, a left surround audio channel, a right surround audio channel, etc.
  • the audio data component (216-1) comprises multi-channel audio data (e.g., for a 5.1 configuration, for a 7.1 configuration, etc.).
  • An individual input audio data portion may comprise a sub-mix of the multi-channel audio data (e.g., a stereo mix for left front and right front audio speakers, etc.).
  • the audio transmission unit (206) Based on the one or more individual input audio data portions, the audio transmission unit (206) generates the output audio data (214) as a collection of one or more individual audio data streams. Each individual audio data stream in the one or more audio data streams is generated from a corresponding individual input audio data portion in the one or more input audio data portions.
  • An individual audio data stream as described herein refers to an output audio data portion that is delivered, transmitted, outputted, etc., at a transmission rate (e.g., a bit rate, a bandwidth, etc.) from the media device (102) to an audio rendering device for rendering.
  • the audio transmission unit (206) is configured to determine/select an individual audio rendering device to which an individual audio data stream in the output audio data (214) is to be transmitted; etc.
  • An audio rendering device e.g., 104 of FIG. 1 A, etc.
  • a left front audio speaker may comprise one or more of: a left front audio speaker, a right front audio speaker, a center audio speaker, a left surround audio speaker, a right surround audio speaker, a Bluetooth earphone that can be used to render a stereo audio mix, a mobile phone with one or more of speakers or earphones to render audio content, a tablet computer with one or more of speakers or earphones to render audio content, a home entertainment system, a multi-channel car-based audio content rendering system, a multi-speaker audio rendering device, etc.
  • the audio transmission unit (206) comprises one or more audio rate conversion sub-units 220.
  • the audio transmission unit (206) can use the one or more audio rate conversion sub-units (220) to transcode or transrate the individual input audio data portions in the audio data component (216-1) into the individual audio data streams in the output audio content (214) to be transmitted at respective individual audio transmission rates to respective individual audio rendering devices.
  • an audio rate conversion sub-unit as described herein can be used by the audio transmission unit (206) to transcode or transrate an individual input audio data portion (e.g., for a left front audio channel, etc.) in the audio data component (216-1) into an individual audio data stream in the output audio content (214) to be transmitted at an individual audio transmission rate to an individual audio rendering device (e.g., a content rendering device that renders audio content through a left front speaker, etc.); the individual audio transmission rate for the individual audio rendering device may be determined by the audio transmission unit (206) based on the audio content transmission parameters (218-1).
  • an individual audio transmission rate for the individual audio rendering device may be determined by the audio transmission unit (206) based on the audio content transmission parameters (218-1).
  • the audio rate conversion sub-unit may perform one or more operations such as filtering, quantization, noise reduction, mixing (e.g., down-mixing audio data for N channels to audio data in M channels, up-mixing audio data for M channels to audio data in N channels, where N is an integer greater than a positive integer M, etc.), encryption, compression/encoding, buffering, etc., to generate the individual audio data stream.
  • filtering quantization
  • noise reduction mixing
  • mixing e.g., down-mixing audio data for N channels to audio data in M channels, up-mixing audio data for M channels to audio data in N channels, where N is an integer greater than a positive integer M, etc.
  • encryption e.g., compression/encoding, buffering, etc.
  • the audio transmission unit (206) is configured to extract the set of input timestamps from the audio data component (216-1) as received from the content separation unit (202). This set of input timestamps is derived from the audiovisual data (210) and also embedded in the video data component (216-2) for synchronization purposes.
  • the audio transmission unit (206) is further configured to obtain/extract/determine the combined audio and video rendering latency based on the audio content transmission parameters (218-1) as received from the content synchronization unit (204). Based at least in part on the set of input timestamps and the combined audio and video rendering latency, the audio transmission unit (206) generates a set of (output) timestamps.
  • each timestamp in the set of output timestamps is derived from a combination (e.g., sum, etc.) of the combined audio and video rendering latency and a corresponding timestamp in the set of input timestamps.
  • the audio transmission unit (206) includes or embeds (e.g., a copy of, etc.) the set of output timestamps in each individual audio data stream, in the individual audio data streams, for transmission to a respective audio rendering device.
  • a timestamp in the set of output timestamps can be used by a recipient device to determine a specific time point with reference to the common clock at which a video frame or an audio data unit is to be rendered by the recipient device.
  • the video transmission unit (208) comprises software, hardware, a combination of software and hardware, etc., configured to receive the video data component (216-2) from the content separation unit (202); receive the video content transmission parameters (218-2) from the content synchronization unit (204); based at least in part on the audio content transmission parameters (218-2), generate output video data 222; transmit the output audio content (222) to the one or more video rendering devices.
  • the video transmission unit (208) may be configured to generate one or more individual input video data portions from the video data component (216-2); each of the individual input video data portions comprises video content to be rendered by a specific video rendering device of the one or more video rendering devices.
  • the video transmission unit (208) Based on the one or more individual input video data portions, the video transmission unit (208) generates the output video data (222) as a collection of one or more individual video data streams. Each individual video data stream in the one or more video data streams is generated from a corresponding individual input video data portion in the one or more input video data portions.
  • An individual audio data stream as described herein refers to an output video data portion that is delivered, transmitted, outputted, etc., at a transmission rate (e.g., a bit rate, a bandwidth, etc.) from the media device (102) to a video rendering device for rendering.
  • the video transmission unit (208) is configured to determine/select an individual video rendering device to which an individual video data stream in the output audio data (214) is to be transmitted; etc.
  • a video rendering device e.g., 106 of FIG. 1 A, etc.
  • the video transmission unit (208) comprises one or more video rate conversion sub-units 224.
  • the video transmission unit (208) can use the one or more video rate conversion sub-units (224) to transcode or transrate the individual input video data portions in the input video content (216-2) into the individual video data streams in the output video content (222) to be transmitted at respective individual video transmission rates to respective individual video rendering devices.
  • a video rate conversion sub-unit as described herein can be used by the video transmission unit (208) to transcode or transrate an individual input video data portion in the video data component (216-2) into an individual video data stream in the output video content (222) to be transmitted at an individual video transmission rate to an individual video rendering device (e.g., a content rendering device that renders video content through a display, etc.); the individual video transmission rate for the individual video rendering device maybe determined by the video transmission unit (208) based on the video content transmission parameters (218-2). Additionally, optionally, or alternatively, the video rate conversion sub-unit may perform one or more operations such as filtering, quantization, noise reduction, encryption, compression/encoding, buffering, etc., to generate the individual video data stream.
  • an individual video rendering device e.g., a content rendering device that renders video content through a display, etc.
  • the individual video transmission rate for the individual video rendering device maybe determined by the video transmission unit (208) based on the video content transmission parameters (218-2).
  • the video transmission unit (208) is configured to extract the set of input timestamps from the video data component (216-2) as received from the content separation unit (202). This set of input timestamps is derived from the audiovisual data (210) and also embedded in the audio data component (216-1) for synchronization purposes.
  • the video transmission unit (208) is further configured to obtain/extract/determine the combined audio and video rendering latency based on the audio content transmission parameters (218-2) as received from the content synchronization unit (204). Based at least in part on the set of input timestamps and the combined audio and video rendering latency, the video transmission unit (208) generates a set of (output) timestamps.
  • each timestamp in the set of output timestamps is derived from a combination (e.g., sum, etc.) of the combined audio and video rendering latency and a corresponding timestamp in the set of input timestamps.
  • the video transmission unit (208) includes or embeds (e.g., a copy of, etc.) the set of output timestamps in each individual video data stream, in the individual video data streams, for transmission to a respective video rendering device.
  • a timestamp in the set of output timestamps can be used by a recipient device to determine a specific time point with reference to the common clock at which a video frame or an audio data unit is to be rendered by the recipient device.
  • FIG. 3 illustrates an example audio rendering device (e.g., 104 of FIG. 1A, etc.) comprising an audio stream decoder 302, an audio synchronization module 304, an audio rendering module 306, etc.
  • an audio rendering device e.g., 104 of FIG. 1A, etc.
  • the audio synchronization module (304) comprises software, hardware, a combination of software and hardware, etc., configured to cooperate with a media device (e.g., 102 of FIG. 1 A or FIG. 2, etc.) to synchronize (e.g., all, etc.) a clock used by the audio rendering device (104) for rendering its received audio content with other clocks used for content rendering by other content rendering devices to which the media device (102) transmits respective audio content or video content for synchronized rendering in reference to a common clock (used for content rendering).
  • the common clock can be based on an external clock external to the media device (102), an internal clock of the media device (102), a clock derived from one or more clocks of the media device (102), the audio rendering devices, or the video rendering devices, etc.
  • the audio rendering device (104), or the audio synchronization module (304) therein may implement its part of a time synchronization algorithm based on one or more of a network time protocol, etc., used to synchronize the clocks (used for content rendering) of the media device (102) and/or the content rendering systems.
  • the audio rendering device (104), or the audio synchronization module (304) therein is configured to communicate over a data path 212-1 with the media device (102) or the content synchronization unit (204) therein, so that the media device (102) can determine audio rendering characteristics pertaining to the audio rendering device (104).
  • the audio rendering device (104), or the audio synchronization module (304) therein can perform operations in conjunction with the media device (102) to run one or more bandwidth tests (e.g., by transmitting test data, by transmitting test files, test packets, etc.) on an end-to-end network path between the media device (102) and the audio rendering device (104) for the purpose of determining or estimating a transmission rate from the media device ( 102) to the audio rendering device (104), continuously, from time to time, periodically, on-demand, over a plurality of time points, through polling, by emitting reporting events, etc.
  • bandwidth tests e.g., by transmitting test data, by transmitting test files, test packets, etc.
  • the audio rendering device (104), or the audio synchronization module (304) therein can perform operations in conjunction with the media device (102) to run one or more delay or latency tests (e.g., by transmitting test data, by transmitting test files, test packets, etc.) on the end-to-end network path between the media device (102) and the audio rendering device (104) for the purpose of determining or estimating delays or latencies from the media device (102) to the audio rendering device (104), continuously, from time to time, periodically, on-demand, over a plurality of time points, through polling, by emitting reporting events, etc.
  • delay or latency tests e.g., by transmitting test data, by transmitting test files, test packets, etc.
  • the audio rendering device (104) can be configured to collect statistics related to audio rendering operations performed by the audio rendering device (104). For example, real-time or non-real-time statistics can be collected for processing received audio content and analyzed, by one or both of the media device (102) and the audio rendering device (104), to determine transmission rates, transmission delays, content processing delays, other delays or latencies, etc., as functions of time.
  • the transmission rates, transmission delays, content processing delays, other delays or latencies, etc., determined based on analyzing statistics can be updated, by one or both of the media device (102) and the audio rendering device (104), on a real-time or near-real-time basis, at the same time when the media device (102) and the audio rendering device (104) are processing and rendering audio content with the audio rendering device (104).
  • the audio stream decoder (302) comprises software, hardware, a combination of software and hardware, etc., configured to receive, from the media device (102), an audio data stream 214-1 comprising specific audio content to be rendered by the audio rendering device (104); decode a set of audio data units (e.g., audio samples, audio data blocks, audio chunks, etc.) from the audio data stream (214-1); extract a set of corresponding timestamps from the audio data stream (214-1), which are included in the audio data stream (214-1) by the media device (102) for the purpose of synchronizing the playing/rendering of the audio data stream (214-1) by the audio rendering device (104) with the playing/rendering of other content data streams (e.g., other audio data streams, video data streams, etc.) by other content rendering devices (e.g., other audio rendering devices, video rendering devices, etc.); etc.
  • the audio stream decoder (302) is further configured to send/forward the set of decoded audio data units and the set of
  • the audio rendering module (306) comprises software, hardware, a combination of software and hardware, etc., configured to queue or store up to a certain number of decoded audio data units in memory (e.g., an audio jitter buffer, etc.) at any given time before rendering these decoded audio data units.
  • memory e.g., an audio jitter buffer, etc.
  • the audio rendering device (104) may be configured to generate digital drive values based on audio data in an audio data unit that is to be rendered at a specific time point (e.g., with reference to the common clock, etc.) specified by an individual timestamp in the set of timestamps, and drives one or more audio transducers 308 (e.g., earphones, audio speakers, etc.) to generate, at the specific time point, sound waves that represent an audio content portion carried by the audio data unit.
  • a specific time point e.g., with reference to the common clock, etc.
  • audio transducers 308 e.g., earphones, audio speakers, etc.
  • FIG. 4 illustrates an example video rendering device (e.g., 106 of FIG. 1A, etc.) comprising a video stream decoder 402, a video synchronization module 404, a video rendering module 406, etc.
  • a video rendering device e.g., 106 of FIG. 1A, etc.
  • the video synchronization module (404) comprises software, hardware, a combination of software and hardware, etc., configured to cooperate with a media device (e.g., 102 of FIG. 1A or FIG. 2, etc.) to synchronize (e.g., all, etc.) a clock used by the video rendering device (106) for rendering its received video content with other clocks for content rendering by other content rendering devices to which the media device (102) transmits respective audio content or video content for synchronized rendering in reference to a common clock (used for content rendering).
  • the common clock can be based on an external clock external to the media device (102), an internal clock of the media device (102), a clock derived from one or more clocks of the media device (102), the audio rendering devices, or the video rendering devices, etc.
  • the video rendering device (106) may implement its part of a time synchronization algorithm based on one or more of a network time protocol, etc., used to synchronize the clocks (used for content rendering) of the media device (102) and/or the content rendering systems.
  • the video rendering device (106), or the video synchronization module (404) therein is configured to communicate over a data path 212-2 with the media device (102) or the content synchronization unit (204) therein, so that the media device (102) can determine video rendering characteristics pertaining to the video rendering device (106).
  • the video rendering device (106), or the video synchronization module (404) therein can perform operations in conjunction with the media device (102) to run one or more bandwidth tests (e.g., by transmitting test data, by transmitting test files, test packets, etc.) on an end-to-end network path between the media device (102) and the video rendering device (106) for the purpose of determining or estimating a transmission rate from the media device ( 102) to the video rendering device (106), continuously, from time to time, periodically, on-demand, over a plurality of time points, through polling, by emitting reporting events, etc.
  • bandwidth tests e.g., by transmitting test data, by transmitting test files, test packets, etc.
  • the video rendering device (106), or the video synchronization module (404) therein can perform operations in conjunction with the media device (102) to run one or more delay or latency tests (e.g., by transmitting test data, by transmitting test files, test packets, etc.) on the end-to-end network path between the media device (102) and the video rendering device (106) for the purpose of determining or estimating delays or latencies from the media device (102) to the video rendering device (106), continuously, from time to time, periodically, on-demand, over a plurality of time points, through polling, by emitting reporting events, etc.
  • delay or latency tests e.g., by transmitting test data, by transmitting test files, test packets, etc.
  • the video rendering device (106) can be configured to collect statistics related to video rendering operations performed by the video rendering device (106). For example, real-time or non-real-time statistics can be collected for processing received video content and analyzed, by one or both of the media device (102) and the video rendering device (106), to determine transmission rates, transmission delays, content processing delays, other delays or latencies, etc., as functions of time.
  • the transmission rates, transmission delays, content processing delays, other delays or latencies, etc., determined based on analyzing statistics can be updated, by one or both of the media device (102) and the video rendering device (106), on a real-time or near-real-time basis, at the same time when the media device (102) and the video rendering device (106) are processing and rendering video content with the video rendering device (106).
  • the video stream decoder (402) comprises software, hardware, a combination of software and hardware, etc., configured to receive, from the media device (102), a video data stream 214-2 comprising specific video content to be rendered by the video rendering device (106); decode a set of video frames (or video chunks) from the video data stream (214-2); extract a set of corresponding timestamps from the video data stream (214-2), which are included in the video data stream (214-2) by the media device (102) for the purpose of synchronizing the playing rendering of the video data stream (214-2) by the video rendering device (106) with the playing/rendering of other content data streams (e.g., audio data streams, other video data streams, etc.) by other content rendering devices (e.g., audio rendering devices, other video rendering devices, etc.); etc.
  • the video stream decoder (402) is further configured to send/forward the set of decoded video frames and the set of corresponding timestamps to the video rendering module (406) for rendering
  • the video rendering module (406) comprises software, hardware, a combination of software and hardware, etc., configured to queue or store up to a certain number of decoded video frames in memory (e.g., a video jitter buffer, etc.) at any given time before rendering these decoded video frames.
  • memory e.g., a video jitter buffer, etc.
  • the video rendering device (106) may be configured to generate digital drive values based on video data in a video frame that is to be rendered at a specific time point (e.g., with reference to the common clock, etc.) specified by an individual timestamp in the set of timestamps, and drives a display 408 (e.g., a liquid crystal display, a plasma display, etc.) to generate, at the specific time point, an image that represents a video content portion in the video frame.
  • a display 408 e.g., a liquid crystal display, a plasma display, etc.
  • FIG. 5A illustrates an example process flow according to an example embodiment of the present invention.
  • one or more computing devices or components may perform this process flow.
  • a media device e.g., 102 of FIG. 1A, FIG. IB or FIG. 2, etc. determines, for an audio rendering device, an audio transmission rate and an audio rendering latency.
  • the media device determines, for a video rendering device, a video transmission rate and a video rendering latency.
  • the media device determines, based at least in part on the audio rendering latency and the video rendering latency, a combined audio and video rendering latency for both the audio rendering device and the video rendering device.
  • the media device transmits a portion of audio content of audiovisual data to the audio rendering device at the audio transmission rate.
  • the portion of the audio content comprises a set of timestamps for rendering audio data in the portion of the audio content; the set of timestamps is determined based as least in part on the combined audio and video rendering latency.
  • the media device transmits a portion of video content of the audiovisual data to the video rendering device at the video transmission rate.
  • the portion of the video content comprises the set of timestamps for rendering video data in the portion of the video content.
  • the media device is further configured to perform: separating the audiovisual data into an audio data component as the audio content and a video data component as the video content; encoding the audio data component into one or more audio data streams; encoding the video data component into one or more video data streams; etc.
  • An audio data stream in the one or more audio data streams comprises the portion of the audio content, and is streamed to the audio rendering device at the audio transmission rate.
  • a video data stream in the one or more video data streams comprises the portion of the video content, and is streamed to the video rendering device at the video transmission rate.
  • the portion of the audio content and the portion of the video content may be transmitted over the same network such as a local network, a non-local network, etc. In some embodiments, the portion of the audio content and the portion of the video content may be transmitted over two different networks such as two different local networks, one local network and one non-local network, two non-local networks, etc.
  • the portion of the audio content is transmitted from a media device to the audio rendering device over a Bluetooth connection
  • the portion of the video content is transmitted from the media device to the video rendering device over a non-Bluetooth connection.
  • the audio rendering device and the video rendering device are synchronized to a common clock for content rendering; the set of timestamps comprises timestamps to be interpreted in reference to the common clock.
  • the audio rendering device is among a plurality of audio rendering devices for each of which an individual audio transmission rate and an individual audio rendering latency are determined; the combined audio and video rendering latency is determined for the plurality of audio rendering devices and the video rendering device based at least in part on individual audio rendering latencies of the plurality of audio rendering devices and the video rendering latency; the media device is further configured to transmit a second portion of the audio content of the audiovisual data to a second audio rendering device in the plurality of audio rendering devices at a second audio transmission rate, wherein the second portion of the audio content comprises the set of timestamps.
  • the video rendering device is among a plurality of video rendering devices for each of which an individual video transmission rate and an individual video rendering latency are determined; the combined audio and video rendering latency is determined for the audio rendering device and the plurality of video rendering devices based at least in part on individual video rendering latencies of the audio rendering device and the plurality of video rendering devices; the media device is further configured to transmit a second portion of the video content of the audiovisual data to a second video rendering device in the plurality of video rendering devices at a second video transmission rate, wherein the second portion of the video content comprises the set of timestamps.
  • the audiovisual data comprises one of movies, media programs, television programs, etc.
  • the audiovisual data is received from one or more of media files stored in locally or remotely accessible storage media, or media data streams from one or more media streaming servers.
  • the media device is further configured to perform: determining a set of input timestamps from the audiovisual data; generating, based on the set of input timestamps and the combined audio and video rendering latency, the set of timestamps.
  • the audio rendering latency for the audio rendering device comprises one or more of transmission delays, content processing delays, network-specific latencies, device-specific latencies, or component-specific latencies, as related to the audio rendering device.
  • the audio transmission rate, the video transmission rate, and the combined audio and video rendering latency are determined based on contemporaneous rendering characteristics for a first time point; the media device is further configured to perform: determining, for the audio rendering device, a subsequent audio transmission rate and a subsequent audio rendering latency for a second time point subsequent to the first time; determining, for the video rendering device, a subsequent video transmission rate and a subsequent video rendering latency for the second time point; determining, based at least in part on the subsequent audio rendering latency and the subsequent video rendering latency, a subsequent combined audio and video rendering latency for both the audio rendering device and the video rendering device; transmitting a subsequent portion of the audio content of the audiovisual data to the audio rendering device at the subsequent audio transmission rate; transmitting a subsequent portion of the video content of the audiovisual data to the video rendering device at the subsequent video transmission rate.
  • the subsequent portion of the audio content comprises a subsequent set of timestamps for rendering audio data in the subsequent portion of the audio content.
  • the subsequent set of timestamps is determined based as least in part on the subsequent combined audio and video rendering latency.
  • the subsequent portion of the video content comprises the subsequent set of timestamps for rendering video data in the subsequent portion of the video content.
  • the audio transmission rate, the video transmission rate, and the combined audio and video rendering latency are determined, while the audio content of the audiovisual data is being streamed to one or more audio rendering devices that include the audio rendering device and the video content of the audiovisual data is being streamed to one or more video rendering devices that include the video rendering device.
  • the method is performed by a media device; the portion of the audio content is transmitted to the audio rendering device in a single media data stream between the media device and the audio rendering device.
  • the method is performed by a media device; the portion of the video content is transmitted to the video rendering device in a single media data stream between the media device and the video rendering device.
  • the method is performed by a media device; the portion of the video content is transmitted to the video rendering device in a first media data sub-stream of a media data stream between the media device and the video rendering device; and a second media data sub-stream of the media data stream comprises a second portion of the audio content streamed from the media device to the video rendering device.
  • FIG. 5B illustrates an example process flow according to an example embodiment of the present invention.
  • one or more computing devices or components may perform this process flow.
  • a content rendering device e.g., 104 of FIG. 1A or FIG. 3, 104-1 or 104-2 of FIG. IB, 106 of FIG. 1A or FIG. 4, 106-1 of FIG. IB, etc.
  • the content rendering device receives a portion of media content of audiovisual data at the content transmission rate.
  • the portion of the media content comprises a set of timestamps for rendering media data in the portion of the media content.
  • the set of timestamps is determined based at least in part on a combined audio and video rendering latency determined by the media device.
  • the content rendering device decodes one or more content data units from the portion of the media content.
  • the content rendering device renders the one or more content data units at one or more time points as determined based on one or more time stamps in the set of timestamps.
  • the content rendering device represents an audio rendering device, and wherein the media content comprises audio-only content.
  • the content rendering device represents a video rendering device, and wherein the media content comprises video-only content.
  • the content rendering device represents a combined audio and video rendering device, and wherein the media content comprises audio and video content.
  • the content rendering device communicates with the media device over an end-to-end network path over one or more of Wi-Fi networks, Bluetooth networks, infrared links, CDMA networks, GSM networks, HDMI video links, non-HDMI video links, radio frequency (RF) networks, optical links, etc.
  • the content rendering device is an audio rendering device in a plurality of audio rendering devices, wherein the plurality of audio rendering devices together forms a multi-channel audio rendering system, and wherein the media content as received by the audio rendering device is a sub-mix of a mix of multi-channel audio data.
  • the one or more time points are determined in reference to a common clock (used for content rendering) based on the one or more time stamps in the set of timestamps; the content rendering device is further configured to communicate with the media device, by the content rendering device, to determine the common clock to be used by the content rendering device for synchronized content rendering with one or more other content rendering devices.
  • the one or more other content rendering devices use the common clock for synchronized content rendering.
  • a system, an apparatus, or one or more other computing devices performs any or a part of the foregoing methods as described.
  • a non-transitory computer readable storage medium storing software instructions, which when executed by one or more processors cause performance of any of the foregoing methods.
  • a computing device comprising one or more processors and one or more storage media storing a set of instructions which, when executed by the one or more processors, cause performance of any of the foregoing methods.
  • the techniques described herein are implemented by one or more special-purpose computing devices.
  • the special-purpose computing devices may be hard-wired to perform the techniques, or may include digital electronic devices such as one or more application-specific integrated circuits (ASICs) or field programmable gate arrays (FPGAs) that are persistently programmed to perform the techniques, or may include one or more general purpose hardware processors programmed to perform the techniques pursuant to program instructions in firmware, memory, other storage, or a combination.
  • ASICs application-specific integrated circuits
  • FPGAs field programmable gate arrays
  • Such special-purpose computing devices may also combine custom hard-wired logic, ASICs, or FPGAs with custom programming to accomplish the techniques.
  • the special-purpose computing devices may be desktop computer systems, portable computer systems, handheld devices, networking devices or any other device that incorporates hard-wired and/or program logic to implement the techniques.
  • FIG. 6 is a block diagram that illustrates a computer system 600 upon which an example embodiment of the invention may be implemented.
  • Computer system 600 includes a bus 602 or other communication mechanism for communicating information, and a hardware processor 604 coupled with bus 602 for processing information.
  • Hardware processor 604 may be, for example, a general purpose microprocessor.
  • Computer system 600 also includes a main memory 606, such as a random access memory (RAM) or other dynamic storage device, coupled to bus 602 for storing information and instructions to be executed by processor 604.
  • Main memory 606 also may be used for storing temporary variables or other intermediate information during execution of instructions to be executed by processor 604.
  • Such instructions when stored in non-transitory storage media accessible to processor 604, render computer system 600 into a special-purpose machine that is customized to perform the operations specified in the instructions.
  • Computer system 600 further includes a read only memory (ROM) 608 or other static storage device coupled to bus 602 for storing static information and instructions for processor 604.
  • ROM read only memory
  • a storage device 610 such as a magnetic disk or optical disk, is provided and coupled to bus 602 for storing information and instructions.
  • Computer system 600 may be coupled via bus 602 to a display 612, such as a liquid crystal display, for displaying information to a computer user.
  • a display 612 such as a liquid crystal display
  • An input device 614 is coupled to bus 602 for communicating information and command selections to processor 604.
  • cursor control 616 is Another type of user input device
  • cursor control 616 such as a mouse, a trackball, or cursor direction keys for communicating direction information and command selections to processor 604 and for controlling cursor movement on display 612.
  • This input device typically has two degrees of freedom in two axes, a first axis (e.g., x) and a second axis (e.g., y), that allows the device to specify positions in a plane.
  • Computer system 600 may implement the techniques described herein using customized hard-wired logic, one or more ASICs or FPGAs, firmware and/or program logic which in combination with the computer system causes or programs computer system 600 to be a special-purpose machine. According to one embodiment, the techniques herein are performed by computer system 600 in response to processor 604 executing one or more sequences of one or more instructions contained in main memory 606. Such instructions may be read into main memory 606 from another storage medium, such as storage device 610. Execution of the sequences of instructions contained in main memory 606 causes processor 604 to perform the process steps described herein. In alternative embodiments, hard-wired circuitry may be used in place of or in combination with software instructions.
  • Non-volatile media includes, for example, optical or magnetic disks, such as storage device 610.
  • Volatile media includes dynamic memory, such as main memory 606.
  • storage media include, for example, a floppy disk, a flexible disk, hard disk, solid state drive, magnetic tape, or any other magnetic data storage medium, a CD-ROM, any other optical data storage medium, any physical medium with patterns of holes, a RAM, a PROM, and EPROM, a FLASH-EPROM, NVRAM, any other memory chip or cartridge.
  • Storage media is distinct from but may be used in conjunction with transmission media.
  • Transmission media participates in transferring information between storage media.
  • transmission media includes coaxial cables, copper wire and fiber optics, including the wires that comprise bus 602.
  • transmission media can also take the form of acoustic or light waves, such as those generated during radio-wave and infra-red data communications.
  • Various forms of media may be involved in carrying one or more sequences of one or more instructions to processor 604 for execution.
  • the instructions may initially be carried on a magnetic disk or solid state drive of a remote computer.
  • the remote computer can load the instructions into its dynamic memory and send the instructions over a telephone line using a modem.
  • a modem local to computer system 600 can receive the data on the telephone line and use an infra-red transmitter to convert the data to an infra-red signal.
  • An infra-red detector can receive the data carried in the infra-red signal and appropriate circuitry can place the data on bus 602.
  • Bus 602 carries the data to main memory 606, from which processor 604 retrieves and executes the instructions.
  • the instructions received by main memory 606 may optionally be stored on storage device 610 either before or after execution by processor 604.
  • Computer system 600 also includes a communication interface 618 coupled to bus 602.
  • Communication interface 618 provides a two-way data communication coupling to a network link 620 that is connected to a local network 622.
  • communication interface 618 may be an integrated services digital network (ISDN) card, cable modem, satellite modem, or a modem to provide a data communication connection to a corresponding type of telephone line.
  • ISDN integrated services digital network
  • communication interface 618 may be a local area network (LAN) card to provide a data communication connection to a compatible LAN.
  • LAN local area network
  • Wireless links may also be implemented.
  • communication interface 618 sends and receives electrical, electromagnetic or optical signals that carry digital data streams representing various types of information.
  • Network link 620 typically provides data communication through one or more networks to other data devices.
  • network link 620 may provide a connection through local network 622 to a host computer 624 or to data equipment operated by an Internet Service Provider (ISP) 626.
  • ISP 626 in turn provides data communication services through the world wide packet data communication network now commonly referred to as the "Internet" 628.
  • Internet 628 uses electrical, electromagnetic or optical signals that carry digital data streams.
  • the signals through the various networks and the signals on network link 620 and through communication interface 618, which carry the digital data to and from computer system 600, are example forms of transmission media.
  • Computer system 600 can send messages and receive data, including program code, through the network(s), network link 620 and communication interface 618.
  • a server 630 might transmit a requested code for an application program through Internet 628, ISP 626, local network 622 and communication interface 618.
  • the received code may be executed by processor 604 as it is received, and/or stored in storage device 610, or other non- volatile storage for later execution.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

Selon l'invention, un débit de transmission audio et une latence de rendu audio sont déterminées pour un dispositif de rendu audio. De manière similaire, un débit de transmission vidéo et un temps d'attente de rendu vidéo sont déterminés pour un dispositif de rendu vidéo. Sur la base de ces débits de transmission et de ces latences, une latence combinée de rendu vidéo et audio est déterminée. Le contenu audio des données audiovisuelles est transmis au dispositif de rendu audio au débit de transmission audio, et comprend des estampilles temporelles pour le rendu du contenu audio. Les estampilles temporelles sont déterminées en se basant en partie sur la latence combinée de rendu audio et vidéo. Le contenu vidéo des données audiovisuelles est transmis au dispositif de rendu vidéo au débit de transmission vidéo et comprend les mêmes estampilles temporelles pour le rendu de contenu vidéo. Le dispositif de rendu audio ou de rendu vidéo décode les unités de données de contenu provenant du contenu audio ou vidéo, et effectue le rendu des unités de données de contenu aux points temporels en se basant sur les estampilles temporelles.
PCT/CN2014/082390 2014-07-17 2014-07-17 Techniques pour lire séparément des données audio et vidéo dans des réseaux locaux WO2016008131A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2014/082390 WO2016008131A1 (fr) 2014-07-17 2014-07-17 Techniques pour lire séparément des données audio et vidéo dans des réseaux locaux

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2014/082390 WO2016008131A1 (fr) 2014-07-17 2014-07-17 Techniques pour lire séparément des données audio et vidéo dans des réseaux locaux

Publications (1)

Publication Number Publication Date
WO2016008131A1 true WO2016008131A1 (fr) 2016-01-21

Family

ID=55077827

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/082390 WO2016008131A1 (fr) 2014-07-17 2014-07-17 Techniques pour lire séparément des données audio et vidéo dans des réseaux locaux

Country Status (1)

Country Link
WO (1) WO2016008131A1 (fr)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018106447A1 (fr) * 2016-12-09 2018-06-14 Arris Enterprises Llc Dispositif d'étalonnage, procédé et programme de réalisation d'une synchronisation entre des données audio et vidéo lors de l'utilisation de dispositifs audio bluetooth
JP2020522146A (ja) * 2017-02-06 2020-07-27 サバント システムズ エルエルシーSavant Systems LLC オーディオダウンミキシング送信機a/vエンドポイント及び分散チャネル増幅を含むa/v相互接続アーキテクチャ
CN113610699A (zh) * 2021-07-19 2021-11-05 广州致远电子有限公司 一种硬件图层渲染调度方法、装置、设备及储存介质
CN114286149A (zh) * 2021-12-31 2022-04-05 广东博华超高清创新中心有限公司 一种跨设备和系统的音视频同步渲染的方法及系统
CN114979766A (zh) * 2022-05-11 2022-08-30 深圳市大头兄弟科技有限公司 音视频的合成方法、装置、设备及存储介质
CN115022731A (zh) * 2022-05-17 2022-09-06 蔚来汽车科技(安徽)有限公司 车载观影系统、车载观影方法及计算机存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101036389A (zh) * 2004-09-02 2007-09-12 索尼株式会社 内容接收器、视频-音频输出定时控制方法和内容提供系统
CN101267572A (zh) * 2008-04-30 2008-09-17 中兴通讯股份有限公司 一种节目流转换的方法及装置
EP2334049A2 (fr) * 2009-12-14 2011-06-15 QNX Software Systems GmbH & Co. KG Synchronisation de présentation vidéo par modification de cadence vidéo
CN103905878A (zh) * 2014-03-13 2014-07-02 北京奇艺世纪科技有限公司 一种视频数据和音频数据同步播放的方法、装置和设备

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101036389A (zh) * 2004-09-02 2007-09-12 索尼株式会社 内容接收器、视频-音频输出定时控制方法和内容提供系统
CN101267572A (zh) * 2008-04-30 2008-09-17 中兴通讯股份有限公司 一种节目流转换的方法及装置
EP2334049A2 (fr) * 2009-12-14 2011-06-15 QNX Software Systems GmbH & Co. KG Synchronisation de présentation vidéo par modification de cadence vidéo
CN103905878A (zh) * 2014-03-13 2014-07-02 北京奇艺世纪科技有限公司 一种视频数据和音频数据同步播放的方法、装置和设备

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018106447A1 (fr) * 2016-12-09 2018-06-14 Arris Enterprises Llc Dispositif d'étalonnage, procédé et programme de réalisation d'une synchronisation entre des données audio et vidéo lors de l'utilisation de dispositifs audio bluetooth
US10892833B2 (en) 2016-12-09 2021-01-12 Arris Enterprises Llc Calibration device, method and program for achieving synchronization between audio and video data when using Bluetooth audio devices
US11329735B2 (en) 2016-12-09 2022-05-10 Arris Enterprises Llc Calibration device, method and program for achieving synchronization between audio and video data when using short range wireless audio devices
JP2020522146A (ja) * 2017-02-06 2020-07-27 サバント システムズ エルエルシーSavant Systems LLC オーディオダウンミキシング送信機a/vエンドポイント及び分散チャネル増幅を含むa/v相互接続アーキテクチャ
JP7144429B2 (ja) 2017-02-06 2022-09-29 サバント システムズ インコーポレイテッド オーディオダウンミキシング送信機a/vエンドポイント及び分散チャネル増幅を含むa/v相互接続アーキテクチャ
CN113610699A (zh) * 2021-07-19 2021-11-05 广州致远电子有限公司 一种硬件图层渲染调度方法、装置、设备及储存介质
CN114286149A (zh) * 2021-12-31 2022-04-05 广东博华超高清创新中心有限公司 一种跨设备和系统的音视频同步渲染的方法及系统
CN114286149B (zh) * 2021-12-31 2023-07-07 广东博华超高清创新中心有限公司 一种跨设备和系统的音视频同步渲染的方法及系统
CN114979766A (zh) * 2022-05-11 2022-08-30 深圳市大头兄弟科技有限公司 音视频的合成方法、装置、设备及存储介质
CN114979766B (zh) * 2022-05-11 2023-11-21 深圳市闪剪智能科技有限公司 音视频的合成方法、装置、设备及存储介质
CN115022731A (zh) * 2022-05-17 2022-09-06 蔚来汽车科技(安徽)有限公司 车载观影系统、车载观影方法及计算机存储介质

Similar Documents

Publication Publication Date Title
US11627351B2 (en) Synchronizing playback of segmented video content across multiple video playback devices
JP7120997B2 (ja) オーディオとビデオのマルチモード同期レンダリング
JP7284906B2 (ja) メディアコンテンツの配信および再生
JP2023083353A (ja) 再生方法および再生装置
WO2016008131A1 (fr) Techniques pour lire séparément des données audio et vidéo dans des réseaux locaux
US20210352345A1 (en) Video distribution synchronization
US11611788B2 (en) Adaptive switching in a whole home entertainment system
CN108810656B (zh) 一种实时直播ts流的去抖处理方法及处理系统
US9131271B2 (en) Systems and methods for real-time adaptation of multimedia data
JP2014131142A (ja) 受信装置、受信方法、及びプログラム、撮像装置、撮像方法、及びプログラム、送信装置、送信方法、及びプログラム
CN114554277B (zh) 多媒体的处理方法、装置、服务器及计算机可读存储介质
CN114257771A (zh) 一种多路音视频的录像回放方法、装置、存储介质和电子设备
CN110392285B (zh) 媒体流处理方法及装置
KR101810883B1 (ko) 라이브 스트리밍 시스템 및 그의 스트리밍 클라이언트
JP4373802B2 (ja) 番組送信方法、番組送信装置、番組送信システム、および、番組送信プログラム
GB2596107A (en) Managing network jitter for multiple audio streams
CN113727183A (zh) 直播推流方法、装置、设备、存储介质及计算机程序产品
KR20170039916A (ko) 애플리케이션의 구동을 처리하는 클라우드 서버, 미디어 재생 장치 및 컴퓨터 프로그램

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14897597

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14897597

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205 DATED 25/08/2017)

122 Ep: pct application non-entry in european phase

Ref document number: 14897597

Country of ref document: EP

Kind code of ref document: A1