WO2021243044A1 - Methods and systems for synchronizing multimedia - Google Patents

Methods and systems for synchronizing multimedia Download PDF

Info

Publication number
WO2021243044A1
WO2021243044A1 PCT/US2021/034558 US2021034558W WO2021243044A1 WO 2021243044 A1 WO2021243044 A1 WO 2021243044A1 US 2021034558 W US2021034558 W US 2021034558W WO 2021243044 A1 WO2021243044 A1 WO 2021243044A1
Authority
WO
WIPO (PCT)
Prior art keywords
user device
content
user devices
user
latency
Prior art date
Application number
PCT/US2021/034558
Other languages
French (fr)
Inventor
Zachery Nelson
Original Assignee
Let's Jam, Llc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Let's Jam, Llc filed Critical Let's Jam, Llc
Publication of WO2021243044A1 publication Critical patent/WO2021243044A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/21Server components or server architectures
    • H04N21/218Source of audio or video content, e.g. local disk arrays
    • H04N21/2187Live feed
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/80Responding to QoS
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/23418Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/242Synchronization processes, e.g. processing of PCR [Program Clock References]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/27Server based end-user applications
    • H04N21/274Storing end-user multimedia data in response to end-user request, e.g. network recorder
    • H04N21/2743Video hosting of uploaded data from client
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/422Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
    • H04N21/42203Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS] sound input device, e.g. microphone
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/422Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
    • H04N21/4223Cameras
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/4302Content synchronisation processes, e.g. decoder synchronisation
    • H04N21/4307Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen
    • H04N21/43076Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen of the same content streams on multiple devices, e.g. when family members are watching the same movie on different devices
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/45Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies, resolving scheduling conflicts
    • H04N21/454Content or additional data filtering, e.g. blocking advertisements
    • H04N21/4545Input to filtering algorithms, e.g. filtering a region of the image
    • H04N21/45455Input to filtering algorithms, e.g. filtering a region of the image applied to a region of the image
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/8106Monomedia components thereof involving special audio data, e.g. different tracks for different languages
    • H04N21/8113Monomedia components thereof involving special audio data, e.g. different tracks for different languages comprising music, e.g. song in MP3 format
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04JMULTIPLEX COMMUNICATION
    • H04J3/00Time-division multiplex systems
    • H04J3/02Details
    • H04J3/06Synchronising arrangements
    • H04J3/062Synchronisation of signals having the same nominal but fluctuating bit rates, e.g. using buffers
    • H04J3/0632Synchronisation of packets and cells, e.g. transmission of voice via a packet network, circuit emulation service [CES]

Definitions

  • a user device e.g., computing device, smart device, mobile device, multimedia capturing device, etc.
  • determine/capture multimedia e.g., video, audio, etc.
  • Multimedia determined/captured by multiple user devices that indicates the actions/performance of multiple performers may be combined to form a synchronized composite performance with minimal latency and delay between performers.
  • a computing device e.g., a server, a cloud-based device, a network device, a timing controller, a multimedia stream compositor, etc.
  • data/information e.g., visual markers, pixels, metadata, etc.
  • a computing device may use data/information (e.g., visual markers, pixels, metadata, etc.) embedded in a multimedia stream from each user device to determine timing information and/or latency information associated with the user device, and use the timing and/or latency information to synchronize the various multimedia streams.
  • Figure 1 shows an example system
  • Figure 2 shows an example display
  • Figure 3 shows an example display
  • Figure 4 shows a flowchart of an example method
  • Figure 5 shows a flowchart of an example method
  • Figure 6 shows a block diagram of a computing device for implementing example methods.
  • a computer program product on a computer-readable storage medium (e.g., non-transitory) having processor-executable instructions (e.g., computer software) embodied in the storage medium.
  • processor-executable instructions e.g., computer software
  • Any suitable computer-readable storage medium may be utilized including hard disks, CD-ROMs, optical storage devices, magnetic storage devices, memresistors, Non-Volatile Random Access Memory (NVRAM), flash memory, or a combination thereof.
  • NVRAM Non-Volatile Random Access Memory
  • processor-executable instructions may also be stored in a computer- readable memory that may direct a computer or other programmable data processing apparatus to function in a particular manner, such that the processor-executable instructions stored in the computer-readable memory produce an article of manufacture including processor-executable instructions for implementing the function specified in the flowchart block or blocks.
  • the processor-executable instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer-implemented process such that the processor-executable instructions that execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart block or blocks.
  • blocks of the block diagrams and flowcharts support combinations of devices for performing the specified functions, combinations of steps for performing the specified functions and program instruction means for performing the specified functions. It will also be understood that each block of the block diagrams and flowcharts, and combinations of blocks in the block diagrams and flowcharts, may be implemented by special purpose hardware-based computer systems that perform the specified functions or steps, or combinations of special purpose hardware and computer instructions.
  • Content items may also be referred to as “content,” “content data,” “content information,” “content asset,” “multimedia asset data file,” or simply “data” or “information”.
  • Content items may be any information or data that may be licensed to one or more individuals (or other entities, such as businesses or groups).
  • Content may be electronic representations of video, audio, text, and/or graphics, which may be but is not limited to electronic representations of videos, movies, or other multimedia, which may be but is not limited to data files adhering to MPEG2, MPEG, MPEG4 UHD, HDR, 4k, Adobe® Flash® Video ( FLV) format or some other video file format whether such format is presently known or developed in the future.
  • FLV Adobe® Flash® Video
  • the content items described herein may be electronic representations of music, spoken words, or other audio, which may be but is not limited to data files adhering to the MPEG-1 Audio Layer 3 (.MP3) format, Adobe®, CableLabs 1.0,1.1, 3.0, AVC, HEVC, H.264, Nielsen watermarks, V-chip data and Secondary Audio Programs (SAP). Sound Document (.ASND) format or some other format configured to store electronic audio whether such format is presently known or developed in the future.
  • .MP3 MPEG-1 Audio Layer 3
  • SAP Secondary Audio Programs
  • content may be data files adhering to the following formats: Portable Document Format (.PDF), Electronic Publication (.EPUB) format created by the International Digital Publishing Forum (IDPF), JPEG (JPG) format, Portable Network Graphics (.PNG) format, dynamic ad insertion data (.csv), Adobe® Photoshop® (.PSD) format or some other format for electronically storing text, graphics and/or other information whether such format is presently known or developed in the future.
  • PDF Portable Document Format
  • .EPUB Electronic Publication
  • IPF International Digital Publishing Forum
  • JPG JPEG
  • JPG Portable Network Graphics
  • .csv Portable Network Graphics
  • .csv dynamic ad insertion data
  • .PSD Adobe® Photoshop®
  • Consuming content or the “consumption of content,” as those phrases are used herein, may also be referred to as “accessing” content, “providing” content, “viewing” content, “listening” to content, “rendering” content, or “playing” content, among other things. In some cases, the particular term utilized may be dependent on the context in which it is used. Consuming video may also be referred to as viewing or playing the video. Consuming audio may also be referred to as listening to or playing the audio.
  • a user device e.g., computing device, smart device, mobile device, multimedia capturing device, etc.
  • determine/capture multimedia e.g., video, audio, etc.
  • Multimedia determined/captured and/or output/streamed by multiple user devices that indicate the actions/performance of multiple performers may be combined to form a synchronized composite performance with minimal latency and delay between performers.
  • a computing device may use data/information (e.g., visual markers, pixels, metadata, etc.) embedded in a multimedia stream from each user device to determine timing information and/or latency information associated with the user device, and use the timing and/or latency information to synchronize the various multimedia streams. For example, may determine communication timing between each user device to identify a user device associated with the most and/or greatest degree of latency. The most and/or greatest degree of latency may be used to determine a respective offset time value for each user device that may be used to vary (e.g., variably delay, etc.) the timing of multimedia received from the user device.
  • data/information e.g., visual markers, pixels, metadata, etc.
  • the most and/or greatest degree of latency may be used to determine a respective offset time value for each user device that may be used to vary (e.g., variably delay, etc.) the timing of multimedia received from the user device.
  • the computing device may combine a locally-playing and variably-delayed media, such as a background audio track, with variably-delayed multimedia streams from each user device to produce a real-time composite multimedia performance.
  • a locally-playing and variably-delayed media such as a background audio track
  • Each user device may modify aspects of the composite multimedia performance. For example, a user device may be used to modify/adjust display settings, such as scene transitions, color patterns, audio levels, video quality, offset time values, and/or any other attribute of and/or associated with the composite multimedia performance.
  • the composite multimedia performance may be sent to and/or published via multimedia streaming platform and/or social media platform for consumption by multiple audiences.
  • FIG. 1 shows an example system 100 for synchronizing multimedia.
  • the system 100 facilitates the synchronization of multimedia streams of one or more musicians, singers, entertainment performers, and/or the like.
  • the synchronized multimedia streams may be used to generate a composite multimedia stream that minimizes latency and/or delay between devices associated with one or more musicians, singers, entertainment performers, and/or the like.
  • the composite multimedia stream may be output and/or presented, for example, in real-time, to a wide variety of audiences, platforms, multimedia sharing systems, and/or the like.
  • components of the system 100 may include hardware, software (e.g., one or more applications, etc.), combinations thereof, and/or the like.
  • the system may include a computing device 101 (e.g., a server, a cloud-based device, a network device, a timing controller, a multimedia stream compositor, etc.) in communication with a network 106.
  • the network 106 may include and/or support a wide area network (e.g., a packet- switch network, Internet, etc.), multimedia hosting and/or communication network, a local area network (LAN), a content delivery/access network, a cellular network, a satellite network, a fiber-optic network, a wired network, a wireless network, and/or the like.
  • the network 106 may support any network path, protocol, device, and/or the like.
  • the computing device 101 may support and/or host multimedia conferences, virtual meetings, multimedia exchange/sharing/compositing, and/or the like between one or more user devices (e.g., computing device, smart device, mobile device, multimedia capturing device, etc.) coupled to and/or in communication with the network 105, such as user devices 102, 103, 104, and 105.
  • user devices e.g., computing device, smart device, mobile device, multimedia capturing device, etc.
  • the network 105 such as user devices 102, 103, 104, and 105.
  • the user device 102 is shown in greater detail than the user devices 103-105. However, each of the user devices 102-105 may be configured the same/similarly and may operate and perform the same and/or similar functions.
  • the user device 102 may include an imaging module 107.
  • the imaging module 107 may include one or more image capturing devices, such as one or more cameras that determine/capture image data (e.g., video, streaming media, etc.).
  • the imaging module 107 may capture/determine actions, such as a real-time performance, of a user 120.
  • the user 120 may include a musician, a singer, an entertainment performer, and/or the like.
  • the user devices 103-105 may each include the imaging modules that capture/determine actions, such as real-time performances, of users 121-123, respectively.
  • the user device 102 may include an interface module 108.
  • the interface module 108 may include one or more interfaces for presenting and/or receiving information to/from a user, such as a multimedia representation of a performance, a composite performance, interactive and/or enhanced data/information, and/or the like.
  • the interface module 108 may include any software, hardware, and/or interfaces used to provide communication between the user device 102 and/or any other device/component of the system 100.
  • the interface module 108 may include one or more audio devices (e.g., stereos, speakers, microphones, etc.).
  • the interface module 108 may include a graphical user interface (GUI), a web browser (e.g., Internet Explorer®, Mozilla Firefox®, Google Chrome®, Safari®, or the like), an application/ API.
  • GUI graphical user interface
  • the interface module 108 may request and/or query various files from a local source and/or a remote source, such as the computing device 101, the user devices 102-105, and/or any other device/component (not shown) associated with the system 100.
  • the user device 102 may include a communication module 109.
  • the communication module 109 may enable the user device 102 to communicate with components of the system 100, such as the computing device 101 and/or the user devices 103-105, via wired and/or wireless communication techniques.
  • the communication module 109 may utilize any suitable wired communication technique, such as Ethernet, coaxial cable, fiber optics, and/or the like.
  • the communication module 109 may utilize any suitable long-range communication technique, such as Wi-Fi (IEEE 802.11), BLUETOOTH®, cellular (e.g., fourth- generation (4G) cellular technology, fifth-generation (5G) cellular technology, any- generation (XG) cellular technology, etc.), satellite, infrared, and/or the like.
  • the communication module 109 may utilize any suitable short-range communication technique, such as BLUETOOTH®, near-field communication, infrared, and the like.
  • the user devices 102-105 may exchange data/information, such as multimedia data, performance data, and/or the like with the computing device 101.
  • the computing device 101 may receive multimedia data from two or more of the user devices 102-105 and generate a composite multimedia stream.
  • the computing device 104 may include a communication module 111 to facilitate communication between the user devices 102-105, such as the exchange of multimedia data/information.
  • the communication module 111 may enable the computing device 101 to communicate with components of the system 100, such as the user devices 102-105, via wired and/or wireless communication techniques.
  • the communication module 111 may utilize any suitable wired communication technique, such as Ethernet, coaxial cable, fiber optics, and/or the like.
  • the communication module 111 may utilize any suitable long-range communication technique, such as Wi-Fi (IEEE 802.11), BLUETOOTH®, cellular (e.g., fourth-generation (4G) cellular technology, fifth-generation (5G) cellular technology, any-generation (XG) cellular technology, etc.), satellite, infrared, and/or the like.
  • the communication module 111 may utilize any suitable short-range communication technique, such as BLUETOOTH®, near-field communication, infrared, and the like.
  • the computing device 101 may use data/information (e.g., visual markers, pixels, metadata, etc.) embedded in a multimedia stream from each of the user devices 102-105 to determine the timing and/or latency information associated with the user devices 102-105, and use the timing and/or latency information to synchronize the multimedia streams received from the user devices 102-105.
  • data/information e.g., visual markers, pixels, metadata, etc.
  • the computing device 101 may include a synchronization module 110.
  • the computing device 101 may send, for example, via the synchronization module 110, a signal, a ping, a test communication, and/or the like to each of the user devices 102-105 to cause and/or elicit a response from the of the user devices 102-105, such as a response signal, a pong, a testing communication, and/or the like.
  • the computing device 101 may initiate one or more timers to record how long it takes for each of the user devices 102-105 to send a response.
  • Each of the user devices 102-105 may be configured with a telemetry component and/or an application that facilitates the generation of responses to signals, pings, test communications, and/or the like.
  • the user devices 102-105 may generate a response to signals, pings, test communications, and/or the like from the computing device 101, for example, via metadata embedded in a media (e.g., audio, video, multimedia, etc.) stream sent to the computing device 101, via manipulation of pixels and/or visual markers that may be detected by the computing device 101 within video information (e.g., a video stream, etc.) sent to the computing device 101, and/or the like.
  • a media e.g., audio, video, multimedia, etc.
  • video information e.g., a video stream, etc.
  • response to response to signals, pings, test communications, and/or the like from the computing device 101 sent via a media stream may enable a real-time and accurate determination of latency between user devices and the computing device 101 since the response propagates the same communication path as the media stream.
  • the user device and/or computing device 101 may, for example, determine/calculate a color or colors to change the pixels by first determining the current pixel color in a displayed region of interest at the user device, the computing device 101, or both.
  • the user device may send a signal, message, notification, and/or the like to the computing device 101 to indicate the current color of the pixels in the region of interest such that a visually opposite or sufficiently different color range of pixels can be used, detected, and/or determined by the computing device 101 during a ping/pong cycle and/or latency determination exercise.
  • the computing device 101 may send a signal, message, notification, and/or the like to the user device to indicate the region of interest and appropriate color(s) to use during the ping/pong cycle to facilitate detection by the computing device 101.
  • the computing device 101 may stop the timer associated with the user device (e.g., the user devices 102-105, etc.) sending the response and determine/calculate the latency and/or response time between the user device and the computing device 101.
  • the computing device may use the following algorithm, PongReceivedTime PingSentTime, to determine the latency. Since latency within the network 106 may vary from time to time, The computing device 101 may, for example, repeat the described ping/pong process several times to achieve a consensus of the latency determination (e.g., Per former.
  • the computing device 101 may determine a user device of the user devices 102-105 associated with the greatest and/or highest amount of latency between the user device and the computing device 101. For example, the computing device 101 may determine, tag, and/or indicate the greatest and/or highest amount of latency between the user device and the computing device 101 as a WorstLatencyPerformer . The computing device 101 may use the latency associated with the WorstLatencyPerformer to determine/calculate a latency offset for each of the user devices 102-105 (e.g., a performance/action of the users 120-123 captured respectively by the user devices 102-105, etc.). For example, the computing device 101 may use the following algorithm to determine/calculate a latency offset (e.g., Performer. LatencyOffset, etc.) for each of the user devices 102-105:
  • a latency offset e.g., Performer. LatencyOffset, etc.
  • the computing device 101 may access related audio (e.g., an audio track, audio content, an audio file, etc.) within a local repository and/or database (not shown) and begin playing/outputting the audio after expiration of a time period associated with the user device with the greatest and/or highest amount of latency between the computing device 101 (e.g., the WorstLatencyPerformer , etc.).
  • related audio e.g., an audio track, audio content, an audio file, etc.
  • the computing device 101 may send a signal to each of the user devices 102-105 that causes each of the user devices 102-105 to begin local playback of the related audio (e.g., an audio track, audio content, an audio file, etc.) after the expiration of the associated latency offset time period (e.g., Performer. LatencyOffset, etc.) in milliseconds.
  • Each of the user devices 102-105 may receive/download, for example, from the computing device 101 and/or the like, a local copy of the related audio (e.g., an audio track, audio content, an audio file, etc.).
  • Each of the user devices 102-105 may output and/or enable the respective users 120-123 to use the received/downloaded related audio (e.g., an audio track, audio content, an audio file, etc.) as a reference to maintain correct timing during a performance (e.g., while singing, playing an instrument, dancing, etc.).
  • the received/downloaded related audio e.g., an audio track, audio content, an audio file, etc.
  • Each of the user devices 102-105 may display media indicative of a performance by the respective user of the user device and/or media indicative of a performance by the other users of the system 100. Each of the user devices 102-105 may send media to the computing device 101 indicative of the performance by the respective users 120-123.
  • FIG. 2 shows a display 200 output by and/or associated with the user device 102.
  • the display 200 may include multiple windows and/or in-picture areas, such as windows 201-204.
  • Each window 201-204 may represent media, such as performance data/information, received from a user device of the system 100.
  • each window 201-204 may display timing statistics of all performers (users/user devices) associated with a composite multimedia performance.
  • each window 201-204 may display an indication of the integrity of the connection between the user devices, for example, as perceived by the respective user devices and/or the like.
  • colored icons may be used to indicate the integrity of the connection between the user devices. For example, a green icon may indicate that a connection is stable, error-free, and/or timely. An orange icon may indicate that a connection is experiencing intermittent issues, and a red icon may indicate that a connection is problematic, failing, and/or the like. Any indicator may be used to indicate the status of a connection between user devices.
  • the described visual representations enable each of the users 120-123, for example, via the user devices 102-105, to know if other user/user devices are in a stable state to begin a group performance and/or start performing a new song.
  • Relative stability may be defined, for example, as an average of the last «-number of latency measurements and ensuring that the most recent measurement is within a defined threshold (e.g., within a number of milliseconds plus or minus from the average, etc.).
  • a connection integrity indicator may be green to indicate that it is safe to begin a song.
  • the most recent latency e.g., current latency measure, etc.
  • the indicator may turn orange to warn against starting a new song because an associated user/user device is communicating via a connection that may not be reliable.
  • the issue may be resolved by waiting for a duration (e.g., a few seconds/minutes) to allow the average latency summation to rebalance, or by waiting for current latency to fall within 30 milliseconds of the average latency. If the most recent latency (e.g., current latency measure, etc.) for a user/user device is greater than 60 milliseconds (plus or minus) from the associated average latency then the indicator may turn red to indicate that an associated user/user device is communicating via a connection that may is likely too unstable to begin a new song. Thresholds for indicating/determining the integrity of a connection may be based on any value.
  • media streams from each of the user devices 102-105 may, for example, exclude and/or filter out, any locally-playing copy of the related audio (e.g., an audio track, audio content, an audio file, etc.) that the users 120-123 use when performing.
  • the computing device may combine the locally-playing and variably-delayed media, such the related audio (e.g., an audio track, audio content, an audio file, etc.), with variably-delayed multimedia streams from each the user devices 102-105 to generate and/or cause output of a real-time composite multimedia performance.
  • Each of the user devices 102-105 may modify aspects of the composite multimedia performance. For example, a user device may be used to modify/adjust display settings, such as scene transitions, color patterns, audio levels, video quality, offset time values, and/or any other attribute of and/or associated with the composite multimedia performance.
  • the system 100 enables each of the users 120-123, for example, via the user devices 102-105, to augment and stylize the composite multimedia performance by adding, for example, visual graphics and full-motion video behind on-screen (e.g., displayed, etc.) boxes in video associated with each of the users 120-123 is contained to generate/create a virtual stage on which the users 120-123 perform.
  • each user 120-123, via the user devices 102-105 can include a chroma-key or green- screen effect causing the user to appear to be physically located in front of a green- screen and/or the like.
  • the computing device 101 may replace the background and/or green-screen associated with each user/user device with visual graphics, a video stream, and/or the like when generating the composite multimedia performance.
  • the system 100 enables multiple users/performers in different physical locations to appear to share a common stage embodied in the composite multimedia performance.
  • the system 100 enables each of the users 120-123, for example, via the user devices 102-105, to control the show state of the composite multimedia performance such that alternative content such a graphics, title screens, or videos can be presented to an audience in addition to or in place of the composite multimedia performance.
  • system 100 enables each of the users 120-123, for example, via the user devices 102-105, to cause a transition from a display of the composite multimedia performance to the display/occurrence of a "blackout" state in which the audience may view/consume alternative content.
  • the users 120-123 may communicate.
  • the users 120-123 for example, via the user devices 102-105, may be able to see, hear, and/or converse with each other in a manner in which an audience is not privy (e.g., a virtual lowering of a curtain on a stage during an intermission or blackout period in a stage show, etc.).
  • the computing device 101 may cause output of alternative content.
  • the users 120-123 for example, via the user devices 102-105, may cause the computing device 101 to return to a "live" state whereby the users 120-123 are displayed to an audience again.
  • the system 100 during the output of a composite multimedia performance, may cause a virtual “curtain up” at the beginning of a performance, occasional blackouts during the performance, and a virtual “curtain down” at the end of the performance which transitions to a final “show has ended” video and/or title screen followed by the end of the composite multimedia performance.
  • the system 100 enables blackouts to optionally be set to automatically occur before and/or after each song in a composite multimedia performance.
  • the users 120- 123 for example, via the user devices 102-105, may implement automatic blackout occurrences so that when a song is selected to be performed to the composite multimedia performance will automatically transition to a blackout state so an audience views alternative content such as a song, introductory video, graphics, a title card, and/or the like before a segue into the song automatically occurs after a selectable period of time.
  • the system 100 enables each of the users 120-123, for example, via the user devices 102-105, to review, modify, and/or adjust a composite multimedia performance before the composite multimedia performance is published and/or output to an audience.
  • the users 120-123 may, for example, rehearse and make manual adjustments such as latency timing associated with their respective user device and/or any other user device, the audio level of performances (e.g., media) output by other user devices (e.g., heard by the current user of a user device, etc.) during a performance.
  • Adjustment of latency timing, audio levels, and/or the like of other users/performers enables any given/current user and/or user device to maintain proper timing while performing to the related audio (e.g., an audio track, audio content, an audio file, etc.) since latency and/or timing between user devices (e.g., the user devices 102-105, etc.) and/or users/performers (e.g., users 120-123, etc.) may be inconsistent.
  • the audio level of a mistimed and/or unsynchronized performer may be muted and/or attenuated by any user device and/or user to promote proper performance timing and/or cadence.
  • the system 100 enables each of the users 120-123, for example, via the user devices 102-105, to send one or more signals to control the audio levels of each of the users 120-123 of a composite multimedia performance perceived by a user device.
  • each of the users 120-123, via the user devices 102-105 may adjust the perceived audio level of each other user/user device and/or the related audio.
  • each user device may be associated with a different perceivable audio level than the computing device 101 and/or another user device, and each of the users 120- 123, for example, via the user devices 102-105, may optionally hear each other along with their own version of the related audio (e.g., an audio track, audio content, an audio file, etc.) to which the respective user performs - which may or may not be the same audio heard by an audience.
  • the audience may hear a finished song/product, while each of the users 120-123, for example, via the user devices 102- 105, may hear a simple click or drum track used for timing with or without all of the instrumentation used in the final composite multimedia performance output to the audience.
  • the system 100 enables each of the users 120-123, for example, via the user devices 102-105 (e.g., via the interface module 108, etc.), to use a virtual metronome for performance timing purposes.
  • the virtual metronome may be output along with and/or in addition to the related audio (e.g., an audio track, audio content, an audio file, etc.) or in lieu of the related audio (e.g., an audio track, audio content, an audio file, etc.).
  • the metronome for each user/user device may leverage the timing algorithm as previously described where the start/stop timing signals are used to trigger the metronome.
  • Each of the users 120-123 for example, via the user devices 102-105 can adjust the beats per minute (BPM) and musical key used for the metronome(s) for each song. Data may be shared by each of the user devices to enable all metronomes used in the performance to operate at the same rate and/or musical key. For example, each of the users 120-123, via the user devices 102-105, may make adjustments to a virtual metronome and the adjustments may be shared between each of the user devices in real-time.
  • BPM beats per minute
  • musical key used for the metronome(s) for each song.
  • Data may be shared by each of the user devices to enable all metronomes used in the performance to operate at the same rate and/or musical key.
  • each of the users 120-123, via the user devices 102-105 may make adjustments to a virtual metronome and the adjustments may be shared between each of the user devices in real-time.
  • the composite multimedia performance may be sent, for example, by the computing device 101, to and/or published via a multimedia streaming platform (e.g., Youtube®, Vimeo®, etc.) and/or social media platform (e.g., Instagram, TikTok, etc.) for consumption by multiple audiences.
  • a multimedia streaming platform e.g., Youtube®, Vimeo®, etc.
  • social media platform e.g., Instagram, TikTok, etc.
  • each of the user devices 102-105 may send/provide an indication and/or notification that/when the respective the users 120-123 are ready to publish/output the composite multimedia performance, such as once all timing and/or media (e.g., audio, video, etc.) adjustments (e.g., adjustments to scene transitions, color patterns, audio levels, video quality, offset time values, etc.) are deemed suitable.
  • FIG. 3 shows an example display 300 of a composite multimedia performance of each of the users 120-124.
  • the system 100 audience members and/or viewers, such as users that view the composite multimedia performance via a multimedia streaming platform (e.g., Youtube®, Vimeo®, etc.), social media platform (e.g., Instagram, TikTok, etc.), and/or the like to interact with the users 120-124, for example in real time, by sending free and/or paid content to the respective user devices 102-105 during a live performance.
  • a multimedia streaming platform e.g., Youtube®, Vimeo®, etc.
  • social media platform e.g., Instagram, TikTok, etc.
  • messages from the audience may enter a moderation queue and/or be passed through a filter mechanism to remove offensive content before being presented to the users 120-124.
  • Audience-Supplied Content may include text messages, pictures, and videos, and/or the like that may visible to the users 120-124 via the respective user devices 102-105, and/or made visible to the entire audience by being displayed along with the composite multimedia performance.
  • the system 100 may enable each of the users 120-123, for example, via the user devices 102-105, to isolate sections of the composite multimedia performance once it has ended to create video clips that may be shared with the audience/fans. For example, isolated sections of a composite multimedia performance enable the users 120-123 to perform most of a song and leave a section open for the audience/fans to sing-to and/or sing along with.
  • the system 100 may enable audience/fans an option for a free or paid photo/video opportunity (meet & greet) with the users 120-123.
  • the computing device 101 may add communications from audience/fans to a virtual queue and/or cache.
  • the computing device 101 may cause a signal to be sent to each audience/fan in the virtual queue and/or cache (e.g., sent to a computing device, mobile, device, smart device, and/or the like associated with the audience/fan) that causes a camera associated with the audience/fan to activate and/or facilitate video conferencing, image taking, and/or the like with each of the users 120-123, for example, via the user devices 102-105.
  • a display associated with the users 120-123 and/or the audience/fans may include a split-screen, picture-in-picture, and/or the like that may be used for video conferencing, image taking, and/or the like. Images, audio, and/or video associated with an audience/fan and user engagement may be uploaded and shared via the multimedia streaming platform (e.g., Youtube®, Vimeo®, etc.), social media platform (e.g., Instagram, TikTok, etc.), and/or the like.
  • FIG. 4 shows a flowchart of a method 400 for synchronizing multimedia.
  • a computing device may combine a locally-playing and variably-delayed media, such as a background audio track, with variably-delayed multimedia streams from each user device (e.g., computing device, smart device, mobile device, multimedia capturing device, etc.) of a plurality of user devices to produce a real-time composite multimedia performance.
  • a user device e.g., computing device, smart device, mobile device, multimedia capturing device, etc.
  • the computing device may use data/information (e.g., visual markers, pixels, metadata, etc.) embedded in a multimedia stream from each user device of the plurality of user devices to determine timing information and/or latency information associated with the user device, and use the timing and/or latency information to synchronize the various multimedia streams and generate the composite multimedia performance.
  • data/information e.g., visual markers, pixels, metadata, etc.
  • the method 400 may include, at 410, determining a respective latency value for each user device (e.g., computing device, smart device, mobile device, multimedia capturing device, etc.) of a plurality of user devices.
  • a computing device e.g., a server, a cloud-based device, a network device, a timing controller, a multimedia stream compositor, etc.
  • Determining, for each user device of the plurality of user devices, the respective latency may include: sending a signal to the user device; and receiving, based on the signal, the one or more pixels of the video content, wherein the latency value indicates an amount of time elapsed before the one or more pixels of the video content are received.
  • Receiving the one or more pixels of the video content may include: determining a region of interest (ROI) within the video content received from the user device; and determining a change in a color associated with the one or more pixels within the ROI.
  • ROI region of interest
  • the total latency value may be determined based on at least one of an aggregation of the respective latency values for each user device of the plurality of user devices, and an average of the respective latency values for each user device of the plurality of user devices.
  • the respective latency offset value may be determined by multiplying the highest latency value by the total latency value and then subtracting the respective latency value from the result.
  • the computing device may output audio content after a period of time equal to the highest latency value multiplied by the total latency value.
  • the output time may be and/or be determined and/or based on any time.
  • causing each user device of the plurality of user devices to output audio content at a respective output time wherein the respective output time is based on the output time and the respective latency offset value.
  • Causing each user device of the plurality of user devices to output the audio content at the respective output time may include sending notification to the user device, wherein the notification comprises an instruction to begin a local output of the audio content at the respective output time.
  • the method 400 may include synchronizing, based on the respective latency offset values for each user device of the plurality of user devices, the additional video content received from each user device of the plurality of user devices.
  • causing output of composite content wherein the composite content comprises the additional video content received from each user device of the plurality of user devices and the audio content.
  • Causing the output of the composite content may include publishing the composite content to at least one of a video hosting service, video sharing service, and a social media service.
  • the method 400 may further include causing, based on one or more signals received from at least one user device of the plurality of user devices, at least a portion of the composite content to be obscured.
  • the method 400 may further include modifying, based on one or more signals received from at least one user device of the plurality of user devices, an audio level associated with at least a portion of the composite content.
  • the method 400 may further include: receiving from a network device (e.g., audience device, etc.), media content (e.g., audience generated content, video clips, sound clips, etc.); and updating, based on the media content, at least a portion of the composite content.
  • a network device e.g., audience device, etc.
  • media content e.g., audience generated content, video clips, sound clips, etc.
  • FIG. 5 shows a flowchart of a method 500 for synchronizing multimedia.
  • a computing device e.g., a server, a cloud-based device, a network device, a timing controller, a multimedia stream compositor, etc.
  • user device e.g., computing device, smart device, mobile device, multimedia capturing device, etc.
  • the computing device may use data/information (e.g., visual markers, pixels, metadata, etc.) embedded in a multimedia stream from each user device of the plurality of user devices to determine timing information and/or latency information associated with the user device, and use the timing and/or latency information to synchronize the various multimedia streams and generate the composite multimedia performance.
  • data/information e.g., visual markers, pixels, metadata, etc.
  • the method 500 may include, at 510, determining a respective latency value for each user device (e.g., computing device, smart device, mobile device, multimedia capturing device, etc.) of a plurality of user devices.
  • a computing device e.g., a server, a cloud-based device, a network device, a timing controller, a multimedia stream compositor, etc.
  • Determining, for each user device of the plurality of user devices, the respective latency may include: sending a signal to the user device; and receiving, based on the signal, video embedded with the metadata, wherein the latency value indicates an amount of time elapsed before the one or more pixels of the video content are received. [0060] At 520, determining, based on the respective latency values for each user device of the plurality of user devices, a total latency value and a highest latency value. The total latency value may be determined based on at least one of an aggregation of the respective latency values for each user device of the plurality of user devices, and an average of the respective latency values for each user device of the plurality of user devices.
  • determining, for each user device of the plurality of user devices, based on the total latency value, the highest latency value, and the respective latency value, a respective latency offset value may be determined by multiplying the highest latency value by the total latency value and then subtracting the respective latency value from the result.
  • the computing device may output audio content after a period of time equal to the highest latency value multiplied by the total latency value.
  • the output time may be and/or be determined and/or based on any time.
  • causing each user device of the plurality of user devices to output audio content at a respective output time wherein the respective output time is based on the output time and the respective latency offset value.
  • Causing each user device of the plurality of user devices to output the audio content at the respective output time may include sending notification to the user device, wherein the notification comprises an instruction to begin a local output of the audio content at the respective output time.
  • the method 500 may include synchronizing, based on the respective latency offset values for each user device of the plurality of user devices, the additional video content received from each user device of the plurality of user devices.
  • causing output of composite content wherein the composite content comprises the additional video content received from each user device of the plurality of user devices and the audio content.
  • Causing the output of the composite content may include publishing the composite content to at least one of a video hosting service, video sharing service, and a social media service.
  • the method 500 may further include causing, based on one or more signals received from at least one user device of the plurality of user devices, at least a portion of the composite content to be obscured.
  • the method 500 may further include modifying, based on one or more signals received from at least one user device of the plurality of user devices, an audio level associated with at least a portion of the composite content.
  • the method 500 may further include: receiving from a network device (e.g., audience device, etc.), media content (e.g., audience generated content, video clips, sound clips, etc.); and updating, based on the media content, at least a portion of the composite content.
  • a network device e.g., audience device, etc.
  • media content e.g., audience generated content, video clips, sound clips, etc.
  • FIG.6 shows a system 600 for synchronizing multimedia. Any device and/or component described herein may be a computer 601 as shown in FIG. 6.
  • the computer 601 may comprise one or more processors 603, a system memory 612, and a bus 613 that couples various components of the computer 601 including the one or more processors 603 to the system memory 612. In the case of multiple processors 603, the computer 601 may utilize parallel computing.
  • the bus 613 may comprise one or more of several possible types of bus structures, such as a memory bus, memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures.
  • the computer 601 may operate on and/or comprise a variety of computer- readable media (e.g., non-transitory). Computer-readable media may be any available media that is accessible by the computer 601 and comprises, non-transitory, volatile and/or non-volatile media, removable and non-removable media.
  • the system memory 612 has computer-readable media in the form of volatile memory, such as random access memory (RAM), and/or non-volatile memory, such as read-only memory (ROM).
  • the system memory 612 may store data such as multimedia synchronization data 607 and/or program modules such as operating system 605 and multimedia synchronization software 606 that are accessible to and/or are operated on by the one or more processors 603.
  • the computer 601 may also comprise other removable/non-removable, volatile/non-volatile computer storage media.
  • the mass storage device 604 may provide non-volatile storage of computer code, computer-readable instructions, data structures, program modules, and other data for the computer 601.
  • the mass storage device 604 may be a hard disk, a removable magnetic disk, a removable optical disk, magnetic cassettes or other magnetic storage devices, flash memory cards, CD-ROM, digital versatile disks (DVD) or other optical storage, random access memories (RAM), read-only memories (ROM), electrically erasable programmable read-only memory (EEPROM), and the like.
  • Any number of program modules may be stored on the mass storage device 604.
  • An operating system 605 and multimedia synchronization software 606 may be stored on the mass storage device 604.
  • One or more of the operating system 605 and multimedia synchronization software 606 (or some combination thereol) may comprise program modules and the multimedia synchronization software 606.
  • multimedia synchronization data 607 may also be stored on the mass storage device 604.
  • Multimedia synchronization data 607 may be stored in any of one or more databases known in the art. The databases may be centralized or distributed across multiple locations within the network 615.
  • a user may enter commands and information into the computer 601 via an input device (not shown).
  • input devices comprise, but are not limited to, a keyboard, pointing device (e.g., a computer mouse, remote control), a microphone, a joystick, a scanner, tactile input devices such as gloves, and other body coverings, motion sensor, and the like
  • a human-machine interface 602 that is coupled to the bus 613, but may be connected by other interface and bus structures, such as a parallel port, game port, an IEEE 1394 Port (also known as a Firewire port), a serial port, network adapter 608, and/or a universal serial bus (USB).
  • a display device 611 may also be connected to the bus 613 via an interface, such as a display adapter 609. It is contemplated that the computer 601 may have more than one display adapter 609 and the computer 601 may have more than one display device 611.
  • a display device 611 may be a monitor, an LCD (Liquid Crystal Display), light-emitting diode (LED) display, television, smart lens, smart glass, and / or a projector.
  • other output peripheral devices may comprise components such as speakers (not shown) and a printer (not shown) which may be connected to the computer 601 via Input/Output Interface 610.
  • Any step and/or result of the methods may be output (or caused to be output) in any form to an output device.
  • Such output may be any form of visual representation, including, but not limited to, textual, graphical, animation, audio, tactile, and the like.
  • the display 611 and computer 601 may be part of one device, or separate devices.
  • the computer 601 may operate in a networked environment using logical connections to one or more remote computing devices 614a, b,c.
  • a remote computing device 614a, b,c may be a personal computer, computing station (e.g., workstation), portable computer (e.g., laptop, mobile phone, tablet device), smart device (e.g., smartphone, smartwatch, activity tracker, smart apparel, smart accessory), security and/or monitoring device, a server, a router, a network computer, a peer device, edge device or other common network nodes, and so on.
  • Logical connections between the computer 601 and a remote computing device 614a,b,c may be made via a network 615, such as a local area network (LAN) and/or a general wide area network (WAN). Such network connections may be through a network adapter 608.
  • a network adapter 608 may be implemented in both wired and wireless environments. Such networking environments are conventional and commonplace in dwellings, offices, enterprise wide computer networks, intranets, and the Internet.
  • Application programs and other executable program components such as the operating system 605 are shown herein as discrete blocks, although it is recognized that such programs and components may reside at various times in different storage components of the computing device 601, and are executed by the one or more processors 603 of the computer 601.
  • An implementation of multimedia synchronization software 606 may be stored on or sent across some form of computer-readable media. Any of the disclosed methods may be performed by processor-executable instructions embodied on computer-readable media.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

A computing device (e.g., a server, a cloud-based device, a network device, a timing controller, a multimedia stream compositor, etc.) may use data/information (e.g., visual markers, pixels, metadata, etc.) embedded in a multimedia stream from each user device (e.g., computing device, smart device, mobile device, multimedia capturing device, etc.) of a plurality of user devices to determine timing information and/or latency information associated with the user device, and use the timing and/or latency information to synchronize the various multimedia streams.

Description

METHODS AND SYSTEMS FOR SYNCHRONIZING MULTIMEDIA
CROSS-REFERENCE TO RELATED PATENT APPLICATION
[0001] This application claims priority to U.S. Provisional Application No. 63/030,424 filed May 27, 2020, herein incorporated by reference in its entirety.
BACKGROUND
[0002] Musicians, singers, and/or the like often gather and/or meet at a location to collectively play music, sing, and/or the like. When gathered in the same location, the music output by different musicians, singers, and/or the like is synchronized during a performance. Videotelephony enables media (e.g., audio, video, etc.) from multiple users to be presented collectively in a virtual meeting environment. In a virtual meeting where the meeting occurs through videotelephony, musicians, singers, and/or the like are unable to perform in sync due to timing differences caused by the musicians, singers, and/or the like being in different physical locations. Systems that enable virtual performances by musicians, singers, and/or the like to be synchronized are needed.
SUMMARY
[0003] It is to be understood that both the following general description and the following detailed description are exemplary and explanatory only and are not restrictive. Methods and systems for synchronizing multimedia are described.
[0004] A user device (e.g., computing device, smart device, mobile device, multimedia capturing device, etc.) may be used to determine/capture multimedia (e.g., video, audio, etc.) indicative of a user action and/or performance, such as a performer playing an instrument, singing, and/or producing music. Multimedia determined/captured by multiple user devices that indicates the actions/performance of multiple performers may be combined to form a synchronized composite performance with minimal latency and delay between performers. For example, a computing device (e.g., a server, a cloud-based device, a network device, a timing controller, a multimedia stream compositor, etc.) may use data/information (e.g., visual markers, pixels, metadata, etc.) embedded in a multimedia stream from each user device to determine timing information and/or latency information associated with the user device, and use the timing and/or latency information to synchronize the various multimedia streams.
[0005] This summary is not intended to identify critical or essential features of the disclosure, but merely to summarize certain features and variations thereof. Other details and features will be described in the sections that follow.
BRIEF DESCRIPTION OF THE DRAWINGS
[0006] The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments and together with the description, serve to explain the principles of the methods and systems:
Figure 1 shows an example system;
Figure 2 shows an example display;
Figure 3 shows an example display;
Figure 4 shows a flowchart of an example method;
Figure 5 shows a flowchart of an example method; and
Figure 6 shows a block diagram of a computing device for implementing example methods.
DETAILED DESCRIPTION
[0007] As used in the specification and the appended claims, the singular forms “a,” “an,” and “the” include plural referents unless the context clearly dictates otherwise. Ranges may be expressed herein as from “about” one particular value, and/or to “about” another particular value. When such a range is expressed, another configuration includes from the one particular value and/or to the other particular value. When values are expressed as approximations, by use of the antecedent “about,” it will be understood that the particular value forms another configuration. It will be further understood that the endpoints of each of the ranges are significant both in relation to the other endpoint, and independently of the other endpoint.
[0008] “Optional” or “optionally” means that the subsequently described event or circumstance may or may not occur, and that the description includes cases where said event or circumstance occurs and cases where it does not.
[0009] Throughout the description and claims of this specification, the word “comprise” and variations of the word, such as “comprising” and “comprises,” means “including but not limited to,” and is not intended to exclude other components, integers, or steps. “Exemplary” means “an example of’ and is not intended to convey an indication of a preferred or ideal configuration. “Such as” is not used in a restrictive sense, but for explanatory purposes.
[0010] It is understood that when combinations, subsets, interactions, groups, etc. of components are described that, while specific reference of each various individual and collective combinations and permutations of these may not be explicitly described, each is specifically contemplated and described herein. This applies to all parts of this application including, but not limited to, steps in described methods. Thus, if there are a variety of additional steps that may be performed it is understood that each of these additional steps may be performed with any specific configuration or combination of configurations of the described methods.
[0011] As will be appreciated by one skilled in the art, hardware, software, or a combination of software and hardware may be implemented. Furthermore, a computer program product on a computer-readable storage medium (e.g., non-transitory) having processor-executable instructions (e.g., computer software) embodied in the storage medium. Any suitable computer-readable storage medium may be utilized including hard disks, CD-ROMs, optical storage devices, magnetic storage devices, memresistors, Non-Volatile Random Access Memory (NVRAM), flash memory, or a combination thereof.
[0012] Throughout this application, reference is made to block diagrams and flowcharts. It will be understood that each block of the block diagrams and flowcharts, and combinations of blocks in the block diagrams and flowcharts, respectively, may be implemented by processor-executable instructions. These processor-executable instructions may be loaded onto a general-purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the processor-executable instructions which execute on the computer or other programmable data processing apparatus create a device for implementing the functions specified in the flowchart block or blocks.
[0013] These processor-executable instructions may also be stored in a computer- readable memory that may direct a computer or other programmable data processing apparatus to function in a particular manner, such that the processor-executable instructions stored in the computer-readable memory produce an article of manufacture including processor-executable instructions for implementing the function specified in the flowchart block or blocks. The processor-executable instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer-implemented process such that the processor-executable instructions that execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart block or blocks.
[0014] Accordingly, blocks of the block diagrams and flowcharts support combinations of devices for performing the specified functions, combinations of steps for performing the specified functions and program instruction means for performing the specified functions. It will also be understood that each block of the block diagrams and flowcharts, and combinations of blocks in the block diagrams and flowcharts, may be implemented by special purpose hardware-based computer systems that perform the specified functions or steps, or combinations of special purpose hardware and computer instructions.
[0015] “Content items,” as the phrase is used herein, may also be referred to as “content,” “content data,” “content information,” “content asset,” “multimedia asset data file,” or simply “data” or “information”. Content items may be any information or data that may be licensed to one or more individuals (or other entities, such as businesses or groups). Content may be electronic representations of video, audio, text, and/or graphics, which may be but is not limited to electronic representations of videos, movies, or other multimedia, which may be but is not limited to data files adhering to MPEG2, MPEG, MPEG4 UHD, HDR, 4k, Adobe® Flash® Video ( FLV) format or some other video file format whether such format is presently known or developed in the future. The content items described herein may be electronic representations of music, spoken words, or other audio, which may be but is not limited to data files adhering to the MPEG-1 Audio Layer 3 (.MP3) format, Adobe®, CableLabs 1.0,1.1, 3.0, AVC, HEVC, H.264, Nielsen watermarks, V-chip data and Secondary Audio Programs (SAP). Sound Document (.ASND) format or some other format configured to store electronic audio whether such format is presently known or developed in the future. In some cases, content may be data files adhering to the following formats: Portable Document Format (.PDF), Electronic Publication (.EPUB) format created by the International Digital Publishing Forum (IDPF), JPEG (JPG) format, Portable Network Graphics (.PNG) format, dynamic ad insertion data (.csv), Adobe® Photoshop® (.PSD) format or some other format for electronically storing text, graphics and/or other information whether such format is presently known or developed in the future. Content items may be any combination of the above-described formats.
[0016] “Consuming content” or the “consumption of content,” as those phrases are used herein, may also be referred to as “accessing” content, “providing” content, “viewing” content, “listening” to content, “rendering” content, or “playing” content, among other things. In some cases, the particular term utilized may be dependent on the context in which it is used. Consuming video may also be referred to as viewing or playing the video. Consuming audio may also be referred to as listening to or playing the audio.
[0017] This detailed description may refer to a given entity performing some action.
It should be understood that this language may in some cases mean that a system (e.g., a computer) owned and/or controlled by the given entity is actually performing the action.
[0018] A user device (e.g., computing device, smart device, mobile device, multimedia capturing device, etc.) may be used to determine/capture multimedia (e.g., video, audio, etc.) indicative of a user action and/or performance, such as a performer playing an instrument, singing, producing music, and/or the like in real-time. Multimedia determined/captured and/or output/streamed by multiple user devices that indicate the actions/performance of multiple performers may be combined to form a synchronized composite performance with minimal latency and delay between performers.
[0019] For example, a computing device (e.g., a server, a cloud-based device, a network device, a timing controller, a multimedia stream compositor, etc.) may use data/information (e.g., visual markers, pixels, metadata, etc.) embedded in a multimedia stream from each user device to determine timing information and/or latency information associated with the user device, and use the timing and/or latency information to synchronize the various multimedia streams. For example, may determine communication timing between each user device to identify a user device associated with the most and/or greatest degree of latency. The most and/or greatest degree of latency may be used to determine a respective offset time value for each user device that may be used to vary (e.g., variably delay, etc.) the timing of multimedia received from the user device.
[0020] The computing device may combine a locally-playing and variably-delayed media, such as a background audio track, with variably-delayed multimedia streams from each user device to produce a real-time composite multimedia performance.
Each user device may modify aspects of the composite multimedia performance. For example, a user device may be used to modify/adjust display settings, such as scene transitions, color patterns, audio levels, video quality, offset time values, and/or any other attribute of and/or associated with the composite multimedia performance. The composite multimedia performance may be sent to and/or published via multimedia streaming platform and/or social media platform for consumption by multiple audiences.
[0021] FIG. 1 shows an example system 100 for synchronizing multimedia. The system 100 facilitates the synchronization of multimedia streams of one or more musicians, singers, entertainment performers, and/or the like. The synchronized multimedia streams may be used to generate a composite multimedia stream that minimizes latency and/or delay between devices associated with one or more musicians, singers, entertainment performers, and/or the like. The composite multimedia stream may be output and/or presented, for example, in real-time, to a wide variety of audiences, platforms, multimedia sharing systems, and/or the like. [0022] A person of ordinary skill in the art would recognize and appreciate that components of the system 100 may include hardware, software (e.g., one or more applications, etc.), combinations thereof, and/or the like. The system may include a computing device 101 (e.g., a server, a cloud-based device, a network device, a timing controller, a multimedia stream compositor, etc.) in communication with a network 106. The network 106 may include and/or support a wide area network (e.g., a packet- switch network, Internet, etc.), multimedia hosting and/or communication network, a local area network (LAN), a content delivery/access network, a cellular network, a satellite network, a fiber-optic network, a wired network, a wireless network, and/or the like. The network 106 may support any network path, protocol, device, and/or the like.
[0023] The computing device 101 may support and/or host multimedia conferences, virtual meetings, multimedia exchange/sharing/compositing, and/or the like between one or more user devices (e.g., computing device, smart device, mobile device, multimedia capturing device, etc.) coupled to and/or in communication with the network 105, such as user devices 102, 103, 104, and 105.
[0001] For simplicity, the user device 102 is shown in greater detail than the user devices 103-105. However, each of the user devices 102-105 may be configured the same/similarly and may operate and perform the same and/or similar functions. The user device 102 may include an imaging module 107. The imaging module 107 may include one or more image capturing devices, such as one or more cameras that determine/capture image data (e.g., video, streaming media, etc.). For example, the imaging module 107 may capture/determine actions, such as a real-time performance, of a user 120. For example, the user 120 may include a musician, a singer, an entertainment performer, and/or the like. The user devices 103-105 may each include the imaging modules that capture/determine actions, such as real-time performances, of users 121-123, respectively.
[0024] The user device 102 may include an interface module 108. The interface module 108 may include one or more interfaces for presenting and/or receiving information to/from a user, such as a multimedia representation of a performance, a composite performance, interactive and/or enhanced data/information, and/or the like. The interface module 108 may include any software, hardware, and/or interfaces used to provide communication between the user device 102 and/or any other device/component of the system 100. The interface module 108 may include one or more audio devices (e.g., stereos, speakers, microphones, etc.). The interface module 108 may include a graphical user interface (GUI), a web browser (e.g., Internet Explorer®, Mozilla Firefox®, Google Chrome®, Safari®, or the like), an application/ API. The interface module 108 may request and/or query various files from a local source and/or a remote source, such as the computing device 101, the user devices 102-105, and/or any other device/component (not shown) associated with the system 100.
[0025] The user device 102 may include a communication module 109. The communication module 109 may enable the user device 102 to communicate with components of the system 100, such as the computing device 101 and/or the user devices 103-105, via wired and/or wireless communication techniques. For example, the communication module 109 may utilize any suitable wired communication technique, such as Ethernet, coaxial cable, fiber optics, and/or the like. The communication module 109 may utilize any suitable long-range communication technique, such as Wi-Fi (IEEE 802.11), BLUETOOTH®, cellular (e.g., fourth- generation (4G) cellular technology, fifth-generation (5G) cellular technology, any- generation (XG) cellular technology, etc.), satellite, infrared, and/or the like. The communication module 109 may utilize any suitable short-range communication technique, such as BLUETOOTH®, near-field communication, infrared, and the like. [0026] The user devices 102-105 may exchange data/information, such as multimedia data, performance data, and/or the like with the computing device 101. For example, the computing device 101 may receive multimedia data from two or more of the user devices 102-105 and generate a composite multimedia stream.
[0027] The computing device 104 may include a communication module 111 to facilitate communication between the user devices 102-105, such as the exchange of multimedia data/information. The communication module 111 may enable the computing device 101 to communicate with components of the system 100, such as the user devices 102-105, via wired and/or wireless communication techniques. For example, the communication module 111 may utilize any suitable wired communication technique, such as Ethernet, coaxial cable, fiber optics, and/or the like. The communication module 111 may utilize any suitable long-range communication technique, such as Wi-Fi (IEEE 802.11), BLUETOOTH®, cellular (e.g., fourth-generation (4G) cellular technology, fifth-generation (5G) cellular technology, any-generation (XG) cellular technology, etc.), satellite, infrared, and/or the like. The communication module 111 may utilize any suitable short-range communication technique, such as BLUETOOTH®, near-field communication, infrared, and the like.
[0028] The computing device 101 may use data/information (e.g., visual markers, pixels, metadata, etc.) embedded in a multimedia stream from each of the user devices 102-105 to determine the timing and/or latency information associated with the user devices 102-105, and use the timing and/or latency information to synchronize the multimedia streams received from the user devices 102-105.
[0029] The determine timing information and/or latency information associated with each of the user devices 102-105 and the computing device 101, the computing device 101 may include a synchronization module 110. The computing device 101 may send, for example, via the synchronization module 110, a signal, a ping, a test communication, and/or the like to each of the user devices 102-105 to cause and/or elicit a response from the of the user devices 102-105, such as a response signal, a pong, a testing communication, and/or the like. While awaiting the responses from each of the user devices 102-105, the computing device 101 may initiate one or more timers to record how long it takes for each of the user devices 102-105 to send a response. Each of the user devices 102-105 may be configured with a telemetry component and/or an application that facilitates the generation of responses to signals, pings, test communications, and/or the like.
[0030] Instead of sending a traditional pong response signal to the computing device 101, the user devices 102-105 may generate a response to signals, pings, test communications, and/or the like from the computing device 101, for example, via metadata embedded in a media (e.g., audio, video, multimedia, etc.) stream sent to the computing device 101, via manipulation of pixels and/or visual markers that may be detected by the computing device 101 within video information (e.g., a video stream, etc.) sent to the computing device 101, and/or the like. As described, response to response to signals, pings, test communications, and/or the like from the computing device 101 sent via a media stream may enable a real-time and accurate determination of latency between user devices and the computing device 101 since the response propagates the same communication path as the media stream.
[0031] When pixels (e.g., manipulation of pixels and/or visual markers, etc.) of a received media stream is used to determine latency between a user device and the computing device 101, the user device and/or computing device 101 may, for example, determine/calculate a color or colors to change the pixels by first determining the current pixel color in a displayed region of interest at the user device, the computing device 101, or both. If a currently displayed pixel color is determined by a user device (the user device 102-105, etc.), the user device may send a signal, message, notification, and/or the like to the computing device 101 to indicate the current color of the pixels in the region of interest such that a visually opposite or sufficiently different color range of pixels can be used, detected, and/or determined by the computing device 101 during a ping/pong cycle and/or latency determination exercise. If a currently displayed pixel color is determined by the computing device 101, the computing device 101 may send a signal, message, notification, and/or the like to the user device to indicate the region of interest and appropriate color(s) to use during the ping/pong cycle to facilitate detection by the computing device 101.
[0032] When the computing device 101 detects embedded metadata in an audio/video stream, and/or manipulated pixels and/or visual markers in a video stream, the computing device 101 may stop the timer associated with the user device (e.g., the user devices 102-105, etc.) sending the response and determine/calculate the latency and/or response time between the user device and the computing device 101. For example, the computing device may use the following algorithm, PongReceivedTime PingSentTime, to determine the latency. Since latency within the network 106 may vary from time to time, The computing device 101 may, for example, repeat the described ping/pong process several times to achieve a consensus of the latency determination (e.g., Per former. Latency, etc.) for each of the user devices 102-105. [0033] The computing device 101 may determine a user device of the user devices 102-105 associated with the greatest and/or highest amount of latency between the user device and the computing device 101. For example, the computing device 101 may determine, tag, and/or indicate the greatest and/or highest amount of latency between the user device and the computing device 101 as a WorstLatencyPerformer . The computing device 101 may use the latency associated with the WorstLatencyPerformer to determine/calculate a latency offset for each of the user devices 102-105 (e.g., a performance/action of the users 120-123 captured respectively by the user devices 102-105, etc.). For example, the computing device 101 may use the following algorithm to determine/calculate a latency offset (e.g., Performer. LatencyOffset, etc.) for each of the user devices 102-105:
Performer .LatencyOffset = (WorstLatencyPerformer. Latency - Performer. Latency). [0034] To synchronize performances by each of the users 120-123, such as in a scenario where each of the users 120-123 are playing, performing, singing, contributing to, and/or the like to the same song and/or related media, the computing device 101 may access related audio (e.g., an audio track, audio content, an audio file, etc.) within a local repository and/or database (not shown) and begin playing/outputting the audio after expiration of a time period associated with the user device with the greatest and/or highest amount of latency between the computing device 101 (e.g., the WorstLatencyPerformer , etc.). The computing device 101 may send a signal to each of the user devices 102-105 that causes each of the user devices 102-105 to begin local playback of the related audio (e.g., an audio track, audio content, an audio file, etc.) after the expiration of the associated latency offset time period (e.g., Performer. LatencyOffset, etc.) in milliseconds. Each of the user devices 102-105 may receive/download, for example, from the computing device 101 and/or the like, a local copy of the related audio (e.g., an audio track, audio content, an audio file, etc.). Each of the user devices 102-105 may output and/or enable the respective users 120-123 to use the received/downloaded related audio (e.g., an audio track, audio content, an audio file, etc.) as a reference to maintain correct timing during a performance (e.g., while singing, playing an instrument, dancing, etc.).
[0035] Each of the user devices 102-105 may display media indicative of a performance by the respective user of the user device and/or media indicative of a performance by the other users of the system 100. Each of the user devices 102-105 may send media to the computing device 101 indicative of the performance by the respective users 120-123. For example, FIG. 2 shows a display 200 output by and/or associated with the user device 102. The display 200 may include multiple windows and/or in-picture areas, such as windows 201-204. Each window 201-204 may represent media, such as performance data/information, received from a user device of the system 100. As shown at 205, each window 201-204 may display timing statistics of all performers (users/user devices) associated with a composite multimedia performance. As shown at 206, each window 201-204 may display an indication of the integrity of the connection between the user devices, for example, as perceived by the respective user devices and/or the like. In some instances, colored icons may be used to indicate the integrity of the connection between the user devices. For example, a green icon may indicate that a connection is stable, error-free, and/or timely. An orange icon may indicate that a connection is experiencing intermittent issues, and a red icon may indicate that a connection is problematic, failing, and/or the like. Any indicator may be used to indicate the status of a connection between user devices. The described visual representations enable each of the users 120-123, for example, via the user devices 102-105, to know if other user/user devices are in a stable state to begin a group performance and/or start performing a new song. Relative stability may be defined, for example, as an average of the last «-number of latency measurements and ensuring that the most recent measurement is within a defined threshold (e.g., within a number of milliseconds plus or minus from the average, etc.). For example, if a user/user device has an average latency of 200 milliseconds, and a most recent latency (e.g., current latency measure, etc.) between 185 milliseconds to 215 milliseconds (e.g., a range of 30 milliseconds) a connection integrity indicator may be green to indicate that it is safe to begin a song. However, if the most recent latency (e.g., current latency measure, etc.) is greater than 30 milliseconds but less than 60 milliseconds the indicator may turn orange to warn against starting a new song because an associated user/user device is communicating via a connection that may not be reliable. The issue may be resolved by waiting for a duration (e.g., a few seconds/minutes) to allow the average latency summation to rebalance, or by waiting for current latency to fall within 30 milliseconds of the average latency. If the most recent latency (e.g., current latency measure, etc.) for a user/user device is greater than 60 milliseconds (plus or minus) from the associated average latency then the indicator may turn red to indicate that an associated user/user device is communicating via a connection that may is likely too unstable to begin a new song. Thresholds for indicating/determining the integrity of a connection may be based on any value.
[0036] Returning to FIG. 1, media streams from each of the user devices 102-105 may, for example, exclude and/or filter out, any locally-playing copy of the related audio (e.g., an audio track, audio content, an audio file, etc.) that the users 120-123 use when performing. The computing device may combine the locally-playing and variably-delayed media, such the related audio (e.g., an audio track, audio content, an audio file, etc.), with variably-delayed multimedia streams from each the user devices 102-105 to generate and/or cause output of a real-time composite multimedia performance. Each of the user devices 102-105 may modify aspects of the composite multimedia performance. For example, a user device may be used to modify/adjust display settings, such as scene transitions, color patterns, audio levels, video quality, offset time values, and/or any other attribute of and/or associated with the composite multimedia performance.
[0037] The system 100 enables each of the users 120-123, for example, via the user devices 102-105, to augment and stylize the composite multimedia performance by adding, for example, visual graphics and full-motion video behind on-screen (e.g., displayed, etc.) boxes in video associated with each of the users 120-123 is contained to generate/create a virtual stage on which the users 120-123 perform. For example, each user 120-123, via the user devices 102-105, can include a chroma-key or green- screen effect causing the user to appear to be physically located in front of a green- screen and/or the like. The computing device 101 may replace the background and/or green-screen associated with each user/user device with visual graphics, a video stream, and/or the like when generating the composite multimedia performance. For example, the system 100 enables multiple users/performers in different physical locations to appear to share a common stage embodied in the composite multimedia performance.
[0038] The system 100 enables each of the users 120-123, for example, via the user devices 102-105, to control the show state of the composite multimedia performance such that alternative content such a graphics, title screens, or videos can be presented to an audience in addition to or in place of the composite multimedia performance.
For example, system 100 enables each of the users 120-123, for example, via the user devices 102-105, to cause a transition from a display of the composite multimedia performance to the display/occurrence of a "blackout" state in which the audience may view/consume alternative content. During a "blackout" state, the users 120-123, for example, via the user devices 102-105, may communicate. For example the users 120-123, for example, via the user devices 102-105, may be able to see, hear, and/or converse with each other in a manner in which an audience is not privy (e.g., a virtual lowering of a curtain on a stage during an intermission or blackout period in a stage show, etc.). The computing device 101, may cause output of alternative content. At an appropriate time, the users 120-123, for example, via the user devices 102-105, may cause the computing device 101 to return to a "live" state whereby the users 120-123 are displayed to an audience again. For example, the system 100, during the output of a composite multimedia performance, may cause a virtual “curtain up” at the beginning of a performance, occasional blackouts during the performance, and a virtual “curtain down” at the end of the performance which transitions to a final “show has ended” video and/or title screen followed by the end of the composite multimedia performance.
[0039] The system 100 enables blackouts to optionally be set to automatically occur before and/or after each song in a composite multimedia performance. The users 120- 123, for example, via the user devices 102-105, may implement automatic blackout occurrences so that when a song is selected to be performed to the composite multimedia performance will automatically transition to a blackout state so an audience views alternative content such as a song, introductory video, graphics, a title card, and/or the like before a segue into the song automatically occurs after a selectable period of time.
[0040] The system 100 enables each of the users 120-123, for example, via the user devices 102-105, to review, modify, and/or adjust a composite multimedia performance before the composite multimedia performance is published and/or output to an audience. During review, the users 120-123 may, for example, rehearse and make manual adjustments such as latency timing associated with their respective user device and/or any other user device, the audio level of performances (e.g., media) output by other user devices (e.g., heard by the current user of a user device, etc.) during a performance. Adjustment of latency timing, audio levels, and/or the like of other users/performers (e.g., users 120-123, etc.) enables any given/current user and/or user device to maintain proper timing while performing to the related audio (e.g., an audio track, audio content, an audio file, etc.) since latency and/or timing between user devices (e.g., the user devices 102-105, etc.) and/or users/performers (e.g., users 120-123, etc.) may be inconsistent. For example, the audio level of a mistimed and/or unsynchronized performer may be muted and/or attenuated by any user device and/or user to promote proper performance timing and/or cadence.
[0041] The system 100 enables each of the users 120-123, for example, via the user devices 102-105, to send one or more signals to control the audio levels of each of the users 120-123 of a composite multimedia performance perceived by a user device. For example, each of the users 120-123, via the user devices 102-105, may adjust the perceived audio level of each other user/user device and/or the related audio. For example, each user device may be associated with a different perceivable audio level than the computing device 101 and/or another user device, and each of the users 120- 123, for example, via the user devices 102-105, may optionally hear each other along with their own version of the related audio (e.g., an audio track, audio content, an audio file, etc.) to which the respective user performs - which may or may not be the same audio heard by an audience. For example, the audience may hear a finished song/product, while each of the users 120-123, for example, via the user devices 102- 105, may hear a simple click or drum track used for timing with or without all of the instrumentation used in the final composite multimedia performance output to the audience.
[0042] The system 100 enables each of the users 120-123, for example, via the user devices 102-105 (e.g., via the interface module 108, etc.), to use a virtual metronome for performance timing purposes. The virtual metronome may be output along with and/or in addition to the related audio (e.g., an audio track, audio content, an audio file, etc.) or in lieu of the related audio (e.g., an audio track, audio content, an audio file, etc.). The metronome for each user/user device may leverage the timing algorithm as previously described where the start/stop timing signals are used to trigger the metronome. Each of the users 120-123, for example, via the user devices 102-105 can adjust the beats per minute (BPM) and musical key used for the metronome(s) for each song. Data may be shared by each of the user devices to enable all metronomes used in the performance to operate at the same rate and/or musical key. For example, each of the users 120-123, via the user devices 102-105, may make adjustments to a virtual metronome and the adjustments may be shared between each of the user devices in real-time.
[0043] The composite multimedia performance may be sent, for example, by the computing device 101, to and/or published via a multimedia streaming platform (e.g., Youtube®, Vimeo®, etc.) and/or social media platform (e.g., Instagram, TikTok, etc.) for consumption by multiple audiences. For example, each of the user devices 102-105 may send/provide an indication and/or notification that/when the respective the users 120-123 are ready to publish/output the composite multimedia performance, such as once all timing and/or media (e.g., audio, video, etc.) adjustments (e.g., adjustments to scene transitions, color patterns, audio levels, video quality, offset time values, etc.) are deemed suitable. FIG. 3 shows an example display 300 of a composite multimedia performance of each of the users 120-124.
[0044] Returning to FIG. 1, the system 100 audience members and/or viewers, such as users that view the composite multimedia performance via a multimedia streaming platform (e.g., Youtube®, Vimeo®, etc.), social media platform (e.g., Instagram, TikTok, etc.), and/or the like to interact with the users 120-124, for example in real time, by sending free and/or paid content to the respective user devices 102-105 during a live performance. In some instances, messages from the audience may enter a moderation queue and/or be passed through a filter mechanism to remove offensive content before being presented to the users 120-124. Audience-Supplied Content may include text messages, pictures, and videos, and/or the like that may visible to the users 120-124 via the respective user devices 102-105, and/or made visible to the entire audience by being displayed along with the composite multimedia performance. [0045] The system 100 may enable each of the users 120-123, for example, via the user devices 102-105, to isolate sections of the composite multimedia performance once it has ended to create video clips that may be shared with the audience/fans. For example, isolated sections of a composite multimedia performance enable the users 120-123 to perform most of a song and leave a section open for the audience/fans to sing-to and/or sing along with.
[0046] The system 100 may enable audience/fans an option for a free or paid photo/video opportunity (meet & greet) with the users 120-123. For example, the computing device 101 may add communications from audience/fans to a virtual queue and/or cache. The computing device 101 may cause a signal to be sent to each audience/fan in the virtual queue and/or cache (e.g., sent to a computing device, mobile, device, smart device, and/or the like associated with the audience/fan) that causes a camera associated with the audience/fan to activate and/or facilitate video conferencing, image taking, and/or the like with each of the users 120-123, for example, via the user devices 102-105. For example, a display associated with the users 120-123 and/or the audience/fans may include a split-screen, picture-in-picture, and/or the like that may be used for video conferencing, image taking, and/or the like. Images, audio, and/or video associated with an audience/fan and user engagement may be uploaded and shared via the multimedia streaming platform (e.g., Youtube®, Vimeo®, etc.), social media platform (e.g., Instagram, TikTok, etc.), and/or the like. [0047] FIG. 4 shows a flowchart of a method 400 for synchronizing multimedia. A computing device (e.g., a server, a cloud-based device, a network device, a timing controller, a multimedia stream compositor, etc.) may combine a locally-playing and variably-delayed media, such as a background audio track, with variably-delayed multimedia streams from each user device (e.g., computing device, smart device, mobile device, multimedia capturing device, etc.) of a plurality of user devices to produce a real-time composite multimedia performance. The computing device may use data/information (e.g., visual markers, pixels, metadata, etc.) embedded in a multimedia stream from each user device of the plurality of user devices to determine timing information and/or latency information associated with the user device, and use the timing and/or latency information to synchronize the various multimedia streams and generate the composite multimedia performance.
[0048] For example, the method 400 may include, at 410, determining a respective latency value for each user device (e.g., computing device, smart device, mobile device, multimedia capturing device, etc.) of a plurality of user devices. For example, a computing device (e.g., a server, a cloud-based device, a network device, a timing controller, a multimedia stream compositor, etc.) may determine, for each user device of the plurality of user devices, based on one or more pixels of video content received from the user device, a respective latency value. Determining, for each user device of the plurality of user devices, the respective latency may include: sending a signal to the user device; and receiving, based on the signal, the one or more pixels of the video content, wherein the latency value indicates an amount of time elapsed before the one or more pixels of the video content are received. Receiving the one or more pixels of the video content may include: determining a region of interest (ROI) within the video content received from the user device; and determining a change in a color associated with the one or more pixels within the ROI.
[0049] At 420, determining, based on the respective latency values for each user device of the plurality of user devices, a total latency value and a highest latency value. The total latency value may be determined based on at least one of an aggregation of the respective latency values for each user device of the plurality of user devices, and an average of the respective latency values for each user device of the plurality of user devices.
[0050] At 430, determining, for each user device of the plurality of user devices, based on the total latency value, the highest latency value, and the respective latency value, a respective latency offset value. For example, the respective latency offset value may be determined by multiplying the highest latency value by the total latency value and then subtracting the respective latency value from the result.
[0051] At 440, determining, based on the highest latency value, an output time. For example, the computing device may output audio content after a period of time equal to the highest latency value multiplied by the total latency value. The output time may be and/or be determined and/or based on any time.
[0052] At 450, causing each user device of the plurality of user devices to output audio content at a respective output time, wherein the respective output time is based on the output time and the respective latency offset value. Causing each user device of the plurality of user devices to output the audio content at the respective output time may include sending notification to the user device, wherein the notification comprises an instruction to begin a local output of the audio content at the respective output time.
[0053] At 460, receiving, from each user device of the plurality of user device, additional video content, wherein the additional video content indicates content that corresponds to the audio content output at the respective output time. The method 400 may include synchronizing, based on the respective latency offset values for each user device of the plurality of user devices, the additional video content received from each user device of the plurality of user devices.
[0054] At 470, causing output of composite content, wherein the composite content comprises the additional video content received from each user device of the plurality of user devices and the audio content. Causing the output of the composite content may include publishing the composite content to at least one of a video hosting service, video sharing service, and a social media service.
[0055] The method 400 may further include causing, based on one or more signals received from at least one user device of the plurality of user devices, at least a portion of the composite content to be obscured.
[0056] The method 400 may further include modifying, based on one or more signals received from at least one user device of the plurality of user devices, an audio level associated with at least a portion of the composite content.
[0057] The method 400 may further include: receiving from a network device (e.g., audience device, etc.), media content (e.g., audience generated content, video clips, sound clips, etc.); and updating, based on the media content, at least a portion of the composite content.
[0058] FIG. 5 shows a flowchart of a method 500 for synchronizing multimedia. A computing device (e.g., a server, a cloud-based device, a network device, a timing controller, a multimedia stream compositor, etc.) may combine a locally-playing and variably-delayed media, such as a background audio track, with variably-delayed multimedia streams from each user device (e.g., computing device, smart device, mobile device, multimedia capturing device, etc.) of a plurality of user devices to produce a real-time composite multimedia performance. The computing device may use data/information (e.g., visual markers, pixels, metadata, etc.) embedded in a multimedia stream from each user device of the plurality of user devices to determine timing information and/or latency information associated with the user device, and use the timing and/or latency information to synchronize the various multimedia streams and generate the composite multimedia performance.
[0059] For example, the method 500 may include, at 510, determining a respective latency value for each user device (e.g., computing device, smart device, mobile device, multimedia capturing device, etc.) of a plurality of user devices. For example, a computing device (e.g., a server, a cloud-based device, a network device, a timing controller, a multimedia stream compositor, etc.) may determine, for each user device of the plurality of user devices, based on metadata received from the user device, a respective latency value. Determining, for each user device of the plurality of user devices, the respective latency may include: sending a signal to the user device; and receiving, based on the signal, video embedded with the metadata, wherein the latency value indicates an amount of time elapsed before the one or more pixels of the video content are received. [0060] At 520, determining, based on the respective latency values for each user device of the plurality of user devices, a total latency value and a highest latency value. The total latency value may be determined based on at least one of an aggregation of the respective latency values for each user device of the plurality of user devices, and an average of the respective latency values for each user device of the plurality of user devices.
[0061] At 530, determining, for each user device of the plurality of user devices, based on the total latency value, the highest latency value, and the respective latency value, a respective latency offset value. For example, the respective latency offset value may be determined by multiplying the highest latency value by the total latency value and then subtracting the respective latency value from the result.
[0062] At 540, determining, based on the highest latency value, an output time. For example, the computing device may output audio content after a period of time equal to the highest latency value multiplied by the total latency value. The output time may be and/or be determined and/or based on any time.
[0063] At 550, causing each user device of the plurality of user devices to output audio content at a respective output time, wherein the respective output time is based on the output time and the respective latency offset value. Causing each user device of the plurality of user devices to output the audio content at the respective output time may include sending notification to the user device, wherein the notification comprises an instruction to begin a local output of the audio content at the respective output time.
[0064] At 560, receiving, from each user device of the plurality of user device, additional video content, wherein the additional video content indicates content that corresponds to the audio content output at the respective output time. The method 500 may include synchronizing, based on the respective latency offset values for each user device of the plurality of user devices, the additional video content received from each user device of the plurality of user devices.
[0065] At 570, causing output of composite content, wherein the composite content comprises the additional video content received from each user device of the plurality of user devices and the audio content. Causing the output of the composite content may include publishing the composite content to at least one of a video hosting service, video sharing service, and a social media service.
[0066] The method 500 may further include causing, based on one or more signals received from at least one user device of the plurality of user devices, at least a portion of the composite content to be obscured.
[0067] The method 500 may further include modifying, based on one or more signals received from at least one user device of the plurality of user devices, an audio level associated with at least a portion of the composite content.
[0068] The method 500 may further include: receiving from a network device (e.g., audience device, etc.), media content (e.g., audience generated content, video clips, sound clips, etc.); and updating, based on the media content, at least a portion of the composite content.
[0069] FIG.6 shows a system 600 for synchronizing multimedia. Any device and/or component described herein may be a computer 601 as shown in FIG. 6.
[0070] The computer 601 may comprise one or more processors 603, a system memory 612, and a bus 613 that couples various components of the computer 601 including the one or more processors 603 to the system memory 612. In the case of multiple processors 603, the computer 601 may utilize parallel computing.
[0071] The bus 613 may comprise one or more of several possible types of bus structures, such as a memory bus, memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. [0072] The computer 601 may operate on and/or comprise a variety of computer- readable media (e.g., non-transitory). Computer-readable media may be any available media that is accessible by the computer 601 and comprises, non-transitory, volatile and/or non-volatile media, removable and non-removable media. The system memory 612 has computer-readable media in the form of volatile memory, such as random access memory (RAM), and/or non-volatile memory, such as read-only memory (ROM). The system memory 612 may store data such as multimedia synchronization data 607 and/or program modules such as operating system 605 and multimedia synchronization software 606 that are accessible to and/or are operated on by the one or more processors 603.
[0073] The computer 601 may also comprise other removable/non-removable, volatile/non-volatile computer storage media. The mass storage device 604 may provide non-volatile storage of computer code, computer-readable instructions, data structures, program modules, and other data for the computer 601. The mass storage device 604 may be a hard disk, a removable magnetic disk, a removable optical disk, magnetic cassettes or other magnetic storage devices, flash memory cards, CD-ROM, digital versatile disks (DVD) or other optical storage, random access memories (RAM), read-only memories (ROM), electrically erasable programmable read-only memory (EEPROM), and the like.
[0074] Any number of program modules may be stored on the mass storage device 604. An operating system 605 and multimedia synchronization software 606 may be stored on the mass storage device 604. One or more of the operating system 605 and multimedia synchronization software 606 (or some combination thereol) may comprise program modules and the multimedia synchronization software 606. multimedia synchronization data 607 may also be stored on the mass storage device 604. Multimedia synchronization data 607 may be stored in any of one or more databases known in the art. The databases may be centralized or distributed across multiple locations within the network 615.
[0075] A user may enter commands and information into the computer 601 via an input device (not shown). Such input devices comprise, but are not limited to, a keyboard, pointing device (e.g., a computer mouse, remote control), a microphone, a joystick, a scanner, tactile input devices such as gloves, and other body coverings, motion sensor, and the like These and other input devices may be connected to the one or more processors 603 via a human-machine interface 602 that is coupled to the bus 613, but may be connected by other interface and bus structures, such as a parallel port, game port, an IEEE 1394 Port (also known as a Firewire port), a serial port, network adapter 608, and/or a universal serial bus (USB).
[0076] A display device 611 may also be connected to the bus 613 via an interface, such as a display adapter 609. It is contemplated that the computer 601 may have more than one display adapter 609 and the computer 601 may have more than one display device 611. A display device 611 may be a monitor, an LCD (Liquid Crystal Display), light-emitting diode (LED) display, television, smart lens, smart glass, and / or a projector. In addition to the display device 611, other output peripheral devices may comprise components such as speakers (not shown) and a printer (not shown) which may be connected to the computer 601 via Input/Output Interface 610. Any step and/or result of the methods may be output (or caused to be output) in any form to an output device. Such output may be any form of visual representation, including, but not limited to, textual, graphical, animation, audio, tactile, and the like. The display 611 and computer 601 may be part of one device, or separate devices.
[0077] The computer 601 may operate in a networked environment using logical connections to one or more remote computing devices 614a, b,c. A remote computing device 614a, b,c may be a personal computer, computing station (e.g., workstation), portable computer (e.g., laptop, mobile phone, tablet device), smart device (e.g., smartphone, smartwatch, activity tracker, smart apparel, smart accessory), security and/or monitoring device, a server, a router, a network computer, a peer device, edge device or other common network nodes, and so on. Logical connections between the computer 601 and a remote computing device 614a,b,c may be made via a network 615, such as a local area network (LAN) and/or a general wide area network (WAN). Such network connections may be through a network adapter 608. A network adapter 608 may be implemented in both wired and wireless environments. Such networking environments are conventional and commonplace in dwellings, offices, enterprise wide computer networks, intranets, and the Internet.
[0078] Application programs and other executable program components such as the operating system 605 are shown herein as discrete blocks, although it is recognized that such programs and components may reside at various times in different storage components of the computing device 601, and are executed by the one or more processors 603 of the computer 601. An implementation of multimedia synchronization software 606 may be stored on or sent across some form of computer-readable media. Any of the disclosed methods may be performed by processor-executable instructions embodied on computer-readable media.
[0079] While specific configurations have been described, it is not intended that the scope be limited to the particular configurations set forth, as the configurations herein are intended in all respects to be possible configurations rather than restrictive.
[0080] Unless otherwise expressly stated, it is in no way intended that any method set forth herein be construed as requiring that its steps be performed in a specific order. Accordingly, where a method claim does not actually recite an order to be followed by its steps or it is not otherwise specifically stated in the claims or descriptions that the steps are to be limited to a specific order, it is no way intended that an order be inferred, in any respect. This holds for any possible non-express basis for interpretation, including: mahers of logic with respect to arrangement of steps or operational flow; plain meaning derived from grammatical organization or punctuation; the number or type of configurations described in the specification. [0081] It will be apparent to those skilled in the art that various modifications and variations may be made without departing from the scope or spirit. Other configurations will be apparent to those skilled in the art from consideration of the specification and practice described herein. It is intended that the specification and described configurations be considered as exemplary only, with a true scope and spirit being indicated by the following claims.

Claims

CLAIMS What is claimed is:
1. A method comprising: determining, for each user device of a plurality of user devices, based on one or more pixels of video content received from the user device, a respective latency value; determining, based on the respective latency values for each user device of the plurality of user devices, a total latency value and a highest latency value; determining, for each user device of the plurality of user devices, based on the total latency value, the highest latency value, and the respective latency value, a respective latency offset value; determining, based on the highest latency value, an output time; causing each user device of the plurality of user devices to output audio content at a respective output time, wherein the respective output time is based on the output time and the respective latency offset value; receiving, from each user device of the plurality of user device, additional video content, wherein the additional video content indicates content that corresponds to the audio content output at the respective output time; and causing output of composite content, wherein the composite content comprises the additional video content received from each user device of the plurality of user devices and the audio content.
2. The method of claim 1, wherein determining, for each user device of the plurality of user devices, the respective latency value comprises: sending a signal to the user device; and receiving, based on the signal, the one or more pixels of the video content, wherein the latency value indicates an amount of time elapsed before the one or more pixels of the video content are received.
3. The method of claim 2, wherein receiving the one or more pixels of the video content comprises: determining a region of interest (ROI) within the video content received from the user device; and determining a change in a color associated with the one or more pixels within the ROI.
4. The method of claim 1, wherein the total latency value is based on at least one of an aggregation of the respective latency values for each user device of the plurality of user devices, or an average of the respective latency values for each user device of the plurality of user devices.
5. The method of claim 1, wherein causing each user device of the plurality of user devices to output the audio content at the respective output time comprises sending notification to the user device, wherein the notification comprises an instruction to begin a local output of the audio content at the respective output time.
6. The method of claim 1, wherein causing the output of the composite content comprises publishing the composite content to at least one of a video hosting service, video sharing service, and a social media service.
7. The method of claim 1, further comprising synchronizing, based on the respective latency offset values for each user device of the plurality of user devices, the additional video content received from each user device of the plurality of user devices.
8. The method of claim 1, further comprising causing, based on one or more signals received from at least one user device of the plurality of user devices, at least a portion of the composite content to be obscured.
9. The method of claim 1, further comprising modifying, based on one or more signals received from at least one user device of the plurality of user devices, an audio level associated with at least a portion of the composite content.
10. The method of claim 1, further comprising: receiving from a network device, media content; and updating, based on the media content, at least a portion of the composite content.
11. One or more computer-readable media storing processor-executable instructions that, when executed by at least one processor, cause the at least one processor to: determine, for each user device of a plurality of user devices, based on one or more pixels of video content received from the user device, a respective latency value; determine, based on the respective latency values for each user device of the plurality of user devices, a total latency value and a highest latency value; determine, for each user device of the plurality of user devices, based on the total latency value, the highest latency value, and the respective latency value, a respective latency offset value; determine, based on the highest latency value, an output time; cause each user device of the plurality of user devices, based on the respective latency offset values, to output audio content at a respective output time; receive, from each user device of the plurality of user device, additional video content, wherein the additional video content indicates content that corresponds to the audio content output at the respective output time; and cause output of composite content, wherein the composite content comprises the additional video content received from each user device of the plurality of user devices and the audio content.
12. The one or more computer-readable media of claim 11, wherein the processor- executable instructions that cause the at least one processor to determine, for each user device of the plurality of user devices, the respective latency value further cause the at least one processor to: send a signal to the user device; and receive, based on the signal, the one or more pixels of the video content, wherein the latency value indicates an amount of time elapsed before the one or more pixels of the video content are received.
13. The one or more computer-readable media of claim 12, wherein the processor- executable instructions that cause the at least one processor to receive the one or more pixels of the video content further cause the at least one processor to: determine a region of interest (ROI) within the video content received from the user device; and determine a change in a color associated with the one or more pixels within the ROI.
14. The one or more computer-readable media of claim 11, wherein the total latency value is based on at least one of an aggregation of the respective latency values for each user device of the plurality of user devices, or an average of the respective latency values for each user device of the plurality of user devices.
15. A method comprising: determining, for each user device of a plurality of user devices, based on metadata received from the user device, a respective latency value; determining, based on the respective latency values for each user device of the plurality of user devices, a total latency value and a highest latency value; determining, for each user device of the plurality of user devices, based on the total latency value, the highest latency value, and the respective latency value, a respective latency offset value; determining, based on the highest latency value, an output time; causing each user device of the plurality of user devices, based on the respective latency offset values, to output audio content at a respective output time; receiving, from each user device of the plurality of user device, video content, wherein the video content indicates content that corresponds to the audio content output at the respective output time; and causing output of composite content, wherein the composite content comprises the video content received from each user device of the plurality of user devices and the audio content.
16. The method of claim 15, wherein determining, for each user device of the plurality of user devices, the respective latency comprises: sending a signal to the user device; and receiving, based on the signal, video embedded with the metadata, wherein the latency value indicates an amount of time elapsed before the video embedded with the metadata is received.
17. The method of claim 15, wherein the total latency value is based on at least one of an aggregation of the respective latency values for each user device of the plurality of user devices, or an average of the respective latency values for each user device of the plurality of user devices.
18. The method of claim 15, wherein causing each user device of the plurality of user devices to output the audio content at the respective output time comprises sending notification to the user device, wherein the notification comprises an instruction to begin a local output of the audio content at the respective output time.
19. The method of claim 15, wherein causing the output of the composite content comprises publishing the composite content to at least one of a video hosting service, video sharing service, and a social media service.
20. The method of claim 15, further comprising synchronizing, based on the respective latency offset values for each user device of the plurality of user devices, the additional video content received from each user device of the plurality of user devices.
PCT/US2021/034558 2020-05-27 2021-05-27 Methods and systems for synchronizing multimedia WO2021243044A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202063030424P 2020-05-27 2020-05-27
US63/030,424 2020-05-27

Publications (1)

Publication Number Publication Date
WO2021243044A1 true WO2021243044A1 (en) 2021-12-02

Family

ID=78722787

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2021/034558 WO2021243044A1 (en) 2020-05-27 2021-05-27 Methods and systems for synchronizing multimedia

Country Status (1)

Country Link
WO (1) WO2021243044A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114500572A (en) * 2022-02-21 2022-05-13 广州慧联网络科技有限公司 Multi-device synchronization method and device, electronic device and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040162041A1 (en) * 2003-02-18 2004-08-19 Ferguson Kevin M. Automatically setting gain and offset based on region of interest
US20130003864A1 (en) * 2011-06-30 2013-01-03 Microsoft Corporation Reducing latency in video encoding and decoding
US20150208095A1 (en) * 2012-06-29 2015-07-23 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Video data stream concept

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040162041A1 (en) * 2003-02-18 2004-08-19 Ferguson Kevin M. Automatically setting gain and offset based on region of interest
US20130003864A1 (en) * 2011-06-30 2013-01-03 Microsoft Corporation Reducing latency in video encoding and decoding
US20150208095A1 (en) * 2012-06-29 2015-07-23 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Video data stream concept

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114500572A (en) * 2022-02-21 2022-05-13 广州慧联网络科技有限公司 Multi-device synchronization method and device, electronic device and storage medium
CN114500572B (en) * 2022-02-21 2024-04-09 广州慧联网络科技有限公司 Multi-device synchronization method and device, electronic device and storage medium

Similar Documents

Publication Publication Date Title
WO2021082668A1 (en) Bullet screen editing method, smart terminal, and storage medium
US9485547B2 (en) Application triggering
US10419510B2 (en) Selective capture with rapid sharing of user or mixed reality actions and states using interactive virtual streaming
US20140096169A1 (en) Playback synchronization in a group viewing a media title
BR112019024004A2 (en) TEMPORAL ARRANGEMENT OF A TEMPORARY REHABILITATION EVENT
US10284883B2 (en) Real-time data updates from a run down system for a video broadcast
WO2021243044A1 (en) Methods and systems for synchronizing multimedia
US20230336838A1 (en) Graphically animated audience
US11165842B2 (en) Selective capture with rapid sharing of user or mixed reality actions and states using interactive virtual streaming
US20230326489A1 (en) Generation of visual effects based on text
US11785278B1 (en) Methods and systems for synchronization of closed captions with content output
TWI482470B (en) Digital signage playback system, real-time monitoring system, and real-time monitoring method thereof
CN107852523B (en) Method, terminal and equipment for synchronizing media rendering between terminals
US11842190B2 (en) Synchronizing multiple instances of projects
US20230410396A1 (en) Audio or visual input interacting with video creation
US12002491B2 (en) Visual effect design using multiple preview windows
US20230215465A1 (en) Visual effect design using multiple preview windows
US11869098B2 (en) Methods and systems for presenting information
US11928078B2 (en) Creating effect assets while avoiding size inflation
US11763849B1 (en) Automatic and fast generation of music audio content for videos
US12010161B1 (en) Browser-based video production
WO2024052964A1 (en) Video synchronization device, video synchronization method, and video synchronization program
US20230402068A1 (en) Voice-controlled content creation
US20230412885A1 (en) Automatic identification of video series
AU2016225817B2 (en) Embedded appliance for multimedia capture

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21814493

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21814493

Country of ref document: EP

Kind code of ref document: A1