US20220360843A1 - Audio and Video Synchronization - Google Patents
Audio and Video Synchronization Download PDFInfo
- Publication number
- US20220360843A1 US20220360843A1 US17/866,622 US202217866622A US2022360843A1 US 20220360843 A1 US20220360843 A1 US 20220360843A1 US 202217866622 A US202217866622 A US 202217866622A US 2022360843 A1 US2022360843 A1 US 2022360843A1
- Authority
- US
- United States
- Prior art keywords
- encode
- video
- audio
- post
- synchronization
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/4302—Content synchronisation processes, e.g. decoder synchronisation
- H04N21/4307—Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen
- H04N21/43072—Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen of multiple content streams on the same device
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/242—Synchronization processes, e.g. processing of PCR [Program Clock References]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/434—Disassembling of a multiplex stream, e.g. demultiplexing audio and video streams, extraction of additional data from a video stream; Remultiplexing of multiplex streams; Extraction or processing of SI; Disassembling of packetised elementary stream
- H04N21/4341—Demultiplexing of audio and video streams
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/439—Processing of audio elementary streams
- H04N21/4394—Processing of audio elementary streams involving operations for analysing the audio stream, e.g. detecting features or characteristics in audio streams
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
- H04N21/44008—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/81—Monomedia components thereof
- H04N21/8146—Monomedia components thereof involving graphical data, e.g. 3D object, 2D graphics
- H04N21/8153—Monomedia components thereof involving graphical data, e.g. 3D object, 2D graphics comprising still images, e.g. texture, background image
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/83—Generation or processing of protective or descriptive data associated with content; Content structuring
- H04N21/835—Generation of protective data, e.g. certificates
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/85—Assembly of content; Generation of multimedia applications
- H04N21/854—Content authoring
- H04N21/8547—Content authoring involving timestamps for synchronizing content
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/236—Assembling of a multiplex stream, e.g. transport stream, by combining a video stream with other content or additional data, e.g. inserting a URL [Uniform Resource Locator] into a video stream, multiplexing software data into a video stream; Remultiplexing of multiplex streams; Insertion of stuffing bits into the multiplex stream, e.g. to obtain a constant bit-rate; Assembling of a packetised elementary stream
- H04N21/2368—Multiplexing of audio and video streams
Definitions
- AV content audio-video content
- AV content Worldwide consumption of audio-video content (also known as “AV content”) has been on a steady increase year over year. The advent of streaming services, improved Internet access capabilities, and device mobility have been strong drivers of this increase. Paramount to the user experience is the quality of the AV content. Users typically focus on the resolution of the video image, the frame rate, the presence of visual artifacts, and general audio quality. Users can become accustomed to temporary changes in any of these parameters. When the audio and video become out of synchronization, however, users may observe inconsistencies between when a person or character speaks and when their lips move. This occurrence is known as AV desynchronization, or what is colloquially known as a “lip sync” issue.
- an audio-video (“AV”) synchronization system can simultaneously capture samples of a pre-encode media stream and a post-encode media stream.
- the pre-encode media stream can include AV content prior to being encoded.
- the post-encode media stream can include the AV content after being encoded.
- the AV synchronization system can align a pre-encode video component of the pre-encode media stream with a post-encode video component.
- the AV synchronization system can determine a video offset between the pre-encode video component and the post-encode video component.
- the AV synchronization system can align a pre-encode audio component of the pre-encode media stream with a post-encode audio component of the post-encode media stream.
- the AV synchronization system can determine an audio offset between the pre-encode audio component and the post-encode audio component.
- the AV synchronization system can then compare the video offset and the audio offset to determine if the post-encode media stream is synchronized with the pre-encode media stream.
- the AV synchronization system can align the pre-encode video component of the pre-encode media stream with the post-encode video component, and can align the pre-encode audio component of the pre-encode media stream with the post-encode audio component of the post-encode media stream during parallel processes. In some embodiments, the AV synchronization system also can determine the video offset between the pre-encode video component and the post-encode video component, and can determine the audio offset between the pre-encode audio component and the post-encode audio component during parallel processes.
- the AV synchronization system can align the pre-encode video component of the pre-encode media stream with the post-encode video component, and can determine the video offset between the pre-encode video component and the post-encode video component via execution of a non-annotated video processing algorithm module.
- the AV synchronization system can execute the non-annotated video processing algorithm module to: generate, from the pre-encode video component and the post-encode video component, a plurality of thumbnail images; determine a plurality of search ranges for an iterative search process used to find a first alignment point between the pre-encode video component and the post-encode video component; compare the thumbnail images to determine a plurality of distance values; determine a second alignment point between the pre-encode video component and the post-encode video component, wherein the second alignment point is where a distance value of the plurality of distance values is minimized; determine the video offset based upon the first alignment point and the second alignment point; and output the video offset.
- the AV synchronization system can align the pre-encode audio component of the pre-encode media stream with the post-encode audio component of the post-encode media stream, and can determine the audio offset between the pre-encode audio component and the post-encode audio component via execution of an audio-video synchronization script module.
- the AV synchronization system can execute the audio-video synchronization script module to: divide the pre-encode audio component and the post-encode audio component into a plurality of time slices, wherein each time slice of the plurality of time slices is associated with a corresponding video frame; generate acoustic fingerprints based upon the plurality of time slices; perform fingerprint matching using the acoustic fingerprints and determining the audio offset therefrom; compare the audio offset and the video offset; determine, based upon comparing the audio offset and the video offset, whether or not the pre-encode media stream and the post-encode media stream are synchronized; and output an audio-visual synchronization evaluation result including an indication of whether or not the pre-encode media stream and the post-encode media stream are synchronized.
- FIG. 1 is a block diagram illustrating an AV synchronization system in which aspects of the concepts and technologies disclosed herein can be implemented.
- FIG. 2A is a diagram illustrating synchronized pre-encode and post-encode AV content, according to an illustrative embodiment.
- FIG. 2B is a diagram illustrating de-synchronized pre-encode and post-encode AV content with audio ahead of video, according to an illustrative embodiment.
- FIG. 2C is a diagram illustrating de-synchronized pre-encode and post-encode AV content with audio behind video, according to an illustrative embodiment.
- FIG. 3 is a diagram illustrating an example translation of audio frames to video frames, according to an illustrative embodiment.
- FIGS. 4A and 4B are diagrams illustrating an example of a micro-iteration iterative search, according to an illustrative embodiment.
- FIG. 5 is a diagram illustrating an example of a macro-iteration iterative search, according to an illustrative embodiment.
- FIG. 6 is a flow diagram illustrating aspects of a method for determining whether pre-encode and post-encode media streams are synchronized, according to an illustrative embodiment.
- FIG. 7 is a flow diagram illustrating aspects of another method for determining whether pre-encode and post-encode media streams are synchronized, according to an illustrative embodiment.
- FIG. 8 is a block diagram illustrating an example computer system capable of implementing aspects of the embodiments presented herein.
- FIG. 9 is a block diagram illustrating an example containerized cloud architecture and components thereof capable of implementing aspects of the embodiments presented herein.
- FIG. 10 is a block diagram illustrating an example virtualized cloud architecture and components thereof capable of implementing aspects of the embodiments presented herein.
- FIG. 11 is a block diagram illustrating an example mobile device capable of implementing aspects of the embodiments disclosed herein.
- FIG. 12 is a diagram illustrating a network, according to an illustrative embodiment.
- FIG. 13 is a diagram illustrating a machine learning system, according to an illustrative embodiment.
- the concepts and technologies disclosed herein focus on video encoders as the first possible source of synchronization problems.
- the concepts and technologies disclosed herein provide a full reference-based analysis to compare alignment results from content pre-encode and post-encode to determine if the video encoder introduced any synchronization issues.
- a distinct advantage to this approach versus other solutions is the ability to detect if AV desynchronization has occurred without relying on talking heads and “lip-reading” algorithms.
- streaming service providers that have access to the pre-encode also referred to herein as “reference”) video also can determine if the desynchronization has caused the audio to be “behind” or “ahead” of the video.
- the concepts and technologies disclosed herein focus, in part, on the ability to accurately and reliably detect when audio and video from post-encode media streams fail to retain synchronization with their pre-encode media stream counterparts. This can be achieved by simultaneously capturing samples of both the pre-encode and post-encode media streams, and then separately aligning the audio and video to determine if the offsets between video pre-encode and post-encode reflect the offsets between the audio pre-encode and post-encode.
- the disclosed solution is a software tool that can reside in the same pod as the encoder(s) (i.e., for KUBERNETES deployments in which a pod is one or more KUBERNETES containers; see FIG. 9 for an example containerized cloud architecture).
- the software tool can cycle through all channels of a given pod to determine if the audio is synchronized with the video.
- the software tool can ingest a few seconds (e.g., 6-8 seconds) of AV content from a multi-cast source being sent by video providers (i.e., reference AV content), and can compare this reference segment to one segment of encoded AV content to determine if the audio and video are synchronized.
- the targeted runtime for each channel validation is 30 seconds, although this target may be changed based upon the needs of a given implementation.
- the software tool is designed to run continuously. If a problem occurs, the software tool can report the problem to other platforms that can trigger alarms and perform the appropriate corrective actions to address the problem.
- program modules include routines, programs, components, data structures, computer-executable instructions, and/or other types of structures that perform particular tasks or implement particular abstract data types.
- program modules include routines, programs, components, data structures, computer-executable instructions, and/or other types of structures that perform particular tasks or implement particular abstract data types.
- program modules include routines, programs, components, data structures, computer-executable instructions, and/or other types of structures that perform particular tasks or implement particular abstract data types.
- the subject matter described herein may be practiced with other computer systems, including hand-held devices, mobile devices, wireless devices, multiprocessor systems, distributed computing systems, microprocessor-based or programmable consumer electronics, minicomputers, mainframe computers, routers, switches, other computing devices described herein, and the like.
- the AV synchronization system 100 can be implemented, at least in part, in a computer system, such as an example computer system 800 that is illustrated and described with reference to FIG. 8 .
- the AV synchronization system 100 alternatively can be implemented, at least in part, in a containerized architecture, such as an example containerized cloud architecture 900 that is illustrated and described herein with reference to FIG. 9 .
- the AV synchronization system 100 can be implemented, at least in part, in a virtualized cloud architecture, such as an example virtualized cloud architecture 1000 that is illustrated and described herein with reference to FIG. 10 .
- aspects of the AV synchronization system 100 may be implemented, at least in part, through the use of machine learning technologies, such as via an example machine learning system 1300 that is illustrated and described herein with reference to FIG. 13 .
- machine learning technologies such as via an example machine learning system 1300 that is illustrated and described herein with reference to FIG. 13 .
- the AV synchronization system 100 can be deployed in various ways on different architectures based upon the needs of a given implementation. Accordingly, the examples set forth herein should not be construed as being limiting to the manner in which the AV synchronization system 100 is implemented.
- the AV synchronization system 100 can receive a pre-encode AV content 102 (also referred to herein, at times, as “reference AV content”) from a video provider or other source.
- the pre-encode AV content 102 can be received via a unicast or multi-cast source, although the latter is more likely in many real-world implementations.
- the pre-encode AV content 102 can include a pre-encode media stream that includes both a reference audio sequence and a reference video sequence (i.e., a reference sequence of audio frames and a reference sequence of video frames) prior to any encoding by one or more AV content encoders 104 .
- the pre-encode AV content 102 has no change from the original quality, or is the source AV content itself.
- the AV content encoder(s) 104 can include any audio encoder(s) that utilize any audio codec based on the needs of a video server provider to encode the audio data of the pre-encode AV content 102 prior to distribution.
- the AV content encoder(s) 104 can include any video encoder(s) that utilize any video codec based on the needs of the video service provider to encode the video data of the pre-encode AV content 102 prior to distribution.
- the audio and video data portions of the pre-encode AV content 102 are synchronized such that the correct audio frame(s) is played during the correct video frame(s) as intended.
- the pre-encode AV content 102 can include real-time live streamed content as well as other media such as, but not limited to, video files.
- the AV synchronization system 100 can also receive a post-encode AV content 106 (also referred to herein, at times, as “distorted AV content”).
- the post-encode AV content 106 can include a media stream that includes both a distorted audio sequence and a distorted video sequence (i.e., a distorted sequence of audio frames and a distorted sequence of video frames) after encoding by one or more of the AV content encoders 104 .
- the post-encode AV content 106 exhibits variance in quality, frame rate, playback delay, or some other alteration from the source AV content.
- the post-encode AV content 106 can be compared against the reference AV content 102 to establish if a change to the AV synchronization found has occurred post-encode. It should be understood that the post-encode AV content 106 can include real-time live streamed content as well as other media such as, but not limited to, video files.
- a “video frame” is a single second of video data for 1/video-framerate.
- one video frame for a 29.97 frame-per-second (“FPS”) video would be 1/29.97 of that video sequence, or approximately 33.37 milliseconds (“ms”) in terms of time.
- ms milliseconds
- a video frame is the contents of a frame buffer.
- Video frames are used herein as a proxy for units of time because the output from the disclosed video alignment algorithm (described below) is purely in video frames, which each refer to actual images or frame buffer contents.
- the duration of one video frame is approximately 16.67 ms.
- the duration of one video frame would be twice as long—33.37 ms.
- an “audio frame” is the data of an audio file (e.g., a .wav file or other audio file) that is rendered once per sample at an audio sampling rate, such as, for example, 44100 Hertz (“Hz”).
- An audio frame encompasses any sub-frames for each audio channel (e.g., left, right, center, etc.).
- an “alignment” refers to the process of matching two distinct sequences of audio or video on a frame-by-frame basis (examples shown in FIGS. 2A-2C ).
- the matching process is per video frame.
- the matching process is based upon the audio data per video frame. For example, if video frame 1 in the distorted video sequence is determined to be equivalent to video frame 100 in the reference video sequence, an alignment is found at that point.
- a “temporal distance” between aligned points in the reference sequences and the distorted sequences is how far apart the sequences are from playing the same content.
- the “temporal distance” is also referred to herein as the “offset.”
- the offset between the reference and distorted audio sequences i.e., the “audio offset”
- the offset between the reference and distorted video sequences i.e., the “video offset”
- the sequences are referred to as being desynchronized, and the offsets can be used to determine by how much time the sequences are desynchronized.
- the reference AV content 102 and the distorted AV content 106 are provided to a multimedia framework 107 .
- the multimedia framework 107 may be proprietary or open-source.
- the multimedia framework 107 can be implemented via FFmpeg, which is a multimedia framework available under the GNU Lesser General Public License (“LGPL”) version 2.1 or later and the GNU General Public License (“GPL”) version 2 or later.
- LGPL General Public License
- GPL GNU General Public License
- the multimedia framework 107 can provide functionality such as decoding, encoding, transcoding, multiplexing, demultiplexing, streaming, filtering, and playback.
- the multimedia framework 107 can include the AV content encoder(s) 104 as shown, although the AV content encoder(s) 104 may be separate from the multimedia framework 107 .
- the multimedia framework 107 streams the pre-encode AV content 102 and the post-encode AV content 106 and performs demultiplexing via demultiplexers 108 A, 108 B (shown as “demux 108 A” and “demux 108 B”) to separate the audio and video sequences.
- the demux 108 A can demultiplex the pre-encode AV content 102 to create a pre-encode video sequence (shown as “pre-encode video”) 110 and a pre-encode audio sequence (shown as “pre-encode audio”) 112 .
- the demux 108 B can demultiplex the post-encode AV content 106 to create a post-encode video sequence (shown as “post-encode video”) 114 and a post-encode audio sequence (shown as “post-encode audio”) 116 .
- the demux 108 A can output the pre-encode video 110 to a first video pipe (“video pipe 1 118 A”) and the pre-encode audio 112 to a first audio pipe (“audio pipe 1 ”) 120 A.
- the demux 108 B can output the post-encode video 114 to a second video pipe (“video pipe 2 118 B”) and the post-encode audio 116 to a second audio pipe (“audio pipe 2 ”) 120 B.
- the video pipes 118 A, 118 B can feed into a non-annotated video processing algorithm module (“NAVPAM”) 122 .
- NAVPAM non-annotated video processing algorithm module
- the NAVPAM 122 can analyze two simultaneous video image sequence captures of the same content, such as the pre-encode video 110 and the post-encode video 114 .
- the NAVPAM 122 addresses the complex problem of divergent video content alignment.
- the output of the NAVPAM 122 is a video offset 124 .
- the video offset 124 is equivalent to the temporal distance between aligned points in the pre-encode video 110 and the post-encode video 114 .
- the audio pipes 120 A, 120 B can feed into an audio/video synchronization script module (“AVSSM”) 126 .
- the AVSSM 126 can generate an audio offset 128 .
- the AVSSM 126 can compare the video offset 124 to the audio offset 128 to determine an AV synchronization evaluation result (shown as “result”) 130 .
- the result 130 indicates whether or not the post-encode AV content 106 (i.e., distorted) is correctly synchronized with the pre-encode AV content 102 (i.e., reference). Additional details in this regard will be described herein.
- the NAVPAM 122 and the AVSSM 126 can be separate software modules as illustrated or can be combined into one software module.
- the NAVPAM 122 and/or the AVSSM 126 can be implemented in hardware such as via a field-programmable gate array (“FPGA”).
- FPGA field-programmable gate array
- the illustrated NAVPAM 122 includes a plurality of sub-modules that perform various functions.
- the NAVPAM 122 is illustrated in this manner for ease of explanation. In practice, the NAVPAM 122 can combine the functionality of the sub-modules.
- each of the plurality of sub-modules can be a standalone software module, the output of which can be used as input for the next sub-module such as in the flow shown in the illustrated example.
- the NAVPAM 122 includes a thumbnail image generator 132 that receives the pre-encode video 110 and the post-encode video 114 via respective video pipes 118 A, 118 B, and processes the pre-encode video 110 and the post-encode video 114 into thumbnail images 134 that are suitable for processing.
- the thumbnail images 134 (also referred to as “binary” or “bi-tonal” images) are lower-resolution images with a color space that has been compressed from a 24-bit channel representation per pixel to a single bit. This results in an image that can be delineated with two colors, with the value of each pixel being either 1 or 0.
- the thumbnail image generator 132 also can discard any duplicate frames 136 .
- the thumbnail image generator 132 can provide the thumbnail images 134 to a search range determiner 138 .
- the search range determiner 138 can determine a plurality of search ranges 140 that can be used to perform an iterative search for a first alignment point between the pre-encode video 110 and the post-encode video 114 . This determination can be based upon one or more search parameters that define the granularity of the search. Additional details about two search strategies will be described below.
- the NAVPAM 122 also includes a thumbnail image comparator 142 .
- the thumbnail image comparator 142 compares pairs of the thumbnail images 134 and determines a distance value 144 for each comparison.
- the thumbnail image comparator 142 provides the distance values 144 to an alignment determiner 146 .
- the alignment determiner 146 analyzes the distance values 144 to determine an alignment where the distance between the thumbnail images 134 is minimized and outputs the video offset 124 .
- the illustrated AVSSM 126 includes a plurality of sub-modules that perform various functions.
- the AVSSM 126 is illustrated in this manner for ease of explanation. In practice, the AVSSM 126 can combine the functionality of the sub-modules.
- each of the plurality of sub-modules can be a standalone software module, the output of which can be used as input for the next sub-module such as in the flow shown in the illustrated example.
- the AVSSM 126 receives the pre-encode audio 112 and the post-encode audio 116 via the audio pipes 120 A, 120 B, respectively.
- the AVSSM 126 includes an audio-video frame correlator 148 that divides the pre-encode audio 112 and the post-encode audio 116 into time slices 150 , wherein each time slice 150 is associated with a corresponding video frame.
- the audio-video frame correlator 148 outputs the time slices 150 to an acoustic fingerprint generator 154 .
- the acoustic fingerprint generator 154 generates acoustic fingerprints 156 from the time slices 150 .
- the acoustic fingerprint generator 154 utilizes open source fingerprint extraction software, such as, for example, Chromaprint (available from the Acousticid project). Other software, including proprietary software can also be used to generate the acoustic fingerprints 156 .
- the acoustic fingerprints 156 can be used to quickly identify portions of the pre-encode audio 112 and the post-encode audio 116 during the time slices 150 .
- the AVSSM 126 also includes a fingerprint matcher 158 .
- the fingerprint matcher 158 can compare the acoustic fingerprints 156 between frames for similarity.
- the fingerprint matcher 158 utilizes fuzzy string matching.
- Many programming languages have well-developed libraries that provide fuzzy string matching functionality.
- One such library is FuzzyWuzzy for Python. FuzzyWuzzy uses Levenshtein distance to correlate two fingerprints, such as the acoustic fingerprints 156 , and return a score from 0-100 that represents how close the fingerprints are. In testing, the acoustic fingerprints 156 that score 90-100 are viable and scores 80 and below can yield inaccurate alignment results.
- Other fuzzy string matching software may require further tweaking to obtain accurate alignment results.
- the output of the fingerprint matcher 158 is the audio offset 128 in video frame numbers. Since the video offset 124 and the audio offset 128 are both known and utilize video frame numbers as a common metric, the video offset 124 and the audio offset 128 can be compared to determine whether or not the post-encode (distorted) AV content 106 is correctly synchronized with the pre-encode (reference) AV content 102 , which an alignment comparator 160 can output as the result 130 .
- NAVPAM 122 the NAVPAM 122 , the AVSSM 126 , and the various sub-modules thereof are described separately and sequentially, it should be understood that the operations performed by the NAVPAM 122 and the AVSSM 126 can be conducted in parallel and likely will be conducted parallel in real-world implementations. Accordingly, any particular order used to describe the operations performed by the NAVPAM 122 and the AVSSM 126 should not be construed as being limiting in any way. Moreover, additional details of the operations performed by the NAVPAM 122 and the AVSSM 126 will become apparent to those skilled in the art from the description of the remaining FIGURES.
- the pre-encode video 110 includes multiple reference video frames 202 A, 202 B (collectively “reference video frames 202 ”). Although only two reference video frames 202 A, 202 B are illustrated, the pre-encode video 110 can contain any number of reference video frames 202 .
- the pre-encode audio 112 includes multiple reference audio frames 204 A, 204 B (collectively “reference audio frames 204 ”).
- the post-encode video 114 includes multiple distorted video frames 206 A, 206 B (collectively “distorted video frames 206 ”). Although only two distorted video frames 206 A, 206 B are illustrated, the post-encode video 114 can contain any number of distorted video frames 206 .
- the post-encode audio 116 includes multiple distorted audio frames 208 A, 208 B (collectively “distorted audio frames 208 ”). Although only two distorted audio frames 208 A, 208 B are illustrated, the post-encode audio 116 can contain any number of distorted audio frames 208 .
- the synchronization diagram 200 A also illustrates an alignment between the reference video frame 202 B and the distorted video frame 206 B with a video offset 124 of 100 milliseconds (“ms”).
- the synchronization diagram 200 A also illustrates an alignment between the reference audio frame 204 B and the distorted audio frame 208 B with an audio offset 128 of 100 ms.
- the audio playing at each of the pre-encode video frames 202 is the same audio playing at the corresponding distorted video frames 206 .
- the pre-encode AV content 102 and the post-encode AV content 106 can be considered synchronized as shown in the synchronization diagram 200 A.
- Desynchronization is defined herein as when the audio offset 128 is not equal to the video offset 124 . If the difference between the audio offset 128 and the video offset 124 is sufficiently large, the desynchronization can be perceptible to users and can be verified by observing lip sync issues during playback. If the audio offset 128 is less than the video offset 124 , then the audio is played after it is supposed to (i.e., the audio is ahead of the video). If the audio offset 128 is more than the video offset 124 , then the audio is played before it is supposed to (i.e., the audio is behind the video). Both of these scenarios can be identified as desynchronization.
- a pre-encode capture is 100 ms ahead of a post-encode
- the audio for the pre-encode capture should be 100 ms ahead of the post-encode capture
- the video for the pre-encode capture should be 100 ms ahead of the post-encode capture. If these two offsets are different, such as, for example, the pre-encode video is ahead 100 ms, but the pre-encode audio is 125 ms ahead, then the audio and video have de-synchronized by 25 ms.
- ITU International Telecommunications Union
- ITU-R BT.1359-1 What constitutes a perceptible time difference between the audio offset 128 and the video offset 124 can be determined based upon the foregoing ITU recommendation.
- FIG. 2B a desynchronization diagram 200 B illustrating example sequences of the pre-encode AV (reference) content 102 and the post-encode (distorted) AV content 106 in desynchronization will be described, according to an illustrative embodiment.
- the example shown in FIG. 2B illustrates when the audio is played before it is supposed to, which can occur when the audio offset 128 between the post-encode audio 116 and the pre-encode audio 112 is smaller than the video offset 124 between the post-encode video 114 and the pre-encode video 110 .
- FIG. 2C another desynchronization diagram 200 C illustrating example sequences of the pre-encode AV (reference) content 102 and the post-encode (distorted) AV content 106 in desynchronization will be described, according to an illustrative embodiment.
- the example shown in FIG. 2C illustrates when the audio is played after it is supposed to, which can occur when the audio offset 128 between the post-encode audio 116 and the pre-encode audio 112 is larger than the video offset 124 between the post-encode video 114 and the pre-encode video 110 .
- Video frames function as a proxy for time or temporal displacement and can be converted back and forth between elapsed time by dividing video frames by the video frame rate (assumed constant) of the capture.
- audio data can be reconciled in a congruent manner; but instead of video frame rate, the audio frame rate is the sampling rate.
- the number of audio frames per second in an audio file sampled at 48 kilohertz (“kHz”) is 48,000.
- FIG. 3 a diagram 300 illustrates this concept.
- FIG. 3 illustrates video frames 302 from video frame 1 to video frame 60 and corresponding audio frames 304 from 800 to 48,000 in increments of 800.
- an index of the audio frame (“ax”) can be extrapolated for video frame (“N”) as a function of the video frame rate (“vFPS”) and the audio sampling rate (“aSR”):
- aL aSR vFPS ⁇ vFP .
- the fractional portion of the values can be discarded because these values are not valid indices.
- 131265 frames are retrieved for the fingerprint.
- Fuzzy string matching can be used to compare fingerprint data.
- Many programming languages have well-developed libraries that provide fuzzy string matching functionality.
- One such library is FuzzyWuzzy for Python.
- FuzzyWuzzy uses Levenshtein distance to correlate two fingerprints and return a score from 0-100 that represents how close the fingerprints are. In testing, fingerprints that score 90-100 are viable and scores 80 and below can yield inaccurate alignment results.
- Other fuzzy string matching software may require further tweaking to obtain accurate alignment results.
- FuzzyWuzzy is described herein, other solutions that provide fuzzy string matching can be implemented without departing from the scope of this disclosure.
- the granularity of a search through the audio frame data is equal to the audio sampling rate divided by the video frame rate.
- the granularity of the search through the audio data was 800 frames wide.
- FIGS. 4A and 4B diagrams 400 A, 400 B illustrating an example of a micro-iteration iterative search will be described, according to an illustrative embodiment.
- “micro-iteration” and “macro-iteration” are used to differentiate iterative searches based on different step sizes in terms of frames. Performing an exhaustive search is slow, but has the advantage of being simple to implement and also suitable, if not ideal, for parallelization.
- the diagram 400 A illustrates a distorted audio sequence 402 , such as the distorted audio 116 .
- the diagram 400 B illustrates a reference audio sequence 404 .
- the distorted audio sequence 402 and the reference audio sequence 404 have a fingerprint size of 164.
- search range in this example is limited to max_search_range of 800 video frames (i.e., 13.3 seconds at 60 FPS) for the following example:
- Fingerprints can then be generated for the two ranges being compared, and the correlation score can be compared. If a match is found, the search can end short of reaching the max_search_range. Micro-iteration can be terminated and the process can proceed to macro-iteration. After macro-iteration, the process can be repeated to find the next alignment point past the ranges through which macro-iteration occurred. Since iterating over the audio sequence tends to retrieve the same set of audio data from the reference sequence (i.e., iteration 1 and 801 access the same reference data), caching this data can improve performance.
- both audio sequences will be 100% different, and the entire reference sequence should be iteratively searched. This could potentially be rectified by generating and comparing two fingerprints for the entirety of each sequence, and using the correlation score to determine if it would be worth attempting to align the audio sequences.
- the iterative search can continue by stepping through the distorted and reference audio sequences, where each step size is the number of video frames that is set for the fingerprint size.
- each step size is the number of video frames that is set for the fingerprint size.
- the assumption is that because the playback rates for the distorted and reference audio data are the same and that no stalls are introduced into the audio data, once an offset is established through finding the first alignment, the remainder of the sequences should be aligned. This greatly reduces the search space by a factor equal to the fingerprint size.
- macro-iteration can be stopped and micro-iteration can begin at the last known sample position above the correlation threshold (e.g., 95).
- the fingerprint size i.e., 164
- Macro-Iteration Distorted Range Reference Range Correlation 1 1-164 100-264 100 2 164-328 264-428 100 3* 328-492 428-592 95 4 492-656 592-756 80 Because the correlation in iteration 4 is beneath the correlation threshold of 95, the process returns to the previous iteration, iteration 3, and proceeds to check frame-by-frame for the last range where the correlation between fingerprints meets the correlation threshold. For example:
- Micro-Iteration Distorted Range Reference Range Correlation 1 329-493 429-593 98 2 330-494 430-594 96 3 331-495 431-595 9 . . . . . . ⁇ 95 163 491-655 591-755 81
- FIG. 6 a method 600 for determining whether pre-encode and post-encode media streams, such as the pre-encode AV content 102 and the post-encode AV content 106 , are synchronized will be described, according to an illustrative embodiment. It should be understood that the operations of the methods disclosed herein are not necessarily presented in any particular order and that performance of some or all of the operations in an alternative order(s) is possible and is contemplated. The operations have been presented in the demonstrated order for ease of description and illustration. Operations may be added, omitted, and/or performed simultaneously, without departing from the scope of the concepts and technologies disclosed herein.
- the logical operations described herein are implemented (1) as a sequence of computer implemented acts or program modules running on a computing system and/or (2) as interconnected machine logic circuits or circuit modules within the computing system.
- the implementation is a matter of choice dependent on the performance and other requirements of the computing system.
- the logical operations described herein are referred to variously as states, operations, structural devices, acts, or modules. These states, operations, structural devices, acts, and modules may be implemented in software, in firmware, in special purpose digital logic, and any combination thereof.
- the phrase “cause a processor to perform operations” and variants thereof is used to refer to causing a processor or multiple processors of one or more systems and/or one or more devices disclosed herein to perform one or more operations and/or causing the processor to direct other components of the computing system or device to perform one or more of the operations.
- the method 600 begins and proceeds to operation 602 .
- the AV synchronization system 100 simultaneously captures samples of a pre-encode media stream (e.g., the pre-encode AV content 102 ) and a post-encode media stream (e.g., the post-encode AV content 106 ). From operation 602 , the method 600 proceeds to operation 604 .
- a pre-encode media stream e.g., the pre-encode AV content 102
- a post-encode media stream e.g., the post-encode AV content 106
- the AV synchronization system 100 via execution of the NAVPAM 122 , aligns a pre-encode video component of the pre-encode media stream (e.g., the pre-encode video 110 ) with a post-encode video component of the post-encode media stream (e.g., the post-encode video 114 ). From operation 604 , the method 600 proceeds to operation 606 .
- a pre-encode video component of the pre-encode media stream e.g., the pre-encode video 110
- a post-encode video component of the post-encode media stream e.g., the post-encode video 114
- the AV synchronization system 100 via execution of the NAVPAM 122 , determines a video offset (e.g., the video offset 124 ) between the pre-encode video component (e.g., the pre-encode video 110 ) and the post-encode video component (e.g., the post-encode video 114 ).
- a video offset e.g., the video offset 124
- the method 600 proceeds to operation 608 .
- the AV synchronization system 100 via execution of the AVSSM 126 , aligns a pre-encode audio component of the pre-encode media stream (e.g., the pre-encode audio 112 ) with a post-encode audio component of the post-encode media stream (e.g., the post-encode audio 116 ).
- the method 600 proceeds to operation 610 .
- the AV synchronization system 100 via execution of the AVSSM 126 , determines an audio offset (e.g., the audio offset 128 ) between the pre-encode audio component (e.g., the pre-encode audio 112 ) and the post-encode audio component (e.g., the post-encode audio 116 ).
- an audio offset e.g., the audio offset 128
- the method 600 proceeds to operation 612 .
- the AV synchronization system 100 via execution of the AVSSM 126 , compares the video offset 124 and the audio offset 128 .
- the method 600 proceeds to operation 614 .
- the AV synchronization system 100 via execution of the AVSSM 126 , determines, based upon the comparison at operation 612 , whether the video offset 124 and the audio offset 128 are equal. If the AV synchronization system 100 determines that the video offset 124 and the audio offset 128 are equal, the method 600 proceeds to operation 616 .
- the AV synchronization system 100 determines that the pre-encode media stream (e.g., the pre-encode AV content 102 ) and the post-encode media stream (e.g., the post-encode AV content 106 ) are synchronized. From operation 616 , the method 600 proceeds to operation 618 . The method 600 can end at operation 618 . Returning to operation 614 , if the AV synchronization system 100 determines that the video offset 124 and the audio offset 128 are not equal, the method 600 proceeds to operation 620 .
- the pre-encode media stream e.g., the pre-encode AV content 102
- the post-encode media stream e.g., the post-encode AV content 106
- the AV synchronization system 100 determines whether the audio offset 128 is less than ( ⁇ ) or greater than (>) the video offset 124 . If the audio offset 128 is less than the video offset 124 , the method 600 proceeds to operation 622 .
- the AV synchronization system 100 determines that the post-encode audio component (e.g., the post-encode audio 116 ) is ahead of the post-encode video component (e.g., the post-encode video 114 ), which is representative of the pre-encode AV content 102 and the post-encode AV content 106 being desynchronized with the audio being played before the video. From operation 622 , the method 600 proceeds to operation 618 .
- the post-encode audio component e.g., the post-encode audio 116
- the post-encode video component e.g., the post-encode video 114
- the method 600 can end at operation 618 .
- the method 600 proceeds to operation 624 .
- the AV synchronization system 100 determines that the post-encode audio component (e.g., the post-encode audio 116 ) is behind of the post-encode video component (e.g., the post-encode video 114 ), which is representative of the pre-encode AV content 102 and the post-encode AV content 106 being desynchronized with the audio being played after the video.
- the method 600 then proceeds to operation 618 .
- the method 600 can end at operation 618 .
- the method 700 begins and proceeds to operation 702 .
- the AV synchronization system 100 receives the pre-encode AV content 102 .
- the method 700 proceeds to operation 704 .
- the AV synchronization system 100 obtains the post-encode AV content 106 .
- the method 700 proceeds to operation 706 .
- the AV synchronization system 100 uses the multimedia framework 107 to perform an integrity check for the pre-encode AV content 102 . From operation 706 , the method 700 proceeds to operation 708 . At operation 708 , the AV synchronization system 100 uses the multimedia framework 107 to perform an integrity check for the post-encode AV content 106 . The remaining operations of the method 700 assume that both the pre-encode AV content 102 and the post-encode AV content 106 pass the integrity checks at operations 706 , 708 , respectively. If the integrity check for either the pre-encode AV content 102 or the post-encode AV content 106 fails, the multimedia framework 107 can output one or more errors.
- the method 700 proceeds to operation 710 .
- the AV synchronization system 100 configures the multimedia framework 107 .
- the multimedia framework 107 can provide a broad range of functionality to process multimedia content such as the pre-encode AV content 102 and the post-encode AV content 106 .
- the AV synchronization system 100 can be configured to create the demultiplexers 108 A, 108 B.
- the demultiplexers 108 A, 108 B are software instructions sets programmed to perform demultiplexing.
- the method 700 proceeds to operation 712 .
- the AV synchronization system 100 uses the multimedia framework 107 , and in particular, the demultiplexer 108 A to demultiplex the pre-encode AV content 102 into the pre-encode video 110 and the post-encode audio 112 streams.
- the method 700 proceeds to operation 714 .
- the AV synchronization system 100 uses the multimedia framework 107 , and in particular, the demultiplexer 108 B to demultiplex the post-encode AV content 106 into post-encode video 114 and post-encode audio 116 components (streams).
- the method 700 proceeds to operation 716 .
- the AV synchronization system 100 creates the first video pipe 118 A and outputs the pre-encode video 110 on the first video pipe 118 A towards the NAVPAM 122 .
- the method 700 proceeds to operation 718 .
- the AV synchronization system 100 creates the first audio pipe 120 A and outputs the pre-encode audio 112 on the first audio pipe 120 A towards the AVSSM 126 .
- the method 700 proceeds to operation 720 .
- the AV synchronization system 100 creates the second video pipe 118 B and outputs the distorted video 114 on the second video pipe 118 B towards the NAVPAM 122 . From operation 720 , the method 700 proceeds to operation 722 . At operation 722 , the AV synchronization system 100 creates the second audio pipe 120 B and outputs the post-encode audio 116 on the second audio pipe 120 B towards the AVSSM 126 .
- the method 700 proceeds to operation 724 .
- the NAVPAM 122 receives the pre-encode video 110 and the post-encode video 114 . From operation 724 , the method 700 proceeds to operation 726 .
- the NAVPAM 122 generates the thumbnail images 134 . From operation 726 , the method 700 proceeds to operation 728 .
- the NAVPAM 122 determines the search ranges 140 . From operation 728 , the method 700 proceeds to operation 730 .
- the NAVPAM 122 calculates the distance values 144 between the thumbnail images 134 . From operation 730 , the method 700 proceeds to operation 732 .
- the NAVPAM 122 determines a best-fit alignment where the distance between the thumbnail images 134 is minimized. From operation 732 , the method 700 proceeds to operation 734 . At operation 734 , the NAVPAM 122 outputs the video offset 124 to the AVSSM 126 .
- the method 700 proceeds to operation 736 .
- the AVSSM receives the pre-encode audio 112 and the post-encode audio 116 .
- the method 700 proceeds to operation 738 .
- the AVSSM 126 correlates the audio and video frames. In particular, the AVSSM 126 divides the pre-encode audio 112 and the post-encode audio 116 into time slices 150 , wherein each time slice 150 is associated with a corresponding video frame.
- the method 700 proceeds to operation 740 .
- the AVSSM 126 generates the acoustic fingerprints 156 .
- the method 700 proceeds to operation 742 .
- the AVSSM 126 compares the acoustic fingerprints 156 between frames for similarity and determines the audio offset 128 .
- the method 700 proceeds to operation 744 .
- the AVSSM 126 compares the video offset 125 and the audio offset 128 to determine whether or not the post-encode (distorted) AV content 106 is correctly synchronized with the pre-encode (reference) AV content 102 .
- the method 700 proceeds to operation 746 .
- the AVSSM 126 provides the result 130 of the comparison at operation 744 .
- the result 130 indicates whether or not the post-encode (distorted) AV content 106 is correctly synchronized with the pre-encode (reference) AV content 102 .
- the method 700 proceeds to operation 748 .
- the method 700 can end at operation 748 .
- FIG. 8 a block diagram illustrating a computer system 800 configured to provide the functionality described herein for AV synchronization in accordance with various embodiments of the concepts and technologies disclosed herein.
- the AV synchronization system 100 is configured the same as or similar to the computer system 800 .
- the computer system 800 includes a processing unit 802 , a memory 804 , one or more user interface devices 806 , one or more input/output (“I/O”) devices 808 , and one or more network devices 810 , each of which is operatively connected to a system bus 812 .
- the bus 812 enables bi-directional communication between the processing unit 802 , the memory 804 , the user interface devices 806 , the I/O devices 808 , and the network devices 810 .
- the processing unit 802 may be a standard central processor that performs arithmetic and logical operations, a more specific purpose programmable logic controller (“PLC”), a programmable gate array, or other type of processor known to those skilled in the art and suitable for controlling the operation of the server computer.
- the processing unit 802 can be a single processing unit or a multiple processing unit that includes more than one processing component. Processing units are generally known, and therefore are not described in further detail herein.
- the memory 804 communicates with the processing unit 802 via the system bus 812 .
- the memory 804 can include a single memory component or multiple memory components.
- the memory 804 is operatively connected to a memory controller (not shown) that enables communication with the processing unit 802 via the system bus 812 .
- the memory 804 includes an operating system 814 and one or more program modules 816 .
- the operating system 814 can include, but is not limited to, members of the WINDOWS, WINDOWS CE, and/or WINDOWS MOBILE families of operating systems from MICROSOFT CORPORATION, the LINUX family of operating systems, the SYMBIAN family of operating systems from SYMBIAN LIMITED, the BREW family of operating systems from QUALCOMM CORPORATION, the MAC OS, iOS, and/or LEOPARD families of operating systems from APPLE CORPORATION, the FREEBSD family of operating systems, the SOLARIS family of operating systems from ORACLE CORPORATION, other operating systems, and the like.
- the program modules 816 may include various software and/or program modules described herein.
- the program modules 816 can include the multimedia framework 107 , the AV content encoder(s) 104 , the NAVPAM 122 , the AVSSM 126 , or a combination thereof.
- multiple implementations of the computer system 800 can be used, wherein each implementation is configured to execute one or more of the program modules 816 .
- the program modules 816 and/or other programs can be embodied in computer-readable media containing instructions that, when executed by the processing unit 802 , perform the methods 600 , 700 described herein.
- the program modules 816 may be embodied in hardware, software, firmware, or any combination thereof.
- the memory 804 also can be configured to store the pre-encode AV content 102 , the post-encode AV content 106 , the pre-encode video 110 , the pre-encode audio 112 , the post-encode video 114 , the post-encode audio 116 , the thumbnail images 134 , the distance values 144 , the search ranges 140 , the video offset 124 , the audio offset 128 , the time slices 150 , the acoustic fingerprints 156 , the result 130 , combinations thereof, and/or other data disclosed herein.
- Computer-readable media may include any available computer storage media or communication media that can be accessed by the computer system 800 .
- Communication media includes computer-readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any delivery media.
- modulated data signal means a signal that has one or more of its characteristics changed or set in a manner as to encode information in the signal.
- communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media. Combinations of the any of the above should also be included within the scope of computer-readable media.
- Computer storage media includes volatile and non-volatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules, or other data.
- Computer storage media includes, but is not limited to, RAM, ROM, Erasable Programmable ROM (“EPROM”), Electrically Erasable Programmable ROM (“EEPROM”), flash memory or other solid state memory technology, CD-ROM, digital versatile disks (“DVD”), or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by the computer system 800 .
- the phrase “computer storage medium,” “computer-readable storage medium,” and variations thereof does not include waves or signals per se and/or communication media, and therefore should be construed as being directed to “non-transitory” media only.
- the user interface devices 806 may include one or more devices with which a user accesses the computer system 800 .
- the user interface devices 806 may include, but are not limited to, computers, servers, personal digital assistants, cellular phones, or any suitable computing devices.
- the I/O devices 808 enable a user to interface with the program modules 816 .
- the I/O devices 808 are operatively connected to an I/O controller (not shown) that enables communication with the processing unit 802 via the system bus 812 .
- the I/O devices 808 may include one or more input devices, such as, but not limited to, a keyboard, a mouse, or an electronic stylus. Further, the I/O devices 808 may include one or more output devices, such as, but not limited to, a display screen or a printer.
- the network devices 810 enable the computer system 800 to communicate with other networks or remote systems via a network 818 .
- Examples of the network devices 810 include, but are not limited to, a modem, a radio frequency (“RF”) or infrared (“IR”) transceiver, a telephonic interface, a bridge, a router, or a network card.
- the network 818 may include a wireless network such as, but not limited to, a Wireless Local Area Network (“WLAN”) such as a WI-FI network, a Wireless Wide Area Network (“WWAN”), a Wireless Personal Area Network (“WPAN”) such as BLUETOOTH, a Wireless Metropolitan Area Network (“WMAN”) such a WiMAX network, or a cellular network.
- WLAN Wireless Local Area Network
- WWAN Wireless Wide Area Network
- WPAN Wireless Personal Area Network
- WMAN Wireless Metropolitan Area Network
- WiMAX Wireless Metropolitan Area Network
- the network 818 may be a wired network such as, but not limited to, a Wide Area Network (“WAN”) such as the Internet, a Local Area Network (“LAN”) such as the Ethernet, a wired Personal Area Network (“PAN”), or a wired Metropolitan Area Network (“MAN”).
- WAN Wide Area Network
- LAN Local Area Network
- PAN Personal Area Network
- MAN wired Metropolitan Area Network
- FIG. 9 a block diagram illustrating an exemplary containerized cloud architecture 900 capable of implementing, at least in part, aspects of the concepts and technologies disclosed herein will be described, according to an illustrative embodiment.
- the AV synchronization system 100 is implemented in the containerized cloud architecture 900 .
- multiple instances of the AV synchronization system 100 can be deployed and executed simultaneously.
- Each instance of the AV synchronization system 100 can be used to determine a result 130 from different pre-encode AV content 102 and post-encode AV content 106 .
- the illustrated containerized cloud architecture 900 includes a first host (“host”) 902 A and a second host (“host”) 902 B (at times referred to herein collectively as hosts 902 or individually as host 902 ) that can communicate via an overlay network 904 .
- hosts 902 are shown, the containerized cloud architecture 900 can support any number of hosts 902 .
- the overlay network 904 can enable communication among hosts 902 in the same cloud network or hosts 902 across different cloud networks.
- the overlay network 904 can enable communication among hosts 902 owned and/or operated by the same or different entities.
- the illustrated host 902 A includes a host hardware 1 906 A, a host operating system 1 908 A, a DOCKER engine 1 910 A, a bridge network 1 912 A, container A-1 through container N-1 914 A 1 - 914 N 1 , and microservice A-1 through microservice N-1 916 A 1 - 916 N 1 .
- the illustrated host 2 902 B includes a host hardware 2 906 B, a host operating system 2 908 B, a DOCKER engine 2 910 B, a bridge network 2 912 B, container A-2 through container N-2 914 A 2 - 914 N 2 , and microservice A-2 through microservice N-2 916 A 2 - 916 N 2 .
- the host hardware 1 906 A and the host hardware 2 906 B can be implemented as bare metal hardware such as one or more physical servers.
- the host hardware 906 alternatively can be implemented using hardware virtualization.
- the host hardware 906 can include compute resources, memory resources, and other hardware resources. These resources can be virtualized according to known virtualization techniques.
- a virtualization cloud architecture 1000 is described herein with reference to FIG. 10 . Although the containerized cloud architecture 900 and the virtualization cloud architecture 1000 are described separately, these architectures can be combined to provide a hybrid containerized/virtualized cloud architecture.
- Compute resources can include one or more hardware components that perform computations to process data and/or to execute computer-executable instructions.
- the compute resources can execute instructions of the host operating system 1 908 A and the host operating system 2 908 B (at times referred to herein collectively as host operating systems 908 or individually as host operating system 908 ), the containers 914 A 1 - 914 N 1 and the containers 914 A 2 - 914 N 2 (at times referred to herein collectively as containers 914 or individually as container 914 ), and the microservices 916 A 1 - 916 N 1 and the microservices 916 A 1 - 916 N 1 (at times referred to herein collectively as microservices 916 or individually as microservice 916 ).
- the compute resources of the host hardware 906 can include one or more central processing units (“CPUs”) configured with one or more processing cores.
- the compute resources can include one or more graphics processing unit (“GPU”) configured to accelerate operations performed by one or more CPUs, and/or to perform computations to process data, and/or to execute computer-executable instructions of one or more application programs, operating systems, and/or other software that may or may not include instructions particular to graphics computations.
- the compute resources can include one or more discrete GPUs.
- the compute resources can include CPU and GPU components that are configured in accordance with a co-processing CPU/GPU computing model, wherein the sequential part of an application executes on the CPU and the computationally-intensive part is accelerated by the GPU.
- the compute resources can include one or more system-on-chip (“SoC”) components along with one or more other components, including, for example, one or more memory resources, and/or one or more other resources.
- the compute resources can be or can include one or more SNAPDRAGON SoCs, available from QUALCOMM; one or more TEGRA SoCs, available from NVIDIA; one or more HUMMINGBIRD SoCs, available from SAMSUNG; one or more Open Multimedia Application Platform (“OMAP”) SoCs, available from TEXAS INSTRUMENTS; one or more customized versions of any of the above SoCs; and/or one or more proprietary SoCs.
- SNAPDRAGON SoCs available from QUALCOMM
- TEGRA SoCs available from NVIDIA
- HUMMINGBIRD SoCs available from SAMSUNG
- OMAP Open Multimedia Application Platform
- the compute resources can be or can include one or more hardware components architected in accordance with an advanced reduced instruction set computing (“RISC”) (“ARM”) architecture, available for license from ARM HOLDINGS.
- RISC advanced reduced instruction set computing
- the compute resources can be or can include one or more hardware components architected in accordance with an x86 architecture, such an architecture available from INTEL CORPORATION, and others.
- RISC advanced reduced instruction set computing
- x86 such an architecture available from INTEL CORPORATION
- the compute resources should not be construed as being limited to any particular computation architecture or combination of computation architectures, including those explicitly disclosed herein.
- the memory resources of the host hardware 906 can include one or more hardware components that perform storage operations, including temporary or permanent storage operations.
- the memory resource(s) include volatile and/or non-volatile memory implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules, or other data disclosed herein.
- Computer storage media includes, but is not limited to, random access memory (“RAM”), read-only memory (“ROM”), Erasable Programmable ROM (“EPROM”), Electrically Erasable Programmable ROM (“EEPROM”), flash memory or other solid state memory technology, CD-ROM, digital versatile disks (“DVD”), or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store data and which can be accessed by the compute resources.
- RAM random access memory
- ROM read-only memory
- EPROM Erasable Programmable ROM
- EEPROM Electrically Erasable Programmable ROM
- flash memory or other solid state memory technology
- CD-ROM compact discs
- DVD digital versatile disks
- magnetic cassettes magnetic tape
- magnetic disk storage magnetic disk storage devices
- the other resource(s) of the host hardware 906 can include any other hardware resources that can be utilized by the compute resources(s) and/or the memory resource(s) to perform operations described herein.
- the other resource(s) can include one or more input and/or output processors (e.g., network interface controller or wireless radio), one or more modems, one or more codec chipset, one or more pipeline processors, one or more fast Fourier transform (“FFT”) processors, one or more digital signal processors (“DSPs”), one or more speech synthesizers, and/or the like.
- the host operating systems 908 can be proprietary, open source, or closed source.
- the host operating systems 908 can be or can include one or more container operating systems designed specifically to host containers such as the containers 914 .
- the host operating systems 908 can be or can include FEDORA COREOS (available from RED HAT, INC), RANCHEROS (available from RANCHER), and/or BOTTLEROCKET (available from Amazon Web Services).
- the host operating systems 908 can be or can include one or more members of the WINDOWS family of operating systems from MICROSOFT CORPORATION (e.g., WINDOWS SERVER), the LINUX family of operating systems (e.g., CENTOS, DEBIAN, FEDORA, ORACLE LINUX, RHEL, SUSE, and UBUNTU), the SOLARIS family of operating systems from ORACLE CORPORATION, other operating systems, and the like.
- WINDOWS family of operating systems from MICROSOFT CORPORATION
- LINUX e.g., CENTOS, DEBIAN, FEDORA, ORACLE LINUX, RHEL, SUSE, and UBUNTU
- SOLARIS family of operating systems from ORACLE CORPORATION, other operating systems, and the like.
- the containerized cloud architecture 900 can be implemented utilizing any containerization technologies.
- open-source container technologies such as those available from DOCKER, INC.
- DOCKER containers technologies available from DOCKER, INC.
- DOCKER engines 910 such as the DOCKER engines 910 .
- other container technologies such as KUBERNETES may also be applicable to implementing the concepts and technologies disclosed herein, and as such, the containerized cloud architecture 900 is not limited to DOCKER container technologies.
- open-source container technologies are most widely used, the concepts and technologies disclosed here may be implemented using proprietary technologies or closed source technologies.
- the DOCKER engines 910 are based on open source containerization technologies available from DOCKER, INC.
- the DOCKER engines 910 enable users (not shown) to build and containerize applications.
- the full breadth of functionality provided by the DOCKER engines 910 and associated components in the DOCKER architecture are beyond the scope of the present disclosure.
- the primary functions of the DOCKER engines 910 will be described herein in brief, but this description should not be construed as limiting the functionality of the DOCKER engines 910 or any part of the associated DOCKER architecture. Instead, those skilled in the art will understand the implementation of the DOCKER engines 910 and other components of the DOCKER architecture to facilitate building and containerizing applications within the containerized cloud architecture 900 .
- the DOCKER engine 910 functions as a client-server application executed by the host operating system 908 .
- the DOCKER engine 910 provides a server with a daemon process along with application programming interfaces (“APIs”) that specify interfaces that applications can use to communicate with and instruct the daemon to perform operations.
- the DOCKER engine 910 also provides a command line interface (“CLI”) that uses the APIs to control and interact with the daemon through scripting and/or CLI commands.
- the daemon can create and manage objects such as images, containers, networks, and volumes.
- the bridge networks 912 enable the containers 914 connected to the same bridge network to communicate.
- the bridge network 1 912 A enables communication among the containers 914 A 1 - 914 N 1
- the bridge networks 912 B enables communication among the containers 914 A 2 - 914 N 2 .
- the bridge networks 912 are software network bridges implemented via the DOCKER bridge driver.
- the DOCKER bridge driver enables default and user-defined network bridges.
- the containers 914 are runtime instances of images.
- the containers 914 are described herein specifically as DOCKER containers, although other containerization technologies are contemplated as noted above.
- Each container 914 can include an image, an execution environment, and a standard set of instructions.
- the container A-1 914 A 1 is shown with the AV content encoder(s) 104 , the multimedia framework 107 , the NAVPAM 122 , and the AVSSM 126 .
- the AV content encoder(s) 104 , the multimedia framework 107 , the NAVPAM 122 , the AVSSM 126 , or any combination thereof can be distributed among multiple containers 914 across the same or different hosts 902 .
- the microservices 916 are applications that provide a single function.
- each of the microservices 916 is provided by one of the containers 914 , although each of the containers 914 may contain multiple microservices 916 .
- the microservices 916 can include, but are not limited, to server, database, and other executable applications to be run in an execution environment provided by a container 914 .
- the microservices 916 can provide any type of functionality, and therefore all the possible functions cannot be listed herein.
- Those skilled in the art will appreciate the use of the microservices 916 along with the containers 914 to improve many aspects of the containerized cloud architecture 900 , such as reliability, security, agility, and efficiency, for example.
- the AV content encoder(s) 104 , the multimedia framework 107 , the NAVPAM 122 , the AVSSM 126 , or some combination thereof are embodied as part of the microservices 916 .
- FIG. 10 a block diagram illustrating an example virtualized cloud architecture 1000 and components thereof will be described, according to an exemplary embodiment.
- the virtualized cloud architecture 1000 can be utilized to implement various elements disclosed herein.
- the AV synchronization system 100 at least in part, is implemented in the virtualized cloud architecture 1000 .
- the virtualized cloud architecture 1000 is a shared infrastructure that can support multiple services and network applications.
- the illustrated virtualized cloud architecture 1000 includes a hardware resource layer 1002 , a control layer 1004 , a virtual resource layer 1006 , and an application layer 1008 that work together to perform operations as will be described in detail herein.
- the hardware resource layer 1002 provides hardware resources, which, in the illustrated embodiment, include one or more compute resources 1010 , one or more memory resources 1012 , and one or more other resources 1014 .
- the compute resource(s) 1010 can include one or more hardware components that perform computations to process data, and/or to execute computer-executable instructions of one or more application programs, operating systems, and/or other software.
- the compute resources 1010 can include one or more central processing units (“CPUs”) configured with one or more processing cores.
- the compute resources 1010 can include one or more graphics processing unit (“GPU”) configured to accelerate operations performed by one or more CPUs, and/or to perform computations to process data, and/or to execute computer-executable instructions of one or more application programs, operating systems, and/or other software that may or may not include instructions particular to graphics computations.
- the compute resources 1010 can include one or more discrete GPUs.
- the compute resources 1010 can include CPU and GPU components that are configured in accordance with a co-processing CPU/GPU computing model, wherein the sequential part of an application executes on the CPU and the computationally-intensive part is accelerated by the GPU.
- the compute resources 1010 can include one or more system-on-chip (“SoC”) components along with one or more other components, including, for example, one or more of the memory resources 1012 , and/or one or more of the other resources 1014 .
- the compute resources 1010 can be or can include one or more SNAPDRAGON SoCs, available from QUALCOMM; one or more TEGRA SoCs, available from NVIDIA; one or more HUMMINGBIRD SoCs, available from SAMSUNG; one or more Open Multimedia Application Platform (“OMAP”) SoCs, available from TEXAS INSTRUMENTS; one or more customized versions of any of the above SoCs; and/or one or more proprietary SoCs.
- SoC system-on-chip
- the compute resources 1010 can be or can include one or more hardware components architected in accordance with an advanced reduced instruction set computing (“RISC”) machine (“ARM”) architecture, available for license from ARM HOLDINGS.
- RISC advanced reduced instruction set computing
- the compute resources 1010 can be or can include one or more hardware components architected in accordance with an x86 architecture, such an architecture available from INTEL CORPORATION of Mountain View, Calif., and others.
- RISC reduced instruction set computing
- ARM advanced reduced instruction set computing
- x86 such an architecture available from INTEL CORPORATION of Mountain View, Calif.
- the compute resources 1010 can utilize various computation architectures, and as such, the compute resources 1010 should not be construed as being limited to any particular computation architecture or combination of computation architectures, including those explicitly disclosed herein.
- the memory resource(s) 1012 can include one or more hardware components that perform storage operations, including temporary or permanent storage operations.
- the memory resource(s) 1012 include volatile and/or non-volatile memory implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules, or other data disclosed herein.
- Computer storage media includes, but is not limited to, random access memory (“RAM”), read-only memory (“ROM”), Erasable Programmable ROM (“EPROM”), Electrically Erasable Programmable ROM (“EEPROM”), flash memory or other solid state memory technology, CD-ROM, digital versatile disks (“DVD”), or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store data and which can be accessed by the compute resources 1010 .
- RAM random access memory
- ROM read-only memory
- EPROM Erasable Programmable ROM
- EEPROM Electrically Erasable Programmable ROM
- flash memory or other solid state memory technology CD-ROM, digital versatile disks (“DVD”), or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store data and which can be accessed by the compute resources 1010 .
- the other resource(s) 1014 can include any other hardware resources that can be utilized by the compute resources(s) 1010 and/or the memory resource(s) 1012 to perform operations described herein.
- the other resource(s) 1014 can include one or more input and/or output processors (e.g., network interface controller or wireless radio), one or more modems, one or more codec chipset, one or more pipeline processors, one or more fast Fourier transform (“FFT”) processors, one or more digital signal processors (“DSPs”), one or more speech synthesizers, and/or the like.
- input and/or output processors e.g., network interface controller or wireless radio
- FFT fast Fourier transform
- DSPs digital signal processors
- the hardware resources operating within the hardware resource layer 1002 can be virtualized by one or more virtual machine monitors (“VMMs”) 1016 A- 1016 N (also known as “hypervisors;” hereinafter “VMMs 1016 ”) operating within the control layer 1004 to manage one or more virtual resources that reside in the virtual resource layer 1006 .
- VMMs 1016 can be or can include software, firmware, and/or hardware that alone or in combination with other software, firmware, and/or hardware, manages one or more virtual resources operating within the virtual resource layer 1006 .
- the virtual resources operating within the virtual resource layer 1006 can include abstractions of at least a portion of the compute resources 1010 , the memory resources 1012 , the other resources 1014 , or any combination thereof. These abstractions are referred to herein as virtual machines (“VMs”).
- the virtual resource layer 1006 includes VMs 1018 A- 1018 N (hereinafter “VMs 1018 ”). Each of the VMs 1018 can execute one or more applications 1020 A- 1020 N in the application layer 1008 .
- the AV content encoder(s) 104 and the multimedia framework 107 are shown as the application 1020 A or a portion thereof; the NAVPAM 122 is shown as the application 1020 B or a portion thereof; and the AVSSM 126 is shown as the application 1020 C or a portion thereof.
- the mobile device 1100 is representative of a device that can playback the post-encode AV content 106 , such as part of a video streaming service provided to the mobile device 1100 . While connections are not shown between the various components illustrated in FIG. 11 , it should be understood that some, none, or all of the components illustrated in FIG. 11 can be configured to interact with one another to carry out various device functions. In some embodiments, the components are arranged so as to communicate via one or more busses (not shown). Thus, it should be understood that FIG. 11 and the following description are intended to provide a general understanding of a suitable environment in which various aspects of embodiments can be implemented, and should not be construed as being limiting in any way.
- the mobile device 1100 can include a display 1102 for displaying data.
- the display 1102 can be configured to display the post-encode AV content 106 , various GUI elements, text, images, video, virtual keypads and/or keyboards, messaging data, notification messages, metadata, Internet content, device status, time, date, calendar data, device preferences, map and location data, combinations thereof, and/or the like.
- the mobile device 1100 also can include a processor 1104 and a memory or other data storage device (“memory”) 1106 .
- the processor 1104 can be configured to process data and/or can execute computer-executable instructions stored in the memory 1106 .
- the computer-executable instructions executed by the processor 1104 can include, for example, an operating system 1108 , one or more applications 1110 , other computer-executable instructions stored in the memory 1106 , or the like.
- the applications 1110 also can include a UI application (not illustrated in FIG. 11 ).
- the UI application can interface with the operating system 1108 to facilitate user interaction with functionality and/or data stored at the mobile device 1100 and/or stored elsewhere.
- the operating system 1108 can include a member of the SYMBIAN OS family of operating systems from SYMBIAN LIMITED, a member of the WINDOWS MOBILE OS and/or WINDOWS PHONE OS families of operating systems from MICROSOFT CORPORATION, a member of the PALM WEBOS family of operating systems from HEWLETT PACKARD CORPORATION, a member of the BLACKBERRY OS family of operating systems from RESEARCH IN MOTION LIMITED, a member of the IOS family of operating systems from APPLE INC., a member of the ANDROID OS family of operating systems from GOOGLE INC., and/or other operating systems.
- These operating systems are merely illustrative of some contemplated operating systems that may be used in accordance with various embodiments of the concepts and technologies described herein and therefore should not be construed as being limiting in
- the UI application can be executed by the processor 1104 to aid a user in entering/deleting data, entering and setting user IDs and passwords for device access, configuring settings, manipulating content and/or settings, multimode interaction, interacting with other applications 1110 , and otherwise facilitating user interaction with the operating system 1108 , the applications 1110 , and/or other types or instances of data 1112 that can be stored at the mobile device 1100 .
- the applications 1110 , the data 1112 , and/or portions thereof can be stored in the memory 1106 and/or in a firmware 1114 , and can be executed by the processor 1104 .
- the firmware 1114 also can store code for execution during device power up and power down operations. It can be appreciated that the firmware 1114 can be stored in a volatile or non-volatile data storage device including, but not limited to, the memory 1106 and/or a portion thereof.
- the mobile device 1100 also can include an input/output (“I/O”) interface 1116 .
- the I/O interface 1116 can be configured to support the input/output of data such as location information, presence status information, user IDs, passwords, and application initiation (start-up) requests.
- the I/O interface 1116 can include a hardwire connection such as a universal serial bus (“USB”) port, a mini-USB port, a micro-USB port, an audio jack, a PS2 port, an IEEE 1394 (“FIREWIRE”) port, a serial port, a parallel port, an Ethernet (RJ45) port, an RJ11 port, a proprietary port, combinations thereof, or the like.
- the mobile device 1100 can be configured to synchronize with another device to transfer content to and/or from the mobile device 1100 . In some embodiments, the mobile device 1100 can be configured to receive updates to one or more of the applications 1110 via the I/O interface 1116 , though this is not necessarily the case.
- the I/O interface 1116 accepts I/O devices such as keyboards, keypads, mice, interface tethers, printers, plotters, external storage, touch/multi-touch screens, touch pads, trackballs, joysticks, microphones, remote control devices, displays, projectors, medical equipment (e.g., stethoscopes, heart monitors, and other health metric monitors), modems, routers, external power sources, docking stations, combinations thereof, and the like. It should be appreciated that the I/O interface 1116 may be used for communications between the mobile device 1100 and a network device or local device.
- I/O devices such as keyboards, keypads, mice, interface tethers, printers, plotters, external storage, touch/multi-touch screens, touch pads, trackballs, joysticks, microphones, remote control devices, displays, projectors, medical equipment (e.g., stethoscopes, heart monitors, and other health metric monitors), modems, routers, external power sources, docking
- the mobile device 1100 also can include a communications component 1118 .
- the communications component 1118 can be configured to interface with the processor 1104 to facilitate wired and/or wireless communications with one or more networks, such as a packet data network 1204 (shown in FIG. 12 ), the Internet, or some combination thereof.
- the communications component 1118 includes a multimode communications subsystem for facilitating communications via the cellular network and one or more other networks.
- the communications component 1118 includes one or more transceivers.
- the one or more transceivers can be configured to communicate over the same and/or different wireless technology standards with respect to one another.
- one or more of the transceivers of the communications component 1118 may be configured to communicate using Global System for Mobile communications (“GSM”), Code-Division Multiple Access (“CDMA”) CDMAONE, CDMA2000, Long-Term Evolution (“LTE”) LTE, and various other 2G, 2.5G, 3G, 4G, 4.5G, 5G, and greater generation technology standards.
- GSM Global System for Mobile communications
- CDMA Code-Division Multiple Access
- LTE Long-Term Evolution
- the communications component 1118 may facilitate communications over various channel access methods (which may or may not be used by the aforementioned standards) including, but not limited to, Time-Division Multiple Access (“TDMA”), Frequency-Division Multiple Access (“FDMA”), Wideband CDMA (“W-CDMA”), Orthogonal Frequency-Division Multiple Access (“OFDMA”), Space-Division Multiple Access (“SDMA”), and the like.
- TDMA Time-Division Multiple Access
- FDMA Frequency-Division Multiple Access
- W-CDMA Wideband CDMA
- OFDMA Orthogonal Frequency-Division Multiple Access
- SDMA Space-Division Multiple Access
- the communications component 1118 may facilitate data communications using General Packet Radio Service (“GPRS”), Enhanced Data services for Global Evolution (“EDGE”), the High-Speed Packet Access (“HSPA”) protocol family including High-Speed Downlink Packet Access (“HSDPA”), Enhanced Uplink (“EUL”) (also referred to as High-Speed Uplink Packet Access (“HSUPA”), HSPA+, and various other current and future wireless data access standards.
- GPRS General Packet Radio Service
- EDGE Enhanced Data services for Global Evolution
- HSPA High-Speed Packet Access
- HSPA High-Speed Downlink Packet Access
- EUL Enhanced Uplink
- HSPA+ High-Speed Uplink Packet Access
- the communications component 1118 can include a first transceiver (“TxRx”) 1120 A that can operate in a first communications mode (e.g., GSM).
- TxRx first transceiver
- the communications component 1118 also can include an N th transceiver (“TxRx”) 1120 N that can operate in a second communications mode relative to the first transceiver 1120 A (e.g., UMTS). While two transceivers 1120 A- 1120 N (hereinafter collectively and/or generically referred to as “transceivers 1120 ”) are shown in FIG. 11 , it should be appreciated that less than two, two, and/or more than two transceivers 1120 can be included in the communications component 1118 .
- TxRx N th transceiver
- the communications component 1118 also can include an alternative transceiver (“Alt TxRx”) 1122 for supporting other types and/or standards of communications.
- the alternative transceiver 1122 can communicate using various communications technologies such as, for example, WI-FI, WIMAX, BLUETOOTH, infrared, infrared data association (“IRDA”), near field communications (“NFC”), other RF technologies, combinations thereof, and the like.
- the communications component 1118 also can facilitate reception from terrestrial radio networks, digital satellite radio networks, internet-based radio service networks, combinations thereof, and the like.
- the communications component 1118 can process data from a network such as the Internet, an intranet, a broadband network, a WI-FI hotspot, an Internet service provider (“ISP”), a digital subscriber line (“DSL”) provider, a broadband provider, combinations thereof, or the like.
- a network such as the Internet, an intranet, a broadband network, a WI-FI hotspot, an Internet service provider (“ISP”), a digital subscriber line (“DSL”) provider, a broadband provider, combinations thereof, or the like.
- ISP Internet service provider
- DSL digital subscriber line
- the mobile device 1100 also can include one or more sensors 1124 .
- the sensors 1124 can include temperature sensors, light sensors, air quality sensors, movement sensors, accelerometers, magnetometers, gyroscopes, infrared sensors, orientation sensors, noise sensors, microphones proximity sensors, combinations thereof, and/or the like.
- audio capabilities for the mobile device 1100 may be provided by an audio I/O component 1126 .
- the audio I/O component 1126 of the mobile device 1100 can include one or more speakers for the output of audio signals, one or more microphones for the collection and/or input of audio signals, and/or other audio input and/or output devices.
- the illustrated mobile device 1100 also can include a subscriber identity module (“SIM”) system 1128 .
- SIM system 1128 can include a universal SIM (“USIM”), a universal integrated circuit card (“UICC”) and/or other identity devices.
- the SIM system 1128 can include and/or can be connected to or inserted into an interface such as a slot interface 1130 .
- the slot interface 1130 can be configured to accept insertion of other identity cards or modules for accessing various types of networks. Additionally, or alternatively, the slot interface 1130 can be configured to accept multiple subscriber identity cards. Because other devices and/or modules for identifying users and/or the mobile device 1100 are contemplated, it should be understood that these embodiments are illustrative, and should not be construed as being limiting in any way.
- the mobile device 1100 also can include an image capture and processing system 1132 (“image system”).
- image system 1132 can be configured to capture or otherwise obtain photos, videos, and/or other visual information.
- the image system 1132 can include cameras, lenses, charge-coupled devices (“CCDs”), combinations thereof, or the like.
- the mobile device 1100 may also include a video system 1134 .
- the video system 1134 can be configured to capture, process, record, modify, and/or store video content. Photos and videos obtained using the image system 1132 and the video system 1134 , respectively, may be added as message content to an MMS message, email message, and sent to another device.
- the video and/or photo content also can be shared with other devices via various types of data transfers via wired and/or wireless communication devices as described herein.
- the mobile device 1100 also can include one or more location components 1136 .
- the location components 1136 can be configured to send and/or receive signals to determine a geographic location of the mobile device 1100 .
- the location components 1136 can send and/or receive signals from global positioning system (“GPS”) devices, assisted-GPS (“A-GPS”) devices, WI-FI/WIMAX and/or cellular network triangulation data, combinations thereof, and the like.
- GPS global positioning system
- A-GPS assisted-GPS
- WI-FI/WIMAX WI-FI/WIMAX and/or cellular network triangulation data, combinations thereof, and the like.
- the location component 1136 also can be configured to communicate with the communications component 1118 to retrieve triangulation data for determining a location of the mobile device 1100 .
- the location component 1136 can interface with cellular network nodes, telephone lines, satellites, location transmitters and/or beacons, wireless network transmitters and receivers, combinations thereof, and the like.
- the location component 1136 can include and/or can communicate with one or more of the sensors 1124 such as a compass, an accelerometer, and/or a gyroscope to determine the orientation of the mobile device 1100 .
- the mobile device 1100 can generate and/or receive data to identify its geographic location, or to transmit data used by other devices to determine the location of the mobile device 1100 .
- the location component 1136 may include multiple components for determining the location and/or orientation of the mobile device 1100 .
- the illustrated mobile device 1100 also can include a power source 1138 .
- the power source 1138 can include one or more batteries, power supplies, power cells, and/or other power subsystems including alternating current (“AC”) and/or direct current (“DC”) power devices.
- the power source 1138 also can interface with an external power system or charging equipment via a power I/O component 1140 . Because the mobile device 1100 can include additional and/or alternative components, the above embodiment should be understood as being illustrative of one possible operating environment for various embodiments of the concepts and technologies described herein. The described embodiment of the mobile device 1100 is illustrative, and should not be construed as being limiting in any way.
- communication media includes computer-executable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any delivery media.
- modulated data signal means a signal that has one or more of its characteristics changed or set in a manner as to encode information in the signal.
- communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared, and other wireless media. Combinations of the any of the above should also be included within the scope of computer-readable media.
- computer storage media may include volatile and non-volatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-executable instructions, data structures, program modules, or other data.
- computer media includes, but is not limited to, RAM, ROM, EPROM, EEPROM, flash memory or other solid state memory technology, CD-ROM, digital versatile disks (“DVD”), HD-DVD, BLU-RAY, or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by the mobile device 1100 or other devices or computers described herein, such as the computer system 800 described above with reference to FIG. 8 .
- Encoding the software modules presented herein also may transform the physical structure of the computer-readable media presented herein.
- the specific transformation of physical structure may depend on various factors, in different implementations of this description. Examples of such factors may include, but are not limited to, the technology used to implement the computer-readable media, whether the computer-readable media is characterized as primary or secondary storage, and the like.
- the computer-readable media is implemented as semiconductor-based memory
- the software disclosed herein may be encoded on the computer-readable media by transforming the physical state of the semiconductor memory.
- the software may transform the state of transistors, capacitors, or other discrete circuit elements constituting the semiconductor memory.
- the software also may transform the physical state of such components in order to store data thereupon.
- the computer-readable media disclosed herein may be implemented using magnetic or optical technology.
- the software presented herein may transform the physical state of magnetic or optical media, when the software is encoded therein. These transformations may include altering the magnetic characteristics of particular locations within given magnetic media. These transformations also may include altering the physical features or characteristics of particular locations within given optical media, to change the optical characteristics of those locations. Other transformations of physical media are possible without departing from the scope and spirit of the present description, with the foregoing examples provided only to facilitate this discussion.
- the mobile device 1100 may not include all of the components shown in FIG. 11 , may include other components that are not explicitly shown in FIG. 11 , or may utilize an architecture completely different than that shown in FIG. 11 .
- the network 1200 includes a cellular network 1202 , a packet data network 1204 , and a circuit switched network 1206 .
- the network 818 is or includes the network 1200 .
- the AV synchronization system 100 can be configured to communicate over the network 1200 .
- the cellular network 1202 can include various components such as, but not limited to, base transceiver stations (“BTSs”), Node-Bs or e-Node-Bs, base station controllers (“BSCs”), radio network controllers (“RNCs”), mobile switching centers (“MSCs”), mobility management entities (“MMEs”), short message service centers (“SMSCs”), multimedia messaging service centers (“MMSCs”), home location registers (“HLRs”), home subscriber servers (“HSSs”), visitor location registers (“VLRs”), charging platforms, billing platforms, voicemail platforms, GPRS core network components, location service nodes, and the like.
- the cellular network 1202 also includes radios and nodes for receiving and transmitting voice, data, and combinations thereof to and from radio transceivers, networks, the packet data network 1204 , and the circuit switched network 1206 .
- a mobile communications device 1208 such as, for example, a cellular telephone, a user equipment, a mobile terminal, a PDA, a laptop computer, a handheld computer, and combinations thereof, can be operatively connected to the cellular network 1202 .
- the mobile communications device 1208 can be configured similar to or the same as the mobile device 1100 described above with reference to FIG. 9 .
- the cellular network 1202 can be configured as a GSM) network and can provide data communications via GPRS and/or EDGE. Additionally, or alternatively, the cellular network 1202 can be configured as a 3G Universal Mobile Telecommunications System (“UMTS”) network and can provide data communications via the HSPA protocol family, for example, HSDPA, EUL, and HSPA+.
- UMTS 3G Universal Mobile Telecommunications System
- the cellular network 1202 also is compatible with 4G mobile communications standards such as LTE, 5G mobile communications standards, or the like, as well as evolved and future mobile standards.
- the packet data network 1204 includes various systems, devices, servers, computers, databases, and other devices in communication with one another, as is generally known.
- the packet data network 1204 is or includes one or more WI-FI networks, each of which can include one or more WI-FI access points, routers, switches, and other WI-FI network components.
- the packet data network 1204 devices are accessible via one or more network links.
- the servers often store various files that are provided to a requesting device such as, for example, a computer, a terminal, a smartphone, or the like.
- the requesting device includes software for executing a web page in a format readable by the browser or other software.
- Other files and/or data may be accessible via “links” in the retrieved files, as is generally known.
- the packet data network 1204 includes or is in communication with the Internet.
- the packet data network 1204 can be or can include one or more of the PDNs 122 A- 122 N.
- the circuit switched network 1206 includes various hardware and software for providing circuit switched communications.
- the circuit switched network 1206 may include, or may be, what is often referred to as a plain old telephone system (“POTS”).
- POTS plain old telephone system
- the functionality of a circuit switched network 1206 or other circuit-switched network are generally known and will not be described herein in detail.
- the illustrated cellular network 1202 is shown in communication with the packet data network 1204 and a circuit switched network 1206 , though it should be appreciated that this is not necessarily the case.
- One or more Internet-capable devices 1210 such as a laptop, a portable device, or another suitable device, can communicate with one or more cellular networks 1202 , and devices connected thereto, through the packet data network 1204 . It also should be appreciated that the Internet-capable device 1210 can communicate with the packet data network 1204 through the circuit switched network 1206 , the cellular network 1202 , and/or via other networks (not illustrated).
- a communications device 1212 for example, a telephone, facsimile machine, modem, computer, or the like, can be in communication with the circuit switched network 1206 , and therethrough to the packet data network 1204 and/or the cellular network 1202 .
- the communications device 1212 can be an Internet-capable device, and can be substantially similar to the Internet-capable device 1210 .
- the AV synchronization system 100 can include the machine learning system 1300 or can be in communication with the machine learning system 1300 .
- the illustrated machine learning system 1300 includes one or more machine learning models 1302 .
- the machine learning models 1302 can include supervised and/or semi-supervised learning models.
- the machine learning model(s) 1302 can be created by the machine learning system 1300 based upon one or more machine learning algorithms 1304 .
- the machine learning algorithm(s) 1304 can be any existing, well-known algorithm, any proprietary algorithms, or any future machine learning algorithm.
- Some example machine learning algorithms 1304 include, but are not limited to, neural networks, gradient descent, linear regression, logistic regression, linear discriminant analysis, classification tree, regression tree, Naive Bayes, K-nearest neighbor, learning vector quantization, support vector machines, and the like. Classification and regression algorithms might find particular applicability to the concepts and technologies disclosed herein. Those skilled in the art will appreciate the applicability of various machine learning algorithms 1304 based upon the problem(s) to be solved by machine learning via the machine learning system 1300 .
- the machine learning system 1300 can control the creation of the machine learning models 1302 via one or more training parameters.
- the training parameters are selected modelers at the direction of an enterprise, for example.
- the training parameters are automatically selected based upon data provided in one or more training data sets 1306 .
- the training parameters can include, for example, a learning rate, a model size, a number of training passes, data shuffling, regularization, and/or other training parameters known to those skilled in the art.
- the training data in the training data sets 1306 can include, for example, a learning rate, a model size, a number of training passes, data shuffling, regularization, and/or other training parameters known to those skilled in the art.
- the learning rate is a training parameter defined by a constant value.
- the learning rate affects the speed at which the machine learning algorithm 1304 converges to the optimal weights.
- the machine learning algorithm 1304 can update the weights for every data example included in the training data set 1306 .
- the size of an update is controlled by the learning rate. A learning rate that is too high might prevent the machine learning algorithm 1304 from converging to the optimal weights. A learning rate that is too low might result in the machine learning algorithm 1304 requiring multiple training passes to converge to the optimal weights.
- the model size is regulated by the number of input features (“features”) 1308 in the training data set 1306 . A greater the number of features 1308 yields a greater number of possible patterns that can be determined from the training data set 1306 .
- the model size should be selected to balance the resources (e.g., compute, memory, storage, etc.) needed for training and the predictive power of the resultant machine learning model 1302 .
- the number of training passes indicates the number of training passes that the machine learning algorithm 1304 makes over the training data set 1306 during the training process.
- the number of training passes can be adjusted based, for example, on the size of the training data set 1306 , with larger training data sets being exposed to fewer training passes in consideration of time and/or resource utilization.
- the effectiveness of the resultant machine learning model 1302 can be increased by multiple training passes.
- Data shuffling is a training parameter designed to prevent the machine learning algorithm 1304 from reaching false optimal weights due to the order in which data contained in the training data set 1306 is processed. For example, data provided in rows and columns might be analyzed first row, second row, third row, etc., and thus an optimal weight might be obtained well before a full range of data has been considered. By data shuffling, the data contained in the training data set 1306 can be analyzed more thoroughly and mitigate bias in the resultant machine learning model 1302 .
- Regularization is a training parameter that helps to prevent the machine learning model 1302 from memorizing training data from the training data set 1306 .
- the machine learning model 1302 fits the training data set 1306 , but the predictive performance of the machine learning model 1302 is not acceptable.
- Regularization helps the machine learning system 1300 avoid this overfitting/memorization problem by adjusting extreme weight values of the features 1308 . For example, a feature that has a small weight value relative to the weight values of the other features in the training data set 1306 can be adjusted to zero.
- the machine learning system 1300 can determine model accuracy after training by using one or more evaluation data sets 1310 containing the same features 1308 ′ as the features 1308 in the training data set 1306 . This also prevents the machine learning model 1302 from simply memorizing the data contained in the training data set 1306 .
- the number of evaluation passes made by the machine learning system 1300 can be regulated by a target model accuracy that, when reached, ends the evaluation process and the machine learning model 1302 is considered ready for deployment.
- the machine learning model 1302 can perform a prediction operation (“prediction”) 1314 with an input data set 1312 having the same features 1308 ′′ as the features 1308 in the training data set 1306 and the features 1308 ′ of the evaluation data set 1310 .
- the results of the prediction 1314 are included in an output data set 1316 consisting of predicted data.
- the machine learning model 1302 can perform other operations, such as regression, classification, and others. As such, the example illustrated in FIG. 13 should not be construed as being limiting in any way.
Abstract
Description
- This application is a continuation of and claims priority to U.S. patent application Ser. No. 17/215,045, entitled “Audio and Video Synchronization,” filed Mar. 29, 2021, now allowed, which is incorporated herein by reference in its entirety.
- Worldwide consumption of audio-video content (also known as “AV content”) has been on a steady increase year over year. The advent of streaming services, improved Internet access capabilities, and device mobility have been strong drivers of this increase. Paramount to the user experience is the quality of the AV content. Users typically focus on the resolution of the video image, the frame rate, the presence of visual artifacts, and general audio quality. Users can become accustomed to temporary changes in any of these parameters. When the audio and video become out of synchronization, however, users may observe inconsistencies between when a person or character speaks and when their lips move. This occurrence is known as AV desynchronization, or what is colloquially known as a “lip sync” issue. While changes in the video image and/or audio quality are not ideal, these changes can be overlooked by many users and not disrupt their enjoyment of the AV content. A lip sync issue, on the other hand, is almost immediately noticed by nearly all users. As an example, singers are often cited for lip syncing at concerts or during television appearances because, as humans, we expect the movement of one's lips to directly coincide with the audio produced. Our inherent sensitivity to lip sync issues makes AV synchronization a primary goal of streaming service providers.
- Concepts and technologies disclosed herein are directed to audio and video synchronization. According to one aspect disclosed herein, an audio-video (“AV”) synchronization system can simultaneously capture samples of a pre-encode media stream and a post-encode media stream. The pre-encode media stream can include AV content prior to being encoded. The post-encode media stream can include the AV content after being encoded. The AV synchronization system can align a pre-encode video component of the pre-encode media stream with a post-encode video component. The AV synchronization system can determine a video offset between the pre-encode video component and the post-encode video component. The AV synchronization system can align a pre-encode audio component of the pre-encode media stream with a post-encode audio component of the post-encode media stream. The AV synchronization system can determine an audio offset between the pre-encode audio component and the post-encode audio component. The AV synchronization system can then compare the video offset and the audio offset to determine if the post-encode media stream is synchronized with the pre-encode media stream.
- In some embodiments, the AV synchronization system can align the pre-encode video component of the pre-encode media stream with the post-encode video component, and can align the pre-encode audio component of the pre-encode media stream with the post-encode audio component of the post-encode media stream during parallel processes. In some embodiments, the AV synchronization system also can determine the video offset between the pre-encode video component and the post-encode video component, and can determine the audio offset between the pre-encode audio component and the post-encode audio component during parallel processes.
- In some embodiments, the AV synchronization system can align the pre-encode video component of the pre-encode media stream with the post-encode video component, and can determine the video offset between the pre-encode video component and the post-encode video component via execution of a non-annotated video processing algorithm module. In particular, the AV synchronization system can execute the non-annotated video processing algorithm module to: generate, from the pre-encode video component and the post-encode video component, a plurality of thumbnail images; determine a plurality of search ranges for an iterative search process used to find a first alignment point between the pre-encode video component and the post-encode video component; compare the thumbnail images to determine a plurality of distance values; determine a second alignment point between the pre-encode video component and the post-encode video component, wherein the second alignment point is where a distance value of the plurality of distance values is minimized; determine the video offset based upon the first alignment point and the second alignment point; and output the video offset.
- In some embodiments, the AV synchronization system can align the pre-encode audio component of the pre-encode media stream with the post-encode audio component of the post-encode media stream, and can determine the audio offset between the pre-encode audio component and the post-encode audio component via execution of an audio-video synchronization script module. In particular, the AV synchronization system can execute the audio-video synchronization script module to: divide the pre-encode audio component and the post-encode audio component into a plurality of time slices, wherein each time slice of the plurality of time slices is associated with a corresponding video frame; generate acoustic fingerprints based upon the plurality of time slices; perform fingerprint matching using the acoustic fingerprints and determining the audio offset therefrom; compare the audio offset and the video offset; determine, based upon comparing the audio offset and the video offset, whether or not the pre-encode media stream and the post-encode media stream are synchronized; and output an audio-visual synchronization evaluation result including an indication of whether or not the pre-encode media stream and the post-encode media stream are synchronized.
- It should be appreciated that the above-described subject matter may be implemented as a computer-controlled apparatus, a computer process, a computing system, or as an article of manufacture such as a computer-readable storage medium. These and various other features will be apparent from a reading of the following Detailed Description and a review of the associated drawings.
- This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended that this Summary be used to limit the scope of the claimed subject matter. Furthermore, the claimed subject matter is not limited to implementations that solve any or all disadvantages noted in any part of this disclosure.
-
FIG. 1 is a block diagram illustrating an AV synchronization system in which aspects of the concepts and technologies disclosed herein can be implemented. -
FIG. 2A is a diagram illustrating synchronized pre-encode and post-encode AV content, according to an illustrative embodiment. -
FIG. 2B is a diagram illustrating de-synchronized pre-encode and post-encode AV content with audio ahead of video, according to an illustrative embodiment. -
FIG. 2C is a diagram illustrating de-synchronized pre-encode and post-encode AV content with audio behind video, according to an illustrative embodiment. -
FIG. 3 is a diagram illustrating an example translation of audio frames to video frames, according to an illustrative embodiment. -
FIGS. 4A and 4B are diagrams illustrating an example of a micro-iteration iterative search, according to an illustrative embodiment. -
FIG. 5 is a diagram illustrating an example of a macro-iteration iterative search, according to an illustrative embodiment. -
FIG. 6 is a flow diagram illustrating aspects of a method for determining whether pre-encode and post-encode media streams are synchronized, according to an illustrative embodiment. -
FIG. 7 is a flow diagram illustrating aspects of another method for determining whether pre-encode and post-encode media streams are synchronized, according to an illustrative embodiment. -
FIG. 8 is a block diagram illustrating an example computer system capable of implementing aspects of the embodiments presented herein. -
FIG. 9 is a block diagram illustrating an example containerized cloud architecture and components thereof capable of implementing aspects of the embodiments presented herein. -
FIG. 10 is a block diagram illustrating an example virtualized cloud architecture and components thereof capable of implementing aspects of the embodiments presented herein. -
FIG. 11 is a block diagram illustrating an example mobile device capable of implementing aspects of the embodiments disclosed herein. -
FIG. 12 is a diagram illustrating a network, according to an illustrative embodiment. -
FIG. 13 is a diagram illustrating a machine learning system, according to an illustrative embodiment. - Many streaming platforms experience problems with AV synchronization. The degradation in user experience has become a customer pain point, and to compound the problem, it is often difficult to diagnose the source of the problem. The concepts and technologies disclosed herein focus on video encoders as the first possible source of synchronization problems. In particular, the concepts and technologies disclosed herein provide a full reference-based analysis to compare alignment results from content pre-encode and post-encode to determine if the video encoder introduced any synchronization issues. A distinct advantage to this approach versus other solutions is the ability to detect if AV desynchronization has occurred without relying on talking heads and “lip-reading” algorithms. Moreover, streaming service providers that have access to the pre-encode (also referred to herein as “reference”) video also can determine if the desynchronization has caused the audio to be “behind” or “ahead” of the video. The concepts and technologies disclosed herein focus, in part, on the ability to accurately and reliably detect when audio and video from post-encode media streams fail to retain synchronization with their pre-encode media stream counterparts. This can be achieved by simultaneously capturing samples of both the pre-encode and post-encode media streams, and then separately aligning the audio and video to determine if the offsets between video pre-encode and post-encode reflect the offsets between the audio pre-encode and post-encode.
- The disclosed solution is a software tool that can reside in the same pod as the encoder(s) (i.e., for KUBERNETES deployments in which a pod is one or more KUBERNETES containers; see
FIG. 9 for an example containerized cloud architecture). The software tool can cycle through all channels of a given pod to determine if the audio is synchronized with the video. The software tool can ingest a few seconds (e.g., 6-8 seconds) of AV content from a multi-cast source being sent by video providers (i.e., reference AV content), and can compare this reference segment to one segment of encoded AV content to determine if the audio and video are synchronized. The targeted runtime for each channel validation is 30 seconds, although this target may be changed based upon the needs of a given implementation. Moreover, the software tool is designed to run continuously. If a problem occurs, the software tool can report the problem to other platforms that can trigger alarms and perform the appropriate corrective actions to address the problem. - While the subject matter described herein may be presented, at times, in the general context of program modules that execute in conjunction with the execution of an operating system and application programs on a computer system, those skilled in the art will recognize that other implementations may be performed in combination with other types of program modules. Generally, program modules include routines, programs, components, data structures, computer-executable instructions, and/or other types of structures that perform particular tasks or implement particular abstract data types. Moreover, those skilled in the art will appreciate that the subject matter described herein may be practiced with other computer systems, including hand-held devices, mobile devices, wireless devices, multiprocessor systems, distributed computing systems, microprocessor-based or programmable consumer electronics, minicomputers, mainframe computers, routers, switches, other computing devices described herein, and the like.
- Referring now
FIG. 1 , a block diagram illustrating anAV synchronization system 100 in which aspects of the concepts and technologies disclosed herein can be implemented will be described. TheAV synchronization system 100 can be implemented, at least in part, in a computer system, such as anexample computer system 800 that is illustrated and described with reference toFIG. 8 . TheAV synchronization system 100 alternatively can be implemented, at least in part, in a containerized architecture, such as an example containerizedcloud architecture 900 that is illustrated and described herein with reference toFIG. 9 . TheAV synchronization system 100 can be implemented, at least in part, in a virtualized cloud architecture, such as an examplevirtualized cloud architecture 1000 that is illustrated and described herein with reference toFIG. 10 . Moreover, aspects of theAV synchronization system 100 may be implemented, at least in part, through the use of machine learning technologies, such as via an examplemachine learning system 1300 that is illustrated and described herein with reference toFIG. 13 . Those skilled in the art will appreciate that theAV synchronization system 100 can be deployed in various ways on different architectures based upon the needs of a given implementation. Accordingly, the examples set forth herein should not be construed as being limiting to the manner in which theAV synchronization system 100 is implemented. - In the example illustrated in
FIG. 1 , theAV synchronization system 100 can receive a pre-encode AV content 102 (also referred to herein, at times, as “reference AV content”) from a video provider or other source. Thepre-encode AV content 102 can be received via a unicast or multi-cast source, although the latter is more likely in many real-world implementations. Thepre-encode AV content 102 can include a pre-encode media stream that includes both a reference audio sequence and a reference video sequence (i.e., a reference sequence of audio frames and a reference sequence of video frames) prior to any encoding by one or moreAV content encoders 104. In other words, thepre-encode AV content 102 has no change from the original quality, or is the source AV content itself. The AV content encoder(s) 104 can include any audio encoder(s) that utilize any audio codec based on the needs of a video server provider to encode the audio data of thepre-encode AV content 102 prior to distribution. The AV content encoder(s) 104 can include any video encoder(s) that utilize any video codec based on the needs of the video service provider to encode the video data of thepre-encode AV content 102 prior to distribution. The audio and video data portions of thepre-encode AV content 102 are synchronized such that the correct audio frame(s) is played during the correct video frame(s) as intended. It should be understood that thepre-encode AV content 102 can include real-time live streamed content as well as other media such as, but not limited to, video files. - The
AV synchronization system 100 can also receive a post-encode AV content 106 (also referred to herein, at times, as “distorted AV content”). Thepost-encode AV content 106 can include a media stream that includes both a distorted audio sequence and a distorted video sequence (i.e., a distorted sequence of audio frames and a distorted sequence of video frames) after encoding by one or more of theAV content encoders 104. In other words, thepost-encode AV content 106 exhibits variance in quality, frame rate, playback delay, or some other alteration from the source AV content. Thepost-encode AV content 106 can be compared against thereference AV content 102 to establish if a change to the AV synchronization found has occurred post-encode. It should be understood that thepost-encode AV content 106 can include real-time live streamed content as well as other media such as, but not limited to, video files. - The subject matter of the
reference AV content 102 and the distortedAV content 106 is inconsequential to the fundamental operation of the concepts and technologies disclosed herein. However, those skilled in the art will appreciate the application of the concepts and technologies disclosed herein to certain AV content, such as movies and television shows in which characters may be shown in speaking roles, may benefit more in terms of improving user experience than content in which audio used as an overlay to video such as might be the case of a narrated documentary or the like. - As used herein, a “video frame” is a single second of video data for 1/video-framerate. For example, one video frame for a 29.97 frame-per-second (“FPS”) video would be 1/29.97 of that video sequence, or approximately 33.37 milliseconds (“ms”) in terms of time. If being rendered, a video frame is the contents of a frame buffer. Video frames are used herein as a proxy for units of time because the output from the disclosed video alignment algorithm (described below) is purely in video frames, which each refer to actual images or frame buffer contents. At 59.94 FPS, the duration of one video frame is approximately 16.67 ms. However, at 29.97 FPS, the duration of one video frame would be twice as long—33.37 ms.
- As used herein, an “audio frame” is the data of an audio file (e.g., a .wav file or other audio file) that is rendered once per sample at an audio sampling rate, such as, for example, 44100 Hertz (“Hz”). An audio frame encompasses any sub-frames for each audio channel (e.g., left, right, center, etc.).
- As used herein, an “alignment” refers to the process of matching two distinct sequences of audio or video on a frame-by-frame basis (examples shown in
FIGS. 2A-2C ). For video sequences, the matching process is per video frame. For audio sequences, the matching process is based upon the audio data per video frame. For example, ifvideo frame 1 in the distorted video sequence is determined to be equivalent tovideo frame 100 in the reference video sequence, an alignment is found at that point. - As used herein, a “temporal distance” between aligned points in the reference sequences and the distorted sequences is how far apart the sequences are from playing the same content. The “temporal distance” is also referred to herein as the “offset.” When the offset between the reference and distorted audio sequences (i.e., the “audio offset”) is not equal to the offset between the reference and distorted video sequences (i.e., the “video offset”), the sequences are referred to as being desynchronized, and the offsets can be used to determine by how much time the sequences are desynchronized.
- The
reference AV content 102 and the distortedAV content 106 are provided to amultimedia framework 107. Themultimedia framework 107 may be proprietary or open-source. By way of example, and not limitation, themultimedia framework 107 can be implemented via FFmpeg, which is a multimedia framework available under the GNU Lesser General Public License (“LGPL”) version 2.1 or later and the GNU General Public License (“GPL”)version 2 or later. Those skilled in the art will appreciate the applicability of other multimedia frameworks and other software that are capable of performing certain operations described herein. Accordingly, themultimedia framework 107 embodied as FFmpeg should not be construed as being limiting in any way. - The
multimedia framework 107 can provide functionality such as decoding, encoding, transcoding, multiplexing, demultiplexing, streaming, filtering, and playback. Themultimedia framework 107 can include the AV content encoder(s) 104 as shown, although the AV content encoder(s) 104 may be separate from themultimedia framework 107. In the illustrated embodiment, themultimedia framework 107 streams thepre-encode AV content 102 and thepost-encode AV content 106 and performs demultiplexing viademultiplexers demux 108A” and “demux 108B”) to separate the audio and video sequences. In particular, thedemux 108A can demultiplex thepre-encode AV content 102 to create a pre-encode video sequence (shown as “pre-encode video”) 110 and a pre-encode audio sequence (shown as “pre-encode audio”) 112. Likewise, thedemux 108B can demultiplex thepost-encode AV content 106 to create a post-encode video sequence (shown as “post-encode video”) 114 and a post-encode audio sequence (shown as “post-encode audio”) 116. Thedemux 108A can output thepre-encode video 110 to a first video pipe (“video pipe 1 118A”) and thepre-encode audio 112 to a first audio pipe (“audio pipe1”) 120A. Thedemux 108B can output thepost-encode video 114 to a second video pipe (“video pipe 2 118B”) and thepost-encode audio 116 to a second audio pipe (“audio pipe2”) 120B. - The
video pipes NAVPAM 122 can analyze two simultaneous video image sequence captures of the same content, such as thepre-encode video 110 and thepost-encode video 114. TheNAVPAM 122 addresses the complex problem of divergent video content alignment. The output of theNAVPAM 122 is a video offset 124. The video offset 124 is equivalent to the temporal distance between aligned points in thepre-encode video 110 and thepost-encode video 114. - The
audio pipes AVSSM 126 can generate an audio offset 128. TheAVSSM 126 can compare the video offset 124 to the audio offset 128 to determine an AV synchronization evaluation result (shown as “result”) 130. Theresult 130 indicates whether or not the post-encode AV content 106 (i.e., distorted) is correctly synchronized with the pre-encode AV content 102 (i.e., reference). Additional details in this regard will be described herein. - The
NAVPAM 122 and theAVSSM 126 can be separate software modules as illustrated or can be combined into one software module. TheNAVPAM 122 and/or theAVSSM 126 can be implemented in hardware such as via a field-programmable gate array (“FPGA”). The operations performed by theNAVPAM 122 and theAVSSM 126 will be described separately, but it should be understood that these operations may be performed by theNAVPAM 122 and theAVSSM 126 simultaneously. - The illustrated
NAVPAM 122 includes a plurality of sub-modules that perform various functions. TheNAVPAM 122 is illustrated in this manner for ease of explanation. In practice, theNAVPAM 122 can combine the functionality of the sub-modules. Alternatively, each of the plurality of sub-modules can be a standalone software module, the output of which can be used as input for the next sub-module such as in the flow shown in the illustrated example. - The
NAVPAM 122 includes athumbnail image generator 132 that receives thepre-encode video 110 and thepost-encode video 114 viarespective video pipes pre-encode video 110 and thepost-encode video 114 intothumbnail images 134 that are suitable for processing. The thumbnail images 134 (also referred to as “binary” or “bi-tonal” images) are lower-resolution images with a color space that has been compressed from a 24-bit channel representation per pixel to a single bit. This results in an image that can be delineated with two colors, with the value of each pixel being either 1 or 0. Thethumbnail image generator 132 also can discard any duplicate frames 136. Thethumbnail image generator 132 can provide thethumbnail images 134 to asearch range determiner 138. - The
search range determiner 138 can determine a plurality of search ranges 140 that can be used to perform an iterative search for a first alignment point between thepre-encode video 110 and thepost-encode video 114. This determination can be based upon one or more search parameters that define the granularity of the search. Additional details about two search strategies will be described below. - The
NAVPAM 122 also includes athumbnail image comparator 142. Thethumbnail image comparator 142 compares pairs of thethumbnail images 134 and determines adistance value 144 for each comparison. Thethumbnail image comparator 142 provides the distance values 144 to analignment determiner 146. Thealignment determiner 146 analyzes the distance values 144 to determine an alignment where the distance between thethumbnail images 134 is minimized and outputs the video offset 124. - The illustrated
AVSSM 126 includes a plurality of sub-modules that perform various functions. TheAVSSM 126 is illustrated in this manner for ease of explanation. In practice, theAVSSM 126 can combine the functionality of the sub-modules. Alternatively, each of the plurality of sub-modules can be a standalone software module, the output of which can be used as input for the next sub-module such as in the flow shown in the illustrated example. - The
AVSSM 126 receives thepre-encode audio 112 and thepost-encode audio 116 via theaudio pipes AVSSM 126 includes an audio-video frame correlator 148 that divides thepre-encode audio 112 and thepost-encode audio 116 intotime slices 150, wherein eachtime slice 150 is associated with a corresponding video frame. The audio-video frame correlator 148 outputs thetime slices 150 to anacoustic fingerprint generator 154. Theacoustic fingerprint generator 154 generates acoustic fingerprints 156 from the time slices 150. In some embodiments, theacoustic fingerprint generator 154 utilizes open source fingerprint extraction software, such as, for example, Chromaprint (available from the Acousticid project). Other software, including proprietary software can also be used to generate the acoustic fingerprints 156. The acoustic fingerprints 156 can be used to quickly identify portions of thepre-encode audio 112 and thepost-encode audio 116 during the time slices 150. - The
AVSSM 126 also includes afingerprint matcher 158. Thefingerprint matcher 158 can compare the acoustic fingerprints 156 between frames for similarity. In some embodiments, thefingerprint matcher 158 utilizes fuzzy string matching. Many programming languages have well-developed libraries that provide fuzzy string matching functionality. One such library is FuzzyWuzzy for Python. FuzzyWuzzy uses Levenshtein distance to correlate two fingerprints, such as the acoustic fingerprints 156, and return a score from 0-100 that represents how close the fingerprints are. In testing, the acoustic fingerprints 156 that score 90-100 are viable andscores 80 and below can yield inaccurate alignment results. Other fuzzy string matching software may require further tweaking to obtain accurate alignment results. Although FuzzyWuzzy is described herein, other solutions that provide fuzzy string matching can be implemented without departing from the scope of this disclosure. The output of thefingerprint matcher 158 is the audio offset 128 in video frame numbers. Since the video offset 124 and the audio offset 128 are both known and utilize video frame numbers as a common metric, the video offset 124 and the audio offset 128 can be compared to determine whether or not the post-encode (distorted)AV content 106 is correctly synchronized with the pre-encode (reference)AV content 102, which analignment comparator 160 can output as theresult 130. - Although the
NAVPAM 122, theAVSSM 126, and the various sub-modules thereof are described separately and sequentially, it should be understood that the operations performed by theNAVPAM 122 and theAVSSM 126 can be conducted in parallel and likely will be conducted parallel in real-world implementations. Accordingly, any particular order used to describe the operations performed by theNAVPAM 122 and theAVSSM 126 should not be construed as being limiting in any way. Moreover, additional details of the operations performed by theNAVPAM 122 and theAVSSM 126 will become apparent to those skilled in the art from the description of the remaining FIGURES. - Turning now to
FIG. 2A , a synchronization diagram 200A illustrating example sequences of the pre-encode AV (reference)content 102 and the post-encode (distorted)AV content 106 in synchronization will be described, according to an illustrative embodiment. Thepre-encode video 110 includes multiple reference video frames 202A, 202B (collectively “reference video frames 202”). Although only two reference video frames 202A, 202B are illustrated, thepre-encode video 110 can contain any number of reference video frames 202. Thepre-encode audio 112 includes multiple reference audio frames 204A, 204B (collectively “reference audio frames 204”). Although only two reference audio frames 204A, 204B are illustrated, thepre-encode audio 112 can contain any number of reference audio frames 204. Thepost-encode video 114 includes multiple distortedvideo frames video frames post-encode video 114 can contain any number of distorted video frames 206. Thepost-encode audio 116 includes multiple distortedaudio frames post-encode audio 116 can contain any number of distorted audio frames 208. - The synchronization diagram 200A also illustrates an alignment between the
reference video frame 202B and the distortedvideo frame 206B with a video offset 124 of 100 milliseconds (“ms”). The synchronization diagram 200A also illustrates an alignment between thereference audio frame 204B and the distortedaudio frame 208B with an audio offset 128 of 100 ms. When the video offset 124 between thepre-encode video 110 and thepost-encode video 114 is the same as the audio offset 128 between thepre-encode audio 112 and thepost-encode audio 116, the audio playing at each of the pre-encode video frames 202 is the same audio playing at the corresponding distorted video frames 206. Even if the alignment is not perfect, so long as the difference between the video offset 124 and the audio offset 128 is small enough to be virtually imperceptible to users, thepre-encode AV content 102 and thepost-encode AV content 106 can be considered synchronized as shown in the synchronization diagram 200A. - Desynchronization is defined herein as when the audio offset 128 is not equal to the video offset 124. If the difference between the audio offset 128 and the video offset 124 is sufficiently large, the desynchronization can be perceptible to users and can be verified by observing lip sync issues during playback. If the audio offset 128 is less than the video offset 124, then the audio is played after it is supposed to (i.e., the audio is ahead of the video). If the audio offset 128 is more than the video offset 124, then the audio is played before it is supposed to (i.e., the audio is behind the video). Both of these scenarios can be identified as desynchronization. For example, if a pre-encode capture is 100 ms ahead of a post-encode, then the audio for the pre-encode capture should be 100 ms ahead of the post-encode capture, and the video for the pre-encode capture should be 100 ms ahead of the post-encode capture. If these two offsets are different, such as, for example, the pre-encode video is ahead 100 ms, but the pre-encode audio is 125 ms ahead, then the audio and video have de-synchronized by 25 ms.
- It should be understood that the International Telecommunications Union (“ITU”) has provided a recommendation for the relative timing of sound and vision for broadcasting (ITU-R BT.1359-1). What constitutes a perceptible time difference between the audio offset 128 and the video offset 124 can be determined based upon the foregoing ITU recommendation.
- Turning now to
FIG. 2B , a desynchronization diagram 200B illustrating example sequences of the pre-encode AV (reference)content 102 and the post-encode (distorted)AV content 106 in desynchronization will be described, according to an illustrative embodiment. The example shown inFIG. 2B illustrates when the audio is played before it is supposed to, which can occur when the audio offset 128 between thepost-encode audio 116 and thepre-encode audio 112 is smaller than the video offset 124 between thepost-encode video 114 and thepre-encode video 110. - Turning now to
FIG. 2C , another desynchronization diagram 200C illustrating example sequences of the pre-encode AV (reference)content 102 and the post-encode (distorted)AV content 106 in desynchronization will be described, according to an illustrative embodiment. The example shown inFIG. 2C illustrates when the audio is played after it is supposed to, which can occur when the audio offset 128 between thepost-encode audio 116 and thepre-encode audio 112 is larger than the video offset 124 between thepost-encode video 114 and thepre-encode video 110. - Conventionally, video alignment is reconciled in terms of video frames instead of elapsed time (e.g., minutes, seconds, milliseconds, etc.). Video frames function as a proxy for time or temporal displacement and can be converted back and forth between elapsed time by dividing video frames by the video frame rate (assumed constant) of the capture. Similarly, audio data can be reconciled in a congruent manner; but instead of video frame rate, the audio frame rate is the sampling rate. The number of audio frames per second in an audio file sampled at 48 kilohertz (“kHz”) is 48,000. If the corresponding video is captured at 60 frames per second, then it can be determined that approximately every 800 frames of audio would correspond to one frame of video (i.e., 48000/60=800). Turning to
FIG. 3 , a diagram 300 illustrates this concept. In particular,FIG. 3 illustrates video frames 302 fromvideo frame 1 tovideo frame 60 and corresponding audio frames 304 from 800 to 48,000 in increments of 800. - For each video frame in the
post-encode video 114, an index of the audio frame (“ax”) can be extrapolated for video frame (“N”) as a function of the video frame rate (“vFPS”) and the audio sampling rate (“aSR”): -
- Where the amount of audio frames (“aL”) is equal to the fingerprint size (“vFP”) in video frames:
-
- Thus, for example, if it is desired to obtain audio data at
video frame 100, where the video frame rate is 59.97 FPS, the audio sampling rate is 48000 Hz, and the size of the fingerprint is 164 video frames, then: -
- The fractional portion of the values can be discarded because these values are not valid indices. Thus, at audio frame 80040, 131265 frames are retrieved for the fingerprint.
- Fuzzy string matching can be used to compare fingerprint data. Many programming languages have well-developed libraries that provide fuzzy string matching functionality. One such library is FuzzyWuzzy for Python. FuzzyWuzzy uses Levenshtein distance to correlate two fingerprints and return a score from 0-100 that represents how close the fingerprints are. In testing, fingerprints that score 90-100 are viable and
scores 80 and below can yield inaccurate alignment results. Other fuzzy string matching software may require further tweaking to obtain accurate alignment results. Although FuzzyWuzzy is described herein, other solutions that provide fuzzy string matching can be implemented without departing from the scope of this disclosure. - Because the present disclosure aligns the
post-encode AV content 106 and thepre-encode AV content 102 based on video frame number, the granularity of a search through the audio frame data is equal to the audio sampling rate divided by the video frame rate. In the previous example, at a sampling rate of 48 kHz and a video frame rate of 60 FPS, the granularity of the search through the audio data was 800 frames wide. There are two search strategies to find the first alignment point: iterative search and hashing fingerprints. Iterative search will now be described. - Turning now to
FIGS. 4A and 4B , diagrams 400A, 400B illustrating an example of a micro-iteration iterative search will be described, according to an illustrative embodiment. As used herein, “micro-iteration” and “macro-iteration” are used to differentiate iterative searches based on different step sizes in terms of frames. Performing an exhaustive search is slow, but has the advantage of being simple to implement and also suitable, if not ideal, for parallelization. In the example shown inFIG. 4A , the diagram 400A illustrates a distortedaudio sequence 402, such as the distortedaudio 116. In the example shown inFIG. 4B , the diagram 400B illustrates areference audio sequence 404. The distortedaudio sequence 402 and thereference audio sequence 404 have a fingerprint size of 164. - Due to the computational complexity upper-bound in the worst-case, the search range in this example is limited to max_search_range of 800 video frames (i.e., 13.3 seconds at 60 FPS) for the following example:
-
Iteration Distorted Range Reference Range 1 1-164 1-164 2 1-164 2-165 3 1-164 3-166 4 1-164 4-167 5 1-164 5-168 . . . . . . . . . 800 1-164 800-964 801 2-165 1-164 802 2-165 2-165 803 2-165 3-166 - Fingerprints can then be generated for the two ranges being compared, and the correlation score can be compared. If a match is found, the search can end short of reaching the max_search_range. Micro-iteration can be terminated and the process can proceed to macro-iteration. After macro-iteration, the process can be repeated to find the next alignment point past the ranges through which macro-iteration occurred. Since iterating over the audio sequence tends to retrieve the same set of audio data from the reference sequence (i.e.,
iteration 1 and 801 access the same reference data), caching this data can improve performance. - In the worst case, both audio sequences will be 100% different, and the entire reference sequence should be iteratively searched. This could potentially be rectified by generating and comparing two fingerprints for the entirety of each sequence, and using the correlation score to determine if it would be worth attempting to align the audio sequences.
- After finding the first alignment point, the iterative search can continue by stepping through the distorted and reference audio sequences, where each step size is the number of video frames that is set for the fingerprint size. Here, the assumption is that because the playback rates for the distorted and reference audio data are the same and that no stalls are introduced into the audio data, once an offset is established through finding the first alignment, the remainder of the sequences should be aligned. This greatly reduces the search space by a factor equal to the fingerprint size. However, while macro iterating, when the two supposedly aligned samples do not satisfy the correlation threshold, macro-iteration can be stopped and micro-iteration can begin at the last known sample position above the correlation threshold (e.g., 95). In the example shown in
FIG. 5 , if a first alignment point is found for distortedframe 1 atreference frame 100 with a correlation score of 100, then macro-iteration in steps of the fingerprint size (i.e., 164) would commence. -
Macro-Iteration Distorted Range Reference Range Correlation 1 1-164 100-264 100 2 164-328 264-428 100 3* 328-492 428-592 95 4 492-656 592-756 80
Because the correlation initeration 4 is beneath the correlation threshold of 95, the process returns to the previous iteration,iteration 3, and proceeds to check frame-by-frame for the last range where the correlation between fingerprints meets the correlation threshold. For example: -
Micro-Iteration Distorted Range Reference Range Correlation 1 329-493 429-593 98 2 330-494 430-594 96 3 331-495 431-595 9 . . . . . . . . . <95 163 491-655 591-755 81
In the example above, there are no correlation scores 95 or greater from micro-iteration 3-163. Since the last sufficient correlation found is atmicro-iteration 3, the macro-interaction process can be concluded. The process can then return to finding the next alignment point for the next sample (i.e., distorted range 332-496), in the reference sequence where the correlation score is greater than or equal to the correlation threshold. - Turning now to
FIG. 6 , amethod 600 for determining whether pre-encode and post-encode media streams, such as thepre-encode AV content 102 and thepost-encode AV content 106, are synchronized will be described, according to an illustrative embodiment. It should be understood that the operations of the methods disclosed herein are not necessarily presented in any particular order and that performance of some or all of the operations in an alternative order(s) is possible and is contemplated. The operations have been presented in the demonstrated order for ease of description and illustration. Operations may be added, omitted, and/or performed simultaneously, without departing from the scope of the concepts and technologies disclosed herein. - It also should be understood that the methods disclosed herein can be ended at any time and need not be performed in its entirety. Some or all operations of the methods, and/or substantially equivalent operations, can be performed by execution of computer-readable instructions included on a computer storage media, as defined herein. The term “computer-readable instructions,” and variants thereof, as used herein, is used expansively to include routines, applications, application modules, program modules, programs, components, data structures, algorithms, and the like. Computer-readable instructions can be implemented on various system configurations including single-processor or multiprocessor systems, minicomputers, mainframe computers, personal computers, hand-held computing devices, microprocessor-based, programmable consumer electronics, combinations thereof, and the like.
- Thus, it should be appreciated that the logical operations described herein are implemented (1) as a sequence of computer implemented acts or program modules running on a computing system and/or (2) as interconnected machine logic circuits or circuit modules within the computing system. The implementation is a matter of choice dependent on the performance and other requirements of the computing system. Accordingly, the logical operations described herein are referred to variously as states, operations, structural devices, acts, or modules. These states, operations, structural devices, acts, and modules may be implemented in software, in firmware, in special purpose digital logic, and any combination thereof. As used herein, the phrase “cause a processor to perform operations” and variants thereof is used to refer to causing a processor or multiple processors of one or more systems and/or one or more devices disclosed herein to perform one or more operations and/or causing the processor to direct other components of the computing system or device to perform one or more of the operations.
- The
method 600 begins and proceeds tooperation 602. Atoperation 602, theAV synchronization system 100 simultaneously captures samples of a pre-encode media stream (e.g., the pre-encode AV content 102) and a post-encode media stream (e.g., the post-encode AV content 106). Fromoperation 602, themethod 600 proceeds tooperation 604. Atoperation 604, theAV synchronization system 100, via execution of theNAVPAM 122, aligns a pre-encode video component of the pre-encode media stream (e.g., the pre-encode video 110) with a post-encode video component of the post-encode media stream (e.g., the post-encode video 114). Fromoperation 604, themethod 600 proceeds tooperation 606. Atoperation 606, theAV synchronization system 100, via execution of theNAVPAM 122, determines a video offset (e.g., the video offset 124) between the pre-encode video component (e.g., the pre-encode video 110) and the post-encode video component (e.g., the post-encode video 114). - From
operation 606, themethod 600 proceeds tooperation 608. Atoperation 608, theAV synchronization system 100, via execution of theAVSSM 126, aligns a pre-encode audio component of the pre-encode media stream (e.g., the pre-encode audio 112) with a post-encode audio component of the post-encode media stream (e.g., the post-encode audio 116). Fromoperation 608, themethod 600 proceeds tooperation 610. Atoperation 610, theAV synchronization system 100, via execution of theAVSSM 126, determines an audio offset (e.g., the audio offset 128) between the pre-encode audio component (e.g., the pre-encode audio 112) and the post-encode audio component (e.g., the post-encode audio 116). - From
operation 610, themethod 600 proceeds tooperation 612. Atoperation 612, theAV synchronization system 100, via execution of theAVSSM 126, compares the video offset 124 and the audio offset 128. Fromoperation 612, themethod 600 proceeds tooperation 614. Atoperation 614, theAV synchronization system 100, via execution of theAVSSM 126, determines, based upon the comparison atoperation 612, whether the video offset 124 and the audio offset 128 are equal. If theAV synchronization system 100 determines that the video offset 124 and the audio offset 128 are equal, themethod 600 proceeds tooperation 616. Atoperation 616, theAV synchronization system 100 determines that the pre-encode media stream (e.g., the pre-encode AV content 102) and the post-encode media stream (e.g., the post-encode AV content 106) are synchronized. Fromoperation 616, themethod 600 proceeds tooperation 618. Themethod 600 can end atoperation 618. Returning tooperation 614, if theAV synchronization system 100 determines that the video offset 124 and the audio offset 128 are not equal, themethod 600 proceeds tooperation 620. - At
operation 620, theAV synchronization system 100 determines whether the audio offset 128 is less than (<) or greater than (>) the video offset 124. If the audio offset 128 is less than the video offset 124, themethod 600 proceeds to operation 622. At operation 622, theAV synchronization system 100 determines that the post-encode audio component (e.g., the post-encode audio 116) is ahead of the post-encode video component (e.g., the post-encode video 114), which is representative of thepre-encode AV content 102 and thepost-encode AV content 106 being desynchronized with the audio being played before the video. From operation 622, themethod 600 proceeds tooperation 618. Themethod 600 can end atoperation 618. Returning tooperation 620, if the audio offset 128 is greater than the video offset 124, themethod 600 proceeds tooperation 624. Atoperation 624, theAV synchronization system 100 determines that the post-encode audio component (e.g., the post-encode audio 116) is behind of the post-encode video component (e.g., the post-encode video 114), which is representative of thepre-encode AV content 102 and thepost-encode AV content 106 being desynchronized with the audio being played after the video. Themethod 600 then proceeds tooperation 618. Themethod 600 can end atoperation 618. - Turning now to
FIG. 7 , amethod 700 for determining whether pre-encode and post-encode media streams, such as thepre-encode AV content 102 and thepost-encode AV content 106, are synchronized will be described, according to an illustrative embodiment. Themethod 700 begins and proceeds tooperation 702. Atoperation 702, theAV synchronization system 100 receives thepre-encode AV content 102. Fromoperation 702, themethod 700 proceeds tooperation 704. Atoperation 704, theAV synchronization system 100 obtains thepost-encode AV content 106. Fromoperation 704, themethod 700 proceeds tooperation 706. Atoperation 706, theAV synchronization system 100 uses themultimedia framework 107 to perform an integrity check for thepre-encode AV content 102. Fromoperation 706, themethod 700 proceeds tooperation 708. Atoperation 708, theAV synchronization system 100 uses themultimedia framework 107 to perform an integrity check for thepost-encode AV content 106. The remaining operations of themethod 700 assume that both thepre-encode AV content 102 and thepost-encode AV content 106 pass the integrity checks atoperations pre-encode AV content 102 or thepost-encode AV content 106 fails, themultimedia framework 107 can output one or more errors. - From
operation 708, themethod 700 proceeds tooperation 710. Atoperation 710, theAV synchronization system 100 configures themultimedia framework 107. As noted above, themultimedia framework 107 can provide a broad range of functionality to process multimedia content such as thepre-encode AV content 102 and thepost-encode AV content 106. For the purposes of the concepts and technologies disclosed herein, andoperation 710 specifically, theAV synchronization system 100 can be configured to create thedemultiplexers demultiplexers - From
operation 710, themethod 700 proceeds tooperation 712. Atoperation 712, theAV synchronization system 100 uses themultimedia framework 107, and in particular, thedemultiplexer 108A to demultiplex thepre-encode AV content 102 into thepre-encode video 110 and thepost-encode audio 112 streams. Fromoperation 712, themethod 700 proceeds tooperation 714. Atoperation 714, theAV synchronization system 100 uses themultimedia framework 107, and in particular, thedemultiplexer 108B to demultiplex thepost-encode AV content 106 intopost-encode video 114 andpost-encode audio 116 components (streams). - From
operation 714, themethod 700 proceeds tooperation 716. Atoperation 716, theAV synchronization system 100 creates thefirst video pipe 118A and outputs thepre-encode video 110 on thefirst video pipe 118A towards theNAVPAM 122. Fromoperation 716, themethod 700 proceeds tooperation 718. Atoperation 718, theAV synchronization system 100 creates thefirst audio pipe 120A and outputs thepre-encode audio 112 on thefirst audio pipe 120A towards theAVSSM 126. Fromoperation 718, themethod 700 proceeds tooperation 720. Atoperation 720, theAV synchronization system 100 creates thesecond video pipe 118B and outputs the distortedvideo 114 on thesecond video pipe 118B towards theNAVPAM 122. Fromoperation 720, themethod 700 proceeds tooperation 722. Atoperation 722, theAV synchronization system 100 creates thesecond audio pipe 120B and outputs thepost-encode audio 116 on thesecond audio pipe 120B towards theAVSSM 126. - From
operation 722, themethod 700 proceeds tooperation 724. Atoperation 724, theNAVPAM 122 receives thepre-encode video 110 and thepost-encode video 114. Fromoperation 724, themethod 700 proceeds tooperation 726. Atoperation 726, theNAVPAM 122 generates thethumbnail images 134. Fromoperation 726, themethod 700 proceeds tooperation 728. Atoperation 728, theNAVPAM 122 determines the search ranges 140. Fromoperation 728, themethod 700 proceeds tooperation 730. Atoperation 730, theNAVPAM 122 calculates the distance values 144 between thethumbnail images 134. Fromoperation 730, themethod 700 proceeds tooperation 732. Atoperation 732, theNAVPAM 122 determines a best-fit alignment where the distance between thethumbnail images 134 is minimized. Fromoperation 732, themethod 700 proceeds tooperation 734. Atoperation 734, theNAVPAM 122 outputs the video offset 124 to theAVSSM 126. - From
operation 734, themethod 700 proceeds tooperation 736. Atoperation 736, the AVSSM receives thepre-encode audio 112 and thepost-encode audio 116. Fromoperation 736, themethod 700 proceeds tooperation 738. Atoperation 738, theAVSSM 126 correlates the audio and video frames. In particular, theAVSSM 126 divides thepre-encode audio 112 and thepost-encode audio 116 intotime slices 150, wherein eachtime slice 150 is associated with a corresponding video frame. - From
operation 738, themethod 700 proceeds tooperation 740. Atoperation 740, theAVSSM 126 generates the acoustic fingerprints 156. Fromoperation 740, themethod 700 proceeds tooperation 742. Atoperation 742, theAVSSM 126 compares the acoustic fingerprints 156 between frames for similarity and determines the audio offset 128. Fromoperation 742, themethod 700 proceeds tooperation 744. Atoperation 744, theAVSSM 126 compares the video offset 125 and the audio offset 128 to determine whether or not the post-encode (distorted)AV content 106 is correctly synchronized with the pre-encode (reference)AV content 102. Fromoperation 744, themethod 700 proceeds tooperation 746. Atoperation 746, theAVSSM 126 provides theresult 130 of the comparison atoperation 744. Theresult 130 indicates whether or not the post-encode (distorted)AV content 106 is correctly synchronized with the pre-encode (reference)AV content 102. - From
operation 746, themethod 700 proceeds tooperation 748. Themethod 700 can end atoperation 748. - Turning now to
FIG. 8 , a block diagram illustrating acomputer system 800 configured to provide the functionality described herein for AV synchronization in accordance with various embodiments of the concepts and technologies disclosed herein. In some embodiments, theAV synchronization system 100 is configured the same as or similar to thecomputer system 800. Thecomputer system 800 includes aprocessing unit 802, amemory 804, one or more user interface devices 806, one or more input/output (“I/O”)devices 808, and one ormore network devices 810, each of which is operatively connected to a system bus 812. The bus 812 enables bi-directional communication between theprocessing unit 802, thememory 804, the user interface devices 806, the I/O devices 808, and thenetwork devices 810. - The
processing unit 802 may be a standard central processor that performs arithmetic and logical operations, a more specific purpose programmable logic controller (“PLC”), a programmable gate array, or other type of processor known to those skilled in the art and suitable for controlling the operation of the server computer. Theprocessing unit 802 can be a single processing unit or a multiple processing unit that includes more than one processing component. Processing units are generally known, and therefore are not described in further detail herein. - The
memory 804 communicates with theprocessing unit 802 via the system bus 812. Thememory 804 can include a single memory component or multiple memory components. In some embodiments, thememory 804 is operatively connected to a memory controller (not shown) that enables communication with theprocessing unit 802 via the system bus 812. Thememory 804 includes anoperating system 814 and one ormore program modules 816. Theoperating system 814 can include, but is not limited to, members of the WINDOWS, WINDOWS CE, and/or WINDOWS MOBILE families of operating systems from MICROSOFT CORPORATION, the LINUX family of operating systems, the SYMBIAN family of operating systems from SYMBIAN LIMITED, the BREW family of operating systems from QUALCOMM CORPORATION, the MAC OS, iOS, and/or LEOPARD families of operating systems from APPLE CORPORATION, the FREEBSD family of operating systems, the SOLARIS family of operating systems from ORACLE CORPORATION, other operating systems, and the like. - The
program modules 816 may include various software and/or program modules described herein. In some embodiments, for example, theprogram modules 816 can include themultimedia framework 107, the AV content encoder(s) 104, theNAVPAM 122, theAVSSM 126, or a combination thereof. In some embodiments, multiple implementations of thecomputer system 800 can be used, wherein each implementation is configured to execute one or more of theprogram modules 816. Theprogram modules 816 and/or other programs can be embodied in computer-readable media containing instructions that, when executed by theprocessing unit 802, perform themethods program modules 816 may be embodied in hardware, software, firmware, or any combination thereof. Although not shown inFIG. 8 , it should be understood that thememory 804 also can be configured to store thepre-encode AV content 102, thepost-encode AV content 106, thepre-encode video 110, thepre-encode audio 112, thepost-encode video 114, thepost-encode audio 116, thethumbnail images 134, the distance values 144, the search ranges 140, the video offset 124, the audio offset 128, thetime slices 150, the acoustic fingerprints 156, theresult 130, combinations thereof, and/or other data disclosed herein. - By way of example, and not limitation, computer-readable media may include any available computer storage media or communication media that can be accessed by the
computer system 800. Communication media includes computer-readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any delivery media. The term “modulated data signal” means a signal that has one or more of its characteristics changed or set in a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media. Combinations of the any of the above should also be included within the scope of computer-readable media. - Computer storage media includes volatile and non-volatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules, or other data. Computer storage media includes, but is not limited to, RAM, ROM, Erasable Programmable ROM (“EPROM”), Electrically Erasable Programmable ROM (“EEPROM”), flash memory or other solid state memory technology, CD-ROM, digital versatile disks (“DVD”), or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by the
computer system 800. In the claims, the phrase “computer storage medium,” “computer-readable storage medium,” and variations thereof does not include waves or signals per se and/or communication media, and therefore should be construed as being directed to “non-transitory” media only. - The user interface devices 806 may include one or more devices with which a user accesses the
computer system 800. The user interface devices 806 may include, but are not limited to, computers, servers, personal digital assistants, cellular phones, or any suitable computing devices. The I/O devices 808 enable a user to interface with theprogram modules 816. In one embodiment, the I/O devices 808 are operatively connected to an I/O controller (not shown) that enables communication with theprocessing unit 802 via the system bus 812. The I/O devices 808 may include one or more input devices, such as, but not limited to, a keyboard, a mouse, or an electronic stylus. Further, the I/O devices 808 may include one or more output devices, such as, but not limited to, a display screen or a printer. - The
network devices 810 enable thecomputer system 800 to communicate with other networks or remote systems via anetwork 818. Examples of thenetwork devices 810 include, but are not limited to, a modem, a radio frequency (“RF”) or infrared (“IR”) transceiver, a telephonic interface, a bridge, a router, or a network card. Thenetwork 818 may include a wireless network such as, but not limited to, a Wireless Local Area Network (“WLAN”) such as a WI-FI network, a Wireless Wide Area Network (“WWAN”), a Wireless Personal Area Network (“WPAN”) such as BLUETOOTH, a Wireless Metropolitan Area Network (“WMAN”) such a WiMAX network, or a cellular network. Alternatively, thenetwork 818 may be a wired network such as, but not limited to, a Wide Area Network (“WAN”) such as the Internet, a Local Area Network (“LAN”) such as the Ethernet, a wired Personal Area Network (“PAN”), or a wired Metropolitan Area Network (“MAN”). - Turning now to
FIG. 9 , a block diagram illustrating an exemplarycontainerized cloud architecture 900 capable of implementing, at least in part, aspects of the concepts and technologies disclosed herein will be described, according to an illustrative embodiment. In some embodiments, theAV synchronization system 100, at least in part, is implemented in thecontainerized cloud architecture 900. In these embodiments, multiple instances of theAV synchronization system 100 can be deployed and executed simultaneously. Each instance of theAV synchronization system 100 can be used to determine aresult 130 from differentpre-encode AV content 102 andpost-encode AV content 106. - The illustrated containerized
cloud architecture 900 includes a first host (“host”) 902A and a second host (“host”) 902B (at times referred to herein collectively as hosts 902 or individually as host 902) that can communicate via anoverlay network 904. Although two hosts 902 are shown, thecontainerized cloud architecture 900 can support any number of hosts 902. Theoverlay network 904 can enable communication among hosts 902 in the same cloud network or hosts 902 across different cloud networks. Moreover, theoverlay network 904 can enable communication among hosts 902 owned and/or operated by the same or different entities. - The illustrated host 902A includes a
host hardware 1 906A, ahost operating system 1 908A, aDOCKER engine 1 910A, abridge network 1 912A, containerA-1 through containerN-1 914A1-914N1, and microserviceA-1 through microserviceN-1 916A1-916N1. Similarly, theillustrated host 2 902B includes ahost hardware 2 906B, ahost operating system 2 908B, aDOCKER engine 2 910B, abridge network 2 912B, containerA-2 through containerN-2 914A2-914N2, and microserviceA-2 through microserviceN-2 916A2-916N2. - The
host hardware 1 906A and thehost hardware 2 906B (at times referred to herein collectively or individually as host hardware 906) can be implemented as bare metal hardware such as one or more physical servers. The host hardware 906 alternatively can be implemented using hardware virtualization. In some embodiments, the host hardware 906 can include compute resources, memory resources, and other hardware resources. These resources can be virtualized according to known virtualization techniques. Avirtualization cloud architecture 1000 is described herein with reference toFIG. 10 . Although thecontainerized cloud architecture 900 and thevirtualization cloud architecture 1000 are described separately, these architectures can be combined to provide a hybrid containerized/virtualized cloud architecture. Those skilled in the art will appreciate that the disclosed cloud architectures are simplified for ease of explanation and can be altered as needed for any given implementation without departing from the scope of the concepts and technologies disclosed herein. As such, thecontainerized cloud architecture 900 and thevirtualized cloud architecture 1000 should not be construed as being limiting in any way. - Compute resources can include one or more hardware components that perform computations to process data and/or to execute computer-executable instructions. For example, the compute resources can execute instructions of the
host operating system 1 908A and thehost operating system 2 908B (at times referred to herein collectively as host operating systems 908 or individually as host operating system 908), the containers 914A1-914N1 and the containers 914A2-914N2 (at times referred to herein collectively as containers 914 or individually as container 914), and the microservices 916A1-916N1 and the microservices 916A1-916N1 (at times referred to herein collectively as microservices 916 or individually as microservice 916). - The compute resources of the host hardware 906 can include one or more central processing units (“CPUs”) configured with one or more processing cores. The compute resources can include one or more graphics processing unit (“GPU”) configured to accelerate operations performed by one or more CPUs, and/or to perform computations to process data, and/or to execute computer-executable instructions of one or more application programs, operating systems, and/or other software that may or may not include instructions particular to graphics computations. In some embodiments, the compute resources can include one or more discrete GPUs. In some other embodiments, the compute resources can include CPU and GPU components that are configured in accordance with a co-processing CPU/GPU computing model, wherein the sequential part of an application executes on the CPU and the computationally-intensive part is accelerated by the GPU. The compute resources can include one or more system-on-chip (“SoC”) components along with one or more other components, including, for example, one or more memory resources, and/or one or more other resources. In some embodiments, the compute resources can be or can include one or more SNAPDRAGON SoCs, available from QUALCOMM; one or more TEGRA SoCs, available from NVIDIA; one or more HUMMINGBIRD SoCs, available from SAMSUNG; one or more Open Multimedia Application Platform (“OMAP”) SoCs, available from TEXAS INSTRUMENTS; one or more customized versions of any of the above SoCs; and/or one or more proprietary SoCs. The compute resources can be or can include one or more hardware components architected in accordance with an advanced reduced instruction set computing (“RISC”) (“ARM”) architecture, available for license from ARM HOLDINGS. Alternatively, the compute resources can be or can include one or more hardware components architected in accordance with an x86 architecture, such an architecture available from INTEL CORPORATION, and others. Those skilled in the art will appreciate the implementation of the compute resources can utilize various computation architectures, and as such, the compute resources should not be construed as being limited to any particular computation architecture or combination of computation architectures, including those explicitly disclosed herein.
- The memory resources of the host hardware 906 can include one or more hardware components that perform storage operations, including temporary or permanent storage operations. In some embodiments, the memory resource(s) include volatile and/or non-volatile memory implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules, or other data disclosed herein. Computer storage media includes, but is not limited to, random access memory (“RAM”), read-only memory (“ROM”), Erasable Programmable ROM (“EPROM”), Electrically Erasable Programmable ROM (“EEPROM”), flash memory or other solid state memory technology, CD-ROM, digital versatile disks (“DVD”), or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store data and which can be accessed by the compute resources.
- The other resource(s) of the host hardware 906 can include any other hardware resources that can be utilized by the compute resources(s) and/or the memory resource(s) to perform operations described herein. The other resource(s) can include one or more input and/or output processors (e.g., network interface controller or wireless radio), one or more modems, one or more codec chipset, one or more pipeline processors, one or more fast Fourier transform (“FFT”) processors, one or more digital signal processors (“DSPs”), one or more speech synthesizers, and/or the like.
- The host operating systems 908 can be proprietary, open source, or closed source. In some embodiments, the host operating systems 908 can be or can include one or more container operating systems designed specifically to host containers such as the containers 914. For example, the host operating systems 908 can be or can include FEDORA COREOS (available from RED HAT, INC), RANCHEROS (available from RANCHER), and/or BOTTLEROCKET (available from Amazon Web Services). In some embodiments, the host operating systems 908 can be or can include one or more members of the WINDOWS family of operating systems from MICROSOFT CORPORATION (e.g., WINDOWS SERVER), the LINUX family of operating systems (e.g., CENTOS, DEBIAN, FEDORA, ORACLE LINUX, RHEL, SUSE, and UBUNTU), the SOLARIS family of operating systems from ORACLE CORPORATION, other operating systems, and the like.
- The
containerized cloud architecture 900 can be implemented utilizing any containerization technologies. Presently, open-source container technologies, such as those available from DOCKER, INC., are the most widely used, and it appears will continue to be for the foreseeable future. For this reason, thecontainerized cloud architecture 900 is described herein using DOCKER container technologies available from DOCKER, INC., such as the DOCKER engines 910. Those skilled in the art will appreciate that other container technologies, such as KUBERNETES may also be applicable to implementing the concepts and technologies disclosed herein, and as such, thecontainerized cloud architecture 900 is not limited to DOCKER container technologies. Moreover, although open-source container technologies are most widely used, the concepts and technologies disclosed here may be implemented using proprietary technologies or closed source technologies. - The DOCKER engines 910 are based on open source containerization technologies available from DOCKER, INC. The DOCKER engines 910 enable users (not shown) to build and containerize applications. The full breadth of functionality provided by the DOCKER engines 910 and associated components in the DOCKER architecture are beyond the scope of the present disclosure. As such, the primary functions of the DOCKER engines 910 will be described herein in brief, but this description should not be construed as limiting the functionality of the DOCKER engines 910 or any part of the associated DOCKER architecture. Instead, those skilled in the art will understand the implementation of the DOCKER engines 910 and other components of the DOCKER architecture to facilitate building and containerizing applications within the containerized
cloud architecture 900. - The DOCKER engine 910 functions as a client-server application executed by the host operating system 908. The DOCKER engine 910 provides a server with a daemon process along with application programming interfaces (“APIs”) that specify interfaces that applications can use to communicate with and instruct the daemon to perform operations. The DOCKER engine 910 also provides a command line interface (“CLI”) that uses the APIs to control and interact with the daemon through scripting and/or CLI commands. The daemon can create and manage objects such as images, containers, networks, and volumes. Although a single DOCKER engine 910 is illustrated in each of the hosts 902, multiple DOCKER engines 910 are contemplated. The DOCKER engine(s) 910 can be run in swarm mode.
- The bridge networks 912 enable the containers 914 connected to the same bridge network to communicate. For example, the
bridge network 1 912A enables communication among the containers 914A1-914N1, and thebridge networks 912B enables communication among the containers 914A2-914N2. In some embodiments, the bridge networks 912 are software network bridges implemented via the DOCKER bridge driver. The DOCKER bridge driver enables default and user-defined network bridges. - The containers 914 are runtime instances of images. The containers 914 are described herein specifically as DOCKER containers, although other containerization technologies are contemplated as noted above. Each container 914 can include an image, an execution environment, and a standard set of instructions. In the illustrated example, the containerA-1 914A1 is shown with the AV content encoder(s) 104, the
multimedia framework 107, theNAVPAM 122, and theAVSSM 126. Alternatively, the AV content encoder(s) 104, themultimedia framework 107, theNAVPAM 122, theAVSSM 126, or any combination thereof can be distributed among multiple containers 914 across the same or different hosts 902. - The microservices 916 are applications that provide a single function. In some embodiments, each of the microservices 916 is provided by one of the containers 914, although each of the containers 914 may contain multiple microservices 916. For example, the microservices 916 can include, but are not limited, to server, database, and other executable applications to be run in an execution environment provided by a container 914. The microservices 916 can provide any type of functionality, and therefore all the possible functions cannot be listed herein. Those skilled in the art will appreciate the use of the microservices 916 along with the containers 914 to improve many aspects of the
containerized cloud architecture 900, such as reliability, security, agility, and efficiency, for example. In some embodiments, the AV content encoder(s) 104, themultimedia framework 107, theNAVPAM 122, theAVSSM 126, or some combination thereof are embodied as part of the microservices 916. - Turning now to
FIG. 10 , a block diagram illustrating an examplevirtualized cloud architecture 1000 and components thereof will be described, according to an exemplary embodiment. Thevirtualized cloud architecture 1000 can be utilized to implement various elements disclosed herein. In some embodiments, theAV synchronization system 100, at least in part, is implemented in thevirtualized cloud architecture 1000. - The
virtualized cloud architecture 1000 is a shared infrastructure that can support multiple services and network applications. The illustratedvirtualized cloud architecture 1000 includes ahardware resource layer 1002, acontrol layer 1004, avirtual resource layer 1006, and anapplication layer 1008 that work together to perform operations as will be described in detail herein. - The
hardware resource layer 1002 provides hardware resources, which, in the illustrated embodiment, include one ormore compute resources 1010, one ormore memory resources 1012, and one or moreother resources 1014. The compute resource(s) 1010 can include one or more hardware components that perform computations to process data, and/or to execute computer-executable instructions of one or more application programs, operating systems, and/or other software. Thecompute resources 1010 can include one or more central processing units (“CPUs”) configured with one or more processing cores. Thecompute resources 1010 can include one or more graphics processing unit (“GPU”) configured to accelerate operations performed by one or more CPUs, and/or to perform computations to process data, and/or to execute computer-executable instructions of one or more application programs, operating systems, and/or other software that may or may not include instructions particular to graphics computations. In some embodiments, thecompute resources 1010 can include one or more discrete GPUs. In some other embodiments, thecompute resources 1010 can include CPU and GPU components that are configured in accordance with a co-processing CPU/GPU computing model, wherein the sequential part of an application executes on the CPU and the computationally-intensive part is accelerated by the GPU. Thecompute resources 1010 can include one or more system-on-chip (“SoC”) components along with one or more other components, including, for example, one or more of thememory resources 1012, and/or one or more of theother resources 1014. In some embodiments, thecompute resources 1010 can be or can include one or more SNAPDRAGON SoCs, available from QUALCOMM; one or more TEGRA SoCs, available from NVIDIA; one or more HUMMINGBIRD SoCs, available from SAMSUNG; one or more Open Multimedia Application Platform (“OMAP”) SoCs, available from TEXAS INSTRUMENTS; one or more customized versions of any of the above SoCs; and/or one or more proprietary SoCs. Thecompute resources 1010 can be or can include one or more hardware components architected in accordance with an advanced reduced instruction set computing (“RISC”) machine (“ARM”) architecture, available for license from ARM HOLDINGS. Alternatively, thecompute resources 1010 can be or can include one or more hardware components architected in accordance with an x86 architecture, such an architecture available from INTEL CORPORATION of Mountain View, Calif., and others. Those skilled in the art will appreciate the implementation of thecompute resources 1010 can utilize various computation architectures, and as such, thecompute resources 1010 should not be construed as being limited to any particular computation architecture or combination of computation architectures, including those explicitly disclosed herein. - The memory resource(s) 1012 can include one or more hardware components that perform storage operations, including temporary or permanent storage operations. In some embodiments, the memory resource(s) 1012 include volatile and/or non-volatile memory implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules, or other data disclosed herein. Computer storage media includes, but is not limited to, random access memory (“RAM”), read-only memory (“ROM”), Erasable Programmable ROM (“EPROM”), Electrically Erasable Programmable ROM (“EEPROM”), flash memory or other solid state memory technology, CD-ROM, digital versatile disks (“DVD”), or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store data and which can be accessed by the
compute resources 1010. - The other resource(s) 1014 can include any other hardware resources that can be utilized by the compute resources(s) 1010 and/or the memory resource(s) 1012 to perform operations described herein. The other resource(s) 1014 can include one or more input and/or output processors (e.g., network interface controller or wireless radio), one or more modems, one or more codec chipset, one or more pipeline processors, one or more fast Fourier transform (“FFT”) processors, one or more digital signal processors (“DSPs”), one or more speech synthesizers, and/or the like.
- The hardware resources operating within the
hardware resource layer 1002 can be virtualized by one or more virtual machine monitors (“VMMs”) 1016A-1016N (also known as “hypervisors;” hereinafter “VMMs 1016”) operating within thecontrol layer 1004 to manage one or more virtual resources that reside in thevirtual resource layer 1006. The VMMs 1016 can be or can include software, firmware, and/or hardware that alone or in combination with other software, firmware, and/or hardware, manages one or more virtual resources operating within thevirtual resource layer 1006. - The virtual resources operating within the
virtual resource layer 1006 can include abstractions of at least a portion of thecompute resources 1010, thememory resources 1012, theother resources 1014, or any combination thereof. These abstractions are referred to herein as virtual machines (“VMs”). In the illustrated embodiment, thevirtual resource layer 1006 includesVMs 1018A-1018N (hereinafter “VMs 1018”). Each of the VMs 1018 can execute one ormore applications 1020A-1020N in theapplication layer 1008. Also in the illustrated embodiment, the AV content encoder(s) 104 and themultimedia framework 107 are shown as theapplication 1020A or a portion thereof; theNAVPAM 122 is shown as theapplication 1020B or a portion thereof; and theAVSSM 126 is shown as theapplication 1020C or a portion thereof. - Turning now to
FIG. 11 , an illustrativemobile device 1100 and components thereof will be described. Themobile device 1100 is representative of a device that can playback thepost-encode AV content 106, such as part of a video streaming service provided to themobile device 1100. While connections are not shown between the various components illustrated inFIG. 11 , it should be understood that some, none, or all of the components illustrated inFIG. 11 can be configured to interact with one another to carry out various device functions. In some embodiments, the components are arranged so as to communicate via one or more busses (not shown). Thus, it should be understood thatFIG. 11 and the following description are intended to provide a general understanding of a suitable environment in which various aspects of embodiments can be implemented, and should not be construed as being limiting in any way. - As illustrated in
FIG. 11 , themobile device 1100 can include adisplay 1102 for displaying data. According to various embodiments, thedisplay 1102 can be configured to display thepost-encode AV content 106, various GUI elements, text, images, video, virtual keypads and/or keyboards, messaging data, notification messages, metadata, Internet content, device status, time, date, calendar data, device preferences, map and location data, combinations thereof, and/or the like. Themobile device 1100 also can include aprocessor 1104 and a memory or other data storage device (“memory”) 1106. Theprocessor 1104 can be configured to process data and/or can execute computer-executable instructions stored in thememory 1106. The computer-executable instructions executed by theprocessor 1104 can include, for example, anoperating system 1108, one ormore applications 1110, other computer-executable instructions stored in thememory 1106, or the like. In some embodiments, theapplications 1110 also can include a UI application (not illustrated inFIG. 11 ). - The UI application can interface with the
operating system 1108 to facilitate user interaction with functionality and/or data stored at themobile device 1100 and/or stored elsewhere. In some embodiments, theoperating system 1108 can include a member of the SYMBIAN OS family of operating systems from SYMBIAN LIMITED, a member of the WINDOWS MOBILE OS and/or WINDOWS PHONE OS families of operating systems from MICROSOFT CORPORATION, a member of the PALM WEBOS family of operating systems from HEWLETT PACKARD CORPORATION, a member of the BLACKBERRY OS family of operating systems from RESEARCH IN MOTION LIMITED, a member of the IOS family of operating systems from APPLE INC., a member of the ANDROID OS family of operating systems from GOOGLE INC., and/or other operating systems. These operating systems are merely illustrative of some contemplated operating systems that may be used in accordance with various embodiments of the concepts and technologies described herein and therefore should not be construed as being limiting in any way. - The UI application can be executed by the
processor 1104 to aid a user in entering/deleting data, entering and setting user IDs and passwords for device access, configuring settings, manipulating content and/or settings, multimode interaction, interacting withother applications 1110, and otherwise facilitating user interaction with theoperating system 1108, theapplications 1110, and/or other types or instances ofdata 1112 that can be stored at themobile device 1100. - The
applications 1110, thedata 1112, and/or portions thereof can be stored in thememory 1106 and/or in afirmware 1114, and can be executed by theprocessor 1104. Thefirmware 1114 also can store code for execution during device power up and power down operations. It can be appreciated that thefirmware 1114 can be stored in a volatile or non-volatile data storage device including, but not limited to, thememory 1106 and/or a portion thereof. - The
mobile device 1100 also can include an input/output (“I/O”)interface 1116. The I/O interface 1116 can be configured to support the input/output of data such as location information, presence status information, user IDs, passwords, and application initiation (start-up) requests. In some embodiments, the I/O interface 1116 can include a hardwire connection such as a universal serial bus (“USB”) port, a mini-USB port, a micro-USB port, an audio jack, a PS2 port, an IEEE 1394 (“FIREWIRE”) port, a serial port, a parallel port, an Ethernet (RJ45) port, an RJ11 port, a proprietary port, combinations thereof, or the like. In some embodiments, themobile device 1100 can be configured to synchronize with another device to transfer content to and/or from themobile device 1100. In some embodiments, themobile device 1100 can be configured to receive updates to one or more of theapplications 1110 via the I/O interface 1116, though this is not necessarily the case. In some embodiments, the I/O interface 1116 accepts I/O devices such as keyboards, keypads, mice, interface tethers, printers, plotters, external storage, touch/multi-touch screens, touch pads, trackballs, joysticks, microphones, remote control devices, displays, projectors, medical equipment (e.g., stethoscopes, heart monitors, and other health metric monitors), modems, routers, external power sources, docking stations, combinations thereof, and the like. It should be appreciated that the I/O interface 1116 may be used for communications between themobile device 1100 and a network device or local device. - The
mobile device 1100 also can include acommunications component 1118. Thecommunications component 1118 can be configured to interface with theprocessor 1104 to facilitate wired and/or wireless communications with one or more networks, such as a packet data network 1204 (shown inFIG. 12 ), the Internet, or some combination thereof. In some embodiments, thecommunications component 1118 includes a multimode communications subsystem for facilitating communications via the cellular network and one or more other networks. - The
communications component 1118, in some embodiments, includes one or more transceivers. The one or more transceivers, if included, can be configured to communicate over the same and/or different wireless technology standards with respect to one another. For example, in some embodiments, one or more of the transceivers of thecommunications component 1118 may be configured to communicate using Global System for Mobile communications (“GSM”), Code-Division Multiple Access (“CDMA”) CDMAONE, CDMA2000, Long-Term Evolution (“LTE”) LTE, and various other 2G, 2.5G, 3G, 4G, 4.5G, 5G, and greater generation technology standards. Moreover, thecommunications component 1118 may facilitate communications over various channel access methods (which may or may not be used by the aforementioned standards) including, but not limited to, Time-Division Multiple Access (“TDMA”), Frequency-Division Multiple Access (“FDMA”), Wideband CDMA (“W-CDMA”), Orthogonal Frequency-Division Multiple Access (“OFDMA”), Space-Division Multiple Access (“SDMA”), and the like. - In addition, the
communications component 1118 may facilitate data communications using General Packet Radio Service (“GPRS”), Enhanced Data services for Global Evolution (“EDGE”), the High-Speed Packet Access (“HSPA”) protocol family including High-Speed Downlink Packet Access (“HSDPA”), Enhanced Uplink (“EUL”) (also referred to as High-Speed Uplink Packet Access (“HSUPA”), HSPA+, and various other current and future wireless data access standards. In the illustrated embodiment, thecommunications component 1118 can include a first transceiver (“TxRx”) 1120A that can operate in a first communications mode (e.g., GSM). Thecommunications component 1118 also can include an Nth transceiver (“TxRx”) 1120N that can operate in a second communications mode relative to thefirst transceiver 1120A (e.g., UMTS). While twotransceivers 1120A-1120N (hereinafter collectively and/or generically referred to as “transceivers 1120”) are shown inFIG. 11 , it should be appreciated that less than two, two, and/or more than two transceivers 1120 can be included in thecommunications component 1118. - The
communications component 1118 also can include an alternative transceiver (“Alt TxRx”) 1122 for supporting other types and/or standards of communications. According to various contemplated embodiments, thealternative transceiver 1122 can communicate using various communications technologies such as, for example, WI-FI, WIMAX, BLUETOOTH, infrared, infrared data association (“IRDA”), near field communications (“NFC”), other RF technologies, combinations thereof, and the like. In some embodiments, thecommunications component 1118 also can facilitate reception from terrestrial radio networks, digital satellite radio networks, internet-based radio service networks, combinations thereof, and the like. Thecommunications component 1118 can process data from a network such as the Internet, an intranet, a broadband network, a WI-FI hotspot, an Internet service provider (“ISP”), a digital subscriber line (“DSL”) provider, a broadband provider, combinations thereof, or the like. - The
mobile device 1100 also can include one ormore sensors 1124. Thesensors 1124 can include temperature sensors, light sensors, air quality sensors, movement sensors, accelerometers, magnetometers, gyroscopes, infrared sensors, orientation sensors, noise sensors, microphones proximity sensors, combinations thereof, and/or the like. Additionally, audio capabilities for themobile device 1100 may be provided by an audio I/O component 1126. The audio I/O component 1126 of themobile device 1100 can include one or more speakers for the output of audio signals, one or more microphones for the collection and/or input of audio signals, and/or other audio input and/or output devices. - The illustrated
mobile device 1100 also can include a subscriber identity module (“SIM”)system 1128. TheSIM system 1128 can include a universal SIM (“USIM”), a universal integrated circuit card (“UICC”) and/or other identity devices. TheSIM system 1128 can include and/or can be connected to or inserted into an interface such as aslot interface 1130. In some embodiments, theslot interface 1130 can be configured to accept insertion of other identity cards or modules for accessing various types of networks. Additionally, or alternatively, theslot interface 1130 can be configured to accept multiple subscriber identity cards. Because other devices and/or modules for identifying users and/or themobile device 1100 are contemplated, it should be understood that these embodiments are illustrative, and should not be construed as being limiting in any way. - The
mobile device 1100 also can include an image capture and processing system 1132 (“image system”). Theimage system 1132 can be configured to capture or otherwise obtain photos, videos, and/or other visual information. As such, theimage system 1132 can include cameras, lenses, charge-coupled devices (“CCDs”), combinations thereof, or the like. Themobile device 1100 may also include avideo system 1134. Thevideo system 1134 can be configured to capture, process, record, modify, and/or store video content. Photos and videos obtained using theimage system 1132 and thevideo system 1134, respectively, may be added as message content to an MMS message, email message, and sent to another device. The video and/or photo content also can be shared with other devices via various types of data transfers via wired and/or wireless communication devices as described herein. - The
mobile device 1100 also can include one ormore location components 1136. Thelocation components 1136 can be configured to send and/or receive signals to determine a geographic location of themobile device 1100. According to various embodiments, thelocation components 1136 can send and/or receive signals from global positioning system (“GPS”) devices, assisted-GPS (“A-GPS”) devices, WI-FI/WIMAX and/or cellular network triangulation data, combinations thereof, and the like. Thelocation component 1136 also can be configured to communicate with thecommunications component 1118 to retrieve triangulation data for determining a location of themobile device 1100. In some embodiments, thelocation component 1136 can interface with cellular network nodes, telephone lines, satellites, location transmitters and/or beacons, wireless network transmitters and receivers, combinations thereof, and the like. In some embodiments, thelocation component 1136 can include and/or can communicate with one or more of thesensors 1124 such as a compass, an accelerometer, and/or a gyroscope to determine the orientation of themobile device 1100. Using thelocation component 1136, themobile device 1100 can generate and/or receive data to identify its geographic location, or to transmit data used by other devices to determine the location of themobile device 1100. Thelocation component 1136 may include multiple components for determining the location and/or orientation of themobile device 1100. - The illustrated
mobile device 1100 also can include apower source 1138. Thepower source 1138 can include one or more batteries, power supplies, power cells, and/or other power subsystems including alternating current (“AC”) and/or direct current (“DC”) power devices. Thepower source 1138 also can interface with an external power system or charging equipment via a power I/O component 1140. Because themobile device 1100 can include additional and/or alternative components, the above embodiment should be understood as being illustrative of one possible operating environment for various embodiments of the concepts and technologies described herein. The described embodiment of themobile device 1100 is illustrative, and should not be construed as being limiting in any way. - As used herein, communication media includes computer-executable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any delivery media. The term “modulated data signal” means a signal that has one or more of its characteristics changed or set in a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared, and other wireless media. Combinations of the any of the above should also be included within the scope of computer-readable media.
- By way of example, and not limitation, computer storage media may include volatile and non-volatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-executable instructions, data structures, program modules, or other data. For example, computer media includes, but is not limited to, RAM, ROM, EPROM, EEPROM, flash memory or other solid state memory technology, CD-ROM, digital versatile disks (“DVD”), HD-DVD, BLU-RAY, or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by the
mobile device 1100 or other devices or computers described herein, such as thecomputer system 800 described above with reference toFIG. 8 . - Encoding the software modules presented herein also may transform the physical structure of the computer-readable media presented herein. The specific transformation of physical structure may depend on various factors, in different implementations of this description. Examples of such factors may include, but are not limited to, the technology used to implement the computer-readable media, whether the computer-readable media is characterized as primary or secondary storage, and the like. For example, if the computer-readable media is implemented as semiconductor-based memory, the software disclosed herein may be encoded on the computer-readable media by transforming the physical state of the semiconductor memory. For example, the software may transform the state of transistors, capacitors, or other discrete circuit elements constituting the semiconductor memory. The software also may transform the physical state of such components in order to store data thereupon.
- As another example, the computer-readable media disclosed herein may be implemented using magnetic or optical technology. In such implementations, the software presented herein may transform the physical state of magnetic or optical media, when the software is encoded therein. These transformations may include altering the magnetic characteristics of particular locations within given magnetic media. These transformations also may include altering the physical features or characteristics of particular locations within given optical media, to change the optical characteristics of those locations. Other transformations of physical media are possible without departing from the scope and spirit of the present description, with the foregoing examples provided only to facilitate this discussion.
- In light of the above, it should be appreciated that many types of physical transformations may take place in the
mobile device 1100 in order to store and execute the software components presented herein. It is also contemplated that themobile device 1100 may not include all of the components shown inFIG. 11 , may include other components that are not explicitly shown inFIG. 11 , or may utilize an architecture completely different than that shown inFIG. 11 . - Turning now to
FIG. 12 , details of anetwork 1200 are illustrated, according to an illustrative embodiment. Thenetwork 1200 includes acellular network 1202, apacket data network 1204, and a circuit switchednetwork 1206. In some embodiments, thenetwork 818 is or includes thenetwork 1200. Moreover, theAV synchronization system 100 can be configured to communicate over thenetwork 1200. - The
cellular network 1202 can include various components such as, but not limited to, base transceiver stations (“BTSs”), Node-Bs or e-Node-Bs, base station controllers (“BSCs”), radio network controllers (“RNCs”), mobile switching centers (“MSCs”), mobility management entities (“MMEs”), short message service centers (“SMSCs”), multimedia messaging service centers (“MMSCs”), home location registers (“HLRs”), home subscriber servers (“HSSs”), visitor location registers (“VLRs”), charging platforms, billing platforms, voicemail platforms, GPRS core network components, location service nodes, and the like. Thecellular network 1202 also includes radios and nodes for receiving and transmitting voice, data, and combinations thereof to and from radio transceivers, networks, thepacket data network 1204, and the circuit switchednetwork 1206. - A
mobile communications device 1208, such as, for example, a cellular telephone, a user equipment, a mobile terminal, a PDA, a laptop computer, a handheld computer, and combinations thereof, can be operatively connected to thecellular network 1202. Themobile communications device 1208 can be configured similar to or the same as themobile device 1100 described above with reference toFIG. 9 . - The
cellular network 1202 can be configured as a GSM) network and can provide data communications via GPRS and/or EDGE. Additionally, or alternatively, thecellular network 1202 can be configured as a 3G Universal Mobile Telecommunications System (“UMTS”) network and can provide data communications via the HSPA protocol family, for example, HSDPA, EUL, and HSPA+. Thecellular network 1202 also is compatible with 4G mobile communications standards such as LTE, 5G mobile communications standards, or the like, as well as evolved and future mobile standards. - The
packet data network 1204 includes various systems, devices, servers, computers, databases, and other devices in communication with one another, as is generally known. In some embodiments, thepacket data network 1204 is or includes one or more WI-FI networks, each of which can include one or more WI-FI access points, routers, switches, and other WI-FI network components. Thepacket data network 1204 devices are accessible via one or more network links. The servers often store various files that are provided to a requesting device such as, for example, a computer, a terminal, a smartphone, or the like. Typically, the requesting device includes software for executing a web page in a format readable by the browser or other software. Other files and/or data may be accessible via “links” in the retrieved files, as is generally known. In some embodiments, thepacket data network 1204 includes or is in communication with the Internet. Thepacket data network 1204 can be or can include one or more of the PDNs 122A-122N. The circuit switchednetwork 1206 includes various hardware and software for providing circuit switched communications. The circuit switchednetwork 1206 may include, or may be, what is often referred to as a plain old telephone system (“POTS”). The functionality of a circuit switchednetwork 1206 or other circuit-switched network are generally known and will not be described herein in detail. - The illustrated
cellular network 1202 is shown in communication with thepacket data network 1204 and a circuit switchednetwork 1206, though it should be appreciated that this is not necessarily the case. One or more Internet-capable devices 1210 such as a laptop, a portable device, or another suitable device, can communicate with one or morecellular networks 1202, and devices connected thereto, through thepacket data network 1204. It also should be appreciated that the Internet-capable device 1210 can communicate with thepacket data network 1204 through the circuit switchednetwork 1206, thecellular network 1202, and/or via other networks (not illustrated). - As illustrated, a
communications device 1212, for example, a telephone, facsimile machine, modem, computer, or the like, can be in communication with the circuit switchednetwork 1206, and therethrough to thepacket data network 1204 and/or thecellular network 1202. It should be appreciated that thecommunications device 1212 can be an Internet-capable device, and can be substantially similar to the Internet-capable device 1210. - Turning now to
FIG. 13 , amachine learning system 1300 capable of implementing aspects of the embodiments disclosed herein will be described. In some embodiments, aspects of theNAVPAM 122 and/or theAVSSM 126 can be enhanced through the use of machine learning and/or artificial intelligence applications. Accordingly, theAV synchronization system 100 can include themachine learning system 1300 or can be in communication with themachine learning system 1300. - The illustrated
machine learning system 1300 includes one or moremachine learning models 1302. Themachine learning models 1302 can include supervised and/or semi-supervised learning models. The machine learning model(s) 1302 can be created by themachine learning system 1300 based upon one or moremachine learning algorithms 1304. The machine learning algorithm(s) 1304 can be any existing, well-known algorithm, any proprietary algorithms, or any future machine learning algorithm. Some examplemachine learning algorithms 1304 include, but are not limited to, neural networks, gradient descent, linear regression, logistic regression, linear discriminant analysis, classification tree, regression tree, Naive Bayes, K-nearest neighbor, learning vector quantization, support vector machines, and the like. Classification and regression algorithms might find particular applicability to the concepts and technologies disclosed herein. Those skilled in the art will appreciate the applicability of variousmachine learning algorithms 1304 based upon the problem(s) to be solved by machine learning via themachine learning system 1300. - The
machine learning system 1300 can control the creation of themachine learning models 1302 via one or more training parameters. In some embodiments, the training parameters are selected modelers at the direction of an enterprise, for example. Alternatively, in some embodiments, the training parameters are automatically selected based upon data provided in one or more training data sets 1306. The training parameters can include, for example, a learning rate, a model size, a number of training passes, data shuffling, regularization, and/or other training parameters known to those skilled in the art. The training data in the training data sets 1306. - The learning rate is a training parameter defined by a constant value. The learning rate affects the speed at which the
machine learning algorithm 1304 converges to the optimal weights. Themachine learning algorithm 1304 can update the weights for every data example included in thetraining data set 1306. The size of an update is controlled by the learning rate. A learning rate that is too high might prevent themachine learning algorithm 1304 from converging to the optimal weights. A learning rate that is too low might result in themachine learning algorithm 1304 requiring multiple training passes to converge to the optimal weights. - The model size is regulated by the number of input features (“features”) 1308 in the
training data set 1306. A greater the number offeatures 1308 yields a greater number of possible patterns that can be determined from thetraining data set 1306. The model size should be selected to balance the resources (e.g., compute, memory, storage, etc.) needed for training and the predictive power of the resultantmachine learning model 1302. - The number of training passes indicates the number of training passes that the
machine learning algorithm 1304 makes over thetraining data set 1306 during the training process. The number of training passes can be adjusted based, for example, on the size of thetraining data set 1306, with larger training data sets being exposed to fewer training passes in consideration of time and/or resource utilization. The effectiveness of the resultantmachine learning model 1302 can be increased by multiple training passes. - Data shuffling is a training parameter designed to prevent the
machine learning algorithm 1304 from reaching false optimal weights due to the order in which data contained in thetraining data set 1306 is processed. For example, data provided in rows and columns might be analyzed first row, second row, third row, etc., and thus an optimal weight might be obtained well before a full range of data has been considered. By data shuffling, the data contained in thetraining data set 1306 can be analyzed more thoroughly and mitigate bias in the resultantmachine learning model 1302. - Regularization is a training parameter that helps to prevent the
machine learning model 1302 from memorizing training data from thetraining data set 1306. In other words, themachine learning model 1302 fits thetraining data set 1306, but the predictive performance of themachine learning model 1302 is not acceptable. Regularization helps themachine learning system 1300 avoid this overfitting/memorization problem by adjusting extreme weight values of thefeatures 1308. For example, a feature that has a small weight value relative to the weight values of the other features in thetraining data set 1306 can be adjusted to zero. - The
machine learning system 1300 can determine model accuracy after training by using one or moreevaluation data sets 1310 containing thesame features 1308′ as thefeatures 1308 in thetraining data set 1306. This also prevents themachine learning model 1302 from simply memorizing the data contained in thetraining data set 1306. The number of evaluation passes made by themachine learning system 1300 can be regulated by a target model accuracy that, when reached, ends the evaluation process and themachine learning model 1302 is considered ready for deployment. - After deployment, the
machine learning model 1302 can perform a prediction operation (“prediction”) 1314 with aninput data set 1312 having thesame features 1308″ as thefeatures 1308 in thetraining data set 1306 and thefeatures 1308′ of theevaluation data set 1310. The results of theprediction 1314 are included in anoutput data set 1316 consisting of predicted data. Themachine learning model 1302 can perform other operations, such as regression, classification, and others. As such, the example illustrated inFIG. 13 should not be construed as being limiting in any way. - Based on the foregoing, it should be appreciated that concepts and technologies directed to audio and video synchronization have been disclosed herein. Although the subject matter presented herein has been described in language specific to computer structural features, methodological and transformative acts, specific computing machinery, and computer-readable media, it is to be understood that the concepts and technologies disclosed herein are not necessarily limited to the specific features, acts, or media described herein. Rather, the specific features, acts and mediums are disclosed as example forms of implementing the concepts and technologies disclosed herein.
- The subject matter described above is provided by way of illustration only and should not be construed as limiting. Various modifications and changes may be made to the subject matter described herein without following the example embodiments and applications illustrated and described, and without departing from the true spirit and scope of the embodiments of the concepts and technologies disclosed herein.
Claims (20)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/866,622 US20220360843A1 (en) | 2021-03-29 | 2022-07-18 | Audio and Video Synchronization |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/215,045 US11395031B1 (en) | 2021-03-29 | 2021-03-29 | Audio and video synchronization |
US17/866,622 US20220360843A1 (en) | 2021-03-29 | 2022-07-18 | Audio and Video Synchronization |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/215,045 Continuation US11395031B1 (en) | 2021-03-29 | 2021-03-29 | Audio and video synchronization |
Publications (1)
Publication Number | Publication Date |
---|---|
US20220360843A1 true US20220360843A1 (en) | 2022-11-10 |
Family
ID=82385213
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/215,045 Active US11395031B1 (en) | 2021-03-29 | 2021-03-29 | Audio and video synchronization |
US17/866,622 Abandoned US20220360843A1 (en) | 2021-03-29 | 2022-07-18 | Audio and Video Synchronization |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/215,045 Active US11395031B1 (en) | 2021-03-29 | 2021-03-29 | Audio and video synchronization |
Country Status (1)
Country | Link |
---|---|
US (2) | US11395031B1 (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090027242A1 (en) * | 2007-05-16 | 2009-01-29 | Ibm Corporation | High-rate rll encoding |
US20110179451A1 (en) * | 2007-10-19 | 2011-07-21 | British Sky Broadcasting Ltd. | Television Display |
US20180077445A1 (en) * | 2016-09-13 | 2018-03-15 | Facebook, Inc. | Systems and methods for evaluating content synchronization |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10313717B2 (en) | 2017-07-12 | 2019-06-04 | At&T Mobility Ii Llc | Adaptive bit rate mobile video objective testing |
-
2021
- 2021-03-29 US US17/215,045 patent/US11395031B1/en active Active
-
2022
- 2022-07-18 US US17/866,622 patent/US20220360843A1/en not_active Abandoned
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090027242A1 (en) * | 2007-05-16 | 2009-01-29 | Ibm Corporation | High-rate rll encoding |
US20110179451A1 (en) * | 2007-10-19 | 2011-07-21 | British Sky Broadcasting Ltd. | Television Display |
US20180077445A1 (en) * | 2016-09-13 | 2018-03-15 | Facebook, Inc. | Systems and methods for evaluating content synchronization |
Also Published As
Publication number | Publication date |
---|---|
US11395031B1 (en) | 2022-07-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10797960B2 (en) | Guided network management | |
US10565464B2 (en) | Adaptive cloud offloading of mobile augmented reality | |
US10423660B1 (en) | System for detecting non-synchronization between audio and subtitle | |
US9801219B2 (en) | Pairing of nearby devices using a synchronized cue signal | |
US9961398B2 (en) | Method and device for switching video streams | |
US11386294B2 (en) | Data harvesting for machine learning model training | |
US11394875B2 (en) | Content capture service | |
US11450027B2 (en) | Method and electronic device for processing videos | |
WO2019242222A1 (en) | Method and device for use in generating information | |
CN111050201B (en) | Data processing method and device, electronic equipment and storage medium | |
US11514948B1 (en) | Model-based dubbing to translate spoken audio in a video | |
US11367196B2 (en) | Image processing method, apparatus, and storage medium | |
US10218570B2 (en) | Client application adaptation method, terminal device, and system | |
US11907285B2 (en) | Surrogate metadata aggregation for dynamic content assembly | |
WO2018071212A1 (en) | Creating a cinematic storytelling experience using network-addressable devices | |
US11600032B2 (en) | Augmented reality security vulnerability assistant | |
US20160295256A1 (en) | Digital content streaming from digital tv broadcast | |
KR20200115017A (en) | Apparatus and method for searching image | |
US11395031B1 (en) | Audio and video synchronization | |
US9973562B2 (en) | Split processing of encoded video in streaming segments | |
US20230283499A1 (en) | Enhanced Smart Home Services | |
US10714088B2 (en) | Speech recognition device and method of identifying speech | |
WO2023045430A1 (en) | Two dimensional code-based data processing method, apparatus, and system | |
WO2022227689A1 (en) | Video processing method and apparatus | |
CN115243077A (en) | Audio and video resource on-demand method and device, computer equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: AT&T MOBILITY II LLC, GEORGIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:PETAJAN, ERIC;REEL/FRAME:060530/0958 Effective date: 20210327 Owner name: AT&T INTELLECTUAL PROPERTY I, L.P., GEORGIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:GOMEZ, ESTEBAN;BRIAND, MANUEL;CULWELL, KEVIN;AND OTHERS;SIGNING DATES FROM 20210318 TO 20210326;REEL/FRAME:060530/0923 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO PAY ISSUE FEE |