WO2021161226A1 - Distributed measurement of latency and synchronization delay between audio/video streams - Google Patents
Distributed measurement of latency and synchronization delay between audio/video streams Download PDFInfo
- Publication number
- WO2021161226A1 WO2021161226A1 PCT/IB2021/051149 IB2021051149W WO2021161226A1 WO 2021161226 A1 WO2021161226 A1 WO 2021161226A1 IB 2021051149 W IB2021051149 W IB 2021051149W WO 2021161226 A1 WO2021161226 A1 WO 2021161226A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- audio
- video stream
- signatures
- video
- synchronization offset
- Prior art date
Links
- 238000005259 measurement Methods 0.000 title description 14
- 238000012360 testing method Methods 0.000 claims abstract description 87
- 239000013598 vector Substances 0.000 claims abstract description 41
- 238000005314 correlation function Methods 0.000 claims abstract description 19
- 238000000034 method Methods 0.000 claims description 55
- 239000000872 buffer Substances 0.000 claims description 10
- 230000000694 effects Effects 0.000 claims description 8
- 230000003139 buffering effect Effects 0.000 claims description 2
- 238000007781 pre-processing Methods 0.000 claims 1
- 230000008569 process Effects 0.000 description 28
- 238000004422 calculation algorithm Methods 0.000 description 8
- 238000013459 approach Methods 0.000 description 5
- 238000012545 processing Methods 0.000 description 5
- 230000006870 function Effects 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 3
- 230000006835 compression Effects 0.000 description 3
- 238000007906 compression Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 239000012634 fragment Substances 0.000 description 3
- 241000023320 Luma <angiosperm> Species 0.000 description 2
- 238000003491 array Methods 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 230000001934 delay Effects 0.000 description 2
- 230000003116 impacting effect Effects 0.000 description 2
- OSWPMRLSEDHDFF-UHFFFAOYSA-N methyl salicylate Chemical compound COC(=O)C1=CC=CC=C1O OSWPMRLSEDHDFF-UHFFFAOYSA-N 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 230000002123 temporal effect Effects 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- 241000278713 Theora Species 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000003466 anti-cipated effect Effects 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 230000001010 compromised effect Effects 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000004806 packaging method and process Methods 0.000 description 1
- 230000000737 periodic effect Effects 0.000 description 1
- 230000002085 persistent effect Effects 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 230000005236 sound signal Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04H—BROADCAST COMMUNICATION
- H04H60/00—Arrangements for broadcast applications with a direct linking to broadcast information or broadcast space-time; Broadcast-related systems
- H04H60/35—Arrangements for identifying or recognising characteristics with a direct linkage to broadcast information or to broadcast space-time, e.g. for identifying broadcast stations or for identifying users
- H04H60/38—Arrangements for identifying or recognising characteristics with a direct linkage to broadcast information or to broadcast space-time, e.g. for identifying broadcast stations or for identifying users for identifying broadcast time or space
- H04H60/40—Arrangements for identifying or recognising characteristics with a direct linkage to broadcast information or to broadcast space-time, e.g. for identifying broadcast stations or for identifying users for identifying broadcast time or space for identifying broadcast time
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/234—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
- H04N21/23418—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/24—Monitoring of processes or resources, e.g. monitoring of server load, available bandwidth, upstream requests
- H04N21/2407—Monitoring of transmitted content, e.g. distribution time, number of downloads
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/242—Synchronization processes, e.g. processing of PCR [Program Clock References]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/4302—Content synchronisation processes, e.g. decoder synchronisation
- H04N21/4307—Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/438—Interfacing the downstream path of the transmission network originating from a server, e.g. retrieving encoded video stream packets from an IP network
- H04N21/4383—Accessing a communication channel
- H04N21/4384—Accessing a communication channel involving operations to reduce the access time, e.g. fast-tuning for reducing channel switching latency
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/439—Processing of audio elementary streams
- H04N21/4394—Processing of audio elementary streams involving operations for analysing the audio stream, e.g. detecting features or characteristics in audio streams
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
- H04N21/44008—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/04—Synchronising
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/04—Synchronising
- H04N5/06—Generation of synchronising signals
- H04N5/067—Arrangements or circuits at the transmitter end
- H04N5/073—Arrangements or circuits at the transmitter end for mutually locking plural sources of synchronising signals, e.g. studios or relay stations
- H04N5/0736—Arrangements or circuits at the transmitter end for mutually locking plural sources of synchronising signals, e.g. studios or relay stations using digital storage buffer techniques
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04H—BROADCAST COMMUNICATION
- H04H2201/00—Aspects of broadcast communication
- H04H2201/90—Aspects of broadcast communication characterised by the use of signatures
Definitions
- aspects of the disclosure generally relate to the distributed measurement of latency and synchronization delay between audio/video streams.
- Measuring real-time latency between two streams can be very time consuming and cumbersome.
- One example technique for performing temporal alignment, and consequently latency measurement is a manual process such as monitoring the frames of two videos and aligning them visually.
- Another technique that may be used is the performance of expensive frame-based measurement such as computing peak signal-to-noise ratio or any other frame-based differencing tool between every frame of the two videos to find the matched frames.
- Such methods may run into timing constraints or may be overly complex to be practical.
- a method for identifying real-time latency of audio/video streams includes buffering signatures of a reference audio/video stream and signatures of a test audio/video stream; constructing a needle as a vector including a set of signatures of the reference audio/video stream; computing correlations of the needle to successive vectors of sets of signatures of the test audio/video stream using a correlation function that calculates relatedness of the needle vector to each of the successive vectors of the test audio/video stream; identifying a synchronization offset between the test stream and the reference stream according to a maximum correlation point of the correlations of the needle to the successive sets of signatures of the test audio/video stream; and aligning the reference audio/video stream and the test audio/video stream according to the synchronization offset.
- a system for identifying real-time latency of audio/video streams includes a computing device programmed to buffer signatures of a reference audio/video stream and signatures of a test audio/video stream; construct a needle as a vector including a set of signatures of the reference audio/video stream; compute correlations of the needle to successive vectors of sets of signatures of the test audio/video stream using a correlation function that calculates relatedness of the needle vector to each of the successive vectors of the test audio/video stream; identify a synchronization offset between the test stream and the reference stream according to a maximum correlation point of the correlations of the needle to the successive sets of signatures of the test audio/video stream; and align the reference audio/video stream and the test audio/video stream according to the synchronization offset.
- a non-transitory computer-readable medium includes instructions for identifying real-time latency of audio/video streams, that when executed by a processor of a computing device, cause the computing device to buffer signatures of a reference audio/video stream and signatures of a test audio/video stream; construct a needle as a vector including a set of signatures of the reference audio/video stream; compute correlations of the needle to successive vectors of sets of signatures of the test audio/video stream using a correlation function that calculates relatedness of the needle vector to each of the successive vectors of the test audio/video stream; identify a synchronization offset between the test stream and the reference stream according to a maximum correlation point of the correlations of the needle to the successive sets of signatures of the test audio/video stream; and align the reference audio/video stream and the test audio/video stream according to the synchronization offset.
- FIG. 1 illustrates an example of an end-to-end system for the measurement of latency between audio/video streams
- FIG. 2 illustrates an example of streams to be compared
- FIG. 3 illustrates a further detail of the comparison of a needle portion of the reference stream to test portions of multiple test streams
- FIG. 4 illustrates an example of a needle along with a correlation coefficient for a test stream
- FIG. 5 illustrates an example process for the measurement of latency between audio/video streams
- FIG. 6 illustrates an example of an end-to-end system for the measurement of latency between audio/video streams.
- latency may refer to an amount of time for a unique media unit to reach between two measurement points.
- distributed it is meant that a system may be co-located but may perform a measurement from two or more data sources that may be geographically diverse.
- Synchronization delay may refer to the time offset between video latency and associated audio latency with additional delays due to decoding, filtering and rendering taken into account and then measured at two different point.
- FIG. 1 illustrates an example of an end-to-end system 100 for the measurement of latency between audio/video streams.
- a video delivery chain includes a sequence of one or more encoder 104, transcoder 106, packager 108, origin 110, content delivery network 112, and home viewing device 114.
- Each of the devices along the video delivery chain may perform operations that involve video quality degradations and latencies.
- the source video feed may be in the format of many video formats, for example, SDI, transport stream, multicast IP, or mezzanine files from content producers/providers.
- a network monitor 116 may monitor the end-to-end system 100 for latency using signatures 118 computed from content streams at various points 120 along the video delivery chain. It should be noted that the delivery chain may be geographically diverse and that the calculations may occur co-located or in a distributed manner.
- An instance of video content may include, as some examples, live video feeds from current events, prerecorded shows or movies, and advertisements or other clips to be inserted into other video feeds.
- the video content may include just video in some examples, but in many cases the video further includes additional content such as audio, subtitles, and metadata information descriptive of the content and/or format of the video.
- the system 100 includes one or more sources 102 of instances of video content.
- the distributor passes the video content through a sophisticated video delivery chain such as shown, including the series of content sources 102, encoders 104, transcoders 106, packagers 108, origins 110, content delivery networks 112, and consumer devices 114 to ultimately present the video content.
- one or more encoders 104 may receive the video content from the sources 102.
- the encoders 104 may be located at a head-end of the system 100.
- the encoders 104 may include electronic circuits and/or software configured to compress the video content into a format that conforms with one or more standard video compression specifications. Examples of video encoding formats include MPEG-2 Part 2, MPEG-4 Part 2, H.264 (MPEG-4 Part 10), HEVC, Theora, RealVideo RV40, VP9, and AVI.
- the compressed video lacks some information present in the original video, which is referred to as lossy compression. A consequence of this is that decompressed video may have a lower quality than the original, uncompressed video.
- One or more transcoders 106 may receive the encoded video content from the encoders 104.
- the transcoders 106 may include electronic circuits and/or software configured to re encode the video content from a source format, resolution, and/or bit depth into an instance of video content with a different format, resolution, and/or bit depth.
- the transcoders 106 may be used to create, for each received instance of video content, a set of time-aligned video streams, each with a different bitrate and frame size. This set of video streams may be referred to as a ladder or compression ladder. It may be useful to have different versions of the same video streams in the ladder, as downstream users may have different bandwidth, screen size, or other constraints.
- the transcoders 106 may be integrated into the encoders 104, but in other examples the encoders 104 and transcoders 106 are separate components.
- One or more packagers 108 may have access to the ladders for each of the instances of video content.
- the packagers 108 may include hardware and/or software configured to create segmented video files to be delivered to clients that then stitch the segments together to form a contiguous video stream.
- the segmented video may include video fragments, as well as a manifest that indicates how to combine the fragments.
- the packager 108 may sometimes be integrated into the encoder 104 and/or transcoder 106 that first creates the digital encoding of the instance of video content, but often it is a separate component.
- the transcoders 106 and packagers 108 may be located in a media data center between the head-end and the content delivery network 112.
- the packagers 108 may provide the packaged video content to one or more origins 110 to the content delivery network 112.
- the origins 110 refer to a location of the content delivery network 112 to which video content enters the content delivery network 112.
- the packagers 108 serve as origins 110 to the content delivery network 112, which in other cases, the packagers 108 push the video fragments and manifests into the origins 110.
- the content delivery network 112 may include a geographically-distributed network of servers and data centers configured to provide the video content from the origins 110 to destination consumer devices 114.
- the consumer devices 114 may include, as some examples, set-top boxes connected to televisions or other video screens, tablet computing devices, and/or mobile phones.
- these varied devices 114 may have different viewing conditions (including illumination and viewing distance, etc.), spatial resolution (e.g ., SD, HD, full-HD, UHD, 4K, etc.), frame rate (15, 24, 30, 60, 120 frames per second, etc.), dynamic range (8 bits, 10 bits, and 12 bits per pixel per color, etc.).
- the consumer device 114 may execute a video player to play back the video content received to the devices 114 from the content delivery network 112.
- the network monitor 116 may be configured to monitor the audio/video streams that are provided along the video delivery chain.
- the network monitor 116 may receive signatures 118 for the audio/video of the streams in a periodic manner from test points 120 along the multimedia delivery chain 100.
- the network monitor 116 may generate the signatures itself.
- the network monitor 116 may also align these streams with a different system that may or may not be co-located.
- the network monitor 116 may execute an analyzer application to monitor the video and audio streams.
- the analyzer application may normalize the video and audio content before analysis. This normalization allows for a uniform approach to generating a signature that is not dependent on input frame rate or resolution.
- the normalization process processes the video to a common resolution and framerate to support multiple cross input resolution and framerates (one example may be to take a 1920xl080p60 video and downconvert it to 640x360p30 video).
- the audio may also be normalized to a common channel layout and sample rate (one example may be to take a 5.1 channel 44.1 kHz audio signal and convert it to a mono 48000 Hz signal).
- signature generation may produce signatures 120 from one or more signatures per video unit of time and one or more signatures per audio unit of time.
- a video unit of time may be a common frame rate such as 30 frames per second, where 1 frame time is 1/30 second.
- An audio unit of time may be the inverse of the sampling rate (i.e., 48000 kHz audio would have an audio time of 1/48000 second).
- the video signatures 118 may be calculated according to an amount of intra frame activity. Intra frame activity may be defined, for example, as shown in equation (1): where:
- Y is the Luma pixel values in a YUV colorspace.
- an approach to computing video signatures 118 may include calculating the inter activity between frames.
- an approach to computing video signatures 118 may include computation of temporal Luma Activity.
- the audio signature 118 may be computed by tracking a difference in audio over time via two or more filters (e.g ., a low pass filter, an infinite impulse response (HR) filter, etc.) and determining whether the audio sample is contributing or impacting the overall energy of the channel.
- filters e.g ., a low pass filter, an infinite impulse response (HR) filter, etc.
- the signature 118 data can be streamed remotely to a server of the network monitor 116.
- This data may be streamed with additional pieces of information as well, such as a time at which the sample (audio or video) should be displayed (sometimes referred to as presentation time stamp (PTS)), as well as the time the sample (audio or video) was captured (sometimes referred to as wallclock time).
- PTS presentation time stamp
- This triplet of data, (e.g., the signatures 118, PTS time, and wallclock time) for each sample may be streamed remotely (e.g., to the network monitor 116) and buffered.
- the amount buffered is equivalent to the greatest delay between test points.
- the network monitor 116 may buffer up to one minute of data.
- FIG. 2 illustrates an example of streams to be compared.
- a test stream 202 and a reference stream 204 there is a test stream 202 and a reference stream 204.
- these streams 202, 204 would typically include an audio/video pair.
- an audio/video signature 118 pair may be taken at the input of the encoder 104, the output of the encoder 104, the output of the transcoder 106, the output of packager 108, or at any other point.
- the data of the audio/video pairs may be streamed in an independent manner. For instance, data for the transcoder 106 output stream pair video 3 may be transmitted once (and used for each of audio 1, 2, and 3), not transmitted once for each audio stream. The data may be received and buffered by the network monitor 116. The amount buffered should be large enough to capture the largest delay between the test points 120.
- a reference stream 204 is selected and is tested along all other vectors of data (e.g ., the test streams 202). Once there are at least a sufficient number of samples, a correlation between the streams of signatures 118 may be calculated. For instance, this correlation may be determined between a needle 206 of the reference stream 204 and a test portion 208 of the test streams 202.
- FIG. 3 illustrates a further detail of the comparison of the needle 206 portion of the reference stream 204 to test 208 portions of multiple test streams 202.
- the needle 206 portion is being compared to an example set of test portions 208 of three test streams 202, to determine a correlation between the needle 206 and each of the test portions 208.
- a correlation function may be used to determine the correlation between the streams of signatures 118.
- a correlation function operates on two sets of vectors and calculates how closely the two vectors are related.
- One correlation function that may be used is the Pearson Finear Correlation Coefficient (PFCC) but other correlation function may be used as well.
- PFCC Pearson Finear Correlation Coefficient
- MSE mean squared error
- RMSE root mean squared error
- MAE mean absolute error
- PSNR peak signal-to-noise ratio
- Spearman’s rank correlation coeffect (SRCC) Spearman’s rank correlation coeffect
- KRCC rank correlation coefficient
- the needle 206 is selected at the most recent point in the data stream. However, it should be noted that the needle 206 can be selected at any point within the reference stream 204. Selecting the needle 206 in a test stream 202 makes that stream the reference stream 204, and the reference stream 204 would then become a test stream 202.
- FIG. 4 illustrates an example of a needle 206 along with CC for a test stream 202.
- the needle 206 provides a reference pattern for the comparison, and the coefficients 402 vary in probability over a time index to illustrate the relative probability of a match.
- a relative maximum probability may be seen shy of index 1600, indicating a most likely offset.
- the starting index i of the vector may be used to look up the wallclock and PTS times at the synchronization point for that stream 202, as the needle 206 may be fixed at m — n while the haystack test portion 208 may vary based on the value above. This is illustrated in equation (4) as follows:
- WC is the wallclock time
- x and y are vectors of the same size of video signatures of the needle and haystack, respectively;
- i is the index found within x that has a high CC with y;
- (m — n) is the size of the needle searched within x.
- the above calculation can be duplicated, but, instead of using wallclock time, the PTS may be used instead. Additionally, the video needle and haystack variables x and y may be replaced with a and b for the needle and haystack of the audio, respectively.
- the synchronization offset may, accordingly, determine utilizing the clock by how much the audio and video pair are out of sync.
- This synchronization-determining process may be calculated at various intervals. In one example, the process may be performed on every sample that comes in. In another example, the process may be performed periodically. For instance, to ensure that the synchronization offset or delay has not changed, the above process can be calculated in a period manner.
- further processing may be performed to ensure that the calculation remains correct.
- An example would be if the max video CC is > 98% for a five-sample period, it can be assumed that the synchronization point is valid. This may be because the synchronization point is the point where the correlation coefficient is maximized.
- confidence in the algorithm may be reduced, e.g., by 50%. If multiple low- confidence values are reached, the buffer may be flushed and the synchronization-determining process restarted from the beginning.
- optimizations can occur by not always selecting the first m — n samples to act as a needle. Instead, searching the reference vector for a window of (m — n) samples with high variance may increase the probability of finding a high CC. Additionally, increasing the size of the (m — n) window may improve finding a high CC, with a tradeoff of requiring additional processing time.
- the process can do a preliminary check that the synchronization point remains correct, and only do a full search when the CC is below an acceptable threshold.
- Another situation that may be considered is if the max CC drops below an acceptable threshold.
- the max CC drops below an acceptable threshold for a number of sample periods, then it can be determined that the audio or video content has become too similar. If the content becomes too similar for a period, then the buffered data can be flushed and the process may restart from an empty buffer to try again to identify the synchronization offsets and delays.
- the described synchronization approach may also be useful for other applications.
- the CC is low between two streams, then it can be inferred that the content of the two streams is wildly different. This may be useful where determination of the two streams is critical to operations. For instance, one stream may have indivertibly been changed to provide different content, or there may be a significant introduction of noise into a stream along the video delivery chain.
- FIG. 5 illustrates an example process 500 for the measurement of latency between audio/video streams.
- the process 500 may be performed by the network monitor 116 in the context of the system 100.
- the network monitor 116 buffers signatures 118 of a reference stream 204 and signatures 118 of one or more test streams 202.
- the video signatures 118 include information calculated according to an amount of intra frame activity, while the audio signature 118 may be computed by tracking a difference in audio over time via two or more filters (e.g ., a low pass filter, an infinite impulse response (HR) filter, etc.) and determining whether the audio sample is contributing or impacting the overall energy of the channel.
- the network monitor 116 may receive the signatures 118 from test points 120 along the multimedia delivery chain 100.
- audio/video signature 118 pairs may be taken at the input of the encoder 104, the output of the encoder 104, the output of the transcoder 106, the output of packager 108, or at any other point along the multimedia delivery chain 100.
- the network monitor 116 may buffer enough data to ensure capture of the largest delay between the test points 120. In an example, the network monitor 116 may buffer one second of the signatures 118.
- the network monitor 116 constructs a needle 206 from the reference stream 204.
- the needle 206 is selected as the most recent set of n points in the reference data stream, where n is the size of the needle.
- the size n may be set to balance finding a high CC, with the amount or processing time that is required to perform the CC computations.
- the network monitor 116 computes correlations of the needle 206 at each position within the test stream 202.
- the network monitor 116 compares the needle 206 portion to successive sets of test portions 208 (e.g ., advancing a window size of n in one sample increments of the one or more test streams 202), to determine a correlation between the needle 206 and each of the test portions 208.
- the input to the correlation function may be represented as vectors to be compared, and the correlation function may be designed to operate on the two sets of vectors to calculate how closely related the two vectors are.
- the correlation function may include one or more of PLCC, MSE, RMSE, MAE, PSNR, SRCC, or RRCC.
- the network monitor 116 identifies a synchronization offset for each of the one or more test streams 202 compared to the reference stream 204.
- the synchronization offset for a test stream 202 may be identified as the maximum correlation point of correlations of the needle 206 at each position within the test stream 202. In some implementations, multiple consecutive consistent maximum correlation points may be required to confirm the synchronization offset.
- the network monitor 116 updates the synchronization offset according to the outlier metric for the identified synchronization offset of operation 508. This may be desirable because sometimes the alignment offset found at operation 508 may change significantly on a per-iteration basis (e.g., the offset for a first iteration is at 100 frames but for a next iteration is at 1000 frames). To prevent excessive bounce from occurring, additional verification may be performed before acceptance of the latest offset. This verification allows the network monitor 116 to compare the synchronization offset for the current iteration with one or more synchronization offsets computed in previous iterations of the process 500. If the synchronization offset is comparable, then the latest synchronization offset may be used as the new offset.
- a function may be applied that determines whether the current value is an outlier from the previous N values.
- One such function may be to determine whether the synchronization offset minus the median value of previous N synchronization offsets (the median absolute difference) is less than a threshold value. If so, then the synchronization offset for the frame is accepted; otherwise, the synchronization offset for the frame is rejected.
- Another such function may take the mean absolute error of the current synchronization offset from the mean of the previous N synchronization offset values. If the mean absolute error is less than 2 standard deviations of the N values (for example), then the synchronization offset may be accepted; otherwise, the synchronization offset is rejected.
- N may be from 1-5, but any amount of synchronization offset slots may be used. These slots may also be reset upon various conditions, such as transition to a new video segment, detection of a change to a new resolution, etc.
- the network monitor 116 determines whether to compute a next synchronization offset. In one example, the network monitor 116 sets a timer to periodically recalculate the synchronization offset(s). If so and the timer expired, control returns to operation 502. If not, then the process 500 may remain at operation 512. In another example, the network monitor 116 tracks whether the synchronization offset no longer provides a high correlation coefficient, such that control returns to operation 502 responsive to a drop in the correlation coefficient below a threshold confidence. If not, then the process 500 may remain at operation 512. In yet a further example, the synchronization offset is continually updated and the process 500 simply loops from operation 512 to operation 502. As yet a further possibility, the synchronization offset is completed once, and after operation 512 the process 500 ends (not shown).
- FIG. 6 illustrates an example computing device 600 for use in the measurement of latency between audio/video streams.
- the algorithms and/or methodologies of one or more embodiments discussed herein may be implemented using such a computing device.
- the operations performed herein by the network monitor 116, such as those of the process 500, may be implemented with such a computing device 600.
- the computing device 600 may include memory 602, processor 604, and non-volatile storage 606.
- the processor 604 may include one or more devices selected from high-performance computing (HPC) systems including high-performance cores, microprocessors, micro-controllers, digital signal processors, microcomputers, central processing units, field programmable gate arrays, programmable logic devices, state machines, logic circuits, analog circuits, digital circuits, or any other devices that manipulate signals (analog or digital) based on computer-executable instructions residing in memory 602.
- the memory 602 may include a single memory device or a number of memory devices including, but not limited to, random access memory (RAM), volatile memory, non-volatile memory, static random-access memory (SRAM), dynamic random access memory (DRAM), flash memory, cache memory, or any other device capable of storing information.
- the non-volatile storage 606 may include one or more persistent data storage devices such as a hard drive, optical drive, tape drive, non-volatile solid-state device, cloud storage or any other device capable of persistently storing information.
- the processor 604 may be configured to read into memory 602 and execute computer- executable instructions residing in program instructions 608 of the non-volatile storage 606 and embodying algorithms and/or methodologies of one or more embodiments.
- the program instructions 608 may include operating systems and applications.
- the program instructions 608 may be compiled or interpreted from computer programs created using a variety of programming languages and/or technologies, including, without limitation, and either alone or in combination, JAVA, C, C++, C#, OBJECTIVE C, FORTRAN, PASCAL, JAVA SCRIPT, PYTHON, PERL, and PL/SQL.
- the computer-executable instructions of the program instructions 608 may cause the computing device 600 to implement one or more of the algorithms and/or methodologies disclosed herein.
- the non-volatile storage 606 may also include data 610 supporting the functions, features, and processes of the one or more embodiments described herein. This data 610 may include, as some examples, data of the test streams 202 and reference streams 204, needle 206, and computed offset results.
- the processes, methods, or algorithms disclosed herein can be deliverable to/implemented by a processing device, controller, or computer, which can include any existing programmable electronic control unit or dedicated electronic control unit.
- the processes, methods, or algorithms can be stored as data and instructions executable by a controller or computer in many forms including, but not limited to, information permanently stored on non-writable storage media such as ROM devices and information alterably stored on writeable storage media such as floppy disks, magnetic tapes, CDs, RAM devices, and other magnetic and optical media.
- the processes, methods, or algorithms can also be implemented in a software executable object.
- the processes, methods, or algorithms can be embodied in whole or in part using suitable hardware components, such as Application Specific Integrated Circuits (ASICs), Field-Programmable Gate Arrays (FPGAs), state machines, controllers or other hardware components or devices, or a combination of hardware, software and firmware components.
- suitable hardware components such as Application Specific Integrated Circuits (ASICs), Field-Programmable Gate Arrays (FPGAs), state machines, controllers or other hardware components or devices, or a combination of hardware, software and firmware components.
Landscapes
- Engineering & Computer Science (AREA)
- Signal Processing (AREA)
- Multimedia (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
- Television Systems (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
Real-time latency of audio/video streams is identified. Signatures of a reference audio/video stream and signatures of a test audio/video stream are buffered. A needle is constructed as a vector including a set of signatures of the reference audio/video stream. Correlations of the needle to successive vectors of sets of signatures of the test audio/video stream are computed using a correlation function that calculates relatedness of the needle vector to each of the successive vectors of the test audio/video stream. A synchronization offset is identified between the test stream and the reference stream according to a maximum correlation point of the correlations of the needle to the successive sets of signatures of the test audio/video stream. The reference audio/video stream and the test audio/video stream are aligned according to the synchronization offset.
Description
DISTRIBUTED MEASUREMENT OF LATENCY AND SYNCHRONIZATION DELAY BETWEEN AUDIO/VIDEO STREAMS
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application claims the benefit of U.S. provisional application Serial
No. 62/976,169 filed February 13, 2020, and U.S. provisional application Serial No. 63/055,946 filed July 24, 2020 the disclosures of which are hereby incorporated in their entireties by reference herein.
TECHNICAL FIELD
[0002] Aspects of the disclosure generally relate to the distributed measurement of latency and synchronization delay between audio/video streams.
BACKGROUND
[0003] Measuring real-time latency between two streams can be very time consuming and cumbersome. One example technique for performing temporal alignment, and consequently latency measurement, is a manual process such as monitoring the frames of two videos and aligning them visually. Another technique that may be used is the performance of expensive frame-based measurement such as computing peak signal-to-noise ratio or any other frame-based differencing tool between every frame of the two videos to find the matched frames. Such methods, however, may run into timing constraints or may be overly complex to be practical.
SUMMARY
[0004] In a first illustrative example, a method for identifying real-time latency of audio/video streams includes buffering signatures of a reference audio/video stream and signatures of a test audio/video stream; constructing a needle as a vector including a set of signatures of the reference audio/video stream; computing correlations of the needle to successive vectors of sets of signatures of the test audio/video stream using a correlation function that calculates relatedness of the needle vector to each of the successive vectors of the test audio/video stream; identifying a synchronization offset between the test stream and the reference stream according to a maximum correlation point of the
correlations of the needle to the successive sets of signatures of the test audio/video stream; and aligning the reference audio/video stream and the test audio/video stream according to the synchronization offset.
[0005] In a second illustrative example, a system for identifying real-time latency of audio/video streams, includes a computing device programmed to buffer signatures of a reference audio/video stream and signatures of a test audio/video stream; construct a needle as a vector including a set of signatures of the reference audio/video stream; compute correlations of the needle to successive vectors of sets of signatures of the test audio/video stream using a correlation function that calculates relatedness of the needle vector to each of the successive vectors of the test audio/video stream; identify a synchronization offset between the test stream and the reference stream according to a maximum correlation point of the correlations of the needle to the successive sets of signatures of the test audio/video stream; and align the reference audio/video stream and the test audio/video stream according to the synchronization offset.
[0006] In a third illustrative example, a non-transitory computer-readable medium includes instructions for identifying real-time latency of audio/video streams, that when executed by a processor of a computing device, cause the computing device to buffer signatures of a reference audio/video stream and signatures of a test audio/video stream; construct a needle as a vector including a set of signatures of the reference audio/video stream; compute correlations of the needle to successive vectors of sets of signatures of the test audio/video stream using a correlation function that calculates relatedness of the needle vector to each of the successive vectors of the test audio/video stream; identify a synchronization offset between the test stream and the reference stream according to a maximum correlation point of the correlations of the needle to the successive sets of signatures of the test audio/video stream; and align the reference audio/video stream and the test audio/video stream according to the synchronization offset.
BRIEF DESCRIPTION OF THE DRAWINGS
[0007] FIG. 1 illustrates an example of an end-to-end system for the measurement of latency between audio/video streams;
[0008] FIG. 2 illustrates an example of streams to be compared;
[0009] FIG. 3 illustrates a further detail of the comparison of a needle portion of the reference stream to test portions of multiple test streams;
[0010] FIG. 4 illustrates an example of a needle along with a correlation coefficient for a test stream;
[0011] FIG. 5 illustrates an example process for the measurement of latency between audio/video streams; and
[0012] FIG. 6 illustrates an example of an end-to-end system for the measurement of latency between audio/video streams.
DETAILED DESCRIPTION
[0013] As required, detailed embodiments of the present invention are disclosed herein; however, it is to be understood that the disclosed embodiments are merely exemplary of the invention that may be embodied in various and alternative forms. The figures are not necessarily to scale; some features may be exaggerated or minimized to show details of particular components. Therefore, specific structural and functional details disclosed herein are not to be interpreted as limiting, but merely as a representative basis for teaching one skilled in the art to variously employ the present invention.
[0014] Aspects of the disclosure generally relate to the distributed measurement of latency and synchronization delay between audio/video streams. As described herein, latency may refer to an amount of time for a unique media unit to reach between two measurement points. By distributed, it is meant that a system may be co-located but may perform a measurement from two or more data sources that may be geographically diverse. Synchronization delay may refer to the time offset
between video latency and associated audio latency with additional delays due to decoding, filtering and rendering taken into account and then measured at two different point.
[0015] FIG. 1 illustrates an example of an end-to-end system 100 for the measurement of latency between audio/video streams. In the illustrated example, a video delivery chain includes a sequence of one or more encoder 104, transcoder 106, packager 108, origin 110, content delivery network 112, and home viewing device 114. Each of the devices along the video delivery chain may perform operations that involve video quality degradations and latencies. The source video feed may be in the format of many video formats, for example, SDI, transport stream, multicast IP, or mezzanine files from content producers/providers. For home TV, there are often set-top boxes that replay the received video streams to TV, e.g. through HDMI cables. As explained in detail below, a network monitor 116 may monitor the end-to-end system 100 for latency using signatures 118 computed from content streams at various points 120 along the video delivery chain. It should be noted that the delivery chain may be geographically diverse and that the calculations may occur co-located or in a distributed manner.
[0016] An instance of video content may include, as some examples, live video feeds from current events, prerecorded shows or movies, and advertisements or other clips to be inserted into other video feeds. The video content may include just video in some examples, but in many cases the video further includes additional content such as audio, subtitles, and metadata information descriptive of the content and/or format of the video. As shown, the system 100 includes one or more sources 102 of instances of video content. In general, when a video distributor receives source video, the distributor passes the video content through a sophisticated video delivery chain such as shown, including the series of content sources 102, encoders 104, transcoders 106, packagers 108, origins 110, content delivery networks 112, and consumer devices 114 to ultimately present the video content.
[0017] More specifically, one or more encoders 104 may receive the video content from the sources 102. The encoders 104 may be located at a head-end of the system 100. The encoders 104 may include electronic circuits and/or software configured to compress the video content into a format that conforms with one or more standard video compression specifications. Examples of video encoding formats include MPEG-2 Part 2, MPEG-4 Part 2, H.264 (MPEG-4 Part 10), HEVC, Theora, RealVideo RV40, VP9, and AVI. In many cases, the compressed video lacks some information
present in the original video, which is referred to as lossy compression. A consequence of this is that decompressed video may have a lower quality than the original, uncompressed video.
[0018] One or more transcoders 106 may receive the encoded video content from the encoders 104. The transcoders 106 may include electronic circuits and/or software configured to re encode the video content from a source format, resolution, and/or bit depth into an instance of video content with a different format, resolution, and/or bit depth. In many examples, the transcoders 106 may be used to create, for each received instance of video content, a set of time-aligned video streams, each with a different bitrate and frame size. This set of video streams may be referred to as a ladder or compression ladder. It may be useful to have different versions of the same video streams in the ladder, as downstream users may have different bandwidth, screen size, or other constraints. In some cases, the transcoders 106 may be integrated into the encoders 104, but in other examples the encoders 104 and transcoders 106 are separate components.
[0019] One or more packagers 108 may have access to the ladders for each of the instances of video content. The packagers 108 may include hardware and/or software configured to create segmented video files to be delivered to clients that then stitch the segments together to form a contiguous video stream. The segmented video may include video fragments, as well as a manifest that indicates how to combine the fragments. The packager 108 may sometimes be integrated into the encoder 104 and/or transcoder 106 that first creates the digital encoding of the instance of video content, but often it is a separate component. In one example, the transcoders 106 and packagers 108 may be located in a media data center between the head-end and the content delivery network 112.
[0020] The packagers 108 may provide the packaged video content to one or more origins 110 to the content delivery network 112. The origins 110 refer to a location of the content delivery network 112 to which video content enters the content delivery network 112. In some cases, the packagers 108 serve as origins 110 to the content delivery network 112, which in other cases, the packagers 108 push the video fragments and manifests into the origins 110. The content delivery network 112 may include a geographically-distributed network of servers and data centers configured to provide the video content from the origins 110 to destination consumer devices 114. The consumer devices 114 may include, as some examples, set-top boxes connected to televisions or other video screens, tablet computing devices, and/or mobile phones. Notably, these varied devices 114 may have
different viewing conditions (including illumination and viewing distance, etc.), spatial resolution ( e.g ., SD, HD, full-HD, UHD, 4K, etc.), frame rate (15, 24, 30, 60, 120 frames per second, etc.), dynamic range (8 bits, 10 bits, and 12 bits per pixel per color, etc.). The consumer device 114 may execute a video player to play back the video content received to the devices 114 from the content delivery network 112.
[0021] The network monitor 116 may be configured to monitor the audio/video streams that are provided along the video delivery chain. In one example, the network monitor 116 may receive signatures 118 for the audio/video of the streams in a periodic manner from test points 120 along the multimedia delivery chain 100. In another example, the network monitor 116 may generate the signatures itself. The network monitor 116 may also align these streams with a different system that may or may not be co-located.
[0022] The network monitor 116 may execute an analyzer application to monitor the video and audio streams. The analyzer application may normalize the video and audio content before analysis. This normalization allows for a uniform approach to generating a signature that is not dependent on input frame rate or resolution.
[0023] In an example, the normalization process processes the video to a common resolution and framerate to support multiple cross input resolution and framerates (one example may be to take a 1920xl080p60 video and downconvert it to 640x360p30 video). The audio may also be normalized to a common channel layout and sample rate (one example may be to take a 5.1 channel 44.1 kHz audio signal and convert it to a mono 48000 Hz signal).
[0024] Once normalized, signature generation may produce signatures 120 from one or more signatures per video unit of time and one or more signatures per audio unit of time. A video unit of time may be a common frame rate such as 30 frames per second, where 1 frame time is 1/30 second. An audio unit of time may be the inverse of the sampling rate (i.e., 48000 kHz audio would have an audio time of 1/48000 second).
[0025] In one example, the video signatures 118 may be calculated according to an amount of intra frame activity. Intra frame activity may be defined, for example, as shown in equation (1):
where:
Y is the Luma pixel values in a YUV colorspace.
It should be noted that the aforementioned approach to computing video signatures 118 is one example, and other techniques may additionally or alternately be used. For instance, an approach to computing video signatures 118 may include calculating the inter activity between frames. As another possibility, an approach to computing video signatures 118 may include computation of temporal Luma Activity.
[0026] With respect to the computation of audio signatures 118, the audio signature 118 may be computed by tracking a difference in audio over time via two or more filters ( e.g ., a low pass filter, an infinite impulse response (HR) filter, etc.) and determining whether the audio sample is contributing or impacting the overall energy of the channel.
[0027] With these audio and video signatures 118 computed, the signature 118 data can be streamed remotely to a server of the network monitor 116. This data may be streamed with additional pieces of information as well, such as a time at which the sample (audio or video) should be displayed (sometimes referred to as presentation time stamp (PTS)), as well as the time the sample (audio or video) was captured (sometimes referred to as wallclock time).
[0028] This triplet of data, (e.g., the signatures 118, PTS time, and wallclock time) for each sample may be streamed remotely (e.g., to the network monitor 116) and buffered. The amount buffered is equivalent to the greatest delay between test points. In an example broadcast implementation, the network monitor 116 may buffer up to one minute of data.
[0029] FIG. 2 illustrates an example of streams to be compared. In many configurations, there is a test stream 202 and a reference stream 204. As described herein, these streams 202, 204 would typically include an audio/video pair. In a distributed system that includes multiple test points 120
and multiple audio streams, this can quickly become a large number of pairs. For instance, an audio/video signature 118 pair may be taken at the input of the encoder 104, the output of the encoder 104, the output of the transcoder 106, the output of packager 108, or at any other point.
[0030] If, for example, the output of the encoder 104 (and input of the transcoder 106) has one video stream and three audio streams, there would be three pairs as shown in Table 1.
Table 1 - Example Audio/Video Pairs at Encoder Output
[0031] This can be repeated at the output of the transcoder 106 where there may be, for example, three video streams and three audio streams, as shown in Table 2.
Table 2 - Example Audio/Video Pairs at Transcoder Output
[0032] The data of the audio/video pairs may be streamed in an independent manner. For instance, data for the transcoder 106 output stream pair video 3 may be transmitted once (and used for each of audio 1, 2, and 3), not transmitted once for each audio stream. The data may be received and buffered by the network monitor 116. The amount buffered should be large enough to capture the largest delay between the test points 120.
[0033] As the data is streamed into the network monitor 116, a reference stream 204 is selected and is tested along all other vectors of data ( e.g ., the test streams 202). Once there are at least a sufficient number of samples, a correlation between the streams of signatures 118 may be calculated. For instance, this correlation may be determined between a needle 206 of the reference stream 204 and a test portion 208 of the test streams 202.
[0034] FIG. 3 illustrates a further detail of the comparison of the needle 206 portion of the reference stream 204 to test 208 portions of multiple test streams 202. As shown, the needle 206 portion is being compared to an example set of test portions 208 of three test streams 202, to determine a correlation between the needle 206 and each of the test portions 208.
[0035] A correlation function may be used to determine the correlation between the streams of signatures 118. In general, a correlation function operates on two sets of vectors and calculates how closely the two vectors are related. One correlation function that may be used is the Pearson Finear Correlation Coefficient (PFCC) but other correlation function may be used as well. As some other examples, mean squared error (MSE), root mean squared error (RMSE), mean absolute error (MAE), peak signal-to-noise ratio (PSNR), Spearman’s rank correlation coeffect (SRCC), and/or Kendall’s rank correlation coefficient (KRCC) may additionally or alternatively be used.
[0036] As noted, the input to the correlation function may be represented as two vectors to be compared. In one example, these vectors may be defined, as shown in equation (2): af = {a[Z], a[l + 1], ... , a[k ]} (2)
[0037] This is an example where the needle 206 is selected at the most recent point in the data stream. However, it should be noted that the needle 206 can be selected at any point within the reference stream 204. Selecting the needle 206 in a test stream 202 makes that stream the reference stream 204, and the reference stream 204 would then become a test stream 202.
[0038] The correlation is calculated at each position within the haystack test portion 208. The resulting vector of correlation values with the associated index within the data stream may then be sorted and the maximum may be taken, as shown in equation (3):
where:
CC is the vector of correlation coefficient, C is the correlation function; i a time index into the stream; m is the total size of the haystack; y is referred to as the needle; n is the size of the needle; and x is the haystack.
[0039] FIG. 4 illustrates an example of a needle 206 along with CC for a test stream 202. As shown, the needle 206 provides a reference pattern for the comparison, and the coefficients 402 vary in probability over a time index to illustrate the relative probability of a match. Notably, a relative maximum probability may be seen shy of index 1600, indicating a most likely offset.
[0040] To continue with the example used above, thirty frames per second for one minute would have 1,800 samples, with an n of 100 samples. If the max CC is greater than a threshold confidence value (for example > 98%), then this may be assumed to be a valid synchronization point between the streams 202, 204.
[0041] The starting index i of the vector may be used to look up the wallclock and PTS times at the synchronization point for that stream 202, as the needle 206 may be fixed at m — n while the haystack test portion 208 may vary based on the value above. This is illustrated in equation (4) as follows:
Delay(x, y ) = WC(x(i)) — WC(y(m — n)) (3) where:
WC is the wallclock time; x and y are vectors of the same size of video signatures of the needle and haystack, respectively; i is the index found within x that has a high CC with y; and
(m — n) is the size of the needle searched within x.
[0042] When audio is introduced, the above calculation can be duplicated, but, instead of using wallclock time, the PTS may be used instead. Additionally, the video needle and haystack variables x and y may be replaced with a and b for the needle and haystack of the audio, respectively. Thus, the synchronization offset may be provided as follows in equation (4):
SynchronizationOffset(a, b, x,y ) = (4)
Assuming, in one illustrative example, a 90 kHz clock for PTS, the synchronization offset may, accordingly, determine utilizing the clock by how much the audio and video pair are out of sync.
[0043] This synchronization-determining process may be calculated at various intervals. In one example, the process may be performed on every sample that comes in. In another example, the process may be performed periodically. For instance, to ensure that the synchronization offset or delay has not changed, the above process can be calculated in a period manner.
[0044] As another example, further processing may be performed to ensure that the calculation remains correct. An example would be if the max video CC is > 98% for a five-sample period, it can be assumed that the synchronization point is valid. This may be because the synchronization point is the point where the correlation coefficient is maximized. In this example, every time a CC that is <= 98% is encountered, confidence in the algorithm may be reduced, e.g., by 50%. If multiple low- confidence values are reached, the buffer may be flushed and the synchronization-determining process restarted from the beginning.
[0045] Optimizations can occur by not always selecting the first m — n samples to act as a needle. Instead, searching the reference vector for a window of (m — n) samples with high variance may increase the probability of finding a high CC. Additionally, increasing the size of the (m — n) window may improve finding a high CC, with a tradeoff of requiring additional processing time. As another possible optimization, when a synchronization point is found, instead of starting the search at the beginning of the next evaluation period, the process can do a preliminary check that the synchronization point remains correct, and only do a full search when the CC is below an acceptable threshold.
[0046] Another situation that may be considered is if the max CC drops below an acceptable threshold. If the max CC drops below an acceptable threshold for a number of sample periods, then it can be determined that the audio or video content has become too similar. If the content becomes too similar for a period, then the buffered data can be flushed and the process may restart from an empty buffer to try again to identify the synchronization offsets and delays.
[0047] The described synchronization approach may also be useful for other applications. As one possibility, if the CC is low between two streams, then it can be inferred that the content of the two streams is wildly different. This may be useful where determination of the two streams is critical to operations. For instance, one stream may have indivertibly been changed to provide different content, or there may be a significant introduction of noise into a stream along the video delivery chain.
[0048] FIG. 5 illustrates an example process 500 for the measurement of latency between audio/video streams. In an example, the process 500 may be performed by the network monitor 116 in the context of the system 100.
[0049] At operation 502, the network monitor 116 buffers signatures 118 of a reference stream 204 and signatures 118 of one or more test streams 202. For instance, the video signatures 118 include information calculated according to an amount of intra frame activity, while the audio signature 118 may be computed by tracking a difference in audio over time via two or more filters ( e.g ., a low pass filter, an infinite impulse response (HR) filter, etc.) and determining whether the audio sample is contributing or impacting the overall energy of the channel. The network monitor 116 may receive the signatures 118 from test points 120 along the multimedia delivery chain 100. For instance, audio/video signature 118 pairs may be taken at the input of the encoder 104, the output of the encoder 104, the output of the transcoder 106, the output of packager 108, or at any other point along the multimedia delivery chain 100. The network monitor 116 may buffer enough data to ensure capture of the largest delay between the test points 120. In an example, the network monitor 116 may buffer one second of the signatures 118.
[0050] At operation 504, the network monitor 116 constructs a needle 206 from the reference stream 204. In an example, the needle 206 is selected as the most recent set of n points in the reference data stream, where n is the size of the needle. The size n may be set to balance finding a high CC, with the amount or processing time that is required to perform the CC computations.
[0051] At operation 506, the network monitor 116 computes correlations of the needle 206 at each position within the test stream 202. In an example, the network monitor 116 compares the needle 206 portion to successive sets of test portions 208 ( e.g ., advancing a window size of n in one sample increments of the one or more test streams 202), to determine a correlation between the needle 206 and each of the test portions 208. The input to the correlation function may be represented as vectors to be compared, and the correlation function may be designed to operate on the two sets of vectors to calculate how closely related the two vectors are. As some possibilities, the correlation function may include one or more of PLCC, MSE, RMSE, MAE, PSNR, SRCC, or RRCC.
[0052] At operation 508, the network monitor 116 identifies a synchronization offset for each of the one or more test streams 202 compared to the reference stream 204. In an example, the synchronization offset for a test stream 202 may be identified as the maximum correlation point of correlations of the needle 206 at each position within the test stream 202. In some implementations, multiple consecutive consistent maximum correlation points may be required to confirm the synchronization offset.
[0053] At operation 510, the network monitor 116 updates the synchronization offset according to the outlier metric for the identified synchronization offset of operation 508. This may be desirable because sometimes the alignment offset found at operation 508 may change significantly on a per-iteration basis (e.g., the offset for a first iteration is at 100 frames but for a next iteration is at 1000 frames). To prevent excessive bounce from occurring, additional verification may be performed before acceptance of the latest offset. This verification allows the network monitor 116 to compare the synchronization offset for the current iteration with one or more synchronization offsets computed in previous iterations of the process 500. If the synchronization offset is comparable, then the latest synchronization offset may be used as the new offset. If synchronization offset does not appear valid, then the synchronization offset may be ignored.
[0054] A function may be applied that determines whether the current value is an outlier from the previous N values. One such function may be to determine whether the synchronization offset minus the median value of previous N synchronization offsets (the median absolute difference) is less than a threshold value. If so, then the synchronization offset for the frame is accepted; otherwise, the synchronization offset for the frame is rejected. Another such function may take the mean absolute error of the current synchronization offset from the mean of the previous N synchronization offset values. If the mean absolute error is less than 2 standard deviations of the N values (for example), then the synchronization offset may be accepted; otherwise, the synchronization offset is rejected.
[0055] For sake of example, N may be from 1-5, but any amount of synchronization offset slots may be used. These slots may also be reset upon various conditions, such as transition to a new video segment, detection of a change to a new resolution, etc.
[0056] At operation 512, the network monitor 116 determines whether to compute a next synchronization offset. In one example, the network monitor 116 sets a timer to periodically recalculate the synchronization offset(s). If so and the timer expired, control returns to operation 502. If not, then the process 500 may remain at operation 512. In another example, the network monitor 116 tracks whether the synchronization offset no longer provides a high correlation coefficient, such that control returns to operation 502 responsive to a drop in the correlation coefficient below a threshold confidence. If not, then the process 500 may remain at operation 512. In yet a further example, the synchronization offset is continually updated and the process 500 simply loops from operation 512 to operation 502. As yet a further possibility, the synchronization offset is completed once, and after operation 512 the process 500 ends (not shown).
[0057] FIG. 6 illustrates an example computing device 600 for use in the measurement of latency between audio/video streams. The algorithms and/or methodologies of one or more embodiments discussed herein may be implemented using such a computing device. For instance, the operations performed herein by the network monitor 116, such as those of the process 500, may be implemented with such a computing device 600. The computing device 600 may include memory 602, processor 604, and non-volatile storage 606. The processor 604 may include one or more devices selected from high-performance computing (HPC) systems including high-performance cores, microprocessors, micro-controllers, digital signal processors, microcomputers, central processing
units, field programmable gate arrays, programmable logic devices, state machines, logic circuits, analog circuits, digital circuits, or any other devices that manipulate signals (analog or digital) based on computer-executable instructions residing in memory 602. The memory 602 may include a single memory device or a number of memory devices including, but not limited to, random access memory (RAM), volatile memory, non-volatile memory, static random-access memory (SRAM), dynamic random access memory (DRAM), flash memory, cache memory, or any other device capable of storing information. The non-volatile storage 606 may include one or more persistent data storage devices such as a hard drive, optical drive, tape drive, non-volatile solid-state device, cloud storage or any other device capable of persistently storing information.
[0058] The processor 604 may be configured to read into memory 602 and execute computer- executable instructions residing in program instructions 608 of the non-volatile storage 606 and embodying algorithms and/or methodologies of one or more embodiments. The program instructions 608 may include operating systems and applications. The program instructions 608 may be compiled or interpreted from computer programs created using a variety of programming languages and/or technologies, including, without limitation, and either alone or in combination, JAVA, C, C++, C#, OBJECTIVE C, FORTRAN, PASCAL, JAVA SCRIPT, PYTHON, PERL, and PL/SQL.
[0059] Upon execution by the processor 604, the computer-executable instructions of the program instructions 608 may cause the computing device 600 to implement one or more of the algorithms and/or methodologies disclosed herein. The non-volatile storage 606 may also include data 610 supporting the functions, features, and processes of the one or more embodiments described herein. This data 610 may include, as some examples, data of the test streams 202 and reference streams 204, needle 206, and computed offset results.
[0060] The processes, methods, or algorithms disclosed herein can be deliverable to/implemented by a processing device, controller, or computer, which can include any existing programmable electronic control unit or dedicated electronic control unit. Similarly, the processes, methods, or algorithms can be stored as data and instructions executable by a controller or computer in many forms including, but not limited to, information permanently stored on non-writable storage media such as ROM devices and information alterably stored on writeable storage media such as floppy disks, magnetic tapes, CDs, RAM devices, and other magnetic and optical media. The
processes, methods, or algorithms can also be implemented in a software executable object. Alternatively, the processes, methods, or algorithms can be embodied in whole or in part using suitable hardware components, such as Application Specific Integrated Circuits (ASICs), Field-Programmable Gate Arrays (FPGAs), state machines, controllers or other hardware components or devices, or a combination of hardware, software and firmware components.
[0061] While exemplary embodiments are described above, it is not intended that these embodiments describe all possible forms encompassed by the claims. The words used in the specification are words of description rather than limitation, and it is understood that various changes can be made without departing from the spirit and scope of the disclosure. As previously described, the features of various embodiments can be combined to form further embodiments of the invention that may not be explicitly described or illustrated. While various embodiments could have been described as providing advantages or being preferred over other embodiments or prior art implementations with respect to one or more desired characteristics, those of ordinary skill in the art recognize that one or more features or characteristics can be compromised to achieve desired overall system attributes, which depend on the specific application and implementation. These attributes can include, but are not limited to cost, strength, durability, life cycle cost, marketability, appearance, packaging, size, serviceability, weight, manufacturability, ease of assembly, etc. As such, to the extent any embodiments are described as less desirable than other embodiments or prior art implementations with respect to one or more characteristics, these embodiments are not outside the scope of the disclosure and can be desirable for particular applications.
[0062] With regard to the processes, systems, methods, heuristics, etc. described herein, it should be understood that, although the steps of such processes, etc. have been described as occurring according to a certain ordered sequence, such processes could be practiced with the described steps performed in an order other than the order described herein. It further should be understood that certain steps could be performed simultaneously, that other steps could be added, or that certain steps described herein could be omitted. In other words, the descriptions of processes herein are provided for the purpose of illustrating certain embodiments, and should in no way be construed so as to limit the claims.
[0063] Accordingly, it is to be understood that the above description is intended to be illustrative and not restrictive. Many embodiments and applications other than the examples provided would be apparent upon reading the above description. The scope should be determined, not with reference to the above description, but should instead be determined with reference to the appended claims, along with the full scope of equivalents to which such claims are entitled. It is anticipated and intended that future developments will occur in the technologies discussed herein, and that the disclosed systems and methods will be incorporated into such future embodiments. In sum, it should be understood that the application is capable of modification and variation.
[0064] All terms used in the claims are intended to be given their broadest reasonable constructions and their ordinary meanings as understood by those knowledgeable in the technologies described herein unless an explicit indication to the contrary in made herein. In particular, use of the singular articles such as “a,” “the,” “said,” etc. should be read to recite one or more of the indicated elements unless a claim recites an explicit limitation to the contrary.
[0065] The abstract of the disclosure is provided to allow the reader to quickly ascertain the nature of the technical disclosure. It is submitted with the understanding that it will not be used to interpret or limit the scope or meaning of the claims. In addition, in the foregoing Detailed Description, it can be seen that various features are grouped together in various embodiments for the purpose of streamlining the disclosure. This method of disclosure is not to be interpreted as reflecting an intention that the claimed embodiments require more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive subject matter lies in less than all features of a single disclosed embodiment. Thus, the following claims are hereby incorporated into the Detailed Description, with each claim standing on its own as a separately claimed subject matter.
[0066] While exemplary embodiments are described above, it is not intended that these embodiments describe all possible forms of the invention. Rather, the words used in the specification are words of description rather than limitation, and it is understood that various changes may be made without departing from the spirit and scope of the invention. Additionally, the features of various implementing embodiments may be combined to form further embodiments of the invention.
Claims
1. A method for identifying real-time latency of audio/video streams, comprising: buffering signatures of a reference audio/video stream and signatures of a test audio/video stream; constructing a needle as a vector including a set of signatures of the reference audio/video stream; computing correlations of the needle to successive vectors of sets of signatures of the test audio/video stream using a correlation function that calculates relatedness of the needle vector to each of the successive vectors of the test audio/video stream; identifying a synchronization offset between the test stream and the reference stream according to a maximum correlation point of the correlations of the needle to the successive sets of signatures of the test audio/video stream; and aligning the reference audio/video stream and the test audio/video stream according to the synchronization offset.
2. The method of claim 1, further comprising simultaneously displaying the reference audio/video stream and the test audio/video stream as aligned.
3. The method of claim 1, wherein the signatures include video signatures each calculated according to an amount of activity within a frame and adjacent frames and audio signatures calculated by tracking a difference in audio over time.
4. The method of claim 1, wherein the reference audio/video stream is calculated from an audio/video stream at a first point along a multimedia delivery chain, and the test audio/video stream is calculated from the audio/video stream at a second point along the multimedia delivery chain.
5. The method of claim 1, wherein the needle is selected to include most recent set of n signatures in the reference audio/video stream, and the successive sets of consecutive signatures of the test audio/video stream each include n signatures.
6. The method of claim 1, wherein the needle is selected either to accommodate searching for different synchronization offset ranges or based on other statistical features.
7. The method of claim 1, wherein the correlation function utilizes one or more of Pearson Linear Correlation Coefficient (PLCC), mean squared error (MSE), root mean squared error (RMSE), mean absolute error (MAE), peak signal-to-noise ratio (PSNR), Spearman’s rank correlation coeffect (SRCC), or Kendall’s rank correlation coefficient (KRCC).
8. The method of claim 1, further comprising recomputing the synchronization offset responsive to the synchronization offset failing to provide a correlation coefficient above a predefined threshold correlation.
9. The method of claim 1, further comprising preprocessing the audio/video streams into a common audio format and a common video format to assist in generating the signatures of the reference audio/video stream and the signatures of the test audio/video stream.
10. The method of claim 1, further comprising: determining an outlier metric for the synchronization offset in comparison to one or more previous synchronization offsets; and updating the synchronization offset based on the outlier metric indicating that the synchronization offset is not an outlier.
11. The method of claim 10, wherein the outlier metric includes determining a median absolute difference for the one or more previous synchronization offsets; and further comprising accepting the synchronization offset responsive to a difference between the synchronization offset and the median absolute difference being within a threshold value.
12. The method of claim 10, wherein the outlier metric includes determining a mean absolute error of the synchronization offset compared to a mean of the one or more previous synchronization offsets; and further comprising accepting the synchronization offset responsive to the mean absolute error being less than two standard deviations from the mean.
13. A system for identifying real-time latency of audio/video streams, comprising: a computing device programmed to: buffer signatures of a reference audio/video stream and signatures of a test audio/video stream; construct a needle as a vector including a set of signatures of the reference audio/video stream; compute correlations of the needle to successive vectors of sets of signatures of the test audio/video stream using a correlation function that calculates relatedness of the needle vector to each of the successive vectors of the test audio/video stream; identify a synchronization offset between the test stream and the reference stream according to a maximum correlation point of the correlations of the needle to the successive sets of signatures of the test audio/video stream; and align the reference audio/video stream and the test audio/video stream according to the synchronization offset.
14. The system of claim 13, wherein the computing device is further programmed to simultaneously display the reference audio/video stream and the test audio/video stream as aligned.
15. The system of claim 13, wherein the signatures include video signatures each calculated according to an amount of activity within a frame and adjacent frames and audio signatures calculated by tracking a difference in audio over time.
16. The system of claim 13, wherein the computing device is further programmed to: calculate the reference audio/video stream from an audio/video stream at a first point along a multimedia delivery chain; and calculate the test audio/video stream from the audio/video stream at a second point along the multimedia delivery chain.
17. The system of claim 13, wherein the computing device is further programmed to: construct the needle using a most recent set ofn signatures in the reference audio/video stream; and construct the each of the successive sets of signatures in the test audio/video stream as a consecutive set of n signatures of the test audio/video stream.
18. The system of claim 13, wherein the computing device is further programmed to select the needle either to accommodate searching for different synchronization offset ranges or based on other statistical features.
19. The system of claim 13, wherein the correlation function utilizes one or more of Pearson Linear Correlation Coefficient (PLCC), mean squared error (MSE), root mean squared error (RMSE), mean absolute error (MAE), peak signal-to-noise ratio (PSNR), Spearman’s rank correlation coeffect (SRCC), or Kendall’s rank correlation coefficient (KRCC).
20. The system of claim 13, wherein the computing device is further programmed to compute the synchronization offset responsive to the synchronization offset failing to provide a correlation coefficient above a predefined threshold correlation.
21. The system of claim 13, wherein the computing device is further programmed to preprocess the audio/video streams into a common audio format and a common video format to assist in generating the signatures of the reference audio/video stream and the signatures of the test audio/video stream.
22. The system of claim 13, wherein the computing device is further programmed: determine an outlier metric for the synchronization offset in comparison to one or more previous synchronization offsets; and update the synchronization offset based on the outlier metric indicating that the synchronization offset is not an outlier.
23. The system of claim 13, wherein the outlier metric includes determining a median absolute difference for the one or more previous synchronization offsets; and the computing device is further programmed to accept the synchronization offset responsive to a difference between the synchronization offset and the median absolute difference being within a threshold value.
24. The system of claim 13, wherein the outlier metric includes determining a mean absolute error of the synchronization offset compared to a mean of the one or more previous synchronization offsets; and the computing device is further programmed to accept the synchronization offset responsive to the mean absolute error being less than two standard deviations from the mean.
25. A non-transitory computer-readable medium comprising instructions for identifying real-time latency of audio/video streams, that when executed by a processor of a computing device, cause the computing device to: buffer signatures of a reference audio/video stream and signatures of a test audio/video stream; construct a needle as a vector including a set of signatures of the reference audio/video stream; compute correlations of the needle to successive vectors of sets of signatures of the test audio/video stream using a correlation function that calculates relatedness of the needle vector to each of the successive vectors of the test audio/video stream; identify a synchronization offset between the test stream and the reference stream according to a maximum correlation point of the correlations of the needle to the successive sets of signatures of the test audio/video stream; and
align the reference audio/video stream and the test audio/video stream according to the synchronization offset.
26. The medium of claim 19, wherein the signatures include video signatures each calculated according to an amount of activity within a frame and adjacent frames and audio signatures calculated by tracking a difference in audio over time.
27. The medium of claim 19, further comprising instructions that when executed by the processor of the computing device, cause the computing device to: calculate the reference audio/video stream from an audio/video stream at a first point along a multimedia delivery chain; and calculate the test audio/video stream from the audio/video stream at a second point along the multimedia delivery chain.
28. The medium of claim 19, further comprising instructions that when executed by the processor of the computing device, cause the computing device to: construct the needle using a most recent set ofn signatures in the reference audio/video stream; and construct the each of the successive sets of signatures in the test audio/video stream as a consecutive set of n signatures of the test audio/video stream.
29. The medium of claim 19, further comprising instructions that when executed by the processor of the computing device, cause the computing device to select the needle either to accommodate searching for different synchronization offset ranges or based on other statistical features.
30. The medium of claim 19, wherein the correlation function utilizes one or more of Pearson Linear Correlation Coefficient (PLCC), mean squared error (MSE), root mean squared error (RMSE), mean absolute error (MAE), peak signal-to-noise ratio (PSNR), Spearman’s rank correlation coeffect (SRCC), or Kendall’s rank correlation coefficient (KRCC).
31. The medium of claim 19, further comprising instructions that when executed by the processor of the computing device, cause the computing device to compute the synchronization offset responsive to the synchronization offset failing to provide a correlation coefficient above a predefined threshold correlation.
32. The medium of claim 19, further comprising instructions that when executed by the processor of the computing device, cause the computing device to preprocess the audio/video streams into a common audio format and a common video format to assist in generating the signatures of the reference audio/video stream and the signatures of the test audio/video stream.
33. The medium of claim 19, further comprising instructions that when executed by the processor of the computing device, cause the computing device to: determine an outlier metric for the synchronization offset in comparison to one or more previous synchronization offsets; and update the synchronization offset based on the outlier metric indicating that the synchronization offset is not an outlier.
34. The medium of claim 19, wherein the outlier metric includes determining a median absolute difference for the one or more previous synchronization offsets; and further comprising instructions that when executed by the processor of the computing device, cause the computing device to accept the synchronization offset responsive to a difference between the synchronization offset and the median absolute difference being within a threshold value.
35. The medium of claim 19, wherein the outlier metric includes determining a mean absolute error of the synchronization offset compared to a mean of the one or more previous synchronization offsets; and further comprising instructions that when executed by the processor of the computing device, cause the computing device to accept the synchronization offset responsive to the mean absolute error being less than two standard deviations from the mean.
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP21754627.4A EP4104450A4 (en) | 2020-02-13 | 2021-02-11 | Distributed measurement of latency and synchronization delay between audio/video streams |
CA3167971A CA3167971A1 (en) | 2020-02-13 | 2021-02-11 | Distributed measurement of latency and synchronization delay between audio/video streams |
IL295544A IL295544A (en) | 2020-02-13 | 2021-02-11 | Distributed measurement of latency and synchronization delay between audio/video streams |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US202062976169P | 2020-02-13 | 2020-02-13 | |
US62/976,169 | 2020-02-13 | ||
US202063055946P | 2020-07-24 | 2020-07-24 | |
US63/055,946 | 2020-07-24 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021161226A1 true WO2021161226A1 (en) | 2021-08-19 |
Family
ID=77273302
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/IB2021/051149 WO2021161226A1 (en) | 2020-02-13 | 2021-02-11 | Distributed measurement of latency and synchronization delay between audio/video streams |
Country Status (5)
Country | Link |
---|---|
US (1) | US11632582B2 (en) |
EP (1) | EP4104450A4 (en) |
CA (1) | CA3167971A1 (en) |
IL (1) | IL295544A (en) |
WO (1) | WO2021161226A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
DE102023112593A1 (en) | 2022-05-27 | 2023-11-30 | Pke Holding Ag | Method for determining latency when displaying individual images of a video |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110096173A1 (en) | 2009-10-25 | 2011-04-28 | Tektronix, Inc | Av delay measurement and correction via signature curves |
US20110261257A1 (en) | 2008-08-21 | 2011-10-27 | Dolby Laboratories Licensing Corporation | Feature Optimization and Reliability for Audio and Video Signature Generation and Detection |
US8311983B2 (en) * | 2009-04-28 | 2012-11-13 | Whp Workflow Solutions, Llc | Correlated media for distributed sources |
US9769527B2 (en) * | 2013-07-11 | 2017-09-19 | Dejero Labs Inc. | Systems and methods for transmission of data streams |
US20180077445A1 (en) | 2016-09-13 | 2018-03-15 | Facebook, Inc. | Systems and methods for evaluating content synchronization |
US20190342594A1 (en) | 2018-04-08 | 2019-11-07 | Q'ligent Corporation | Method and system for analyzing audio, video, and audio-video media data streams |
WO2020170237A1 (en) * | 2019-02-19 | 2020-08-27 | Edgy Bees Ltd. | Estimating real-time delay of a video data stream |
US20200314503A1 (en) * | 2019-03-26 | 2020-10-01 | Ssimwave Inc. | Unified end-to-end quality and latency measurement, optimization and management in multimedia communications |
Family Cites Families (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5481294A (en) * | 1993-10-27 | 1996-01-02 | A. C. Nielsen Company | Audience measurement system utilizing ancillary codes and passive signatures |
US10726822B2 (en) * | 2004-09-27 | 2020-07-28 | Soundstreak, Llc | Method and apparatus for remote digital content monitoring and management |
EP1924101B1 (en) * | 2005-09-06 | 2013-04-03 | Nippon Telegraph And Telephone Corporation | Video communication quality estimation device, method, and program |
US8340510B2 (en) * | 2009-07-17 | 2012-12-25 | Microsoft Corporation | Implementing channel start and file seek for decoder |
CN103038783B (en) * | 2010-03-09 | 2016-03-09 | 泰景系统公司 | Adaptive video decoding circuit and method thereof |
US9357275B2 (en) * | 2011-09-06 | 2016-05-31 | Qualcomm Incorporated | Network streaming of coded video data |
GB2534136A (en) * | 2015-01-12 | 2016-07-20 | Nokia Technologies Oy | An apparatus, a method and a computer program for video coding and decoding |
US10063907B1 (en) * | 2017-06-06 | 2018-08-28 | Polycom, Inc. | Differential audio-video synchronization |
US10236005B2 (en) * | 2017-06-08 | 2019-03-19 | The Nielsen Company (Us), Llc | Methods and apparatus for audio signature generation and matching |
US10735825B1 (en) * | 2019-02-07 | 2020-08-04 | Disney Enterprises, Inc. | Coordination of media content delivery to multiple media players |
US20200280761A1 (en) * | 2019-03-01 | 2020-09-03 | Pelco, Inc. | Automated measurement of end-to-end latency of video streams |
US10856024B2 (en) * | 2019-03-27 | 2020-12-01 | Microsoft Technology Licensing, Llc | Audio synchronization of correlated video feeds |
-
2021
- 2021-02-11 CA CA3167971A patent/CA3167971A1/en active Pending
- 2021-02-11 IL IL295544A patent/IL295544A/en unknown
- 2021-02-11 WO PCT/IB2021/051149 patent/WO2021161226A1/en unknown
- 2021-02-11 US US17/173,309 patent/US11632582B2/en active Active
- 2021-02-11 EP EP21754627.4A patent/EP4104450A4/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110261257A1 (en) | 2008-08-21 | 2011-10-27 | Dolby Laboratories Licensing Corporation | Feature Optimization and Reliability for Audio and Video Signature Generation and Detection |
US8311983B2 (en) * | 2009-04-28 | 2012-11-13 | Whp Workflow Solutions, Llc | Correlated media for distributed sources |
US20110096173A1 (en) | 2009-10-25 | 2011-04-28 | Tektronix, Inc | Av delay measurement and correction via signature curves |
US9769527B2 (en) * | 2013-07-11 | 2017-09-19 | Dejero Labs Inc. | Systems and methods for transmission of data streams |
US20180077445A1 (en) | 2016-09-13 | 2018-03-15 | Facebook, Inc. | Systems and methods for evaluating content synchronization |
US20190342594A1 (en) | 2018-04-08 | 2019-11-07 | Q'ligent Corporation | Method and system for analyzing audio, video, and audio-video media data streams |
WO2020170237A1 (en) * | 2019-02-19 | 2020-08-27 | Edgy Bees Ltd. | Estimating real-time delay of a video data stream |
US20200314503A1 (en) * | 2019-03-26 | 2020-10-01 | Ssimwave Inc. | Unified end-to-end quality and latency measurement, optimization and management in multimedia communications |
Non-Patent Citations (1)
Title |
---|
See also references of EP4104450A4 |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
DE102023112593A1 (en) | 2022-05-27 | 2023-11-30 | Pke Holding Ag | Method for determining latency when displaying individual images of a video |
Also Published As
Publication number | Publication date |
---|---|
EP4104450A4 (en) | 2024-01-24 |
CA3167971A1 (en) | 2021-08-19 |
US11632582B2 (en) | 2023-04-18 |
IL295544A (en) | 2022-10-01 |
EP4104450A1 (en) | 2022-12-21 |
US20210258630A1 (en) | 2021-08-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR101789086B1 (en) | Concept for determining the quality of a media data stream with varying quality-to-bitrate | |
US11638051B2 (en) | Real-time latency measurement of video streams | |
US8914835B2 (en) | Streaming encoded video data | |
US8631143B2 (en) | Apparatus and method for providing multimedia content | |
US20100110199A1 (en) | Measuring Video Quality Using Partial Decoding | |
US8437619B2 (en) | Method of processing a sequence of coded video frames | |
SG181131A1 (en) | Technique for video quality estimation | |
CN112425178B (en) | Two pass block parallel transcoding process | |
US11632582B2 (en) | Distributed measurement of latency and synchronization delay between audio/video streams | |
EP4068779A1 (en) | Cross-validation of video encoding | |
Baik et al. | Video acuity assessment in mobile devices | |
Staelens et al. | Viqid: A no-reference bit stream-based visual quality impairment detector | |
EP3264709A1 (en) | A method for computing, at a client for receiving multimedia content from a server using adaptive streaming, the perceived quality of a complete media session, and client | |
JP2006050130A (en) | Video encoder and encoding method | |
KR102350570B1 (en) | Set-Top Box for Measuring Frame Loss in a Video Stream and Method for Operating Same | |
KR20090071873A (en) | System and method for controlling coding rate using quality of image | |
JP7431514B2 (en) | Method and system for measuring quality of video call service in real time | |
US20240212118A1 (en) | Quality measurement between mismatched videos | |
JP5394991B2 (en) | Video frame type estimation adjustment coefficient calculation method, apparatus, and program | |
Jonnalagadda et al. | Evaluation of video quality of experience using evalvid | |
KR20120105969A (en) | Defection of fast multi-track video ingest detection method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21754627 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 3167971 Country of ref document: CA |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 2021754627 Country of ref document: EP Effective date: 20220913 |