EP2005762A1 - Method and apparatus for measuring audio/video sync delay - Google Patents

Method and apparatus for measuring audio/video sync delay

Info

Publication number
EP2005762A1
EP2005762A1 EP07732245A EP07732245A EP2005762A1 EP 2005762 A1 EP2005762 A1 EP 2005762A1 EP 07732245 A EP07732245 A EP 07732245A EP 07732245 A EP07732245 A EP 07732245A EP 2005762 A1 EP2005762 A1 EP 2005762A1
Authority
EP
European Patent Office
Prior art keywords
audio
encoded
video
time
timestamps
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP07732245A
Other languages
German (de)
French (fr)
Inventor
Matthew Alan Bowers
Scott Griffiths
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tektronix Inc
Original Assignee
Tektronix International Sales GmbH
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tektronix International Sales GmbH filed Critical Tektronix International Sales GmbH
Publication of EP2005762A1 publication Critical patent/EP2005762A1/en
Withdrawn legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/434Disassembling of a multiplex stream, e.g. demultiplexing audio and video streams, extraction of additional data from a video stream; Remultiplexing of multiplex streams; Extraction or processing of SI; Disassembling of packetised elementary stream
    • H04N21/4341Demultiplexing of audio and video streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/236Assembling of a multiplex stream, e.g. transport stream, by combining a video stream with other content or additional data, e.g. inserting a URL [Uniform Resource Locator] into a video stream, multiplexing software data into a video stream; Remultiplexing of multiplex streams; Insertion of stuffing bits into the multiplex stream, e.g. to obtain a constant bit-rate; Assembling of a packetised elementary stream
    • H04N21/2368Multiplexing of audio and video streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/8106Monomedia components thereof involving special audio data, e.g. different tracks for different languages
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • H04N21/8547Content authoring involving timestamps for synchronizing content
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N17/00Diagnosis, testing or measuring for television systems or their details
    • H04N17/004Diagnosis, testing or measuring for television systems or their details for digital television systems

Definitions

  • a common requirement is to pass the digital data through one or more encoding processes, for example prior to the broadcast transmission of the digital audio-visual data, for example the broadcast of a television programme.
  • the coding processes habitually involve data compression and the use of digital audio filters to process the audio signal.
  • the encoding process may also typically involve multiplexing a plurality of separate data streams together.
  • the audio data will be processed differently from the video data, and each of the different stages in the encoding process can potentially introduce a time delay to the digital data signal
  • the overall encoding process can potentially introduce a loss of synchronisation between the audio and video data, which will be most noticeable as a loss of lip-sync in video footage of speaking characters.
  • the human brain can perceive even quite small time delays between the video and audio data, with the circumstances in which the audio signal leads the video signal being most noticeable.
  • the applicable encoding and transmission standards stipulate maximum time delays between the audio and video data. For example, according to some standards the audio signal must not lead the corresponding video signal by a time delay greater than 40ms.
  • a method of determining the delay between an audio and visual signal comprising:
  • Encoding the audio and video signals to generate digitally encoded audio and video data streams; Analysing the encoded video and audio data streams to extract each of the audibly and visually encoded timestamps; and
  • the audio and video timestamps are encoded as a binary code, for example Gray code.
  • Each visually encoded timestamp preferably comprises a plurality of display segments, the colour or shade of each segment being representative of a binary state.
  • the display segments comprise a portion of a macro block.
  • Each audibly encoded timestamp preferably comprises an audio tone having a plurality of predetermined frequency components, the presence of a frequency component being representative of a binary state.
  • each encoded time stamp comprises a frame count.
  • apparatus for determining the delay between a digitally encoded audio and video signal, the video signal having a plurality of sequential timestamps visually encoded thereon and the audio signal having a corresponding plurality of timestamps audibly encoded thereon, the audio and video signals being synchronised to one another, the apparatus comprising:
  • a video timestamp detector arranged to detect each of the timestamps encoded in the encoded video signal, decode the timestamp and provide a first time signal representative of the actual time of receipt of the video timestamp;
  • An audio timestamp detector arranged to detect each of the timestamps encoded in the encoded audio signal, decode the timestamp and provide a second time signal representative of the actual time of receipt of the audio timestamp;
  • a timestamp comparator arranged to receive the first and second time signals and measure any delay between their time of receipt.
  • Figure 1 schematically illustrates the timing and duration of audio and video events included in a possible test signal
  • Figure 2 schematically illustrates a time delay analysis system for determining the time delays between the audio and video signals shown in Figure 1 ;
  • Figure 3 schematically illustrates the relative timings of an audio and video signal as shown in Figure 1 in which there is a delay between the audio and video signals;
  • Figure 4 schematically illustrates a method of visually encoding a time stamp according to an embodiment of the present information
  • Figure 5 schematically illustrates a method of audio encoding a time stamp according to an embodiment of the present information
  • Figure 6 schematically illustrates a time delay analysis system according to an embodiment of the present invention for determining the time delays between audio and video signals having time stamps encoded therein of the kind illustrated in Figures 4 & 5.
  • any time delay between audio and video data subsequent to an encoding process having been performed on the originally available audio and video data is determined utilising a predetermined video sequence having known timing properties.
  • the video/audio data sequence is provided in either an uncompressed data format or in a standard encoded data format, such as for example MPEG-2 video or audio.
  • the predetermined audio/video sequence comprises a series of visible "flashes" having a predetermined duration and time interval between each flash.
  • the sequence also comprises a corresponding number of audible tones whose duration and time interval between tones exactly corresponds to the occurrences of the visible flashes.
  • An example of an appropriate timing diagram for the visible and audible signals is schematically illustrated in
  • the upper signal trace 2 represents the binary levels for the visible signal, with the signal either being totally black or totally white in visible appearance.
  • the lower signal trace 4 represents the audible signal, with the upper signal level representing a production of an audible tone and the lower signal level representing the absence of a tone.
  • a visible flash and audible tone of duration of 1 unit is subsequently produced. This is followed by a further time period during which no visible flash or audible tone is produced, this second time period having a duration of 2 units.
  • the total sequence comprises five periods during which a visible flash and audible tone are produced, each period lasting one time unit longer than the preceding period, with correspondingly increasing time periods in between during which no visible flash or audible tone is produced.
  • the entire sequence lasts for a total of 30 time units, which will typically be 30 seconds.
  • the entire sequence preferably continually repeats.
  • the visible flash is produced in at least the macro block, or at least an integer multiple thereof, that is shown at the top left hand corner of the display screen.
  • a 4x4 array of blocks i.e. 32x32 pixels, is used to encode the visible flash.
  • This location is carefully chosen since, due to the scanning method of generating a displayed image as will be appreciated by those skilled in the art, the digital data representing this part of the display screen will occur very early in the relevant data stream and will consequently practically always be correctly encoded.
  • the selection of the visible flash as a 32x32 pixel area will also tend to ensure the correct encoding of this video data.
  • the use of only black and white shades for the visible flash will maximise the likelihood of the video data being correctly encoded since these are "basic" digital values unlikely to be corrupted by the encoding process.
  • the audio tone is provided as a tone with only a single frequency component, for example at 10KHz, or some other single frequency. Since only a single frequency component is utilised for the audio tone, it should be faithfully encoded by any audio encoder included within the encoding system under test.
  • visual data may be provided to the user, for example a larger visual representation of the visible flash, for example as a series of rotating circular segments, each segment being representative of a single time unit such that a complete sequence requires a full "revolution" through the multiple segments.
  • a larger visual representation of the visible flash for example as a series of rotating circular segments, each segment being representative of a single time unit such that a complete sequence requires a full "revolution" through the multiple segments.
  • the predetermined audio visual sequence is passed through the encoding system under test and the encoded digital data stream subsequently analysed.
  • the analysis process comprises detecting one or both of the beginning and end of one of the visible flashes by detecting the point in time within the encoded data stream at which the 32x32 pixel macro block integer changes from "black” to "white” or vice versa.
  • the time at which this occurs is accurate to within the duration of 1 frame of visual data, since the display is only refreshed every frame, A typical frame rate is 25 frames per second.
  • Concurrently the encoded audio signal is analysed to determine one or both of the beginning and end of the audio tones.
  • a preferred method of detecting the beginning or end of the audio tone is to detect the sharply rising or falling amplitude of the tone as each transition from "tone” to "no tone” or vice versa occurs.
  • the analysis process can thus determine any time delay between the video and audio "events" (an event being rising or falling audio or video signal edge).
  • any determined delay that falls outside a predetermined set of parameters, such as those set by one or more transmission standards, causes an alert to be automatically generated.
  • a system in accordance with the applicant's co-pending analysis scheme for determining any loss of video/audio synchronisation in an encoded data stream is schematically represented in Figure 2.
  • a predetermined video stream 10 as described above in an un- encoded state is stored on a data storage medium such as a hard disk 12 and is provided as an input to the encoding system 14 to be tested.
  • the encoding system will generally output an encoded data stream that can be decomposed to separate video 16 and audio 18 streams.
  • Each of the video and audio streams are provided as inputs to an analysis engine 20 and input to separate video and audio event detection units 22, 24.
  • each event detection unit is arranged to detect the relevant video or audio 'events' of the encoded test data stream, these being the beginning or ends, or both, of the visible 'flashes' and audio tones as discussed above in relation to Figure 1, and to provide an output signal indicative of when each event occurs.
  • the output signals from each of the audio and video event detection units 22, 24 is provided to a time comparison, unit 26 that is arranged to measure any time interval present between the output signals from the event detection units and thus any time interval, be it lag or lead, between the occurrences of the audio and video 'events'.
  • This time interval data is provided from the time comparison unit 26 to an output interface unit 28 that is arranged to provide the time interval data to an appropriate user interface.
  • the output interface unit 28 is also arranged to compare any time delay between the audio and video signals with defined maximum permitted delays that may be stored in a further data storage area 30 or may be stored internally to the output interface unit. If a detected time interval exceeds a predefined value then the output interface unit may be arranged to provide an alarm signal.
  • the sequence of visible flashes and audible tones comprises 'events' with increasing time intervals between each 'event'. This ensures that should the time delay between the video and audio signals be great enough for one of the video events to coincide with an audio event this 'false' synchronisation, which would not cause the analysis engine to generate a report or alarm, will notjbe maintained at the next occurrence of a video and audio event.
  • Figure 3 This is schematically illustrated in Figure 3, in which the upper trace 32 represents the video event signal and the lower trace 34 represents the audio event signal.
  • the audio event signal has been delayed relative to the video signal by the encoding process by a time period of 3 time units, say seconds, as represented by arrow A.
  • the beginning of the second video event 36 occurs at the same time as the beginning of the first audio event 38. If these are the first video and audio events detected by the analysis engine then a false report of synchronisation between the audio and video streams may be provided. However, at the beginning of the next video event 40 it can be seen that the audio stream is out of synchronisation, since the events are not evenly spaced apart and do not have a constant duration. Consequently the analysis engine is able to determine that in fact the video and audio streams are not in synchronisation. If the analysis engine detects both the beginning and end of the video and audio events then the loss of synchronisation will be detected sooner since the end of the first audio event 38 will occur before the end of the second video event 36, even though the beginning of both events coincided. In this instance the loss of synchronisation between the audio and video streams is detected by the analysis engine within one time unit, for example one second.
  • any loss in synchronisation can only be determined, and the delay measured, at best when an audio or video event occurs and as described above in relation to Figure 3 the time required to determine a loss in synchronisation can be multiple time units if a 'miss-match' between video and audio events occurs due to a gross loss of synchronisation.
  • This is wasteful of system resources since each second of audio-visual data will comprise, typically, 25 frames of data. In other words, the same data is being processed 25 times a second.
  • an analysis scheme is provided that allows improved determination of the time delay between audio and video data streams. This is accomplished by providing a predetermined audio-visual test sequence to be encoded in the encoding system under test that includes audio and visual data that allows each frame to be identified.
  • the visual encoding is accomplished using a pattern of black and white squares to represent a binary code.
  • a preferred binary code is Gray code, since a well known property of Gray code is that when presented in sequence only one bit of the binary word changes at a time.
  • An example of a possible sequence of black and white squares is illustrated in Figure 4, in which the sequences of squares for three consecutive frames of audio-visual data are shown.
  • the first 5 square sequence represents the Gray code 00101, which is decimal 7, and hence is used to identify the frame as frame number 7.
  • the second and third sequences represent 00100 and 01100, decimal 8 and 9, respectively.
  • the individual squares are encoded as discrete blocks or integer parts of a macroblock, such as 2x2 block of 32x32 pixels, so as to facilitate the reliable error free encoding of the sequence of squares, thus reliably maintaining the encoded frame identification code.
  • the audio signal is also encoded with a timing sequence that serves to identify which frame of the video signal the particular section of audio data should be synchronised to.
  • a timing sequence that serves to identify which frame of the video signal the particular section of audio data should be synchronised to.
  • this is accomplished by the inclusion of an audio tone that is made up of a number of separate discrete frequencies, each frequency representing one bit in the data word, in an analogous manner to each square of the video code representing a single bit of the Gray code.
  • This allows the encoded tone to be analysed using Fourier analysis techniques to determine the presence or otherwise of the individual frequency components and thus the binary code represented by the tone.
  • An example of the frequency analysis of such an encoded tone is schematically illustrated in Figure 5.
  • the horizontal axis represents the frequency of detected frequency components, in KHz, whilst the vertical axis represents the power of the component.
  • two frequency components 40 are shown at 9KHz and 12 KHz. If the selected code is a 5 bit code with the most significant bit represented by the frequency component centred at 3KHZ and the least significant bit at 15KHz, then the frequency spectrum shown in Figure 5 is taken to represent the binary word 00101, or decimal 7 in Gray code. Care must be taken in the selection of the frequency components selected to represent individual bits of the encoded timing word since it is common practice for audio encoders to discard certain frequency components of a signal based on an analysis of what frequencies the human ear will and will not be able to hear. The frequency components selected for the timing word must therefore be such that they will not be discarded by such encoding techniques.
  • the audio code may be encoded as a series of short audio tones in a predetermined time interval, each tone in the series representing a bit within the timing word and the presence or not of a tone representing the binary state of the bit. So for example, a series of eight audio tones may be used to represent an 8 bit binary word. The frequency of the audio tones may be pre-selected to facilitate their detection.
  • FIG. 6 schematically illustrates an analysis engine for analysing a test audio-visual data stream of the format described above according to embodiments of the present invention after it has been encoded by an encoding system under test.
  • the basic components are the same as for the system illustrated in Figure 2.
  • the individual video and audio data streams 616, 618 are provided as inputs to respective time code detection units 622, 624.
  • Each time code detection unit is arranged to identify and decode the embedded video and audio time codes. Consequently in preferred embodiments the video time code detection unit 622 will be arranged to locate the sequence of coded black and white squares and determine the binary code represented by the particular sequence, thus identifying the individual frame number. The point in time at which each frame is received is also determined.
  • the audio time code detection unit 624 is preferably arranged to perform the necessary frequency analysis on the embedded audio time code to determine the present frequency components and thus the represented binary code.
  • the relevant output signals from the time code detection units are provided to a time comparison unit 626 that is arranged to determine any time delay between the audio and video data streams on the basis of the time of receipt of corresponding portions of the data streams, as identified by the relevant embedded time codes. Any time delay is provided as an input to a report and/or alert unit 628 that is arranged to determine if the time delay exceeds certain predetermined parameters that may, for example, be stored as a look up table in local data storage 630.
  • the analysis engine is capable of determining the relative positions of the separate audio and video data streams within the space of a single frame and can provide audio/video time delay information for each frame, as opposed to delay information only for each video/audio 'event' as is the case with the scheme discussed in the applicant's co-pending application. Consequently, the apparatus and method of the present invention allows improved speed of providing the delay information and improved resolution of the delay information.

Abstract

A method of determining the delay between an audio and visual signal, the method comprising providing a video signal having a plurality of sequential timestamps visually encoded thereon and providing an audio signal having a corresponding plurality of timestamps audibly encoded thereon, the audio and video signals being synchronised to one another, encoding the audio and video signals to generate digitally encoded audio and video data streams, analysing the encoded video and audio data streams to extract each of the audibly and visually encoded timestamps and measuring any delay between the time of receipt of corresponding video and audio timestamps.

Description

METHOD AND APPARATUS FOR MEASURING AUDIO/VIDEO SYNC DELAY
In the field of providing audio-visual content it is common to provide that content as digital data. A common requirement is to pass the digital data through one or more encoding processes, for example prior to the broadcast transmission of the digital audio-visual data, for example the broadcast of a television programme. The coding processes habitually involve data compression and the use of digital audio filters to process the audio signal. The encoding process may also typically involve multiplexing a plurality of separate data streams together. Since it will commonly be the case that the audio data will be processed differently from the video data, and each of the different stages in the encoding process can potentially introduce a time delay to the digital data signal, the overall encoding process can potentially introduce a loss of synchronisation between the audio and video data, which will be most noticeable as a loss of lip-sync in video footage of speaking characters. The human brain can perceive even quite small time delays between the video and audio data, with the circumstances in which the audio signal leads the video signal being most noticeable. For this reason, the applicable encoding and transmission standards stipulate maximum time delays between the audio and video data. For example, according to some standards the audio signal must not lead the corresponding video signal by a time delay greater than 40ms.
It is therefore advantageous to be able to determine in advance the precise amount of time delay any given encoding system is likely to introduce between the audio and video signals of an audio-visual programme.
According to a first aspect of the present invention there is provided a method of determining the delay between an audio and visual signal, the method comprising:
Providing a video signal having a plurality of sequential timestamps visually encoded thereon and providing an audio signal having a corresponding plurality of timestamps audibly encoded thereon, the audio and video signals being synchronised to one another;
Encoding the audio and video signals to generate digitally encoded audio and video data streams; Analysing the encoded video and audio data streams to extract each of the audibly and visually encoded timestamps; and
Measuring any delay between the time of receipt of corresponding video and audio timestamps.
In preferred embodiments the audio and video timestamps are encoded as a binary code, for example Gray code.
Each visually encoded timestamp preferably comprises a plurality of display segments, the colour or shade of each segment being representative of a binary state. Preferably the display segments comprise a portion of a macro block.
Each audibly encoded timestamp preferably comprises an audio tone having a plurality of predetermined frequency components, the presence of a frequency component being representative of a binary state.
Preferably each encoded time stamp comprises a frame count.
According to a second aspect of the present invention there is provided apparatus for determining the delay between a digitally encoded audio and video signal, the video signal having a plurality of sequential timestamps visually encoded thereon and the audio signal having a corresponding plurality of timestamps audibly encoded thereon, the audio and video signals being synchronised to one another, the apparatus comprising:
A video timestamp detector arranged to detect each of the timestamps encoded in the encoded video signal, decode the timestamp and provide a first time signal representative of the actual time of receipt of the video timestamp;
An audio timestamp detector arranged to detect each of the timestamps encoded in the encoded audio signal, decode the timestamp and provide a second time signal representative of the actual time of receipt of the audio timestamp; and
A timestamp comparator arranged to receive the first and second time signals and measure any delay between their time of receipt. Embodiments of the present invention will now be described below, by way of illustrative example only, with reference to the accompanying figures, of which:
Figure 1 schematically illustrates the timing and duration of audio and video events included in a possible test signal;
Figure 2 schematically illustrates a time delay analysis system for determining the time delays between the audio and video signals shown in Figure 1 ;
Figure 3 schematically illustrates the relative timings of an audio and video signal as shown in Figure 1 in which there is a delay between the audio and video signals;
Figure 4 schematically illustrates a method of visually encoding a time stamp according to an embodiment of the present information;
Figure 5 schematically illustrates a method of audio encoding a time stamp according to an embodiment of the present information; and
Figure 6 schematically illustrates a time delay analysis system according to an embodiment of the present invention for determining the time delays between audio and video signals having time stamps encoded therein of the kind illustrated in Figures 4 & 5.
According to a time analysis scheme detailed in the applicant's co-pending patent application of the same title, any time delay between audio and video data subsequent to an encoding process having been performed on the originally available audio and video data is determined utilising a predetermined video sequence having known timing properties. The video/audio data sequence is provided in either an uncompressed data format or in a standard encoded data format, such as for example MPEG-2 video or audio. The predetermined audio/video sequence comprises a series of visible "flashes" having a predetermined duration and time interval between each flash. The sequence also comprises a corresponding number of audible tones whose duration and time interval between tones exactly corresponds to the occurrences of the visible flashes. An example of an appropriate timing diagram for the visible and audible signals is schematically illustrated in
Figure 1.
In Figure 1 the upper signal trace 2 represents the binary levels for the visible signal, with the signal either being totally black or totally white in visible appearance. The lower signal trace 4 represents the audible signal, with the upper signal level representing a production of an audible tone and the lower signal level representing the absence of a tone. As can be seen from Figure 1, after an initial time period of 1 unit, for example 1 second, during which neither a visible flash is produced nor an audible tone produced, a visible flash and audible tone of duration of 1 unit is subsequently produced. This is followed by a further time period during which no visible flash or audible tone is produced, this second time period having a duration of 2 units. This is then followed by the production of a visible flash and audible tone having a duration of 2 units, followed by a period of no visible flash or audible tone of duration 3 units and so on in the sequence illustrated in Figure 1 , the total sequence comprises five periods during which a visible flash and audible tone are produced, each period lasting one time unit longer than the preceding period, with correspondingly increasing time periods in between during which no visible flash or audible tone is produced. In the example shown therefore the entire sequence lasts for a total of 30 time units, which will typically be 30 seconds. The entire sequence preferably continually repeats.
In a preferred arrangement of this analysis scheme, the visible flash is produced in at least the macro block, or at least an integer multiple thereof, that is shown at the top left hand corner of the display screen. Preferably a 4x4 array of blocks, i.e. 32x32 pixels, is used to encode the visible flash. This location is carefully chosen since, due to the scanning method of generating a displayed image as will be appreciated by those skilled in the art, the digital data representing this part of the display screen will occur very early in the relevant data stream and will consequently practically always be correctly encoded. The selection of the visible flash as a 32x32 pixel area will also tend to ensure the correct encoding of this video data. Similarly, the use of only black and white shades for the visible flash will maximise the likelihood of the video data being correctly encoded since these are "basic" digital values unlikely to be corrupted by the encoding process. In a similar fashion, the audio tone is provided as a tone with only a single frequency component, for example at 10KHz, or some other single frequency. Since only a single frequency component is utilised for the audio tone, it should be faithfully encoded by any audio encoder included within the encoding system under test.
Further visual data may be provided to the user, for example a larger visual representation of the visible flash, for example as a series of rotating circular segments, each segment being representative of a single time unit such that a complete sequence requires a full "revolution" through the multiple segments. It will of course be appreciated that such visual enhancements are merely for the convenience of the human operator and are not a necessary part of the present invention.
According to this analysis scheme, the predetermined audio visual sequence is passed through the encoding system under test and the encoded digital data stream subsequently analysed. The analysis process comprises detecting one or both of the beginning and end of one of the visible flashes by detecting the point in time within the encoded data stream at which the 32x32 pixel macro block integer changes from "black" to "white" or vice versa. The time at which this occurs is accurate to within the duration of 1 frame of visual data, since the display is only refreshed every frame, A typical frame rate is 25 frames per second. Concurrently the encoded audio signal is analysed to determine one or both of the beginning and end of the audio tones. A preferred method of detecting the beginning or end of the audio tone is to detect the sharply rising or falling amplitude of the tone as each transition from "tone" to "no tone" or vice versa occurs. The analysis process can thus determine any time delay between the video and audio "events" (an event being rising or falling audio or video signal edge). In preferred embodiments any determined delay that falls outside a predetermined set of parameters, such as those set by one or more transmission standards, causes an alert to be automatically generated.
A system in accordance with the applicant's co-pending analysis scheme for determining any loss of video/audio synchronisation in an encoded data stream is schematically represented in Figure 2. A predetermined video stream 10 as described above in an un- encoded state is stored on a data storage medium such as a hard disk 12 and is provided as an input to the encoding system 14 to be tested. The encoding system will generally output an encoded data stream that can be decomposed to separate video 16 and audio 18 streams. Each of the video and audio streams are provided as inputs to an analysis engine 20 and input to separate video and audio event detection units 22, 24. It will be appreciated that although the video and audio streams are shown in Figure 2 as discrete inputs to the analysis engine, the decomposition of the encoded data stream provided by the encoding system 14 may equally be accomplished within the analysis engine, for example by means of a wrapper demux. Each event detection unit is arranged to detect the relevant video or audio 'events' of the encoded test data stream, these being the beginning or ends, or both, of the visible 'flashes' and audio tones as discussed above in relation to Figure 1, and to provide an output signal indicative of when each event occurs. The output signals from each of the audio and video event detection units 22, 24 is provided to a time comparison, unit 26 that is arranged to measure any time interval present between the output signals from the event detection units and thus any time interval, be it lag or lead, between the occurrences of the audio and video 'events'. This time interval data is provided from the time comparison unit 26 to an output interface unit 28 that is arranged to provide the time interval data to an appropriate user interface. Preferably the output interface unit 28 is also arranged to compare any time delay between the audio and video signals with defined maximum permitted delays that may be stored in a further data storage area 30 or may be stored internally to the output interface unit. If a detected time interval exceeds a predefined value then the output interface unit may be arranged to provide an alarm signal.
As previously mentioned with reference to Figure 1, the sequence of visible flashes and audible tones comprises 'events' with increasing time intervals between each 'event'. This ensures that should the time delay between the video and audio signals be great enough for one of the video events to coincide with an audio event this 'false' synchronisation, which would not cause the analysis engine to generate a report or alarm, will notjbe maintained at the next occurrence of a video and audio event. This is schematically illustrated in Figure 3, in which the upper trace 32 represents the video event signal and the lower trace 34 represents the audio event signal. As can be seen from Figure 3, the audio event signal has been delayed relative to the video signal by the encoding process by a time period of 3 time units, say seconds, as represented by arrow A. Consequently, the beginning of the second video event 36 occurs at the same time as the beginning of the first audio event 38. If these are the first video and audio events detected by the analysis engine then a false report of synchronisation between the audio and video streams may be provided. However, at the beginning of the next video event 40 it can be seen that the audio stream is out of synchronisation, since the events are not evenly spaced apart and do not have a constant duration. Consequently the analysis engine is able to determine that in fact the video and audio streams are not in synchronisation. If the analysis engine detects both the beginning and end of the video and audio events then the loss of synchronisation will be detected sooner since the end of the first audio event 38 will occur before the end of the second video event 36, even though the beginning of both events coincided. In this instance the loss of synchronisation between the audio and video streams is detected by the analysis engine within one time unit, for example one second.
However, it will be appreciated that using the scheme described above any loss in synchronisation can only be determined, and the delay measured, at best when an audio or video event occurs and as described above in relation to Figure 3 the time required to determine a loss in synchronisation can be multiple time units if a 'miss-match' between video and audio events occurs due to a gross loss of synchronisation. This is wasteful of system resources since each second of audio-visual data will comprise, typically, 25 frames of data. In other words, the same data is being processed 25 times a second. According to embodiments of the present invention an analysis scheme is provided that allows improved determination of the time delay between audio and video data streams. This is accomplished by providing a predetermined audio-visual test sequence to be encoded in the encoding system under test that includes audio and visual data that allows each frame to be identified.
In a preferred embodiment the visual encoding is accomplished using a pattern of black and white squares to represent a binary code. A preferred binary code is Gray code, since a well known property of Gray code is that when presented in sequence only one bit of the binary word changes at a time. An example of a possible sequence of black and white squares is illustrated in Figure 4, in which the sequences of squares for three consecutive frames of audio-visual data are shown. The first 5 square sequence represents the Gray code 00101, which is decimal 7, and hence is used to identify the frame as frame number 7. The second and third sequences represent 00100 and 01100, decimal 8 and 9, respectively. It will be appreciated that a 5 bit word has been illustrated in Figure 4 for the purposes of clarity only and any length of word may be selected depending on the number of frames in the test sequence. As with the scheme described above, in embodiments of the present invention the individual squares are encoded as discrete blocks or integer parts of a macroblock, such as 2x2 block of 32x32 pixels, so as to facilitate the reliable error free encoding of the sequence of squares, thus reliably maintaining the encoded frame identification code.
The audio signal is also encoded with a timing sequence that serves to identify which frame of the video signal the particular section of audio data should be synchronised to. In a preferred embodiment of the present invention this is accomplished by the inclusion of an audio tone that is made up of a number of separate discrete frequencies, each frequency representing one bit in the data word, in an analogous manner to each square of the video code representing a single bit of the Gray code. This allows the encoded tone to be analysed using Fourier analysis techniques to determine the presence or otherwise of the individual frequency components and thus the binary code represented by the tone. An example of the frequency analysis of such an encoded tone is schematically illustrated in Figure 5. The horizontal axis represents the frequency of detected frequency components, in KHz, whilst the vertical axis represents the power of the component. In the example illustrated in Figure 5 two frequency components 40 are shown at 9KHz and 12 KHz. If the selected code is a 5 bit code with the most significant bit represented by the frequency component centred at 3KHZ and the least significant bit at 15KHz, then the frequency spectrum shown in Figure 5 is taken to represent the binary word 00101, or decimal 7 in Gray code. Care must be taken in the selection of the frequency components selected to represent individual bits of the encoded timing word since it is common practice for audio encoders to discard certain frequency components of a signal based on an analysis of what frequencies the human ear will and will not be able to hear. The frequency components selected for the timing word must therefore be such that they will not be discarded by such encoding techniques.
In other embodiments of the present invention the audio code may be encoded as a series of short audio tones in a predetermined time interval, each tone in the series representing a bit within the timing word and the presence or not of a tone representing the binary state of the bit. So for example, a series of eight audio tones may be used to represent an 8 bit binary word. The frequency of the audio tones may be pre-selected to facilitate their detection.
Figure 6 schematically illustrates an analysis engine for analysing a test audio-visual data stream of the format described above according to embodiments of the present invention after it has been encoded by an encoding system under test. The basic components are the same as for the system illustrated in Figure 2. The individual video and audio data streams 616, 618 are provided as inputs to respective time code detection units 622, 624. Each time code detection unit is arranged to identify and decode the embedded video and audio time codes. Consequently in preferred embodiments the video time code detection unit 622 will be arranged to locate the sequence of coded black and white squares and determine the binary code represented by the particular sequence, thus identifying the individual frame number. The point in time at which each frame is received is also determined. Equally, the audio time code detection unit 624 is preferably arranged to perform the necessary frequency analysis on the embedded audio time code to determine the present frequency components and thus the represented binary code. The relevant output signals from the time code detection units are provided to a time comparison unit 626 that is arranged to determine any time delay between the audio and video data streams on the basis of the time of receipt of corresponding portions of the data streams, as identified by the relevant embedded time codes. Any time delay is provided as an input to a report and/or alert unit 628 that is arranged to determine if the time delay exceeds certain predetermined parameters that may, for example, be stored as a look up table in local data storage 630.
Since each frame of the encoded audio-visual data stream is individually identified by its respective embedded time code, the analysis engine is capable of determining the relative positions of the separate audio and video data streams within the space of a single frame and can provide audio/video time delay information for each frame, as opposed to delay information only for each video/audio 'event' as is the case with the scheme discussed in the applicant's co-pending application. Consequently, the apparatus and method of the present invention allows improved speed of providing the delay information and improved resolution of the delay information.

Claims

1. A method of determining the delay between an audio and visual signal, the method comprising: providing a video signal having a plurality of sequential timestamps visually encoded thereon and providing an audio signal having a corresponding plurality of timestamps audibly encoded thereon, the audio and video signals being synchronised to one another; encoding the audio and video signals to generate digitally encoded audio and video data streams; analysing the encoded video and audio data streams to extract each of the audibly and visually encoded timestamps; and measuring any delay between the time of receipt of corresponding video and audio timestamps.
2. The method of claim 1, wherein the audio and video timestamps are encoded as a binary code,
3. The method of claim 1 or 2, wherein each visually encoded timestamp comprises a plurality of display segments, the colour or shade of each segment being representative of a binary state.
4. The method of claim 3, wherein the display segments comprise a plurality of macro blocks.
5. The method of any preceding claim, wherein each audibly encoded timestamp comprises an audio tone having a plurality of predetermined frequency components, the presence of a frequency component being representative of a binary state.
6. The method of any preceding claim, wherein each encoded time stamp comprises a frame count.
7. Apparatus for determining the delay between a digitally encoded audio and video signal, the video signal having a plurality of sequential timestamps visually encoded thereon and the audio signal having a corresponding plurality of timestamps audibly encoded thereon, the audio and video signals being synchronised to one another, the apparatus comprising: a video timestamp detector arranged to detect each of the timestamps encoded in the encoded video signal, decode the timestamp and provide a first time signal representative of the actual time of receipt of the video timestamp; an audio timestamp detector arranged to detect each of the timestamps encoded in the encoded audio signal, decode the timestamp and provide a second time signal representative of the actual time of receipt of the audio timestamp; and a timestamp comparator arranged to receive the first and second time signals and measure any delay between their time of receipt.
EP07732245A 2006-04-10 2007-03-30 Method and apparatus for measuring audio/video sync delay Withdrawn EP2005762A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB0607215A GB2437123B (en) 2006-04-10 2006-04-10 Method and apparatus for measuring audio/video sync delay
PCT/GB2007/001191 WO2007116205A1 (en) 2006-04-10 2007-03-30 Method and apparatus for measuring audio/video sync delay

Publications (1)

Publication Number Publication Date
EP2005762A1 true EP2005762A1 (en) 2008-12-24

Family

ID=36539692

Family Applications (1)

Application Number Title Priority Date Filing Date
EP07732245A Withdrawn EP2005762A1 (en) 2006-04-10 2007-03-30 Method and apparatus for measuring audio/video sync delay

Country Status (4)

Country Link
EP (1) EP2005762A1 (en)
JP (1) JP5025722B2 (en)
GB (1) GB2437123B (en)
WO (1) WO2007116205A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112601077A (en) * 2020-12-11 2021-04-02 杭州当虹科技股份有限公司 Automatic encoder delay measuring method based on audio

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2263232A2 (en) * 2008-03-19 2010-12-22 Telefonaktiebolaget L M Ericsson (publ) Method and apparatus for measuring audio-video time skew and end-to-end delay
JP5160950B2 (en) * 2008-04-25 2013-03-13 株式会社タイトー Timing correction program, portable terminal, and processing timing synchronization method
US20110170537A1 (en) * 2010-01-08 2011-07-14 Marius Ungureanu One Way and Round Trip Delays Using Telephony In-Band Tones
CN103796006A (en) * 2012-10-30 2014-05-14 中兴通讯股份有限公司 System and method for labial sound synchronization test
CN106358039B (en) * 2016-09-07 2019-02-01 深圳Tcl数字技术有限公司 Sound draws synchronous detecting method and device
GB2586986B (en) * 2019-09-10 2023-05-24 Hitomi Ltd Signal variation measurement
CN112351273B (en) * 2020-11-04 2022-03-01 新华三大数据技术有限公司 Video playing quality detection method and device
CN112601078B (en) * 2020-12-11 2022-07-26 杭州当虹科技股份有限公司 Automatic encoder delay measuring method based on video

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4326102A (en) * 1980-02-04 1982-04-20 Msi Data Corporation Audio data transmission device with coupling device
EP0789497A2 (en) * 1996-02-12 1997-08-13 Tektronix, Inc. Progammable instrument for automatic measurement of compressed video quality
EP0888019A1 (en) * 1997-06-23 1998-12-30 Hewlett-Packard Company Method and apparatus for measuring the quality of a video transmission

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH05236513A (en) * 1992-02-21 1993-09-10 Shibasoku Co Ltd Method for counting transmission delay time difference between television video signal and audio signal
US6836295B1 (en) * 1995-12-07 2004-12-28 J. Carl Cooper Audio to video timing measurement for MPEG type television systems
US6208745B1 (en) * 1997-12-30 2001-03-27 Sarnoff Corporation Method and apparatus for imbedding a watermark into a bitstream representation of a digital image sequence
GB9804071D0 (en) * 1998-02-27 1998-04-22 Ridgeway Systems And Software Audio-video telephony
US6414960B1 (en) * 1998-12-29 2002-07-02 International Business Machines Corp. Apparatus and method of in-service audio/video synchronization testing
JP2001298757A (en) * 2000-04-11 2001-10-26 Nippon Hoso Kyokai <Nhk> Video and audio delay time difference measuring device
JP2003259314A (en) * 2002-02-26 2003-09-12 Nippon Hoso Kyokai <Nhk> Video audio synchronization method and system thereof
JP2004158913A (en) * 2002-11-01 2004-06-03 Canon Inc Audiovisual processor
JP2004242130A (en) * 2003-02-07 2004-08-26 Nippon Hoso Kyokai <Nhk> Signal generating device and method for measuring video/audio transmission time difference, and signal analysis device and method therefor
KR100499037B1 (en) * 2003-07-01 2005-07-01 엘지전자 주식회사 Method and apparatus of dtv lip-sync test
CN1868213B (en) * 2003-09-02 2010-05-26 索尼株式会社 Content receiving apparatus, video/audio output timing control method, and content providing system
US20050219366A1 (en) * 2004-03-31 2005-10-06 Hollowbush Richard R Digital audio-video differential delay and channel analyzer
CN100442858C (en) * 2005-10-11 2008-12-10 华为技术有限公司 Lip synchronous method for multimedia real-time transmission in packet network and apparatus thereof

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4326102A (en) * 1980-02-04 1982-04-20 Msi Data Corporation Audio data transmission device with coupling device
EP0789497A2 (en) * 1996-02-12 1997-08-13 Tektronix, Inc. Progammable instrument for automatic measurement of compressed video quality
EP0888019A1 (en) * 1997-06-23 1998-12-30 Hewlett-Packard Company Method and apparatus for measuring the quality of a video transmission

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of WO2007116205A1 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112601077A (en) * 2020-12-11 2021-04-02 杭州当虹科技股份有限公司 Automatic encoder delay measuring method based on audio
CN112601077B (en) * 2020-12-11 2022-07-26 杭州当虹科技股份有限公司 Automatic encoder delay measuring method based on audio

Also Published As

Publication number Publication date
GB2437123B (en) 2011-01-26
GB2437123A (en) 2007-10-17
JP5025722B2 (en) 2012-09-12
WO2007116205A9 (en) 2009-04-09
WO2007116205A1 (en) 2007-10-18
GB0607215D0 (en) 2006-05-17
JP2009533920A (en) 2009-09-17

Similar Documents

Publication Publication Date Title
EP2005762A1 (en) Method and apparatus for measuring audio/video sync delay
KR100499037B1 (en) Method and apparatus of dtv lip-sync test
CN101796812B (en) Lip synchronization system and method
CN105049917B (en) The method and apparatus of recording audio/video synchronized timestamp
EP3171593B1 (en) Testing system and method
US20050219366A1 (en) Digital audio-video differential delay and channel analyzer
CA2428064A1 (en) Apparatus and method for determining the programme to which a digital broadcast receiver is tuned
CN102523063A (en) Methods and apparatus to monitor audio/visual content from various sources
US10931975B2 (en) Techniques for detecting media playback errors
CN101616331A (en) A kind of method that video frame rate and audio-visual synchronization performance are tested
EP2239952A1 (en) A method and apparatus for testing a digital video broadcast display product and a method of data communication
CN112601078B (en) Automatic encoder delay measuring method based on video
EP2725578A1 (en) Loudness log for recovery of gated loudness measurements and associated analyzer
GB2437122A (en) Method and apparatus for measuring audio/video sync delay
CN113055711B (en) Audio and video synchronous detection method and detection system thereof
CN112601077B (en) Automatic encoder delay measuring method based on audio
US10097819B2 (en) Testing system, testing method, computer program product, and non-transitory computer readable data carrier
KR100966830B1 (en) Apparatus for inserting audio watermark, apparatus for detecting audio watermark and test automation system for detecting audio distortion using the same
JP5342229B2 (en) Broadcast performance acquisition system, information embedding device, information detection device, and broadcast performance acquisition method
JP5205254B2 (en) Broadcast performance acquisition system, information embedding device, information detection device, and broadcast performance acquisition method
AU2001281320A1 (en) Apparatus and method for determining the programme to which a digital broadcast receiver is tuned

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20081027

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LI LT LU LV MC MT NL PL PT RO SE SI SK TR

17Q First examination report despatched

Effective date: 20090218

R17C First examination report despatched (corrected)

Effective date: 20090323

DAX Request for extension of the european patent (deleted)
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: EXAMINATION IS IN PROGRESS

RAP1 Party data changed (applicant data changed or rights of an application transferred)

Owner name: TEKTRONIX, INC.

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20200103