JP5025722B2

JP5025722B2 - Audio / video synchronization delay measuring method and apparatus

Info

Publication number: JP5025722B2
Application number: JP2009504802A
Authority: JP
Inventors: ボワーズ・マシュー・アラン; グリフィス・スコット
Original assignee: テクトロニクス・インターナショナル・セールス・ゲーエムベーハー
Priority date: 2006-04-10
Filing date: 2007-03-30
Publication date: 2012-09-12
Anticipated expiration: 2027-03-30
Also published as: GB2437123A; JP2009533920A; GB2437123B; WO2007116205A9; GB0607215D0; WO2007116205A1; EP2005762A1

Description

Detailed Description of the Invention

オーディオ・ビジュアル・コンテンツの分野において、そのコンテンツをデジタル・データとして提供することは一般的である。一般的な条件は、例えばテレビジョン・プログラムの放送であるデジタル・オーディオ・ビジュアル・データの放送伝送の前に、デジタル・データを１つ以上の符号化処理に渡すことである。コード化処理は、データ圧縮と、オーディオ信号を処理のためにデジタル・オーディオ・フィルタの利用とを常習的に必要とする。符号化処理も、複数の別個のデータ・ストリームを互いにマルチプレクスすることを典型的に必要とする。これは、一般的に、オーディオ・データをビデオ・データと異なって処理する場合であり、符号化処理における異なる段階の各々がデジタル・データ信号に時間遅延を潜在的に導入するため、全体的な符号化処理は、オーディオ及びビデオのデータの間の同期を損なうことを潜在的に導入し、これが話文字のビデオ画面でのリップ・シンクの損失として最も顕著となる。人間の脳は、ビデオ及びオーディオのデータの間の非常にわずかな時間遅延であっても知覚することができ、オーディオ信号がビデオ信号よりも進んでいる状況が最も目立つ。この理由のため、適用可能な符号化及び伝送標準は、オーディオ及びビデオのデータの間の最大時間遅延を規定している。例えば、いくつかの標準によれば、オーディオ信号は、対応するビデオ信号よりも、４０ｍｓより長い時間遅延に及んで進んではいけない。 In the field of audio-visual content, it is common to provide the content as digital data. A common condition is to pass the digital data to one or more encoding processes prior to the broadcast transmission of digital audio-visual data, for example a television program broadcast. The encoding process routinely requires data compression and the use of digital audio filters to process the audio signal. The encoding process also typically requires that multiple separate data streams be multiplexed together. This is generally the case when audio data is processed differently than video data, since each of the different stages in the encoding process potentially introduces a time delay in the digital data signal, The encoding process potentially introduces a loss of synchronization between audio and video data, which is most noticeable as a loss of lip sync on spoken video screens. The human brain can perceive even the very slight time delay between video and audio data, most notably the situation where the audio signal is more advanced than the video signal. For this reason, applicable coding and transmission standards specify a maximum time delay between audio and video data. For example, according to some standards, an audio signal should not travel over a time delay longer than 40 ms than the corresponding video signal.

したがって、任意所定の符号化システムがオーディオ・ビジュアル・プログラムのオーディオ及びビデオの信号の間に生じるかもしれない時間遅延の正確な量を予め測定できることが有利である。 Thus, it is advantageous that any given coding system can pre-measure the exact amount of time delay that may occur between audio and video signals of an audiovisual program.

本発明の第１の概念によれば、本発明は、オーディオ及びビジュアルの信号の間の遅延を測定する方法であって、
視覚的に符号化された複数の一連のタイムスタンプを有するビデオ信号を供給すると共に、聴覚的に符号化された対応する複数のタイムスタンプを有するオーディオ信号を供給し、上記オーディオ及びビデオ信号が互いに同期しており、
上記オーディオ及びビデオ信号を符号化して、デジタル的に符号化されたオーディオ及びビデオ・データ・ストリームを発生し、
上記符号化されたビデオ及びオーディオ・データ・ストリームを分析して、上記聴覚的及び視覚的に符号化されたタイムスタンプの各々を抽出し、
対応するビデオ及びオーディオのタイムスタンプの受信時点の間の遅延を測定することを具えている。 According to a first concept of the present invention, the present invention is a method for measuring a delay between an audio and visual signal comprising:
Providing a video signal having a plurality of visually encoded time stamps and an audio signal having a corresponding plurality of time encoded audio signals, wherein the audio and video signals are Are synchronized,
Encoding the audio and video signals to generate a digitally encoded audio and video data stream;
Analyzing the encoded video and audio data streams to extract each of the audio and visually encoded time stamps;
Measuring the delay between the reception of corresponding video and audio time stamps.

好適実施例において、オーディオ及びビデオのタイムスタンプは、２進コード、例えば、グレー・コードとして符号化される。 In the preferred embodiment, audio and video time stamps are encoded as binary codes, eg, gray codes.

各視覚的に符号化されたタイムスタンプは、好ましくは、複数の表示セグメントを具えており、各セグメントの色又は影は、２進状態を表す。好ましくは、表示セグメントは、マクロ・ブロックの部分を具えている。 Each visually encoded timestamp preferably comprises a plurality of display segments, and the color or shadow of each segment represents a binary state. Preferably, the display segment comprises a macroblock portion.

聴覚的に符号化された各タイムスタンプは、好ましくは、複数の所定周波数成分を有するオーディオ・トーンを具え、周波数成分の存在が２進状態を表す。 Each aurally encoded time stamp preferably comprises an audio tone having a plurality of predetermined frequency components, with the presence of frequency components representing a binary state.

好ましくは、各符号化されたタイムスタンプは、フレーム・カウントを具える。 Preferably, each encoded timestamp comprises a frame count.

本発明の第２の概念によれば、本発明は、デジタル的に符号化されたオーディオ及びビデオの信号の間の遅延を測定する装置であって、上記ビデオ信号は、視覚的に符号化された複数の連続的なタイムスタンプを有し、上記オーディオ信号は、聴覚的に符号化された対応する複数のタイムスタンプを有し、上記オーディオ及びビデオの信号が互いに同期され、
上記符号化されたビデオ信号に符号化されたタイムスタンプの各々を検出し、上記タイムスタンプを復号化し、上記ビデオ・タイムスタンプの受信の実際の時間を表す第１時間信号を提供するように配置されたビデオ・タイムスタンプ検出器と、
上記符号化されたオーディオ信号に符号化された上記タイムスタンプの各々を検出し、上記タイムスタンプを復号化し、上記オーディオ・タイムスタンプの実際の時間を表す第２時間信号を提供するように配置されたオーディオ・タイムスタンプ検出器と、
上記第１及び第２の時間信号を受け、これらの受信時点の間の遅延を測定するように配置されたタイムスタンプ比較器とを具える。 According to a second concept of the present invention, the present invention is an apparatus for measuring the delay between digitally encoded audio and video signals, wherein the video signal is visually encoded. A plurality of consecutive time stamps, the audio signal has a corresponding plurality of time-stamps that are aurally encoded, and the audio and video signals are synchronized with each other;
Arranged to detect each of the time stamps encoded in the encoded video signal, decode the time stamp and provide a first time signal representative of the actual time of receipt of the video time stamp. A video time stamp detector,
Arranged to detect each of the time stamps encoded in the encoded audio signal, decode the time stamp, and provide a second time signal representative of the actual time of the audio time stamp. Audio time stamp detector,
A time stamp comparator arranged to receive the first and second time signals and measure a delay between these reception points.

例としてのみ示す添付図を参照して、本発明の実施例を以下に説明する。 Embodiments of the invention will now be described with reference to the accompanying drawings, which are shown by way of example only.

本出願人による出願中の同じ名称の特許出願において詳述された時間分析スキームによれば、本来利用可能なオーディオ及びビデオのデータに対して実施された符号化処理に続くオーディオ及びビデオのデータの間の任意の時間遅延は、既存の時間特性を有する所定ビデオ・シーケンスを用いて測定する。非圧縮データ・フォーマット、又は、例えばＭＰＥＧ−２ビデオ又はオーディオの如き標準符号化データ・フォーマットのいずれかにより、ビデオ／オーディオ・データ・シーケンスを提供する。所定のオーディオ／ビデオ・シーケンスは、所定の継続時間と、各フラッシュ間の時間間隔を有する一連の可視「フラッシュ」から成る。また、このシーケンスは、対応する数の可聴トーンも具え、その継続時間とトーン間の時間間隔とは、可視フラッシュの発生に正確に対応する。可視及び可聴の信号用の適切なタイミング図の例を図１に図的に示す。 According to the time analysis scheme detailed in the same-patent application filed by the applicant, the audio and video data following the encoding process performed on the originally available audio and video data. Any time delay between is measured using a predetermined video sequence with existing temporal characteristics. The video / audio data sequence is provided in either an uncompressed data format or a standard encoded data format such as MPEG-2 video or audio. A given audio / video sequence consists of a series of visible “flashes” having a given duration and a time interval between each flash. The sequence also has a corresponding number of audible tones, the duration of which and the time interval between the tones correspond exactly to the occurrence of a visible flash. An example of a suitable timing diagram for visible and audible signals is shown graphically in FIG.

図１において、上側の信号トレース２は、可視信号の二進レベルを示し、この信号は、可視の状態にて全体として黒又は全体として白のいずれかである。下側の信号トレース４は、可聴信号を表し、上側の信号レベルが可聴トーンの発生を示し、下側の信号レベルがトーンの不在を示す。図１から判る如く、可視フラッシュが発生せず、可聴トーンも発生しない期間である１単位の初期時間周期、例えば、１秒の後、１単位の継続時間の可視フラッシュ及び可聴トーンが続いて発生される。この後に、可視フラッシュも可聴トーンも生じない期間である更なる時間周期が続き、この第２時間周期が２単位の継続時間を有する。この後に、図１に示すシーケンスにおいて、２単位の継続時間の可視フラッシュ及び可聴トーンの発生が続き、３単位の非可視フラッシュ又は可聴トーンなどが続く。総合シーケンスは、可視フラッシュ及び可聴トーンが生じる期間の５周期を有し、各周期が前の周期よりも長い１時間単位だけ続き、可視フラッシュ又は可聴トーンが生じない期間の間に、対応して時間周期が増加する。よって、図示の例において、全体のシーケンスは、総合で３０時間単位にわたって続くが、これは典型的には３０秒である。全体のシーケンスは、好ましくは連続的に繰り返す。 In FIG. 1, the upper signal trace 2 shows the binary level of the visible signal, which is either black as a whole or white as a whole in the visible state. The lower signal trace 4 represents an audible signal, with the upper signal level indicating the occurrence of an audible tone and the lower signal level indicating the absence of a tone. As can be seen from FIG. 1, one unit of initial time period, ie, one second after one unit of visible flash and audible tone, followed by no visible flash and no audible tone. Is done. This is followed by a further time period, which is a period in which there is no visible flash or audible tone, and this second time period has a duration of 2 units. This is followed by 2 units of duration of visible flash and audible tone generation in the sequence shown in FIG. 1, followed by 3 units of invisible flash or audible tone. The overall sequence has five periods of time during which visible flash and audible tones occur, each period lasting for an hour unit longer than the previous period, and correspondingly during periods of no visible flash or audible tone The time period increases. Thus, in the illustrated example, the entire sequence lasts for a total of 30 hours, which is typically 30 seconds. The entire sequence is preferably repeated continuously.

この分析スキームの好適配置において、可視フラッシュは、少なくともマクロ・ブロック又は少なくともその整数倍内で生じるが、これは、表示スクリーンの左上隅に示される。好ましくは、４×４ブロック配列、即ち、３２×３２ピクセルを用いて、可視フラッシュを符号化する。当業者には理解できるように、表示されたイメージを発生する走査方法により、表示スクリーンのこの部分を表すデジタル・データが関連データ・ストリーム内に非常に早く生じ、その結果、実際には常に正確に符号化されるので、この位置が注意深く選択される。３２×３２ピクセル領域としての可視フラッシュの選択も、このビデオ・データの正確な符号化を確実にする傾向がある。同様に、可視フラッシュ用のクロック及び白の影のみを用いることは、ビデオ・データが正しく符号化される見込みを最大とする。これは、これらが、符号化処理で好ましくなく劣化する「基本」デジタル値であるためである。同様な様式において、オーディオ・トーンは、例えば、１０ＫＨｚ又はいくつかの他の単一周波数である単一周波数成分のみのトーンとして提供される。単一周波数成分のみをオーディオ・トーン用に用いるので、それを、被試験符号化システム内に含まれる任意のオーディオ・エンコーダにより忠実に符号化できる。 In a preferred arrangement of this analysis scheme, a visible flash occurs at least within a macroblock or at least an integer multiple thereof, which is shown in the upper left corner of the display screen. Preferably, the visible flash is encoded using a 4 × 4 block array, ie 32 × 32 pixels. As will be appreciated by those skilled in the art, the scanning method that produces the displayed image results in digital data representing this portion of the display screen very quickly in the associated data stream, so that it is always always accurate. This position is carefully selected. The choice of visible flash as a 32 × 32 pixel area also tends to ensure accurate encoding of this video data. Similarly, using only a visible flash clock and white shadow maximizes the likelihood that video data will be correctly encoded. This is because these are “basic” digital values that are undesirably degraded in the encoding process. In a similar manner, audio tones are provided as single frequency component only tones, eg, 10 KHz or some other single frequency. Since only a single frequency component is used for audio tones, it can be faithfully encoded by any audio encoder included in the coding system under test.

さらなるビジュアル・データ、例えば、可視フラッシュの大量のビジュアル表現を、例えば、一連の回転循環セグメントとして、ユーザに提供してもよく、各セグメントが単一の時間単位を表すので、完全なシーケンスには、多数のセグメントを通しての完全な「回転」が要求される。勿論、かかるビジュアル協調は、単に人間の操作者の便宜のためであり、本発明の必要部分ではないことが明らかである。 Additional visual data, such as a large amount of visual representation of the visible flash, may be provided to the user, for example, as a series of rotating circular segments, each segment representing a single unit of time, so a complete sequence A complete “rotation” through multiple segments is required. Of course, it is clear that such visual coordination is merely for the convenience of a human operator and is not a necessary part of the present invention.

この分析スキームによれば、所定のオーディオ・ビジュアル・シーケンスは、被試験符号化システムを通過し、符号化されたデジタル・データ・ストリームがその後に分析される。この分析処理は、３２×３２ピクセル・マクロ・ブロック完全体が「黒」から「白」に、又はその逆に変化する符号化されたデータ・ストリーム内で、時間的なポイントを検出することにより、可視フラッシュの１つの開始及び終了の一方又は両方を検出することである。表示がフレーム毎にリフレッシュされるだけなので、これが生じた時点は、ビジュアル・データの１フレームの継続時間内で正確である。典型的なフレーム・レートは、１秒当たり２５フレームである。よって、符号化されたオーディオ信号を分析して、オーディオ・トーンの開始及び終了の一方又は両方を求める。オーディオ・トーンの開始又は終了を検出する好ましい方法は、「トーン」から「トーンなし」又はその逆の各遷移が生じる程度に、トーンの急激な立ち上がり又は立ち下がり振幅を検出することである。よって、分析処理は、ビデオ及びオーディオの「事象」（立ち上がり又は立ち下がりのオーディオ又はビデオ信号エッジである事象）の間の任意の時間遅延を求めることができる。好適実施例において、１つ以上の伝送標準により設定される如き所定組のパラメータの外となる任意所定の遅延により、警告が自動的に発生される。
According to this analysis scheme, a predetermined audio-visual sequence passes through the coding system under test and the encoded digital data stream is subsequently analyzed. This analysis process is performed by detecting temporal points in the encoded data stream where the complete 32 × 32 pixel macroblock changes from “black” to “white” or vice versa. Detecting one or both of the start and end of a visible flash. Since the display is only refreshed every frame, the point in time when this occurs is accurate within the duration of one frame of visual data. A typical frame rate is 25 frames per second. Thus, the encoded audio signal is analyzed to determine one or both of the start and end of the audio tone. A preferred method of detecting the beginning or end of an audio tone is to detect the sudden rising or falling amplitude of the tone to the extent that each transition from “tone” to “no tone” or vice versa occurs. Thus, the analysis process can determine any time delay between video and audio “events” (events that are rising or falling audio or video signal edges). In the preferred embodiment, an alert is automatically generated with any predetermined delay outside a predetermined set of parameters, such as set by one or more transmission standards.

符号化されたデータ・ストリーム内のビデオ／オーディオ同期が任意に損なわれたことを求める本願出願人による出願中の分析スキームによるシステムが図２に図的に示される。上述の如き符号されていない状態の所定ビデオ・ストリーム１０は、ハードディスク１２の如きデータ蓄積媒体に蓄積され、被試験符号化システム１４に入力として供給される。この符号化システムは、分離されたビデオ１６及びオーディオ１８のストリームに分解できる符号化データ・ストリームを一般的に出力する。ビデオ及びオーディオのストリームの各々は、分析エンジン２０の入力として、また、分離したビデオ及びオーディオの事象検出ユニット２２、２４の入力として供給される。これらビデオ及びオーディオのストリームが分析エンジンへの個別の入力として図２にビデオ及びオーディオのストリームとして示されるが、符号化システム１４により供給された符号化データ・ストリームの分解は、分析エンジン内で、例えば、ラッパー・デマルチプレクサにより均等に達成されることが理解できよう。図１に関連して上述したように可視「フラッシュ」及びオーディオ・トーンの開始又は終了若しくは両方である、符号化されたテスト・データ・ストリームの関連するビデオ又はオーディオの「事象」を検出し、各事象が生じた時を示す出力信号を供給するように、各事象検出ユニットが配置されている。オーディオ及びビデオの事象検出ユニット２２、２４からの出力信号は、時間比較ユニットに供給される。この時間比較ユニットは、事象検出ユニットからの出力信号の間に存在する任意の時間間隔と、オーディオ及びビデオの「事象」の発生の間で遅れているか又は進んでいる任意の時間間隔とを測定するように配置される。この時間間隔データは、時間比較ユニット２６から出力インタフェース・ユニット２８に供給され、この出力インタフェース・ユニットは、時間間隔データを適切なユーザ・インタフェースに供給されるように配置されている。好ましくは、出力インタフェース・ユニット２８もオーディオ及びビデオの信号の間の任意の時間遅延を定義済み最大許容遅延と比較するように配置される。この定義済み最大許容遅延は、更なるデータ蓄積領域３０に蓄積されてもよいし、又は、内部的に出力インタフェース・ユニットに蓄積されてもよい。検出された時間間隔が所定値を超えると、警告信号を供給するように出力インタフェース・ユニットを配置してもよい。
A system in accordance with Applicant's pending analysis scheme seeking that video / audio synchronization in the encoded data stream has been arbitrarily corrupted is shown schematically in FIG. The predetermined video stream 10 in the uncoded state as described above is stored in a data storage medium such as a hard disk 12 and supplied to the encoding system 14 under test as an input. The encoding system typically outputs an encoded data stream that can be broken down into separate video 16 and audio 18 streams. Each of the video and audio streams is provided as an input to the analysis engine 20 and as an input to a separate video and audio event detection unit 22,24. These video and audio streams are shown as separate video and audio streams in FIG. 2 as separate inputs to the analysis engine, but the decomposition of the encoded data stream provided by the encoding system 14 is within the analysis engine: For example, it can be seen that it is achieved equally by a wrapper demultiplexer. Detecting an associated video or audio “event” in the encoded test data stream that is the start or end or both of a visible “flash” and an audio tone as described above in connection with FIG. Each event detection unit is arranged to provide an output signal indicating when each event occurred. Output signals from the audio and video event detection units 22, 24 are supplied to a time comparison unit. This time comparison unit measures any time interval that exists between the output signals from the event detection unit and any time interval that is delayed or advanced between the occurrence of audio and video "events" To be arranged. This time interval data is supplied from the time comparison unit 26 to the output interface unit 28, which is arranged so that the time interval data is supplied to the appropriate user interface. Preferably, the output interface unit 28 is also arranged to compare any time delay between the audio and video signals with a predefined maximum allowable delay. This predefined maximum allowable delay may be stored in a further data storage area 30 or may be stored internally in the output interface unit. The output interface unit may be arranged to supply a warning signal when the detected time interval exceeds a predetermined value.

図１を参照して上述した如く、可視フラッシュ及び可聴トーンのシーケンスは、各「事象」の間の時間間隔が延びた「事象」を具えている。これは、ビデオ及びオーディオの信号の間の時間遅延をビデオ事象の１つに対して充分に長くして、オーディオ事象と確実に一致させ、分析エンジンに報告又は警告を発生させない「誤った」同期は、ビデオ及びオーディオの事象の次のアセンブリにて維持されない。これは、図３にて図的に示され、上側のトレース３２は、ビデオ事象信号を表し、下側のトレース３４は、オーディオ事象信号を表す。図３から判る如く、符号化処理により、オーディオ事象信号は、矢印Ａで示すように、３時間単位、例えば、３秒の時間周期だけ、ビデオ信号に対して遅延される。よって、第２ビデオ事象３６の開始は、第１オーディオ事象３８の開始と同じ時点で生じる。これらが分析エンジンにより検出された第１ビデオ及びオーディオの事象ならば、オーディオ及びビデオのストリームの間の同期についての誤った報告がなされるかもしれない。しかし、事象が均等に相隔たっておらず一定時間間隔でないので、次のビデオ事象４０の開始にて、オーディオ・ストリームが非同期であることがわかる。よって、分析エンジンは、実際に、ビデオ及びオーディオのストリームが同期していないかを判断できる。分析エンジンがビデオ及びオーディオの事象の開始及び終了の両方を検出すると、同期が損なわれることが直ぐに検出される。これは、両方の事象の開始が一致しても、第１オーディオ事象３８の終了が第２ビデオ事象３６の終了前に生じるためである。この場合、１時間単位、例えば１秒の間に、分析エンジンによりオーディオ及びビデオのストリームの間の同期が損なわれたことが検出される。 As described above with reference to FIG. 1, the sequence of visible flash and audible tones comprises “events” with an extended time interval between each “event”. This makes the time delay between the video and audio signals long enough for one of the video events to ensure that it matches the audio event and does not report or alert the analysis engine. Is not maintained at the next assembly of video and audio events. This is shown graphically in FIG. 3, with the upper trace 32 representing the video event signal and the lower trace 34 representing the audio event signal. As can be seen from FIG. 3, the encoding process delays the audio event signal with respect to the video signal by a unit of 3 hours, for example, a time period of 3 seconds, as indicated by arrow A. Thus, the start of the second video event 36 occurs at the same time as the start of the first audio event 38. If these are first video and audio events detected by the analysis engine, false reports about synchronization between the audio and video streams may be made. However, since the events are not evenly spaced and not at a fixed time interval, it can be seen that at the start of the next video event 40 the audio stream is asynchronous. Thus, the analysis engine can determine whether the video and audio streams are actually out of sync. As soon as the analysis engine detects both the start and end of video and audio events, it is detected that the synchronization is lost. This is because the end of the first audio event 38 occurs before the end of the second video event 36 even if the start of both events coincide. In this case, the analysis engine detects that the synchronization between the audio and video streams has been lost in an hourly unit, for example 1 second.

しかし、上述のスキームを用いることにより、同期が任意に損なわれたことを単に求めることができ、また、同期の全体が損なわれたことにより、ビデオ及びオーディオの事象の間に「ミス・マッチ」が生じると、オーディオ又はビデオの事象が生じるときが最良で且つ図３に関連して上述したように、遅延が測定され、同期が損なわれたかを判断するのに必要な時間を多数の時間単位にできることが理解できよう。オーディオ・ビジュアル・データの各１秒が典型的にはデータの２５フレームを具えているので、これは、システム資源の無駄である。すなわち、同じデータが１秒間に２５回だけ処理される。本発明の実施例によれば、分析スキームを提供して、オーディオ及びビデオのデータ・ストリームの間の時間遅延を求めることが改良できる。各フレームの識別を可能にするオーディオ及びビジュアルのデータを含む被試験符号化システムにて復号化すべき所定のオーディオ・ビジュアル試験シーケンスを提供することにより、これを達成できる。
However, by using the above scheme, one can simply determine that synchronization has been arbitrarily compromised, and that the overall synchronization has been compromised, resulting in a “mismatch” between video and audio events. Occurs when the audio or video event occurs best and, as described above in connection with FIG. 3, the time required to determine if the delay is measured and the synchronization is lost You can understand what you can do. This is a waste of system resources because each second of audio-visual data typically comprises 25 frames of data. That is, the same data is processed only 25 times per second. In accordance with embodiments of the present invention, an analysis scheme can be provided to improve the determination of time delay between audio and video data streams. This can be accomplished by providing a predetermined audio-visual test sequence to be decoded at the coding system under test that includes audio and visual data that allows identification of each frame.

好適実施例において、２進コードを表す黒及び白のシーケンスのパターンを用いて、ビジュアル符号化を達成する。グレー・コードの既知の特性は、２進ワードのたった１つのビットが一度に変化するので、好適な２進コードはグレー・コードである。黒及び白の矩形の可能なシーケンスの例を図４に示す。ここでは、オーディオ・ビジュアル・データの３つの連続フレームに対する矩形にシーケンスが示されている。最初の５つの矩形シーケンスは、グレー・コード００１０１である１０進法の７を表す。よって、これを用いて、フレーム番号７としてのフレームを識別する。第２及び第３シーケンスは、夫々１０進法の８及び９である００１００及び０１１００を表す。図４では、単に明瞭にするためだけの目的で、５ビット・ワードを示しており、試験シーケンスのフレームの数に応じて任意の長さのワードを選択できることが明らかであろう。上述のスキームと同様に、本発明の実施例において、個別の矩形が３２×３２ピクセルの３２×３２ブロックの如くマクロ・ブロックの完全な部分又は別々のブロックとして、個別のシーケンスが符号化されて、矩形のシーケンスの信頼性エラーのない符号化を容易にする。よって、符号化されたフレーム識別コードを、信頼性をもって維持する。 In the preferred embodiment, visual coding is achieved using a pattern of black and white sequences representing binary codes. Since the known property of gray codes is that only one bit of a binary word changes at a time, the preferred binary code is a gray code. An example of a possible sequence of black and white rectangles is shown in FIG. Here, the sequence is shown in a rectangle for three consecutive frames of audio-visual data. The first five rectangular sequences represent the decimal 7 which is the gray code 00101. Therefore, the frame as frame number 7 is identified using this. The second and third sequences represent 00100 and 01100 which are decimal 8 and 9, respectively. In FIG. 4, a 5-bit word is shown for purposes of clarity only, and it will be apparent that any length of word can be selected depending on the number of frames in the test sequence. Similar to the scheme described above, in an embodiment of the present invention, individual sequences are encoded as complete portions of macroblocks or separate blocks, such as 32 × 32 blocks of individual rectangles of 32 × 32 pixels. Facilitates encoding of rectangular sequences without reliability errors. Therefore, the encoded frame identification code is maintained with reliability.

ビデオ信号のどのフレームかを識別するためのタイミング・シーケンスによりオーディオ信号も符号化され、オーディオ・データの特定部分と同期される。本発明の特定実施例において、これは、多くの分離した個別の周波数を作るオーディオ・トーンを含むことにより、成し遂げられる。データ・ワードの１ビットを、グレー・コードの単一ビットを表すビデオ・コードの各シーケンシャルに対し各周波数がアナログ方法にて表す。これにより、フーリエ分析技術を用いて、符号化されたトーンを分析して、個別の周波数成分及びトーンが表す２進コードが存在するかそうでないかを判断できる。かかる符号化されたトーンの周波数分析の例を図５に図的に示す。水平軸は、検出した周波数成分の周波数をＫＨｚで表す一方、垂直軸は、その成分のパワーを表す。図５に示す例において、２つの周波数成分４０は、９ＫＨｚ及び１５ＫＨｚで示される。選択されたコードが５ビット・コードで、３ＫＨｚを中心とする周波数成分が最上位ビットを表し、１５ＫＨｚが最下位ビットならば、図５に示す周波数スペクトルは、グレー・コードで２進ワード００１０１、即ち、１０進法の７を表す。符号化されたタイミング・ワードの個別ビットを表すために選択される周波数成分の選択においては、配慮が必要である。これは、オーディオ・エンコーダにとっては、人間の耳に聞こえる及び聞こえない周波数が何かの分析に基づいて信号のある周波数成分を捨てるということが一般的な方法のためである。したがって、タイミング・ワード用に選択された周波数成分は、かかる符号化技術により捨てられないものでなければならない。
The audio signal is also encoded with a timing sequence for identifying which frame of the video signal and is synchronized with a specific portion of the audio data. In a particular embodiment of the invention, this is accomplished by including audio tones that make up many separate individual frequencies. One bit of the data word represents each frequency in an analog manner for each sequential video code representing a single bit of the gray code. This allows the encoded tone to be analyzed using Fourier analysis techniques to determine whether the individual frequency components and the binary code represented by the tone are present or not. An example of such a coded tone frequency analysis is shown graphically in FIG. The horizontal axis represents the frequency of the detected frequency component in KHz, while the vertical axis represents the power of that component. In the example shown in FIG. 5, the two frequency components 40 are shown at 9 KHz and 15 KHz. If the selected code is a 5-bit code and the frequency component centered at 3 KHz represents the most significant bit and 15 KHz is the least significant bit, then the frequency spectrum shown in FIG. That is, it represents 7 in decimal notation. Care must be taken in selecting the frequency components selected to represent the individual bits of the encoded timing word. This is because it is common for audio encoders to discard certain frequency components of a signal based on an analysis of what frequencies are audible and inaudible to the human ear. Therefore, the frequency component selected for the timing word must be such that it cannot be discarded by such an encoding technique.

本発明の他の実施例において、オーディオ・コードを所定時間間隔における一連の短いオーディオ・トーンとして符号化してもよく、この一連での各トーンがタイミング・ワード内のビットを表し、トーンが存在するか否かがビットの２進状態を表す。また例としては、一連の８つのオーディオ・トーンを用いて、８ビット２進ワードを表す。オーディオ・トーンの周波数は、予め選択されてこれらの検出を容易にする。 In other embodiments of the present invention, the audio code may be encoded as a series of short audio tones in a predetermined time interval, where each tone in the series represents a bit in the timing word and there is a tone. Or not represents the binary state of the bit. Also by way of example, an 8-bit binary word is represented using a series of eight audio tones. The frequency of the audio tone is preselected to facilitate these detections.

図６は、被試験符号化システムにより符号化した後に、本発明の実施例により上述のフォーマットの試験オーディオ・ビジュアル・データ・ストリームを分析する分析エンジンを図的に示している。基本的なコンポーネントは、図２に示すシステム用と同じである。個別のビデオ及びオーディオのデータ・ストリーム６１６、６１８は、各時間コード検出ユニット６２２、６２４への入力として供給される。各時間コード検出ユニットは、埋め込みビデオ及びオーディオの時間コードを識別し復号化するように配置されている。その結果、好適実施例において、コード化された黒及び白の矩形のシーケンスを探し、特定のシーケンスが表す２進コードを求め、個別のフレーム番号を識別するように、ビデオ時間コード検出ユニット６２２が配置される。各フレームが受信される時間的ポイントも求まる。同様に、埋め込みオーディオ時間コードの必要な周波数分析を行って、現在の周波数成分及びそれを表す２進コードを求めるように、オーディオ時間コード検出ユニット６２４を好適には配置する。時間コード検出ユニットからの関連した出力信号を時間比較ユニット６２６に供給する。この時間比較ユニットは、関連埋め込み時間コードが識別するように、データ・ストリームの対応部分の受信時点に基づいて、オーディオ及びビデオのデータ・ストリーム間の任意の時間遅延を求めるように配置されている。任意の時間遅延を報告及び／又は警告ユニット６２８への入力として供給する。このユニット６２８は、例えば、ローカル・データ・ストレージ６３０にルックアップ・テーブルとして蓄積されてもよいある所定パラメータを時間遅延が超えたかを判断するように配置されている。 FIG. 6 graphically illustrates an analysis engine that analyzes a test audio-visual data stream in the format described above according to an embodiment of the present invention after encoding by the encoding system under test. The basic components are the same as for the system shown in FIG. Separate video and audio data streams 616, 618 are provided as inputs to each time code detection unit 622, 624. Each time code detection unit is arranged to identify and decode embedded video and audio time codes. As a result, in the preferred embodiment, the video time code detection unit 622 looks for a coded black and white rectangular sequence, determines the binary code that the particular sequence represents, and identifies the individual frame number. Be placed. The time point at which each frame is received is also determined. Similarly, the audio time code detection unit 624 is preferably arranged to perform the necessary frequency analysis of the embedded audio time code to determine the current frequency component and the binary code representing it. An associated output signal from the time code detection unit is provided to the time comparison unit 626. This time comparison unit is arranged to determine an arbitrary time delay between the audio and video data streams based on the time of reception of the corresponding part of the data stream, as identified by the associated embedded time code. . Any time delay is provided as an input to the reporting and / or warning unit 628. This unit 628 is arranged, for example, to determine whether the time delay has exceeded certain parameters that may be stored as a look-up table in the local data storage 630.

符号化されたオーディオ・ビジュアル・データ・ストレージの各フレームは、その各埋め込み時間コードにより個別に識別されるので、本出願人の継続中の出願で説明したスキームによる場合のままで、分析エンジンは、単一フレームのスペース内で分離したオーディオ及びビデオのデータ・ストリームの相対部分を求めることができると共に、各ビデオ／オーディオ「事象」のみのための遅延情報とは対照的に、各フレームのオーディオ／ビデオ時間遅延情報を提供できる。その結果、本発明の装置及び方法は、改善された速度で、遅延情報と、この遅延情報の改善された分解能とを提供できる。
Each frame of encoded audio-visual data storage is individually identified by its respective embedded time code, so that the analysis engine remains as is according to the scheme described in Applicant's ongoing application. The relative portion of the audio and video data streams separated within a single frame space can be determined, and the audio for each frame as opposed to the delay information for each video / audio "event" only / Video time delay information can be provided. As a result, the apparatus and method of the present invention can provide delay information and improved resolution of this delay information at an improved rate.

図１は、可能な試験信号内に含まれるオーディオ及びビデオの事象のタイミング及び継続時間を図的に示す。FIG. 1 graphically illustrates the timing and duration of audio and video events contained within possible test signals. 図２は、図１に示すオーディオ及びビデオの信号の間の時間遅延を測定する時間遅延分析システムを図的に示す。FIG. 2 graphically illustrates a time delay analysis system that measures the time delay between the audio and video signals shown in FIG. 図３は、図１に示す如きオーディオ及びビデオの信号の相対時間を図的に示し、オーディオ及びビデオの信号の間に遅延が存在する。FIG. 3 graphically illustrates the relative time of the audio and video signals as shown in FIG. 1, with a delay between the audio and video signals. 図４は、本発明の実施例によりタイムスタンプを視覚的に符号化する方法を図的に示す。FIG. 4 graphically illustrates a method for visually encoding a time stamp according to an embodiment of the present invention. 図５は、本発明の実施例によりタイムスタンプをオーディオ符号化する方法を図的に示す。FIG. 5 schematically illustrates a method for audio encoding a time stamp according to an embodiment of the present invention. 図６は、図４及び図５に示す種類の符号化されたタイムスタンプを有するオーディオ及びビデオの信号の間の時間遅延を測定する本発明の実施例による時間遅延分析システムを図的に示す。FIG. 6 schematically illustrates a time delay analysis system according to an embodiment of the present invention that measures the time delay between audio and video signals having encoded time stamps of the type shown in FIGS.

Claims

A method for measuring a delay between an audio signal and a video signal, comprising:
Providing a video signal having a plurality of visually encoded video time stamps ;
A step of supplying the audio signal having the video signal and synchronized, aurally corresponding plurality of audio time stamp encoded,
And generating the audio signal and the video signal by coding, digitally encoded audio data stream and the video data stream,
By analyzing the O Dio data stream and the video data stream, extracting each of the audio time stamps and the video time stamp,
Extracted corresponding comprising the steps of measuring the delay between the reception time of the video time stamp and said audio time stamp,
The video time stamp comprises a plurality of display segments, wherein the color or shadow of each segment is represented in a binary state, the video time stamp provides a visual representation of the video binary code, and the video 2 The hexadecimal code represents the frame number of the video signal,
Said audio time stamp, comprises an audio tone having a plurality of different predetermined frequency components, the presence or absence of each of said plurality of different frequency components represent a binary state of each bit of the audio binary code, the audio time stamp to provide an audio representation of the audio binary code, audio / video synchronization delay measuring method the audio binary code is equal to or representing the frame number of the video signal.

The video each time stamp and said audio time stamp, the video time stamp and said audio according to claim 1, characterized in that it comprises a sequence different duration increases through the sequence of time stamps the audio / video synchronization Delay measurement method.

An apparatus for measuring a delay between a digitally encoded audio signal and a video signal, the video signal having a plurality of visually encoded video time stamps, audio signal audibly has a plurality of audio time stamp corresponding encoded signal of the audio signal and the video are synchronized with each other,
Detecting each of said video time stamp from said video signal, said video time stamp and decrypts the video time stamp detector that provides a first time signal representing the actual time of reception of the video time stamp And
Detecting each of said audio time stamp from the audio signal, and decoding the above-mentioned audio time stamp, the actual audio time stamp detector that provides second time signal indicating a time of said audio time stamp ,
Receiving the first hour signal and the second time signal, comprises a time stamp comparator you measure the delay between the reception time of the video time stamp and said audio time stamp,
The video time stamp comprises a plurality of display segments, wherein the color or shadow of each segment is represented in a binary state, the video time stamp provides a visual representation of the video binary code, and the video 2 The hexadecimal code represents the frame number of the video signal,
Said audio time stamp, comprises an audio tone having a plurality of different predetermined frequency components, the presence or absence of each of said plurality of different frequency components represent a binary state of each bit of the audio binary code, the audio time stamp to provide an audio representation of the audio binary code, audio / video synchronization delay measurement device where the audio binary code is equal to or representing the frame number of the video signal.

4. The audio / video synchronization of claim 3, wherein each of the video timestamp and the audio timestamp has a different duration that sequentially increases through the sequence of the video timestamp and the audio timestamp. Delay measurement device.