JP2012244413A

JP2012244413A - Signal monitoring device and program, and signal correction device and program

Info

Publication number: JP2012244413A
Application number: JP2011112582A
Authority: JP
Inventors: Yukihiro Nishida; 幸博西田; Koichi Nishimura; 高一西村; Toshio Hata; 俊生秦
Original assignee: Nippon Hoso Kyokai NHK
Current assignee: Japan Broadcasting Corp
Priority date: 2011-05-19
Filing date: 2011-05-19
Publication date: 2012-12-10

Abstract

PROBLEM TO BE SOLVED: To detect a relative delay between a video signal and an audio signal when a delay occurs between the video signal and the audio signal.SOLUTION: A transmit side device 10 comprises a signal extraction unit 11 which extracts an audio signal from a video signal in which the audio signal is multiplexed, a feature quantity calculation unit 12 which calculates a sound feature quantity a(i) from the audio signal extracted by the signal extraction unit 11, and an auxiliary data multiplexing unit 13 which multiplexes the sound feature quantity a(i) calculated by the feature quantity calculation unit 12 in the video signal which has had the audio signal multiplexed therein and outputs a video signal having the audio signal and the sound feature quantity a(i) multiplexed therein. A receive side device 20 comprises a signal and feature quantity extraction unit 21 which extracts the audio signal and the sound feature quantity a(i) from the video signal in which the audio signal and the sound feature quantity a(i) have been multiplexed, a feature quantity calculation unit 22 which calculates a sound feature quantity b(i) from the audio signal extracted by the signal and feature quantity extraction unit 21, and a relative delay calculation unit 24 which calculates a video and audio relative delay quantity 25 from the sound feature quantity a(i) extracted by the signal and feature quantity extraction unit 21 and the sound feature quantity b(i) calculated by the feature quantity calculation unit 22.

Description

本発明は、映像信号と音声信号を伝送する際に、伝送過程で生じた映像信号と音声信号の間の相対遅延や誤りを検出する信号監視装置およびプログラム、ならびに、相対遅延を補正する信号補正装置およびプログラムに関する。 The present invention relates to a signal monitoring device and program for detecting a relative delay or error between a video signal and an audio signal generated in the transmission process when transmitting a video signal and an audio signal, and a signal correction for correcting the relative delay. The present invention relates to a device and a program.

テレビ映像信号用SDI(Serial Digital Interface)の補助データ領域に音声信号を多重して伝送することがよく行われている。SDIの補助データ領域への音声信号の多重法は、例えばARIB標準規格BTA S-006に規定されている。国内で使用されているHDTV(1080／59.94／I)の場合、映像信号のフレーム期間毎に音声信号約1600サンプル(48kHzサンプリング)が多重される。伝送過程では、機器の不具合や雑音の混入などによって、映像信号や音声信号にエラーが生じたり、運用上のエラーによって意図しない信号が付加、挿入されたりすることがある。また、映像と音声を分離して処理し、再び音声信号を多重して伝送することがある。この場合、信号処理によっては映像信号と音声信号の間に遅延差が生じる。映像と音声の間に遅延が生じると、映像フレーム期間に多重される音声信号のサンプルが、元のサンプルとは異なることとなる。 Audio signals are often multiplexed and transmitted in an auxiliary data area of SDI (Serial Digital Interface) for TV video signals. A method for multiplexing an audio signal to the auxiliary data area of SDI is defined in, for example, ARIB standard BTA S-006. In the case of HDTV (1080 / 59.94 / I) used in Japan, approximately 1600 audio signals (48 kHz sampling) are multiplexed for each frame period of the video signal. In the transmission process, an error may occur in a video signal or an audio signal due to a malfunction of a device or noise, or an unintended signal may be added or inserted due to an operational error. Also, video and audio may be separated and processed, and audio signals may be multiplexed and transmitted again. In this case, a delay difference occurs between the video signal and the audio signal depending on the signal processing. When a delay occurs between video and audio, the sample of the audio signal multiplexed in the video frame period is different from the original sample.

ARIB技術資料ARIB TR-B29（非特許文献１）では、信号の障害監視を行うために、映像や音声の特徴量を用いるための方法が記載されている。送信側の映像と音声の特徴量をフレーム毎に算出し、メタデータとして映像・音声信号と共に伝送し、受信側の監視点では、受信した映像および音声信号から映像および音声の特徴量をフレーム毎に算出し、送信側で算出されて伝送された特徴量と比較する。伝送過程で信号にエラーが生じていれば、受信側で算出した特徴量は、送信側で算出した特徴量とは異なる値となるため、送受の特徴量を比較することによって、送信・受信間で生じた信号エラーを検出できる。 The ARIB technical document ARIB TR-B29 (Non-Patent Document 1) describes a method for using video and audio feature quantities to perform signal fault monitoring. The video and audio feature quantities on the sending side are calculated for each frame and transmitted as metadata along with the video and audio signals. At the monitoring point on the receiving side, the video and audio feature quantities are received from the received video and audio signals for each frame. And is compared with the feature amount calculated and transmitted on the transmission side. If there is an error in the signal during the transmission process, the feature value calculated on the receiving side will be different from the feature value calculated on the transmitting side. Can detect signal errors.

図１に、送信側と受信側それぞれにおける、映像信号と音声信号ならびにフレーム毎の映像特徴量と音声特徴量の関係を示す。この例では、映像信号が0.5フレーム分、音声より遅延した場合を示し、１フレーム当たりの音声サンプル数を1600として記載している。 FIG. 1 shows the relationship between the video signal and audio signal and the video feature value and audio feature value for each frame on the transmission side and the reception side. This example shows a case where the video signal is delayed by 0.5 frames from the audio, and the number of audio samples per frame is described as 1600.

伝送過程で生じた映像信号と音声信号の間の相対遅延を検出する技術の従来技術文献としては、例えば特許文献１がある。特許文献１には、送信側で、デジタル映像信号及びデジタル音声信号の基準信号でサンプリングした時刻情報をビデオ圧縮データ及びオーディオ圧縮データに付加して伝送し、受信側で、受信したビデオ圧縮データ及びオーディオ圧縮データに付加された時刻情報を抽出し、受信したビデオ圧縮データ及びオーディオ圧縮データを伸長したデジタル映像信号及びデジタル音声信号の基準信号でサンプリングした時刻情報と抽出した時刻情報の差分を検出し、当該検出した差分情報に基づき、映像信号及び音声信号の送信側における入力時刻と受信側における出力時刻の差である遅延時間を検出し、映像信号の遅延時間と音声信号の遅延時間から映像信号と音声信号の遅延時間の差を示す値を得ることが記載されている（要約、段落００３５〜００３６参照）。 As a prior art document of a technique for detecting a relative delay between a video signal and an audio signal generated in a transmission process, for example, there is Patent Document 1. In Patent Document 1, time information sampled with a reference signal of a digital video signal and a digital audio signal is added to video compression data and audio compression data on the transmission side and transmitted, and on the reception side, the received video compression data and Extracts the time information added to the audio compression data and detects the difference between the extracted time information and the time information sampled with the reference signal of the digital video signal and digital audio signal obtained by expanding the received video compression data and audio compression data. Based on the detected difference information, a delay time that is a difference between an input time on the transmission side of the video signal and the audio signal and an output time on the reception side is detected, and the video signal is determined from the delay time of the video signal and the delay time of the audio signal. And obtaining a value indicating the difference between the delay times of the audio signal (summary, paragraphs 0035- See 036).

特開２００４−１０４７３０号公報JP 2004-104730 A

ARIB技術資料ARIB TR-B29「放送チェーンにおける映像・音声信号の障害監視のためのメタデータ」、[online]、［平成２３年４月６日検索］、インターネット＜http:／／www.arib.or.jp／english／html／overview／doc／4-TR-B29v1_0.pdf＞ARIB Technical Document ARIB TR-B29 “Metadata for Video / Audio Signal Fault Monitoring in Broadcast Chains” [online], [April 6, 2011 Search], Internet <http: //www.arib. or.jp/english/html/overview/doc/4-TR-B29v1_0.pdf>

映像信号と音声信号の相対遅延を検出するためには、映像信号と音声信号の時間的な対応関係が既知である必要がある。さらに、受信した映像信号と音声信号から、時間的な対応関係を検出する手段が必要である。 In order to detect the relative delay between the video signal and the audio signal, the temporal correspondence between the video signal and the audio signal needs to be known. Furthermore, a means for detecting temporal correspondence from the received video signal and audio signal is required.

また、伝送過程で映像信号と音声信号の間に遅延が生じていると、受信した映像信号フレーム期間に多重されている音声信号が、送信側で多重された音声信号とは異なることとなり、伝送エラーが存在しなくても、受信側で算出される音声特徴量は、送信側で算出された音声特徴量とは異なることとなり、送受の音声特徴量が一致しない。すなわち、図１において、送信側では、映像フレーム１に対して、音声サンプル1−1600から音声特徴量AI_T(1)が生成されるが、受信側では、映像フレーム１に対して、音声サンプル801−2400に対して音声特徴量AI_R(1)が生成され、全く伝送過程でのエラーが無かったとしてもAI_T(1)とAI_R(1)は一致しない。 Also, if there is a delay between the video signal and the audio signal in the transmission process, the audio signal multiplexed in the received video signal frame period will be different from the audio signal multiplexed on the transmission side. Even if there is no error, the speech feature amount calculated on the receiving side is different from the speech feature amount calculated on the transmitting side, and the transmitted and received speech feature amounts do not match. That is, in FIG. 1, the audio feature AI _T (1) is generated from the audio sample 1-1600 for the video frame 1 on the transmission side, while the audio sample for the video frame 1 is generated on the reception side. A voice feature value AI _R (1) is generated for 801-2400, and AI _T (1) and AI _R (1) do not match even if there is no error in the transmission process.

本発明の目的は、伝送過程における信号処理の結果生じる処理遅延によって、映像信号と音声信号の間に遅延が生じた場合の映像信号と音声信号の間の相対遅延を検出する装置を提供することである。 An object of the present invention is to provide an apparatus for detecting a relative delay between a video signal and an audio signal when a delay occurs between the video signal and the audio signal due to a processing delay generated as a result of signal processing in a transmission process. It is.

また、伝送過程における機器障害や伝送エラーなどによって生じる信号誤りを監視する際に、映像信号と音声信号の間に遅延が生じた場合に映像信号と音声信号の間の相対遅延を検出し、相対遅延を補正して音声信号の誤りを検出する装置を提供することである。 Also, when a signal error caused by equipment failure or transmission error in the transmission process is monitored, if a delay occurs between the video signal and the audio signal, the relative delay between the video signal and the audio signal is detected. An object of the present invention is to provide an apparatus for detecting an error in an audio signal by correcting a delay.

また、検出された映像信号と音声信号の間の相対遅延に基づいて、いずれか進んでいる方の信号を遅延させ、映像信号と音声信号の間の遅延を補正する装置を提供することである。 Another object of the present invention is to provide a device that delays one of the two signals on the basis of the relative delay between the detected video signal and the audio signal and corrects the delay between the video signal and the audio signal. .

本明細書において開示される発明のうち、代表的なものの概要を簡単に説明すれば、以下のとおりである。 Of the inventions disclosed in this specification, the outline of typical ones will be briefly described as follows.

（１）音声信号と、音声信号の音声特徴量である第１の音声特徴量が多重された映像信号を受信する信号監視装置であって、前記映像信号から音声信号および第１の音声特徴量を抽出する信号・特徴量抽出部と、前記信号・特徴量抽出部が抽出した音声信号から第２の音声特徴量を算出する特徴量算出部と、前記信号・特徴量抽出部が抽出した第１の音声特徴量と前記第２の特徴量算出部が算出した第２の音声特徴量から映像・音声相対遅延量を算出する相対遅延算出部と、を備える信号監視装置である。 (1) A signal monitoring apparatus that receives an audio signal and a video signal obtained by multiplexing a first audio feature quantity that is an audio feature quantity of the audio signal, the audio signal and the first audio feature quantity being obtained from the video signal. A signal / feature amount extraction unit for extracting a signal, a feature amount calculation unit for calculating a second audio feature amount from the audio signal extracted by the signal / feature amount extraction unit, and a first one extracted by the signal / feature amount extraction unit And a relative delay calculating unit that calculates a video / audio relative delay amount from the first audio feature amount and the second audio feature amount calculated by the second feature amount calculating unit.

（２）上記（１）において、前記相対遅延算出部が算出した映像・音声相対遅延量に基づいて、前記信号・特徴量抽出部が抽出した第１の音声特徴量と前記第２の特徴量算出部が算出した第２の音声特徴量から遅延を補正した第１の音声特徴量と第２の音声特徴量を得る位相調整部と、前記位相調整部によって得られた遅延を補正された第１の音声特徴量と第２の音声特徴量を比較して誤り検出を行う音声特徴量比較部と、を備える信号監視装置である。 (2) In the above (1), the first audio feature amount and the second feature amount extracted by the signal / feature amount extraction unit based on the video / audio relative delay amount calculated by the relative delay calculation unit. A first sound feature value obtained by correcting a delay from the second sound feature value calculated by the calculation unit; a phase adjustment unit that obtains a second sound feature value; and a delay obtained by correcting the delay obtained by the phase adjustment unit. 1 is a signal monitoring device including a speech feature amount comparison unit that performs error detection by comparing one speech feature amount with a second speech feature amount.

（３）上記（１）の信号監視装置と、前記信号監視装置から出力される映像・音声相対遅延量に基づいて、前記音声信号が多重された映像信号の中の音声信号または映像信号のいずれかの遅延を補正する映像／音声信号補正部と、を備える信号補正装置である。 (3) Either the audio signal or the video signal in the video signal in which the audio signal is multiplexed based on the signal monitoring device of (1) above and the video / audio relative delay amount output from the signal monitoring device. And a video / audio signal correction unit that corrects such a delay.

（４）コンピュータを、音声信号と、音声信号の音声特徴量である第１の音声特徴量が多重された映像信号を受信する信号監視装置として機能させるプログラムであって、前記映像信号から音声信号および第１の音声特徴量を抽出する信号・特徴量抽出手段と、前記信号・特徴量抽出手段が抽出した音声信号から第２の音声特徴量を算出する特徴量算出手段と、前記信号・特徴量抽出手段が抽出した第１の音声特徴量と前記第２の特徴量算出部が算出した第２の音声特徴量から映像・音声相対遅延量を算出する相対遅延算出手段として機能させるためのプログラムである。 (4) A program for causing a computer to function as a signal monitoring device that receives an audio signal and a video signal in which a first audio feature quantity that is an audio feature quantity of the audio signal is multiplexed. And a signal / feature amount extraction means for extracting the first sound feature amount, a feature amount calculation means for calculating a second sound feature amount from the sound signal extracted by the signal / feature amount extraction means, and the signal / feature A program for functioning as a relative delay calculating unit that calculates a video / audio relative delay amount from the first audio feature amount extracted by the amount extracting unit and the second audio feature amount calculated by the second feature amount calculating unit It is.

（５）コンピュータを、音声信号と、音声信号の音声特徴量である第１の音声特徴量が多重された映像信号を受信する信号監視装置として機能させるプログラムであって、前記映像信号から音声信号および第１の音声特徴量を抽出する信号・特徴量抽出手段と、前記信号・特徴量抽出手段が抽出した音声信号から第２の音声特徴量を算出する特徴量算出手段と、前記信号・特徴量抽出手段が抽出した第１の音声特徴量と前記第２の特徴量算出部が算出した第２の音声特徴量から映像・音声相対遅延量を算出する相対遅延算出手段と、前記相対遅延算出手段が算出した映像・音声相対遅延量に基づいて、前記信号・特徴量抽出部が抽出した第１の音声特徴量と前記第２の特徴量算出部が算出した第２の音声特徴量から遅延を補正した第１の音声特徴量と第２の音声特徴量を得る位相調整手段と、前記位相調整手段によって得られた遅延を補正された第１の音声特徴量と第２の音声特徴量を比較して誤り検出を行う音声特徴量比較手段として機能させるためのプログラムである。 (5) A program that causes a computer to function as a signal monitoring device that receives an audio signal and a video signal in which a first audio feature quantity that is an audio feature quantity of the audio signal is multiplexed. And a signal / feature amount extraction means for extracting the first sound feature amount, a feature amount calculation means for calculating a second sound feature amount from the sound signal extracted by the signal / feature amount extraction means, and the signal / feature A relative delay calculating unit that calculates a video / audio relative delay amount from the first audio feature amount extracted by the amount extracting unit and the second audio feature amount calculated by the second feature amount calculating unit; and the relative delay calculating unit. Based on the video / audio relative delay amount calculated by the means, a delay is generated from the first audio feature amount extracted by the signal / feature amount extraction unit and the second audio feature amount calculated by the second feature amount calculation unit. The first sound corrected A phase adjustment unit that obtains the collected amount and the second audio feature amount, and an error detection is performed by comparing the first audio feature amount and the second audio feature amount obtained by correcting the delay obtained by the phase adjustment unit. This is a program for functioning as a voice feature amount comparison unit.

（６）コンピュータを、音声信号と、音声信号の音声特徴量である第１の音声特徴量が多重された映像信号を受信する信号補正装置として機能させるプログラムであって、前記映像信号から音声信号および第１の音声特徴量を抽出する信号・特徴量抽出手段と、前記信号・特徴量抽出手段が抽出した音声信号から第２の音声特徴量を算出する特徴量算出手段と、前記信号・特徴量抽出手段が抽出した第１の音声特徴量と前記第２の特徴量算出部が算出した第２の音声特徴量から映像・音声相対遅延量を算出する相対遅延算出手段と、前記相対遅延算出手段から出力される映像・音声相対遅延量に基づいて、前記音声信号が多重された映像信号の中の音声信号または映像信号のいずれかの遅延を補正する映像／音声信号補正手段として機能させるためのプログラムである。 (6) A program that causes a computer to function as a signal correction device that receives an audio signal and a video signal in which a first audio feature quantity that is an audio feature quantity of the audio signal is multiplexed. And a signal / feature amount extraction means for extracting the first sound feature amount, a feature amount calculation means for calculating a second sound feature amount from the sound signal extracted by the signal / feature amount extraction means, and the signal / feature A relative delay calculating unit that calculates a video / audio relative delay amount from the first audio feature amount extracted by the amount extracting unit and the second audio feature amount calculated by the second feature amount calculating unit; and the relative delay calculating unit. Based on the video / audio relative delay amount output from the means, it functions as a video / audio signal correction means for correcting the delay of either the audio signal or the video signal in the video signal multiplexed with the audio signal. It is a program for.

本発明により、映像信号と音声信号を伝送する際に、伝送過程における信号処理の結果生じる処理遅延によって、映像信号と音声信号の間に遅延が生じた場合の映像信号と音声信号の間の相対遅延を検出する装置を得ることができる。 According to the present invention, when transmitting a video signal and an audio signal, a relative delay between the video signal and the audio signal when a delay occurs between the video signal and the audio signal due to a processing delay resulting from the signal processing in the transmission process. An apparatus for detecting the delay can be obtained.

また、本発明により、伝送過程における機器障害や伝送エラーなどによって生じる信号誤りを監視する際に、映像信号と音声信号の間に遅延が生じた場合に、映像信号と音声信号の間の相対遅延を検出し、送信側から伝送した音声特徴量と受信側で算出した音声特徴量間の相対遅延を補正して比較し音声信号の誤りを検出する装置を得ることができる。 In addition, according to the present invention, when a signal error caused by a device failure or a transmission error in the transmission process is monitored, when a delay occurs between the video signal and the audio signal, the relative delay between the video signal and the audio signal is determined. Thus, a device for detecting an error in a speech signal can be obtained by correcting and comparing the relative delay between the speech feature amount transmitted from the transmission side and the speech feature amount calculated on the reception side.

また、本発明により、検出された映像信号と音声信号の間の相対遅延に基づいて、いずれか進んでいる方の信号を遅延させ、映像信号と音声信号の間の遅延を無くす装置を提供することができる。 Further, according to the present invention, there is provided an apparatus that delays whichever signal is advanced based on the relative delay between the detected video signal and the audio signal, and eliminates the delay between the video signal and the audio signal. be able to.

送信側と受信側それぞれにおける映像信号と音声信号ならびにフレーム毎の映像特徴量と音声特徴量の関係を示す図である。It is a figure which shows the relationship between the video feature-value and audio | voice feature-value for every image | video signal and audio | voice signal and each flame | frame in a transmission side and a receiving side. 本発明の実施例１の信号監視装置を示す図である。It is a figure which shows the signal monitoring apparatus of Example 1 of this invention. 送信側装置で算出された音声特徴量の例を示す図である。It is a figure which shows the example of the audio | voice feature-value calculated by the transmission side apparatus. 受信側装置で算出された音声特徴量の例を示す図である。It is a figure which shows the example of the audio | voice feature-value calculated by the receiving side apparatus. 窓関数w(i)の例を示す図である。It is a figure which shows the example of window function w (i). 音声・映像相対遅延量に応じた検出結果r(i)の例（映像・音声相対遅延量＝０の場合）を示す図である。It is a figure which shows the example (when video / audio relative delay amount = 0) of the detection result r (i) according to the audio | video / video relative delay amount. 音声・映像相対遅延量に応じた検出結果r(i)の例（映像・音声相対遅延量＝０．２５フレーム（映像遅れ・音声進み）の場合）を示す図である。It is a figure which shows the example (in the case of video / audio relative delay amount = 0.25 frame (video delay and audio advance)) of the detection result r (i) according to the audio / video relative delay amount. 音声・映像相対遅延量に応じた検出結果r(i)の例（映像・音声相対遅延量＝１．５フレーム（映像遅れ・音声進み）の場合）を示す図である。It is a figure which shows the example (in the case of video / audio relative delay amount = 1.5 frames (video delay and audio advance)) of the detection result r (i) according to the audio / video relative delay amount. 音声・映像相対遅延量に応じた検出結果r(i)の例（映像・音声相対遅延量＝３０フレーム（映像遅れ・音声進み）の場合）を示す図である。It is a figure which shows the example (in the case of video / audio relative delay amount = 30 frames (video delay and audio advance)) of the detection result r (i) according to the audio / video relative delay amount. 本発明の実施例２の信号監視装置を示す図である。It is a figure which shows the signal monitoring apparatus of Example 2 of this invention. 本発明の実施例３の信号補正装置を示す図である。It is a figure which shows the signal correction apparatus of Example 3 of this invention.

以下、本発明の実施例を図を参照して詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

本発明の実施例１の信号監視装置の構成を図２に示す。本実施例は、映像信号と音声信号を伝送する際に伝送過程で生じた映像信号と音声信号の間の相対遅延を検出する信号監視装置（映像・音声相対遅延測定装置）に関するものである。 The configuration of the signal monitoring apparatus according to the first embodiment of the present invention is shown in FIG. The present embodiment relates to a signal monitoring device (video / audio relative delay measuring device) for detecting a relative delay between a video signal and an audio signal generated in a transmission process when transmitting a video signal and an audio signal.

図２に示すように、信号監視装置は、送信側装置１０からの信号を受信する受信側に設けられた受信側装置２０から成る。 As shown in FIG. 2, the signal monitoring device includes a reception-side device 20 provided on the reception side that receives a signal from the transmission-side device 10.

送信側装置１０は、音声信号が多重された映像信号から音声信号を抽出する信号抽出部１１と、信号抽出部１１が抽出した音声信号から映像フレーム期間毎の音声特徴量a(i)を算出する特徴量算出部１２と、特徴量算出部１２が算出した音声特徴量a(i)を音声信号が多重された映像信号に多重して、音声信号および音声特徴量a(i)が多重された映像信号を出力する補助データ多重部１３と、を備える。 The transmission-side apparatus 10 extracts an audio signal from the video signal multiplexed with the audio signal, and calculates an audio feature amount a (i) for each video frame period from the audio signal extracted by the signal extraction unit 11. And the audio feature quantity a (i) calculated by the feature quantity calculation unit 12 is multiplexed with the video signal multiplexed with the audio signal, and the audio signal and the audio feature quantity a (i) are multiplexed. And an auxiliary data multiplexing unit 13 for outputting the video signal.

受信側装置２０は、音声信号および音声特徴量a(i)が多重された映像信号から音声信号および音声特徴量a(i)を抽出する信号・特徴量抽出部２１と、信号・特徴量抽出部２１が抽出した音声信号から映像フレーム期間毎の音声特徴量b(i)を算出する特徴量算出部２２と、信号・特徴量抽出部２１が抽出した音声特徴量a(i)と特徴量算出部２２が算出した音声特徴量b(i)から映像・音声相対遅延量２５を算出する相対遅延算出部２４と、特徴量算出部２２が算出した音声特徴量b(i)を音声信号および音声特徴量a(i)が多重された映像信号に多重して、音声信号および音声特徴量a(i)およびb(i)が多重された映像信号を出力する補助データ多重部２３と、を備える。なお、本受信側装置２０以降で信号監視を行わない場合は、補助データ多重部２３は必要としない。 The receiving-side device 20 includes a signal / feature amount extraction unit 21 that extracts an audio signal and an audio feature amount a (i) from a video signal in which the audio signal and the audio feature amount a (i) are multiplexed, and a signal / feature amount extraction. A feature amount calculating unit 22 that calculates an audio feature amount b (i) for each video frame period from the audio signal extracted by the unit 21, and an audio feature amount a (i) and feature amount extracted by the signal / feature amount extracting unit 21. The relative delay calculation unit 24 that calculates the video / audio relative delay amount 25 from the audio feature amount b (i) calculated by the calculation unit 22, and the audio feature amount b (i) calculated by the feature amount calculation unit 22 as the audio signal and An auxiliary data multiplexing unit 23 that multiplexes the audio feature a (i) with the multiplexed video signal and outputs the audio signal and the video signal with the audio feature a (i) and b (i) multiplexed. Prepare. Note that the auxiliary data multiplexing unit 23 is not required when signal monitoring is not performed in the receiving side device 20 and later.

音声信号が多重された映像信号は、映像信号に所定の時間（例えば映像フレーム期間）毎に音声信号を多重した信号であり、特徴量算出部１２は、音声信号から所定の時間（例えば映像フレーム期間）毎の音声特徴量a(i)を算出し、特徴量算出部２２は、音声信号から所定の時間（例えば映像フレーム期間）毎の音声特徴量b(i)を算出する。48kHzサンプリングの音声信号を1080／59.94／IのHDTV信号と共に伝送する場合、１フレーム当たりの音声サンプル数は約1600サンプルである。 The video signal in which the audio signal is multiplexed is a signal obtained by multiplexing the audio signal with the video signal every predetermined time (for example, video frame period), and the feature amount calculation unit 12 performs the predetermined time (for example, video frame) from the audio signal. The audio feature quantity a (i) for each period) is calculated, and the feature quantity calculation unit 22 calculates the audio feature quantity b (i) for each predetermined time (for example, video frame period) from the audio signal. When a 48 kHz sampling audio signal is transmitted together with a 1080 / 59.94 / I HDTV signal, the number of audio samples per frame is about 1600 samples.

また、相対遅延算出部２４は、音声特徴量a(i)と音声特徴量b(i)との間で、例えば位相限定相関法によって両者の位相差を算出して映像・音声相対遅延量２５を得る。 Also, the relative delay calculation unit 24 calculates the phase difference between the audio feature quantity a (i) and the audio feature quantity b (i) by, for example, the phase-only correlation method, and the video / audio relative delay quantity 25 is calculated. Get.

この例では、送信側、受信側において、映像信号はSDI(serial digital interface)で入出力され、音声信号は補助データとしてSDIに多重して伝送される。映像フレーム期間毎に算出した音声特徴量a(i)、b(i)は、次映像フレームの補助データ領域に多重する。送信側における映像信号と音声信号の相対遅延を基準とし、送信側において、映像信号と音声信号の相対遅延差はないものとする。ただし、映像信号はSDIに限定されない。 In this example, on the transmitting side and the receiving side, video signals are input / output by SDI (serial digital interface), and audio signals are multiplexed and transmitted on SDI as auxiliary data. The audio feature amounts a (i) and b (i) calculated for each video frame period are multiplexed in the auxiliary data area of the next video frame. Assume that there is no relative delay difference between the video signal and the audio signal on the transmission side on the basis of the relative delay between the video signal and the audio signal on the transmission side. However, the video signal is not limited to SDI.

送信側装置１０には音声信号が補助データパケットに多重されたHD-SDI信号（映像＋音声）１４が入力され、補助データ多重部１３で音声特徴量a(i)が補助データパケットに多重され、音声信号および音声特徴量a(i)が補助データパケットに多重されたHD-SDI信号（映像＋音声＋音声特徴量a(i)）１５が受信側に向けて送信される。図２では送信側装置１０から受信側装置２０までの信号の伝送や信号の処理を包括的に伝送・処理と記載している。 An HD-SDI signal (video + audio) 14 in which an audio signal is multiplexed into an auxiliary data packet is input to the transmission side apparatus 10, and an audio feature quantity a (i) is multiplexed into the auxiliary data packet by an auxiliary data multiplexing unit 13. Then, an HD-SDI signal (video + audio + audio feature quantity a (i)) 15 in which the audio signal and the audio feature quantity a (i) are multiplexed in the auxiliary data packet is transmitted toward the receiving side. In FIG. 2, signal transmission and signal processing from the transmission-side device 10 to the reception-side device 20 are collectively described as transmission / processing.

送信側から伝送や信号の処理を経て受信した音声信号および音声特徴量a(i)が補助データパケットに多重されたHD-SDI信号（映像＋音声＋音声特徴量a(i)）２６が受信側装置２０に入力され、受信側装置２０はHD-SDI信号（映像＋音声＋音声特徴量a(i)+音声特徴量b(i)）２７および映像・音声相対遅延量２５を出力する。 Received is an HD-SDI signal (video + audio + audio feature a (i)) 26 in which the audio signal and audio feature a (i) received from the transmission side through transmission and signal processing are multiplexed in the auxiliary data packet. Input to the side device 20, the receiving side device 20 outputs an HD-SDI signal (video + audio + audio feature amount a (i) + audio feature amount b (i)) 27 and video / audio relative delay amount 25.

音声特徴量、およびこれらの補助データパケットへの多重法は、例えば、ARIB技術資料ARIB TR-B29「放送チェーンにおける映像・音声信号の障害監視のためのメタデータ」（非特許文献１）に規定されている。本実施例では、音声特徴量として、ARIB TR-B29に規定されている振幅情報あるいは位相情報を用いる。ただし、音声特徴量、およびこれらの補助データパケットへの多重法はこれらに限定されない。 For example, ARIB technical document ARIB TR-B29 “Metadata for video / audio signal failure monitoring in the broadcast chain” (Non-patent Document 1) specifies the audio feature quantity and the method of multiplexing these auxiliary data packets. Has been. In the present embodiment, amplitude information or phase information defined in ARIB TR-B29 is used as the audio feature amount. However, the voice feature amount and the multiplexing method to these auxiliary data packets are not limited to these.

信号抽出部１１はHD-SDI信号（映像＋音声）１４の補助データ領域に多重されている音声信号を抽出する。 The signal extraction unit 11 extracts an audio signal multiplexed in the auxiliary data area of the HD-SDI signal (video + audio) 14.

特徴量算出部１２は信号抽出部１１が抽出した音声信号からARIB TR-B29に基づき映像フレーム期間毎の音声特徴量a(i)を算出する。 The feature amount calculation unit 12 calculates an audio feature amount a (i) for each video frame period based on ARIB TR-B29 from the audio signal extracted by the signal extraction unit 11.

補助データ多重部１３は特徴量算出部１２が算出した音声特徴量a(i)を、ARIB TR-B29に基づき補助データとしてHD-SDI信号（映像＋音声）１４に多重する。 The auxiliary data multiplexing unit 13 multiplexes the audio feature quantity a (i) calculated by the feature quantity calculating unit 12 on the HD-SDI signal (video + audio) 14 as auxiliary data based on ARIB TR-B29.

信号・特徴量抽出部２１はHD-SDI信号（映像＋音声＋音声特徴量a(i)）２６の補助データ領域に多重されている音声信号および音声特徴量a(i)を抽出する。 The signal / feature quantity extraction unit 21 extracts the audio signal and the audio feature quantity a (i) multiplexed in the auxiliary data area of the HD-SDI signal (video + audio + audio feature quantity a (i)) 26.

特徴量算出部２２は信号・特徴量抽出部２１が抽出した音声信号からARIB TR-B29に基づき映像フレーム期間毎の音声特徴量b(i)を算出する。 The feature quantity calculation unit 22 calculates the audio feature quantity b (i) for each video frame period based on ARIB TR-B29 from the audio signal extracted by the signal / feature quantity extraction unit 21.

補助データ多重部２３は特徴量算出部２２が算出した音声特徴量b(i)を、ARIB TR-B29に基づき補助データとしてHD-SDI信号（映像＋音声＋音声特徴量a(i)）２６に多重する。 The auxiliary data multiplexing unit 23 uses the audio feature quantity b (i) calculated by the feature quantity calculation unit 22 as HD-SDI signal (video + audio + audio feature quantity a (i)) 26 as auxiliary data based on ARIB TR-B29. To multiplex.

相対遅延算出部２４は信号・特徴量抽出部２１が抽出した音声特徴量a(i)と特徴量算出部２２が算出した音声特徴量b(i)から映像・音声相対遅延量２５を算出する。 The relative delay calculation unit 24 calculates the video / audio relative delay amount 25 from the audio feature amount a (i) extracted by the signal / feature amount extraction unit 21 and the audio feature amount b (i) calculated by the feature amount calculation unit 22. .

伝送過程で、映像信号と音声信号に相対遅延が生じていなければ、受信側で算出した音声特徴量b(i)と、送信側で算出した音声特徴量a(i)の算出対象音声サンプルが一致する。しかし、映像信号と音声信号に相対遅延が生じていると、映像フレーム期間に対応する音声信号期間が時間的にずれることとなり、音声特徴量の算出対象音声サンプルが、送信側と受信側で異なることとなり、音声特徴量は一致しなくなる。この送受の音声特徴量a(i)とb(i)を基に、相対遅延算出部２４は映像・音声相対遅延量２５を以下のようにして位相限定相関法を用いて算出する。 If there is no relative delay between the video signal and the audio signal during the transmission process, the audio feature b (i) calculated on the receiving side and the audio sample to be calculated for the audio feature a (i) calculated on the transmitting side are Match. However, if a relative delay occurs between the video signal and the audio signal, the audio signal period corresponding to the video frame period will be shifted in time, and the audio feature calculation target audio samples are different on the transmission side and the reception side. As a result, the voice feature amounts do not match. Based on the transmitted and received audio feature values a (i) and b (i), the relative delay calculation unit 24 calculates the video / audio relative delay amount 25 using the phase-only correlation method as follows.

送信側および受信側における、現映像フレームを含む過去N(＝2ⁿ)フレーム分の音声特徴量サンプルをそれぞれa(i)、b(i) (i＝-2^n-1，…， 2^n-1 - 1)とする。ここで、iは映像フレームを表す。映像・音声相対遅延量の算出に用いるフレーム数Nは、検出しようとする映像・音声相対遅延量や精度、後述の離散フーリエ変換の計算法に応じて設定すればよい。2のべき乗とするのは、高速フーリエ変換の容易さのためである。 The audio feature quantity samples for the past N (= 2 ⁿ ) frames including the current video frame at the transmitting side and the receiving side are respectively a (i), b (i) (i = −2 ⁿ⁻¹ ,..., 2 ^{n -1-1} ). Here, i represents a video frame. The number of frames N used for calculating the video / audio relative delay amount may be set in accordance with the video / audio relative delay amount and accuracy to be detected and the calculation method of the discrete Fourier transform described later. The power of 2 is for ease of fast Fourier transform.

a(i)およびb(i)に窓関数w(i) ＝ (1+cos(πi／2^n-1))／2をかけた後、離散フーリエ変換してA(j)、B(j)を得る。離散フーリエ変換と離散逆フーリエ変換をそれぞれ記号F{}とF^-1{}で表すこととする。
A(j) ＝ F{a(i)w(i)}
B(j) ＝ F{b(i)w(i)}
窓関数をかけるのは有限個のサンプルに対して離散フーリエ変換を行うためである。 a (i) and b (i) are multiplied by the window function w (i) = (1 + cos (πi / 2 ^n-1 )) / 2, and then subjected to discrete Fourier transform to A (j), B (j ) The discrete Fourier transform and the discrete inverse Fourier transform are represented by symbols F {} and F ⁻¹ {}, respectively.
A (j) = F {a (i) w (i)}
B (j) = F {b (i) w (i)}
The reason for applying the window function is to perform a discrete Fourier transform on a finite number of samples.

A(j)とB(j)から正規化相互パワースペクトルR(j) ＝ A(j)B^*(j) ／ |A(j)B^*(j)|を算出し（*は位相共役を示し、｜｜は絶対値を示す）、位相限定相関関数r(i)をR(j)の離散逆フーリエ変換により求める。
R(j) ＝ A(j)B^*(j) ／ |A(j)B^*(j)| （*は位相共役を示し、｜｜は絶対値を示す）
r(i) ＝ F^-1{R(j)}
a(i)とb(i)が同一であれば、r_p (0)＝1.0が得られる。 Calculate the normalized mutual power spectrum R (j) = A (j) B ^* (j) / | A (j) B ^* (j) | from A (j) and B (j) And || represents an absolute value), and a phase-only correlation function r (i) is obtained by a discrete inverse Fourier transform of R (j).
R (j) = A (j) B ^* (j) / | A (j) B ^* (j) | (* indicates phase conjugation, || indicates absolute value)
r (i) = F ^-1 {R (j)}
If a (i) and b (i) are the same, r _p (0) = 1.0 is obtained.

r(i)から、r(i)の最大値r_pおよびその位置i_pを検出する。
r_p＝r(i_p) (0＜r_p≦1)
r_p ＝ 1.0、i_p ＝ 0の場合、遅延は無いと判断され、以下の処理は不要である。 From r (i), the maximum value r _p of r (i) and its position i _p are detected.
r _p = r (i _p ) (0 <r _p ≦ 1)
When r _p = 1.0 and i _p = 0, it is determined that there is no delay, and the following processing is unnecessary.

r(i)の第二最大値r_sとその位置i_sを検出する。
r_s ＝ r(i_s) (0＜r_s＜r_p)
第二最大値が最大値の下隣接であればsign ＝ -1、第二最大値が最大値r_pの上隣接であればsign ＝ 1、第二最大値が最大値に隣接していなければsign ＝ 0とし、以下の演算によって位相差を検出する。
位相差＝ i_p + sign * {(1 - r_p) + r_s}／2
位相差の符号が、a(i)とb(i)のどちらが進んでいるか、遅れているかを表す。 second maximum value r _s of r (i) and detects the position i _s.
r _s = r (i _s ) (0 <r _s <r _p )
Sign = -1 if the second maximum value is below the maximum value, sign = -1 if the second maximum value is above the maximum value r _p , sign = 1, and if the second maximum value is not adjacent to the maximum value With sign = 0, the phase difference is detected by the following calculation.
Phase difference = i _p + sign * {(1-r _p ) + r _s } / 2
The sign of the phase difference indicates whether a (i) or b (i) is advanced or delayed.

本実施例では、r(i)の最大値と第二最大値から位相差の近似値を求める簡便な方法を用いたが、より高度な方法を用いても良い。 In this embodiment, a simple method for obtaining an approximate value of the phase difference from the maximum value of r (i) and the second maximum value is used, but a more advanced method may be used.

このようにa(i)とb(i)との間で位相限定相関法により両者の位相差を算出することによって、伝送過程で生じた映像・音声相対遅延量が求められる。ただし、位相限定相関法以外の方法を用いて映像・音声相対遅延量を求めてもよい。 Thus, by calculating the phase difference between a (i) and b (i) by the phase-only correlation method, the video / audio relative delay amount generated in the transmission process can be obtained. However, the video / audio relative delay amount may be obtained using a method other than the phase-only correlation method.

次に、図３〜図９を用いて音声特徴量の例、窓関数の例、映像・音声相対遅延量に応じた検出結果r(i)の例について説明する。図３〜９において横軸はフレームである。 Next, an example of the audio feature amount, an example of the window function, and an example of the detection result r (i) corresponding to the video / audio relative delay amount will be described with reference to FIGS. 3 to 9, the horizontal axis is a frame.

図３に送信側装置１０で算出された音声特徴量a(i)の一例を示す。送信側において映像信号と音声信号の相対遅延差はないものとしているから、映像・音声相対遅延量＝0である。ここで、音声特徴量としては、ARIB TR-B29に規定の音声同相情報(AII)を用いている。 FIG. 3 shows an example of the voice feature amount a (i) calculated by the transmission side device 10. Since there is no relative delay difference between the video signal and the audio signal on the transmission side, the video / audio relative delay amount = 0. Here, the audio in-phase information (AII) defined in ARIB TR-B29 is used as the audio feature amount.

図４に伝送・処理の過程で、1.5フレーム（約2400音声サンプル@48kHz）の映像・音声相対遅延が生じた場合の、受信側装置２０で算出された音声特徴量b(i)の一例を示す。 FIG. 4 shows an example of the audio feature b (i) calculated by the receiving-side device 20 when a video / audio relative delay of 1.5 frames (about 2400 audio samples @ 48 kHz) occurs in the transmission / processing process. Show.

図５にN＝256の場合の窓関数w(i)を示す。 FIG. 5 shows the window function w (i) when N = 256.

図６に映像・音声相対遅延量＝0の場合のr(i)を示す。r_p ＝ r(0)、i_p ＝ 0であり、遅延が0であることが分かる。 FIG. 6 shows r (i) when the video / audio relative delay amount = 0. It can be seen that r _p = r (0), i _p = 0, and the delay is zero.

図７に映像・音声相対遅延量＝0.25フレーム（400音声サンプル@48kHz、映像遅れ・音声進み）の場合のr(i)を示す。r_p ＝ r(0) ＝ 0.752、i_p ＝ 0、r_s ＝ r(1) ＝ 0.2429、i_s ＝ 1であり、また、第二最大値が最大値の上隣接であるからsign ＝ 1であり、
位相差＝ i_p + sign * {(1 - r_p) + r_s}／2 ＝ 0 + 1 * {(1 - 0.752) + 0.242}／2 ＝ 0.245フレーム
と算出される。位相差が0.245フレームであるから映像・音声相対遅延量は約0.25フレームである。 FIG. 7 shows r (i) in the case of video / audio relative delay amount = 0.25 frame (400 audio samples @ 48 kHz, video delay / audio advance). r _p = r (0) = 0.752, i _p = 0, r _s = r (1) = 0.2429, i _s = 1 and the second maximum value is the upper neighbor of the maximum value, sign = 1 And
Phase difference _{= i p + sign * {(} 1 - r p) + r s} / 2 = 0 + 1 * {(1 - 0.752) + 0.242} is calculated as / 2 = 0.245 frames. Since the phase difference is 0.245 frames, the video / audio relative delay amount is about 0.25 frames.

図８に映像・音声相対遅延量＝1.5フレーム（映像遅れ・音声進み）の場合のr(i)を示す。r_p ＝ r(1) ＝ 0.533、i_p ＝ 1、r_s ＝ r(2) ＝ 0.482、i_s ＝ 2であり、また、第二最大値が最大値の上隣接であるからsign ＝ 1であり、
位相差＝ i_p + sign * {(1 - r_p) + r_s}／2 ＝ 1 + 1 * {(1 - 0.533) + 0.482}／2 ＝ 1.47フレーム
と算出される。位相差が1.47フレームであるから映像・音声相対遅延量は約1.5フレームである。 FIG. 8 shows r (i) when the video / audio relative delay amount = 1.5 frames (video delay / audio advance). r _p = r (1) = 0.533, i _p = 1, r _s = r (2) = 0.482, i _s = 2, and the second maximum value is adjacent to the maximum value, sign = 1 And
Phase difference = i _p + sign * {(1-r _p ) + r _s } / 2 = 1 + 1 * {(1-0.533) + 0.482} / 2 = 1.47 frames. Since the phase difference is 1.47 frames, the video / audio relative delay is about 1.5 frames.

図９に映像・音声相対遅延量＝30フレーム（映像遅れ・音声進み）の場合のr(i)を示す。r_p ＝ r(30) ＝ 0.761、i_p ＝ 30、r_s ＝ r(-26) ＝ 0.095、i_s ＝ -26であり、最大値と第二最大値が隣接していないからsign ＝ 0であり、
位相差＝ i_p ＝ 30フレーム
と算出される。位相差が30フレームであるから、映像・音声相対遅延量は30フレームである。 FIG. 9 shows r (i) when the video / audio relative delay amount = 30 frames (video delay / audio advance). r _p = r (30) = 0.761, i _p = 30, r _s = r (-26) = 0.095, i _s = -26, and the maximum value and the second maximum value are not adjacent, sign = 0 And
Phase difference = i _p = 30 frames is calculated. Since the phase difference is 30 frames, the video / audio relative delay amount is 30 frames.

以上、図７〜９では映像遅れ・音声進みの例を示したが、逆に映像進み・音声遅れの場合はr(i)の最大値となるi_pが勿論マイナス側に来る。 As described above, FIGS. 7 to 9 show examples of video delay and audio advance. Conversely, in the case of video advance and audio delay, i _{p which} is the maximum value of r (i) is of course on the negative side.

本実施例により、映像信号に音声信号を多重して伝送することによって、送信側では両者の時間差が無いものとして扱うことができる。さらに、送信側で算出した音声特徴量を小サイズのデータとして映像信号および音声信号とともに伝送する。この音声特徴量（映像特徴量も同じ）は、映像フレームに同期して伝送され、伝送過程で映像と音声が分離処理され再び音声信号が再び多重されるときでも、映像信号と補助データ領域の音声特徴量の位置関係は変化しない。受信側では、受信した音声信号から音声特徴量を算出して、送信側で算出し伝送した音声特徴量との間に位相限定相関法を適用することによって、位相差を求めることができる。 According to the present embodiment, the audio signal is multiplexed with the video signal and transmitted, so that it can be handled on the transmission side as having no time difference between them. Further, the audio feature amount calculated on the transmission side is transmitted as small size data together with the video signal and the audio signal. This audio feature amount (same as the video feature amount) is transmitted in synchronization with the video frame, and even when the video and audio are separated in the transmission process and the audio signal is multiplexed again, the video signal and the auxiliary data area The positional relationship of the audio feature amount does not change. On the reception side, a phase difference can be obtained by calculating a voice feature value from the received voice signal and applying a phase-only correlation method to the voice feature value calculated and transmitted on the transmission side.

本受信側装置２０以降でも信号監視を行う場合は、第２、第３、‥の受信装置を備えることによって、送信側装置、受信装置、第２、第３、‥の受信装置の任意の区間の相対遅延を求めることが可能となる。 In the case where signal monitoring is performed even after the receiving side device 20, any section of the transmitting side device, the receiving device, the second, third,... Receiving device is provided by providing the second, third,. Can be obtained.

本発明の実施例２の信号監視装置の構成を図１０に示す。本実施例は、伝送過程で生じた映像信号と音声信号の誤りを検出する信号監視装置（映像・音声信号監視装置）に関するもので、伝送過程で映像信号と音声信号の間に相対遅延が生じた場合でも、音声信号の誤りを検出できる装置である。 The configuration of the signal monitoring apparatus according to the second embodiment of the present invention is shown in FIG. The present embodiment relates to a signal monitoring device (video / audio signal monitoring device) for detecting an error between a video signal and an audio signal generated in the transmission process, and a relative delay occurs between the video signal and the audio signal in the transmission process. This is a device that can detect an error in an audio signal even in the case of an error.

図１０に示すように、信号監視装置は、送信側装置３０からの信号を受信する受信側に設けられた受信側装置４０から成る。 As shown in FIG. 10, the signal monitoring device includes a reception-side device 40 provided on the reception side that receives a signal from the transmission-side device 30.

送信側装置３０は、音声信号が多重された映像信号から音声信号および映像信号を抽出する信号抽出部３１と、信号抽出部３１が抽出した音声信号および映像信号から音声特徴量AI_T(i)および映像特徴量VI_T(i)を算出する特徴量算出部３２と、特徴量算出部３２が算出した音声特徴量AI_T(i)および映像特徴量VI_T(i)を音声信号が多重された映像信号に多重して、音声信号および音声特徴量AI_T(i) および映像特徴量VI_T(i)が多重された映像信号を出力する補助データ多重部３３と、を備える。 The transmitting-side device 30 extracts a sound signal and a video signal from a video signal multiplexed with a sound signal, and a sound feature amount AI _T (i) from the sound signal and the video signal extracted by the signal extraction unit 31. and a feature amount calculation unit 32 that calculates the image feature VI _T (i), audio feature feature amount calculation unit 32 has calculated AI _T (i) and the image feature VI _T a (i) the audio signals are multiplexed And an auxiliary data multiplexing unit 33 that outputs the video signal multiplexed with the audio signal, the audio feature amount AI _T (i), and the video feature amount VI _T (i).

受信側装置４０は、音声信号および音声特徴量AI_T(i)および映像特徴量VI_T(i)が多重された映像信号から音声信号および映像信号および音声特徴量AI_T(i)および映像特徴量VI_T(i)を抽出する信号・特徴量抽出部４１と、抽出した音声信号および映像信号から音声特徴量AI_R(i)および映像特徴量VI_R(i)を算出する特徴量算出部４２と、信号・特徴量抽出部４１が抽出した音声特徴量AI_T(i)と特徴量算出部４２が算出した音声特徴量AI_R(i)から映像・音声相対遅延量４５を算出する相対遅延算出部４４と、相対遅延算出部４４が算出した映像・音声相対遅延量４５に基づいて、AI_T(i)の位相を調整する位相調整部４６とAI_R(i)の位相を補正する位相調整部４７と、位相調整部４６、４７によって補正された音声特徴量AI_T(i)と音声特徴量AI_R(i)を比較して誤り検出を行う音声特徴量比較部４８と、信号・特徴量抽出部４１が抽出した映像特徴量VI_T(i)と特徴量算出部４２が算出した映像特徴量VI_R(i)を比較して誤り検出を行う映像特徴量比較部４９と、音声特徴量AI_R(i)および映像特徴量VI_R(i)を音声信号および音声特徴量AI_T(i)および映像特徴量VI_T(i)が多重された映像信号に多重して、音声信号および音声特徴量AI_T(i)およびAI_R(i)、映像特徴量VI_T(i)およびVI_R(i)が多重された映像信号を出力する補助データ多重部４３と、を備える。なお、本受信側装置４０以降で信号監視を行わない場合は、補助データ多重部４３は必要としない。 The receiving-side device 40 receives the audio signal, the video signal, the audio feature quantity AI _T (i), and the video feature from the video signal in which the audio signal, the audio feature quantity AI _T (i), and the video feature quantity VI _T (i) are multiplexed. A signal / feature amount extraction unit 41 that extracts the amount VI _T (i), and a feature amount calculation unit that calculates the audio feature amount AI _R (i) and the video feature amount VI _R (i) from the extracted audio signal and video signal. 42, and a relative audio / video relative delay amount 45 calculated from the audio feature amount AI _T (i) extracted by the signal / feature amount extraction unit 41 and the audio feature amount AI _R (i) calculated by the feature amount calculation unit 42. Based on the delay calculation unit 44 and the video / audio relative delay amount 45 calculated by the relative delay calculation unit 44, the phase adjustment unit 46 that adjusts the phase of AI _T (i) and the phase of AI _R (i) are corrected. a phase adjusting unit 47 compares the voice characteristic amount is corrected by the phase adjusting unit 46, 47 AI _T (i) and speech features AI _R (i) Ri and audio feature amount comparison section 48 for detecting, by comparing the image feature amount signal, the feature extraction unit 41 has extracted VI _T (i) the image feature feature quantity calculating unit 42 is calculated and VI _R (i) A video feature quantity comparison unit 49 that performs error detection and an audio feature quantity AI _R (i) and a video feature quantity VI _R (i) as an audio signal, a voice feature quantity AI _T (i), and a video feature quantity VI _T (i ) Is multiplexed into the multiplexed video signal, and the audio signal and audio feature quantities AI _T (i) and AI _R (i), and the video feature quantities VI _T (i) and VI _R (i) are multiplexed. And an auxiliary data multiplexing unit 43 for outputting. Note that the auxiliary data multiplexing unit 43 is not required when signal monitoring is not performed after the receiving side device 40.

音声信号が多重された映像信号は、映像信号に所定の時間（例えば映像フレーム期間）毎に音声信号を多重した信号であり、特徴量算出部３２は、音声信号、映像信号から所定の時間（例えば映像フレーム期間）毎の音声特徴量AI_T(i)、映像特徴量VI_T(i)を算出し、特徴量算出部４２は、音声信号、映像信号から所定の時間（例えば映像フレーム期間）毎の音声特徴量AI_R(i)、映像特徴量VI_R(i)を算出する。また、相対遅延算出部４４は、音声特徴量AI_T(i)と音声特徴量AI_R(i)との間で、例えば、位相限定相関法によって両者の位相差を算出して映像・音声相対遅延量（位相差）４５を得る。 The video signal in which the audio signal is multiplexed is a signal obtained by multiplexing the audio signal with the video signal every predetermined time (for example, video frame period), and the feature amount calculation unit 32 performs a predetermined time (from the audio signal and the video signal). For example, an audio feature value AI _T (i) and a video feature value VI _T (i) are calculated for each video frame period), and the feature value calculation unit 42 calculates a predetermined time (for example, a video frame period) from the audio signal and the video signal. The audio feature amount AI _R (i) and the video feature amount VI _R (i) are calculated for each. Also, the relative delay calculation unit 44 calculates the phase difference between the audio feature value AI _T (i) and the audio feature value AI _R (i) by, for example, the phase-only correlation method to calculate the video / audio relative A delay amount (phase difference) 45 is obtained.

送信側装置３０には音声信号が補助データパケットに多重されたHD-SDI信号（映像＋音声）３４が入力され、送信側装置３０で音声特徴量AI_T(i)および映像特徴量VI_T(i)が補助データパケットに多重され、音声信号および音声特徴量AI_T(i)および映像特徴量VI_T(i)が補助データパケットに多重されたHD-SDI信号（映像＋音声＋映像特徴量VI_T(i)＋音声特徴量AI_T(i)）３５が受信側に向けて送信される。図１０では送信側装置３０から受信側装置４０までの信号の伝送や信号の処理を包括的に伝送・処理と記載している。 An HD-SDI signal (video + audio) 34 in which an audio signal is multiplexed in an auxiliary data packet is input to the transmission side device 30, and the audio feature amount AI _T (i) and the video feature amount VI _T ( i) is multiplexed into the auxiliary data packet, and the audio signal and audio feature value AI _T (i) and video feature value VI _T (i) are multiplexed into the auxiliary data packet (video + audio + video feature value). VI _T (i) + voice feature quantity AI _T (i)) 35 is transmitted toward the receiving side. In FIG. 10, signal transmission and signal processing from the transmission-side device 30 to the reception-side device 40 are collectively described as transmission / processing.

送信側から受信した音声信号および音声特徴量AI_T(i)および映像特徴量VI_T(i)が補助データパケットに多重されたHD-SDI信号（映像＋音声＋映像特徴量VI_T(i)＋音声特徴量AI_T(i)）５０が受信側装置４０に入力され、受信側装置４０はHD-SDI信号（映像＋音声＋映像特徴量VI_T(i)＋映像特徴量VI_R(i)＋音声特徴量AI_T(i)＋音声特徴量AI_R(i)）５１を出力する。また、受信側装置４０は相対遅延算出部４４の出力である映像・音声相対遅延量（位相差）４５を用いて図示しない表示装置に映音遅延状態表示を行い、映像特徴量比較部４９、音声特徴量比較部４８の出力を用いて図示しない表示装置に映像状態表示、音声状態表示を行う。 An HD-SDI signal (video + audio + video feature VI _T (i) in which the audio signal, audio feature AI _T (i) and video feature VI _T (i) received from the transmission side are multiplexed in the auxiliary data packet + Audio feature amount AI _T (i)) 50 is input to the receiving side device 40, and the receiving side device 40 receives the HD-SDI signal (video + audio + video feature amount VI _T (i) + video feature amount VI _R (i ) + Voice feature quantity AI _T (i) + voice feature quantity AI _R (i)) 51. In addition, the receiving-side device 40 displays the sound delay state on a display device (not shown) using the video / audio relative delay amount (phase difference) 45 that is the output of the relative delay calculation unit 44, and the video feature amount comparison unit 49, Using the output of the audio feature quantity comparison unit 48, video state display and audio state display are performed on a display device (not shown).

本実施例では、映像特徴量、音声特徴量として、ARIB TR-B29（非特許文献１）に規定されている映像空間情報、映像時間情報、音声振幅情報、音声位相情報を用いる。ただし、映像特徴量、音声特徴量、およびこれらの補助データパケットへの多重法はこれらに限定されない。 In the present embodiment, video space information, video time information, audio amplitude information, and audio phase information defined in ARIB TR-B29 (Non-Patent Document 1) are used as video feature amounts and audio feature amounts. However, the video feature value, the audio feature value, and the multiplexing method of these to the auxiliary data packet are not limited to these.

信号抽出部３１は、HD-SDI信号（映像＋音声）３４から、映像信号および補助データ領域に多重されている音声信号を抽出する。 The signal extraction unit 31 extracts the video signal and the audio signal multiplexed in the auxiliary data area from the HD-SDI signal (video + audio) 34.

特徴量算出部３２は映像信号、音声信号からARIB TR-B29に基づき映像特徴量VI_T(i)、音声特徴量AI_T(i)を算出する。 The feature amount calculation unit 32 calculates the video feature amount VI _T (i) and the audio feature amount AI _T (i) based on ARIB TR-B29 from the video signal and the audio signal.

補助データ多重部３３は映像特徴量VI_T(i)、音声特徴量AI_T(i)を、ARIB TR-B29に基づき補助データとしてHD-SDI信号（映像＋音声）３４に多重する。 The auxiliary data multiplexing unit 33 multiplexes the video feature amount VI _T (i) and the audio feature amount AI _T (i) as the auxiliary data on the HD-SDI signal (video + audio) 34 based on ARIB TR-B29.

信号・特徴量抽出部４１はHD-SDI信号（映像＋音声＋映像特徴量VI_T(i)＋音声特徴量AI_T(i)）５０から、映像信号および補助データ領域に多重されている音声信号および映像特徴量VI_T(i)および音声特徴量AI_T(i)を抽出する。 The signal / feature quantity extraction unit 41 uses the HD-SDI signal (video + audio + video feature quantity VI _T (i) + audio feature quantity AI _T (i)) 50 to multiplex the video signal and the auxiliary data area. A signal and video feature amount VI _T (i) and an audio feature amount AI _T (i) are extracted.

相対遅延算出部４４は信号・特徴量抽出部４１が抽出した音声特徴量AI_T(i)と特徴量算出部４２が算出した音声特徴量AI_R(i)から映像・音声相対遅延量４５を算出する。 The relative delay calculation unit 44 calculates a video / audio relative delay amount 45 from the audio feature amount AI _T (i) extracted by the signal / feature amount extraction unit 41 and the audio feature amount AI _R (i) calculated by the feature amount calculation unit 42. calculate.

位相調整部４６および位相調整部４７は、それぞれ、算出した映像・音声相対遅延量に基づき、信号・特徴量抽出部４１が抽出した音声特徴量AI_T(i)および特徴量算出部４２が算出した音声特徴量AI_R(i)の位相を補正する。なお、実際には遅れている方に対して進んでいる方を遅らせることで補正するので位相調整部４６、４７のいずれか一方で補正する（遅らせる）ことになる。 The phase adjustment unit 46 and the phase adjustment unit 47 are calculated by the audio feature amount AI _T (i) extracted by the signal / feature amount extraction unit 41 and the feature amount calculation unit 42 based on the calculated video / audio relative delay amount, respectively. The phase of the audio feature value AI _R (i) is corrected. In fact, since the correction is made by delaying the one that is advanced with respect to the one that is behind, the correction is made (delayed) by either one of the phase adjustment units 46 and 47.

音声特徴量比較部４８は位相調整部４６によって補正された音声特徴量AI_T(i)と位相調整部４７によって補正された音声特徴量AI_R(i)をフレーム毎に比較する。 The audio feature amount comparison unit 48 compares the audio feature amount AI _T (i) corrected by the phase adjustment unit 46 and the audio feature amount AI _R (i) corrected by the phase adjustment unit 47 for each frame.

映像特徴量比較部４９は信号・特徴量抽出部４１が抽出した映像特徴量VI_T(i)と特徴量算出部４２が算出した映像特徴量VI_R(i)をフレーム毎に比較する。 The video feature amount comparison unit 49 compares the video feature amount VI _T (i) extracted by the signal / feature amount extraction unit 41 with the video feature amount VI _R (i) calculated by the feature amount calculation unit 42 for each frame.

補助データ多重部４３は映像特徴量VI_R(i)、音声特徴量AI_R(i)を、ARIB TR-B29に基づき補助データとしてHD-SDI信号（映像＋音声＋映像特徴量VI_T(i)＋音声特徴量AI_T(i)）５０に多重する。このVI_T(i)、 AI_T(i)は映像フレームに同期して伝送される。 The auxiliary data multiplexing unit 43 uses the video feature value VI _R (i) and the audio feature value AI _R (i) as auxiliary data based on ARIB TR-B29 as HD-SDI signals (video + audio + video feature value VI _T (i ) + Voice feature amount AI _T (i)) 50. These VI _T (i) and AI _T (i) are transmitted in synchronization with the video frame.

送信側において、送信側装置３０は、所定の映像特徴量VI_T(i)、音声特徴量AI_T(i)を映像フレーム期間毎に算出し、次フレームの補助データ領域に多重する。48kHzサンプリングの音声信号を1080／59.94／IのHDTV信号と共に伝送する場合、１フレーム当たりの音声サンプル数は約1600サンプルである。 On the transmission side, the transmission side device 30 calculates a predetermined video feature amount VI _T (i) and audio feature amount AI _T (i) for each video frame period, and multiplexes them in the auxiliary data area of the next frame. When a 48 kHz sampling audio signal is transmitted together with a 1080 / 59.94 / I HDTV signal, the number of audio samples per frame is about 1600 samples.

受信側では、映像信号、音声信号と共に、送信側で算出・多重された映像特徴量VI_T(i)、音声特徴量AI_T(i)が受信される。受信側では、受信側装置４０が、受信した映像信号および音声信号から、送信側で算出し伝送したのと同じ定義に基づく映像特徴量VI_R(i)、音声特徴量AI_R(i)を算出する。 On the receiving side, the video feature amount VI _T (i) and the audio feature amount AI _T (i) calculated and multiplexed on the transmitting side are received together with the video signal and the audio signal. On the receiving side, the receiving side device 40 calculates the video feature amount VI _R (i) and the audio feature amount AI _R (i) based on the same definitions as those calculated and transmitted from the received video signal and audio signal on the transmitting side. calculate.

伝送過程で、映像信号と音声信号に相対遅延が生じていなければ、受信側で算出した音声特徴量AI_R(i)と、送信側で算出した音声特徴量AI_T(i)の算出対象音声サンプルが一致する。しかし、映像信号と音声信号に相対遅延が生じていると、映像フレーム期間に対応する音声サンプルが時間的にずれることとなり、音声特徴量の算出対象音声サンプルが、送信側と受信側で異なることとなり、音声特徴量は一致しない。 If there is no relative delay between the video signal and the audio signal during the transmission process, the target audio of the audio feature value AI _R (i) calculated on the receiving side and the audio feature value AI _T (i) calculated on the transmitting side The samples match. However, if there is a relative delay between the video signal and the audio signal, the audio sample corresponding to the video frame period will be shifted in time, and the audio sample for which the audio feature value is calculated will be different between the transmission side and the reception side. Thus, the voice feature amounts do not match.

この送受の音声特徴量を基に、相対遅延算出部４４は映像・音声相対遅延量４５を位相限定相関法を用いて算出する。 Based on the transmitted and received audio feature quantities, the relative delay calculation unit 44 calculates the video / audio relative delay quantity 45 using the phase only correlation method.

すなわち、受信した音声信号から算出した音声特徴量列と、送信側で算出し伝送した音声特徴量列との間に位相限定相関法を適用することによって、送受信間の音声特徴量の位相差を求める。送受信間の音声特徴量の位相差は、映像と音声との相対遅延に相当する。 That is, by applying the phase-only correlation method between the speech feature value sequence calculated from the received speech signal and the speech feature value sequence calculated and transmitted on the transmission side, the phase difference of the speech feature value between transmission and reception is reduced. Ask. The phase difference of the audio feature amount between transmission and reception corresponds to the relative delay between video and audio.

本実施例の相対遅延算出部４４の算出方法は、実施例１の相対遅延算出部２４と同様である。実施例１のa(i)、b(i)をAI_T(i)、AI_R(i)に置き換えれば全く同じ演算であるので、ここでは繰り返さないが、実施例１と同様に、AI_T(i)とAI_R(i)から位相限定相関関数r(i)を求め、その最大値と第二最大値およびそれらの位置から位相差（映像・音声相対遅延量）を求める。位相差の符号が、AI_T(i)とAI_R(i)のどちらが進んでいるか（遅れているか）を表す。AI_T(i)とAI_R(i)が同一であれば、r_p(0) ＝ 1.0が得られる。 The calculation method of the relative delay calculation unit 44 of the present embodiment is the same as that of the relative delay calculation unit 24 of the first embodiment. If a (i) and b (i) in the first embodiment are replaced with AI _T (i) and AI _R (i), the operation is exactly the same. Therefore, although not repeated here, as in the first embodiment, AI _T A phase-only correlation function r (i) is obtained from (i) and AI _R (i), and a phase difference (video / audio relative delay amount) is obtained from the maximum value, the second maximum value, and their positions. The sign of the phase difference indicates whether AI _T (i) or AI _R (i) is advanced (delayed). If AI _T (i) and AI _R (i) are the same, then r _p (0) = 1.0 is obtained.

このようにして、伝送過程で生じた映像・音声相対遅延量が求められる。 In this way, the video / audio relative delay amount generated in the transmission process is obtained.

また、実施例１において、図３〜図９を用いて音声特徴量の例、窓関数の例、映像・音声相対遅延量に応じた検出結果r(i)の例について説明したが、本実施例においても同様である。 In the first embodiment, the example of the audio feature amount, the example of the window function, and the example of the detection result r (i) corresponding to the video / audio relative delay amount have been described with reference to FIGS. The same applies to the examples.

この映像・音声相対遅延量４５を基に、位相調整部４６で、信号・特徴量抽出部４１が抽出した音声特徴量AI_T(i)および特徴量算出部４２が算出した音声特徴量AI_R(i)の位相を補正する。位相差を与える処理は、例えば以下のように表すことができる。 Based on this video / audio relative delay amount 45, the phase adjustment unit 46 extracts the audio feature amount AI _T (i) extracted by the signal / feature amount extraction unit 41 and the audio feature amount AI _R calculated by the feature amount calculation unit 42. Correct the phase of (i). The process for giving the phase difference can be expressed as follows, for example.

f(t)のフーリエ変換をF(ω)とすると、f(t)をt₀だけシフトしたf(t - t₀)のフーリエ変換はF(ω)exp(-jωt₀)であらわされるから（ｊは虚数単位）、音声特徴列a(i)に対し、そのフーリエ変換A(k)にexp(-jωt₀) （ただし、位相差t₀、ω＝2πk）を乗じ、その逆フーリエ変換を得る。 Assuming that the Fourier transform of f (t) is F (ω), the Fourier transform of f (t-t ₀ ) obtained by shifting f (t) by t ₀ is expressed by F (ω) exp (-jωt ₀ ) (J is an imaginary unit), and the Fourier transform A (k) is multiplied by exp (-jωt ₀ ) (where phase difference t ₀ , ω = 2πk) is applied to the speech feature sequence a (i), and the inverse Fourier transform is performed. Get.

音声特徴量はフレーム単位で算出されたものであるため、整数フレームの遅延の場合は、位相調整部４６は特徴量サンプルを位相差分シフトすればよい。小数フレームの遅延の場合は、遅延量に応じて上記のような補間処理を行う必要がある。 Since the audio feature amount is calculated in units of frames, in the case of an integer frame delay, the phase adjustment unit 46 may shift the feature amount sample by a phase difference. In the case of a delay of a fractional frame, it is necessary to perform the above interpolation processing according to the delay amount.

位相補正した音声特徴量を用い、送受信間の音声特徴量の差分を求めることで、音声信号異常を検出する。差分が所定の閾値以内であれば、異常はないと判断する。 An audio signal abnormality is detected by obtaining a difference in audio feature amount between transmission and reception using the phase-corrected audio feature amount. If the difference is within a predetermined threshold, it is determined that there is no abnormality.

映像特徴量については、送受の特徴量間の差分を求めることで、映像信号異常を検出する。差分が所定の閾値以内であれば、異常はないと判断する。 As for the video feature amount, a video signal abnormality is detected by obtaining a difference between the transmission and reception feature amounts. If the difference is within a predetermined threshold, it is determined that there is no abnormality.

本実施例においては、送受信間の音声特徴量の位相差に応じて、送信側あるいは受信側の音声特徴量の位相をずらすことによって、遅延補正した音声特徴量を得る。遅延補正した音声特徴量を用いて、送受間で音声特徴量の差を求める。これによって、映音相対遅延が生じた場合でも、伝送過程の障害や雑音などによる音声信号のエラーを検出することが可能となる。 In the present embodiment, the delay-corrected speech feature value is obtained by shifting the phase of the speech feature value on the transmission side or the reception side in accordance with the phase difference of the speech feature value between transmission and reception. By using the delay-corrected voice feature quantity, a difference in voice feature quantity between transmission and reception is obtained. This makes it possible to detect an audio signal error due to a transmission process failure or noise even when a relative sound delay occurs.

受信側装置４０以降でも信号監視を行う場合は、第２、第３、‥の受信装置を備えることによって、送信側装置、受信装置、第２、第３、‥の受信装置の任意の区間の相対遅延検出ならびにエラー検出が可能となる。 When the signal monitoring is performed even after the receiving side device 40, the second, third,... Receiving devices are provided, so that any section of the transmitting side device, the receiving device, the second, third,. Relative delay detection and error detection are possible.

実施例２では、音声特徴量と映像特徴量の両方を多重しているが、音声特徴量だけを多重する実施例１に、実施例２のように位相調整部４６、４７、音声特徴量比較部４８を設けて、音声信号異常を検出するようにしてもよい。 In the second embodiment, both the audio feature amount and the video feature amount are multiplexed. However, the phase adjustment units 46 and 47 and the audio feature amount comparison are performed in the first embodiment that multiplexes only the audio feature amount as in the second embodiment. The unit 48 may be provided to detect an audio signal abnormality.

また、実施例１では、音声特徴量だけを多重しているが、実施例２のように音声特徴量と映像特徴量の両方を多重して、映像特徴量比較部４９を設けて、映像信号異常を検出するようにしてもよい。 Further, in the first embodiment, only the audio feature amount is multiplexed. However, as in the second embodiment, both the audio feature amount and the video feature amount are multiplexed, and a video feature amount comparison unit 49 is provided so that a video signal is provided. An abnormality may be detected.

本発明の実施例３の信号補正装置の構成を図１１に示す。本実施例は、伝送過程で映像信号と音声信号の間に相対遅延が生じた場合でも、両信号間の相対遅延を検出し、検出した遅延分進んでいる方の信号を遅らせる補正をすることで相対遅延を解消する装置である。 FIG. 11 shows the configuration of the signal correction apparatus according to the third embodiment of the present invention. In this embodiment, even when a relative delay occurs between the video signal and the audio signal in the transmission process, the relative delay between the two signals is detected, and correction is performed to delay the signal that is advanced by the detected delay. This is a device that eliminates the relative delay.

図１１に示すように、信号補正装置は、映像信号と、映像フレームに同期した音声特徴量および音声信号が多重された信号を受信する受信側装置６０から成る。また、これは実施例１の信号監視装置２０に、補助データ多重部２３を除き、映像／音声信号補正部６６を加えた構成となっている。 As shown in FIG. 11, the signal correction device includes a reception-side device 60 that receives a video signal and a signal obtained by multiplexing an audio feature and an audio signal synchronized with a video frame. Further, this is a configuration in which a video / audio signal correction unit 66 is added to the signal monitoring device 20 of the first embodiment except for the auxiliary data multiplexing unit 23.

受信側装置６０は、音声信号および音声特徴量a(i)が多重された映像信号から音声信号および音声特徴量a(i)を抽出する信号・特徴量抽出部６１と、信号・特徴量抽出部６１が抽出した音声信号から映像フレーム期間毎の音声特徴量b(i)を算出する特徴量算出部６２と、信号・特徴量抽出部６１が抽出した音声特徴量a(i)と特徴量算出部２２が算出した音声特徴量b(i)から映像・音声相対遅延量６５を算出する相対遅延算出部６４と、算出した相対遅延分だけ映像信号または音声信号を遅延させる映像／音声信号補正部６６、から構成される。 The reception-side device 60 includes a signal / feature amount extraction unit 61 that extracts an audio signal and an audio feature amount a (i) from a video signal in which the audio signal and the audio feature amount a (i) are multiplexed, and a signal / feature amount extraction unit. A feature amount calculation unit 62 that calculates an audio feature amount b (i) for each video frame period from the audio signal extracted by the unit 61; an audio feature amount a (i) and a feature amount extracted by the signal / feature amount extraction unit 61 A relative delay calculation unit 64 that calculates a video / audio relative delay amount 65 from the audio feature amount b (i) calculated by the calculation unit 22, and a video / audio signal correction that delays the video signal or the audio signal by the calculated relative delay. Part 66.

映像・音声相対遅延量６５を算出するまでの動作は実施例１と同様である。 The operations until the video / audio relative delay amount 65 is calculated are the same as those in the first embodiment.

映像／音声信号補正部６６は、想定される遅延量に相当するバッファを設けておいて［映像＋音声（補助データ領域）］信号を滞留できるようにし、補助データ領域へアクセスすることで音声信号の差し替え、すなわち映像・音声相対遅延量に応じた遅れまたは進みの遅延調整を行う。映像・音声相対遅延量が映像遅れ・音声進みならば、補助データ領域に多重されている音声信号（サンプリングデータ）を算出された遅延量だけ遅延させる。また、映像進み・音声遅れならば、補助データ領域の音声信号を、算出された遅延量だけ遅延させた映像信号に移動させる。 The video / audio signal correcting unit 66 is provided with a buffer corresponding to the assumed delay amount so that the [video + audio (auxiliary data area)] signal can stay, and the audio signal is accessed by accessing the auxiliary data area. In other words, delay or advance delay adjustment is performed in accordance with the video / audio relative delay amount. If the video / audio relative delay amount is video delay / audio advance, the audio signal (sampling data) multiplexed in the auxiliary data area is delayed by the calculated delay amount. If the video advance / audio delay occurs, the audio signal in the auxiliary data area is moved to a video signal delayed by the calculated delay amount.

このようにして、映像／音声信号補正部６６は、伝送過程での映像・音声の相対遅延を補正した映像信号および音声信号を出力できる。また、必要ならば、後段に実施例１の送信側装置１０に相当する装置を設け、改めて音声信号に対する音声特徴量を算出し、次映像フレームに付加してもよい。 In this way, the video / audio signal correction unit 66 can output a video signal and an audio signal in which the relative delay of the video / audio during the transmission process is corrected. Further, if necessary, a device corresponding to the transmission-side device 10 of the first embodiment may be provided in the subsequent stage, and the audio feature amount for the audio signal may be calculated again and added to the next video frame.

実施例１、２の信号監視装置および実施例３の信号補正装置はHD-SDIのインタフェース部を除いてコンピュータとプログラムで構成することができる。また、そのプログラムの一部または全部をハードウェアで構成してもよい。 The signal monitoring apparatus according to the first and second embodiments and the signal correction apparatus according to the third embodiment can be configured by a computer and a program, except for the HD-SDI interface unit. Moreover, you may comprise a part or all of the program with a hardware.

以上、本発明者によってなされた発明を、前記実施例に基づき具体的に説明したが、本発明は、前記実施例に限定されるものではなく、その要旨を逸脱しない範囲において種々変更可能であることは勿論である。 As mentioned above, the invention made by the present inventor has been specifically described based on the above embodiments. However, the present invention is not limited to the above embodiments, and various modifications can be made without departing from the scope of the invention. Of course.

１０…送信側装置、１１…信号抽出部、１２…特徴量算出部、１３…補助データ多重部、１４…送信側装置１０に入力されるHD-SDI（映像＋音声）、１５…送信側装置１０から出力されるHD-SDI（映像＋音声＋音声特徴量a(i)）、２０…受信側装置、２１…信号・特徴量抽出部、２２…特徴量算出部、２３…補助データ多重部、２４…相対遅延算出部、２５…映像・音声相対遅延量、２６…受信側装置２０に入力されるHD-SDI（映像＋音声＋音声特徴量a(i)）、２７…受信側装置２０から出力されるHD-SDI（映像＋音声＋音声特徴量a(i)＋音声特徴量b(i)）、３０…送信側装置、３１…信号抽出部、３２…特徴量算出部、３３…補助データ多重部、３４…送信側装置３０に入力されるHD-SDI（映像＋音声）、３５…送信側装置３０から出力されるHD-SDI（映像＋音声＋映像特徴量VI_T(i)＋音声特徴量AI_T(i)）、４０…受信側装置、４１…信号・特徴量抽出部、４２…特徴量算出部、４３…補助データ多重部、４４…相対遅延算出部、４５…映像・音声相対遅延量、４６，４７…位相調整部、４８…音声特徴量比較部、４９…映像特徴量比較部、５０…受信側装置４０に入力されるHD-SDI（映像＋音声＋映像特徴量VI_T(i)＋音声特徴量VI_T(i)）、５１…受信側装置４０から出力されるHD-SDI（映像＋音声＋映像特徴量VI_T(i)＋映像特徴量VI_R(i)＋音声特徴量VI_T(i)＋音声特徴量VI_R(i)）、６０…受信側装置、６１…信号・特徴量抽出部、６２…特徴量算出部、６４…相対遅延算出部、６５…映像・音声相対遅延量、６６…映像／音声信号補正部 DESCRIPTION OF SYMBOLS 10 ... Transmission side apparatus, 11 ... Signal extraction part, 12 ... Feature-value calculation part, 13 ... Auxiliary data multiplexing part, 14 ... HD-SDI (video + audio | voice) input into the transmission side apparatus 10, 15 ... Transmission side apparatus HD-SDI (video + audio + audio feature quantity a (i)) output from 10, 20... Receiving side device, 21... Signal / feature quantity extraction unit, 22 .. feature quantity calculation unit, 23 .. auxiliary data multiplexing unit , 24 ... relative delay calculation unit, 25 ... video / audio relative delay amount, 26 ... HD-SDI (video + audio + audio feature amount a (i)) input to the reception side device 20, 27 ... reception side device 20 HD-SDI (video + audio + audio feature value a (i) + audio feature value b (i)), 30... Transmission side device, 31... Signal extraction unit, 32. Auxiliary data multiplexing unit 34... HD-SDI (video + audio) input to the transmission-side device 30 35. HD-SDI (video +) output from the transmission-side device 30 Audio + video feature amount VI _T (i) + audio feature amount AI _T (i)), 40... Receiving side device, 41... Signal / feature amount extraction unit, 42... Feature amount calculation unit, 43. 44 ... Relative delay calculation unit, 45 ... Video / audio relative delay amount, 46, 47 ... Phase adjustment unit, 48 ... Audio feature amount comparison unit, 49 ... Video feature amount comparison unit, 50 ... Input to receiving apparatus 40 HD-SDI (video + audio + video feature VI _T (i) + audio feature VI _T (i)), 51 ... HD-SDI (video + audio + video feature VI _T output from the receiving device 40 (i) + image feature value VI _R (i) + audio feature value VI _T (i) + audio feature value VI _R (i)), 60... reception side device, 61... signal / feature value extraction unit, 62. Amount calculation unit, 64 ... relative delay calculation unit, 65 ... video / audio relative delay amount, 66 ... video / audio signal correction unit

Claims

A signal monitoring device that receives an audio signal and a video signal in which a first audio feature amount that is an audio feature amount of the audio signal is multiplexed,
A signal / feature amount extraction unit for extracting an audio signal and a first audio feature amount from the video signal;
A feature amount calculator that calculates a second speech feature amount from the speech signal extracted by the signal / feature amount extractor;
A relative delay calculation unit that calculates a video / audio relative delay amount from the first audio feature amount extracted by the signal / feature amount extraction unit and the second audio feature amount calculated by the second feature amount calculation unit;
A signal monitoring device comprising:

Based on the video / audio relative delay amount calculated by the relative delay calculation unit, the first audio feature amount extracted by the signal / feature amount extraction unit and the second audio calculated by the second feature amount calculation unit A phase adjustment unit for obtaining a first audio feature quantity and a second audio feature quantity, in which a delay is corrected from the feature quantity;
A speech feature amount comparison unit that performs error detection by comparing the first speech feature amount with the delay corrected by the phase adjustment unit and the second speech feature amount;
The signal monitoring apparatus according to claim 1, further comprising:

A signal monitoring device according to claim 1;
A video / audio signal correction unit that corrects a delay of either the audio signal or the video signal in the video signal multiplexed with the audio signal based on the video / audio relative delay amount output from the signal monitoring device; ,
A signal correction apparatus comprising:

A program that causes a computer to function as a signal monitoring device that receives an audio signal and a video signal in which a first audio feature quantity that is an audio feature quantity of the audio signal is multiplexed,
A signal / feature amount extraction means for extracting an audio signal and a first audio feature amount from the video signal;
Feature quantity calculation means for calculating a second voice feature quantity from the voice signal extracted by the signal / feature quantity extraction means;
Functions as a relative delay calculation unit that calculates a video / audio relative delay amount from the first audio feature amount extracted by the signal / feature amount extraction unit and the second audio feature amount calculated by the second feature amount calculation unit. Program to let you.

A program that causes a computer to function as a signal monitoring device that receives an audio signal and a video signal in which a first audio feature quantity that is an audio feature quantity of the audio signal is multiplexed,
A signal / feature amount extraction means for extracting an audio signal and a first audio feature amount from the video signal;
Feature quantity calculation means for calculating a second voice feature quantity from the voice signal extracted by the signal / feature quantity extraction means;
A relative delay calculating unit that calculates a video / audio relative delay amount from the first audio feature amount extracted by the signal / feature amount extracting unit and the second audio feature amount calculated by the second feature amount calculating unit;
Based on the video / audio relative delay amount calculated by the relative delay calculating means, the first audio feature amount extracted by the signal / feature amount extraction unit and the second audio calculated by the second feature amount calculation unit. Phase adjusting means for obtaining a first voice feature quantity and a second voice feature quantity, in which a delay is corrected from the feature quantity;
A program for functioning as a speech feature amount comparison unit that performs error detection by comparing the first speech feature amount with the delay corrected by the phase adjustment unit and the second speech feature amount.

A program that causes a computer to function as a signal correction device that receives an audio signal and a video signal in which a first audio feature amount that is an audio feature amount of the audio signal is multiplexed,
A signal / feature amount extraction means for extracting an audio signal and a first audio feature amount from the video signal;
Feature quantity calculation means for calculating a second voice feature quantity from the voice signal extracted by the signal / feature quantity extraction means;
A relative delay calculating unit that calculates a video / audio relative delay amount from the first audio feature amount extracted by the signal / feature amount extracting unit and the second audio feature amount calculated by the second feature amount calculating unit;
Video / audio signal correction means for correcting the delay of either the audio signal or the video signal in the video signal multiplexed with the audio signal based on the video / audio relative delay amount output from the relative delay calculation means Program to function as.