JP2005136675A

JP2005136675A - Video/voice transmitting device and video/voice receiving device

Info

Publication number: JP2005136675A
Application number: JP2003370179A
Authority: JP
Inventors: Noritaka Kishida; 教敬岸田
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2003-10-30
Filing date: 2003-10-30
Publication date: 2005-05-26

Abstract

<P>PROBLEM TO BE SOLVED: To provide a video/voice transmitting device and a video/voice receiving device for realizing the AV synchronization of a video/voice to be transmitted/received on a network by a simple method without using any time stamp by a TS(transport stream). <P>SOLUTION: A time stamp value to be added by RTP/header adding parts 13 and 23 is inputted by a 32 bit counter 16. Thus, it is possible to realize AV synchronization by time stamp by an RFC 2250, and to realize the simple and inexpensive configurations of the device by making it unnecessary to use any time stamp by the TS. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

この発明は、映像と音声とをネットワークにて送受信する映像音声送信装置及び映像音声受信装置に関するものである。 The present invention relates to a video / audio transmission device and a video / audio reception device that transmit and receive video and audio over a network.

従来の映像音声送信装置及び映像音声受信装置においては、映像と音声を同時にリアルタイムに送受信する方法として、映像と音声をシステム多重する手段が一般的である。例えば特許文献１によれば、非同期にて別々に入力された映像と音声とシステム多重する場合に、符号化装置にてＭＰＥＧ２規格であるＴＳ（トランスポートストリーム）フォーマットにて多重して、復号化装置でリアルタイム再生するための時刻情報として、ＰＣＲ（プログラムカウントリファレンス）と呼ばれるタイムスタンプをＴＳストリームに重畳して送信していた。通常このＰＣＲは映像と音声のリップシンク（ＡＶ同期）を合わせる目的で用いられる。 In conventional video / audio transmission devices and video / audio reception devices, as a method of simultaneously transmitting and receiving video and audio in real time, a means for system-multiplexing video and audio is common. For example, according to Patent Document 1, when video and audio input separately and asynchronously are system-multiplexed, the video is multiplexed and decoded in a TS (transport stream) format that is an MPEG2 standard by an encoding device. A time stamp called PCR (Program Count Reference) is superimposed on a TS stream and transmitted as time information for real-time reproduction by the apparatus. Normally, this PCR is used for the purpose of matching the lip sync (AV synchronization) of video and audio.

さらに、このＴＳデータをＩＰなどによるネットワークにて送受信する場合、例えば非特許文献１によれば、上記タイムスタンプを付加したＴＳストリームを、別のタイムスタンプを含むＲＴＰヘッダを付加して送受信していた。通常、このタイムスタンプはネットワークジッタを計測したり、データ送信のリアルタイム性を保持する目的で使用される。
特開２００１−７８１９５公報第５−１３頁第１図ＲＦＣ２２５０ＲＴＰＰａｙｌｏａｄＦｏｒｍａｔｆｏｒＭＰＥＧ１／ＭＰＥＧ２ＶＩＤＥＯ Furthermore, when this TS data is transmitted / received over a network such as IP, for example, according to Non-Patent Document 1, the TS stream with the time stamp added is transmitted / received with an RTP header including another time stamp added. It was. Usually, this time stamp is used for the purpose of measuring network jitter and maintaining the real-time property of data transmission.
Japanese Patent Laid-Open No. 2001-78195, page 5-13, Fig. 1 RFC2250 RTP Payload Format for MPEG1 / MPEG2 VIDEO

従来の映像音声送信装置及び映像音声受信装置は、ネットワーク上で映像と音声をリアルタイムかつＡＶ同期を図りながら送受信するために、ＴＳによるタイムスタンプとＲＦＣ２２５０によるタイムスタンプとの別々の目的で用いられる２種類のタイムスタンプが必要となる。つまり、上記２種類のタイムスタンプを付加するためには、機器構成が複雑となり、また高価になるという課題があった。 The conventional video / audio transmission device and video / audio reception device are used for different purposes of a time stamp by TS and a time stamp by RFC2250 in order to transmit and receive video and audio on a network in real time and with AV synchronization. A type of timestamp is required. That is, in order to add the two types of time stamps, there is a problem that the device configuration is complicated and expensive.

この発明は上記のような課題を解消するためになされたもので、ＴＳによるタイムスタンプを用いなくても、簡単な方法でネットワーク上を送受信させる映像と音声とのＡＶ同期を図ることができる映像音声送信装置及び映像音声受信装置を提供することを目的とする。 The present invention has been made to solve the above-described problems, and can perform AV synchronization between video and audio transmitted / received on a network by a simple method without using a time stamp by TS. An object of the present invention is to provide an audio transmission device and a video / audio reception device.

この発明に係わる映像音声送信装置は、映像信号を符号化して映像符号化データを出力する映像符号化手段と、音声信号を符号化して音声符号化データを出力する音声符号化手段と、上記映像符号化データにネットワーク用の映像ヘッダを付加する映像ヘッダ付加手段と、上記音声符号化データにネットワーク用の音声ヘッダを付加する音声ヘッダ付加手段と、上記映像ヘッダ付加手段及び音声ヘッダ付加手段の各々から映像ヘッダ及び音声ヘッダを付加した映像符号化データ及び／又は音声符号化データを入力してネットワーク上に出力するネットワーク伝送手段と、上記映像ヘッダ及び音声ヘッダに挿入するタイムスタンプ値を提供する共通カウンタとを備えたものである。 The video / audio transmission device according to the present invention includes a video encoding unit that encodes a video signal and outputs video encoded data, an audio encoding unit that encodes an audio signal and outputs audio encoded data, and the video Video header adding means for adding a network video header to the encoded data, audio header adding means for adding a network audio header to the audio encoded data, and each of the video header adding means and the audio header adding means Commonly providing network transmission means for inputting video encoded data and / or audio encoded data with video header and audio header added thereto and outputting them on a network, and a time stamp value to be inserted into the video header and audio header And a counter.

この発明によれば、映像音声送信装置は、共通カウンタが映像ヘッダ及び音声ヘッダに挿入するタイムスタンプ値を提供するので、ＲＴＰのタイムスタンプ値を用いて映像信号と音声信号とのＡＶ同期を図ることが可能となる効果がある。 According to the present invention, the video / audio transmission device provides the time stamp value inserted into the video header and the audio header by the common counter, so that the AV synchronization between the video signal and the audio signal is achieved using the RTP time stamp value. There is an effect that becomes possible.

実施の形態１．
以下、この発明の実施の形態を図について説明する。図１はこの発明の実施の形態１に示す映像音声送信装置の構成図、図２は図１に示す映像音声送信装置を用いた映像伝送システムの構成図である。図において、１、３は映像信号、２、４は音声信号、５はネットワーク伝送データ、１０は映像側のＡ／Ｄ変換器（以下、映像Ａ／Ｄと称す）、２０は音声側のＡ／Ｄ変換器（以下、音声Ａ／Ｄと称す）、１１は映像符号化部、２１は音声符号化部、１２、２２はイーサネット（Ｒ）／ＩＰ／ＵＤＰヘッダ付加部、１３、２３はＲＴＰ／ヘッダ付加部、１４は２７ＭＨｚ発振器、１５は３００分周期、１６、２６は３２ビットカウンタである。また、２７は切り替え部、３０はネットワーク伝送部、４０はＣＰＵ、１００及び１００−１〜１００−３は映像音声送信装置、２００はネットワーク伝送路、３００−１〜３００−３は映像音声受信装置、３１０はモニタ、３２０はスピーカである。 Embodiment 1 FIG.
Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 is a block diagram of a video / audio transmission apparatus according to Embodiment 1 of the present invention, and FIG. 2 is a block diagram of a video transmission system using the video / audio transmission apparatus shown in FIG. In the figure, 1 and 3 are video signals, 2 and 4 are audio signals, 5 is network transmission data, 10 is an A / D converter (hereinafter referred to as video A / D) on the video side, and 20 is A on the audio side. / D converter (hereinafter referred to as audio A / D), 11 is a video encoding unit, 21 is an audio encoding unit, 12 and 22 are Ethernet (R) / IP / UDP header addition units, and 13 and 23 are RTPs. / Header adding unit, 14 is a 27 MHz oscillator, 15 is a 300 minute period, and 16 and 26 are 32-bit counters. Also, 27 is a switching unit, 30 is a network transmission unit, 40 is a CPU, 100 and 100-1 to 100-3 are video / audio transmission devices, 200 is a network transmission path, and 300-1 to 300-3 are video / audio reception devices. 310 are monitors, and 320 is a speaker.

また、図３は図１に示すネットワーク伝送データ５の構成図である。先頭にイーサネット（Ｒ）ヘッダ、２番目にＩＰヘッダ、３番目にＵＤＰヘッダ、４番目にＲＴＰヘッダ、最後にペイロードである映像符号化データ又は音声符号化データで構成される。映像符号化部１１又は音声符号化部２１から出力する映像符号化データ又は音声符号化データは適当なデータ量、例えば２０４８バイト単位に分割される。また、イーサネット（Ｒ）ヘッダ、ＩＰヘッダ、ＵＤＰヘッダはイーサネット（Ｒ）／ＩＰ／ＵＤＰヘッダ付加部１２又は２２が付加し、ＲＴＰヘッダはＲＴＰ／ヘッダ付加部１３又は２３が付加する。また、ネットワーク伝送部３０は、ＲＴＰ／ヘッダ付加部１３、２３から入力したヘッダが付加された映像符号化データ又は音声符号化データに送信先を指定してネットワーク伝送データ５としてパケット単位でネットワーク伝送路２００に送信する。 FIG. 3 is a block diagram of the network transmission data 5 shown in FIG. It is composed of an Ethernet (R) header at the top, an IP header at the third, a UDP header at the third, an RTP header at the fourth, and video encoded data or audio encoded data as the payload at the end. Video encoded data or audio encoded data output from the video encoding unit 11 or the audio encoding unit 21 is divided into an appropriate amount of data, for example, 2048 bytes. The Ethernet (R) header, IP header, and UDP header are added by the Ethernet (R) / IP / UDP header adding unit 12 or 22, and the RTP header is added by the RTP / header adding unit 13 or 23. Further, the network transmission unit 30 designates a transmission destination in the encoded video data or audio encoded data to which the headers input from the RTP / header adding units 13 and 23 are added, and transmits the network as network transmission data 5 in packet units. To the path 200.

次に動作について説明するが、最初に映像信号のみを送信する場合について説明する。図２において、カメラ１１０から出力された映像信号１は、映像音声送信装置１００−１に入力される。映像音声送信装置１００−１では特に音声を入力せず、映像のみを送信するものとする。この場合、切り替え部２７は使用しないため、どちら側になっていても構わない。 Next, the operation will be described. First, a case where only the video signal is transmitted will be described. In FIG. 2, the video signal 1 output from the camera 110 is input to the video / audio transmission device 100-1. It is assumed that the video / audio transmission device 100-1 transmits only video without particularly inputting audio. In this case, since the switching unit 27 is not used, it does not matter which side it is on.

映像信号１は、映像Ａ／Ｄ１０にてデジタル化される。デジタル化された映像信号は、映像符号化部１１によってＭＰＥＧなどの映像符号化データに圧縮される。この場合、音声とのシステム多重がないためＴＳによるタイムスタンプは付加しない。圧縮された映像符号化データはＩＰ送信するために、イーサネット（Ｒ）／ＩＰ／ＵＤＰヘッダ付加部１２にてイーサネット（Ｒ）ヘッダ、ＩＰヘッダ、及びＵＤＰヘッダが付加される。その後ＲＴＰヘッダ付加部１３にてＲＴＰヘッダを付加し、ネットワーク伝送部３０にて送信先を映像音声受信装置３００−１とし、ネットワーク伝送データ５としてネットワーク伝送路２００に送信する。 Video signal 1 is digitized by video A / D 10. The digitized video signal is compressed by the video encoder 11 into video encoded data such as MPEG. In this case, since there is no system multiplexing with audio, a time stamp by TS is not added. In order to transmit the compressed video encoded data by IP, an Ethernet (R) header, an IP header, and a UDP header are added by the Ethernet (R) / IP / UDP header adding unit 12. Thereafter, an RTP header is added by the RTP header adding unit 13, and the transmission destination is the video / audio receiving device 300-1 by the network transmission unit 30 and the network transmission data 5 is transmitted to the network transmission line 200.

また、映像Ａ／Ｄ１０及び映像符号化部１１の動作周波数は２７ＭＨｚ発振器１４より供給される２７ＭＨｚのクロックである。このクロックは、映像Ａ／Ｄ１０のサンプリング周波数として用いられ、さらに映像符号化部１１の動作クロックになる。 The operating frequency of the video A / D 10 and the video encoding unit 11 is a 27 MHz clock supplied from the 27 MHz oscillator 14. This clock is used as a sampling frequency of the video A / D 10 and further becomes an operation clock of the video encoding unit 11.

さらに、この２７ＭＨｚのクロックは、３００分周器１５によって９０ＫＨｚのクロックに分周される。分周されたクロックは３２ビットカウンタ１６に供給され、３２ビットカウンタ１６は後述するタイムスタンプ値として供給するカウンタ値のカウント動作を行う。 Further, the 27 MHz clock is divided by the 300 frequency divider 15 into a 90 KHz clock. The divided clock is supplied to the 32-bit counter 16, and the 32-bit counter 16 counts the counter value supplied as a time stamp value described later.

また、図３に示すようにＲＴＰヘッダの内部には３２ビットのタイムスタンプ（ＴｉｍｅＳｔａｍｐ）の領域がある。このタイムスタンプの値は、３２ビットカウンタ１６からのカウンタ値が付加される。すなわち、映像符号化の基準周波数である２７ＭＨｚから分周された９０ＫＨｚのカウント値が、ＲＴＰ／ヘッダ付加部１３で挿入するタイムスタンプ値に使用される。 As shown in FIG. 3, the RTP header has a 32-bit time stamp (Time Stamp) area. The counter value from the 32-bit counter 16 is added to the time stamp value. That is, the 90 KHz count value divided from the video encoding reference frequency of 27 MHz is used as the time stamp value inserted by the RTP / header adding unit 13.

ネットワーク伝送データ５として送信された映像符号化データは、ネットワーク伝送路２００を通じて映像音声受信装置３００−１が受信する。尚、映像音声受信装置３００−１〜３００−３は、ネットワーク伝送データ５の周波数を検知し、検知した周波数に合わせてパケットを抽出し、ＲＴＰヘッダのペイロードタイプ（図３のＰＴ）にて映像符号化データ又は音声符号化データ毎に分離復号を行う装置である。つまり、映像音声受信装置３００−１では、入力したネットワーク伝送データ５の周波数を９０ＫＨｚと検知し、９０Ｋｈｚにてネットワーク伝送データ５を入力し、ペイロードデータである映像符号化データを分離復号して映像信号１を出力し、その映像信号１をモニタ３１０に出力する。 The encoded video data transmitted as the network transmission data 5 is received by the video / audio reception device 300-1 through the network transmission path 200. The video / audio reception devices 300-1 to 300-3 detect the frequency of the network transmission data 5, extract packets according to the detected frequency, and use the RTP header payload type (PT in FIG. 3) for video. It is a device that performs demultiplexing for each encoded data or audio encoded data. That is, the video / audio reception device 300-1 detects the frequency of the input network transmission data 5 as 90 KHz, inputs the network transmission data 5 at 90 Khz, separates and decodes the video encoded data as payload data, and outputs the video. The signal 1 is output, and the video signal 1 is output to the monitor 310.

次に、音声信号のみを送信する場合について説明する。マイク１２０から出力された音声信号２は、映像音声送信装置１００−２に入力される。映像音声送信装置１００−２では特に映像を入力せず、音声のみを送信するものとする。この場合、ＣＰＵ４０は切り替え部２７を３２ビットカウンタ２６側に切り替える。 Next, a case where only an audio signal is transmitted will be described. The audio signal 2 output from the microphone 120 is input to the video / audio transmission device 100-2. It is assumed that the video / audio transmission device 100-2 transmits only audio without inputting video. In this case, the CPU 40 switches the switching unit 27 to the 32-bit counter 26 side.

音声信号２は、音声Ａ／Ｄ２０にてデジタル化される。デジタル化された音声信号は、音声符号化部２１によってＭＰＥＧやμ―ＬＡＷなどの音声符号化データに圧縮される。この場合、映像とのシステム多重がないためＴＳによるタイムスタンプは付加しない。圧縮された音声符号化データはＩＰ送信するために、イーサネット（Ｒ）／ＩＰ／ＵＤＰヘッダ付加部２２にてイーサネット（Ｒ）ヘッダ、ＩＰヘッダ、及びＵＤＰヘッダが付加される。その後ＲＴＰヘッダ付加部２３にてＲＴＰヘッダを付加し、ネットワーク伝送部３０にて送信先を映像音声受信装置３００−２とし、ネットワーク伝送データ５としてネットワーク伝送路２００に送信する。 The audio signal 2 is digitized by the audio A / D 20. The digitized audio signal is compressed by the audio encoding unit 21 into audio encoded data such as MPEG or μ-LAW. In this case, since there is no system multiplexing with video, a time stamp by TS is not added. In order to transmit the compressed speech encoded data by IP, the Ethernet (R) / IP / UDP header adding unit 22 adds an Ethernet (R) header, an IP header, and a UDP header. Thereafter, the RTP header adding unit 23 adds an RTP header, and the network transmission unit 30 sets the transmission destination as the video / audio reception device 300-2 and transmits it as the network transmission data 5 to the network transmission line 200.

また、音声Ａ／Ｄ２０と音声符号化部２１の動作周波数は８ＫＨｚ発振器２４より供給される８ＫＨｚのクロックである。このクロックは、音声Ａ／Ｄ２０のサンプリング周波数として用いられ、さらに音声符号化部２１の動作クロックにもなる。 The operating frequency of the audio A / D 20 and the audio encoding unit 21 is an 8 KHz clock supplied from the 8 KHz oscillator 24. This clock is used as a sampling frequency of the audio A / D 20 and also becomes an operation clock of the audio encoding unit 21.

さらに、この８ＫＨｚのクロックは、３２ビットカウンタ２６に供給され、３２ビットカウンタ２６はタイムスタンプ値として供給するカウンタ値のカウント動作を行う。 Further, this 8 KHz clock is supplied to the 32-bit counter 26, and the 32-bit counter 26 performs a count operation of the counter value supplied as a time stamp value.

また、音声の場合もＲＴＰヘッダの内部には３２ビットのタイムスタンプの領域がある。このタイムスタンプの値には、３２ビットカウンタ２６のカウンタ値が付加される。すなわち、音声符号化の基準周波数である８ＫＨｚでのカウント値が、ＲＴＰ／ヘッダ付加部２３で挿入するタイムスタンプ値に使用され、ネットワーク伝送データ５として送信される。 Also in the case of voice, there is a 32-bit time stamp area inside the RTP header. The counter value of the 32-bit counter 26 is added to the time stamp value. That is, the count value at 8 KHz, which is the reference frequency for speech coding, is used as the time stamp value inserted by the RTP / header adding unit 23 and transmitted as the network transmission data 5.

ネットワーク伝送データ５として送信された音声符号化データは、ネットワーク伝送路２００を通じて映像音声受信装置３００−２が受信する。映像音声受信装置３００−２では、入力したネットワーク伝送データ５の周波数を８ＫＨｚと検知し、８ＫＨｚにてネットワーク伝送データ５を入力し、ペイロードデータである音声符号化データを分離復号して音声信号２を出力し、その音声信号２をスピーカ３２０に出力する。 The encoded audio data transmitted as the network transmission data 5 is received by the video / audio reception device 300-2 through the network transmission path 200. The video / audio reception device 300-2 detects the frequency of the input network transmission data 5 as 8 KHz, inputs the network transmission data 5 at 8 KHz, separates and decodes the encoded audio data as payload data, and outputs the audio signal 2 And the audio signal 2 is output to the speaker 320.

次に、映像と音声を同時に送信する場合について説明する。図２に示すように、映像音声送信装置１００−３にはカメラ１１０とマイク１２０とが接続されている。カメラ１１０から出力された映像信号３とマイク１２０から出力された音声信号４は、映像音声送信装置１００−３に入力される。この場合、ＣＰＵ４０は３２ビットカウンタ１６側に切り替える。 Next, a case where video and audio are transmitted simultaneously will be described. As shown in FIG. 2, a camera 110 and a microphone 120 are connected to the video / audio transmission device 100-3. The video signal 3 output from the camera 110 and the audio signal 4 output from the microphone 120 are input to the video / audio transmission device 100-3. In this case, the CPU 40 switches to the 32-bit counter 16 side.

映像信号３は、映像Ａ／Ｄ１０にてデジタル化される。デジタル化された映像信号は、映像符号化部１１によってＭＰＥＧなどの映像符号化データに圧縮される。圧縮された映像符号化データはＩＰ送信するために、イーサネット（Ｒ）／ＩＰ／ＵＤＰヘッダ付加部１２にてイーサネット（Ｒ）ヘッダ、ＩＰヘッダ、及びＵＤＰヘッダが付加される。その後ＲＴＰヘッダ付加部１３にてＲＴＰヘッダが付加される。 The video signal 3 is digitized by the video A / D 10. The digitized video signal is compressed by the video encoder 11 into video encoded data such as MPEG. In order to transmit the compressed video encoded data by IP, an Ethernet (R) header, an IP header, and a UDP header are added by the Ethernet (R) / IP / UDP header adding unit 12. Thereafter, the RTP header adding unit 13 adds an RTP header.

音声信号４は、音声Ａ／Ｄ２０にてデジタル化される。デジタル化された音声信号は、音声符号化部２１によってＭＰＥＧやμ―ＬＡＷなどの音声符号化データに圧縮される。圧縮された符号化データはＩＰ送信するために、イーサネット（Ｒ）／ＩＰ／ＵＤＰヘッダ付加部２２にてイーサネット（Ｒ）ヘッダ、ＩＰヘッダ、及びＵＤＰヘッダが付加される。その後ＲＴＰヘッダ付加部２３にてＲＴＰヘッダが付加される。 The audio signal 4 is digitized by the audio A / D 20. The digitized audio signal is compressed by the audio encoding unit 21 into audio encoded data such as MPEG or μ-LAW. The Ethernet (R) / IP / UDP header adding unit 22 adds an Ethernet (R) header, an IP header, and a UDP header in order to transmit the compressed encoded data by IP. Thereafter, the RTP header adding unit 23 adds an RTP header.

また、切り替え部２７はＣＰＵ４０によって３２ビットカウンタ１６側に切り替えられている。すなわち、映像と音声とを同時に送信する場合、ＲＴＰ／ヘッダ付加部１３、２３にて付加するタイムスタンプ値は、映像符号化の基準クロックから分周された９０ＫＨｚでカウントする３２ビットカウンタ１６でのカウント値を使用する。これにより、ＲＴＰヘッダ付加部１３によってＲＴＰヘッダを付加した映像符号化データ、ＲＴＰヘッダ付加部２３によってＲＴＰヘッダを付加した音声符号化データはネットワーク伝送部３０にて送信先を映像音声受信装置３００−３とし、ネットワーク伝送データ５としてネットワーク伝送路２００に送信する。 The switching unit 27 is switched to the 32-bit counter 16 side by the CPU 40. That is, when video and audio are transmitted simultaneously, the time stamp value added by the RTP / header adding units 13 and 23 is a 32-bit counter 16 that counts at 90 KHz divided from the video encoding reference clock. Use the count value. As a result, the encoded video data to which the RTP header is added by the RTP header adding unit 13 and the audio encoded data to which the RTP header is added by the RTP header adding unit 23 are transmitted to the video / audio receiving device 300-by the network transmission unit 30. 3 and transmitted as network transmission data 5 to the network transmission line 200.

ネットワーク伝送データ５として送信された映像符号化データ及び音声符号化データは、ネットワーク伝送路２００を通じて映像音声受信装置３００−３が受信する。映像音声受信装置３００−３では、入力したネットワーク伝送データ５の周波数を９０ＫＨｚと検知し、９０ＫＨｚにてネットワーク伝送データ５を入力し、ペイロードデータである映像符号化データ及び音声符号化データを各々に分離復号して映像信号３と音声信号４とを出力し、映像信号３をモニタ３１０に出力し、音声信号４をスピーカ３２０に出力する。 The video and audio encoded data and audio encoded data transmitted as the network transmission data 5 are received by the video and audio receiving device 300-3 through the network transmission path 200. In the video / audio reception device 300-3, the frequency of the input network transmission data 5 is detected as 90 KHz, the network transmission data 5 is input at 90 KHz, and the video encoded data and the audio encoded data, which are payload data, are respectively received. The video signal 3 and the audio signal 4 are output after being separated and decoded, the video signal 3 is output to the monitor 310, and the audio signal 4 is output to the speaker 320.

これにより、映像符号化データと音声符号化データのＲＴＰヘッダに挿入されるタイムスタンプ値は、９０ＫＨｚの同一周波数によるカウント値になるので、映像音声受信装置３００−３は、このＲＴＰヘッダのタイムスタンプ値を用いて映像信号３と音声信号４とのＡＶ同期を図ることが可能となる。従来は、ＲＴＰのタイムスタンプ以外に特許文献１のような符号化時の映像音声多重のためのＴＳによるタイムスタンプ値の２種類のタイムスタンプを必要としていた。しかし、本発明では３２ビットカウンタ１６でのカウント値を使用してＲＴＰ／ヘッダ付加部１３、２３内のタイムスタンプを挿入するため、ＲＴＰのタイムスタンプにてネットワーク上のＲＴＰネットワークのジッタ計測やリアルタイム性の保持のみでなく、ＡＶ同期の調整も可能になる。 As a result, the time stamp value inserted into the RTP header of the video encoded data and the audio encoded data becomes a count value with the same frequency of 90 KHz, so that the video / audio receiving apparatus 300-3 uses the time stamp of this RTP header. It is possible to achieve AV synchronization between the video signal 3 and the audio signal 4 using the value. Conventionally, in addition to RTP time stamps, two types of time stamps of time stamp values by TS for video / audio multiplexing at the time of encoding as in Patent Document 1 are required. However, in the present invention, since the time stamps in the RTP / header adding units 13 and 23 are inserted using the count value of the 32-bit counter 16, jitter measurement of the RTP network on the network or real time using the RTP time stamp. It is possible to adjust the AV synchronization as well as maintain the performance.

つまり、ネットワーク上で映像と音声とをリアルタイムに送受信するために、従来はＴＳ及びＲＴＰの２種類のタイムスタンプを生成する映像音声送信装置及び映像音声受信装置が必要となり、装置構成を複雑にしなければならなかった。しかし、上述の本発明の構成により、ＲＴＰのタイムスタンプ値を用いて映像信号３と音声信号４とのＡＶ同期を図ることが可能となり、ＴＳのタイムスタンプを不要とできるので、装置構成を簡単かつ安価にできる効果がある。 In other words, in order to transmit and receive video and audio on a network in real time, conventionally, a video / audio transmission device and a video / audio reception device that generate two types of time stamps of TS and RTP are required, and the device configuration must be complicated. I had to. However, with the above-described configuration of the present invention, it is possible to achieve AV synchronization between the video signal 3 and the audio signal 4 using the RTP time stamp value, and the TS time stamp is not necessary. In addition, there is an effect that can be made inexpensive.

また、９０ＫＨｚで動作する３２ビットカウンタ１６を用いることで映像符号化データと音声符号化データの両方を同時に送信することが可能となる。 Further, by using the 32-bit counter 16 that operates at 90 KHz, it becomes possible to transmit both video encoded data and audio encoded data simultaneously.

この発明の活用例として、ネットワークを用いた遠隔監視装置に用いることができる。 As an application example of the present invention, it can be used for a remote monitoring device using a network.

この発明の実施の形態１に示す映像音声送信装置の構成図である。BRIEF DESCRIPTION OF THE DRAWINGS It is a block diagram of the video / audio transmitter shown in Embodiment 1 of this invention. 図１に示す映像音声送信装置を用いた映像伝送システムの構成図である。FIG. 2 is a configuration diagram of a video transmission system using the video / audio transmission device shown in FIG. 1. 図１に示すネットワーク伝送データの構成図である。It is a block diagram of the network transmission data shown in FIG.

Explanation of symbols

１１映像符号化部
１２イーサネット（Ｒ）／ＩＰ／ＵＤＰヘッダ付加部
１３ＲＴＰ／ヘッダ付加部
１４２７ＭＨｚ発振器
１５３００分周器
１６３２ビットカウンタ
２１音声符号化部
２２イーサネット（Ｒ）／ＩＰ／ＵＤＰヘッダ付加部
２３ＲＴＰ／ヘッダ付加部
２４８ＫＨｚ発振器
２６３２ビットカウンタ
２７切り替え部
３０ネットワーク伝送部
１００、１００−１〜１００−３映像音声送信装置
３００−１〜３００−３映像音声受信装置 DESCRIPTION OF SYMBOLS 11 Video encoding part 12 Ethernet (R) / IP / UDP header addition part 13 RTP / header addition part 14 27 MHz oscillator 15 300 frequency divider 16 32-bit counter 21 Audio encoding part 22 Ethernet (R) / IP / UDP header Additional part 23 RTP / header additional part
24 8 KHz oscillator 26 32-bit counter 27 switching unit 30 network transmission unit 100, 100-1 to 100-3 video / audio transmission device 300-1 to 300-3 video / audio reception device

Claims

Video encoding means for encoding a video signal and outputting video encoded data; audio encoding means for encoding an audio signal and outputting audio encoded data;
Video header adding means for adding a video header for the network to the video encoded data;
Voice header adding means for adding a network voice header to the voice encoded data;
Network transmission means for inputting the video encoded data and / or audio encoded data to which the video header and the audio header are added from each of the video header adding means and the audio header adding means and outputting them on the network;
A video / audio transmission apparatus comprising: a common counter that provides a time stamp value to be inserted into the video header and the audio header.

2. The video / audio transmission apparatus according to claim 1, wherein the common counter is synchronized at 90 KHz.

The network transmission data is input from the video / audio transmission device according to claim 1, and the video encoded data and / or the audio encoded data is separated and decoded.
A video / audio receiving apparatus, wherein when the decoded video signal and / or audio signal is output, the decoded video signal and / or audio signal is output while performing AV synchronization based on a time stamp value of the video header and the audio header.