JP2008099209A

JP2008099209A - Content reproducing apparatus and reproduction timing synchronizing method thereof

Info

Publication number: JP2008099209A
Application number: JP2006281861A
Authority: JP
Inventors: Tatsuya Koretsu; 達也是津; Takeshi Nagai; 剛永井
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2006-10-16
Filing date: 2006-10-16
Publication date: 2008-04-24

Abstract

<P>PROBLEM TO BE SOLVED: To correct out-of-clock synchronization between transmitting and receiving sides while keeping excellent reproduction quality of content data even when a receiving-side clock velocity is lower than a transmitting-side clock velocity. <P>SOLUTION: When a buffering time quantity of video and audio RTP packets in a buffer memory exceeds a threshold value, at mobile terminals MS1-MSm, a packet is detected from among video and audio RTP packets received thereafter, the packet containing a video frame which is positioned just before an I frame and of which the frame size is below a threshold value, and a silent audio frame of which a reproduction time is corresponding to that of the relevant video frame. When video and audio RTP packets meeting these conditions are selected, these packets are discarded from the buffer memory. Furthermore, at the same time, switching processing is performed to sequentially carry time stamps of following video and audio RTP packets. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

この発明は、例えばコンテンツサーバからパケット化されて伝送される映像フレーム及び音声フレームの少なくとも一方を含むコンテンツデータを受信して再生するコンテンツ再生装置とその再生タイミング同期方法に関する。 The present invention relates to a content reproduction apparatus that receives and reproduces content data including at least one of a video frame and an audio frame that are packetized and transmitted from a content server, for example, and a reproduction timing synchronization method thereof.

近年、通信技術の目覚ましい発展により、コンテンツサーバから映像データ及び音声データを含むコンテンツデータを、通信ネットワークを介してコンテンツ再生装置に配信するサービスが提供されている。このようなサービスを利用することで、ユーザは自身のパーソナル・コンピュータや携帯端末等を使用して、映画や監視データ等の所望のコンテンツを受信し再生することが可能となる。 2. Description of the Related Art In recent years, with the remarkable development of communication technology, a service for distributing content data including video data and audio data from a content server to a content reproduction device via a communication network is provided. By using such a service, the user can receive and play back desired contents such as movies and monitoring data using his / her personal computer, portable terminal, and the like.

ところで、ユーザがコンテンツ再生装置として使用する上記パーソナル・コンピュータや携帯端末は一般に独自のクロックにより動作するため、送信側のコンテンツサーバとは同期がとれているとは限らない。このため、コンテンツ再生装置においてコンテンツを長時間再生すると、送信側と受信側との間のクロックの位相ずれが大きくなり、その結果映像フレーム及び音声フレームのバッファリング量が増減して再生遅延の増大やバッファアンダフローを招くおそれがある。 By the way, the personal computer and the portable terminal used by the user as a content reproduction apparatus generally operate with a unique clock, and therefore are not always synchronized with the content server on the transmission side. For this reason, when content is played back for a long time in the content playback device, the phase shift of the clock between the transmission side and the reception side increases, and as a result, the buffering amount of the video frame and the audio frame increases and decreases and the playback delay increases Or buffer underflow.

一方、従来ではデータパケットの受信タイミングに揺らぎが生じることによって複数のコンテンツデータ間、例えば映像データと音声データとの間の同期がずれた場合に、一方のデータを遅延させることにより両者間の再同期をとる手段が提案されている（例えば、特許文献１を参照。）。
特開２００３−２０４４９２号公報 On the other hand, when the synchronization between data contents, for example, video data and audio data, is out of sync due to fluctuations in the reception timing of data packets, on the other hand, one data is delayed to re-establish between them. Means for achieving synchronization have been proposed (see, for example, Patent Document 1).
JP 2003-204492 A

ところが、前記従来の提案では、映像データと音声データとの間の同期ずれを再同期することはできるが、送信側装置におけるコンテンツデータの送信タイミングと、受信側のコンテンツ再生装置における当該コンテンツデータの再生タイミングとの間のずれを補正することができない。また、再同期手段としてデータの一方を遅延する手法が示されているが、この手法では受信側装置のクロック速度が送信側装置のクロック速度より速い場合には再同期が可能であるが、反対に受信側装置のクロック速度が送信側装置のクロック速度より遅い場合には再同期が困難である。 However, in the conventional proposal, it is possible to resynchronize the synchronization deviation between the video data and the audio data, but the transmission timing of the content data in the transmission side device and the content data in the content reproduction device on the reception side. The deviation from the reproduction timing cannot be corrected. In addition, as a resynchronization means, a method of delaying one of the data is shown, but in this method, resynchronization is possible when the clock speed of the receiving apparatus is faster than the clock speed of the transmitting apparatus, In addition, resynchronization is difficult when the clock speed of the receiving apparatus is slower than the clock speed of the transmitting apparatus.

この発明は上記事情に着目してなされたもので、その目的とするところは、受信側装置のクロック速度が送信側装置のクロック速度より遅い場合にも、送受間のクロック同期ずれをコンテンツデータの再生品質を良好に保ちつつ補正できるようにしたコンテンツ再生装置とその再生タイミング同期方法を提供することにある。 The present invention has been made paying attention to the above circumstances, and the object of the present invention is to reduce the clock synchronization deviation between transmission and reception even when the clock speed of the receiving apparatus is slower than the clock speed of the transmitting apparatus. It is an object of the present invention to provide a content playback apparatus and a playback timing synchronization method thereof that can be corrected while maintaining good playback quality.

上記目的を達成するためにこの発明の一観点は、送信側装置からパケット化されて伝送される映像フレーム及び音声フレームの少なくとも一方を含むコンテンツデータを受信して再生するコンテンツ再生装置において、上記受信された映像フレーム又は音声フレームの再生タイミングの基準値に対する遅延時間を検出する。そして、この検出された遅延時間が予め設定された値を超えた場合に、それ以後に受信される映像フレーム又は音声フレームの中から、フレーム内符号化されたフレームの直前に位置しかつ情報量がしきい値以下の映像フレーム又は音声フレームを調整用フレームとして検出し、この調整用フレームが検出された場合に当該調整用フレームを破棄すると共に後続のフレームの再生時刻を早める処理を行うようにしたものである。 In order to achieve the above object, an aspect of the present invention provides a content reproduction apparatus that receives and reproduces content data including at least one of a video frame and an audio frame that are packetized and transmitted from a transmission side apparatus. A delay time with respect to a reference value of the reproduction timing of the video frame or audio frame is detected. When the detected delay time exceeds a preset value, the information frame is located immediately before the intra-coded frame from the video frames or audio frames received thereafter. Is detected as an adjustment frame, and when this adjustment frame is detected, the adjustment frame is discarded and the playback time of the subsequent frame is advanced. It is a thing.

したがって、受信側のコンテンツ再生装置のクロック速度が送信側装置のクロック速度より遅い場合に、受信側における映像フレーム又は音声フレームの再生タイミングの基準値に対する遅延時間が予め設定された値を超えると、それ以後に受信される映像フレーム又は音声フレームの中から調整用フレームが選択されて、この調整用フレームが破棄されると共に後続のフレームの再生時刻が早められる。このため、コンテンツ再生装置のクロック遅れによる再生遅延は軽減される。 Therefore, when the clock speed of the content playback device on the reception side is slower than the clock speed of the transmission side device, if the delay time with respect to the reference value of the playback timing of the video frame or audio frame on the reception side exceeds a preset value, Thereafter, an adjustment frame is selected from video frames or audio frames received thereafter, the adjustment frame is discarded, and the reproduction time of the subsequent frame is advanced. For this reason, the reproduction delay due to the clock delay of the content reproduction apparatus is reduced.

また、調整用フレームとして、フレーム内符号化されたフレームの直前に位置しかつ情報量がしきい値以下の映像フレーム又は音声フレームを選択するようにしている。このため、調整用フレームを破棄してもそれによる再生映像又は再生音声の途切れや不自然な変化等の不連続の発生は低く抑えられ、これにより再生品質の低下は防止される。 In addition, as an adjustment frame, a video frame or an audio frame that is located immediately before the intra-coded frame and whose information amount is equal to or less than a threshold value is selected. For this reason, even if the adjustment frame is discarded, the occurrence of discontinuity such as interruption or unnatural change in the reproduced video or reproduced sound is suppressed to a low level, thereby preventing the reproduction quality from being deteriorated.

すなわちこの発明によれば、受信側装置のクロック速度が送信側装置のクロック速度より遅い場合にも、送受間のクロック同期ずれをコンテンツデータの再生品質を良好に保ちつつ補正できるようにしたコンテンツ再生装置とその再生タイミング同期方法を提供することができる。 In other words, according to the present invention, even when the clock speed of the receiving apparatus is slower than the clock speed of the transmitting apparatus, the content reproduction that can correct the clock synchronization deviation between transmission and reception while maintaining the reproduction quality of the content data is good. An apparatus and a reproduction timing synchronization method thereof can be provided.

以下、図面を参照してこの発明の実施形態を説明する。
（第１の実施形態）
この発明の第１の実施形態は、コンテンツ再生装置としての携帯端末において、受信された映像ＲＴＰパケット及び音声ＲＴＰパケット中から、フレーム内符号化されたキーフレームの直前に位置しかつフレームサイズがしきい値以下の映像フレームと、当該映像フレームと再生時刻が対応する無音の音声フレームを含むパケットを検出し、この検出されたパケットを破棄すると共に後続の映像ＲＴＰパケット及び音声ＲＴＰパケットのタイムスタンプを順次繰り上げるようにしたものである。 Embodiments of the present invention will be described below with reference to the drawings.
(First embodiment)
In the first embodiment of the present invention, in a mobile terminal as a content playback device, the received video RTP packet and audio RTP packet are located immediately before the intraframe-encoded key frame and the frame size is reduced. A packet including a video frame below the threshold and a silent audio frame corresponding to the video frame and the playback time is detected, the detected packet is discarded, and the time stamps of the subsequent video RTP packet and audio RTP packet are recorded. It is made to move up sequentially.

図１は、この発明の第１の実施形態に係わるコンテンツ再生装置を使用した移動通信システムの概略構成図である。このシステムは、移動通信交換機能を有する通信ネットワークＮＷと、サービスエリアに分散配置された複数の基地局ＢＳ１〜ＢＳｎとを備える。そして、コンテンツ再生装置としての携帯端末ＭＳ１〜ＭＳｍを上記基地局ＢＳ１〜ＢＳｎ及び通信ネットワークＮＷを介してコンテンツサーバＳＶに接続することにより、コンテンツサーバＳＶが配信するコンテンツデータを携帯端末ＭＳ１〜ＭＳｍで受信し再生可能としたものである。コンテンツサーバＳＶから携帯端末ＭＳ１〜ＭＳｍへコンテンツデータを配信する際に使用される通信プロトコルとしては、例えばＲＴＰ（Real-time Transport Protocol）が用いられる。ＲＴＰについては、マスタリングＴＣＰ／ＩＰＲＴＰ編 Colin Perkins著小川晃通監訳オーム社に詳しく述べられている。 FIG. 1 is a schematic configuration diagram of a mobile communication system using a content reproduction apparatus according to the first embodiment of the present invention. This system includes a communication network NW having a mobile communication exchange function and a plurality of base stations BS1 to BSn distributed in a service area. Then, by connecting the mobile terminals MS1 to MSm as the content playback devices to the content server SV via the base stations BS1 to BSn and the communication network NW, the content data distributed by the content server SV is transmitted to the mobile terminals MS1 to MSm. It can be received and played back. For example, RTP (Real-time Transport Protocol) is used as a communication protocol used when distributing content data from the content server SV to the mobile terminals MS1 to MSm. RTP is described in detail in the mastering TCP / IP RTP edition by Colin Perkins, translated by Otsumu Odori.

ところで、上記携帯端末ＭＳ１〜ＭＳｍは次のように構成される。図２はその機能構成を示すブロック図である。すなわち携帯端末ＭＳ１〜ＭＳｍは、無線ユニット１と、ベースバンドユニット２と、ユーザインタフェースユニット３と、電源ユニット４とを備えている。 By the way, the portable terminals MS1 to MSm are configured as follows. FIG. 2 is a block diagram showing the functional configuration. That is, the mobile terminals MS <b> 1 to MSm include a wireless unit 1, a baseband unit 2, a user interface unit 3, and a power supply unit 4.

同図において、基地局ＢＳ１〜ＢＳｎから無線チャネルを介して到来した無線信号は、アンテナ１１で受信されたのちアンテナ共用器１２を介して受信回路（ＲＸ）１３に入力される。受信回路１３は、高周波増幅器、周波数変換器及び復調器を備える。そして、上記無線周波信号を低雑音増幅器で低雑音増幅したのち、周波数変換器において周波数シンセサイザ（ＳＹＮ）１４から発生された受信局部発振信号とミキシングして受信ベースバンド信号に周波数変換し、その出力信号を復調器でディジタル復調する。復調方式としては、例えば直交復調方式と拡散符号を使用したスペクトラム逆拡散方式が用いられる。なお、上記周波数シンセサイザ１４から発生される受信局部発振信号周波数は、ベースバンドユニット２に設けられた制御モジュール２１から指示される。 In the figure, a radio signal arriving from base stations BS1 to BSn via a radio channel is received by an antenna 11 and then input to a receiving circuit (RX) 13 via an antenna duplexer 12. The reception circuit 13 includes a high frequency amplifier, a frequency converter, and a demodulator. Then, after the radio frequency signal is amplified with a low noise amplifier, the frequency converter mixes the received local oscillation signal generated from the frequency synthesizer (SYN) 14 with the received signal and converts the frequency into a received baseband signal. The signal is digitally demodulated with a demodulator. As the demodulation method, for example, an orthogonal demodulation method and a spectrum despreading method using a spreading code are used. The reception local oscillation signal frequency generated from the frequency synthesizer 14 is instructed from the control module 21 provided in the baseband unit 2.

上記受信回路１３の復調器から出力された復調信号は、ベースバンドユニット２に入力される。この復調信号は、多重分離部２２において制御パケットとデータパケットとに分離される。そして、このうちデータパケットはパケット逆変換部２８に入力される。なお、制御パケットには例えば発信元番号や、自端末及び相手通信端末の位置情報などが含まれ、これらは制御モジュール２１において接続制御等を行うために使用される。 The demodulated signal output from the demodulator of the receiving circuit 13 is input to the baseband unit 2. The demodulated signal is separated into a control packet and a data packet by the demultiplexing unit 22. Of these, the data packet is input to the packet reverse conversion unit 28. The control packet includes, for example, a transmission source number, location information of the own terminal and the partner communication terminal, and these are used in the control module 21 for connection control and the like.

パケット逆変換部２８は、上記入力されたデータパケットから映像データとオーディオデータとを分離抽出し、これらのデータをそれぞれ復号処理が可能な情報形式に変換してオーディオデコーダ２６及びビデオデコーダ２５に入力する。オーディオデコーダ２６は、上記入力されたオーディオデータを指定された音声符号化方式に対応する復号方式により復号し、この復号されたオーディオ信号をユーザインタフェースユニット３のスピーカ３４から拡声出力する。音声符号化方式としては、例えばＡＭＲ（Adaptive Multi-Rate）を採用する。ＡＭＲは符号化された音声が無音であることを示すサイズの小さいデータを有音とは別に規定しているため、符号化後の音声のサイズから有音と無音を区別することができる。なお、ＡＭＲの代わりにＡＡＣ（Advanced Audio Coder）やＭＰ３（MPEG-1 Audio Layer3）を採用することも可能であるが、この場合は無音であるかもしくは無音に相当する程度に音量が小さいことを、符号化後の音声データのサイズではなくデータの内容から検出する必要がある。ビデオデコーダ２５は、上記入力された映像データを例えばＭＰＥＧ４（Moving Picture Coding Experts Group 4）方式に従い復号処理し、この復号された映像信号を図示しない表示制御部を介して液晶表示器（ＬＣＤ）３３に供給し表示させる。 The packet reverse conversion unit 28 separates and extracts video data and audio data from the input data packet, converts these data into information formats that can be decoded, and inputs them to the audio decoder 26 and the video decoder 25. To do. The audio decoder 26 decodes the input audio data by a decoding method corresponding to the designated audio encoding method, and outputs the decoded audio signal from the speaker 34 of the user interface unit 3 as a loud sound. For example, AMR (Adaptive Multi-Rate) is adopted as the speech encoding method. AMR prescribes small-sized data indicating that the encoded speech is silent, separately from speech, so that speech and silence can be distinguished from the size of the encoded speech. It is possible to adopt AAC (Advanced Audio Coder) or MP3 (MPEG-1 Audio Layer 3) instead of AMR. In this case, however, it is silent or the volume is low enough to correspond to silence. Therefore, it is necessary to detect from the content of the data rather than the size of the encoded audio data. The video decoder 25 decodes the input video data in accordance with, for example, MPEG4 (Moving Picture Coding Experts Group 4) system, and the decoded video signal is displayed on a liquid crystal display (LCD) 33 via a display control unit (not shown). To be displayed.

なお、ユーザインタフェースユニット３のカメラ（ＣＡＭ）３１とマイクロホン３２、及びベースバンドユニット２のビデオエンコーダ２３とオーディオエンコーダ２４は、携帯端末ＭＳ１〜ＭＳｍ同士でテレビジョン電話通信を行う場合に用いられる。また、無線ユニット１の送信回路（ＴＸ）１５は、ベースバンドユニット２で生成された送信パケットを無線信号に変換してアンテナ１１から基地局ＢＳ１〜ＢＳｎに向けて送信する。変調方式としては、ＱＰＳＫ（Quadriphase Phase Shift Keying）方式又はＱＡＭ（Quadrature Amplitude Modulation）方式と、拡散符号を使用するスペクトラム拡散方式が用いられる。 The camera (CAM) 31 and the microphone 32 of the user interface unit 3 and the video encoder 23 and the audio encoder 24 of the baseband unit 2 are used when videophone communication is performed between the mobile terminals MS1 to MSm. The transmission circuit (TX) 15 of the wireless unit 1 converts the transmission packet generated by the baseband unit 2 into a wireless signal and transmits the wireless signal from the antenna 11 to the base stations BS1 to BSn. As a modulation method, a QPSK (Quadriphase Phase Shift Keying) method or a QAM (Quadrature Amplitude Modulation) method and a spread spectrum method using a spread code are used.

電源ユニット４には、リチウムイオン電池等のバッテリ４１と、このバッテリ４１を商用電源出力（ＡＣ１００Ｖ）をもとに充電するための充電回路４２と、電圧生成回路（ＰＳ）４３とが設けられている。電圧生成回路４３は、例えばＤＣ／ＤＣコンバータからなり、バッテリ４１の出力電圧をもとに所定の電源電圧Ｖccを生成する。 The power supply unit 4 is provided with a battery 41 such as a lithium ion battery, a charging circuit 42 for charging the battery 41 based on a commercial power output (AC 100 V), and a voltage generating circuit (PS) 43. Yes. The voltage generation circuit 43 is composed of, for example, a DC / DC converter, and generates a predetermined power supply voltage Vcc based on the output voltage of the battery 41.

ところで、上記ベースバンドユニット２は、端末全体の動作を統括的に制御する制御モジュール２１と、例えばＮＡＮＤ型フラッシュメモリを使用したメモリ（ＭＥＭ）２９とを備えている。メモリ２９には、送受信メールや留守録データ等の送受信データ、電話帳データ、発着信履歴及び各種制御データ等の管理用データがそれぞれ記憶され、さらにコンテンツサーバＳＶから配信されたコンテンツデータも記憶される。 By the way, the baseband unit 2 includes a control module 21 that comprehensively controls the operation of the entire terminal, and a memory (MEM) 29 using, for example, a NAND flash memory. The memory 29 stores transmission / reception data such as transmission / reception mail and answering record data, telephone book data, outgoing / incoming call history, and various control data, and further stores content data distributed from the content server SV. The

制御モジュール２１は、マイクロプロセッサを使用したＣＰＵ（Central Processing Unit）と、アプリケーション・プログラムを記憶したＲＯＭと、送受信パケットを一時保持するバッファメモリとを備える。そして、この発明に係わる制御機能として、遅延時間検出機能２１１と、調整用フレーム選択機能２１２と、再生タイミング調整機能２１３とを備えている。これらの機能はいずれも、アプリケーション・プログラムをＣＰＵに実行させることにより実現される。 The control module 21 includes a central processing unit (CPU) that uses a microprocessor, a ROM that stores application programs, and a buffer memory that temporarily stores transmission and reception packets. As a control function according to the present invention, a delay time detection function 211, an adjustment frame selection function 212, and a reproduction timing adjustment function 213 are provided. All of these functions are realized by causing the CPU to execute an application program.

遅延時間検出機能２１１は、受信された映像フレーム及び音声フレームの再生タイミングの基準値に対する遅延時間を検出する。具体的には、制御モジュール２１内のバッファメモリにおける映像パケット及び音声パケットの基準値に対するバッファリング時間量を検出する。 The delay time detection function 211 detects a delay time with respect to the reference value of the reproduction timing of the received video frame and audio frame. Specifically, the buffering time amount with respect to the reference value of the video packet and the audio packet in the buffer memory in the control module 21 is detected.

調整用フレーム選択機能２１２は、上記遅延時間検出機能２１１により検出された基準値に対する遅延時間（バッファリング時間量）が予め設定されたしきい値を超えた場合に、それ以後に受信される映像ＲＴＰパケット及び音声ＲＴＰパケットの中から、フレーム内符号化された映像フレーム（キーフレーム（Ｉフレーム））の直前に位置しかつフレームサイズがしきい値以下の映像フレームと、当該映像フレームと再生時刻が対応する無音の音声フレームを含むパケットを検出する。そして、これらの条件を満たす映像ＲＴＰパケット及び音声ＲＴＰパケットが検出された場合に、これらのパケットに含まれる映像フレーム及び音声フレームを再生タイミングの調整用フレームとして選択する。 The adjustment frame selection function 212 is an image received after that when the delay time (buffering time amount) with respect to the reference value detected by the delay time detection function 211 exceeds a preset threshold value. Among the RTP packet and the audio RTP packet, a video frame that is located immediately before the intra-coded video frame (key frame (I frame)) and has a frame size equal to or smaller than a threshold value, the video frame, and the playback time Detects a packet containing a silent voice frame corresponding to. When video RTP packets and audio RTP packets satisfying these conditions are detected, video frames and audio frames included in these packets are selected as playback timing adjustment frames.

再生タイミング調整機能２１３は、上記調整用フレーム選択機能２１２により調整用フレームが選択された場合に、当該調整用フレームを含むパケットをバッファメモリから破棄する。またそれと共に、後続の映像ＲＴＰパケット及び音声ＲＴＰパケットのタイムスタンプ（ＰＴＳ）を順次繰り上げるように付け替え処理を行う。 When the adjustment frame selection function 212 selects the adjustment frame, the reproduction timing adjustment function 213 discards the packet including the adjustment frame from the buffer memory. At the same time, a replacement process is performed so that the time stamps (PTS) of subsequent video RTP packets and audio RTP packets are sequentially advanced.

次に、以上のように構成された携帯端末ＭＳ１〜ＭＳｍによるコンテンツ再生動作を説明する。図３はその制御手順と制御内容を示すフローチャートである。なお、ここでは携帯端末ＭＳ１がコンテンツサーバＳＶから配信される映像データ及び音声データを含むコンテンツデータを受信し再生する場合を例にとって説明する。 Next, the content reproduction operation by the mobile terminals MS1 to MSm configured as described above will be described. FIG. 3 is a flowchart showing the control procedure and control contents. Here, an example will be described in which mobile terminal MS1 receives and reproduces content data including video data and audio data distributed from content server SV.

携帯端末ＭＳ１において、無線ユニット１で受信復調された映像ＲＴＰパケット及び音声ＲＴＰパケットは、ベースバンドユニット２内においてバッファメモリに一旦蓄積される。そして、時間調整された適切なタイミングで読み出されたのち、パケット逆変換部２８によりデパケットされ、さらにビデオデコーダ２５及びオーディオデコーダ２６によりそれぞれ復号されて、ＬＣＤ３３及びスピーカ３５において再生出力される。 In the mobile terminal MS1, the video RTP packet and the audio RTP packet received and demodulated by the wireless unit 1 are temporarily stored in the buffer memory in the baseband unit 2. Then, after being read out at the appropriate time adjusted timing, it is depacketized by the packet inverse conversion unit 28, further decoded by the video decoder 25 and the audio decoder 26, and reproduced and output by the LCD 33 and the speaker 35.

ところで、携帯端末ＭＳ１のクロック速度がコンテンツサーバＳＶのクロック速度より遅いと、上記バッファメモリにおけるバッファリング時間量が例えば図４に示すように基準値から徐々に増加して、再生遅延が許容値を超えてしまう。そこで、制御モジュール２１は図３に示すようにステップＳ３１で上記映像ＲＴＰパケット及び音声ＲＴＰパケットを受信するための制御を行いながら、上記バッファメモリにおけるバッファリング時間量を監視する。そして、バッファリング時間量がしきい値以上に増加したか否かをステップＳ３２で判定する。この判定の結果、バッファリング時間量がしきい値に達していなければ、制御モジュール２１はステップＳ３７及びステップＳ３８において、ビデオデコーダ２５及びオーディオデコーダ２６に対し映像ＲＴＰパケット及び音声ＲＴＰパケットの復号再生処理を行わせる。 By the way, if the clock speed of the portable terminal MS1 is slower than the clock speed of the content server SV, the buffering time amount in the buffer memory gradually increases from the reference value as shown in FIG. It will exceed. Therefore, the control module 21 monitors the buffering time amount in the buffer memory while performing control for receiving the video RTP packet and audio RTP packet in step S31 as shown in FIG. Then, it is determined in step S32 whether or not the buffering time amount has increased beyond a threshold value. If the result of this determination is that the buffering time amount has not reached the threshold value, the control module 21 decodes the video RTP packet and audio RTP packet to the video decoder 25 and audio decoder 26 in steps S37 and S38. To do.

そして、一つの映像ＲＴＰパケット及び音声ＲＴＰパケットの復号再生処理が終了するごとに、制御モジュール２１はステップＳ３９において受信済のＲＴＰパケットの再生がすべて終了したか否かを判定する。そして、未再生のＲＴＰパケットが残っている場合にはステップＳ３１に戻り、以上述べたステップＳ３１〜ステップＳ３８によるＲＴＰパケットの受信・再生制御を繰り返し実行する。
したがって以上の状態では、受信された映像ＲＴＰパケット及び音声ＲＴＰパケットがそのタイムスタンプに示されるタイミングに従い受信順に復号再生される。 Then, every time decoding / reproduction processing of one video RTP packet and audio RTP packet is completed, the control module 21 determines in step S39 whether reproduction of all received RTP packets is completed. If an unreproduced RTP packet remains, the process returns to step S31, and RTP packet reception / reproduction control in steps S31 to S38 described above is repeatedly executed.
Therefore, in the above state, the received video RTP packet and audio RTP packet are decoded and reproduced in the order of reception according to the timing indicated by the time stamp.

一方、上記バッファリング時間量がしきい値以上に増加したとする。そうすると制御モジュール２１は、ステップＳ３３に移行して調整用フレームの検出処理を次のように行う。すなわち、それ以後映像ＲＴＰパケット及び音声ＲＴＰパケットが受信されるごとに、当該パケットに挿入されている映像フレーム及び音声フレームを分析し、当該フレームが調整用フレームとしての条件を備えているか否かを判定する。 On the other hand, it is assumed that the buffering time amount has increased beyond a threshold value. Then, the control module 21 proceeds to step S33 and performs adjustment frame detection processing as follows. That is, each time a video RTP packet and an audio RTP packet are received thereafter, the video frame and the audio frame inserted in the packet are analyzed, and whether or not the frame has a condition as an adjustment frame is determined. judge.

このとき、条件としては、
(1) 受信された映像フレームが、フレーム内符号化されたいわゆるキーフレーム（Ｉフレーム）の直前に位置し、かつフレームサイズがしきい値以下であること。なお、このしきい値は、例えば、色成分が殆どないとみなせる値を設定すればよい。
(2) 当該映像フレームと再生時刻が対応する音声フレームが無音フレームであること。
の２つがある。 At this time, as a condition,
(1) The received video frame is located immediately before a so-called key frame (I frame) that is intra-coded, and the frame size is equal to or smaller than a threshold value. The threshold value may be set to a value that can be regarded as having few color components, for example.
(2) The audio frame corresponding to the video frame and the playback time is a silent frame.
There are two.

制御モジュール２１は、上記２つの条件を両方とも満たす映像フレーム及び音声フレームが検出されると、ステップＳ３４からステップＳ３５に移行して上記検出された映像フレーム及び音声フレームを調整用フレームとしてこれを破棄する。またそれと共に、ステップＳ３６において、後続の映像ＲＴＰパケット及び音声ＲＴＰパケットのタイムスタンプ（ＰＴＳ）を順次繰り上げるように付け替え処理を行う。 When a video frame and an audio frame satisfying both of the above two conditions are detected, the control module 21 proceeds from step S34 to step S35 and discards the detected video frame and audio frame as an adjustment frame. To do. At the same time, in step S36, a replacement process is performed so that the time stamps (PTS) of subsequent video RTP packets and audio RTP packets are sequentially advanced.

例えば、いま図５（ｂ）に示す映像ＲＴＰパケットＶ３に含まれる映像フレームがＩフレームであり、その直前の映像ＲＴＰパケットＶ２に含まれる映像フレームのフレームサイズが一定値以下だったとする。また、図５（ａ）に示すように、上記映像ＲＴＰパケットＶ２と時間的に対応する音声ＲＴＰパケットＡ３，Ａ４，Ａ５に含まれる音声フレームがいずれも無音フレームだったとする。そうすると制御モジュール２１は、この映像ＲＴＰパケットＶ２と音声ＲＴＰパケットＡ３，Ａ４，Ａ５に含まれる映像フレーム及び音声フレームを調整対象フレームとして選択し、これらをバッファメモリから破棄する。そして、バッファメモリ上において、図５（ｂ）に示すように後続の映像ＲＴＰパケットＶ３，Ｖ４，…のタイムスタンプｔV3，ｔV4，…を順次ｔV2，ｔV3，…に繰り上げる。またそれと共に、図５（ａ）に示すように後続の音声ＲＴＰパケットＡ６，Ａ７，Ａ８のタイムスタンプｔA6，ｔA7，ｔA8をそれぞれｔA3，ｔA4，ｔA5に繰り上げる。 For example, it is assumed that the video frame included in the video RTP packet V3 shown in FIG. 5B is an I frame, and the frame size of the video frame included in the video RTP packet V2 immediately before it is less than a certain value. Also, as shown in FIG. 5A, it is assumed that the audio frames included in the audio RTP packets A3, A4, and A5 temporally corresponding to the video RTP packet V2 are all silent frames. Then, the control module 21 selects video frames and audio frames included in the video RTP packet V2 and audio RTP packets A3, A4, and A5 as adjustment target frames, and discards them from the buffer memory. Then, the time stamps tV3, tV4,... Of the subsequent video RTP packets V3, V4,... Are sequentially moved up to tV2, tV3,. At the same time, as shown in FIG. 5A, the time stamps tA6, tA7, tA8 of the subsequent voice RTP packets A6, A7, A8 are moved up to tA3, tA4, tA5, respectively.

そうして調整用フレームの破棄及びタイムスタンプの付け替え処理が終了すると、制御モジュール２１はステップＳ３７，Ｓ３８に移行して、上記再生タイミング調整後の映像フレーム及び音声フレームの復号再生制御を行う。したがって、映像フレーム及び音声フレームの再生タイミングは早められ、この結果映像及び音声の再生遅延の発生は防止される。 When the adjustment frame discarding and time stamp replacement processing ends, the control module 21 proceeds to steps S37 and S38, and performs decoding and reproduction control of the video frame and audio frame after the adjustment of the reproduction timing. Therefore, the reproduction timing of the video frame and the audio frame is advanced, and as a result, the occurrence of the reproduction delay of the video and audio is prevented.

なお、上記破棄された映像ＲＴＰパケットＶ２の直後に位置する映像ＲＴＰパケットＶ３と、同じく破棄された上記音声ＲＴＰパケットＡ３，Ａ４，Ａ５の直後に位置する音声ＲＴＰパケットＡ６との間に、図５に示すように時間差ΔＴが発生する場合がある。この場合、パケットを破棄して単に再生時刻を前に詰めると、再生タイミング調整前に映像ＲＴＰパケットＶ３が音声ＲＴＰパケットＡ６より前に再生される予定であるのが、調整後では後に再生されてしまう。 Note that, between the video RTP packet V3 positioned immediately after the discarded video RTP packet V2 and the audio RTP packet A6 positioned immediately after the discarded audio RTP packets A3, A4, A5, FIG. In some cases, a time difference ΔT may occur as shown in FIG. In this case, if the packet is discarded and the playback time is simply moved forward, the video RTP packet V3 is scheduled to be played back before the audio RTP packet A6 before the playback timing adjustment. End up.

そこで制御モジュール２１は、上記映像ＲＴＰパケットＶ３に含まれる映像フレーム及び音声ＲＴＰパケットＡ６に含まれる音声フレームを復号再生させる際に、両フレームの再生開始タイミングの差を元の再生開始タイミングと一致させるべく、例えば図５（ｃ）に示すように音声ＲＴＰパケットＡ６に含まれる音声フレームの再生タイミングを、再生タイミング調整後の映像ＲＴＰパケットＶ３の再生タイミングｔV3よりΔＴだけ遅延させて、タイミングｔ′A3とする。同様に、音声ＲＴＰパケットＡ７以降の各音声フレームについても、その再生タイミングを順次ΔＴ遅延させ、図５（ｃ）に示すようにｔ′A4，ｔ′A5とする。このようにすることで、映像フレームと音声フレームとの間の同期ずれが補正されて、映像及び音声のより一層高品質な再生が可能となる。 Therefore, when the control module 21 decodes and reproduces the video frame included in the video RTP packet V3 and the audio frame included in the audio RTP packet A6, the control module 21 matches the difference between the playback start timings of both frames with the original playback start timing. Therefore, for example, as shown in FIG. 5C, the reproduction timing of the audio frame included in the audio RTP packet A6 is delayed by ΔT from the reproduction timing tV3 of the video RTP packet V3 after the adjustment of the reproduction timing, and the timing t′A3. And Similarly, with respect to each audio frame after the audio RTP packet A7, the reproduction timing is sequentially delayed by ΔT to be t′A4 and t′A5 as shown in FIG. By doing so, the synchronization shift between the video frame and the audio frame is corrected, and the video and audio can be reproduced with higher quality.

なお、上記ステップＳ３４において２つの条件を両方とも満たす映像フレーム及び音声フレームが検出されるまでの期間では、制御モジュール２１はステップＳ３７，Ｓ３８に移行して、バッファメモリに蓄積された映像ＲＴＰパケット及び音声ＲＴＰパケットを受信順に復号再生させる。 In the period until the video frame and the audio frame satisfying both of the two conditions are detected in step S34, the control module 21 proceeds to steps S37 and S38, and the video RTP packet stored in the buffer memory and Audio RTP packets are decoded and reproduced in the order received.

以上説明したように第１の実施形態では、携帯端末ＭＳ１〜ＭＳｍにおいて、バッファメモリにおける映像及び音声のＲＴＰパケットのバッファリング時間量がしきい値を超えた場合に、それ以後に受信される映像ＲＴＰパケット及び音声ＲＴＰパケットの中から、Ｉフレームの直前に位置しかつフレームサイズがしきい値以下の映像フレームと、当該映像フレームと再生時刻が対応する無音の音声フレームを含むパケットを検出する。そして、これらの条件を満たす映像ＲＴＰパケット及び音声ＲＴＰパケットが選択された場合に、これらのパケットをバッファメモリから破棄する。またそれと共に、後続の映像ＲＴＰパケット及び音声ＲＴＰパケットのタイムスタンプを順次繰り上げるように付け替え処理を行うようにしている。 As described above, in the first embodiment, in mobile terminals MS1 to MSm, when the buffering time amount of video and audio RTP packets in the buffer memory exceeds a threshold value, video received after that is received. From the RTP packet and the audio RTP packet, a packet including a video frame located immediately before the I frame and having a frame size equal to or smaller than a threshold and a silent audio frame corresponding to the video frame and the reproduction time is detected. When a video RTP packet and an audio RTP packet that satisfy these conditions are selected, these packets are discarded from the buffer memory. At the same time, the replacement process is performed so that the time stamps of the subsequent video RTP packet and audio RTP packet are sequentially advanced.

したがって、受信側の携帯端末ＭＳ１〜ＭＳｍのクロック速度が送信側のコンテンツサーバＳＶのクロック速度よりも遅く、受信された映像フレーム及び音声フレームの再生遅延が発生するおそれがある場合でも、上記パケットの破棄とタイムスタンプの繰り上げ処理により上記再生遅延の発生を未然に防止することができる。 Therefore, even when the clock speed of the mobile terminals MS1 to MSm on the receiving side is slower than the clock speed of the content server SV on the transmitting side, there is a possibility that reproduction delay of the received video frame and audio frame may occur. Occurrence of the reproduction delay can be prevented by discarding and time stamp advance processing.

しかも、再生タイミングの調整用フレームとしては、Ｉフレームの直前に位置しかつフレームサイズがしきい値以下の映像フレームと無音の音声フレームが選択される。このため、調整用フレームを破棄しても、再生映像及び再生音声の途切れや不自然な変化は抑えられ、これにより連続性のある高品質の映像及び音声の再生を維持することができる。 In addition, as a frame for adjusting the reproduction timing, a video frame and a silent audio frame that are located immediately before the I frame and whose frame size is equal to or smaller than a threshold value are selected. For this reason, even if the adjustment frame is discarded, the interruption and unnatural change of the reproduced video and reproduced audio can be suppressed, and thereby the continuous reproduction of high quality video and audio can be maintained.

（第２の実施形態）
この発明の第２の実施形態は、送信側のコンテンツサーバにおいて、映像ＲＴＰパケットのストリーム中にＩフレームとその直前にフレームサイズが一定値以下の再生タイミング調整用の映像フレームを定期的に挿入すると共に、音声ＲＴＰパケットのストリーム中の上記調整用の映像フレームと時間的に対応する位置に無音の音声フレームを挿入する。そして、受信側の携帯端末においては、受信された映像ＲＴＰパケット及び音声ＲＴＰパケットのストリーム中から上記調整用の映像フレーム及び無音の音声フレームを含むパケットを検出し、この検出されたパケットを破棄すると共に後続の映像ＲＴＰパケット及び音声ＲＴＰパケットのタイムスタンプを順次繰り上げるようにしたものである。 (Second Embodiment)
In the second embodiment of the present invention, in the content server on the transmission side, an I frame and a video frame for adjusting a playback timing whose frame size is equal to or smaller than a certain value are periodically inserted in the video RTP packet stream. At the same time, a silent audio frame is inserted at a position temporally corresponding to the adjustment video frame in the audio RTP packet stream. Then, in the mobile terminal on the receiving side, the packet including the adjustment video frame and the silent audio frame is detected from the received video RTP packet and audio RTP packet streams, and the detected packet is discarded. At the same time, the time stamps of subsequent video RTP packets and audio RTP packets are sequentially advanced.

図６は、この発明の第２の実施形態に係わる再生タイミング同期方法を実現する機能を備えたコンテンツサーバＳＶの構成を示すブロック図である。
このコンテンツサーバＳＶは、中央処理ユニット（ＣＰＵ）５１を備える。そして、このＣＰＵ５１には、バス５２を介してプログラムメモリ５３及びコンテンツサーバ５４が接続され、さらにエンコーダ５５、通信インタフェース（通信Ｉ／Ｆ）５６、入力インタフェース（入力Ｉ／Ｆ）５７がそれぞれ接続されている。 FIG. 6 is a block diagram showing the configuration of the content server SV having a function for realizing the reproduction timing synchronization method according to the second embodiment of the present invention.
The content server SV includes a central processing unit (CPU) 51. The CPU 51 is connected to a program memory 53 and a content server 54 via a bus 52, and further to an encoder 55, a communication interface (communication I / F) 56, and an input interface (input I / F) 57. ing.

エンコーダ５５は、ＣＰＵ５１の制御の下で配信対象の映像データ及び音声データの符号化処理を行う。通信Ｉ／Ｆ５６は、上記ＣＰＵ５１の制御の下で、映像ＲＴＰパケット及び音声ＲＴＰパケットを配信先の携帯端末ＭＳ１〜ＭＳｍに向けて通信ネットワークＮＷへ送信する。入力Ｉ／Ｆ５７には、カメラ５８及びマイクロホン５９が接続される。入力Ｉ／Ｆ５７は、ＣＰＵ５１の制御の下で上記カメラ５８から映像信号を取り込むと共にマイクロホン５９から音声信号を取り込む。 The encoder 55 performs processing for encoding video data and audio data to be distributed under the control of the CPU 51. Under the control of the CPU 51, the communication I / F 56 transmits the video RTP packet and the audio RTP packet to the communication network NW toward the distribution destination mobile terminals MS1 to MSm. A camera 58 and a microphone 59 are connected to the input I / F 57. The input I / F 57 captures a video signal from the camera 58 and a sound signal from the microphone 59 under the control of the CPU 51.

コンテンツメモリ５４は、図示しないコンテンツ作成元の端末から転送されたコンテンツデータや図示しない記憶媒体から読み込んだコンテンツデータを記憶する。また、必要に応じて、上記入力Ｉ／Ｆ５７により取り込まれた映像信号及び音声信号の符号化データをコンテンツデータとして記憶する。 The content memory 54 stores content data transferred from a content creation source terminal (not shown) and content data read from a storage medium (not shown). Further, if necessary, the encoded data of the video signal and audio signal captured by the input I / F 57 is stored as content data.

プログラムメモリ５３には、この発明に係わる機能を実現するための制御プログラムとして、コンテンツ配信制御プログラム５３１と、調整用フレーム挿入制御プログラム５３２が格納されている。
コンテンツ配信制御プログラム５３１は、上記入力Ｉ／Ｆ５７により取り込まれた映像信号及び音声信号をエンコーダ５５に符号化処理させ、さらにこの符号化処理により生成された映像フレーム及び音声フレームをＲＴＰパケットに変換したのち、通信Ｉ／Ｆ５６から配信先の携帯端末ＭＳ１〜ＭＳｍに向け送信させる制御を、上記ＣＰＵ５１に実行させる。また、コンテンツメモリ５４に記憶されたコンテンツデータを読み出し、このコンテンツデータをＲＴＰパケットに変換して送信させる制御を、上記ＣＰＵ５１に実行させる。 The program memory 53 stores a content distribution control program 531 and an adjustment frame insertion control program 532 as control programs for realizing the functions according to the present invention.
The content distribution control program 531 causes the encoder 55 to encode the video signal and the audio signal captured by the input I / F 57, and further converts the video frame and the audio frame generated by the encoding process into an RTP packet. After that, the CPU 51 is caused to execute control to transmit from the communication I / F 56 to the mobile terminals MS1 to MSm as distribution destinations. Further, the CPU 51 is caused to execute control for reading the content data stored in the content memory 54 and converting the content data into an RTP packet and transmitting it.

調整用フレーム挿入制御プログラム５３２は、以下のような処理を上記ＣＰＵ５１に実行させる。すなわち、上記映像ＲＴＰパケット及び音声ＲＴＰパケットを生成する際に、映像パケットのストリーム中に、フレーム内符号化されたＩフレームを一定の時間間隔で挿入すると共に、当該Ｉフレームの直前にフレームサイズが一定サイズ以下の再生タイミング調整用の映像フレームを挿入する。また、音声パケットのストリーム中の上記調整用の映像フレームと時間的に対応する位置に、無音の音声フレームを挿入する。 The adjustment frame insertion control program 532 causes the CPU 51 to execute the following processing. That is, when generating the video RTP packet and the audio RTP packet, I-frames that are intra-frame encoded are inserted into the video packet stream at regular intervals, and the frame size is set immediately before the I-frame. Insert a video frame for adjusting the playback timing below a certain size. In addition, a silent audio frame is inserted at a position temporally corresponding to the video frame for adjustment in the audio packet stream.

なお、一般に無音を挿入できる箇所はストリームの内容によって決まるため、上記時間間隔において上記条件を満たす映像フレームおよび音声フレームを挿入できない場合がある。その場合には、無音の音声フレームの再生タイミングを検出し、そのタイミングを基にフレーム内符号化されたＩフレームを上記時間間隔に関わらず適宜挿入してもよい。 It should be noted that in general, the place where silence can be inserted is determined by the content of the stream, and therefore video frames and audio frames that satisfy the above conditions may not be inserted in the time interval. In that case, the reproduction timing of a silent audio frame may be detected, and an I-frame encoded within the frame based on the timing may be inserted as appropriate regardless of the time interval.

なお、携帯端末ＭＳ１〜ＭＳｍにおけるクロックの精度は５０ppm程度である。このため、例えば１フレームが２０msecの音声フレームであれば、１フレーム分の同期ずれが発生するまでに４００sec（約７分）かかり、また１フレームが６６msecの映像フレームであれば、１フレーム分の同期ずれが発生するまでに約２２分かかる。したがって、上記Ｔcは上記７分又は２２分程度に設定すればよい。 Note that the clock accuracy of the mobile terminals MS1 to MSm is about 50 ppm. For this reason, for example, if one frame is an audio frame of 20 msec, it takes 400 seconds (about 7 minutes) until the synchronization loss of one frame occurs, and if one frame is a video frame of 66 msec, it is one frame. It takes about 22 minutes for synchronization to occur. Therefore, the Tc may be set to about 7 minutes or 22 minutes.

このような構成であるから、コンテンツサーバＳＶでは、コンテンツデータを配信する際に、例えば図７に示すように、映像ＲＴＰパケットのストリーム中にＩフレームＶ２，Ｖｉ，Ｖｊ，…が一定の時間間隔Ｔcで挿入され、さらにこれらのＩフレームＶ２，Ｖｉ，Ｖｊ，…の直前にフレームサイズが一定サイズ以下の調整用の映像フレームＶ１，Ｖi-1，Ｖj-1，…がそれぞれ挿入される。また、音声ＲＴＰパケットのストリーム中の上記調整用の映像フレームＶ１，Ｖi-1，Ｖj-1，…と時間的に対応する位置に、無音の音声フレームＡ１，Ａ２，Ａ３、Ａm-3，Ａm-2，Ａm-1、Ａn-3，Ａn-2，Ａn-1がそれぞれ挿入される。 With such a configuration, when the content server SV distributes the content data, for example, as shown in FIG. 7, I frames V2, Vi, Vj,... .. Are further inserted immediately before these I frames V2, Vi, Vj,..., And adjustment video frames V1, Vi-1, Vj-1,. In addition, silent audio frames A1, A2, A3, Am-3, Am are located at positions corresponding to the video frames V1, Vi-1, Vj-1,... For adjustment in the audio RTP packet stream. -2, Am-1, An-3, An-2, and An-1 are respectively inserted.

一方、携帯端末ＭＳ１〜ＭＳｍでは、映像ＲＴＰパケット及び音声ＲＴＰパケットの受信再生中にバッファリング時間量がしきい値以上に増加したか否かが監視される。そして、バッファリング時間量がしきい値以上に増加すると、その後受信される映像ＲＴＰパケット及び音声ＲＴＰパケットのストリーム中からそれぞれ、Ｉフレームの直前に位置する調整用の映像フレーム及び無音の音声フレームを含むパケットが検出される。そして、この検出されたパケットを破棄する処理と、後続の映像ＲＴＰパケット及び音声ＲＴＰパケットのタイムスタンプを順次繰り上げる処理が行われる。なお、この携帯端末ＭＳ１〜ＭＳｍによるコンテンツ同期再生制御の手順と内容は、第１の実施形態において図３に示したものと同じである。 On the other hand, in the mobile terminals MS1 to MSm, it is monitored whether or not the buffering time amount has increased beyond the threshold during the reception and reproduction of the video RTP packet and the audio RTP packet. When the amount of buffering time increases beyond the threshold value, an adjustment video frame and a silent audio frame located immediately before the I frame are respectively extracted from the video RTP packet and audio RTP packet streams received thereafter. The containing packet is detected. Then, a process of discarding the detected packet and a process of sequentially incrementing time stamps of the subsequent video RTP packet and audio RTP packet are performed. Note that the procedure and content of the content synchronized playback control by the mobile terminals MS1 to MSm are the same as those shown in FIG. 3 in the first embodiment.

以上述べたように第２の実施形態では、送信側のコンテンツサーバＳＶにおいて、映像ＲＴＰパケットのストリーム中にＩフレームとその直前にフレームサイズが一定値以下の再生タイミング調整用の映像フレームを一定の時間間隔Ｔcで挿入すると共に、音声ＲＴＰパケットのストリーム中の上記調整用の映像フレームと時間的に対応する位置に無音の音声フレームを挿入して送信する。そして、受信側の携帯端末ＭＳ１〜ＭＳｍにおいては、受信された映像ＲＴＰパケット及び音声ＲＴＰパケットのストリーム中から上記調整用の映像フレーム及び無音の音声フレームを含むパケットを検出し、この検出されたパケットを破棄すると共に、後続の映像ＲＴＰパケット及び音声ＲＴＰパケットのタイムスタンプを順次繰り上げたのち、復号再生するようにしている。 As described above, in the second embodiment, in the content server SV on the transmission side, the I frame in the video RTP packet stream and the video frame for adjusting the playback timing whose frame size is a certain value or less immediately before the I frame are fixed. In addition to being inserted at a time interval Tc, a silent audio frame is inserted at a position temporally corresponding to the video frame for adjustment in the audio RTP packet stream and transmitted. In the mobile terminals MS1 to MSm on the receiving side, the packet including the adjustment video frame and the silent audio frame is detected from the received video RTP packet and audio RTP packet streams, and the detected packet is detected. And the time stamps of subsequent video RTP packets and audio RTP packets are sequentially incremented, and then decoded and reproduced.

このため、受信側の携帯端末ＭＳ１〜ＭＳｍでは、バッファリング時間量がしきい値以上に増加した場合、最長でも上記時間間隔Ｔc以内に調整用パケットの破棄とタイムスタンプの繰り上げによる同期処理が行われる。このため、許容量を超えた再生遅延を起こすことなく、より安定な再生タイミング同期制御を行うことができる。 For this reason, in the mobile terminals MS1 to MSm on the receiving side, when the amount of buffering time increases beyond the threshold, synchronization processing is performed by discarding the adjustment packet and raising the time stamp within the time interval Tc at the longest. Is called. For this reason, more stable reproduction timing synchronization control can be performed without causing a reproduction delay exceeding the allowable amount.

（その他の実施形態）
前記第１の実施形態では、バッファリング時間量の増加を判定するためのしきい値を固定値としたが、バッファリング時間量と基準値との差が大きくなるに従い増加させるようにしてもよい。すなわち、ストリームによっては、バッファリング時間量と基準値との差が大きいにも関わらず、上記しきい値を下回るサイズの映像フレームと無音の音声フレームが出現しない場合がある。このような場合には、バッファリング時間量と基準値との差が大きくなるに従いしきい値を高める。このようにすると、映像フレームのサイズがしきい値を超える確率を高めることができ、これにより再生タイミングを調整する頻度を高めることができる。なお、しきい値を増加させるアルゴリズムとしては、パケット網における輻輳回避のためのパケット破棄方法を適用することが可能である。 (Other embodiments)
In the first embodiment, the threshold value for determining an increase in the buffering time amount is a fixed value. However, the threshold value may be increased as the difference between the buffering time amount and the reference value increases. . That is, depending on the stream, there may be a case where a video frame having a size lower than the threshold value and a silent audio frame do not appear even though the difference between the buffering time amount and the reference value is large. In such a case, the threshold value is increased as the difference between the buffering time amount and the reference value increases. In this way, it is possible to increase the probability that the size of the video frame exceeds the threshold value, thereby increasing the frequency of adjusting the reproduction timing. As an algorithm for increasing the threshold value, a packet discarding method for avoiding congestion in the packet network can be applied.

その他、送信側のコンテンツサーバの構成や調整用フレーム挿入制御の手順と内容、コンテンツ再生装置の種類やその構成、再生タイミング同期制御の手順と内容、バッファリング時間量の増加を判定するためのしきい値の具体的な値、送信側装置で調整フレームを挿入するための周期Ｔcの具体的な値等についても、この発明の要旨を逸脱しない範囲で種々変形して実施できる。 In addition, the configuration of the content server on the transmitting side, the procedure and content of adjustment frame insertion control, the type and configuration of the content playback device, the procedure and content of playback timing synchronization control, and the increase in the amount of buffering time are determined. The specific value of the threshold value, the specific value of the period Tc for inserting the adjustment frame in the transmission side apparatus, and the like can be variously modified and implemented without departing from the gist of the present invention.

要するにこの発明は、上記各実施形態そのままに限定されるものではなく、実施段階ではその要旨を逸脱しない範囲で構成要素を変形して具体化できる。また、上記各実施形態に開示されている複数の構成要素の適宜な組み合せにより種々の発明を形成できる。例えば、各実施形態に示される全構成要素から幾つかの構成要素を削除してもよい。さらに、異なる実施形態に亘る構成要素を適宜組み合せてもよい。 In short, the present invention is not limited to the above-described embodiments as they are, and can be embodied by modifying the components without departing from the scope of the invention in the implementation stage. Various inventions can be formed by appropriately combining a plurality of constituent elements disclosed in the above embodiments. For example, some components may be deleted from all the components shown in each embodiment. Furthermore, you may combine suitably the component covering different embodiment.

この発明の第１の実施形態に係わるコンテンツ再生装置を使用した移動通信システムの構成を示す図である。It is a figure which shows the structure of the mobile communication system using the content reproduction apparatus concerning 1st Embodiment of this invention. この発明の第１の実施形態に係わるコンテンツ再生装置の構成を示すブロック図である。It is a block diagram which shows the structure of the content reproduction apparatus concerning the 1st Embodiment of this invention. 図２に示したコンテンツ再生装置によるコンテテンツ再生制御の手順と内容を示すフローチャートである。3 is a flowchart showing a procedure and details of content reproduction control by the content reproduction apparatus shown in FIG. 2. 図２に示したコンテンツ再生装置におけるバッファリング量の時間変化の一例を示す図である。It is a figure which shows an example of the time change of the buffering amount in the content reproduction apparatus shown in FIG. 図３に示したコンテンツ再生制御の動作説明に使用するタイミング図である。FIG. 4 is a timing chart used for explaining the operation of content reproduction control shown in FIG. 3. この発明の第２の実施形態に係わる再生タイミング同期方法を実施する機能を備えたコンテンツサーバの構成を示すブロック図である。It is a block diagram which shows the structure of the content server provided with the function which implements the reproduction | regeneration timing synchronization method concerning 2nd Embodiment of this invention. この発明の第２の実施形態に係わる再生タイミング同期方法を説明するためのタイミング図である。It is a timing diagram for demonstrating the reproduction | regeneration timing synchronization method concerning the 2nd Embodiment of this invention.

Explanation of symbols

ＭＳ１〜ＭＳｍ…携帯端末、ＮＷ…通信ネットワーク、ＢＳ１〜ＢＳｎ…基地局、ＳＶ…コンテンツサーバ、１…無線ユニット、２…ベースバンドユニット、３…ユーザインタフェースユニット、４…電源ユニット、１１…アンテナ、１２…アンテナ共用器（ＤＵＰ）、１３…受信回路（ＲＸ）、１４…周波数シンセサイザ（ＳＹＮ）、１５…送信回路（ＴＸ）、２１…制御モジュール、２１１…遅延時間検出機能、２１２…調整用フレーム選択機能、２１３…再生タイミング調整機能、２２…多重分離部、２３…ビデオエンコーダ、２４…オーディオエンコーダ、２５…ビデオデコーダ、２６…オーディオデコーダ、２７…パケット変換部、２８…パケット逆変換部、２９…メモリ（ＭＥＭ）、３１…カメラ（ＣＡＭ）、３２…マイクロホン、３３…液晶表示器（ＬＣＤ）、３４…スピーカ、３５…入力デバイス、４１…バッテリ、４２…充電回路（ＣＨＧ）、４３…電圧生成回路（ＰＳ）、５１…ＣＰＵ、５２…バス、５３…プログラムメモリ、５３１…コンテンツ配信制御プログラム、５３２…調整用フレーム挿入制御プログラム、５４…コンテンツメモリ、５５…エンコーダ、５６…通信インタフェース、５７…入力インタフェース、５８…カメラ、５９…マイクロホン。 MS1 to MSm ... mobile terminal, NW ... communication network, BS1 to BSn ... base station, SV ... content server, 1 ... wireless unit, 2 ... baseband unit, 3 ... user interface unit, 4 ... power supply unit, 11 ... antenna, DESCRIPTION OF SYMBOLS 12 ... Antenna duplexer (DUP), 13 ... Reception circuit (RX), 14 ... Frequency synthesizer (SYN), 15 ... Transmission circuit (TX), 21 ... Control module, 211 ... Delay time detection function, 212 ... Adjustment frame Selection function, 213 ... reproduction timing adjustment function, 22 ... demultiplexing unit, 23 ... video encoder, 24 ... audio encoder, 25 ... video decoder, 26 ... audio decoder, 27 ... packet conversion unit, 28 ... packet reverse conversion unit, 29 ... Memory (MEM), 31 ... Camera (CAM), 32 ... Microphone 33 ... Liquid crystal display (LCD) 34 ... Speaker 35 ... Input device 41 ... Battery 42 ... Charging circuit (CHG) 43 ... Voltage generation circuit (PS) 51 ... CPU 52 ... Bus 53 ... Program memory, 531 ... Content distribution control program, 532 ... Adjustment frame insertion control program, 54 ... Content memory, 55 ... Encoder, 56 ... Communication interface, 57 ... Input interface, 58 ... Camera, 59 ... Microphone.

Claims

In a content reproduction apparatus that receives and reproduces content data including at least one of a video frame and an audio frame that are packetized and transmitted from a transmission-side apparatus,
Means for detecting a delay time with respect to a reference value of the reproduction timing of the received video frame or audio frame;
When the detected delay time exceeds a preset value, it is located immediately before the intra-coded frame from the video frame or audio frame received thereafter and the amount of information is reduced. An adjustment frame detecting means for detecting a video frame or an audio frame below the threshold as an adjustment frame;
A content playback apparatus comprising: playback timing adjustment means for performing processing for discarding the adjustment frame and advancing the playback time of the subsequent frame when the adjustment frame is detected.

When the content data includes a video frame and an audio frame whose playback times correspond to each other,
When the detected delay time exceeds a preset value, the adjustment frame detection means detects a frame immediately before an intra-coded frame from video and audio frames received thereafter. 2. The content according to claim 1, wherein both of a video frame located in a position and having an information amount equal to or smaller than a threshold and a silent audio frame corresponding to the video frame and a reproduction time are detected as adjustment frames. Playback device.

3. The content reproduction apparatus according to claim 1, wherein the adjustment frame detection means includes means for increasing the threshold value in accordance with an increase in the delay time.

A playback timing synchronization method for a content playback device that receives and plays back content data including at least one of a video frame and an audio frame that are packetized and transmitted from a transmission side device,
In the transmission side device, when the content data is generated, a process of periodically inserting a video frame or an audio frame whose information amount is equal to or less than a threshold value immediately before the intra-coded frame;
In the content reproduction device, a video that is located immediately before an intra-coded frame and has an information amount equal to or less than a threshold value from among video frames or audio frames included in content data transmitted from the transmission-side device Detecting a frame or audio frame as an adjustment frame;
A reproduction timing synchronization method comprising: a step of performing a process of discarding the adjustment frame and advancing the reproduction time of a subsequent frame when the adjustment frame is detected in the content reproduction apparatus.

When the content data is generated, the transmission side apparatus periodically inserts a video frame whose information amount is equal to or less than a threshold value immediately before the intra-coded frame in the video stream, and the video 5. The reproduction timing synchronization method according to claim 4, wherein a silent audio frame corresponding to the frame and the reproduction time is inserted into the audio stream.