JP6119765B2

JP6119765B2 - Information processing apparatus, information processing system, information processing program, and moving image data transmission / reception method

Info

Publication number: JP6119765B2
Application number: JP2014546811A
Authority: JP
Inventors: 敏郎大櫃
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2012-11-16
Filing date: 2012-11-16
Publication date: 2017-04-26
Anticipated expiration: 2032-11-16
Also published as: WO2014076823A1; JPWO2014076823A1

Description

本発明は、動画データの送受信技術に関する。 The present invention relates to moving image data transmission / reception technology.

現在、外部に持ち出して電池により駆動することができるスマートフォンのような移動端末機器（以下、移動端末と記述）には、例えば内蔵カメラによって動画を撮影し、それを同移動端末で視聴することができるグラフィック性能を持つものがある。このような移動端末は動画再生機能、インターネット接続機能を持ち、インターネット上にある動画サイトのコンテンツを視聴することができる。また、移動端末には、インターネットを介して家庭内のパーソナルコンピュータ（以下、パソコンと記述）の補助記憶装置等に記憶された個人的な写真や動画の視聴ができるものが提供されている。 Currently, mobile terminal devices such as smartphones (hereinafter referred to as mobile terminals) that can be taken out and driven by a battery can shoot videos with a built-in camera and view them on the mobile terminal. Some have graphic performance that can be done. Such a mobile terminal has a video playback function and an Internet connection function, and can view the content of a video site on the Internet. In addition, mobile terminals that can view personal photos and videos stored in an auxiliary storage device of a personal computer (hereinafter referred to as a personal computer) in the home via the Internet are provided.

ところで、動画は送信時に、映像フレームと音声フレームとを個別にエンコードして異なるストリームとして送信される。この場合、動画の視聴時に音声と映像とがずれて再生されることがある。視聴者が画面に表示されている人を見たときに違和感なく話しているように感じられるように映像フレームと音声フレームを調整することをリップシンク調整と呼ぶ。 By the way, at the time of transmission, a moving image is transmitted as a different stream by separately encoding a video frame and an audio frame. In this case, the audio and the video may be reproduced with a shift when the moving image is viewed. The adjustment of the video frame and the audio frame so that the viewer can feel as if he / she is speaking without feeling uncomfortable when he / she sees the person displayed on the screen is called lip sync adjustment.

映像フレームと音声フレームとが個別にエンコードされ、異なるストリームとして送信されて、送信先で再生される技術の一例として、第１〜５の技術がある。第１の技術として、エンコーダ側とデコーダ側のクロック周波数の差を吸収し、オーディオフレーム出力タイミングにビデオフレーム出力タイミングを合わせてリップシンクさせる受信装置がある。この受信装置は、エンコーダ側の基準クロックに基づくビデオタイムスタンプが順次付された複数の符号化ビデオフレームと基準クロックに基づくオーディオタイムスタンプが順次付された複数の符号化オーディオフレームとを受信して復号する復号手段を有する。さらに、受信装置は、復号手段によって符号化ビデオフレーム及び符号化オーディオフレームを復号した結果得られる複数のビデオフレーム及び複数のオーディオフレームを蓄積する記憶手段を有する。また受信装置は、エンコーダ側の基準クロックのクロック周波数とデコーダ側のシステムタイムクロックのクロック周波数とのずれによって生じる時間差を算出する算出手段を有する。さらに受信装置は、時間差に応じ複数のオーディオフレームをフレーム単位で順次出力するときのオーディオフレーム出力タイミングを基準として複数のビデオフレームをフレーム単位で順次出力するときのビデオフレーム出力タイミングを調整する調整手段を有する。 As examples of techniques in which video frames and audio frames are individually encoded, transmitted as different streams, and reproduced at a transmission destination, there are first to fifth techniques. As a first technique, there is a receiving apparatus that absorbs a difference in clock frequency between an encoder side and a decoder side and lip-syncs the audio frame output timing with the video frame output timing. The receiving apparatus receives a plurality of encoded video frames sequentially attached with video time stamps based on a reference clock on an encoder side and a plurality of encoded audio frames sequentially attached with audio time stamps based on a reference clock. Decoding means for decoding is included. Further, the receiving apparatus has storage means for storing a plurality of video frames and a plurality of audio frames obtained as a result of decoding the encoded video frame and the encoded audio frame by the decoding means. The receiving device also has a calculation means for calculating a time difference caused by a difference between the clock frequency of the reference clock on the encoder side and the clock frequency of the system time clock on the decoder side. Further, the receiving apparatus adjusts video frame output timing when sequentially outputting a plurality of video frames in units of frames based on an audio frame output timing when sequentially outputting a plurality of audio frames in units of frames according to a time difference. Have

第２の技術として、リップシンク制御装置は所定のタイミングで入力された音声基準信号を含み且つエンコードされた音声信号を入力する第１の入力手段を備える。さらに、リップシンク制御装置は、音声基準信号と同じタイミングで入力された映像基準信号を含む且つエンコードされた映像信号を入力する第２の入力手段と、第１の入力手段により入力された音声信号をデコードする第１のデコード手段と、を備える。さらに、リップシンク制御装置は、第２の入力手段により入力された映像信号をデコードする第２のデコード手段と、第１のデコード手段でデコードされた音声信号に含まれる音声基準信号と、を備える。さらに、リップシンク制御装置は、第２のデコード手段でデコードされた映像信号に含まれる映像基準信号との間の時間ずれ量を検出する時間ずれ検出手段を備える。さらに、リップシンク制御装置は、時間ずれ検出手段での検出結果に基づいて、音声信号と映像信号とのうち相互の時間関係が早い方の信号を時間ずれ量分遅らせてそれぞれ出力するように制御する制御手段と、を備える。 As a second technique, the lip sync control apparatus includes first input means for inputting an encoded audio signal that includes an audio reference signal input at a predetermined timing. Furthermore, the lip sync control device includes a second input means for inputting an encoded video signal including a video reference signal input at the same timing as the audio reference signal, and an audio signal input by the first input means. First decoding means for decoding. The lip sync control device further includes second decoding means for decoding the video signal input by the second input means, and an audio reference signal included in the audio signal decoded by the first decoding means. . The lip sync control device further includes a time shift detection unit that detects a time shift amount between the video reference signal included in the video signal decoded by the second decoding unit. Further, the lip sync control device controls the audio signal and the video signal, which are earlier in time relative to each other, to be output after being delayed by the amount of time shift based on the detection result of the time shift detection means. And a control means.

第３の技術として、映像及び音声をネットワークを介して通信する画像通信装置であって、映像及び音声の受信側での受信状況に応じて、送信するレートをアプリケーション層から制御する制御手段を有する画像通信装置がある。画像通信装置は、さらに、制御手段により制御された送信レートで映像及び音声を送信する送信手段を有する。 A third technique is an image communication apparatus that communicates video and audio via a network, and includes a control unit that controls a transmission rate from an application layer according to a reception state on the video and audio reception side. There is an image communication device. The image communication apparatus further includes transmission means for transmitting video and audio at a transmission rate controlled by the control means.

第４の技術として、パケット廃棄時に廃棄による品質への影響の小さいパケットを多く廃棄し、廃棄による品質への影響の大きなパケットはなるべく廃棄しない方式がある。 As a fourth technique, there is a method in which many packets that have a small effect on quality due to discard are discarded when a packet is discarded, and packets that have a large effect on quality due to discard are not discarded as much as possible.

第５の技術として、同期のとれたデジタル映像とデジタル音声それぞれに同一の時間情報を電子透かしとして挿入し、同期のずれた電子透かし挿入後のデジタル映像とデジタル音声それぞれから時間情報を抽出する。そして、デジタル映像の時間情報とデジタル音声の時間情報とを比較して両者が一致するようデジタル映像とデジタル音声の少なくともいずれか一方を遅延する。 As a fifth technique, the same time information is inserted as a digital watermark into each of the synchronized digital video and digital audio, and the time information is extracted from each of the digital video and digital audio after the digital watermark is out of synchronization. Then, the time information of the digital video and the time information of the digital audio are compared, and at least one of the digital video and the digital audio is delayed so that they match.

特開２００５−１０２１９２号公報JP 2005-102192 A 特開２００８−１３１５９１号公報JP 2008-131591 A 特開平１０−１６４５３３号公報JP-A-10-164533 特開平４−３６２８３２号公報JP-A-4-362932 特開２００３−２５９３１４号公報JP 2003-259314 A

上記第１〜第５の技術はいずれも、映像フレームと音声フレームとを個別にエンコードして異なるストリームとして送信され、送信先で再生する技術である。 Each of the first to fifth techniques is a technique in which video frames and audio frames are individually encoded and transmitted as different streams and reproduced at a transmission destination.

しかしながら、上記第１〜５の技術を用いても、動画の再生レートより伝送レートが低いネットワーク環境では伝送前のデータ本来の画質と比較すると再生時の画像品質が低下する。 However, even when the first to fifth technologies are used, the image quality at the time of reproduction is deteriorated in the network environment where the transmission rate is lower than the reproduction rate of the moving image as compared with the original image quality of the data before transmission.

そこで、１つの側面では、本発明は、帯域幅の狭いネットワークを介して保存先から送信された動画データの再生品質を向上させることを目的とする。 Accordingly, in one aspect, an object of the present invention is to improve the reproduction quality of moving image data transmitted from a storage destination via a network having a narrow bandwidth.

一態様の第１の情報処理装置は、記憶部、監視部、削除部、フレーム情報生成部、及び送信部を含む。記憶部は、第１の映像データと、第１の映像データと対応付けた同期情報を含む音声データと、を記憶する。監視部は、通信ネットワークの状態を監視する。削除部は、監視の結果に応じて、第１の映像データの単位時間当たりの第１のフレーム数を示す第１のフレームレートから、第１の映像データのうち、対応する音声データの音声レベルが所定の閾値以下である連続する映像フレームを削除して、第１のフレームレートより低い第２のフレームレートにした第２の映像データを生成する。フレーム情報生成部は、削除されたフレームに関するフレーム情報を生成する。送信部は、第２の映像データ及びフレーム情報を送信する。 The first information processing apparatus according to one aspect includes a storage unit, a monitoring unit, a deletion unit, a frame information generation unit, and a transmission unit. The storage unit stores first video data and audio data including synchronization information associated with the first video data. The monitoring unit monitors the state of the communication network. The deletion unit determines, from the first frame rate indicating the first number of frames per unit time of the first video data, the audio level of the corresponding audio data in the first video data according to the monitoring result. The second video data having a second frame rate lower than the first frame rate is generated by deleting consecutive video frames having a value equal to or less than a predetermined threshold. The frame information generation unit generates frame information regarding the deleted frame. The transmission unit transmits the second video data and frame information.

第２の情報処理装置は、受信部、補完画像生成部、及び映像データ生成部を含む。受信部は、第１の映像データの単位時間当たりの第１のフレーム数を示す第１のフレームレートから、第１の映像データのうち、対応する音声データの音声レベルが所定の閾値以下である連続する映像フレームが削除されて第１のフレーム数より小さい第２のフレームレートにされた第２の映像データと、削除されたフレームに関するフレーム情報と、を受信する。補完画像生成部は、フレーム情報を用いて削除されたフレームの画像を補完する補完画像を生成する。映像データ生成部は、補完画像と、第２の映像データとを用いて、第１のフレームレートの映像データを生成する。 The second information processing apparatus includes a reception unit, a complementary image generation unit, and a video data generation unit. From the first frame rate indicating the first number of frames per unit time of the first video data, the receiving unit has the audio level of the corresponding audio data of the first video data equal to or less than a predetermined threshold. Second video data in which consecutive video frames are deleted to a second frame rate smaller than the first frame number and frame information related to the deleted frames are received. The complementary image generation unit generates a complementary image that complements the deleted frame image using the frame information. The video data generation unit generates video data of the first frame rate using the complementary image and the second video data.

本実施形態に係る情報処理システムによれば、保存先からネットワークを介して送信された動画データの再生品質を向上させることができる。 According to the information processing system according to the present embodiment, it is possible to improve the reproduction quality of moving image data transmitted from a storage destination via a network.

本実施形態に係る情報処理システムのブロック図の一例を示す。An example of a block diagram of an information processing system concerning this embodiment is shown. 本実施形態に係る情報処理システムの構成の一例を示す。1 shows an example of the configuration of an information processing system according to the present embodiment. 映像データのメタ情報の構造の一例を示す。An example of the structure of the meta-information of video data is shown. フレームの削除及び復元の様子を説明するための図を示す。The figure for demonstrating the mode of the deletion and restoration | reconstruction of a frame is shown. 補完フレーム生成処理の一例を示す図を示す。The figure which shows an example of a complement frame production | generation process is shown. 移動量が大きい対象物を判別して、補完フレームを作成する方法の一例を示す。An example of a method for creating a complementary frame by discriminating an object having a large movement amount will be described. 動画データの送信端末におけるフレームレートの調整を説明するための図を示す。The figure for demonstrating adjustment of the frame rate in the transmission terminal of moving image data is shown. フレームの削減処理のフローチャートを示す。6 shows a flowchart of a frame reduction process. 作業バッファに格納された動画データのうち、各フレームの音声レベルに基いて削除するフレームを判別する動作のフローの詳細を示す。Details of an operation flow for determining a frame to be deleted based on the audio level of each frame in the moving image data stored in the work buffer are shown. 受信端末のデコード処理を説明するための図を示す。The figure for demonstrating the decoding process of a receiving terminal is shown. 受信端末によるフレーム再構築のフローチャートである。It is a flowchart of frame reconstruction by a receiving terminal. 受信端末における削除フレームに対する補完フレームの生成処理の動作フローチャートを示す。The operation | movement flowchart of the production | generation process of the complementary frame with respect to the deletion frame in a receiving terminal is shown. 本実施形態に係る情報処理システムのシーケンス図（その１）の一例を示す。An example of the sequence diagram (the 1) of the information processing system concerning this embodiment is shown. 本実施形態に係る情報処理システムのシーケンス図（その２）の一例を示す。An example of the sequence diagram (the 2) of the information processing system concerning this embodiment is shown. 本実施形態におけるサーバの構成の一例を示す。An example of the structure of the server in this embodiment is shown. 本実施形態に係るサーバまたはパソコンのハードウェア構成の一例を示す。2 shows an example of a hardware configuration of a server or a personal computer according to the present embodiment. 本実施形態に係る移動端末のハードウェア構成の一例を示す。An example of the hardware constitutions of the mobile terminal which concerns on this embodiment is shown. 本実施形態（変形例）における情報処理システムの構成の一例を示す。An example of the structure of the information processing system in this embodiment (modification) is shown.

図１は、本実施形態に係る情報処理システムの機能ブロック図の一例である。第１の情報処理装置１は、記憶部２、監視部３、削除部４、フレーム情報生成部５、送信部６を含む。 FIG. 1 is an example of a functional block diagram of the information processing system according to the present embodiment. The first information processing apparatus 1 includes a storage unit 2, a monitoring unit 3, a deletion unit 4, a frame information generation unit 5, and a transmission unit 6.

記憶部２は、第１の映像データと、第１の映像データと対応付けた同期情報を含む音声データと、を記憶する。監視部３は、通信ネットワークの状態を監視する。 The storage unit 2 stores first video data and audio data including synchronization information associated with the first video data. The monitoring unit 3 monitors the state of the communication network.

削除部４は、監視の結果に応じて、第１の映像データの単位時間当たりの第１のフレーム数を示す第１のフレームレートから、第１の映像データのうち、対応する音声データの音声レベルが所定の閾値以下である連続する映像フレームを削除して、第１のフレームレートより低い第２のフレームレートにした第２の映像データを生成する。また、削除部４は、連続するフレーム間で、類似度が所定の閾値以上である連続するフレームのいずれかを削除する。 The deletion unit 4 determines the audio of the corresponding audio data from the first frame rate based on the first frame rate indicating the first number of frames per unit time of the first video data according to the monitoring result. Second video data having a second frame rate lower than the first frame rate is generated by deleting consecutive video frames having a level equal to or lower than a predetermined threshold. In addition, the deletion unit 4 deletes any of the consecutive frames whose similarity is equal to or greater than a predetermined threshold between consecutive frames.

フレーム情報生成部５は、削除されたフレームに関するフレーム情報を生成する。
送信部６は、第２の映像データ、音声データ、及びフレーム情報を送信する。The frame information generation unit 5 generates frame information regarding the deleted frame.
The transmission unit 6 transmits the second video data, audio data, and frame information.

第２の情報処理装置７は、受信部８、補完画像生成部９、及び映像データ生成部１０を含む。 The second information processing device 7 includes a reception unit 8, a complementary image generation unit 9, and a video data generation unit 10.

受信部８は、第１の映像データの単位時間当たりの第１のフレーム数を示す第１のフレームレートから、第１の映像データのうち、対応する音声データの音声レベルが所定の閾値以下である連続する映像フレームが削除されて第１のフレーム数より小さい第２のフレームレートにされた第２の映像データと、削除されたフレームに関するフレーム情報と、を受信する。また、受信部８は、さらに、第１の映像データと対応付けた同期情報を含む音声データを受信する。 From the first frame rate indicating the first number of frames per unit time of the first video data, the receiving unit 8 determines that the audio level of the corresponding audio data of the first video data is equal to or lower than a predetermined threshold. The second video data in which a certain continuous video frame is deleted and the second frame rate is smaller than the first frame number and the frame information related to the deleted frame are received. The receiving unit 8 further receives audio data including synchronization information associated with the first video data.

補完画像生成部９は、フレーム情報を用いて削除されたフレームの画像を補完する補完画像を生成する。また、補完画像生成部９は、削除されたフレームの直前のフレームを複製することにより補完画像を生成する。また、補完画像生成部９は、削除されたフレームの前後のフレームを用いて、削除されたフレームに含まれる移動量が所定の閾値以上である対象物を判別し、対象物を表示する領域は、削除されたフレームの直後の削除されていないフレームの対象物を示す領域を複製することにより補完画像を生成する。 The complementary image generation unit 9 generates a complementary image that complements the deleted frame image using the frame information. Further, the complementary image generation unit 9 generates a complementary image by duplicating the frame immediately before the deleted frame. In addition, the complementary image generation unit 9 uses the frames before and after the deleted frame to determine an object whose movement amount included in the deleted frame is equal to or greater than a predetermined threshold, and an area for displaying the object is The complementary image is generated by duplicating the area indicating the object of the non-deleted frame immediately after the deleted frame.

映像データ生成部１０は、補完画像と、第２の映像データとを用いて、第１のフレームレートの映像データを生成する。また、映像データ生成部１０は、フレーム情報を用いて、補完画像を第２の映像データの削除されたフレームの位置に挿入し、第１のフレームレートの映像データを生成する。また、映像データ生成部１０は、同期情報を用いて、削除されたフレームに対応する補完画像と、削除されたフレームに対応する音声データとを同期させ、第１のフレームレートの映像データを生成する。 The video data generation unit 10 generates video data of the first frame rate using the complementary image and the second video data. Further, the video data generation unit 10 uses the frame information to insert a complementary image at the position of the deleted frame of the second video data to generate video data of the first frame rate. In addition, the video data generation unit 10 uses the synchronization information to synchronize the complementary image corresponding to the deleted frame and the audio data corresponding to the deleted frame to generate video data at the first frame rate. To do.

このような構成とすることで、ネットワークの帯域幅に応じて動画データの伝送レートまたはフレームレートを変更することができる。よって、ネットワークを流れるストリーミングデータの単位時間当たりのデータ量を削減することができる。また、映像フレームの音声フレームに対する遅延を防ぐことができる。また、映像フレームが音声フレームに比べて伝送量が大きいために、移動端末の映像デコード処理の際に一定時間内に映像フレームが届かずに発生する映像フレームの欠落を防ぐことができる。 With such a configuration, the transmission rate or frame rate of moving image data can be changed according to the bandwidth of the network. Therefore, the data amount per unit time of streaming data flowing through the network can be reduced. Further, it is possible to prevent a delay of the video frame with respect to the audio frame. In addition, since the transmission amount of the video frame is larger than that of the audio frame, it is possible to prevent the loss of the video frame that occurs when the video frame does not reach within a certain time during the video decoding process of the mobile terminal.

さらに、動画の再生レートより伝送レートが低いネットワーク環境を介して動画データを受信し、受信側で十分な動画再生のためのバッファを確保できない場合であっても、送信前の動画データ本来のフレームレートで動画データを再生することが可能となる。 Furthermore, even if video data is received via a network environment where the transmission rate is lower than the video playback rate, and the receiver does not have enough buffer for video playback, the original frame of video data before transmission It becomes possible to reproduce moving image data at a rate.

また、音声レベルに基いて削除する映像フレームを決定するため、削除フレームの前後の削除されていない映像フレームをコピーすることによって削除された映像フレームを復元した場合でも、動画再生時の視聴者の違和感を抑えることができる。また、前後の映像フレームを合成することにより補完フレームを生成することに比べて、補完フレーム生成のための計算量を削減することができ、補完フレームの作成時間に依存する映像と音声のずれを防ぐことができる。 Also, in order to determine the video frame to be deleted based on the audio level, even when the deleted video frame is restored by copying the undeleted video frame before and after the deleted frame, the viewer's A sense of incongruity can be suppressed. Compared to generating complementary frames by synthesizing previous and next video frames, the amount of computation for generating complementary frames can be reduced, and the difference between video and audio depending on the creation time of complementary frames can be reduced. Can be prevented.

また、削除された映像フレームに対応する補完フレームが生成された際に、生成された補完フレームのタイムスタンプが、その映像フレームに対応する音声フレームのタイムスタンプに同期されることにより、映像データと音声データの同期が可能となる。これにより、帯域幅の狭いネットワークを介して保存先から送信された動画データにおける映像再生と音声再生とのずれを最小限に抑えることができる。さらに、リップシンク調整のためのメモリ領域を削減することができ、また、演算負荷を削減することができる。例えば、ビデオメモリやオーディオメモリにおいて、映像データもしくは音声データを遅延させることによりリップシンク調整を行わなくてもよい。従って、リップシンクのためのメモリ領域とその映像処理に耐えうるＧＰＵ(Graphics Processing Unit)を削減することができる。 In addition, when a complementary frame corresponding to the deleted video frame is generated, the time stamp of the generated complementary frame is synchronized with the time stamp of the audio frame corresponding to the video frame, so that the video data and Audio data can be synchronized. As a result, it is possible to minimize the difference between video reproduction and audio reproduction in moving image data transmitted from a storage destination via a network with a narrow bandwidth. Furthermore, the memory area for lip sync adjustment can be reduced, and the calculation load can be reduced. For example, in the video memory or the audio memory, the lip sync adjustment may not be performed by delaying the video data or the audio data. Therefore, a memory area for lip sync and a GPU (Graphics Processing Unit) that can withstand the video processing can be reduced.

本実施形態のシステム構成の一例を説明する。図２は、本実施形態に係る情報処理システムの構成の一例を示す。 An example of the system configuration of this embodiment will be described. FIG. 2 shows an example of the configuration of the information processing system according to the present embodiment.

図２に示すように、情報処理システムは、パソコン３３、サーバ３１、及び移動端末３５を含む。パソコン３３とサーバ３１、サーバ３１と移動端末３５は通信キャリア３４を介してネットワークで接続される。サーバ３１は、第１の情報処理装置１の一例である。移動端末３５は、第２の情報処理装置７の一例である。 As shown in FIG. 2, the information processing system includes a personal computer 33, a server 31, and a mobile terminal 35. The personal computer 33 and the server 31, and the server 31 and the mobile terminal 35 are connected via a communication carrier 34 via a network. The server 31 is an example of the first information processing apparatus 1. The mobile terminal 35 is an example of the second information processing device 7.

情報処理システムは、パソコン３３、サーバ３１、移動端末３５、を含む。パソコン３３は、動画データを保存する。サーバ３１は、パソコン３３から動画データを受信し、受信した動画データのうち複数の映像フレームを削除して転送する。移動端末３５は、サーバ３１から受信した動画データを再生することができる。 The information processing system includes a personal computer 33, a server 31, and a mobile terminal 35. The personal computer 33 stores moving image data. The server 31 receives the moving image data from the personal computer 33, and deletes and transfers a plurality of video frames from the received moving image data. The mobile terminal 35 can reproduce the moving image data received from the server 31.

パソコン３３は、移動端末３５からサーバ３１を介してアクセスされる動画データを記憶する。パソコン３３は、例えば家庭内に配置され、インターネットを介してサーバ３１に接続される。サーバ３１からパソコン３３へのアクセスは、認証機能によりアクセス制限される。また、パソコン３３は、サーバ３１毎にそのサーバ３１に対して提供可能な動画ファイルの一覧情報を含むファイルを保持する。または、パソコン３３は、移動端末３５毎にその移動端末３５に対して提供可能な動画ファイルの一覧情報を含むファイルを保持してもよい。尚、パソコン３３は、サーバ３１の機能を有するホストコンピュータでもよいし、動画ファイルを格納しネットワークに接続されたストレージ装置でもよい。 The personal computer 33 stores moving image data accessed from the mobile terminal 35 via the server 31. The personal computer 33 is disposed in the home, for example, and is connected to the server 31 via the Internet. Access from the server 31 to the personal computer 33 is restricted by the authentication function. Further, the personal computer 33 holds a file including a list of moving image files that can be provided to the server 31 for each server 31. Or the personal computer 33 may hold | maintain the file containing the list information of the moving image file which can be provided with respect to the mobile terminal 35 for every mobile terminal 35. FIG. The personal computer 33 may be a host computer having the function of the server 31, or may be a storage device that stores a moving image file and is connected to a network.

サーバ３１は、移動端末３５からパソコン３３に保存された動画ファイルに対する視聴要求を受信する。また、サーバ３１は、サーバ３１と移動端末３５の帯域状況の監視を行う。そして、サーバ３１は、視聴要求対象の動画データをパソコン３３から取得し、取得した動画データからネットワークの帯域状況に応じて複数の映像フレームを削除して、その動画データを移動端末３５に転送する。ネットワークの帯域監視及び映像フレームの削除方法の詳細については後ほど説明する。尚、サーバ３１は移動端末３５との接続を確立するための認証機能を有する。 The server 31 receives a viewing request for a moving image file stored in the personal computer 33 from the mobile terminal 35. The server 31 also monitors the bandwidth status of the server 31 and the mobile terminal 35. Then, the server 31 acquires the video data to be viewed from the personal computer 33, deletes a plurality of video frames from the acquired video data according to the network bandwidth condition, and transfers the video data to the mobile terminal 35. . Details of network bandwidth monitoring and video frame deletion will be described later. The server 31 has an authentication function for establishing a connection with the mobile terminal 35.

移動端末３５は、ユーザから、再生したい動画の要求を受け付ける。そして、パソコン３３に対して、動画の転送を指示する。移動端末３５は、サーバ３１から動画データをストリーミング形式で受信する。受信した動画データのうち送信時に削除された映像フレームがあれば、移動端末３５は、その映像フレームに対応する補完フレームを生成して、動画データに挿入する。それにより、パソコン３３は、映像フレームを削除する前の動画データを復元する。そして、移動端末３５は復元した動画データを再生する。復元処理を行うことで、移動端末３５は動画データをパソコン３３に保存されていた動画データと同じフレームレートで再生する。 The mobile terminal 35 receives a request for a moving image to be reproduced from the user. Then, it instructs the personal computer 33 to transfer the moving image. The mobile terminal 35 receives the moving image data from the server 31 in a streaming format. If there is a video frame deleted at the time of transmission in the received video data, the mobile terminal 35 generates a complementary frame corresponding to the video frame and inserts it into the video data. Thereby, the personal computer 33 restores the video data before deleting the video frame. Then, the mobile terminal 35 reproduces the restored moving image data. By performing the restoration process, the mobile terminal 35 reproduces the moving image data at the same frame rate as the moving image data stored in the personal computer 33.

尚、以下の説明では、パソコン３３からサーバ３１に対してデータを送信することを上り（アップロード）と記し、サーバ３１から移動端末３５にデータを送信することを下り（ダウンロード）と記すことがある。また、動画データの送信元となるパソコン３３またはサーバ３１を送信端末、動画データの受信側となるサーバ３１または移動端末３５を受信端末と記すことがある。さらに、パソコン３３において削除される映像フレームを削除フレームと記すことがある。 In the following description, transmitting data from the personal computer 33 to the server 31 may be referred to as “upload”, and transmitting data from the server 31 to the mobile terminal 35 may be referred to as “downlink” (download). . Further, the personal computer 33 or the server 31 that is the transmission source of the moving image data may be referred to as a transmission terminal, and the server 31 or the mobile terminal 35 that is the reception side of the moving image data may be referred to as a reception terminal. Further, a video frame deleted in the personal computer 33 may be referred to as a deleted frame.

ここで映像データのメタ情報の構造を説明する。映像データのメタ情報は映像データとともに送信端末から受信端末へと送付される。送信端末は削除した映像フレームの情報をメタ情報に付加させ、受信端末はこのメタ情報を用いて補完フレームを生成することにより動画データを復元する。図３は、映像データのメタ情報の構造の一例を示す。 Here, the structure of meta information of video data will be described. The meta information of the video data is sent from the transmitting terminal to the receiving terminal together with the video data. The transmitting terminal adds the information of the deleted video frame to the meta information, and the receiving terminal restores the moving image data by generating a complementary frame using the meta information. FIG. 3 shows an example of the structure of meta information of video data.

メタ情報は、動画データに関する情報であり動画データに対応付けられる。メタ情報は、フォーマットサイズ（コンテンツ解像度）４３、映像タイトル４４、映像時間４５、作成日４６、内容４７、削除フレーム開始番号４１、削除フレーム期間（フレーム数）４２を含む。 The meta information is information related to the moving image data and is associated with the moving image data. The meta information includes a format size (content resolution) 43, a video title 44, a video time 45, a creation date 46, contents 47, a deletion frame start number 41, and a deletion frame period (number of frames) 42.

フォーマットサイズ（コンテンツ解像度）４３、映像タイトル４４、映像時間４５、作成日４６、内容４７は、パソコン３３に保存された動画データに含まれる。フォーマットサイズ（コンテンツ解像度）４３、映像タイトル４４、映像時間４５、作成日４６、内容４７は、それぞれ、対応する動画データの、フォーマットサイズ（コンテンツ解像度）、映像タイトル、映像時間、作成日、内容である。 The format size (content resolution) 43, the video title 44, the video time 45, the creation date 46, and the content 47 are included in the moving image data stored in the personal computer 33. The format size (content resolution) 43, video title 44, video time 45, creation date 46, and content 47 are respectively the format size (content resolution), video title, video time, creation date, and content of the corresponding video data. is there.

削除フレーム開始番号４１、削除フレーム期間（フレーム数）４２は、送信端末において映像フレームを削除する場合に、メタデータに付加されるデータ項目である。削除フレーム開始番号４１は、送信端末により削除される映像フレームのフレーム番号（フレームの識別番号）である。削除フレーム期間（フレーム数）４２は、削除される映像フレームが連続する場合の期間を示し、例えば、削除フレームの連続数で表される。 The deletion frame start number 41 and the deletion frame period (number of frames) 42 are data items added to the metadata when the video frame is deleted at the transmission terminal. The deletion frame start number 41 is a frame number (frame identification number) of a video frame deleted by the transmission terminal. The deletion frame period (the number of frames) 42 indicates a period when the video frames to be deleted are continuous, and is represented by, for example, the continuous number of deletion frames.

削除フレーム開始番号４１、削除フレーム期間（フレーム数）４２は、所定期間毎（例えば１秒毎）の削除フレームの情報である。よって、所定期間内に削除フレームが複数ある場合には、削除フレーム毎に対応付けた、削除フレーム開始番号４１、削除フレーム期間４２のデータ項目が、メタ情報に付加される。 The deletion frame start number 41 and the deletion frame period (the number of frames) 42 are information on the deletion frame every predetermined period (for example, every 1 second). Therefore, when there are a plurality of deletion frames within a predetermined period, data items of the deletion frame start number 41 and the deletion frame period 42 associated with each deletion frame are added to the meta information.

ここで、削除フレーム開始番号４１、削除フレーム期間（フレーム数）４２は、再開始フレームのフレーム番号、再開始フレームの期間（フレーム数）でもよい。本実施形態では、削除フレームは連続するものであるため、削除され欠落した映像フレームの次に位置する、映像の再開始フレームの番号とそのフレーム期間の情報があれば、移動端末３５での映像再生のときに削除フレームの開始番号とその期間を事前に認識できる。これにより、移動端末３５で削除フレームの開始番号とその期間の補完を行ってもよい。 Here, the deletion frame start number 41 and the deletion frame period (number of frames) 42 may be the frame number of the restart frame and the period (number of frames) of the restart frame. In this embodiment, since the deleted frames are continuous, if there is information on the restart frame number and the frame period of the video positioned next to the deleted and missing video frame, the video at the mobile terminal 35 can be obtained. It is possible to recognize in advance the start number of the deleted frame and its duration during playback. Thereby, the mobile terminal 35 may complement the start number of the deleted frame and its period.

次に、送信端末において動画データから映像フレームが削除されて送信され、受信端末において補完フレームが生成されて動画データが復元される動作の説明をする。 Next, a description will be given of an operation in which a video frame is deleted from a moving image data and transmitted at a transmitting terminal, a complementary frame is generated at a receiving terminal, and the moving image data is restored.

送信端末においてどの映像フレームが削除されるかの判定は、各映像フレームに対応する音声フレームの音声レベルに基いて行われる。 The determination of which video frame is deleted at the transmission terminal is made based on the audio level of the audio frame corresponding to each video frame.

動画再生において、リップシンクずれがもっとも顕著に表れるのは、人が話す場面における人の口の動きと音声とのずれである。音量レベルが小さいときは、例え人の顔が映像に表示されていたとしても口は動いていないものと考えられる。よって、音声レベルが小さい連続する映像フレームを削除して、削除フレームの直前の映像フレームをコピーすることによって復元することにより、動画再生時の視聴者の違和感を抑えることができる。 In video playback, the lip sync deviation appears most prominently in the movement of the person's mouth and the voice in the scene where the person speaks. When the volume level is low, it is considered that the mouth is not moving even if a person's face is displayed in the video. Therefore, it is possible to suppress a viewer's uncomfortable feeling when reproducing a moving image by deleting consecutive video frames having a low audio level and restoring by copying a video frame immediately before the deleted frame.

尚、映像フレームの削除処理において、削除されるのは映像データだけであり、音声データは削除されない。それは移動端末側では元の映像と同じフレーム数で再生されるため、音声データを削除する必要はないからである。 In the video frame deletion process, only video data is deleted, and audio data is not deleted. This is because it is not necessary to delete the audio data because the mobile terminal reproduces the same number of frames as the original video.

次に、送信端末において動画データから映像フレームが削除されて送信され、受信端末において補完フレームが生成されて動画データが復元される様子を説明する。図４は、映像フレームの削除及び復元の様子を説明するための図である。尚、図４では映像フレームの削除処理はサーバ３１で行われるとするが、パソコン３３が行ってもよい。 Next, a description will be given of how a video frame is deleted from video data at the transmitting terminal and transmitted, and a complementary frame is generated at the receiving terminal to restore the video data. FIG. 4 is a diagram for explaining how video frames are deleted and restored. In FIG. 4, the video frame deletion process is performed by the server 31, but may be performed by the personal computer 33.

パソコン３３からＡ〜Ｋのフレームが移動端末３５に送信される場合を考える。まず、パソコン３３は、Ａ〜Ｋのフレームをメタ情報とともにサーバ３１に送信する。サーバ３１は、Ａ〜Ｋのフレームをメタ情報とともに受信すると、各映像フレームに対応する音声レベルを認識する。音声レベルは、例えば、音の大きさや、人間の可聴範囲に属するような特定の周波数帯域に属する音の大きさ等を数値化したものであってよいし、これに限定されない。 Consider a case in which frames A to K are transmitted from the personal computer 33 to the mobile terminal 35. First, the personal computer 33 transmits frames A to K to the server 31 together with meta information. When the server 31 receives the frames A to K together with the meta information, the server 31 recognizes the audio level corresponding to each video frame. The sound level may be, for example, a numerical value of a sound volume, a sound volume belonging to a specific frequency band belonging to a human audible range, or the like, but is not limited thereto.

図４の例では、音声レベル５０に示すように、フレームＡの音声レベルは「60」、フレームＢの音声レベルは「70」、フレームＣの音声レベルは「90」、以下Ｄ〜Ｋは同様に音声レベル５０に示すとおりとする。 In the example of FIG. 4, as indicated by the audio level 50, the audio level of frame A is “60”, the audio level of frame B is “70”, the audio level of frame C is “90”, and so on. The voice level is as shown in FIG.

サーバ３１は、各フレームのうち、音声レベルが所定の閾値以下であるフレームを判別する。そして、音声レベルが所定の閾値以下のフレームのうち、直前のフレームの音声レベルが所定の閾値以下である映像フレームを削除対象フレームとして認識する。 The server 31 determines a frame whose sound level is equal to or lower than a predetermined threshold among the frames. Of the frames whose audio level is equal to or lower than a predetermined threshold, a video frame whose audio level of the immediately preceding frame is equal to or lower than the predetermined threshold is recognized as a deletion target frame.

図４の場合、所定の閾値の値が20であるとする。すると、音声レベルが20以下であるフレームはＤ〜Ｇである。フレームＤは、直前のフレームの音声レベルが閾値以上（フレームＣ、音声レベル90）であるので、映像フレームＤは削除対象フレームとは認識されない。一方、フレームＥ〜Ｇはいずれも直前のフレームの音声レベルが閾値以下であるので、削除対象フレームとして認識される。尚、映像データと音声データはフレーム毎に対応付けられ、映像フレームの音声レベルは、その映像フレームに対応する音声フレームの情報が用いられて認識される。 In the case of FIG. 4, it is assumed that the predetermined threshold value is 20. Then, the frames whose audio level is 20 or less are D to G. In the frame D, the audio level of the immediately preceding frame is equal to or higher than the threshold (frame C, audio level 90), so the video frame D is not recognized as a deletion target frame. On the other hand, the frames E to G are recognized as frames to be deleted because the audio level of the immediately preceding frame is equal to or less than the threshold value. Note that video data and audio data are associated with each frame, and the audio level of the video frame is recognized using information of the audio frame corresponding to the video frame.

次に、サーバ３１は認識した削除対象フレームを動画データから削除する。そして、サーバ３１は、削除フレームに関する情報をメタ情報に付加する。図４の例の場合映像フレームＥ〜Ｇの情報がメタ情報に付加される。具体的には、サーバ３１は、削除フレーム開始番号４１に映像フレームＥのフレーム番号を格納し、削除フレーム期間（フレーム数）４２に整数値「3」を格納する。 Next, the server 31 deletes the recognized deletion target frame from the moving image data. Then, the server 31 adds information regarding the deleted frame to the meta information. In the case of the example in FIG. 4, information on the video frames E to G is added to the meta information. Specifically, the server 31 stores the frame number of the video frame E in the deletion frame start number 41 and stores the integer value “3” in the deletion frame period (number of frames) 42.

ここで、削除する映像フレームの数は、サーバ３１と移動端末３５の間のネットワークの帯域幅よりも送信する動画データの伝送レートが小さくなるように決定される。 Here, the number of video frames to be deleted is determined such that the transmission rate of the moving image data to be transmitted is smaller than the bandwidth of the network between the server 31 and the mobile terminal 35.

動画データの所定の期間内、すなわち所定数の連続するフレームのうち、音声レベルが閾値以下であるフレームが存在しない場合も考えられる。例えば、図４の場合、閾値を3とした場合、フレームＡ〜Ｋのうち、音声レベルが閾値以下のフレームが存在しない。この場合、削除対象のフレームが存在しないこととなり、ネットワークの帯域幅を超える伝送レートで動画が配信されてしまう。この場合は、同時に削除対象の判別が行われる動画データの期間を増やすことにより対応するが、この動作の詳細は後ほど説明する。 There may be a case where there is no frame having a sound level equal to or lower than a threshold value within a predetermined period of the moving image data, that is, a predetermined number of consecutive frames. For example, in the case of FIG. 4, when the threshold is set to 3, there is no frame having an audio level equal to or lower than the threshold among the frames A to K. In this case, there is no frame to be deleted, and a moving image is distributed at a transmission rate that exceeds the bandwidth of the network. This case can be dealt with by increasing the period of the moving image data in which the deletion target is simultaneously determined. Details of this operation will be described later.

サーバ３１は、削除した映像フレームの情報を追記したメタ情報をフレームに付随させて、移動端末３５に送信する。すると、移動端末３５は、映像フレームＥ〜Ｇが削除された状態の映像データを受信することとなる。移動端末３５は、映像データと同時にメタ情報を受信し、削除された映像フレームは映像フレームＥ〜Ｇであることを認識する。具体的には、移動端末３５はメタ情報の削除フレーム開始番号４１、削除フレーム期間（フレーム数）４２の情報を認識し、削除フレームを特定する。 The server 31 transmits the meta information in which the information of the deleted video frame is added to the mobile terminal 35 along with the meta information. Then, the mobile terminal 35 receives video data in a state where the video frames E to G are deleted. The mobile terminal 35 receives the meta information simultaneously with the video data, and recognizes that the deleted video frames are the video frames E to G. Specifically, the mobile terminal 35 recognizes information on the deletion frame start number 41 and the deletion frame period (number of frames) 42 of the meta information, and specifies the deletion frame.

そして、移動端末３５は削除された映像フレームＥ〜Ｇの補完フレームを生成し、その補完フレームを動画データの削除された映像フレームの位置に挿入することで、動画データの復元処理を行う。具体的には、移動端末３５は、対応するメタ情報の削除フレーム開始番号４１、削除フレーム期間（フレーム数）４２から特定した削除フレームの位置に補完フレームを挿入する。 Then, the mobile terminal 35 generates a complement frame of the deleted video frames E to G, and inserts the complement frame at the position of the video frame from which the video data is deleted, thereby performing the restoration process of the video data. Specifically, the mobile terminal 35 inserts a complementary frame at the position of the deleted frame specified from the deleted frame start number 41 and the deleted frame period (number of frames) 42 of the corresponding meta information.

補完フレーム生成処理は、削除フレームの前後の映像フレームが複製されることにより実行される。 The complementary frame generation process is executed by duplicating the video frames before and after the deletion frame.

図５は、補完フレーム生成処理の一例を示す図である。図５の例は、フレームの音声レベルに基いて映像フレームＤ〜Ｆが送信端末で削除された場合の補完の例を示している。映像フレームＤに対応するフレームは、直前の削除されていない映像フレームである映像フレームＣが複製されることによって生成される。一方、映像フレームＥ、Ｆは、直後の削除されていない映像フレームである映像フレームＧが複製されることによって生成される。 FIG. 5 is a diagram illustrating an example of a complementary frame generation process. The example of FIG. 5 shows an example of complementation when the video frames D to F are deleted at the transmission terminal based on the audio level of the frame. The frame corresponding to the video frame D is generated by duplicating the video frame C, which is the previous video frame not deleted. On the other hand, the video frames E and F are generated by duplicating the video frame G, which is a video frame that has not been deleted immediately after.

補完フレームの生成では、映像フレームにおける物体の移動量を考慮して複製元の映像フレームを変化させる構成としてもよい。具体的には、移動端末３５は、削除フレームの前後の削除されていない映像フレームを比較して、移動量が大きい対象物とそうでない対象物を判別する。そして、移動端末３５は、移動量が大きくない対象物に対応する位置の画素値は、直前の削除されていない映像フレームの画素値を複製した値とする。移動量が大きい対象物に対応する位置の画素値は、連続する複数の削除フレームのうち、所定時点以前か以後かで複製元の映像フレームを変更させる。すなわち、所定の時点までの削除フレームの画素値は、直前の削除されていない映像フレームの画素値を複製した値とし、所定の時点より後の削除フレームの画素値は、直後の削除されていない映像フレームの画素値を複製した値とする。尚、移動量の測定は例えばＭＰＥＧ（Moving Picture Experts Group）の動き補償等で用いられるような、画像の動き量を推定する動きベクトル探索を用いる方法など種々の方法で測定可能である。 The generation of the complementary frame may be configured to change the copy source video frame in consideration of the amount of movement of the object in the video frame. Specifically, the mobile terminal 35 compares the video frames that have not been deleted before and after the deletion frame, and discriminates an object having a large movement amount and an object that is not. Then, the mobile terminal 35 sets the pixel value at the position corresponding to the object whose movement amount is not large as a value obtained by duplicating the pixel value of the video frame not deleted immediately before. A pixel value at a position corresponding to an object having a large movement amount changes a copy source video frame before or after a predetermined time among a plurality of consecutive deletion frames. That is, the pixel value of the deleted frame up to a predetermined time is a value obtained by duplicating the pixel value of the video frame not deleted immediately before, and the pixel value of the deleted frame after the predetermined time is not deleted immediately after The pixel value of the video frame is a duplicated value. Note that the amount of movement can be measured by various methods such as a method using a motion vector search for estimating the amount of motion of an image as used in motion compensation of MPEG (Moving Picture Experts Group), for example.

図６は、移動量が大きい対象物と、そうでない対象物が判別され、補完フレームが生成される方法の一例を示す。図６の例は、フレームの音声レベルに基いて映像フレームＤ〜Ｆが送信端末で削除された場合の補完の例を示している。図６に示すように、映像フレームＤ、Ｅの移動量の大きい対象物を示す位置の画素値は、直前の削除されていない映像フレームである映像フレームＣの対応する位置の画素値とする。また、映像フレームＦの移動量の大きい対象物を示す位置の画素値は、直後の削除されていない映像フレームである映像フレームＧの対応する位置の画素値とする。一方、映像フレームＤ、Ｅ、Ｆの移動量の小さい対象物を示す位置の画素値は映像フレームＣの対応する位置の画素値とする。 FIG. 6 shows an example of a method in which an object having a large movement amount and an object that is not so are discriminated and a complementary frame is generated. The example of FIG. 6 shows an example of complementation when the video frames D to F are deleted at the transmission terminal based on the audio level of the frame. As shown in FIG. 6, the pixel value at the position indicating the object with the large moving amount of the video frames D and E is the pixel value at the corresponding position of the video frame C that is the previous video frame that has not been deleted. Also, the pixel value at the position indicating the object with the large moving amount of the video frame F is set as the pixel value at the corresponding position of the video frame G which is the video frame not deleted immediately after. On the other hand, a pixel value at a position indicating an object with a small moving amount of the video frames D, E, and F is a pixel value at a corresponding position in the video frame C.

ここで、動画再生時においては、音声出力は元の動画に対して変化しないことから、音飛びは発生しない。また、映像フレーム数は補完され、見た目の映像フレーム数が確保されるという点から、再生映像の見た目の画質劣化を抑えることができる。さらに、音声レベルが大きく変わるところは、映像も大きく変わると考えられ、本実施形態では音声レベルが所定の閾値より低い映像フレームを削除することから、再生映像の見た目の画質劣化を抑えることができる。 Here, at the time of moving image reproduction, since the audio output does not change with respect to the original moving image, no sound skip occurs. In addition, since the number of video frames is complemented and the number of apparent video frames is ensured, it is possible to suppress the degradation of the visual quality of the reproduced video. Furthermore, where the audio level changes significantly, it is considered that the video also changes greatly. In this embodiment, video frames whose audio level is lower than a predetermined threshold are deleted, so that it is possible to suppress the deterioration in the visual image quality of the reproduced video. .

以上、フレームの音声レベルに基いて削除対象フレームを判定する方法を説明したが、削除対象フレームの判定には、音声レベルに加えて、動画データの時系列における前後の映像フレームとの近似性、すなわち類似度をさらに考慮した方法としてもよい。 As described above, the method of determining the deletion target frame based on the audio level of the frame has been described. For the determination of the deletion target frame, in addition to the audio level, the closeness with the video frames before and after the time series of the video data, That is, a method that further considers the similarity may be used.

類似度の計算には、種々の手法が用いられてよいが、例えば、直前の映像フレームとの映像フレーム比較により導かれる相似率を類似度としてもよいし、画素値の差の二乗和（SSD: Sum of Squared Differences）を用いて計算することで、類似度を算出してもよい。 Various methods may be used to calculate the similarity. For example, the similarity calculated by comparing the video frame with the immediately previous video frame may be used as the similarity, or the sum of squares of differences in pixel values (SSD). : Similarity may be calculated by calculating using Sum of Squared Differences).

この場合、送信端末は、映像フレームのうち、類似度が所定の閾値以上である映像フレームを判別し、類似度が所定の閾値以上であると判別した映像フレームのうち音声レベルが所定の閾値以下である連続する映像フレームを削除対象フレームとする。 In this case, the transmitting terminal determines a video frame having a similarity greater than or equal to a predetermined threshold among the video frames, and an audio level of the video frame determined to have a similarity greater than or equal to a predetermined threshold is equal to or lower than the predetermined threshold. Are consecutive video frames as deletion target frames.

次に、サーバ３１の動画データの配信品質監視について説明する。配信品質監視とは、動画の転送で使用する帯域幅が、サーバ３１と移動端末３５間のネットワークの帯域幅以下となるように、送信する動画データの映像フレームレートを調整するための調整量を決定することである。 Next, the distribution quality monitoring of the moving image data of the server 31 will be described. Distribution quality monitoring is an adjustment amount for adjusting the video frame rate of moving image data to be transmitted so that the bandwidth used for moving image transfer is equal to or less than the network bandwidth between the server 31 and the mobile terminal 35. Is to decide.

サーバ３１は、サーバ３１と移動端末３５間のネットワーク帯域の状況を動的に監視し、監視の結果に応じて動画データから映像フレームを削除することにより伝送レートの調整を行う。ここでサーバ３１は動画データの伝送レートがネットワーク帯域を超えないように映像フレームの削除を行う。 The server 31 dynamically monitors the status of the network bandwidth between the server 31 and the mobile terminal 35, and adjusts the transmission rate by deleting video frames from the moving image data according to the monitoring result. Here, the server 31 deletes the video frame so that the transmission rate of the moving image data does not exceed the network bandwidth.

配信品質監視では、サーバ３１は、送信する動画のデータ量と安定的に送受信可能なデータ量とを比較し、安定的に送受信可能なデータ量の範囲内に送信する動画のデータ量が収まるように、動画データの映像フレームレートを決定する。ここで決定された動画データの映像フレームレートを、以下の説明では、送信用フレームレートと記す。 In the distribution quality monitoring, the server 31 compares the data amount of the moving image to be transmitted with the data amount that can be stably transmitted / received, and the data amount of the moving image to be transmitted falls within the range of the data amount that can be stably transmitted / received. Next, the video frame rate of the moving image data is determined. The video frame rate of the moving image data determined here is referred to as a transmission frame rate in the following description.

先ず、サーバ３１は、サーバ３１と移動端末３５の間の帯域監視の結果からサーバ３１と移動端末３５間の使用可能な帯域幅を判定する。もしくは、帯域監視の結果と回線の伝送路容量とからサーバ３１は使用可能な帯域幅を判定する。そして、サーバ３１は、動画データの、解像度、ビット色、圧縮率、アクティビティレベル、及びフレームレートから動画データの伝送ビットレートを算出する。そしてサーバ３１は、使用可能な帯域幅に、算出した伝送ビットレートが収まるように、動画データの映像フレームレートの削除量を導出する。 First, the server 31 determines the usable bandwidth between the server 31 and the mobile terminal 35 from the result of bandwidth monitoring between the server 31 and the mobile terminal 35. Alternatively, the server 31 determines an available bandwidth from the result of bandwidth monitoring and the transmission path capacity of the line. Then, the server 31 calculates the transmission bit rate of the moving image data from the resolution, bit color, compression rate, activity level, and frame rate of the moving image data. Then, the server 31 derives the deletion amount of the video frame rate of the moving image data so that the calculated transmission bit rate is within the usable bandwidth.

尚、アクティビティレベルは、ストリームに対するパケットの送信頻度であり、例えば動画データの内容によって変化する。例えば無音の映像データの場合、オーディオストリームを送信する必要はなく、オーディオストリームのアクティビティレベルは０になる。 The activity level is the frequency of packet transmission with respect to the stream, and varies depending on, for example, the content of the moving image data. For example, in the case of silent video data, there is no need to transmit an audio stream, and the activity level of the audio stream is zero.

次に、配信品質監視の動作について、具体例を示して説明する。サーバ３１と移動端末３５間の通信規格をＬＴＥ（Long Term Evolution）と仮定する。この場合の帯域幅を最大75Mbpsと仮定する。尚、以下の例では、動画データはコーデックにおいて固定ビットレートでエンコードされると想定する。 Next, the delivery quality monitoring operation will be described with a specific example. It is assumed that the communication standard between the server 31 and the mobile terminal 35 is LTE (Long Term Evolution). The bandwidth in this case is assumed to be 75 Mbps at the maximum. In the following example, it is assumed that moving image data is encoded at a fixed bit rate in the codec.

動画送信のために使用可能な帯域幅は、ネットワーク内の他のアプリケーション等によるトラフィックにも依存するため、常に100%使用できるというわけではない。そこで、サーバ３１は帯域監視を行い、すなわち、一定量のパケットがパソコン３３からサーバ３１まで到達するまでにかかる時間を算出し、ネットワークのトラフィック状況を把握する。このトラフィック状況に基いて、動画配信のために保証される帯域幅の、回線の最大伝送速度に対する割合をZ%と定義する。ここでは、Z＝1と仮定する。 The bandwidth that can be used for video transmission depends on traffic from other applications in the network, and is not always 100% usable. Therefore, the server 31 performs bandwidth monitoring, that is, calculates the time required for a certain amount of packets to reach the server 31 from the personal computer 33, and grasps the traffic status of the network. Based on this traffic situation, the ratio of the bandwidth guaranteed for moving picture distribution to the maximum transmission speed of the line is defined as Z%. Here, it is assumed that Z = 1.

動画データは、例えば解像度が1280[dot per inch(dpi)]×720[dpi]のフルＨＤ（full high definition）、ビット色が8、映像フレームレートが24ｆｐｓ（frames per second）(とする。この動画データのビットレートは、（1280×720（解像度））×（3（RGB）×256（8ビット色））×24（フレーム）＝17Gbpsになる。この値は１フレームが未圧縮状態である場合の値である。 The moving image data has, for example, a resolution of 1280 [dot per inch (dpi)] × 720 [dpi] full HD (full high definition), a bit color of 8, and a video frame rate of 24 fps (frames per second). The bit rate of the video data is (1280 x 720 (resolution)) x (3 (RGB) x 256 (8-bit color)) x 24 (frame) = 17 Gbps This value is one frame uncompressed Is the case value.

ここで例えばMPEG等の圧縮方式は、例えば、フルのIピクチャ（Intra picture）、1/2サイズのPピクチャ（Predictive picture）、1/4サイズのBピクチャ（Bi-directional predictive picture）を含む。そして、Iピクチャ、Pピクチャ、Bピクチャは、1:4:10の割合で構成されている。そのため、動画データの圧縮率は約11/30になる。さらに、例えば、MPEG-AVCは、移動部分だけの差異データであるため、さらにおおよそ1/24に圧縮される。よって動画データは11/30×1/24=11/720に圧縮されると想定する。 Here, for example, a compression method such as MPEG includes a full I picture (Intra picture), a 1/2 size P picture (Predictive picture), and a 1/4 size B picture (Bi-directional predictive picture). The I picture, P picture, and B picture are configured in a ratio of 1: 4: 10. Therefore, the compression rate of moving image data is about 11/30. Furthermore, for example, since MPEG-AVC is difference data of only a moving part, it is further compressed to approximately 1/24. Therefore, it is assumed that the moving image data is compressed to 11/30 × 1/24 = 11/720.

この圧縮を考慮すると動画データの伝送ビットレートは、17Gbps×11/720=259Mbpsとなる。ここで、動画データは、再生が行われる移動端末３５の解像度に合わせて圧縮が可能である。例えば、サーバ３１が認識した移動端末３５の解像度が800×480である場合を考える。この場合、動画データは送信前に、800×480/1280×720=0.42倍に圧縮することができる。 Considering this compression, the transmission bit rate of moving image data is 17 Gbps × 11/720 = 259 Mbps. Here, the moving image data can be compressed in accordance with the resolution of the mobile terminal 35 where the reproduction is performed. For example, consider a case where the resolution of the mobile terminal 35 recognized by the server 31 is 800 × 480. In this case, the moving image data can be compressed to 800 × 480/1280 × 720 = 0.42 times before transmission.

ここで、元の動画データは解像度が大きい装置で表示するコンテンツであり、解像度が小さい移動端末３５で動画データを再生する場合、動画データの解像度を移動端末３５の解像度まで落としても見た目が変化することはない。また、移動端末３５の映像再生チップの性能は動画データが含む情報量を全て活用できないことが多く、映像の細かい変化を再現できない。 Here, the original video data is content to be displayed on a device with a high resolution, and when video data is played back on the mobile terminal 35 with a low resolution, the appearance changes even if the resolution of the video data is reduced to the resolution of the mobile terminal 35. Never do. In addition, the performance of the video playback chip of the mobile terminal 35 often cannot use all the amount of information included in the moving image data, and cannot reproduce fine changes in video.

よって、サーバ３１が認識した移動端末３５の解像度に基いた解像度をA、映像フレーム数をBとする。A=800×480、B=24の場合、動画データの伝送ビットレートは、A×(RGB各色フレーム:3×256)×B×11/720=(800×480)×(3×256)×24×11/720≒108Mbpsとなる。 Therefore, the resolution based on the resolution of the mobile terminal 35 recognized by the server 31 is A, and the number of video frames is B. When A = 800 × 480 and B = 24, the transmission bit rate of video data is A × (RGB color frames: 3 × 256) × B × 11/720 = (800 × 480) × (3 × 256) × 24 × 11/720 ≒ 108Mbps

しかしながら、この値はネットワークの帯域幅である75Mbps以下ではない。そこで、送信端末は動画データの伝送ビットレートがネットワークの帯域幅以下となるように映像フレームを削除して送信する。削除前の映像フレームに対する削除後映像フレームの割合をEとすると、108M×E≦77Mが成立すればよいから、E≦0.71となる。24×E、すなわち、24×0.71=17.04であるため、送信する動画データの映像フレームレートは17ｆｐｓに変更されればよい。この場合、元の動画の映像フレームレートは24ｆｐｓであるから、1秒間の動画データにおいて、削除する映像フレームの数は7となる。 However, this value is not less than the network bandwidth of 75 Mbps. Therefore, the transmission terminal deletes and transmits the video frame so that the transmission bit rate of the moving image data is equal to or less than the network bandwidth. If the ratio of the post-deletion video frame to the pre-deletion video frame is E, 108M × E ≦ 77M may be satisfied, and E ≦ 0.71. Since 24 × E, that is, 24 × 0.71 = 17.04, the video frame rate of the moving image data to be transmitted may be changed to 17 fps. In this case, since the video frame rate of the original moving image is 24 fps, the number of video frames to be deleted is 7 in the moving image data for 1 second.

次に、配信品質監視において決定された動画データの映像フレームレート以下となるように、配信される動画データの映像フレームが削除される動作について説明する。 Next, an operation for deleting the video frame of the moving image data to be distributed so as to be equal to or less than the video frame rate of the moving image data determined in the distribution quality monitoring will be described.

図７は、送信端末における動画データの映像フレームレートの調整を説明するための図である。 FIG. 7 is a diagram for explaining adjustment of the video frame rate of moving image data in the transmission terminal.

ファイルデコーダ５１は、ストリーミングデータを映像データと音声データに分離する。そして、ファイルデコーダ５１は分離した映像データをビデオエンコーダ５３に出力し、分離した音声データをオーディオエンコーダ５５に出力する。さらに、ファイルデコーダ５１はストリーミングデータのメタ情報をフレーム制御部５２に送信する。 The file decoder 51 separates the streaming data into video data and audio data. Then, the file decoder 51 outputs the separated video data to the video encoder 53 and outputs the separated audio data to the audio encoder 55. Further, the file decoder 51 transmits the meta information of the streaming data to the frame control unit 52.

ここで、分離された映像データと音声データはフレーム毎に対応付けられる。例えば、映像フレームと音声フレームで同じフレーム番号を有するものがそれぞれ対応付けられる。対応する映像フレームと音声フレームのタイムスタンプは同じ値が設定される。尚、タイムスタンプは、フレーム単位に再生開始からの秒数をつけ定義するものであり、タイムスタンプの時刻に合わせてフレームが再生される。各映像フレームおよび各音声フレームはそれぞれのタイムスタンプ情報を保持する。 Here, the separated video data and audio data are associated with each frame. For example, video frames and audio frames having the same frame number are associated with each other. The same value is set for the time stamp of the corresponding video frame and audio frame. The time stamp is defined by adding the number of seconds from the start of playback to each frame, and the frame is played back at the time of the time stamp. Each video frame and each audio frame holds time stamp information.

フレーム制御部５２は、映像データのうち、フレームレートが送信用フレームレート以下となるように、削除する映像フレームを決定する。そして、フレーム制御部５２は削除した映像フレームに関する情報をメタ情報に付加する。すなわち、フレーム制御部５２は、メタ情報の削除フレーム開始番号４１、削除フレーム期間（フレーム数）４２に削除した映像フレームの情報を格納する。 The frame control unit 52 determines a video frame to be deleted so that the frame rate of the video data is equal to or less than the transmission frame rate. Then, the frame control unit 52 adds information on the deleted video frame to the meta information. That is, the frame control unit 52 stores the information of the deleted video frame in the deleted frame start number 41 and the deleted frame period (number of frames) 42 of the meta information.

ファイルデコーダ５１より出力された映像データは、ビデオエンコーダ５３に入力される。ビデオエンコーダ５３は、映像データからフレーム制御部５２により決定された削除フレームを削除し、再度映像フレームを構築する。そして、ビデオエンコーダ５３は再構築した映像フレームを送信用の形式にエンコードする。エンコードした結果映像データは、例えばＲＴＭＰ（Real Time Messaging Protocol）形式のパケット等に分割もしくは集約される。ビデオエンコーダ５３はエンコードした映像データをビデオメモリ５４に出力し、ビデオメモリ５４からエンコードされた映像データが受信端末に送信される。 The video data output from the file decoder 51 is input to the video encoder 53. The video encoder 53 deletes the deleted frame determined by the frame control unit 52 from the video data, and constructs the video frame again. Then, the video encoder 53 encodes the reconstructed video frame into a transmission format. The encoded video data is divided or aggregated into, for example, RTMP (Real Time Messaging Protocol) packets. The video encoder 53 outputs the encoded video data to the video memory 54, and the encoded video data is transmitted from the video memory 54 to the receiving terminal.

一方、ファイルデコーダ５１により分離された音声データはオーディオエンコーダ５５に入力される。オーディオエンコーダ５５は受信した音声データを送付用の形式に変換してオーディオメモリ５６に出力する。そして、エンコードされた音声データはオーディオメモリ５６から受信端末に送信される。ここで、削除された映像フレームに対応する音声フレームが削除されることはなく、音声データのメタ情報も変更されずに送信される。 On the other hand, the audio data separated by the file decoder 51 is input to the audio encoder 55. The audio encoder 55 converts the received audio data into a sending format and outputs it to the audio memory 56. The encoded audio data is transmitted from the audio memory 56 to the receiving terminal. Here, the audio frame corresponding to the deleted video frame is not deleted, and the meta information of the audio data is transmitted without being changed.

次に、送信端末における映像フレームの削除処理の動作フローを説明する。図８は、映像フレームの削減処理のフローチャートを示す。ここで、送信端末は、作業バッファに格納される所定期間の映像フレーム毎に削除対象フレームを判定し、そのフレームを削除することによって、作業バッファに格納された動画データの映像フレームレートが送信用フレームレート以下となるように調整される。尚、図８のフローは、作業バッファに格納される所定期間の映像フレーム毎に周期的に実行される。 Next, an operation flow of video frame deletion processing in the transmission terminal will be described. FIG. 8 shows a flowchart of video frame reduction processing. Here, the transmission terminal determines a frame to be deleted for each video frame of a predetermined period stored in the work buffer, and by deleting the frame, the video frame rate of the moving image data stored in the work buffer is set for transmission. It is adjusted so that it is below the frame rate. The flow in FIG. 8 is periodically executed for each video frame of a predetermined period stored in the work buffer.

先ず、送信端末は、所定期間の動画データを作業バッファに読み込む（バッファリングする）（Ｓ６１）。次に、送信端末は、配信品質監視の結果を確認し、送信用フレームレートを認識する（Ｓ６２）。そして、送信端末は、現在作業バッファ内に格納されている動画データの映像フレームレートが、Ｓ６２で認識した送信用フレームレートとなるように、現在作業バッファに格納されている映像フレームのうち削除するフレーム数を確認する（Ｓ６３）。 First, the transmitting terminal reads (buffers) moving image data for a predetermined period into a work buffer (S61). Next, the transmission terminal confirms the result of the distribution quality monitoring and recognizes the transmission frame rate (S62). Then, the transmitting terminal deletes the video frames currently stored in the work buffer so that the video frame rate of the moving image data currently stored in the work buffer becomes the transmission frame rate recognized in S62. The number of frames is confirmed (S63).

次に、送信端末は、作業バッファに格納された映像フレームのうち、各フレームに対応する音声レベルに基いて削除する削除対象フレームの判別を行う（Ｓ６４）。そして、送信端末は、Ｓ６４で判別した削除対象フレームの数が、Ｓ６３で確認した削除フレーム数以上か否かを判定する（Ｓ６５）。 Next, the transmission terminal determines a deletion target frame to be deleted based on the audio level corresponding to each frame among the video frames stored in the work buffer (S64). Then, the transmitting terminal determines whether or not the number of deletion target frames determined in S64 is equal to or greater than the number of deleted frames confirmed in S63 (S65).

Ｓ６４で判別した削除対象フレーム数がＳ６３で確認した削除フレームの数より小さい場合（Ｓ６５でＮｏ）、送信端末は、作業バッファに格納するフレームの期間を増加して（Ｓ６６）、再度削除フレームの判別を行う（Ｓ６４）。Ｓ６４で判別した削除対象フレームの数がＳ６３で確認した削除フレーム数以上である場合（Ｓ６５でＹｅｓ）、送信端末は、Ｓ６４で判別した削除対象フレームを削除し、その削除フレームの情報をメタ情報に付加する（Ｓ６６）。そして、送信端末は、映像データを送付用にエンコードして動画データを配信する（Ｓ６７）。 When the number of frames to be deleted determined in S64 is smaller than the number of deleted frames confirmed in S63 (No in S65), the transmitting terminal increases the period of frames to be stored in the work buffer (S66), and deletes the deleted frames again. A determination is made (S64). When the number of deletion target frames determined in S64 is equal to or greater than the number of deletion frames confirmed in S63 (Yes in S65), the transmitting terminal deletes the deletion target frame determined in S64 and uses the information on the deletion frames as meta information. (S66). Then, the transmission terminal encodes the video data for delivery and distributes the video data (S67).

図９は、作業バッファに格納された動画データのうち、各フレームの音声レベルに基いて送信端末が削除対象レームを判別する動作（Ｓ６４）のフローの詳細を示す。図９においては、説明のために判別対象映像フレームのフレーム番号をｎ（以下、フレーム番号がｎの映像フレームをｎフレームと記す）として説明する。 FIG. 9 shows the details of the flow of the operation (S64) in which the transmitting terminal discriminates the deletion target frame based on the audio level of each frame in the moving image data stored in the work buffer. In FIG. 9, for the sake of explanation, the frame number of the discrimination target video frame is assumed to be n (hereinafter, the video frame having the frame number n is referred to as n frame).

送信端末はフレーム番号がｎ−１の映像フレーム（ｎ−１フレーム）の音声レベルが所定の閾値以上か否かを判定する（Ｓ７１）。ｎ−１フレームの音声レベルが所定の閾値未満である場合（Ｓ７１でＮｏ）、送信端末は、ｎフレームは削除対象フレームではないと判定する（Ｓ７４）。そして、送信端末はｎの値をインクリメントする（Ｓ７５）。 The transmitting terminal determines whether or not the audio level of the video frame (n-1 frame) with the frame number n-1 is equal to or greater than a predetermined threshold (S71). When the audio level of the n-1 frame is less than the predetermined threshold (No in S71), the transmitting terminal determines that the n frame is not a deletion target frame (S74). Then, the transmitting terminal increments the value of n (S75).

Ｓ７１でｎ−１フレームの音声レベルが所定の閾値以上である場合（Ｓ７１でＹｅｓ）、送信端末は、ｎフレームの音声レベルが所定の閾値以上か否かを判定する（Ｓ７２）。 When the sound level of the n-1 frame is equal to or higher than the predetermined threshold value in S71 (Yes in S71), the transmitting terminal determines whether or not the sound level of the n frame is equal to or higher than the predetermined threshold value (S72).

ｎフレームの音声レベルが所定の閾値未満である場合（Ｓ７２でＮｏ）、送信端末は、ｎフレームは削除対象のフレームではないと判定する（Ｓ７４）。そして、送信端末はｎの値をインクリメントする（Ｓ７５）。 When the audio level of n frames is less than the predetermined threshold (No in S72), the transmitting terminal determines that the n frames are not frames to be deleted (S74). Then, the transmitting terminal increments the value of n (S75).

Ｓ７２でｎフレームの音声レベルが所定の閾値以上である場合（Ｓ７２でＹｅｓ）、送信端末は、ｎフレームは削除対象のフレームであると判定する（Ｓ７３）。そして、送信端末はｎの値をインクリメントする（Ｓ７５）。 If the sound level of the n frame is equal to or greater than the predetermined threshold value in S72 (Yes in S72), the transmitting terminal determines that the n frame is a frame to be deleted (S73). Then, the transmitting terminal increments the value of n (S75).

Ｓ７５で、ｎの値をインクリメントした後、送信端末は、すべての作業バッファ内の映像フレームに対して判別処理が行われたか否かを判定する（Ｓ７６）。作業バッファ内の映像フレームで判別処理が行われていないものがある場合（Ｓ７６でＮｏ）、処理はＳ７１に戻る。すべての作業バッファ内の映像フレームに対して判別処理が行われた場合（Ｓ７６でＹｅｓ）、処理は終了する。 After incrementing the value of n in S75, the transmitting terminal determines whether or not the discrimination process has been performed on the video frames in all the work buffers (S76). If there is a video frame in the work buffer that has not been subjected to the discrimination process (No in S76), the process returns to S71. If discrimination processing has been performed on all the video frames in the work buffer (Yes in S76), the processing ends.

尚、作業バッファは動画データの全ての映像フレームを格納する必要はなく、映像フレームの削除処理を行い、映像フレームレートを配信品質監視で決定された送信用フレームレート以下にすることができる領域があればよい。 Note that the work buffer does not have to store all the video frames of the moving image data, and there is an area in which the video frame can be deleted and the video frame rate can be made equal to or lower than the transmission frame rate determined by the distribution quality monitoring. I just need it.

次に、所定期間の動画データにおいて、削除対象と判定される映像フレームが削除された結果、その所定期間の動画データの映像フレームレートが送信用フレームレートより小さくならない場合の動作について説明する。ここで、送信端末において、送信用フレームレートと一度に比較される動画データの期間を動画データの判別対象期間と記す。 Next, an operation when the video frame determined to be deleted is deleted from the video data for a predetermined period and the video frame rate of the video data for the predetermined period does not become lower than the transmission frame rate will be described. Here, the period of moving image data that is compared with the transmission frame rate at a time in the transmitting terminal is referred to as a moving image data discrimination target period.

上記に述べたように、送信端末は、動画データを所定期間毎に区切って削除対象の判別を行い、この所定期間における動画の映像フレームレートが送信用フレームレートより小さいか否かの判定を行っている。 As described above, the transmission terminal determines the deletion target by dividing the moving image data every predetermined period, and determines whether or not the video frame rate of the moving image in this predetermined period is smaller than the transmission frame rate. ing.

削除対象と判定された映像フレームを削除した結果、その所定期間の動画データの映像フレームレートが送信用フレームレートより小さくならない場合、送信端末は、動画データの判別対象期間を増加させる。本実施形態では、送信端末は、削除対象か否かの判別が同時に行われる映像フレームが格納される作業バッファに格納される動画データの期間を増加させる。送信端末は、作業バッファが格納可能な映像フレームの期間における動画データの映像フレームレートが、配信品質監視によって決定された送信用フレームレート以下となるように映像フレームを削除する。 If the video frame rate of the moving image data in the predetermined period does not become lower than the transmission frame rate as a result of deleting the video frame determined to be deleted, the transmitting terminal increases the determination target period of the moving image data. In the present embodiment, the transmission terminal increases the period of the moving image data stored in the work buffer in which the video frames that are simultaneously determined whether or not to be deleted are stored. The transmission terminal deletes the video frame so that the video frame rate of the moving image data during the period of the video frame that can be stored in the work buffer is equal to or less than the transmission frame rate determined by the distribution quality monitoring.

具体的には、例えば図４の例で、Ａ〜Ｋの映像フレームのうち、音声レベルが閾値以下の映像フレームが所定の数以上存在しない場合、さらに映像フレームＬ〜Ｖを作業バッファにバッファリングし、その中で類似度が閾値以下である映像フレームを削除する。そして、映像フレームＡ〜Ｖの期間で映像フレームレートが、配信品質監視で決定された送信用フレームレート以下か否かを確認する。 Specifically, for example, in the example of FIG. 4, when there are no more than a predetermined number of video frames having an audio level equal to or less than a threshold among the video frames A to K, the video frames L to V are further buffered in the work buffer. Among them, the video frame whose similarity is equal to or less than the threshold is deleted. Then, it is confirmed whether or not the video frame rate is equal to or lower than the transmission frame rate determined by the distribution quality monitoring in the period of the video frames A to V.

図１０は、受信端末のデコード処理を説明するための図である。デコード処理部は、ファイルデコーダ９１、フレーム制御部９２、ビデオデコーダ９３、ビデオメモリ９４、オーディオデコーダ９５、オーディオメモリ９６を含む。 FIG. 10 is a diagram for explaining the decoding process of the receiving terminal. The decoding processing unit includes a file decoder 91, a frame control unit 92, a video decoder 93, a video memory 94, an audio decoder 95, and an audio memory 96.

ファイルデコーダ９１は、送信端末より受信したストリーミングデータを映像データと音声データに分離する。そしてファイルデコーダ９１は、分離した映像データをビデオデコーダ９３に出力し、音声データをオーディオデコーダ９５に出力する。また、ファイルデコーダ９１は、受信したストリーミングデータよりメタ情報を抽出し、フレーム制御部９２に送信する。 The file decoder 91 separates the streaming data received from the transmission terminal into video data and audio data. Then, the file decoder 91 outputs the separated video data to the video decoder 93 and outputs the audio data to the audio decoder 95. In addition, the file decoder 91 extracts meta information from the received streaming data and transmits it to the frame control unit 92.

フレーム制御部９２は、ファイルデコーダ９１から受信したメタ情報から削除フレームを認識し、削除フレームの情報をビデオデコーダ９３に出力する。削除フレームの情報とは、例えば、削除フレーム開始番号４１、削除フレーム期間（フレーム数）４２を含む。また、フレーム制御部９２は、メタ情報を用いて、削除フレームに対応する音声フレームのタイムスタンプを認識し、その音声フレームのタイムスタンプ情報を制御信号としてビデオデコーダ９３に出力する。 The frame control unit 92 recognizes the deleted frame from the meta information received from the file decoder 91, and outputs the deleted frame information to the video decoder 93. The deleted frame information includes, for example, a deleted frame start number 41 and a deleted frame period (number of frames) 42. Further, the frame control unit 92 recognizes the time stamp of the audio frame corresponding to the deleted frame using the meta information, and outputs the time stamp information of the audio frame to the video decoder 93 as a control signal.

ビデオデコーダ９３は、ファイルデコーダ９１から映像データを受信し、フレーム制御部９２から削除フレーム情報を受信する。ビデオデコーダ９３は映像データをデコードする。そして、ビデオデコーダ９３は映像データと削除フレーム情報から削除フレームの補完フレームを生成し、映像フレームの再構築を行う。ここで生成した補完フレームのタイムスタンプは、フレーム制御部９２から受信した制御信号により、削除フレームと対応する音声フレームのタイムスタンプと同じ値に設定される。これによりリップシンク調整がなされる。そして、ビデオデコーダ９３は再構築した映像データをビデオメモリ９４に出力する。 The video decoder 93 receives video data from the file decoder 91 and receives deletion frame information from the frame control unit 92. The video decoder 93 decodes the video data. Then, the video decoder 93 generates a complementary frame of the deleted frame from the video data and the deleted frame information, and reconstructs the video frame. The time stamp of the complementary frame generated here is set to the same value as the time stamp of the audio frame corresponding to the deleted frame by the control signal received from the frame control unit 92. Thereby, lip sync adjustment is performed. Then, the video decoder 93 outputs the reconstructed video data to the video memory 94.

オーディオデコーダ９５は、ファイルデコーダ９１から音声データを受信し、デコード処理を行う。そして、音声データをオーディオメモリ９６に出力する。 The audio decoder 95 receives audio data from the file decoder 91 and performs decoding processing. Then, the audio data is output to the audio memory 96.

ここで、リップシンク調整は、ビデオメモリ９４、オーディオメモリ９６にデータが格納された時点でなされるのではなく、ビデオデコーダ９３において完了する。これにより、受信端末でリップシンク調整のためのメモリ容量を削減することができる。 Here, the lip sync adjustment is not performed when the data is stored in the video memory 94 and the audio memory 96 but is completed in the video decoder 93. Thereby, the memory capacity for lip sync adjustment at the receiving terminal can be reduced.

次に、受信端末によるフレーム再構築の動作を説明する。図１１は、受信端末によるフレーム再構築のフローチャートである。 Next, the operation of frame reconstruction by the receiving terminal will be described. FIG. 11 is a flowchart of frame reconstruction by the receiving terminal.

先ず受信端末のビデオデコーダ９３は、削除フレームの補完フレーム生成処理を実行する（Ｓ１０１）。 First, the video decoder 93 of the receiving terminal executes a deletion frame complementary frame generation process (S101).

ビデオデコーダ９３は、メタ情報に含まれる削除フレームに関する情報をフレーム制御部９２から受信する。削除フレームに関する情報は、例えば、削除フレーム開始番号４１、削除フレームの期間（フレーム数）４２を含む。そしてビデオデコーダ９３は、削除フレームに関する情報から削除フレーム番号と削除フレームの期間を認識する（Ｓ１０２）。尚、ここで認識するのは削除フレーム番号の代わりに再開始フレーム番号としてもよい。 The video decoder 93 receives information regarding the deleted frame included in the meta information from the frame control unit 92. The information regarding the deleted frame includes, for example, a deleted frame start number 41 and a deleted frame period (number of frames) 42. Then, the video decoder 93 recognizes the deletion frame number and the period of the deletion frame from the information regarding the deletion frame (S102). Note that the restart frame number may be recognized instead of the deletion frame number.

次に、ビデオデコーダ９３は、認識した削除フレーム番号または再開始フレーム番号に基いて、生成した補完フレームにフレーム番号を割り当てる（Ｓ１０３）。このフレーム番号の割り当ては、補完フレームを削除されたフレームの位置に挿入する動作といえる。 Next, the video decoder 93 assigns a frame number to the generated complementary frame based on the recognized deletion frame number or restart frame number (S103). This frame number assignment can be said to be an operation of inserting a complementary frame at the position of the deleted frame.

次に、ビデオデコーダ９３は、所定の期間のフレームを検出したら（Ｓ１０４）、生成した補完フレームと、Ｓ１０３で割り当てられたフレーム番号に対応する音声フレームのタイムスタンプの値を一致させる（Ｓ１０５）。これにより、リップシンクの調整がなされ、映像と音声の同期がとれる。 Next, when the video decoder 93 detects a frame of a predetermined period (S104), the generated complementary frame and the time stamp value of the audio frame corresponding to the frame number assigned in S103 are matched (S105). As a result, the lip sync is adjusted, and the video and audio can be synchronized.

そして、ビデオデコーダ９３は、リップシンク調整が完了した映像データをビデオメモリ９４に出力し、また、音声データをオーディオメモリ９６に出力する。そして映像データと音声データがそれぞれビデオメモリ９４とオーディオメモリ９６に出力されたことが動画の再生アプリケーションに通知される（Ｓ１０６）。このデータを用いて移動端末３５は動画を再生する。尚、図１１のフローは所定期間のフレーム毎に周期的に実行される。 Then, the video decoder 93 outputs the video data for which the lip sync adjustment has been completed to the video memory 94, and outputs the audio data to the audio memory 96. Then, the moving image reproduction application is notified that the video data and the audio data have been output to the video memory 94 and the audio memory 96, respectively (S106). Using this data, the mobile terminal 35 reproduces the moving image. Note that the flow of FIG. 11 is periodically executed for each frame of a predetermined period.

図１２は、受信端末における削除フレームに対する補完フレームの生成処理の動作フローチャートである。図１２においては、説明のためにメタ情報の削除フレーム開始番号４１の値がｎ＋１、削除フレーム期間（フレーム数）４２の値がＴである場合の例を説明する。また、映像フレームにおいて移動量が大きい対象物が存在する場合、その対象物を示す画素の値を、直前の映像フレームから複製するか直後の映像フレームから複製するかを変更させるために用いられる閾値をｘとする。この例では、閾値ｘはフレームの数を表す整数とするが、これに限定されない。 FIG. 12 is an operation flowchart of a complementary frame generation process for a deleted frame in the receiving terminal. In FIG. 12, an example in which the value of the deletion frame start number 41 of the meta information is n + 1 and the value of the deletion frame period (number of frames) 42 is T will be described for the sake of explanation. Further, when there is an object having a large movement amount in the video frame, a threshold value used for changing whether the value of the pixel indicating the object is duplicated from the immediately preceding video frame or the immediately following video frame Let x be x. In this example, the threshold value x is an integer representing the number of frames, but is not limited thereto.

受信端末は、メタデータの情報から、送信端末において削除された映像フレームが存在することを確認する（Ｓ１２１）。すなわち、受信端末はメタデータに含まれる削除フレーム開始番号４１、削除フレーム期間（フレーム数）４２を認識する。 The receiving terminal confirms from the metadata information that there is a video frame deleted at the transmitting terminal (S121). That is, the receiving terminal recognizes the deletion frame start number 41 and the deletion frame period (number of frames) 42 included in the metadata.

そして、受信端末は、削除されていないフレームのうち、削除フレーム開始番号４１の直前のフレーム番号を有する映像フレーム（ｎフレーム）をバッファリングする（Ｓ１２２）。 Then, the receiving terminal buffers the video frame (n frame) having the frame number immediately before the deleted frame start number 41 among the frames not deleted (S122).

次に、受信端末は、削除されていないフレームのうち、削除フレーム開始番号４１の直後のフレーム番号を有するフレーム（（ｎ＋１＋Ｔ）フレーム）をバッファリングする（Ｓ１２３）。 Next, the receiving terminal buffers a frame ((n + 1 + T) frame) having a frame number immediately after the deleted frame start number 41 among frames that have not been deleted (S123).

そして、受信端末はｎフレームと（ｎ＋１＋Ｔ）フレームを用いて、フレームにおいて移動量の大きい対象物を判別する（Ｓ１２４）。 Then, the receiving terminal uses the n frame and the (n + 1 + T) frame to determine an object having a large movement amount in the frame (S124).

次に、補完処理対象のフレームをｎ＋１フレームに設定する（Ｓ１２５）。そして、受信端末は、補完処理対象フレームのフレーム番号がｎ＋ｘ以上か否かを判定する（Ｓ１２６）。補完処理対象フレームのフレーム番号がｎ＋ｘ未満である場合（Ｓ１２６でＮｏ）、補完処理対象フレームは、ｎフレームを複製することにより生成される（Ｓ１２７）。 Next, the frame to be complemented is set to n + 1 frame (S125). Then, the receiving terminal determines whether or not the frame number of the complementary processing target frame is n + x or more (S126). When the frame number of the complement processing target frame is less than n + x (No in S126), the complement processing target frame is generated by duplicating n frames (S127).

そして、受信端末は補完処理対象のフレーム番号をインクリメントし（Ｓ１２９）、そのフレーム番号がｎ＋Ｔより大きいか否かを判定する（Ｓ１３０）。補完処理対象のフレーム番号がｎ＋Ｔ以下である場合（Ｓ１３０でＮｏ）、処理はＳ１２６に戻る。補完処理対象のフレーム番号がｎ＋Ｔより大きい場合（Ｓ１３０でＹｅｓ）、処理は終了する。 The receiving terminal increments the frame number to be complemented (S129), and determines whether the frame number is greater than n + T (S130). When the complement processing target frame number is n + T or less (No in S130), the process returns to S126. If the frame number to be complemented is greater than n + T (Yes in S130), the process ends.

Ｓ１２６において、補完処理対象フレームのフレーム番号がｎ＋ｘ以上である場合（Ｓ１２６でＹｅｓ）、Ｓ１２４で判定した移動量の大きい対象物を（ｎ＋１＋Ｔ）フレームから複製し、そうでない対象物をｎフレームから複製する（Ｓ１２８）。 In S126, when the frame number of the complement processing target frame is n + x or more (Yes in S126), the object with the large amount of movement determined in S124 is duplicated from the (n + 1 + T) frame, and the other object is duplicated from the n frame. (S128).

そして、補完処理対象のフレーム番号をインクリメントし（Ｓ１２９）、そのフレーム番号がｎ＋Ｔより大きいか否かを判定する（Ｓ１３０）。補完処理対象のフレーム番号がｎ＋Ｔ以下である場合（Ｓ１３０でＮｏ）、処理はＳ１２６に戻る。補完処理対象のフレーム番号がｎ＋Ｔより大きい場合（Ｓ１３０でＹｅｓ）、処理は終了する。 Then, the frame number to be complemented is incremented (S129), and it is determined whether or not the frame number is greater than n + T (S130). When the complement processing target frame number is n + T or less (No in S130), the process returns to S126. If the frame number to be complemented is greater than n + T (Yes in S130), the process ends.

図１３Ａ、図１３Ｂは本実施形態にかかる情報処理システムのシーケンス図である。図１３は、パソコン３３とサーバ３１と移動端末３５との動作関係を表している。 13A and 13B are sequence diagrams of the information processing system according to the present embodiment. FIG. 13 shows an operational relationship among the personal computer 33, the server 31, and the mobile terminal 35.

移動端末３５はパソコン３３に保存された動画データを取得するために、移動端末３５とパソコン３３とを中継するサーバ３１に接続を行う（Ｓ４０１）。接続の確立前に、サーバ３１は接続要求のあった移動端末３５の正当性を確認するために認証を行う。また、サーバ３１は移動端末３５の認証と同時に、移動端末３５における動画再生時の解像度を認識する（Ｓ４０２）。認証が成功した場合、サーバ３１は移動端末３５に機器承認応答を行う（Ｓ４０３）。 The mobile terminal 35 connects to the server 31 that relays between the mobile terminal 35 and the personal computer 33 in order to acquire the moving image data stored in the personal computer 33 (S401). Before the connection is established, the server 31 performs authentication in order to confirm the validity of the mobile terminal 35 that requested the connection. Further, the server 31 recognizes the resolution at the time of moving image reproduction on the mobile terminal 35 simultaneously with the authentication of the mobile terminal 35 (S402). If the authentication is successful, the server 31 sends a device approval response to the mobile terminal 35 (S403).

次に、サーバ３１は移動端末３５から要求のあった動画データが保存されているパソコン３３に接続を行う（Ｓ４０４）。パソコン３３とサーバ３１の接続確立時にも接続の正当性を確認するための認証が行われる。認証が成功した場合、パソコン３３は接続応答をサーバ３１に送信する（Ｓ４０５）。 Next, the server 31 connects to the personal computer 33 storing the moving image data requested by the mobile terminal 35 (S404). Even when the connection between the personal computer 33 and the server 31 is established, authentication for confirming the validity of the connection is performed. If the authentication is successful, the personal computer 33 transmits a connection response to the server 31 (S405).

次に、パソコン３３は提供可能な動画の一覧情報を、サーバ３１を介して移動端末３５に通知する（Ｓ４０６）。提供可能な動画の一覧は、移動端末３５毎に、または、サーバ３１毎に予め設定して、パソコン３３にファイルとして保存されてもよい。また、移動端末３５の解像度の情報をサーバ３１から受信し、その情報に応じて、パソコン３３は、動画データのうち移動端末３５が再生することができる動画を判定してもよい。 Next, the personal computer 33 notifies the mobile terminal 35 of list information of videos that can be provided via the server 31 (S406). A list of videos that can be provided may be preset for each mobile terminal 35 or for each server 31 and stored as a file in the personal computer 33. Further, the resolution information of the mobile terminal 35 may be received from the server 31, and the personal computer 33 may determine a video that the mobile terminal 35 can reproduce from the video data according to the information.

通知された一覧が移動端末３５の画面に表示されると、ユーザは、移動端末３５を操作して、再生する動画を選択する。移動端末３５は、サーバ３１に対して、選択された再生動画の配信要求（以下、再生動画要求と記す）を行う（Ｓ４０７）。 When the notified list is displayed on the screen of the mobile terminal 35, the user operates the mobile terminal 35 to select a moving image to be reproduced. The mobile terminal 35 makes a delivery request for the selected playback video (hereinafter referred to as a playback video request) to the server 31 (S407).

次に、サーバ３１は移動端末３５とサーバ３１間のネットワークの帯域幅を監視するための情報収集作業を行う（Ｓ４０８）。 Next, the server 31 performs information collection work for monitoring the network bandwidth between the mobile terminal 35 and the server 31 (S408).

次に、サーバ３１は移動端末３５から受信した再生動画要求をパソコン３３に対して送信する（Ｓ４０９）。パソコン３３は再生動画要求を受信すると、再生動画要求で指定のあった動画に関するメタ情報をサーバ３１に通知する（Ｓ４１０）。 Next, the server 31 transmits the reproduction moving image request received from the mobile terminal 35 to the personal computer 33 (S409). When receiving the playback video request, the personal computer 33 notifies the server 31 of meta information related to the video specified by the playback video request (S410).

サーバ３１はパソコン３３からメタ情報を受信すると、サーバ３１と移動端末３５間の動画送信の配信品質監視を行う（Ｓ４１１）。具体的には、動画の転送で使用する帯域幅が、サーバ３１と移動端末３５間のネットワークの帯域幅以下となるように、サーバ３１は送信する動画データの映像フレームレートを調整するための調整量を決定する。このときサーバ３１は、動画データの変更情報を移動端末３５が認識できるように、動画データの変更情報を再生情報変更要求として移動端末３５に通知する（Ｓ４１２）。 When the server 31 receives the meta information from the personal computer 33, the server 31 monitors the distribution quality of the moving image transmission between the server 31 and the mobile terminal 35 (S411). Specifically, the server 31 adjusts the video frame rate of the moving image data to be transmitted so that the bandwidth used for moving image transfer is equal to or less than the network bandwidth between the server 31 and the mobile terminal 35. Determine the amount. At this time, the server 31 notifies the mobile terminal 35 of the video data change information as a reproduction information change request so that the mobile terminal 35 can recognize the video data change information (S412).

移動端末３５はサーバ３１から再生情報変更要求を受信すると、配信動画の変更を認識し、再生設定応答をサーバ３１に行う（Ｓ４１３）。 When receiving the reproduction information change request from the server 31, the mobile terminal 35 recognizes the change of the distribution video and sends a reproduction setting response to the server 31 (S413).

次に、サーバ３１はストリーミング開始要求をパソコン３３に通知する（Ｓ４１４）。パソコン３３はサーバ３１からストリーミング開始要求を受信すると、サーバ３１を介して移動端末３５にストリーミング開始応答を行う（Ｓ４１５）。そしてパソコン３３はサーバ３１に対してストリーミングデータを配信する（Ｓ４１６）。 Next, the server 31 notifies the personal computer 33 of a streaming start request (S414). When receiving the streaming start request from the server 31, the personal computer 33 sends a streaming start response to the mobile terminal 35 via the server 31 (S415). The personal computer 33 distributes the streaming data to the server 31 (S416).

サーバ３１はストリーミングデータを受信すると、動画データの音声レベルを認識する（Ｓ４１７）。そして、サーバ３１は、認識した音声レベルに基いて、映像フレームレートが配信品質監視により導かれる送信用フレームレートとなるように、削除対象の映像フレームを判別し、選択する。ここで、サーバ３１は、動画データの解像度をＳ４０２で認識した解像度に変更してもよい。そして、サーバ３１は、選択した削除対象フレームを動画データから削除する（Ｓ４１８）。 When receiving the streaming data, the server 31 recognizes the audio level of the moving image data (S417). Then, the server 31 determines and selects the video frame to be deleted so that the video frame rate becomes the transmission frame rate derived by the distribution quality monitoring based on the recognized audio level. Here, the server 31 may change the resolution of the moving image data to the resolution recognized in S402. Then, the server 31 deletes the selected deletion target frame from the moving image data (S418).

次に、サーバ３１は削除した映像フレームの情報をメタ情報に追記する（Ｓ４１９）。そして、サーバ３１は、削除処理を行った動画データを送付用の形式にエンコードして（Ｓ４２０）、エンコードしたデータを移動端末３５にストリーミング配信する（Ｓ４２１）。 Next, the server 31 adds the deleted video frame information to the meta information (S419). Then, the server 31 encodes the deleted moving image data into a format for sending (S420), and streams the encoded data to the mobile terminal 35 (S421).

移動端末３５はストリーミングデータを受信すると、受信したデータをデコードする（Ｓ４２２）。そして、移動端末３５は、Ｓ４１８で削除された映像フレームに対応する補完フレームを生成し動画データに挿入することにより、動画データを復元する（Ｓ４２３）。そして、移動端末３５は、復元した動画データを再生する（Ｓ４２４）。 When receiving the streaming data, the mobile terminal 35 decodes the received data (S422). Then, the mobile terminal 35 restores the moving image data by generating a complementary frame corresponding to the video frame deleted in S418 and inserting it into the moving image data (S423). Then, the mobile terminal 35 reproduces the restored moving image data (S424).

そして、所定のデータ量ごとにＳ４０８〜Ｓ４２４の動作が繰り返され、パソコン３３による動画データのストリーミング配信と移動端末３５による再生が行われる。尚、この所定データ量ごとのＳ４０８〜Ｓ４２４の動作は、データ量ごとに平行して行われてもよい。 Then, the operations of S408 to S424 are repeated for each predetermined amount of data, and streaming distribution of moving image data by the personal computer 33 and reproduction by the mobile terminal 35 are performed. Note that the operations in S408 to S424 for each predetermined data amount may be performed in parallel for each data amount.

次に、移動端末３５とサーバ３１、サーバ３１とパソコン３３の間の接続動作（Ｓ４０１、Ｓ４０４）について詳細に説明する。 Next, the connection operation (S401, S404) between the mobile terminal 35 and the server 31, and between the server 31 and the personal computer 33 will be described in detail.

移動端末３５には、動画を再生するためのアプリケーションプログラム（以下、アプリケーションと称する）がインストールされており、アプリケーションにより目的とするサーバ３１に接続する。目的とするサーバ３１の指定はユーザが選択できる構成としてもよいし、予めアプリケーションに設定されている構成としてもよい。移動端末３５は例えば３Ｇ（3rd Generation）回線のように接続するときにその回線を指定してインターネットに接続することができる。また、移動端末３５とパソコン３３はインターネットを経由してのＰ２Ｐ（Peer to Peer）接続を可能とする。 An application program (hereinafter referred to as an application) for reproducing a moving image is installed in the mobile terminal 35 and is connected to the target server 31 by the application. The designation of the target server 31 may be selected by the user, or may be set in advance in the application. For example, when the mobile terminal 35 is connected like a 3G (3rd Generation) line, the mobile terminal 35 can specify the line and connect to the Internet. In addition, the mobile terminal 35 and the personal computer 33 enable P2P (Peer to Peer) connection via the Internet.

サーバ３１と移動端末３５間、サーバ３１とパソコン３３間の接続確立時には認証が行われるが、そこで用いられる認証情報は、例えば、各機器の、ＩＰアドレス、ＭＡＣアドレスなどの固有機器情報などである。認証情報はサーバ３１上で管理され、ＩＰアドレスや固有機器情報を、移動端末３５と移動端末３５が接続可能なパソコン３３とを対応付けて、移動端末３５毎にグループ化して格納される。尚、１つの移動端末３５に対して１つのパソコン３３を対応付けてもよいし、複数と対応付けてもよい。 Authentication is performed when a connection is established between the server 31 and the mobile terminal 35 and between the server 31 and the personal computer 33. Authentication information used there is, for example, specific device information such as an IP address and a MAC address of each device. . The authentication information is managed on the server 31, and the IP address and unique device information are stored grouped for each mobile terminal 35 in association with the mobile terminal 35 and the personal computer 33 to which the mobile terminal 35 can be connected. One personal computer 33 may be associated with one mobile terminal 35 or may be associated with a plurality.

移動端末３５とサーバ３１の接続確立時の動作について説明する。サーバ３１は、移動端末３５から接続要求を受信すると、接続要求のあった移動端末３５のＩＰアドレスまたは固有機器情報と、サーバ３１に保存されているＩＰアドレスまたは固有機器情報と、を照合する。照合の結果が一致すれば、サーバ３１は認証が成功したとして接続を確立する。尚、認証には種々の認証技術が用いられてもよく、パスワード認証方式や電子証明書認証方式等を用いてもよい。 An operation when establishing a connection between the mobile terminal 35 and the server 31 will be described. When receiving the connection request from the mobile terminal 35, the server 31 collates the IP address or unique device information of the mobile terminal 35 that has made the connection request with the IP address or unique device information stored in the server 31. If the collation results match, the server 31 establishes a connection assuming that the authentication is successful. Various authentication techniques may be used for authentication, and a password authentication method, an electronic certificate authentication method, or the like may be used.

次に、サーバ３１とパソコン３３の接続確立時の動作について説明する。サーバ３１は移動端末３５との接続を確立した後、その移動端末３５に対応するパソコン３３に接続を行う。具体的には、サーバ３１に保存されている認証情報において１つの移動端末３５に対して１つのパソコン３３が対応付けられている場合、サーバ３１は認証情報を確認して移動端末３５に対応するパソコン３３に接続を行う。もしくは、移動端末３５がサーバ３１との接続を確立する際に、移動端末３５が接続先のパソコン３３を指定し、サーバ３１は指定されたパソコン３３に接続する構成にしてもよい。 Next, an operation when establishing a connection between the server 31 and the personal computer 33 will be described. After establishing the connection with the mobile terminal 35, the server 31 connects to the personal computer 33 corresponding to the mobile terminal 35. Specifically, when one personal computer 33 is associated with one mobile terminal 35 in the authentication information stored in the server 31, the server 31 confirms the authentication information and corresponds to the mobile terminal 35. Connect to PC 33. Alternatively, when the mobile terminal 35 establishes a connection with the server 31, the mobile terminal 35 may designate a personal computer 33 as a connection destination, and the server 31 may be connected to the designated personal computer 33.

サーバ３１からの接続要求を受けたパソコン３３は、アクセス要求のあったサーバ３１が正当なものか否かの認証を行う。認証にはサーバ３１が移動端末３５に対して行う認証と同様、種々の認証技術が用いられる。または、パソコン３３が保有する動画ファイルやパソコン３３のディレクトリ（フォルダ）にアクセス権限を付与してアクセス制御を行ってもよい。 The personal computer 33 that has received the connection request from the server 31 authenticates whether or not the server 31 that has requested access is legitimate. Various authentication techniques are used for the authentication, as in the authentication performed by the server 31 for the mobile terminal 35. Alternatively, access control may be performed by granting access authority to a moving image file held by the personal computer 33 or a directory (folder) of the personal computer 33.

尚、パソコン３３とサーバ３１、サーバ３１と移動端末３５間は、ＶＰＮ（Virtual Private Network）等の高セキュリティのネットワーク技術を用いて接続されてもよい。さらに、パソコン３３とサーバ３１、サーバ３１と移動端末３５間のデータ伝送において、伝送されるデータは種々の暗号化技術により暗号化されてもよい。また、サーバ３１とパソコン３３は同一イントラネット内に配置されてもよい。 The personal computer 33 and the server 31, and the server 31 and the mobile terminal 35 may be connected using a high security network technology such as VPN (Virtual Private Network). Furthermore, in data transmission between the personal computer 33 and the server 31, and between the server 31 and the mobile terminal 35, the transmitted data may be encrypted by various encryption techniques. The server 31 and the personal computer 33 may be arranged in the same intranet.

次に、サーバ３１によるネットワークの帯域幅を監視するための情報収集作業（Ｓ４０８）について詳細に説明する。 Next, the information collection work (S408) for monitoring the network bandwidth by the server 31 will be described in detail.

サーバ３１は、移動端末３５とサーバ３１間の帯域幅を監視する。サーバ３１は移動端末３５に接続されるネットワークの帯域幅を検出するために、移動端末３５に対して帯域検出用のパケットを送信する。送信パケットにはサーバ３１が送信した時刻が記録されている。移動端末３５はパケットを受信した際に受信時刻を計測し、これをパケットに記録された送信時刻と比較する。それにより移動端末３５は一定量のパケットがサーバ３１から移動端末３５まで到達するまでにかかる時間を算出することができる。移動端末３５はここで得られた帯域監視情報をサーバ３１に転送する。尚、サーバ３１は、一定量のデータを複数のパケットに分割して、最初のパケットをサーバ３１が送信してから最後のパケットを移動端末３５が受信するのに要した時間を計測することにより、帯域監視情報を取得してもよい。 The server 31 monitors the bandwidth between the mobile terminal 35 and the server 31. The server 31 transmits a band detection packet to the mobile terminal 35 in order to detect the bandwidth of the network connected to the mobile terminal 35. In the transmission packet, the time when the server 31 transmits is recorded. When the mobile terminal 35 receives the packet, it measures the reception time and compares it with the transmission time recorded in the packet. Thereby, the mobile terminal 35 can calculate the time required for a certain amount of packets to reach the mobile terminal 35 from the server 31. The mobile terminal 35 transfers the bandwidth monitoring information obtained here to the server 31. The server 31 divides a predetermined amount of data into a plurality of packets, and measures the time required for the mobile terminal 35 to receive the last packet after the server 31 transmits the first packet. Bandwidth monitoring information may be acquired.

尚、帯域の監視のためのパケットの送信は、実動画データの再生に問題がない範囲において行う。ここで、帯域の監視は帯域を監視するためのコマンドを送付することにより行ってもよいし、ｐｉｎｇ等のコマンドを送付し、その応答時間を計測することによって行ってもよい。また、パソコン３３がサーバ３１を介して移動端末３５にパケットを送付することで、一度にパソコン３３とサーバ３１間とサーバ３１と移動端末３５間の帯域幅の監視を行ってもよい。尚、帯域幅の監視のみではなく、伝送遅延から回線のトラフィックの監視を行ってもよい。 Note that transmission of packets for bandwidth monitoring is performed within a range where there is no problem in reproduction of actual moving image data. Here, the bandwidth may be monitored by sending a command for monitoring the bandwidth, or by sending a command such as ping and measuring the response time. Alternatively, the bandwidth between the personal computer 33 and the server 31 and between the server 31 and the mobile terminal 35 may be monitored at a time by sending a packet from the personal computer 33 to the mobile terminal 35 via the server 31. Note that not only bandwidth monitoring but also line traffic monitoring may be performed based on transmission delay.

一方、インターネット接続する回線は種類毎に１秒間にデータの送信（または受信）を安定的に可能とする情報量が規定されている。サーバ３１はこの規定の情報量と帯域監視情報とによって、単位時間当たりに安定的に送受信可能なデータ量（使用量）を判定する。 On the other hand, the amount of information that enables stable transmission (or reception) of data per second for each type of line connected to the Internet is defined. The server 31 determines the data amount (usage amount) that can be stably transmitted / received per unit time based on the prescribed information amount and the bandwidth monitoring information.

ここで、サーバ３１は、帯域監視動作後（Ｓ４０８後）、監視の結果を移動端末３５に送信し、ユーザが移動端末３５により帯域監視の結果を確認して再生する動画の解像度を指定してもよい。その場合、指定した解像度情報は配信品質監視において用いられる。 Here, after the bandwidth monitoring operation (after S408), the server 31 transmits the monitoring result to the mobile terminal 35, and the user confirms the bandwidth monitoring result by the mobile terminal 35 and designates the resolution of the moving image to be reproduced. Also good. In that case, the designated resolution information is used in distribution quality monitoring.

図１４は、本実施形態におけるサーバ３１の構成の一例を示す。サーバ３１は、デコード処理部１３１、演算処理部１３２、ストレージ１３３、コンテンツサーバ１３４、ストリーミングサーバ１３５を含む。 FIG. 14 shows an example of the configuration of the server 31 in the present embodiment. The server 31 includes a decoding processing unit 131, an arithmetic processing unit 132, a storage 133, a content server 134, and a streaming server 135.

デコード処理部１３１は、パソコン３３等の端末機器からアップロードされる動画データをデコードする。ここで、アップロードされる動画データは送付用の形式に分割または統合され、もしくは圧縮されているため、デコード処理を行い、動画データを修復する。 The decode processing unit 131 decodes moving image data uploaded from a terminal device such as the personal computer 33. Here, since the moving image data to be uploaded is divided, integrated, or compressed into a sending format, decoding processing is performed to restore the moving image data.

演算処理部１３２は、帯域幅の監視及び配信品質管理を行い、その結果に応じて動画データの映像フレームの削除処理及びメタ情報の変更処理を行う。 The arithmetic processing unit 132 performs bandwidth monitoring and distribution quality management, and performs a video frame deletion process and a meta information change process according to the result.

ストレージ１３３にはオペレーティングシステム、ミドルウェア、アプリケーションが格納されており、演算処理部１３２によりメモリに読み出され、実行される。 The storage 133 stores an operating system, middleware, and applications, which are read out to the memory by the arithmetic processing unit 132 and executed.

コンテンツサーバ１３４は、ストリーミング再生を行うために準備されているコンテンツが管理されており、移動端末３５はここで管理されているコンテンツから再生する動画を選択できる。 The content server 134 manages content prepared for streaming playback, and the mobile terminal 35 can select a moving image to be played back from the managed content.

ストリーミングサーバ１３５は、動画データを移動端末３５に配信する。動画データは演算処理部１３２から映像フレームが削除された動画データを受信する。また、ストリーミングサーバ１３５は、配信で使用する、例えばＨＴＴＰ（HyperText Transfer Protocol）、ＨＴＴＰ／ＲＴＭＰなどのプロトコルに応じて分けられる。 The streaming server 135 distributes moving image data to the mobile terminal 35. As the moving image data, the moving image data from which the video frame is deleted is received from the arithmetic processing unit 132. The streaming server 135 is divided according to protocols used for distribution, such as HTTP (HyperText Transfer Protocol) and HTTP / RTMP.

図１５は、本実施形態に係るサーバ３１またはパソコン３３のハードウェア構成の一例を示す。サーバ３１またはパソコン３３は、ＣＰＵ（Central Processing Unit）１６１、ＳＤＲＡＭ（Synchronous Dynamic Random Access Memory）１６２、シリアルポート１６３、フラッシュメモリ１６４、デジタルＩ／Ｏ、アナログＩ／Ｏ１６５を含む。また、サーバ３１またはパソコン３３はストレージ１６６、チップセット１６７、通信カード１６８、ＣＦ（Compact Flash（登録商標））インターフェースカード１６９、リアルタイムクロック１７０を含む。 FIG. 15 shows an example of the hardware configuration of the server 31 or the personal computer 33 according to this embodiment. The server 31 or the personal computer 33 includes a CPU (Central Processing Unit) 161, an SDRAM (Synchronous Dynamic Random Access Memory) 162, a serial port 163, a flash memory 164, a digital I / O, and an analog I / O 165. The server 31 or the personal computer 33 includes a storage 166, a chip set 167, a communication card 168, a CF (Compact Flash (registered trademark)) interface card 169, and a real time clock 170.

ＣＰＵ１６１は、ストレージ１６６に保存された上述のフロー図の手順を記述したプログラムを、ＳＤＲＡＭ１６２またはフラッシュメモリ１６４を利用して実行する。また、ＣＰＵ１６１は、チップセット１６７を介して、通信カード１６８、ＣＦインターフェースカード１６９、リアルタイムクロック１７０とデータのやり取りを行う。サーバ３１またはパソコン３３は、通信カードを介して、シリアルポート１６３またはデジタルＩ／Ｏ、アナログＩ／Ｏ１６５から動画データの入出力を行う。ＣＰＵ１６１は、監視部３、削除部４、フレーム情報生成部５、送信部６、受信部８、補完画像生成部９、映像データ生成部１０の一部または全部の機能を提供する。また、ＣＰＵ１６１は動画データのエンコード、デコード、リップシンク、接続確立のための認証動作、動画データの再生を行う。ストレージ１６６は、記憶部２の一部または全部の機能を提供する。ＣＰＵ１６１は、動画データのフレーム削除処理及び動画データの復元処理を行うための一時データ保存領域（作業バッファ）としてＳＤＲＡＭ１６２を利用できる。尚、ＳＤＲＡＭ１６２はこれに限定されず種々のＲＡＭ（Random Access Memory）とすることができる。 The CPU 161 uses the SDRAM 162 or the flash memory 164 to execute a program describing the procedure of the above-described flowchart stored in the storage 166. Further, the CPU 161 exchanges data with the communication card 168, the CF interface card 169, and the real time clock 170 via the chip set 167. The server 31 or the personal computer 33 inputs and outputs moving image data from the serial port 163, digital I / O, or analog I / O 165 via a communication card. The CPU 161 provides some or all of the functions of the monitoring unit 3, the deletion unit 4, the frame information generation unit 5, the transmission unit 6, the reception unit 8, the complementary image generation unit 9, and the video data generation unit 10. The CPU 161 performs encoding / decoding of the moving image data, lip sync, authentication operation for establishing a connection, and reproduction of the moving image data. The storage 166 provides a part or all of the functions of the storage unit 2. The CPU 161 can use the SDRAM 162 as a temporary data storage area (working buffer) for performing frame deletion processing of moving image data and restoration processing of moving image data. The SDRAM 162 is not limited to this and can be various RAMs (Random Access Memory).

フラッシュメモリ１６４にはカーネル、サーバ３１におけるアプリケーション、設定ファイル等が保存される。フラッシュメモリには拡張領域があり、動画データのフレーム削除処理、動画データの復元処理を行うための一時データ保存領域（作業バッファ）としても利用できる。 The flash memory 164 stores a kernel, an application in the server 31, a setting file, and the like. The flash memory has an expansion area and can be used as a temporary data storage area (working buffer) for performing frame deletion processing of moving image data and restoration processing of moving image data.

ＣＦインターフェースカード１６９はサーバ３１の保守等に使うための補助機能として使用される。ストレージが内蔵されているため、多くのパソコン３３と移動端末３５間のデータ処理にはこれが使用される。 The CF interface card 169 is used as an auxiliary function for use in maintenance of the server 31. Since the storage is built in, this is used for data processing between many personal computers 33 and mobile terminals 35.

リアルタイムクロック１７０は、コンピュータの時計としての機能をもつ専用のチップである。フレームのタイムスタンプはリアルタイムクロック１７０の時計に従って設定される。 The real-time clock 170 is a dedicated chip having a function as a computer clock. The time stamp of the frame is set according to the clock of the real time clock 170.

実施形態の第１の情報処理装置１及び第２の情報処理装置７の一部は、ハードウェアで実現してもよい。或いは、実施形態の第１の情報処理装置１及び第２の情報処理装置７は、ソフトウェアおよびハードウェアの組み合わせで実現してもよい。 A part of the first information processing apparatus 1 and the second information processing apparatus 7 of the embodiment may be realized by hardware. Alternatively, the first information processing device 1 and the second information processing device 7 of the embodiment may be realized by a combination of software and hardware.

図１６は、本実施形態に係る移動端末３５のハードウェア構成の一例を示す。移動端末３５は、図１６に示すように、ＣＰＵ２０１、メモリ２０２、記憶部２０３、読取部２０４、通信インターフェース２０６、入出力部２０７、表示部２０８を含む。なお、ＣＰＵ２０１、メモリ２０２、記憶部２０３、読取部２０４、通信インターフェース２０６、入出力部２０７、表示部２０８は、例えば、バス２０９を介して互いに接続されている。 FIG. 16 shows an example of the hardware configuration of the mobile terminal 35 according to the present embodiment. As shown in FIG. 16, the mobile terminal 35 includes a CPU 201, a memory 202, a storage unit 203, a reading unit 204, a communication interface 206, an input / output unit 207, and a display unit 208. Note that the CPU 201, the memory 202, the storage unit 203, the reading unit 204, the communication interface 206, the input / output unit 207, and the display unit 208 are connected to each other via a bus 209, for example.

ＣＰＵ２０１は、メモリ２０２を利用して上述のフローチャートの手順を記述したプログラムを実行する。ＣＰＵ２０１は、受信部８、補完画像生成部９、映像データ生成部１０の一部または全部の機能を提供する。また、ＣＰＵ２０１は動画データのデコード、リップシンク、動画データの再生を行う。メモリ２０２は、例えば半導体メモリであり、ＲＡＭ領域およびＲＯＭ領域を含んで構成される。記憶部２０３は、例えばハードディスクである。なお、記憶部２０３は、フラッシュメモリ等の半導体メモリであってもよい。動画データを再生するためのアプリケーションは、メモリ２０２または記憶部２０３に格納され、ＣＰＵ２０１によって実行される。移動端末３５は記憶部２０３を有さない構成とすることも可能である。 The CPU 201 uses the memory 202 to execute a program describing the above-described flowchart procedure. The CPU 201 provides some or all of the functions of the reception unit 8, the complementary image generation unit 9, and the video data generation unit 10. The CPU 201 also decodes moving image data, lip sync, and reproduces moving image data. The memory 202 is, for example, a semiconductor memory, and includes a RAM area and a ROM area. The storage unit 203 is a hard disk, for example. Note that the storage unit 203 may be a semiconductor memory such as a flash memory. An application for reproducing moving image data is stored in the memory 202 or the storage unit 203 and executed by the CPU 201. The mobile terminal 35 may be configured without the storage unit 203.

読取部２０４は、ＣＰＵ２０１の指示に従って着脱可能記録媒体２０５にアクセスする。着脱可能記録媒体２０５は、たとえば、半導体デバイス（ＵＳＢメモリ等）、磁気的作用により情報が入出力される媒体（磁気ディスク等）、光学的作用により情報が入出力される媒体（ＣＤ−ＲＯＭ、ＤＶＤ等）などにより実現される。 The reading unit 204 accesses the removable recording medium 205 according to an instruction from the CPU 201. The detachable recording medium 205 is, for example, a semiconductor device (USB memory or the like), a medium to / from which information is input / output by a magnetic action (magnetic disk or the like), a medium to / from which information is input / output by an optical action (CD-ROM, For example, a DVD).

通信インターフェース２０６は、ＣＰＵ２０１の指示に従ってネットワークを介してデータを送受信する。また、通信インターフェース２０６は、動画データを受信する。入出力部２０７は、例えば、ユーザからの指示を受け付けるデバイスに相当する。ユーザは入出力部２０７を使用して、再生する動画データの指定や動画データの解像度の指定が可能である。表示部２０８は、再生した動画データの表示を行う。 The communication interface 206 transmits / receives data via a network according to instructions from the CPU 201. The communication interface 206 receives moving image data. The input / output unit 207 corresponds to, for example, a device that receives an instruction from the user. The user can use the input / output unit 207 to specify the moving image data to be reproduced and the resolution of the moving image data. The display unit 208 displays the reproduced moving image data.

実施形態を実現するための情報処理プログラムは、例えば、下記の形態で移動端末３５に提供される。
（１）記憶部２０３に予めインストールされている。
（２）着脱可能記録媒体２０５により提供される。
（３）ネットワークを介して提供される。An information processing program for realizing the embodiment is provided to the mobile terminal 35 in the following form, for example.
(1) Installed in advance in the storage unit 203.
(2) Provided by the removable recording medium 205.
(3) Provided via a network.

（変形例）
図１７は、本実施形態（変形例）における情報処理システムの構成の一例を示す。本変形例と実施形態との違いは受信端末が例えばパソコン３３のような端末機器３７の形態であることである。同じパソコン３３間であってもインターネットの帯域幅によっては画質劣化が確認される場合があるため、このような場合は実施形態に記した構成が適用されることになる。また、変形例は、フレームの削除処理をパソコン３３が行う場合の例である。(Modification)
FIG. 17 shows an example of the configuration of the information processing system in the present embodiment (modification). The difference between this modification and the embodiment is that the receiving terminal is in the form of a terminal device 37 such as a personal computer 33, for example. Even in the same personal computer 33, image quality deterioration may be confirmed depending on the bandwidth of the Internet. In such a case, the configuration described in the embodiment is applied. The modification is an example in which the personal computer 33 performs the frame deletion process.

本実施形態におけるサーバ３１で行った、ネットワークの帯域監視、配信品質監視、フレームの削除処理とそれに相当する処理は、パソコン３３が行ってもよい。また、ネットワークの帯域監視、配信品質監視は、サーバ３１とパソコン３３の間においても行われてもよい。また、本実施形態では、動画データはパソコン３３に格納される構成としたが、動画データはサーバ３１に格納され、サーバ３１から移動端末３５に提供されてもよい。また、移動端末３５は、シンクライアント端末でもよい。また、本実施形態におけるデコードはＭＰＥＧ２等の規格に則って行われてもよい。 The personal computer 33 may perform the network bandwidth monitoring, the distribution quality monitoring, the frame deletion processing and the processing corresponding thereto performed by the server 31 in this embodiment. Network bandwidth monitoring and distribution quality monitoring may also be performed between the server 31 and the personal computer 33. In the present embodiment, the moving image data is stored in the personal computer 33, but the moving image data may be stored in the server 31 and provided from the server 31 to the mobile terminal 35. The mobile terminal 35 may be a thin client terminal. Further, the decoding in the present embodiment may be performed in accordance with a standard such as MPEG2.

また、本実施形態では、サーバ３１からストリーミング形式で移動端末３５に動画が配信されるとしたが、配信方法はストリーミング形式に限定されない。 In the present embodiment, the moving image is distributed from the server 31 to the mobile terminal 35 in the streaming format. However, the distribution method is not limited to the streaming format.

尚、本実施形態は、以上に述べた実施の形態に限定されるものではなく、本実施形態の要旨を逸脱しない範囲内で種々の構成または実施形態を取ることができる。 In addition, this embodiment is not limited to embodiment described above, A various structure or embodiment can be taken in the range which does not deviate from the summary of this embodiment.

１第１の情報処理装置
２記憶部
３監視部
４削除部
５フレーム情報生成部
６送信部
７第２の情報処理装置
８受信部
９補完画像生成部
１０映像データ生成部
１１通信ネットワークDESCRIPTION OF SYMBOLS 1 1st information processing apparatus 2 Storage part 3 Monitoring part 4 Deletion part 5 Frame information generation part 6 Transmission part 7 2nd information processing apparatus 8 Reception part 9 Complementary image generation part 10 Video data generation part 11 Communication network

Claims

A storage unit for storing first video data and audio data including synchronization information associated with the first video data;
A monitoring unit for monitoring the state of the communication network;
In accordance with the result of the monitoring , the similarity between consecutive video frames in the first video data is determined from a first frame rate indicating the first number of frames per unit time of the first video data. Is equal to or higher than a predetermined threshold, and the corresponding audio data has an audio level equal to or lower than a predetermined threshold, and any one of the continuous video frames is deleted to a second frame rate lower than the first frame rate. A deletion unit for generating the second video data,
A frame information generation unit that generates frame information about the deleted frame;
A transmission unit for transmitting the second video data and the frame information;
An information processing apparatus comprising:

The information processing apparatus according to claim 1, wherein the transmission unit further transmits the audio data.

From the first frame rate indicating the first number of frames per unit time of the first video data , the similarity between consecutive video frames of the first video data is greater than or equal to a predetermined threshold, and and second image data one of the video frame that is being removed, the lower than the first frame rate second frame rate for audio level of the audio data is the continuous is below a predetermined threshold for the A receiving unit that receives frame information related to the deleted frame and audio data including synchronization information associated with the first video data ;
A complementary image generation unit that generates a complementary image that complements the image of the deleted frame using the frame information;
Using the complementary image, the second video data, and the synchronization information, the video data of the first frame rate, the complementary image corresponding to the deleted frame, and the deleted image A video data generation unit that generates the video data synchronized with the audio data corresponding to the frame ;
An information processing apparatus comprising:

The video data generation unit inserts the complementary image at the position of the deleted frame of the second video data using the frame information, and generates video data of the first frame rate. 3. The information processing apparatus according to 3 .

The information processing apparatus according to claim 3, wherein the complementary image generation unit generates the complementary image by duplicating a frame immediately before the deleted frame.

The first information processing apparatus
Monitor the status of the communication network
According to the result of the monitoring , the similarity between consecutive video frames in the first video data is determined from the first frame rate indicating the first number of frames per unit time of the first video data. not less than a predetermined threshold value, the speech level of the corresponding audio data by deleting one of the video frames the continuous is below a predetermined threshold value, and the lower second frame rate than the first frame rate 2 video data,
Generating frame information about the deleted frame;
Executing a process of transmitting the second video data and the frame information, and audio data including synchronization information associated with the first video data ;
The second information processing apparatus
Receiving the transmitted second video data and the frame information and the audio data including the synchronization information ;
Generating a complementary image that complements the image of the deleted frame using the frame information;
Using the complementary image, the second video data, and the synchronization information, the video data of the first frame rate, the complementary image corresponding to the deleted frame, and the deleted image A video data transmission / reception method for executing processing for generating the video data synchronized with the audio data corresponding to the frame .

From the first frame rate indicating the first number of frames per unit time of the first video data, continuous video in which the audio level of the corresponding audio data is equal to or less than a predetermined threshold among the first video data A receiving unit for receiving second video data in which a frame is deleted and having a second frame rate smaller than the first frame number, and frame information regarding the deleted frame;
A complementary image generation unit that generates a complementary image that complements the image of the deleted frame using the frame information, and is included in the deleted frame using frames before and after the deleted frame An object whose movement amount is equal to or greater than a predetermined threshold is determined, and the area for displaying the object is complemented by duplicating an area indicating the object of the undeleted frame immediately after the deleted frame. The complementary image generation unit for generating an image;
A video data generation unit configured to generate video data of the first frame rate using the complementary image and the second video data;
An information processing apparatus comprising:

The first information processing apparatus
Monitor the status of the communication network
Depending on the result of the monitoring, the audio level of the corresponding audio data of the first video data is predetermined from the first frame rate indicating the first number of frames per unit time of the first video data. To generate second video data having a second frame rate lower than the first frame rate by deleting consecutive video frames that are equal to or less than a threshold value of
Generating frame information about the deleted frame;
The second video data and the frame information are transmitted
Execute the process,
The second information processing apparatus
Receiving the transmitted second video data and the frame information;
Generating a complementary image that complements the image of the deleted frame using the frame information;
Using the complementary image and the second video data, the video data of the first frame rate is generated.
Execute the process,
In the generation of the complementary image, an area in which the movement amount included in the deleted frame is greater than or equal to a predetermined threshold using the frames before and after the deleted frame, and the object is displayed Generates the complementary image by duplicating a region indicating the object of the non-deleted frame immediately after the deleted frame
Video data transmission / reception method.