JP5596473B2

JP5596473B2 - Video content playback apparatus, control method therefor, program, and recording medium

Info

Publication number: JP5596473B2
Application number: JP2010199227A
Authority: JP
Inventors: 一人大原; 禎三橋
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 2010-09-06
Filing date: 2010-09-06
Publication date: 2014-09-24
Anticipated expiration: 2030-09-06
Also published as: JP2012060256A

Description

本発明は、複数の映像コンテンツの中からサムネール表示により所望の映像コンテンツの選択を可能にする映像コンテンツ再生装置及びその制御方法、プログラム、記録媒体に関する。 The present invention relates to a video content reproduction apparatus, a control method thereof, a program, and a recording medium that enable selection of a desired video content from a plurality of video contents by thumbnail display.

従来、映像コンテンツを記録媒体に記録するレコーダ装置等では、多数の映像コンテンツを一覧表示するために、該映像コンテンツを代表する静止画像として映像コンテンツの動画像１フレームを縮小表示することが行われている。このような映像コンテンツを代表する画像は、一般にサムネールと呼ばれている。複数の映像コンテンツが記録されている場合には、複数のサムネールを一画面に一覧表示して、この一覧表示によりユーザに所望の映像コンテンツを選択させることを可能にしている。 2. Description of the Related Art Conventionally, in a recorder apparatus or the like that records video content on a recording medium, in order to display a large number of video content as a list, one frame of a moving image of the video content is reduced and displayed as a still image representing the video content. ing. An image representative of such video content is generally called a thumbnail. When a plurality of video contents are recorded, a plurality of thumbnails are displayed as a list on one screen, and this list display allows the user to select desired video contents.

一方で、近年の記録媒体の大容量化や通信回線の高速化を背景に、記録媒体に記録される、またはネットワークを介してアクセス可能とされる映像コンテンツの量は、確実に増加している。このように映像コンテンツの数が多くなった場合、所望のコンテンツにアクセスするためには、静止画によるサムネール表示では不十分である。なぜなら、多数の映像コンテンツを記録した状態では、視聴前からすべての映像コンテンツの内容を個別に把握しているとは限らず、視聴前において映像コンテンツに関する漠然としたイメージ以上の知識を持っていないことも多いからである。このような場合の例としては、ユーザ自身が個別に録画設定をしなくても好みに応じて自動的に録画された映像コンテンツを視聴する場合や、個人により作成された映像コンテンツを視聴する場合などが該当する。このようなときに目的の映像コンテンツを効率的に検索するためには、一覧表示において映像コンテンツの内容を端的に表してる動画像、すなわち動画サムネールを見ながら探すようにすることが有効である。 On the other hand, the amount of video content recorded on a recording medium or accessible via a network is steadily increasing against the background of increasing capacity of recording media and speeding up of communication lines in recent years. . When the number of video contents increases in this way, thumbnail display using still images is insufficient to access desired contents. This is because in the state of recording a large number of video contents, the contents of all video contents are not necessarily grasped individually before viewing, and they do not have knowledge beyond vague images about video contents before viewing. Because there are many. Examples of such cases include viewing video content that has been automatically recorded according to user preferences, or viewing video content created by an individual without the user having to individually set recording settings. And so on. In order to efficiently search for the target video content at such times, it is effective to search while viewing a moving image that clearly represents the content of the video content in the list display, that is, a moving image thumbnail.

動画サムネールによる一覧表示の例を図１０に示す。このように複数の映像コンテンツを表示する場合には、検索用途であることから、各映像コンテンツの音声は必ずしも出力する必要はないといえるが、選択した映像コンテンツについては音声を出力することにより、ユーザの利便性が向上するともいえる。その場合には、選択された映像コンテンツに関して、ユーザの操作後、即座に動画像との同期を取って音声を出力することが望ましい。このような動画像と音声の同期（ＡＶ同期）を実現するためには、一般的には動画像データと音声データそれぞれに付加された時刻情報に基づき、両者の処理タイミングを調整することが行われている。結果的に、出力時点で両者の同期が取れていればよいため、処理過程のいずこかで同期を取ればよい。例えば、特許文献１では、記録媒体から映像コンテンツのストリームを読み出す時点で、ストリームの入力タイミングと同様なタイミングで出力する方法（タイムスタンプに基づく読み出し方法）と、バッファ内のデータ量に合わせて入力を調整する読み出し方法とを切り換えることで、各デコーダにデータを入力する前の段階でＡＶ同期を実現している。 An example of a list display by moving image thumbnail is shown in FIG. In this way, when displaying a plurality of video contents, since it is a search application, it can be said that the audio of each video content does not necessarily need to be output, but by outputting the audio for the selected video content, It can be said that user convenience is improved. In that case, it is desirable that the selected video content is output immediately after being operated by the user in synchronization with the moving image. In order to realize such synchronization of moving image and sound (AV synchronization), generally, the processing timing of both is adjusted based on time information added to each of moving image data and sound data. It has been broken. As a result, since it is only necessary to synchronize both at the time of output, it is sufficient to synchronize somewhere in the process. For example, in Patent Document 1, when a stream of video content is read from a recording medium, a method of outputting at a timing similar to the input timing of the stream (a reading method based on a time stamp) and an input according to the amount of data in the buffer By switching between the reading method for adjusting the AV, the AV synchronization is realized at a stage before data is input to each decoder.

特開２００６−１１５２４５号公報JP 2006-115245 A

タイムスタンプに基づく読み出し方法は、データ記録先のアドレスに基づき記録媒体に対してランダムアクセスが必要となるため、データを記録媒体から読み出す場合には適用が容易になるが、ネットワーク上の映像ストリームを受信しながら再生する場合には適用が難しいという面がある。なぜなら、受信側となる端末だけでなく、送信側となるサーバに対しても、タイムスタンプに基づく読み出しを実現する機構を組み込まなければならないからである。一般に普及しているサーバでは、例えばＨＴＴＰが使用されており、この場合には、再生端末はデータ取得の要求後、サーバが送信するデータを受信することになる。サーバ側にプロトコルが実装されていない限り、バッファ内のデータ量を監視しながら読み出す方法しか採ることができず、デコード前に動画像と音声の時刻情報を合わせることは困難である。結局、映像コンテンツのデータを読み出す時点ではなく、出力までの間にＡＶ同期を行うことが現実的となる。 The reading method based on the time stamp requires random access to the recording medium based on the address of the data recording destination, so that it is easy to apply when reading data from the recording medium. When playing while receiving, it is difficult to apply. This is because a mechanism for realizing reading based on the time stamp must be incorporated not only in the terminal serving as the reception side but also in the server serving as the transmission side. For example, HTTP is used in a server that is widely used. In this case, after a request for data acquisition, the playback terminal receives data transmitted by the server. Unless the protocol is implemented on the server side, only the method of reading while monitoring the data amount in the buffer can be used, and it is difficult to match the time information of the moving image and the sound before decoding. In the end, it is practical to perform AV synchronization not before the video content data is read but before output.

ところで、動画サムネールの一覧表示では、複数映像コンテンツに対する処理量が、割り当てられたリソース量に対して大きい場合には、動画コンテンツにて定められた表示レート（動画像のフレームレート）での再生が困難であり、その代替としてスロー再生、スキップ再生を実行せざるを得ないことがある。特にスキップ再生では、動画像のフレーム間隔が広がる傾向がある。例えば図１１に示すように、ユーザの操作があった時点（時刻ｔｃ秒）で表示している動画像のフレームｆ１と、復号中のフレームｆ２と、映像コンテンツのストリームから分離処理中のフレームｆ３があり、例えばそれぞれの時刻情報の間隔がそれぞれ２秒であったとすると、ｆ３の表示時刻はｔｃ＋４となる。映像コンテンツのストリーム読み出しに利用するバッファにおいては、フレームｆ３周辺のデータが格納されており、ｔｃの時点で読み出し可能な音声データはｔｃ＋４秒付近のものとなる。音声データの復号が早期に終了しても、ユーザの操作後、音声が出力されるまでの時間が長くなってしまう。 By the way, in the list display of moving image thumbnails, when the processing amount for a plurality of video contents is larger than the allocated resource amount, reproduction at a display rate (moving image frame rate) determined in the moving image content is possible. In some cases, it is difficult to execute slow playback and skip playback as an alternative. In particular, skip playback tends to increase the frame interval of moving images. For example, as shown in FIG. 11, the frame f1 of the moving image displayed at the time of user operation (time tc seconds), the frame f2 being decoded, and the frame f3 being separated from the video content stream For example, if the interval of each time information is 2 seconds, the display time of f3 is tc + 4. The buffer used for reading the video content stream stores the data around the frame f3, and the audio data that can be read out at the time of tc is in the vicinity of tc + 4 seconds. Even if the decoding of the audio data ends early, the time until the audio is output after the user's operation becomes long.

本発明は、このような実情を鑑みてなされたものであり、映像コンテンツへのアクセス方法（記録装置へのアクセス、ネットアクセス）や動画サムネールの再生方法（通常再生と低負荷のスキップ、スロー再生）に係わらず、ユーザの操作後迅速に動画像と音声の同期処理を行い、音声出力を開始することで、操作に対する応答性を高め、ユーザの利便性を向上させた映像コンテンツ再生装置及びその制御方法、プログラム、記録媒体を提供する。 The present invention has been made in view of the above circumstances, and is a video content access method (access to a recording device, net access) and a video thumbnail playback method (normal playback and low-load skip, slow playback). Regardless of the user's operation, the video content playback apparatus and the video content playback apparatus that performs the synchronization processing of the moving image and the sound promptly after the user's operation and starts the audio output, thereby improving the responsiveness to the operation and improving the user's convenience A control method, a program, and a recording medium are provided.

本発明の映像コンテンツ再生装置は、映像コンテンツのストリームの入力を受けるデータ取得部と、前記映像コンテンツのストリームから動画像ストリームと音声ストリームとを取得するストリーム処理部と、前記動画像ストリームを復号し、復号動画像フレームを得る動画像復号部と、時刻情報に基づき前記復号動画像フレームの出力を調整する第１の動作と、前記時刻情報と無関係に前記復号動画像フレームを出力する第２の動作とを実行する画像出力部と、前記音声ストリームを復号し、復号音声データを得る音声復号部と、前記復号音声データを出力する音声出力部と、を備え、前記ストリーム処理部は、ユーザ操作に応じて前記映像コンテンツのストリームから前記音声ストリームを取得し、前記画像出力部は、前記復号音声データが前記音声出力部に入力された時点で動作を前記第１の動作から前記第２の動作に切り替えることを特徴とする。 The video content reproduction apparatus of the present invention includes a data acquisition unit that receives an input of a video content stream, a stream processing unit that acquires a moving image stream and an audio stream from the video content stream, and a decoding unit that decodes the moving image stream. A moving image decoding unit for obtaining a decoded moving image frame; a first operation for adjusting an output of the decoded moving image frame based on time information; and a second operation for outputting the decoded moving image frame regardless of the time information. An image output unit that executes an operation, an audio decoding unit that decodes the audio stream and obtains decoded audio data, and an audio output unit that outputs the decoded audio data. The audio stream is acquired from the video content stream in response to the video content stream, and the image output unit receives the decoded audio data. There and switches the operation when it is input to the audio output unit to the second operation from the first operation.

前記画像出力部は、前記音声出力部が前記復号音声データの出力を開始した時点で動作を前記第１の動作から前記第２の動作に切り替えるようにしてもよい。 The image output unit may switch the operation from the first operation to the second operation when the audio output unit starts outputting the decoded audio data.

前記映像出力部は、前記第２の動作において前記復号動画像フレームの時刻情報を記憶し、前記音声出力部は、前記復号動画像フレームの時刻情報を参照して前記復号音声データの出力開始を決定するようにしてもよい。 The video output unit stores time information of the decoded moving image frame in the second operation, and the audio output unit refers to the time information of the decoded moving image frame and starts outputting the decoded audio data. It may be determined.

前記音声出力部は、前記復号音声データの出力開始時刻と前記復号動画像フレームの時刻情報の差分を算出し、前記画像出力部は、前記第１の動作において前記復号動画像フレームの時刻情報を前記差分により補正するようにしてもよい。 The audio output unit calculates a difference between an output start time of the decoded audio data and time information of the decoded moving image frame, and the image output unit calculates time information of the decoded moving image frame in the first operation. You may make it correct | amend by the said difference.

前記画像出力部に入力される前記復号動画像フレームの時刻情報の間隔が所定の値を超える場合にのみ、前記第１の動作から前記第２の動作への切り替えを行うようにしてもよい。 The switching from the first operation to the second operation may be performed only when the time information interval of the decoded moving image frame input to the image output unit exceeds a predetermined value.

複数の映像コンテンツを一覧表示する際に、前記画像出力部は、音声を出力する対象の映像コンテンツについてのみ前記第１の動作から前記第２の動作への切り替えを行うようにしてもよい。 When displaying a list of a plurality of video contents, the image output unit may perform switching from the first operation to the second operation only for the video content to be output.

本発明の映像コンテンツ再生装置の制御方法は、映像コンテンツのサムネールを表示する映像コンテンツ再生装置の制御方法であって、映像コンテンツのストリームの入力を受けるデータ取得ステップと、前記映像コンテンツのストリームから動画像ストリームと音声ストリームとを取得するストリーム処理ステップと、前記動画像ストリームを復号し、復号動画像フレームを得る動画像復号ステップと、時刻情報に基づき前記復号動画像フレームの出力を調整する第１の動作と、前記時刻情報と無関係に前記復号動画像フレームを出力する第２の動作とを実行する画像出力ステップと、前記音声ストリームを復号し、復号音声データを得る音声復号ステップと、前記復号音声データを出力する音声出力ステップと、を有し、前記ストリーム処理ステップにおいて、ユーザ操作に応じて前記映像コンテンツのストリームから前記音声ストリームを取得し、前記画像出力ステップにおいて、前記音声出力ステップにおいて前記復号音声データが入力された時点で動作を前記第１の動作から前記第２の動作に切り替えることを特徴とする。 The video content playback apparatus control method of the present invention is a video content playback apparatus control method for displaying a video content thumbnail, a data acquisition step for receiving an input of a video content stream, and a video from the video content stream. A stream processing step of acquiring an image stream and an audio stream; a moving image decoding step of decoding the moving image stream to obtain a decoded moving image frame; and a first adjusting the output of the decoded moving image frame based on time information And an image output step for executing the second operation for outputting the decoded moving image frame irrespective of the time information, an audio decoding step for decoding the audio stream and obtaining decoded audio data, and the decoding An audio output step of outputting audio data, and the stream In the processing step, the audio stream is acquired from the video content stream in response to a user operation, and in the image output step, the operation is performed when the decoded audio data is input in the audio output step. To the second operation.

前記画像出力ステップにおいて、前記音声出力ステップにおいて前記復号音声データの出力を開始した時点で動作を前記第１の動作から前記第２の動作に切り替えるようにしてもよい。 In the image output step, the operation may be switched from the first operation to the second operation when output of the decoded audio data is started in the audio output step.

前記映像出力ステップにおいて、前記第２の動作において前記復号動画像フレームの時刻情報を記憶し、前記音声出力ステップにおいて、前記復号動画像フレームの時刻情報を参照して前記復号音声データの出力開始を決定するようにしてもよい。 In the video output step, time information of the decoded video frame is stored in the second operation, and in the audio output step, output of the decoded audio data is started with reference to time information of the decoded video frame. It may be determined.

前記音声出力ステップにおいて、前記復号音声データの出力開始時刻と前記復号動画像フレームの時刻情報の差分を算出し、前記画像出力ステップにおいて、前記第１の動作において前記復号動画像フレームの時刻情報を前記差分により補正するようにしてもよい。 In the audio output step, a difference between the output start time of the decoded audio data and the time information of the decoded video frame is calculated, and in the image output step, the time information of the decoded video frame is calculated in the first operation. You may make it correct | amend by the said difference.

本発明のプログラムは、上記映像コンテンツ再生装置の制御方法をコンピュータに実行させるためのプログラムである。 A program according to the present invention is a program for causing a computer to execute the control method of the video content reproduction apparatus.

本発明の記録媒体は、上記プログラムを記録したコンピュータ読み取り可能な記録媒体である。 The recording medium of the present invention is a computer-readable recording medium on which the above program is recorded.

本発明によれば、映像コンテンツへのアクセス方法（記録装置へのアクセス、ネットアクセス）や動画サムネールの再生方法（通常再生と低負荷のスキップ、スロー再生）に係わらず、ユーザの操作後迅速に動画像と音声の同期処理を行い、音声出力を開始することで、操作に対する応答性を高め、ユーザの利便性を向上させることができる。 According to the present invention, regardless of the method for accessing video content (access to a recording device, net access) and the method for reproducing a video thumbnail (normal playback, low-load skip, slow playback), the user can quickly operate after the operation. By performing the synchronization processing of the moving image and the sound and starting the sound output, the responsiveness to the operation can be improved and the convenience for the user can be improved.

本発明の実施の形態に係る映像コンテンツ再生装置の構成例を示すブロック図である。It is a block diagram which shows the structural example of the video content reproduction apparatus which concerns on embodiment of this invention. 本発明の実施の形態に係る映像コンテンツ再生装置を適用したシステム構成例を示す図である。It is a figure which shows the system configuration example to which the video content reproduction apparatus which concerns on embodiment of this invention is applied. 本発明の実施の形態に係る映像コンテンツ再生装置の動作における状態遷移を示す図である。It is a figure which shows the state transition in operation | movement of the video content reproduction apparatus which concerns on embodiment of this invention. 本発明の実施の形態に係る映像コンテンツ再生装置の動作例を示すフローチャートである。It is a flowchart which shows the operation example of the video content reproduction apparatus which concerns on embodiment of this invention. 本発明の実施の形態に係る映像コンテンツ再生装置の動作例を示す模式図である。It is a schematic diagram which shows the operation example of the video content reproduction apparatus which concerns on embodiment of this invention. 本発明の実施の形態に係る映像コンテンツ再生装置における表示時刻の変化を示す図である。It is a figure which shows the change of the display time in the video content reproduction apparatus which concerns on embodiment of this invention. 本発明の実施の形態に係る映像コンテンツ再生装置の動作例を示すフローチャートである。It is a flowchart which shows the operation example of the video content reproduction apparatus which concerns on embodiment of this invention. 本発明の実施の形態に係る映像コンテンツ再生装置の音声出力部の動作例を示すフローチャートである。It is a flowchart which shows the operation example of the audio | voice output part of the video content reproduction apparatus which concerns on embodiment of this invention. 本発明の実施の形態に係る映像コンテンツ再生措置の動作を説明するための図である。It is a figure for demonstrating operation | movement of the video content reproduction | regeneration measure which concerns on embodiment of this invention. 本発明の実施の形態に係る一覧表示の画面例を示す図である。It is a figure which shows the example of a screen of the list display which concerns on embodiment of this invention. 各処理における処理タイミングの時間差の一例を示す図である。It is a figure which shows an example of the time difference of the process timing in each process.

以下、本発明の実施形態について、図面を参照しながら説明する。図１は、本発明の実施形態に係る映像コンテンツ再生装置１００の構成例を示すブロック図である。ここでは一例として、同時に処理する映像コンテンツのストリーム数を４としている。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. FIG. 1 is a block diagram showing a configuration example of a video content reproduction apparatus 100 according to the embodiment of the present invention. Here, as an example, the number of video content streams to be processed simultaneously is four.

映像コンテンツ再生装置１００は、映像コンテンツのストリームを取得するデータ取得部１０１と、映像コンテンツのストリームデータの解析および多重分離を行うストリーム処理部１０２と、動画像データを復号する動画像復号部１０３と、動画像復号部１０３の動作を制御する復号制御部１０４と、音声データを復号する音声復号部１０６と、複数の動画像の出力タイミングを調整する画像出力部１０５と、音声を出力する音声出力部１０７と、前記の各部を制御する制御部１０８とを備える。 The video content reproduction apparatus 100 includes a data acquisition unit 101 that acquires a video content stream, a stream processing unit 102 that analyzes and demultiplexes the video content stream data, and a video decoding unit 103 that decodes video data. , A decoding control unit 104 that controls the operation of the moving image decoding unit 103, an audio decoding unit 106 that decodes audio data, an image output unit 105 that adjusts the output timing of a plurality of moving images, and an audio output that outputs audio Unit 107 and a control unit 108 that controls each of the above-described units.

本発明の構成要素である各部は、例えば、マイクロプロセッサ、メモリ、バス、インターフェイス、周辺機器などから構成されるハードウェアと、これらのハードウェア上にて実行可能なソフトウェアにより実現される。上記ハードウェアの一部または全部を、集積回路／ＩＣ（Integrated Circuit）として搭載することができ、その場合、上記ソフトウェアは上記メモリに記憶しておけばよい。また、本発明の構成要素の各部のすべてをハードウェアで構成してもよく、その場合についても同様に、そのハードウェアの一部または全部を、集積回路／ＩＣチップセットとして搭載することも可能である。 Each unit, which is a component of the present invention, is realized by, for example, hardware configured by a microprocessor, a memory, a bus, an interface, a peripheral device, and the like, and software that can be executed on these hardware. A part or all of the hardware can be mounted as an integrated circuit / IC (Integrated Circuit). In this case, the software may be stored in the memory. In addition, all of the components of the present invention may be configured by hardware, and in that case as well, part or all of the hardware can be mounted as an integrated circuit / IC chip set. It is.

図示はしないが、データ取得部１０１、ストリーム処理部１０２、動画像復号部１０３、画像出力部１０５、音声復号部１０６および音声出力部１０７の各部は、入力される処理対象データを記憶するバッファ領域がメモリ上に割り当てられる。これらのバッファ領域は、構成上まとめて一箇所に配置されてもよいが、便宜上、各部の中に設けられているものとして扱い、内部バッファと呼ぶことにする。 Although not shown, each of the data acquisition unit 101, the stream processing unit 102, the moving image decoding unit 103, the image output unit 105, the audio decoding unit 106, and the audio output unit 107 is a buffer area for storing input processing target data Are allocated in memory. These buffer areas may be arranged collectively in one place in terms of configuration, but for convenience, they are treated as being provided in each part and will be referred to as internal buffers.

データ取得部１０１は、映像コンテンツのストリームデータを外部のデバイスから入手するための手段である。データ取得部１０１は、例えば、記録メディア内に置かれた動画ファイルへのアクセスや、ネットワークを介したサーバへのリクエストによりストリームデータを取得する。 The data acquisition unit 101 is means for acquiring video content stream data from an external device. The data acquisition unit 101 acquires stream data, for example, by accessing a moving image file placed in a recording medium or by making a request to a server via a network.

ストリーム処理部１０２は、データ取得部１０１からストリームデータを読み出して、そのヘッダを解析し、解析した結果に基づき、ストリームデータに多重化された動画像ストリームデータ、および音声ストリームデータを取得する。ストリーム処理部１０２は、取得した動画像ストリームデータを動画像復号部１０３に出力し、音声ストリームデータを音声復号部１０６に出力する。ここで、動画像ストリームは処理対象となっているストリームデータ数だけ処理され、音声ストリームデータはそのうちのいずれか１つのみが処理されるものとする。動画像／音声の各フレームには、提示されるべき時刻を示す時刻情報が付随しており、ストリーム処理部１０２は、動画像ストリームデータまたは音声ストリームデータを出力する際に、前記時刻情報を付加する（所定数のサンプルから構成される音声データの集合をフレームと呼ぶ）。 The stream processing unit 102 reads the stream data from the data acquisition unit 101, analyzes the header, and acquires moving image stream data and audio stream data multiplexed on the stream data based on the analysis result. The stream processing unit 102 outputs the acquired moving image stream data to the moving image decoding unit 103 and outputs the audio stream data to the audio decoding unit 106. Here, it is assumed that the moving image stream is processed by the number of stream data to be processed, and only one of the audio stream data is processed. Each frame of moving image / sound is accompanied by time information indicating the time to be presented, and the stream processing unit 102 adds the time information when outputting moving image stream data or audio stream data. (A set of audio data composed of a predetermined number of samples is called a frame).

動画像復号部１０３は、入力された動画像ストリームデータから、動画像の１フレームを復号する。動画像ストリームデータに付加された時刻情報は、当該復号動画像フレームを表示するタイミング（表示時刻）を指定するものであり、復号フレームと合わせて画像出力部１０５に出力される。動画像復号部１０３は、また、復号処理の終了が前記表示時刻に対してどの程度遅延しているかを示す遅延量を算出し、遅延量を復号制御部１０４に出力する。動画像復号部１０３は、復号制御部１０４の制御により復号時の動作が変更可能であり、復号動作によってはフレームの時刻情報を書き換えて出力する。 The moving image decoding unit 103 decodes one frame of the moving image from the input moving image stream data. The time information added to the moving image stream data designates the timing (display time) for displaying the decoded moving image frame, and is output to the image output unit 105 together with the decoded frame. The moving image decoding unit 103 also calculates a delay amount indicating how much the end of the decoding process is delayed with respect to the display time, and outputs the delay amount to the decoding control unit 104. The moving picture decoding unit 103 can change the decoding operation under the control of the decoding control unit 104. Depending on the decoding operation, the moving picture decoding unit 103 rewrites and outputs the frame time information.

復号制御部１０４は、復号処理に遅延が発生しているか否かを動画像復号部１０３から入力される遅延量に基づいて判定し、遅延発生時には、単位時間当たりの処理量を減らす方向に動画像復号部１０３の動作を制御する（ここで動画像復号部１０３の動作を再生モードと呼ぶ）。また、複数の映像コンテンツの中から他に優先して再生すべき映像コンテンツの指定を制御部１０８から受けた場合には、その時点における再生モードによらず、当該映像コンテンツが通常レートでの再生となるように制御する。通常レートとは、映像コンテンツのストリームデータの再生レート（フレームレート）を指す。 The decoding control unit 104 determines whether or not there is a delay in the decoding process based on the delay amount input from the moving image decoding unit 103. When a delay occurs, the decoding control unit 104 reduces the processing amount per unit time. The operation of the image decoding unit 103 is controlled (here, the operation of the moving image decoding unit 103 is referred to as a reproduction mode). In addition, when the control unit 108 receives designation of video content to be played with priority from other video content, the video content is played at the normal rate regardless of the playback mode at that time. Control to be The normal rate refers to a playback rate (frame rate) of stream data of video content.

画像出力部１０５は、入力された動画像フレームを表示デバイス上に表示するための出力処理を行う。動画像フレームに付加された時刻情報に基づき表示タイミングを調整する動作（第１の動作）と、時刻情報と無関係に動画像フレームを処理する動作（第２の動作）のいずれかを行う。外部出力の際は、フレームのデータ形式、カラーフォーマットなどを必要に応じて変換する。例えば、フレームがＹ、Ｕ、Ｖの色空間データで、表示がＲ、Ｇ、Ｂ色空間であるとき、ＹＵＶからＲＧＢに変換する。 The image output unit 105 performs output processing for displaying the input moving image frame on the display device. Either an operation for adjusting the display timing based on the time information added to the moving image frame (first operation) or an operation for processing the moving image frame regardless of the time information (second operation) is performed. For external output, the frame data format, color format, etc. are converted as necessary. For example, when the frame is Y, U, V color space data and the display is the R, G, B color space, conversion from YUV to RGB is performed.

音声復号部１０６は、入力された音声ストリームデータから、音声の１フレームを復号する。この繰り返しにより、音声復号部１０６は、音声ストリームデータから動画サムネールと合わせて出力するための一連の音声フレームを得る。音声出力部１０７は、音声復号部１０６から入力された音声データを出力する。画像出力部１０５と音声出力部１０７は、各自の内部の状態を共有し、音声出力時に対となる動画像のフレーム出力と同期させる機能を有する。 The audio decoding unit 106 decodes one frame of audio from the input audio stream data. By repeating this, the audio decoding unit 106 obtains a series of audio frames to be output together with the moving image thumbnail from the audio stream data. The audio output unit 107 outputs the audio data input from the audio decoding unit 106. The image output unit 105 and the audio output unit 107 share the internal state of each of them, and have a function of synchronizing with a frame output of a moving image that becomes a pair at the time of audio output.

制御部１０８は、複数の映像コンテンツから１つの映像コンテンツを優先指定する。優先指定は、例えば、制御部１０８が外部から取得する制御情報や、ユーザ操作により操作デバイスが生成する操作情報に基づき行われる。 The control unit 108 preferentially designates one video content from a plurality of video content. The priority designation is performed based on, for example, control information acquired from the outside by the control unit 108 or operation information generated by the operation device by a user operation.

以上のように、映像コンテンツ処理装置１００は、複数の映像コンテンツを処理し、同時に表示することが可能である。例えば、図２に示すように、記録装置（記録媒体とその読み取り装置）、通信装置、操作デバイス、ディスプレイ、スピーカを接続することで、ディスプレイ上に図１０に示すように複数の映像コンテンツを表示し、そのうちの選択された１つの映像コンテンツについて付随する音声をスピーカから出力することができる。 As described above, the video content processing apparatus 100 can process a plurality of video contents and display them simultaneously. For example, as shown in FIG. 2, a recording device (recording medium and its reading device), a communication device, an operation device, a display, and a speaker are connected to display a plurality of video contents on the display as shown in FIG. Then, it is possible to output a sound accompanying the selected video content from the speaker.

図３に映像コンテンツ再生装置１００の動作における内部状態の遷移例を示す。図３（ａ）において、起動後は初期状態である。この例では、処理すべき映像コンテンツが指定されて動画サムネールの表示を開始すると、内部状態は優先指定なしの再生状態に遷移する。この状態では、映像コンテンツをすべて非優先として音声を表示しない。画面の表示例は図３（ｂ）のようになる。映像コンテンツ再生装置１００は、負荷が処理能力より小さければ通常再生を維持し、負荷が処理能力を超えたら、発生した遅延量に応じてスロー再生、またはスキップ再生を実行する。この状態で優先指定がなされると、優先指定ありの再生状態に遷移する。この状態では、１つの映像コンテンツが優先して処理され、音声が出力される。画面の表示例は図３（ｃ）のようになる。この状態で優先指定を解除すると、優先指定なしの再生状態に遷移し、優先指定の映像コンテンツを変更すると、優先指定ありの状態は変化しないで優先して処理される映像コンテンツが変わる。画面の表示例は図３（ｄ）のようになる。 FIG. 3 shows a transition example of the internal state in the operation of the video content reproduction apparatus 100. In FIG. 3A, it is in an initial state after startup. In this example, when video content to be processed is designated and display of a moving image thumbnail is started, the internal state transitions to a playback state without priority designation. In this state, all video content is not prioritized and no audio is displayed. A display example of the screen is as shown in FIG. The video content playback apparatus 100 maintains normal playback if the load is smaller than the processing capability, and performs slow playback or skip playback according to the amount of delay that has occurred when the load exceeds the processing capability. If priority designation is made in this state, a transition is made to a playback state with priority designation. In this state, one video content is preferentially processed and audio is output. A display example of the screen is as shown in FIG. If priority designation is canceled in this state, a transition is made to a playback state without priority designation. When video content with priority designation is changed, the video content to be processed with priority is changed without changing the state with priority designation. A display example of the screen is as shown in FIG.

図４は、映像コンテンツの再生に係る動作を示すフローチャートである。このフローチャートは、図３における優先指定なしの再生状態におけるデータ取得部１０１、ストリーム処理部１０２、動画像復号部１０３の動作を示すものである。 FIG. 4 is a flowchart showing an operation related to reproduction of video content. This flowchart shows the operations of the data acquisition unit 101, the stream processing unit 102, and the moving image decoding unit 103 in the playback state without priority designation in FIG.

ステップＳＴ４０１において、制御部１０８は、再生すべき映像コンテンツを特定し、ストリーム処理部１０２に対して映像コンテンツを指定する。例えば、記録メディア内にファイルとして記録されている映像コンテンツを読み出す、または、ネットワーク上のサーバにアクセスして利用可能な映像コンテンツ名を取得する。さらに、内部クロックを利用してカウントされるシステムタイマーの値（システム時刻）を初期化する。 In step ST401, the control unit 108 specifies video content to be played back and designates the video content to the stream processing unit 102. For example, the video content recorded as a file in the recording medium is read, or the name of the usable video content is acquired by accessing a server on the network. Further, the system timer value (system time) counted using the internal clock is initialized.

ステップＳＴ４０２において、ストリーム処理部１０２は、映像コンテンツのストリームを解析し、動画像と音声ストリームについての情報を取得する。ここで取得できる情報は、動画像の場合、解像度、フレームレート、符号化方式、ストリームデータ中のどの位置から動画像の１フレーム分のデータ取得ができるかを示すオフセット値、表示のための時刻情報などである（以下、動画像フレームの表示タイミングを示す時刻情報を表示時刻と呼ぶ）。 In step ST402, the stream processing unit 102 analyzes the video content stream, and acquires information about the moving image and the audio stream. The information that can be acquired here is, in the case of a moving image, the resolution, frame rate, encoding method, offset value indicating from which position in the stream data the data for one frame of data can be acquired, and the time for display. Information (hereinafter, time information indicating the display timing of moving image frames is referred to as display time).

ステップＳＴ４０３において、制御部１０８は、再生すべき映像コンテンツのデータの処理がすべて終了したかどうかを判定し、終了している場合は動作を終了し、終了していない場合はステップＳＴ４０４に進む。 In step ST403, the control unit 108 determines whether or not all the processing of the video content data to be reproduced has been completed. If it has been completed, the control unit 108 ends the operation. If not, the process proceeds to step ST404.

ステップＳＴ４０４において、ストリーム処理部１０２は、ストリーム解析終了後の映像コンテンツのストリームデータから動画像ストリームデータを取り出し、この動画像ストリームデータに付加された時刻情報を必要に応じて加工し、元の時刻情報とともに動画像ストリームデータに付加して動画像復号部１０３に出力する。 In step ST404, the stream processing unit 102 extracts moving image stream data from the stream data of the video content after the end of the stream analysis, processes the time information added to the moving image stream data as necessary, and converts the original time The information is added to the moving image stream data together with the information and output to the moving image decoding unit 103.

ステップＳＴ４０５において、動画像復号部１０３は、表示時刻の書き換えが必要かどうかを判定し、必要な場合はステップＳＴ４０６に進み、必要ない場合はステップＳＴ４０７に進む。 In step ST405, the moving picture decoding unit 103 determines whether or not the display time needs to be rewritten. If necessary, the process proceeds to step ST406, and if not necessary, the process proceeds to step ST407.

ステップＳＴ４０６において、動画像復号部１０３は、復号制御部１０４により指定された再生モードに基づき、入力された時刻情報を書き換える。ここで、再生モードは、通常再生、スロー再生、スキップ再生のいずれかの動作を示す。例えば、スロー再生における再生速度が通常再生の２倍であるとき、前に入力された表示時刻と現在の表示時刻の差を２倍して、前に出力した表示時刻に加算して出力する。 In step ST406, the moving image decoding unit 103 rewrites the input time information based on the reproduction mode specified by the decoding control unit 104. Here, the playback mode indicates any one of normal playback, slow playback, and skip playback. For example, when the playback speed in slow playback is twice that of normal playback, the difference between the previously input display time and the current display time is doubled and added to the previously output display time for output.

ステップＳＴ４０７において、動画像復号部１０３は、動画像データの復号処理を行うか否かを判定し、行わない場合はステップＳＴ４０３に戻り、行う場合はステップＳＴ４０８に進む。 In step ST407, the moving image decoding unit 103 determines whether or not to perform moving image data decoding processing. If not, the moving image decoding unit 103 returns to step ST403, and if yes, proceeds to step ST408.

ステップＳＴ４０８において、動画像復号部１０３は、動画像ストリームデータを復号する。動作モードが通常再生、およびスロー再生の場合、ストリーム処理部１０２から入力された順に復号する。スキップ再生の場合には、復号処理の前に動画像ストリームデータを調べ、その符号化タイプを取得する。ここで、符号化タイプとは、動画像１フレームのストリームデータが当該フレームのストリームデータのみで復号できるか（イントラフレーム）、それとも他のフレームの情報を利用するか（インターフレーム）を指す。スキップ再生では、例えば、符号化タイプがイントラフレームのストリームデータのみを復号する。または、他のフレームの復号時に参照されないフレームのストリームデータのみを復号する。 In step ST408, the moving image decoding unit 103 decodes moving image stream data. When the operation mode is normal reproduction and slow reproduction, decoding is performed in the order input from the stream processing unit 102. In the case of skip reproduction, the moving image stream data is examined before decoding processing, and the encoding type is acquired. Here, the encoding type indicates whether the stream data of one moving image frame can be decoded only with the stream data of the frame (intra frame) or whether information of another frame is used (inter frame). In skip reproduction, for example, only stream data with an encoding type of intra frame is decoded. Alternatively, only stream data of a frame that is not referred to when other frames are decoded is decoded.

ステップＳＴ４０９において、動画像復号部１０３は、復号処理終了後、システム時刻を取得して表示時刻に対する遅延量を算出し、ステップＳＴ４０３に戻る。 In step ST409, the moving image decoding unit 103 acquires the system time after the decoding process is completed, calculates the delay amount with respect to the display time, and returns to step ST403.

復号制御部１０４は、図４のフローチャートで示される動作とは独立して動作し、動画像復号部１０３からの遅延量を監視する。遅延量から遅延が発生していると判定した場合、各動画像復号部１０３の動作モードを通常→スロー→スキップの順に負荷の軽い方向へ変更し、遅延が発生せず十分に処理に余裕があると判定した場合には、逆に負荷の大きい方向へ変更する。遅延の発生は、例えば、遅延量がしきい値を超えた場合や、所定時間において遅延量が増加し続ける場合に起きているとする。 The decoding control unit 104 operates independently of the operation shown in the flowchart of FIG. 4 and monitors the delay amount from the moving image decoding unit 103. If it is determined that a delay has occurred from the amount of delay, the operation mode of each moving image decoding unit 103 is changed in the direction of light load in the order of normal → slow → skip, and there is no delay, so there is sufficient processing margin. If it is determined that there is, it is changed to a direction with a larger load. It is assumed that the delay occurs, for example, when the delay amount exceeds a threshold value or when the delay amount continues to increase for a predetermined time.

画像出力部１０５は、各動画像復号部１０３からの動画像フレームと時刻情報の出力に合わせて、表示のための出力タイミングを調整する。例えば、取得したシステム時刻と復号後の動画像フレームの表示時刻とを比較した結果、システム時刻と表示時刻の差が所定のしきい値以下の場合、または、表示時刻がシステム時刻に対して遅れている場合には、復号フレームを出力する。 The image output unit 105 adjusts the output timing for display according to the output of the moving image frame and the time information from each moving image decoding unit 103. For example, as a result of comparing the acquired system time with the display time of the decoded video frame, if the difference between the system time and the display time is less than or equal to a predetermined threshold value, or the display time is delayed with respect to the system time If so, a decoded frame is output.

表示時刻とシステム時刻について、図５および図６を用いて説明する。ここで、再生処理の開始時に制御部１０８がシステム時刻をＴ０に初期化したとする。ストリーム処理部１０２は、ストリームデータに付けられた表示時刻ｔｖ（ｉ）からＴｖ（ｉ）を導出する。ここで、ｉは動画像のフレーム番号、ｄは表示時刻を一定時間だけ遅らせるためのオフセット値である。表示時刻Ｔｖ（ｉ）は、システム時刻Ｔ０を基準とした実際の表示時刻である。 The display time and system time will be described with reference to FIGS. Here, it is assumed that the control unit 108 initializes the system time to T0 at the start of the reproduction process. The stream processing unit 102 derives Tv (i) from the display time tv (i) attached to the stream data. Here, i is the frame number of the moving image, and d is an offset value for delaying the display time by a fixed time. The display time Tv (i) is an actual display time based on the system time T0.

ストリーム処理部１０２は、２つの表示時刻ｔｖ（ｉ）、Ｔｖ（ｉ）を動画像ストリームに付加して動画像復号部１０３に出力する。動画像復号部１０３は、再生制御の実行により、Ｔｖ（ｉ）をＴｖ’（ｉ）に書き換える。復号済みの動画像フレームにｔｖ（ｉ）と書き換えたＴｖ’（ｉ）を付加して画像出力部１０５へ出力する。復号処理終了時のシステム時刻をＴ１とすると、動画像復号部１０３は、Ｔｖ’（ｉ）−Ｔ１を遅延量として出力する。動画像復号部１０３は、画像出力部１０５に表示時刻としてＴｖ’（ｉ）を出力する。 The stream processing unit 102 adds the two display times tv (i) and Tv (i) to the moving image stream, and outputs them to the moving image decoding unit 103. The moving picture decoding unit 103 rewrites Tv (i) to Tv ′ (i) by executing playback control. Tv (i) and rewritten Tv ′ (i) are added to the decoded moving image frame and output to the image output unit 105. Assuming that the system time at the end of the decoding process is T1, the moving picture decoding unit 103 outputs Tv ′ (i) −T1 as a delay amount. The moving image decoding unit 103 outputs Tv ′ (i) to the image output unit 105 as the display time.

図６（ａ）の場合、Ｔ１＜Ｔｖ’（ｉ）であり、画像出力部１０５は、システム時刻がＴｖ’ （ｉ）になった時点で復号フレームを出力する。 In the case of FIG. 6A, T1 <Tv ′ (i), and the image output unit 105 outputs the decoded frame when the system time reaches Tv ′ (i).

図６（ｂ）は、復号処理終了時のシステム時刻Ｔ２＞Ｔｖ’（ｉ）で遅延が起きている場合の例を示す。この場合には、画像出力部１０５は、動画像復号部１０３から復号フレームが入力された直後に復号フレームを出力する。なお、ストリームデータに付けられた２つの表示時刻は、画像出力部１０５に記憶され、優先指定時に音声ストリームとの同期に利用される。 FIG. 6B shows an example in which a delay occurs at the system time T2> Tv ′ (i) at the end of the decoding process. In this case, the image output unit 105 outputs the decoded frame immediately after the decoded frame is input from the moving image decoding unit 103. The two display times attached to the stream data are stored in the image output unit 105, and are used for synchronization with the audio stream when priority is designated.

図３における優先指定ありの再生状態では、１つの映像コンテンツを優先として音声を出力する。この場合、ステップＳＴ４０４で動画像ストリームデータだけでなく音声ストリームデータを取得し、音声復号部１０６で復号処理を実行する。さらに、復号制御部１０４は、他の映像コンテンツの状態にかかわらず、優先指定された映像コンテンツを処理する動画像復号部１０３の動作モードを通常再生とする。 In the playback state with priority designation in FIG. 3, audio is output with priority given to one video content. In this case, not only the moving image stream data but also the audio stream data is acquired in step ST404, and the audio decoding unit 106 executes the decoding process. Furthermore, the decoding control unit 104 sets the operation mode of the moving image decoding unit 103 that processes the preferentially specified video content to normal playback regardless of the state of other video content.

図７は、優先指定なしから優先指定ありへの状態遷移時の処理の流れを示すフローチャートである。処理が開始すると、ステップＳＴ７０１において、制御部１０８は、優先指定された映像コンテンツのストリームを処理するストリーム処理部１０２（以下、優先指定されたストリーム処理部と呼ぶ）に対して音声ストリームデータの取得を指示すると共に、復号制御部１０４には、当該映像コンテンツの動画像ストリームの全フレームを処理するように指示する。 FIG. 7 is a flowchart showing the flow of processing at the time of state transition from no priority designation to priority designation. When the process starts, in step ST701, the control unit 108 acquires audio stream data from the stream processing unit 102 (hereinafter referred to as a priority-designated stream processing unit) that processes a stream of video content that has been designated by priority. And the decoding control unit 104 is instructed to process all frames of the moving image stream of the video content.

ステップＳＴ７０２において、優先指定されたストリーム処理部１０２は、すでに処理した動画像ストリームのうち最も近い過去に処理した動画像ストリームの表示時刻を取得する。 In step ST702, the priority-designated stream processing unit 102 acquires the display time of the moving image stream processed in the past in the most processed moving image stream.

ステップＳＴ７０３において、優先指定されたストリーム処理部１０２は、ステップＳＴ７０２で取得した表示時刻に基づき、取得すべき音声ストリームの表示時刻を決定する。ここで扱う表示時刻は、画像出力部１０５にて扱われる最終的な表示時刻Ｔｖ’（ｉ）とは異なり、動画像ストリームデータに付随して映像コンテンツのストリームデータに多重化されている値ｔｖ（ｉ）である。動画像ストリームの表示時刻ｔｖ（ｉ）に基づき、例えば、最も近い音声ストリームの表示時刻を算出し、これをｔａ（ｊ）とする。 In step ST703, the priority-designated stream processing unit 102 determines the display time of the audio stream to be acquired based on the display time acquired in step ST702. The display time handled here is different from the final display time Tv ′ (i) handled by the image output unit 105, and is a value tv that is multiplexed with the video content stream data along with the moving image stream data. (I). Based on the display time tv (i) of the moving image stream, for example, the display time of the nearest audio stream is calculated, and this is set to ta (j).

ステップＳＴ７０４において、ストリーム処理部１０２は、表示時刻ｔａ（ｊ）を持つ音声ストリームデータを内部バッファから探索し、該当する音声ストリームデータが内部バッファ内に見つかれば音声取り出し可能と判断してステップＳＴ７０５に進み、該当する音声ストリームデータが内部バッファ内に見つからなければ音声取り出し不可能と判断して、再度ステップＳＴ７０４において、内部バッファのデータが更新された時点で、音声ストリームデータを内部バッファから探索する。条件に適合した音声ストリームが探索できた時点では、必ずしも動画像と音声の間に同期がとれているとは限らない。ストリーム処理部１０２は、後にステップＳＴ７０５において音声ストリームデータを音声復号部１０６に対して出力するに際し、表示時刻ｔａ（ｊ）を添付する。 In step ST704, the stream processing unit 102 searches the internal buffer for audio stream data having the display time ta (j), determines that the audio can be extracted if the corresponding audio stream data is found in the internal buffer, and proceeds to step ST705. In step ST704, if the corresponding audio stream data is not found in the internal buffer, it is determined that the audio cannot be extracted. When the data in the internal buffer is updated again, the audio stream data is searched from the internal buffer. At the time when an audio stream suitable for the condition can be searched, the moving image and the audio are not necessarily synchronized. When the stream processing unit 102 outputs the audio stream data to the audio decoding unit 106 later in step ST705, the stream processing unit 102 attaches the display time ta (j).

ステップＳＴ７０５において、優先指定されたストリーム処理部１０２は、動画像ストリームおよび音声ストリームデータの双方を内部バッファから取得する。一方、優先指定のないストリーム処理部１０２は、以前（優先指定なしの再生情報）と同じ動作を続ける。取得後の動画像ストリームは、対応する動画像復号部１０３の内部バッファに送られ、音声ストリームデータは、音声復号部１０６の内部バッファに送られ、適宜復号される。 In step ST705, the priority-designated stream processing unit 102 acquires both the moving image stream and the audio stream data from the internal buffer. On the other hand, the stream processing unit 102 without priority designation continues the same operation as before (reproduction information without priority designation). The acquired moving image stream is sent to the corresponding internal buffer of the moving image decoding unit 103, and the audio stream data is sent to the internal buffer of the audio decoding unit 106 and decoded as appropriate.

ステップＳＴ７０６において、音声出力部１０７は、復号後の動画像ストリームと音声ストリームの同期をとるために、優先指定後の最初の復号音声データを受け取った時点で、当該音声データが出力可能かどうかを判定し、出力可能な場合はステップＳＴ７０８に進み、出力可能でない場合はステップＳＴ７０７に進む。 In step ST706, the audio output unit 107 determines whether or not the audio data can be output when the first decoded audio data after priority designation is received in order to synchronize the decoded video stream and audio stream. If it is determined that output is possible, the process proceeds to step ST708, and if output is not possible, the process proceeds to step ST707.

ステップＳＴ７０８において、動画像と音声を同期して出力し、処理を終了する。 In step ST708, the moving image and the sound are output in synchronization, and the process ends.

ステップＳＴ７０７において、動画像のみの出力を継続し、ステップＳＴ７０５に戻る。 In step ST707, the output of only the moving image is continued, and the process returns to step ST705.

音声出力部１０７の具体的な動作を、図８のフローチャートを用いて説明する。 A specific operation of the audio output unit 107 will be described with reference to a flowchart of FIG.

ステップＳＴ８０１において、音声出力部１０７は、内部バッファを監視し、復号済みのデータが存在するか否かを調べ、存在する場合はステップＳＴ８０２に進む。 In step ST801, the audio output unit 107 monitors the internal buffer to check whether or not there is decoded data. If there is, the process proceeds to step ST802.

ステップＳＴ８０２において、音声出力部１０７は、優先指定後最初の復号音声データを受け取ったことを画像出力部１０５へ通知する。 In step ST802, the audio output unit 107 notifies the image output unit 105 that the first decoded audio data after priority designation has been received.

ステップＳＴ８０３において、音声出力部１０７は、当該音声データに対応する出力済み動画像フレームの表示時刻ｔｖ（ｉ）およびＴｖ’（ｉ）を画像出力部１０５から取得する。 In step ST803, the audio output unit 107 acquires the display times tv (i) and Tv '(i) of the output moving image frame corresponding to the audio data from the image output unit 105.

ステップＳＴ８０４において、音声出力部１０７は、取得した動画像フレームの表示時刻を参照し、音声出力の可否を判断する。すなわち、音声出力部１０７は、音声の表示時刻ｔａ（ｊ）と動画像の表示時刻ｔｖ（ｉ）を比較し、例えば、表示時刻ｔａ（ｊ）と表示時刻ｔｖ（ｉ）の差が所定のしきい値の範囲内であれば音声出力可能と判定する。音声出力可能な場合はステップＳＴ８０５に進み、音声出力が可能でない場合はステップＳＴ８０３に戻る。 In step ST804, the audio output unit 107 refers to the display time of the acquired moving image frame and determines whether audio output is possible. That is, the audio output unit 107 compares the audio display time ta (j) with the moving image display time tv (i). For example, the difference between the display time ta (j) and the display time tv (i) is a predetermined value. If it is within the threshold range, it is determined that audio output is possible. If audio output is possible, the process proceeds to step ST805, and if audio output is not possible, the process returns to step ST803.

ステップＳＴ８０５において、音声出力部１０７は、音声データの出力を開始する。 In step ST805, the audio output unit 107 starts outputting audio data.

ステップＳＴ８０６において、音声出力部１０７は、音声データの出力を開始した時のシステム時刻と、画像出力部１０５から取得した表示済み動画像フレームの表示時刻Ｔｖ’（ｉ）の差分を、表示時刻のオフセットとして算出する。 In step ST806, the audio output unit 107 calculates the difference between the system time when the output of the audio data is started and the display time Tv ′ (i) of the displayed moving image frame acquired from the image output unit 105 as the display time. Calculate as an offset.

ステップＳＴ８０７において、音声出力部１０７は、算出したオフセットを画像出力部１０５へ出力し、音声出力を通知し、動作を終了する。 In step ST807, the audio output unit 107 outputs the calculated offset to the image output unit 105, notifies the audio output, and ends the operation.

画像出力部１０５は、音声出力部１０７から音声データ取得と音声データ出力の２つの通知を受け取る。優先指定後は、前述のように、時刻情報に合わせて表示のための動画像フレーム出力タイミングを調整する動作を行い、最初に音声データ取得の通知を受けた時点で、優先ストリームの動画像については、その表示時刻Ｔｖ’（ｉ）にかかわらず、受け取った時点で即時出力を行うように動作を変更する。これ以降、音声出力部１０７から音声出力通知を受け取るまでこの動作を続ける。このときの動画像フレームの表示時刻ｔｖ（ｉ）およびＴｖ’（ｉ）を記憶しておき、図８のステップＳＴ８０３について説明したように、音声出力部１０７からの問い合わせに対して出力する。 The image output unit 105 receives two notifications of audio data acquisition and audio data output from the audio output unit 107. After priority designation, as described above, the operation to adjust the output timing of the video frame for display according to the time information is performed. Regardless of the display time Tv ′ (i), the operation is changed so that immediate output is performed at the time of reception. Thereafter, this operation is continued until a voice output notification is received from the voice output unit 107. The display time tv (i) and Tv ′ (i) of the moving image frame at this time are stored, and are output in response to the inquiry from the audio output unit 107 as described for step ST803 in FIG.

優先指定以外の動画像については、これまで通り時刻情報に合わせた出力を行う。図９に処理例を示す。 For moving images other than priority designation, output is performed according to time information as before. FIG. 9 shows a processing example.

図９（ａ）では、動画像フレームＶ１を出力中にユーザにより優先指定の操作がなされ、このときストリーム処理部１０２では、動画像ストリームデータＶ７を処理中である。この状態でストリーム処理部１０２は、音声ストリームＡ５から取得を開始する。 In FIG. 9A, the user performs a priority designation operation while outputting the moving image frame V1, and at this time, the stream processing unit 102 is processing the moving image stream data V7. In this state, the stream processing unit 102 starts acquisition from the audio stream A5.

図９（ｂ）は、図９（ａ）からΔｔだけ経過後の状態を示しており、復号された音声データＡ５が音声出力部１０７に入力されたときで、画像出力部１０５は動画像フレームＶ６の出力待ちの状態である。ここで画像出力部１０５は、時刻情報に合わせて表示のための動画像フレーム出力タイミングを調整する動作から、時刻情報を考慮しない動作へと切り替わる。図９（ｂ）では、Ｖ６を即時に出力し、ｔｖ（６）およびＴｖ’（６）を記憶する。音声出力部１０７は、ｔｖ（６）を参照し、ｔａ（７）と比較する。もし、音声データＡ７が出力できるならば、Ｖ６の表示時刻Ｔｖ’（６）とシステム時刻ｔの差分Ｔｄを算出する。画像出力部１０５は、Ｖ７以降の動画像フレームについては、その時刻Ｔｖ’（ｉ）からＴｄを引いた時刻で出力タイミングを調整する。 FIG. 9B shows a state after Δt has elapsed from FIG. 9A, and when the decoded audio data A5 is input to the audio output unit 107, the image output unit 105 displays the moving image frame. It is in the state of waiting for the output of V6. Here, the image output unit 105 switches from the operation of adjusting the moving image frame output timing for display according to the time information to the operation not considering the time information. In FIG. 9B, V6 is output immediately and tv (6) and Tv ′ (6) are stored. The audio output unit 107 refers to tv (6) and compares it with ta (7). If the audio data A7 can be output, the difference Td between the display time Tv '(6) of V6 and the system time t is calculated. The image output unit 105 adjusts the output timing of a moving image frame after V7 at a time obtained by subtracting Td from the time Tv ′ (i).

動画像と音声の同期を図っている間は、動画像については時刻情報に関係なく出力がされるようになり、一般には出力タイミングが早まることになる。ただし、早まる時間の変動量は、動画像フレームの間隔に依存し、例えばスキップ再生のようにフレーム間隔が広がったときには大きく変動するが、通常の再生レートであればほとんど変動しない。スキップ再生では元々フレーム間隔が広いため、表示時刻を早めてもユーザが知覚する違和感は少なくて済む。ユーザにより音声出力が選択された時点で音声処理を開始しつつ、音声出力までの時間を常に短くすることができる。 While the moving image and the sound are synchronized, the moving image is output regardless of the time information, and generally the output timing is advanced. However, the amount of fluctuation of the time to be advanced depends on the interval of the moving image frames, and varies greatly when the frame interval is widened, for example, skip reproduction, but hardly varies at a normal reproduction rate. In skip reproduction, since the frame interval is originally wide, even if the display time is advanced, there is little discomfort perceived by the user. It is possible to always shorten the time until the voice output while starting the voice processing when the voice output is selected by the user.

なお、前述の実施例では、動画像と音声の同期を図っている間に実行する第２の動作は、時刻情報に関係なく即時に動画像フレームを出力するようにしているが、動画像の表示をせずに、単に動画像フレームのデータを内部バッファから読み捨てるようにしてもよいし、フレーム間の時刻の間隔が所定の値を超える場合にのみ、時刻情報に関係なく即時に出力する（または出力せずに読み捨てる）ようにしてもよい。 In the above-described embodiment, the second operation executed while synchronizing the moving image and the sound is to output the moving image frame immediately regardless of the time information. The video frame data may be simply discarded from the internal buffer without being displayed, or is output immediately regardless of the time information only when the time interval between frames exceeds a predetermined value. (Or discard it without outputting it).

本発明の実施形態の機能を実現するソフトウェアのプログラムコードを記録した記録媒体を装置に供給し、マイクロプロセッサまたはＤＰＳによりプログラムコードが実行されることによっても、本発明の目的が達成される。この場合、ソフトウェアのプログラムコード自体が本実施形態の機能を実現することになり、プログラムコードを記録した記録媒体は本発明を構成することになる。 The object of the present invention can also be achieved by supplying a recording medium storing software program codes for realizing the functions of the embodiments of the present invention to the apparatus and executing the program codes by a microprocessor or DPS. In this case, the software program code itself realizes the functions of this embodiment, and the recording medium on which the program code is recorded constitutes the present invention.

以上、本発明の実施形態について説明したが、本発明の映像コンテンツ再生装置およびその制御方法、プログラム、記録媒体は、上記の実施形態に限定されるものではなく、本発明の要旨を逸脱しない範囲内において、種々の変更を加えうることは勿論である。 Although the embodiment of the present invention has been described above, the video content reproduction apparatus, the control method, the program, and the recording medium of the present invention are not limited to the above-described embodiment, and do not depart from the gist of the present invention. Of course, various changes can be made.

本発明は、映像コンテンツ再生装置およびその制御方法、プログラム、記録媒体に利用可能である。 INDUSTRIAL APPLICABILITY The present invention can be used for a video content reproduction apparatus, a control method thereof, a program, and a recording medium.

１００映像コンテンツ再生装置
１０１データ取得部
１０２ストリーム処理部
１０３動画像復号部
１０４復号制御部
１０５画像出力部
１０６音声復号部
１０７音声出力部
１０８制御部 DESCRIPTION OF SYMBOLS 100 Video content reproduction apparatus 101 Data acquisition part 102 Stream processing part 103 Moving image decoding part 104 Decoding control part 105 Image output part 106 Audio decoding part 107 Audio output part 108 Control part

Claims

A data acquisition unit that receives an input of a stream of video content;
A stream processing unit for obtaining a moving image stream and an audio stream from the video content stream;
A video decoding unit that decodes the video stream and obtains a decoded video frame;
An image output unit that executes a first operation for adjusting the output of the decoded moving image frame based on time information and a second operation for outputting the decoded moving image frame regardless of the time information;
An audio decoding unit that decodes the audio stream and obtains decoded audio data;
An audio output unit for outputting the decoded audio data;
With
The stream processing unit acquires the audio stream from the video content stream in response to a user operation,
The video content reproduction apparatus, wherein the image output unit switches the operation from the first operation to the second operation when the decoded audio data is input to the audio output unit.

2. The video content according to claim 1, wherein the image output unit switches the operation from the second operation to the first operation when the audio output unit starts outputting the decoded audio data. Playback device.

The image output unit stores time information of the decoded moving image frame in the second operation,
The video content reproduction apparatus according to claim 1 or 2, wherein the audio output unit determines output start of the decoded audio data with reference to time information of the decoded moving image frame.

The audio output unit calculates a difference between an output start time of the decoded audio data and time information of the decoded moving image frame;
4. The video content reproduction apparatus according to claim 3, wherein the image output unit corrects time information of the decoded moving image frame by the difference in the first operation.

The switching from the first operation to the second operation is performed only when an interval of time information of the decoded moving image frame input to the image output unit exceeds a predetermined value. Item 5. The video content reproduction device according to any one of Items 1 to 4.

The image output unit, when displaying a list of a plurality of video contents, performs switching from the first operation to the second operation only for the video content to be output. 5. The video content reproduction apparatus according to any one of 1 to 4.

A control method of a video content playback apparatus that displays a thumbnail of video content,
A data acquisition step for receiving an input of a video content stream;
A stream processing step of obtaining a moving image stream and an audio stream from the video content stream;
A video decoding step of decoding the video stream and obtaining a decoded video frame;
An image output step for executing a first operation for adjusting the output of the decoded moving image frame based on time information and a second operation for outputting the decoded moving image frame irrespective of the time information;
An audio decoding step of decoding the audio stream to obtain decoded audio data;
An audio output step for outputting the decoded audio data;
Have
In the stream processing step, the audio stream is acquired from the video content stream in response to a user operation,
The method for controlling a video content reproduction apparatus, wherein, in the image output step, the operation is switched from the first operation to the second operation when the decoded audio data is input in the audio output step.

8. The video content according to claim 7, wherein in the image output step, the operation is switched from the second operation to the first operation at the time when the output of the decoded audio data is started in the audio output step. A control method of a playback device.

In the image output step, time information of the decoded moving image frame is stored in the second operation,
9. The method of controlling a video content reproduction apparatus according to claim 7, wherein in the audio output step, the output start of the decoded audio data is determined with reference to time information of the decoded moving image frame.

In the audio output step, the difference between the output start time of the decoded audio data and the time information of the decoded video frame is calculated,
10. The video content playback apparatus control method according to claim 9, wherein, in the image output step, time information of the decoded moving image frame is corrected by the difference in the first operation.

A program for causing a computer to execute the control method of the video content reproduction apparatus according to any one of claims 7 to 10.

The computer-readable recording medium which recorded the program of Claim 11.