JP2012257078A

JP2012257078A - Image and sound data processor and data multiplexing method

Info

Publication number: JP2012257078A
Application number: JP2011129034A
Authority: JP
Inventors: Kyosuke Toda; 京佑戸田
Original assignee: Fujitsu Semiconductor Ltd
Current assignee: Fujitsu Semiconductor Ltd
Priority date: 2011-06-09
Filing date: 2011-06-09
Publication date: 2012-12-27
Anticipated expiration: 2031-06-09
Also published as: JP5678807B2

Abstract

PROBLEM TO BE SOLVED: To reduce the overflow of image data and sound data.SOLUTION: An image and sound data processor includes: a first storage part for storing encoded image data; a second storage part for storing encoded sound data; and a chunk determination part for monitoring the respective occupation amounts of the first storage part and the second storage part and reduces the number of frames of a chunk to be smaller than the number of frames of a GOP (Group Of Picture) when at least one condition is satisfied between the first condition that the occupation amount of the first storage part is equal to or more than a predetermined first threshold and the second condition that the occupation amount of the second storage part is equal to or more than a predetermined second threshold.

Description

本発明は、映像・音声データ処理装置およびデータ多重化方法に関する。 The present invention relates to a video / audio data processing apparatus and a data multiplexing method.

近年、映像データおよび音声データを扱うファイルフォーマットとして、ＭＰ４ファイルフォーマットが普及している。ＭＰ４ファイルフォーマットは、ＭＰＥＧ４システム規格（ＩＳＯ／ＩＥＣ１４４９６−１）で規定されている。ＭＰ４ファイルフォーマットは、パーソナルコンピュータ（以下、ＰＣとも称する）等での処理が容易であるため、ＰＣアプリケーションの分野で広く使用されている。例えば、ＭＰ４ファイルフォーマットを使用することにより、映像および音声のストリームデータの編集等を容易に実行できる。 In recent years, the MP4 file format has become widespread as a file format for handling video data and audio data. The MP4 file format is defined in the MPEG4 system standard (ISO / IEC 14496-1). The MP4 file format is widely used in the field of PC applications because it can be easily processed by a personal computer (hereinafter also referred to as a PC). For example, video and audio stream data can be easily edited by using the MP4 file format.

ＭＰ４ファイルは、映像データおよび音声データが格納されるデータ部分と、付属情報が格納されるヘッダ部分とを有している。なお、ＭＰ４ファイルでは、映像データおよび音声データを、サンプルおよびチャンクの単位で管理する。サンプルは、映像データおよび音声データを管理する際の最小単位である。例えば、映像データでは、１サンプルは、１映像フレームである。また、音声データでは、１サンプルは、１音声フレームである。そして、データ部分にまとめて配置された複数のサンプルは、チャンクとして管理される。なお、ＭＰ４ファイルでは、同一のチャンク内のサンプルは、データ部分に連続して配置されている。 The MP4 file has a data portion in which video data and audio data are stored, and a header portion in which attached information is stored. In the MP4 file, video data and audio data are managed in units of samples and chunks. A sample is a minimum unit for managing video data and audio data. For example, in video data, one sample is one video frame. In the audio data, one sample is one audio frame. A plurality of samples arranged together in the data portion are managed as chunks. In the MP4 file, samples in the same chunk are continuously arranged in the data portion.

チャンク内のサンプルの数は、ＭＰ４ファイルフォーマットでは規定されていない。このため、チャンクのサンプル数は、ユーザにより任意に決められる。一般的な方式では、映像データの１ＧＯＰ（Group of Picture）分のフレームを１チャンクにまとめる。例えば、チャンクの先頭フレームがＩピクチャ（Ｉフレーム）になるように、各チャンクのサンプル数（フレーム数）が設定される。なお、ＧＯＰは、Ｉピクチャを少なくとも１枚含むピクチャの集合体であり、Ｉピクチャの他にＰピクチャ（フレーム）、Ｂピクチャ（フレーム）を含むこともある。この方式では、Ｉピクチャの間隔が長いとき、チャンクのサンプル数は増加する。なお、チャンク内のフレーム（サンプル）の最大数を予め設定し、チャンク内のフレームが予め設定した最大数以上になることを防止する方式が提案されている（例えば、特許文献１参照）。 The number of samples in a chunk is not specified in the MP4 file format. Therefore, the number of chunk samples is arbitrarily determined by the user. In a general method, frames of 1 GOP (Group of Picture) of video data are combined into one chunk. For example, the number of samples (number of frames) of each chunk is set so that the first frame of the chunk is an I picture (I frame). The GOP is a collection of pictures including at least one I picture, and may include a P picture (frame) and a B picture (frame) in addition to the I picture. In this method, when the interval between I pictures is long, the number of chunk samples increases. A method has been proposed in which the maximum number of frames (samples) in a chunk is set in advance and the number of frames in a chunk is prevented from exceeding a preset maximum number (see, for example, Patent Document 1).

一般に、コーデックＬＳＩ等の映像・音声データ処理装置では、内部メモリの容量が限られているため、ＭＰ４ファイルを内部で生成することは現実的でない。このため、例えば、コーデックＬＳＩで符号化された映像データおよび音声データをＭＰ４ファイルに格納するシステムでは、ＭＰ４ファイルは、コーデックＬＳＩの外部のモジュールで生成される。例えば、コーデックＬＳＩは、ＭＰ４ファイルのデータ部分に格納されるデータとヘッダ部分に格納される付属情報とを、個別に出力する。そして、例えば、コーデックＬＳＩの外部のモジュールは、データと付属情報とを合成し、ＭＰ４ファイルを生成する。 In general, in a video / audio data processing apparatus such as a codec LSI, the capacity of an internal memory is limited, and therefore it is not realistic to generate an MP4 file internally. Therefore, for example, in a system that stores video data and audio data encoded by a codec LSI in an MP4 file, the MP4 file is generated by a module outside the codec LSI. For example, the codec LSI individually outputs data stored in the data portion of the MP4 file and attached information stored in the header portion. Then, for example, a module external to the codec LSI synthesizes the data and the attached information to generate an MP4 file.

特開２００４−１２８９３８号公報JP 2004-128938 A

コーデックＬＳＩ等の映像・音声データ処理装置は、映像データおよび音声データを、チャンク単位で交互に出力する。このため、映像データおよび音声データの一方を出力している期間では、他方は待たされる。出力待ちの期間が長いとき、映像データを一時的に記憶するメモリおよび音声データを一時的に記憶するメモリのいずれかがオーバーフローするおそれがある。オーバーフローが発生したとき、有効なデータは消失する。 A video / audio data processing apparatus such as a codec LSI alternately outputs video data and audio data in units of chunks. For this reason, in the period in which one of the video data and the audio data is output, the other is kept waiting. When the output waiting period is long, either the memory for temporarily storing the video data or the memory for temporarily storing the audio data may overflow. When overflow occurs, valid data is lost.

出力待ちの期間は、例えば、チャンクのサンプル数が多いとき、長くなる。なお、チャンク内のサンプルの最大数を予め設定する方式では、メモリの空き容量が大きいときにも、チャンクが不要に細分化されるおそれがある。ＭＰ４ファイルのヘッダ部分に格納される付属情報は、チャンク毎に生成される。このため、チャンクが不要に細分化されたとき、ＭＰ４ファイルのヘッダ部分のサイズが増加するおそれがある。ＭＰ４ファイルのヘッダ部分のサイズの増加に伴い、ＭＰ４ファイル全体のサイズは増加する。サイズの大きいＭＰ４ファイルは、ＰＣアプリケーションでは扱い難い。 The output waiting period becomes longer, for example, when the number of chunk samples is large. Note that with the method of setting the maximum number of samples in a chunk in advance, there is a risk that the chunk will be unnecessarily subdivided even when the free capacity of the memory is large. The attached information stored in the header part of the MP4 file is generated for each chunk. For this reason, when chunks are subdivided unnecessarily, the size of the header portion of the MP4 file may increase. As the size of the header portion of the MP4 file increases, the size of the entire MP4 file increases. A large MP4 file is difficult to handle in a PC application.

本発明の目的は、映像データおよび音声データのオーバーフローを低減することである。 An object of the present invention is to reduce overflow of video data and audio data.

本発明の一形態では、映像・音声データ処理装置は、符号化された映像データを記憶する第１記憶部と、符号化された音声データを記憶する第２記憶部と、第１記憶部および第２記憶部のそれぞれの占有量を監視し、第１記憶部の占有量が所定の第１閾値以上である第１条件および第２記憶部の占有量が所定の第２閾値以上である第２条件の少なくとも一方を満たすとき、チャンクのフレーム数をＧＯＰのフレーム数より少なくするチャンク確定部とを有している。 In one aspect of the present invention, a video / audio data processing device includes a first storage unit that stores encoded video data, a second storage unit that stores encoded audio data, a first storage unit, Each occupancy of the second storage unit is monitored, the first condition that the occupancy of the first storage unit is equal to or greater than a predetermined first threshold, and the occupancy of the second storage unit is equal to or greater than a predetermined second threshold A chunk determination unit that reduces the number of chunk frames to less than the number of GOP frames when at least one of the two conditions is satisfied.

映像データおよび音声データのオーバーフローを低減できる。 Overflow of video data and audio data can be reduced.

一実施形態における映像・音声データ処理装置の例を示している。1 illustrates an example of a video / audio data processing device according to an embodiment. ＭＰ４ファイルフォーマットの概要を示している。An overview of the MP4 file format is shown. 図１に示した映像・音声データ処理装置から出力されるデータの一例を示している。2 shows an example of data output from the video / audio data processing apparatus shown in FIG. チャンクとランダムアクセスポイントとの関係の一例を示している。An example of the relationship between chunks and random access points is shown. チャンクとランダムアクセスポイントとの関係の別の例を示している。4 shows another example of the relationship between chunks and random access points. メモリの占有量が閾値以上のときのチャンクの一例を示している。An example of the chunk when the memory occupation amount is equal to or greater than the threshold is shown. メモリの占有量が閾値以上のときのチャンクの別の例を示している。The other example of the chunk when the memory occupation amount is more than a threshold is shown. メモリの占有量が閾値以上のときのチャンクの別の例を示している。The other example of the chunk when the memory occupation amount is more than a threshold is shown. メモリの占有量が閾値以上のときのチャンクの別の例を示している。The other example of the chunk when the memory occupation amount is more than a threshold is shown. 別の実施形態における映像・音声データ処理装置の一例を示している。3 shows an example of a video / audio data processing apparatus according to another embodiment. 図１０に示した映像・音声データ処理装置の動作の一例を示している。11 shows an example of the operation of the video / audio data processing apparatus shown in FIG.

以下、実施形態を図面を用いて説明する。 Hereinafter, embodiments will be described with reference to the drawings.

図１は、一実施形態における映像・音声データ処理装置１０の例を示している。 FIG. 1 shows an example of a video / audio data processing apparatus 10 according to an embodiment.

映像・音声データ処理装置１０は、例えば、ＭＰ４ファイルを生成するシステムに、搭載される。ＭＰ４ファイルフォーマットは、ＭＰＥＧ４システム規格（ＩＳＯ／ＩＥＣ１４４９６−１）で規定されている。例えば、映像・音声データ処理装置１０は、ＭＰＥＧ−４（ＩＳＯ／ＩＥＣ１４４９６）等に準拠した符号化方式で符号化された映像データおよび音声データを、外部装置にチャンク単位で交互に出力する。そして、外部装置は、例えば、映像・音声データ処理装置１０から受けた映像データおよび音声データをＭＰ４ファイルに格納し、ＭＰ４ファイルを生成する。 The video / audio data processing apparatus 10 is mounted on, for example, a system that generates an MP4 file. The MP4 file format is defined in the MPEG4 system standard (ISO / IEC 14496-1). For example, the video / audio data processing apparatus 10 alternately outputs video data and audio data encoded by an encoding method compliant with MPEG-4 (ISO / IEC 14496) or the like to the external device in units of chunks. Then, for example, the external device stores the video data and audio data received from the video / audio data processing device 10 in an MP4 file, and generates an MP4 file.

映像・音声データ処理装置１０は、チャンク確定部２０およびメモリ３０を有している。メモリ３０は、例えば、ＭＰＥＧ−４等に準拠した符号化方式で符号化された映像データおよび音声データを順次記憶する。なお、メモリ３０に記憶された映像データおよび音声データは、順次出力される。すなわち、メモリ３０は、符号化された映像データおよび音声データを一時的に記憶する。 The video / audio data processing apparatus 10 includes a chunk determination unit 20 and a memory 30. The memory 30 sequentially stores, for example, video data and audio data encoded by an encoding method compliant with MPEG-4 or the like. Note that the video data and audio data stored in the memory 30 are sequentially output. That is, the memory 30 temporarily stores the encoded video data and audio data.

例えば、メモリ３０は、符号化された映像データを一時的に記憶する映像領域３２と、符号化された音声データを一時的に記憶する音声領域３４とを有している。すなわち、メモリ３０の映像領域３２は、符号化された映像データを一時的に記憶する記憶部として機能する。また、メモリ３０の音声領域３４は、符号化された音声データを一時的に記憶する記憶部として機能する。なお、映像領域３２および音声領域３４は、互いに異なるメモリに設けられてもよい。 For example, the memory 30 includes a video area 32 that temporarily stores encoded video data and an audio area 34 that temporarily stores encoded audio data. That is, the video area 32 of the memory 30 functions as a storage unit that temporarily stores the encoded video data. The audio area 34 of the memory 30 functions as a storage unit that temporarily stores encoded audio data. Note that the video area 32 and the audio area 34 may be provided in different memories.

チャンク確定部２０は、映像領域３２の占有量および音声領域３４の占有量をそれぞれ監視する。例えば、チャンク確定部２０は、映像データや音声データの符号化に関する情報を用いて、映像領域３２や音声領域３４の占有量を算出する。符号化に関する情報は、例えば、符号化された映像データや音声データのサイズ等であり、映像データや音声データを符号化する際に生成される。なお、チャンク確定部２０は、メモリ３０の映像領域３２や音声領域３４に対する書き込みアドレスや読み出しアドレスに基づいて、映像領域３２や音声領域３４の占有量を算出してもよい。 The chunk determination unit 20 monitors the occupation amount of the video area 32 and the occupation amount of the audio area 34, respectively. For example, the chunk determination unit 20 calculates the occupation amount of the video area 32 and the audio area 34 using information related to encoding of video data and audio data. The information related to encoding is, for example, the size of encoded video data or audio data, and is generated when encoding the video data or audio data. Note that the chunk determination unit 20 may calculate the occupation amount of the video area 32 and the audio area 34 based on the write address and the read address for the video area 32 and the audio area 34 of the memory 30.

そして、チャンク確定部２０は、例えば、映像領域３２の占有量と所定の第１閾値とを比較するとともに、音声領域３４の占有量と所定の第２閾値とを比較する。そして、チャンク確定部２０は、映像領域３２の占有量が第１閾値以上である第１条件および音声領域３４の占有量が第２閾値以上である第２条件の少なくとも一方を満たすとき、チャンクのフレーム数（サンプル数）をＧＯＰ（Group of Picture）のフレーム数より少なくする。ＧＯＰは、複数のピクチャを有するデータ単位であり、少なくとも１つのＩピクチャ（Ｉフレーム）を有している。 Then, for example, the chunk determination unit 20 compares the occupation amount of the video area 32 with a predetermined first threshold value, and compares the occupation amount of the audio area 34 with a predetermined second threshold value. Then, when the chunk determination unit 20 satisfies at least one of the first condition in which the occupation amount of the video area 32 is equal to or greater than the first threshold and the second condition in which the occupation amount of the audio area 34 is equal to or greater than the second threshold, The number of frames (number of samples) is made smaller than the number of frames of GOP (Group of Picture). The GOP is a data unit having a plurality of pictures, and has at least one I picture (I frame).

このように、チャンク確定部２０は、映像領域３２および音声領域３４の少なくとも一方の空き容量が小さいとき、チャンクのフレーム数をＧＯＰのフレーム数より少なくする。例えば、映像・音声データ処理装置１０は、符号化された映像データおよび音声データを、チャンク確定部２０により設定されたチャンクのフレーム数に基づいて多重化する。 As described above, the chunk determination unit 20 makes the number of chunk frames smaller than the number of GOP frames when the free capacity of at least one of the video area 32 and the audio area 34 is small. For example, the video / audio data processing device 10 multiplexes the encoded video data and audio data based on the number of chunk frames set by the chunk determination unit 20.

なお、映像データおよび音声データの符号化は、映像・音声データ処理装置１０で実施されてもよいし、映像・音声データ処理装置１０の外部で実施されてもよい。したがって、映像・音声データ処理装置１０は、映像データおよび音声データを符号化する機能を有してもよいし、映像データおよび音声データを符号化する機能を有していなくてもよい。例えば、映像・音声データ処理装置１０は、映像データおよび音声データを符号化する機能を有していなとき、外部のコーデックＬＳＩ等で符号化された映像データおよび音声データを順次受ける。そして、映像・音声データ処理装置１０は、符号化された映像データおよび音声データを映像領域３２および音声領域３４に順次記憶する。 The encoding of the video data and the audio data may be performed by the video / audio data processing device 10 or may be performed outside the video / audio data processing device 10. Therefore, the video / audio data processing apparatus 10 may have a function of encoding video data and audio data, or may not have a function of encoding video data and audio data. For example, when the video / audio data processing apparatus 10 does not have a function of encoding video data and audio data, the video / audio data processing apparatus 10 sequentially receives video data and audio data encoded by an external codec LSI or the like. Then, the video / audio data processing device 10 sequentially stores the encoded video data and audio data in the video area 32 and the audio area 34.

図２は、ＭＰ４ファイルフォーマットの概要を示している。ＭＰ４ファイルＦＩＬは、例えば、ボックスＢ１０、Ｂ２０、Ｂ３０を有している。ファイルタイプボックス（ｆｔｙｐ）Ｂ１０は、ファイルの先頭に１つのみ存在する。ファイルタイプボックス（ｆｔｙｐ）Ｂ１０には、ＭＰ４ファイルＦＩＬの構造を示すブランド名が格納される。ムービーボックス（ｍｏｏｖ）Ｂ２０は、１つのファイルに１つのみ存在する。ムービーボックス（ｍｏｏｖ）Ｂ２０には、ｍｄａｔに関する全てのヘッダ情報が格納される。例えば、各サンプルのサイズ、各サンプルの再生期間、ファイル全体の再生期間および各チャンクの位置情報等がムービーボックス（ｍｏｏｖ）Ｂ２０に格納される。メディアデータボックス（ｍｄａｔ）Ｂ３０には、例えば、符号化された映像データおよび音声データが格納される。なお、メディアデータボックス（ｍｄａｔ）Ｂ３０は、１つのファイルに複数存在してもよい。 FIG. 2 shows an outline of the MP4 file format. The MP4 file FIL has, for example, boxes B10, B20, and B30. There is only one file type box (ftyp) B10 at the head of the file. A brand name indicating the structure of the MP4 file FIL is stored in the file type box (ftyp) B10. There is only one movie box (moov) B20 in one file. The movie box (moov) B20 stores all header information related to mdat. For example, the size of each sample, the playback period of each sample, the playback period of the entire file, the position information of each chunk, and the like are stored in the movie box (moov) B20. In the media data box (mdat) B30, for example, encoded video data and audio data are stored. Note that a plurality of media data boxes (mdat) B30 may exist in one file.

図３は、図１に示した映像・音声データ処理装置１０から出力されるデータの一例を示している。なお、図中の網掛けは、映像データのチャンクＣＮＫおよび映像データのメッセージＭＥＳを示している。また、メッセージＭＥＳの括弧内の数字およびチャンクＣＮＫの括弧内の数字は、出力された順番を示している。 FIG. 3 shows an example of data output from the video / audio data processing apparatus 10 shown in FIG. The shaded portions in the figure indicate the chunk CNK of video data and the message MES of video data. The numbers in parentheses of the message MES and the numbers in parentheses of the chunk CNK indicate the output order.

映像・音声データ処理装置１０は、映像領域３２および音声領域３４のそれぞれに記憶された映像データおよび音声データを、外部装置１００に、チャンクＣＮＫ単位で交互に出力する。チャンクＣＮＫは、複数のサンプルを有している。サンプルは、映像データおよび音声データを管理する際の最小単位である。例えば、映像データのチャンクＣＮＫの１サンプルは、１映像フレームＦＲＭである。また、例えば、音声データのチャンクＣＮＫの１サンプルは、１音声フレームである。 The video / audio data processing apparatus 10 alternately outputs the video data and audio data stored in the video area 32 and the audio area 34 to the external apparatus 100 in units of chunks CNK. The chunk CNK has a plurality of samples. A sample is a minimum unit for managing video data and audio data. For example, one sample of the chunk CNK of video data is one video frame FRM. Further, for example, one sample of the chunk CNK of the audio data is one audio frame.

すなわち、チャンクＣＮＫは、複数のフレームＦＲＭを有している。例えば、映像データのチャンクＣＮＫ（１）は、フレームＦＲＭ（１）−ＦＲＭ（ｎ）を有している。また、例えば、音声データのチャンクＣＮＫ（２）は、フレームＦＲＭ（ｎ＋１）−ＦＲＭ（ｍ）を有している。 That is, the chunk CNK has a plurality of frames FRM. For example, the chunk CNK (1) of the video data has a frame FRM (1) -FRM (n). Further, for example, the chunk CNK (2) of the audio data has a frame FRM (n + 1) −FRM (m).

また、映像・音声データ処理装置１０は、各チャンクＣＮＫに対応するメッセージＭＥＳを外部装置１００に出力する。例えば、映像・音声データ処理装置１０は、映像データおよび音声データを出力するインターフェースとは別のインターフェースからメッセージＭＥＳを出力する。 Also, the video / audio data processing device 10 outputs a message MES corresponding to each chunk CNK to the external device 100. For example, the video / audio data processing apparatus 10 outputs the message MES from an interface different from the interface that outputs the video data and audio data.

ＭＰ４ファイルでは、映像データおよび音声データは、サンプル（フレームＦＲＭ）およびチャンクＣＮＫの単位で管理される。このため、例えば、映像・音声データ処理装置１０は、メッセージＭＥＳを、チャンクＣＮＫ毎に生成する。メッセージＭＥＳは、例えば、チャンクＣＮＫに関するヘッダ情報である。したがって、例えば、映像・音声データ処理装置１０は、各メッセージＭＥＳを、各チャンクＣＮＫに対応付けて出力する。図の破線は、各メッセージＭＥＳが各チャンクＣＮＫに対応付けされていることを示している。 In the MP4 file, video data and audio data are managed in units of sample (frame FRM) and chunk CNK. For this reason, for example, the video / audio data processing device 10 generates a message MES for each chunk CNK. The message MES is, for example, header information related to the chunk CNK. Therefore, for example, the video / audio data processing apparatus 10 outputs each message MES in association with each chunk CNK. The broken lines in the figure indicate that each message MES is associated with each chunk CNK.

外部装置１００は、映像・音声データ処理装置１０からチャンクＣＮＫ単位で受けた映像データおよび音声データとメッセージＭＥＳとを元に、ＭＰ４ファイルを生成する。例えば、外部装置１００は、映像・音声データ処理装置１０から受けたメッセージＭＥＳを、図２に示したムービーボックス（ｍｏｏｖ）Ｂ２０に格納する。また、例えば、外部装置１００は、映像・音声データ処理装置１０から受けた映像データおよび音声データを、図２に示したメディアデータボックス（ｍｄａｔ）Ｂ３０に格納する。なお、ＭＰ４ファイルでは、同一のチャンクＣＮＫ内のサンプル（フレームＦＲＭ）は、メディアデータボックス（ｍｄａｔ）Ｂ３０に連続して配置される。 The external device 100 generates an MP4 file based on the video data and audio data received from the video / audio data processing device 10 in units of chunks CNK and the message MES. For example, the external device 100 stores the message MES received from the video / audio data processing device 10 in the movie box (moov) B20 shown in FIG. For example, the external device 100 stores the video data and audio data received from the video / audio data processing device 10 in the media data box (mdat) B30 shown in FIG. In the MP4 file, samples (frame FRM) in the same chunk CNK are continuously arranged in the media data box (mdat) B30.

このため、映像・音声データ処理装置１０は、映像領域３２および音声領域３４のそれぞれに記憶された映像データおよび音声データを、チャンクＣＮＫ単位で交互に出力する。したがって、映像データおよび音声データの出力待機時間（出力待ちの期間）は、例えば、チャンクのフレーム数を少なくすることにより、短くなる。例えば、映像・音声データ処理装置１０は、図１で説明したように、映像領域３２および音声領域３４の少なくとも一方の空き容量が小さいとき、チャンクのフレーム数をＧＯＰのフレーム数より少なくする。 Therefore, the video / audio data processing apparatus 10 alternately outputs the video data and audio data stored in the video area 32 and the audio area 34 in units of chunks CNK. Therefore, the output waiting time (output waiting period) of the video data and audio data is shortened by, for example, reducing the number of chunk frames. For example, as described with reference to FIG. 1, the video / audio data processing apparatus 10 reduces the number of chunk frames to be smaller than the number of GOP frames when the free space of at least one of the video area 32 and the audio area 34 is small.

すなわち、この実施形態では、映像領域３２および音声領域３４の少なくとも一方の空き容量が小さいとき、チャンクのフレーム数をＧＯＰのフレーム数より少なくすることにより、映像データおよび音声データの出力待ちの期間を短くできる。これにより、この実施形態では、映像領域３２および音声領域３４の一方からデータが出力されている期間（他方の出力待ちの期間）に、映像領域３２および音声領域３４の他方がオーバーフローすることを低減できる。すなわち、この実施形態では、映像データおよび音声データのオーバーフローを低減できる。 That is, in this embodiment, when the free space of at least one of the video area 32 and the audio area 34 is small, the number of chunk frames is made smaller than the number of GOP frames, thereby reducing the output waiting period of video data and audio data. Can be shortened. As a result, in this embodiment, the overflow of the other of the video area 32 and the audio area 34 during the period in which data is output from one of the video area 32 and the audio area 34 (the other output waiting period) is reduced. it can. That is, in this embodiment, overflow of video data and audio data can be reduced.

図４は、チャンクＣＮＫとランダムアクセスポイントＲＡＰとの関係の一例を示している。なお、図４は、２つのＧＯＰのフレームＦＲＭｖが１つのチャンクＣＮＫｖにまとめられたときの例を示している。図の太線で示したフレームＦＲＭｖは、ＩＤＲピクチャおよびＩピクチャを示している。また、フレームＦＲＭｖは、映像データのフレームＦＲＭを示し、フレームＦＲＭａは、音声データのフレームＦＲＭを示している。チャンクＣＮＫｖは、映像データのチャンクＣＮＫを示し、チャンクＣＮＫａは、音声データのチャンクＣＮＫを示している。 FIG. 4 shows an example of the relationship between the chunk CNK and the random access point RAP. FIG. 4 shows an example in which two GOP frames FRMv are combined into one chunk CNKv. A frame FRMv indicated by a bold line in the figure indicates an IDR picture and an I picture. The frame FRMv indicates the frame FRM of the video data, and the frame FRMa indicates the frame FRM of the audio data. The chunk CNKv indicates the chunk CNK of the video data, and the chunk CNKa indicates the chunk CNK of the audio data.

なお、フレームＦＲＭａの括弧内の数字は、括弧内の数字が同じフレームＦＲＭｖに対応していることを示している。例えば、フレームＦＲＭａ（１）は、フレームＦＲＭｖ（１）に対応する音声データのフレームＦＲＭである。また、チャンクＣＮＫａの括弧内の数字は、括弧内の数字が同じチャンクＣＮＫｖに対応していることを示している。例えば、チャンクＣＮＫａ（１）は、チャンクＣＮＫｖ（１）に対応する音声データのチャンクＣＮＫである。 The numbers in parentheses of the frame FRMa indicate that the numbers in the parentheses correspond to the same frame FRMv. For example, the frame FRMa (1) is a frame FRM of audio data corresponding to the frame FRMv (1). The number in parentheses of chunk CNKa indicates that the numbers in parentheses correspond to the same chunk CNKv. For example, chunk CNKa (1) is a chunk CNK of audio data corresponding to chunk CNKv (1).

ランダムアクセスポイントＲＡＰは、特定の位置からＭＰ４ファイルを再生する際の復帰ポイントおよび再生ポイントを意味する。例えば、早送り再生の際に、ランダムアクセスポイントＲＡＰが探索される。ランダムアクセスポイントＲＡＰのフレームＦＲＭｖ（サンプル）は、映像データの参照画像が不要なＩＤＲピクチャやＩピクチャに対応するフレームＦＲＭである。また、ランダムアクセスポイントＲＡＰのフレームＦＲＭａ（サンプル）は、映像データのＩＤＲピクチャやＩピクチャに対応する音声データのフレームＦＲＭである。 The random access point RAP means a return point and a playback point when playing back an MP4 file from a specific position. For example, a random access point RAP is searched for during fast forward playback. The frame FRMv (sample) of the random access point RAP is a frame FRM corresponding to an IDR picture or an I picture that does not require a reference image of video data. The frame FRMa (sample) of the random access point RAP is an audio data frame FRM corresponding to the IDR picture or I picture of the video data.

例えば、フレームＦＲＭｖ（１）、ＦＲＭｖ（ｎ＋１）、ＦＲＭｖ（ｍ＋１）は、映像データのランダムアクセスポイントＲＡＰである。また、フレームＦＲＭａ（１）、ＦＲＭａ（ｎ＋１）、ＦＲＭａ（ｍ＋１）は、音声データのランダムアクセスポイントＲＡＰである。ランダムアクセスポイントＲＡＰの情報は、図２に示したムービーボックス（ｍｏｏｖ）Ｂ２０内の“ＳＴＳＳ”と呼ばれるボックスに格納される。 For example, frames FRMv (1), FRMv (n + 1), and FRMv (m + 1) are random access points RAP for video data. The frames FRMa (1), FRMa (n + 1), and FRMa (m + 1) are voice data random access points RAP. Information on the random access point RAP is stored in a box called “STSS” in the movie box (moov) B20 shown in FIG.

ボックス“ＳＴＳＳ”に格納される情報（ランダムアクセスポイントＲＡＰの情報）は、ランダムアクセスポイントＲＡＰに対応するフレームＦＲＭのサンプル番号（ファイルの先頭から数えた順番）である。例えば、映像データでは、フレームＦＲＭｖ（１）、ＦＲＭｖ（ｎ＋１）等のサンプル番号（“１”、“ｎ＋１”等）がボックス“ＳＴＳＳ”に格納される。また、例えば、音声データでは、フレームＦＲＭａ（１）、ＦＲＭａ（ｎ＋１）等のサンプル番号（“チャンクＣＮＫｖ（１）のフレーム数＋１”、“チャンクＣＮＫｖ（１）のフレーム数＋ｎ＋１”等）がボックス“ＳＴＳＳ”に格納される。 The information (random access point RAP information) stored in the box “STSS” is the sample number (the order counted from the top of the file) of the frame FRM corresponding to the random access point RAP. For example, in the video data, sample numbers (“1”, “n + 1”, etc.) such as frames FRMv (1), FRMv (n + 1) are stored in the box “STSS”. Also, for example, in the audio data, the sample numbers of frames FRMa (1), FRMa (n + 1), etc. (“number of frames of chunk CNKv (1) +1”, “number of frames of chunk CNKv (1) + n + 1”, etc.) are boxes. Stored in “STSS”.

図５は、チャンクＣＮＫとランダムアクセスポイントＲＡＰとの関係の別の例を示している。なお、図５は、１つのＧＯＰのフレームＦＲＭｖが１つのチャンクＣＮＫｖにまとめられたときの例を示している。図の太線で示したフレームＦＲＭｖの意味は、図４と同じである。また、フレームＦＲＭの括弧内の数字およびチャンクＣＮＫの括弧内の数字の意味は、図４と同じである。 FIG. 5 shows another example of the relationship between the chunk CNK and the random access point RAP. FIG. 5 shows an example when the frame FRMv of one GOP is combined into one chunk CNKv. The meaning of the frame FRMv indicated by the bold line in the figure is the same as that in FIG. Further, the meanings of the numbers in parentheses of the frame FRM and the numbers in parentheses of the chunk CNK are the same as those in FIG.

図５の例では、１つのＧＯＰのフレームＦＲＭｖが１つのチャンクＣＮＫｖにまとめられているため、チャンクＣＮＫの先頭フレームＦＲＭがランダムアクセスポイントＲＡＰに対応する。ＭＰ４ファイルでは、チャンクＣＮＫの先頭フレームＦＲＭの位置を示す情報（ファイルの先頭からのオフセットアドレス）が、図２に示したムービーボックス（ｍｏｏｖ）Ｂ２０内の“ＳＴＣＯ”と呼ばれるボックスに格納される。このため、１つのチャンクＣＮＫが１つのＧＯＰに対応しているＭＰ４ファイルでは、ランダムアクセスポイントＲＡＰの探索を効率よく実施できる。 In the example of FIG. 5, since the frame FRMv of one GOP is combined into one chunk CNKv, the first frame FRM of the chunk CNK corresponds to the random access point RAP. In the MP4 file, information indicating the position of the top frame FRM of the chunk CNK (offset address from the top of the file) is stored in a box called “STCO” in the movie box (moov) B20 shown in FIG. For this reason, in the MP4 file in which one chunk CNK corresponds to one GOP, the random access point RAP can be efficiently searched.

したがって、例えば、映像領域３２および音声領域３４の空き容量が大きいときには、チャンク確定部２０は、１つのチャンクＣＮＫが１つのＧＯＰに対応するように、各チャンクＣＮＫのフレームＦＲＭの数を設定する。なお、チャンクＣＮＫのフレームＦＲＭの数をＧＯＰのフレーム数より少なくするとき、チャンク確定部２０は、ランダムアクセスポイントＲＡＰがチャンクＣＮＫの先頭フレームＦＲＭに対応するように、各チャンクＣＮＫのフレームＦＲＭの数を設定する。 Therefore, for example, when the free space of the video area 32 and the audio area 34 is large, the chunk determination unit 20 sets the number of frames FRM of each chunk CNK so that one chunk CNK corresponds to one GOP. When the number of frames FRM of chunk CNK is made smaller than the number of frames of GOP, the chunk determination unit 20 counts the number of frames FRM of each chunk CNK so that the random access point RAP corresponds to the first frame FRM of the chunk CNK. Set.

例えば、チャンク確定部２０は、ＩピクチャあるいはＩＤＲピクチャがチャンクＣＮＫの先頭フレームＦＲＭになるように、各チャンクＣＮＫのフレームＦＲＭの数を設定する。これにより、この実施形態では、ランダムアクセスポイントＲＡＰの探索効率を向上できる。例えば、ＭＰ４ファイルを再生する装置では、ランダムアクセスポイントＲＡＰを効率よく探索できる。 For example, the chunk determination unit 20 sets the number of frames FRM of each chunk CNK so that the I picture or IDR picture becomes the first frame FRM of the chunk CNK. Thereby, in this embodiment, the search efficiency of the random access point RAP can be improved. For example, an apparatus that plays an MP4 file can efficiently search for a random access point RAP.

また、チャンク確定部２０は、チャンクＣＮＫのフレームＦＲＭの数をＧＯＰのフレーム数より少なくするとき、ＭＰ４ファイルを再生する際の表示順序と符号化の処理順序との関係がチャンクＣＮＫ内で閉じるように、各チャンクＣＮＫのフレームＦＲＭの数を設定する。例えば、チャンク確定部２０は、ＭＰ４ファイルを再生する際の表示順序にフレームＦＲＭの順序を各チャンクＣＮＫ内で変更できるように、チャンクＣＮＫのフレームＦＲＭの数を設定する。これにより、この実施形態では、順序変更の対象フレームＦＲＭの探索効率を向上できる。例えば、ＭＰ４ファイルを再生する装置では、フレームＦＲＭの順序を表示順序にする際に、順序変更の対象フレームＦＲＭを効率よく探索できる。 In addition, when the number of chunk CNK frames FRM is smaller than the number of GOP frames, the chunk determination unit 20 closes the relationship between the display order and the encoding processing order when reproducing the MP4 file in the chunk CNK. Is set to the number of frames FRM of each chunk CNK. For example, the chunk determination unit 20 sets the number of frames FRM of the chunk CNK so that the order of the frames FRM can be changed within each chunk CNK to the display order when reproducing the MP4 file. Thereby, in this embodiment, the search efficiency of the frame FRM subject to the order change can be improved. For example, an apparatus that reproduces an MP4 file can efficiently search for a frame FRM whose order is to be changed when the order of the frames FRM is changed to the display order.

図６は、メモリ３０の占有量が閾値以上のときのチャンクＣＮＫの一例を示している。すなわち、図６は、映像領域３２の占有量が第１閾値以上である第１条件および音声領域３４の占有量が第２閾値以上である第２条件の少なくとも一方を満たすときのチャンクＣＮＫの一例を示している。なお、図６は、ＧＯＰがＩＢＢＰ構造のときのチャンクＣＮＫｖの一例を示している。図の太線で示したフレームＦＲＭｖは、Ｉピクチャを示している。また、フレームＦＲＭｖの括弧内の数字は、符号化の処理順序を示している。 FIG. 6 shows an example of the chunk CNK when the occupation amount of the memory 30 is equal to or larger than the threshold value. That is, FIG. 6 shows an example of the chunk CNK when at least one of the first condition in which the occupation amount of the video area 32 is equal to or larger than the first threshold and the second condition in which the occupation amount of the audio area 34 is equal to or larger than the second threshold is satisfied. Is shown. FIG. 6 shows an example of the chunk CNKv when the GOP has an IBBP structure. A frame FRMv indicated by a bold line in the figure indicates an I picture. The numbers in parentheses of the frame FRMv indicate the encoding processing order.

図６の例では、ＧＯＰは、１５個のフレームＦＲＭを有している。ＩＢＢＰ構造のＧＯＰでは、先ず、ＩピクチャのフレームＦＲＭｖ（１）が符号化される。次に、ＢピクチャのフレームＦＲＭｖ（２）、ＦＲＭｖ（３）が順次符号化される。そして、Ｐピクチャ、Ｂピクチャ、Ｂピクチャの順に符号化される処理が、フレームＦＲＭｖ（４）−ＦＲＭｖ（１５）まで繰り返される。例えば、フレームＦＲＭｖ（３）の次に、ＰピクチャのフレームＦＲＭｖ（４）が符号化される。フレームＦＲＭｖ（４）の次に、ＢピクチャのフレームＦＲＭｖ（５）、ＦＲＭｖ（６）が順次符号化される。 In the example of FIG. 6, the GOP has 15 frames FRM. In the GOP having the IBBP structure, first, a frame FRMv (1) of an I picture is encoded. Next, the frames FRMv (2) and FRMv (3) of the B picture are sequentially encoded. Then, the process of encoding the P picture, the B picture, and the B picture in order is repeated until the frame FRMv (4) -FRMv (15). For example, the frame FRMv (4) of the P picture is encoded after the frame FRMv (3). Next to the frame FRMv (4), the frames FRMv (5) and FRMv (6) of the B picture are sequentially encoded.

また、ＩＢＢＰ構造のＧＯＰでは、ＭＰ４ファイルを再生したとき、ＢピクチャのフレームＦＲＭｖは、先に符号化処理が実施されたＩピクチャやＰピクチャのフレームＦＲＭｖより先に表示される。例えば、ＩピクチャのフレームＦＲＭｖ（１）は、ＢピクチャのフレームＦＲＭｖ（３）の次に表示される。そして、ＰピクチャのフレームＦＲＭｖ（４）は、ＢピクチャのフレームＦＲＭｖ（６）の次に表示される。 In the GOP having the IBBP structure, when the MP4 file is reproduced, the frame FRMv of the B picture is displayed before the frame FRMv of the I picture or the P picture that has been previously encoded. For example, the frame FRMv (1) of the I picture is displayed next to the frame FRMv (3) of the B picture. The P picture frame FRMv (4) is displayed next to the B picture frame FRMv (6).

したがって、ＩＢＢＰ構造のＧＯＰでは、チャンクＣＮＫｖのフレームＦＲＭｖの数は、３の倍数に設定される。例えば、チャンク確定部２０は、３の倍数のうち、２以上の最小値（図６では、“３”）に、チャンクＣＮＫｖのフレームＦＲＭｖの数を設定する。これにより、例えば、フレームＦＲＭｖ（１）、ＦＲＭｖ（２）、ＦＲＭｖ（３）は、チャンクＣＮＫｖ（１）として管理される。また、フレームＦＲＭｖ（４）、ＦＲＭｖ（５）、ＦＲＭｖ（６）は、チャンクＣＮＫｖ（２）として管理される。そして、フレームＦＲＭｖ（１３）、ＦＲＭｖ（１４）、ＦＲＭｖ（１５）は、チャンクＣＮＫｖ（５）として管理される。 Therefore, in the GOP having the IBBP structure, the number of frames CRMv of the chunk CNKv is set to a multiple of 3. For example, the chunk determination unit 20 sets the number of the frame CRMv of the chunk CNKv to a minimum value of 2 or more (“3” in FIG. 6) among multiples of 3. Thereby, for example, the frames FRMv (1), FRMv (2), and FRMv (3) are managed as the chunk CNKv (1). Also, the frames FRMv (4), FRMv (5), and FRMv (6) are managed as chunk CNKv (2). The frames FRMv (13), FRMv (14), and FRMv (15) are managed as a chunk CNKv (5).

このように、チャンク確定部２０は、ＭＰ４ファイルを再生する際の表示順序にフレームＦＲＭｖの順序をチャンクＣＮＫｖ内で変更可能なフレーム数のうち、２以上の最小値に、チャンクＣＮＫｖのフレーム数を設定する。なお、チャンクＣＮＫｖのフレーム数の設定は、この例に限定されない。例えば、１つのＧＯＰは、フレームＦＲＭｖ（１）−ＦＲＭｖ（６）のチャンクＣＮＫｖと、フレームＦＲＭｖ（７）−ＦＲＭｖ（１２）のチャンクＣＮＫｖと、フレームＦＲＭｖ（１３）−ＦＲＭｖ（１５）のチャンクＣＮＫｖとにより管理されてもよい。 As described above, the chunk determination unit 20 sets the number of frames of the chunk CNKv to a minimum value of 2 or more among the number of frames that can change the order of the frame FRMv in the display order when reproducing the MP4 file in the chunk CNKv. Set. The setting of the number of frames of chunk CNKv is not limited to this example. For example, one GOP includes the chunk CNKv of the frame FRMv (1) -FRMv (6), the chunk CNKv of the frame FRMv (7) -FRMv (12), and the chunk CNKv of the frame FRMv (13) -FRMv (15). It may be managed by.

また、音声データのチャンクＣＮＫａは、映像データのチャンクＣＮＫｖに対応するように設定される。例えば、チャンク確定部２０は、チャンクＣＮＫａのフレームＦＲＭａの数を、チャンクＣＮＫｖのフレームＦＲＭｖの数と同じ数に設定する。 The audio data chunk CNKa is set to correspond to the video data chunk CNKv. For example, the chunk determination unit 20 sets the number of the frame FRMa of the chunk CNKa to the same number as the number of the frames FRMv of the chunk CNKv.

図７は、メモリ３０の占有量が閾値以上のときのチャンクＣＮＫの別の例を示している。なお、図７は、ＧＯＰがＩＰＰＰ構造でＧＯＰのフレーム数が奇数のときのチャンクＣＮＫｖの一例を示している。図の太線で示したフレームＦＲＭｖの意味は、図６と同じである。また、フレームＦＲＭｖの括弧内の数字の意味は、図６と同じである。 FIG. 7 shows another example of the chunk CNK when the occupation amount of the memory 30 is equal to or larger than the threshold value. FIG. 7 shows an example of the chunk CNKv when the GOP has an IPPP structure and the number of GOP frames is an odd number. The meaning of the frame FRMv indicated by the bold line in the figure is the same as that in FIG. The meanings of the numbers in parentheses of the frame FRMv are the same as those in FIG.

図７の例では、ＧＯＰは、１５個のフレームＦＲＭを有している。ＩＰＰＰ構造のＧＯＰでは、先ず、ＩピクチャのフレームＦＲＭｖ（１）が符号化される。次に、ＰピクチャのフレームＦＲＭｖ（２）−ＦＲＭｖ（１５）が順次符号化される。また、ＩＰＰＰ構造のＧＯＰでは、ＭＰ４ファイルを再生する際の表示順序は、符号化の処理順序と同じである。 In the example of FIG. 7, the GOP has 15 frames FRM. In the GOP having the IPPP structure, first, a frame FRMv (1) of an I picture is encoded. Next, the frames FRMv (2) -FRMv (15) of the P picture are sequentially encoded. In addition, in the GOP having the IPPP structure, the display order when reproducing the MP4 file is the same as the encoding processing order.

したがって、ＩＰＰＰ構造のＧＯＰでは、例えば、チャンク確定部２０は、チャンクＣＮＫｖのフレームＦＲＭｖの数を３に設定する。これにより、１つのＧＯＰに含まれる複数のチャンクＣＮＫｖは、互いに同じ数のフレーム数に設定される。例えば、フレームＦＲＭｖ（１）、ＦＲＭｖ（２）、ＦＲＭｖ（３）は、チャンクＣＮＫｖ（１）として管理される。また、フレームＦＲＭｖ（４）、ＦＲＭｖ（５）、ＦＲＭｖ（６）は、チャンクＣＮＫｖ（２）として管理される。そして、フレームＦＲＭｖ（１３）、ＦＲＭｖ（１４）、ＦＲＭｖ（１５）は、チャンクＣＮＫｖ（５）として管理される。 Accordingly, in the GOP having the IPPP structure, for example, the chunk determination unit 20 sets the number of frames FRMv of the chunk CNKv to 3. Thereby, the plurality of chunks CNKv included in one GOP are set to the same number of frames. For example, the frames FRMv (1), FRMv (2), and FRMv (3) are managed as chunk CNKv (1). Also, the frames FRMv (4), FRMv (5), and FRMv (6) are managed as chunk CNKv (2). The frames FRMv (13), FRMv (14), and FRMv (15) are managed as a chunk CNKv (5).

なお、チャンクＣＮＫｖのフレームＦＲＭｖの数は、３以外の２以上の値に設定されてもよい。例えば、チャンク確定部２０は、チャンクＣＮＫｖのフレームＦＲＭｖの数を５に設定してもよい。この実施形態では、チャンクＣＮＫｖのフレームＦＲＭｖの数を２以上に設定することにより、ＭＰ４ファイルのサイズが増加することを抑制している。 Note that the number of the frame FRMv of the chunk CNKv may be set to a value of 2 or more other than 3. For example, the chunk determination unit 20 may set the number of frames FRMv of the chunk CNKv to 5. In this embodiment, an increase in the size of the MP4 file is suppressed by setting the number of the frame FRMv of the chunk CNKv to 2 or more.

例えば、メッセージＭＥＳは、図３で説明したように、チャンクＣＮＫ毎に生成される。このため、チャンクＣＮＫｖのフレームＦＲＭｖの数を１に設定したとき、メッセージＭＥＳの数が増加し、メッセージＭＥＳの合計のサイズが増加する。メッセージＭＥＳは、図２に示したムービーボックス（ｍｏｏｖ）Ｂ２０に格納される。このため、ムービーボックス（ｍｏｏｖ）Ｂ２０のサイズが増加し、ＭＰ４ファイルのサイズが増加する。なお、この実施形態では、チャンクＣＮＫｖのフレームＦＲＭｖの数を２以上に設定するため、ＭＰ４ファイルのサイズが増加することを抑制できる。 For example, the message MES is generated for each chunk CNK as described with reference to FIG. For this reason, when the number of frames CRMv of chunk CNKv is set to 1, the number of message MESs increases and the total size of message MESs increases. The message MES is stored in the movie box (moov) B20 shown in FIG. For this reason, the size of the movie box (moov) B20 increases, and the size of the MP4 file increases. In this embodiment, since the number of frames FRMv of chunk CNKv is set to 2 or more, it is possible to suppress an increase in the size of the MP4 file.

図８は、メモリ３０の占有量が閾値以上のときのチャンクＣＮＫの別の例を示している。なお、図８は、ＧＯＰがＩＰＰＰ構造でＧＯＰのフレーム数が偶数のときのチャンクＣＮＫｖの一例を示している。図の太線で示したフレームＦＲＭｖの意味は、図６と同じである。また、フレームＦＲＭｖの括弧内の数字の意味は、図６と同じである。 FIG. 8 shows another example of the chunk CNK when the occupation amount of the memory 30 is equal to or larger than the threshold value. FIG. 8 shows an example of the chunk CNKv when the GOP has an IPPP structure and the number of GOP frames is an even number. The meaning of the frame FRMv indicated by the bold line in the figure is the same as that in FIG. The meanings of the numbers in parentheses of the frame FRMv are the same as those in FIG.

図８の例では、ＧＯＰは、１６個のフレームＦＲＭを有している。ＩＰＰＰ構造のＧＯＰでは、先ず、ＩピクチャのフレームＦＲＭｖ（１）が符号化される。次に、ＰピクチャのフレームＦＲＭｖ（２）−ＦＲＭｖ（１６）が順次符号化される。また、ＩＰＰＰ構造のＧＯＰでは、ＭＰ４ファイルを再生する際の表示順序は、符号化の処理順序と同じである。 In the example of FIG. 8, the GOP has 16 frames FRM. In the GOP having the IPPP structure, first, a frame FRMv (1) of an I picture is encoded. Next, the frames FRMv (2) -FRMv (16) of the P picture are sequentially encoded. In addition, in the GOP having the IPPP structure, the display order when reproducing the MP4 file is the same as the encoding processing order.

したがって、ＩＰＰＰ構造のＧＯＰでは、例えば、チャンク確定部２０は、チャンクＣＮＫｖのフレームＦＲＭｖの数を２に設定する。これにより、１つのＧＯＰに含まれる複数のチャンクＣＮＫｖは、互いに同じ数のフレーム数に設定される。例えば、フレームＦＲＭｖ（１）、ＦＲＭｖ（２）は、チャンクＣＮＫｖ（１）として管理される。また、フレームＦＲＭｖ（３）、ＦＲＭｖ（４）は、チャンクＣＮＫｖ（２）として管理される。フレームＦＲＭｖ（５）、ＦＲＭｖ（６）は、チャンクＣＮＫｖ（３）として管理される。そして、フレームＦＲＭｖ（１５）、ＦＲＭｖ（１６）は、チャンクＣＮＫｖ（８）として管理される。 Accordingly, in the GOP having the IPPP structure, for example, the chunk determination unit 20 sets the number of frames FRMv of the chunk CNKv to 2. Thereby, the plurality of chunks CNKv included in one GOP are set to the same number of frames. For example, the frames FRMv (1) and FRMv (2) are managed as chunk CNKv (1). The frames FRMv (3) and FRMv (4) are managed as chunk CNKv (2). Frames FRMv (5) and FRMv (6) are managed as chunk CNKv (3). Frames FRMv (15) and FRMv (16) are managed as chunk CNKv (8).

なお、チャンクＣＮＫｖのフレームＦＲＭｖの数は、２以外の２以上の値に設定されてもよい。例えば、チャンク確定部２０は、チャンクＣＮＫｖのフレームＦＲＭｖの数を４に設定してもよいし、チャンクＣＮＫｖのフレームＦＲＭｖの数を８に設定してもよい。このように、この実施形態では、チャンクＣＮＫｖのフレームＦＲＭｖの数を２以上に設定するため、ＭＰ４ファイルのサイズが増加することを抑制できる。 Note that the number of frames CRMv of the chunk CNKv may be set to a value of 2 or more other than 2. For example, the chunk determination unit 20 may set the number of frames FRMv of the chunk CNKv to 4 or may set the number of frames FRMv of the chunk CNKv to 8. Thus, in this embodiment, since the number of chunk CNKv frames FRMv is set to 2 or more, an increase in the size of the MP4 file can be suppressed.

図９は、メモリ３０の占有量が閾値以上のときのチャンクＣＮＫの別の例を示している。なお、図９は、ＧＯＰがＩＢＰ構造のときのチャンクＣＮＫｖの一例を示している。図の太線で示したフレームＦＲＭｖの意味は、図６と同じである。また、フレームＦＲＭｖの括弧内の数字の意味は、図６と同じである。 FIG. 9 shows another example of the chunk CNK when the occupation amount of the memory 30 is equal to or larger than the threshold value. FIG. 9 shows an example of the chunk CNKv when the GOP has an IBP structure. The meaning of the frame FRMv indicated by the bold line in the figure is the same as that in FIG. The meanings of the numbers in parentheses of the frame FRMv are the same as those in FIG.

図９の例では、ＧＯＰは、１６個のフレームＦＲＭを有している。ＩＢＰ構造のＧＯＰでは、先ず、ＩピクチャのフレームＦＲＭｖ（１）が符号化される。次に、ＢピクチャのフレームＦＲＭｖ（２）が符号化される。そして、Ｐピクチャ、Ｂピクチャの順に符号化される処理が、フレームＦＲＭｖ（３）−ＦＲＭｖ（１６）まで繰り返される。例えば、フレームＦＲＭｖ（２）の次に、ＰピクチャのフレームＦＲＭｖ（３）が符号化される。フレームＦＲＭｖ（３）の次に、ＢピクチャのフレームＦＲＭｖ（４）が符号化される。 In the example of FIG. 9, the GOP has 16 frames FRM. In the GOP having the IBP structure, first, a frame FRMv (1) of an I picture is encoded. Next, the frame FRMv (2) of the B picture is encoded. Then, the process of encoding the P picture and the B picture in order is repeated until the frame FRMv (3) -FRMv (16). For example, the frame FRMv (3) of the P picture is encoded after the frame FRMv (2). Next to the frame FRMv (3), the frame FRMv (4) of the B picture is encoded.

また、ＩＢＰ構造のＧＯＰでは、ＭＰ４ファイルを再生したとき、ＢピクチャのフレームＦＲＭｖは、先に符号化処理が実施されたＩピクチャやＰピクチャのフレームＦＲＭｖより先に表示される。例えば、ＩピクチャのフレームＦＲＭｖ（１）は、ＢピクチャのフレームＦＲＭｖ（２）の次に表示される。そして、ＰピクチャのフレームＦＲＭｖ（３）は、ＢピクチャのフレームＦＲＭｖ（４）の次に表示される。 In the GOP having the IBP structure, when the MP4 file is reproduced, the frame FRMv of the B picture is displayed before the frame FRMv of the I picture or P picture that has been previously encoded. For example, the frame FRMv (1) of the I picture is displayed next to the frame FRMv (2) of the B picture. The P picture frame FRMv (3) is displayed next to the B picture frame FRMv (4).

したがって、ＩＢＰ構造のＧＯＰでは、チャンクＣＮＫｖのフレームＦＲＭｖの数は、２の倍数に設定される。例えば、チャンク確定部２０は、２の倍数のうち、２以上の最小値（図９では、“２”）に、チャンクＣＮＫｖのフレームＦＲＭｖの数を設定する。これにより、例えば、フレームＦＲＭｖ（１）、ＦＲＭｖ（２）は、チャンクＣＮＫｖ（１）として管理される。また、フレームＦＲＭｖ（３）、ＦＲＭｖ（４）は、チャンクＣＮＫｖ（２）として管理される。フレームＦＲＭｖ（５）、ＦＲＭｖ（６）は、チャンクＣＮＫｖ（３）として管理される。そして、フレームＦＲＭｖ（１５）、ＦＲＭｖ（１６）は、チャンクＣＮＫｖ（８）として管理される。 Therefore, in the GOP having the IBP structure, the number of frames FRMv of the chunk CNKv is set to a multiple of 2. For example, the chunk determination unit 20 sets the number of the frame CRMv of the chunk CNKv to a minimum value of 2 or more (“2” in FIG. 9) among multiples of 2. Thereby, for example, the frames FRMv (1) and FRMv (2) are managed as the chunk CNKv (1). The frames FRMv (3) and FRMv (4) are managed as chunk CNKv (2). Frames FRMv (5) and FRMv (6) are managed as chunk CNKv (3). Frames FRMv (15) and FRMv (16) are managed as chunk CNKv (8).

なお、チャンクＣＮＫｖのフレームＦＲＭｖの数は、２以外の２の倍数に設定されてもよい。例えば、チャンク確定部２０は、チャンクＣＮＫｖのフレームＦＲＭｖの数を４に設定してもよいし、チャンクＣＮＫｖのフレームＦＲＭｖの数を８に設定してもよい。このように、この実施形態では、チャンクＣＮＫｖのフレームＦＲＭｖの数を２以上に設定するため、ＭＰ４ファイルのサイズが増加することを抑制できる。 Note that the number of frames FRMv of the chunk CNKv may be set to a multiple of 2 other than 2. For example, the chunk determination unit 20 may set the number of frames FRMv of the chunk CNKv to 4 or may set the number of frames FRMv of the chunk CNKv to 8. Thus, in this embodiment, since the number of chunk CNKv frames FRMv is set to 2 or more, an increase in the size of the MP4 file can be suppressed.

図６−図９で説明したように、チャンク確定部２０は、チャンクＣＮＫのフレームＦＲＭの数をＧＯＰのフレーム数より少なくするとき、ＧＯＰの構造およびＧＯＰのサイズ（フレーム数）に基づいて、チャンクＣＮＫのフレーム数を設定する。 As described with reference to FIG. 6 to FIG. 9, when the number of frames FRM of the chunk CNK is smaller than the number of GOP frames, the chunk determination unit 20 determines the chunk based on the GOP structure and the GOP size (frame number). Sets the number of CNK frames.

なお、映像・音声データ処理装置１０の動作は、この例に限定されない。例えば、チャンク確定部２０は、映像領域３２および音声領域３４の空き容量に応じて、チャンクＣＮＫのフレーム数を段階的に変更してもよい。例えば、映像領域３２の第１閾値および音声領域３４の第２閾値の少なくとも一方は、複数の値を有してもよい。そして、チャンク確定部２０は、複数の値と比較される占有量に応じて、チャンクＣＮＫのフレーム数を段階的に変更してもよい。 The operation of the video / audio data processing apparatus 10 is not limited to this example. For example, the chunk determination unit 20 may change the number of frames of the chunk CNK step by step according to the free space in the video area 32 and the audio area 34. For example, at least one of the first threshold value of the video area 32 and the second threshold value of the audio area 34 may have a plurality of values. Then, the chunk determination unit 20 may change the number of frames of the chunk CNK step by step according to the occupation amount compared with a plurality of values.

例えば、チャンク確定部２０は、映像領域３２の閾値に複数の値を設定し、映像領域３２の占有量と複数の値との比較結果に応じて、チャンクＣＮＫのフレーム数を段階的に変更してもよい。あるいは、チャンク確定部２０は、音声領域３４の閾値に複数の値を設定し、音声領域３４の占有量と複数の値との比較結果に応じて、チャンクＣＮＫのフレーム数を段階的に変更してもよい。 For example, the chunk determination unit 20 sets a plurality of values for the threshold value of the video area 32, and changes the number of frames of the chunk CNK step by step according to the comparison result between the occupied amount of the video area 32 and the plurality of values. May be. Alternatively, the chunk determination unit 20 sets a plurality of values for the threshold value of the voice area 34, and changes the number of frames of the chunk CNK stepwise in accordance with the comparison result between the occupation amount of the voice area 34 and the plurality of values. May be.

以上、この実施形態では、映像・音声データ処理装置１０は、映像領域３２および音声領域３４の少なくとも一方の空き容量が小さいとき、チャンクのフレーム数をＧＯＰのフレーム数より少なくする。例えば、チャンクＣＮＫのフレーム数は、ＭＰ４ファイルを再生する際の表示順序と符号化の処理順序との関係がチャンクＣＮＫ内で閉じる条件と、ＩピクチャあるいはＩＤＲピクチャがチャンクＣＮＫの先頭フレームＦＲＭになる条件との両方を満たす数に設定される。これにより、この実施形態では、ランダムアクセスポイントＲＡＰの探索効率と順序変更の対象フレームＦＲＭの探索効率とを向上できる。この結果、例えば、ＭＰ４ファイルを再生する装置では、ランダムアクセスポイントＲＡＰを効率よく探索できる。また、例えば、ＭＰ４ファイルを再生する装置では、フレームＦＲＭの順序を表示順序にする際に、順序変更の対象フレームＦＲＭを効率よく探索できる。 As described above, in this embodiment, the video / audio data processing device 10 makes the number of chunk frames smaller than the number of GOP frames when the free space of at least one of the video area 32 and the audio area 34 is small. For example, the number of frames of the chunk CNK is such that the relationship between the display order when reproducing the MP4 file and the encoding processing order is closed in the chunk CNK, and the I picture or IDR picture becomes the first frame FRM of the chunk CNK. It is set to a number that satisfies both of the conditions. As a result, in this embodiment, the search efficiency of the random access point RAP and the search efficiency of the target frame FRM whose order is changed can be improved. As a result, for example, an apparatus that plays an MP4 file can efficiently search for a random access point RAP. Further, for example, in an apparatus that plays back an MP4 file, when changing the order of the frames FRM to the display order, it is possible to efficiently search for the frame FRM whose order is changed.

図１０は、別の実施形態における映像・音声データ処理装置１２の一例を示している。上述した実施形態で説明した要素と同一の要素については、同一の符号を付し、これ等については、詳細な説明を省略する。映像・音声データ処理装置１２は、例えば、ＭＰ４ファイルを生成するシステムに、搭載される。 FIG. 10 shows an example of the video / audio data processing device 12 in another embodiment. The same elements as those described in the above-described embodiment are denoted by the same reference numerals, and detailed description thereof will be omitted. The video / audio data processing device 12 is mounted on, for example, a system that generates an MP4 file.

映像・音声データ処理装置１２は、例えば、システム部２２、メモリ３０、データ符号化部４０および多重化部５０を有している。メモリ３０は、例えば、映像領域３２、音声領域３４、設定領域３６および付属情報領域３８を有している。映像領域３２および音声領域３４は、上述した実施形態と同じである。設定領域３６には、例えば、外部ホスト等により指定されるＧＯＰの構造等が記憶される。付属情報領域３８は、符号化された映像データＶＤＡおよび音声データＡＤＡに関するヘッダ情報等が記憶される。なお、付属情報領域３８は、映像データＶＤＡ用と音声データＡＤＡ用とに分けられてもよい。 The video / audio data processing device 12 includes, for example, a system unit 22, a memory 30, a data encoding unit 40, and a multiplexing unit 50. The memory 30 includes, for example, a video area 32, an audio area 34, a setting area 36, and an attached information area 38. The video area 32 and the audio area 34 are the same as those in the above-described embodiment. In the setting area 36, for example, a GOP structure designated by an external host or the like is stored. The attached information area 38 stores header information and the like related to the encoded video data VDA and audio data ADA. The attached information area 38 may be divided into video data VDA and audio data ADA.

データ符号化部４０は、例えば、映像データＶＤＡおよび音声データＡＤＡを順次受け、受けた映像データＶＤＡおよび音声データＡＤＡをＭＰＥＧ−４等に準拠した符号化方式で順次符号化する。例えば、データ符号化部４０は、符号化制御部４２、映像符号化部４４および音声符号化部４６を有している。 For example, the data encoding unit 40 sequentially receives the video data VDA and the audio data ADA, and sequentially encodes the received video data VDA and audio data ADA by an encoding method compliant with MPEG-4 or the like. For example, the data encoding unit 40 includes an encoding control unit 42, a video encoding unit 44, and an audio encoding unit 46.

符号化制御部４２は、外部ホスト等により指定されたＧＯＰの構造等を、メモリ３０の設定領域３６から取得する。そして、符号化制御部４２は、取得したＧＯＰの構造等を、映像符号化部４４および音声符号化部４６の符号化処理に反映させる。また、符号化制御部４２は、例えば、映像データＶＤＡの符号化に関するヘッダ情報を、映像符号化部４４による１映像フレームの符号化に同期させて、メモリ３０の付属情報領域３８に格納する。映像データＶＤＡの符号化に関するヘッダ情報は、例えば、ＧＯＰの構造、ピクチャタイプおよび１映像フレームのサイズである。さらに、符号化制御部４２は、例えば、１映像フレームの符号化が完了したとき、１映像フレームの符号化が完了したことをシステム部２２のシステム制御部２６に通知する。 The encoding control unit 42 acquires the GOP structure specified by the external host or the like from the setting area 36 of the memory 30. Then, the encoding control unit 42 reflects the acquired GOP structure and the like in the encoding processing of the video encoding unit 44 and the audio encoding unit 46. For example, the encoding control unit 42 stores header information related to encoding of the video data VDA in the attached information area 38 of the memory 30 in synchronization with the encoding of one video frame by the video encoding unit 44. The header information related to the encoding of the video data VDA is, for example, the GOP structure, the picture type, and the size of one video frame. Further, for example, when encoding of one video frame is completed, the encoding control unit 42 notifies the system control unit 26 of the system unit 22 that encoding of one video frame is completed.

また、符号化制御部４２は、例えば、音声データＡＤＡの符号化に関するヘッダ情報を、音声符号化部４６による１音声フレームの符号化に同期させて、メモリ３０の付属情報領域３８に格納する。さらに、符号化制御部４２は、例えば、１音声フレームの符号化が完了したとき、１音声フレームの符号化が完了したことをシステム部２２のシステム制御部２６に通知する。 Also, the encoding control unit 42 stores, for example, header information related to encoding of the audio data ADA in the attached information area 38 of the memory 30 in synchronization with the encoding of one audio frame by the audio encoding unit 46. Furthermore, for example, when encoding of one audio frame is completed, the encoding control unit 42 notifies the system control unit 26 of the system unit 22 that encoding of one audio frame is completed.

映像符号化部４４は、例えば、映像データＶＤＡを順次受け、受けた映像データＶＤＡをＭＰＥＧ−４等に準拠した符号化方式で順次符号化し、符号化映像データＶＥＤ（符号化された映像データＶＤＡ）を生成する。例えば、映像符号化部４４は、符号化制御部４２がメモリ３０の設定領域３６から取得したＧＯＰの構造等に基づいて、映像データＶＤＡを符号化する。そして、映像符号化部４４は、符号化映像データＶＥＤをメモリ３０の映像領域３２に格納する。 For example, the video encoding unit 44 sequentially receives the video data VDA, sequentially encodes the received video data VDA by an encoding method compliant with MPEG-4 or the like, and generates encoded video data VED (encoded video data VDA). ) Is generated. For example, the video encoding unit 44 encodes the video data VDA based on the GOP structure and the like acquired by the encoding control unit 42 from the setting area 36 of the memory 30. Then, the video encoding unit 44 stores the encoded video data VED in the video area 32 of the memory 30.

音声符号化部４６は、例えば、音声データＡＤＡを順次受け、受けた音声データＡＤＡをＭＰＥＧ−４等に準拠した符号化方式で順次符号化し、符号化音声データＡＥＤ（符号化された音声データＡＤＡ）を生成する。そして、音声符号化部４６は、符号化音声データＡＥＤをメモリ３０の音声領域３４に格納する。 For example, the audio encoding unit 46 sequentially receives the audio data ADA, sequentially encodes the received audio data ADA using an encoding method compliant with MPEG-4 or the like, and generates encoded audio data AED (encoded audio data ADA). ) Is generated. Then, the speech encoding unit 46 stores the encoded speech data AED in the speech area 34 of the memory 30.

システム部２２は、例えば、多重化部５０の制御およびメッセージＭＥＳの生成を実施する。例えば、システム部２２は、チャンク確定部２４、システム制御部２６およびメッセージ生成部２８を有している。システム制御部２６は、１映像フレームの符号化の完了通知を符号化制御部４２から受けたとき、映像データＶＤＡの符号化に関するヘッダ情報を、メモリ３０の付属情報領域３８から取得する。すなわち、システム制御部２６は、１映像フレームの符号化の完了に応答して、ＧＯＰの構造、ピクチャタイプおよび１映像フレームのサイズ等を取得する。そして、システム制御部２６は、例えば、映像データＶＤＡの符号化に関するヘッダ情報（ＧＯＰの構造、ピクチャタイプおよび１映像フレームのサイズ等）を、チャンク確定部２４およびメッセージ生成部２８に通知する。 For example, the system unit 22 controls the multiplexing unit 50 and generates a message MES. For example, the system unit 22 includes a chunk determination unit 24, a system control unit 26, and a message generation unit 28. When the system control unit 26 receives from the encoding control unit 42 a notification of completion of encoding of one video frame, the system control unit 26 acquires header information related to encoding of the video data VDA from the attached information area 38 of the memory 30. That is, in response to the completion of encoding of one video frame, the system control unit 26 acquires the GOP structure, picture type, size of one video frame, and the like. Then, for example, the system control unit 26 notifies the chunk determination unit 24 and the message generation unit 28 of header information (GOP structure, picture type, size of one video frame, etc.) related to the encoding of the video data VDA.

また、システム制御部２６は、１音声フレームの符号化の完了通知を符号化制御部４２から受けたとき、音声データＡＤＡの符号化に関するヘッダ情報を、メモリ３０の付属情報領域３８から取得する。そして、システム制御部２６は、例えば、音声データＡＤＡの符号化に関するヘッダ情報を、チャンク確定部２４およびメッセージ生成部２８に通知する。 Further, when the system control unit 26 receives a notification of completion of encoding of one audio frame from the encoding control unit 42, the system control unit 26 acquires header information related to encoding of the audio data ADA from the attached information area 38 of the memory 30. Then, for example, the system control unit 26 notifies the chunk determination unit 24 and the message generation unit 28 of header information related to encoding of the audio data ADA.

チャンク確定部２４は、映像領域３２の占有量および音声領域３４の占有量をそれぞれ監視する。そして、チャンク確定部２４は、映像領域３２および音声領域３４の少なくとも一方の空き容量が小さいとき、図６−図９で説明したように、ＧＯＰの構造およびピクチャタイプ等に基づいて、チャンクのフレーム数を設定する。 The chunk determination unit 24 monitors the occupation amount of the video area 32 and the occupation amount of the audio area 34. Then, when the free capacity of at least one of the video area 32 and the audio area 34 is small, the chunk determination unit 24, as described with reference to FIGS. 6 to 9, based on the GOP structure, picture type, and the like, Set the number.

例えば、チャンク確定部２４は、システム制御部２６から受けた映像データＶＤＡの符号化に関するヘッダ情報および音声データＡＤＡの符号化に関するヘッダ情報に基づいて、チャンクのフレーム数を設定する。そして、チャンク確定部２４は、例えば、設定したチャンクのフレーム数の情報（以下、チャンク設定通知とも称する）を、多重化部５０の多重化制御部５２にシステム制御部２６を介して通知する。 For example, the chunk determination unit 24 sets the number of chunk frames based on the header information regarding the encoding of the video data VDA and the header information regarding the encoding of the audio data ADA received from the system control unit 26. Then, the chunk determination unit 24 notifies, for example, information on the number of frames of the set chunk (hereinafter also referred to as chunk setting notification) to the multiplexing control unit 52 of the multiplexing unit 50 via the system control unit 26.

メッセージ生成部２８は、例えば、システム制御部２６から受けた情報（ＧＯＰの構造、ピクチャタイプおよび１映像フレームのサイズ等）に基づいて、チャンクに対応するメッセージＭＥＳを生成する。そして、メッセージ生成部２８は、生成したメッセージＭＥＳを、対応するチャンクの出力に同期させて、外部（例えば、図３に示した外部装置１００）に出力する。 For example, the message generation unit 28 generates a message MES corresponding to a chunk based on information received from the system control unit 26 (GOP structure, picture type, size of one video frame, and the like). Then, the message generation unit 28 outputs the generated message MES to the outside (for example, the external device 100 illustrated in FIG. 3) in synchronization with the output of the corresponding chunk.

多重化部５０は、映像領域３２および音声領域３４にそれぞれ記憶された符号化映像データＶＥＤおよび符号化音声データＡＥＤを多重化して、外部（例えば、図３に示した外部装置１００）に出力する。例えば、多重化部５０は、多重化制御部５２、入力部５４、多重化バッファ５６、暗号化部５８および出力バッファ６０を有している。 The multiplexing unit 50 multiplexes the encoded video data VED and the encoded audio data AED stored in the video area 32 and the audio area 34, respectively, and outputs them to the outside (for example, the external device 100 shown in FIG. 3). . For example, the multiplexing unit 50 includes a multiplexing control unit 52, an input unit 54, a multiplexing buffer 56, an encryption unit 58, and an output buffer 60.

多重化制御部５２は、例えば、システム部２２からの情報に基づいて入力部５４等を制御する。例えば、多重化制御部５２は、システム制御部２６から受けるチャンク設定通知に応答して、入力部５４を起動する。入力部５４は、例えば、１チャンク分の符号化映像データＶＥＤを映像領域３２から読み出す処理と１チャンク分の符号化音声データＡＥＤを音声領域３４から読み出す処理とを、交互に実施する。なお、映像領域３２および音声領域３４から読み出された符号化映像データＶＥＤおよび符号化音声データＡＥＤは、多重化バッファ５６に順次出力される。 For example, the multiplexing control unit 52 controls the input unit 54 and the like based on information from the system unit 22. For example, the multiplexing control unit 52 activates the input unit 54 in response to the chunk setting notification received from the system control unit 26. For example, the input unit 54 alternately performs a process of reading the encoded video data VED for one chunk from the video area 32 and a process of reading the encoded audio data AED for one chunk from the audio area 34. The encoded video data VED and the encoded audio data AED read from the video area 32 and the audio area 34 are sequentially output to the multiplexing buffer 56.

多重化バッファ５６は、入力部５４から受けた符号化映像データＶＥＤおよび符号化音声データＡＥＤを、暗号化部５８に順次出力する。これにより、暗号化部５８は、１チャンク分の符号化映像データＶＥＤおよび１チャンク分の符号化音声データＡＥＤを交互に受ける。暗号化部５８は、多重化バッファ５６から受ける符号化映像データＶＥＤおよび符号化音声データＡＥＤを、順次暗号化する。暗号化の方式等は、例えば、システム部２２により予め指定される。例えば、暗号化部５８は、システム部２２のシステム制御部２６から多重化制御部５２を介して、暗号化の方式等に関する情報を受ける。 The multiplexing buffer 56 sequentially outputs the encoded video data VED and the encoded audio data AED received from the input unit 54 to the encryption unit 58. Thus, the encryption unit 58 alternately receives one chunk of encoded video data VED and one chunk of encoded audio data AED. The encryption unit 58 sequentially encrypts the encoded video data VED and the encoded audio data AED received from the multiplexing buffer 56. The encryption method and the like are specified in advance by the system unit 22, for example. For example, the encryption unit 58 receives information on the encryption method and the like from the system control unit 26 of the system unit 22 via the multiplexing control unit 52.

暗号化部５８で暗号化された符号化映像データＶＥＤおよび符号化音声データＡＥＤは、出力バッファ６０に順次出力される。すなわち、出力バッファ６０は、暗号化された１チャンク分の符号化映像データＶＥＤおよび暗号化された１チャンク分の符号化音声データＡＥＤを交互に受ける。出力バッファ６０は、暗号化された１チャンク分の符号化映像データＶＥＤと暗号化された１チャンク分の符号化音声データＡＥＤとを交互に出力する。 The encoded video data VED and encoded audio data AED encrypted by the encryption unit 58 are sequentially output to the output buffer 60. That is, the output buffer 60 alternately receives the encrypted video data VED for one chunk and the encoded audio data AED for one chunk. The output buffer 60 alternately outputs encrypted video data VED for one chunk and encrypted audio data AED for one chunk.

すなわち、出力バッファ６０は、１チャンク分の符号化映像データＶＥＤと１チャンク分の符号化音声データＡＥＤとを多重化したストリームデータＳＤＡを、外部（例えば、図３に示した外部装置１００）に出力する。例えば、ストリームデータＳＤＡおよびメッセージＭＥＳを受けた外部装置は、ストリームデータＳＤＡおよびメッセージＭＥＳをＭＰ４ファイルに格納し、ＭＰ４ファイルを生成する。 That is, the output buffer 60 sends the stream data SDA obtained by multiplexing the encoded video data VED for one chunk and the encoded audio data AED for one chunk to the outside (for example, the external device 100 shown in FIG. 3). Output. For example, the external device that receives the stream data SDA and the message MES stores the stream data SDA and the message MES in an MP4 file, and generates an MP4 file.

このように、映像・音声データ処理装置１２は、ＭＰＥＧ−４等に準拠したエンコーダとして機能する。なお、映像・音声データ処理装置１２の構成は、この例に限定されない。例えば、映像・音声データ処理装置１２は、符号化された映像データおよび音声データを復号する機能を有してもよい。すなわち、映像・音声データ処理装置１２は、コーデックやトランスコーダとして機能するように形成されてもよい。 In this way, the video / audio data processing device 12 functions as an encoder conforming to MPEG-4 or the like. The configuration of the video / audio data processing device 12 is not limited to this example. For example, the video / audio data processing device 12 may have a function of decoding encoded video data and audio data. In other words, the video / audio data processing device 12 may be formed to function as a codec or a transcoder.

図１１は、図１０に示した映像・音声データ処理装置１２の動作の一例を示している。図１１の動作は、ハードウエアのみで実現されてもよく、ハードウエハをソフトウエアにより制御することにより実現されてもよい。 FIG. 11 shows an example of the operation of the video / audio data processing device 12 shown in FIG. The operation of FIG. 11 may be realized only by hardware, or may be realized by controlling a hard wafer by software.

処理Ｓ１００では、データ符号化部４０は、入力データ（映像データＶＤＡおよび音声データＡＤＡ）を符号化し、符号化した入力データをメモリ３０に書き込む。例えば、データ符号化部４０の映像符号化部４４は、映像データＶＤＡを符号化して生成した符号化映像データＶＥＤを、メモリ３０の映像領域３２に書き込む。また、例えば、データ符号化部４０の音声符号化部４６は、音声データＡＤＡを符号化して生成した符号化音声データＡＥＤを、メモリ３０の音声領域３４に書き込む。 In process S <b> 100, the data encoding unit 40 encodes input data (video data VDA and audio data ADA) and writes the encoded input data into the memory 30. For example, the video encoding unit 44 of the data encoding unit 40 writes the encoded video data VED generated by encoding the video data VDA into the video area 32 of the memory 30. Further, for example, the voice encoding unit 46 of the data encoding unit 40 writes the encoded voice data AED generated by encoding the voice data ADA in the voice area 34 of the memory 30.

処理Ｓ１１０では、チャンク確定部２４は、映像領域３２および音声領域３４の占有量が閾値以上か否かを判定する。例えば、チャンク確定部２４は、システム制御部２６から受けた１映像フレームのサイズ情報（符号化されたフレームのサイズ情報）を用いて、映像領域３２の占有量を算出する。なお、チャンク確定部２４は、映像領域３２や音声領域３４に対する書き込みアドレスや読み出しアドレスに基づいて、映像領域３２や音声領域３４の占有量を算出してもよい。 In the process S110, the chunk confirmation unit 24 determines whether the occupation amounts of the video area 32 and the audio area 34 are equal to or greater than a threshold value. For example, the chunk determination unit 24 calculates the occupation amount of the video area 32 using the size information of one video frame received from the system control unit 26 (encoded frame size information). Note that the chunk determination unit 24 may calculate the occupation amount of the video area 32 and the audio area 34 based on the write address and the read address for the video area 32 and the audio area 34.

そして、チャンク確定部２４は、算出した占有量と映像領域３２の閾値（以下、第１閾値とも称する）とを比較する。第１閾値は、例えば、映像領域３２のサイズに基づいて、予め設定されている。なお、音声領域３４の占有量と閾値（以下、第２閾値とも称する）との比較も、映像領域３２の占有量と第１閾値との比較と同様の方法により実施される。第２閾値は、例えば、音声領域３４のサイズに基づいて、予め設定されている。 Then, the chunk determination unit 24 compares the calculated occupation amount with a threshold value of the video area 32 (hereinafter also referred to as a first threshold value). The first threshold is set in advance based on the size of the video area 32, for example. Note that the comparison between the occupation amount of the audio area 34 and the threshold value (hereinafter also referred to as a second threshold value) is performed in the same manner as the comparison between the occupation amount of the video area 32 and the first threshold value. The second threshold is set in advance based on the size of the audio area 34, for example.

映像領域３２の占有量が第１閾値以上である第１条件および音声領域３４の占有量が第２閾値以上である第２条件の少なくとも一方を満たすとき（処理Ｓ１１０のＹｅｓ）、チャンク確定部２４の動作は、処理Ｓ１３０に移る。すなわち、映像領域３２および音声領域３４の空き容量が小さいとき、チャンク確定部２４の動作は、処理Ｓ１３０に移る。第１条件および第２条件のいずれも満たさないとき（処理Ｓ１１０のＮｏ）、チャンク確定部２４の動作は、処理Ｓ１２０に移る。すなわち、映像領域３２および音声領域３４の空き容量が大きいとき、チャンク確定部２４の動作は、処理Ｓ１２０に移る。 When at least one of the first condition in which the occupation amount of the video area 32 is equal to or greater than the first threshold and the second condition in which the occupation amount of the audio area 34 is equal to or greater than the second threshold is satisfied (Yes in step S110), the chunk determination unit 24 The operation proceeds to step S130. That is, when the free space in the video area 32 and the audio area 34 is small, the operation of the chunk determination unit 24 proceeds to processing S130. When neither the first condition nor the second condition is satisfied (No in process S110), the operation of the chunk confirmation unit 24 proceeds to process S120. That is, when the free space in the video area 32 and the audio area 34 is large, the operation of the chunk determination unit 24 proceeds to processing S120.

処理Ｓ１２０では、チャンク確定部２４は、チャンクのフレーム数を、１ＧＯＰのフレーム数と同じ数に設定する。例えば、処理Ｓ１２０で設定されたチャンクのフレーム数の情報（チャンク設定通知）は、多重化部５０に通知される。このように、映像領域３２および音声領域３４の空き容量が大きいときには、チャンクのフレーム数は、１ＧＯＰのフレーム数と同じ数に設定される。一方、映像領域３２および音声領域３４の空き容量が小さいときには、処理Ｓ１３０、Ｓ１４０において、チャンクのフレーム数は、１ＧＯＰのフレーム数より少ない数に設定される。 In the process S120, the chunk determination unit 24 sets the number of chunk frames to the same number as the number of frames of 1 GOP. For example, the information on the number of chunk frames (chunk setting notification) set in step S120 is notified to the multiplexing unit 50. As described above, when the free space in the video area 32 and the audio area 34 is large, the number of frames of the chunk is set to the same number as the number of frames of 1 GOP. On the other hand, when the free space in the video area 32 and the audio area 34 is small, the number of chunk frames is set to a number smaller than the number of frames in 1 GOP in steps S130 and S140.

処理Ｓ１３０では、チャンク確定部２４は、例えば、システム制御部２６から受けた情報（ＧＯＰの構造、ピクチャタイプおよび１映像フレームのサイズ等）に基づいて、１チャンク当たりのフレーム数（“ｊ”）を決定する。処理Ｓ１４０では、チャンク確定部２４は、処理Ｓ１３０で決定したフレーム数（“ｊ”）に、チャンクのフレーム数を設定する。例えば、チャンク確定部２４は、処理Ｓ１３０で決定したチャンクのフレーム数の情報（チャンク設定通知）を、多重化部５０に通知する。 In the process S130, the chunk determination unit 24, for example, based on information received from the system control unit 26 (GOP structure, picture type, size of one video frame, etc.), the number of frames per chunk (“j”) To decide. In the process S140, the chunk confirmation unit 24 sets the number of frames of the chunk to the number of frames (“j”) determined in the process S130. For example, the chunk determination unit 24 notifies the multiplexing unit 50 of information on the number of chunk frames determined in step S130 (chunk setting notification).

処理Ｓ１５０では、例えば、多重化部５０は、処理Ｓ１２０、Ｓ１４０で設定されたチャンクに基づいて、符号化データ（符号化映像データＶＥＤあるいは符号化音声データＡＥＤ）を順次出力する。そして、例えば、システム制御部２６は、１チャンク分の符号化データの出力が終了したか否かを判定する。１チャンク分の符号化データの出力が終了したとき（処理Ｓ１５０のＹｅｓ）、処理Ｓ１８０において、メッセージ生成部２８は、出力したチャンクに対応するメッセージＭＥＳを出力する。一方、１チャンク分の符号化データの出力が終了していないとき（処理Ｓ１５０のＮｏ）、映像・音声データ処理装置１２の動作は、処理Ｓ１６０に移る。 In the process S150, for example, the multiplexing unit 50 sequentially outputs encoded data (encoded video data VED or encoded audio data AED) based on the chunk set in the processes S120 and S140. For example, the system control unit 26 determines whether or not output of encoded data for one chunk has been completed. When the output of the encoded data for one chunk is completed (Yes in process S150), in process S180, the message generator 28 outputs a message MES corresponding to the output chunk. On the other hand, when the output of the encoded data for one chunk has not been completed (No in process S150), the operation of the video / audio data processing device 12 proceeds to process S160.

処理Ｓ１６０では、例えば、システム制御部２６は、メモリ３０の映像領域３２および音声領域３４のいずれかがオーバーフローしたか否かを判定する。なお、オーバーフローの判定は、システム制御部２６以外のモジュール（例えば、チャンク確定部２４）により実施されてもよい。メモリ３０の映像領域３２および音声領域３４のいずれかがオーバーフローしたとき（処理Ｓ１６０のＹｅｓ）、処理Ｓ１７０において、オーバーフローした領域（映像領域３２や音声領域３４）のデータが削除される。一方、メモリ３０がオーバーフローしていないとき処理Ｓ１６０のＮｏ）、映像・音声データ処理装置１２の動作は、処理Ｓ１５０に戻る。 In the process S160, for example, the system control unit 26 determines whether any of the video area 32 and the audio area 34 of the memory 30 has overflowed. The determination of overflow may be performed by a module other than the system control unit 26 (for example, the chunk determination unit 24). When either the video area 32 or the audio area 34 of the memory 30 overflows (Yes in process S160), the data in the overflowed area (video area 32 or audio area 34) is deleted in process S170. On the other hand, when the memory 30 has not overflowed (No in process S160), the operation of the video / audio data processing device 12 returns to process S150.

このように、映像・音声データ処理装置１２は、映像領域３２および音声領域３４の少なくとも一方の空き容量が小さいとき、チャンクのフレーム数をＧＯＰのフレーム数より少なくする。なお、チャンクのフレーム数の条件は、上述した実施形態と同じである。また、映像・音声データ処理装置１２の動作は、この例に限定されない。例えば、チャンク確定部２４は、映像領域３２および音声領域３４の空き容量に応じて、チャンクのフレーム数を段階的に変更してもよい。 As described above, the video / audio data processing device 12 makes the number of chunk frames smaller than the number of GOP frames when the free capacity of at least one of the video area 32 and the audio area 34 is small. The condition for the number of frames in the chunk is the same as that in the above-described embodiment. The operation of the video / audio data processing device 12 is not limited to this example. For example, the chunk determination unit 24 may change the number of chunk frames step by step according to the free space in the video area 32 and the audio area 34.

以上、この実施形態においても、上述した実施形態と同様の効果を得ることができる。 As described above, also in this embodiment, the same effect as that of the above-described embodiment can be obtained.

以上の詳細な説明により、実施形態の特徴点および利点は明らかになるであろう。これは、特許請求の範囲がその精神および権利範囲を逸脱しない範囲で前述のような実施形態の特徴点および利点にまで及ぶことを意図するものである。また、当該技術分野において通常の知識を有する者であれば、あらゆる改良および変更に容易に想到できるはずであり、発明性を有する実施形態の範囲を前述したものに限定する意図はなく、実施形態に開示された範囲に含まれる適当な改良物および均等物に拠ることも可能である。 From the above detailed description, features and advantages of the embodiments will become apparent. This is intended to cover the features and advantages of the embodiments described above without departing from the spirit and scope of the claims. Further, any person having ordinary knowledge in the technical field should be able to easily come up with any improvements and modifications, and there is no intention to limit the scope of the embodiments having the invention to those described above. It is also possible to rely on suitable improvements and equivalents within the scope disclosed in.

１０、１２‥映像・音声データ処理装置；２０、２４‥チャンク確定部；２２‥システム部；２６‥システム制御部；２８‥メッセージ生成部；３０‥メモリ；３２‥映像領域；３４‥音声領域；３６‥設定領域；３８‥付属情報領域；４０‥データ符号化部；４２‥符号化制御部；４４‥映像符号化部；４６‥音声符号化部；５０‥多重化部；５２‥多重化制御部；５４‥入力部；５６‥多重化バッファ；５８‥暗号化部；６０‥出力バッファ；１００‥外部装置 DESCRIPTION OF SYMBOLS 10, 12 ... Video / audio data processing device; 20, 24 Chunk determination part; 22 ... System part; 26 ... System control part; 28 ... Message generation part; 30 ... Memory; 32 ... Video area; 36 ... Setting area; 38 ... Attached information area; 40 ... Data encoding part; 42 ... Encoding control part; 44 ... Video encoding part; 46 ... Audio encoding part; 50 ... Multiplexing part; 54. Input unit 56 56 Multiplexing buffer 58 Encrypting unit 60 Output buffer 100 External device

Claims

A first storage unit for storing encoded video data;
A second storage unit for storing encoded audio data;
The occupation amounts of the first storage unit and the second storage unit are monitored, and the first condition that the occupation amount of the first storage unit is a predetermined first threshold or more and the occupation amount of the second storage unit are A chunk determination unit that reduces the number of frames of a chunk to be smaller than the number of frames of a GOP (Group Of Picture) when at least one of the second conditions equal to or greater than a predetermined second threshold is satisfied. Audio data processing device.

The chunk determination unit sets the number of frames of the chunk so that the order of the frames can be changed within the chunk to a display order when the video data is reproduced. Video encoding device.

The chunk determination unit sets the number of frames of the chunk to a minimum value of 2 or more out of the number of frames in which the order of the frames can be changed in the chunk as a display order when reproducing the video data. The video / audio data processing apparatus according to claim 2.

At least one of the first threshold and the second threshold has a plurality of values,
The video / audio data processing device according to claim 1, wherein the chunk determination unit changes the number of frames of the chunk in a stepwise manner in accordance with the occupation amount compared with the plurality of values.

In a data multiplexing method for multiplexing encoded video data and encoded audio data,
Comparing the occupation amount of the first storage unit for storing the encoded video data with a predetermined first threshold;
Comparing the occupation amount of the second storage unit for storing the encoded audio data with a predetermined second threshold;
Number of chunk frames when the first storage unit occupies at least one of the first condition and the second storage unit occupy at least one of the second threshold and the second condition. The data multiplexing method is characterized in that the number of frames is less than the number of GOP (Group Of Picture) frames.