JP2010086615A

JP2010086615A - Multiplexing device, program, and multiplexing method

Info

Publication number: JP2010086615A
Application number: JP2008255616A
Authority: JP
Inventors: Takeshi Tateno; 剛舘野
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2008-09-30
Filing date: 2008-09-30
Publication date: 2010-04-15

Abstract

<P>PROBLEM TO BE SOLVED: To provide a multiplexing device for generating a moving image file which can efficiently perform post edition processing, in a non-fragment system file of a moving image-encoding file format. <P>SOLUTION: When video data and voice data are multiplexed and the moving image encoding file of a non-fragment system is generated, mdat boxes equivalent to the number of scenes of video data are generated as shown in Fig. 202. The video data and voice data are divided for each scene of video data, divided video data and voice data are multiplexed, and media data are stored in order from the beginning, that is, from mdat box of the head. Also, moov box is generated and meta-data are stored. The moving image-encoding file can be cut off for each mdat box defined for each scene while keeping the non-fragment form as it is. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、後の編集処理に好適な、映像データや、音声データ等のメディアデータを作成する多重化装置に関し、特にノンフラグメント形式によるメディアデータの多重化技術に関するものである。 The present invention relates to a multiplexing apparatus that creates media data such as video data and audio data suitable for later editing processing, and more particularly to a technology for multiplexing media data in a non-fragment format.

近年、通信ネットワークの大容量化、伝送技術の進歩により、インターネット上で、動画、音声、テキストあるいは、静止画等のマルチメディアコンテンツを含む動画像ファイルをパーソナルコンピュータに配信する動画配信サービスの普及が著しい。また、動画配信サービスは今後、携帯電話機やPDA等の移動体端末への提供の拡大も見込まれている。 In recent years, with the increase in capacity of communication networks and the advancement of transmission technology, video distribution services that distribute video files including multimedia content such as video, audio, text, or still images to personal computers over the Internet have become widespread. It is remarkable. In addition, the video distribution service is expected to expand to mobile terminals such as mobile phones and PDAs in the future.

それに伴い、動画・音声符号化形式が多様化し、様々な形式を均一な枠組みの中で相互接続可能な形で処理できるようにする必要性が高まっている。そこでISO/IEC（International Standardization Organization/International Engineering Consortium） JTC１/SC２９/WG１１によって、MPEG等の動画・音声のコンテンツデータをファイルに記録するために、「ISO Base Mediaファイル形式」という汎用のファイル形式が規格化されている。 Along with this, video and audio coding formats have diversified, and there is an increasing need to be able to process various formats in a form that can be interconnected within a uniform framework. Therefore, the ISO / IEC (International Standardization Organization / International Engineering Consortium) JTC1 / SC29 / WG11 uses a general-purpose file format called “ISO Base Media file format” to record video and audio content data such as MPEG. It has been standardized.

例えばこの拡張形式であるＭＰ４の場合、異なるファイルの動画像を切り出し、１つのMP４ファイルに纏めるような編集処理を行う場合の編集方法としては、それぞれにMP４ファイルのmdat box内における多重化された実体データに多重分離（demux）を行ない、映像データと音声データとして取り出したのち、それらのデータを切り出して並び替える編集処理を行い、編集したデータをＭＰ４形式のファイルに再度多重化していた。この処理はあらかじめ多重化されているコンテンツデータを分離し、再度多重化するなど迂遠な手順が多く、また編集の際のデータ処理負荷が大きなものとなっていた。 For example, in the case of MP4, which is an extended format, as an editing method in the case of performing an editing process in which moving images of different files are cut out and combined into one MP4 file, each is multiplexed in the mdat box of the MP4 file. Demultiplexing (demux) was performed on the actual data, and the data was extracted as video data and audio data. Then, the data was cut out and rearranged, and the edited data was multiplexed again into the MP4 format file. In this process, there are many detour procedures such as separating previously multiplexed content data and then multiplexing again, and the data processing load during editing is large.

前述の問題を解決するべく、コンテンツデータがあらかじめ分割されているフラグメント方式のMP４ファイルにおける編集処理を効率的に行えるようなデータファイルの編集方法が提案されている（特許文献１を参照）。
特開２００６−１２９０８１ In order to solve the above-mentioned problem, a data file editing method has been proposed that can efficiently perform editing processing on a fragment-type MP4 file in which content data is divided in advance (see Patent Document 1).
JP2006-129081

あらかじめフラグメントしていないMP４ファイルから所定の動画像の切り出し編集や、異なるMP４ファイル同士の動画像を切り出して１つのMP４ファイルに連結する編集処理を行う場合、多重化されているコンテンツデータを一旦多重分離し、編集後に再度多重化しなければならない。フラグメント方式においては分割された単位での編集方法が提案されているが、ノンフラグメント方式のコンテンツデータには適用できない。 When performing edit processing to extract a predetermined moving image from an MP4 file that has not been fragmented in advance, or to extract a moving image of different MP4 files and link them to one MP4 file, the multiplexed content data is temporarily multiplexed. It must be separated and multiplexed again after editing. In the fragment method, an editing method in divided units has been proposed, but it cannot be applied to non-fragment content data.

本発明は上記問題に鑑みてなされたもので、MP４ファイルフォーマットのノンフラグメント方式ファイルにおいて、後の編集処理を効率的に行なうことのできるMP４ファイル作成するための多重化装置を提供することを目的とする。 The present invention has been made in view of the above problems, and an object of the present invention is to provide a multiplexing apparatus for creating an MP4 file capable of efficiently performing subsequent editing processing in a non-fragmented file of the MP4 file format. And

本発明にかかる多重化装置は映像データと音声データを、メタデータとメディアデータとを有するノンフラグメント方式の動画像符号化ファイルフォーマット形式に多重化するための多重化装置であって、前記映像データ及び前記音声データを取得する手段と、前記映像データのランダムアクセス可能なポイントで、前記映像データ及び前記音声データを該映像データに含まれるシーン毎に分割する手段と、前記分割された映像データの個数分のmdat boxを定義する手段と、前記分割された映像データ及び音声データを多重化する手段と、前記多重化されたメディアデータを最初のデータから先頭の前記mdat boxより順に格納する手段と、を具備することを特徴している。 A multiplexing device according to the present invention is a multiplexing device for multiplexing video data and audio data into a non-fragmented moving image encoded file format format having metadata and media data, the video data And means for acquiring the audio data; means for dividing the video data and the audio data for each scene included in the video data at a randomly accessible point of the video data; and Means for defining the number of mdat boxes; means for multiplexing the divided video data and audio data; means for storing the multiplexed media data in order from the first mdat box from the first data; It is characterized by comprising.

本発明にかかるプログラムは映像データと音声データを、メタデータとメディアデータとを有するノンフラグメント方式の動画像符号化ファイルフォーマット形式に多重化する、計算機で実行可能なプログラムであって、映像データ及び音声データからなるメディアデータを取得する手順と、前記映像データのランダムアクセス可能なポイントで、前記映像データ及び前記音声データを該映像データに含まれるシーン毎に分割する手順と、前記分割された映像データの個数分のmdat boxを定義する手順と、前記分割された映像データ及び音声データを多重化する手順と、前記多重化された映像データおよび音声データを最初のデータから先頭の前記mdat boxより順に格納する手順とを計算機に実行させることを特徴としている。 A program according to the present invention is a computer-executable program that multiplexes video data and audio data into a non-fragmented moving image encoded file format format having metadata and media data. A procedure for obtaining media data composed of audio data; a procedure for dividing the video data and the audio data for each scene included in the video data at a randomly accessible point of the video data; and the divided video A procedure for defining mdat boxes for the number of data, a procedure for multiplexing the divided video data and audio data, and the multiplexed video data and audio data from the first data to the first mdat box It is characterized by having a computer execute the procedure of storing in order.

本発明にかかる多重化方法は映像データと音声データを、メタデータとメディアデータとを有するノンフラグメント方式の動画像符号化ファイルフォーマット形式に多重化するための多重化方法であって、映像データ及び音声データからなるメディアデータを取得し、前記映像データのランダムアクセス可能なポイントで、前記映像データ及び前記音声データを該映像データに含まれるシーン毎に分割するとともに、前記分割された映像データの個数分のmdat boxを定義して、前記分割された映像データ及び音声データを多重化した符号化データを、最初のデータから先頭の前記mdat boxより順に格納することを特徴としている。 A multiplexing method according to the present invention is a multiplexing method for multiplexing video data and audio data into a non-fragmented moving image encoded file format format having metadata and media data, the video data and Media data consisting of audio data is acquired, and the video data and the audio data are divided for each scene included in the video data at a random accessible point of the video data, and the number of the divided video data The mdat box of minutes is defined, and the encoded data obtained by multiplexing the divided video data and audio data is stored in order from the first data to the first mdat box.

本発明によれば、MP４ファイルの多重化の際にシーン毎にメディアデータの分割を行い、それぞれのメディアデータを異なるmdat boxに多重化して格納する。そうすることでシーン毎の編集を所望のmdat box単位で切り取り、並べ替え等の処理により行うことが可能となる。よってノンフラグメント形式のままで、mdat box内のメディアデータの多重化、再多重化等の迂遠な処理を行うことなくMP４ファイルの編集処理を効率的に行うことができる。 According to the present invention, when MP4 files are multiplexed, media data is divided for each scene, and each media data is multiplexed and stored in different mdat boxes. By doing so, editing for each scene can be cut out in units of desired mdat boxes and rearranged. Therefore, the MP4 file editing process can be efficiently performed without performing a detour process such as multiplexing and remultiplexing of media data in the mdat box in the non-fragment format.

以下、本発明の実施形態を図面を参照しながら説明する。 Embodiments of the present invention will be described below with reference to the drawings.

前述の「ISO Base Mediaファイル形式」という汎用のファイル形式は特定の符号化形式を前提とはしていない基本ファイル形式として定義されている。所定の符号化形式や目的に適合させるにはこの規格を部分的に拡張した規格を別途定義することによって対応するという特徴を有している。この拡張の代表例としてMP４ファイルフォーマットがある。 The above-mentioned general-purpose file format called “ISO Base Media file format” is defined as a basic file format that does not assume a specific encoding format. In order to adapt to a predetermined encoding format and purpose, there is a feature that a standard partially expanded from this standard is separately defined. A typical example of this extension is the MP4 file format.

配信サービスにおいて、メディアファイルを配信する際には、まず、多重化装置において、動画、静止画、音声、およびテキスト等のメディアデータを取り込んで、メディアデータの再生に必要なヘッダ情報とメディアデータの実体データとを多重化して、メディアファイルデータを作成することが必要となるが、現在、動画ファイルフォーマットとして、このMP４ファイルフォーマットが注目されており、今後広く普及するものと予想される。本実施形態ではMP4ファイル形式を例に説明する。 When distributing a media file in a distribution service, first, the multiplexing device captures media data such as moving images, still images, audio, and text, and the header information and media data required for reproducing the media data. Although it is necessary to multiplex the entity data with the media file data, the MP4 file format is currently attracting attention as a moving image file format and is expected to be widely used in the future. In this embodiment, an MP4 file format will be described as an example.

ここでMP４ファイルのデータ構造について説明する。図１はMP４形式のデータ構造の一例を示す図である。図１のようにＭＰ４形式のファイルデータは大きくmdat box、moov boxの２つのデータ構造から構成される。 Here, the data structure of the MP4 file will be described. FIG. 1 is a diagram showing an example of a data structure in the MP4 format. As shown in FIG. 1, the MP4 format file data is largely composed of two data structures of mdat box and moov box.

ここで１つのまとまったデータを格納する領域をboxという単位で表す。このboxには先頭の４バイトにboxのバイト単位のサイズ（Size）を格納し、後続する４バイトにそのboxの種類（Type）を格納して、この合計８バイトをヘッダとして、データブロックの先頭に付加する構造となっている。そのため、MP４ファイルの先頭８バイトで最初のｂｏｘのサイズと種類を知ることができ、ファイルの先頭からその最初のｂｏｘのサイズだけ（ファイルの先頭の位置からサイズを加算した位置だけ）移動することにより後続するｂｏｘにアクセスすることが可能となる。 Here, an area for storing a single piece of data is expressed in units of box. In this box, the size (Size) of the box in bytes is stored in the first 4 bytes, the type (Type) of the box is stored in the subsequent 4 bytes, and the total of 8 bytes is used as a header, and It has a structure added to the head. Therefore, the size and type of the first box can be known from the first 8 bytes of the MP4 file, and only the size of the first box is moved from the beginning of the file (only the position obtained by adding the size from the beginning position of the file). The subsequent box can be accessed.

mdat boxとは符号化された映像、音声データの実体を格納している領域である、またmoov boxは映像、音声データの物理的位置、時間的位置や、特性情報等のメタデータを格納している領域である。また、boxはファイル内に連続して記録できるだけでなく、図１に示すmoov box内のtrack boxのように、box内に幾つかのboxを持つことが可能であり、boxを入れ子にすることができる。ここで、track boxとは、moov box内に存在し、音声や画像などを再生するために必要な管理情報をそれぞれが分担して保持している領域である。 An mdat box is an area that stores encoded video and audio data entities, and a moov box stores metadata such as physical and temporal positions of video and audio data, and characteristic information. It is an area. Boxes can be recorded not only continuously in a file, but also can have several boxes in a box, such as the track box in the moov box shown in Fig. 1. Can do. Here, the track box is an area that exists in the moov box and shares and holds management information necessary for reproducing sound, images, and the like.

また、MP４ファイルフォーマットでは、moov boxに全てのメタデータを記録する形だけではなく、メタデータを時系列順に複数の領域に分割して記録するような形式も許可している。この形式は「フラグメントムービー」（Fragmented Movie）と呼ばれている。 The MP4 file format allows not only the form of recording all the metadata in the moov box, but also the form of recording the metadata by dividing it into a plurality of areas in chronological order. This format is called "Fragmented Movie".

MP４ファイルフォーマットのメディアデータを作成する際、前述のように多重化装置において映像、音声データを取り込み、多重化を行ない、メディアデータの実体データをmdat boxに格納し、メタデータをmoov box内に作成する。 When creating media data in MP4 file format, the video and audio data is captured and multiplexed by the multiplexing device as described above, the media data is stored in the mdat box, and the metadata is stored in the moov box. create.

図２には従来の多重化装置を用いて多重化処理を行った従来のMP４ファイル２０１、及び本発明を用いて多重化処理を行ったMP４ファイル２０２の一例が示されている。 FIG. 2 shows an example of a conventional MP4 file 201 that has been multiplexed using a conventional multiplexer, and an MP4 file 202 that has been multiplexed using the present invention.

従来の多重化処理装置を用いて多重化処理を行うと、ノンフラグメント形式のMP４ファイルは、MP４ファイル２０１のようにmoov boxとmdat boxそれぞれ一つずつの形式となって出力される。一方、本実施形態における多重化装置より出力されたMP４ファイル２０２は一つのmoov boxと一つ以上のmdat boxによって構成される。このmdat boxに格納されているメディアデータは先頭のmdat boxより時系列順に格納される。そしてこのMP４ファイル２０２の通常の再生を行う際は、先頭のmdat boxに格納されているメディアデータより次のmdat box内のメディアデータへと順次再生していくことになる。このMP４ファイル２０２の形式にすることによりそれぞれのmdat boxについて、多重分離することなくその部分を切り出し編集することが可能な状態となる。 When multiplexing processing is performed using a conventional multiplexing processing apparatus, a non-fragmented MP4 file is output in a format of one moov box and one mdat box as in the MP4 file 201. On the other hand, the MP4 file 202 output from the multiplexing device in the present embodiment is composed of one moov box and one or more mdat boxes. Media data stored in this mdat box is stored in chronological order from the top mdat box. When normal playback of the MP4 file 202 is performed, the media data stored in the first mdat box is sequentially played back to the media data in the next mdat box. By adopting the MP4 file 202 format, each mdat box can be cut and edited without being demultiplexed.

図３は本実施形態における多重化処理を実現するための情報処理装置の構成の一例を示すブロック図である。図３には、CPU３０１，Mメモリ３０２、HDD３０３、多重化装置３０４、ODD３０５、ネットワークI/F３０６、音声I/F３０７、映像I/F３０８、およびバス３０９が示されている。 FIG. 3 is a block diagram showing an example of the configuration of the information processing apparatus for realizing the multiplexing processing in the present embodiment. FIG. 3 shows a CPU 301, M memory 302, HDD 303, multiplexing device 304, ODD 305, network I / F 306, audio I / F 307, video I / F 308, and bus 309.

CPU３０１は中央演算処理装置（Central Processing Unit）であり、情報処理装置全体を制御している。またプログラムを実行し、そのプログラムに応じた所定の処理を実行する機能を有している。 A CPU 301 is a central processing unit and controls the entire information processing apparatus. It also has a function of executing a program and executing a predetermined process corresponding to the program.

Mメモリ３０２は半導体メモリにより構成され、CPU３０１がプログラムを処理する際のプログラムおよびデータの格納用領域として利用される。 The M memory 302 is composed of a semiconductor memory, and is used as a program and data storage area when the CPU 301 processes the program.

HDD３０３は例えば磁気ディスク装置であり、データを保存する不揮発性の領域として利用される。CPU３０１の指示により、記憶されたプログラムやデータを読み出すことができる。 The HDD 303 is a magnetic disk device, for example, and is used as a non-volatile area for storing data. The stored program and data can be read out according to an instruction from the CPU 301.

多重化装置３０４はMP４ファイルフォーマットに準拠して映像データと音声データを多重化し、MP４形式のファイルを生成するモジュールである。多重化装置３０４はバス３０９を介して映像データ及び音声データを取得し、多重化したMP４データをHDD３０３に出力する。本実施形態では多重化後のMP４データをHDD３０３に出力する例を示しているが、ODD３０５に書き込むようにしても良く、またネットワークI/F３０６を介して他の情報処理装置に送信するようにしても良い。 The multiplexing device 304 is a module that multiplexes video data and audio data in accordance with the MP4 file format to generate an MP4 format file. The multiplexing device 304 acquires video data and audio data via the bus 309 and outputs the multiplexed MP4 data to the HDD 303. In this embodiment, the multiplexed MP4 data is output to the HDD 303. However, the MP4 data may be written to the ODD 305 or transmitted to another information processing apparatus via the network I / F 306. Also good.

ODD３０５は例えば光ディスク装置であり、挿入されたCD（Compact Disc）やDVD（Digital Versatile Disc）等の光ディスクに対し、データおよびプログラムの書き込み、読み出しを行う。 The ODD 305 is, for example, an optical disk device, and writes and reads data and programs to and from an inserted optical disk such as a CD (Compact Disc) or a DVD (Digital Versatile Disc).

ネットワークI/F３０６はLAN、WAN等のネットワークを介して他の情報処理装置等とのデータの伝達を行う。 A network I / F 306 transmits data to other information processing apparatuses and the like via a network such as a LAN or a WAN.

音声I/F３０７はマイク等の外部音声を取得するためのインタフェースであり、外部音声を電気信号に変換して情報処理装置に入力する機能を有している。 The audio I / F 307 is an interface for acquiring external sound such as a microphone, and has a function of converting the external sound into an electric signal and inputting it to the information processing apparatus.

映像I/F３０８はデジタルカメラ等の外部映像を取得するためのインタフェースであり、外部映像を電気信号に変換して情報処理装置に入力する機能を有している。 A video I / F 308 is an interface for acquiring an external video such as a digital camera, and has a function of converting the external video into an electric signal and inputting it to the information processing apparatus.

バス３０９には各モジュールが接続されており、モジュール相互での通信が可能となっている。 Each module is connected to the bus 309 so that the modules can communicate with each other.

また多重化装置３０４の各部の構成はソフトウェアで実現してもよい。この場合には、各構成の機能を実現するCPU３０１で実行可能なプログラムをあらかじめHDD３０３に格納しておき、処理時にはメモリ３０２上に読み出して実行するように構成すれば良い。なお、ソフトウェアで構成する場合、プログラムはHDD３０３に格納されているだけではなく、光ディスクに格納されたプログラムをODD３０５から直接読み出しても良い。あるいはネットワークI/F３０６より取得するようにしても良い。 The configuration of each unit of the multiplexing device 304 may be realized by software. In this case, a program that can be executed by the CPU 301 that realizes the function of each configuration is stored in the HDD 303 in advance, and is read out and executed on the memory 302 during processing. In the case of being configured by software, the program is not only stored in the HDD 303, but the program stored in the optical disk may be directly read from the ODD 305. Or you may make it acquire from network I / F306.

本実施形態において、多重化装置３０４を用いて多重化を行う映像データおよび音声データは、音声I/F３０７および映像I/F３０８から別々に取得したデータでもよい。音声I/F３０７および映像I/F３０８から別々に取得したデータの場合はそれぞれを独立に多重化装置３０４に入力し多重化を行う。また、すでに多重化されているようなデータは多重分離（demux）し、映像データ、音声データを分離してからそれぞれを多重化装置３０４に入力する。多重化装置３０４に入力するこれらのデータはHDD３０３又は、ODD３０５に保存されていたデータでもよいし、ネットワークI/F３０５から取得しても良い。また、データの取得手段としては当然上記の例示に限定されるものではない。 In the present embodiment, the video data and audio data to be multiplexed using the multiplexing device 304 may be data obtained separately from the audio I / F 307 and the video I / F 308. In the case of data acquired separately from the audio I / F 307 and the video I / F 308, each is independently input to the multiplexer 304 and multiplexed. In addition, data that has already been multiplexed is demultiplexed (demultiplexed), video data and audio data are separated and then input to the multiplexer 304. These data input to the multiplexing device 304 may be data stored in the HDD 303 or the ODD 305, or may be acquired from the network I / F 305. The data acquisition means is not limited to the above example.

図４は本実施形態における多重化装置３０４の構成の一例を示すブロック図である。図４には多重化装置３０４、映像データ指定部４０１、映像データ解析部４０２、音声データ指定部４０３、音声データ解析部４０４、MP４多重化部４０５、およびMP４データ出力部４０６が示されている。 FIG. 4 is a block diagram showing an example of the configuration of the multiplexing device 304 in the present embodiment. FIG. 4 shows a multiplexing device 304, a video data specifying unit 401, a video data analyzing unit 402, an audio data specifying unit 403, an audio data analyzing unit 404, an MP4 multiplexing unit 405, and an MP4 data output unit 406. .

映像データ指定部４０１は、多重化装置３０４外部からバス３０９を介して映像データを取得し、映像データ解析部４０２に対し出力する機能を有している。 The video data designation unit 401 has a function of acquiring video data from the outside of the multiplexing device 304 via the bus 309 and outputting the video data to the video data analysis unit 402.

映像データ解析部４０２は、映像データ指定部４０１が生成した映像データの解析を行い、以降の多重化処理に必要な情報を抽出する機能を有している。また、映像データからシーン情報や、シーン毎の符号化情報等の抽出も行い、抽出されたシーン等情報や多重化処理に必要な情報をMP４多重化部４０５へと出力する。ここでのシーンとはランダムアクセス可能なポイントで映像データを分割したものである。 The video data analysis unit 402 has a function of analyzing the video data generated by the video data specifying unit 401 and extracting information necessary for subsequent multiplexing processing. Also, scene information, encoding information for each scene, and the like are extracted from the video data, and the extracted scene information and information necessary for the multiplexing process are output to the MP4 multiplexing unit 405. Here, the scene is obtained by dividing video data at random accessible points.

音声データ指定部４０３は、多重化装置３０４外部からバス３０９を介して音声データを取得し、音声データ解析部４０４に対し出力する機能を有している。 The audio data designation unit 403 has a function of acquiring audio data from the outside of the multiplexing device 304 via the bus 309 and outputting the audio data to the audio data analysis unit 404.

音声データ解析部４０４は、音声データ指定部４０３から取得した音声データの解析を行い、多重化処理に必要な情報を抽出する。また、抽出した情報をMP４多重化部４０５へと出力する機能を有している。 The voice data analysis unit 404 analyzes the voice data acquired from the voice data designation unit 403 and extracts information necessary for the multiplexing process. Also, it has a function of outputting the extracted information to the MP4 multiplexing unit 405.

MP４多重化部４０５は映像データおよび音声データのMP４ファイルフォーマットへの多重化を行ない、多重化したMP４データをMP４データ送信部に対し出力する機能を有している。MP４多重化部４０５はMP４ファイルを作成するとき、映像データから抽出したシーンの単位に映像データおよび音声データを分割する。その後、映像データに含まれるシーン数分のmdat boxを作成し、先に分割した単位で映像データと音声データを多重化して、多重化したシーン毎の映像データ及び音声データを先頭のmdat boxから順に格納していく。この処理によりMP4ファイル２０２が生成される。 The MP4 multiplexing unit 405 has a function of multiplexing video data and audio data into the MP4 file format and outputting the multiplexed MP4 data to the MP4 data transmitting unit. When creating the MP4 file, the MP4 multiplexing unit 405 divides the video data and audio data into scene units extracted from the video data. After that, create mdat boxes for the number of scenes included in the video data, multiplex the video data and audio data in the previously divided units, and store the multiplexed video data and audio data for each scene from the top mdat box Store in order. With this process, an MP4 file 202 is generated.

また、MP４多重化部４０５はmoov box内のstsd boxの中にmdat boxの個数分（シーン数分）、多重化を行なうメディアデータの符号化情報であるSample descriptionを格納するSample description boxを定義する。これは入力された映像データがシーン毎に符号化情報が異なることが考えられるためである。 In addition, the MP4 multiplexing unit 405 defines a sample description box for storing a sample description, which is encoding information of media data to be multiplexed, for the number of mdat boxes (for the number of scenes) in the stsd box in the moov box. To do. This is because the input video data may have different encoding information for each scene.

このとき先頭のmdat boxに格納されたシーンを記述するSample descriptionは先頭のSample description boxに格納し、N番目のmdat boxに格納されたシーンを記述するSample descriptionはN番目のSample description boxに格納する。つまり先頭のSample descriptionから順に先頭のSample description boxへと順次格納する。そしてこのSample description boxはmoov box内のstsd boxに格納される。ここで、このSample description boxおよびstsd boxは上述のtrak boxの一つであり他階層のものである。 At this time, the sample description describing the scene stored in the first mdat box is stored in the first sample description box, and the sample description describing the scene stored in the Nth mdat box is stored in the Nth sample description box. To do. In other words, the first sample description is sequentially stored in the first sample description box. This Sample description box is stored in the stsd box in the moov box. Here, the Sample description box and the stsd box are one of the trak boxes described above and are of other layers.

またSample_description_indexもmdat boxの個数分に対応して作成しmoov box内のstsc boxに格納する。例えばmdat boxがN個あり、それぞれのシーンを記述する全てのSample descriptionで共通していないとするならば「Sample_description_index＝N」と定義する。また全てのシーンにおいてSample descriptionが共通であれば、「Sample_description_index＝１」と定義する。 Sample_description_index is also created corresponding to the number of mdat boxes and stored in the stsc box in the moov box. For example, if there are N mdat boxes and they are not common to all sample descriptions describing each scene, “Sample_description_index = N” is defined. If Sample description is common in all scenes, it is defined as “Sample_description_index = 1”.

MP４データ出力部４０６は、MP４多重化部４０５で多重化されたMP４データを外部に出力する機能を有している。 The MP4 data output unit 406 has a function of outputting the MP4 data multiplexed by the MP4 multiplexing unit 405 to the outside.

図５は、本実施形態におけるシーン数がN個であった場合の多重化後のmoov boxの構造と格納された情報の一例を示す概略図である。moov boxに格納されるdref boxにそれぞれのmdat boxの存在場所をしめすurl boxをmdat boxの個数分（シーン数分）作成し、格納する。図５にシーン数がN個であった場合のmoov boxの構造と格納された情報の概略図を示す。図５ではmoov box内の階層構造はMP４ファイルフォーマットに準拠するものとし、上位の階層は省略して記載してある。 FIG. 5 is a schematic diagram illustrating an example of the structure of the moov box after multiplexing and stored information when the number of scenes in the present embodiment is N. Create and store url boxes indicating the location of each mdat box in the dref box stored in the moov box (the number of scenes). FIG. 5 shows a schematic diagram of the structure of the moov box and stored information when the number of scenes is N. In FIG. 5, the hierarchical structure in the moov box is based on the MP4 file format, and the upper hierarchy is omitted.

図６は多重化される前の映像データと音声データのシーンの分割方法の一例を示す図である。一般に符号化された映像データは、フレームの符号化の単位に従いランダムアクセスが可能なポイントが限定されており、ここでいうシーンの分割は必ずランダムアクセスが可能なポイントで行なわれる。シーンの実際の分割方法としては、実施の際の実装に依存し、短い映像データであるならばランダムアクセスの可能な全てのポイントで分割することも考えられ、またある程度の長さを持った映像データであるならば、所定時間経過毎にランダムアクセスの可能なポイントで分割を行なったり、大きく映像データの情報が切り替わるランダムアクセス可能なポイントでシーンの分割を行なうことも考えられるため、本実施形態では特に限定しない。しかし、どのような分割方法であったとしても、Sample descriptionが変化するポイントではシーンは異なるものとし、分割を行なう。 FIG. 6 is a diagram showing an example of a scene dividing method of video data and audio data before being multiplexed. In general, encoded video data has a limited number of points that can be randomly accessed according to a frame encoding unit, and scene division here is always performed at a point where random access is possible. The actual scene division method depends on the implementation at the time of implementation, and if it is short video data, it is possible to divide at all points where random access is possible, and video with a certain length If it is data, it is conceivable to divide at a point where random access is possible at every elapse of a predetermined time, or to divide a scene at a point where random access is possible where information of video data is largely switched. Then there is no particular limitation. However, no matter what division method is used, the scene is different at the point where the sample description changes, and division is performed.

音声データに関しては、ランダムアクセスが不可能なポイントというのは存在せず、どのポイントでも分割可能であるため、図のように映像データのシーン分割の箇所に対応する箇所でシーン分割を行なう。 Regarding audio data, there is no point at which random access is impossible, and any point can be divided. Therefore, scene division is performed at a location corresponding to a scene division location of video data as shown in the figure.

図７は本実施形態における多重化装置３０４の処理フローの一例を示すフロー図である。 FIG. 7 is a flowchart showing an example of a processing flow of the multiplexing device 304 in the present embodiment.

まず、映像データ指定部４０１、音声データ指定部４０３が多重化装置３０４外部より映像データ、音声データをそれぞれ分離したデータ形式で取得する（S７０１）。次に映像データ解析部４０２、音声データ解析部４０４がそれぞれ受信した映像データ、音声データの解析を行い（S７０２）、解析によって得られたシーン情報、多重化処理に必要となる情報をMP４多重化処理部４０５に送信する。MP４多重化処理部は受信した情報を元に、まず映像データの中のシーン数が２つ以上であるかどうかを判別する（S７０３）。シーン数が１であるならば（No）、シーン分割が行なわれないのでmdat boxを一つ作成し、そこに全メディアデータを格納する。またシーン数が１であるならば、Sample descriptionも一つなのでSample_description_indexに関して、Sample_description_index＝１とする（S７０４）。S７０３においてシーン数が2以上であれば（Yes）、次にSample descriptionが全てのシーンにおいて共通か、否かの判別を行う（S７０５）。全てのシーンでSample descriptionが共通であるならば（Yes）、シーン数分mdat boxを作成し、最初のmdat boxよりシーン毎のメディアデータをそれぞれ時系列的に格納する。またそれに対応するurl boxを作成する。さらにSample descriptionは全てのシーンにおいて共通なので、Sample_description_index＝１とする（S７０６）。S７０５においてメディアデータにSample descriptionが異なるシーンが存在するならば（No）、S７０６と同様にシーン数分のmdat box、url boxを作成する。また、それぞれのmdat boxに格納されるメディアデータに対応するSample descriptionをmdat box毎に用意し、Sample_description_index＝Nとする（S７０７）。これらの処理によってmdat box毎に格納されたメディアデータとmoov box内に格納されたSample descriptionとの対応関係の判別が可能な形式となる。次に、S７０４、S７０６、S７０７の次のフローとして多重化装置３０４はMP４ファイルフォーマットに準拠してMP4データを構築する（S７０８）。最後に構築したMP４データをMP４データ出力部４０６より多重化装置３０４外部へ出力し、一連の多重化処理フローは終了となる。 First, the video data designating unit 401 and the audio data designating unit 403 acquire the video data and the audio data from the outside of the multiplexing device 304 in separate data formats (S701). Next, the video data analysis unit 402 and the audio data analysis unit 404 respectively analyze the received video data and audio data (S702), and MP4 multiplexing of the scene information obtained by the analysis and information necessary for the multiplexing process is performed. The data is transmitted to the processing unit 405. Based on the received information, the MP4 multiplexing processing unit first determines whether the number of scenes in the video data is two or more (S703). If the number of scenes is 1 (No), scene division is not performed, so one mdat box is created and all media data is stored therein. If the number of scenes is 1, since there is only one Sample description, Sample_description_index = 1 is set for Sample_description_index (S704). If the number of scenes is 2 or more in S703 (Yes), it is next determined whether or not the Sample description is common to all scenes (S705). If the sample description is common to all scenes (Yes), mdat boxes are created for the number of scenes, and media data for each scene is stored in time series from the first mdat box. Also create a corresponding url box. Furthermore, since Sample description is common to all scenes, Sample_description_index = 1 is set (S706). If there are scenes having different sample descriptions in the media data in S705 (No), mdat boxes and url boxes for the number of scenes are created as in S706. Sample description corresponding to the media data stored in each mdat box is prepared for each mdat box, and Sample_description_index = N is set (S707). With these processes, the correspondence between the media data stored for each mdat box and the sample description stored in the moov box can be determined. Next, as the next flow of S704, S706, and S707, the multiplexing apparatus 304 constructs MP4 data in accordance with the MP4 file format (S708). Finally, the constructed MP4 data is output from the MP4 data output unit 406 to the outside of the multiplexing device 304, and a series of multiplexing processing flow ends.

以上の方法により、mdat doxがシーン数個存在し、それぞれに各シーンが格納されたmＭＰ４データが作成できる。ここで本実施形態によって得られたMP４データを元データとして新たなMP４データを作成する場合の編集処理方法を説明する。 By the above method, there are several mdat dox scenes, and mMP4 data storing each scene can be created. Here, an editing processing method in the case where new MP4 data is created using the MP4 data obtained by the present embodiment as original data will be described.

図８は本実施形態の多重化処理によって得られたMP４データの編集方法の一例を示した図である。本実施形態においては、得られたMP４データ（a）、（b）、（c）の中のmdat boxを切り出し、１つのMP４データ（d）に纏める編集処理を考える。 FIG. 8 is a diagram showing an example of a method for editing MP4 data obtained by the multiplexing process of this embodiment. In the present embodiment, consider an editing process in which mdat boxes in the obtained MP4 data (a), (b), and (c) are cut out and combined into one MP4 data (d).

図８に示すようにMP４データ（a）より、mdat dox[A]、mdat dox[C]、MP４データ（b）より、mdat dox[B]、MP４データ（c）より、mdat dox[D]を切り出し、一つのMP４データ（d）に纏める編集処理を行う場合、従来の多重化装置で多重化されたMP４データのように多重分離し、再び多重化を行なう等の処理を行う必要が無い。本実施形態で得られたMP４データに関して同様の編集を行なう場合は、mdat dox[A]、 [B]、[C] 、[D]をそれぞれのMP４データより切り出し、並べるのみで実体データを格納するmdat boxに関する編集処理は終了となる。後に実体データに対応するようにmoov box内にメタデータを作成することで、編集されたMP４データが完成する。ここでmdat boxにも他のbox同様に先頭にboxのサイズがヘッダ情報として格納されているため、メタデータ作成の際これを利用してもよい。 As shown in FIG. 8, mdat dox [A], mdat dox [C] from MP4 data (a), mdat dox [B] from MP4 data (b), mdat dox [D] from MP4 data (c) When the editing process is performed to combine the data into one MP4 data (d), it is not necessary to perform a process such as demultiplexing and multiplexing again like the MP4 data multiplexed by the conventional multiplexing apparatus. . When the same editing is performed on the MP4 data obtained in the present embodiment, mdat dox [A], [B], [C], and [D] are cut out from the respective MP4 data and stored simply by arranging them. The editing process related to the mdat box to be completed is completed. Later, the edited MP4 data is completed by creating metadata in the moov box so as to correspond to the entity data. Here, since the size of the box is stored as header information at the top of the mdat box as well as other boxes, this may be used when creating metadata.

本実施形態ではmdat box内の実体データに関しては再多重化等の処理を行う必要が無く、mdat boxを並べるのみの作業で編集が可能であり、moov box内のメタデータの作成を行うのみで編集が可能なため処理が簡単であり、情報処理装置のデータ処理負担も軽いものとなる。 In this embodiment, it is not necessary to perform re-multiplexing or the like on the entity data in the mdat box, it can be edited only by arranging the mdat boxes, and only the metadata in the moov box is created. Since editing is possible, the processing is simple and the data processing burden on the information processing apparatus is light.

元のMP４データより切り取りってきたmdat boxには対応するSample descriptionが存在するため、編集後のMP４データのメタデータの編集に関して、Sample descriptionは編集前のMP４データ（a）、（b）、（c）におけるSample descriptionをそのまま利用することができる。元のMP４データのmdat boxと、moov box内に格納されたSample descriptionを編集時に抽出するときの対応を示した図が図９となる。 Since the corresponding sample description exists in the mdat box cut out from the original MP4 data, the sample description is the MP4 data (a), (b), The sample description in (c) can be used as it is. FIG. 9 shows the correspondence between the mdat box of the original MP4 data and the sample description stored in the moov box when extracted during editing.

ここで図９を用いて元のMP４データのmdat boxと、Sample description編集時に抽出するときの対応について説明する。図９−A〜図９−Cでは編集元のMP４データの状態に分類して示してある。 Here, the correspondence between the mdat box of the original MP4 data and the extraction when editing the sample description will be described with reference to FIG. In FIGS. 9A to 9C, the MP4 data of the editing source is classified and shown.

図９−Aでは「mdat boxの個数＝１」の場合のmdat boxとSample descriptionの対応を示してる。mdat boxの個数＝１（シーン数＝１）のMP４データでは当然Sample descriptionも一つとなるので、MP４データのmdat boxを利用する場合はmdat 1にSample description 1を対応付けて編集作業に利用する。 FIG. 9A shows the correspondence between the mdat box and the sample description when “the number of mdat boxes = 1”. Since the number of mdat boxes = 1 (number of scenes = 1), the sample description is naturally one, so when using the MP4 data mdat box, mdat 1 is associated with Sample description 1 and used for editing. .

図９−Bでは「mdat boxの個数＝N、Sample description共通」の場合を示している。これはmdat boxの個数＝Nよってシーン数＝Nであるが、全シーンにおいてSample descriptionが共通である場合（Sample_description_index＝１）である。このときいずれのmdat boxを切り出す場合にもSample description １を対応付けて編集作業に利用する。 FIG. 9-B shows a case where “the number of mdat boxes = N, common sample description”. This is the case where the number of mdat boxes = N and the number of scenes = N, but the sample description is common in all scenes (Sample_description_index = 1). At this time, when any mdat box is cut out, Sample description 1 is associated and used for editing.

図９−Cでは「mdat boxの個数＝N、共通でないSample descriptionが存在する」場合を示している。これはmdat boxの個数＝Nよりシーン数＝Nであり、シーンによって共通でないSample descriptionが存在する場合（Sample_description_index＝N）である。このときmoov box内にmdat boxと同じ順番で対応するSample descriptionが格納されているため、同順番のmdat boxとSample descriptionを対応付けて編集作業に利用する。 FIG. 9C shows a case where “the number of mdat boxes = N, there is a non-common Sample description”. This is the case where the number of mdat boxes = N and the number of scenes = N, and there is a sample description that is not common to each scene (Sample_description_index = N). At this time, since the sample description corresponding to the mdat box is stored in the moov box in the same order, the mdat box and the sample description in the same order are associated with each other and used for the editing work.

上述のようにmdat box毎に対応するSample descriptionを元のMP４データより運用することでmdat box毎の符号化情報を気にすることなく編集を行なうことが可能である。本実施形態ではMP４データ多重化の際、Sample description等についてのみ、それぞれのmdat boxに対応して作成して例示したが、これに限定されるものではなく、他のメタデータに関してもmdat boxに対応するよう作成しておけば、編集の際にSample description同様に利用できるものと考えられる。 As described above, by using the sample description corresponding to each mdat box from the original MP4 data, editing can be performed without worrying about the encoding information for each mdat box. In this embodiment, at the time of multiplexing MP4 data, only the sample description and the like are created and illustrated corresponding to each mdat box. However, the present invention is not limited to this, and other metadata is also included in the mdat box. If it is created so that it corresponds, it can be used in the same way as Sample description when editing.

また、本実施形態では編集作業として多重化装置３０４によって多重化されたMP４を切り出し、新たなMP４データを作成する編集作業を例示したが、これに限定されるものではなく、多重化したMP４データからのmdat boxの削除や、他のMP４データへのmdat boxの付加などmdat box単位でデータを自由に動かすことが可能であるということから様々な編集処理について応用可能である。 Further, in the present embodiment, the editing work for cutting out the MP4 multiplexed by the multiplexing device 304 as the editing work and creating new MP4 data is illustrated, but the present invention is not limited to this, and the multiplexed MP4 data is used. Since it is possible to move data freely in units of mdat boxes, such as deletion of mdat boxes from, and addition of mdat boxes to other MP4 data, various editing processes can be applied.

本発明では、MP４ファイルフォーマットのノンフラグメント方式ファイルにおいて、後の編集処理を効率的に行なうことのできるMP４ファイル作成することができる。 In the present invention, it is possible to create an MP4 file that can be efficiently processed later in a non-fragmented file in the MP4 file format.

なお、本発明は上記実施形態そのままに限定されるものではなく、実施段階ではその要旨を逸脱しない範囲で構成要素を変形して具現化できる。また、上記実施形態に開示されている複数の構成要素の適宜な組み合わせにより、種々の発明を形成できる。例えば、実施形態に示される全構成要素から幾つかの構成要素を削除してもよい。さらに、異なる実施形態にわたる構成要素を適宜組み合わせてもよい。
Note that the present invention is not limited to the above-described embodiment as it is, and can be embodied by modifying the constituent elements without departing from the scope of the invention in the implementation stage. In addition, various inventions can be formed by appropriately combining a plurality of components disclosed in the embodiment. For example, some components may be deleted from all the components shown in the embodiment. Furthermore, constituent elements over different embodiments may be appropriately combined.

MP４形式のデータ構造の一例を示す図。The figure which shows an example of the data structure of MP4 format. 従来の多重化装置を用いて多重化処理を行った従来のMP４ファイルと本実施形態において多重化処理を行ったMP４ファイルのデータ構造の一例を示す図。The figure which shows an example of the data structure of the conventional MP4 file which performed the multiplexing process using the conventional multiplexing apparatus, and the MP4 file which performed the multiplexing process in this embodiment. 本実施形態における多重化処理を実現するための情報処理装置の構成の一例を示すブロック図1 is a block diagram showing an example of the configuration of an information processing apparatus for realizing multiplexing processing in the present embodiment 本実施形態における多重化装置の構成の一例を示すブロック図。The block diagram which shows an example of a structure of the multiplexing apparatus in this embodiment. 本実施形態におけるシーン数がN個であった場合の多重化後のmoov boxの構造と格納された情報の一例を示す概略図Schematic showing an example of the structure of moov box after multiplexing and stored information when the number of scenes in this embodiment is N 本実施形態における多重化処理前の映像データと音声データのシーンの分割方法の一例を示す図The figure which shows an example of the division | segmentation method of the scene of the video data before multiplexing processing in this embodiment, and audio | voice data 本実施形態における多重化装置の処理フローの一例を示すフロー図Flow chart showing an example of the processing flow of the multiplexing device in the present embodiment 本実施形態の多重化処理によって得られるMP４データの編集方法の一例を示した図The figure which showed an example of the edit method of MP4 data obtained by the multiplexing process of this embodiment mdat boxとSample descriptionの編集時における対応の一例を示す図。The figure which shows an example of a response | compatibility at the time of editing of mdat box and Sample description.

Explanation of symbols

２０１：従来の形式のMP４ファイル
２０２：本実施形態において多重化されたMP４ファイル
３０１：CPU
３０２：Mメモリ
３０３：HDD
３０４：多重化装置
３０５：ODD
３０６：ネットワークI/F
３０７：音声I/F
３０８：映像I/F
３０９：バス
４０１：映像データ指定部
４０２：映像データ解析部
４０３：音声データ指定部
４０４：音声データ解析部
４０５：MP４多重化部
４０６：MP４データ出力部 201: MP4 file 202 in the conventional format 202: MP4 file 301 multiplexed in the present embodiment 301: CPU
302: M memory 303: HDD
304: Multiplexer 305: ODD
306: Network I / F
307: Voice I / F
308: Video I / F
309: Bus 401: Video data specifying unit 402: Video data analyzing unit 403: Audio data specifying unit 404: Audio data analyzing unit 405: MP4 multiplexing unit 406: MP4 data output unit

Claims

A multiplexing device for multiplexing video data and audio data into a non-fragmented moving image encoded file format format having metadata and media data,
Means for obtaining the video data and the audio data;
Means for dividing the video data and the audio data for each scene included in the video data at a randomly accessible point of the video data;
Means for defining mdat boxes for the number of divided video data;
Means for multiplexing the divided video data and audio data;
Means for sequentially storing the multiplexed media data from the first data to the first mdat box;
A multiplexing apparatus comprising:

If the encoding information of the media data stored in each of the mdat boxes is common, determine one Sample description corresponding to the media data,
Means for determining a plurality of sample descriptions corresponding to each of the media data if the encoding information of the media data is not common;
2. The multiplexing apparatus according to claim 1, further comprising means for defining a Sample description box for storing the Sample description by the number of Sample descriptions.

The multiplexing apparatus according to claim 2, further comprising means for defining the number of Sample description boxes as a value of Sample_description_index.

A computer-executable program that multiplexes video data and audio data into a non-fragmented moving image encoded file format format having metadata and media data,
A procedure for acquiring media data composed of video data and audio data;
A procedure of dividing the video data and the audio data for each scene included in the video data at a randomly accessible point of the video data;
A procedure for defining mdat boxes for the number of divided video data;
A procedure for multiplexing the divided video data and audio data;
A procedure for storing the multiplexed video data and audio data in order from the first data to the first mdat box;
A program characterized by causing a computer to execute.

If the encoding information of the media data stored in each of the mdat boxes is common, determine one Sample description corresponding to the media data,
A procedure for determining a plurality of sample descriptions corresponding to each of the media data if the encoding information of the media data is not common,
5. The program according to claim 4, further causing a computer to execute a procedure for defining a Sample description box for storing the Sample description for the number of Sample descriptions.

6. The program according to claim 5, further causing a computer to execute a procedure for defining the number of Sample description boxes as a value of Sample_description_index.

A multiplexing method for multiplexing video data and audio data into a non-fragmented moving image encoded file format format having metadata and media data,
Obtain media data consisting of video data and audio data,
Dividing the video data and the audio data for each scene included in the video data at a randomly accessible point of the video data,
Define mdat boxes for the number of divided video data, and store encoded data obtained by multiplexing the divided video data and audio data in order from the first data to the first mdat box. Feature multiplexing method.

If the encoding information of the media data stored in each of the mdat boxes is common, determine one Sample description corresponding to the media data,
If the encoding information of the media data is not common, determine a plurality of sample descriptions corresponding to each media data,
8. The multiplexing method according to claim 7, wherein after that, a number of sample description boxes for storing the sample description are defined.

The multiplexing method according to claim 8, wherein the number of Sample description boxes is defined as a value of Sample_description_index.