JP7342520B2

JP7342520B2 - Video file processing methods and programs

Info

Publication number: JP7342520B2
Application number: JP2019153129A
Authority: JP
Inventors: 祐二小池
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2019-08-23
Filing date: 2019-08-23
Publication date: 2023-09-12
Anticipated expiration: 2039-08-23
Also published as: JP2021034081A

Description

この発明は、動画ファイルの処理方法およびプログラムに関する。 The present invention relates to a method and program for processing moving image files.

ユーザから与えられる編集指示に従って動画の編集を行う動画編集装置が知られている（例えば特許文献１参照）。この種の動画編集装置による動画編集では、映像の圧縮フレームデータと音声の圧縮フレームデータとを含む複数の動画ファイルを時間軸方向に連結し、さらに動画ファイル内の音声の圧縮フレームデータと他の音声データをミキシングする場合がある。ここで、動画ファイル内の映像の圧縮フレームデータは、ＣＦＲ（Constant Frame Rate；固定フレームレート）である場合もあれば、ＶＦＲ（Variable Frame Rate；可変フレームレート）である場合もある。そして、後者の動画ファイルの場合、他の動画ファイルとの連結の際に、ＶＦＲの圧縮フレームデータがＣＦＲの圧縮フレームデータに変換される（以下、ＶＦＲ／ＣＦＲ変換という）。また、このＶＦＲ／ＣＦＲ変換において、動画ファイル内の音声の圧縮フレームデータは動画のＣＦＲに対応したフレームレートの圧縮フレームデータに変換され、他の音声データとミキシングされる。 2. Description of the Related Art A video editing device that edits a video according to editing instructions given by a user is known (for example, see Patent Document 1). In video editing using this type of video editing device, multiple video files containing compressed video frame data and compressed audio frame data are concatenated in the time axis direction, and the compressed audio frame data in the video files and other compressed frame data are Audio data may be mixed. Here, the compressed frame data of the video in the video file may be CFR (Constant Frame Rate) or VFR (Variable Frame Rate). In the case of the latter video file, when concatenating with another video file, VFR compressed frame data is converted to CFR compressed frame data (hereinafter referred to as VFR/CFR conversion). In addition, in this VFR/CFR conversion, compressed frame data of audio in a video file is converted to compressed frame data of a frame rate corresponding to the CFR of the video, and mixed with other audio data.

特開２００４－２６６７２１号公報Japanese Patent Application Publication No. 2004-266721

上述した従来の技術において、動画ファイル内の音声の圧縮フレームデータは、ＶＦＲ／ＣＦＲ変換の過程において、非圧縮形式への変換、非可逆圧縮アルゴリズムによる圧縮が行われ、その後、非圧縮形式の音声データに変換され、例えば音声ファイル内の音声データ（すなわち、他の音声データ）とミキシングされる。この場合、音声ファイル内の音声データとミキシングされる音声データは、非可逆圧縮アルゴリズムによる処理を経るため、この処理に伴う音の劣化が発生する問題があった。 In the conventional technology described above, compressed audio frame data in a video file is converted to an uncompressed format and compressed using a lossy compression algorithm in the process of VFR/CFR conversion, and then the audio in the uncompressed format is compressed. The audio data is converted into data and mixed with audio data (ie, other audio data) within an audio file, for example. In this case, the audio data that is mixed with the audio data in the audio file undergoes processing using an irreversible compression algorithm, so there is a problem in that sound deterioration occurs due to this processing.

この発明は、以上説明した事情に鑑みてなされたものであり、動画ファイルから音声の圧縮フレームデータを取り出して他の音声データとのミキシングを行う過程において音の劣化を防止する手段を提供することを目的とする。 This invention has been made in view of the circumstances described above, and provides a means for preventing sound deterioration in the process of extracting compressed audio frame data from a video file and mixing it with other audio data. With the goal.

この発明は、複数の動画ファイルに各々含まれる映像の圧縮フレームデータから固定フレームレートの映像の圧縮フレームデータを生成し、前記複数の動画ファイルに各々含まれる音声の圧縮フレームデータから非圧縮形式または可逆圧縮形式の音声データを生成し、前記音声データと他の音声データをミキシングすることを特徴とする動画ファイルの処理方法を提供する。 The present invention generates compressed frame data of video at a fixed frame rate from compressed frame data of video included in each of a plurality of video files, and generates compressed frame data of video in uncompressed format or from compressed frame data of audio included in each of the plurality of video files. A method of processing a moving image file is provided, which is characterized by generating audio data in a reversibly compressed format and mixing the audio data with other audio data.

この発明の一実施形態である動画編集システムの構成を示す図である。1 is a diagram showing the configuration of a video editing system that is an embodiment of the present invention. 同実施形態における編集対象である動画ファイルの一例を示す図である。FIG. 3 is a diagram showing an example of a video file to be edited in the same embodiment. 同実施形態における動画編集の一例を示す図である。It is a figure which shows an example of video editing in the same embodiment. 同実施形態における動画編集処理の一例を示すフローチャートである。It is a flowchart which shows an example of video editing processing in the same embodiment. ＶＦＲ／ＣＦＲ変換の比較例を示す図である。It is a figure which shows the comparative example of VFR/CFR conversion. 同実施形態におけるＶＦＲ／ＣＦＲ変換の一例を示す図である。It is a figure which shows an example of VFR/CFR conversion in the same embodiment. 同実施形態におけるＶＦＲ／ＣＦＲ変換の他の例を示す図である。It is a figure showing other examples of VFR/CFR conversion in the same embodiment.

以下、図面を参照し、この発明の実施形態を説明する。 Embodiments of the present invention will be described below with reference to the drawings.

図１はこの発明の一実施形態による動画編集システム１の構成を示す図である。図１に示すように、動画編集システム１は、インターネット、移動体通信網等のネットワークＮＷに接続された動画編集サーバ１０と、ＤＢ（データベース）２０と、端末装置３０とを含む。 FIG. 1 is a diagram showing the configuration of a video editing system 1 according to an embodiment of the present invention. As shown in FIG. 1, the video editing system 1 includes a video editing server 10, a DB (database) 20, and a terminal device 30, which are connected to a network NW such as the Internet or a mobile communication network.

端末装置３０は、動画の編集を希望するユーザが操作する端末装置であり、例えばスマートフォンにより構成されている。なお、端末装置３０は、タブレット端末であってもよく、パーソナルコンピュータであってもよい。また、図１には１台の端末装置３０のみが図示されているが、動画編集システム１は、複数の端末装置３０を含み得る。 The terminal device 30 is a terminal device operated by a user who wishes to edit a video, and is configured by, for example, a smartphone. Note that the terminal device 30 may be a tablet terminal or a personal computer. Further, although only one terminal device 30 is illustrated in FIG. 1, the video editing system 1 may include a plurality of terminal devices 30.

動画編集サーバ１０は、動画編集サービスを提供する業者が管理するサーバであり、例えばサーバコンピュータにより構成されている。本実施形態において、動画編集サーバ１０は、端末装置３０からの指示に従い、音および映像の双方を含む動画をコンテンツとして生成する。このコンテンツには、例えば企業が人材を採用するためのコンテンツ、または、企業の製品を紹介するためのコンテンツ等の各種テーマに応じたコンテンツがある。 The video editing server 10 is a server managed by a company that provides video editing services, and is configured by, for example, a server computer. In this embodiment, the video editing server 10 generates a video containing both sound and video as content in accordance with instructions from the terminal device 30. This content includes content corresponding to various themes, such as content for a company to recruit human resources or content for introducing a company's products.

端末装置３０は、制御部３１と、操作部３２と、表示部３３と、通信部３４と、放音部３５と、記憶部３６と、撮像部３７と、収音部３８とを有する。制御部３１は、ＣＰＵ（Central Processing Unit）等のプロセッサであり、端末装置３０の各部の制御を行う。操作部３２は、キーボード、マウス等の各種の操作子からなる。表示部３３は、例えば液晶表示パネルである。通信部３４は、ネットワークＮＷを介して他の装置と通信を行うための装置である。放音部３５は、音を出力する装置であり、スピーカ等からなる。撮像部３７は、例えばカメラであり、コンテンツの素材となる動画のシーンを撮像するために使用される。収音部３８は、例えばマイクロホンであり、コンテンツの素材となる音声を収音するために使用される。記憶部３６は、制御部３１によってワークエリアとして使用される揮発性記憶部と、制御部３１により実行されるプログラム等を記憶する不揮発性記憶部とからなる。 The terminal device 30 includes a control section 31 , an operation section 32 , a display section 33 , a communication section 34 , a sound emitting section 35 , a storage section 36 , an imaging section 37 , and a sound collection section 38 . The control unit 31 is a processor such as a CPU (Central Processing Unit), and controls each unit of the terminal device 30. The operation unit 32 includes various operators such as a keyboard and a mouse. The display section 33 is, for example, a liquid crystal display panel. The communication unit 34 is a device for communicating with other devices via the network NW. The sound emitting unit 35 is a device that outputs sound, and includes a speaker and the like. The imaging unit 37 is, for example, a camera, and is used to capture a scene of a moving image that is a source of content. The sound collection unit 38 is, for example, a microphone, and is used to collect sound that is the material of the content. The storage unit 36 includes a volatile storage unit used as a work area by the control unit 31 and a nonvolatile storage unit that stores programs and the like executed by the control unit 31.

本実施形態において、記憶部３６の不揮発性記憶部には、端末側編集プログラムＰＲＧ３が記憶されている。制御部３１は、この端末側編集プログラムＰＲＧ３に従い、動画のコンテンツを編集するための編集指示を生成し、ＤＢ２０にアップロードする。また、制御部３１は、端末側編集プログラムＰＲＧ３に従い、編集対象である動画の完成版であるコンテンツの生成指示を動画編集サーバ１０に対して送信し、コンテンツを受信する。 In this embodiment, a terminal side editing program PRG3 is stored in the nonvolatile storage section of the storage section 36. The control unit 31 generates an editing instruction for editing the video content according to this terminal-side editing program PRG3, and uploads it to the DB 20. Further, the control unit 31 transmits an instruction to generate content, which is a completed version of the video to be edited, to the video editing server 10 according to the terminal-side editing program PRG3, and receives the content.

ＤＢ２０には、各種の素材データと、各種のテンプレートが記憶される。ＤＢ２０は、ネットワークＮＷを介して動画編集サーバ１０および端末装置３０と通信可能である。素材データは、例えばスマートフォンにより撮像されたコンテンツの各シーンの動画データや音声データ等、動画のコンテンツを構成するために使用することが可能な各種の素材を示すデータである。ユーザは、素材データを編集指示とともにＤＢ２０にアップロードすることも可能であるし、編集指示に先立って、予めＤＢ２０にアップロードすることも可能である。 The DB 20 stores various material data and various templates. The DB 20 can communicate with the video editing server 10 and the terminal device 30 via the network NW. The material data is data indicating various materials that can be used to configure video content, such as video data and audio data of each scene of content captured by a smartphone. The user can upload material data to the DB 20 together with an editing instruction, or can upload the material data to the DB 20 in advance before issuing an editing instruction.

テンプレートは、動画の編集作業を指示する作業仕様書としての機能を備えたデータである。このテンプレートは、人材採用、製品紹介といったテーマの種類毎に用意されている。このテンプレートは、動画内において動画素材の表示を行う期間と画面内での表示領域を定義する情報を含む。このテンプレートは、動画内において音声素材の再生を行う期間を定義する情報を含む。また、テンプレートは、素材データの入力を促す案内画像や案内音声を示す情報を含む。 The template is data that functions as a work specification that instructs video editing work. These templates are prepared for each type of theme, such as human resource recruitment and product introduction. This template includes information that defines the period during which the video material is displayed within the video and the display area within the screen. This template includes information that defines the period during which the audio material is played within the video. Further, the template includes information indicating a guide image and a guide voice prompting input of material data.

ユーザは、ＤＢ２０から端末装置３０に所望のテンプレートをダウンロードすることができる。端末装置３０では、制御部３１が端末側編集プログラムＰＲＧ３を実行し、テンプレートに従って、案内画像や案内音声を提供し、素材データの入力をユーザに促す。そして、ユーザが例えば端末装置３０の撮像部３７によって動画のシーンを撮像することにより、動画の素材データを入力すると、制御部３１は、この素材データをＤＢ２０にアップロードするとともに、この素材データの素材ＩＤと、テンプレートにより指定された表示領域および再生期間を示す情報とを含む編集指示を生成し、ＤＢ２０にアップロードする。また、ユーザが例えば収音部３８により収音を行って音声の素材データを入力すると、制御部３１は、この素材データをＤＢ２０にアップロードするとともに、この素材データの素材ＩＤと、テンプレートにより指定された再生期間を示す情報とを含む編集指示を生成し、ＤＢ２０にアップロードする。これらの編集指示は、端末装置３０を操作するユーザのユーザＩＤに対応付けられてＤＢ２０に格納される。また、ユーザは、素材データの入力を促されたとき、表示領域や再生期間を指定する情報を入力することができる。この場合、テンプレートにより指定された表示領域や再生期間の代わりに、ユーザによって指定された表示領域や再生期間に基づき、編集指示が生成される。 The user can download a desired template from the DB 20 to the terminal device 30. In the terminal device 30, the control unit 31 executes the terminal-side editing program PRG3, provides a guide image and a guide voice according to the template, and prompts the user to input material data. When the user inputs video material data by, for example, capturing a scene of the video with the imaging unit 37 of the terminal device 30, the control unit 31 uploads this material data to the DB 20, and also uploads the material data of this material data. An editing instruction including the ID and information indicating the display area and playback period specified by the template is generated and uploaded to the DB 20. Further, when the user collects sound using the sound collection unit 38 and inputs audio material data, the control unit 31 uploads this material data to the DB 20, and also uses the material ID of this material data and the data specified by the template. An editing instruction including information indicating a playback period is generated and uploaded to the DB 20. These editing instructions are stored in the DB 20 in association with the user ID of the user operating the terminal device 30. Furthermore, when prompted to input material data, the user can input information specifying the display area and playback period. In this case, the editing instruction is generated based on the display area and playback period specified by the user instead of the display area and playback period specified by the template.

動画編集サーバ１０は、端末装置３０と同様、制御部１１と、通信部１４と、記憶部１６とを有する。記憶部１６は、制御部１１によってワークエリアとして使用される揮発性記憶部と、制御部１１により実行されるプログラム等を記憶する不揮発性記憶部とからなる。本実施形態では、記憶部１６の不揮発性記憶部にサーバ側編集プログラムＰＲＧ１が記憶される。制御部１１は、このサーバ側編集プログラムＰＲＧ１を実行することにより、端末装置３０からＤＢ２０にアプロードされた編集指示に従って、動画を生成する。具体的には、編集指示が示す素材ＩＤに対応した素材データをＤＢ２０から読み出し、同素材データを編集指示において指定された再生期間に再生し、複数の音声データを再生した場合にはそれらをミキシングするという処理を繰り返し、再生およびミキシングしたデータを圧縮することにより動画コンテンツを生成する。 Like the terminal device 30, the video editing server 10 includes a control section 11, a communication section 14, and a storage section 16. The storage unit 16 includes a volatile storage unit used as a work area by the control unit 11 and a nonvolatile storage unit that stores programs and the like executed by the control unit 11. In this embodiment, the server-side editing program PRG1 is stored in the nonvolatile storage section of the storage section 16. The control unit 11 generates a moving image according to the editing instructions uploaded from the terminal device 30 to the DB 20 by executing the server-side editing program PRG1. Specifically, the material data corresponding to the material ID indicated by the editing instruction is read from the DB 20, the same material data is played during the playback period specified in the editing instruction, and when multiple audio data are played, they are mixed. By repeating this process and compressing the played and mixed data, video content is generated.

本実施形態において実行される動画編集の典型例は、映像の圧縮フレームデータと音声の圧縮フレームデータを各々含む複数の動画ファイルを時間軸上において連結し、それに対して音声データからなる音声ファイルをミキシングする、という動画編集である。本実施形態において、動画編集サーバ１０の制御部１１は、サーバ側編集プログラムＰＲＧ１に従い、複数の動画ファイルに各々含まれる映像の圧縮フレームデータから固定フレームレートの映像の圧縮フレームデータを生成し、複数の動画ファイルに各々含まれる音声の圧縮フレームデータから非圧縮形式または可逆圧縮形式の音声データを生成し、音声データと音声ファイルの音声データをミキシングする、という態様でこの典型的な動画編集を行う。
以上が本実施形態の構成である。 A typical example of video editing performed in this embodiment is to concatenate multiple video files, each containing compressed video frame data and audio compressed frame data, on the time axis, and then create an audio file consisting of audio data. Video editing involves mixing. In this embodiment, the control unit 11 of the video editing server 10 generates compressed frame data of a fixed frame rate video from compressed frame data of videos included in each of a plurality of video files according to the server-side editing program PRG1, and Typical video editing is performed by generating uncompressed or reversibly compressed audio data from compressed audio frame data included in each video file, and mixing the audio data with the audio data of the audio file. .
The above is the configuration of this embodiment.

次に本実施形態の動作について説明する。図２は、動画編集サーバ１０の制御部１１が実行する動画編集において、素材となる動画ファイルの例を示すものである。この例において、動画ファイルは、ＭＰ４ファイルである。図２に示すように、ＭＰ４ファイルは、ＡＶＣ（MPEG-4 Part 10 Advanced Video Coding）形式の映像の圧縮フレームデータと、ＡＡＣ（Advanced Audio
Coding）形式の音声の圧縮フレームデータとを収容したコンテナファイルである。 Next, the operation of this embodiment will be explained. FIG. 2 shows an example of a video file that serves as a material in video editing performed by the control unit 11 of the video editing server 10. In this example, the video file is an MP4 file. As shown in Figure 2, an MP4 file consists of compressed frame data of an AVC (MPEG-4 Part 10 Advanced Video Coding) format video and AAC (Advanced Audio
This is a container file that contains compressed audio frame data in (Coding) format.

図３は本実施形態において編集指示が示す動画編集の例を示す図である。図３において、横軸は時間であり、時刻ｔ１～ｔ７は、編集指示において指定された各種の素材の再生開始時刻または再生終了時刻を示している。この例では、ｔ１＜ｔ２＜ｔ３＜ｔ４＜ｔ５＜ｔ６＜ｔ７の関係がある。また、図３には、編集指示において指定された素材が示されている。 FIG. 3 is a diagram showing an example of video editing indicated by editing instructions in this embodiment. In FIG. 3, the horizontal axis is time, and times t1 to t7 indicate playback start times or playback end times of various materials specified in editing instructions. In this example, the relationship is t1<t2<t3<t4<t5<t6<t7. Further, FIG. 3 shows the material specified in the editing instruction.

図３において、ＭＰ４ファイルＦ１１、Ｆ１２およびＦ１３は、時刻ｔ１、ｔ２およびｔ５が再生開始時刻となっている。ＷＡＶ（Waveform Audio File Format）ファイルＦ２１は、ＢＧＭの音声ファイルであり、時刻ｔ１が再生開始時刻となっている。ＭＰ３（MPEG-1 Audio Layer-3）ファイルＦ２２は、ナレーションの音声ファイルであり、時刻ｔ２が再生開始時刻、時刻ｔ４が再生終了時刻となっている。ＷＡＶファイルＦ２３は、効果音の音声ファイルであり、時刻ｔ２が再生開始時刻、時刻ｔ３が再生終了時刻となっている。ＷＡＶファイルＦ２４は、効果音の音声ファイルであり、時刻ｔ６が再生開始時刻、時刻ｔ７が再生終了時刻となっている。 In FIG. 3, MP4 files F11, F12, and F13 have playback start times at times t1, t2, and t5. The WAV (Waveform Audio File Format) file F21 is a BGM audio file, and the playback start time is time t1. The MP3 (MPEG-1 Audio Layer-3) file F22 is a narration audio file, and time t2 is the playback start time, and time t4 is the playback end time. The WAV file F23 is an audio file of sound effects, and time t2 is the playback start time, and time t3 is the playback end time. The WAV file F24 is an audio file of sound effects, and time t6 is the playback start time, and time t7 is the playback end time.

図４は編集指示が示す動画編集を実行する制御部１１の動作例を示すフローチャートである。制御部１１は、コンテンツの生成指示を端末装置３０から受信することにより、図４に示す処理を実行する。本実施形態において、制御部１１は、図３に示す時間軸に沿って時刻を一定時間ずつ進めつつ、編集対象である素材がなくなってステップＳ４の判断結果が「ＹＥＳ」になるまでの間、ＶＦＲ／ＣＦＲ変換（ステップＳ１）と、ミキシング（ステップＳ２）と、圧縮（ステップＳ３）の各処理を一定時間分ずつ繰り返す。 FIG. 4 is a flowchart illustrating an example of the operation of the control unit 11 that executes video editing indicated by an editing instruction. The control unit 11 executes the process shown in FIG. 4 by receiving a content generation instruction from the terminal device 30. In this embodiment, the control unit 11 advances the time by a certain amount of time along the time axis shown in FIG. 3 until there is no more material to be edited and the determination result in step S4 becomes "YES" Each process of VFR/CFR conversion (step S1), mixing (step S2), and compression (step S3) is repeated for a certain period of time.

ＶＦＲ／ＣＦＲ変換（ステップＳ１）は、映像処理（ステップＳ１１）と音声処理（ステップＳ１２）とからなる。ＶＦＲ／ＣＦＲ変換（ステップＳ１）では、編集指示に従い、当該処理の実行タイミングにおいて再生すべき動画ファイルをＤＢ２０から読み出し、映像処理（ステップＳ１１）と音声処理（ステップＳ１２）とを実行する。例えば時刻ｔ１と時刻ｔ２との間の時刻にステップＳ１１が実行されたとすると、制御部１１は、動画ファイルＦ１１をＤＢ２０から読み出し、この動画ファイルＦ１１内の当該実行タイミングに対応した映像の圧縮フレームデータについて映像処理（ステップＳ１１）を実行し、当該実行タイミングに対応した音声の圧縮フレームデータについて音声処理（ステップＳ１２）を実行する。 VFR/CFR conversion (step S1) consists of video processing (step S11) and audio processing (step S12). In VFR/CFR conversion (step S1), a video file to be played back is read from the DB 20 at the execution timing of the process according to an editing instruction, and video processing (step S11) and audio processing (step S12) are executed. For example, if step S11 is executed between time t1 and time t2, the control unit 11 reads the video file F11 from the DB 20, and reads the compressed frame data of the video corresponding to the execution timing in the video file F11. Video processing (step S11) is performed on the data, and audio processing (step S12) is performed on the audio compressed frame data corresponding to the execution timing.

映像処理（ステップＳ１１）では、当該実行タイミングにおいて再生すべき映像の圧縮フレームデータがＶＦＲの圧縮フレームデータである場合には、同データを所定のフレームレートの非圧縮のフレームデータに変換する。この所定のフレームレートは、動画編集サーバ１０において予め決められたものであってもよいが、動画ファイルＦ１１～Ｆ１３に含まれる各圧縮フレームデータのフレームレート（ＶＦＲを含む）に基づいて決定されてもよい。例えば各圧縮フレームデータのフレームレートの最大値を所定のフレームレートとしてもよい。 In the video processing (step S11), if the compressed frame data of the video to be reproduced at the execution timing is VFR compressed frame data, the same data is converted to uncompressed frame data of a predetermined frame rate. This predetermined frame rate may be predetermined by the video editing server 10, but may be determined based on the frame rate (including VFR) of each compressed frame data included in the video files F11 to F13. Good too. For example, the maximum value of the frame rate of each compressed frame data may be set as the predetermined frame rate.

音声処理（ステップＳ１２）では、当該実行タイミングにおいて再生すべき音声の圧縮フレームデータから非圧縮形式または可逆圧縮形式の音声データを生成する。 In the audio processing (step S12), audio data in an uncompressed format or a reversibly compressed format is generated from the compressed frame data of the audio to be reproduced at the execution timing.

ミキシング（ステップＳ２）では、当該処理の実行タイミングにおいて再生すべき音声データを編集指示において指定された音声ファイルから読み出し、ステップＳ１２において得られた音声データとミキシングする。例えば時刻ｔ１と時刻ｔ２との間の時刻にステップＳ２が実行されたとすると、制御部１１は、ＷＡＶファイルＦ２１をＤＢ２０から読み出し、このＷＡＶファイルＦ２１内の当該処理の実行タイミングに対応した音声データを読み出し、ステップＳ１２において得られた音声データとミキシングする。 In mixing (step S2), audio data to be reproduced at the execution timing of the process is read from the audio file specified in the editing instruction, and mixed with the audio data obtained in step S12. For example, if step S2 is executed between time t1 and time t2, the control unit 11 reads the WAV file F21 from the DB 20, and extracts the audio data corresponding to the execution timing of the process in this WAV file F21. The data is read out and mixed with the audio data obtained in step S12.

圧縮（ステップＳ３）では、当該処理の実行タイミングまでの間にステップＳ１１において得られた映像のＣＦＲの非圧縮フレームデータと、ステップＳ２のミキシングにより得られた音声データとを圧縮し、ＭＰ４等のコンテナファイルに格納する。例えば時刻ｔ２の直後の時刻にステップＳ３が実行されたとすると、制御部１１は、ＭＰ４ファイルＦ１１から得られた映像の末尾付近のフレームデータと、ＭＰ４ファイルＦ１２から得られた映像の先頭付近のフレームデータとに基づいて圧縮フレームデータを生成し、時間軸上において連結された圧縮フレームデータを生成することとなる。
以上が本実施形態の動作である。 In compression (step S3), the CFR uncompressed frame data of the video obtained in step S11 before the execution timing of the process and the audio data obtained by the mixing in step S2 are compressed, and the audio data obtained by the mixing in step S2 is compressed. Store in a container file. For example, if step S3 is executed at a time immediately after time t2, the control unit 11 uses frame data near the end of the video obtained from the MP4 file F11 and frames near the beginning of the video obtained from the MP4 file F12. Compressed frame data is generated based on the data, and compressed frame data connected on the time axis is generated.
The above is the operation of this embodiment.

従来技術においては、動画ファイルから取り出した音声の圧縮フレームデータを処理（図４の音声処理（ステップＳ１２）に相当）する際に、図５に例示するように、ＡＡＣ等の非可逆圧縮アルゴリズムにより音声の圧縮フレームデータを生成していた。このため、動画ファイルから取り出した音声データの音質を劣化させていた。 In the conventional technology, when processing the compressed frame data of the audio extracted from the video file (corresponding to the audio processing (step S12) in FIG. 4), as illustrated in FIG. It was generating compressed audio frame data. As a result, the quality of the audio data extracted from the video file deteriorated.

これに対し、本実施形態では、図４の音声処理（ステップＳ１２）において、動画ファイルから取り出した音声の圧縮フレームデータから非圧縮形式または可逆圧縮形式の音声データを生成し、他の音声データとのミキシングを行う。図６に示す例ではＡＡＣ形式の圧縮フレームデータから可逆圧縮形式であるＦＬＡＣ形式の音声データを生成している。また、図７に示す例ではＡＡＣ形式の圧縮フレームデータから非圧縮形式であるリニアＰＣＭ形式の音声データを生成している。従って、本実施形態によれば、動画ファイルから取り出した音声データについて実施する非可逆圧縮の回数が従来技術よりも１回少なくなり、他の音声データとミキシングされる音声データの音質の劣化を防止することができる。 In contrast, in the present embodiment, in the audio processing (step S12) in FIG. 4, audio data in an uncompressed format or a reversibly compressed format is generated from compressed audio frame data extracted from a video file, and is combined with other audio data. Performs mixing. In the example shown in FIG. 6, audio data in FLAC format, which is a reversible compression format, is generated from compressed frame data in AAC format. Furthermore, in the example shown in FIG. 7, audio data in linear PCM format, which is an uncompressed format, is generated from compressed frame data in AAC format. Therefore, according to the present embodiment, the number of times of irreversible compression performed on audio data extracted from a video file is one less than in the conventional technology, thereby preventing deterioration in the sound quality of audio data that is mixed with other audio data. can do.

＜他の実施形態＞
以上、この発明の一実施形態について説明したが、この発明には他にも実施形態が考えられる。例えば次の通りである。 <Other embodiments>
Although one embodiment of this invention has been described above, other embodiments of this invention are possible. For example:

（１）上記実施形態では、端末装置３０と動画編集サーバ１０との連係により動画のコンテンツを生成したが、スタンドアロン型の装置が動画のコンテンツを生成してもよい。 (1) In the above embodiment, the video content is generated by the cooperation between the terminal device 30 and the video editing server 10, but the video content may be generated by a stand-alone device.

（２）上記実施形態では、動画ファイルＦ１１～Ｆ１３の再生と並行して音声ファイルＦ２１～Ｆ２４を生成し、各ファイルから得られる音声データのミキシングを行った。しかし、そのようにする代わりに、動画ファイルＦ１１～Ｆ１３のＶＦＲ／ＣＦＲ変換を順次実行し、映像についてのＣＦＲの圧縮フレームデータと、非圧縮または可逆圧縮形式の音声データとからなる1本の動画ファイルを構成し、この動画ファイル内の非圧縮または可逆圧縮形式の音声データと音声ファイルＦ２１～Ｆ２４内の音声データとのミキシングを行ってもよい。 (2) In the above embodiment, the audio files F21 to F24 are generated in parallel with the reproduction of the video files F11 to F13, and the audio data obtained from each file is mixed. However, instead of doing so, VFR/CFR conversion of video files F11 to F13 is executed sequentially, and a single video consisting of CFR compressed frame data for video and audio data in uncompressed or reversible compressed format is created. It is also possible to configure a file and mix the uncompressed or reversibly compressed audio data in the video file with the audio data in the audio files F21 to F24.

（３）上記実施形態では、動画ファイル内の音声データと音声ファイル内の音声データとをミキシングしたが、動画ファイル内の音声データと他の動画ファイル内の音声データとをミキシングする場合に本発明を適用してもよい。 (3) In the above embodiment, audio data in a video file and audio data in an audio file are mixed, but the present invention can be used when mixing audio data in a video file and audio data in another video file. may be applied.

１０……動画編集サーバ、２０……ＤＢ、３０……端末装置、ＮＷ……ネットワーク、１１，３１……制御部、３２……操作部、３３……表示部、１４，３４……通信部、３５……放音部、１６，３６……記憶部、３７……撮像部、３８……収音部。 10...Video editing server, 20...DB, 30...Terminal device, NW...Network, 11, 31...Control unit, 32...Operation unit, 33...Display unit, 14, 34...Communication unit , 35... Sound emitting section, 16, 36... Storage section, 37... Imaging section, 38... Sound collecting section.

Claims

Generating compressed frame data of a fixed frame rate video from compressed frame data of variable frame rate videos included in each of a plurality of video files,
When generating the compressed frame data of the fixed frame rate video, generating audio data in an uncompressed format or a reversibly compressed format from the compressed frame data of the audio included in each of the plurality of video files,
A method for processing a video file, comprising mixing the audio data and other audio data.

to the computer,
Generating compressed frame data of a fixed frame rate video from compressed frame data of variable frame rate videos included in each of a plurality of video files,
When generating the compressed frame data of the fixed frame rate video, generating audio data in an uncompressed format or a reversibly compressed format from the compressed frame data of the audio included in each of the plurality of video files,
Mixing the audio data with other audio data
A program characterized by: