JP2005318210A

JP2005318210A - Reproducing device for picture with sound

Info

Publication number: JP2005318210A
Application number: JP2004132969A
Authority: JP
Inventors: Toshio Motegi; 敏雄茂出木
Original assignee: Dai Nippon Printing Co Ltd
Current assignee: Dai Nippon Printing Co Ltd
Priority date: 2004-04-28
Filing date: 2004-04-28
Publication date: 2005-11-10
Anticipated expiration: 2024-04-28
Also published as: JP4422538B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a reproducer for reproducing pictures having sounds synchronously at accurate timings, even when a plurality of sound-including picture files in the form of sound data blocks are made to be produced, synchronously attached to pictures frames. <P>SOLUTION: The reproducer reads a plurality of sound-including picture files in the form of sound data blocks synchronously attached to pictures frames, takes the picture frames out of each file and combines them to obtain a composite picture frame, while it combines sound data blocks corresponding to each picture frame to form a composite sound data block. Then it shows the composite picture frames thus obtained on the output with the composite sound data blocks outputted as sounds. Thus, they are outputted, without shifting the timing mutually between the pictures and between the sound data contained in the original file. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は、ＣＤ・ＤＶＤ等を用いた民生・業務用途における鑑賞用のパッケージ映像・音楽再生分野、放送事業者・公共施設の事業者等が商業目的で配信する環境映像・環境音楽分野において好適な音声信号および映像信号の再生技術に関する。 INDUSTRIAL APPLICABILITY The present invention is suitable in the field of packaged video / music playback for viewing in consumer / business applications using CDs / DVDs, and in the field of environmental video / environmental music distributed for commercial purposes by broadcasters / businesses in public facilities. TECHNICAL FIELD OF THE INVENTION

従来より、複数の映像を同一画面上に表示させる技術が利用されている。このような技術は、特に、監視カメラ、マルチアングル映像、テレビ番組の多重化放送等でよく用いられている（例えば、特許文献１〜３参照）。しかし、これら従来の技術は、複数のモニターを設置するスペースを節約し、１台のモニターで複数の映像を同時に見られるようにしたものであり、複数の画面を同期させて再生することは行われておらず、オーディオ信号は再生しないか、再生してもどれか１つのチャンネルを再生するかのいずれかであった。
特開平６−６８０４号公報特開平８−１２５９７３号公報特開平１１−３４１４４３号公報 Conventionally, a technique for displaying a plurality of videos on the same screen has been used. Such a technique is often used particularly for surveillance cameras, multi-angle video, multiplexed broadcasting of television programs, and the like (see, for example, Patent Documents 1 to 3). However, these conventional technologies save space for installing a plurality of monitors and allow a single monitor to view a plurality of images at the same time. The audio signal is not played back, or one of the channels is played back when played back.
JP-A-6-6804 Japanese Patent Laid-Open No. 8-125973 JP-A-11-341443

一方、コンピュータの発達により汎用のパーソナルコンピュータ等で映像信号を扱うことが可能になってきている。近年では、映像データと音響データを統合したファイル形式も開発されており、１つのファイルを再生するだけで映像と音声を再生することが可能になっている。 On the other hand, with the development of computers, video signals can be handled by general-purpose personal computers. In recent years, a file format in which video data and audio data are integrated has been developed, and video and audio can be reproduced by reproducing only one file.

しかしながら、このようなファイル形式の映像信号を合成して再生する場合、音響信号の再生タイミングを正確に合わせて再生しないと、正しい音が再生できない。汎用の映像ファイル形式では、映像データの各映像フレームと音響データブロックが対応づけて記録されているため、音響データの再生タイミングを合わせるためには、映像データの再生タイミングも正確に合わせなければならない。ところが、映像データはデータ量が多いため、処理側のＣＰＵの能力が低いと、処理しきれず、再生タイミングがずれ、これに合わせて音響データの再生タイミングもずれてしまうという問題がある。 However, when a video signal in such a file format is synthesized and reproduced, a correct sound cannot be reproduced unless the reproduction timing of the acoustic signal is accurately matched. In the general-purpose video file format, each video frame of the video data and the audio data block are recorded in association with each other. Therefore, in order to match the playback timing of the audio data, the playback timing of the video data must also be accurately matched. . However, since video data has a large amount of data, there is a problem that if the processing CPU is low in capacity, the processing cannot be completed and the reproduction timing is shifted, and the reproduction timing of the audio data is also shifted accordingly.

そこで、本発明は、映像フレームに音響データブロックが同期して添付された形式の音付映像ファイルを複数再生させる場合でも、正確なタイミングで同期して再生することが可能な音付映像の再生装置を提供することを課題とする。 Therefore, the present invention reproduces sound-added video that can be played back synchronously at accurate timing even when a plurality of sound-added video files in a format in which audio data blocks are attached to a video frame synchronously. It is an object to provide an apparatus.

上記課題を解決するため、本発明では、音付映像の再生装置を、複数の映像フレームを有し、各映像フレームに音響データブロックが対応付けて添付された音付映像ファイルを複数蓄積した音付映像ファイル記憶手段と、前記音付映像ファイル記憶手段から合成再生の対象とする複数の音付映像ファイルを選択するファイル選択手段と、前記選択された複数の音付映像ファイルから各々対応する映像フレームを抽出し、１つの合成映像フレームに合成する映像フレーム合成手段と、前記選択された複数の音付映像ファイルから各々対応する映像フレームに添付された音響データブロックを抽出し、１つの合成音響データブロックに合成する音響データブロック合成手段と、前記合成映像フレームをディスプレイのメモリに書き込み画面表示させる映像フレーム表示手段と、前記合成音響データブロックをサウンドデバイスに書き込み音響再生させる音響出力手段と、前記映像フレーム合成手段、前記音響データブロック合成手段、前記映像フレーム表示手段、前記音響出力手段における各処理を、前記選択された音付映像ファイルに収納されている全映像フレームに対して、所定の時間間隔で順次実行するよう制御する再生タイミング制御手段を有する構成としたことを特徴とする。 In order to solve the above-mentioned problems, in the present invention, a sound video playback apparatus has a plurality of video files with a plurality of video frames each having a plurality of video frames and an audio data block associated with each video frame. Video file storage means, file selection means for selecting a plurality of sound-added video files to be synthesized and reproduced from the sound-added video file storage means, and video corresponding to each of the selected plurality of sound-added video files Video frame synthesizing means for extracting a frame and synthesizing it into one synthesized video frame, and extracting an audio data block attached to each corresponding video frame from the selected plurality of video files with sound, and producing one synthesized audio A sound data block synthesizing unit that synthesizes the data block; Video frame display means, sound output means for writing the synthesized audio data block to a sound device and reproducing the sound, each of the video frame synthesizing means, the audio data block synthesizing means, the video frame display means, and the sound output means. It is characterized by having a reproduction timing control means for performing control so that the processing is sequentially executed at predetermined time intervals for all the video frames stored in the selected video file with sound.

本発明によれば、音付映像ファイルを複数同時に再生する際、音付映像ファイルから映像フレームを抽出して合成映像フレームを生成し、各映像フレームに対応する音響データブロックを合成して合成音響データブロックを生成し、合成映像フレームを表示するとともに、合成音響データブロックを再生するようにしたので、再生する機器のＣＰＵの処理能力が低い場合であっても、正確なタイミングで同期して再生することが可能となる。 According to the present invention, when a plurality of sound-added video files are played back simultaneously, a video frame is extracted from the sound-added video file to generate a composite video frame, and a sound data block corresponding to each video frame is combined to generate a composite audio frame. Since the data block is generated, the synthesized video frame is displayed, and the synthesized audio data block is played back, even if the processing capacity of the CPU of the playback device is low, it is played back synchronously with accurate timing. It becomes possible to do.

（１．音付映像ファイルの構造）
以下、本発明の実施形態について図面を参照して詳細に説明する。まず、本発明に係る音付映像の再生装置で再生の対象とする音付動画ファイルの構造について説明する。図１に、汎用的な音付映像ファイルの構造を模式的に表現した図を示す。図１において、Ｖは映像フレーム、Ａは音響データブロックを示しており、Ｆ１〜ＦＫは、映像フレームに対応したフレーム番号を示している。汎用フォーマットでは、１秒間３０フレームで構成されており、例えば、３分の動画データであれば、５４００フレームで構成されることになる。また、映像フレームＶは、圧縮されているのが通常であり、圧縮方式により、１つの映像フレームＶから静止画像を復元できる場合もあり、他の映像フレームＶを利用しなければ静止画像を復元できない場合もある。音響データは、フレーム単位、すなわち１／３０秒単位で区分され、音響データブロックＡとして記録される。例えば、サンプリング周波数４８ｋＨｚでステレオ音響信号をサンプリングした場合は、１つの音響データブロックには、１／３０秒に相当する３２００サンプルが記録されることになる。 (1. Structure of video file with sound)
Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. First, the structure of a moving image file with sound to be played back by the playback device for video with sound according to the present invention will be described. FIG. 1 schematically shows the structure of a general-purpose sound-added video file. In FIG. 1, V indicates a video frame, A indicates an audio data block, and F1 to FK indicate frame numbers corresponding to the video frames. In the general-purpose format, it is composed of 30 frames per second. For example, if it is 3 minutes of moving image data, it is composed of 5400 frames. In addition, the video frame V is usually compressed, and a still image may be restored from one video frame V by a compression method. If no other video frame V is used, the still image is restored. Sometimes it is not possible. The acoustic data is divided in units of frames, that is, in units of 1/30 seconds, and recorded as an acoustic data block A. For example, when a stereo sound signal is sampled at a sampling frequency of 48 kHz, 3200 samples corresponding to 1/30 seconds are recorded in one sound data block.

図１に示したような音付映像ファイルは、公知の手法により作成することができる。すなわち、ビデオカメラ等で撮影することにより得られる互いに対応付けられた映像データと音響データを音付映像ファイル形式に変換しても良いし、別々に得られた映像データと音響データを対応付けて記録するようにしても良い。本実施形態では、別々に得られた映像データと音響データを対応付けて記録することにより作成している。 The sound-added video file as shown in FIG. 1 can be created by a known method. That is, video data and audio data associated with each other obtained by shooting with a video camera or the like may be converted into a sound-added video file format, or video data and audio data obtained separately are associated with each other. It may be recorded. In the present embodiment, it is created by associating and recording separately obtained video data and audio data.

（２．音付映像ファイルの作成）
続いて、本実施形態における音付映像ファイルの作成について説明する。本実施形態では、本システムを、音付映像ファイルに素材として記録された音響データを合成して再生する装置に利用する場合を想定して説明する。本システムにおいては、合成する音付映像ファイルの数を適宜変更することができるが、本実施形態では、最大５つまで選択可能とした場合について説明する。この場合、音響データは、５つのトラックに設定されて再生されることになるが、各トラックについて例えば５つの楽曲を選択可能となるようにすると、全部で２５の音響データが必要となる。そのため、まず、録音等により得られたアナログの音響データをデジタル化して２５個のデジタルの音響データを得る。アナログの音響信号のデジタル化は、従来の一般的なＰＣＭの手法を用いて行う。具体的には、所定のサンプリング周波数でアナログ音響信号をサンプリングし、振幅を所定の量子化ビット数を用いてデジタルデータに変換する処理を行う。このようにしてデジタル化した音響データは、量子化ビット数に応じた値をもつサンプルの時系列の集合となる。例えば、サンプリング周波数を４８ｋＨｚ、量子化ビット数を１６ビットとした場合、１秒間のアナログ音響信号は、−３２７６８〜３２７６７の値をとるサンプル４８０００個からなるデジタル音響データに変換されることになる。 (2. Creation of video file with sound)
Next, creation of a video file with sound in the present embodiment will be described. In the present embodiment, a case will be described in which the present system is used for an apparatus that synthesizes and reproduces audio data recorded as a material in a sound-added video file. In the present system, the number of sound-added video files to be synthesized can be changed as appropriate. In the present embodiment, a case where a maximum of five files can be selected will be described. In this case, the acoustic data is set to five tracks and reproduced. However, if, for example, five songs can be selected for each track, a total of 25 acoustic data are required. Therefore, first, analog sound data obtained by recording or the like is digitized to obtain 25 digital sound data. Digitization of an analog acoustic signal is performed using a conventional general PCM technique. Specifically, an analog audio signal is sampled at a predetermined sampling frequency, and an amplitude is converted into digital data using a predetermined number of quantization bits. The acoustic data digitized in this way becomes a set of time series of samples having a value corresponding to the number of quantization bits. For example, when the sampling frequency is 48 kHz and the number of quantization bits is 16 bits, the analog sound signal for 1 second is converted into digital sound data composed of 48000 samples having values of −32768 to 32767.

複数の音響データを合成して１つの再生音響データとして再生するためには、合成対象とする音響データの各再生時間が同一となるように加工する必要がある。これは、１つの音響データを基準として、他の音響データの各サンプル（各時刻において所定のビット数で量子化したもの）が、基準とした素材音響信号に時間的かつ音楽的に同期するように調整する処理を行う。また、本実施形態では、再生する利用者が、自由に音楽の構成を変化させることが可能なように、各音響データをメロディ、コード、リズム等のパートに分けて作成している。各音響データは、上述のようにアナログの音響信号をＰＣＭ等の手法でデジタルデータ化したものである。 In order to synthesize a plurality of pieces of sound data and reproduce them as one piece of reproduced sound data, it is necessary to process the sound data to be synthesized to have the same reproduction time. This is based on one acoustic data so that each sample of other acoustic data (quantized with a predetermined number of bits at each time) synchronizes temporally and musically with the reference material acoustic signal. Process to adjust to. In this embodiment, the acoustic data is divided into parts such as melody, chord, rhythm, etc. so that the user who reproduces can freely change the music composition. Each acoustic data is obtained by converting an analog acoustic signal into digital data by a technique such as PCM as described above.

ここで、５つのトラックとして設定する各音響データについて説明する。図２は、各トラックの音響データの信号波形を模式的に示したものである。図２の例では、各音響データが左右（Ｌ・Ｒ）２チャンネルで構成されるステレオ音響データの場合を示している。図２においては、説明の簡略化のため、信号の振幅値がある程度以上のレベルを有する部分、すなわち非無音部については同一の振幅で波形を示し、無音部は波形が無い状態で示している。複数のトラックの楽曲を合成して再生し、なおかつ各トラックについての楽曲を複数から選択可能とする場合、どのような組合せになっても、合成後の楽曲がまともなものとなるようにするために、各音響データは所定の規則に従って作成される必要がある。したがって、各トラックにおいては、どの楽曲を選択しても非無音部と無音部の時間的位置が原則同じになるように構成されている。すなわち、例えばトラック１用として準備される５つの各音響データは、原則同一位置に非無音部（有音部）、無音部を有するものとなるが、音楽的な変化が乏しくなることを避けるため、音楽規則上支障がない範囲で、非無音部と無音部の長さを多少変化させることも行われる。 Here, each acoustic data set as five tracks will be described. FIG. 2 schematically shows the signal waveform of the acoustic data of each track. In the example of FIG. 2, a case is shown in which each piece of acoustic data is stereo acoustic data composed of two left and right (LR) channels. In FIG. 2, for simplification of description, a portion where the amplitude value of the signal has a level higher than a certain level, that is, a non-silence portion, shows a waveform with the same amplitude, and a silence portion is shown with no waveform. . When combining and playing music from multiple tracks and making it possible to select multiple songs for each track, to ensure that the combined music is decent regardless of the combination In addition, each acoustic data needs to be created according to a predetermined rule. Therefore, each track is configured such that, regardless of which piece of music is selected, the temporal positions of the silent section and the silent section are basically the same. That is, for example, each of the five acoustic data prepared for the track 1 has a non-silent part (sound part) and a silent part at the same position in order to avoid poor musical change. The lengths of the non-silence portion and the silence portion are also changed somewhat within a range that does not hinder the music rules.

図２に示したような波形の各音響データを合成して再生すると、まず、最初にトラック１とトラック５からの音が聞こえ、次に、トラック５からの音が消えてトラック３からの音が聞こえ、次に、トラック１とトラック３からの音が消えてトラック２とトラック４からの音が聞こえ、次に、トラック２とトラック４からの音が消えてトラック１とトラック３からの音が聞こえ、次に、トラック３からの音が消えてトラック５からの音が聞こえ、最後にトラック１とトラック５からの音が消えるということになる。 When the acoustic data having the waveforms as shown in FIG. 2 are synthesized and reproduced, the sound from the track 1 and the track 5 is first heard, and then the sound from the track 5 disappears and the sound from the track 3 is heard. Next, the sound from track 1 and track 3 disappears and the sound from track 2 and track 4 is heard. Next, the sound from tracks 2 and 4 disappears and the sound from tracks 1 and 3 Then, the sound from the track 3 disappears, the sound from the track 5 is heard, and finally the sounds from the track 1 and the track 5 disappear.

一方、映像データは、上記音響データに記録された音の内容に合わせたものを撮影し、撮影したデータを所定の方式で圧縮符号化する。例えば、各トラックに各国を代表するような楽器の音響データを設定する場合、その国の風景を撮影した映像を映像データとして撮影する。続いて、別々に得られた映像データと音響データを統合して１つの音付映像ファイルとすることになるが、これは、映像データの各フレームに対して、対応する時間相当の音響データブロックを記録することにより行われる。例えば、映像データが３０フレーム／秒、音響データが、４８ｋＨｚステレオでサンプリングされている場合、３２００サンプルを１つの音響データブロックとして１つの映像フレームと対応付けて記録される。すなわち、図１に示したような形態で記録されることになる。 On the other hand, the video data is photographed according to the sound content recorded in the sound data, and the photographed data is compression-coded by a predetermined method. For example, when the acoustic data of a musical instrument representing each country is set for each track, an image of the scenery of the country is captured as image data. Subsequently, the separately obtained video data and audio data are integrated into one sound-added video file, which corresponds to an audio data block corresponding to the time corresponding to each frame of the video data. This is done by recording. For example, when video data is sampled at 30 frames / second and audio data is sampled at 48 kHz stereo, 3200 samples are recorded as one audio data block in association with one video frame. That is, it is recorded in the form as shown in FIG.

本実施形態においては、さらに、メニュー画面用にデータを加工し、記録している。具体的には、上記２５個の音付映像ファイルから先頭の１０秒を取り出し、メニュー用映像ファイルとデモ再生用音響ファイルを作成している。メニュー用映像ファイルは、上記音付映像ファイルから取り出した１０秒分の映像データを、各映像フレーム単位で２５個合成してメニュー用合成フレームを生成する。これにより、３００枚のメニュー用合成フレームから構成されるメニュー用映像ファイルが得られる。一方、デモ再生用音響ファイルは、上記音付映像ファイルから取り出した１０秒分の音響データブロックを連続して記録することにより得られる。また、映像を合成して再生する際に、その背景とする場合には、別途背景映像用の映像データである背景映像ファイルを用意しておく。背景映像ファイルは、構造的には、１秒３０フレームの通常の映像ファイルである。 In the present embodiment, data is further processed and recorded for the menu screen. Specifically, the first 10 seconds are extracted from the 25 sound-added video files, and a menu video file and a demo playback sound file are created. The menu video file generates a menu composite frame by synthesizing 25 pieces of video data for 10 seconds extracted from the sound-added video file for each video frame. Thus, a menu video file composed of 300 menu composite frames is obtained. On the other hand, the demo playback audio file is obtained by continuously recording 10-second audio data blocks extracted from the sound-added video file. In addition, when a video is synthesized and played back, a background video file, which is video data for background video, is prepared separately as the background. The background video file is structurally a normal video file of 30 frames per second.

（３．システム構成）
図３は、本発明に係る映像信号再生装置の一実施形態を示すシステム構成図である。図３において、１０は音付映像ファイル記憶手段、１１はメニュー用映像ファイル記憶手段、１２はデモ再生用音響ファイル記憶手段、１３は背景映像ファイル記憶手段、２０はファイル選択手段、３０は映像フレーム合成手段、４０は音響データブロック合成手段、５０は再生タイミング制御手段、６０は映像フレーム表示手段、７０は音響出力手段である。 (3. System configuration)
FIG. 3 is a system configuration diagram showing an embodiment of a video signal reproducing apparatus according to the present invention. In FIG. 3, 10 is a video file storage means with sound, 11 is a video file storage means for menu, 12 is an audio file storage means for demonstration reproduction, 13 is a background video file storage means, 20 is a file selection means, and 30 is a video frame. A synthesizing unit, 40 is an audio data block synthesizing unit, 50 is a reproduction timing control unit, 60 is a video frame display unit, and 70 is an audio output unit.

図３において、音付映像ファイル記憶手段１０は、音付映像ファイルを記憶するための記憶装置である。メニュー用映像ファイル記憶手段１１は、メニュー用の映像ファイルを記録した記憶装置である。デモ再生用音響ファイル記憶手段１２は、デモ再生用音響ファイルを記録した記憶装置である。背景映像ファイル記憶手段１３は、背景映像ファイルを記録した記憶装置である。映像フレーム合成手段３０は、音付映像ファイルから映像フレームを抽出して合成する機能を有している。音響データブロック合成手段４０は、音付映像ファイルから音響データブロックを抽出して合成する機能を有している。再生タイミング制御手段５０は、合成された映像フレームおよび音響データブロックの再生タイミングを制御する機能を有している。映像フレーム表示手段６０は、再生タイミング制御手段５０の指示タイミングにしたがって合成された映像フレームをディスプレイのメモリに書き込み画面表示させる機能を有している。音響出力手段７０は、再生タイミング制御手段５０の指示タイミングにしたがって、合成された音響データブロックを音として出力する機能を有している。具体的には、コンピュータに装着された合成音響データブロックをサウンドデバイスに書き込み音響再生させる処理を行う。図３に示したシステムは、現実には、コンピュータおよびその周辺機器等のハードウェア、コンピュータに搭載する専用のソフトウェアにより実現される。 In FIG. 3, a sound-added video file storage means 10 is a storage device for storing a sound-added video file. The menu video file storage means 11 is a storage device in which a menu video file is recorded. The demo playback sound file storage means 12 is a storage device that records a demo playback sound file. The background video file storage means 13 is a storage device that records a background video file. The video frame synthesizing unit 30 has a function of extracting and synthesizing video frames from the sound-added video file. The sound data block synthesizing unit 40 has a function of extracting and synthesizing the sound data block from the sound-added video file. The reproduction timing control means 50 has a function of controlling the reproduction timing of the synthesized video frame and audio data block. The video frame display means 60 has a function of writing a video frame synthesized in accordance with the instruction timing of the reproduction timing control means 50 into a display memory and displaying it on the screen. The sound output means 70 has a function of outputting the synthesized sound data block as sound according to the instruction timing of the reproduction timing control means 50. Specifically, a process of writing a synthesized sound data block attached to a computer to a sound device and reproducing the sound is performed. The system shown in FIG. 3 is actually realized by hardware such as a computer and its peripheral devices, and dedicated software installed in the computer.

（４．処理動作）
続いて、図３に示したシステムの処理動作について説明する。まず、システムを起動すると、図４に示すようなメニュー画面が表示される。図４において、Ｅは映像を表示するための映像表示領域であり、映像区画Ｅ１１〜映像区画Ｅ５５の２５の区画に分けられて映像が表示されている。Ｃ１〜Ｃ５は映像区画を選択していることを示すカーソルであり、１行に１つ用意されている。すなわち、映像は１行につき１つ選択可能となっており、初期状態では、図４に示すように左端の映像区画が選択されている。このメニュー画面では、図４に示すように２５個の映像が表示されているように見えるが、システムとしては、メニュー用映像ファイルを再生する処理を行っているだけであり、実際には、ディスプレイには１つの映像が１秒３０フレームのペースで表示されている。ただし、もともと異なる２５個の映像フレームを合成したものであるため、見る側から見ると、映像区画ごとに映像の内容が異なっている。この表示処理は、具体的には、映像フレーム表示手段６０が、メニュー用映像ファイル記憶手段１１からメニュー用映像ファイルを抽出し、再生することにより行われる。 (4. Processing operation)
Next, the processing operation of the system shown in FIG. 3 will be described. First, when the system is activated, a menu screen as shown in FIG. 4 is displayed. In FIG. 4, E is a video display area for displaying a video, and the video is divided into 25 sections of a video section E11 to a video section E55. C1 to C5 are cursors indicating that a video section is selected, and one cursor is prepared for each line. That is, one video can be selected per line, and in the initial state, the leftmost video section is selected as shown in FIG. On this menu screen, it appears that 25 videos are displayed as shown in FIG. 4, but the system only performs the process of playing back the video file for the menu. One video is displayed at a pace of 30 frames per second. However, since 25 different video frames are originally synthesized, the content of the video is different for each video section when viewed from the viewing side. Specifically, this display processing is performed by the video frame display means 60 extracting the menu video file from the menu video file storage means 11 and playing it back.

このようなメニュー画面上で、利用者は合成再生の対象とする音付映像ファイルを選択することになる。具体的には、利用者が、表示されている映像区画をクリックすることにより行われる。利用者は、各行につき１つの映像区画を選択し、全部で５つの映像区画が選択されることになる。利用者が他の映像区画をクリックすると、クリックされた映像上にカーソルが移動すると共に、その映像区画に対応するデモ再生用音響ファイルが、デモ再生用音響ファイル記憶手段１２から抽出され、音響出力手段７０により再生出力される。すなわち、利用者にしてみると、クリックした映像区画に対応した音を、その場で聴くことができる。上述のように、デモ再生用音響ファイルは１０秒程度の長さであり、利用者が合成再生する対象となる演奏等の内容を確認するために用いられる。このようにして、利用者は、各行につき、１つの映像区画を選択していく。利用者が選択した後の画面の状態を図５に示す。図５の例では、１行目では映像区画Ｅ１２、２行目では映像区画Ｅ２３、３行目では映像区画Ｅ３５、４行目では映像区画Ｅ４１、５行目では映像区画Ｅ５３がそれぞれ選択されたことを示している。 On such a menu screen, the user selects a video file with sound to be synthesized and reproduced. Specifically, this is performed by the user clicking on the displayed video section. The user selects one video section for each row, and a total of five video sections are selected. When the user clicks on another video section, the cursor moves on the clicked video, and a demo playback sound file corresponding to the video section is extracted from the demo playback sound file storage means 12 and is output as a sound output. Reproduced and output by means 70. That is, for the user, the sound corresponding to the clicked video section can be heard on the spot. As described above, the demo playback sound file has a length of about 10 seconds and is used by the user to confirm the contents of a performance or the like to be synthesized and played back. In this way, the user selects one video section for each row. The state of the screen after the user has selected is shown in FIG. In the example of FIG. 5, the video segment E12 is selected in the first row, the video segment E23 is selected in the second row, the video segment E35 is selected in the third row, the video segment E41 is selected in the fourth row, and the video segment E53 is selected in the fifth row. It is shown that.

ここで、このようなメニュー画面の構造について説明しておく。メニュー画面は、図６に示すような素材選択ボタン群、メニュー用映像フレーム、カーソル用オーバレイウィンドウの３つのレイヤーで構成されている。そして、素材選択ボタン群の上にメニュー用映像フレームを重ね、さらにその上にカーソル用オーバレイウィンドウを重ねることにより図４、図５に示した映像表示領域における表示が行われることになる。そして、利用者が映像区画上をクリックすると、その下の素材選択ボタンが反応し、カーソルレイヤー上の対応する行に配置されたカーソルが移動することになる。例えば、利用者がメニュー画面上で、図６（ｄ）に示すような３行３列目の映像区画をクリックすると、図６（ｅ）に示すように３行目のカーソルが１列目から３列目に移動することになる。この際、選択された映像区画に対応したデモ再生用音響ファイルが再生されることになる。５つの映像区画が選択された状態で、再生ボタンをクリックすると、選択された映像区画に対応する音付映像ファイルが合成されて再生されることになる。 Here, the structure of such a menu screen will be described. The menu screen is composed of three layers of a material selection button group, a menu video frame, and a cursor overlay window as shown in FIG. Then, a menu video frame is superimposed on the material selection button group, and a cursor overlay window is further superimposed on the menu video frame, whereby display in the video display area shown in FIGS. 4 and 5 is performed. When the user clicks on the video section, the material selection button below it reacts, and the cursor placed on the corresponding line on the cursor layer moves. For example, when the user clicks the video section in the third row and the third column as shown in FIG. 6D on the menu screen, the cursor in the third row starts from the first column as shown in FIG. It will move to the third row. At this time, the audio file for demo reproduction corresponding to the selected video section is reproduced. When the play button is clicked in a state where five video sections are selected, a sound-added video file corresponding to the selected video section is synthesized and played back.

以下、選択された音付映像ファイルの合成再生について説明する。合成再生指示が行われると、映像フレーム合成手段３０は、背景映像ファイル記憶手段１３に記憶された背景映像ファイルから背景映像フレームを抽出するとともに、選択された５つの各音付映像ファイルから先頭の映像フレームを抽出し、背景映像フレームの上に重ねて合成映像フレームを生成する。この際、映像フレームが圧縮されている場合には、映像フレームを抽出した後に復号を行い、合成映像フレームに合成する処理を行う。合成された映像フレームは、映像フレーム表示手段６０により画面上に表示されることになる。このときの合成映像表示画面の様子を図７に示す。図７に示すように、選択された映像が合成されて表示されることになる。この際、映像フレーム合成手段３０は、前記選択された各音付映像ファイルの映像フレームに対して所定の画素アドレスだけオフセットをかけて一つの合成映像フレームに合成する処理を行い、選択された５つの映像フレームが互いに重ならないようにして合成映像フレームを作成している。 Hereinafter, the combined reproduction of the selected video file with sound will be described. When the synthesis / playback instruction is given, the video frame synthesizing unit 30 extracts the background video frame from the background video file stored in the background video file storage unit 13 and starts from the selected five video files with sound. A video frame is extracted and superimposed on the background video frame to generate a composite video frame. At this time, if the video frame is compressed, the video frame is extracted and then decoded and synthesized into the synthesized video frame. The synthesized video frame is displayed on the screen by the video frame display means 60. The state of the composite video display screen at this time is shown in FIG. As shown in FIG. 7, the selected video is synthesized and displayed. At this time, the video frame synthesizing unit 30 performs a process of synthesizing the selected video frame of each sound-added video file with a predetermined pixel address and synthesizing it into one synthesized video frame. A composite video frame is created so that two video frames do not overlap each other.

一方、音響データブロック合成手段４０は、選択された５つの各音付映像ファイルから、先頭の映像フレームに対応する音響データブロックを抽出し、合成する。具体的には、音付映像ファイルの映像フレームに添付されている音響データブロックに対して、同一時刻に対応する値の総和を算出することにより、１つの合成音響データブロックを作成する。そして、再生タイミング制御手段５０による指示にしたがって、合成音響データブロックを音響出力手段７０が音として再生する。以上のような映像フレーム合成手段３０、映像フレーム表示手段６０、音響データブロック合成手段４０、音響出力手段７０による処理が、再生タイミング制御手段５０の指示にしたがって、音付映像ファイルの全フレームが処理されるまで繰り返し行われる。これにより、図７に示した合成映像表示画面には、合成映像が変化して表示され、対応する音響データが音として再生されることになる。再生タイミング制御手段５０は、映像フレームの間隔に対応した１／３０秒間隔で映像フレームの合成、表示処理、音響データブロックの合成、出力処理を実行するよう制御することになる。上記のように、選択された音付映像ファイルの映像を合成して表示させるようにするため、ＣＰＵの処理能力が低い場合であっても、各音付映像ファイル間において、ある音付映像ファイルが他の音付映像ファイルに比べて表示処理が遅れるということがなくなる。したがって、各音付映像ファイルにフレームと対応づけて記録された音響データブロックについても、各音付映像ファイル間でずれることなく、同時に再生されることになる。そのため、各音付映像ファイルに合奏を構成する各楽器による演奏が記録されている場合に、特に効果がある。 On the other hand, the audio data block synthesizing unit 40 extracts and synthesizes an audio data block corresponding to the first video frame from the selected five video files with sound. Specifically, one synthesized audio data block is created by calculating the sum of values corresponding to the same time for the audio data block attached to the video frame of the video file with sound. Then, according to the instruction from the reproduction timing control means 50, the sound output means 70 reproduces the synthesized sound data block as sound. The processing by the video frame synthesizing unit 30, the video frame display unit 60, the audio data block synthesizing unit 40, and the audio output unit 70 as described above is performed on all frames of the sound-added video file according to the instruction of the reproduction timing control unit 50. It is repeated until it is done. As a result, the synthesized video is changed and displayed on the synthesized video display screen shown in FIG. 7, and the corresponding acoustic data is reproduced as sound. The reproduction timing control means 50 controls to execute video frame synthesis, display processing, audio data block synthesis, and output processing at 1/30 second intervals corresponding to the video frame intervals. As described above, in order to synthesize and display the video of the selected video file with sound, even if the processing capacity of the CPU is low, there is a video file with sound between the video files with sound. Is no longer delayed in display processing compared to other video files with sound. Therefore, the sound data blocks recorded in association with the frames in each sound-added video file are also reproduced simultaneously without shifting between the sound-added video files. For this reason, the present invention is particularly effective when a performance by each instrument constituting an ensemble is recorded in each sound-added video file.

また、本システムでは、映像フレームに音響データブロックを対応づけて記録した音付映像ファイルを対象として処理を行っているため、利用者は、コンピュータで汎用的に扱われているフォーマットの音付映像ファイルさえ用意して、本システムの音付映像ファイル記憶手段１０に記憶させておけば、本システムで利用することが可能となる。すなわち、利用者としては、素材として音付映像ファイルを準備するだけで、合成再生する複数の音付映像ファイルを随時入れ替えることにより、再生する映像および音楽コンテンツにバリエーションを与えることが可能になり、長時間にわたって変化する環境映像およびＢＧＭを提供することも可能になる。 In addition, in this system, since processing is performed on a sound-added video file that is recorded by associating an audio data block with a video frame, the user can add a sound-added video in a format commonly used by computers. If even a file is prepared and stored in the sound-added video file storage means 10 of this system, it can be used in this system. In other words, as a user, it is possible to give variations to video and music content to be played by simply preparing a video file with sound as a material and replacing multiple video files with sound to be synthesized and played as needed. It is also possible to provide environmental images and BGM that change over a long period of time.

以上、本発明の好適な実施形態について説明したが、本発明は、上記実施形態に限定されず、種々の変形が可能である。例えば、上記実施形態では、選択された映像を合成して表示する際、背景映像とともに合成するようにしたが、背景映像を用いないようにしても良い。また、上記実施形態では、音付映像ファイル選択用のメニュー画面を表示する際、複数の映像を合成したメニュー用映像フレームファイルを作成しておき、これを表示するようにしたが、各映像フレームをそのまま表示させるようにしても良い。例えば、上記の例では、２５個の映像が独立して処理されて表示されることになるので、ＣＰＵの処理能力によっては、最後の方で処理される映像については、処理が間に合わなくなり、先頭の方で処理される映像とのズレが生じる場合がある。しかし、メニューの表示の際には、デモ再生用音響データの再生は、映像フレームと同期させることなく独立して行っているので、表示されている映像間のズレは問題がないためである。 The preferred embodiment of the present invention has been described above. However, the present invention is not limited to the above embodiment, and various modifications can be made. For example, in the above embodiment, when the selected video is synthesized and displayed, it is synthesized with the background video, but the background video may not be used. Further, in the above embodiment, when displaying a menu screen for selecting a sound-added video file, a menu video frame file in which a plurality of videos are combined is created and displayed. May be displayed as it is. For example, in the above example, 25 videos are processed and displayed independently, so depending on the processing power of the CPU, the last processed video may not be in time, and the top There may be a deviation from the video processed by the. However, when the menu is displayed, the reproduction of the audio data for demonstration reproduction is performed independently without synchronizing with the video frame, so there is no problem in the deviation between the displayed videos.

また、上記実施形態では、音響データについては、圧縮を行わない方式としたが、圧縮を行ったものを音付映像ファイルに各映像フレームと対応付けて記録するようにしても良い。この場合、音響出力手段７０は、圧縮符号化方式に対応して、復号して、再生する機能が必要となる。ただし、音響データは、映像データに比べてデータ量が小さいため、大量の映像フレームを扱う本システムにおいては、あまり問題にならない。 In the above embodiment, the audio data is not compressed, but the compressed data may be recorded in a video file with sound in association with each video frame. In this case, the sound output means 70 is required to have a function of decoding and reproducing in accordance with the compression encoding method. However, since the amount of audio data is smaller than that of video data, there is not much problem in this system that handles a large amount of video frames.

また、上記実施形態では、映像フレームを合成して再生する際、合成以外は特に加工をせず、図７に示すようにそのまま並べて表示したが、各映像についてマスクする等の加工を行うようにしても良い。この場合、音付映像ファイルの映像フレームには、画素ごとに上書きをするか否かを識別するためのマスクデータを添付しておく。そして、映像フレーム合成手段３０は、合成映像フレームを合成する際、選択された各音付映像ファイルの映像フレームの一部の画素に対しては、そのマスクデータに基づいて、合成映像フレームに合成しない制御を行うようにする。 Further, in the above embodiment, when the video frames are combined and played back, no processing other than the synthesis is performed, and the images are displayed side by side as shown in FIG. 7, but processing such as masking of each video is performed. May be. In this case, mask data for identifying whether to overwrite each pixel is attached to the video frame of the video file with sound. Then, when synthesizing the synthesized video frame, the video frame synthesizing means 30 synthesizes the synthesized video frame with respect to some pixels of the selected video frame of the video file with sound based on the mask data. Do not control.

本発明で合成再生の対象とする音付動画ファイルの構造を示す図である。It is a figure which shows the structure of the moving image file with a sound made into the object of composition reproduction by this invention. 合成対象とする各音響信号の信号波形を示す図である。It is a figure which shows the signal waveform of each acoustic signal made into a synthetic | combination object. 本発明に係る音付動画再生システムの一実施形態を示す機能ブロック図である。It is a functional block diagram which shows one Embodiment of the moving image reproduction system with sound which concerns on this invention. 本システムにおいて、合成再生する対象を選択するためのメニュー画面の初期状態を示す図である。In this system, it is a figure which shows the initial state of the menu screen for selecting the object to synthesize | combine and reproduce. 利用者により合成再生の対象が選択された状態のメニュー画面の状態を示す図である。It is a figure which shows the state of the menu screen in the state as which the object of synthetic | combination reproduction was selected by the user. メニュー画面の映像表示領域を構成する各レイヤーの構造を示す図である。It is a figure which shows the structure of each layer which comprises the video display area of a menu screen. 選択された動画を合成して表示した状態を示す図である。It is a figure which shows the state which synthesize | combined and displayed the selected moving image.

Explanation of symbols

１０・・・音付映像ファイル記憶手段
１１・・・メニュー用映像ファイル記憶手段
１２・・・デモ再生用音響ファイル記憶手段
１３・・・背景映像ファイル記憶手段
２０・・・ファイル選択手段
３０・・・映像フレーム合成手段
４０・・・音響データブロック合成手段
５０・・・再生タイミング制御手段
６０・・・映像フレーム表示手段
７０・・・音響出力手段

DESCRIPTION OF SYMBOLS 10 ... Sound-added video file storage means 11 ... Menu video file storage means 12 ... Demo reproduction sound file storage means 13 ... Background video file storage means 20 ... File selection means 30 ... Video frame synthesizing means 40 ... sound data block synthesizing means 50 ... reproduction timing control means 60 ... video frame display means 70 ... sound output means

Claims

Sound-added video file storage means having a plurality of video frames and storing a plurality of sound-added video files each having an audio data block associated with each video frame;
File selection means for selecting a plurality of sound-added video files to be synthesized and reproduced from the sound-added video file storage means;
Video frame synthesizing means for extracting corresponding video frames from the plurality of selected video files with sound and synthesizing them into one synthesized video frame;
An audio data block synthesizing unit that extracts an audio data block attached to each corresponding video frame from the plurality of selected video files with sound, and synthesizes the audio data block into one synthesized audio data block;
Video frame display means for writing the synthesized video frame to a display memory and displaying it on a screen;
Sound output means for writing the synthesized sound data block to a sound device and reproducing the sound;
Each process in the video frame synthesizing unit, the audio data block synthesizing unit, the video frame displaying unit, and the audio output unit is performed on all video frames stored in the selected video file with sound. Reproduction timing control means for controlling to execute sequentially at time intervals;
An apparatus for reproducing video with sound, comprising:

In claim 1,
The file selection means is configured to display a menu composite frame created by combining video frames extracted from a predetermined sound-added video file among sound-added video files stored in the sound-added video file storage means. A sound-added video reproduction apparatus, wherein a sound-added video file to be synthesized and reproduced is selected by displaying on a writing screen in a memory and selecting a video section on the screen.

In claim 2,
It has a function of preparing an audio file for demo reproduction in which only audio data is extracted from a video file with sound in advance, and reproducing an audio file for demo reproduction corresponding to the video section selected on the screen. To play video with sound.

In claim 1,
The image size of each video frame of the selected video file with sound and the number of video frames to be combined are set so as to fit in a display memory when combined, and the video frame combining means An apparatus for reproducing a video with sound, characterized in that the video frame of the selected video file with sound is offset by a predetermined pixel address and synthesized into one composite video frame.

In claim 1,
The video frame synthesizing unit captures a background video frame stored in a background video file which is a background video file as an initial image when synthesizing the synthesized video frame, and then selects the selected video file with sound. A video-with-sound reproduction apparatus characterized in that the video frames are sequentially superimposed on a single composite video frame.

In claim 1,
Mask video data for identifying whether to overwrite each pixel is attached to the video frame of the selected video file with sound, and the video frame synthesis means is configured to synthesize the synthesized video frame. A part of pixels of the video frame of the selected video file with sound is controlled not to be combined with the composite video frame based on the mask data. Playback device.

In claim 1,
The sound data block synthesizing unit calculates the sum of the corresponding values of the sound data blocks with respect to the sound data blocks attached to the video frames of the selected plurality of sound-added video files. An apparatus for reproducing video with sound, which creates two synthetic audio data blocks.

In claim 1,
The video frames of the audio video file stored in the audio video file storage means are recorded in a compressed and encoded state by a predetermined method, and the video frame synthesis means An apparatus for reproducing a video with sound, wherein a video frame is extracted from an attached video file, decoded and then synthesized into a synthesized video frame.

In claim 1,
The sound data block recorded in the sound-added video file stored in the sound-added video file storage means is recorded in a state compressed and encoded by a predetermined method, and the sound data block combining means The sound-added video is characterized in that one audio data block attached to the video frames of the selected plurality of sound-added video files is extracted and then decoded to create one composite audio data block. Playback device.

A program for causing a computer to function as the playback device according to claim 1.