JP2000078531A

JP2000078531A - Method and system for editing audio data

Info

Publication number: JP2000078531A
Application number: JP11599199A
Authority: JP
Inventors: Eriko Koda; 恵理子幸田; Takashi Kudo; 敬工藤
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1998-04-28
Filing date: 1999-04-23
Publication date: 2000-03-14

Abstract

PROBLEM TO BE SOLVED: To adjust the reproducing time length of audio data and to cancel a synchronous deviation between video data and audio data by generating dummy audio data consisting of dummy data which is neglected at the time of decoding. SOLUTION: The data part 34 of dummy audio data is constituted of Error Check 31, AudioData 32 and Ancillary Data 33 and their sizes are different in layer and sampling frequency. Besides, Audio Data 32 is variable length data and data '0' is stored in Audio Data 32 when the size of audio data is less than the size of AAU. Thus, the configuration is made to be the dummy audio data one where '0' is stored in the header of AAU and its data part so that audio data which are reproduced as the soundless one in result are generated without a compression processing.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、オーディオデータ
の編集処理および編集装置に関する。The present invention relates to an audio data editing process and editing apparatus.

【０００２】[0002]

【従来の技術】近年、ビデオ圧縮の技術として国際標準
規格であるＭｏｖｉｎｇＰｉｃｔｕｒｅＥｃｐｅｒ
ｔｓＧｒｏｕｐ（ＭＰＥＧ）による圧縮率向上と二次
記憶装置の低価格化により、ビデオを家庭用コンピュー
タで扱うことも可能になっている。2. Description of the Related Art In recent years, Moving Picture Explorer, which is an international standard as a video compression technique, has been developed.
The improvement of the compression ratio by the ts Group (MPEG) and the reduction in the price of the secondary storage device have made it possible to handle video on a home computer.

【０００３】ＭＰＥＧとは、ＩＳＯの制定した動画圧縮
に関する国際標準規格である。最初の規格であるＭＰＥ
Ｇ1が公表された後、ＭＰＥＧ2と呼ばれる放送用圧縮規
格が制定された。ＭＰＥＧ1は、１．５Ｍｂｐｓ程度の
転送レート転送した画像を、３５２×２４０画素程度の
解像度で毎秒約３０フレーム(ＮＴＳＣの場合)または２
４フレーム(ＰＡＬ)程度で再生する。復号したＭＰＥＧ
1データは、ＶＨＳ相当の画質となることが広く知られ
ている。これに対し、ＭＰＥＧ２は４．０〜８．０Ｍｂ
ｐｓ程度の転送レートで、７２０×４８０画素程度の画
像を再生する。ＭＰＥＧ1に比べ、ＭＰＥＧ2の画質はＬ
Ｄ並みであると言われている。[0003] MPEG is an international standard for moving picture compression established by the ISO. MPE, the first standard
After G1 was announced, a broadcasting compression standard called MPEG2 was established. MPEG1 uses an image transferred at a transfer rate of about 1.5 Mbps at a resolution of about 352 × 240 pixels at about 30 frames per second (in the case of NTSC) or 2 frames.
Playback is performed in about 4 frames (PAL). Decrypted MPEG
It is widely known that one data has an image quality equivalent to VHS. In contrast, MPEG2 is 4.0-8.0 Mb.
An image of about 720 × 480 pixels is reproduced at a transfer rate of about ps. The image quality of MPEG2 is lower than that of MPEG1.
It is said to be on par with D.

【０００４】通常、ＭＰＥＧデータはカメラやキャプチ
ャボードなどから入力したアナログ動画像をＭＰＥＧ形
式に圧縮（エンコード）して生成される。また、キャプ
チャされたＭＰＥＧデータは、ＭＰＥＧデコーダ（ソフ
トウェアまたはハードウェア）がインストールされてい
るＰＣで再生可能である。Normally, MPEG data is generated by compressing (encoding) an analog moving image input from a camera or a capture board into an MPEG format. The captured MPEG data can be reproduced by a PC in which an MPEG decoder (software or hardware) is installed.

【０００５】ＭＰＥＧデータは、ビデオを圧縮したデー
タであるＭＰＥＧビデオストリームとオーディオを圧縮
したデータであるＭＰＥＧオーディオストリームをマル
チプレクスしてＭＰＥＧシステムストリームを形成す
る。通常ＭＰＥＧデータと呼ばれているのは、ＭＰＥＧ
システムストリームであるが、ＭＰＥＧビデオストリー
ム、ＭＰＥＧオーディオストリームだけでもＭＰＥＧデ
ータとしてソフトデコーダ等で再生可能である。[0005] MPEG data is formed by multiplexing an MPEG video stream which is data obtained by compressing video data and an MPEG audio stream which is data obtained by compressing audio data to form an MPEG system stream. MPEG data is usually called MPEG data.
Although it is a system stream, only an MPEG video stream and an MPEG audio stream can be reproduced as MPEG data by a software decoder or the like.

【０００６】通常ＭＰＥＧでは、ピクチャレート（１秒
間に使用するフレーム）は３０であり、この場合９００
フレームのビデオの再生時間長は３０秒となる。したが
って１秒間に３０枚のフレームを使用する場合、1フレ
ームあたりの再生時間長は約３３ｍｓである。これに対
し、ＭＰＥＧのオーディオデータは、レイヤ１、レイヤ
２、レイヤ３の３種類があり、それぞれのサンプリング
周波数が３２ＫＨｚ、４４．１ＫＨｚ、４８ＫＨｚとな
っている。また、ＭＰＥＧのオーディオデータの圧縮単
位であるＡｕｄｉｏＡｃｃｅｓｓＵｎｉｔ(ＡＡＵ)
は、レイヤ１では３８４サンプル、レイヤ２、レイヤ３
では１１５２サンプルが圧縮単位である。In MPEG, the picture rate (frames used per second) is 30, and in this case, 900
The playback time length of the frame video is 30 seconds. Therefore, when 30 frames are used per second, the reproduction time length per frame is about 33 ms. On the other hand, there are three types of MPEG audio data, Layer 1, Layer 2, and Layer 3, and their sampling frequencies are 32 KHz, 44.1 KHz, and 48 KHz. In addition, Audio Access Unit (AAU), which is a compression unit of MPEG audio data, is used.
Is 384 samples for layer 1, layer 2 and layer 3
In this example, 1152 samples are the compression unit.

【０００７】ＭＰＥＧデータも通常の非圧縮データと同
様にキャプチャしたデータをそのまま使用するのではな
く、一部を削除したり、効果的にデータを貼りあわるこ
とがある。ビデオデータとオーディオデータを貼りあわ
せる場合、両者の同期を取る必要があるが、それぞれの
データの長さが同じでないことが多く、ビデオデータと
オーディオデータの間にずれが生じてしまうという問題
があった。As with MPEG data, similarly to normal uncompressed data, captured data may not be used as it is, but may be partially deleted or effectively pasted. When pasting video data and audio data, it is necessary to synchronize them. However, the length of each data is often not the same, and there is a problem that a gap occurs between the video data and the audio data. Was.

【０００８】図２を用いてこの問題について説明する。
まず、フレームのビデオＡに対して２０秒のＢＧＭ（オ
ーディオ）Ｂを貼り合わせる。この後ろに９００フレー
ム(３０秒) のビデオＣと３０秒のオーディオＤを貼り
あわせると、ビデオＣとオーディオＤの始まりにずれが
生じる。さらにビデオＣとオーディオＤの後に、同じ再
生時間長を持つビデオＥとオーディオＦを貼り合わせた
場合も、この同期ずれは続くことになる。This problem will be described with reference to FIG.
First, BGM (audio) B for 20 seconds is attached to video A of a frame. When a video C of 900 frames (30 seconds) and an audio D of 30 seconds are pasted together, a shift occurs at the beginning of the video C and the audio D. Further, when the video E and the audio F having the same playback time length are pasted after the video C and the audio D, the synchronization shift continues.

【０００９】そこで、この同期のずれを解決するための
手段として、特開平９−３７２０４号公報に記載されて
いるような技術がある。この技術は、圧縮された動画像
データと音声データを分離し、それぞれの再生時間を比
較する。音声データの再生時間の方が動画像データの再
生時間よりも短い場合は、予め用意しておいた無音のＰ
ＣＭデータを圧縮して、必要な時間分の無音の圧縮音声
データを生成し、音声データに連結し、再び動画像デー
タと音声データを合成する。Therefore, as a means for solving this synchronization deviation, there is a technique described in Japanese Patent Application Laid-Open No. 9-37204. This technique separates compressed moving image data and audio data, and compares their playback times. If the playback time of the audio data is shorter than the playback time of the moving image data, the P
The CM data is compressed to generate silence compressed audio data for a required time, connected to the audio data, and the moving image data and the audio data are synthesized again.

【００１０】[0010]

【発明が解決しようとする課題】しかしながら、上記の
技術では無音のＰＣＭデータを圧縮する処理が必要であ
るので、追加する必要のある無音区間が長時間にわたる
場合には、ＰＣＭデータの圧縮処理に時間がかかるとい
う問題があった。However, in the above-mentioned technology, since processing for compressing silent PCM data is necessary, when a silent section that needs to be added extends for a long time, the PCM data is not compressed. There was a problem that it took time.

【００１１】また、エンコーダを用いてＭＰＥＧシステ
ムストリーム作成中、動画像データは連続的に入力され
るが、音声が途切れたり、ミュート機能の使用などでオ
ーディオデータが途切れることがある。このような場
合、エンコーダでは無音のオーディオデータを圧縮処理
して動画像データを作成していたが、やはり無音時間が
長時間にわたる場合には、圧縮処理に時間がかかるとい
う問題があった。[0011] In addition, while the MPEG system stream is being created using the encoder, moving image data is continuously input, but audio may be interrupted or audio data may be interrupted due to use of a mute function. In such a case, the encoder generates the moving image data by compressing the silent audio data. However, if the silent period is long, there is a problem that the compression process takes a long time.

【００１２】本願発明は、圧縮処理の必要のないダミー
オーディオデータを生成することを目的とする。An object of the present invention is to generate dummy audio data that does not require compression processing.

【００１３】また本願発明の第二の目的は、圧縮処理の
必要のないダミーオーディオデータによってオーディオ
データの再生時間長を調整し、ビデオデータとオーディ
オデータの同期ずれを解消することを目的とする。A second object of the present invention is to adjust the reproduction time length of audio data by using dummy audio data that does not require compression processing, and to eliminate the synchronization deviation between video data and audio data.

【００１４】[0014]

【課題を解決するための手段】本発明では、圧縮作業を
行わずに音声復号単位の開始を示す情報を少なくとも含
むヘッダと、復号時に無視されるダミーデータからなる
ダミーオーディオデータを生成する編集方法および装置
が開示される。このダミーオーディオデータは復号時に
復号化されることなく、次のヘッダの検索が始まるの
で、結果的に次のデータを検索する時間分、無音区間が
続くこととなる。According to the present invention, there is provided an editing method for generating dummy audio data including a header including at least information indicating the start of a speech decoding unit and dummy data ignored during decoding without performing a compression operation. And an apparatus are disclosed. Since the search for the next header starts without decoding the dummy audio data at the time of decoding, as a result, a silence section continues for the time for searching for the next data.

【００１５】一つの実施の形態では、ビデオデータとオ
ーディオデータを貼りあわせる作業において、オーディ
オデータの再生時間長がビデオデータの再生時間長より
短い場合に、ＭＰＥＧオーディオストリーム足りない時
間長分だけ、オーディオデータのヘッダ情報を抽出した
ヘッダ部分と再生装置側が無視するダミーデータを格納
したデータ部分をオーディオデータとして合成し、追加
する。In one embodiment, in the operation of pasting video data and audio data, if the playback time length of the audio data is shorter than the playback time length of the video data, the audio data may be shorter than the MPEG audio stream. The header portion from which the header information of the data is extracted and the data portion storing the dummy data to be ignored by the playback device are combined as audio data and added.

【００１６】さらに、ビデオデータとオーディオデータ
を取り込みながら一つの動画像音声データを作成する処
理において、オーディオデータが無音である場合に、圧
縮オーディオデータのヘッダとダミーデータからなるダ
ミーオーディオデータを生成し、ビデオデータと合成
し、動画像音声データを生成する。Further, in the process of creating one moving picture audio data while taking in the video data and the audio data, when the audio data is silent, dummy audio data including a header of the compressed audio data and dummy data is generated. And video data to generate moving image audio data.

【００１７】本発明は、数多くの効果をもたらすが特に
大きな効果は、ダミーオーディオデータを生成する際
に、オーディオデータの圧縮処理を必要としないことで
ある。また、オーディオとビデオを貼り合わせる際にオ
ーディオの方が短い場合に、オーディオデータの再生時
間長を調節が短時間の処理で行うことが可能となる。さ
らに、音声部分が無音である動画像音声データを作成す
る処理時間を短縮することができる。The present invention provides a number of effects, but a particularly significant effect is that audio data compression processing is not required when generating dummy audio data. In addition, when the audio and the video are combined, if the audio is shorter, the reproduction time length of the audio data can be adjusted in a short time. Further, it is possible to reduce the processing time for creating moving image audio data in which the audio portion is silent.

【００１８】[0018]

【発明の実施の形態】以下、本発明に係る実施例を図面
を用いて詳細に説明する。Embodiments of the present invention will be described below in detail with reference to the drawings.

【００１９】図１は、本発明の一実施例に係るハードウ
ェア構成を示すブロック図である。FIG. 1 is a block diagram showing a hardware configuration according to an embodiment of the present invention.

【００２０】図１において、各装置を制御するための処
理装置１０、本実施例を実現するためのプログラムをロ
ードするためのメインメモリ１１、表示するための画像
データを一時格納するためのフレームメモリ１２、画像
データを表示するディスプレイ装置１３、画像データ、
オーディオデータを伸長するためのデコーダ１４、デー
タを圧縮するためのエンコーダ１５、アナログの画像デ
ータ、オーディオデータをディジタル変換するためのＡ
／Ｄコンバータ１６、アナログビデオデータを入力する
ための画像入力装置１７、アナログオーディオデータを
入力するためのオーディオ入力装置１８、デコードした
データやプログラムを格納するための二次記憶装置１
９、音声出力装置であるスピーカ１０１から構成されて
いる。In FIG. 1, a processing device 10 for controlling each device, a main memory 11 for loading a program for realizing this embodiment, and a frame memory for temporarily storing image data for display 12, a display device 13 for displaying image data, image data,
A decoder 14 for expanding audio data, an encoder 15 for compressing data, and an A for digitally converting analog image data and audio data.
/ D converter 16, image input device 17 for inputting analog video data, audio input device 18 for inputting analog audio data, secondary storage device 1 for storing decoded data and programs
9. It comprises a speaker 101 which is an audio output device.

【００２１】画像入力装置１７や音声入力装置１８から
入力されたアナログ信号は、１６のＡ／Ｄコンバータに
より、ビデオ、オーディオ別々にデジタル信号に変換さ
れ、エンコーダ１５に入力される。エンコーダ１５では
それらのデジタル信号を圧縮し、ＭＰＥＧ形式のデータ
として出力する。エンコーダ１５により生成されたＭＰ
ＥＧデータは、二次記憶装置１９またはメインメモリ１
１に蓄積される。メインメモリ１１または二次記憶装置
１９に蓄積されたデータは、ユーザからデータ再生要求
があった場合、デコーダ１４により伸長される。伸長さ
れたビデオデータはフレームメモリ１２に書き込まれデ
ィスプレイ１３に表示される。デコーダ１４により伸長
されたオーディオデータはスピーカ１０１を通して再生
される。The analog signals input from the image input device 17 and the audio input device 18 are converted into digital signals for video and audio separately by 16 A / D converters and input to the encoder 15. The encoder 15 compresses these digital signals and outputs them as data in MPEG format. MP generated by the encoder 15
The EG data is stored in the secondary storage device 19 or the main memory 1
1 is stored. The data stored in the main memory 11 or the secondary storage device 19 is decompressed by the decoder 14 when a user requests data reproduction. The expanded video data is written into the frame memory 12 and displayed on the display 13. The audio data expanded by the decoder 14 is reproduced through the speaker 101.

【００２２】本実施例のプログラムは、多くの編集作業
を行うことのできる編集エンジンで起動される。この種
の作業として、入力ファイルや入力ストリームから、他
のファイルで使用するために切り取るカット操作、また
はペースト操作、フェード操作、ブレンド操作、モーフ
ィング（形付け）操作、チィルティング（傾け）操作、
音声データと動画像データの貼り合わせ操作などをあげ
ることができる。一般に、編集エンジンは、編集作業を
要求するアプリケーションが提供するオペレータの種類
に応じて異なる多くの編集作業を管理する。編集エンジ
ンやアプリケーション、また本実施例のプログラムは、
二次記憶装置１９に記憶されており、起動コマンドによ
ってメインメモリ１１にロードされ、本プログラムの各
コマンドに応じて制御装置１０が編集処理装置として各
処理を実行する。The program of this embodiment is started by an editing engine capable of performing many editing operations. These operations include cuts, cuts, pastes, fades, blends, morphs, tilts, and tilts from input files and streams for use in other files.
An operation of bonding audio data and moving image data can be given. In general, an editing engine manages many different editing tasks depending on the type of operator provided by an application requesting the editing task. Editing engine and application, and the program of this embodiment,
It is stored in the secondary storage device 19, loaded into the main memory 11 by a start command, and the control device 10 executes each process as an editing processing device according to each command of this program.

【００２３】図３に、本発明のダミーオーディオデータ
のデータ構成を示す。ヘッダ部分３０は、ｓｙｎｃｗｏ
ｒｄ、ＩＤ、ｌａｙｅｒ、ｐｒｏｔｅｃｔｉｏｎｂｉ
ｔ、ｂｉｔｒａｔｅｉｎｄｅｘ、ｓａｍｐｌｉｎｇ
ｆｒｅｑｕｅｎｃｙ、ｐａｄｄｉｎｇｂｉｔ、ｐｒ
ｉｖａｔｅｂｉｔ、ｍｏｄｅ、ｍｏｄｅｅｘｔ
ｅｎｓｉｏｎ、ｃｏｐｙｒｉｇｈｔ、ｏｒｉｇｉｎａ
ｌ／ｈｏｍｅ、ｅｍｐｈａｓｉｓなどの情報を含み（こ
れらのデータ詳細はＩＳＯ１１１７２−３参照）、大き
さは４バイトである。FIG. 3 shows the data structure of the dummy audio data of the present invention. The header part 30 is syncwo
rd, ID, layer, protection bi
t, bitrate index, sampling
frequency, padding bit, pr
irate bit, mode, mode ext
enhancement, copyright, originala
It contains information such as 1 / home and emphasis (for details of these data, refer to ISO11172-3), and is 4 bytes in size.

【００２４】データ部分３４は、ＥｒｒｏｒＣｈｅｃｋ
３１、ＡｕｄｉｏＤａｔａ３２、Ａｎｃｉｌｌａｒｙ
Ｄａｔａ（外部データ）３３から構成されており、そ
の大きさはレイヤやサンプリング周波数によって異なっ
ている。また、ＡｕｄｉｏＤａｔａ３２は可変長データ
であり、オーディオデータがＡＡＵの大きさに満たない
場合は、残りの部分がＡｎｃｉｌｌａｒｙＤａｔａ３
３となり、この部分には、ＭＰＥＧオーディオ以外の任
意のデータを挿入することができる。本発明では、この
ＡｕｄｉｏＤａｔａ３２にデータ”０”を格納する。
このようなデータがオーディオデータ中にあると、ＭＰ
ＥＧデコーダは、データを復号化することなく次のＡＡ
Ｕの始まりであるヘッダのｓｙｎｃｗｏｒｄを検索す
る。このように、ＡＡＵのヘッダとそのデータ部分に”
０”を格納したダミーオーディオデータの構成とするこ
とで、圧縮処理をすることなく、結果的に無音として再
生されるオーディオデータを作成することができる。The data portion 34 is an ErrorCheck
31, Audio Data32, Ancillary
Data (external data) 33, the size of which differs depending on the layer and sampling frequency. Audio Data 32 is variable-length data. When audio data is smaller than the AAU size, the remaining portion is Ancillary Data 3.
3 and any data other than MPEG audio can be inserted into this part. In the present invention, data "0" is stored in the audio data 32.
If such data is present in the audio data, MP
The EG decoder performs the next AA without decoding the data.
Search for syncword in the header that is the beginning of U. Thus, the AAU header and its data part
With the configuration of the dummy audio data storing “0”, it is possible to create audio data that is consequently reproduced as silence without performing compression processing.

【００２５】次に、図４を用いて入力ファイルからビデ
オデータおよびオーディオデータを貼りあわせる際に、
図３で示したダミーオーディオデータを生成する場合の
処理ステップの概要を説明する。一般に貼りあわせ操作
は、コマンド入力装置１０２による指示に応じてアプリ
ケーションが貼り合わせ操作を行うことを要求したとき
に開始される。Next, when pasting video data and audio data from an input file using FIG.
The outline of the processing steps when the dummy audio data shown in FIG. 3 is generated will be described. Generally, the bonding operation is started when the application requests to perform the bonding operation in response to an instruction from the command input device 102.

【００２６】まず、同期させたいビデオデータとオーデ
ィオデータがコマンド入力装置１０２によって指定され
ると、処理４０において、編集装置は指示されたビデオ
データとオーディオデータにアクセスして、同期させた
いビデオとオーディオの再生時間長を計算する。First, when the video data and audio data to be synchronized are specified by the command input device 102, in a process 40, the editing device accesses the specified video data and audio data to obtain the video and audio data to be synchronized. Calculate the playback time length of.

【００２７】ビデオ再生時間長Lvは、次の式（１）によ
り算出できる。The video playback time length Lv can be calculated by the following equation (1).

【００２８】Ｌｖ＝ピクチャヘッダ数／ピクチャレート・・・（１）また、オーディオデータの再生長Ｌａは、次の式（２）
により算出できる。Lv = number of picture headers / picture rate (1) Further, the reproduction length La of the audio data is expressed by the following equation (2).
Can be calculated by

【００２９】Ｌａ＝ＡＡＵの個数×Ｘ・・・（２）ここでＸは、１ＡＡＵ当たりにかかる再生時間長であ
り、各レイヤのサンプル数に応じて、次の式（３）で求
められる。La = number of AAUs × X (2) Here, X is the playback time length per AAU, and is obtained by the following equation (3) according to the number of samples in each layer.

【００３０】レイヤ１の場合Ｘ＝３８４／サンプリングレートレイヤ２の場合Ｘ＝１１５２／サンプリングレート・・・（３）したがって、処理４０では、ビデオデータのシーケンス
ヘッダにあるピクチャレートと、コマンド入力装置１０
２により指定された編集範囲中にあるピクチャ数により
ビデオ再生時間長を算出する。また、オーディオヘッダ
にあるレイヤ情報、サンプリングレートと編集範囲中に
含まれるＡＡＵ数により、オーディオ再生時間長を計算
する。In the case of Layer 1, X = 384 / sampling rate In the case of Layer 2, X = 1152 / sampling rate (3) Therefore, in the process 40, the picture rate in the sequence header of the video data and the command input device 10
The video playback time length is calculated based on the number of pictures in the editing range designated by 2. Also, the audio playback time length is calculated from the layer information in the audio header, the sampling rate, and the number of AAUs included in the editing range.

【００３１】ピクチャ数とＡＡＵ数は、それぞれピクチ
ャヘッダとオーディオシーケンスヘッダをカウントする
ことでも算出可能であるが、ＰＴＳ（Ｐｒｅｓｅｎｔａ
ｔｉｏｎＴｉｍｅＳｔａｍｐ）とＴＲ（Ｔｅｍｐｏ
ｒａｌＲｅＲｅｆｅｒｅｎｃｅ）から割り出すこと
も可能である。Although the number of pictures and the number of AAUs can be calculated by counting the picture header and the audio sequence header, respectively, the PTS (Presenta)
Tion Time Stamp) and TR (Tempo)
ral Re Reference).

【００３２】次に処理４１及び処理４６において、ビデ
オ再生時間長とオーディオ再生時間長を比較する。処理
４１でＹｅｓの場合、即ちビデオ再生時間長がオーディ
オ再生時間長より長い場合は、ダミーオーディオデータ
を作成する必要があるので処理４２に移る。処理４１で
Ｎｏの時には処理４６の処理へ移行する。処理４６でＹ
ｅｓの場合、即ちオーディオ再生時間長がビデオ再生時
間長より長い場合は、ブランク用のビデオデータを作成
する必要があるので処理４７に進む。処理４６でＮｏの
場合は、両者が等しいため、そのまま処理４５でビデオ
とオーディオを張り合わせ処理を実行する。Next, in steps 41 and 46, the video playback time length and the audio playback time length are compared. If Yes in the process 41, that is, if the video playback time length is longer than the audio playback time length, the process proceeds to the process 42 because it is necessary to create dummy audio data. When the determination in step 41 is No, the process proceeds to step 46. Y in processing 46
In the case of es, that is, when the audio reproduction time length is longer than the video reproduction time length, it is necessary to generate blank video data, so that the process proceeds to processing 47. In the case of No in step 46, both are equal, so in step 45, the video and audio are joined together.

【００３３】処理４２では必要なＡＡＵ数Ｎを求める。
ビデオとオーディオの再生時間長の差分をＹとすると、
ダミーオーディオデータ部分のＡＡＵ数Ｎは、次の式
（４）で求められる。In the process 42, the required number of AAUs N is obtained.
Assuming that the difference between the video and audio playback time lengths is Y,
The number N of AAUs in the dummy audio data portion is obtained by the following equation (4).

【００３４】Ｎ＝Ｙ／Ｘ・・・（４）ここで、Ｙ＝Ｌｖ―Ｌａここで端数が発生した場合は、四捨五入してＮの値を求
める。N = Y / X (4) where Y = Lv-La If a fraction occurs, the value of N is obtained by rounding off.

【００３５】次に処理４３で、ダミーオーディオデータ
のヘッダ情報を、貼りあわせるオーディオデータのヘッ
ダ情報から取り出す。ここで、ダミーオーディオデータ
のヘッダ情報は、その前のデータと同じでなければなら
ない。この情報を取得し、図３に示したダミーオーディ
オデータを処理４４で作成する。ここで、１ＡＡＵ当り
のバイト数Ｓは、次の式（５）で求められる。Next, in process 43, the header information of the dummy audio data is extracted from the header information of the audio data to be pasted. Here, the header information of the dummy audio data must be the same as the data before it. This information is acquired, and the dummy audio data shown in FIG. Here, the number of bytes S per 1 AAU is obtained by the following equation (5).

【００３６】レイヤ１の場合：Ｓ=オーディオビットレート／サンプリングレート×１２レイヤ２、３の場合：Ｓ=オーディオビットレート／サンプリングレート×１４４・・・(5) また、1ＡＡＵのヘッダ情報は4バイト、エラーチェック
は１６バイトであるので、０を格納するバイト数Ｂは、エラーチェックがない場合：Ｂ＝Ｓ―４エラーチェックがある場合：Ｂ＝Ｓ−２０となる。このバイト数分だけの０をヘッダ部分の後に加
えることで１ＡＡＵのダミーオーディオデータを作成す
る。In the case of layer 1: S = audio bit rate / sampling rate × 12 In the case of layers 2 and 3: S = audio bit rate / sampling rate × 144 (5) Further, the header information of one AAU is 4 bytes. Since the error check is 16 bytes, the number B of bytes storing 0 is as follows: when there is no error check: B = S−4, when there is an error check: B = S−20. One AAU dummy audio data is created by adding 0 of the number of bytes after the header portion.

【００３７】最後に処理４５において、ビデオとオーデ
ィオの貼り合わせ処理を行う。Finally, in a process 45, a video and audio joining process is performed.

【００３８】このように、ダミーオーディオデータをＮ
個生成し貼りあわせることで、図２のデータ２２に示す
ようにオーディオとビデオのずれのないデータを短時間
で作成することが可能となる。As described above, the dummy audio data is
By generating the pieces and pasting them together, it is possible to create data in a short time without deviation between audio and video, as shown by data 22 in FIG.

【００３９】なお、図４のブロック４７に示したブラン
ク用ビデオを作成する処理は、同じ出願人による特許出
願(特願平９―３３９６４２号、出願日：平成９年１２
月１０日、発明の名称「画像データの符号量制御方法お
よび装置」)に詳述される。The processing for creating a blank video shown in block 47 of FIG. 4 is performed by the same applicant as a patent application (Japanese Patent Application No. 9-339842, filed on Dec. 1997).
On the 10th, the invention will be described in detail in the title of the invention, "Method and apparatus for controlling code amount of image data".

【００４０】次に他の一実施例について説明する。この
実施例では、例えば、図１において、画像入力装置１７
および音声入力装置１８からそれぞれアナログのビデオ
データおよびオーディオデータを取り込んで、動画像音
声圧縮データを作成する際に、図３に示されるデータ構
成のダミーオーディオデータを作成する方法を開示す
る。Next, another embodiment will be described. In this embodiment, for example, in FIG.
A method for creating dummy audio data having the data configuration shown in FIG. 3 when creating analog video data and audio data from the audio input device 18 and creating moving image audio compressed data, respectively.

【００４１】図５は、本実施例におけるエンコーダ１５
のブロック図である。本実施例におけるエンコーダは、
ビデオデータを圧縮する動画像圧縮手段５１と、オーデ
ィオデータを圧縮するための音声圧縮手段５２と、本発
明のダミーオーディオデータを生成するダミーオーディ
オデータ生成手段５３と、切り替え手段５４と、音声圧
縮手段５２やダミーオーディオデータ生成手段５３や切
り替え手段５５を制御する制御手段５５を備えている。FIG. 5 shows an encoder 15 in this embodiment.
It is a block diagram of. The encoder in this embodiment is
Moving image compression means 51 for compressing video data, audio compression means 52 for compressing audio data, dummy audio data generation means 53 for generating dummy audio data of the present invention, switching means 54, audio compression means 52, a control unit 55 for controlling the dummy audio data generating unit 53 and the switching unit 55.

【００４２】Ａ／Ｄ変換回路１６でディジタル信号に変
換されたビデオデータとオーディオデータは、エンコー
ダ１５に入力され、ビデオデータは動画像圧縮手段５１
において圧縮される。オーディオデータは、音声圧縮手
段５２と制御手段５５に入力される。制御手段５５は、
音声データの出力が所定の値未満の場合、ダミーオーデ
ィオデータ生成手段５３によってダミーオーディオデー
タを生成する。このダミーオーディオデータ生成手段で
は、通常の圧縮音声データのヘッダ部を生成し、さら
に、図３に示すデータ部からなるダミーオーディオデー
タを生成する。この時、制御手段５５は音声圧縮手段５
２でのエンコード処理を中断する。音声データの出力が
所定の値以上になった場合は、音声圧縮手段５２の処理
再開を支持する。制御手段５５は、さらに切り替え手段
５４の制御をし、音声データの出力が所定の値以上の場
合は通常の音声圧縮手段５２によって圧縮された圧縮オ
ーディオデータを出力し、所定の値に満たない場合は、
ダミーオーディオデータを出力するように、切り替え手
段５４を制御する。圧縮されたビデオデータと圧縮オー
ディオデータあるいはダミーオーディオデータは同期を
とられ、二次記憶装置やメインメモリなどの記憶手段に
記憶される。The video data and audio data converted into digital signals by the A / D conversion circuit 16 are input to the encoder 15, and the video data is supplied to the moving picture compression means 51.
Is compressed. The audio data is input to the audio compression unit 52 and the control unit 55. The control means 55
When the output of the audio data is less than the predetermined value, the dummy audio data generating means 53 generates dummy audio data. This dummy audio data generating means generates a header portion of normal compressed audio data, and further generates dummy audio data including a data portion shown in FIG. At this time, the control means 55 controls the sound compression means 5
The encoding process in step 2 is interrupted. When the output of the audio data becomes equal to or more than the predetermined value, the processing restart of the audio compression unit 52 is supported. The control unit 55 further controls the switching unit 54. If the output of the audio data is equal to or more than a predetermined value, the control unit 55 outputs the compressed audio data compressed by the normal audio compression unit 52, and if the output is less than the predetermined value. Is
The switching unit 54 is controlled so as to output dummy audio data. The compressed video data and the compressed audio data or dummy audio data are synchronized and stored in a storage means such as a secondary storage device or a main memory.

【００４３】このように、取り込まれた音声データの出
力が一定値以下の場合は無音であると判断し、ダミーオ
ーディオデータを生成することで、音声データの圧縮処
理を省略することができ、全体の圧縮処理の処理時間を
短縮することができる。As described above, when the output of the fetched audio data is equal to or less than a predetermined value, it is determined that there is no sound, and by generating dummy audio data, the compression processing of the audio data can be omitted. Processing time of the compression processing can be shortened.

【００４４】本実施例では、音声入力手段によって取り
込んだデータを圧縮する場合に、本発明のダミーオーデ
ィオデータを制御する制御手段を含む符号器が記載され
ているが、制御手段は、上位装置にあるプロセッサの指
示されることも考えられる。In this embodiment, the encoder including the control means for controlling the dummy audio data of the present invention when compressing the data fetched by the voice input means is described. It is also conceivable that a certain processor instructs.

【００４５】さらに、本実施例では、ローカル型のアー
キテクチャが記載されているが、その他、ネットワーク
で接続されている他のクライアントに対して、ミュート
機能を用いて映像のみを送信するために動画像データと
無音データを圧縮する必要がある場合など、無音データ
の圧縮が必要ないろいろな場合に用いられることはいう
までもない。Further, in this embodiment, a local type architecture is described. However, in order to transmit only a video using a mute function to another client connected via a network, a moving image is used. Needless to say, it is used in various cases where compression of silence data is necessary, such as when data and silence data need to be compressed.

【００４６】[0046]

【発明の効果】以上説明したように本発明によれば、無
音データを作成する際にエンコードするオーディオデー
タのフォーマットに合致したＡＡＵヘッダとダミーデー
タを用いるので、圧縮処理をすることなく復号時に無音
となるＭＰＥＧオーディオデータを自在に生成すること
が可能となり、再生時間長の異なるビデオとオーディオ
の生成・編集の処理時間を短縮することができる。As described above, according to the present invention, when creating silence data, an AAU header and dummy data that match the format of the audio data to be encoded are used. MPEG audio data can be freely generated, and the processing time for generating and editing video and audio having different reproduction time lengths can be reduced.

[Brief description of the drawings]

【図１】本発明の一実施例を実現するためのシステム
構成の図である。FIG. 1 is a diagram of a system configuration for realizing an embodiment of the present invention.

【図２】本発明の問題と公知例を説明するための編集
データの例である。FIG. 2 is an example of edit data for explaining a problem and a known example of the present invention.

【図３】本発明で使用するダミーオーディオデータの
データ構成である。FIG. 3 is a data configuration of dummy audio data used in the present invention.

【図４】本発明でビデオとオーディオの貼り合わせ方
法を示すフローチャートである。FIG. 4 is a flowchart illustrating a method of combining video and audio according to the present invention.

【図５】本発明の一実施例を実現するための符号器の
概略を示すブロック図である。FIG. 5 is a block diagram schematically showing an encoder for implementing an embodiment of the present invention.

[Explanation of symbols]

３０…ＭＰＥＧオーディオストリームの１ＡＡＵ中のヘ
ッダ部分、３１…ＭＰＥＧオーディオストリームの１ＡＡＵ中のエ
ラーチェック部分、３２…ＭＰＥＧオーディオストリームの１ＡＡＵ中のデ
ータ部分、３３…ＭＰＥＧオーディオストリームの１ＡＡＵ中のア
ンシラリーデータ部分、３３…ＭＰＥＧオーディオストリームの１ＡＡＵ中のー
データ部分、５１…動画像圧縮手段、５２…音声圧縮手段、５３…ダ
ミーオーディオデータ生成手段、５４…切り替え手段、５５…制御手段30: header part in one AAU of MPEG audio stream 31: error check part in one AAU of MPEG audio stream 32: data part in one AAU of MPEG audio stream 33: ancillary data part in one AAU of MPEG audio stream 33, a data portion in one AAU of the MPEG audio stream, 51, moving image compression means, 52, audio compression means, 53, dummy audio data generation means, 54, switching means, 55, control means

Claims

[Claims]

A header for storing at least information indicating the start of a speech decoding unit, to which dummy data is added; and a header creating step for creating a header portion having the same value for each component of the header; An audio data editing method including a dummy data creating step of creating audio data composed of dummy data that is sometimes ignored.

2. The audio data editing method according to claim 1, further comprising: calculating a reproduction time of the compression-encoded audio data and a reproduction time of the compression-encoded video data. Determining the number of minimum audio decoding units from the playback time difference from the video data, wherein the header creating step creates a header portion by extracting header information of the audio data, and the dummy data creating step includes: An audio data editing method, wherein the dummy data is created in an amount corresponding to the number of the minimum audio decoding units.

3. The audio data editing method according to claim 1, wherein the audio decoding unit is the same minimum unit of audio data as the original audio data that can be decoded, and the dummy data is a bit corresponding to the reproduction time difference. An audio data editing method, characterized in that the audio data editing value is a value of Numerical 0.

4. A step of capturing audio data by audio input, a step of detecting an output of the audio data, and information indicating a start of an audio decoding unit when the output of the audio data is less than a predetermined value. An audio data editing method, comprising the step of creating dummy audio data at least including a header portion that stores and has a value of each component equal to the audio data and dummy data that is ignored during decoding.

5. The audio data editing method according to claim 4, wherein: a step of capturing video data from a video input device; a step of compressing the video data; a step of instructing a start of creation of the dummy audio data; Issuing an instruction to end the creation of the audio data.

6. The audio data editing method according to claim 4, wherein the audio decoding unit is the same minimum unit of audio data as the input audio data that can be decoded, and the dummy data is zero. Audio data editing method.

7. A storage device for storing compression-encoded audio data, and header information obtained by accessing the compression-encoded audio data, and a header having the same value as the acquired header information and each component element. An audio data editing system comprising: an editing processing device that generates dummy audio data composed of dummy data ignored during decoding.

8. The audio data editing system according to claim 7, wherein the storage device further stores compression-encoded video data, and the editing processing device stores the compression-encoded video data and the compression-encoded video data. Audio data, wherein the number of minimum audio decoding units corresponding to the reproduction time difference of the encoded audio data is calculated, and dummy audio data corresponding to the minimum audio decoding unit number of the reproduction time difference is created. Editing system.

9. A step of calculating a reproduction time of the compression-encoded audio data, a reproduction time of the compression-encoded video data, and a minimum corresponding to a reproduction time difference between the audio data and the video data. Determining a number of audio decoding units and storing at least information indicating the start of the audio decoding unit, and creating a header portion in which the value of each component element is equal to the audio data; and A computer-readable recording medium storing a program including a dummy data creating step of creating a minimum unit of dummy audio data corresponding to a time difference.