JP2011013620A

JP2011013620A - Voice editing method and device

Info

Publication number: JP2011013620A
Application number: JP2009159850A
Authority: JP
Inventors: Hideki Taniguchi; 秀樹谷口
Original assignee: Panasonic Corp
Current assignee: Panasonic Corp
Priority date: 2009-07-06
Filing date: 2009-07-06
Publication date: 2011-01-20

Abstract

PROBLEM TO BE SOLVED: To solve the problems that a high frequency noise is generated due to a level difference at a connection position, by difference of sound volumes of two streams, when two irreversible compressed voice streams are edited and connected, and that sound quality is unavoidably degraded when compressing a compressed voice data again after having changed a sound volume by decompressing the once compressed voice data.SOLUTION: According to a feature of a codec of input stream, only numerical values of binary data such as a dynamic range or a scale factor are changed, and thereby, effects corresponding to a pseudo fade-out and fade-in are obtained. As a result, two voices are connected by preventing noise generation caused by difference of the sound volume, without degrading sound quality due to processing of decompression/compression, and without requiring processing load of decompression/compression.

Description

本発明は、ＤｏｌｂｙＤｉｔａｌ方式やＭＰＥＧ方式の圧縮音声ストリームの編集において、編集位置の音量を変更してストリームを結合したり、分割したりする音声編集の方法及びその装置に関するものである。 The present invention relates to an audio editing method and apparatus for combining and dividing streams by changing the volume of an editing position in editing a compressed audio stream of the Dolby Digital system or the MPEG system.

従来、この種の方法としては、例えば特許文献１に記載されているようなものがあった。
一般的に音量の異なる２つの音声ファイルを結合すると、その音量レベルの違いによって結合位置でノイズが発生する。それを防ぐこと目的として一般的にファイルの結合前にフェードアウトをして音量を絞り、結合後にフェードインして音量を増加する方法がとられる。
特許文献１では、記録を開始して記録ファイルサイズが記録制限に到達する時点Ｔｅより時間Ｔｗだけ前の段階で、音声レベルの計測期間に入る。計測を開始して、音声レベルが所定レベルを下回った時点Ｔｂ１から所定期間Ｔｏｕｔの音声フェードアウトを実行して、ファイルＦ１をクローズし、ファイルＦ２を新規作成する。後続のファイルＦ２に対して所定期間Ｔｉｎの音声フェードインを実行する。画像は、２つのファイルＦ１，Ｆ２で継続して記録される。 Conventionally, this type of method has been described in, for example, Patent Document 1.
In general, when two audio files having different volumes are combined, noise is generated at the combination position due to the difference in the volume levels. In order to prevent this, a method is generally employed in which the volume is reduced by fading out before combining the files and the volume is increased by fading in after combining.
In Patent Document 1, the recording period of the audio level is entered at a stage before time Tw from the time Te when the recording file size reaches the recording limit after starting recording. The measurement is started, and the audio fade-out for a predetermined period Tout is executed from the time Tb1 when the audio level falls below the predetermined level, the file F1 is closed, and the file F2 is newly created. An audio fade-in for a predetermined period Tin is executed for the subsequent file F2. Images are continuously recorded in two files F1 and F2.

特開２００８−９９１２５号公報（第１０頁、第３図、第４図、第５図）JP 2008-99125 A (page 10, FIG. 3, FIG. 4, FIG. 5)

従来例では、ファイルに記録する際に、終了前にフェードアウトし、開始時にフェードインするなど、記録時に実時間で制御する方法である。
また単純な連続再生を想定しており、ファイルの途中部分を削除して結合するような映像編集などでは効果を得られないという課題があった。 In the conventional example, when recording in a file, the control is performed in real time during recording, such as fading out before the end and fading in at the start.
In addition, simple continuous playback is assumed, and there is a problem that the effect cannot be obtained in video editing or the like in which middle portions of files are deleted and combined.

あるいは、映像編集などを想定した他の結合方法においては、圧縮音声データの場合、一旦音声データを解凍し、再圧縮する必要があり、非可逆の圧縮音声コーデックの場合には音質の劣化が避けられないという課題もあった。 Alternatively, in other combining methods that assume video editing, for compressed audio data, it is necessary to decompress and recompress the audio data, and in the case of an irreversible compressed audio codec, avoid deterioration in sound quality. There was also a problem that it was not possible.

本発明は、上記課題を解決するために、音声ストリームを一旦解凍することなしに、圧縮されたストリームのままで編集処理することを可能としており、
音声ストリーム１及び音声ストリーム２を結合する際に、音声ストリーム１の最後の数パッケットを解凍せずにそのコーデックの特徴に応じて、ダイナミックレンジ、あるいはスケールファクタのようなバイナリデータの数値のみを変化させることで、擬似的にフェードアウトに相当する効果を得て、音声ストリーム２の最初の数パッケットを解凍せずにフェードインに相当する効果を得ることで、解凍・圧縮という処理負荷が多大にかかる処理をすることなく、２つの音声を結合する際に、その音量レベルの違いによるノイズを発生することを防ぐことを可能とする。
また音声データを解凍したり、再圧縮する必要がないため、非可逆の圧縮音声コーデックの場合でも音質の劣化が発生しない。
その結果、低い処理能力の装置でも、音質を劣化させることも、ノイズを発生させることもなく、２つのストリームを結合する方法を提供することができる。 In order to solve the above-mentioned problem, the present invention makes it possible to edit an audio stream as it is without decompressing the audio stream.
When combining audio stream 1 and audio stream 2, only the numerical value of the binary data such as dynamic range or scale factor is changed according to the characteristics of the codec without decompressing the last few packets of audio stream 1 By doing so, an effect equivalent to fading out is obtained in a pseudo manner, and an effect equivalent to fade-in is obtained without decompressing the first few packets of the audio stream 2, so that the processing load of decompression / compression is greatly increased. It is possible to prevent noise due to the difference in volume level when two sounds are combined without processing.
Further, since there is no need to decompress or recompress audio data, sound quality does not deteriorate even in the case of an irreversible compressed audio codec.
As a result, it is possible to provide a method for combining two streams without degrading sound quality or generating noise even in a device with low processing capability.

従来の課題を解決するため、本発明の音声編集方法は
第２の判別ステップを設け、入力されるストリームのコーデックの種類を判別して、コーデックに適した音量変更ステップへストリームを出力する。
また入力されるストリームがＤｏｌｂｙＤｉｇｉｔａｌ方式である場合には、変更ステップとしてダイナミックレンジ変更ステップを設け、入力ストリームのデータ構造からダイナミックレンジ値のビット位置を特定し、フェードアウトやフェードインと同等の音量変化を生じるようにその値のみを変更させる。
更に入力されるストリームがＭＰＥＧ方式である場合には、変更ステップとしてスケールファクタ変更ステップを設け、入力ストリームのデータ構造からスケールファクタ値のビット位置を特定し、フェードアウトやフェードインと同等の音量変化を生じるようにその値のみを変更させる。 In order to solve the conventional problem, the audio editing method of the present invention includes a second determination step, determines the codec type of the input stream, and outputs the stream to the volume change step suitable for the codec.
If the input stream is Dolby Digital, a dynamic range change step is provided as the change step, the bit position of the dynamic range value is specified from the data structure of the input stream, and the volume change equivalent to fade-out or fade-in is performed. Only change that value to occur.
If the input stream is in MPEG format, a scale factor change step is provided as a change step, the bit position of the scale factor value is specified from the data structure of the input stream, and the volume change equivalent to fade-out or fade-in is performed. Only change that value to occur.

このように構成することによって、従来のゲイン値の変更による音量の増減に加えて、入力されるストリームのコーデックに応じた音量変更手段を選択できるようになり、低い処理能力の装置でも、ノイズを発生させることなくストリームの結合・分割などを可能とする。 By configuring in this way, in addition to increasing or decreasing the volume by changing the gain value in the past, it becomes possible to select a volume changing means according to the codec of the input stream. Streams can be combined and divided without generating them.

圧縮音声データでは、これまで解凍・圧縮という処理が必要であり、この処理には非常に高い処理能力を要する手段も必要としていたが、本発明では、解凍・圧縮しょりを不要とするため低い処理能力の装置でも実現が可能である。
また、非可逆な圧縮音声データの場合、データの解凍・圧縮という処理によって、これまで音質の劣化が発生していたが、本発明では、この音質を劣化させることがない。
更に２つ以上の音声データを結合したり、切れ目なく再生する際に、２つの音声データの音量レベルの差異により、擬似的な高周波成分が生じ、この高周波成分によりノイズが発生するが、
本発明では、解凍・圧縮という処理なしにフェードイン、フェードアウトを実現して高周波成分を発生させないためノイズを防ぐことができる。 Compressed audio data has so far required processing of decompression / compression, and this processing has also required means requiring very high processing power, but in the present invention, it is low because decompression / compression is unnecessary. It can also be realized with a processing capacity device.
In the case of irreversible compressed audio data, the sound quality has been degraded by the process of data decompression / compression, but in the present invention, the sound quality is not degraded.
Furthermore, when two or more audio data are combined or reproduced without a break, a pseudo high frequency component is generated due to the difference in volume level between the two audio data, and noise is generated by this high frequency component.
In the present invention, noise can be prevented because fade-in and fade-out are realized without generating decompression / compression and no high frequency components are generated.

本実施の形態１の音声編集方法に係わるフローチャートFlowchart for the voice editing method of the first embodiment 本実施の形態２の音声編集方法に係わるフローチャートFlowchart for the voice editing method of the second embodiment 本実施の形態３の音声編集方法に係わるフローチャートFlowchart for the voice editing method of the third embodiment 本実施の形態４の音声編集方法に係わるフローチャートFlowchart for the voice editing method of the fourth embodiment 本実施の形態５の音声編集方法に係わるフローャートFloat related to the voice editing method of the fifth embodiment 本実施の形態６の音声編集装置に係わるブロック図Block diagram related to the audio editing apparatus of the sixth embodiment

以下に、本発明の実施の形態について、図面を参照しながら説明する。 Embodiments of the present invention will be described below with reference to the drawings.

図１は、本実施の形態１における音声編集方法のフローチャートである。ストリーム入力ステップ１には、第１の音声データと第２の音声データが入力される。第１、第２の音声データとしては、例えばムービーで撮影した映像データから分離されたＤｏｌｂｙＤｉｇｉｔａｌ形式の音声データや、ＣＤをリッピングして作成されたＬＰＣＭ形式の音声データ、あるいはインターネットで入手したＭＰＥＧ１ＬａｙｅｒＩＩＩ（以降ＭＰ３）形式の音声データなどが入力される。入力された音声データは、ストリーム入力ステップ１で最終的に本手段にて編集後音声データ１７０として出力される順序となるように順序を並べかえて入力ストリーム１００として出力される。
ストリーム解析ステップ２において、ストリーム入力ステップからの入力ストリーム１００を解析してアクセスユニット（以下ＡＡＵ）と呼ばれる最小の再生データ単位の境界情報や第１の音声データと第２の音声データの切り替わり点を示す切り替わり位置情報といったストリームの付加情報を解析情報１０１として、入力ストリーム１００と共に出力する。
変更開始位置判定ステップ３では、解析情報１０１と入力ストリーム１００を入力し、解析情報から切り替わり位置情報が検出されるまでは、入力ストリーム１００をそのままストリーム出力ステップ５に出力する。切り替わり位置が検出されると、切り替わり位置の直前の音声データ１の数ＡＡＵと、切り替わり位置の直後の音声データ２の数ＡＡＵのみを変更入力ストリーム１０２としてコーデック判定ステップに出力する。切り替わり位置を過ぎると次の切り替わり位置、あるいは入力ストリームの終端まで再びストリーム出力ステップ５に出力する。
数値変更ステップ４において、変更開始位置判定ステップ３から変更入力ストリーム１０２が入力されたら、変更入力ストリームの音声データ１のＡＡＵの先頭からの特定ビットを５，４，３，２，０に変更して、フェードインをし、音声データ２のＡＡＵの同じビットを逆に０，２，３，４，５に変更してフェードアウトとなるように音量を変更させて変更ストリーム１１０として出力する。
ストリーム出力ステップ５では、変更開始位置判定ステップ３からの入力ストリーム１００と数値変更ステップ４からの変更ストリーム１１０を順に結合して編集後音声ストリーム１６０として出力する。 FIG. 1 is a flowchart of the voice editing method according to the first embodiment. In the stream input step 1, the first audio data and the second audio data are input. As the first and second audio data, for example, Dolby Digital format audio data separated from video data shot by a movie, LPCM format audio data created by ripping a CD, or MPEG1 Layer III ( Thereafter, MP3) format audio data and the like are input. In the stream input step 1, the input audio data is rearranged so that it is finally output as the edited audio data 170 by this means and output as the input stream 100.
In the stream analysis step 2, the input stream 100 from the stream input step is analyzed to determine the boundary information of the minimum reproduction data unit called an access unit (hereinafter referred to as AAU) and the switching point between the first audio data and the second audio data. The additional information of the stream such as the switching position information shown is output together with the input stream 100 as analysis information 101.
In the change start position determination step 3, the analysis information 101 and the input stream 100 are input, and the input stream 100 is output as it is to the stream output step 5 until the position information is detected from the analysis information. When the switching position is detected, only the number AAU of the audio data 1 immediately before the switching position and the number AAU of the audio data 2 immediately after the switching position are output as the changed input stream 102 to the codec determination step. When the switching position is passed, the data is output again to the stream output step 5 until the next switching position or the end of the input stream.
In the numerical value changing step 4, when the changed input stream 102 is input from the changing start position determining step 3, the specific bit from the head of the AAU of the audio data 1 of the changed input stream is changed to 5, 4, 3, 2, 0. Then, fade-in is performed, and the same bit of the AAU of the audio data 2 is changed to 0, 2, 3, 4 and 5 to change the volume so as to fade out and output as a changed stream 110.
In the stream output step 5, the input stream 100 from the change start position determination step 3 and the change stream 110 from the numerical value change step 4 are sequentially combined and output as an edited audio stream 160.

実施の形態２は実施の形態１において、第１、第２の音声データとしてＬＰＣＭデータが入力され、数値変更ステップとしてゲインの変更するようにしたものであり、その一例について図２を利用して以下に記述する。図２は実施の形態２のフローチャートである。
ゲイン変更ステップ１０では、入力されるストリームがＬＰＣＭ方式である場合に入力ストリームのデータ構造からゲイン値のビット位置を特定し、フェードアウトやフェードインと同等の音量変化を生じるようにその値のみを変更させる。この様に特定の方式に特化した変更ステップを設けることにより、機械的にＡＡＵの先頭からのオフセット値をスキップし、そこから固定ビット分のバイナリデータを機械的に書き換えるだけで、フェードイン、フェードアウトの効果を得られるため、デコーダやエンコーダといったＣｏｄｅｃを利用せずに音量の制御が可能となる。 In the second embodiment, LPCM data is input as the first and second audio data in the first embodiment, and the gain is changed as a numerical value changing step. An example thereof is shown in FIG. Described below. FIG. 2 is a flowchart of the second embodiment.
In the gain changing step 10, when the input stream is LPCM, the bit position of the gain value is specified from the data structure of the input stream, and only the value is changed so as to produce a volume change equivalent to fade-out or fade-in. Let In this way, by providing a change step specialized for a specific method, the offset value from the head of the AAU is mechanically skipped, and then the binary data for a fixed bit is simply rewritten, and the fade-in, Since a fade-out effect can be obtained, the volume can be controlled without using a codec such as a decoder or an encoder.

実施の形態３は実施の形態１において、第１、第２の音声データとしてＤｏｌｂｙＤｉｇｉｔａｌデータが入力され、数値変更ステップとしてダイナミックレンジの変更するようにしたものであり、その一例について図３を利用して以下に記述する。図３は実施の形態３のフローチャートである。
ダイナミックレンジ変更ステップ１１では、入力されるストリームがＤｏｌｂｙＤｉｇｉｔａｌ方式である場合に入力ストリームのデータ構造からダイナミックレンジ値のビット位置を特定し、フェードアウトやフェードインと同等の音量変化を生じるようにその値のみを変更させる。 In Embodiment 3, DolbyDigital data is input as the first and second audio data in Embodiment 1, and the dynamic range is changed as a numerical value changing step. An example thereof is shown in FIG. Is described below. FIG. 3 is a flowchart of the third embodiment.
In the dynamic range changing step 11, when the input stream is the DolbyDigital system, the bit position of the dynamic range value is specified from the data structure of the input stream, and only that value is generated so as to produce a volume change equivalent to fade-out or fade-in. To change.

実施の形態４は実施の形態１において、第１、第２の音声データとしてＭＰ３データが入力され、数値変更ステップとしてスケールファクタの変更するようにしたものであり、その一例について図４を利用して以下に記述する。図４は実施の形態４のフローチャートである。
スケールファクタ変更ステップ１２では、入力されるストリームがＭＰＥＧ方式である場合に入力ストリームのデータ構造からスケールファクタ値のビット位置を特定し、フェードアウトやフェードインと同等の音量変化を生じるようにその値のみを変更させる。このように構成することによって、従来のゲイン値の変更による音量の増減に加えて、入力されるストリームのコーデックに応じた音量変更手段を選択できるようになり、低い処理能力の装置でも、ノイズを発生させることなくストリームの結合・分割などを可能とする。 In the fourth embodiment, MP3 data is input as the first and second audio data in the first embodiment, and the scale factor is changed as a numerical value changing step. An example thereof is shown in FIG. Is described below. FIG. 4 is a flowchart of the fourth embodiment.
In the scale factor changing step 12, when the input stream is in the MPEG system, the bit position of the scale factor value is specified from the data structure of the input stream, and only that value is generated so as to produce a volume change equivalent to fade-out or fade-in. To change. By configuring in this way, in addition to increasing or decreasing the volume by changing the gain value in the past, it becomes possible to select a volume changing means according to the codec of the input stream. Streams can be combined and divided without generating them.

実施の形態５は実施の形態１において、第１、第２の音声データとしてＤｏｌｂｙＤｉｇｉｔａｌ、ＭＰ３、ＬＰＣＭのいずれのデータが入力されても数値変更ステップにて、自動的にコーデック判定を行い、適応的に音量変更をできるようにしたものであり、その一例について図５を利用して以下に記述する。図５は実施の形態５のフローチャートである。
コーデック判定ステップ６では、変更開始位置判定ステップから入力される変更入力ストリーム１０２がＤｏｌｂｙＤｉｇｉｔａｌ、ＭＰ３、ＬＰＣＭのいずれであるかを自動判別して、ＤｏｌｂｙＤｉｇｉｔａｌ形式であればダイナミックレンジ変更ステップ１１に出力し、ＭＰ３形式であればスケールファクタ変更ステップ１２に出力し、ＬＰＣＭ形式であればゲイン変更ステップ１０に出力してフェードアウトやフェードインと同等の音量変化を生じるようにその値のみを変更させる。 The fifth embodiment is the same as the first embodiment except that any one of Dolby Digital, MP3, and LPCM data is input as the first and second audio data, and the codec determination is automatically performed in the numerical value changing step. The volume can be changed as follows, and an example thereof will be described below with reference to FIG. FIG. 5 is a flowchart of the fifth embodiment.
In the codec determination step 6, whether the change input stream 102 input from the change start position determination step is DolbyDigital, MP3, or LPCM is automatically determined, and if it is DolbyDigital format, it is output to the dynamic range change step 11, If it is in the MP3 format, it is output to the scale factor changing step 12, and if it is in the LPCM format, it is output to the gain changing step 10, and only the value is changed so as to cause a volume change equivalent to fade-out or fade-in.

図６は、本実施の形態６における音声編集装置のブロック図である。入力手段２１には、第１の音声データと第２の音声データが入力される。第１、第２の音声データとしては、例えばムービーで撮影した映像データから分離されたＤｏｌｂｙＤｉｇｉｔａｌ形式の音声データや、ＣＤをリッピングして作成されたＬＰＣＭ形式の音声データ、あるいはインターネットで入手したＭＰＥＧ１ＬａｙｅｒＩＩＩ（以降ＭＰ３）形式の音声データなどが入力される。入力された音声データは、入力手段２１で最終的に本手段にて編集後音声データ１６０として出力される順序となるように順序を並べかえて入力ストリーム１００として出力される。
第１の判定手段２２において、入力手段からの入力ストリーム１００を解析してアクセスユニット（以下ＡＡＵ）と呼ばれる最小の再生データ単位の境界情報や第１の音声データと第２の音声データの切り替わり点を示す切り替わり位置情報といったストリームの付加情報を解析情報１０１として、入力ストリーム１００と共に出力する。
第２の判定手段２３では、解析情報１０１と入力ストリーム１００を入力し、解析情報から切り替わり位置情報が検出されるまでは、入力ストリーム１００をそのまま結合手段２５に出力する。切り替わり位置が検出されると、切り替わり位置の直前の音声データ１の数ＡＡＵと、切り替わり位置の直後の音声データ２の数ＡＡＵのみを変更入力ストリーム１０２としてコーデック判定ステップに出力する。切り替わり位置を過ぎると次の切り替わり位置、あるいは入力ストリームの終端まで再び結合手段２５に出力する。
変更手段２４において、第２の判定手段２３から変更入力ストリーム１０２が入力されたら、変更入力ストリームの音声データ１のＡＡＵの先頭からの特定ビットを５，４，３，２，０に変更して、フェードインをし、音声データ２のＡＡＵの同じビットを逆に０，２，３，４，５に変更してフェードアウトとなるように音量を変更させて変更ストリーム１１０として出力する。
結合手段２５では、第２の判定手段２３からの入力ストリーム１００と変更手段２４からの変更ストリーム１１０を順に結合して編集後音声ストリーム１６０として出力する。 FIG. 6 is a block diagram of the speech editing apparatus according to the sixth embodiment. The input means 21 receives the first sound data and the second sound data. As the first and second audio data, for example, Dolby Digital format audio data separated from video data shot by a movie, LPCM format audio data created by ripping a CD, or MPEG1 Layer III ( Thereafter, MP3) format audio data and the like are input. The input audio data is output as an input stream 100 by rearranging the order so that the audio data 160 is finally output as edited audio data 160 by the input means 21 in the input means 21.
In the first determination means 22, the input stream 100 from the input means is analyzed and the boundary information of the minimum reproduction data unit called an access unit (hereinafter referred to as AAU) and the switching point between the first audio data and the second audio data The additional information of the stream such as the switching position information indicating the analysis information 101 is output together with the input stream 100.
The second determination means 23 inputs the analysis information 101 and the input stream 100 and outputs the input stream 100 as it is to the combining means 25 until switching information is detected from the analysis information. When the switching position is detected, only the number AAU of the audio data 1 immediately before the switching position and the number AAU of the audio data 2 immediately after the switching position are output as the changed input stream 102 to the codec determination step. When the switching position is passed, the data is output again to the combining means 25 until the next switching position or the end of the input stream.
When the change input stream 102 is input from the second determination means 23 in the change means 24, the specific bit from the head of the AAU of the audio data 1 of the change input stream is changed to 5, 4, 3, 2, 0. , Fade in, change the same bit of the AAU of the audio data 2 to 0, 2, 3, 4 and 5 to change the volume so that it fades out, and output as a changed stream 110.
The combining unit 25 sequentially combines the input stream 100 from the second determination unit 23 and the changed stream 110 from the changing unit 24 and outputs the result as an edited audio stream 160.

本発明は、ＭＰＥＧやＡＡＣ、ＤｏｌｂｙＤｉｇｉｔａｌといった非可逆な圧縮方式の音声ストリームを編集する装置において、ストリームの結合時の音量レベルの違いによるノイズを発生することを防ぎ、非可逆なストリームの解凍・圧縮という処理による音質を劣化させることもなく、さらに解凍・圧縮という高い処理能力を要する手段も必要とすることもないため、携帯端末や携帯電話、低い処理能力のＰＣなどで音声、静止画、動画を編集、整理するソフトウェアや装置等に有用である。 The present invention prevents an irreversible stream decompression / compression in an apparatus for editing an irreversible compression type audio stream such as MPEG, AAC, DolbyDigital, etc., by preventing noise due to a difference in volume level when the streams are combined. The sound quality due to the processing is not deteriorated, and there is no need for a means that requires high processing power such as decompression / compression. This is useful for software and devices that edit and organize files.

１００入力ストリーム
１０１解析情報
１０２変更入力ストリーム
１１０音声変更ストリーム
１２０ゲイン変更ストリーム
１３０ダイナミックレンジ変更ストリーム
１４０スケールファクタ変更ストリーム
１５０適応音量変更ストリーム
１６０編集後音声ストリーム 100 Input Stream 101 Analysis Information 102 Changed Input Stream 110 Audio Change Stream 120 Gain Change Stream 130 Dynamic Range Change Stream 140 Scale Factor Change Stream 150 Adaptive Volume Change Stream 160 Edited Audio Stream

Claims

An input step for inputting the first and second audio data;
A first determination step of determining a volume change section of the first and second audio data input in the input step;
A second determination step of determining a codec method of the volume change section determined in the first determination step;
A change step of changing the content of the binary data at a specific position according to the codec determined in the second determination step;
A voice editing method comprising: a combining step of combining the first voice data and the second voice data changed in the changing step.

The changing step includes
A first changing step for changing the dynamic range of the volume changing section determined in the first determining step when the second determining step determines AC3;
A second change step for changing the scale factor of the volume change section determined in the first determination step when it is determined that the audio signal is MPEG audio in the second determination step; The speech editing method according to claim 1.

A stream input step for inputting the first audio data and the second audio data;
A stream analysis step for extracting the structure and position information of the audio data;
From the position information extraction extracted in the stream analysis step, a change start position determination step for determining whether data conversion is necessary,
In the change start position determining step, if it is determined that data conversion is necessary, a numerical value changing step for changing the stream data input in the stream input step;
A stream output step for outputting the stream input in the stream input step when it is determined in the stream change step in the numerical value changing step or in the change start position determining step that data conversion is unnecessary. A voice editing method characterized by the above.

When it is determined in the change start position step that the gain value of the stream needs to be changed, the gain of the stream input in the stream input step is changed in the numerical value changing step. The voice editing method according to claim 3.

When it is determined in the change start position determining step that the gain value of the stream needs to be changed, the dynamic range of the stream input in the stream input step is changed in the numerical value changing step. The voice editing method according to claim 3.

When it is determined in the change start position step that it is necessary to change the gain value of the stream, the scale factor of the stream input in the stream input step is changed in the numerical value change step. The voice editing method according to claim 3.

As the numerical value changing step,
A codec determination step for determining the codec type of the input stream;
A dynamic range changing step of changing the value of the dynamic range of the input stream of the DolbyDigital audio stream and outputting it as an adaptive audio change stream;
A scale factor changing step for changing the value of the scale factor of the input stream of the MPEG audio stream and outputting it as an adaptive audio change stream;
A gain changing step of changing the value of the gain of the input stream of the other stream and outputting as an adaptive audio change stream;
In the codec determination step, a plurality of types of input streams can be determined. When the input stream is a DolbyDigital audio stream, the dynamic range change step is executed.
If the input stream is an MPEG audio stream, execute the scale factor changing step;
The audio editing method according to claim 3, wherein the gain changing step is executed when the input stream is other input stream.

Input means for inputting the first and second audio data;
First determination means for determining a volume change interval of the first and second audio data input by the input means;
Second determination means for determining the codec method of the volume change section determined by the first determination means;
Changing means for changing the content of the binary data at a specific position according to the codec determined by the second determining means;
An audio editing apparatus comprising: a combining unit that combines the first audio data and the second audio data changed by the changing unit.

The changing means is
A first changing means for changing the dynamic range of the volume changing section determined by the first determining means when the second determining means determines AC3;
And a second changing means for changing the scale factor of the volume changing section determined by the first determining means when the second determining means determines MPEG audio. Editing device.