JP2000267699A

JP2000267699A - Acoustic signal coding method and device therefor, program recording medium therefor, and acoustic signal decoding device

Info

Publication number: JP2000267699A
Application number: JP11075557A
Authority: JP
Inventors: Kenichi Minami; 憲一南; Akito Akutsu; 明人阿久津; Yoshinobu Tonomura; 佳伸外村
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1999-03-19
Filing date: 1999-03-19
Publication date: 2000-09-29

Abstract

PROBLEM TO BE SOLVED: To make performable a high-quality and high efficiency coding in accordance with the kind of an acoustic signal. SOLUTION: An input acoustic signal is classified into the acoustic of a music, a voice or the like by analyzing the frequency of it by an acoustic classifying part 102, and a selecting part 103 selects a proper coding method in accordance with this classification and also generates classification information indicating the selection. When the classification is a music, the acoustic signal is coded by the coding part 1041 with coding methods of a TwinVQ (frequency area weighted interleave quantization) and MPEG1, 2 or the like and when the classification is a voice, the signal is coded by a coding part 1042 with the coding method of a PSI-CELP (pitch synchronous stimulated source code driving linear predictive coding) and a CS-CELP or the like. These coded acoustic signals and classification information are collectively filed for every frame, otherwise classification information is collectively field in proper unit as coded data.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】この発明は音声や音楽などの
音響信号をフレームに分割し、フレームごとあるいは連
続する複数のフレームごとに符号化方法を選択して符号
化する可変ビットレート符号化方法および装置およびそ
のプログラムを記録した記録媒体、復号装置に関するも
のであり、音響信号の蓄積や伝送に適用される。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a variable bit rate encoding method for dividing an audio signal such as voice or music into frames and selecting and encoding an encoding method for each frame or for a plurality of continuous frames. The present invention relates to a device, a recording medium on which the program is recorded, and a decoding device, and is applied to storage and transmission of an audio signal.

【０００２】[0002]

【従来の技術】通信やマルチメディアの分野において、
音声や音楽等の音響信号を蓄積したり伝送する際には、
記録媒体や伝送路の効率的な利用を実現するために音声
あるいは音楽の高能率な符号化方法が利用されている。
例えば、音声の符号化に関しては、ピッチ同期励振源符
号駆動線形予測符号化（ＰＳＩ−ＣＥＬＰ：Pitch Sync
hronous Innovation Code Excited Linear Predictive
Coding）方法があり、自動車・携帯電話で日本の標準方
式として用いられている。この符号化方式の詳細は「自
動車電話用ハーフレート音声コーデックの検討」（間
野、他：信学技報、ＳＰ−９２−１３３、１９９３）に
記載されている。この方式は、音声信号を一定の長さの
フレームに分割し、各フレームに一定のビットレートを
割り当てる固定ビットレート符号化方式を用いている
が、フレームごとあるいは同じ特徴を持つ複数のフレー
ムごとにビットレートを選択して符号化を行う可変ビッ
トレート符号化方式も提案されている。可変ビットレー
トＰＳＩ−ＣＥＬＰ音声符号化はその一例であり、詳細
は「ＰＳＩ−ＣＥＬＰ音声符号化の可変ビットレート化
に関する検討」（大室、他：信学技報、ＳＰ−９３−１
３９、１９９４−０２）に記載されている。2. Description of the Related Art In the field of communication and multimedia,
When storing or transmitting audio signals such as voice or music,
2. Description of the Related Art In order to realize efficient use of a recording medium and a transmission path, a highly efficient encoding method of voice or music is used.
For example, regarding speech encoding, pitch synchronous excitation source code driven linear prediction encoding (PSI-CELP: Pitch Sync
hronous Innovation Code Excited Linear Predictive
Coding) method, which is used as a standard system in Japan for automobiles and mobile phones. The details of this encoding method are described in "Study of Half-rate Voice Codec for Mobile Phone" (Mano et al., IEICE Technical Report, SP-92-133, 1993). This method divides the audio signal into frames of a fixed length, and uses a fixed bit rate coding method in which a fixed bit rate is assigned to each frame, but for each frame or a plurality of frames having the same characteristics. A variable bit rate encoding system that performs encoding by selecting a bit rate has also been proposed. Variable bit rate PSI-CELP speech coding is one example, and details are described in "Study on Variable Bit Rate PSI-CELP Speech Coding" (Omuro, et al .: IEICE Technical Report, SP-93-1).
39, 1994-02).

【０００３】一方、音楽の符号化に関しては、周波数領
域重み付けインタリーブベクトル量子化（TwinＶＱ：Tr
ansform-domain Weighted Interleave Vector Quantiza
tion）方式があり、広帯域の音響信号を低ビットレート
で符号化することが可能である。TwinＶＱは、スケーラ
ブル符号化用の量子化としてＩＳＯ標準符号化規格であ
るＭＰＥＧに採用されている。この符号化方式の詳細は
「周波数領域重み付けインタリーブベクトル量子化（Tw
inＶＱ）による楽音符号化」（岩上、他：信学論Ａ、Vo
l.８０、No. ５、pp. ８３０−８３７）に記載されてい
る。On the other hand, regarding music encoding, frequency domain weighted interleaved vector quantization (TwinVQ: Tr
ansform-domain Weighted Interleave Vector Quantiza
) method, and it is possible to encode a wideband audio signal at a low bit rate. TwinVQ is employed in MPEG, which is an ISO standard coding standard, as quantization for scalable coding. For details of this encoding method, refer to “Frequency-domain weighted interleave vector quantization (Tw
inVQ), (Iwagami, et al .: IEICE A, Vo
l.80, No. 5, pp. 830-837).

【０００４】ところで、音響信号の符号化に用いられる
ビットレートは、２kbps〜６４kbpsと幅広い範囲に渡っ
ているため、ビットレートに応じて適当な符号化方式が
用いられるが、一般に音声のほうが時間変化が早く、時
間分解能が必要なのに対し、音楽は周波数分解能が必要
であるから、単にビットレートだけでなく、音響信号の
種類によっても符号化方式を選択した方が復号化したと
きに高品質な結果が得られる。この場合、対象とする音
響信号が音声主体か、音楽主体かを人が判断し、符号化
方式を一意に選択する必要があった。Since the bit rate used for encoding an audio signal is as wide as 2 kbps to 64 kbps, an appropriate encoding method is used in accordance with the bit rate. However, music requires frequency resolution while time resolution is required, so selecting a coding method depending not only on the bit rate but also on the type of audio signal gives a higher quality result when decoding. Is obtained. In this case, it is necessary for a person to determine whether the target acoustic signal is mainly of voice or music, and to select an encoding method uniquely.

【０００５】[0005]

【発明が解決しようとする課題】従来の符号化方式で
は、符号化の対象となる音響信号が音声か、音楽かを人
が判断し、この情報とビットレートに応じて符号化方法
を一意に選択していたが、実際には音声と音楽が時間分
割的に混在する対象も多く存在するため、そのような場
合は充分な音質が得られない場合がある。In the conventional coding method, a person judges whether the audio signal to be coded is voice or music, and the coding method is uniquely determined according to this information and the bit rate. However, in practice, there are many objects in which voice and music are mixed in a time-division manner, and in such a case, sufficient sound quality may not be obtained.

【０００６】この発明の目的は、一連の音響信号内にお
ける音の種類を自動的に判別し、音の種類によって符号
化方法を切り換えることによって、高品質かつ効率的な
符号化方法とその復号装置を提供することにある。SUMMARY OF THE INVENTION It is an object of the present invention to automatically determine the type of sound in a series of audio signals and switch the encoding method according to the type of sound, thereby providing a high-quality and efficient encoding method and its decoding apparatus. Is to provide.

【０００７】[0007]

【課題を解決するための手段】上記目的を達成するた
め、この発明においては、音響信号を入力してフレーム
分割し、フレームごとに周波数解析し、各フレームを音
楽あるいは音声あるいはそれ以外の音響の少なくとも２
種類以上の音に分類し、フレームごとあるいは同じ種類
の連続したフレームごとに音の種類に応じた符号化方法
を選択し、どの符号化方法が選択されたかを示す分類情
報を生成し、選択された符号化方法により音響信号を符
号化し、分類情報と符号化された音響信号から符号化デ
ータを生成する。In order to achieve the above object, according to the present invention, an audio signal is input, divided into frames, frequency-analyzed for each frame, and each frame is composed of music or voice or other audio. At least 2
Classify into sounds of more than one type, select an encoding method according to the type of sound for each frame or for each successive frame of the same type, generate classification information indicating which encoding method was selected, and select The audio signal is encoded by the encoding method described above, and encoded data is generated from the classification information and the encoded audio signal.

【０００８】また、音響信号を音楽に分類する際に、周
波数方向に安定しながら一定時間以上持続する音楽の周
波数スペクトルの時間方向のエッジの強さを用いる。ま
た、音響信号を音声に分類する際に、音声の高調波成分
を検出するくし形フィルタの出力を用いる。さらに、音
楽、音声、それ以外の音響の順に割り当てるビットレー
トを低くする。[0008] When classifying an acoustic signal into music, the strength in the time direction of the frequency spectrum of the music, which is stable in the frequency direction and lasts for a certain time or more, is used. When classifying an acoustic signal into a sound, the output of a comb filter for detecting a harmonic component of the sound is used. Furthermore, the bit rate to be assigned in the order of music, voice, and other sounds is reduced.

【０００９】[0009]

【発明の実施の形態】次に、この発明の実施例について
図面を参照して説明する。図１は、この発明の一実施例
の音響信号符号化装置の概略構成を示すブロック図であ
る。音声や音楽などの音響信号は、音響入力部１０１か
ら入力され、音響分類部１０２において音声、音楽、そ
の他の音響に分類される。この分類の方法については後
で述べる。次に符号化方法選択部１０３において、分類
された音響信号の種類に応じて適切な符号化方法を選択
する。符号化方法の例としては、音楽にはTwinＶＱ、Ｍ
ＰＥＧ１，２、ＡＣ−３等が、音声にはＰＳＩ−ＣＥＬ
Ｐ、ＣＳ−ＡＣＥＬＰ等が、その他の音響には音声に対
するものと同様のもの例えばＣＥＬＰなどがそれぞれ利
用可能である。音響データは、選択された符号化方法に
応じて符号化部１０４₁〜１０４₃の何れかで符号化お
よび誤り訂正符号化される。符号化ファイル構成部１０
５では、選択された符号化方法の種類を示す分類情報と
実際に符号化された音響データとをフレームごと、ある
いは、分類情報を適当な単位でまとめてファイル化し、
符号化データとして出力する。Next, an embodiment of the present invention will be described with reference to the drawings. FIG. 1 is a block diagram showing a schematic configuration of an audio signal encoding device according to one embodiment of the present invention. An audio signal such as voice or music is input from the audio input unit 101, and is classified into voice, music, and other sounds by the audio classification unit 102. The method of this classification will be described later. Next, the encoding method selection unit 103 selects an appropriate encoding method according to the type of the classified audio signal. As an example of the encoding method, TwinVQ, M
PEG1,2, AC-3 etc., PSI-CEL for audio
P, CS-ACELP, etc., and other sounds similar to those for voice, such as CELP, can be used. The audio data is encoded and error-correction encoded by any of the encoding units 104 _{1 to} 104 ₃ according to the selected encoding method. Encoded file construction unit 10
In 5, the classification information indicating the type of the selected encoding method and the actually encoded audio data are filed for each frame or the classification information is collected in an appropriate unit and filed.
Output as encoded data.

【００１０】図２は、この発明の一実施例の音響信号符
号化装置において、可変ビットレートの符号化方法を用
いた場合の概略構成を示すブロック図である。図１との
違いは、音響符号化部１０４₁〜１０４₃において、フ
レームごとの音響信号の特徴に応じてビットレートを変
化させるための音響分析部２０４₃₁とモード選択部２０
４₃₂が備わっている点である。音響符号化部１０４₁ お
よび１０４₂ も同様の構成にすることも可能である。音
響分析部２０４₃₁では、例えば音声の場合には、音声信
号のパワー、ピッチ、相関等の特徴量を算出し、無音部
分、子音部分、母音部分等を特定する。ここでパワーは
音声信号の強度そのものであり、ピッチはいわゆる音声
帯域に特徴的な周波数情報であり、相関は音声波形を短
い周期で観測したときに現れる信号波形の相関を意味
し、特に母音の場合にこの相関が高くなるため、母音の
位置を検出できる。FIG. 2 is a block diagram showing a schematic configuration in a case where a variable bit rate encoding method is used in the audio signal encoding apparatus according to one embodiment of the present invention. The difference from FIG. 1, the acoustic coding unit 104 ₁ to 104 _3, acoustic analysis unit 204 ₃₁ and a mode selection unit 20 for changing the bit rate according to the characteristics of the acoustic signal for each frame
4 ₃₂ is provided. The audio encoding units 104 ₁ and 104 ₂ may have the same configuration. The acoustic analysis unit 204 _31, for example in the case of voice, the voice signal power, calculates a pitch, a feature amount of correlation, etc., to identify silence, consonant part, a vowel part or the like. Here, the power is the intensity itself of the audio signal, the pitch is frequency information characteristic of a so-called audio band, and the correlation means the correlation of the signal waveform that appears when the audio waveform is observed in a short cycle, and particularly, the vowel sound. In this case, since the correlation is high, the position of the vowel can be detected.

【００１１】モード選択部２０４₃₂では、無音部分では
ビットレートを低くし、母音部分では高くするといった
ようにビットレートのモードを選択する。選択されたモ
ードに応じて音響信号は符号化部２０４₃₃〜２０４₃₅で
符号化および誤り訂正符号化される。モード情報は、符
号化方法選択部１０３で選択された符号化方法を示す分
類情報と共にフレームごと、あるいは、適当な単位ごと
に符号化ファイル構成部１０５においてファイル化さ
れ、符号化部２０４₃₃〜２０４₃₅において符号化された
符号化情報と共に符号化データとして出力される。[0011] In the mode selection unit 204 _32, lowering the bit rate is silence, selects the mode of bit-rate as such that high vowel portions. Acoustic signal according to the selected mode is coded and error correction coding by the coding section 204 _33-204 _35. Mode information, each frame with classification information indicating the encoded method selected by the coding method selection unit 103, or is filed in the coding file multiplexing section 105 for each appropriate unit, the encoding unit 204 _33-204 It is output as encoded data together with the encoded information encoded in ₃₅ .

【００１２】音響分類部１０２で分類された音響信号が
音楽の場合には、通常可変ビットレート符号化方法を使
用することがないが、この場合でも途中で音がない休止
区間にはビットレートを低くしてもよい。分類された音
響信号が音楽でもなく、音声でもないその他の音響であ
る場合には、雑音、拍手、笑声、無音区間などが想定さ
れ、可変ビットレート符号化方式を使用することにより
効率的な符号化が達成できる。この場合の音響分析部２
０４₃₁での分析は、音声と同様な手法により実現でき
る、つまりその分析すべき特徴量としては信号パワーを
用いて、無音区間と有音区間を識別して、無音区間や雑
音区間のビットレートを低くする。When the audio signal classified by the audio classification unit 102 is music, the variable bit rate encoding method is not usually used. May be lower. If the classified audio signal is not music and other audio that is not voice, noise, applause, laughter, silence, etc. are assumed, and the use of a variable bit rate encoding method is more efficient. Encoding can be achieved. Sound analysis unit 2 in this case
The analysis at 04 ₃₁ can be realized by the same method as that of speech. That is, as a feature to be analyzed, a signal rate is used to identify a silent section and a sound section, and the bit rate of a silent section or a noise section is determined. Lower.

【００１３】図３はこの発明の一実施例の音響信号符号
化装置における音響分類部での処理の流れを示すフロー
チャートである。まず、数十ミリ秒ごとのフレーム単位
で入力された音響信号は３０１で高速フーリエ変換（Ｆ
ＦＴ）され、３０２で数秒程度の長さを持つサウンドス
ペクトログラムが生成される。次に、周波数ｉを０に初
期化し、３０３において各周波数ｉにおけるサウンドス
ペクトログラムの時間方向のエッジ強度Ｉｉを算出す
る。エッジ強度の算出には、周波数方向のスペクトル値
の微分等を用いることができる。３０４でそれまでの総
エッジ強度ＥＤにエッジ強度Ｉｉを加算して総エッジ強
度ＥＤとし、３０５で周波数ｉが音響信号の帯域幅にお
ける最高周波数ＢＷより大となったかを調べ、大でなけ
ればｉを＋１してステップ３０３に戻る。このようにし
て各周波数ｉにおけるサウンドスペクトログラムの時間
方向のエッジ強度Ｉｉを算出する処理を帯域幅ＢＷ全体
に対して行い、その間エッジ強度の和ＥＤを３０４で算
出する。３０５においてｉ＞ＢＷとなり全帯域について
の処理が終了したと判断されたならば、ＥＤの値と予め
設定されたしきい値の比較を３０６で行う。ＥＤの値が
しきい値以上であれば、このフレームは音楽に分類され
る。つまり、周波数方向に安定しながら一定時間以上持
続する周波数スペクトルの時間方向のエッジの強さが所
定値以上であればその音響信号は音楽であると判定す
る。FIG. 3 is a flowchart showing the flow of processing in the audio classification unit in the audio signal encoding apparatus according to one embodiment of the present invention. First, an acoustic signal input in units of frames every several tens of milliseconds is subjected to a fast Fourier transform (F) at 301.
FT), and a sound spectrogram having a length of about several seconds is generated at 302. Next, the frequency i is initialized to 0, and at 303, the edge intensity Ii in the time direction of the sound spectrogram at each frequency i is calculated. For the calculation of the edge strength, differentiation of the spectrum value in the frequency direction or the like can be used. At 304, the edge strength Ii is added to the total edge strength ED up to that time to obtain the total edge strength ED. At 305, it is checked whether the frequency i has become larger than the highest frequency BW in the bandwidth of the audio signal. Is incremented by 1, and the process returns to step 303. In this manner, the processing for calculating the edge strength Ii in the time direction of the sound spectrogram at each frequency i is performed on the entire bandwidth BW, and the sum ED of the edge strength is calculated by 304 during that. If it is determined at 305 that i> BW and the processing for all the bands is completed, the value of ED is compared with a preset threshold at 306. If the value of ED is greater than or equal to the threshold, this frame is classified as music. That is, if the strength of the edge in the time direction of the frequency spectrum, which is stable in the frequency direction and lasts for a certain time or more, is equal to or more than the predetermined value, the sound signal is determined to be music.

【００１４】３０６においてＥＤがしきい値以下であっ
た場合には、ｉ＝０、ｊ＝１、ｍａｘＣＲ＝０と初期化
した後、３０８で音響信号周波数スペクトルをくし形フ
ィルタ処理し、その処理結果、つまりくし形フィルタ出
力ＣＲijを算出し、３０９でそのＣＲijがそれまで最大
値ｍａｘＣＲより大かを調べ、大であれば３１０でその
ＣＲijをｍａｘＣＲとし、次に３１１でくし形フィルタ
の隣接通過域間隔ｊがＢＷ／２より大かを調べ、大でな
けばｊを＋１してステップ３０８に戻り、ＢＷ／２より
大であればｊ＝１として、３１２でｉがＢＷより大かを
調べ、大でなければｉを＋１してステップ３０８に戻
る。このようにして、くし形フィルタ処理は、各ライン
スペクトルに対して繰り返し処理を行う。くし形フィル
タは、中心周波数をｉずらしながら、各ｉに対してくし
の間隔ｊを変化させ、３０９においてｍａｘＣＲとの比
較を行い、１ラインスペクトルで最大の値ｍａｘＣＲを
評価値とする。If the ED is equal to or smaller than the threshold value in step 306, i = 0, j = 1, and maxCR = 0 are initialized, and in step 308, the acoustic signal frequency spectrum is subjected to comb filter processing, and the processing is performed. The result, that is, the comb filter output CRij is calculated. At 309, it is checked whether or not the CRij is larger than the maximum value maxCR. It is checked whether or not the area interval j is larger than BW / 2. If it is not larger, j is incremented by 1 and the process returns to step 308. If it is larger than BW / 2, j is set to 1 and 312 is checked whether i is larger than BW. If not large, i is incremented by 1 and the process returns to step 308. In this way, the comb filter processing repeatedly performs processing on each line spectrum. The comb filter changes the comb interval j for each i while shifting the center frequency by i, compares it with maxCR at 309, and sets the maximum value maxCR in one line spectrum as the evaluation value.

【００１５】ｍａｘＣＲと予め設定されたしきい値の比
較を３１３で行い、しきい値以上の場合は該当フレーム
を音声に分類する。しきい値以下の場合は、音楽でも音
声でもないその他の音響として分類する。その他の音響
として分類されたものは、有音／無音や特定のフィルタ
を用いてさらに分類することも可能である。このように
音響信号を音楽、音声、その他に分類し、各々の種類に
応じて適切な符号化方法を選択することにより、高品質
で効率の良い符号化が可能である。A comparison between maxCR and a preset threshold value is made at step 313. If the value is equal to or larger than the threshold value, the corresponding frame is classified as speech. If it is less than the threshold, it is classified as other sound that is neither music nor voice. Those classified as other sounds can be further classified using sound / silence or a specific filter. As described above, by classifying the audio signal into music, voice, and the like, and selecting an appropriate encoding method according to each type, high-quality and efficient encoding can be performed.

【００１６】以上のように符号化され音響信号に対する
復号は例えば図４に示すように入力された符号化データ
は分離部４０１で符号化音響信号と分類情報とに分離さ
れる。その分類情報に応じて選択部４０２が制御され、
符号化音響信号が復号部４０３，４０４，４０５の何れ
かに入力される。復号部４０３は、符号化側で楽音の音
響信号に対し符号化した符号化方法と対応する復号方法
により復号するものであり、復号部４０４は音声の音響
信号に対し符号化した符号化方法と対応する復号方法に
より復号するものであり、復号部４０５は音楽でもなく
音声でもない音の音響信号に対し符号化した符号化方法
と対応する復号方法により復号するものである。分類情
報が音楽を表わしていれば符号化音響信号を復号部４０
３に入力し、音声を表わしていれば復号部４０４に入力
する。In the decoding of the coded audio signal as described above, for example, the input coded data is separated into the coded audio signal and the classification information by the separation unit 401 as shown in FIG. The selection unit 402 is controlled according to the classification information,
The encoded audio signal is input to any of the decoding units 403, 404, and 405. The decoding unit 403 decodes the sound signal of the musical sound on the coding side by a decoding method corresponding to the coding method of the sound signal. The decoding unit 404 decodes the coding method of the sound signal of the voice. The decoding is performed by a corresponding decoding method, and the decoding unit 405 decodes the sound signal of a sound that is neither music nor sound by a coding method corresponding to the coding method. If the classification information indicates music, the encoded audio signal is decoded by the decoding unit 40.
3 and input to the decoding unit 404 if it represents voice.

【００１７】[0017]

【発明の効果】以上のとおり、請求項１、２、６、７、
１１および１２記載の発明は、音響信号をフレーム分割
し、周波数解析し、各フレームを音楽あるいは音声ある
いはそれ以外の音響の少なくとも２種類以上の音に分類
し、音の種類に応じて適切な符号化方法を選択し、高品
質で効率の良い符号化が可能となる。As described above, claims 1, 2, 6, 7,
According to the inventions described in 11 and 12, the audio signal is divided into frames, frequency analysis is performed, and each frame is classified into at least two or more types of music or voice or other types of sounds, and an appropriate code is set according to the type of sound. By selecting a coding method, high-quality and efficient coding can be performed.

【００１８】請求項３、８および１３記載の発明は、音
響信号から得られたサウンドスペクトログラムにおい
て、周波数方向に安定しながら一定時間以上持続する時
間方向のエッジの強さを用いることによって、音響信号
を音楽に分類することができる。請求項４、９および１
４記載の発明は、音響信号から得られたサウンドスペク
トログラムに対して、くし形フィルタの出力を用いるこ
とにより音響信号を音声に分類することができる。According to the third, eighth and thirteenth aspects of the present invention, in the sound spectrogram obtained from the sound signal, the strength of the edge in the time direction that is stable in the frequency direction and lasts for a certain time or more is used. Can be classified as music. Claims 4, 9 and 1
According to the invention described in Item 4, the sound signal can be classified into speech by using the output of the comb filter with respect to the sound spectrogram obtained from the sound signal.

【００１９】請求項５、１０および１５記載の発明は、
音響信号に対して、音楽、音声、それ以外の音響の順に
割り当てるビットレートを低くし、符号化することによ
って、高品質で効率の良い符号化が可能となる。請求項
１６の記載の発明によれば、音楽、音声、その他の音と
対応した適切な復号がなされ高品質な効率の良い復号が
なされる。The invention according to claims 5, 10 and 15 provides:
By encoding and lowering the bit rate assigned to music, voice, and other audio in the order of audio signals, high-quality and efficient encoding can be performed. According to the invention of claim 16, appropriate decoding corresponding to music, voice, and other sounds is performed, and high-quality efficient decoding is performed.

[Brief description of the drawings]

【図１】この発明の一実施例の音響信号符号化装置の概
略機能構成を示すブロック図。FIG. 1 is a block diagram showing a schematic functional configuration of an audio signal encoding device according to an embodiment of the present invention.

【図２】この発明の一実施例の音響信号符号化装置にお
いて、符号化部に可変ビットレートの方法を適用した場
合の概略機能構成を示すブロック図。FIG. 2 is a block diagram showing a schematic functional configuration in a case where a variable bit rate method is applied to an encoding unit in the audio signal encoding apparatus according to one embodiment of the present invention;

【図３】この発明の一実施例の音響信号符号化装置にお
ける音響分類部の処理の流れを示すフローチャート。FIG. 3 is a flowchart showing a flow of processing of an audio classification unit in the audio signal encoding device according to one embodiment of the present invention;

【図４】この発明の音響信号復号装置の概略機能構成を
示すブロック図。FIG. 4 is a block diagram showing a schematic functional configuration of an audio signal decoding device according to the present invention.

───────────────────────────────────────────────────── フロントページの続き (72)発明者外村佳伸東京都新宿区西新宿三丁目19番２号日本電信電話株式会社内Ｆターム(参考） 5D045 CA01 DA11 DA20 5J064 AA01 AA02 BB10 BC02 BC25 BD03 9A001 BB04 EE04 GG03 HH16 KK31 KK43 KK54 LL02 ────────────────────────────────────────────────── ─── Continued on the front page (72) Inventor Yoshinobu Tonomura 3-19-2 Nishi-Shinjuku, Shinjuku-ku, Tokyo F-Term within Nippon Telegraph and Telephone Corporation (reference) 5D045 CA01 DA11 DA20 5J064 AA01 AA02 BB10 BC02 BC25 BD03 9A001 BB04 EE04 GG03 HH16 KK31 KK43 KK54 LL02

Claims

[Claims]

1. A variable bit rate encoding method for dividing an audio signal into frames, selecting an encoding method for each of the divided frames, and encoding the audio signal, wherein the audio signal is frequency-analyzed for each frame. The frame is classified into at least two types of sound, music or voice or other sound, and an encoding method according to the type of sound is selected for each frame or for each successive frame of the same type. Generating a classification information indicating whether the selected audio signal is selected, encoding the audio signal by a selected encoding method, and generating encoded data from the classification information and the encoded audio signal. Encoding method.

2. A variable bit rate encoding method for dividing an audio signal into frames, selecting an encoding method for each of the divided frames, and encoding the audio signal, wherein the audio signal is frequency-analyzed for each frame. Classifying the frame into at least two or more types of music or voice or other types of sound, and selecting an encoding method according to the type of sound for each frame or for each successive frame of the same type; Performing an acoustic analysis according to the classification of the signal, selecting a bit rate mode according to the analysis result, encoding the acoustic signal according to the selected encoding method and the selected bit rate; Generating classification information indicating the selected encoding method; generating mode information indicating the selected bit rate; and generating the classification information and the mode information. Acoustic signal encoding method and generating encoded data from said encoded sound signal.

3. The method according to claim 1, wherein when the audio signal is classified into music, the strength of a time-direction edge of a frequency spectrum of the music, which is stable in a frequency direction and lasts for a predetermined time or more, is used. 3. The audio signal encoding method according to item 2.

4. The audio signal encoding method according to claim 1, wherein an output of a comb filter for detecting a harmonic component of the audio is used when the audio signal is classified into audio.

5. A low bit rate to be assigned to music, voice, and other sounds in the order of the sound signal,
The audio signal encoding method according to claim 1, wherein encoding is performed.

6. A variable bit rate encoding apparatus for dividing an audio signal into frames, selecting an encoding method for each of the divided frames, and encoding the divided audio signals, wherein a plurality of encoding methods different in encoding method from each other are provided. A sound input unit that inputs the sound signal, a sound classification unit that performs a frequency analysis of the sound signal for each frame, and classifies each frame into at least two types of sounds of music or voice or other sounds, For each frame or for each successive frame of the same type, select an encoding unit of an encoding method according to the type of sound, supply the audio signal to the encoding unit, and indicate which encoding method has been selected. An encoding method selecting unit for generating classification information; an audio file encoded by the selected encoding unit; and an encoded file structure for generating encoded data from the classification information. Acoustic signal encoding apparatus characterized by comprising a part.

7. A variable bit rate encoding device for dividing an audio signal into frames, selecting an encoding method for each divided frame, and encoding, wherein at least one of the encoding method and the bit rate is different from each other. A plurality of encoding units, an audio input unit for inputting the audio signal, a frequency analysis of the audio signal for each frame, and converting each frame to at least two types of music or voice or other types of sound. A sound classification unit for classifying, a sound analysis unit for performing sound analysis on the classified sound signal according to the classification, and an encoding method selected according to a type of the classified sound, and the analysis The audio signal is supplied to an encoding unit having a bit rate selected in accordance with the result of the above, classification information indicating which encoding method is selected, and which bit rate is selected. A selection unit that generates mode information indicating whether or not the audio signal has been encoded; an audio signal encoded by the encoding unit; and an encoded file configuration unit that generates encoded data from the classification information and the mode information. A sound signal encoding device characterized by the above-mentioned.

8. The sound classifying unit uses the strength of the edge in the time direction of the frequency spectrum of the music, which is stable in the frequency direction and lasts for a predetermined time or more, when classifying the sound signal into music. The audio signal encoding device according to claim 6 or 7, wherein

9. The sound classification unit according to claim 6, wherein when classifying the sound signal into a sound, an output of a comb filter for detecting a harmonic component of the sound is used.
The audio signal encoding device according to any one of the above.

10. The audio signal encoding according to claim 6, wherein an encoding unit having a lower bit rate is selected in the order of music, voice, and other audio in the classification result. apparatus.

11. A recording medium storing a variable bit rate encoding program for dividing an audio signal into frames, selecting an encoding method for each of the divided frames, and encoding the audio signal, wherein the audio signal is input. Input processing, frequency analysis of an audio signal for each frame, and audio classification processing for classifying each frame into at least two types of sound, such as music or voice or other sounds, and for each frame or each successive frame of the same type A coding method selection process for selecting a coding method according to the type of sound, generating classification information indicating which coding method has been selected, and a code for coding an audio signal by the selected coding method And performing, on a computer, an encoding file forming process of generating encoded data from the classification information and the encoded audio signal. Recording medium characterized by recording the acoustic signal encoding program for causing.

12. A variable bit rate encoding program for dividing an audio signal into frames, selecting an encoding method for each divided frame, and encoding the audio signal, wherein: an audio input process for inputting the audio signal; An audio signal is subjected to frequency analysis for each frame, and an audio classification process for classifying each frame into at least two types of sounds, such as music, voice, and other sounds, and a sound type for each frame or for each successive frame of the same type. A coding method selection process of selecting a coding method according to the above, and generating classification information indicating which coding method has been selected, and an acoustic analysis of performing an acoustic analysis on the classified audio signal according to the classification Processing and selecting a bit rate mode according to the analysis result,
A mode selection process for generating mode information indicating the selected bit rate mode; an encoding process for encoding the audio signal according to the selected encoding method and the selected bit rate mode; A recording medium storing an audio signal encoding program for causing a computer to execute an encoded file configuration process for generating encoded data from the mode information and the encoded audio signal.

13. The sound classification processing, wherein when classifying the sound signal into music, the strength of a time-direction edge of a frequency spectrum of the music, which is stable in the frequency direction and lasts for a predetermined time or more, is used. Claim 11 or 1
2. The recording medium according to 2.

14. The sound classification process according to claim 11, wherein when classifying the sound signal into a sound, an output of a comb filter for detecting a harmonic component of the sound is used.
14. The recording medium according to any one of claims 13 to 13.

15. The recording medium according to claim 11, wherein the encoding process reduces a bit rate to be assigned in the order of music, voice, and other sounds.

16. A separation unit for separating input coded data into classification information and a coded audio signal, a plurality of decoding units different in decoding method from each other, for decoding an input coded audio signal into an audio signal, An audio signal decoding apparatus comprising: a decoding selection unit that selects one of the decoding units according to the separated classification information and inputs the encoded audio signal to the decoding unit.