JPH01312600A - Reproducing method for voice waveform by power adaptive window - Google Patents

Reproducing method for voice waveform by power adaptive window

Info

Publication number
JPH01312600A
JPH01312600A JP14312188A JP14312188A JPH01312600A JP H01312600 A JPH01312600 A JP H01312600A JP 14312188 A JP14312188 A JP 14312188A JP 14312188 A JP14312188 A JP 14312188A JP H01312600 A JPH01312600 A JP H01312600A
Authority
JP
Japan
Prior art keywords
power
window
audio
voice
waveform
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP14312188A
Other languages
Japanese (ja)
Inventor
Yasumi Matsuyuki
松雪 康巳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nippon Telegraph and Telephone Corp
Original Assignee
Nippon Telegraph and Telephone Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nippon Telegraph and Telephone Corp filed Critical Nippon Telegraph and Telephone Corp
Priority to JP14312188A priority Critical patent/JPH01312600A/en
Publication of JPH01312600A publication Critical patent/JPH01312600A/en
Pending legal-status Critical Current

Links

Landscapes

  • Transmission Systems Not Characterized By The Medium Used For Transmission (AREA)

Abstract

PURPOSE:To obtain a voice waveform being similar to an original sound by using a power adaptive window for varying a shape (weighting of a duplicate part) of its window in accordance with a ratio of voice power in frame sections before and behind the duplicate part at the time of synthesis. CONSTITUTION:With respect to a duplicate part at the time of synthesis, a shape (weighting) of a window of the duplicate part is varied adaptively in accordance with a ratio of voice power in analysis sections before and behind said part. That is, even when a sudden variation of power has been generated between frames by utilizing a variation of the voice power, this power adaptive window reproduces an encoded voice waveform in accordance with its power. In such a way, a transient part of a rise, a fall, etc., of the voice waveform becomes clear, and a waveform being similar to an original sound can be obtained.

Description

【発明の詳細な説明】 「産業上の利用分野」 この発明は音声信号全フレーム毎に処理する音声符号化
方式において、復号時におけるフレーム間の重ね合わせ
部分の処理に関するものである。
DETAILED DESCRIPTION OF THE INVENTION [Field of Industrial Application] The present invention relates to processing of overlapping portions between frames at the time of decoding in an audio encoding system that processes every frame of an audio signal.

「従来の技術」 音声符号化において、音声信号の分析・合成の処理を行
う場合には、・入力信号波形のサンプル値を準定常と考
えられる程度の短区間(通常20〜39 m−5ee 
)に区切り、かつ結果の連続性が保たれるように分析区
間(フレーム)を一部重複させて分析を行う。フレーム
の端の影響を少なくするために、分析時には、ハミング
窓、ハニング窓、方形窓等が用いられ、合成時には台形
窓が用いられることが多い。ところがフレーム毎に周波
数領域に変換して処理を行う場合などには、合成時に台
形窓を用いるとフレームの境界部分などでの音声波形の
立ち上がり、立ち下がり部などの過渡部での歪が大きく
なることがあり、符号化品質に劣化をもたらすことがあ
る。
``Prior art'' In speech encoding, when analyzing and synthesizing speech signals, the sample values of the input signal waveform are divided into short intervals that can be considered quasi-stationary (usually 20 to 39 m-5ee).
), and the analysis intervals (frames) are partially overlapped to maintain continuity of results. To reduce the influence of frame edges, Hamming windows, Hanning windows, square windows, etc. are used during analysis, and trapezoidal windows are often used during synthesis. However, when processing is performed by converting each frame into the frequency domain, using a trapezoidal window during synthesis increases distortion at transient parts such as the rising and falling parts of the audio waveform at frame boundaries. This may cause deterioration in encoding quality.

「課題を解決するための手段」 上記問題点を解決するため(二、この発明では合成時の
重ね合わせ部分に対して、その前後の分析区間の音声パ
ワの比に応じて重ね合わせ部分の窓の形(重み付け)を
適応的Cユ変化させることを行う。これにより音声波形
の立ち上がり、立ち下がり等の過渡部を明瞭≦;し、S
N比を向上させることが可能となる。
"Means for Solving the Problem" In order to solve the above problem (2. In this invention, a window of the superimposed part is set according to the ratio of the audio power of the analysis section before and after the superimposed part during synthesis. By adaptively changing the shape (weighting) of the voice waveform, transient parts such as rises and falls of the audio waveform can be clearly seen.
It becomes possible to improve the N ratio.

音声信号の分析・合成を行う場合(二は、入力信号波形
のサンプル値を準定常と考えられる程度の短区間(通常
20〜3 Q m see )に区切って分析を行い、
結果の連続性が作たれるように分析区間を一部重複させ
て分析を行う。この方法を第1(gに示す。このとき分
析フレームの端の影響を少なくするために、第1図(;
示すように窓かけ分析波形としてハミング窓やハニング
窓、方形窓等が用いられる。この分析した信号波形を合
成する場合に、この発明では連続する2フレームの各々
の音声パワを利用したパワ適応窓を用いる。パワ適応窓
の形状を一般(:よく用いられる台形窓と比較して第2
図(al 、 (b)に示す。
When analyzing and synthesizing audio signals, the sample values of the input signal waveform are divided into short intervals that can be considered quasi-stationary (usually 20 to 3 Q msee) and analyzed.
Analyzes are performed with some overlapping analysis intervals to ensure continuity of results. This method is shown in Figure 1 (g). At this time, in order to reduce the influence of the edges of the analysis frame, Figure 1 (;
As shown, a Hamming window, Hanning window, square window, etc. are used as windowed analysis waveforms. When synthesizing the analyzed signal waveforms, the present invention uses a power adaptive window that utilizes the audio power of each of two consecutive frames. The shape of the power adaptive window is the second most common (compared to the commonly used trapezoidal window).
Shown in Figure (al, (b)).

第2図において、時間方向をX軸、振幅方向をy軸にと
り、重ね合わせ部分を示すA、B、C。
In FIG. 2, the time direction is taken as the X-axis and the amplitude direction is taken as the y-axis, and A, B, and C show the overlapping parts.

D点のx、y座標を各々(0,0)t(091)1(N
、0)、(N、1)としたとき、第2図(a) (:。
Let the x and y coordinates of point D be (0,0)t(091)1(N
, 0), (N, 1), Fig. 2(a) (:.

示す台形窓では重ね合わせの部分のB−0間、A−0間
は各々 B−C:  y=−LX+i  p、−D:  y=3
−xN          N と表される。一方、この発明によるパワ適応窓では第2
図(b)に示すように、時間方向と振幅方向において、
各フレーム区間(二おける音声パワのα乗(αハ正(D
実数) P1a* P2” <D比1:応1:tテE点
、2点を決定する。すなわちE点を(N −P2”/(
P、” + P2” ) s P2a/(P1a+P2
”) ) t F、#l’(N’P2a/CP、”+P
2” ) I P、”/(P、” +P2a) )トL
/、B−E 、 E−C、A−F 、 F−0間を各々
とする。但しフレームの音声パワPを と定義する。Xiは標本化されたフレーム内の音声信号
の値であり、Lはlフレームのサンプル数である。
In the trapezoidal window shown, the overlapping portions between B-0 and A-0 are respectively B-C: y=-LX+i p, -D: y=3
−xN N . On the other hand, in the power adaptive window according to the present invention, the second
As shown in figure (b), in the time direction and amplitude direction,
Each frame period (α power of the audio power in two
Real number) P1a*P2"< D ratio 1: E1: t Determine the E point and 2 points. In other words, set the E point as (N - P2"/(
P,"+P2")s P2a/(P1a+P2
”) ) t F, #l'(N'P2a/CP, "+P
2”) I P,”/(P,” +P2a))
/, B-E, E-C, A-F, and F-0 respectively. However, the audio power P of the frame is defined as follows. Xi is the value of the audio signal in the sampled frame, and L is the number of samples in l frame.

このパワ適応窓は、音声パワの変化を利用してフレーム
間で急激なパワの変化が生じたときにもそのパワに応じ
て符号化音声波形を再生するものである。
This power adaptive window uses changes in audio power to reproduce the encoded audio waveform according to the power even when a sudden power change occurs between frames.

「実施例」 以下にこの発明の実施例を図面を用いて説明する。第3
図はこの発明を音声符号化方式(特願昭63−3019
8)に適用した一実施例である。
"Embodiments" Examples of the present invention will be described below with reference to the drawings. Third
The figure shows this invention as a voice coding system (Patent Application No. 63-3019).
This is an example in which 8) is applied.

入力端子1からの入力音声信号は周波数領域変換部2で
1フレームを単位に、例えば離散的フーリエ変換(DF
T)(=より周波数領域の信号、すなわちスペクトルに
変換される。次&:捕助情報抽出部3において音声パワ
とスペクトルの帯域(例えば0〜1 、1〜2 、2〜
4 KHz)ごとのパワが補助情報として抽出された後
、量子化され、さらに補助情報荀部復号化部4において
補助情報はあらかじめ復号化される。変換されたスペク
トルは振幅・位相成分抽出部6で振幅成分と位相成分と
に分けられ、ピーク抽出部7で調波構造をもつ振幅成分
の各ピークが抽出される。
The input audio signal from the input terminal 1 is subjected to a discrete Fourier transform (DF
T) (= converted into a frequency domain signal, that is, a spectrum.Next &: In the acquisition information extraction section 3, the voice power and the spectrum band (for example, 0~1, 1~2, 2~
After the power per 4 KHz is extracted as auxiliary information, it is quantized, and the auxiliary information is further decoded in advance in the auxiliary information decoding section 4. The converted spectrum is divided into an amplitude component and a phase component by an amplitude/phase component extractor 6, and a peak extractor 7 extracts each peak of the amplitude component having a harmonic structure.

次(二層波数情報補間部8で抽出した振幅成分のピーク
点とその両側の2点を含む3点(二よる二次捕間によっ
て実際のピークとなる周波数を詳細に定める。その決定
されたピークの周波数位置を、先に抽出した補助情報に
より適応情報割当部5で帯域毎のパワに応じたピット配
分を行って割り当てられたピット数で1周波数情報量子
化部9で量子化する。局部復号化部10でこの周波数情
報のみを復号化して、その復号された周波数位置へ゛:
における振幅・位相を新たに定める。振幅情報補間へ 部11で周波数fil一対応する振幅を決定し、位相へ 情報補間部12で周波数fiに対応する位相を決定する
。これら決定された振幅・位相の各パラメータを、先に
求めておいた補助情報(二もとづいて、適応情報割当部
5で適応情報割当を行い、振幅情報量子化部13、位相
情報量子化部14でそれぞれ量子化する。
Next (three points including the peak point of the amplitude component extracted by the two-layer wave number information interpolation unit 8 and two points on both sides of it) (determine in detail the actual peak frequency by quadratic interpolation. The frequency position of the peak is quantized by the number of allocated pits by the adaptive information allocation unit 5 according to the power of each band by the adaptive information allocation unit 5 using the previously extracted auxiliary information.The frequency information quantization unit 9 quantizes the peak frequency position. The decoding unit 10 decodes only this frequency information and moves to the decoded frequency position.
Newly determine the amplitude and phase of . An amplitude information interpolation section 11 determines the amplitude corresponding to the frequency fi, and a phase information interpolation section 12 determines the phase corresponding to the frequency fi. Based on the previously determined auxiliary information (2), the adaptive information allocation unit 5 performs adaptive information allocation on each of the determined amplitude and phase parameters, and the amplitude information quantization unit 13 and the phase information quantization unit 14 quantize each.

受信側では補助情報復号化部17で復号化された補助情
報を用いて、符号器で用いたものと同じ適応情報割当部
5で割り当てられるビット数によって、各゛復号化部1
8〜20で周波数、振幅、位相の各情報が復号化され、
これら情報により音声波形再生部21でパワ適応窓によ
る再生波形が得られる。
On the receiving side, using the auxiliary information decoded by the auxiliary information decoding section 17, each decoding section 1
Frequency, amplitude, and phase information are decoded in steps 8 to 20,
Based on this information, the audio waveform reproducing section 21 obtains a reproduced waveform using a power adaptive window.

一般(二音声符号化(:おいてはフレーム毎のパワは補
助情報として伝送するものであるから、パワ適応窓は伝
送パラメータ数の増加を招くことなく、すなわちビット
レートの増加なく実現することができる。
In general (two-speech coding), the power per frame is transmitted as auxiliary information, so the power adaptive window can be realized without increasing the number of transmission parameters, that is, without increasing the bit rate. can.

次に本実施例において、通常の台形窓を用いた場合とパ
ワ適応窓を用いた場合を比較した実験例を述べる。サン
プリング周波数は8KHz、分析フレーム長は32m5
lIe、重なり部分の長さはl(5mse、前記のα上
1/2としたときの情報量(ビットレート)(二対する
セグメンタルSN比を第4図に示す。
Next, in this embodiment, an experimental example comparing the case where a normal trapezoidal window is used and the case where a power adaptive window is used will be described. Sampling frequency is 8KHz, analysis frame length is 32m5
lIe, the length of the overlapping part is l (5 mse, the amount of information (bit rate) when α is set to 1/2 above, and the segmental S/N ratio for (2) is shown in FIG.

第4因において曲線23は台形窓を用いた場合、曲線2
4はこの発明のパワ適応窓を用いた場合であり、他の条
件は全く同一である。第4図から従来方式である台形窓
よりもこの発明方法のパワ適応窓の方が優れていること
が理解できる。
In the fourth factor, curve 23 becomes curve 2 when using a trapezoidal window.
4 is a case where the power adaptive window of the present invention is used, and the other conditions are exactly the same. From FIG. 4, it can be seen that the power adaptive window of this invention is superior to the conventional trapezoidal window.

また符号化音声波形の一例を原音声波形と併せて第5図
に示す。第5図(a)が原音声波形で、第5図(b)が
台形窓を用いた場合、第5[/(C)がパワ適応窓を適
用した場合、符号化の条件は前記と同様でビットレート
はF3 kbpsである。第5図に示した例は音声波形
が大きく変化している部分がフレームの境界部分(二角
たっている例である。第5図よりこの発明のパワ適応窓
により大きな改善効果が得られていることがわかる。
Further, an example of the encoded speech waveform is shown in FIG. 5 together with the original speech waveform. When Figure 5(a) is the original speech waveform, Figure 5(b) is when a trapezoidal window is used, and Figure 5(C) is when a power adaptive window is applied, the encoding conditions are the same as above. The bit rate is F3 kbps. The example shown in Fig. 5 is an example in which the part where the audio waveform changes greatly is the frame boundary (two corners). As shown in Fig. 5, a large improvement effect has been obtained by the power adaptive window of the present invention. I understand that.

ここではこの発明の実施例として、音声符号化方式(特
願昭63−30198)に適用した例(;ついてのみ述
べたが、その池のフレーム毎に処理を行う音声符号化方
式にも適用することができる。
Here, as an embodiment of the present invention, an example in which it is applied to an audio encoding method (Japanese Patent Application No. 63-30198) is described. be able to.

「発明の効果」 以上述べたようにこの発明によれば、音声の分析・合成
の処理をフレーム毎に行う場合において、音声波形の急
峻な変化がある場合でも、音声パワ情報を有効に利用す
ること(二より、原音により近い音声波形を得ることが
できる。
"Effects of the Invention" As described above, according to the present invention, when audio analysis and synthesis processing is performed frame by frame, audio power information can be effectively used even when there is a steep change in the audio waveform. (Secondly, it is possible to obtain an audio waveform that is closer to the original sound.

【図面の簡単な説明】[Brief explanation of the drawing]

第1図はハミング窓を用いた分析の説明図、第2図は合
成窓を示し、(a)は台形窓、(blはこの発明のパワ
適応窓、第3図はこの発明の実施例を示すブロック図、
第4図はパワ適応窓を用いた場合と台形窓を用いた場合
の情報量−8N比の関係を示す図、第5図はパワ適応窓
の効果を示す図で、(a)は原音声波形、(b)は台形
窓を用いた場合の符号化音声波形、(C)はパワ適応窓
を用いた場合の符号化音声波形である。 特許出願人  日本電信電話株式会社 代 理  人   草   野     卓汗 1 圀 悠授11介祈浪形 、+172  g (a) (b) オ 3 図 々 4 口 ごットレート ヤ 5 図 2msec
Fig. 1 is an explanatory diagram of analysis using a Hamming window, Fig. 2 shows a synthesis window, (a) is a trapezoidal window, (bl is a power adaptive window of this invention, and Fig. 3 is an embodiment of this invention. Block diagram shown,
Figure 4 is a diagram showing the relationship between the amount of information -8N ratio when using a power adaptive window and when using a trapezoidal window, and Figure 5 is a diagram showing the effect of the power adaptive window. (a) is the original voice Waveforms, (b) are encoded speech waveforms when using a trapezoidal window, and (C) are encoded speech waveforms when using a power adaptive window. Patent Applicant Nippon Telegraph and Telephone Corporation Agent Takuhan Kusano 1 Yusuke Kuni 11 words of prayer, +172 g (a) (b) O 3 Figures 4 Word of mouth 5 Figure 2msec

Claims (1)

【特許請求の範囲】[Claims] (1)入力信号を一定時間間隔で標本化してその標本値
を取り出し、一定数記憶して1フレームとし、1フレー
ム毎の分析・合成の処理を、分析フレーム区間を一部重
複させて行う音声符号化方式において、合成時には前記
重複部分の前後のフレーム区間の音声パワの比に応じて
その窓の形(重複部分の重み付け)を変化させるパワ適
応窓を用いたことを特徴とするパワ適応窓音声波形再生
方法。
(1) Audio in which the input signal is sampled at fixed time intervals, the sample values are taken out, a fixed number of them are stored as one frame, and the analysis and synthesis processing for each frame is performed by partially overlapping the analysis frame section. A power adaptive window in the encoding system, characterized in that a power adaptive window is used that changes the shape of the window (weighting of the overlapping part) according to the ratio of audio power of frame sections before and after the overlapping part during synthesis. Audio waveform playback method.
JP14312188A 1988-06-10 1988-06-10 Reproducing method for voice waveform by power adaptive window Pending JPH01312600A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP14312188A JPH01312600A (en) 1988-06-10 1988-06-10 Reproducing method for voice waveform by power adaptive window

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP14312188A JPH01312600A (en) 1988-06-10 1988-06-10 Reproducing method for voice waveform by power adaptive window

Publications (1)

Publication Number Publication Date
JPH01312600A true JPH01312600A (en) 1989-12-18

Family

ID=15331405

Family Applications (1)

Application Number Title Priority Date Filing Date
JP14312188A Pending JPH01312600A (en) 1988-06-10 1988-06-10 Reproducing method for voice waveform by power adaptive window

Country Status (1)

Country Link
JP (1) JPH01312600A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH02258279A (en) * 1989-11-30 1990-10-19 Victor Co Of Japan Ltd Transfer paper cartridge
JPH02258281A (en) * 1989-11-30 1990-10-19 Victor Co Of Japan Ltd Transfer paper cartridge
JPH02258280A (en) * 1989-11-30 1990-10-19 Victor Co Of Japan Ltd Transfer paper cartridge

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH02258279A (en) * 1989-11-30 1990-10-19 Victor Co Of Japan Ltd Transfer paper cartridge
JPH02258281A (en) * 1989-11-30 1990-10-19 Victor Co Of Japan Ltd Transfer paper cartridge
JPH02258280A (en) * 1989-11-30 1990-10-19 Victor Co Of Japan Ltd Transfer paper cartridge

Similar Documents

Publication Publication Date Title
KR100427753B1 (en) Method and apparatus for reproducing voice signal, method and apparatus for voice decoding, method and apparatus for voice synthesis and portable wireless terminal apparatus
US5953696A (en) Detecting transients to emphasize formant peaks
CN1270292C (en) Speech bandwidth extension and speech bandwidth extension method
EP0698876B1 (en) Method of decoding encoded speech signals
JPH0516599B2 (en)
CA2399253C (en) Speech decoder and method of decoding speech involving frequency expansion
US5899966A (en) Speech decoding method and apparatus to control the reproduction speed by changing the number of transform coefficients
WO1998006090A1 (en) Speech/audio coding with non-linear spectral-amplitude transformation
KR20060036724A (en) Method and apparatus for encoding/decoding audio signal
JPH01312600A (en) Reproducing method for voice waveform by power adaptive window
JP2003157100A (en) Voice communication method and equipment, and voice communication program
JP3559485B2 (en) Post-processing method and device for audio signal and recording medium recording program
JP3472974B2 (en) Acoustic signal encoding method and acoustic signal decoding method
JPH05303399A (en) Audio time axis companding device
JPS60262200A (en) Expolation of spectrum parameter
US6606591B1 (en) Speech coding employing hybrid linear prediction coding
JPH04249300A (en) Method and device for voice encoding and decoding
JP3038755B2 (en) Sound source data generation method for speech synthesizer
JP2000259197A (en) Method for detecting and correcting attack/release signal in audio encoding
KR0171004B1 (en) Basic frequency using samdf and ratio technique of the first format frequency
JPH07261796A (en) Voice encoding and decoding device
JPH0481199B2 (en)
JPH11251918A (en) Sound signal waveform encoding transmission system
JPH05265488A (en) Pitch extracting method
KR100554165B1 (en) CELP-based Speech Codec capable of eliminating of pitch-multiple effect and method of the same