JPS6170599A

JPS6170599A - Voice reproduction system

Info

Publication number: JPS6170599A
Application number: JP59192113A
Authority: JP
Inventors: 国澤　寛治; 糸山　博
Original assignee: Matsushita Electric Works Ltd
Current assignee: Panasonic Electric Works Co Ltd
Priority date: 1984-09-13
Filing date: 1984-09-13
Publication date: 1986-04-11
Also published as: JPH0576639B2

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】［技術分野１本発明はＰＡＲＣＯＲ方式やＬＰＧ方式などの線形予測
法による音声再生方式に関するものである。DETAILED DESCRIPTION OF THE INVENTION [Technical Field 1] The present invention relates to an audio reproduction method using a linear prediction method such as a PARCOR method or an LPG method.

１背量技術１現在ＰＡＲＣＯＲ方式などの線形予測法を用いた音声合
ｒＬＩｃｈｔ？！ｒ種市販されている。この種の線形予
測法による音声合成は、合成ｔ）Ｓ（＋＋ｊをとして、
残ｉＺｎの２東平均値が最小となるような予測係数Ａｋ
を求め、予測次数ｐが充分大きい（ｐ≧８）場合には、
残差Ｚ＋＋が有声音区間では音源とノチに対応したイン
パルス列、無声音区間では白色雑音に近くなることを利
用し、残差波形を有声音区間ではピノ千周期と平均残差
強度をもつインパルス列（場合によっては三角ｌ皮、１
ピンチ残差ｌ皮形なども用い→れる）で、無声音区間で
は平均残差強度をらっ白色雑音でそれぞれ置き換えるこ
とによって、良好な音質を保ちながら情報圧縮を行なう
ものである。1 Volume technology 1 Currently, audio synthesis rLIcht using linear prediction methods such as PARCOR method? ! R types are commercially available. Speech synthesis using this type of linear prediction method is as follows:
Prediction coefficient Ak that minimizes the two-east average value of residual iZn
is calculated, and if the predicted order p is sufficiently large (p≧8),
Taking advantage of the fact that the residual Z++ is an impulse train corresponding to the sound source and the notch in the voiced section, and close to white noise in the unvoiced section, the residual waveform is converted into an impulse train with a Pino thousand period and an average residual intensity in the voiced section. (In some cases, triangular skin, 1
Pinch residual l-rind etc. are also used), and information is compressed while maintaining good sound quality by replacing the average residual intensity with bright white noise in unvoiced sections.

第２関は残差波形を模塑的に示したもので、（ａｉの無
声音区間においては白色雑音状の残差の２乗平均値を最
小にするように音声パラメータが決定されるが、（ｂ）
の有声音区間においてはインパルス状の残差が加わって
νするので、この場合の音声パラメータはインパルスと
白色雑音とを含んだ全体の残差の２乗平均値を最小にす
るように決定される。しかしインパルス部分は振幅が大
きいので２乗平均値のがなりの部分を占めるものと考え
られ、しｒこがって（ｂ）図において、零レベルが（ａ
）図の場合よりも２乗平均値のインパルス分りだけ嵩上
げされて実線位置に（るように音声パラメータが決定さ
れるので、白色雑音状残差のみの２Ｐ、平均値は最小に
はならなくなる。The second section is a simulated representation of the residual waveform.In the unvoiced section of (ai), the voice parameters are determined so as to minimize the root mean square value of the white noise-like residual; b)
In the voiced sound section, an impulse-like residual is added to ν, so the speech parameters in this case are determined to minimize the root mean square value of the entire residual including impulses and white noise. . However, since the impulse part has a large amplitude, it is thought that it occupies the slope part of the root mean square value.
) Since the voice parameters are determined so that they are raised by the impulse of the root mean square value and are at the solid line position () than in the case of the figure, the 2P and mean value of only white noise-like residuals will no longer be the minimum.

しかし音声パラメータが抽出される過程では１周期の大
部分を占める白色雑音状部分に相関性が残存しており、
この白色雑音状の残差を最小にすることによって、この
部分から声道の特徴を抽出するのであるから、インパル
スのために白色雑音状残差が最小でなくなるのは不合理
であＩ）、その分だけ原音の復元性が悪くなると考えら
れる。まｒこ音声を合成する際には、有声音区間では音
源波形としてインパルス列のみを使用するので、その意
味からち分析の段階で白色雑音残差が零になるように音
声パラメータを決定しておくことが望ましい。However, in the process of extracting speech parameters, correlation remains in the white noise-like part that occupies most of one cycle.
By minimizing this white noise-like residual, the features of the vocal tract are extracted from this part, so it is unreasonable for the white noise-like residual to become no longer minimum due to the impulse.I) It is thought that the restoration of the original sound deteriorates accordingly. When synthesizing Mako's speech, only the impulse train is used as the sound source waveform in the voiced section, so the speech parameters are determined so that the white noise residual becomes zero at the analysis stage. It is desirable to leave it there.

［発明の目的１本発明は上記の問題点に鑑み為さＦ′したちのであり、
線形予測法による残差波形をインパルス系列および白色
ランダム雑音で近似する音声再生方式の予測分析におい
て、有声音区間での白色雑音状残差のみの２乗平均値を
最小とするように音声パラメータを決定し、それによっ
て音声復元性を向上すること゛を目的とするものである
。　　　　＝［発明の開示１しかして本発明は、線形予測に上る残差波形をインパル
ス系列封よび白色雑音で近似する音声再生方式において
、まず原音声波形を線形予測分析し、得られｒこ残差波
形情報を用いて残差波形を生成し、この残差波形を原音
声波形から脆し引いｔこ波形について再度線形予測分析
を行なって各音声パラメータを求め、これらの音声パラ
メータと上記残差波形情報とを用いて音声波形を合成す
るようにしたものであり、予測に先立ってインパルスの
みの２乗平均値すなわち第２図（１＋）ｌこおけるＤを
相殺しておくようにしたちのである。[Object of the Invention 1 The present invention has been made in view of the above problems,
In the predictive analysis of a speech reproduction method that approximates the residual waveform obtained by the linear prediction method using an impulse sequence and white random noise, the speech parameters are set so as to minimize the root mean square value of only the white noise-like residual in the voiced speech section. The purpose of this is to improve audio restoration performance. = [Disclosure of the Invention 1 The present invention provides a speech reproduction method in which the residual waveform resulting from linear prediction is approximated by impulse sequence sealing and white noise. A residual waveform is generated using the waveform information, this residual waveform is subtracted from the original audio waveform, linear predictive analysis is performed again on this waveform to obtain each audio parameter, and these audio parameters and the above residual waveform are The speech waveform is synthesized using information, and prior to prediction, the root mean square value of only the impulses, that is, D in (1+)l in FIG. 2 is canceled out.

第１図は本発明の一実施例を示すブロック図である。同
図において、原音声は低域フィルタ１を通ってＡ／Ｄ−
変換２されたのち、メモリ３に記憶される。次にこの原
音声波形Ｓ１．はメモリ３がら読み出され、マイクロコ
ンピュータ４によって線形予測分析が行なわれる。１回
目の予測５に上って音声パラメータＡｋが抽出されると
共に残差情報としてピッチパラメータＰおよび振幅パラ
メータＵが得られるが、この時の音声パラメータＡｋお
よび残差波形Ｚ。は捨てられ、残差情報Ｐお上りしのみ
が利用される。これらのＰお上りＵが音源発生回路６に
加えられて、純粋のインパルス列と白色稚仔のみからな
る残差波形Ｚｌが生成される。この残差波形がメモリ３
から読み出されｒこ原音声波形Ｓ、から引き算され、こ
うして得られた波形Ｓ１に上り２回目の予測７が行なわ
れる。２回目の予測７で得られた音声パラメータＡｋと
ＰおよびＵパラメータ（前回得られたＰお上びＵでら同
じ）が音声情報として記憶または伝送８される。再生側
ではＰおよびＵを音源発生回路９に加えてＺｌと全く同
じ音源波形Ｚ２を生成し、二へをディジタルフィルタ１
０に加元て音声パラメータＡｋにより音声波形Ｓ２を合
成し、Ｄ／ノ＼変換器１１および低域フィルタ１２全通
して合成音声とする。FIG. 1 is a block diagram showing one embodiment of the present invention. In the figure, the original audio passes through a low-pass filter 1 and then passes through the A/D-
After conversion 2, it is stored in memory 3. Next, this original audio waveform S1. is read out from the memory 3 and subjected to linear predictive analysis by the microcomputer 4. In the first prediction 5, the audio parameter Ak is extracted and the pitch parameter P and amplitude parameter U are obtained as residual information, and the audio parameter Ak and the residual waveform Z at this time. is discarded, and only the residual information P is used. These P upstreams U are added to the sound source generating circuit 6 to generate a residual waveform Zl consisting only of a pure impulse train and white particles. This residual waveform is stored in memory 3.
The second prediction 7 is performed on the thus obtained waveform S1 by subtracting it from the original speech waveform S. The audio parameters Ak, P, and U parameters obtained in the second prediction 7 (same as P, U, and U obtained previously) are stored or transmitted 8 as audio information. On the playback side, P and U are added to the sound source generation circuit 9 to generate a sound source waveform Z2 that is exactly the same as Zl, and the second is sent to the digital filter 1.
0 and synthesizes the speech waveform S2 using the speech parameter Ak, and passes it through the D/N converter 11 and the low-pass filter 12 to produce synthesized speech.

このように原音声波形Ｓ。から残差波形ＺＩを差し引い
ｔこ波形３　、　ｌこよって２回目の予測を行なった場
合に、インパルスの２乗平均値すなわち第２図（ｂ＞に
おけるＤが相殺さ技るのは次の理由による。いま音源（
残差）波形Ｚ１のみを入力として予測を行なったとする
と、Ｚ、は有声音区間ではインパルス列のみで構ｆ１．
されており、インパルスの１周期に占める時間は小さい
ので殆どサンプリングにかからず、そのｒこめにすべて
の音声パラメータは実質的に零となり、その残差は音源
と同一のインパルス列となる。したがって「源波形Ｚ１
を反転した波形＜−２，＞を原音声波形Ｓ、に重畳した
波形Ｓ、（＝Ｓ、−Ｚ、）を人力として予測を行なうと
、残差にはインパルス列の反軟した波形が重畳されるの
で、元のインパルスの２乗平均値が相殺さ跣ることにな
り、純粋の誤差成分である白色雑音状残差のみを２乗平
均値を最小にすること１こより、この部分に残存する相
関性をほぼ完全に除去でｂるのである。In this way, the original speech waveform S. When the second prediction is made by subtracting the residual waveform ZI from the waveform 3 and l, the root mean square value of the impulse, that is, D in Figure 2 (b>) cancels out for the following reason. According to the current sound source (
If prediction is performed using only the waveform Z1 as input, then Z, is only an impulse train in the voiced section, and f1.
Since the time occupied in one impulse cycle is small, almost no sampling is required, and all the audio parameters become substantially zero, and the residual becomes the same impulse train as the sound source. Therefore, “source waveform Z1
When the waveform S, (=S, -Z,) obtained by superimposing the inverted waveform <-2,> on the original speech waveform S, is manually predicted, the softened waveform of the impulse train is superimposed on the residual. Therefore, the root mean square value of the original impulse is canceled out, and only the white noise-like residual, which is a pure error component, remains in this part by minimizing the root mean square value. This almost completely eliminates the correlation between the two.

［発明の効果１上述のよう１こ本発明は、まず原音声波形を予測して得
られた残差波形情報を用いて残差波形を生成し、この残
差波形を原音声波形から差し引いた波形について再度予
測を行なって各音声パラメータを求め、これらの音声パ
ラメータと上記残塩波形情報とを用いて音声波形を合成
するものであるから、有声音区間と無声音区間とを問わ
ず白色雑音状の残差の２乗平均値を最小とするように音
声パラメータの抽出を行なうことができ、したがって音
源波形をインパルス列と白色雑音とで近似して音声を再
生する場合の原音復元性をきわめて簡単な構成によって
改善し得るという利点がある。[Effect of the invention 1 As mentioned above, the present invention first generates a residual waveform using the residual waveform information obtained by predicting the original speech waveform, and then subtracts this residual waveform from the original speech waveform. The waveform is predicted again to obtain each voice parameter, and the voice waveform is synthesized using these voice parameters and the residual salt waveform information. Speech parameters can be extracted so as to minimize the root mean square value of the residuals of This has the advantage that it can be improved by a new configuration.

[Brief explanation of the drawing]

第１図は本発明の一実施例を示すブロック回路図、第２
図（ＩＩ）（ｌＪ）は同上の動作を示す波形図で・ある
。１は低域フィルタ、２はＡ／Ｄ変換器、３はメモリ、４
はマイクロコンビ１−タ、５は第１の予測回路、６（ま
音源発生回路、７は第２の予測回路、８は記憶または伝
送回路、９は音源発生回路、１０はディノタルフィルタ
、１１はＤ／Ａｔ換器、１２はＩ［Ｌ＊フィルタ、Ｓ、
は原音声波形、Ｓｌは重畳された波形、Ｓ２は再生音声
波形、Ｚ。は残差波形、Ｚ、およびＺ２は近似された音
２！（残差）＄Ｌ形。代理人　弁理士　石　１）艮　七手続補正書（自発）昭和　５９年１２月２９日FIG. 1 is a block circuit diagram showing one embodiment of the present invention, and FIG.
Figure (II) (lJ) is a waveform diagram showing the same operation as above. 1 is a low-pass filter, 2 is an A/D converter, 3 is memory, 4
1 is a microcombiner, 5 is a first prediction circuit, 6 is a sound source generation circuit, 7 is a second prediction circuit, 8 is a storage or transmission circuit, 9 is a sound source generation circuit, 10 is a dinotal filter, 11 is a D/At converter, 12 is an I[L* filter, S,
is the original audio waveform, Sl is the superimposed waveform, S2 is the reproduced audio waveform, and Z. is the residual waveform, Z, and Z2 are the approximated sound 2! (Residual) $L type. Agent Patent Attorney Ishi 1) Written amendment to the 7th procedure (voluntary) December 29, 1980

Claims

[Claims]

(1) In a speech reproduction method that approximates a residual waveform by linear prediction with an impulse train and white noise, first performs a linear prediction analysis on the original speech waveform, generates a residual waveform using the obtained residual waveform information, A voice characterized in that each voice parameter is obtained by performing linear predictive analysis on the waveform obtained by subtracting this residual waveform from the original voice waveform, and a voice waveform is synthesized using these voice parameters and the residual waveform information. Playback method.