JPH07219597A

JPH07219597A - Pitch converting device

Info

Publication number: JPH07219597A
Application number: JP6009613A
Authority: JP
Inventors: Hiroko Yoshida; 田博子吉
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1994-01-31
Filing date: 1994-01-31
Publication date: 1995-08-18

Abstract

PURPOSE:To reduce the degradation of a tone quality even though the pitch of the tone is changed. CONSTITUTION:This device is provided with a voice input means 1, a pitch conversion rate input means 2, a linear prediction analyzing means 3 obtaining a linear prediction coefficient by executing the linear prediction analysis of the inputted voice, an inverse filter 4 calculating a residual signal from an obtained linear prediction coefficient and the inputted voice signal, a residual signal re-sampling means 5 interpolating the residual signal with an interpolation polynomial and performing a re-sampling while adjusting a sampling frequency with the conversion rate of a pitch inputted by the pitch conversion rate input means, a synthesizing filter 6 synthesizing a voice using a re-sampled residual signal and the linear prediction coefficient and a synthesized voice output means 7 outputting a synthesized voice the same sampling frequency as that of the time of inputting.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、音声編集合成装置や、
音質変換装置など、音声の高さやアクセントを変更した
りするためのピッチ変換装置に関する。BACKGROUND OF THE INVENTION The present invention relates to a voice editing / synthesizing device,
The present invention relates to a pitch conversion device such as a sound quality conversion device for changing the pitch and accent of voice.

【０００２】[0002]

【従来の技術】従来、この種のピッチ変換装置は、テー
プレコーダーの回転数を変えたり、Ｄ／Ａ変換する時の
サンプリング周波数を変えたりすることによって、声の
高さを調整している。2. Description of the Related Art Conventionally, this type of pitch conversion apparatus adjusts the pitch of a voice by changing the number of revolutions of a tape recorder or the sampling frequency for D / A conversion.

【０００３】このように、上記従来の装置でも、再生ス
ピードを調整することにより所望の高さに音声を変換す
ることができる。As described above, even in the above-mentioned conventional apparatus, the sound can be converted to a desired pitch by adjusting the reproduction speed.

【０００４】[0004]

【発明が解決しようとする課題】しかしながら、上記従
来のピッチ変換装置では、音声そのものの再生スピード
を変えるため、音声のスペクトル構造が崩れ、音声の個
人性はおろか、明瞭性までも低下してしまうという問題
があった。However, in the above-described conventional pitch conversion device, since the reproduction speed of the voice itself is changed, the spectral structure of the voice is destroyed, and the individuality of the voice is lowered, and even the clarity is lowered. There was a problem.

【０００５】本発明は、上記従来の問題を解決するもの
で、声の高さを変えても音質の劣化の少ないピッチ変換
装置を提供することを目的とするものである。The present invention solves the above-mentioned conventional problems, and an object of the present invention is to provide a pitch conversion device which causes little deterioration in sound quality even when the pitch of a voice is changed.

【０００６】[0006]

【課題を解決するための手段】本発明は、上記目的を達
成するために、音声入力手段と、ピッチ変換率入力手段
と、入力された音声を線形予測分析して線形予測係数を
求める線形予測分析手段と、得られた線形予測係数と入
力された音声信号から残差信号を計算する逆フィルタ
と、この残差信号を補間多項式によって補間し、ピッチ
変換率入力手段で入力したピッチの変換率によって、サ
ンプリング周波数を調整しながら再サンプリングを行な
う残差信号再サンプリング手段と、再サンプリングされ
た残差信号と線形予測係数とを用いて音声を合成する合
成フィルタと、合成した音声を入力時と同じサンプリン
グ周波数で出力する合成音声出力手段とを備えたもので
ある。In order to achieve the above-mentioned object, the present invention provides a speech input means, a pitch conversion rate input means, and a linear prediction for linearly predicting an inputted speech to obtain a linear prediction coefficient. Analyzing means, an inverse filter for calculating a residual signal from the obtained linear prediction coefficient and the input voice signal, and an interpolation polynomial for interpolating the residual signal, and a pitch conversion rate input by the pitch conversion rate input means. The residual signal resampling means for performing resampling while adjusting the sampling frequency, the synthesis filter for synthesizing speech using the resampled residual signal and the linear prediction coefficient, and the synthesized speech at the time of input. And a synthetic voice output means for outputting at the same sampling frequency.

【０００７】本発明はまた、上記構成に加え、音声の時
間長を調整する手段を備えたものである。In addition to the above structure, the present invention also comprises means for adjusting the time length of voice.

【０００８】[0008]

【作用】本発明は、上記のような構成により次の様な作
用を有する。すなわち、一般に音声を線形予測分析を行
なうと、音声のスペクトル情報を有する線形予測係数
と、音声の高さを決めるピッチ情報を含む残差信号に分
離される。そのため、音声入力手段で入力した音声信号
を線形予測分析手段で線形予測分析を行ない、線形予測
係数を計算し、さらに逆フィルタで入力音声信号と計算
された線形予測係数とを用いて残差信号を求め、この残
差信号を、ピッチ変換率入力手段で入力されたピッチ変
換率をもとに、残差再サンプリング手段でピッチを低く
する場合には、サンプリング周波数を実際のサンプリン
グ周波数よりも大きくし、ピッチを高くする場合には実
際のサンプリング周波数よりも小さいサンプリング周波
数で再サンプリングし、残差信号の形状を変えずにサン
プル数を増加させ、それを合成フィルタで、線形予測分
析手段で計算された線形予測係数を用いて音声に合成
し、入力音声のサンプリング周波数と同じサンプリング
周波数で出力することにより、ピッチが変換された音声
が出力される。The present invention has the following actions due to the above-mentioned structure. That is, generally, when speech is subjected to linear prediction analysis, it is separated into a linear prediction coefficient having speech spectrum information and a residual signal containing pitch information for determining the pitch of speech. Therefore, the speech signal input by the speech input means is subjected to linear prediction analysis by the linear prediction analysis means, the linear prediction coefficient is calculated, and the residual signal is calculated by the inverse filter using the input speech signal and the calculated linear prediction coefficient. If the residual resampling means reduces the pitch of this residual signal based on the pitch conversion rate input by the pitch conversion rate input means, the sampling frequency is set higher than the actual sampling frequency. However, when increasing the pitch, re-sampling is performed at a sampling frequency lower than the actual sampling frequency, the number of samples is increased without changing the shape of the residual signal, and it is calculated by the synthesis filter and linear prediction analysis means. By synthesizing into speech using the linear prediction coefficient obtained, and outputting at the same sampling frequency as the sampling frequency of the input speech, Voice pitch is converted is output.

【０００９】つまり、ピッチ情報を有する残差信号を再
サンプリングしてサンプル数を変えることにより、もと
のサンプリング周波数でみると、あたかもピッチ間隔が
広くなったり、狭くなったりする。したがって、再サン
プリングされた残差信号と、入力音声から計算された線
形予測係数を用いて合成フィルタで合成することによ
り、ピッチが変換された音声を合成することができる。That is, when the residual signal having pitch information is resampled and the number of samples is changed, the pitch interval becomes wider or narrower at the original sampling frequency. Therefore, the re-sampled residual signal and the linear prediction coefficient calculated from the input speech are used for synthesis by the synthesis filter, whereby the speech with the pitch converted can be synthesized.

【００１０】また、合成された音声信号は、ピッチを高
くしたときは時間長が短く、ピッチを低くしたときは時
間長が長くなってしまうため、ＴＤＨＳ方式やＰＩＣＯ
ＬＡ方式など、ピッチを変えずに音声を時間軸上で圧
縮、伸長する手段を組み合わせて入力信号と出力信号の
時間長を合わせると、音声の時間長を変更させずにピッ
チを変更することができる。Further, the synthesized voice signal has a short time length when the pitch is raised and a long time length when the pitch is lowered, so that the TDHS system and the PICO system are used.
If the time lengths of the input signal and the output signal are matched by combining means for compressing and expanding the voice on the time axis without changing the pitch, such as the LA method, the pitch can be changed without changing the time length of the voice. it can.

【００１１】[0011]

【Example】

（実施例１）図１は本発明の第１の実施例の構成を示す
ブロック図である。図１において、１はＡ／Ｄ変換器な
どを含む音声入力手段、２はピッチ変換率入力手段、３
は入力された音声を線形予測分析し、線形予測係数を求
める線形予測分析手段、４は線形予測係数と入力された
音声信号とから残差信号を計算する逆フィルタ、５はピ
ッチ変換率入力手段２で入力されたピッチ変換率を参照
しながらサンプリング周波数を調整して、残差信号を再
サンプリングする残差信号再サンプリング手段、６はサ
ンプリングされた残差信号と線形予測係数とを用いて音
声信号に合成する合成フィルタ、７は合成した音声を入
力時と同じサンプリング周波数で出力するＡ／Ｄ変換器
等を含む合成音声入力手段である。(Embodiment 1) FIG. 1 is a block diagram showing the configuration of the first embodiment of the present invention. In FIG. 1, 1 is a voice input means including an A / D converter and the like, 2 is a pitch conversion rate input means, 3
Is a linear prediction analysis means for linearly predicting the input speech to obtain a linear prediction coefficient, 4 is an inverse filter for calculating a residual signal from the linear prediction coefficient and the input speech signal, and 5 is a pitch conversion rate input means. Residual signal resampling means for resampling the residual signal by adjusting the sampling frequency while referring to the pitch conversion rate input in 2, and 6 is a voice using the sampled residual signal and the linear prediction coefficient. A synthesis filter for synthesizing into a signal, and 7 is a synthesis voice input means including an A / D converter for outputting the synthesized voice at the same sampling frequency as that at the time of input.

【００１２】次に、上記実施例の動作について、図６の
波形図を参照しながら説明する。上記実施例において、
音声入力手段１で音声信号（図６（１））を入力して、
音声をディジタル信号に変換し、また、ピッチ変換率入
力手段２で、入力音声信号の音声に対して、どの程度音
の高さを変更するかを表すピッチ変更率を入力する。Next, the operation of the above embodiment will be described with reference to the waveform chart of FIG. In the above example,
Input the voice signal (Fig. 6 (1)) with the voice input means 1,
The voice is converted into a digital signal, and the pitch conversion rate input means 2 inputs a pitch change rate representing how much the pitch of the voice of the input voice signal is changed.

【００１３】音声入力手段１で入力した音声信号は、一
定区間（以下フレームという）の信号毎に線形予測分析
手段３で線形予測分析し、線形予測係数を求める。そし
て、逆フィルタ４で、求められた線形予測係数と音声入
力手段１で入力された音声信号を用いて残差信号（図６
（２））を算出する。算出された残差信号は、残差再サ
ンプリング手段５で再サンプリングを行なう（図６
（３））。再サンプリングする際の再サンプリング周波
数は、ピッチ変換率入力手段２で入力されたピッチ変換
率をもとに、ピッチを高くする時は音声入力手段１で入
力した音声のサンプリング周波数よりも低く、ピッチを
低くするとき音声のサンプリング周波数よりも高く設定
し、ピッチ変換率によってその値を調整しながら決定す
る。The speech signal input by the speech input means 1 is subjected to linear prediction analysis by the linear prediction analysis means 3 for each signal in a fixed section (hereinafter referred to as a frame) to obtain a linear prediction coefficient. Then, the inverse filter 4 uses the obtained linear prediction coefficient and the voice signal input by the voice input means 1 to generate a residual signal (see FIG. 6).
(2)) is calculated. The calculated residual signal is resampled by the residual resampling means 5 (FIG. 6).
(3)). The resampling frequency at the time of resampling is lower than the sampling frequency of the voice input by the voice input means 1 when the pitch is increased based on the pitch conversion rate input by the pitch conversion rate input means 2. Is set to a value higher than the sampling frequency of the voice when it is lowered, and the value is adjusted by adjusting the value according to the pitch conversion rate.

【００１４】再サンプリングを行なうと当然、残差信号
のフレームのサンプル数は増減するが、増減された残差
信号のサンプル数を新しいフレーム長として、合成フィ
ルタ６で再サンプリングされた残差信号と、線形予測分
析手段３で計算された線形予測係数を用いて音声信号に
合成する（図６（４））。合成された音声信号は、合成
音声出力手段７で音声入力手段１で入力した音声のサン
プリング周波数と同じサンプリング周波数で出力するこ
とによって、ピッチが変換された音声が出力される。Naturally, when the resampling is performed, the number of samples of the frame of the residual signal increases or decreases. However, with the increased or decreased number of samples of the residual signal as a new frame length, the residual signal resampled by the synthesis filter 6 is used. The linear prediction coefficient calculated by the linear prediction analysis unit 3 is used to synthesize the speech signal (FIG. 6 (4)). The synthesized voice signal is outputted by the synthesized voice output means 7 at the same sampling frequency as the sampling frequency of the voice inputted by the voice input means 1, whereby the voice whose pitch is converted is outputted.

【００１５】このように、上記第１の実施例によれば、
ピッチ情報を有する残差信号を再サンプリングしてサン
プル数を変えることにより、もとのサンプリング周波数
でみると、ピッチ間隔がひろくなったり、狭くなったり
しているため、再サンプリングされた残差信号と、入力
音声か計算された線形予測係数を用いて合成フィルタで
合成することによって、声の高さが変換された音声を合
成することができるという効果を有する。As described above, according to the first embodiment,
Re-sampling the residual signal with pitch information and changing the number of samples, the pitch interval becomes wider or narrower at the original sampling frequency. By synthesizing the input voice or the calculated linear prediction coefficient with the synthesizing filter, it is possible to synthesize the voice with the converted pitch.

【００１６】（実施例２）次に、本発明の第２の実施例
について説明する。上記実施例では、出力される音声信
号は、再サンプリングした後のフレーム長で合成される
ため、声を高くすると出力される音声の時間長は短く、
低くすると出力される音声の時間長は長くなってしまう
（図６（４））。そのため、本実施例では、上記第１の
実施例の構成に音声の時間長を調整する手段を付加した
ものである。(Second Embodiment) Next, a second embodiment of the present invention will be described. In the above embodiment, the output voice signal is synthesized with the frame length after re-sampling, so when the voice is raised, the time length of the output voice is short,
If it is lowered, the time length of the output sound will be longer (FIG. 6 (4)). Therefore, in this embodiment, means for adjusting the time length of voice is added to the configuration of the first embodiment.

【００１７】図２および図３に示す構成では、ＴＤＨＳ
やＰＩＣＯＬＡ方式など、音声のピッチ（高さ）を変え
ないで音声の時間長（速さ）を変換する音声時間軸伸長
手段８を設け、入力信号と出力信号の時間長を合わせる
ようにしている（図６（５））。また、図４および図５
に示す構成では、同じ様な方法で残差信号を時間軸で圧
縮、伸張する残差時間軸圧縮手段９を設け、残差信号を
時間軸上で圧縮、伸張して残差信号のフレーム長を入力
信号のフレーム長と合わせてから音声を合成することに
よって、出力音声の時間長が、入力音声の時間長と同じ
になるようにしたものである。In the configuration shown in FIGS. 2 and 3, the TDHS
A voice time axis expanding means 8 for converting the time length (speed) of the voice without changing the pitch (height) of the voice, such as the PICOLA method or the like, is provided to match the time length of the input signal and the output signal. (FIG. 6 (5)). Also, FIG. 4 and FIG.
In the configuration shown in FIG. 3, a residual time axis compression means 9 for compressing and expanding the residual signal on the time axis by the same method is provided, and the residual signal is compressed and expanded on the time axis to form the frame length of the residual signal. Is combined with the frame length of the input signal and then the voice is synthesized so that the time length of the output voice becomes the same as the time length of the input voice.

【００１８】すなわち、図２に示すピッチ変換装置は、
ピッチ変換され、時間長が変わった合成音声を時間軸上
で圧縮、伸張し、入力音声の時間長に合わせてから出力
するようにしたものである。図３に示すピッチ変換装置
は、ピッチ変換率をもとに、入力された音声信号をピッ
チ変換の処理を行なう前に時間軸上で圧縮、伸張するも
ので、ピッチが変換された後の合成音声の時間長が、入
力された音声信号と同じになるようにしたものである。
さらに図４に示すピッチ変換装置は、再サンプリングし
た残差信号を時間軸上で圧縮、伸張し、残差信号のフレ
ーム長を入力音声信号のフレーム長と合わせてから、合
成フィルタで合成することによって、合成音声の時間長
が入力音声の時間長と変わらないようにしたものであ
る。そして図５に示すピッチ変換装置は、ピッチ変換率
をもとに、再サンプリングする前の残差信号を、ピッチ
変換率をもとに圧縮、伸張し、再サンプリングされた残
差信号が入力音声信号のフレーム長と同じ長さになるよ
うにして、合成音声の時間長が入力音声の時間長と同じ
長さになるようにしたものである。That is, the pitch conversion device shown in FIG.
The synthesized voice whose pitch is converted and whose time length is changed is compressed and expanded on the time axis, and is output after matching the time length of the input voice. The pitch conversion apparatus shown in FIG. 3 compresses and expands an input voice signal on the time axis based on the pitch conversion rate before performing the pitch conversion process, and synthesizes after the pitch conversion. The time length of the voice is set to be the same as the input voice signal.
Further, the pitch conversion apparatus shown in FIG. 4 compresses and expands the resampled residual signal on the time axis, matches the frame length of the residual signal with the frame length of the input speech signal, and then synthesizes them with a synthesis filter. By doing so, the time length of the synthesized voice is kept the same as the time length of the input voice. Then, the pitch conversion apparatus shown in FIG. 5 compresses and expands the residual signal before resampling based on the pitch conversion rate based on the pitch conversion rate, and the resampled residual signal outputs the input speech. The time length of the synthesized voice is set to be the same as the frame length of the signal, and the time length of the synthesized voice is set to be the same as the time length of the input voice.

【００１９】このように、上記第２の実施例では、音声
の時間長（速さ）を変えずに、音声のピッチ（高さ）を
変換できるという効果を有する。As described above, the second embodiment has an effect that the pitch (height) of the voice can be converted without changing the time length (speed) of the voice.

【００２０】[0020]

【発明の効果】本発明は、上記実施例から明らかなよう
に、ピッチ情報を有する残差信号をピッチを高くする時
は音声入力のサンプリング周波数よりも低く設定し、ピ
ッチを低くするときは高く設定し、ピッチの変換率によ
って、その値を調整して再サンプリングを行ないサンプ
ル数を変えることにより、もとのサンプリング周波数で
みると、あたかも残差信号が時間的に伸縮していること
になるため、ピッチ間隔を広くしたり、狭くしたりする
ことができ、再サンプリングされた残差信号と、入力音
声から計算された線形予測係数を用いて合成フィルタで
合成すると、声の高さが変換された音声を合成すること
ができるという利点を有する。As is apparent from the above embodiment, the present invention sets the residual signal having pitch information to be lower than the sampling frequency of the voice input when the pitch is increased, and is set higher when the pitch is decreased. By setting and adjusting the value according to the conversion rate of the pitch and re-sampling to change the number of samples, at the original sampling frequency, it is as if the residual signal is expanding or contracting in time. Therefore, the pitch interval can be widened or narrowed, and when the resampled residual signal and the linear prediction coefficient calculated from the input speech are used for synthesis by the synthesis filter, the pitch of the voice is converted. This has the advantage that the synthesized speech can be synthesized.

【００２１】また本発明は、上記第２の実施例から明ら
かなように、合成された音声信号は、ピッチを高くした
ときは時間長が短く、ピッチを低くしたときは時間長が
長くなってしなうため、合成された音声信号、ピッチ変
換する前の入力音声、再サンプリングした後の残差信号
または再サンプリングした前の残差信号をピッチ変換す
る割合によって、時間軸上で圧縮、伸張し、入力信号と
出力信号の時間長を合わせることによって、音声の時間
長を変更せずにピッチを変更できるという効果を有す
る。As is apparent from the second embodiment of the present invention, the synthesized voice signal has a short time length when the pitch is raised and a long time length when the pitch is lowered. Therefore, according to the ratio of the pitch conversion of the synthesized voice signal, the input voice before pitch conversion, the residual signal after resampling or the residual signal before resampling, compression and expansion on the time axis, By adjusting the time lengths of the input signal and the output signal, the pitch can be changed without changing the time length of the voice.

[Brief description of drawings]

【図１】本発明の第１の実施例におけるピッチ変換装置
の構成を示す概略ブロック図FIG. 1 is a schematic block diagram showing the configuration of a pitch conversion device according to a first embodiment of the present invention.

【図２】本発明の第２の実施例におけるピッチ変換装置
の構成を示す概略ブロック図FIG. 2 is a schematic block diagram showing a configuration of a pitch conversion device according to a second embodiment of the present invention.

【図３】本発明の第２の実施例における変形例を示す概
略ブロック図FIG. 3 is a schematic block diagram showing a modification of the second embodiment of the present invention.

【図４】本発明の第２の実施例における変形例を示す概
略ブロック図FIG. 4 is a schematic block diagram showing a modification of the second embodiment of the present invention.

【図５】本発明の第２の実施例における変形例を示す概
略ブロック図FIG. 5 is a schematic block diagram showing a modification of the second embodiment of the present invention.

【図６】本発明の実施例の動作を説明するための波形図FIG. 6 is a waveform diagram for explaining the operation of the embodiment of the present invention.

[Explanation of symbols]

１音声入力手段２ピッチ変換率入力手段３線形予測分析手段４逆フィルタ５残差信号再サンプリング手段６合成フィルタ７合成音声出力手段８音声時間軸伸縮手段９残差時間軸伸縮手段 1 Voice Input Means 2 Pitch Conversion Rate Input Means 3 Linear Prediction Analysis Means 4 Inverse Filters 5 Residual Signal Resampling Means 6 Synthetic Filters 7 Synthetic Speech Output Means 8 Voice Time Axis Expansion / Expansion Means 9 Residual Time Axis Expansion / Expansion Means

Claims

[Claims]

1. A voice inputting means, a pitch conversion rate inputting means, a linear prediction analyzing means for linearly predicting input speech to obtain a linear prediction coefficient, an obtained linear prediction coefficient and an input speech signal. And an inverse filter for calculating a residual signal from the residual signal, the residual signal is interpolated by an interpolation polynomial, and the residual signal is resampled while adjusting the sampling frequency according to the pitch conversion rate input by the pitch conversion rate input means. Pitch comprising sampling means, synthesis filter for synthesizing speech using the resampled residual signal and linear prediction coefficient, and synthetic speech output means for outputting synthesized speech at the same sampling frequency as at input Converter.

2. The pitch conversion device according to claim 1, further comprising means for adjusting the time length of the voice.