JP2005062442A

JP2005062442A - Waveform connection apparatus, waveform connection method and program

Info

Publication number: JP2005062442A
Application number: JP2003291958A
Authority: JP
Inventors: Kunihiro Suga; 邦博須賀
Original assignee: Kenwood KK
Current assignee: Kenwood KK
Priority date: 2003-08-12
Filing date: 2003-08-12
Publication date: 2005-03-10

Abstract

<P>PROBLEM TO BE SOLVED: To provide a waveform connecting apparatus etc., for generating a synthesized voice which has small noise and is natural. <P>SOLUTION: A cepstrum analysis part 1 specifies the frequency of a pitch component of a speech that a waveform signal supplied from an input end IN represents. A variable BPF 2 generates a pitch signal by filtering the waveform signal with a characteristic of a band-pass filter having as its center frequency that frequency specified by the cepstrum analysis part 1. A zero-cross detection part 3 specifies the timing where the pitch signal raises to cross zero and a pitch mark addition part 5 adds a pitch mark indicating the timing to the waveform signal. A waveform connection processing part 6 sections a plurality of waveform signals given pitch marks nearby the timing where the pitch signal crosses zero so that a precedent waveform signal and a following waveform signal have nearly equal momentary values at the end point of the former waveform signal and at the start point of the latter waveform signal, and then connects them to each other to generate a waveform signal representing a synthesized speech. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

この発明は、波形接続装置、波形接続方法及びプログラムに関する。 The present invention relates to a waveform connecting device, a waveform connecting method, and a program.

近年、音声合成の技術により合成された音声が広く利用されている。具体的には、たとえば、テキスト読み上げソフトウェアや、電話番号案内や、株式案内、旅行案内、店舗案内、交通情報など、多くの場面で利用されている。 In recent years, speech synthesized by speech synthesis technology has been widely used. Specifically, it is used in many scenes such as text-to-speech software, telephone number guidance, stock guidance, travel guidance, store guidance, traffic information, and the like.

音声合成の手法には、大別して、規則合成方式と、波形編集方式（コーパス方式）とがある。
規則合成方式は、音声を合成する対象のテキストについて形態素解析を行い、解析の結果に基づき、テキストに音韻論的処理を施すことにより音声を生成する手法である。規則合成方式では、音声合成に用いるテキストの内容についての制約が少なく、多様な内容のテキストを音声合成に用いることができる。しかし、規則合成方式では、波形編集方式に比べ、出力される音声の品質が劣っている。 The speech synthesis methods are roughly classified into a rule synthesis method and a waveform editing method (corpus method).
The rule synthesis method is a method of generating speech by performing morphological analysis on a text to be synthesized and performing phonological processing on the text based on the analysis result. In the rule synthesis method, there are few restrictions on the content of text used for speech synthesis, and texts with various contents can be used for speech synthesis. However, in the rule synthesis method, the quality of the output voice is inferior compared to the waveform editing method.

一方、コーパス方式は、人間が実際に発話した音声を録音して、録音した音声を細分化して得られる構成部分をつなぎ合わせることにより、目的とする音声を得る手法である。波形編集方式は、音声の品質の点で規則合成方式より有利であり、肉声感のある音声が得られる。 On the other hand, the corpus method is a method of obtaining a target voice by recording voices actually spoken by humans and connecting constituent parts obtained by subdividing the recorded voices. The waveform editing method is more advantageous than the rule synthesis method in terms of voice quality, and a voice with a real voice can be obtained.

しかし、録音した音声の構成部分をつなぎ合わせる接続部分の波形は、単純につなぎ合わせた場合は一般に不連続となり、これがノイズの発生源となって、合成音声の品質の低下を招く。そこで、接続部分が不連続であることにより生じるこのノイズを軽減する手法として、２個の波形をつなぎ合わせる際、前に来る波形のうちなるべく後端に近い部分と、後ろに来る波形のうちなるべく前端に近い部分とから、瞬時値と接線の勾配とが互いにほぼ一致する点を１個ずつ探し出して、これらの点同士を接続する、という手法が考えられている（例えば、特許文献１参照）。また、例えば図５に示すように、前に来る波形（図５で“Ｓ１”として示す波形）のなるべく後端に近い部分と、後ろに来る波形（図５で“Ｓ２”として示す波形）のなるべく前端に近い部分とから、瞬時値が０になる（ゼロクロスする）点を１個ずつ探し出して接続する、という手法も考えられる。
特開２００３−１５６８１号公報 However, the waveform of the connection part that connects the constituent parts of the recorded voice is generally discontinuous when they are simply connected, and this becomes a source of noise, leading to a decrease in the quality of the synthesized voice. Therefore, as a technique for reducing this noise caused by the discontinuity of the connected portion, when connecting two waveforms, the portion of the preceding waveform that is as close as possible to the rear end and the waveform that is behind are as much as possible. A method is conceived in which a point at which the instantaneous value and the gradient of the tangential line substantially match each other is found one by one from a portion close to the front end, and these points are connected to each other (see, for example, Patent Document 1). . Further, for example, as shown in FIG. 5, a waveform that comes as close to the rear end as possible (a waveform shown as “S1” in FIG. 5) and a waveform that comes after (a waveform shown as “S2” in FIG. 5) A method of finding and connecting one point at a time when the instantaneous value becomes 0 (zero crossing) from a portion as close to the front end as possible is also conceivable.
JP 2003-15681 A

ところが、一般に、音声の波形はピッチ周期と呼ばれる時間間隔で周期性を有しており、自然な合成音声を得るためには、先行の単位音声のピッチ周期の終点と後続の単位音声のピッチ周期の始点とを適切に（人が実際にこれらの単位音声を発声した場合に終点や始点となるような箇所を）選択したうえ、これらの終点と始点とを接続するようにして、音声をつなぎ合わせる必要がある。 However, in general, a speech waveform has periodicity at a time interval called a pitch period, and in order to obtain a natural synthesized speech, an end point of a pitch period of a preceding unit speech and a pitch period of a subsequent unit speech are obtained. Select the appropriate start point (the point that will be the end point or start point when a person actually utters these unit sounds), and connect the end point and start point to connect the audio. It is necessary to match.

これに対し、例えば、ゼロクロスする点同士を接続する上述の手法では、図５に示すように、区切りにより生じる端がピッチ周期の始点や終点として適切な点にならず、結果として得られる合成音声（図５で“Ｓ０”として示す波形を有する音声）が不自然なものとなることが多い（なお、図５において、“Ｄ１”として示す区間は波形Ｓ１のピッチ周期であり、“Ｄ２”として示す区間は波形Ｓ２のピッチ周期である）。同様の現象は、瞬時値と接線の勾配とが互いにほぼ一致する点同士を接続する上述の手法でも起こる。 On the other hand, for example, in the above-described method of connecting points that cross zero, as shown in FIG. 5, the end generated by the break is not an appropriate point as the start point or end point of the pitch period, and the resultant synthesized speech (Sound having a waveform shown as “S0” in FIG. 5) is often unnatural (in FIG. 5, the section shown as “D1” is the pitch period of the waveform S1, and “D2”) The section shown is the pitch period of the waveform S2.) A similar phenomenon also occurs in the above-described method of connecting points at which the instantaneous value and the gradient of the tangential line substantially coincide with each other.

この発明は上記実状に鑑みてなされたものであり、ノイズが少なく、あるいは自然な合成音声を生成するための波形接続装置、波形接続方法及びプログラムを提供することを目的とする。 The present invention has been made in view of the above circumstances, and an object thereof is to provide a waveform connecting device, a waveform connecting method, and a program for generating a natural synthesized speech with little noise.

上記目的を達成するため、この発明の第１の観点に係る波形接続装置は、
音声の波形を表す第１の波形信号、及び、前記第１の波形信号に後続すべき第２の波形信号を取得し、取得したそれぞれの波形信号について、当該波形信号が表す音声のピッチ成分の周波数を特定するピッチ成分周波数特定手段と、
前記第１及び第２の波形信号のそれぞれについて、前記ピッチ成分周波数特定手段が特定したピッチ成分の周波数近傍の成分以外が遮断されるような通過帯域特性となるように自己の通過帯域特性を変化させ、当該波形信号をフィルタリングしてピッチ成分を抽出する可変フィルタ手段と、
前記第１及び第２の波形信号のそれぞれについて、前記可変フィルタ手段により抽出されたピッチ成分が所定の位相となるタイミングを特定する位相検出手段と、
前記第１及び第２の波形信号を、それぞれ、前記位相検出手段が検出したタイミングに相当する箇所の近傍で区切ることにより、前記第１の波形信号の終点及び前記第２の波形信号の始点を決定し、前記第１及び第２の波形信号を、当該第１の波形信号の終点と当該第２の波形信号の始点とで互いに接続する接続手段と、を備え、
前記接続手段は、前記第１及び第２の波形信号を、前記第１の波形信号の終点と前記第２の波形信号の始点とが互いに実質的に等しい瞬時値となるように区切るものである、
ことを特徴とする。 In order to achieve the above object, a waveform connecting device according to a first aspect of the present invention is provided by:
A first waveform signal representing a voice waveform and a second waveform signal to be subsequent to the first waveform signal are acquired, and for each acquired waveform signal, the pitch component of the voice represented by the waveform signal is obtained. Pitch component frequency specifying means for specifying the frequency;
For each of the first and second waveform signals, the self passband characteristic is changed so that the passband characteristic is cut off except for the components in the vicinity of the frequency of the pitch component specified by the pitch component frequency specifying means. Variable filter means for filtering the waveform signal and extracting a pitch component;
Phase detection means for identifying the timing at which the pitch component extracted by the variable filter means has a predetermined phase for each of the first and second waveform signals;
By dividing the first and second waveform signals in the vicinity of the portion corresponding to the timing detected by the phase detection means, the end point of the first waveform signal and the start point of the second waveform signal are obtained. Connecting means for determining and connecting the first and second waveform signals to each other at the end point of the first waveform signal and the start point of the second waveform signal;
The connecting means divides the first and second waveform signals so that the end point of the first waveform signal and the start point of the second waveform signal are substantially equal to each other. ,
It is characterized by that.

また、この発明の第２の観点に係る波形接続装置は、
音声の波形を表す第１の波形信号、及び、前記第１の波形信号に後続すべき第２の波形信号を取得し、取得したそれぞれの波形信号について、当該波形信号が表す音声のピッチ成分の周波数を特定するピッチ成分周波数特定手段と、
前記第１及び第２の波形信号のそれぞれについて、前記ピッチ成分周波数特定手段が特定したピッチ成分の周波数と実質的に一致した中心周波数を有するバンドパスフィルタの通過帯域特性になるように自己の通過帯域特性を変化させ、当該波形信号をフィルタリングしてピッチ成分を抽出する可変フィルタ手段と、
前記第１及び第２の波形信号のそれぞれについて、前記可変フィルタ手段により抽出されたピッチ成分が所定の位相となるタイミングを特定する位相検出手段と、
前記第１及び第２の波形信号を、それぞれ、前記位相検出手段が検出したタイミングに相当する箇所で区切ることにより、前記第１の波形信号の終点及び前記第２の波形信号の始点を決定し、前記第１及び第２の波形信号を、当該第１の波形信号の終点と当該第２の波形信号の始点とで互いに接続する接続手段と、を備える、
ことを特徴とする。 Moreover, the waveform connecting device according to the second aspect of the present invention is:
A first waveform signal representing a voice waveform and a second waveform signal to be subsequent to the first waveform signal are acquired, and for each acquired waveform signal, the pitch component of the voice represented by the waveform signal is obtained. Pitch component frequency specifying means for specifying the frequency;
For each of the first and second waveform signals, the self-pass so as to have a passband characteristic of a bandpass filter having a center frequency substantially coincident with the frequency of the pitch component specified by the pitch component frequency specifying means. Variable filter means for changing a band characteristic and filtering the waveform signal to extract a pitch component;
Phase detection means for identifying the timing at which the pitch component extracted by the variable filter means has a predetermined phase for each of the first and second waveform signals;
The first waveform signal and the second waveform signal are delimited at locations corresponding to the timing detected by the phase detection means, thereby determining the end point of the first waveform signal and the start point of the second waveform signal. Connection means for connecting the first and second waveform signals to each other at the end point of the first waveform signal and the start point of the second waveform signal;
It is characterized by that.

前記ピッチ成分周波数特定手段は、前記第１及び第２の波形信号のそれぞれについて、当該波形信号のケプストラムが極大値をとる周波数を、当該波形信号のピッチ成分の周波数として特定するものであってもよい。 The pitch component frequency specifying means may specify, for each of the first and second waveform signals, a frequency at which a cepstrum of the waveform signal takes a maximum value as a frequency of a pitch component of the waveform signal. Good.

前記位相検出手段は、前記第１及び第２の波形信号のそれぞれについて、前記可変フィルタ手段により抽出されたピッチ成分の値が所定方向への符号の変化を伴ってゼロクロスするタイミングを特定するものであってもよい。 The phase detection means specifies the timing at which the value of the pitch component extracted by the variable filter means zero-crosses with a change in sign in a predetermined direction for each of the first and second waveform signals. There may be.

また、この発明の第３の観点に係る波形接続方法は、
音声の波形を表す第１の波形信号、及び、前記第１の波形信号に後続すべき第２の波形信号を取得し、取得したそれぞれの波形信号について、当該波形信号が表す音声のピッチ成分の周波数を特定するピッチ成分周波数特定ステップと、
前記第１及び第２の波形信号のそれぞれについて、前記ピッチ成分周波数特定ステップで特定したピッチ成分の周波数近傍の成分以外が遮断されるような通過帯域特性でフィルタリングしてピッチ成分を抽出する可変フィルタリングステップと、
前記第１及び第２の波形信号のそれぞれについて、前記可変フィルタリングステップで抽出されたピッチ成分が所定値の位相となるタイミングを特定する位相検出ステップと、
前記第１及び第２の波形信号を、それぞれ、前記位相検出ステップで検出したタイミングに相当する箇所の近傍で区切ることにより、前記第１の波形信号の終点及び前記第２の波形信号の始点を決定し、前記第１及び第２の波形信号を、当該第１の波形信号の終点と当該第２の波形信号の始点とで互いに接続する接続ステップと、を含み、
前記接続ステップでは、前記第１及び第２の波形信号を、前記第１の波形信号の終点と前記第２の波形信号の始点とが互いに実質的に等しい瞬時値となるように区切る、
ことを特徴とする。 Moreover, the waveform connection method according to the third aspect of the present invention is as follows:
A first waveform signal representing a voice waveform and a second waveform signal to be subsequent to the first waveform signal are acquired, and for each acquired waveform signal, the pitch component of the voice represented by the waveform signal is obtained. A pitch component frequency specifying step for specifying a frequency;
For each of the first and second waveform signals, variable filtering that extracts a pitch component by filtering with passband characteristics such that components other than those in the vicinity of the frequency of the pitch component specified in the pitch component frequency specifying step are blocked. Steps,
For each of the first and second waveform signals, a phase detection step for specifying a timing at which the pitch component extracted in the variable filtering step becomes a predetermined value phase;
By dividing the first and second waveform signals in the vicinity of the portion corresponding to the timing detected in the phase detection step, the end point of the first waveform signal and the start point of the second waveform signal are obtained. Determining and connecting the first and second waveform signals to each other at an end point of the first waveform signal and a start point of the second waveform signal,
In the connecting step, the first and second waveform signals are separated such that the end point of the first waveform signal and the start point of the second waveform signal are substantially equal to each other,
It is characterized by that.

また、この発明の第４の観点に係るプログラムは、
コンピュータを、
音声の波形を表す第１の波形信号、及び、前記第１の波形信号に後続すべき第２の波形信号を取得し、取得したそれぞれの波形信号について、当該波形信号が表す音声のピッチ成分の周波数を特定するピッチ成分周波数特定手段と、
前記第１及び第２の波形信号のそれぞれについて、前記ピッチ成分周波数特定手段が特定したピッチ成分の周波数近傍の成分以外が遮断されるような通過帯域特性となるように自己の通過帯域特性を変化させ、当該波形信号をフィルタリングしてピッチ成分を抽出する可変フィルタ手段と、
前記第１及び第２の波形信号のそれぞれについて、前記可変フィルタ手段により抽出されたピッチ成分が所定の位相となるタイミングを特定する位相検出手段と、
前記第１及び第２の波形信号を、それぞれ、前記位相検出手段が検出したタイミングに相当する箇所の近傍で区切ることにより、前記第１の波形信号の終点及び前記第２の波形信号の始点を決定し、前記第１及び第２の波形信号を、当該第１の波形信号の終点と当該第２の波形信号の始点とで互いに接続する接続手段と、して機能させるためのプログラムであって、
前記接続手段は、前記第１及び第２の波形信号を、前記第１の波形信号の終点と前記第２の波形信号の始点とが互いに実質的に等しい瞬時値となるように区切るものである、
ことを特徴とする。 A program according to the fourth aspect of the present invention is
Computer
A first waveform signal representing a voice waveform and a second waveform signal to be subsequent to the first waveform signal are acquired, and for each acquired waveform signal, the pitch component of the voice represented by the waveform signal is obtained. Pitch component frequency specifying means for specifying the frequency;
For each of the first and second waveform signals, the self passband characteristic is changed so that the passband characteristic is cut off except for the components in the vicinity of the frequency of the pitch component specified by the pitch component frequency specifying means. Variable filter means for filtering the waveform signal and extracting a pitch component;
Phase detection means for identifying the timing at which the pitch component extracted by the variable filter means has a predetermined phase for each of the first and second waveform signals;
By dividing the first and second waveform signals in the vicinity of the portion corresponding to the timing detected by the phase detection means, the end point of the first waveform signal and the start point of the second waveform signal are obtained. A program for determining and functioning the first and second waveform signals as connection means for connecting the end point of the first waveform signal and the start point of the second waveform signal to each other. ,
The connecting means divides the first and second waveform signals so that the end point of the first waveform signal and the start point of the second waveform signal are substantially equal to each other. ,
It is characterized by that.

以上説明したように、この発明によれば、ノイズが少なく、あるいは自然な合成音声を生成するための波形接続装置、波形接続方法及びプログラムが実現される。 As described above, according to the present invention, a waveform connecting apparatus, a waveform connecting method, and a program for generating a synthesized speech with little noise or natural are realized.

以下に、図面を参照して、この発明の実施の形態を、音声合成装置を例として説明する。
図１は、この音声合成装置の構成を示すブロック図である。図示するように、この音声合成装置は、入力端ＩＮと、ケプストラム分析部１と、可変ＢＰＦ（バンドパスフィルタ）２と、遅延部４と、ゼロクロス検出部３と、ピッチマーク付加部５と、波形接続処理部６と、出力端ＯＵＴとより構成されている。
この音声合成装置は、予め録音した音声を個々の単位音声（例えば、１個の単語や短文を表す音声など）に細分化することによって得られる波形信号が入力端ＩＮから供給されたとき、その供給された波形信号を合成した合成音声信号を出力端ＯＵＴより出力するものである。 In the following, an embodiment of the present invention will be described with reference to the drawings, taking a speech synthesizer as an example.
FIG. 1 is a block diagram showing the configuration of this speech synthesizer. As shown, the speech synthesizer includes an input terminal IN, a cepstrum analysis unit 1, a variable BPF (bandpass filter) 2, a delay unit 4, a zero cross detection unit 3, a pitch mark addition unit 5, The waveform connection processing unit 6 and an output terminal OUT are included.
When a waveform signal obtained by subdividing a prerecorded voice into individual unit voices (for example, a voice representing one word or a short sentence) is supplied from the input terminal IN, A synthesized voice signal obtained by synthesizing the supplied waveform signal is output from the output terminal OUT.

ケプストラム分析部１、可変ＢＰＦ２、ゼロクロス検出部３、遅延部４、ピッチマーク付加部５及び波形接続処理部６は、それぞれ、ＤＳＰ（Digital Signal Processor）やＣＰＵ（Central Processing Unit）等のプロセッサと、ＲＡＭ（Random Access Memory）やハードディスク装置等のメモリとより構成されており、あるいは、専用の回路から構成されている。遅延部４は、それぞれ、たとえばシフトレジスタやディレイライン等の遅延回路より構成されていてもよい。
なお、同一のプロセッサが、ケプストラム分析部１、可変ＢＰＦ２、ゼロクロス検出部３、遅延部４、ピッチマーク付加部５及び波形接続処理部６の一部又は全部の機能を行うようにしてもよい。 The cepstrum analysis unit 1, the variable BPF 2, the zero cross detection unit 3, the delay unit 4, the pitch mark addition unit 5 and the waveform connection processing unit 6 are respectively a processor such as a DSP (Digital Signal Processor) and a CPU (Central Processing Unit), It is composed of a RAM (Random Access Memory), a memory such as a hard disk device, or a dedicated circuit. Each of the delay units 4 may be configured by a delay circuit such as a shift register or a delay line.
The same processor may perform a part or all of the functions of the cepstrum analysis unit 1, the variable BPF 2, the zero cross detection unit 3, the delay unit 4, the pitch mark addition unit 5, and the waveform connection processing unit 6.

ケプストラム分析部１は、入力端ＩＮより波形信号が供給されるたびに、この波形信号にケプストラム分析を施すことにより、この波形信号が表す音声のピッチ成分の周波数を特定する。（なお、波形信号のデータ形式は任意であり、例えばＰＣＭ（Pulse Code Modulation）形式を採っていればよい。） Each time a waveform signal is supplied from the input terminal IN, the cepstrum analysis unit 1 performs cepstrum analysis on the waveform signal to specify the frequency of the pitch component of the voice represented by the waveform signal. (Note that the data format of the waveform signal is arbitrary, for example, the PCM (Pulse Code Modulation) format may be adopted.)

ケプストラム分析の具体的な処理として、例えば、ケプストラム分析部１は、入力端ＩＮより波形信号を供給されると、まず、この波形信号の強度を、元の値の対数に実質的に等しい値へと変換する。（対数の底は任意であり、例えば常用対数などでよい。）
次に、ケプストラム分析部１は、値が変換された波形信号のスペクトル（すなわち、ケプストラム）を、高速フーリエ変換の手法（あるいは、離散的変数をフーリエ変換した結果を表すデータを生成する他の任意の手法）により求める。そして、このケプストラムの極大値を与える周波数のうちの最小値を、ピッチ成分の周波数（基本周波数）として特定する。そして、ケプストラム分析部１は、特定したピッチ成分の周波数を可変ＢＰＦ２の中心周波数（通過帯域の中央の周波数）とするように、可変ＢＰＦ２の通過帯域特性を制御する。 As specific processing of cepstrum analysis, for example, when the cepstrum analysis unit 1 is supplied with a waveform signal from the input terminal IN, first, the intensity of the waveform signal is changed to a value substantially equal to the logarithm of the original value. And convert. (The base of the logarithm is arbitrary, and may be a common logarithm, for example.)
Next, the cepstrum analysis unit 1 uses a fast Fourier transform technique (or other arbitrary data that generates a result of Fourier transform of discrete variables) on the spectrum of the waveform signal (ie, the cepstrum) whose value has been converted. This method is used. Then, the minimum value among the frequencies giving the maximum value of the cepstrum is specified as the frequency (basic frequency) of the pitch component. Then, the cepstrum analysis unit 1 controls the pass band characteristic of the variable BPF 2 so that the frequency of the identified pitch component is the center frequency of the variable BPF 2 (the center frequency of the pass band).

可変ＢＰＦ２は、中心周波数が可変なＦＩＲ（Finite Impulse Response）型又はＩＩＲ（Infinite Impulse Response）型のバンドパスフィルタの機能を行う。
具体的には、可変ＢＰＦ２は、自己の中心周波数を、ケプストラム分析部１の制御に従った値に設定する。一方、可変ＢＰＦ２は、入力端ＩＮより波形信号の供給を受け、ケプストラム分析部１が中心周波数の決定に用いたものと同一の波形信号をフィルタリングし、フィルタリングされた波形信号（ピッチ信号）を、ゼロクロス検出部３へと供給する。 The variable BPF 2 performs a function of a FIR (Finite Impulse Response) type or IIR (Infinite Impulse Response) type band pass filter whose center frequency is variable.
Specifically, the variable BPF 2 sets its center frequency to a value according to the control of the cepstrum analysis unit 1. On the other hand, the variable BPF 2 receives a waveform signal from the input terminal IN, filters the same waveform signal that the cepstrum analysis unit 1 used to determine the center frequency, and outputs the filtered waveform signal (pitch signal), This is supplied to the zero cross detection unit 3.

なお、ピッチ信号は、例えば、波形信号のサンプリング間隔と実質的に同一のサンプリング間隔を有するディジタル形式のデータからなっていればよい。また、可変ＢＰＦ２の帯域幅は、ケプストラム分析部１が特定したピッチ成分の周波数の２倍以内に常に収まることが望ましい。 Note that the pitch signal may be composed of digital data having a sampling interval substantially the same as the sampling interval of the waveform signal, for example. Further, it is desirable that the bandwidth of the variable BPF 2 always falls within twice the frequency of the pitch component specified by the cepstrum analysis unit 1.

ゼロクロス検出部３は、可変ＢＰＦ２から供給されたピッチ信号の瞬時値が負から正となるタイミング（立ち上がってゼロクロスするタイミング）を特定し、特定したタイミングを表す信号（ゼロクロス信号）を生成し、ピッチマーク付加部５へと供給する。ピッチ信号が立ち上がってゼロクロスするタイミングは、換言すれば、ピッチ信号を正弦波とみなした場合に当該正弦波の位相が０となるタイミングである。 The zero-cross detection unit 3 identifies the timing at which the instantaneous value of the pitch signal supplied from the variable BPF 2 changes from negative to positive (timing to rise and zero-cross), generates a signal (zero-cross signal) representing the identified timing, It supplies to the mark addition part 5. In other words, the timing at which the pitch signal rises and zero-crosses is the timing at which the phase of the sine wave becomes 0 when the pitch signal is regarded as a sine wave.

遅延部４は、入力端ＩＮより波形信号を供給されるたびに、この波形信号を一定時間遅延させてピッチマーク付加部５に供給する。なお、遅延部４が波形信号を遅延させる時間長は、この波形信号のうち、ゼロクロス検出部３よりピッチマーク付加部５に供給されるゼロクロス信号により特定されるタイミング（つまり、ピッチ信号が立ち上がってゼロクロスするタイミング）に相当する箇所が、このゼロクロス信号より早く（又は、このゼロクロス信号と同時に）ピッチマーク付加部５に供給されるような値に選ばれているものとする。 Each time the waveform signal is supplied from the input terminal IN, the delay unit 4 delays the waveform signal for a predetermined time and supplies the waveform signal to the pitch mark adding unit 5. The time length that the delay unit 4 delays the waveform signal is the timing specified by the zero cross signal supplied from the zero cross detection unit 3 to the pitch mark adding unit 5 (that is, the pitch signal rises). It is assumed that a portion corresponding to the timing of zero crossing is selected to a value that is supplied to the pitch mark adding unit 5 earlier than this zero cross signal (or simultaneously with this zero cross signal).

ピッチマーク付加部５は、遅延部４より波形信号を供給され、ゼロクロス検出部３よりゼロクロス信号を供給されるたびに、供給された波形信号に、この波形信号の１ピッチ分の区間の境界を示すピッチマークを付加して、波形接続処理部６へと順次供給する。
ピッチマーク付加部５が付加するピッチマークが示す箇所は、具体的には、図２に模式的に示すように、ゼロクロス検出部３より供給されるゼロクロス信号により特定されるタイミング（つまり、ピッチ信号が立ち上がってゼロクロスするタイミング）に相当する箇所（図２において“ＰＭ”として示す箇所）である。なお、図２は、“Ｓ１”及び“Ｓ２”として図示する波形を表す波形信号にピッチマークを付加した場合を例示する図である。 Each time the pitch mark adding unit 5 is supplied with the waveform signal from the delay unit 4 and the zero cross signal is supplied from the zero cross detection unit 3, the pitch mark adding unit 5 sets the boundary of the section corresponding to one pitch of the waveform signal to the supplied waveform signal. Pitch marks shown are added and sequentially supplied to the waveform connection processing unit 6.
Specifically, the position indicated by the pitch mark added by the pitch mark adding unit 5 is, as schematically shown in FIG. 2, the timing specified by the zero cross signal supplied from the zero cross detecting unit 3 (that is, the pitch signal). Is a location (a location indicated as “PM” in FIG. 2) corresponding to the timing at which zero rises and zero crossing). FIG. 2 is a diagram exemplifying a case where pitch marks are added to waveform signals representing waveforms illustrated as “S1” and “S2”.

人が実際に発声する音声においては、一般に、隣接する２個の単位音声の境界で先行の単位音声のピッチ成分がゼロクロスしていることが多く、また、この境界で後続の単位音声のピッチ成分もゼロクロスしていることが多い。このため、ピッチマーク付加部５が波形信号に付加するピッチマークは、この波形信号が人により実際に発声された場合に単位音声の境界となり得るような位置を示すものとなる。換言すれば、ピッチマークは、ピッチ周期の適切な終点や始点の位置を示すようになる。 In speech actually uttered by a person, in general, the pitch component of the preceding unit speech often crosses zero at the boundary between two adjacent unit speeches, and the pitch component of the subsequent unit speech at this boundary. Are often zero-crossed. For this reason, the pitch mark added to the waveform signal by the pitch mark adding unit 5 indicates a position that can be a boundary between unit sounds when the waveform signal is actually uttered by a person. In other words, the pitch mark indicates the position of an appropriate end point or start point of the pitch period.

波形接続処理部６は、ピッチマーク付加部５より、ピッチマークが付加された波形信号を順次供給されると、これらの波形信号を一時記憶する。そして、これらの波形信号を互いに接続して、得られた波形信号を、合成音声の波形信号として、出力端ＯＵＴより出力する。 When the waveform signal with the pitch mark added thereto is sequentially supplied from the pitch mark adding unit 5, the waveform connection processing unit 6 temporarily stores these waveform signals. Then, these waveform signals are connected to each other, and the obtained waveform signal is output from the output terminal OUT as a synthesized speech waveform signal.

ただし、波形接続処理部６は、例えば図３に示すように、２個の波形信号のうち先行する波形信号（図３に示す例では“Ｓ１”として示す波形を表す波形信号）のうちなるべく後端のピッチマークに近い部分と、後続する波形信号（図３に示す例では“Ｓ２”として示す波形を表す波形信号）のうちなるべく前端のピッチマークに近い部分とから、瞬時値及び接線の勾配が互いにほぼ一致する点を１個ずつ探し出して、これらの点（すなわち、先行する波形信号の終点と、後続する波形信号の始点）同士を重ねるようにして波形信号の接続を行って、合成音声の波形信号（図３に示す例では“Ｓ３”として示す波形を表す波形信号）を生成するものとする。先行する波形信号の終点より後ろの部分と、後続する波形信号の始点より前の部分は、切り捨てられる。 However, for example, as shown in FIG. 3, the waveform connection processing unit 6 is as late as possible of the preceding waveform signal (a waveform signal representing a waveform shown as “S1” in the example shown in FIG. 3). The instantaneous value and the tangent gradient from the portion close to the pitch mark at the end and the portion of the subsequent waveform signal (the waveform signal representing the waveform shown as “S2” in the example shown in FIG. 3) as close to the pitch mark as possible. Are searched for one point at a time, and the waveform signals are connected so that these points (that is, the end point of the preceding waveform signal and the start point of the subsequent waveform signal) overlap each other, and the synthesized speech (In the example shown in FIG. 3, a waveform signal representing a waveform shown as “S3”) is generated. The part after the end point of the preceding waveform signal and the part before the start point of the subsequent waveform signal are discarded.

なお、波形接続処理部６が、ピッチマークが付加された波形信号をどのような順序で接続するかを指定する手法は任意であり、例えば、波形接続処理部６は、ピッチマークが付加された波形信号を、自己に供給された順序で互いに接続するようにしてもよい。あるいは、接続する順序を指定するデータをユーザの操作等に従って外部より取得し、このデータが示す順序で接続するようにしてもよい。 Note that the waveform connection processing unit 6 can specify any order in which the waveform signals to which the pitch marks are added are connected in any order. For example, the waveform connection processing unit 6 has the pitch marks added thereto. The waveform signals may be connected to each other in the order in which they are supplied to them. Alternatively, data specifying the connection order may be acquired from the outside in accordance with a user operation or the like, and the connections may be made in the order indicated by the data.

以上説明した動作を行うことにより、この音声合成装置は、ピッチ信号が立ち上がってゼロクロスするタイミングで（従って、人により実際に発声された場合に単位音声の境界となり得るような位置で）波形信号を区切ったうえ、区切られた波形信号同士を、滑らかにつながるよう互いに接続する。このため、合成音声の波形信号は、自然な音声を表すものとなる。 By performing the operations described above, this speech synthesizer can generate a waveform signal at the timing when the pitch signal rises and zero-crosses (that is, at a position that can become the boundary of unit speech when actually uttered by a person). After the separation, the separated waveform signals are connected to each other so as to be connected smoothly. For this reason, the waveform signal of the synthesized speech represents natural speech.

なお、この音声合成装置の構成は上述のものに限られない。
たとえば、ピッチマーク付加部５は、波形信号にピッチマークを付加する代わりに、ゼロクロス検出部３がピッチマーク付加部５に供給するゼロクロス信号により特定されるタイミングを示すデータを波形接続処理部６に供給するようにしてもよい。この場合、波形接続処理部６は、このデータが示す位置を、ピッチマークが付加された位置であるものとして上述の処理を行えばよい。また、ゼロクロス検出部３がゼロクロス信号を波形接続処理部６に供給するようにしてもよい。この場合、波形接続処理部６は、波形信号上、このゼロクロス信号により特定されるタイミングに相当する位置を、ピッチマークが付加された位置であるものとして上述の処理を行えばよい。 The configuration of the speech synthesizer is not limited to that described above.
For example, instead of adding a pitch mark to the waveform signal, the pitch mark adding unit 5 sends data indicating the timing specified by the zero cross signal supplied to the pitch mark adding unit 5 by the zero cross detecting unit 3 to the waveform connection processing unit 6. You may make it supply. In this case, the waveform connection processing unit 6 may perform the above-described processing assuming that the position indicated by this data is the position to which the pitch mark is added. Further, the zero cross detection unit 3 may supply the zero cross signal to the waveform connection processing unit 6. In this case, the waveform connection processing unit 6 may perform the above-described processing on the waveform signal, assuming that the position corresponding to the timing specified by the zero cross signal is the position where the pitch mark is added.

また、ゼロクロス検出部３は、可変ＢＰＦ２から供給されたピッチ信号の瞬時値が正から負となるタイミング（立ち下がってゼロクロスするタイミング。換言すれば、ピッチ信号を正弦波とみなした場合に当該正弦波の位相がπ［ラジアン］となるタイミング）を特定し、このタイミングを表す信号をゼロクロス信号として生成するようにしてもよい。 Also, the zero cross detection unit 3 is the timing at which the instantaneous value of the pitch signal supplied from the variable BPF 2 changes from positive to negative (the timing of falling and zero crossing. In other words, the sine when the pitch signal is regarded as a sine wave. The timing at which the phase of the wave becomes π [radian] is specified, and a signal representing this timing may be generated as a zero-cross signal.

また、人が実際に発声したときに他の単位音声との境界でピッチ成分が（必ずしも０でない）所定の位相をとるような単位音声にピッチマークを付加する場合などにおいては、ゼロクロス検出部３が、ピッチ信号がほぼ当該位相となるようなタイミングを特定し、特定したタイミングを表す信号を、ゼロクロス信号に代わる信号としてピッチマーク付加部５へと供給するようにしてもよい。 In addition, when a pitch mark is added to a unit sound whose pitch component has a predetermined phase (not necessarily 0) at the boundary with another unit sound when a person actually utters, the zero cross detection unit 3 However, the timing at which the pitch signal substantially reaches the phase may be specified, and a signal representing the specified timing may be supplied to the pitch mark adding unit 5 as a signal in place of the zero cross signal.

ただし、ピッチ信号の任意の位相を特定する処理よりも、ピッチ信号のゼロクロスの有無及びゼロクロスした点前後での符号の変化を特定する処理の方が、簡単な構成で容易に行える。このため、この音声合成装置が取り扱う単位音声すべてについて、「人が実際に発声したときに他の単位音声との境界でピッチ成分の位相がほぼ０となる」とみなすことができる場合は、そのようにみなして取り扱う方が、ゼロクロス検出部３の構成を簡略化できる。 However, the process of specifying the presence or absence of the zero crossing of the pitch signal and the change of the sign before and after the zero crossing can be performed more easily with a simple configuration than the process of specifying an arbitrary phase of the pitch signal. For this reason, if all the unit sounds handled by this speech synthesizer can be regarded as “the phase of the pitch component becomes almost zero at the boundary with other unit sounds when a person actually utters”, Thus, it is possible to simplify the configuration of the zero-cross detection unit 3 by handling it as such.

また、波形接続処理部６は、互いに接続すべき２個の点を探し出すにあたって、必ずしも接線の勾配がほぼ一致することを条件としなくてもよく、先行する波形信号のうちなるべく後端のピッチマークに近い部分と、後続する波形信号のうちなるべく前端のピッチマークに近い部分とから、瞬時値が互いにほぼ一致する点１個ずつを探し出すようにしてもよい。 In addition, the waveform connection processing unit 6 does not necessarily have to have the tangent gradients substantially coincide with each other when searching for two points to be connected to each other. One point at which the instantaneous values substantially match each other may be searched for from the portion close to and the portion of the subsequent waveform signal as close as possible to the front end pitch mark.

また、ケプストラム分析部１は、必ずしも、可変ＢＰＦ２の中心周波数をピッチ成分の周波数に一致させなくてもよく、ピッチ成分の周波数近傍の成分以外が遮断されるような通過帯域特性となるように、可変ＢＰＦ２の通過帯域特性を制御すればよい。もっとも、周波数が可変ＢＰＦ２の中心周波数にほぼ一致する信号が可変ＢＰＦ２を通過した場合の位相遅れは、実質上０となる。このため、可変ＢＰＦ２の中心周波数はピッチ成分の周波数にほぼ一致していることが望ましい。 Further, the cepstrum analysis unit 1 does not necessarily need to make the center frequency of the variable BPF 2 coincide with the frequency of the pitch component, and has a passband characteristic that blocks other components in the vicinity of the frequency of the pitch component. What is necessary is just to control the pass-band characteristic of variable BPF2. However, the phase delay when a signal whose frequency substantially matches the center frequency of the variable BPF 2 passes through the variable BPF 2 is substantially zero. For this reason, it is desirable that the center frequency of the variable BPF 2 substantially matches the frequency of the pitch component.

また、周波数が可変ＢＰＦ２の中心周波数にほぼ一致する信号が可変ＢＰＦ２を通過した場合の位相遅れは上述の通り実質上０となるため、可変ＢＰＦ２の中心周波数が入力端ＩＮに供給された波形信号のピッチ成分の周波数に実質上一致していれば、この波形信号のうちピッチマークが付加された箇所の瞬時値はほぼ０であることが期待できる。従って、可変ＢＰＦ２の中心周波数が入力端ＩＮに供給された波形信号のピッチ成分の周波数に実質上一致している場合は、波形接続処理部６に供給された２個の波形信号を、ピッチマークが付加された箇所同士で互いに接続すればよい。 Further, since the phase delay when the signal whose frequency substantially matches the center frequency of the variable BPF 2 passes through the variable BPF 2 is substantially 0 as described above, the waveform signal in which the center frequency of the variable BPF 2 is supplied to the input terminal IN. If the pitch component substantially matches the frequency of the pitch component, the instantaneous value of the portion of the waveform signal to which the pitch mark is added can be expected to be almost zero. Therefore, when the center frequency of the variable BPF 2 substantially matches the frequency of the pitch component of the waveform signal supplied to the input terminal IN, the two waveform signals supplied to the waveform connection processing unit 6 are replaced with the pitch mark. What is necessary is just to mutually connect in the location to which "." Was added.

また、遅延部４は必ずしも必要ではなく、ピッチマーク付加部５が、入力端ＩＮより波形信号の供給を直接受けるようにしてもよい。この場合は、例えばピッチマーク付加部５が波形信号を一定時間以上一時記憶するようにしてもよい。あるいは、入力端ＩＮからピッチマーク付加部５に供給された波形信号のうちゼロクロス信号により特定されるタイミングに相当する箇所が、このゼロクロス信号と同時に（あるいはこのゼロクロス信号より遅く）ピッチマーク付加部５に供給されるならば、必ずしもピッチマーク付加部５が波形信号を一時記憶する必要はない。 Further, the delay unit 4 is not always necessary, and the pitch mark adding unit 5 may be directly supplied with the waveform signal from the input terminal IN. In this case, for example, the pitch mark adding unit 5 may temporarily store the waveform signal for a predetermined time or more. Alternatively, the portion corresponding to the timing specified by the zero cross signal in the waveform signal supplied from the input terminal IN to the pitch mark adding unit 5 is simultaneously with the zero cross signal (or later than the zero cross signal). The pitch mark adding unit 5 does not necessarily need to temporarily store the waveform signal.

なお、一般に、ＩＩＲ型のフィルタは、段数を大きくしても通過する信号の遅れが増大しないという特徴を有している。このため、可変ＢＰＦ２がＩＩＲ型のフィルタより構成されている場合は、遅延部４を省略できる場合が多い。 In general, an IIR filter has a feature that the delay of a signal passing therethrough does not increase even when the number of stages is increased. For this reason, when the variable BPF 2 is formed of an IIR type filter, the delay unit 4 can often be omitted.

また、この音声合成装置は、入力端ＩＮ、ケプストラム分析部１、可変ＢＰＦ２、ゼロクロス検出部３、遅延部４及びピッチマーク付加部５より構成されるユニットを複数備えるようにしてもよく、それぞれのユニットが、各自の入力端ＩＮに供給された波形信号を用いてピッチマーク付きの波形信号を生成し、波形接続処理部６に供給するようにしてもよい。そしてこの場合、波形接続処理部６は、各ユニットより供給されたピッチマーク付きの波形信号を用いて、合成音声の波形信号を生成すればよい。 The speech synthesizer may include a plurality of units each including an input terminal IN, a cepstrum analysis unit 1, a variable BPF 2, a zero cross detection unit 3, a delay unit 4, and a pitch mark addition unit 5. A unit may generate a waveform signal with a pitch mark using a waveform signal supplied to its input terminal IN and supply the waveform signal to the waveform connection processing unit 6. In this case, the waveform connection processing unit 6 may generate a synthesized speech waveform signal using the waveform signal with pitch marks supplied from each unit.

また、入力端ＩＮに供給される波形信号は、無音状態を表すものであっても差し支えない。有音状態を表す波形信号と無音状態を表す波形信号とが結合されることにより、有音状態を表す信号の端を含む部分（具体的には、たとえば、音声の始まりや終わり、あるいは息継ぎ部分など）がノイズを発生することが避けられ、またこの部分が自然に聞こえるものとなる。 Further, the waveform signal supplied to the input terminal IN may represent a silent state. By combining the waveform signal representing the sound state and the waveform signal representing the silence state, a portion including the end of the signal representing the sound state (specifically, for example, the beginning and end of the voice, or the breathing portion) Etc.) can be avoided, and this part can be heard naturally.

また、この音声合成装置は、入力端ＩＮに代えて、波形信号が記録された記録媒体（たとえば、フロッピー（登録商標）ディスクや、ＣＤ（Compact Disc）や、ＭＯ（Magneto-Optical Disk）など）から波形信号を読み出して遅延部４、可変ＢＰＦ２及びケプストラム分析部１に供給する記録媒体ドライブ装置（たとえば、フロッピー（登録商標）ディスクドライブや、ＣＤ−ＲＯＭドライブや、ＭＯドライブなど）を備えていてもよい。
また、この音声合成装置は、出力端ＯＵＴに代えて、波形接続処理部６が生成した波形信号を記録媒体に書き込む記録媒体ドライブ装置を備えていてもよい。
なお、同一の記録媒体ドライブ装置が、記録媒体からの波形信号を読み出す機能と、波形接続処理部６が生成した波形信号を記録媒体に書き込む機能とを両方行うようにしてもよい。 In addition, this speech synthesizer has a recording medium on which a waveform signal is recorded instead of the input terminal IN (for example, a floppy (registered trademark) disk, a CD (Compact Disc), an MO (Magneto-Optical Disk), etc.) Includes a recording medium drive device (for example, a floppy (registered trademark) disk drive, a CD-ROM drive, an MO drive, etc.) that reads the waveform signal from the signal and supplies it to the delay unit 4, the variable BPF 2, and the cepstrum analysis unit 1. Also good.
In addition, the speech synthesizer may include a recording medium drive device that writes the waveform signal generated by the waveform connection processing unit 6 to the recording medium instead of the output terminal OUT.
The same recording medium drive device may perform both the function of reading the waveform signal from the recording medium and the function of writing the waveform signal generated by the waveform connection processing unit 6 to the recording medium.

以上、この発明の実施の形態を説明したが、この発明にかかる波形接続装置は、専用のシステムによらず、通常のコンピュータシステムを用いて実現可能である。
例えば、パーソナルコンピュータに上述のケプストラム分析部１、可変ＢＰＦ２、ゼロクロス検出部３、遅延部４、ピッチマーク付加部５及び波形接続処理部６の動作を実行させるためのプログラムを格納した媒体（ＣＤ、ＭＯ、フロッピー（登録商標）ディスク等）から該プログラムをインストールすることにより、上述の処理を実行する音声合成装置を構成することができる。 Although the embodiment of the present invention has been described above, the waveform connecting apparatus according to the present invention can be realized using a normal computer system, not a dedicated system.
For example, a medium (CD, which stores a program for causing a personal computer to execute the operations of the above-described cepstrum analysis unit 1, variable BPF 2, zero cross detection unit 3, delay unit 4, pitch mark addition unit 5 and waveform connection processing unit 6) By installing the program from an MO, floppy (registered trademark) disk, or the like, a speech synthesizer that performs the above-described processing can be configured.

そして、このプログラムを実行するパーソナルコンピュータが、図１の音声合成装置の動作に相当する処理として、例えば、図４に示す処理を行うものとする。図４は、このパーソナルコンピュータが実行する処理を示すフローチャートである。 The personal computer that executes this program performs, for example, the process shown in FIG. 4 as a process corresponding to the operation of the speech synthesizer of FIG. FIG. 4 is a flowchart showing processing executed by the personal computer.

すなわち、このパーソナルコンピュータが、外部より、上述の単位音声の波形信号を表すデータを取得すると（図４、ステップＳ１０１）、まず、このパーソナルコンピュータは、このデータにケプストラム分析を施すことにより、このデータが表す音声のピッチ成分の周波数を特定する（ステップＳ１０２）。 That is, when the personal computer obtains data representing the waveform signal of the above unit voice from the outside (step S101 in FIG. 4), the personal computer first performs cepstrum analysis on the data to obtain the data. The frequency of the pitch component of the voice represented by is specified (step S102).

次に、このパーソナルコンピュータは、ステップＳ１０１で取得したデータを、ステップＳ１０２で特定した周波数を中心周波数とするバンドパスフィルタの特性でフィルタリングすることにより、ピッチ信号を表すデータを生成する（ステップＳ１０３）。 Next, the personal computer generates data representing the pitch signal by filtering the data acquired in step S101 with the characteristics of the bandpass filter having the frequency specified in step S102 as the center frequency (step S103). .

次に、このパーソナルコンピュータは、ステップＳ１０３で生成したデータが表すピッチ信号が立ち上がってゼロクロスするタイミングを特定し（ステップＳ１０４）、特定したこのタイミングに相当する波形信号上の位置を示すデータであるピッチマークを、ステップＳ１０１で取得したデータに付加する（ステップＳ１０５）。 Next, the personal computer specifies the timing at which the pitch signal represented by the data generated in step S103 rises and zero-crosses (step S104), and the pitch is data indicating the position on the waveform signal corresponding to the specified timing. The mark is added to the data acquired in step S101 (step S105).

このパーソナルコンピュータは、単位音声の波形信号を表すデータをステップＳ１０１で複数取得すると、これらのデータそれぞれについてステップＳ１０２〜Ｓ１０５の処理を行い、ピッチマーク付きのデータを、ステップＳ１０１で取得したデータの個数と同数生成する。 When the personal computer acquires a plurality of data representing the waveform signal of the unit sound in step S101, the personal computer performs the processing of steps S102 to S105 for each of these data, and the number of data acquired in step S101 as the data with the pitch mark. The same number is generated.

そして、このパーソナルコンピュータは、ピッチマークが付加された複数のデータに基づいて、合成音声の波形信号を表すデータを作成し、出力する（ステップＳ１０６）。ステップＳ１０６で作成され出力されるデータは、ステップＳ１０２〜Ｓ１０５の処理で生成される複数のデータが表す２個の波形信号のうち先行する波形信号のうちなるべく後端のピッチマークに近い部分と、後続する波形信号のうちなるべく前端のピッチマークに近い部分とから、瞬時値及び接線の勾配が互いにほぼ一致する点を１個ずつ探し出して、これらの点同士を重ねるようにして波形信号の接続を行う結果得られる合成音声の波形信号を表すものとする。 The personal computer then creates and outputs data representing the waveform signal of the synthesized speech based on the plurality of data with the pitch mark added (step S106). The data generated and output in step S106 includes a portion as close as possible to the rearmost pitch mark in the preceding waveform signal among the two waveform signals represented by the plurality of data generated in the processing in steps S102 to S105, From the subsequent waveform signal, search the point where the instantaneous value and the tangential gradient are almost identical to each other from the portion as close as possible to the pitch mark at the front end, and connect the waveform signal so that these points overlap each other. It is assumed that the waveform signal of the synthesized speech obtained as a result of execution is represented.

なお、パーソナルコンピュータに上述の音声合成装置の機能を行わせるプログラムは、例えば、通信回線の掲示板（ＢＢＳ）にアップロードし、これを通信回線を介して配信してもよく、また、このプログラムを表す信号により搬送波を変調し、得られた変調波を伝送し、この変調波を受信した装置が変調波を復調してこのプログラムを復元するようにしてもよい。そして、このプログラムを起動し、ＯＳの制御下に、他のアプリケーションプログラムと同様に実行することにより、上述の処理を実行することができる。 The program for causing the personal computer to perform the functions of the above-described speech synthesizer may be, for example, uploaded to a bulletin board (BBS) on a communication line and distributed via the communication line. The carrier wave may be modulated by the signal, the obtained modulated wave may be transmitted, and the apparatus that has received the modulated wave may demodulate the modulated wave to restore the program. The above-described processing can be executed by starting this program and executing it under the control of the OS in the same manner as other application programs.

なお、ＯＳが処理の一部を分担する場合、あるいは、ＯＳが本願発明の１つの構成要素の一部を構成するような場合には、記録媒体には、その部分を除いたプログラムを格納してもよい。この場合も、この発明では、その記録媒体には、コンピュータが実行する各機能又はステップを実行するためのプログラムが格納されているものとする。 When the OS shares a part of the processing, or when the OS constitutes a part of one component of the present invention, a program excluding the part is stored in the recording medium. May be. Also in this case, in the present invention, it is assumed that the recording medium stores a program for executing each function or step executed by the computer.

この発明の実施の形態に係る音声合成装置を示す図である。It is a figure which shows the speech synthesizer which concerns on embodiment of this invention. 波形信号にピッチマークが付加された状態を模式的に表す図である。It is a figure which represents typically the state by which the pitch mark was added to the waveform signal. 波形信号が接続される様子を説明する図である。It is a figure explaining a mode that a waveform signal is connected. この発明の実施の形態に係る音声合成装置の機能を行うパーソナルコンピュータが実行する処理を示すフローチャートである。It is a flowchart which shows the process which the personal computer which performs the function of the speech synthesizer concerning embodiment of this invention performs. 波形信号が不適切な箇所で接続される様子を説明する図である。It is a figure explaining a mode that a waveform signal is connected in an improper location.

Explanation of symbols

１ケプストラム分析部
２可変ＢＰＦ
３ゼロクロス検出部
４遅延部
５ピッチマーク付加部
６波形接続処理部
ＩＮ入力端
ＯＵＴ出力端 1 Cepstrum analysis unit 2 Variable BPF
3 Zero cross detection unit 4 Delay unit 5 Pitch mark adding unit 6 Waveform connection processing unit IN Input terminal OUT Output terminal

Claims

A first waveform signal representing a voice waveform and a second waveform signal to be subsequent to the first waveform signal are acquired, and for each acquired waveform signal, the pitch component of the voice represented by the waveform signal is obtained. Pitch component frequency specifying means for specifying the frequency;
For each of the first and second waveform signals, the self passband characteristic is changed so that the passband characteristic is cut off except for the components in the vicinity of the frequency of the pitch component specified by the pitch component frequency specifying means. Variable filter means for filtering the waveform signal and extracting a pitch component;
Phase detection means for identifying the timing at which the pitch component extracted by the variable filter means has a predetermined phase for each of the first and second waveform signals;
By dividing the first and second waveform signals in the vicinity of the portion corresponding to the timing detected by the phase detection means, the end point of the first waveform signal and the start point of the second waveform signal are obtained. Connecting means for determining and connecting the first and second waveform signals to each other at the end point of the first waveform signal and the start point of the second waveform signal;
The connecting means divides the first and second waveform signals so that the end point of the first waveform signal and the start point of the second waveform signal are substantially equal to each other. ,
A waveform connecting device characterized by that.

A first waveform signal representing a voice waveform and a second waveform signal to be subsequent to the first waveform signal are acquired, and for each acquired waveform signal, the pitch component of the voice represented by the waveform signal is obtained. Pitch component frequency specifying means for specifying the frequency;
For each of the first and second waveform signals, the self-pass so as to have a passband characteristic of a bandpass filter having a center frequency substantially coincident with the frequency of the pitch component specified by the pitch component frequency specifying means. Variable filter means for changing a band characteristic and filtering the waveform signal to extract a pitch component;
Phase detection means for identifying the timing at which the pitch component extracted by the variable filter means has a predetermined phase for each of the first and second waveform signals;
The first waveform signal and the second waveform signal are delimited at locations corresponding to the timing detected by the phase detection means, thereby determining the end point of the first waveform signal and the start point of the second waveform signal. Connection means for connecting the first and second waveform signals to each other at the end point of the first waveform signal and the start point of the second waveform signal;
A waveform connecting device characterized by that.

The pitch component frequency specifying means specifies, for each of the first and second waveform signals, a frequency at which the cepstrum of the waveform signal takes a maximum value as a frequency of the pitch component of the waveform signal.
The waveform connecting device according to claim 1 or 2, wherein

The phase detection means specifies the timing at which the value of the pitch component extracted by the variable filter means zero-crosses with a change in sign in a predetermined direction for each of the first and second waveform signals. is there,
The waveform connecting device according to claim 1, 2, or 3.

A first waveform signal representing a voice waveform and a second waveform signal to be subsequent to the first waveform signal are acquired, and for each acquired waveform signal, the pitch component of the voice represented by the waveform signal is obtained. A pitch component frequency specifying step for specifying a frequency;
For each of the first and second waveform signals, variable filtering that extracts a pitch component by filtering with passband characteristics such that components other than those in the vicinity of the frequency of the pitch component specified in the pitch component frequency specifying step are blocked. Steps,
For each of the first and second waveform signals, a phase detection step for specifying a timing at which the pitch component extracted in the variable filtering step becomes a predetermined value phase;
By dividing the first and second waveform signals in the vicinity of the portion corresponding to the timing detected in the phase detection step, the end point of the first waveform signal and the start point of the second waveform signal are obtained. Determining and connecting the first and second waveform signals to each other at an end point of the first waveform signal and a start point of the second waveform signal,
In the connecting step, the first and second waveform signals are separated such that the end point of the first waveform signal and the start point of the second waveform signal are substantially equal to each other,
The waveform connection method characterized by the above-mentioned.

Computer
A first waveform signal representing a voice waveform and a second waveform signal to be subsequent to the first waveform signal are acquired, and for each acquired waveform signal, the pitch component of the voice represented by the waveform signal is obtained. Pitch component frequency specifying means for specifying the frequency;
For each of the first and second waveform signals, the self passband characteristic is changed so that the passband characteristic is cut off except for the components in the vicinity of the frequency of the pitch component specified by the pitch component frequency specifying means. Variable filter means for filtering the waveform signal and extracting a pitch component;
Phase detection means for identifying the timing at which the pitch component extracted by the variable filter means has a predetermined phase for each of the first and second waveform signals;
By dividing the first and second waveform signals in the vicinity of the portion corresponding to the timing detected by the phase detection means, the end point of the first waveform signal and the start point of the second waveform signal are obtained. A program for determining and functioning the first and second waveform signals as connection means for connecting the end point of the first waveform signal and the start point of the second waveform signal to each other. ,
The connecting means divides the first and second waveform signals so that the end point of the first waveform signal and the start point of the second waveform signal are substantially equal to each other. ,
A program characterized by that.