JPH07129196A

JPH07129196A - Sound waveform segmenting device, sound waveform shaping device, and sound synthesizing device

Info

Publication number: JPH07129196A
Application number: JP5278266A
Authority: JP
Inventors: Toshimitsu Minowa; 利光蓑輪
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1993-11-08
Filing date: 1993-11-08
Publication date: 1995-05-19

Abstract

PURPOSE:To provide a voice synthesizing device which can reduce labor of segmenting the pitch waveform in voice synthesization and can suppress noise generation at the termination of the segmented pitch waveform. CONSTITUTION:Each zero-cross point immediately before the voice waveform goes beyond the envelope which is prepared by a pitch waveform envelope preparing device 13, and the next zero-cross point is sensed within a certain time before and after the pitch period presumed by a pitch period presuming device 12, and the voice waveform is cut out by a pitch segmenting device 15 while this zero-cross interval is made one pitch of voice waveform. In accordance with any pitch period, a necessary window is set by a waveform forming device 16, and pitch waveforms formed are connected with each other by a waveform connecting device 17, and a resultant synthetic sound is emitted from an output part 18.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、自動案内放送などに利
用される音声切出し装置、音声波形成形装置音声合成装
置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice slicing device, a voice waveform shaping device and a voice synthesizing device which are used for automatic guide broadcasting.

【０００２】[0002]

【従来の技術】近年、自動案内放送装置では、送出音声
の変更を行う場合、同じアナウンサが新たに録音をやり
直しており、相当の手間とコストがかかることから、そ
の手間とコストの削減が強く求められている。このた
め、アクセントの変更のように、音声の文字並びが変わ
らずピッチ周期だけを変更するような場合には、音声の
１ピッチ波形を切り出し、これを時間軸上で切り詰めて
接続したり、逆に１ピッチ波形の後ろに無音を追加して
からピッチ波形どうしを接続したりしてアクセントの変
更を行っていた。2. Description of the Related Art In recent years, in an automatic guide broadcasting apparatus, when changing the sound to be transmitted, the same announcer newly makes a new recording, which requires considerable labor and cost, so that labor and cost reduction are strongly required. It has been demanded. For this reason, when changing only the pitch cycle without changing the character arrangement of the voice, such as when changing the accent, one pitch waveform of the voice is cut out, and this is cut down on the time axis to connect or reverse. The accent was changed by adding silence after the 1-pitch waveform and then connecting the pitch waveforms together.

【０００３】以下に従来の音声合成装置について説明す
る。図４は従来の音声合成装置の構成を示すものであ
る。図４において、４１は入力音声を線形予測分析する
音声分析器、４２は変形自己相関法などで概ピッチ周期
の推定を行うピッチ概周期推定器、４３は１ピッチ波形
の切り出し器、４４はピッチの短縮器、４５はピッチの
伸張器、４６はピッチの接続器、４７は合成波形の出力
部である。A conventional speech synthesizer will be described below. FIG. 4 shows the configuration of a conventional speech synthesizer. In FIG. 4, 41 is a speech analyzer for linear prediction analysis of input speech, 42 is a pitch approximate period estimator for estimating an approximate pitch period by a modified autocorrelation method, 43 is a 1-pitch waveform clipper, and 44 is a pitch. , 45 is a pitch expander, 46 is a pitch connector, and 47 is a composite waveform output section.

【０００４】以上のように構成された音声合成装置につ
いて、以下その動作について説明する。まず、ディジタ
ル化された音声信号が音声分析器４１により分析され、
その結果に基づきピッチ概周期推定器４２により概ピッ
チ周期の推定を行う。次に、求められた概ピッチ周期を
参考にして、オペレータが概ピッチ周期に相当する１ピ
ッチ波形を、切り出し器４３を用いて音声波形から視察
により切り出す。ピッチを上げる場合は、短縮器４４に
よりそのピッチ周期間隔に合うよう１ピッチ波形の後半
を切り捨て、接続器４６よりピッチ波形どうしを接続す
る。逆に、ピッチを下げようとする場合は、そのピッチ
周期間隔となるよう、伸張器４５により切り出された１
ピッチ波形の後半に無音を追加してから接続器４６より
ピッチ波形どうしを接続する。従来の音声合成装置にお
いては、以上のような操作により、ピッチ周期の変更を
行っていた。The operation of the speech synthesizer configured as above will be described below. First, the digitized voice signal is analyzed by the voice analyzer 41,
Based on the result, the pitch approximate period estimator 42 estimates the approximate pitch period. Next, referring to the obtained approximate pitch period, the operator cuts out one pitch waveform corresponding to the approximate pitch period from the voice waveform by visual inspection using the cutout device 43. When increasing the pitch, the shortener 44 cuts off the latter half of one pitch waveform so as to match the pitch cycle interval, and connects the pitch waveforms from the connector 46. On the other hand, when the pitch is to be lowered, 1 is cut out by the expander 45 so as to have the pitch cycle interval.
After adding silence to the latter half of the pitch waveform, the pitch waveforms are connected from the connector 46. In the conventional speech synthesizer, the pitch cycle is changed by the above operation.

【０００５】[0005]

【発明が解決しようとする課題】しかしながら上記の従
来の音声合成装置では、ピッチ波形の切り出しに熟練を
要するため、音声合成の手間の削減が十分でないという
問題を有していた。また、１ピッチ波形の切り出しを矩
形窓によって行うため、切り出された１ピッチ波形の終
端で異常な音を発生する場合があるという問題を有して
いた。However, the above-mentioned conventional speech synthesizer has a problem that the time and effort required for speech synthesis are not sufficiently reduced because it requires skill in cutting out a pitch waveform. Further, since the 1-pitch waveform is cut out by the rectangular window, there is a problem that an abnormal sound may be generated at the end of the cut-out 1-pitch waveform.

【０００６】本発明は上記従来の問題を解決するもの
で、ピッチ波形切り出しの自動化を図る音声波形切り出
し装置、切り出された１ピッチ波形を成形する音声波形
成形装置、およびピッチ波形切り出しを自動化し、かつ
異常な音の発生を防止しうる優れた音声合成装置を提供
することを目的とする。The present invention solves the above-mentioned conventional problems, and a speech waveform slicing device for automating pitch waveform slicing, a speech waveform slicing device for shaping a sliced one pitch waveform, and a pitch waveform slicing automation, It is also an object of the present invention to provide an excellent voice synthesizer capable of preventing the generation of abnormal sound.

【０００７】[0007]

【課題を解決するための手段】上記目的を達成するため
に、本発明の音声波形切出し装置は、音声のピッチ波形
に相当する包絡を作成する手段と、この包絡を音声波形
が超える直前の音声波形のゼロクロス点を検知する検知
手段とを備え、このように検知された各ゼロクロス点の
間隔毎に音声波形を出力する構成としている。In order to achieve the above-mentioned object, a speech waveform slicing device of the present invention comprises means for creating an envelope corresponding to a pitch waveform of speech, and speech immediately before the speech waveform exceeds this envelope. And a detection means for detecting a zero-cross point of the waveform, and a voice waveform is output at intervals of each zero-cross point detected in this way.

【０００８】また、音声のピッチ概周期を推定する推定
手段を備え、ゼロクロス点検知手段により検知された第
１のゼロクロス点検知時から前記推定されたピッチ概周
期の前後一定時間内に検知された第２のゼロクロス点と
のゼロクロス点間隔を１ピッチの音声波形として出力す
る構成としている。Further, it is provided with an estimating means for estimating an approximate pitch period of the voice, and is detected within a certain time before and after the estimated approximate pitch cycle from the time of the first zero-cross point detection detected by the zero-cross point detecting means. The interval between the second zero-cross point and the zero-cross point is output as a 1-pitch voice waveform.

【０００９】さらに、音声のピッチ概周期を推定する推
定手段と、ゼロクロス点を検知する検知手段により検知
された音声波形の各ゼロクロス点の間隔をピッチ概周期
と比較する比較手段とを備え、この比較結果が一定値以
下である場合に、これを１ピッチ波形として出力する構
成としている。Further, there are provided an estimating means for estimating the pitch approximate period of the voice and a comparing means for comparing the interval between the respective zero cross points of the voice waveform detected by the detecting means for detecting the zero cross points with the pitch approximate period. When the comparison result is less than a certain value, this is output as a one-pitch waveform.

【００１０】一方、本発明の音声波形成形装置は、音声
波形を出力する出力手段と、任意のピッチ周期に応じて
音声波形を成形する窓を作成する窓作成手段とを備え、
出力手段より出力された音声波形を窓作成手段により作
成された窓によって、音声波形の成形を行う構成として
いる。On the other hand, the voice waveform shaping apparatus of the present invention comprises output means for outputting a voice waveform and window creating means for producing a window for shaping a voice waveform according to an arbitrary pitch period,
The voice waveform output from the output unit is shaped by the window created by the window creating unit.

【００１１】さらに、本発明の音声合成装置は、音声の
ピッチ概周期を推定する推定手段と、音声のピッチ波形
に相当する包絡を作成する手段と、この包絡を音声波形
が超える直前の音声波形のゼロクロス点を検知する検知
手段と、任意のピッチ周期に応じて音声波形を成形する
窓を作成する窓作成手段と、ピッチ波形同士を接続する
接続手段とを備え、前記検知手段により検知された第１
のゼロクロス点検知時から前記推定されたピッチ概周期
の前後一定時間内に検知された第２のゼロクロス点との
ゼロクロス点間隔を１ピッチの音声波形として出力し、
この出力された音声波形を窓作成手段により作成された
窓によって、音声波形の成形を行い、成形されたピッチ
波形どうしを接続手段により接続し、任意のピッチ周期
に音声合成を行う構成としている。Further, the speech synthesizer of the present invention comprises an estimating means for estimating a pitch pitch cycle of speech, a means for creating an envelope corresponding to the pitch waveform of the speech, and a speech waveform immediately before the speech waveform exceeds this envelope. Of the zero crossing point, a window creating means for creating a window for shaping a voice waveform according to an arbitrary pitch cycle, and a connecting means for connecting the pitch waveforms to each other. First
The zero-cross point interval with the second zero-cross point detected within a fixed time before and after the estimated pitch approximate period from the time of zero-cross point detection is output as a 1-pitch voice waveform,
The output voice waveform is shaped by the window created by the window creating means, and the shaped pitch waveforms are connected to each other by the connecting means to synthesize the voice in an arbitrary pitch cycle.

【００１２】また、音声のピッチ概周期を推定する推定
手段と、音声のピッチ波形に相当する包絡を作成する手
段と、この包絡を音声波形が超える直前の音声波形のゼ
ロクロス点を検知する検知手段と、この検知手段により
検知された音声波形の各ゼロクロス点の間隔を推定手段
により推定されたピッチ概周期と比較する比較手段と、
任意のピッチ周期に応じて音声波形を成形する窓を作成
する窓作成手段と、ピッチ波形同士を接続する接続手段
とを備え、比較手段における比較結果が一定値以下であ
る場合に、検知手段により検知された各ゼロクロス点の
間隔毎に、これを１ピッチの音声波形として出力手段よ
り音声を出力し、この出力された音声波形を窓作成手段
により作成された窓によって、音声波形の成形を行い、
成形されたピッチ波形どうしを接続手段により接続し、
任意のピッチ周期に音声合成を行う構成としている。Further, an estimating means for estimating the approximate pitch of the voice, a means for creating an envelope corresponding to the pitch waveform of the voice, and a detecting means for detecting a zero cross point of the voice waveform immediately before the voice waveform exceeds the envelope. And comparing means for comparing the interval between the respective zero-cross points of the voice waveform detected by the detecting means with the pitch approximate period estimated by the estimating means,
A window creating means for creating a window for shaping a voice waveform according to an arbitrary pitch period, and a connecting means for connecting the pitch waveforms to each other are provided, and when the comparison result in the comparing means is less than a certain value, the detecting means At each interval of each detected zero-cross point, the output means outputs a voice as a 1-pitch voice waveform, and the output voice waveform is shaped by the window created by the window creating means. ,
Connect the molded pitch waveforms by connecting means,
The speech synthesis is performed at an arbitrary pitch cycle.

【００１３】[0013]

【作用】この構成によって、音声波形の１ピッチ波形毎
のゼロクロス点を検知し、これをもとに１ピッチ波形を
自動的に切り出すことができ、また任意のピッチ周期に
対してピッチ波形の終端をスムーズに０とすることがで
きる。With this configuration, a zero-cross point for each pitch waveform of the voice waveform can be detected, and the one-pitch waveform can be automatically cut out based on the detected zero-cross point, and the pitch waveform can be terminated at any pitch period. Can be set to 0 smoothly.

【００１４】[0014]

【実施例】以下本発明の一実施例について、図面を参照
しながら説明する。DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of the present invention will be described below with reference to the drawings.

【００１５】図１において、１１は音声分析器、１２は
ピッチ概周期推定器であり、従来技術における音声分析
器４１、ピッチ概周期推定器４２と同一の機能である。
１３はカウンタ２１、メモリ２２、包絡作成部２３及び
スイッチＧ２より構成されるピッチ波形包絡作成器、１
４は波形包絡と音声の比較器、１５は音声の１ピッチを
切り出すピッチ切出器、１６は求められた１ピッチ波形
に対する波形成形器、１７は成形されたピッチ波形どう
しをつなぐ接続器、１８は合成波形の出力部である。In FIG. 1, reference numeral 11 is a speech analyzer, and 12 is a pitch approximate period estimator, which has the same functions as the speech analyzer 41 and the pitch approximate period estimator 42 in the prior art.
Reference numeral 13 denotes a pitch waveform envelope generator including a counter 21, a memory 22, an envelope generator 23, and a switch G2.
Reference numeral 4 is a waveform envelope / speech comparator, 15 is a pitch clipper that cuts out one pitch of voice, 16 is a waveform shaper for the obtained 1 pitch waveform, 17 is a connector that connects the formed pitch waveforms, 18 Is the output section of the composite waveform.

【００１６】図１、図２、および図３（ａ）、図３
（ｂ）を用いてその動作を説明する。まず、１ピッチ波
形の自動的な切り出し方法について説明する。従来と同
様にディジタル化された音声信号が入力され、音声分析
器１１およびピッチ概周期推定器１２によりピッチ概周
期ｆが推定される。このように推定されたピッチ概周期
ｆは５ピッチ毎に更新される。一方、ピッチ波形包絡作
成器１３において、以下の方法により最大振幅の検知お
よびピッチ波形の包絡が作成される。図２は、比較器１
４への入力信号ならびにスイッチＧ１およびＧ２のＯＮ
−ＯＦＦのタイミングを示す。1, 2 and 3 (a), 3
The operation will be described with reference to FIG. First, a method of automatically cutting out a one-pitch waveform will be described. As in the conventional case, a digitized voice signal is input, and the voice analyzer 11 and the pitch approximate period estimator 12 estimate the pitch approximate period f. The pitch approximate period f thus estimated is updated every 5 pitches. On the other hand, the pitch waveform envelope generator 13 detects the maximum amplitude and creates the envelope of the pitch waveform by the following method. FIG. 2 shows the comparator 1.
4 input signal and switches G1 and G2 ON
-OFF timing is shown.

【００１７】図２において、時刻Ｔ１に電源が投入され
ると、最大振幅は初期値０であり、時刻Ｔ２に音声が入
ってくるまでは比較器１４に入力される最大振幅も包絡
も０のままである。時刻Ｔ２で包絡（＝０）を超える音
声入力があると、比較器１４からスイッチＧ１を一時的
に（数μｓ）オンにする信号と、ピッチ波形包絡作成器
１３内のカウンタ２１をリセットする信号が出力され
る。カウンタ２１がリセットされると、カウンタ２１の
初期値が０となる。またスイッチＧ２もオンとなる。以
上の動作にともない時刻Ｔ２後に表れる音声信号の振幅
の最大値がメモリ２２に書き込まれると同時に、包絡作
成部２３にその振幅値が入力され、包絡が作成される。
なお、包絡の作成については後述する。In FIG. 2, when the power is turned on at time T1, the maximum amplitude has an initial value of 0, and the maximum amplitude and the envelope input to the comparator 14 remain 0 until the sound comes in at time T2. Is. When there is a voice input exceeding the envelope (= 0) at time T2, a signal from the comparator 14 that temporarily turns on the switch G1 (several μs) and a signal that resets the counter 21 in the pitch waveform envelope generator 13. Is output. When the counter 21 is reset, the initial value of the counter 21 becomes zero. The switch G2 is also turned on. With the above operation, the maximum value of the amplitude of the audio signal appearing after time T2 is written in the memory 22, and at the same time, the amplitude value is input to the envelope creating unit 23 to create the envelope.
The creation of the envelope will be described later.

【００１８】また、カウンタ２１において、カウンタ２
１がリセットされると同時に、ピッチ概周期推定器１２
において推定されたピッチ概周期ｆがロードされ、カウ
ントが開始される。一方、雑音等の不要な入力により一
時的に音声信号が包絡を超える場合等の誤検知を排除す
るために、ピッチ概周期ｆの前後一定期間についての
み、比較器１４において包絡と音声信号を比較する必要
がある。そこで予め一定値を、例えば５msと定め、カウ
ンタ２１の値がピッチ概周期ｆ±５msの間、カウンタ２
１はスイッチＧ２をオンにする信号を出力する。Further, in the counter 21, the counter 2
At the same time that 1 is reset, the pitch approximate period estimator 12
The pitch approximate period f estimated at is loaded, and counting is started. On the other hand, in order to eliminate erroneous detection when the voice signal temporarily exceeds the envelope due to unnecessary input such as noise, the comparator 14 compares the envelope and the voice signal only for a certain period before and after the approximate pitch period f. There is a need to. Therefore, a fixed value is set in advance to, for example, 5 ms, and the value of the counter 21 is kept in the counter 2 while the pitch approximate period f ± 5 ms
1 outputs a signal for turning on the switch G2.

【００１９】音声信号が包絡を超えた場合（時刻Ｔ
３）、スイッチＧ１が一時的にオンされ、音声信号がメ
モリ２２に入力される。その間の振幅の最大値がメモリ
２２に書き込まれると同時に、包絡作成部２３にその最
大振幅値が入力される。そして、これをもとに新たな包
絡が作成され、またカウンタ２１がリセットされる。逆
にカウンタ２１の値がピッチ概周期ｆの前後一定値（５
ms）に達していない間に、包絡を超える音声が比較器１
４に入力され（時刻Ｔ４）、スイッチＧ１がオンされて
も、前述のようにスイッチＧ２はオフの状態であり、包
絡はそのまま減衰し続けることとなり、雑音等の不要な
入力により一時的に音声信号が包絡を超える場合等の誤
検知を排除することができる。When the voice signal exceeds the envelope (time T
3), the switch G1 is temporarily turned on, and the audio signal is input to the memory 22. At the same time, the maximum value of the amplitude is written in the memory 22, and at the same time, the maximum amplitude value is input to the envelope creating section 23. Then, a new envelope is created based on this, and the counter 21 is reset. On the contrary, the value of the counter 21 is a constant value (5
voice that exceeds the envelope while it has not reached
4 is input (time T4) and the switch G1 is turned on, the switch G2 is in the off state as described above, and the envelope continues to be attenuated as it is. False detections such as when the signal exceeds the envelope can be eliminated.

【００２０】以上のように、概周期の前後一定時間内
に、初めて包絡線を超えた音声波形の振幅を最大振幅と
して検知し、この最大振幅をもとに包絡が作成される。As described above, the amplitude of the speech waveform exceeding the envelope is detected as the maximum amplitude for the first time within a certain time before and after the approximate period, and the envelope is created based on this maximum amplitude.

【００２１】なお、このような音声波形の最大振幅時点
を起点とする時刻ｔにおける波形包絡Ｅ（ｔ）は次の式
によって計算される。The waveform envelope E (t) at time t starting from the time point of maximum amplitude of the voice waveform is calculated by the following equation.

【００２２】Ｅ（ｔ）＝Ａexp（−πＢｔ）ここで、Ａは最大振幅値Ｂは第１ホルマントの平均帯域幅で、約５０Hzである。E (t) = Aexp (-πBt) where A is the maximum amplitude value and B is the average bandwidth of the first formant, which is about 50 Hz.

【００２３】図３（ａ）の３１、３２、３３はこのよう
にして検知された最大振幅であり、３４、３５、３６
は、上式に基づき、前記最大振幅３１、３２、３３各々
を始点として作成されたピッチ波形の包絡である。Reference numerals 31, 32, and 33 in FIG. 3A are maximum amplitudes detected in this way, and 34, 35, and 36.
Is the envelope of the pitch waveform created from each of the maximum amplitudes 31, 32, and 33 as a starting point based on the above equation.

【００２４】ピッチ概周期ｆの前後一定期間に、このよ
うに作成されたピッチ波形の包絡を音声信号が超える
と、ピッチ切出器１５に対して切り出し信号が出力され
る。ピッチ切出器１５は、入力された音声信号を１０ms
程度バッファリングできる。このようにバッファリング
された入力信号について、切り出し信号が入力されると
信号入力時点から時間的にさかのぼり、音声信号が初め
て負の値になる時点を検知する（図３（ａ）の３７，３
８，３９）。この負値時点をゼロクロス点と呼ぶ。この
ようにして各ゼロクロス点を検知すると、その間隔（３
７−３８、３８−３９）を１ピッチとし、入力音声をピ
ッチ切出器１５によって音声波形を切り出すことによ
り、ピッチ波形が自動的に切り出される。When the voice signal exceeds the envelope of the pitch waveform created in this way within a certain period before and after the approximate pitch period f, a cutout signal is output to the pitch cutout device 15. The pitch cutout device 15 receives the input voice signal for 10 ms.
Can be buffered to some extent. With respect to the buffered input signal, when the cutout signal is input, it goes back in time from the signal input time, and the time when the audio signal becomes a negative value for the first time is detected (37, 3 in FIG. 3A).
8, 39). This negative value time point is called a zero cross point. When each zero-cross point is detected in this way, the interval (3
7-38, 38-39) as one pitch, and the pitch waveform is cut out by the pitch cutout device 15 from the input voice, the pitch waveform is automatically cut out.

【００２５】なお、誤検出を排除するために、上述のよ
うにピッチ概周期ｆ±５msの時間内においてのみ、スイ
ッチＧ２をオンとして包絡作成を行うという方法をとら
ず、検知されたゼロクロス点の間隔（３７−３８，３８
−３９）を検出し、この時間間隔を概周期推定器１２に
より推定されたピッチ概周期と直接比較し、比較結果が
一定値以下、例えば±５ms程度以内の違いであれば、こ
れを１ピッチを示すゼロクロス点として取り扱いによ
り、ピッチ波形の自動切り出しを行うことも可能であ
る。In order to eliminate erroneous detection, the method of turning on the switch G2 to create the envelope only within the time of the pitch approximate period f ± 5 ms as described above is not adopted, but the detected zero-cross point is detected. Interval (37-38, 38
-39) is detected, and this time interval is directly compared with the pitch approximate period estimated by the approximate period estimator 12, and if the comparison result is a fixed value or less, for example, a difference within about ± 5 ms, this is 1 pitch. It is also possible to automatically cut out the pitch waveform by handling it as a zero-cross point.

【００２６】次に、このように切り出された音声波形の
成形について説明する。ピッチ切出器１５によって切り
出された１ピッチの音声波形に、波形成形器１６におい
て、ウィンドウをかけることにより、音声波形の終端が
０になるようにする。かかるウィンドウは、次の式によ
って作成される。Next, the shaping of the voice waveform cut out in this way will be described. The waveform shaping unit 16 applies a window to the 1-pitch voice waveform cut out by the pitch cutout unit 15 so that the end of the voice waveform becomes zero. Such a window is created by the following equation.

【００２７】[0027]

【数１】 [Equation 1]

【００２８】また、Ｔはピッチを上げようとする場合
と、下げようとする場合とで、次の様になる。Further, T is as follows depending on whether the pitch is to be raised or lowered.

【００２９】すなわち、ピッチを上げようとする場合に
は、That is, when trying to raise the pitch,

【００３０】[0030]

【数２】 [Equation 2]

【００３１】となる。また、ピッチを下げようとする場
合には、It becomes Also, when trying to lower the pitch,

【００３２】[0032]

【数３】 [Equation 3]

【００３３】となる。このようなウィンドウを図３
（ｂ）の５１、このウィンドウによって成形された波形
の例を５２に示す。It becomes Figure 3 shows such a window.
An example of the waveform formed by this window is shown at 51 in (b) and at 52.

【００３４】このように成形された波形はその終端がス
ムーズに０となっており、これらのピッチ波形どうしを
接続器１７によって接続し、出力部１８より出力される
ことにより、任意のピッチ周期に対してピッチ波形の端
で異常の音の発生しない音声を合成することができる。The end of the waveform thus shaped is smoothly 0, and these pitch waveforms are connected to each other by the connector 17 and output from the output section 18, so that an arbitrary pitch period is obtained. On the other hand, it is possible to synthesize a voice in which no abnormal sound is generated at the end of the pitch waveform.

【００３５】以上のように本実施例によれば、ピッチ波
形の切り出しを自動的に行えるため、音声合成の手間が
削減でき、また、波形成形したピッチ波形を使ってピッ
チ周期の変更をするので切り出されたピッチ波形の端で
の異常な音の発生を防止できる。As described above, according to the present embodiment, since the pitch waveform can be automatically cut out, the labor of voice synthesis can be reduced, and the pitch cycle can be changed by using the waveform-shaped pitch waveform. It is possible to prevent abnormal sounds from being generated at the ends of the cut pitch waveform.

【００３６】[0036]

【発明の効果】以上のように本発明は、検知したゼロク
ロス点を利用し、またピッチ概周期との比較により、ピ
ッチ波形の切り出しを自動化する音声のピッチ波形切り
出し装置と、このピッチ波形切り出し装置から出力され
る１ピッチ波形に、作成しようとするピッチ周期に応じ
た波形成形を行い、成形されたピッチ波形どうしを接続
して任意のピッチ周期の音声を合成する装置を設けるこ
とにより、切り出されたピッチ波形の端での異常な音の
発生を防止することができる優れた音声合成装置を実現
できるものである。As described above, the present invention utilizes a detected zero-cross point and compares a pitch waveform with an approximate pitch period to automatically cut out a pitch waveform, and a pitch waveform cutting device for this speech. The 1-pitch waveform output from is shaped by waveform shaping according to the pitch cycle to be created, and by connecting the shaped pitch waveforms to each other to provide a device for synthesizing speech of an arbitrary pitch cycle, it is cut out. It is also possible to realize an excellent voice synthesizer capable of preventing the generation of abnormal sounds at the ends of the pitch waveform.

[Brief description of drawings]

【図１】本発明の実施例における音声合成装置の機器構
成図FIG. 1 is a device configuration diagram of a speech synthesizer according to an embodiment of the present invention.

【図２】比較器１４への入力信号ならびにスイッチＧ１
およびＧ２のＯＮ−ＯＦＦのタイミングを示す波形図FIG. 2 is an input signal to a comparator 14 and a switch G1.
Waveform diagram showing ON-OFF timing of G2 and G2

【図３】（ａ）音声波形図（ｂ）成形された音声波形図FIG. 3 (a) Voice waveform diagram (b) Shaped voice waveform diagram

【図４】従来の音声合成装置の機器構成図FIG. 4 is a device configuration diagram of a conventional speech synthesizer.

[Explanation of symbols]

１１音声分析器１２ピッチ概周期推定器１３ピッチ波形包絡作成器１４比較器１５ピッチ切出器１６波形成形器１７接続器１８出力部２１カウンタ２２メモリ２３包絡作成部 11 Speech Analyzer 12 Pitch Approximate Period Estimator 13 Pitch Waveform Envelope Creator 14 Comparator 15 Pitch Cutout Device 16 Waveform Shaper 17 Connector 18 Output Unit 21 Counter 22 Memory 23 Envelope Creating Unit

Claims

[Claims]

1. An envelope creating means for creating an envelope corresponding to a voice waveform, a detecting means for detecting a zero-cross point of the voice waveform immediately before the voice waveform exceeds the envelope, and a zero-cross point thus detected. An audio waveform slicing device, comprising: an output unit that outputs an interval as an audio waveform of one pitch.

2. An estimation means for estimating a pitch approximate cycle of a voice is provided, and the speech is detected within a certain time before and after the estimated pitch approximate cycle from the time of the first zero-cross point detection detected by the zero-cross point detecting means. The voice waveform cutting device according to claim 1, wherein the interval between the zero cross points and the second zero cross point is output as a one-pitch voice waveform.

3. Estimating means for estimating an approximate pitch period of a voice, and comparing means for comparing an interval between respective zero-cross points of a voice waveform detected by a zero-cross point detecting means with an approximate pitch cycle estimated by the estimating means. The audio waveform cutting device according to claim 1, wherein the zero-cross point interval is output as a one-pitch audio waveform when the comparison result is less than a certain value.

4. An output means for outputting a voice waveform, and a window creating means for creating a window for shaping the voice waveform according to an arbitrary pitch period, the voice waveform output from the output means being created as the window. An audio waveform shaping device that shapes an audio waveform using a window created by the means.

5. An estimating means for estimating an approximate pitch of a voice, an envelope producing means for producing an envelope corresponding to a voice waveform, and a zero-cross point of the voice waveform immediately before the voice waveform exceeds the envelope. Detecting means, window creating means for creating a window for shaping a voice waveform according to an arbitrary pitch cycle, and connecting means for connecting pitch waveforms to each other, and first zero-cross point detection detected by the detecting means. From time to time, the zero-cross point interval with the second zero-cross point detected within a certain time before and after the estimated pitch approximate period is set to 1
A voice output as a pitch voice waveform, the voice waveform is shaped by the window created by the window creating means, and the formed pitch waveforms are connected by the connecting means to synthesize a voice of an arbitrary pitch cycle. Synthesizer.

6. An estimating means for estimating an approximate pitch of a voice, an envelope producing means for producing an envelope corresponding to a voice waveform, and a zero-cross point of the voice waveform immediately before the voice waveform exceeds the envelope. Detecting means, comparing means for comparing the interval between the respective zero-cross points of the voice waveform detected by the detecting means with the pitch approximate period estimated by the estimating means, and a window for shaping the voice waveform according to an arbitrary pitch period. A window creating means for creating, and connecting means for connecting the pitch waveforms, when the comparison result in the comparing means is less than a certain value, for each interval of each zero-cross point detected by the detecting means, Using this as a 1-pitch voice waveform, the voice is cut out by the cutting-out means, and the cut-out voice waveform is shaped by the window created by the window creating means. A voice synthesizing device for synthesizing voices having an arbitrary pitch period by connecting the formed pitch waveforms by the connecting means.