JP2002366177A

JP2002366177A - Node extracting device for natural voice

Info

Publication number: JP2002366177A
Application number: JP2001169140A
Authority: JP
Inventors: Kazufumi Serio; 一史芹生
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2001-06-05
Filing date: 2001-06-05
Publication date: 2002-12-20
Anticipated expiration: 2021-06-05
Also published as: JP4639532B2

Abstract

PROBLEM TO BE SOLVED: To provide a node extracting device for a natural voice which stably and efficiently extract nodes needed to generate a pitch pattern by spline approximation. SOLUTION: A pattern extraction part 1 extracts a basic frequency pattern according to an inputted natural voice 102. An input part 2 inputs language information 101. A pattern sectioning part 3 sections the basic frequency pattern into access phrases according to the language information 101. A no-voice control part 4 corrects a voiceless section of an access phrase pattern curve into a smooth curve by interpolation. A differential arithmetic part 5 finds the primary differential curve and secondary differential curve of the access phrase pattern curve. A node extraction part 6 extracts nodes from the access phrase pattern curve according to the primary differential curve, secondary differential curve, and language information 101. A node information output part 7 outputs a node 103 as a final node to the outside.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、音声合成装置のた
めの節点抽出装置に関し、より詳細には、スプライン近
似でピッチパターンを生成するために、必要な節点を抽
出する自然音声の節点抽出装置に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a node extracting device for a speech synthesizer, and more particularly, to a natural speech node extracting device for extracting nodes necessary for generating a pitch pattern by spline approximation. It is about.

【０００２】[0002]

【従来の技術】最近の音声合成装置は、規則合成方式に
従って、音声を合成する。規則合成方式では、規則合成
エンジンにパラメータとして節点を与えることで、音声
の基本周波数の時間的変化パターンを示すピッチパター
ンが生成されて、音声を合成する規則の１つとして利用
される。2. Description of the Related Art Recent speech synthesizers synthesize speech according to a rule synthesis method. In the rule synthesis method, a pitch pattern indicating a temporal change pattern of a fundamental frequency of a voice is generated by giving a node as a parameter to the rule synthesis engine, and is used as one of rules for synthesizing the voice.

【０００３】音声合成装置は、節点抽出装置が自然音声
から抽出した節点を予め記憶し、節点に基づいてスプラ
イン近似を行い、ピッチパターンを生成する。スプライ
ン近似は、節点と呼ばれる離散的な点を順に結び、スプ
ライン関数を用いて、全体が滑らかな曲線に近似する処
理である。[0003] A speech synthesizer previously stores nodes extracted from natural speech by a node extractor, performs spline approximation based on the nodes, and generates a pitch pattern. The spline approximation is a process of connecting discrete points called nodes in order and using a spline function to approximate a smooth curve as a whole.

【０００４】規則合成方式の音声合成装置では、音声合
成の規則の１つとして、生成されたピッチパターンを利
用し、別に入力される発音記号又は文字から、任意の語
彙の連続音声を直接合成する。節点抽出装置は、発話者
の性別や発話速度等の条件に左右されないで、節点を抽
出できることが重要になり、幾つかの提案がなされてい
る。[0004] In a speech synthesis apparatus of the rule synthesis system, as one of the rules of speech synthesis, a continuous speech of an arbitrary vocabulary is directly synthesized from phonetic symbols or characters which are separately input, using a generated pitch pattern. . It is important for the node extraction device to be able to extract nodes without being influenced by conditions such as the gender of the speaker and the utterance speed, and some proposals have been made.

【０００５】信学技法ＳＰ２０００−２９には、自然音
声の節点抽出装置で使用される節点抽出方法が記載され
ている（２０００年７月発行の電子情報通信学会信学
技法：２０ページ、筆者：森川博由坪井直宏柳雄一
郎、題名：「平滑化スプライン関数による音声のピッチ
パターンのモデル化と分析」）。この節点抽出装置が行
う節点抽出方法（節点選択法）では、自然音声から基本
周波数を抽出し、この抽出した基本周波数を各時間毎に
プロットした複数のデータ点として求め、複数のデータ
点から下記に示す２つの方法を用いて節点を選択する。[0005] The IEICE SP2000-29 describes a node extraction method used in a natural speech node extraction apparatus (IEICE IEICE Technical Report, July 2000, page 20, author: Hiroyoshi Morikawa Naohiro Tsuboi Yuichiro Yanagi, Title: "Modeling and Analysis of Speech Pitch Pattern Using Smoothing Spline Function"). In the node extraction method (node selection method) performed by the node extraction device, a fundamental frequency is extracted from natural speech, and the extracted fundamental frequency is obtained as a plurality of data points plotted for each time. A node is selected using the two methods shown in FIG.

【０００６】図６は、第１の節点選択法のフローチャー
トである。複数のデータ点の始点及び終点を２つの節点
とし、始点から終点までの間を時間間隔ｄｔで等分割
し、分割点毎に最も近いデータ点を抽出して節点候補に
する（ステップＳ８１）。隣り合う節点候補間の傾きを
求め、傾きの大きさがしきい値ＴH2より小さければ、節
点候補から削除する（ステップＳ８２）。FIG. 6 is a flowchart of the first node selection method. The start point and the end point of the plurality of data points are set as two nodes, the interval from the start point to the end point is equally divided at a time interval dt, and the closest data point is extracted for each division point to be a node candidate (step S81). The inclination between adjacent node candidates is obtained, and if the magnitude of the inclination is smaller than the threshold value TH2, the candidate is deleted from the node candidates (step S82).

【０００７】節点及び節点候補に基づいて、平滑スプラ
イン関数を求めて基本周波数パターン曲線との誤差を計
算する（ステップＳ８３）。ここで、基本周波数パター
ン曲線とは、複数のデータ点の集まりを曲線として取り
扱うものである。ステップＳ８３の誤差がしきい値ＴH3
より小さいと、傾きの大きさが最も小さい節点候補を削
除し、ステップＳ８３から処理を実行する（ステップＳ
８６）。ステップＳ８３の誤差がしきい値ＴH3より大き
いと、最終的に残った節点候補を節点として決定する
（ステップＳ８５）。[0007] Based on the nodes and the node candidates, a smooth spline function is obtained to calculate an error from the fundamental frequency pattern curve (step S83). Here, the fundamental frequency pattern curve handles a group of a plurality of data points as a curve. The error in step S83 is equal to the threshold value TH3.
If it is smaller, the node candidate with the smallest inclination is deleted, and the processing is executed from step S83 (step S83).
86). If the error in step S83 is larger than the threshold value TH3, the finally remaining node candidate is determined as a node (step S85).

【０００８】図７は、第２の節点選択法のフローチャー
トである。複数のデータ点の始点及び終点を２つの節点
とし、双方の節点を直線で結び、直線と最も遠いデータ
点を節点候補とする（ステップＳ９１）。節点及び節点
候補に基づいて、平滑スプライン関数を求めて基本周波
数パターン曲線との誤差を計算する（ステップＳ９
２）。FIG. 7 is a flowchart of the second node selection method. The start point and the end point of the plurality of data points are two nodes, and both nodes are connected by a straight line, and the data point farthest from the straight line is set as a node candidate (step S91). Based on the nodes and the node candidates, a smooth spline function is obtained to calculate an error from the fundamental frequency pattern curve (step S9).
2).

【０００９】ステップＳ９２の誤差がしきい値ＴH3より
大きいと、スプライン関数から最も遠いデータ点を新た
な節点候補として追加し、ステップＳ９２から処理を実
行する（ステップＳ９５）。ステップＳ９２の誤差がし
きい値ＴH3より小さいと、最終的に残った節点候補を節
点として決定する（ステップＳ９４）。If the error in step S92 is larger than the threshold value TH3, the data point furthest from the spline function is added as a new node candidate, and the process is executed from step S92 (step S95). If the error in step S92 is smaller than the threshold value TH3, the finally remaining node candidate is determined as a node (step S94).

【００１０】[0010]

【発明が解決しようとする課題】上記従来の自然音声の
節点抽出装置では、第１の節点選択法は、分割する時間
間隔ｄｔが発話速度に依存するので、時間間隔ｄｔを経
験的に決定しなければならない。また、誤差と比較され
るしきい値ＴH2又はＴH3を発話速度に応じて変更し、経
験的に決定する必要があり、節点を安定して求められな
い。第２の節点選択方法も、第１の節点選択方法と同様
に、しきい値ＴH2又はＴH3を使用するので、発話速度に
依存し経験的に決定する必要がある。In the above-described conventional natural speech node extraction device, the first node selection method empirically determines the time interval dt because the time interval dt to be divided depends on the speech speed. There must be. Further, it is necessary to change the threshold value TH2 or TH3 to be compared with the error according to the utterance speed and to determine the threshold value empirically, so that a node cannot be obtained stably. Similarly to the first node selection method, the second node selection method uses the threshold value TH2 or TH3, and thus needs to be determined empirically depending on the speech speed.

【００１１】一般に、自然音声の基本周波数パターン曲
線について、その形状を考慮せずに節点を抽出すると、
この節点に基づくスプライン近似で生成されたピッチパ
ターンには、波打ち現象等のような影響が現れ、自然音
声の基本周波数パターンと異なる形状やパターンが生成
されることがある。In general, if a node is extracted from a fundamental frequency pattern curve of natural speech without considering its shape,
The pitch pattern generated by the spline approximation based on the node has an effect such as a waving phenomenon, and a shape or pattern different from the fundamental frequency pattern of the natural voice may be generated.

【００１２】上記従来の自然音声の節点抽出装置では、
節点及び節点候補に基づく誤差がしきい値内であるか否
かの比較により節点を選択するので、自然音声の基本周
波数パターンと異なるピッチパターンが生成されること
があり、基本周波数パターン曲線の形状を十分に考慮し
ているとはいえない。In the above-described conventional natural speech node extraction device,
Since a node is selected by comparing whether or not the error based on the node and the node candidate is within a threshold value, a pitch pattern different from the fundamental frequency pattern of natural speech may be generated, and the shape of the fundamental frequency pattern curve may be generated. Is not considered enough.

【００１３】本発明は、上記したような従来の技術が有
する問題点を解決するためになされたものであり、スプ
ライン近似でピッチパターンを生成するために、必要な
節点を安定して効率よく抽出する自然音声の節点抽出装
置を提供する。SUMMARY OF THE INVENTION The present invention has been made to solve the above-mentioned problems of the prior art, and stably and efficiently extracts necessary nodes in order to generate a pitch pattern by spline approximation. The present invention provides a natural speech node extraction device.

【００１４】[0014]

【課題を解決するための手段】上記目的を達成するた
め、本発明の自然音声の節点抽出装置は、自然発生音の
基本周波数パターンを抽出するパターン抽出部と、前記
基本周波数パターンをアクセント句毎に区分するパター
ン区分部と、前記区分された基本周波数パターンの１次
微分曲線及び２次微分曲線を求める微分演算部と、前記
区分された基本周波数パターン、１次微分曲線及び２次
微分曲線に基づいて、前記基本周波数パターンの節点を
抽出する節点抽出部とを備えることを特徴とする。To achieve the above object, a natural sound node extraction device according to the present invention comprises a pattern extraction unit for extracting a fundamental frequency pattern of a naturally occurring sound, and a method for extracting the fundamental frequency pattern for each accent phrase. And a differential operation unit for obtaining a first derivative curve and a second derivative curve of the divided fundamental frequency pattern; and a differentiator for the divided fundamental frequency pattern, the first derivative curve and the second derivative curve. And a node extracting unit for extracting a node of the fundamental frequency pattern based on the information.

【００１５】本発明の自然音声の節点抽出装置は、自然
音声をアクセント句で区切って基本周波数を抽出した基
本周波数パターン曲線、その１次微分曲線及び２次微分
曲線に基づいて節点を抽出することにより、基本周波数
パターン曲線の形状を特徴づける変化点等を節点として
抽出し、この節点は発話速度とは無関係に抽出されるの
で、安定して効率のよい節点抽出を行うことができる。The natural speech node extraction device of the present invention extracts nodes based on a fundamental frequency pattern curve obtained by extracting a fundamental frequency by dividing a natural speech by an accent phrase, and its first and second derivative curves. As a result, a change point or the like characterizing the shape of the fundamental frequency pattern curve is extracted as a node, and this node is extracted independently of the speech speed, so that stable and efficient node extraction can be performed.

【００１６】本発明の自然音声の節点抽出装置では、前
記節点抽出部は、前記１次微分曲線のゼロ点を節点とし
て抽出すること、また、前記２次微分曲線の最高点及び
最低点を夫々節点として抽出することが好ましい。この
場合、節点抽出が基本周波数パターン曲線の形状を特徴
づける変化点等で確実に行える。In the natural sound node extraction device according to the present invention, the node extraction unit extracts a zero point of the primary differential curve as a node, and extracts a maximum point and a minimum point of the secondary differential curve, respectively. It is preferable to extract it as a node. In this case, node extraction can be reliably performed at a change point or the like that characterizes the shape of the fundamental frequency pattern curve.

【００１７】また、本発明の自然音声の節点抽出装置で
は、前記自然音声が、疑問文であるか否か、及び、アク
セント位置を含むか否かを指定するための入力部を更に
備えることが好ましい。この場合、前記節点抽出部は、
自然音声が疑問文である場合に、前記区分された基本周
波数パターンの終了点の直前の周波数最低点を節点とし
て抽出すること、自然音声がアクセント位置を含む場合
には、前記１次微分曲線のゼロ点以降の２次微分曲線の
最高点及び最低点を夫々節点として抽出することができ
る。この場合、アクセント位置等の言語情報を用いて、
基本周波数パターン曲線の形状を十分に考慮できるの
で、節点抽出が基本周波数パターン曲線の形状を特徴づ
ける変化点等でより確実に行える。Further, the natural speech node extraction device of the present invention may further comprise an input unit for designating whether the natural speech is a question sentence and whether or not the natural speech includes an accent position. preferable. In this case, the node extraction unit includes:
When natural speech is a question sentence, the lowest frequency point immediately before the end point of the divided fundamental frequency pattern is extracted as a node. When natural speech includes an accent position, the first derivative curve of the primary differential curve is extracted. The highest point and the lowest point of the second derivative curve after the zero point can be extracted as nodes. In this case, using language information such as accent positions,
Since the shape of the fundamental frequency pattern curve can be sufficiently considered, node extraction can be performed more reliably at a change point or the like that characterizes the shape of the fundamental frequency pattern curve.

【００１８】前記節点抽出部は、先に求められた２つの
隣り合う節点の中間点を新たに節点として抽出すること
も本発明の好ましい態様である。この場合、節点に基づ
いてピッチパターンを生成することが確実になる。It is also a preferred embodiment of the present invention that the node extracting unit newly extracts an intermediate point between two adjacent nodes obtained earlier as a node. In this case, it is ensured that the pitch pattern is generated based on the nodes.

【００１９】[0019]

【発明の実施の形態】以下、本発明の実施形態例に基づ
いて、本発明の自然音声の節点抽出装置について図面を
参照して説明する。図１は、本発明の一実施形態例の自
然音声の節点抽出装置のブロック図である。自然音声の
節点抽出装置は、パターン抽出部１、パターン区分部
３、入力部２、無声制御部４、微分演算部５、節点抽出
部６、及び、節点情報出力部７で構成される。BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a block diagram showing a natural speech node extracting apparatus according to an embodiment of the present invention. FIG. 1 is a block diagram of a natural speech node extraction device according to an embodiment of the present invention. The natural speech node extraction device includes a pattern extraction unit 1, a pattern classification unit 3, an input unit 2, an unvoiced control unit 4, a differential operation unit 5, a node extraction unit 6, and a node information output unit 7.

【００２０】パターン抽出部１は、入力される自然音声
１０２に基づいて、基本周波数パターンを抽出し、パタ
ーン区分部３に入力する。基本周波数パターンは、短い
時間間隔の抽出時点で、基本周波数を抽出した複数のデ
ータ点である。データ点は、抽出時刻及び基本周波数で
構成される。The pattern extracting unit 1 extracts a fundamental frequency pattern based on the input natural speech 102 and inputs the pattern to the pattern dividing unit 3. The fundamental frequency pattern is a plurality of data points from which the fundamental frequency has been extracted at the point in time when the short time interval is extracted. A data point is composed of an extraction time and a fundamental frequency.

【００２１】入力部２は、入力される言語情報１０１を
パターン区分部３に入力する。言語情報１０１は、アク
セント句の開始時刻と終了時刻、アクセント位置時刻、
アクセント句に含まれる子音母音の開始時刻と終了時
刻、及び、疑問文か平叙文かを示す文タイプ等から成る
情報である。パターン区分部３は、言語情報１０１に基
づいて、基本周波数パターンをアクセント句ごとに区切
り、無声制御部４に入力する。The input unit 2 inputs the input linguistic information 101 to the pattern classification unit 3. The language information 101 includes a start time and an end time of an accent phrase, an accent position time,
This is information including a start time and an end time of a consonant vowel included in the accent phrase, and a sentence type indicating whether the sentence is a question sentence or a declarative sentence. The pattern classification unit 3 divides the fundamental frequency pattern into accent phrases based on the linguistic information 101 and inputs the same to the unvoiced control unit 4.

【００２２】図２は、自然音声の「よろしいですか」に
関する情報を示す。自然音声は、各時刻毎に発生された
周波数が点としてプロットされる。図中の黒い影部分
は、自然音声の周波数特性（スペクトル表示）を示す。
同図（ａ）に示すように、自然音声の基本周波数は、２
００Ｈｚから４００Ｈｚまでの黒い影部分の中に、白抜
き線＊として示される。FIG. 2 shows information about "Is it OK?" The natural sound is plotted as points at frequencies generated at each time. The black shaded portions in the figure indicate the frequency characteristics (spectral display) of natural speech.
As shown in FIG. 2A, the fundamental frequency of natural speech is 2
It is shown as a white line * in the black shaded region from 00 Hz to 400 Hz.

【００２３】無声制御部４は、言語情報１０１に基づい
て、基本周波数パターン曲線に含まれる無声区間を調べ
る。基本周波数パターン曲線は、無声区間が存在する
と、スプライン近似に必要な節点を抽出する際に誤りを
起こし易いので、補間して滑らかなアクセント句パター
ン曲線として修正される。The unvoiced control section 4 examines unvoiced sections included in the fundamental frequency pattern curve based on the linguistic information 101. If there is an unvoiced section, the fundamental frequency pattern curve is likely to cause an error when extracting nodes required for spline approximation. Therefore, the fundamental frequency pattern curve is interpolated and corrected as a smooth accent phrase pattern curve.

【００２４】図２（ｂ）に示すように、子音を含む無声
区間（“ｓｈ”）がある場合には、近くの有声区間
（“ｏ”又は“ｉ−”）から引き伸ばし、直線又は曲線
で補間する。アクセント句の開始点又は終了点が無声で
ある場合には、近くの有声区間の値から数Ｈｚ小さい値
を開始点又は終了点として補間する。無声制御部４は、
アクセント句パターン曲線を白丸で示される各節点（Ｂ
1、Ｐ1、Ｅ1、Ｅ2）を通るように連続的で滑らかにし
て、節点抽出部６に入力する。As shown in FIG. 2B, when there is a unvoiced section (“sh”) including a consonant, it is stretched from a nearby voiced section (“o” or “i−”) and is drawn by a straight line or a curve. Interpolate. If the start point or the end point of the accent phrase is unvoiced, a value smaller by several Hz than the value of the nearby voiced section is interpolated as the start point or the end point. The silent control unit 4
Each accent phrase pattern curve is represented by a node (B
1, P1, E1, E2), and is input to the node extractor 6 after being smoothed continuously.

【００２５】図３は、図１の自然音声の節点抽出装置が
行う節点抽出方法のフローチャートである。微分演算部
５は、アクセント句パターン曲線の１次微分曲線及び２
次微分曲線を求めて、節点抽出部６に入力する。節点抽
出部６は、１次微分曲線、２次微分曲線、及び、言語情
報１０１に基づいて、節点抽出を行う。FIG. 3 is a flowchart of a node extraction method performed by the natural speech node extraction apparatus of FIG. The differential operation unit 5 includes a first-order differential curve of the accent phrase pattern curve and 2
The next differential curve is obtained and input to the node extraction unit 6. The node extraction unit 6 extracts nodes based on the primary differential curve, the secondary differential curve, and the language information 101.

【００２６】図４（ａ）、（ｂ）、及び、（ｃ）は、平
叙文のアクセント句パターン曲線、その１次微分曲線、
及び、２次微分曲線を夫々示す。アクセント句パターン
曲線は、アクセント句パターンの開始点Ｂ1、及び、ア
クセント句パターンの終了点Ｅ1を有する。１次微分曲
線は、符号が正から負に変わるゼロ交差点Ｐ1を有す
る。２次微分曲線は、ゼロ交差点Ｐ1以前の最高点Ａ1と
ゼロ交差点Ｐ1以後の最高点Ｃ2、及び、ゼロ交差点Ｐ1
以前の最低点Ａ2とゼロ交差点Ｐ1以後の最低点Ｃ1を有
する。FIGS. 4A, 4B and 4C show an accent phrase pattern curve of a declarative sentence, its first derivative curve,
And second derivative curves are shown. The accent phrase pattern curve has a start point B1 of the accent phrase pattern and an end point E1 of the accent phrase pattern. The first derivative curve has a zero crossing point P1 whose sign changes from positive to negative. The second derivative curve includes the highest point A1 before the zero crossing point P1, the highest point C2 after the zero crossing point P1, and the zero crossing point P1.
It has the previous lowest point A2 and the lowest point C1 after the zero crossing point P1.

【００２７】アクセント句パターン曲線の開始点である
データ点Ｂ1を節点Ｂ1として抽出し、アクセント句パタ
ーン曲線の終了点であるデータ点Ｅ1を節点Ｅ1として抽
出する（ステップＳ１１）。アクセント句パターン曲線
を一階微分し、１次微分曲線を求める（ステップＳ１
２）。The data point B1 which is the start point of the accent phrase pattern curve is extracted as the node B1, and the data point E1 which is the end point of the accent phrase pattern curve is extracted as the node E1 (step S11). First-order differentiation of the accent phrase pattern curve is performed to obtain a first-order differential curve (step S1)
2).

【００２８】１次微分曲線の符号が正から負に変わるゼ
ロ交差点Ｐ１を求め、ゼロ交差点Ｐ1に対応するアクセ
ント句パターン曲線上のデータ点Ｐ1である節点Ｐ1を抽
出する。ゼロ交差点が複数ある場合には、アクセント句
パターン曲線の最高周波数点に最も近い交差点をゼロ交
差点Ｐ１とする（ステップＳ１３）。言語情報１０１の
文タイプが疑問文でなければ（ステップＳ１４）、ステ
ップＳ１６に進み次の処理を実行する。A zero-crossing point P1 at which the sign of the primary differential curve changes from positive to negative is obtained, and a node P1 which is a data point P1 on the accent phrase pattern curve corresponding to the zero-crossing point P1 is extracted. If there are a plurality of zero-crossing points, the crossing point closest to the highest frequency point of the accent phrase pattern curve is set as the zero-crossing point P1 (step S13). If the sentence type of the linguistic information 101 is not a question sentence (step S14), the process proceeds to step S16 to execute the next process.

【００２９】図５（ａ）、（ｂ）、及び、（ｃ）は、疑
問文のアクセント句パターン曲線、その１次微分曲線、
及び、２次微分曲線を夫々示す。アクセント句パターン
曲線は、アクセント句パターンの開始点Ｂ1、周波数最
低点Ｅ1、及び、アクセント句パターンの終了点Ｅ2を有
する。１次微分曲線は、ゼロ交差点Ｐ1を有する。２次
微分曲線は、ゼロ交差点Ｐ1以前の最高点Ａ1とゼロ交差
点Ｐ1以後の最高点Ｃ2、及び、ゼロ交差点Ｐ1以後の最
低点Ｃ1を有する。FIGS. 5A, 5B, and 5C show an accent phrase pattern curve of a question sentence, its first derivative curve,
And second derivative curves are shown. The accent phrase pattern curve has a start point B1 of the accent phrase pattern, a lowest frequency point E1, and an end point E2 of the accent phrase pattern. The first derivative curve has a zero crossing point P1. The second derivative curve has the highest point A1 before the zero crossing point P1, the highest point C2 after the zero crossing point P1, and the lowest point C1 after the zero crossing point P1.

【００３０】ステップＳ１４で疑問文であれば、アクセ
ント句パターン曲線の周波数最低点であるデータ点Ｅ1
を節点Ｅ1として抽出し、アクセント句パターン曲線の
終了点であるデータ点Ｅ2を節点Ｅ2として抽出する（ス
テップＳ１５）。また、１次微分曲線の符号が負から正
に変わるゼロ交差点Ｅ1を調べ、１次微分曲線のゼロ交
差点Ｅ1に対応するアクセント句パターン曲線上のデー
タ点Ｅ1を周波数最低点Ｅ1としてもよい。If it is a question sentence in step S14, the data point E1 which is the lowest frequency point of the accent phrase pattern curve
Is extracted as the node E1, and the data point E2, which is the end point of the accent phrase pattern curve, is extracted as the node E2 (step S15). Further, the zero-crossing point E1 at which the sign of the primary differential curve changes from negative to positive may be examined, and the data point E1 on the accent phrase pattern curve corresponding to the zero-crossing point E1 of the primary differential curve may be set as the lowest frequency point E1.

【００３１】図４（ｃ）に示すように、アクセント句パ
ターン曲線を二階微分し、２次微分曲線を求める（ステ
ップＳ１６）。アクセント句パターン曲線の節点Ｂ1か
ら節点Ｐ1までの区間にある２次微分曲線の頂点を調
べ、ゼロ交差点Ｐ1以前の最高点Ａ1に対応するアクセン
ト句パターン曲線上のデータ点Ａ1である節点Ａ1を抽出
し、節点Ａ1から節点Ｐ1までの区間で、ゼロ交差点Ｐ1
以前の最低点Ａ2に対応するアクセント句パターン曲線
上のデータ点Ａ2である節点Ａ2を抽出する（ステップＳ
１７）。As shown in FIG. 4C, the accent phrase pattern curve is second-order differentiated to obtain a second derivative curve (step S16). The vertex of the second derivative curve in the section from the node B1 to the node P1 of the accent phrase pattern curve is examined, and the node A1 which is the data point A1 on the accent phrase pattern curve corresponding to the highest point A1 before the zero crossing point P1 is extracted. Then, in the section from the node A1 to the node P1, the zero-crossing point P1
The node A2, which is the data point A2 on the accent phrase pattern curve corresponding to the previous lowest point A2, is extracted (step S2).
17).

【００３２】次に、言語情報１０１のアクセント位置時
刻を調べ、アクセント位置を含まなければ（ステップＳ
１８）、ステップＳ２０に進み次の処理を実行する。ア
クセント位置は、アクセントのある位置を表わすもので
ある。例えば、「アンケート」は、「ア」の次の音で下
がるので、「ア」にアクセントがあり、「ア」の音の終
了位置がアクセント位置である。Next, the accent position time of the language information 101 is checked, and if the accent position is not included (step S
18) Go to step S20 to execute the next process. The accent position indicates an accent position. For example, since "questionnaire" goes down with the sound following "a", "a" has an accent, and the end position of the sound of "a" is the accent position.

【００３３】アクセント位置を含めば、アクセント句パ
ターン曲線の節点Ｐ1から節点Ｅ１までの区間にある２
次微分曲線の頂点を調べ、ゼロ交差点Ｐ1以後の最高点
Ｃ2に対応するアクセント句パターン曲線上のデータ点
Ｃ2である節点Ｃ2を抽出し、節点Ｐ1から節点Ｃ2までの
区間で、ゼロ交差点Ｐ1以後の最低点Ｃ1に対応するアク
セント句パターン曲線上ののデータ点Ｃ1である節点Ｃ1
を抽出する（ステップＳ１９）。If the accent position is included, 2 in the section from the node P1 to the node E1 of the accent phrase pattern curve
The vertex of the next differential curve is examined, and a node C2, which is a data point C2 on the accent phrase pattern curve corresponding to the highest point C2 after the zero crossing point P1, is extracted, and in a section from the node P1 to the node C2, the node after the zero crossing point P1 Is the data point C1 on the accent phrase pattern curve corresponding to the lowest point C1
Is extracted (step S19).

【００３４】ただし、ステップＳ１７又はＳ１９におい
て、指定区間で２次微分曲線の最高点又は最低点が無い
場合には、アクセント句パターン曲線上の節点を抽出し
ない。図５（ｃ）には２次微分曲線のゼロ交差点Ｐ1以
前の最低点Ａ2が無く、アクセント句パターン曲線の節
点Ａ2を抽出しない例が示されている。However, if there is no highest or lowest point of the second derivative curve in the designated section in step S17 or S19, no node on the accent phrase pattern curve is extracted. FIG. 5C shows an example in which there is no lowest point A2 before the zero crossing point P1 of the second derivative curve, and no node A2 of the accent phrase pattern curve is extracted.

【００３５】また、指定された区間に対する２次微分曲
線の頂点を求める際に、更に三次微分曲線を求め、三次
微分曲線が正又は負に符号が変わるゼロ交差点を調べ、
三次微分曲線のゼロ交差点に対応する２次微分曲線の頂
点を求めてもよい。Further, when obtaining the apex of the second derivative curve for the designated section, the third derivative curve is further obtained, and the zero crossing point where the third derivative curve changes sign to positive or negative is examined.
The vertex of the second derivative curve corresponding to the zero crossing point of the third derivative curve may be obtained.

【００３６】アクセント句パターン曲線上で抽出された
節点Ｂ1、Ａ1、Ａ2、Ｐ1、Ｃ1、Ｃ2、及び、Ｅ1だけで
は、ピッチパターンを生成する際に不十分な場合、アク
セント句パターン曲線上で、先に求められた２つの隣り
合う節点の中間点を新たに節点として抽出してもよい。
最終的な節点を節点情報出力部７に入力して、処理を終
了する（ステップＳ２０）。If the nodes B1, A1, A2, P1, C1, C2, and E1 extracted on the accent phrase pattern curve alone are not enough to generate a pitch pattern, An intermediate point between two adjacent nodes obtained earlier may be newly extracted as a node.
The final node is input to the node information output unit 7, and the process ends (step S20).

【００３７】節点情報出力部７は、節点抽出部６からの
最終的な節点を節点１０３として外部に出力する。ま
た、上記のステップＳ２０に相当する中間点を追加し最
終的な節点を求める処理を、節点情報出力部７が実行し
てもよい。The node information output unit 7 outputs the final node from the node extraction unit 6 as a node 103 to the outside. In addition, the node information output unit 7 may execute the process of adding the intermediate point corresponding to the step S20 and obtaining the final node.

【００３８】上記実施形態例によれば、自然音声をアク
セント句で区切って基本周波数を抽出した基本周波数パ
ターン曲線、その１次微分曲線及び２次微分曲線に基づ
いて節点を抽出することにより、基本周波数パターン曲
線の形状を特徴づける変化点等を節点として抽出し、こ
の節点は発話速度とは無関係に抽出されるので、安定し
て効率のよい節点抽出を行うことができる。According to the above embodiment, nodes are extracted based on a fundamental frequency pattern curve obtained by extracting a fundamental frequency by dividing a natural voice by an accent phrase and its primary differential curve and secondary differential curve. A change point or the like characterizing the shape of the frequency pattern curve is extracted as a node, and this node is extracted regardless of the speech speed, so that stable and efficient node extraction can be performed.

【００３９】音声合成装置は、節点に基づいてピッチパ
ターンを生成し、音声合成の規則（規則合成方式）の１
つとして利用し、別に入力される発音記号又は文字列に
基づいて音声を合成する。ピッチパターンは、アクセン
ト及びイントネーションと最も密接に関連し、自然で聞
きやすい音調を与えるだけでなく、単語や句のまとまり
を示し、文として理解しやすくする。音声合成装置は、
生成されるピッチパターンが実際の基本周波数パターン
を忠実に再現すれば、自然で聞きやすい音声を合成でき
る。The speech synthesizer generates a pitch pattern on the basis of the nodal point, and determines one of the rules of speech synthesis (rule synthesis method).
And synthesizes speech based on phonetic symbols or character strings that are input separately. Pitch patterns are most closely related to accents and intonations and give not only natural and audible tones, but also indicate units of words and phrases and are easy to understand as sentences. The speech synthesizer is
If the generated pitch pattern faithfully reproduces the actual fundamental frequency pattern, a natural and easy-to-hear voice can be synthesized.

【００４０】本発明の節点抽出装置では、基本周波数パ
ターン曲線の形状を特徴づける変化点等を節点として抽
出することにより、生成されるピッチパターンが実際の
基本周波数パターンを忠実に再現できるので、音声合成
装置に限らず規則合成方式を採用する装置には好適に利
用される。In the node extraction device of the present invention, a change point or the like characterizing the shape of the fundamental frequency pattern curve is extracted as a node, so that the generated pitch pattern can faithfully reproduce the actual fundamental frequency pattern. The present invention is suitably used not only for the synthesizing apparatus but also for an apparatus employing the rule synthesizing method.

【００４１】以上、本発明をその好適な実施形態例に基
づいて説明したが、本発明の節点抽出方法は、上記実施
形態例の構成にのみ限定されるものでなく、上記実施形
態例の構成から種々の修正及び変更を施した自然音声の
節点抽出装置も、本発明の範囲に含まれる。Although the present invention has been described based on the preferred embodiment, the node extraction method of the present invention is not limited to the configuration of the above-described embodiment, but may be the same as that of the above-described embodiment. The node extraction device for natural speech modified and modified from the above is also included in the scope of the present invention.

【００４２】[0042]

【発明の効果】以上説明したように、本発明の自然音声
の節点抽出装置では、自然音声をアクセント句で区切っ
て基本周波数を抽出した基本周波数パターン曲線、その
１次微分曲線及び２次微分曲線に基づいて節点を抽出す
ることにより、基本周波数パターン曲線の形状を特徴づ
ける変化点等を節点として抽出し、この節点は発話速度
とは無関係に抽出されるので、安定して効率のよい節点
抽出を行うことができる。As described above, in the natural speech node extraction apparatus of the present invention, a fundamental frequency pattern curve obtained by extracting a fundamental frequency by dividing a natural speech by an accent phrase, its first derivative curve and its second derivative curve. By extracting nodes based on, the change points that characterize the shape of the fundamental frequency pattern curve are extracted as nodes, and since these nodes are extracted independently of the speech speed, stable and efficient node extraction It can be performed.

[Brief description of the drawings]

【図１】本発明の一実施形態例の自然音声の節点抽出装
置のブロック図である。FIG. 1 is a block diagram of a natural sound node extraction device according to an embodiment of the present invention.

【図２】自然音声の「よろしいですか」に関する情報を
示す。FIG. 2 shows information related to "Are you sure?"

【図３】図１の自然音声の節点抽出装置が行う節点抽出
方法のフローチャートである。FIG. 3 is a flowchart of a node extracting method performed by the natural speech node extracting apparatus of FIG. 1;

【図４】同図（ａ）、（ｂ）、及び、（ｃ）は、平叙文
のアクセント句パターン曲線、その１次微分曲線、及
び、２次微分曲線を夫々示す。FIGS. 4A, 4B and 4C show an accent phrase pattern curve of a declarative sentence, its first derivative curve and its second derivative curve, respectively.

【図５】同図（ａ）、（ｂ）、及び、（ｃ）は、疑問文
のアクセント句パターン曲線、その１次微分曲線、及
び、２次微分曲線を夫々示す。FIGS. 5 (a), (b), and (c) show an accent phrase pattern curve of a question sentence, its first derivative curve, and its second derivative curve, respectively.

【図６】第１の節点選択法のフローチャートである。FIG. 6 is a flowchart of a first node selection method.

【図７】第２の節点選択法のフローチャートである。FIG. 7 is a flowchart of a second node selection method.

[Explanation of symbols]

１パターン抽出部２入力部３パターン区分部４無声制御部５微分演算部６節点抽出部７節点情報出力部１０１言語情報１０２自然音声１０３節点 DESCRIPTION OF SYMBOLS 1 Pattern extraction part 2 Input part 3 Pattern division part 4 Unvoiced control part 5 Differential calculation part 6 Node extraction part 7 Node information output part 101 Language information 102 Natural speech 103 Node

Claims

[Claims]

A pattern extracting unit for extracting a fundamental frequency pattern of a naturally occurring sound; a pattern dividing unit for dividing the fundamental frequency pattern for each accent phrase; a first derivative curve of the divided fundamental frequency pattern; A differential operation unit for obtaining a second derivative curve, and a node extraction unit for extracting a node of the fundamental frequency pattern based on the divided fundamental frequency pattern, the first derivative curve and the second derivative curve. Do
Node extraction device for natural speech.

2. The natural speech node extraction device according to claim 1, wherein the node extraction unit extracts a zero point of the first derivative curve as a node.

3. The node extractor extracts a highest point and a lowest point of the quadratic differential curve as nodes, respectively.
Or the natural speech node extraction device according to 2.

4. Whether or not the natural voice is a question sentence
The natural speech node extraction device according to any one of claims 1 to 3, further comprising an input unit for specifying whether or not to include an accent position.

5. The natural node according to claim 4, wherein when the natural speech is a question sentence, the node extracting unit extracts a lowest frequency point immediately before an end point of the divided fundamental frequency pattern as a node. Speech node extraction device.

6. When the natural speech includes an accent position, the node extraction unit extracts the highest point and the lowest point of the secondary differential curve after the zero point of the primary differential curve as nodes, respectively. Item 4. The natural speech node extraction device according to item 4 or 5.

7. The node extracting unit newly extracts an intermediate point between two adjacent nodes obtained earlier as a new node.
The node extraction device for natural speech according to any one of claims 3 to 6.