JPH07134598A

JPH07134598A - System for outputting basic frequency of voice

Info

Publication number: JPH07134598A
Application number: JP5283483A
Authority: JP
Inventors: Takahiko Niimura; 貴彦新村
Original assignee: N T T DATA TSUSHIN KK; NTT Data Communications Systems Corp
Current assignee: N T T DATA TSUSHIN KK; NTT Data Corp
Priority date: 1993-11-12
Filing date: 1993-11-12
Publication date: 1995-05-23

Abstract

PURPOSE:To provide a system capable of adjusting an output state of a fundamental frequency pattern according to the change in the size of an output surface and the number of data and visually providing a clear result. CONSTITUTION:A voice waveform analysis part 12 extracts a basic frequency and an amplitude value at every prescribed vocal sound line section from a voice waveform, and stores it in an analysis data hold part 16 temporarily. A syllable time length normalization part 13 normalizes a time length of other syllable based on an analysis time length of one syllable, and comparison observes plural basic frequency patterns on the same time base. Although the normalized time length is led to a display position operation part 14 for deciding a display position on an output device 2, the size information of the output surface is considered at the time of the decision, and further, only the time length suitable to an output condition of a lower limit value, etc., of a display interval is extracted, and the display data are formed. Thus, the redundant data are eliminated, and the clear output result is obtained.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、音声の生成や認識分野
における音声韻律の分析技術に係り、特に、同じ文字列
であっても発声時間長（以下、時間長）が異なる音節の
基本周波数パターンを同一時間軸上で明瞭に比較観察し
得る音声の基本周波数出力システムに関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech prosody analysis technique in the field of speech generation and recognition, and particularly to the fundamental frequency of syllables having different utterance time lengths (hereinafter, time lengths) even for the same character string. The present invention relates to a fundamental frequency output system for speech, which allows clear and comparative observation of patterns on the same time axis.

【０００２】[0002]

【従来の技術】音声のアクセント、イントネーション、
ストレス、リズム等は、音韻性以外の要素であり、音声
の基本周波数や振幅値、音節の接続時間等の韻律的特徴
によって表現される。これらは意味内容等の言語的情報
の伝達はもとより、感情等の非言語的情報の伝達にも重
要である。2. Description of the Related Art Accent of voice, intonation,
Stress, rhythm, and the like are elements other than phonological characteristics, and are expressed by prosodic features such as the fundamental frequency and amplitude value of speech, and syllable connection time. These are important not only for transmission of linguistic information such as meaning content but also for transmission of non-verbal information such as emotions.

【０００３】このうち、アクセントは、単音構成の等し
い複数の語（音節）の意味上の区別あるいは品詞の区別
に用いられるが、それが主としてどの韻律的特徴によっ
て表現されるかは言語によって異なる。例えば、日本語
のアクセントはピッチアクセントと呼ばれ、専ら基本周
波数の時間変化のパターン（以下、基本周波数パター
ン）によって表される。また、イントネーションは、例
えば統語構造や平叙文・疑問文の区別を示すものである
が、日本語を含む多くの言語において基本周波数パター
ンによって表現される。Of these, the accent is used for distinguishing a plurality of words (syllables) having the same monophonic structure in terms of meaning or part of speech, but which prosodic feature is mainly used depends on the language. For example, a Japanese accent is called a pitch accent, and is represented exclusively by a temporal change pattern of a fundamental frequency (hereinafter, fundamental frequency pattern). Further, intonation indicates a syntactic structure and a distinction between a plain text and an interrogative sentence, but is expressed by a fundamental frequency pattern in many languages including Japanese.

【０００４】このように、音声の基本周波数パターン
は、種々の情報の分析の基礎となる重要な韻律的特徴で
あるため、音声の生成や認識の分野においては、入力さ
れた音声波形の基本周波数パターンを迅速且つ的確に抽
出することが求められる。そこで、情報処理装置を用い
て所定の分析時間長毎に音声の基本周波数を求め、これ
を所定間隔でプリンタあるいは表示装置等に出力するこ
とにより基本周波数パターンを得ることが一般に行われ
ている。As described above, since the fundamental frequency pattern of a voice is an important prosodic feature that is a basis for analyzing various information, in the field of voice generation and recognition, the fundamental frequency of an input voice waveform is used. It is required to extract patterns quickly and accurately. Therefore, it is generally practiced to obtain a basic frequency pattern of a voice for each predetermined analysis time length using an information processing device and output the basic frequency to a printer or a display device at predetermined intervals to obtain a basic frequency pattern.

【０００５】ところで、同じ文字列でも話者やそのとき
の発声状態等が異なると時間長が異なってくることは良
く知られている。特に、音節に関しては顕著である。こ
ういった異なる時間長の音節についての基本周波数パタ
ーンを、例えば時間軸対周波数軸のグラフ面上に表して
比較観察する際、従来は、音節区間毎の基本周波数を一
時保持するとともに、複数のパターンの時間軸を同一に
修正し、韻律的特徴の相違のみが現れるようにしてい
た。By the way, it is well known that the same character string has different time lengths depending on the speaker or the utterance state at that time. Especially, it is remarkable in terms of syllables. When comparing and observing the fundamental frequency patterns of syllables having different time lengths on the graph surface of the time axis versus the frequency axis, for example, conventionally, the fundamental frequency for each syllable section is temporarily held and a plurality of The time axis of the pattern was modified to be the same so that only the differences in prosodic features appeared.

【０００６】[0006]

【発明が解決しようとする課題】しかしながら、従来の
手法では、同一のグラフ面上に複数の基本周波数パター
ンを表す場合に、各パターンの印字（表示のみの場合を
含む、以下同じ）の間隔及び印字個数（データ個数）は
固定的であったため、該グラフ面を印字する出力面、例
えば印字紙面の寸法や印字するデータ数が変化すると印
字部分が多重に重なって、明瞭な韻律的特徴の相違が得
られない場合があった。また、話者あるいは発声状態に
よっては雑音成分が混入する場合があるが、従来は全て
の入力音声に対して基本周波数の検出を行っていたので
グラフ面上にこれが反映され、基本周波数パターンの明
瞭さを損なう場合があった。However, in the conventional method, when a plurality of fundamental frequency patterns are displayed on the same graph surface, the intervals of printing of each pattern (including the case of display only, the same applies hereinafter) and Since the number of prints (the number of data) was fixed, when the size of the output surface on which the graph surface is printed, for example, the size of the printing paper surface or the number of data to be printed changes, the printed parts overlap and the difference in clear prosodic features becomes apparent. Was not obtained in some cases. Also, noise components may be mixed depending on the speaker or the utterance state, but in the past, since the fundamental frequency was detected for all input speech, this was reflected on the graph surface and the fundamental frequency pattern was clear. There was a case where it deteriorated.

【０００７】本発明は、かかる背景のもとに創案された
もので、その目的とするところは、出力面の寸法の変化
や発声状態に関わらず視覚的に明瞭な基本周波数パター
ンが迅速に得られる基本周波数出力システムを提供する
ことにある。The present invention was created in view of such a background, and an object thereof is to quickly obtain a visually clear fundamental frequency pattern regardless of the change in the size of the output surface or the utterance state. To provide a fundamental frequency output system.

【０００８】[0008]

【課題を解決するための手段】本発明は、個々の基本周
波数の表示位置が出力面の寸法上の制約を受けるという
性質、及び、有声音以外の雑音成分については振幅値が
有声音の場合より低く且つ周期性の相関がとれないとい
う性質を利用することにより上記目的の達成を図るもの
である。According to the present invention, the display position of each fundamental frequency is restricted by the size of the output surface, and for noise components other than voiced sound, the amplitude value is voiced. The object is achieved by utilizing the property of being lower and having no periodicity correlation.

【０００９】即ち、第一の発明は、音節又は音韻列の音
声波形の基本周波数パターンを、時間軸を含む二次元出
力面に表示するシステムであって、音声波形から所定の
音韻列区間毎の基本周波数及び振幅値を抽出する音声波
形分析手段と、抽出された基本周波数とその基本周波数
に対応する振幅値とを当該音韻区間の分析データとして
保持する分析データ保持手段と、前記出力面の少なくと
も時間軸方向の寸法情報を保持する出力面情報保持手段
と、前記時間軸方向の寸法情報をもとに各基本周波数の
前記出力面への表示位置を決定する表示位置決定手段
と、前記出力面への分析データの出力条件を保持すると
ともに、この出力条件に適合する基本周波数及びその表
示位置を分析データ保持手段及び表示位置決定手段から
抽出し、抽出された基本周波数及び表示位置に基づいて
前記出力面への表示データを作成する表示データ作成手
段とを有する音声の基本周波数出力システムを提供す
る。That is, the first invention is a system for displaying a fundamental frequency pattern of a voice waveform of a syllable or a phonological sequence on a two-dimensional output surface including a time axis, and for each predetermined phonological sequence section from the voice waveform. At least one of a voice waveform analysis unit that extracts a fundamental frequency and an amplitude value, an analysis data holding unit that holds the extracted fundamental frequency and an amplitude value corresponding to the fundamental frequency as analysis data of the phoneme section, and at least the output surface. Output surface information holding means for holding dimension information in the time axis direction, display position determining means for determining the display position of each fundamental frequency on the output surface based on the dimension information in the time axis direction, and the output surface The output condition of the analysis data is stored, and the fundamental frequency and its display position that meet this output condition are extracted from the analysis data holding means and the display position determining means, and extracted. Providing the fundamental frequency output system of a sound and a display data generating means for generating display data to the output surface in accordance with the present frequency and the display position.

【００１０】また、第二の発明は、発声時間長の異なる
複数の音節又は音韻列の音声波形の基本周波数パターン
を、時間軸を含む二次元出力面に同時に出力するシステ
ムであって、各音声波形からそれぞれ所定の音韻列区間
毎の基本周波数及び振幅値を抽出する音声波形分析手段
と、抽出された各基本周波数とその基本周波数に対応す
る振幅値とを当該音韻区間の分析データとして保持する
分析データ保持手段と、前記複数の音節又は音韻列のう
ち特定のものの発声時間長を基準として他の音節又は音
韻列の発声時間長を正規化する発声時間長正規化手段
と、前記出力面の少なくとも時間軸方向の寸法情報を保
持する出力面情報保持手段と、前記時間軸方向の寸法情
報をもとに各基本周波数の前記出力面への表示位置を決
定する表示位置決定手段と、前記出力面への分析データ
の出力条件を保持するとともに、この出力条件に適合す
る基本周波数及びその表示位置を分析データ保持手段及
び表示位置決定手段から抽出し、抽出された基本周波数
及び表示位置に基づいて前記出力面への表示データを作
成する表示データ作成手段とを有する音声の基本周波数
出力システムを提供する。The second invention is a system for simultaneously outputting a fundamental frequency pattern of a speech waveform of a plurality of syllables or phoneme strings having different utterance time lengths to a two-dimensional output surface including a time axis. A voice waveform analysis means for extracting a fundamental frequency and an amplitude value for each predetermined phoneme sequence section from the waveform, and holding each extracted fundamental frequency and an amplitude value corresponding to the fundamental frequency as analysis data of the phoneme section. Analysis data holding means, utterance time length normalizing means for normalizing utterance time lengths of other syllables or phoneme strings based on the utterance time length of a particular one of the plurality of syllables or phoneme strings, and the output surface Output surface information holding means for holding at least dimension information in the time axis direction, and display position determination for determining the display position of each fundamental frequency on the output surface based on the dimension information in the time axis direction. And the output condition of the analysis data to the output surface is held, and the basic frequency and its display position that meet this output condition are extracted from the analysis data holding means and the display position determining means, and the extracted basic frequency and There is provided a fundamental frequency output system for voice, comprising: display data creating means for creating display data on the output surface based on a display position.

【００１１】なお、前記表示データ作成手段の出力条件
としては、用途に応じて種々のものを設定可能である
が、従来の課題をより有効に解決するため、本発明で
は、前記出力面に対する個々のデータ表示間隔の下限
値、基本周波数に対応する振幅値の下限値、基本周波数
の自己相関係数値の閾値の少なくとも一つを含むものと
し、これら下限値に満たない基本周波数又は表示位置を
不適合要素と判定するようにした。As the output conditions of the display data creating means, various ones can be set according to the application, but in order to more effectively solve the conventional problems, in the present invention, the individual output surfaces are individually selected. At least one of the lower limit of the data display interval, the lower limit of the amplitude value corresponding to the fundamental frequency, and the threshold value of the autocorrelation coefficient value of the fundamental frequency. I decided to judge.

【００１２】また、時間軸上で周期性の無い成分（雑音
成分）は、自己相関係数値が有効音声成分よりも小さ
い。そこで予め閾値を定め、自己相関係数値とこの閾値
とを比較することで周期性の有無を検出し、周期性が無
い場合にこれを雑音成分と判定することができる。The component having no periodicity on the time axis (noise component) has an autocorrelation coefficient value smaller than that of the effective voice component. Therefore, a threshold value is set in advance, the presence or absence of periodicity is detected by comparing the autocorrelation coefficient value with this threshold value, and when there is no periodicity, this can be determined as a noise component.

【００１３】[0013]

【作用】第一の発明では、音声波形分析手段によって音
声波形から抽出された音韻列区間毎の基本周波数及び振
幅値が分析データ保持手段に一時保存される。音声波形
分析手段の出力は、出力面への表示位置の決定のために
表示位置決定手段にも導かれるが、その決定の際に出力
面の寸法情報が考慮される。例えば時間軸方向の寸法が
大きくなれば個々の表示間隔も大きくなる。表示データ
作成手段は、前記出力面に対する個々のデータ表示間隔
の下限値、基本周波数に対応する振幅値の下限値、基本
周波数の自己相関係数値の閾値の少なくとも一つを含む
出力条件を保持しており、上記決定された表示間隔が下
限値に満たない場合、あるいは分析データ保持手段に保
持された基本周波数の振幅値が下限値に満たない場合
は、基本周波数又は表示位置を不適合要素と判定し、こ
れら要素を無視する。また、基本周波数の自己相関係数
値が閾値に満たない場合は隣接するフレーム間での周期
性の相関がとれないことから雑音成分であることがわか
る。この場合の基本周波数も同様に不適合要素と判定
し、これを無視する。In the first aspect of the invention, the fundamental frequency and the amplitude value for each phoneme sequence section extracted from the voice waveform by the voice waveform analyzing means are temporarily stored in the analysis data holding means. The output of the voice waveform analyzing means is also guided to the display position determining means for determining the display position on the output surface, and the dimension information of the output surface is taken into consideration in the determination. For example, as the size in the time axis direction increases, the individual display intervals also increase. The display data creating means holds an output condition including at least one of a lower limit value of an individual data display interval on the output surface, a lower limit value of an amplitude value corresponding to a fundamental frequency, and a threshold value of an autocorrelation coefficient value of the fundamental frequency. If the display interval determined above does not reach the lower limit value, or if the amplitude value of the fundamental frequency held in the analysis data holding means does not reach the lower limit value, the fundamental frequency or display position is determined to be an incompatible element. And ignore these elements. Further, when the value of the autocorrelation coefficient of the fundamental frequency is less than the threshold value, it can be seen that it is a noise component because the periodicity correlation cannot be obtained between adjacent frames. The fundamental frequency in this case is also determined to be an incompatible element and is ignored.

【００１４】他方、基本周波数の振幅値、表示間隔が各
下限値を超えるときは適合要素と判定して表示データを
作成する。これにより多重印字が確実に回避されるとと
もに、雑音成分の出力も防止される。On the other hand, when the amplitude value of the fundamental frequency and the display interval exceed the respective lower limit values, it is determined as a compatible element and display data is created. This surely avoids multiple printing and also prevents the output of noise components.

【００１５】第二の発明は、発声時間長の異なる複数の
音節又は音韻列の音声波形の基本周波数パターンを、時
間軸を含む二次元出力面に同時に出力する場合の構成で
あり、上記第一の発明に発声時間長正規化手段が付加さ
れる。この発声時間長正規化手段においては、特定の音
節又は音韻列の発声時間長を基準として他のものの発声
時間長を正規化する。これにより各音声波形の分析単位
時間長が同一となり、同一時間軸上に複数の韻律的特徴
を表すことができる。その際、出力面情報及び出力条件
が考慮されて各音声波形に対応する個々の基本周波数の
表示データが作成されるのは第一の発明の場合と同様と
なる。A second invention is a configuration for simultaneously outputting a fundamental frequency pattern of a speech waveform of a plurality of syllables or phoneme strings having different utterance time lengths to a two-dimensional output surface including a time axis. In the invention of (1), a vocalization time length normalizing means is added. In this utterance time length normalizing means, the utterance time lengths of other syllables or phoneme strings are normalized with respect to the utterance time lengths of others. As a result, the analysis unit time length of each voice waveform becomes the same, and a plurality of prosodic features can be expressed on the same time axis. At that time, the output data and the output conditions are taken into consideration, and the display data of the individual fundamental frequencies corresponding to the respective voice waveforms are created, as in the case of the first invention.

【００１６】[0016]

【実施例】次に、図面を参照して本発明の実施例を説明
する。Embodiments of the present invention will now be described with reference to the drawings.

【００１７】図１は、本発明の一実施例に係る音声の基
本周波数出力システムのブロック図であり、複数の音節
又は音韻列の基本周波数をそれぞれ同一の出力面上に印
字する場合の構成例を示してある。図１において、１は
基本周波数分析装置、２はプリンタあるいはディスプレ
イ等の出力装置である。FIG. 1 is a block diagram of a voice fundamental frequency output system according to an embodiment of the present invention, in which the fundamental frequencies of a plurality of syllables or phoneme strings are printed on the same output surface. Is shown. In FIG. 1, reference numeral 1 is a fundamental frequency analyzer, and 2 is an output device such as a printer or a display.

【００１８】基本周波数分析装置１は、音声波形を入力
して基本周波数パターンを出力装置２に出力するもの
で、上記音声波形を保存するための音声波形保持部１
１、保存された音声波形を随時呼び出して分析し、基本
周波数とその振幅値を得る音声波形分析部１２、複数の
音節の時間長を正規化する音節時間長正規化部１３、出
力装置２の出力面上の位置情報の計算を行う表示位置演
算部１４、実際に出力面に表示するための座標データを
作成する表示データ作成部１５、上記音声波形分析部１
２で得られた基本周波数と振幅値とをそれぞれ対応付け
て保存する分析データ保持部１６、上記音声波形に対応
する音素ラベリングに基づいて作成した当該音声の音節
区間情報を保持する音節区間情報保持部１７、出力面の
寸法に関する情報を保持する出力面情報保持部１８、及
び、実際に印字する際の条件を設定保持する印字条件保
持部１９を有する。The fundamental frequency analyzer 1 inputs a speech waveform and outputs a fundamental frequency pattern to the output device 2, and the speech waveform holding unit 1 for storing the speech waveform.
1, a voice waveform analysis unit 12 that calls up and analyzes a stored voice waveform at any time to obtain a fundamental frequency and its amplitude value, a syllable time length normalization unit 13 that normalizes the time length of a plurality of syllables, and an output device 2. A display position calculation unit 14 that calculates position information on the output surface, a display data creation unit 15 that creates coordinate data to be actually displayed on the output surface, and the voice waveform analysis unit 1 described above.
2. An analysis data holding unit 16 which stores the fundamental frequency and the amplitude value obtained in step 2 in association with each other, and a syllable section information holding section which holds syllabic section information of the voice created based on phoneme labeling corresponding to the voice waveform. The unit 17 includes an output surface information storage unit 18 that stores information about the dimensions of the output surface, and a print condition storage unit 19 that sets and stores conditions for actual printing.

【００１９】次に、上記構成の基本周波数分析装置１の
処理動作を具体的に説明する。Next, the processing operation of the fundamental frequency analyzer 1 having the above configuration will be specifically described.

【００２０】入力された音声波形は、音声波形保持部１
１に一時保存されるとともに、基本周波数と振幅値を抽
出するために、随時、音声波形分析部１２に送られる。
基本周波数の抽出は、ごく短い時間を分析単位時間長
（以下、フレーム）として定め、このフレーム分だけ切
り出した波形を分析することで１つのフレームに対して
１つの基本周波数を決める。ここでは、例えば、同じ音
節”ｆｕｎｅ”でもその時間長が異なればフレームの
数、基本周波数の数も異なっている。The input voice waveform is stored in the voice waveform holding unit 1.
1, and is sent to the voice waveform analysis unit 12 at any time in order to extract the fundamental frequency and the amplitude value.
In the extraction of the fundamental frequency, a very short time is defined as an analysis unit time length (hereinafter, frame), and one fundamental frequency is determined for one frame by analyzing a waveform cut out by this frame. Here, for example, even if the same syllable “fune” has different time lengths, the number of frames and the number of fundamental frequencies also differ.

【００２１】音声波形保持部１１からの音声波形は、音
声波形分析部１２で上述のように分析されるが、その
際、音節区間情報保持部１７から送られる当該音声波形
の音節区間情報に基づいて、目的とする音節を取り出
す。取り出された音節の各フレーム毎の基本周波数及び
このときの振幅値は、分析データとして分析データ保持
部１６に送られ、ここで保存される。なお、図示を省略
しているが、基本周波数毎に自己相関係数値を算出する
公知の係数値算出手段を設け、その算出結果をも分析デ
ータとして分析データ保持部１６に保存するようにして
も良い。The voice waveform from the voice waveform holding unit 11 is analyzed by the voice waveform analysis unit 12 as described above, and at this time, based on the syllable section information of the voice waveform sent from the syllable section information holding unit 17. The target syllable. The fundamental frequency of each frame of the extracted syllable and the amplitude value at this time are sent to the analysis data holding unit 16 as analysis data and stored therein. Although not shown, a known coefficient value calculating means for calculating an autocorrelation coefficient value for each fundamental frequency is provided, and the calculation result is also stored in the analysis data holding unit 16 as analysis data. good.

【００２２】様々な時間長の音節を同一時間軸で比較観
察するために、本実施例は、一つの音節の時間長を基準
とし、他の音節の場合との比率を計算して利用する。こ
の比率計算は、音節時間長正規化部１３にて行われる。In order to compare and observe syllables having various time lengths on the same time axis, the present embodiment uses the time length of one syllable as a reference and calculates and uses the ratio to the case of other syllables. This ratio calculation is performed by the syllable time length normalization unit 13.

【００２３】表示位置演算部１４では、音節時間長正規
化部１３で求めた比率を用いて実際に印字する時間軸上
の間隔を演算するが、その際、出力面情報保持部１８に
保存された出力面情報を参照する。この出力面情報は、
具体的には、出力装置２の出力面の寸法、例えば二次元
の印字紙面の縦横の長さを表す設定情報である。この設
定値は用途に応じて任意に変更可能である。いま、横の
長さ（時間軸方向の長さ）をＬとすれば、複数の音節の
基本周波数は、表示位置演算部４によりそれぞれ時間軸
の最大値がこの長さＬ以内の同一値に揃えられる。ま
た、出力面の設定情報に応じて印字間隔が自動的に変更
される。The display position calculation unit 14 calculates the interval on the time axis to be actually printed using the ratio obtained by the syllable time length normalization unit 13. At that time, it is stored in the output surface information holding unit 18. Refer to the output surface information. This output surface information is
Specifically, it is setting information indicating the dimensions of the output surface of the output device 2, for example, the vertical and horizontal lengths of the two-dimensional printing paper surface. This set value can be arbitrarily changed according to the application. Now, assuming that the horizontal length (length in the time axis direction) is L, the fundamental frequencies of a plurality of syllables are set to the same value within the length L by the maximum value of the time axis by the display position calculation unit 4. Aligned. Further, the print interval is automatically changed according to the setting information of the output surface.

【００２４】表示データ作成部１５は、この表示位置演
算部１４の出力と印字条件保持部１９に保存された印字
条件とに基づいて前述の分析データのフレーム毎の表示
データを作成する。The display data creation unit 15 creates display data for each frame of the above-mentioned analysis data based on the output of the display position calculation unit 14 and the printing conditions stored in the printing condition holding unit 19.

【００２５】印字条件は、例えば出力面の寸法に対応す
る印字間隔の下限値、基本周波数毎の振幅値の下限値で
あり、これら下限値を超える基本周波数についてのみ表
示データを作成する。The printing conditions are, for example, the lower limit value of the printing interval corresponding to the size of the output surface and the lower limit value of the amplitude value for each fundamental frequency, and the display data is created only for the fundamental frequency exceeding these lower limit values.

【００２６】出力面の寸法は前述のように任意に設定可
能であり、この設定値に応じて上記基本周波数の印字間
隔も自動的に変わる。通常、複数の音節の基本周波数を
印字したときに重なり合うのは時間軸方向であることか
ら、横方向の長さＬを考慮した印字個数の最大値を定め
ることで印字間隔の下限値を決める。例えば、印字され
る最大個数を、時間軸で２００［ｍｓ］あたり２０個ま
でとすれば印字間隔の下限値が一義的に定まる。この下
限に満たないフレームがあると個々の印字が重なり合う
ので、その基本周波数については不適合要素として無視
し、印字を行わないようにした。The dimensions of the output surface can be arbitrarily set as described above, and the printing interval of the fundamental frequency is automatically changed according to the set value. Usually, when the fundamental frequencies of a plurality of syllables are printed, they overlap each other in the time axis direction. Therefore, the lower limit value of the printing interval is determined by determining the maximum value of the number of printed sheets in consideration of the lateral length L. For example, if the maximum number of prints is 20 per 200 [ms] on the time axis, the lower limit of the print interval is uniquely determined. If there are frames that do not meet this lower limit, the individual prints will overlap, so the fundamental frequency is ignored as a nonconforming element, and printing is not performed.

【００２７】また、例えば、録音された音声の場合やマ
イクロフォン等を使用して発声した場合の雑音成分の振
幅値は有声音の振幅値より小さい。殊に、発声時に生じ
る雑音成分は、自己相関係数値が極めて小さく、隣接フ
レーム間での周期性の相関もとれないことから、前述の
自己相関計数値が予め定めた閾値未満である場合は、そ
れだけで雑音成分と判定することも可能である。そこ
で、本実施例では、基本周波数毎の振幅値の下限値と自
己相関係数値の閾値を定め、この下限値又は閾値に満た
ない基本周波数については雑音成分と判定して印字を行
わないようにした。Further, for example, the amplitude value of the noise component in the case of recorded voice or when uttered using a microphone or the like is smaller than the amplitude value of voiced sound. In particular, the noise component generated at the time of utterance has an extremely small autocorrelation coefficient value, and since the periodic correlation between adjacent frames cannot be obtained, when the above-mentioned autocorrelation count value is less than a predetermined threshold value, It is also possible to determine that it is a noise component. Therefore, in the present embodiment, the lower limit value of the amplitude value and the threshold value of the autocorrelation coefficient value for each fundamental frequency are set, and the fundamental frequency that is less than the lower limit value or the threshold value is determined to be a noise component and printing is not performed. did.

【００２８】表示データ作成部１５で作成された表示デ
ータは、出力装置２に出力され、ここで出力面に印字さ
れる。The display data created by the display data creating section 15 is output to the output device 2 and printed on the output surface.

【００２９】図２は、複数の音韻列の基本周波数の分析
データを出力面情報及び印字条件を考慮しない従来の手
法により時間軸対周波数軸の印字紙面にプロットしてグ
ラフ化した図であり、文章”ｆｕｎｅｍａｄｅｍｕ
ｋａｅｎｉｉｋｉｍａｓｕｙｏ”を、１０個の音韻
列”ｆｕ”〜”ｙｏ”に分解するとともに、これを同一
人に異なる発声の仕方（状態ａ〜状態ｄ）で発声させた
ものを同一時間軸上にて比較観察したものである。な
お、正規化のための基準時間長は、条件ａの文字列の時
間長を基準としている。FIG. 2 is a graph in which the analysis data of the fundamental frequencies of a plurality of phoneme strings are plotted on the printing paper surface of the time axis versus the frequency axis by a conventional method that does not consider the output surface information and the printing conditions. The sentence "fune made mu
"kaeni ikimasuyo" is decomposed into ten phoneme strings "fu" to "yo", and the same person is uttered in different ways (state a to state d) on the same time axis. The reference time length for normalization is based on the time length of the character string of condition a.

【００３０】この図において、状態ａは平常状態で発声
した場合、状態ｂは笑いながら発声した場合、状態ｃは
泣いたような状態で発声した場合、状態ｄは怒りながら
発声した場合である。図２に示すように、時間長を正規
化することで異なる時間長の音節（音韻列）の基本周波
数パターンを比較することはできる。しかし、印字部分
が多重に重なっており、しかも各状態ａ〜ｄの際に混入
する雑音成分をも全く同様な処理を経てプロットしてい
るから視覚的に不明瞭となっている。In this figure, the state a is a normal state, the state b is a laughing state, the state c is a crying state, and the state d is an angry state. As shown in FIG. 2, it is possible to compare fundamental frequency patterns of syllables (phoneme sequences) having different time lengths by normalizing the time lengths. However, the printed portions are superposed and the noise components mixed in each of the states a to d are plotted through the same processing, so that they are visually unclear.

【００３１】これに対し、本実施例により得られるグラ
フ面を図３に示す。On the other hand, the graph surface obtained in this embodiment is shown in FIG.

【００３２】図３から明らかなように、分析データ保持
部１６に保持された基本周波数及び振幅値、出力面情報
保持部１８に保存された出力面の大きさの設定値、印字
条件保持部１９に保存された各出力条件に基づいて各音
節に対応する基本周波数の印字制御を行う本実施例の基
本周波数出力システムによれば、冗長ないし不要な分析
データが省かれ、視覚的に明瞭な出力結果が得られるこ
とがわかる。As is apparent from FIG. 3, the fundamental frequency and the amplitude value held in the analysis data holding unit 16, the set value of the size of the output surface stored in the output surface information holding unit 18, and the printing condition holding unit 19 According to the fundamental frequency output system of the present embodiment, which controls printing of the fundamental frequency corresponding to each syllable based on each output condition stored in, the redundant or unnecessary analysis data is omitted, and a visually clear output is obtained. It turns out that the result is obtained.

【００３３】なお、図２及び図３は同一人により様々な
状態で発声された音声波形の例であるが、一つの音声波
形のみの場合、あるいは異なる話者により発声された音
声波形についても同様の処理が可能であることはいうま
でもない。Although FIG. 2 and FIG. 3 are examples of voice waveforms uttered by the same person in various states, the same applies to the case of only one voice waveform or voice waveforms uttered by different speakers. Needless to say, the processing of is possible.

【００３４】このように、本実施例によれば、言語的情
報あるいは非言語的情報の分析をより簡易且つ正確に行
うことができる。しかも表示間隔の下限値に達している
か否かは、表示位置演算部１４より得られた演算値に対
して判定されるので、出力面の大きさが変わってもこれ
に追従して最適の印字間隔が得られる。As described above, according to this embodiment, it is possible to analyze linguistic information or non-linguistic information more easily and accurately. Moreover, whether or not the lower limit value of the display interval has been reached is determined based on the calculated value obtained from the display position calculation unit 14, so that even if the size of the output surface changes, the optimum printing is performed by following this. The spacing is obtained.

【００３５】本実施例は以上のとおりであるが、本発明
は、上記実施例に限定されるものではなく、その要旨を
逸脱しない範囲で他の実施態様が可能である。Although the present embodiment is as described above, the present invention is not limited to the above embodiment, and other embodiments are possible without departing from the scope of the invention.

【００３６】[0036]

【発明の効果】以上説明したように、本発明の音声の基
本周波数出力システムによれば、出力面の寸法情報に応
じて表示データの間隔が再決定されるとともにこの間隔
が予め定めた出力条件に満たない場合、例えば間隔の下
限値に達しない場合は下限値を超えるまで表示データが
作成されないので、出力すべき基本周波数の数が変化し
ても、あるいは出力面の寸法が変化しても上記下限値を
超える間隔で出力面に表示される効果がある。As described above, according to the fundamental frequency output system for voice of the present invention, the interval of the display data is re-determined according to the size information of the output surface, and the interval is a predetermined output condition. If the lower limit of the interval is not reached, display data will not be created until the lower limit is exceeded, so even if the number of fundamental frequencies to be output changes or the dimensions of the output surface change. It has the effect of being displayed on the output surface at intervals exceeding the above lower limit.

【００３７】また、出力条件として基本周波数の振幅値
の下限値あるいは自己相関係数値の閾値を含ませること
で、振幅値がこの下限値に満たない場合、あるいは基本
周波数の自己相関係数値が閾値に満たない場合も表示デ
ータが作成されないので、雑音成分が出力面に表示され
なくなる効果がある。Further, by including the lower limit value of the amplitude value of the fundamental frequency or the threshold value of the autocorrelation coefficient value as the output condition, when the amplitude value does not reach the lower limit value or the autocorrelation coefficient value of the fundamental frequency is the threshold value. Since the display data is not created even when the value is less than, there is an effect that the noise component is not displayed on the output surface.

【００３８】このように、本発明によれば、表示データ
作成手段に具える出力条件により表示の仕方を容易に調
節するでき、視覚的に明瞭な基本周波数パターンを迅速
に得ることができる。As described above, according to the present invention, the manner of display can be easily adjusted according to the output conditions provided in the display data creating means, and a visually clear fundamental frequency pattern can be quickly obtained.

[Brief description of drawings]

【図１】本発明の一実施例に係る音声の基本周波数出力
システムのブロック図。FIG. 1 is a block diagram of an audio fundamental frequency output system according to an embodiment of the present invention.

【図２】出力面情報及び出力条件を考慮しない場合の基
本周波数の出力結果の例を示す説明図。FIG. 2 is an explanatory diagram showing an example of an output result of a fundamental frequency when output surface information and output conditions are not considered.

【図３】出力面情報、印字間隔の下限値及び印字間隔の
下限値を考慮した本実施例の構成により出力された基本
周波数の出力結果の例を示す説明図。FIG. 3 is an explanatory diagram showing an example of an output result of a fundamental frequency output by the configuration of the present embodiment in consideration of output surface information, a lower limit value of a print interval, and a lower limit value of a print interval.

[Explanation of symbols]

１基本周波数分析装置１１音声波形保持部１２音声波形分析部１３音節時間長正規化部１４表示位置演算部１５表示データ作成部１７音節区間情報保持部１８出力面情報保持部１９印字条件保持部２二次元出力面を有する出力装置 DESCRIPTION OF SYMBOLS 1 fundamental frequency analyzer 11 voice waveform holding unit 12 voice waveform analyzing unit 13 syllable time length normalizing unit 14 display position calculation unit 15 display data creating unit 17 syllable section information holding unit 18 output surface information holding unit 19 printing condition holding unit 2 Output device having two-dimensional output surface

Claims

[Claims]

1. A system for displaying a fundamental frequency pattern of a speech waveform of a syllable or a phonological sequence on a two-dimensional output surface including a time axis, the fundamental frequency and amplitude value of each prescribed phonological sequence section from the speech waveform. A voice waveform analyzing means for extracting, an analysis data holding means for holding the extracted fundamental frequency and an amplitude value corresponding to the fundamental frequency as analysis data of the phoneme section, and a dimension of at least the time axis direction of the output surface. Output surface information holding means for holding information, display position determining means for determining the display position on the output surface of each fundamental frequency based on the dimension information in the time axis direction, of the analysis data to the output surface The output frequency is held, and the fundamental frequency and its display position that meet this output condition are extracted from the analysis data holding means and the display position determining means, and the extracted fundamental frequency is extracted. And fundamental frequency output system sound and having a display data generating means for generating display data to the output surface based on the display position.

2. A system for simultaneously outputting a fundamental frequency pattern of a speech waveform of a plurality of syllables or phoneme strings having different utterance time lengths to a two-dimensional output surface including a time axis, wherein each of the speech waveforms has a predetermined frequency. A speech waveform analysis unit that extracts a fundamental frequency and an amplitude value for each phoneme sequence section, and an analysis data holding unit that holds each extracted fundamental frequency and an amplitude value corresponding to the fundamental frequency as analysis data of the phoneme section. A voicing time length normalizing means for normalizing voicing time lengths of other syllables or phonological strings based on the utterance time length of a particular one of the plurality of syllables or phonological strings, and at least the time axis direction of the output surface Output surface information holding means for holding the dimensional information, display position determining means for deciding the display position on the output surface of each fundamental frequency based on the dimensional information in the time axis direction, While holding the output condition of the analysis data to the output surface, the fundamental frequency and its display position that meet this output condition are extracted from the analysis data holding means and the display position determining means, and the extracted fundamental frequency and display position And a display data creating unit for creating display data on the output surface based on the above.

3. The audio fundamental frequency output system according to claim 1, wherein the display data creating unit corresponds to a lower limit value of individual data display intervals on an output surface and a fundamental frequency as the output condition. The fundamental frequency of the voice, which includes at least one of the lower limit value of the amplitude value and the threshold value of the autocorrelation coefficient value of the fundamental frequency, and which determines the fundamental frequency or the display position less than the lower limit value or the threshold value as a nonconforming element. Output system.