JPH09230884A

JPH09230884A - Text voice conversion method and method for determining breathing position of text voice converter

Info

Publication number: JPH09230884A
Application number: JP8041184A
Authority: JP
Inventors: Masanori Miyatake; 正典宮武; Hiroki Onishi; 宏樹大西
Original assignee: Sanyo Electric Co Ltd
Current assignee: Sanyo Electric Co Ltd
Priority date: 1996-02-28
Filing date: 1996-02-28
Publication date: 1997-09-05
Anticipated expiration: 2016-02-28
Also published as: JP3519852B2

Abstract

PROBLEM TO BE SOLVED: To make it possible to convert inputted sentences to natural speeches by determining breathing positions in accordance with the inputted sentences and inserting breathing tones in the determined breathing positions. SOLUTION: The speech signals formed in a speech forming section 2 is sent to a breathing and silent inserting section 3 and are sent to a breathing position candidate detecting section 5. The breathing and silent inserting section 3 inserts the breathing or silent speech signals into the speech signals formed in the speech forming section 2 according to the controlling command from the breathing and silent inserting control section 6. The speech signals obtd. by the breathing and silent inserting section 3 are outputted as speeches via a speech output section 4. The breaching position candidate detecting section 5 accumulates the power values of the speeches from the speech signals formed by the speech forming section 2 and sends breaching position information to a breathing and silent insertion control section 6 when the cumulative value attains the prescribed value determined beforehand. The control section clears the cumulative value thus far and freshly starts the accumulation.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する分野】この発明は、文音声変換方法なら
びに文音声変換装置における息継ぎ位置決定方法に関す
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a sentence / speech conversion method and a breathing position determining method in a sentence / speech conversion apparatus.

【０００２】[0002]

【従来の技術】入力された文章に対応する音声を生成す
る文音声変換装置においては、入力された文章に応じた
自然な音声を生成することが重要である。入力された文
章に応じた自然な音声を生成するために、生成される音
声にポーズを挿入することが行われている。ポーズの挿
入位置は、文章の句読点、意味の境界、最終の息継ぎ位
置からの経過時間等に基づいて決定されている。しかし
ながら、大声で発生した場合には、早く息切れするた
め、息継ぎが増加するといった生理的現象までは反映さ
れていない。2. Description of the Related Art In a sentence-to-speech converter that generates a voice corresponding to an input sentence, it is important to generate a natural voice corresponding to the input sentence. In order to generate a natural voice corresponding to an input sentence, a pose is inserted in the generated voice. The insertion position of the pose is determined based on the punctuation marks of the sentence, the boundaries of the meaning, the elapsed time from the final breathing position, and the like. However, when it occurs in a loud voice, shortness of breath occurs, so that a physiological phenomenon such as an increase in breathing is not reflected.

【０００３】[0003]

【発明が解決しようとする課題】この発明は、入力文章
をより自然な音声に変換することができる文音声変換方
法ならびに文音声変換装置における息継ぎ位置決定方法
を提供することを目的とする。SUMMARY OF THE INVENTION It is an object of the present invention to provide a sentence-speech conversion method capable of converting an input sentence into a more natural voice and a breathing position determining method in a sentence-speech conversion device.

【０００４】[0004]

【課題を解決するための手段】この発明による文音声変
換方法は、入力された文章を音声に変換する文音声変換
方法において、入力された文章に基づいて息継ぎ位置を
決定し、決定された息継ぎ位置に、息継ぎ音を挿入する
ことを特徴とする。A sentence-speech conversion method according to the present invention is a sentence-speech conversion method for converting an input sentence into a voice, in which a breathing position is determined based on the input sentence and the determined breathing position is determined. It is characterized by inserting a breathing sound at the position.

【０００５】上記息継ぎ位置は、たとえば、次のように
して決定される。（１）入力された文章から生成された音声信号の音声パ
ワーの累積値に基づいて、息継ぎ位置を決定する。The breathing position is determined as follows, for example. (1) The breathing position is determined based on the cumulative value of the voice power of the voice signal generated from the input sentence.

【０００６】（２）入力された文章から生成された音声
信号の音声パワーの累積値に基づいて決定された息継ぎ
位置候補から後に最初に現れる文法・意味解析上の区切
り位置を、息継ぎ位置と決定する。(2) The break position in grammatical / semantic analysis first appearing after the breathing position candidate determined based on the cumulative value of the voice power of the voice signal generated from the input sentence is determined as the breathing position. To do.

【０００７】この発明による文音声変換方法では、息継
ぎ位置に無音ではなく、息継ぎ音を挿入しているので、
より自然な音声が得られる。In the sentence-speech conversion method according to the present invention, since the breathing sound is inserted at the breathing position instead of the silence,
A more natural sound can be obtained.

【０００８】この発明による第１の文音声変換装置にお
ける息継ぎ位置決定方法は、入力された文章を音声に変
換する文音声変換装置における息継ぎ位置決定方法であ
って、入力された文章を朗読した場合の空気吐出量を予
測し、予測した空気吐出量に基づいて、息継ぎ位置を決
定することを特徴とする。A first breath-speech position deciding method in the first sentence-speech conversion apparatus according to the present invention is a breath-speech position deciding method in a sentence-speech conversion apparatus for converting an inputted sentence into a voice, and when the inputted sentence is read aloud. Is predicted, and the breathing position is determined based on the predicted air discharge amount.

【０００９】この発明による第１の文音声変換装置にお
ける息継ぎ位置決定方法では、入力された文章を朗読し
た場合の空気吐出量を予測し、予測した空気吐出量に基
づいて、息継ぎ位置を決定しているので、入力文章を朗
読した際に実際の息継ぎが行なわれる位置が、息継ぎ位
置として決定されやすくなり、決定された息継ぎ位置に
息継ぎ音または無音のポーズを入れることにより、より
自然な音声が得られるようになる。In the breath position determining method in the first sentence-speech conversion apparatus according to the present invention, the air discharge amount when the input sentence is read aloud is predicted, and the breath position is determined based on the predicted air discharge amount. Therefore, it is easy to determine the position where the actual breathing is performed when reading the input sentence as the breathing position, and by inserting a breathing sound or silence pause at the determined breathing position, a more natural voice can be obtained. You will get it.

【００１０】この発明による第２の文音声変換装置にお
ける息継ぎ位置決定方法は、入力された文章を音声に変
換する文音声変換装置における息継ぎ位置決定方法であ
って、入力された文章に基づいて、入力文章に対応する
音声信号を生成し、生成された音声信号の音声パワーの
累積値に基づいて、息継ぎ位置を決定することを特徴と
する。A second breath-to-speech position determining method in the second sentence-speech conversion apparatus according to the present invention is a breath-to-speech position determining method in a sentence-to-speech conversion apparatus for converting an input sentence into a voice, based on the inputted sentence. A feature is that a voice signal corresponding to an input sentence is generated, and a breathing position is determined based on a cumulative value of voice power of the generated voice signal.

【００１１】具体的には、たとえば、音声信号の音声パ
ワーの累積値に基づいて、息継ぎ位置候補を決定し、決
定した息継ぎ位置候補から後に最初に現れる文法・意味
解析上の区切り位置を、息継ぎ位置と決定する。Specifically, for example, a breathing position candidate is determined based on the cumulative value of the voice power of a voice signal, and the break position in the grammar / semantic analysis that first appears after the determined breathing position candidate is breathed. Determine the position.

【００１２】この発明による第２の文音声変換装置にお
ける息継ぎ位置決定方法では、入力文章に対応する音声
信号の音声パワーの累積値に基づいて、息継ぎ位置が決
定されているので、入力文章を朗読した際に実際の息継
ぎが行なわれる位置が、息継ぎ位置として決定されやす
くなり、決定された息継ぎ位置に息継ぎ音または無音の
ポーズを入れることにより、より自然な音声が得られる
ようになる。In the second breath-speech position determining method in the second sentence-speech conversion apparatus according to the present invention, since the breath-breath position is determined based on the cumulative value of the voice power of the voice signal corresponding to the input sentence, the input sentence is read aloud. The position where the actual breathing is performed at this time is easily determined as the breathing position, and a more natural voice can be obtained by placing a breathing sound or a silent pause at the determined breathing position.

【００１３】[0013]

【発明の実施の形態】以下、図面を参照して、この発明
の実施の形態について説明する。BEST MODE FOR CARRYING OUT THE INVENTION Embodiments of the present invention will be described below with reference to the drawings.

【００１４】〔１〕第１の実施の形態の説明図１は、この発明の第１の実施の形態である文音声変換
装置を示している。[1] Description of First Embodiment FIG. 1 shows a sentence-speech conversion apparatus according to a first embodiment of the present invention.

【００１５】文音声変換装置に入力された文章は、言語
処理部１によって文法解析、意味解析等が行われた後、
音声生成部２に送られる。音声生成部２では、入力文章
に応じた音声信号を生成する。音声生成部２からは、入
力文章の内容に応じて強弱のある音声信号が出力され
る。音声に強弱をつける方法としては、入力文章に基づ
いて、言語処理部１が意味解析により、強く発声する部
分と弱く発声する部分とを分離する方法、予め入力文章
に強く発声する部分と弱く発声する部分とを示すコード
を入れておく方法とがある。The sentence input to the sentence-speech converter is subjected to grammatical analysis, semantic analysis, etc. by the language processing unit 1,
It is sent to the voice generation unit 2. The voice generation unit 2 generates a voice signal according to the input sentence. The voice generation unit 2 outputs a voice signal having strength and weakness according to the content of the input sentence. As a method of adding strength to a voice, the language processing unit 1 separates a strongly uttered part and a weakly uttered part by semantic analysis based on an input sentence, and a strongly uttered part and weakly uttered part in advance in an input sentence. There is a method to put a code indicating the part to be done.

【００１６】音声生成部２で生成された音声信号は、息
継ぎ音および無音挿入部３に送られるとともに、息継ぎ
位置候補検出部５に送られる。息継ぎ音および無音挿入
部３は、息継ぎ音および無音挿入制御部６からの制御指
令に基づいて、音声生成部２で生成された音声信号に、
息継ぎ音または無音の音声信号を挿入する。息継ぎ音お
よび無音挿入部３によって得られた音声信号は、音声出
力部４を介して音声出力される。The voice signal generated by the voice generation unit 2 is sent to the breathing sound and silence insertion unit 3 and the breathing position candidate detection unit 5. The breath-and-sound and silence insertion unit 3 adds the sound signal generated by the sound generation unit 2 to the sound signal based on the control command from the breath-and-sound and silence insertion control unit 6.
Insert breath or silence audio signal. The voice signal obtained by the breath breath sound and silence insertion unit 3 is output as voice through the voice output unit 4.

【００１７】息継ぎ位置候補検出部５は、音声生成部２
によって生成された音声信号から音声のパワー値を累積
し、累積値が予め定められた所定値に達したときに、息
継ぎ位置情報を息継ぎ音および無音挿入制御部６に送
る。また、息継ぎ位置候補検出部５は、累積値が所定値
に達したときには、それまでの累積値をクリアして、累
積値の算出を新たに開始する。The breathing position candidate detection unit 5 includes a voice generation unit 2
The power value of the sound is accumulated from the sound signal generated by, and when the accumulated value reaches a predetermined value, the breathing position information is sent to the breathing sound and silence insertion control unit 6. When the cumulative value reaches the predetermined value, the breathing position candidate detection unit 5 clears the cumulative value up to that point and newly starts calculation of the cumulative value.

【００１８】ｖ（ｔ）を音声波形とすると、時点ｔ１〜
時点ｔ３までの音声のパワー値の累積値Ｐ（ｔ１，ｔ
２）は、次の数式１で表される。When v (t) is a voice waveform, time points t1 to
Cumulative value P (t1, t) of the power value of the voice until time t3
2) is expressed by the following mathematical formula 1.

【００１９】[0019]

【数１】 [Equation 1]

【００２０】息継ぎ音および無音挿入制御部６には、息
継ぎ位置候補検出部５からの息継ぎ位置情報が送られる
とともに、言語処理部１によって入力文章から抽出され
た句読点位置情報および意味境界位置情報が送られる。
以下、句読点位置情報および意味境界位置情報を総称し
て文法・意味解析上の区切り位置情報という。The breathing sound and silence insertion control unit 6 receives the breathing position information from the breathing position candidate detecting unit 5 and also receives the punctuation mark position information and the meaning boundary position information extracted from the input sentence by the language processing unit 1. Sent.
Hereinafter, the punctuation mark position information and the meaning boundary position information are collectively referred to as delimiter position information in grammar / semantic analysis.

【００２１】息継ぎ音および無音挿入制御部６は、文法
・意味解析上の区切り位置情報（句読点位置情報および
意味境界位置情報）と、息継ぎ位置情報とに基づいて、
次のような制御を行う。The breathless sound and silence insertion control unit 6 determines, based on the break position information (punctuation mark position information and semantic boundary position information) in grammatical / semantic analysis, and the breathing position information.
The following control is performed.

【００２２】（１）息継ぎ位置情報によって表される息
継ぎ位置と、文法・意味解析上の区切り位置情報によっ
て表される文法・意味解析上の区切り位置とに基づい
て、息継ぎ音挿入位置を決定し、決定した息継ぎ音挿入
位置に息継ぎ音を挿入するように息継ぎ音および無音挿
入部３を制御する。(1) The breathing sound insertion position is determined based on the breathing position represented by the breathing position information and the grammatical / semantic analysis dividing position represented by the grammatical / semantic analysis delimiter position information. , And controls the breathing sound and the silent insertion portion 3 to insert the breathing sound at the determined breathing sound insertion position.

【００２３】つまり、息継ぎ音挿入位置は、句読点位置
および意味境界位置のうち、息継ぎ位置より後に最初に
現れる文法・意味解析上の区切り位置に決定される。な
お、息継ぎ位置は、息継ぎ位置候補検出部５からの息継
ぎ位置情報が出力された時点に対応する音声信号の位置
をいう。That is, the breathing sound insertion position is determined to be a grammatical / semantic parsing position that appears first after the breathing position, out of the punctuation mark position and the meaning boundary position. The breathing position refers to the position of the audio signal corresponding to the time when the breathing position candidate detection unit 5 outputs the breathing position information.

【００２４】（２）息継ぎ位置情報によって表される息
継ぎ位置と、文法・意味解析上の区切り位置情報によっ
て表される文法・意味解析上の区切り位置とに基づい
て、無音挿入位置を決定し、決定した無音挿入位置に無
音を挿入するように息継ぎ音および無音挿入部３を制御
する。(2) The silent insertion position is determined based on the breathing position represented by the breathing position information and the grammatical / semantic analysis delimiting position represented by the grammatical / semantic analysis delimiter position information, The breathing sound and silence insertion section 3 is controlled so as to insert the silence into the determined silence insertion position.

【００２５】無音挿入位置は、原則的には、文法・意味
解析上の区切り位置に決定される。ただし、原則的に決
定された無音挿入位置の近くに息継ぎ音挿入位置が存在
する場合には、決定された無音挿入位置をキャンセルす
る。また、原則的に決定された無音挿入位置の近くに、
他の文法・意味解析上の区切り位置が存在する場合に
は、それらのうちから、１つの位置を選択して無音挿入
位置と決定する。The silence insertion position is basically determined as a separation position in grammar / semantic analysis. However, if the breath insertion sound insertion position is present near the silence insertion position determined in principle, the determined silence insertion position is canceled. Also, in the vicinity of the silence insertion position that is determined in principle,
If there are other grammatical / semantic parsing positions, one of them is selected and determined as the silent insertion position.

【００２６】ところで、文章の朗読時には、息継ぎのポ
ーズと息継ぎのないポーズとがある。息継ぎのポーズは
比較的長く、意味上の区切りとして重要であり、息継ぎ
のないポーズは比較的短くまた必ずしも重要ではない。By the way, when reading sentences, there are breath poses and breathless poses. Breath poses are relatively long and are important as semantic breaks, while breathless poses are relatively short and not necessarily important.

【００２７】文章の朗読時の息継ぎのタイミングは、肺
の残気量（残留呼吸量）に関係している。つまり、空気
吐き出し量に関係している。上記実施の形態では、空気
吐き出し量を、音声生成部２によって生成された音声の
パワー累積値に基づいて算出し、パワー累積値が所定値
に達したときに、肺の残気量が所定以下になったと判定
し、その後の最初の文法・意味解析上の区切り位置を、
息継ぎ位置と決定している。したがって、入力された文
章を朗読した際に息継ぎされる位置が、息継ぎ位置とし
て決定されやすくなる。また、検出された息継ぎ位置に
おいては、無音ではなく息継ぎ音を挿入しているので、
入力文章をより自然な音声に変換することができる。The timing of breathing at the time of reading a sentence is related to the lung residual capacity (residual respiratory capacity). That is, it is related to the amount of air discharged. In the above-described embodiment, the air discharge amount is calculated based on the power cumulative value of the sound generated by the sound generating unit 2, and when the power cumulative value reaches the predetermined value, the residual air volume of the lung is equal to or less than the predetermined value. It is determined that, and the first delimiter position in grammar and semantic analysis after that,
It has been determined to be a breathing position. Therefore, the position where the breath is breathed when the input sentence is read aloud is easily determined as the breathing position. In addition, at the detected breathing position, since the breathing sound is inserted instead of silence,
The input sentence can be converted into a more natural voice.

【００２８】なお、各音韻ごとに空気吐出量と発声量と
の関係を示すテーブルを用意しておき、入力された文章
を構成する各音韻についてその音声の大きさ（発声量）
から空気吐出量を算出し、算出された空気吐出量の累積
値に基づいて、息継ぎ位置情報を生成するようにしても
よい。It should be noted that a table showing the relationship between the air ejection amount and the utterance amount is prepared for each phoneme, and the loudness (utterance amount) of each phoneme constituting the input sentence.
The air discharge amount may be calculated from the above, and the breathing position information may be generated based on the calculated cumulative value of the air discharge amount.

【００２９】〔２〕第２の実施の形態の説明[2] Description of Second Embodiment

【００３０】図２は、この発明の第２の実施の形態であ
る文音声変換装置を示している。図２において、図１と
同じものには、同じ符号を付してその説明を省略する。FIG. 2 shows a sentence-speech conversion apparatus according to the second embodiment of the present invention. 2, the same parts as those in FIG. 1 are designated by the same reference numerals and the description thereof will be omitted.

【００３１】この例では、息継ぎ音および無音挿入制御
部６は、予め設定された時間を計時するタイマ７を備え
ており、息継ぎ音挿入位置の決定方法が図１の文音声変
換装置と異なっている。In this example, the breathing sound and silence insertion control unit 6 is provided with a timer 7 for measuring a preset time, and the method of determining the breathing sound insertion position is different from that of the sentence-speech conversion apparatus of FIG. There is.

【００３２】息継ぎ位置候補検出部５から出力される息
継ぎ位置情報を第１の息継ぎ位置情報ということにす
る。息継ぎ位置候補検出部５は、音声生成部２によって
生成された音声信号から音声のパワー値を積分し、積分
値が予め定められた所定値に達したときに、第１の息継
ぎ位置情報を息継ぎ音および無音挿入制御部６に送る。
また、息継ぎ位置候補検出部５は、積分値が所定値に達
したときおよび息継ぎ音および無音挿入制御部６によっ
て決定された息継ぎ音挿入位置の直後の音声信号が音声
生成部２から出力されたときには、それまでの積分値を
クリアして、積分値の算出を新たに開始する。The breathing position information output from the breathing position candidate detection unit 5 is referred to as first breathing position information. The breathing position candidate detecting unit 5 integrates the power value of the voice from the voice signal generated by the voice generating unit 2, and when the integrated value reaches a predetermined value, the first breathing position information is breathed. Send to the sound and silence insertion control unit 6.
In addition, the breathing position candidate detection unit 5 outputs the voice signal immediately after the breathing sound insertion position determined by the breathing sound and silence insertion control unit 6 when the integrated value reaches a predetermined value from the voice generation unit 2. In some cases, the integration value up to that point is cleared, and calculation of the integration value is newly started.

【００３３】タイマ７は、息継ぎ音および無音挿入制御
部６によって決定された息継ぎ音挿入位置の直後の音声
信号が音声生成部２から出力される時点でリセットさ
れ、リセットされてから次にリセットされるまでの間に
所定時間を計時したときに、第２の息継ぎ位置情報を出
力する。The timer 7 is reset when the voice signal immediately after the breathing sound insertion position determined by the breathing sound and silence insertion control unit 6 is output from the voice generating unit 2, and is reset and then reset. The second breathing position information is output when a predetermined time is counted until the time.

【００３４】息継ぎ音挿入位置は、原則的には、第１の
実施の形態と同様に、第１の息継ぎ位置情報によって表
される息継ぎ位置より後に最初に現れる文法・意味解析
上の区切り位置（第１息継ぎ音挿入位置という）に決定
される。The breath insertion sound insertion position is, in principle, similar to the first embodiment, a delimiter position in grammatical / semantic analysis which first appears after the breathing position represented by the first breathing position information. The first breath sound insertion position) is determined.

【００３５】ただし、第１の息継ぎ位置情報が出力され
る前に、第２の息継ぎ位置情報が出力されたときには、
第２の息継ぎ位置情報によって表される息継ぎ位置より
後に最初に現れる文法・意味解析上の区切り位置（第２
息継ぎ音挿入位置という）が息継ぎ音挿入位置に決定さ
れる。However, when the second breathing position information is output before the first breathing position information is output,
Separation position in the grammatical / semantic analysis that first appears after the breathing position represented by the second breathing position information (second
The breathing sound insertion position) is determined as the breathing sound insertion position.

【００３６】[0036]

【発明の効果】この発明によれば、入力文章をより自然
な音声に変換することができるようになる。According to the present invention, the input sentence can be converted into a more natural voice.

[Brief description of drawings]

【図１】この発明の第１の実施の形態の文音声変換装置
の構成を示すブロック図である。FIG. 1 is a block diagram showing a configuration of a sentence-speech conversion apparatus according to a first embodiment of this invention.

【図２】この発明の第２の実施の形態の文音声変換装置
の構成を示すブロック図である。FIG. 2 is a block diagram showing a configuration of a sentence-speech conversion apparatus according to a second embodiment of this invention.

[Explanation of symbols]

１言語処理部２音声生成部３息継ぎ音および無音挿入部４音声出力部５息継ぎ位置候補検出部６息継ぎ音および無音挿入制御部７タイマ 1 Language Processing Section 2 Voice Generation Section 3 Breathing Sound and Silent Insertion Section 4 Voice Output Section 5 Breathing Position Candidate Detection Section 6 Breathing Sound and Silent Insertion Control Section 7 Timer

Claims

[Claims]

1. A sentence-speech conversion method for converting an input sentence into voice, wherein a breathing position is determined based on the input sentence, and a breathing sound is inserted at the determined breathing position. Sentence conversion method.

2. The sentence-speech conversion method according to claim 1, wherein the breathing position is determined based on a cumulative value of the voice power of the voice signal generated from the input sentence.

3. The breathing position is determined as the break position in the grammatical / semantic analysis first appearing after the breathing position candidate determined based on the cumulative value of the voice power of the voice signal generated from the input sentence. Claim 1 characterized by the above.
Sentence voice conversion method described in.

4. A breath position determining method in a sentence-speech conversion device for converting an input sentence into voice, comprising predicting an air discharge amount when reading the input sentence,
A breath-breathing position determining method in a sentence-speech conversion device, which determines a breath-breathing position based on a predicted air discharge amount.

5. A breath position determining method in a sentence-speech conversion device for converting an input sentence into a voice, wherein a voice signal corresponding to the input sentence is generated based on the input sentence, and the generated voice is generated. A breath-breathing position determining method in a sentence-speech conversion apparatus, which determines a breath-breathing position based on a cumulative value of voice power of a signal.

6. A breathing position candidate is determined based on a cumulative value of voice power of a voice signal, and a break position in grammatical / semantic analysis first appearing after the determined breathing position candidate is determined as a breathing position. The breathing position determining method in the sentence-speech conversion apparatus according to claim 5.