JPH03141399A

JPH03141399A - Voice parameter coupling system

Info

Publication number: JPH03141399A
Application number: JP1280816A
Authority: JP
Inventors: Tetsuya Sakayori; 哲也酒寄
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1989-10-27
Filing date: 1989-10-27
Publication date: 1991-06-17

Abstract

PURPOSE:To allow the synthesis of voices to meet the requirements for naturality and distinctness by selectively using plural different tables and cou pling the voice parameters between elementary pieces. CONSTITUTION:The connection point position of the preceding elementary piece A is indicated by the frame number (a) from the final frame and the connection point position of the succeding elementary piece B is indicated by the frame number (b) from the top frame. The method of connection is indicated by an interpolation function (f). The most adequate connection parameters for respective conditions and characteristics are determined at this time and the respective corresponding tables, for example, (a), (b) are prepd. The voice parameter in the i-th frame (0<=i<=N1) of the connecting part is thereafter calculated. The connecting parameter, such as connecting method and connection point position, are selected by the table number (t) stored in a memory device and are used for each of the characteristics, such as use objects of the synthesized voices, required speaking tone and speaker's characteristics. The distinctness and smoothness of the synthesized voices are obtd. with the balance meeting the condition in this way.

Description

【発明の詳細な説明】援１笈ル本発明は、規則音声合成装置の音声パラメータ結合方式
に関する。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a speech parameter combination method for a regular speech synthesizer.

ｋ米伎排ワードプロセッサや翻訳装置など、コンピュータによる
日本語処理が普及するにつれて、任意音声を出力する装
置の必要性が高まってきた。As the use of computers to process Japanese, such as word processors and translation devices, has become widespread, the need for devices that can output arbitrary speech has increased.

音声出力装置では、合成音声が明瞭であること、流暢で
聞きやすいこと、豊かな音質であること、などが要求さ
れる。しかしながら、任意音声合成の技術には未解決の
問題が多く、合成音声には明瞭性に欠けるとか、日本語
らしいリズム感に乏しいとかの難点があった。Speech output devices are required to have synthesized speech that is clear, fluent and easy to listen to, and has rich sound quality. However, there are many unresolved problems with arbitrary speech synthesis technology, such as the synthesized speech lacking clarity and the sense of rhythm typical of Japanese.

規則音声合成のための音声パラメータ結合方式としては
、従来、例えば、■「箱出、石川、ＬＳｉ）　−ＣＶ方
式におけるパラメータ結合法の検討。As a speech parameter combination method for regular speech synthesis, for example, ① "Hakode, Ishikawa, LSi) - Study of parameter combination method in CV method.

日本音響学会講演論文集ｌ−２−１８（昭和５８年１０
月」、■「新居、ＣＶ音節配置規則を用いたＬＳＰ−Ｃ
Ｖ規則音声合成、電子情報通信学会論文誌、Ｖ０１ｊ７
０−Ａ、Ｎ（１５，ＰＰ８３６−８４３」、■「斉藤、
大嶋、素片の適用変形を用いた規則音声合成、日本音響
学会講演論文集３−４−１１　（昭和６１年１０月）ｊ
等が知られており、■では、素片間のパラメータを直線
で接続するようにしており（ただし、音韻の種類によっ
て適切な接続点の位置を設定する）、■では、後続素片
の先頭近くにバイアス’７　Ｌ／−ムを設定し、それ以
前のフレームのパラメータをバイアスフレームのパラメ
ータで置き換えて接続するようにしており、■では、直
線接続ではなく、適用変形関数によって滑らかなパラメ
ータ変化を実現するようにしている。Acoustical Society of Japan Proceedings l-2-18 (October 1982)
``Moon'', ■``Arai, LSP-C using CV syllable placement rules
V-rule speech synthesis, Journal of the Institute of Electronics, Information and Communication Engineers, V01j7
0-A, N (15, PP836-843”, ■ “Saito,
Oshima, Ruled Speech Synthesis Using Applied Deformation of Elements, Proceedings of the Acoustical Society of Japan 3-4-11 (October 1986) j
etc. are known. In ■, the parameters between the segments are connected with a straight line (however, the position of the connection point is set appropriately depending on the type of phoneme), and in ■, the beginning of the subsequent segment is connected. A bias '7 L/- frame is set nearby, and the parameters of the previous frame are replaced with the parameters of the bias frame and connected. We are trying to realize this.

而して、素片間の音声パラメータの接続における主な課
題は２つあり、そのひとつは音韻性を明瞭に表現できる
こと、もうひとつは滑らか素片を接続することである。Therefore, there are two main challenges in connecting speech parameters between segments: one is to be able to clearly express phonology, and the other is to connect smooth segments.

多くの場合この２つは相反する要求であり、両方同時に
完全に満足することは難しい。上述の従来技術では、い
ずれも、両方の要求がほどほどに満足される接続方法や
パラメータを設定せざるを得ない。しかし、この２つの
要求のバランスは合成音声の使用目的、合成音声に要求
される口調、合成音声に要求される話者特性などによっ
て異なるのが普通である。例えば、原稿の読み合わせに
よる校閲に使用する場合は滑らかさは犠牲にしても音韻
をはっきりと発音しなければならない。また、長文を聞
き取ってその大意を把握することが目的ならば音韻性は
それほど重要でないが、滑らかでない音声の長時間の聴
取は受聴者の疲労を呼ぶ。In many cases, these two requirements are contradictory, and it is difficult to completely satisfy both at the same time. In all of the above-mentioned conventional techniques, it is necessary to set connection methods and parameters that moderately satisfy both requirements. However, the balance between these two requirements usually differs depending on the purpose of use of the synthesized voice, the tone required of the synthesized voice, the speaker characteristics required of the synthesized voice, etc. For example, when using it for proofreading a manuscript, it is necessary to pronounce the phonemes clearly, even at the expense of smoothness. Furthermore, if the purpose is to listen to a long sentence and understand its meaning, phonology is not so important, but listening to unsmooth speech for a long time causes listeners to become fatigued.

狙　　　的− 本発明は、上述のごとき実情に鑑みてなされたもので、
特に、合成音声の明瞭性と滑らかさを状況にあったバラ
ンスで実現することを目的としてなされたものである。Aim - The present invention was made in view of the above-mentioned circumstances, and
In particular, this was done with the aim of achieving a balance between clarity and smoothness of synthesized speech that suits the situation.

市１−−虞本発明は、上記目的を達成するために、予め用意した音
声素片のパラメータ系列を音韻・韻律を表現する入力記
号列に従って読みだし、音声パラメータ結合規則によっ
て前記音声素片パラメータ系列を接続し、韻律制御規則
によって韻律を付加して音声を合成する規則音声合成装
置において、音声パラメータ結合の際の接続方法、接続
点位置等の接続パラメータを記憶したテーブルに用意し
、複数記憶したテーブルの中から、合成音声の発話速度
１合成音声の使用目的、合成音声に要求される口調、合
成音声に要求される話者特性などによって、異なるテー
ブルを選択的に用いて音声パラメータを結合することを
特徴としたものである。In order to achieve the above object, the present invention reads out a parameter sequence of a speech segment prepared in advance according to an input symbol string expressing phoneme/prosody, and adjusts the speech segment parameters by a speech parameter combination rule. In a regular speech synthesizer that connects sequences and synthesizes speech by adding prosody according to prosody control rules, connection parameters such as connection method and connection point position when combining speech parameters are prepared in a table and multiple storage methods are used. Speech parameters are combined by selectively using different tables from among the tables created, depending on the purpose of use of the synthesized voice, the tone required for the synthesized voice, the speaker characteristics required for the synthesized voice, etc. It is characterized by

以下、本発明の実施例に基いて説明する。Hereinafter, the present invention will be explained based on examples.

第１図は、本発明の一実施例を説明するためのフローチ
ャートで、以下、第１図に添って具体的な処理の流れを
説明する。FIG. 1 is a flowchart for explaining one embodiment of the present invention, and the specific process flow will be explained below with reference to FIG.

まず、合成音声の使用目的、合成音声に要求される口調
、合成音声に要求される話者特性などによってテーブル
番号ｔを設定する。ここで合成音声の使用目的とは、例
えば「受聴者への何らかの教育あるいは指示」、「ニュ
ースやデータベースなどの情報の正確な伝達」、「非常
時の警報あるい誘導」、「人間との対話」、などである
。合成音声に要求される口調とは例えば「命令調ノ、「
質問間」、「朗読調」、「泣きそうな話し方」、「甘え
た話し方」、「怒りを含んだ話し方」などである。合成
音声に要求される話者特性とは「性別」、「年齢」、「
出身地」、「特定の職業（バスガイド、落語家、アナウ
ンサーなど）に特有の話し方」、「特定の個人」などで
ある。First, the table number t is set based on the purpose of use of the synthesized speech, the tone required for the synthesized speech, the speaker characteristics required for the synthesized speech, etc. Here, the purposes of using synthesized speech include, for example, ``some kind of education or instructions to listeners,'' ``accurate transmission of information such as news and databases,'' ``emergency warnings or guidance,'' and ``dialogue with humans.'' ”, etc. The tone required for synthesized speech is, for example, "command tone,"
These include ``between questions'', ``reading tone'', ``speech that makes one feel like crying'', ``speech that is sweet'', and ``speech that is filled with anger''. The speaker characteristics required for synthesized speech include "gender", "age", and "
Examples include ``place of birth'', ``a way of speaking unique to a specific occupation (bus guide, rakugo storyteller, announcer, etc.)'', ``a specific individual'', etc.

また、テーブルとは素片接続の際の接続点の位置や接続
方法などの接続パラメータを記憶したものである。Further, the table stores connection parameters such as the position of connection points and connection method when connecting the pieces.

第２図に、これらの接続パラメータの例を示す。FIG. 2 shows an example of these connection parameters.

第２図でＡは先行素片、Ｂは後続素片、■は接続部分を
表わす。また、Ｎ、は＃の部分のフレーム数を表わす。In FIG. 2, A represents a preceding elemental piece, B represents a subsequent elemental piece, and ■ represents a connecting part. Further, N represents the number of frames in the # portion.

第３図（ａ）、（ｂ）は、それぞれ状況や特性によって
複数用意したテーブルの例を示す図で、ここで、テーブ
ルに記憶した接続パラメータについて説明すると、ａは
先行素片の接続点位置を最終フレームからのフレーム数
で表わしたものであり、ｂは後続素片の接続点位置を先
頭フレームからのフレーム数で表わしたものであり、ｆ
は接続の方法を補間関数で表わしたものである。Figures 3(a) and 3(b) are diagrams showing examples of multiple tables prepared depending on the situation and characteristics, respectively.Here, to explain the connection parameters stored in the table, a is the connection point position of the preceding segment. is expressed as the number of frames from the last frame, b is the connection point position of the subsequent segment expressed as the number of frames from the first frame, and f
represents the connection method using an interpolation function.

上述のようなそれぞれの状況や特性に対して最も適切な
接続パラメータを決定し、それぞれに対応するテーブル
を用意しておく。Determine the most appropriate connection parameters for each situation and characteristic as described above, and prepare tables corresponding to each.

この後、接続部分の第ｉフレーム（０≦ｌ　＜　Ｎ　Ｉ
）における音声パラメータを次式によって計算する。After this, the i-th frame of the connection part (0≦l<N I
) is calculated by the following formula.

なお、Ｘ［＃］は第ｉフレームにおけるＸの音声パラメ
ータを意味するものとする。この時の接続方法や接続点
位置などの接続パラメータは合成音声の使用目的、合成
音声に要求される口調、合成音声に要求される話者特性
などの特性ごとに、複数のテーブルとして記憶装置に蓄
えられているもののテーブル番号しによって選択をして
使用する。Note that X[#] means the audio parameter of X in the i-th frame. At this time, connection parameters such as the connection method and connection point position are stored in the storage device as multiple tables for each characteristic such as the purpose of use of the synthesized voice, the tone required for the synthesized voice, and the speaker characteristics required for the synthesized voice. Select and use the stored items based on the table number.

さらに、韻律パターンを設定し、ディジタルフィルタ等
によって音声信号を合成し、Ｄ／Ａ変換器によって合成
音声を得るが、これらの方法は公知であるので詳しい説
明は省略する。Furthermore, a prosody pattern is set, audio signals are synthesized using a digital filter, etc., and synthesized speech is obtained using a D/A converter, but since these methods are well known, detailed explanation will be omitted.

努−一末以」二の説明から明らかなように、本発明によると、合
成音声の使用目的、合成音声に要求される口調、合成音
声に要求される話者特性などに応じて、複数の異なるテ
ーブルを選択的に用いて素片間の音声パラメータを結合
することにより、発声［１的６ロ調、話者特性などによ
って異なる自然性・明瞭性への要求をみたす音声が合成
できる。As is clear from the explanation in Section 2, according to the present invention, a plurality of speech patterns can be used depending on the purpose of use of the synthesized speech, the tone required for the synthesized speech, the speaker characteristics required for the synthesized speech, etc. By selectively using different tables to combine the speech parameters between segments, it is possible to synthesize speech that satisfies different demands for naturalness and clarity depending on the utterance [1-6 B key, speaker characteristics, etc.].

[Brief explanation of the drawing]

第１図は、本発明の一実施例を説明するためのフローチ
ャート、第２図は、接続パラメータの例を示す図、第３
図（ａ）、（ｂ）は、それぞれ接続パラメータを記憶し
たテーブルを示す図である。FIG. 1 is a flowchart for explaining one embodiment of the present invention, FIG. 2 is a diagram showing an example of connection parameters, and FIG.
Figures (a) and (b) are diagrams showing tables storing connection parameters, respectively.

Claims

[Claims]

1. Read out the parameter series of speech segments prepared in advance according to input symbol strings expressing phoneme and prosody, connect the speech segment parameter series using speech parameter combination rules, and add prosody using prosody control rules to create speech. In a rule speech synthesis device that synthesizes speech parameters, a table is prepared that stores connection parameters such as connection method and connection point position when combining speech parameters, and from among the multiple stored tables,
It is characterized by selectively using different tables to combine voice parameters depending on the speech rate of the synthesized voice, the purpose of use of the synthesized voice, the tone required for the synthesized voice, the speaker characteristics required for the synthesized voice, etc. Audio parameter combination method.