JP2005321631A - Speech synthesizing method and its system - Google Patents

Speech synthesizing method and its system Download PDF

Info

Publication number
JP2005321631A
JP2005321631A JP2004139861A JP2004139861A JP2005321631A JP 2005321631 A JP2005321631 A JP 2005321631A JP 2004139861 A JP2004139861 A JP 2004139861A JP 2004139861 A JP2004139861 A JP 2004139861A JP 2005321631 A JP2005321631 A JP 2005321631A
Authority
JP
Japan
Prior art keywords
speech
unit
units
synthesized
speaker
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP2004139861A
Other languages
Japanese (ja)
Other versions
JP4297433B2 (en
Inventor
Miki Hasebe
未来 長谷部
Hideyuki Mizuno
秀之 水野
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nippon Telegraph and Telephone Corp
Original Assignee
Nippon Telegraph and Telephone Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nippon Telegraph and Telephone Corp filed Critical Nippon Telegraph and Telephone Corp
Priority to JP2004139861A priority Critical patent/JP4297433B2/en
Publication of JP2005321631A publication Critical patent/JP2005321631A/en
Application granted granted Critical
Publication of JP4297433B2 publication Critical patent/JP4297433B2/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Abstract

<P>PROBLEM TO BE SOLVED: To provide a speech synthesizing method for providing higher quality synthesized speech than the conventional one in response to the case that synthesized speech is reproduced, and also to provide its system. <P>SOLUTION: The speech synthesizing method comprises steps of: retrieving an elementary speech unit used for synthesis at each unit in an input text by a DB retrieval part 2 from a database 1 storing a speech corpus including the elementary speech units of various units configuring a language; selecting combination of the optimum elementary speech unit from among the elementary speech units at each unit in the retrieved text by an elementary speech unit selector 3; determining by a state determining part 4 whether audio equipment reproducing the synthesized speech is a head phone or a speaker; outputting each elementary speech unit to a connection part 6 as it is with the head phone, or routing each elementary speech unit through a signal processor 5 with the speaker, and outputting it to the connection part 6 after signal processing correcting a rhythm to the elementary speech unit is performed. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は、キーボード等から直接入力され又は予めテキストを記憶した記憶媒体から読み出されて入力され又は通信媒体を介して他の装置等から入力されたテキストを音声に変換して再生(出力)するテキスト音声合成技術に関する。   The present invention converts text input directly from a keyboard or the like, or read and input from a storage medium that stores text in advance, or input from another device or the like via a communication medium, into sound and reproduced (output). The present invention relates to text-to-speech synthesis technology.

現在、電話による株価案内システム等の各種の情報案内システムやEメール・Webの読み上げ等、様々な状況でテキスト音声合成技術が利用されている。   Currently, text-to-speech synthesis technology is used in various situations such as various information guidance systems such as a stock price guidance system by telephone, and reading out emails and webs.

従来の音声合成方法として、(1)音素、音韻、単語等の言語を構成する様々な単位の音声素片を含む音声コーパスから、入力されたテキスト中の前記単位に対応し、合成に使用可能な音声素片を検索し、該検索した音声素片の中から最適な音声素片を選択し、これを前記テキスト中の全ての前記単位について繰り返し、選択した各音声素片をそのまま接続して合成音声とする方法がある(特許文献1参照)。   As a conventional speech synthesis method, (1) a speech corpus including speech units of various units constituting a language such as phonemes, phonemes, words, etc., can be used for synthesis corresponding to the units in the input text. Search for a speech unit, select an optimal speech unit from the searched speech units, repeat this for all the units in the text, and connect the selected speech units as they are. There is a method of using synthesized speech (see Patent Document 1).

また、この際、(2)選択した各音声素片に対して、合成目標となる韻律に合わせるための信号処理を施した上で接続して合成音声とする方法がある(非特許文献1参照)。
特許第2761552号公報 Satoshi TAKANO, Masanobu ABE "A NEW F0 MODIFICATION ALGORITHM BY MANIPULATING HARMONICS OF MAGNITUDE SPECTRUM", Eurospeech'99 高野、阿部「部分置換実験による韻律変形の音質へおよぼす影響の評価」日本音響学会講演論文集1−7−11、2000(3)、pp.217−218
In this case, (2) there is a method in which each selected speech unit is connected to a synthesized speech after being subjected to signal processing for matching to the synthesis target prosody (see Non-Patent Document 1). ).
Japanese Patent No. 2761552 Satoshi TAKANO, Masanobu ABE "A NEW F0 MODIFICATION ALGORITHM BY MANIPULATING HARMONICS OF MAGNITUDE SPECTRUM", Eurospeech'99 Takano, Abe “Evaluation of the effect of prosody modification on sound quality by partial substitution experiment” Proceedings of the Acoustical Society of Japan 1-7-11, 2000 (3), pp. 217-218

しかし、前述した(1)の方法では、肉声らしい自然な音質の音声を合成できるが、合成音声の韻律が合成目標の韻律とは異なる可能性があるという問題があり、また、(2)の方法では、合成目標通りの韻律が得られるが、信号処理によって音声の持つ肉声らしさが損なわれる可能性があるという問題があった。   However, with the method (1) described above, it is possible to synthesize natural voice-like speech, but there is a problem that the prosody of the synthesized speech may be different from the prosody of the synthesis target. In the method, the prosody according to the synthesis target can be obtained, but there is a problem that the voice quality of the voice may be impaired by the signal processing.

つまり、従来のテキスト音声合成技術では、合成音声の肉声らしさと韻律の正確さがトレードオフの関係にあり、両方同時に満たすことができないという問題があった。   In other words, the conventional text-to-speech synthesis technology has a problem that the real voice quality of the synthesized speech and the accuracy of the prosody are in a trade-off relationship, and both cannot be satisfied at the same time.

このように、現状の音声合成技術は人間の音声に比べて十分な品質を達成できておらず、合成音声の品質向上への要望が強かった。   Thus, the current speech synthesis technology has not achieved sufficient quality compared to human speech, and there has been a strong demand for improving the quality of synthesized speech.

本発明の目的は、音声合成技術の利用状況、特に合成音声を再生する際の状況に応じて、これまでより高品質な合成音声を提供可能な音声合成方法及びその装置を実現することにある。   SUMMARY OF THE INVENTION An object of the present invention is to realize a speech synthesis method and apparatus capable of providing a synthesized speech of higher quality than ever, according to the use situation of speech synthesis technology, particularly when the synthesized speech is reproduced. .

上述のように、合成音声の肉声らしさと韻律の正確さはトレードオフの関係にあり、韻律を正確に合成しようとすると音質が劣化するという問題がある。   As described above, there is a trade-off relationship between the real voice quality of the synthesized speech and the accuracy of the prosody, and there is a problem that sound quality deteriorates when attempting to synthesize the prosody accurately.

ここで、韻律を正確に合成するために信号処理を行った場合の音質の劣化は、韻律の変形量や変形の方向、信号処理方法などによってその度合いが異なる(非特許文献1、2参照)。   Here, the degree of deterioration in sound quality when signal processing is performed in order to synthesize the prosody accurately differs depending on the prosody deformation amount and direction, the signal processing method, and the like (see Non-Patent Documents 1 and 2). .

また、信号処理を行った場合の合成音声の音質劣化は、ヘッドフォンで聞く場合とスピーカから聞く場合とではその劣化の判り方が異なり、当然ながら、ヘッドフォンで合成音声のみを集中して聞く方が合成音声の音質劣化が判り易い。   In addition, the sound quality degradation of synthesized speech when signal processing is performed differs depending on whether it is heard with headphones or speakers, and naturally, it is better to listen only to synthesized speech with headphones. It is easy to understand the deterioration of the synthesized speech.

このように、合成音声を再生する際の状況として、ヘッドフォンで再生して聞くのか、スピーカから再生して聞くのか、また、スピーカから再生して聞く場合には、部屋の中等の静かな場所なのか、駅構内等の騒がしい場所なのか、といった様々な状況が存在し、それぞれの状況や環境で合成音声の音質劣化の判り方が異なる。   In this way, the situation when playing back synthesized speech is whether it is played back through headphones, listened through speakers, or played back through speakers. There are various situations such as whether it is a noisy place such as a station premises, etc., and the way of understanding the deterioration of the quality of the synthesized speech differs in each situation and environment.

本発明では、合成音声を再生する際の状況まで考慮して、信号処理による音質の劣化が目立たない状況では信号処理を行って韻律を修正することにより、ユーザに対しては信号処理による音質の劣化を感じさせずかつ正しい韻律の合成音声を提示することで、上述の合成音声の音質と韻律の正しさを両立するという課題を解決する。   In the present invention, in consideration of the situation when the synthesized speech is reproduced, in a situation where deterioration in sound quality due to signal processing is not conspicuous, signal processing is performed to correct the prosody, so that the sound quality due to signal processing is improved for the user. By presenting the synthesized speech with the correct prosody without causing deterioration, the above-described problem of achieving both the sound quality of the synthesized speech and the correctness of the prosody is solved.

本発明によれば、合成音声を再生する際の利用状況に応じて、これまでより高品質な合成音声の提供が可能となる。   ADVANTAGE OF THE INVENTION According to this invention, according to the utilization condition at the time of reproducing | regenerating synthetic | combination speech, it becomes possible to provide synthetic speech of higher quality than before.

なお、本出願でいうヘッドフォンは、一人の人間の身体に直接接触する形で使用され、音声信号を当該人間の耳に伝達する全ての音響機器を含み、また、本出願でいうスピーカは、人間の身体に接触することなく使用され、主として空気を介して音声信号を一人以上の人間の耳に伝達する全ての音響機器を含むものとする。   The headphones referred to in this application are used in direct contact with the body of one person, and include all acoustic devices that transmit audio signals to the human ear, and the speakers referred to in this application are humans. It includes all acoustic equipment that is used without touching the body and that transmits audio signals to one or more human ears, primarily through the air.

本発明の特徴は、音声合成技術の利用状況まで考慮し、合成音声が再生される際の状況に応じて信号処理による韻律変形を行うかどうかを決定することで、音質劣化が目立たないような雑音の多い環境においては韻律を修正して音声を再生することが可能となり、従来より高品質な音声を合成することである。   The feature of the present invention is that the use of speech synthesis technology is taken into consideration, and it is determined whether or not prosody modification by signal processing is performed according to the situation when the synthesized speech is reproduced, so that the sound quality degradation is not noticeable. In a noisy environment, prosody can be corrected and voice can be reproduced, which is to synthesize higher quality voice than before.

その結果、各種の情報案内等のサービスにおいて、より高品質な合成音声を提供することが可能となり、また、従来は品質の問題から合成音声を利用できなかった分野においても音声合成技術を利用可能になる。   As a result, it is possible to provide higher-quality synthesized speech for various information guidance services, etc., and speech synthesis technology can also be used in fields where synthesized speech could not be used due to quality problems. become.

図1は本発明の音声合成装置の実施の形態の一例を示すもので、図中、1はデータベース(DB)、2はデータベース検索部(DB検索部)、3は音声素片選択部、4は状況判定部、5は信号処理部、6は接続部である。以下、各部の動作をその構成とともに説明する。   FIG. 1 shows an example of an embodiment of a speech synthesizer according to the present invention. In the figure, 1 is a database (DB), 2 is a database search unit (DB search unit), 3 is a speech unit selection unit, 4 Is a status determination unit, 5 is a signal processing unit, and 6 is a connection unit. Hereinafter, the operation of each unit will be described together with its configuration.

DB1は、音素、音韻、単語等の言語を構成する様々な単位の音声素片を含む音声コーパス、詳細には音声波形、音声の韻律情報、発声内容に対応する音素ラベル列、音声素片の境界を示すラベルデータ等、合成のための情報を含む音声コーパスを格納している。   DB1 is a speech corpus that includes speech units of various units constituting a language such as phonemes, phonemes, and words. Specifically, speech corpus, speech prosodic information, phoneme label sequence corresponding to the utterance content, speech segment A speech corpus including information for synthesis such as label data indicating a boundary is stored.

DB検索部2は、入力されたテキストと、図示しないテキスト解析部によって得られた当該テキストの音素系列、合成の目標となる韻律、使用するデータベースや信号処理方法の指定等、音声合成のための制御情報とを含む入力情報101を入力とし、テキスト中の前記単位毎に合成に使用可能な音声素片をデータベース1から検索し、検索した音声素片102を検索結果として音声素片選択部3へ渡す。   The DB search unit 2 is used for speech synthesis such as input text, phoneme series of the text obtained by a text analysis unit (not shown), prosody as a synthesis target, specification of a database to be used and a signal processing method. The input information 101 including the control information is input, the speech unit usable for synthesis is searched from the database 1 for each unit in the text, and the speech unit selection unit 3 uses the searched speech unit 102 as a search result. To pass.

音声素片選択部3は、DB検索部2から渡されたテキスト中の前記単位毎の音声素片102の中から最適な音声素片の組み合わせを選択する、詳細には音声素片102に対し、韻律、音韻環境、接続性、言語情報等の合成音声の品質に関わる要素をコストとして計算し、コストを最小化する音声素片の組み合わせを探索することによって、最適な音声素片の組み合わせ103を選択し、これを選択結果として状況判定部4に渡す。   The speech unit selection unit 3 selects an optimal combination of speech units from the speech units 102 for each unit in the text passed from the DB search unit 2. , Prosody, phoneme environment, connectivity, linguistic information, and other factors related to the quality of the synthesized speech are calculated as costs, and by searching for a speech unit combination that minimizes the cost, an optimal speech unit combination 103 Is selected and passed to the situation determination unit 4 as a selection result.

状況判定部4は、音声素片選択部3から渡された最適な音声素片の組み合わせ103を、合成音声を再生する際の状況に応じて、後述する如く信号処理部5もしくは接続部6へ渡す。   The situation determination unit 4 sends the optimum speech unit combination 103 passed from the speech unit selection unit 3 to the signal processing unit 5 or the connection unit 6 as described later according to the situation when the synthesized speech is reproduced. hand over.

信号処理部5は、状況判定部4から音声素片の組み合わせ103が渡された場合、該組み合わせ103に含まれる音声素片のそれぞれに対し、合成目標となる韻律に合わせるために韻律を修正する信号処理を行い、修正済み音声素片の組み合わせ104として接続部6へ渡す。   When the speech unit combination 103 is passed from the situation determination unit 4, the signal processing unit 5 corrects the prosody for each of the speech units included in the combination 103 in order to match the synthesis target prosody. Signal processing is performed, and the result is passed to the connection unit 6 as a modified speech element combination 104.

接続部6は、状況判定部4から渡された音声素片の組み合わせ103もしくは信号処理部5から渡された修正済み音声素片の組み合わせ104を接続し、合成音声(データ)105として図示しない再生用の音響機器等へ出力する。   The connection unit 6 connects the speech unit combination 103 passed from the situation determination unit 4 or the modified speech unit combination 104 passed from the signal processing unit 5, and reproduces the synthesized speech (data) 105 (not shown). Output to audio equipment for use.

なお、DB1、DB検索部2、音声素片選択部3、信号処理部5及び接続部6並びに図示しないテキスト解析部の構成及び動作は、既存の音声合成装置の場合と何ら変わらないので、その詳細は省略する。   The configuration and operation of the DB1, the DB search unit 2, the speech unit selection unit 3, the signal processing unit 5 and the connection unit 6, and the text analysis unit (not shown) are the same as those of the existing speech synthesizer. Details are omitted.

図2は状況判定部4における処理の流れを示すもので、以下、これに従って状況判定部4の動作を説明する。   FIG. 2 shows the flow of processing in the situation determination unit 4, and the operation of the situation determination unit 4 will be described below in accordance with this.

状況判定部4は、音声素片選択部3から最適な音声素片の組み合わせ103を受け取るとともに、合成音声を再生する音響機器がヘッドフォンであるかスピーカであるかを示す機器識別情報106及び音響機器がスピーカである場合の当該スピーカの設置場所の周囲の雑音レベル(情報)107を受信し、機器識別情報106から合成音声を再生する音響機器がヘッドフォンであるかスピーカであるかを判定し(s1)、この際、ヘッドフォンであれば信号処理による音質劣化の影響が大きいとして最適な音声素片の組み合わせ103を接続部6へ渡す(s2)。   The situation determination unit 4 receives the optimum speech unit combination 103 from the speech unit selection unit 3, and also includes device identification information 106 and an acoustic device indicating whether the acoustic device that reproduces the synthesized speech is a headphone or a speaker. Is a speaker, a noise level (information) 107 around the installation location of the speaker is received, and it is determined from the device identification information 106 whether the acoustic device that reproduces the synthesized speech is a headphone or a speaker (s1). At this time, if it is a headphone, since the influence of the sound quality deterioration due to the signal processing is large, the optimum speech element combination 103 is passed to the connection unit 6 (s2).

一方、機器識別情報106が、合成音声を再生する音響機器がスピーカであることを示している場合は、雑音レベル107が予め設定した所定の閾値以下であるかどうかを判定し(s3)、この際、閾値以下であれば信号処理による音質劣化の影響が大きいとして最適な音声素片の組み合わせ103を接続部6へ渡し(s2)、また、閾値より大きければ信号処理による音質劣化の影響が小さいとして最適な音声素片の組み合わせ103を信号処理部4へ渡す(s4)。   On the other hand, when the device identification information 106 indicates that the acoustic device that reproduces the synthesized speech is a speaker, it is determined whether or not the noise level 107 is equal to or lower than a predetermined threshold value (s3). At this time, if it is less than the threshold value, it is assumed that the influence of the sound quality deterioration due to the signal processing is large, and the optimum speech unit combination 103 is passed to the connection unit 6 (s2). The optimum speech element combination 103 is transferred to the signal processing unit 4 (s4).

なお、ここでステップs3を省略し、合成音声を再生する音響機器がスピーカである場合は、常に音質劣化の影響が小さいとして最適な音声素片の組み合わせ103を信号処理部4へ渡すようにしても良い。   Here, step s3 is omitted, and when the acoustic device that reproduces the synthesized speech is a speaker, the optimum speech element combination 103 is always passed to the signal processing unit 4 on the assumption that the influence of sound quality deterioration is small. Also good.

また、前述した機器識別情報106としては、ユーザから入力された音響機器がヘッドフォンであるかスピーカであるかを示す指定情報や、ヘッドフォンジャックを備えたパソコンやテレビ等の装置における当該ヘッドフォンジャックにヘッドフォンプラグが接続されたかどうかを表す信号等を用いることができ、また、雑音レベル107としては、スピーカの設置場所の周囲に配置されたマイク等の出力信号を適当な積分回路等に通した際に得られる、ある程度時間軸上で平均化された信号を用いることができる。   Further, as the device identification information 106 described above, the specification information indicating whether the acoustic device input from the user is a headphone or a speaker, or the headphone jack in a device such as a personal computer or a television equipped with a headphone jack is connected to the headphone. A signal indicating whether or not the plug is connected can be used, and the noise level 107 is obtained when an output signal from a microphone or the like placed around the speaker installation place is passed through an appropriate integration circuit or the like. The resulting signal averaged on the time axis to some extent can be used.

本発明の音声合成装置の実施の形態の一例を示す構成図The block diagram which shows an example of embodiment of the speech synthesizer of this invention 図1中の状況判定部における処理の流れ図Flow chart of processing in the situation determination unit in FIG.

符号の説明Explanation of symbols

1:データベース(DB)、2:データベース検索部(DB検索部)、3:音声素片選択部、4:状況判定部、5:信号処理部、6:接続部。   1: Database (DB), 2: Database search unit (DB search unit), 3: Speech segment selection unit, 4: Situation determination unit, 5: Signal processing unit, 6: Connection unit.

Claims (4)

言語を構成する単位の音声素片を含む音声コーパスを格納したデータベースを用いて、計算機が、入力されたテキスト中の前記単位に対応し、合成に使用可能な音声素片を前記データベースから検索し、該検索した音声素片の中から最適な音声素片を選択し、これを前記テキスト中の全ての前記単位について繰り返し、選択した各音声素片を接続して合成音声とする音声合成方法において、
計算機が、
合成音声を再生する音響機器がヘッドフォンかスピーカかを判定し、
音響機器がヘッドフォンであれば前記選択した各音声素片をそのまま接続して合成音声とし、
音響機器がスピーカであれば前記選択した各音声素片に韻律を修正する信号処理を行った上で接続して合成音声とする
ことを特徴とする音声合成方法。
Using a database storing a speech corpus including speech units of units constituting a language, a computer searches the database for speech segments that can be used for synthesis corresponding to the units in the input text. In the speech synthesis method, an optimal speech unit is selected from the retrieved speech units, this is repeated for all the units in the text, and the selected speech units are connected to form synthesized speech. ,
The calculator
Determine whether the audio device that plays the synthesized sound is a headphone or a speaker,
If the audio device is a headphone, connect the selected speech segments as they are to make a synthesized speech,
If the acoustic device is a speaker, the speech synthesis method is characterized in that after the signal processing for correcting the prosody is performed on each of the selected speech units, the speech is connected to produce synthesized speech.
請求項1記載の音声合成方法において、
計算機が、
音響機器がスピーカである場合、さらにスピーカの設置場所の周囲の雑音レベルが所定の閾値以下かどうかを判定し、
閾値以下であれば前記選択した各音声素片をそのまま接続して合成音声とし、
閾値より大きければ前記選択した各音声素片に韻律を修正する信号処理を行った上で接続して合成音声とする
ことを特徴とする音声合成方法。
The speech synthesis method according to claim 1,
The calculator
If the audio device is a speaker, further determine whether the noise level around the speaker installation location is below a predetermined threshold,
If it is below the threshold, each selected speech unit is connected as it is to make a synthesized speech,
A speech synthesis method characterized in that if it is larger than the threshold value, the selected speech unit is connected to the selected speech segment after performing signal processing for correcting the prosody, and then synthesized speech.
言語を構成する単位の音声素片を含む音声コーパスを格納したデータベースと、入力されたテキスト中の前記単位毎に合成に使用可能な音声素片を前記データベースから検索するデータベース検索部と、該検索したテキスト中の前記単位毎の音声素片の中から最適な音声素片を選択する音声素片選択部と、該選択したテキスト中の各単位に対応する音声素片を接続して合成音声とする接続部とを備えた音声合成装置において、
音声素片に対して韻律を修正する信号処理を行う信号処理部と、
合成音声を再生する音響機器がヘッドフォンかスピーカかを判定し、ヘッドフォンであれば音声素片選択部で選択された各音声素片をそのまま接続部へ出力し、スピーカであれば音声素片選択部で選択された各音声素片を信号処理部を経由して接続部へ出力する状況判定部とを備えた
ことを特徴とする音声合成装置。
A database storing a speech corpus including speech units of units constituting a language, a database search unit for searching speech units usable for synthesis for each unit in the input text, and the search A speech unit selecting unit that selects an optimal speech unit from speech units for each unit in the text, and a speech unit corresponding to each unit in the selected text, A speech synthesizer provided with a connection unit for
A signal processing unit for performing signal processing for correcting a prosody for a speech unit;
It is determined whether the sound device that reproduces the synthesized speech is a headphone or a speaker. If the headphone is a headphone, each speech unit selected by the speech unit selection unit is output to the connection unit as it is, and if it is a speaker, the speech unit selection unit A speech synthesizer comprising: a situation determination unit that outputs each speech unit selected in step 1 to a connection unit via a signal processing unit.
請求項3記載の音声合成装置において、
合成音声を再生する音響機器がヘッドフォンかスピーカかを判定し、ヘッドフォンであれば音声素片選択部で選択された各音声素片をそのまま接続部へ出力し、スピーカであれば当該スピーカの設置場所の周囲の雑音レベルが所定の閾値以下かどうかを判定し、閾値以下であれば音声素片選択部で選択された各音声素片をそのまま接続部へ出力し、閾値より大きければ音声素片選択部で選択された各音声素片を信号処理部を経由して接続部へ出力する状況判定部を備えた
ことを特徴とする音声合成装置。
The speech synthesizer according to claim 3.
It is determined whether the sound device that reproduces the synthesized sound is a headphone or a speaker. If the headphone is a headphone, each voice unit selected by the voice unit selection unit is output to the connection unit as it is. It is determined whether or not the noise level in the surroundings is equal to or less than a predetermined threshold value. If the noise level is equal to or less than the threshold value, each voice element selected by the voice element selection unit is output to the connection unit as it is. A speech synthesizer, comprising: a situation determination unit that outputs each speech unit selected by the unit to the connection unit via the signal processing unit.
JP2004139861A 2004-05-10 2004-05-10 Speech synthesis method and apparatus Expired - Fee Related JP4297433B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2004139861A JP4297433B2 (en) 2004-05-10 2004-05-10 Speech synthesis method and apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2004139861A JP4297433B2 (en) 2004-05-10 2004-05-10 Speech synthesis method and apparatus

Publications (2)

Publication Number Publication Date
JP2005321631A true JP2005321631A (en) 2005-11-17
JP4297433B2 JP4297433B2 (en) 2009-07-15

Family

ID=35468968

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2004139861A Expired - Fee Related JP4297433B2 (en) 2004-05-10 2004-05-10 Speech synthesis method and apparatus

Country Status (1)

Country Link
JP (1) JP4297433B2 (en)

Also Published As

Publication number Publication date
JP4297433B2 (en) 2009-07-15

Similar Documents

Publication Publication Date Title
US10685638B2 (en) Audio scene apparatus
JP6118838B2 (en) Information processing apparatus, information processing system, information processing method, and information processing program
US8781836B2 (en) Hearing assistance system for providing consistent human speech
JP5750380B2 (en) Speech translation apparatus, speech translation method, and speech translation program
JP2008096483A (en) Sound output control device and sound output control method
JP2000148182A (en) Editing system and method used for transcription of telephone message
US11727949B2 (en) Methods and apparatus for reducing stuttering
WO2018038235A1 (en) Auditory training device, auditory training method, and program
JP2011248025A (en) Channel integration method, channel integration device, and program
US20050080626A1 (en) Voice output device and method
JP2013072903A (en) Synthesis dictionary creation device and synthesis dictionary creation method
US11367457B2 (en) Method for detecting ambient noise to change the playing voice frequency and sound playing device thereof
WO2011122522A1 (en) Ambient expression selection system, ambient expression selection method, and program
JP2011186143A (en) Speech synthesizer, speech synthesis method for learning user&#39;s behavior, and program
JP4564416B2 (en) Speech synthesis apparatus and speech synthesis program
US20140324418A1 (en) Voice input/output device, method and programme for preventing howling
JP3555490B2 (en) Voice conversion system
JP4297433B2 (en) Speech synthesis method and apparatus
KR101611224B1 (en) Audio interface
JP2016186646A (en) Voice translation apparatus, voice translation method and voice translation program
JP5052107B2 (en) Voice reproduction device and voice reproduction method
JP2015187738A (en) Speech translation device, speech translation method, and speech translation program
JP6353402B2 (en) Acoustic digital watermark system, digital watermark embedding apparatus, digital watermark reading apparatus, method and program thereof
JP4817949B2 (en) In-vehicle machine
JP5049310B2 (en) Speech learning / synthesis system and speech learning / synthesis method

Legal Events

Date Code Title Description
A621 Written request for application examination

Free format text: JAPANESE INTERMEDIATE CODE: A621

Effective date: 20060718

A977 Report on retrieval

Free format text: JAPANESE INTERMEDIATE CODE: A971007

Effective date: 20090119

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20090122

A521 Request for written amendment filed

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20090303

TRDD Decision of grant or rejection written
A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

Effective date: 20090408

A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

A61 First payment of annual fees (during grant procedure)

Free format text: JAPANESE INTERMEDIATE CODE: A61

Effective date: 20090410

R150 Certificate of patent or registration of utility model

Ref document number: 4297433

Country of ref document: JP

Free format text: JAPANESE INTERMEDIATE CODE: R150

Free format text: JAPANESE INTERMEDIATE CODE: R150

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20120424

Year of fee payment: 3

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20130424

Year of fee payment: 4

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20140424

Year of fee payment: 5

S531 Written request for registration of change of domicile

Free format text: JAPANESE INTERMEDIATE CODE: R313531

R350 Written notification of registration of transfer

Free format text: JAPANESE INTERMEDIATE CODE: R350

LAPS Cancellation because of no payment of annual fees