JP5935831B2 - Speech synthesis apparatus, speech synthesis method and program - Google Patents

Speech synthesis apparatus, speech synthesis method and program Download PDF

Info

Publication number
JP5935831B2
JP5935831B2 JP2014128317A JP2014128317A JP5935831B2 JP 5935831 B2 JP5935831 B2 JP 5935831B2 JP 2014128317 A JP2014128317 A JP 2014128317A JP 2014128317 A JP2014128317 A JP 2014128317A JP 5935831 B2 JP5935831 B2 JP 5935831B2
Authority
JP
Japan
Prior art keywords
phoneme
sound
pronunciation
indicator
vowel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
JP2014128317A
Other languages
Japanese (ja)
Other versions
JP2014170251A (en
Inventor
久湊 裕司
裕司 久湊
嘉山 啓
啓 嘉山
慶二郎 才野
慶二郎 才野
隼人 大下
隼人 大下
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yamaha Corp
Original Assignee
Yamaha Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yamaha Corp filed Critical Yamaha Corp
Priority to JP2014128317A priority Critical patent/JP5935831B2/en
Publication of JP2014170251A publication Critical patent/JP2014170251A/en
Application granted granted Critical
Publication of JP5935831B2 publication Critical patent/JP5935831B2/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Auxiliary Devices For Music (AREA)

Description

本発明は、音声を合成する技術に関する。   The present invention relates to a technique for synthesizing speech.

所望の音声を合成する技術が従来から提案されている。例えば特許文献1には、音声合成の対象として指定された各音符(以下「指定音」という)に所望の歌詞を割当てる技術が開示されている。母音を各々が含む複数の音節の歌詞が1個の指定音に割当てられた場合には、その指定音は音節毎(母音毎)に分割され、各音節の時間長の比率が利用者からの指示(移動ツマミの操作)に応じて可変に設定される。例えば、「きみ(君)」という歌詞が1個の指定音に割当てられた場合、その指定音を分割した音節「き(ki)」および音節「み(mi)」の時間長の比率が調整される。   Techniques for synthesizing desired speech have been proposed. For example, Patent Document 1 discloses a technique for assigning desired lyrics to each note (hereinafter referred to as “designated sound”) designated as a target for speech synthesis. When lyrics of a plurality of syllables each containing a vowel are assigned to one designated sound, the designated sound is divided into syllables (each vowel), and the time length ratio of each syllable is determined by the user. It is variably set according to the instruction (operation of the moving knob). For example, if the lyrics “Kimi (Kimi)” are assigned to one specified sound, the ratio of the length of the syllable “ki (ki)” and syllable “mi (mi)” is divided. Is done.

特開2004−258562号公報JP 2004-258562 A

しかし、特許文献1の技術では1個の指定音が音節毎(母音毎)に分割されるから、利用者が意図する微妙な表情の合成音を生成することが困難であるという問題がある。以上の事情を考慮して、本発明は、利用者の意図を精緻に反映した合成音を生成することを目的とする。   However, in the technique of Patent Document 1, since one designated sound is divided for each syllable (for each vowel), there is a problem that it is difficult to generate a synthesized sound with a delicate expression intended by the user. In view of the above circumstances, an object of the present invention is to generate a synthesized sound that precisely reflects a user's intention.

以上の課題を解決するために、本発明の第1態様に係る音声合成装置は、複数の音素の各々について音素記号と発音期間の始点とを表示装置に時系列に表示させ、各音素の発音期間の始点を利用者からの指示に応じて音素毎に時間軸上で移動させる表示制御手段と、発音期間にわたる各音素の合成音を生成する音声合成手段とを具備する。以上の構成では、各音素の発音期間の始点が利用者からの指示に応じて音素毎に移動されるから、音節を単位として発音期間が調整される特許文献1の技術と比較して、利用者の意図を精緻に反映した合成音を生成できるという利点がある。   In order to solve the above problems, the speech synthesizer according to the first aspect of the present invention displays a phoneme symbol and a starting point of a pronunciation period in time series for each of a plurality of phonemes, Display control means for moving the start point of the period on the time axis for each phoneme according to an instruction from the user, and speech synthesis means for generating a synthesized sound of each phoneme over the pronunciation period. In the above configuration, since the starting point of the pronunciation period of each phoneme is moved for each phoneme according to an instruction from the user, it is used in comparison with the technique of Patent Document 1 in which the pronunciation period is adjusted in units of syllables. There is an advantage that a synthesized sound that precisely reflects the intention of the person can be generated.

第1態様に係る音声合成装置の好適例において、表示制御手段は、音高および発音文字が指示された各指定音の発音期間と、各指定音の発音文字に対応する各音素の音素記号および発音期間の始点とを、相異なる領域に時系列に表示させる。以上の態様の好適例において、表示制御手段は、母音の音素の発音期間の始点の移動が利用者から指示された場合に、母音の音素の発音期間の始点を当該指示に応じて移動させるとともに、母音の音素の発音期間の始点に連動して、母音の音素に対応する指定音の発音期間の始点を移動させる一方、子音の音素の発音期間の始点の移動が利用者から指示された場合に、子音の音素に対応する指定音の発音期間の始点を維持したまま、子音の音素の発音期間の始点を当該指示に応じて移動させる。また、母音の音素の発音期間の始点と子音の音素の発音期間の始点とを相異なる態様で表示させる構成も好適である。   In a preferred example of the speech synthesizer according to the first aspect, the display control means includes a pronunciation period of each designated sound in which the pitch and the pronunciation character are designated, a phoneme symbol of each phoneme corresponding to the pronunciation character of each designated sound, and The start point of the pronunciation period is displayed in chronological order in different areas. In a preferred example of the above aspect, the display control means moves the start point of the pronunciation period of the vowel phoneme according to the instruction when the user instructs to move the start point of the pronunciation period of the vowel phoneme. When the user instructs to move the start point of the pronunciation period of the consonant phoneme while moving the start point of the specified note corresponding to the vowel phoneme in conjunction with the start point of the pronunciation period of the vowel phoneme In addition, the start point of the sound generation period of the consonant phoneme is moved in accordance with the instruction while the start point of the sound generation period of the designated sound corresponding to the consonant phoneme is maintained. A configuration in which the start point of the vowel phoneme pronunciation period and the start point of the consonant phoneme pronunciation period are displayed in a different manner is also preferable.

本発明の第2態様に係る音声合成装置は、複数の音素の各々について発音期間と音素記号とを表示装置に時系列に表示させ、利用者からの指示に応じて音素記号を移動させる表示制御手段と、音声素片毎に素片データを記憶する記憶手段と、発音期間にわたる各音素の合成音を各素片データから生成する音声合成手段とを具備し、音声合成手段は、第1素片データが示す音声素片の第1区間(例えば図3から図5の区間SA)と第2素片データが示す音声素片の第2区間(例えば図3から図5の区間SB)とを利用して一の音素の合成音を生成する場合に、第1区間と第2区間との時間長の比率を一の音素の音素記号の位置に応じた比率に設定する。以上の構成においては、第1素片データが示す音声素片の第1区間と第2素片データが示す音声素片の第2区間との時間長の比率が可変に設定されるから、音節毎に時間長を制御する特許文献1の構成と比較して、利用者の意図を精緻に反映した合成音を生成することが可能である。しかも、利用者から指示された音素記号の位置に応じて第1区間と第2区間との時間長の比率が制御されるから、第1区間と第2区間との時間長の比率を利用者が直感的に把握し易いという利点もある。   The speech synthesizer according to the second aspect of the present invention displays a sound generation period and a phoneme symbol for each of a plurality of phonemes in a time series on a display device, and moves the phoneme symbols according to an instruction from a user. Means, storage means for storing segment data for each speech unit, and speech synthesis means for generating a synthesized sound of each phoneme over the pronunciation period from each segment data. The first segment of the speech unit indicated by the segment data (for example, segment SA in FIGS. 3 to 5) and the second segment of the speech unit represented by the second segment data (for example, segment SB in FIGS. 3 to 5). When a synthesized sound of one phoneme is generated by using, the ratio of the time length between the first section and the second section is set to a ratio corresponding to the position of the phoneme symbol of the one phoneme. In the above configuration, the ratio of the time length between the first section of the speech unit indicated by the first unit data and the second section of the speech unit indicated by the second unit data is variably set. Compared with the configuration of Patent Document 1 in which the time length is controlled every time, it is possible to generate a synthesized sound that precisely reflects the user's intention. In addition, since the ratio of the time length between the first section and the second section is controlled according to the position of the phoneme symbol specified by the user, the ratio of the time length between the first section and the second section is determined by the user. There is also an advantage that it is easy to grasp intuitively.

例えば、音声合成手段は、第1区間に対応する子音の音素が母音の音素に後続する音素連鎖の第1素片データと、第2区間に対応する子音の音素に母音の音素が後続する音素連鎖の第2素片データとを利用して、一の音素の合成音を生成する。   For example, the speech synthesizer includes a first segment data of a phoneme chain in which a consonant phoneme corresponding to the first interval follows the vowel phoneme, and a phoneme in which the vowel phoneme follows the consonant phoneme corresponding to the second interval. Using the second unit data of the chain, a synthesized sound of one phoneme is generated.

第2態様に係る音声合成装置の好適例において、表示制御手段は、音高が指示された複数の指定音の各々に対応する音指示子を表示装置に時系列に表示させ、各音素の発音期間と音素記号とを音指示子に重ねて表示させる。以上の態様においては、各音素の発音期間と音素記号との表示に各指定音の音指示子が流用されるから、各音素の発音期間と音素記号とが音指示子とは別個に表示される構成と比較して、各指定音と各音素との時間軸上の関係を利用者が容易に把握できるという利点がある。   In a preferred example of the speech synthesizer according to the second aspect, the display control means displays a sound indicator corresponding to each of the plurality of designated sounds for which the pitches are instructed on the display device in time series, and the pronunciation of each phoneme. The period and phoneme symbol are displayed superimposed on the sound indicator. In the above aspect, since the sound indicator of each designated sound is used to display the pronunciation period and phoneme symbol of each phoneme, the pronunciation period and phoneme symbol of each phoneme are displayed separately from the sound indicator. There is an advantage that the user can easily grasp the relationship on the time axis between each designated sound and each phoneme.

第2態様に係る音声合成装置の好適例において、表示制御手段は、連続的な発音が指示された各指定音の音指示子を相互に連結する連結部を表示装置に表示させ、連結部に沿うように利用者からの指示に応じて音素記号を移動させる。以上の態様においては、各指定音を連結する連結部に沿って音素記号が移動するから、連続的な発音(レガート)が指示された部分についても、第1区間と第2区間との時間長の比率を利用者が直感的に調整できるという利点がある。   In a preferred example of the speech synthesizer according to the second aspect, the display control means causes the display device to display a connection unit that connects the sound indicators of the designated sounds for which continuous pronunciation is instructed. The phoneme symbol is moved in accordance with an instruction from the user. In the above aspect, since the phoneme symbol moves along the connecting portion that connects the designated sounds, the time length between the first interval and the second interval is also applied to the portion where continuous pronunciation (legato) is indicated. There is an advantage that the ratio can be intuitively adjusted by the user.

本発明の第3態様に係る音声合成装置は、音高が指示された複数の指定音の各々に対応して時系列に配置されて当該指定音の発音期間に応じて長さが選定された音指示子と、各指定音の発音を構成する各音素に対応して時系列に配置されて当該音素の発音期間に応じて長さが選定された音素指示子とを、各指定音の音指示子の始点と当該指定音の発音を構成する母音の音素の音素指示子の始点とが時間軸上で合致するように共通の時間軸のもとで表示装置に表示させる表示制御手段と、音声素片毎に素片データを記憶する記憶手段と、発音期間にわたる各音素の合成音を各素片データから生成する音声合成手段とを具備し、表示制御手段は、子音の音素の音素指示子の始点を利用者からの指示に応じて移動させ、母音の音素の音素指示子の始点の移動が利用者から指示された場合に、当該母音の音素の音素指示子の始点と当該母音の音素に対応する指定音の音指示子の始点とを連動して移動させる。以上の態様においては、各指定音の音素を単位として発音期間が調整されるから、音節を単位として発音期間が調整される特許文献1の技術と比較して、利用者の意図を精緻に反映した合成音を生成することが可能である。また、母音の音素の音素指示子の始点とその音素に対応する指定音の音指示子の始点とが連動するから、指定音の発音期間の始点から母音の発音を開始させるという関係が各音素の発音期間に関わらず維持されるという利点がある。   The speech synthesizer according to the third aspect of the present invention is arranged in time series corresponding to each of a plurality of designated sounds whose pitches are instructed, and the length is selected according to the sound generation period of the designated sound. A sound indicator and a phoneme indicator arranged in a time series corresponding to each phoneme constituting the pronunciation of each designated sound and having a length selected according to the pronunciation period of the phoneme. Display control means for displaying on the display device under a common time axis so that the start point of the indicator and the start point of the phoneme indicator of the vowel phoneme constituting the pronunciation of the designated sound are matched on the time axis; A storage unit that stores segment data for each speech unit; and a speech synthesis unit that generates a synthesized sound of each phoneme from each segment data over a pronunciation period. The display control unit includes a phoneme instruction of a consonant phoneme The child's start point is moved according to instructions from the user, and the start of the phoneme indicator of the vowel phoneme If the movement of is instructed from the user, it is moved in conjunction with the start point of the sound indicator of the designated sound corresponding to phonemes of the start point and the vowel phoneme indicator of phonemes of the vowel. In the above aspect, since the pronunciation period is adjusted in units of phonemes of each designated sound, the intention of the user is accurately reflected in comparison with the technique of Patent Document 1 in which the pronunciation period is adjusted in units of syllables. Synthesized sound can be generated. In addition, since the start point of the phoneme indicator of the vowel phoneme and the start point of the sound indicator of the designated sound corresponding to that phoneme are linked, the relationship of starting the pronunciation of the vowel from the start point of the designated sound generation period is the relationship between each phoneme. There is an advantage that it is maintained regardless of the pronunciation period.

以上の各態様に係る音楽情報処理装置は、音楽情報の処理に専用されるDSP(Digital Signal Processor)などのハードウェア(電子回路)によって実現されるほか、CPU(Central Processing Unit)などの汎用の演算処理装置とプログラムとの協働によっても実現される。本発明の第1態様に係るプログラムは、複数の音素の各々について音素記号と発音期間の始点とを表示装置に時系列に表示させ、各音素の発音期間の始点を利用者からの指示に応じて音素毎に時間軸上で移動させる表示制御手段、および、発音期間にわたる各音素の合成音を生成する音声合成手段としてコンピュータを機能させる。   The music information processing apparatus according to each of the aspects described above is realized by hardware (electronic circuit) such as a DSP (Digital Signal Processor) dedicated to processing music information, and a general-purpose device such as a CPU (Central Processing Unit). This is also realized by cooperation between the arithmetic processing unit and the program. The program according to the first aspect of the present invention displays a phoneme symbol and a starting point of a pronunciation period for each of a plurality of phonemes in a time series on a display device, and responds to an instruction from a user according to an instruction from a user. The computer is caused to function as a display control unit that moves on the time axis for each phoneme, and a voice synthesis unit that generates a synthesized sound of each phoneme over the pronunciation period.

本発明の第2態様に係るプログラムは、音声素片毎に素片データを記憶する記憶手段を具備するコンピュータに、複数の音素の各々について発音期間と音素記号とを表示装置に時系列に表示させ、利用者からの指示に応じて音素記号を移動させる表示制御処理と、発音期間にわたる各音素の合成音を各素片データから生成する音声合成処理であって、第1素片データが示す音声素片の第1区間と第2素片データが示す音声素片の第2区間とを利用して一の音素の合成音を生成する場合に、第1区間と第2区間との時間長の比率を一の音素の音素記号の位置に応じた比率に設定する音声合成処理とを実行させる。以上のプログラムによれば、本発明の第1態様に係る音声合成装置と同様の作用および効果が実現される。   The program according to the second aspect of the present invention displays, in a time series, a pronunciation period and a phoneme symbol for each of a plurality of phonemes on a computer having storage means for storing unit data for each speech unit. Display control processing for moving a phoneme symbol in accordance with an instruction from a user, and speech synthesis processing for generating a synthesized sound of each phoneme over a pronunciation period from each segment data, the first segment data indicates When generating a synthesized sound of one phoneme using the first segment of the speech segment and the second segment of the speech segment indicated by the second segment data, the time length between the first segment and the second segment And a speech synthesis process for setting the ratio to a ratio corresponding to the position of the phoneme symbol of one phoneme. According to the above program, the same operation and effect as the speech synthesizer according to the first aspect of the present invention are realized.

本発明の第3態様に係るプログラムは、音声素片毎に素片データを記憶する記憶手段を具備するコンピュータに、発音期間にわたる各音素の合成音を各素片データから生成する音声合成処理と、音高が指示された複数の指定音の各々に対応して時系列に配置されて当該指定音の発音期間に応じて長さが選定された音指示子と、各指定音の発音を構成する各音素に対応して時系列に配置されて当該音素の発音期間に応じて長さが選定された音素指示子とを、各指定音の音指示子の始点と当該指定音の発音を構成する母音の音素の音素指示子の始点とが時間軸上で合致するように共通の時間軸のもとで表示装置に表示させる表示制御処理であって、子音の音素の音素指示子の始点を利用者からの指示に応じて移動させ、母音の音素の音素指示子の始点の移動が利用者から指示された場合に、当該母音の音素の音素指示子の始点と当該母音の音素に対応する指定音の音指示子の始点とを連動して移動させる表示制御処理とを実行させる。以上のプログラムによれば、本発明の第2態様に係る音声合成装置と同様の作用および効果が実現される。   According to a third aspect of the present invention, there is provided a program comprising: a computer having storage means for storing segment data for each speech unit; and a speech synthesis process for generating a synthesized sound of each phoneme over a pronunciation period from each segment data. A sound indicator that is arranged in chronological order corresponding to each of a plurality of designated sounds for which pitches are instructed and whose length is selected according to the sound generation period of the designated sound, and a pronunciation of each designated sound is configured The phoneme indicator that is arranged in time series corresponding to each phoneme and the length is selected according to the pronunciation period of the phoneme is composed of the start point of the sound indicator of each designated sound and the pronunciation of the designated sound. Display control processing to display on the display device under a common time axis so that the start point of the phoneme indicator of the vowel phoneme to be matched on the time axis, the start point of the phoneme indicator of the consonant phoneme The phoneme indicator is moved according to instructions from the user. Display control processing for moving the start point of the phoneme indicator of the vowel phoneme and the start point of the sound indicator of the designated sound corresponding to the phoneme of the vowel when the movement of the point is instructed by the user; Is executed. According to the above program, the same operation and effect as the speech synthesizer according to the second aspect of the present invention are realized.

以上の各態様のプログラムは、コンピュータが読取可能な記録媒体に格納された形態で利用者に提供されてコンピュータにインストールされるほか、通信網を介した配信の形態でサーバ装置から提供されてコンピュータにインストールされる。   The program of each aspect described above is provided to the user in a form stored in a computer-readable recording medium and installed in the computer, or is provided from the server device in a form of distribution via a communication network. To be installed.

第1実施形態に係る音声合成装置のブロック図である。It is a block diagram of the speech synthesizer concerning a 1st embodiment. 編集画像の模式図である。It is a schematic diagram of an edit image. 音指示子および音素指示子と素片データとの関係を示す模式図である。It is a schematic diagram which shows the relationship between a sound indicator, a phoneme indicator, and segment data. 音素記号の移動時の音指示子と音素指示子と素片データとの関係を示す模式図である。It is a schematic diagram which shows the relationship between the sound indicator at the time of the movement of a phoneme symbol, a phoneme indicator, and segment data. 音素記号の移動時の音指示子と音素指示子と素片データとの関係を示す模式図である。It is a schematic diagram which shows the relationship between the sound indicator at the time of the movement of a phoneme symbol, a phoneme indicator, and segment data. 音声合成部のブロック図である。It is a block diagram of a speech synthesizer. 第2実施形態における音指示子と音素指示子との関係を示す模式図である。It is a schematic diagram which shows the relationship between the sound indicator and phoneme indicator in 2nd Embodiment. 第3実施形態における音指示子の模式図である。It is a schematic diagram of the sound indicator in 3rd Embodiment. 第4実施形態における音指示子の模式図である。It is a schematic diagram of the sound indicator in 4th Embodiment. 変形例における素片データ(VCV型)の模式図である。It is a schematic diagram of the segment data (VCV type) in a modification. 変形例における素片データの伸長について説明するための模式図である。It is a schematic diagram for demonstrating expansion | extension of the segment data in a modification.

<A:第1実施形態>
図1は、本発明の第1実施形態に係る音声合成装置100のブロック図である。音声合成装置100は、歌唱音などの様々な音声(以下「合成音」という)を合成する装置であり、図1に示すように、制御装置10と記憶装置12と入力装置14と表示装置16と放音装置18とを具備するコンピュータシステムで実現される。音声合成装置100を楽曲の歌唱音の合成に利用する場合を以下では想定する。
<A: First Embodiment>
FIG. 1 is a block diagram of a speech synthesizer 100 according to the first embodiment of the present invention. The speech synthesizer 100 is a device that synthesizes various sounds such as singing sounds (hereinafter referred to as “synthesized sounds”). As shown in FIG. 1, the control device 10, the storage device 12, the input device 14, and the display device 16. And a sound emitting device 18. In the following, it is assumed that the speech synthesizer 100 is used for synthesizing a song singing sound.

制御装置(CPU)10は、記憶装置12に記憶されたプログラムPGの実行で、音声信号SOUTの生成に必要な複数の機能(表示制御部22,情報生成部24,音声合成部26)を実現する。音声信号SOUTは、合成音の波形を表す信号である。なお、制御装置10の各機能を専用の電子回路(DSP)で実現した構成や、制御装置10の各機能を複数の集積回路に分散した構成も採用され得る。   The control device (CPU) 10 realizes a plurality of functions (display control unit 22, information generation unit 24, speech synthesis unit 26) necessary for generating the audio signal SOUT by executing the program PG stored in the storage device 12. To do. The audio signal SOUT is a signal representing the waveform of the synthesized sound. A configuration in which each function of the control device 10 is realized by a dedicated electronic circuit (DSP) or a configuration in which each function of the control device 10 is distributed over a plurality of integrated circuits may be employed.

入力装置14は、利用者からの指示を受付ける機器(例えばマウスやキーボード)である。表示装置16(例えば液晶表示装置)は、制御装置10から指示された画像を表示する。放音装置18(例えばスピーカやヘッドホン)は、制御装置10が生成する音声信号SOUTに応じた音波を放射する。   The input device 14 is a device (for example, a mouse or a keyboard) that receives an instruction from a user. The display device 16 (for example, a liquid crystal display device) displays an image instructed from the control device 10. The sound emitting device 18 (for example, a speaker or a headphone) emits a sound wave corresponding to the sound signal SOUT generated by the control device 10.

記憶装置12は、制御装置10が実行するプログラムPGや制御装置10が使用する各種のデータ(素片情報DV,音楽情報DS)を記憶する。半導体記録媒体や磁気記録媒体などの公知の記録媒体や複数種の記録媒体の組合せが記憶装置12として任意に採用される。なお、プログラムPGや各データ(DV,DS)を複数の記録媒体に分散して記憶した構成も採用され得る。   The storage device 12 stores a program PG executed by the control device 10 and various data (segment information DV, music information DS) used by the control device 10. A known recording medium such as a semiconductor recording medium or a magnetic recording medium or a combination of a plurality of types of recording media is arbitrarily adopted as the storage device 12. A configuration in which the program PG and each data (DV, DS) are distributed and stored in a plurality of recording media may be employed.

素片情報DVは、合成音の素材として利用されるデータ群であり、図1に示すように、相異なる音声素片([a_s],[s_a],[a],……)に対応する複数の素片データPを含んで構成される。音声素片は、音声を聴覚的に区別し得る最小の単位に相当する1個の音素(母音や子音)、または複数(典型的には2個または3個)の音素を連結した音素連鎖である。例えば、音声素片の時間波形のサンプル系列が、その音声素片の素片データPとして利用される。   The unit information DV is a data group used as a synthetic sound material, and corresponds to different speech units ([a_s], [s_a], [a],...) As shown in FIG. A plurality of segment data P are included. A phoneme segment is a phoneme chain (one or two vowels or consonants) corresponding to the smallest unit that can be audibly discriminated, or a phoneme chain in which multiple (typically two or three) phonemes are connected. is there. For example, a sample series of the time waveform of the speech unit is used as the unit data P of the speech unit.

音楽情報DSは、楽曲を構成する各指定音の時系列を示す情報(スコアデータ)である。具体的には、音楽情報DSは、指定音の音高(ノートナンバ)と発音期間と発音文字とを楽曲内の指定音毎に指定する。発音期間は、例えば発音が開始する時刻と発音が継続される時間長とで規定される。発音文字は、音節を単位として発音の内容を示す文字(音節文字)である。1個の指定音には1個以上の発音文字が指示され得る。   The music information DS is information (score data) indicating the time series of each designated sound that constitutes the music. Specifically, the music information DS designates the pitch (note number) of the designated sound, the pronunciation period, and the pronunciation character for each designated sound in the music. The sound generation period is defined by, for example, the time when sound generation starts and the length of time during which sound generation is continued. The phonetic character is a character (syllable character) indicating the content of the pronunciation in syllable units. One designated sound can indicate one or more pronunciation characters.

図1の表示制御部22は、音楽情報DSの生成および編集のために利用者が視認する図2の編集画像40を表示装置16に表示させる。図2に示すように、編集画像40は、指定音の時系列を表示する楽譜領域42と、各指定音の発音文字を構成する各音素の時系列を表示する音素領域44とを含んで構成される。   The display control unit 22 in FIG. 1 causes the display device 16 to display the edited image 40 in FIG. 2 that is visually recognized by the user for generating and editing the music information DS. As shown in FIG. 2, the edited image 40 includes a score area 42 that displays a time series of designated sounds, and a phoneme area 44 that displays the time series of each phoneme constituting a pronunciation character of each designated sound. Is done.

楽譜領域42は、音高に対応する縦軸(音高軸)と時間に対応する横軸(時間軸)とが設定されたピアノロール型の画像領域である。利用者は、楽譜領域42を視認しながら入力装置14を適宜に操作することで各指定音の音高と発音期間とを指示する。表示制御部22は、利用者から指示された指定音を表す画像(以下「音指示子」という)51を楽譜領域42に時系列に配置する。音高軸の方向における音指示子51の位置は利用者が指示した音高に応じて決定される。また、時間軸の方向における音指示子51の各端点(始点および終点)は、利用者が指示した発音期間の始点および終点に相当する。したがって、時間軸の方向における音指示子51の長さは指定音の発音期間の時間長を意味する。   The score area 42 is a piano roll type image area in which a vertical axis (pitch axis) corresponding to pitch and a horizontal axis (time axis) corresponding to time are set. The user instructs the pitch and the sound generation period of each designated sound by appropriately operating the input device 14 while visually recognizing the score area 42. The display control unit 22 arranges an image 51 (hereinafter referred to as “sound indicator”) representing a designated sound instructed by the user in the score area 42 in time series. The position of the sound indicator 51 in the direction of the pitch axis is determined according to the pitch instructed by the user. Further, each end point (start point and end point) of the sound indicator 51 in the direction of the time axis corresponds to the start point and end point of the sound generation period instructed by the user. Therefore, the length of the sound indicator 51 in the direction of the time axis means the time length of the sound generation period of the designated sound.

また、利用者は、入力装置14を適宜に操作することで各指定音の発音文字(歌詞)を指示し得る。表示制御部22は、図2に示すように、利用者から指示された発音文字53を、指定音の音指示子51とともに(例えば図2の例示のように音指示子51に重ねて)表示装置16に表示させる。   Further, the user can instruct the pronunciation characters (lyrics) of each designated sound by appropriately operating the input device 14. As shown in FIG. 2, the display control unit 22 displays the pronunciation character 53 instructed by the user together with the sound indicator 51 of the designated sound (for example, superimposed on the sound indicator 51 as illustrated in FIG. 2). It is displayed on the device 16.

図1の情報生成部24は、楽譜領域42に対して利用者から指示された各指定音の音高と発音期間と発音文字とを対応させて記憶装置12の音楽情報DSに格納する。以上の処理が反復されることで、利用者から指示された指定音の時系列を示す音楽情報DSが記憶装置12に生成され、各指定音の音指示子51の時系列が図2の例示のように楽譜領域42に表示される。   The information generation unit 24 in FIG. 1 stores the pitch, pronunciation period, and pronunciation character of each designated sound instructed by the user for the score area 42 in the music information DS of the storage device 12. By repeating the above processing, the music information DS indicating the time series of the designated sound instructed by the user is generated in the storage device 12, and the time series of the sound indicator 51 of each designated sound is illustrated in FIG. As shown in FIG.

表示制御部22は、各指定音に指示された発音文字を構成する音素を表す画像(以下「音素指示子」という)61を音素領域44に時系列に配置する。表示制御部22は、楽譜領域42内の各音指示子51の時系列と音素領域44内の各音素指示子61の時系列とを共通の時間軸のもとで表示装置16に表示させる。   The display control unit 22 arranges images 61 (hereinafter referred to as “phoneme indicators”) 61 representing the phonemes constituting the phonetic characters designated by the designated sounds in the phoneme region 44 in time series. The display control unit 22 causes the display device 16 to display the time series of each phone indicator 51 in the score area 42 and the time series of each phoneme indicator 61 in the phoneme area 44 on a common time axis.

図2に示すように、音素指示子61は、指定音の発音文字(音節)を構成する音素毎に設定される。時間軸の方向における音素指示子61の各端点(始点および終点)は、その音素指示子61に対応する音素の発音期間の始点および終点を意味する。したがって、時間軸の方向における音素指示子61の長さは音素の発音期間の時間長に相当する。また、表示制御部22は、音素を表す記号(以下「音素記号」という)63を音素指示子61に重ねて(すなわち、音素指示子61の輪郭線の内側に)配置する。図2に示すように、母音の音素([a],[i])の音素指示子61と子音の音素([s],[n])の音素指示子61とは相異なる態様(色彩や模様)で表示装置16に表示される。   As shown in FIG. 2, the phoneme indicator 61 is set for each phoneme constituting the pronunciation character (syllable) of the designated sound. Each end point (start point and end point) of the phoneme indicator 61 in the direction of the time axis means the start point and end point of the pronunciation period of the phoneme corresponding to the phoneme indicator 61. Therefore, the length of the phoneme indicator 61 in the direction of the time axis corresponds to the time length of the phoneme pronunciation period. The display control unit 22 arranges a symbol representing a phoneme (hereinafter referred to as “phoneme symbol”) 63 so as to overlap the phoneme indicator 61 (that is, inside the outline of the phoneme indicator 61). As shown in FIG. 2, the phoneme indicator 61 of the vowel phoneme ([a], [i]) and the phoneme indicator 61 of the consonant phoneme ([s], [n]) are different from each other in color (color and color). Pattern) on the display device 16.

図3は、楽譜領域42および音素領域44を拡大した模式図である。図3では、「あさ(朝)」という単語を構成する発音文字「あ(a)」および発音文字「さ(sa)」が別個の指定音(N1,N2)に指示された場合が想定されている。図3に示すように、表示制御部22は、発音文字「あ」(音素単体)の音素[a]に対応する音素指示子61と、発音文字「さ(sa)」(音素連鎖)のうち前方の子音の音素[s]に対応する音素指示子61と、発音文字「さ(sa)」のうち後方の母音の音素[a]に対応する音素指示子61とを、音素領域44内に時系列に配列する。   FIG. 3 is a schematic diagram in which the score area 42 and the phoneme area 44 are enlarged. In FIG. 3, it is assumed that the pronunciation character “a (a)” and the pronunciation character “sa (sa)” constituting the word “asa (morning)” are indicated by separate designated sounds (N1, N2). ing. As shown in FIG. 3, the display control unit 22 includes a phoneme indicator 61 corresponding to the phoneme [a] of the phonetic character “a” (phoneme alone) and the phonetic character “sa (sa)” (phoneme chain). A phoneme indicator 61 corresponding to the phoneme [s] of the front consonant and a phoneme indicator 61 corresponding to the phoneme [a] of the rear vowel among the pronunciation characters “sa (sa)” in the phoneme region 44. Arrange in time series.

図3の例示の通り、各指定音の発音文字を構成する母音の音素の発音期間の始点(音素指示子61の始点)が、その指定音の発音期間の始点(音指示子51の始点)に時間軸上で合致するように、表示制御部22は各音素指示子61の各端点を選定する。すなわち、母音の音素の発音は指定音の発音期間の始点から開始する。例えば、図3の例示のように、子音の音素[s]と母音の音素[a]とを連結した発音文字「さ(sa)」が指定音N2に指示された場合、後方の音素[a]の音素指示子61の始点が、指定音N2の音指示子51の始点に合致する。   As illustrated in FIG. 3, the starting point of the vowel phoneme generation period (starting point of the phoneme indicator 61) constituting the pronunciation character of each specified sound is the starting point of the specified sound generation period (starting point of the sound indicator 51). The display control unit 22 selects each end point of each phoneme indicator 61 so as to match with the time axis. That is, the pronunciation of a vowel phoneme starts from the start point of the specified sound generation period. For example, as illustrated in FIG. 3, when the designated sound N2 indicates a phonetic character “sa (sa)” in which a consonant phoneme [s] and a vowel phoneme [a] are connected, a backward phoneme [a The start point of the phoneme indicator 61 is coincident with the start point of the sound indicator 51 of the designated sound N2.

他方、各指定音の発音文字を構成する子音の音素(例えば音素[s])については、子音の音素の発音期間の終点(音素指示子61の終点)が、その指定音の発音期間の始点(子音の直後の母音の始点)に合致するように、表示制御部22は各音素指示子61の各端点を選定する。例えば、指定音N2に指示された発音文字「さ(sa)」を構成する子音の音素[s]の音素指示子61の終点は、直後の母音の音素[a]の音素指示子61の始点(指定音N2の音指示子51の始点)に合致する。すなわち、指定音N2の発音期間の開始に先立って音素[s]の発音が開始される。以上に説明した規則で子音および母音の音素の始点を設定するのは、母音の始点と音符(指定音)の始点とが合致する場合に歌唱のタイミングが適切であると聴取者に認識されるという傾向があるからである。   On the other hand, for consonant phonemes (for example, phonemes [s]) constituting the pronunciation characters of each designated sound, the end point of the pronunciation period of the consonant phoneme (end point of the phoneme indicator 61) is the start point of the pronunciation period of the designated sound. The display control unit 22 selects each end point of each phoneme indicator 61 so as to match (the start point of the vowel immediately after the consonant). For example, the end point of the phoneme indicator 61 of the consonant phoneme [s] constituting the phonetic character “sa (sa)” indicated by the designated sound N2 is the start point of the phoneme indicator 61 of the immediately following vowel phoneme [a]. (Start point of the sound indicator 51 of the designated sound N2). That is, the pronunciation of the phoneme [s] is started prior to the start of the generation period of the designated sound N2. Setting the start point of consonant and vowel phonemes according to the rules described above is recognized by the listener that the singing timing is appropriate when the start point of the vowel coincides with the start point of the note (specified sound). This is because there is a tendency.

表示制御部22は、音素指示子61に配置された音素記号63を、その音素指示子61の端点間の範囲内で、入力装置14に対する利用者からの指示に応じて時間軸の方向(左右)に移動させる。図4では、音素[s]の音素記号63を時間軸上の後方(時間が経過する方向)に移動した場合が想定され、図5では、音素[s]の音素記号63を時間軸上の前方(時間が遡及する方向)に移動した場合が想定されている。第1実施形態では、時間軸上における各音素記号63の位置に応じて、その音素の合成に適用される各素片データPの境界の位置(時点)が可変に制御される(詳細は後述する)。   The display control unit 22 displays the phoneme symbol 63 arranged in the phoneme indicator 61 within the range between the end points of the phoneme indicator 61 in accordance with the direction from the time axis (left and right) according to an instruction from the user to the input device 14. ). In FIG. 4, it is assumed that the phoneme symbol 63 of the phoneme [s] is moved backward on the time axis (direction in which time passes), and in FIG. 5, the phoneme symbol 63 of the phoneme [s] is moved on the time axis. It is assumed that the vehicle has moved forward (the direction in which the time goes back). In the first embodiment, in accordance with the position of each phoneme symbol 63 on the time axis, the position (time point) of the boundary of each piece data P applied to the synthesis of the phoneme is variably controlled (details will be described later). To do).

図1の音声合成部26は、記憶装置12に格納された音楽情報DSが示す各指定音を合成して音声信号SOUTを生成する。図6に示すように、音声合成部26は、素片選択部262と素片調整部264と合成処理部266とを含んで構成される。素片選択部262は、音楽情報DSにて各指定音に指示された発音文字に対応する各音声素片の素片データPを記憶装置12の素片情報DVから選択する。例えば、前述の例示のように「あさ(朝)」が発音文字として指示された場合、素片選択部262は、図3に示すように、音声素片[#_a](「#」は無音を意味する),[a],[a_s],[s_a],[a]および[a_#]の各々に対応する素片データPを記憶装置12から取得する。   The voice synthesizer 26 in FIG. 1 synthesizes each designated sound indicated by the music information DS stored in the storage device 12 to generate a voice signal SOUT. As shown in FIG. 6, the speech synthesis unit 26 includes a segment selection unit 262, a segment adjustment unit 264, and a synthesis processing unit 266. The element selection unit 262 selects the element data P of each speech element corresponding to the pronunciation character instructed to each designated sound by the music information DS from the element information DV of the storage device 12. For example, when “ASA (morning)” is instructed as a pronunciation character as illustrated above, the segment selection unit 262 causes the speech segment [#_a] (“#” to be silent) as shown in FIG. ), Element data P corresponding to each of [a], [a_s], [s_a], [a] and [a_ #] is acquired from the storage device 12.

図6の素片調整部264は、素片選択部262が選択した各素片データPの音高や時間長を調整する。合成処理部266は、素片調整部264による調整後の各素片データPを相互に連結することで音声信号SOUTを生成する。素片データPを利用した音声信号SOUTの生成には公知の技術が任意に採用される。   The segment adjustment unit 264 in FIG. 6 adjusts the pitch and time length of each segment data P selected by the segment selection unit 262. The synthesis processing unit 266 generates the audio signal SOUT by connecting the unit data P adjusted by the unit adjustment unit 264 to each other. A known technique is arbitrarily employed for generating the audio signal SOUT using the segment data P.

素片調整部264による具体的な処理を以下に説明する。素片調整部264は、音高調整処理と時間調整処理(伸縮処理)とを実行する。音高調整処理は、各素片データPの音声素片の音高を、音楽情報DSが各指定音について示す音高に調整する処理である。素片データPの音高の調整には公知の技術が任意に採用される。   Specific processing by the segment adjustment unit 264 will be described below. The segment adjustment unit 264 performs a pitch adjustment process and a time adjustment process (expansion / contraction process). The pitch adjustment process is a process of adjusting the pitch of the speech segment of each segment data P to the pitch indicated by the music information DS for each designated sound. A known technique is arbitrarily adopted for adjusting the pitch of the segment data P.

時間調整処理は、各素片データPの音声素片の時間長を、音楽情報DSが各指定音について示す発音期間(音素領域44内の音素指示子61が示す各音素の発音期間)に応じて調整する処理である。すなわち、素片調整部264は、音楽情報DSにて指定音に指示された各発音文字を構成する音素が、音素領域44内の当該音素の音素指示子61が示す発音期間にて発音されるように、各素片データPを時間軸上で伸縮する。   In the time adjustment process, the time length of the speech element of each element data P is determined according to the sound generation period (the sound generation period of each phoneme indicated by the phoneme indicator 61 in the phoneme area 44) indicated by the music information DS for each designated sound. Adjustment process. That is, the phoneme adjustment unit 264 sounds the phonemes constituting each phonetic character specified by the designated sound in the music information DS during the pronunciation period indicated by the phoneme indicator 61 of the phoneme in the phoneme region 44. As described above, each piece of data P is expanded or contracted on the time axis.

例えば、図3の例示のように、音声素片[a_s]の素片データPAと音声素片[s_a]の素片データPBとを利用して素片[s]を合成する場合を想定する。素片調整部264は、素片データPAが示す音声素片[a_s]のうち後方の子音の音素[s]に相当する区間SAの始点(音素[a]に相当する区間の終点)tAが、音素領域44内の音素[s]の発音期間(音素指示子61)の始点pAに合致し、かつ、素片データPBが示す音声素片[s_a]のうち前方の子音の音素[s]に相当する区間SBの終点(音素[a]に相当する区間の始点)tBが、音素領域44内の音素[s]の発音期間(音素指示子61)の終点pBに合致するように、素片データPAおよび素片データPBを伸縮する。素片データPの伸縮には公知の技術(時間軸圧伸技術)が任意に採用される。   For example, as illustrated in FIG. 3, it is assumed that the unit [s] is synthesized using the unit data PA of the speech unit [a_s] and the unit data PB of the speech unit [s_a]. . The segment adjustment unit 264 has a start point (end point of a section corresponding to the phoneme [a]) tA corresponding to the phoneme [s] of the rear consonant in the speech unit [a_s] indicated by the unit data PA. The phoneme [s] of the consonant ahead of the phoneme unit [s_a] that coincides with the start point pA of the phoneme [s] pronunciation period (phoneme indicator 61) in the phoneme region 44 and is indicated by the unit data PB. So that the end point (start point of the section corresponding to phoneme [a]) tB corresponds to the end point pB of the pronunciation period (phoneme indicator 61) of the phoneme [s] in the phoneme area 44. The piece data PA and the piece data PB are expanded and contracted. A known technique (time axis companding technique) is arbitrarily adopted for expansion and contraction of the segment data P.

さらに、素片調整部264は、各音素の合成に利用される2個の素片データP(音声素片)の境界の位置を、音素領域44におけるその音素の音素記号63の位置に応じて可変に設定する。前述の例示のように音声素片[a_s]の素片データPAと音声素片[s_a]の素片データPBとを素片[s]の合成に利用する場合に着目して、各素片データPの境界の制御について以下に詳述する。   Further, the segment adjustment unit 264 determines the position of the boundary between the two segment data P (speech segment) used for synthesizing each phoneme according to the position of the phoneme symbol 63 of the phoneme in the phoneme region 44. Set to variable. Focusing on the case where the unit data PA of the speech unit [a_s] and the unit data PB of the speech unit [s_a] are used for the synthesis of the unit [s] as shown in the above example, The control of the boundary of the data P will be described in detail below.

音素領域44のうち音素[s]に対応する音素指示子61の音素記号63が初期的な位置から移動していない場合、素片調整部264は、図3に示すように、音声素片[a_s]の素片データPAと音声素片[s_a]の素片データPBとの境界(素片データPAの区間SAと素片データPBの区間SBとの境界)を、音素指示子61の始点pAから終点pBまでの区間内の所定の位置(以下「基準位置」という)tCに設定する。   When the phoneme symbol 63 of the phoneme indicator 61 corresponding to the phoneme [s] in the phoneme region 44 is not moved from the initial position, the unit adjustment unit 264, as shown in FIG. The boundary between the segment data PA of the a_s] and the segment data PB of the speech segment [s_a] (the boundary between the segment SA of the segment data PA and the segment SB of the segment data PB) is the starting point of the phoneme indicator 61. A predetermined position (hereinafter referred to as “reference position”) tC in the section from pA to end point pB is set.

図4の例示のように、利用者からの指示に応じて表示制御部22が音素[s]の音素記号63を右方(時間が経過する方向)に移動した場合、素片調整部264は、素片データPAの区間SAと素片データPBの区間SBとの境界が基準位置tCに対して右方の位置tC_1となるように素片データPAおよび素片データPBを伸縮する。区間SAの始点tAは音素[s]の発音期間の始点pAに維持され、区間SBの終点tBは音素[s]の発音期間の終点pBに維持される。すなわち、素片調整部264は、区間SAと区間SBとの時間長の合計を音素[s]の発音期間の時間長に維持したまま、区間SAを伸長するとともに区間SBを短縮する。区間SAと区間SBとの境界の変化量(基準位置tCと位置tC_1との間隔)は、音素記号63の移動量に応じて可変に(例えば音素記号63の移動量に比例するように)設定される。   As illustrated in FIG. 4, when the display control unit 22 moves the phoneme symbol 63 of the phoneme [s] to the right (in the direction in which time elapses) in response to an instruction from the user, the segment adjustment unit 264 The segment data PA and the segment data PB are expanded and contracted so that the boundary between the segment SA of the segment data PA and the segment SB of the segment data PB is the right position tC_1 with respect to the reference position tC. The start point tA of the section SA is maintained at the start point pA of the sounding period of the phoneme [s], and the end point tB of the section SB is maintained at the end point pB of the sounding period of the phoneme [s]. That is, the segment adjustment unit 264 extends the section SA and shortens the section SB while maintaining the total time length of the sections SA and SB at the time length of the phoneme [s] sound generation period. The change amount of the boundary between the section SA and the section SB (the interval between the reference position tC and the position tC_1) is variably set according to the movement amount of the phoneme symbol 63 (for example, proportional to the movement amount of the phoneme symbol 63). Is done.

他方、図5の例示のように、表示制御部22が音素[s]の音素記号63を左方(時間が遡及する方向)に移動した場合、素片調整部264は、区間SAと区間SBとの境界が基準位置tCに対して左方の位置tC_2に移動するように素片データPAおよび素片データPBを伸縮する。すなわち、素片調整部264は、区間SAと区間SBとの時間長の合計を音素[s]の発音期間の時間長に維持したまま、区間SAを短縮するとともに区間SBを伸長する。前述の説明と同様に、音素記号63の移動量に応じて区間SAと区間SBとの境界の変化量は可変に設定される。   On the other hand, as illustrated in FIG. 5, when the display control unit 22 moves the phoneme symbol 63 of the phoneme [s] to the left (the direction in which the time goes back), the segment adjustment unit 264 causes the segment SA and the segment SB to be moved. The segment data PA and the segment data PB are expanded and contracted so that the boundary between and the reference position tC moves to the left position tC_2. That is, the segment adjustment unit 264 shortens the section SA and extends the section SB while maintaining the total time length of the section SA and the section SB at the time length of the sound generation period of the phoneme [s]. Similar to the above description, the change amount of the boundary between the section SA and the section SB is variably set according to the movement amount of the phoneme symbol 63.

合成処理部266は、以上の手順で調整された各素片データPを利用して音声信号SOUTを生成する。素片データPAの区間SAと素片データPBの区間SBとは、発音の内容(音素[s])は共通するが、各音素に隣接する音素(母音)の種類や先後に応じて音楽的な特性(スペクトルや強度の時間変化)は相違する。したがって、音声信号SOUTが示す音素[s]の特性は、区間SAと区間SBとの時間長の比率(両区間の境界の位置)に応じて変化する。すなわち、第1実施形態によれば、利用者の意図を精緻に反映した微妙な表情の合成音を生成できるという利点がある。しかも、相前後する素片データPの境界の位置が音素領域44内の音素記号63の操作で可変に制御されるから、利用者が各素片データPの境界を直感的に調整できるという格別の効果も実現される。   The synthesis processing unit 266 generates an audio signal SOUT using each piece data P adjusted in the above procedure. The segment SA of the segment data PA and the segment SB of the segment data PB have the same pronunciation content (phoneme [s]), but musical according to the type of phoneme (vowel) adjacent to each phoneme and the future. Characteristics (spectrum and intensity over time) are different. Therefore, the characteristic of the phoneme [s] indicated by the audio signal SOUT changes in accordance with the ratio of the time length between the section SA and the section SB (the position of the boundary between both sections). That is, according to the first embodiment, there is an advantage that it is possible to generate a synthesized sound with a delicate expression that accurately reflects the user's intention. In addition, since the position of the boundary of the adjacent segment data P is variably controlled by the operation of the phoneme symbol 63 in the phoneme region 44, the user can adjust the boundary of each segment data P intuitively. The effect of is also realized.

<B:第2実施形態>
次に、本発明の第2実施形態を説明する。なお、以下の各例示において作用や機能が第1実施形態と同等である要素については、以上と同じ符号を付して各々の詳細な説明を適宜に省略する。
<B: Second Embodiment>
Next, a second embodiment of the present invention will be described. In the following examples, elements having the same functions and functions as those of the first embodiment are denoted by the same reference numerals, and detailed descriptions thereof are omitted as appropriate.

第1実施形態と同様に、表示制御部22は、図7の部分(A)の例示の通り、母音の音素の音素指示子61の始点pVが、その音素に対応する指定音の音指示子51の始点p0と合致するように、指定音毎の音指示子51の時系列と音素毎の音素指示子61の時系列とを、共通の時間軸のもとで表示装置16に表示させる。表示制御部22は、各音素指示子61の始点(pC,pV)を表す直線(相前後する音素指示子61の境界線)を、母音の音素の音素指示子61と子音の音素の音素指示子61とについて相異なる態様で表示する。具体的には、子音の音素指示子61の始点pCは二重線で表示され、母音の音素指示子61の始点pVは1本の直線で表示される。   Similar to the first embodiment, the display control unit 22, as illustrated in the part (A) of FIG. 7, has the start point pV of the phoneme indicator 61 of the vowel phoneme as the sound indicator of the designated sound corresponding to the phoneme. The time series of the sound indicator 51 for each designated sound and the time series of the phoneme indicator 61 for each phoneme are displayed on the display device 16 on a common time axis so as to coincide with the start point p0 of 51. The display control unit 22 uses a straight line representing the start point (pC, pV) of each phoneme indicator 61 (boundary line of the phoneme indicator 61) and a phoneme indication of the phoneme of the consonant phoneme. The child 61 is displayed in a different mode. Specifically, the start point pC of the consonant phoneme indicator 61 is displayed as a double line, and the start point pV of the vowel phoneme indicator 61 is displayed as a single straight line.

利用者は、所望の音素指示子61の始点の移動を入力装置14に指示することが可能である。子音の音素指示子61の始点pCの移動が利用者から指示された場合、表示制御部22は、図7の部分(B)に矢印M1で示すように、利用者からの指示に応じて時間軸の方向にその始点pCを移動させる。音指示子51の始点p0や母音の音素指示子61の始点pVは、子音の音素指示子61の始点pCの移動に関わらず維持される。   The user can instruct the input device 14 to move the starting point of the desired phoneme indicator 61. When the movement of the start point pC of the consonant phoneme indicator 61 is instructed by the user, the display control unit 22 performs time according to the instruction from the user as shown by an arrow M1 in the part (B) of FIG. The starting point pC is moved in the direction of the axis. The start point p0 of the sound indicator 51 and the start point pV of the vowel phoneme indicator 61 are maintained regardless of the movement of the start point pC of the consonant phoneme indicator 61.

他方、母音の音素指示子61の始点pVの移動が利用者から指示された場合、表示制御部22は、母音の音素指示子61の始点pVを、図7の部分(C)に矢印M2で示すように、利用者からの指示に応じて時間軸の方向に移動させるとともに、その母音に対応する指定音の音指示子51の始点p0を、矢印mで示すように、音素指示子61の始点pVに連動して移動させる。すなわち、母音の音素指示子61の始点pVと指定音の音指示子51の始点p0との合致は、母音の音素指示子61の始点pVの移動の前後にわたって維持される。   On the other hand, when the movement of the start point pV of the vowel phoneme indicator 61 is instructed by the user, the display control unit 22 displays the start point pV of the vowel phoneme indicator 61 by the arrow M2 in the part (C) of FIG. As shown in the figure, it is moved in the direction of the time axis according to the instruction from the user, and the start point p0 of the sound indicator 51 of the designated sound corresponding to the vowel is indicated by the phoneme indicator 61 as indicated by the arrow m. Move in conjunction with the starting point pV. That is, the match between the start point pV of the vowel phoneme indicator 61 and the start point p0 of the vowel phoneme indicator 51 is maintained before and after the movement of the start point pV of the vowel phoneme indicator 61.

音声合成部26は、以上の手順で調整された各音素指示子61の長さに応じた時間長(発音期間)にわたって各音素が発音されるように、第1実施形態と同様に各素片データPを伸縮して音声信号SOUTを生成する。以上の形態においては、各指定音の音素を単位として発音期間が調整されるから、音節を単位として発音期間が調整される特許文献1の技術と比較して、利用者の意図を精緻に反映した微妙な表情の合成音を生成できるという利点がある。   As in the first embodiment, the speech synthesizer 26 generates each phoneme as in the first embodiment so that each phoneme is pronounced over a time length (sounding period) corresponding to the length of each phoneme indicator 61 adjusted in the above procedure. The audio signal SOUT is generated by expanding and contracting the data P. In the above embodiment, the pronunciation period is adjusted in units of phonemes of each designated sound, so that the intention of the user is reflected more precisely than the technique of Patent Document 1 in which the pronunciation period is adjusted in units of syllables. There is an advantage that it is possible to generate a synthesized sound with a delicate expression.

なお、以上の例示では第1実施形態を基礎として第2実施形態を説明したが、相前後する素片データPの時間長の比率を音素記号63の位置に応じて可変に制御する第1実施形態の構成は、第2実施形態では省略され得る。また、以上の例示では、音素指示子61の始点pVの移動が指示された場合に音指示子51の始点p0を連動させたが、音指示子51の始点p0の移動が利用者から指示された場合に、この始点p0に連動して音素指示子61の始点pVを移動させる構成も好適である。   In the above example, the second embodiment has been described on the basis of the first embodiment. However, the first embodiment in which the ratio of the time lengths of the adjacent segment data P is variably controlled according to the position of the phoneme symbol 63. The configuration of the form may be omitted in the second embodiment. Further, in the above example, the start point p0 of the sound indicator 51 is linked when the movement of the start point pV of the phoneme indicator 61 is instructed, but the user is instructed to move the start point p0 of the sound indicator 51. In such a case, it is also preferable to move the start point pV of the phoneme indicator 61 in conjunction with the start point p0.

<C:第3実施形態>
次に、本発明の第3実施形態を説明する。第1実施形態や第2実施形態では、楽譜領域42から独立した音素領域44の音素指示子61を利用して各音素の発音期間や音素記号63を表示した。他方、第3実施形態では、各音素の発音期間や音素記号63の表示に楽譜領域42内の各音指示子51を流用する。したがって、第1実施形態の音素領域44を省略した内容の編集画像40が第3実施形態では表示装置16に表示される。
<C: Third Embodiment>
Next, a third embodiment of the present invention will be described. In the first and second embodiments, the phoneme period 61 and the phoneme symbol 63 are displayed using the phoneme indicator 61 of the phoneme region 44 independent of the score region 42. On the other hand, in the third embodiment, each sound indicator 51 in the score area 42 is used for displaying the pronunciation period of each phoneme and the phoneme symbol 63. Therefore, the edited image 40 having the content in which the phoneme region 44 of the first embodiment is omitted is displayed on the display device 16 in the third embodiment.

図8は、第3実施形態における楽譜領域42の模式図である。図8に示すように、利用者から指示された指定音に対応する音指示子51が楽譜領域42内に時系列に配置される。第1実施形態と同様に、音高軸の方向における音指示子51の位置は指定音の音高に応じて決定され、時間軸の方向における音指示子51の長さは指定音の発音期間に応じて決定される。発音文字53は音指示子51の外側に配置される。   FIG. 8 is a schematic diagram of the score area 42 in the third embodiment. As shown in FIG. 8, sound indicators 51 corresponding to designated sounds instructed by the user are arranged in time series in the score area 42. As in the first embodiment, the position of the sound indicator 51 in the direction of the pitch axis is determined according to the pitch of the designated sound, and the length of the sound indicator 51 in the direction of the time axis is determined by the sound generation period of the designated sound. It is decided according to. The phonetic characters 53 are arranged outside the sound indicator 51.

表示制御部22は、各指定音に指示された発音文字を構成する各音素の発音期間と音素記号63とを音指示子51に重ねて表示する。図8に示すように、音指示子51は、時間軸上で音素毎の区間(以下「音素区間」という)55に区分される。各音素に対応する音素区間55の長さは、その音素の発音期間に応じて可変に選定される。また、各音素の音素区間55と重なるようにその音素の音素記号63が配置される。1個の指定音に1個の音素のみが対応する場合(例えば図8の後方の指定音)には1個の音指示子51の全体が音素区間55に相当する。表示制御部22は、音素区間55の範囲内で利用者からの指示に応じて音素記号63を移動させる。第1実施形態と同様に、相前後する素片データPの時間長の比率は音素記号63の位置に応じて可変に制御される。   The display control unit 22 displays the pronunciation period of each phoneme constituting the pronunciation character designated by each designated sound and the phoneme symbol 63 superimposed on the sound indicator 51. As shown in FIG. 8, the sound indicator 51 is divided into sections (hereinafter referred to as “phoneme sections”) 55 for each phoneme on the time axis. The length of the phoneme section 55 corresponding to each phoneme is variably selected according to the pronunciation period of the phoneme. Moreover, the phoneme symbol 63 of the phoneme is arranged so as to overlap the phoneme section 55 of each phoneme. When only one phoneme corresponds to one designated sound (for example, the designated sound at the back of FIG. 8), the entire sound indicator 51 corresponds to the phoneme section 55. The display control unit 22 moves the phoneme symbol 63 within the range of the phoneme section 55 in accordance with an instruction from the user. Similar to the first embodiment, the ratio of the time lengths of the successive segment data P is variably controlled according to the position of the phoneme symbol 63.

以上の形態においては、指定音の音高や発音期間を示す音指示子51が各音素の発音期間や音素記号63の表示に流用される。したがって、楽譜領域42と音素領域44とが個別に表示される第1実施形態と比較して編集画像40の内容が簡素化されて利用者による確認が容易化されるという利点がある。例えば、第3実施形態によれば、各指定音の音高と各音素の発音期間との関係を利用者が容易に確認することが可能である。   In the above embodiment, the sound indicator 51 indicating the pitch of the designated sound and the sound generation period is used for displaying the sound generation period of each phoneme and the phoneme symbol 63. Therefore, compared with the first embodiment in which the score area 42 and the phoneme area 44 are individually displayed, there is an advantage that the content of the edited image 40 is simplified and the confirmation by the user is facilitated. For example, according to the third embodiment, the user can easily confirm the relationship between the pitch of each designated sound and the pronunciation period of each phoneme.

<D:第4実施形態>
次に、本発明の第4実施形態について説明する。利用者は、複数の指定音のうち相前後する各指定音の間のレガートの付加を入力装置14の操作で制御装置10に指示することが可能である。レガートは、音高が相違する2個の指定音を滑らかに連続して発音させる音楽的な表現である。
<D: Fourth Embodiment>
Next, a fourth embodiment of the present invention will be described. The user can instruct the control device 10 to add legato between the designated sounds that are in succession among the plurality of designated sounds by operating the input device 14. Legato is a musical expression that allows two designated sounds with different pitches to be pronounced smoothly and continuously.

図9は、第4実施形態における楽譜領域42の模式図である。図9に示すように、表示制御部22は、レガートが指示された各指定音(先行音および後続音)の音指示子51を相互に連結する形状の連結部57を、各音指示子51とともに楽譜領域42内に配置する。連結部57は、先行音の音指示子51の端部(後端)と後続音の音指示子51の端部(前端)とを曲線状に連結する画像である。   FIG. 9 is a schematic diagram of a score area 42 in the fourth embodiment. As shown in FIG. 9, the display control unit 22 includes a connection unit 57 having a shape for connecting the sound indicators 51 of the designated sounds (preceding sound and subsequent sound) for which legato is instructed. At the same time, it is arranged in the score area 42. The connecting portion 57 is an image that connects the end portion (rear end) of the preceding sound indicator 51 and the end portion (front end) of the subsequent sound indicator 51 in a curved line.

表示制御部22は、第3実施形態と同様に、各指定音の音指示子51を音素毎の音素区間55に区分することで各音素の発音期間および音素記号63を楽譜領域42に表示させる。また、各指定音にレガートが付加される場合、表示制御部22は、各音指示子51と連結部57とを含む帯状の領域を、各音素の発音期間に応じた時間長の音素区間55に区分するとともに音素記号63を付加する。図9では、連結部57が音素[s]の音素区間55に相当する場合が例示されている。   Similarly to the third embodiment, the display control unit 22 divides the sound indicator 51 of each designated sound into phoneme sections 55 for each phoneme, thereby displaying the pronunciation period of each phoneme and the phoneme symbol 63 in the score area 42. . In addition, when legato is added to each designated sound, the display control unit 22 converts a band-like region including each sound indicator 51 and the connecting unit 57 into a phoneme section 55 having a length corresponding to the sound generation period of each phoneme. And a phoneme symbol 63 is added. FIG. 9 illustrates the case where the connecting portion 57 corresponds to the phoneme section 55 of the phoneme [s].

また、表示制御部22は、第3実施形態と同様に、音素区間55の範囲内で利用者からの指示に応じて音素記号63を移動させる。音素記号63が連結部57に重なる場合、表示制御部22は、連結部57に沿うように音素記号63を移動させる。例えば、音素記号63の中心が連結部57の中心線(曲線)Lの線上に位置するように、表示制御部22は音素記号63を移動させる。音素記号63の時間軸上の位置に応じて各素片データPの時間長の比率を可変に制御する構成は第1実施形態と同様である。   Moreover, the display control part 22 moves the phoneme symbol 63 according to the instruction | indication from a user within the range of the phoneme area 55 similarly to 3rd Embodiment. When the phoneme symbol 63 overlaps the connecting portion 57, the display control unit 22 moves the phoneme symbol 63 along the connecting portion 57. For example, the display control unit 22 moves the phoneme symbol 63 so that the center of the phoneme symbol 63 is positioned on the center line (curve) L of the connecting portion 57. The configuration for variably controlling the time length ratio of each segment data P according to the position of the phoneme symbol 63 on the time axis is the same as in the first embodiment.

以上の形態においては、レガートが指示された各指定音の音指示子51を連結するように連結部57が表示されるから、利用者は、連続的に発音される各指定音を楽譜領域42から直感的に把握することが可能である。また、音素記号63が連結部57に重なる場合には連結部57に沿うように音素記号63が移動するから、レガートが付加された指定音についても、利用者が各素片データPの境界を直感的に調整できるという利点がある。   In the above embodiment, since the connecting portion 57 is displayed so as to connect the sound indicators 51 of the designated sounds for which legato is instructed, the user can input the designated sounds that are continuously pronounced into the score area 42. It is possible to grasp intuitively. In addition, when the phoneme symbol 63 overlaps the connecting portion 57, the phoneme symbol 63 moves along the connecting portion 57. Therefore, for the designated sound to which legato is added, the user sets the boundary of each piece data P. There is an advantage that it can be adjusted intuitively.

<E:変形例>
以上の各形態は多様に変形され得る。具体的な変形の態様を以下に例示する。以下の例示から任意に選択された2以上の態様は適宜に併合され得る。
<E: Modification>
Each of the above forms can be variously modified. Specific modifications are exemplified below. Two or more aspects arbitrarily selected from the following examples can be appropriately combined.

(1)変形例1
第1実施形態では子音の音素[s]を例示したが、各素片データPの時間長の比率が制御される音素の種類は任意である。すなわち、音素[s]以外の摩擦音の音素や摩擦音以外の音素(例えば破裂音の音素や母音の音素)についても同様に、素片データPの時間長の比率を制御する第1実施形態の構成が適用される。
(1) Modification 1
In the first embodiment, the phoneme [s] of the consonant is illustrated, but the type of phoneme in which the time length ratio of each piece data P is controlled is arbitrary. That is, the configuration of the first embodiment for controlling the time length ratio of the segment data P similarly for the phonemes of frictional sounds other than the phoneme [s] and the phonemes other than the frictional sounds (for example, plosive phonemes and vowel phonemes). Applies.

(2)変形例2
第1実施形態においては、素片データPの表す音声素片が1個または2個の音素で構成される場合を例示したが、3個以上の音素で構成される音声素片(音素連鎖)の素片データPを利用する場合にも以上の各形態が同様に適用される。
(2) Modification 2
In the first embodiment, the case where the speech element represented by the element data P is composed of one or two phonemes has been exemplified, but the speech element composed of three or more phonemes (phoneme chain). Each of the above forms is similarly applied when using the segment data P.

例えば、図10の例示のように子音の音素cが2個の母音の音素(v1,v2)に挟まれた音素連鎖(VCV型)の素片データPを音声合成に利用する構成では、中央の音素(図10では子音の音素c)を素片データPAと素片データPBとに区分することで以上の各形態を同様に適用できる。素片データPAは、子音の音素cの前半の区間に相当し、素片データPBは、子音の音素cの後半の区間に相当する。第1実施形態と同様に、子音の音素cの音素記号63の位置に応じて素片データPAと素片データPBとの時間長の比率が可変に制御される。   For example, as shown in the example of FIG. 10, in the configuration in which phoneme chain (VCV type) segment data P in which consonant phoneme c is sandwiched between two vowel phonemes (v1, v2) is used for speech synthesis, By dividing the phoneme (consonant phoneme c in FIG. 10) into segment data PA and segment data PB, each of the above forms can be similarly applied. The segment data PA corresponds to the first half of the consonant phoneme c, and the segment data PB corresponds to the second half of the consonant phoneme c. Similar to the first embodiment, the ratio of the time lengths of the segment data PA and the segment data PB is variably controlled in accordance with the position of the phoneme symbol 63 of the consonant phoneme c.

(3)変形例3
以上の各形態においては、素片データPAと素片データPBとの境界を音素記号63の位置に応じて変化させたが、素片データPAと素片データPBとの時間長の比率を変化させる方法は以上の例示に限定されない。例えば、図11に示すように、素片データPAと素片データPBとの重複の程度を音素記号63の位置に応じて変化させる構成も採用され得る。すなわち、音素[s]の音素記号63が右方に移動すると、素片調整部264は、図11に示すように、素片データPBのうち音素[s]の区間SBを維持したまま素片データPAの区間SAを伸長する(すなわち両者の時間長の比率を変化させる)。したがって、区間SAと区間SBとは部分的に重複する。音声合成部26は、素片データPAと素片データPBとが重複する部分については両者を加算(例えば加重和)することで音声信号SOUTを生成する。
(3) Modification 3
In each of the above embodiments, the boundary between the segment data PA and the segment data PB is changed according to the position of the phoneme symbol 63, but the time length ratio between the segment data PA and the segment data PB is changed. The method of making is not limited to the above illustration. For example, as shown in FIG. 11, a configuration in which the degree of overlap between the segment data PA and the segment data PB is changed according to the position of the phoneme symbol 63 may be employed. That is, when the phoneme symbol 63 of the phoneme [s] moves to the right, the segment adjustment unit 264 maintains the segment SB of the phoneme [s] in the segment data PB as shown in FIG. The section SA of the data PA is expanded (that is, the ratio of both time lengths is changed). Therefore, the section SA and the section SB partially overlap. The speech synthesizer 26 generates a speech signal SOUT by adding (for example, weighted sum) the portions where the segment data PA and the segment data PB overlap.

(4)変形例4
第2実施形態においては音指示子51の始点と音素指示子61の始点とを連動させたが、音指示子51と音素指示子61とを連動させるか否かを(例えば利用者からの指示に応じて)切替える構成も採用され得る。例えば、特定の操作子を押下しながら利用者が音指示子51および音素指示子61の一方の始点を移動させた場合、表示制御部22は他方の始点を連動して移動させ、利用者がその操作子を押下せずに音指示子51および音素指示子61の一方の始点を移動させた場合、表示制御部22は他方の始点を連動させない。
(4) Modification 4
In the second embodiment, the start point of the sound indicator 51 and the start point of the phoneme indicator 61 are linked, but whether or not the sound indicator 51 and the phoneme indicator 61 are linked (for example, an instruction from the user) Depending on the configuration, switching may also be employed. For example, when the user moves one start point of the sound indicator 51 and the phoneme indicator 61 while pressing a specific operation member, the display control unit 22 moves the other start point in conjunction with each other, and the user moves When one start point of the sound indicator 51 and the phoneme indicator 61 is moved without pressing the operation member, the display control unit 22 does not link the other start point.

(5)変形例5
各素片データPの時間長の比率を可変に制御する構成にとって、音指示子51を時系列に表示する機能は必須ではない。例えば、第1実施形態では編集画像40から楽譜領域42が省略され得る。すなわち、表示制御部22は、各音素の発音期間と音素記号63とを表示装置16に時系列に表示させる要素として包括される。音指示子51の時系列とは別個の音素指示子61が音素の発音期間や音素記号63の表示に利用されるか(第1実施形態,第2実施形態)、音指示子51の時系列(音素区間55)が音素の発音期間や音素記号63の表示に利用されるか(第3実施形態,第4実施形態)は本発明において不問である。また、音楽情報DSを利用者が編集する構成(情報生成部24)も省略され得る。
(5) Modification 5
For the configuration that variably controls the time length ratio of each piece of data P, the function of displaying the sound indicator 51 in time series is not essential. For example, in the first embodiment, the score area 42 may be omitted from the edited image 40. That is, the display control unit 22 is included as an element that causes the display device 16 to display the pronunciation period of each phoneme and the phoneme symbol 63 in time series. Whether a phoneme indicator 61 separate from the time series of the sound indicator 51 is used for displaying the phoneme pronunciation period or the phoneme symbol 63 (first embodiment, second embodiment), or the time series of the phone indicator 51. Whether the (phoneme section 55) is used for the phoneme generation period or the display of the phoneme symbol 63 (the third embodiment, the fourth embodiment) is unquestioned in the present invention. Further, the configuration (information generating unit 24) in which the user edits the music information DS can be omitted.

100……音声合成装置、10……制御装置、12……記憶装置、14……入力装置、16……表示装置、18……放音装置、22……表示制御部、24……情報生成部、26……音声合成部、262……素片選択部、264……素片調整部、266……合成処理部、40……編集画像、42……楽譜領域、44……音素領域、51……音指示子、53……発音文字、55……音素区間、57……連結部、61……音素指示子、63……音素記号。


100 ... speech synthesizer, 10 ... control device, 12 ... storage device, 14 ... input device, 16 ... display device, 18 ... sound emitting device, 22 ... display control unit, 24 ... information generation ,... Speech synthesis unit, 262... Segment selection unit, 264... Segment adjustment unit, 266... Synthesis processing unit, 40. 51... Phone indicator, 53... Phonetic character, 55... Phoneme segment, 57.


Claims (3)

音高および発音文字が指示された各指定音の発音期間と、前記各指定音の発音文字に対応する複数の音素の各々音素記号および発音期間の始点とを、表示装置の相異なる領域に時系列に表示させ、前記各音素の発音期間の始点を利用者からの指示に応じて音素毎に時間軸上で移動させる表示制御手段と、
前記発音期間にわたる各音素の合成音を生成する音声合成手段とを具備し、
前記表示制御手段は、
母音の音素の発音期間の始点の移動が利用者から指示された場合に、前記母音の音素の発音期間の始点を当該指示に応じて移動させるとともに、前記母音の音素の発音期間の始点に連動して、当該母音の音素に対応する前記指定音の発音期間の始点を移動させ、
子音の音素の発音期間の始点の移動が利用者から指示された場合に、前記子音の音素に対応する前記指定音の発音期間の始点と、当該子音に隣接する母音の音素の発音期間の始点とを維持したまま、前記子音の音素の発音期間の始点を当該指示に応じて移動させる
音声合成装置。
The pronunciation period of each designated sound for which the pitch and the pronunciation character are indicated, and the phoneme symbol of each of the plurality of phonemes corresponding to the pronunciation character of each designated sound and the start point of the pronunciation period are in different areas of the display device. Display in time series, display control means for moving the starting point of the pronunciation period of each phoneme on the time axis for each phoneme according to an instruction from the user;
Voice synthesis means for generating a synthesized sound of each phoneme over the pronunciation period ,
The display control means includes
When the movement of the starting point of the pronunciation period of the vowel phoneme is instructed by the user, the starting point of the pronunciation period of the vowel phoneme is moved according to the instruction and linked to the starting point of the pronunciation period of the vowel phoneme Then, the start point of the pronunciation period of the designated sound corresponding to the phoneme of the vowel is moved,
When the movement of the start point of the consonant phoneme is instructed by the user, the start point of the specified sound corresponding to the consonant phoneme and the start of the vowel phoneme adjacent to the consonant A speech synthesizer that moves the starting point of the pronunciation period of the phoneme of the consonant according to the instruction .
音高および発音文字が指示された各指定音の発音期間と、前記各指定音の発音文字に対応する複数の音素の各々音素記号および発音期間の始点とを、表示装置の相異なる領域に時系列に表示させ、前記各音素の発音期間の始点を利用者からの指示に応じて音素毎に時間軸上で移動させ、
前記発音期間にわたる各音素の合成音を生成する音声合成方法であって、
母音の音素の発音期間の始点の移動が利用者から指示された場合に、前記母音の音素の発音期間の始点を当該指示に応じて移動させるとともに、前記母音の音素の発音期間の始点に連動して、当該母音の音素に対応する前記指定音の発音期間の始点を移動させ、
子音の音素の発音期間の始点の移動が利用者から指示された場合に、前記子音の音素に対応する前記指定音の発音期間の始点と、当該子音に隣接する母音の音素の発音期間の始点とを維持したまま、前記子音の音素の発音期間の始点を当該指示に応じて移動させる
音声合成方法。
The pronunciation period of each designated sound for which the pitch and the pronunciation character are indicated, and the phoneme symbol of each of the plurality of phonemes corresponding to the pronunciation character of each designated sound and the start point of the pronunciation period are in different areas of the display device. Display in time series, move the starting point of the pronunciation period of each phoneme on the time axis for each phoneme according to the instruction from the user,
A speech synthesis method for generating a synthesized sound of each phoneme over the pronunciation period,
When the movement of the starting point of the pronunciation period of the vowel phoneme is instructed by the user, the starting point of the pronunciation period of the vowel phoneme is moved according to the instruction and linked to the starting point of the pronunciation period of the vowel phoneme Then, the start point of the pronunciation period of the designated sound corresponding to the phoneme of the vowel is moved,
When the movement of the start point of the consonant phoneme is instructed by the user, the start point of the specified sound corresponding to the consonant phoneme and the start of the vowel phoneme adjacent to the consonant The speech synthesis method of moving the starting point of the pronunciation period of the phoneme of the consonant according to the instruction while maintaining the above .
音高および発音文字が指示された各指定音の発音期間と、前記各指定音の発音文字に対応する複数の音素の各々音素記号および発音期間の始点とを、表示装置の相異なる領域に時系列に表示させ、前記各音素の発音期間の始点を利用者からの指示に応じて音素毎に時間軸上で移動させる表示制御手段、および、
前記発音期間にわたる各音素の合成音を生成する音声合成手段
としてコンピュータを機能させるプログラムであって、
前記表示制御手段は、
母音の音素の発音期間の始点の移動が利用者から指示された場合に、前記母音の音素の発音期間の始点を当該指示に応じて移動させるとともに、前記母音の音素の発音期間の始点に連動して、当該母音の音素に対応する前記指定音の発音期間の始点を移動させ、
子音の音素の発音期間の始点の移動が利用者から指示された場合に、前記子音の音素に対応する前記指定音の発音期間の始点と、当該子音に隣接する母音の音素の発音期間の始点とを維持したまま、前記子音の音素の発音期間の始点を当該指示に応じて移動させる
プログラム。
The pronunciation period of each designated sound for which the pitch and the pronunciation character are indicated, and the phoneme symbol of each of the plurality of phonemes corresponding to the pronunciation character of each designated sound and the start point of the pronunciation period are in different areas of the display device. Display in time series, display control means for moving the starting point of the pronunciation period of each phoneme on the time axis for each phoneme according to an instruction from the user, and
A program that causes a computer to function as speech synthesis means for generating a synthesized sound of each phoneme over the pronunciation period ,
The display control means includes
When the movement of the starting point of the pronunciation period of the vowel phoneme is instructed by the user, the starting point of the pronunciation period of the vowel phoneme is moved according to the instruction and linked to the starting point of the pronunciation period of the vowel phoneme Then, the start point of the pronunciation period of the designated sound corresponding to the phoneme of the vowel is moved,
When the movement of the start point of the consonant phoneme is instructed by the user, the start point of the specified sound corresponding to the consonant phoneme and the start of the vowel phoneme adjacent to the consonant A program that moves the starting point of the pronunciation period of the consonant phoneme in accordance with the instruction .
JP2014128317A 2014-06-23 2014-06-23 Speech synthesis apparatus, speech synthesis method and program Expired - Fee Related JP5935831B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2014128317A JP5935831B2 (en) 2014-06-23 2014-06-23 Speech synthesis apparatus, speech synthesis method and program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2014128317A JP5935831B2 (en) 2014-06-23 2014-06-23 Speech synthesis apparatus, speech synthesis method and program

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
JP2013210108A Division JP5641266B2 (en) 2013-10-07 2013-10-07 Speech synthesis apparatus, speech synthesis method and program

Publications (2)

Publication Number Publication Date
JP2014170251A JP2014170251A (en) 2014-09-18
JP5935831B2 true JP5935831B2 (en) 2016-06-15

Family

ID=51692629

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2014128317A Expired - Fee Related JP5935831B2 (en) 2014-06-23 2014-06-23 Speech synthesis apparatus, speech synthesis method and program

Country Status (1)

Country Link
JP (1) JP5935831B2 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104992712B (en) * 2015-07-06 2019-02-12 成都云创新科技有限公司 It can identify music automatically at the method for spectrum

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3879402B2 (en) * 2000-12-28 2007-02-14 ヤマハ株式会社 Singing synthesis method and apparatus, and recording medium

Also Published As

Publication number Publication date
JP2014170251A (en) 2014-09-18

Similar Documents

Publication Publication Date Title
JP6171711B2 (en) Speech analysis apparatus and speech analysis method
JP6236765B2 (en) Music data editing apparatus and music data editing method
JP5728913B2 (en) Speech synthesis information editing apparatus and program
JP5817854B2 (en) Speech synthesis apparatus and program
JP5423375B2 (en) Speech synthesizer
JP6127371B2 (en) Speech synthesis apparatus and speech synthesis method
US9711123B2 (en) Voice synthesis device, voice synthesis method, and recording medium having a voice synthesis program recorded thereon
JP5625321B2 (en) Speech synthesis apparatus and program
JP5636665B2 (en) Music information processing apparatus and music information processing method
JP6390690B2 (en) Speech synthesis method and speech synthesis apparatus
JP5935831B2 (en) Speech synthesis apparatus, speech synthesis method and program
JP5779838B2 (en) Sound processing apparatus and program
JP5157922B2 (en) Speech synthesizer and program
JP5641266B2 (en) Speech synthesis apparatus, speech synthesis method and program
WO2019239971A1 (en) Information processing method, information processing device and program
JP5790860B2 (en) Speech synthesizer
JP5552797B2 (en) Speech synthesis apparatus and speech synthesis method
WO2019239972A1 (en) Information processing method, information processing device and program
JP5782799B2 (en) Speech synthesizer
JP6036903B2 (en) Display control apparatus and display control method
JP6331470B2 (en) Breath sound setting device and breath sound setting method
JP6439288B2 (en) Composite information management apparatus and composite information management method
JP6435791B2 (en) Display control apparatus and display control method
JP6295691B2 (en) Music processing apparatus and music processing method
JP6149373B2 (en) Speech synthesis data editing apparatus and speech synthesis data editing method

Legal Events

Date Code Title Description
A621 Written request for application examination

Free format text: JAPANESE INTERMEDIATE CODE: A621

Effective date: 20140623

RD04 Notification of resignation of power of attorney

Free format text: JAPANESE INTERMEDIATE CODE: A7424

Effective date: 20150410

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20151006

A521 Request for written amendment filed

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20151120

TRDD Decision of grant or rejection written
A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

Effective date: 20160412

A61 First payment of annual fees (during grant procedure)

Free format text: JAPANESE INTERMEDIATE CODE: A61

Effective date: 20160425

R151 Written notification of patent or utility model registration

Ref document number: 5935831

Country of ref document: JP

Free format text: JAPANESE INTERMEDIATE CODE: R151

LAPS Cancellation because of no payment of annual fees