JP6888351B2

JP6888351B2 - Input device, speech synthesizer, input method, and program

Info

Publication number: JP6888351B2
Application number: JP2017052950A
Authority: JP
Inventors: 潮岡部; 亮佑石浦; 航平大竹; 悠真竹内; 俊文八木
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2017-03-17
Filing date: 2017-03-17
Publication date: 2021-06-16
Anticipated expiration: 2037-03-17
Also published as: JP2018156417A

Description

本発明は、ユーザーの操作に応じてリアルタイムで歌唱音声を合成する技術に関する。 The present invention relates to a technique for synthesizing a singing voice in real time according to a user's operation.

ユーザーによる演奏及び歌詞の入力に応じて、リアルタイムで歌唱音声を合成及び再生する技術が知られている。例えば、非特許文献１には、母音を入力するためのキー及び演奏を入力するためのキーを有する歌唱音声合成装置が記載されている。 There is known a technique for synthesizing and reproducing a singing voice in real time according to a performance by a user and input of lyrics. For example, Non-Patent Document 1 describes a singing voice synthesizer having a key for inputting a vowel and a key for inputting a performance.

“歌うキーボードポケットミク”、［online］、平成２６年４月３日、［２０１７年３月６日検索］、インターネット＜URL：http://otonanokagaku.net/nsx39/＞"Singing Keyboard Pocket Miku", [online], April 3, 2014, [Search March 6, 2017], Internet <URL: http://otonanokagaku.net/nsx39/>

非特許文献１に記載の技術においては、入力できる歌詞が母音のみであり、合成される歌唱音声が単調であるという問題があった。
これに対し本発明は、母音及び子音を簡単な操作で音声合成装置に入力できるようにした入力装置を提供することを目的とする。 In the technique described in Non-Patent Document 1, there is a problem that the lyrics that can be input are only vowels and the synthesized singing voice is monotonous.
On the other hand, an object of the present invention is to provide an input device capable of inputting vowels and consonants into a speech synthesizer with a simple operation.

本発明は、歌唱合成制御装置で合成する歌唱音声の歌詞の母音及び子音の一方を操作子に対する操作に応じて指定する第１指定部と、前記母音及び子音の他方を、自装置の動きに応じて指定する第２指定部と、前記指定された母音及び子音を、歌唱合成制御装置へ送信する送信部とを有する入力装置を提供する。 In the present invention, one of the vowels and consonants of the lyrics of the singing voice synthesized by the singing synthesis control device is designated by the operation on the operator, and the other of the vowels and consonants is used as the movement of the own device. Provided is an input device having a second designated unit designated accordingly and a transmitting unit that transmits the designated vowels and consonants to the singing synthesis control device.

この入力装置は、使用状態においてユーザーの指と接触する接触面を有する被把持部を有し、前記操作子は、前記被把持部における前記接触面に設けられてもよい。 The input device has a gripped portion having a contact surface that comes into contact with the user's finger in use, and the operator may be provided on the contact surface of the gripped portion.

前記第２指定部は、前記入力装置を動かす方向に応じて前記母音及び子音の他方を指定してもよい。 The second designation unit may designate the other of the vowel and the consonant according to the direction in which the input device is moved.

上記いずれかの構成の入力装置と、歌唱合成制御装置とを備え、前記歌唱合成制御装置は、前記入力装置から前記指定された母音及び子音を受信する受信部と、１以上の操作子と、前記１以上の操作子に対する操作を検出する操作検出部と、前記操作検出部により前記操作が検出された前記操作子に応じて音高を決定する決定部と前記受信部により受信された前記母音及び子音と、前記決定部により決定された音高とを有する合成音声を生成する音声合成部とを有する音声合成装置を提供する。 The singing synthesis control device includes an input device having any of the above configurations and a singing synthesis control device, and the singing synthesis control device includes a receiving unit that receives the designated vowels and consonants from the input device, and one or more controls. An operation detection unit that detects an operation on one or more controls, a determination unit that determines the pitch according to the operator whose operation is detected by the operation detection unit, and a vowel received by the reception unit. And a voice synthesizer having a voice synthesizer that generates a synthetic voice having a consonant and a pitch determined by the determination unit.

本発明によれば、母音及び子音を簡単な操作で音声合成装置に入力できるようにした入力装置を提供することができる。 According to the present invention, it is possible to provide an input device capable of inputting vowels and consonants into a speech synthesizer with a simple operation.

本発明の一実施形態に係る音声合成装置の概略構成を例示する図。The figure which illustrates the schematic structure of the voice synthesis apparatus which concerns on one Embodiment of this invention. 被把持部１１の構成を例示する図。The figure which illustrates the structure of the gripped part 11. 入力装置１０の動きと指定される子音との関係を例示する図。The figure which illustrates the relationship between the movement of the input device 10 and the designated consonant. 入力装置１０及び歌唱合成制御装置２０の機能構成を例示する図。The figure which illustrates the functional structure of the input device 10 and the singing synthesis control device 20. 入力装置１０及び歌唱合成制御装置２０の動作を示すフローチャート。The flowchart which shows the operation of the input device 10 and the singing synthesis control device 20. 変形例に係る被把持部１１の構造を例示する図。The figure which illustrates the structure of the gripped part 11 which concerns on the modification. 変形例に係る被把持部１１の動きと指定される子音との関係を例示する図。The figure which illustrates the relationship between the movement of the gripped portion 11 and the designated consonant which concerns on a modification. 別の変形例に係る被把持部１１の構造を例示する図。The figure which illustrates the structure of the gripped part 11 which concerns on another modification. 別の変形例に係る被把持部１１の動きと指定される子音との関係を示す図。The figure which shows the relationship between the movement of the gripped part 11 and the designated consonant which concerns on another modification. 変形例に係る入力装置及び歌唱合成制御装置の動作を示すフローチャート。The flowchart which shows the operation of the input device and the singing synthesis control device which concerns on a modification.

１．構成
図１は、本発明の一実施形態に係る音声合成装置１の概略構成を例示する図である。音声合成装置１は、リアルタイムで歌唱音声を合成する装置である。音声合成装置１は、入力装置１０と、歌唱合成制御装置２０とを含む。歌唱音声の合成には、少なくとも、歌詞及び音高の情報が必要である。この例において、歌詞は入力装置１０において入力され、音高は歌唱合成制御装置２０において入力される。入力装置１０において入力された歌詞を伝達するため、入力装置１０と歌唱合成制御装置２０とは、情報を送受信するためのケーブル３０を用いて、接続されている。ただし、入力装置１０と歌唱合成制御装置２０とは、有線ではなく、無線で接続されてもよい。 1. 1. Configuration FIG. 1 is a diagram illustrating a schematic configuration of a speech synthesizer 1 according to an embodiment of the present invention. The voice synthesizer 1 is a device that synthesizes singing voice in real time. The voice synthesis device 1 includes an input device 10 and a singing synthesis control device 20. At least lyrics and pitch information are required for singing speech synthesis. In this example, the lyrics are input by the input device 10, and the pitch is input by the singing synthesis control device 20. In order to transmit the lyrics input by the input device 10, the input device 10 and the singing synthesis control device 20 are connected by using a cable 30 for transmitting and receiving information. However, the input device 10 and the singing synthesis control device 20 may be connected wirelessly instead of being wired.

歌唱合成制御装置２０は、歌唱合成を行う装置である。この例において、歌唱合成制御装置２０は、電子ピアノで例示される鍵盤楽器を模した外観を有する。歌唱合成制御装置２０は、前面に操作部２１を備える。操作部２１は、鍵を模した複数の操作子２１１を有する。歌唱合成制御装置２０は、入力装置１０から入力された歌詞と、いずれかの操作子２１１を押す操作に応じて決定した音高とに基づいて、歌唱音声の合成を制御する。 The song synthesis control device 20 is a device that performs song synthesis. In this example, the singing synthesis control device 20 has an appearance that imitates a keyboard instrument exemplified by an electronic piano. The song synthesis control device 20 includes an operation unit 21 on the front surface. The operation unit 21 has a plurality of controls 211 that imitate a key. The singing synthesis control device 20 controls the synthesis of singing voices based on the lyrics input from the input device 10 and the pitch determined in response to the operation of pressing any of the controls 211.

入力装置１０は、歌詞を入力するための装置である。歌詞は母音及び子音の組み合わせにより構成される。入力装置１０は、棒状の形状を有する。入力装置１０は、被把持部１１と、発光部１２とを含む。被把持部１１は、ユーザーによって把持される部位である。発光部１２は、発光する部位である。このように入力装置１０は、ケミカルライトで例示される照明器具としても機能する。発光部１２及びその制御には公知の技術が用いられる。 The input device 10 is a device for inputting lyrics. The lyrics are composed of a combination of vowels and consonants. The input device 10 has a rod-like shape. The input device 10 includes a gripped portion 11 and a light emitting portion 12. The gripped portion 11 is a portion gripped by the user. The light emitting unit 12 is a portion that emits light. In this way, the input device 10 also functions as a lighting fixture exemplified by a chemical light. A known technique is used for the light emitting unit 12 and its control.

図２は、被把持部１１の構成を例示する図である。被把持部１１は、使用状態においてユーザーの指と接触する接触面１１Ａを有する。接触面１１Ａには、複数のスイッチ１１１〜１１６が設けられている。接触面１１Ａに設けられるスイッチは、例えばモーメンタリ型のプッシュ式スイッチである。このスイッチによれば、これらのスイッチを押している間はオンが入力され、スイッチが押されていない間はオフが入力される。スイッチは、オン／オフの入力が可能であれば、プッシュ式でなくてもよい。 FIG. 2 is a diagram illustrating the configuration of the gripped portion 11. The gripped portion 11 has a contact surface 11A that comes into contact with the user's finger in the used state. A plurality of switches 111 to 116 are provided on the contact surface 11A. The switch provided on the contact surface 11A is, for example, a momentary type push type switch. According to this switch, on is input while these switches are pressed, and off is input while the switches are not pressed. The switch does not have to be a push type as long as it can be input on / off.

この実施形態では、歌詞を構成する母音及び子音のうちの母音については、スイッチ１１１〜１１４の操作によって指定される。例えば、スイッチ１１１のみが押されている間は、［ａ］（あ）が指定される。スイッチ１１２のみが押されている間は、［ｉ］（い）が指定される。スイッチ１１３のみが押されている間は、［ｕ］（う）が指定される。スイッチ１１１及び１１２のみが押されている間は、［ｅ］（え）が指定される。スイッチ１１１及び１１３のみが押されている間は、［ｏ］（お）が指定される。 In this embodiment, the vowels and consonants that make up the lyrics are designated by the operation of switches 111 to 114. For example, [a] (a) is specified while only the switch 111 is pressed. While only switch 112 is pressed, [i] (i) is specified. While only switch 113 is pressed, [u] is specified. While only switches 111 and 112 are pressed, [e] is specified. While only switches 111 and 113 are pressed, [o] (o) is specified.

スイッチ１１４が押されている間は、拗音（半母音）の使用が指定される。例えば［ｋａ］（か）の拗音は［ｋｊａ］（きゃ）である。このため、スイッチ１１４が押されて間は、拗音を表現するために、母音［ａ］の直前に半母音［ｊ］を挿入することが指定される。 While the switch 114 is pressed, the use of a yoon (semivowel) is specified. For example, the yoon of [ka] (ka) is [kja] (ka). Therefore, while the switch 114 is pressed, it is specified that a semivowel [j] is inserted immediately before the vowel [a] in order to express the yoon.

歌詞の母音及び子音のうちの子音については、スイッチ１１５及び１１６の操作、並びに入力装置１０の動きによって指定される。この例において、入力装置１０の「動き」は、入力装置１０が振られることによる入力装置１０の位置の変化（つまり移動）である。 The vowels and consonants of the lyrics are designated by the operation of switches 115 and 116 and the movement of the input device 10. In this example, the "movement" of the input device 10 is a change (that is, movement) in the position of the input device 10 due to the shaking of the input device 10.

また、この実施形態では、入力装置１０の動きによって清音が指定され、濁音の使用の有無はスイッチ１１５の操作、半濁音の使用の有無はスイッチ１１６の操作によって指定される。例えば、子音として［ｋ］（か行）が指定され、且つ濁音の使用が指定された場合、［ｇ］（が行）が指定される。また、子音として［ｈ］（は行）が指定され、且つ半濁音の使用が指定された場合、［ｐ］（ぱ行）が指定される。 Further, in this embodiment, the clear sound is specified by the movement of the input device 10, the presence / absence of the use of the voiced sound is specified by the operation of the switch 115, and the presence / absence of the use of the semi-voiced sound is specified by the operation of the switch 116. For example, when [k] (or line) is specified as a consonant and the use of voiced sound is specified, [g] (ga line) is specified. When [h] (ha line) is specified as a consonant and the use of handakuon is specified, [p] (pa line) is specified.

図３は、入力装置１０の動きと、指定される子音との関係を例示する図である。ここで、入力装置１０の下から上に延びる中心軸を「Ｌ」と規定する。入力装置１０が、中心軸Ｌの軸方向に振られている間は、［ｋ］が、その反対方向に振られている間は、［ｈ］が指定される。入力装置１０が、中心軸Ｌに対して時計回りに４５度回転した方向に振られている間は、［ｓ］（さ行）が、その反対方向に振られている間は、［ｍ］（ま行）が指定される。入力装置１０が、中心軸Ｌに対して時計回りに９０度回転した方向に振られている間は、［ｔ］（た行）が、その反対方向に振られている間は、［ｙ］（や行）が指定される。入力装置１０が、中心軸Ｌに対して時計回りに１３５度回転した方向に振られている間は、［ｎ］（な行）が、その反対方向に振られている間は、［ｒ］（ら行）が指定される。入力装置１０がいずれの方向にも振られていない場合は、［ａ］（あ行）が指定される。図３に示す矢印で示した方向以外に入力装置１０が振られた場合は、最も近い方向に対応する子音が指定される。 FIG. 3 is a diagram illustrating the relationship between the movement of the input device 10 and the designated consonant. Here, the central axis extending from the bottom to the top of the input device 10 is defined as "L". [K] is designated while the input device 10 is swung in the axial direction of the central axis L, and [h] is designated while the input device 10 is swung in the opposite direction. While the input device 10 is swung in the direction rotated 45 degrees clockwise with respect to the central axis L, [s] is swung in the opposite direction, and [m] is swung in the opposite direction. (Ma line) is specified. While the input device 10 is swung in the direction rotated 90 degrees clockwise with respect to the central axis L, [t] (row) is swung in the opposite direction, [y]. (Or line) is specified. While the input device 10 is swung in the direction rotated 135 degrees clockwise with respect to the central axis L, [n] (na line) is swung in the opposite direction, while [r] is swung. (Ra line) is specified. If the input device 10 is not swung in either direction, [a] (A line) is specified. When the input device 10 is swung in a direction other than the direction indicated by the arrow shown in FIG. 3, the consonant corresponding to the closest direction is designated.

なお入力装置１０の動きと、指定される子音との関係は図３の例に限定されない。図３の例では、入力装置１０を地面に垂直に立てて用いた場合、地面にほぼ垂直な面における入力装置１０の動きに応じて子音が定義される。しかし、図３の例における中心軸Ｌを、入力装置１０において横（具体的には、例えば被把持部１１においてスイッチが設けられた面に垂直な方向）に設定してもよい。この例によれば、地面にほぼ水平な面における入力装置１０の動きに応じて子音が定義される。 The relationship between the movement of the input device 10 and the designated consonant is not limited to the example of FIG. In the example of FIG. 3, when the input device 10 is used upright on the ground, consonants are defined according to the movement of the input device 10 on a surface substantially perpendicular to the ground. However, the central axis L in the example of FIG. 3 may be set laterally in the input device 10 (specifically, in the direction perpendicular to the surface of the gripped portion 11 where the switch is provided). According to this example, consonants are defined according to the movement of the input device 10 on a plane substantially horizontal to the ground.

なお、［わ］、［を］、［ん］の各音は、例えば、スイッチ１１５をオンし、且つ入力装置１０を動かさないことによって指定される。また、［ゃ］、［ゅ］［ょ］という小書き文字を表現する場合は、これを指定するためのスイッチが別に設けられてもよい。 The sounds [wa], [o], and [n] are specified by, for example, turning on the switch 115 and not moving the input device 10. Further, when expressing small characters such as [ya] and [yu] [yo], a switch for designating these characters may be provided separately.

図４は、入力装置１０及び歌唱合成制御装置２０の機能構成を例示する図である。入力装置１０は、操作検出部１０１、第１指定部１０２、動き検出部１０３、第２指定部１０４及び送信部１０５を含む。操作検出部１０１は、スイッチ１１１〜１１６の各スイッチから入力される信号に基づいて、スイッチ１１１〜１１６の操作の状態を検出する。第１指定部１０２は、歌詞の母音及び子音のうちの母音を、操作検出部１０１により検出されたスイッチ１１１〜１１４の操作の状態に応じて指定する。動き検出部１０３は、入力装置１０の動きを検出する。この実施形態では、動き検出部１０３は、図示せぬセンサからの情報に基づいて、少なくとも入力装置１０の動かされた方向（振られた方向）を検出する。センサは、例えば、２軸又は３軸の加速度センサを含む。動き検出部１０３は、例えば、加速度センサによって計測された加速度、加速度から求められた速度、及び変位の大きさに基づいて、入力装置１０の動きを検出する。動き検出部１０３は、加速度センサ以外のセンサを用いて、入力装置１０の動きを検出してもよい。第２指定部１０４は、歌詞の母音及び子音のうちの子音を、動き検出部１０３により検出された入力装置１０の動き、並びに操作検出部１０１により検出されたスイッチ１１５及び１１６の操作の状態に応じて指定する。送信部１０５は、第１指定部１０２により指定された母音及び第２指定部１０４により指定された子音を、歌唱合成制御装置２０へ送信する。 FIG. 4 is a diagram illustrating the functional configurations of the input device 10 and the song synthesis control device 20. The input device 10 includes an operation detection unit 101, a first designation unit 102, a motion detection unit 103, a second designation unit 104, and a transmission unit 105. The operation detection unit 101 detects the operation state of the switches 111 to 116 based on the signals input from the switches of the switches 111 to 116. The first designation unit 102 designates a vowel among the vowels and consonants of the lyrics according to the operation state of the switches 111 to 114 detected by the operation detection unit 101. The motion detection unit 103 detects the motion of the input device 10. In this embodiment, the motion detection unit 103 detects at least the moved direction (swinged direction) of the input device 10 based on information from a sensor (not shown). Sensors include, for example, 2-axis or 3-axis accelerometers. The motion detection unit 103 detects the motion of the input device 10 based on, for example, the acceleration measured by the acceleration sensor, the velocity obtained from the acceleration, and the magnitude of the displacement. The motion detection unit 103 may detect the motion of the input device 10 by using a sensor other than the acceleration sensor. The second designation unit 104 sets the consonants of the vowels and consonants of the lyrics to the movement of the input device 10 detected by the motion detection unit 103 and the operation state of the switches 115 and 116 detected by the operation detection unit 101. Specify according to. The transmission unit 105 transmits the vowels designated by the first designated unit 102 and the consonants designated by the second designated unit 104 to the singing synthesis control device 20.

なお、入力装置１０の各機能は、ＣＰＵ（Central Processing Unit）で例示される演算処理装置、ＲＯＭ（Read only memory）及びＲＡＭ（Random access memory）で例示されるメモリ、並びに通信モジュール等を搭載したプロセッサにより実装される。入力装置１０の各機能は、例えば、プロセッサ及びプロセッサが実行するプログラムにより実装される。また、入力装置１０の機能は２以上のプロセッサ又はプログラムにより実装されてもよい。 Each function of the input device 10 is equipped with an arithmetic processing unit exemplified by a CPU (Central Processing Unit), a memory exemplified by a ROM (Read only memory) and a RAM (Random access memory), a communication module, and the like. Implemented by the processor. Each function of the input device 10 is implemented by, for example, a processor and a program executed by the processor. Further, the function of the input device 10 may be implemented by two or more processors or programs.

歌唱合成制御装置２０は、受信部２０１、操作検出部２０２、決定部２０３、合成指示部２０４、音声合成部２０５及び音声出力部２０６を含む。受信部２０１は、入力装置１０（送信部１０５）から、歌詞の母音及び子音を受信する。操作検出部２０２は、操作部２１の各操作子２１１から入力される信号に基づいて、
各操作子２１１の操作の状態を検出する。決定部２０３は、操作検出部２０２の検出結果に基づいて、ユーザーにより押された操作子２１１に応じた音高を決定する。合成指示部２０４は、受信部２０１により受信された子音及び母音、並びに決定部２０３により決定された音高に基づいて、歌唱音声を合成するように、音声合成部２０５に指示する。音声合成部２０５は、合成指示部２０４からの合成指示に従って歌唱音声を合成して、歌唱音声（合成音声）を生成する。音声合成部２０５は、合成された歌唱音声を示す音信号を、音声出力部２０６に出力する。音声出力部２０６は、音声合成部２０５から出力された音信号に従って音を出力する。 The singing synthesis control device 20 includes a reception unit 201, an operation detection unit 202, a determination unit 203, a synthesis instruction unit 204, a voice synthesis unit 205, and a voice output unit 206. The receiving unit 201 receives the vowels and consonants of the lyrics from the input device 10 (transmitting unit 105). The operation detection unit 202 is based on the signal input from each operator 211 of the operation unit 21.
The operation status of each operator 211 is detected. The determination unit 203 determines the pitch according to the operator 211 pressed by the user based on the detection result of the operation detection unit 202. The synthesis instruction unit 204 instructs the voice synthesis unit 205 to synthesize the singing voice based on the consonants and vowels received by the reception unit 201 and the pitch determined by the determination unit 203. The voice synthesis unit 205 synthesizes the singing voice according to the synthesis instruction from the synthesis instruction unit 204 to generate the singing voice (synthetic voice). The voice synthesis unit 205 outputs a sound signal indicating the synthesized singing voice to the voice output unit 206. The voice output unit 206 outputs sound according to the sound signal output from the voice synthesis unit 205.

なお、受信部２０１、操作検出部２０２、決定部２０３、合成指示部２０４及び音声合成部２０５の各機能は、ＣＰＵで例示される演算処理装置、ＲＯＭ及びＲＡＭで例示されるメモリ、並びに通信モジュール等を搭載したプロセッサにより実装される。歌唱合成制御装置２０の各機能は、例えば、プロセッサ及びプロセッサが実行するプログラムにより実装される。また、歌唱合成制御装置２０の機能は２以上のプロセッサ又はプログラムにより実装されてもよい。音声出力部２０６は、例えば、信号処理回路、増幅器及びスピーカを含む。 The functions of the reception unit 201, the operation detection unit 202, the determination unit 203, the synthesis instruction unit 204, and the voice synthesis unit 205 are the arithmetic processing unit exemplified by the CPU, the memory exemplified by the ROM and RAM, and the communication module. It is implemented by a processor equipped with such as. Each function of the song synthesis control device 20 is implemented by, for example, a processor and a program executed by the processor. Further, the function of the song synthesis control device 20 may be implemented by two or more processors or programs. The audio output unit 206 includes, for example, a signal processing circuit, an amplifier, and a speaker.

２．動作
図５は、入力装置１０及び歌唱合成制御装置２０の動作を示すフローチャートである。図５のフローは、例えば、入力装置１０及び歌唱合成制御装置２０の電源がオンされている期間において実行される。 2. Operation FIG. 5 is a flowchart showing the operation of the input device 10 and the song synthesis control device 20. The flow of FIG. 5 is executed, for example, during the period when the power of the input device 10 and the song synthesis control device 20 is turned on.

入力装置１０において、第１指定部１０２は、操作検出部１０１の検出結果に基づいて、スイッチ１１１〜１１３の少なくともいずれかが押されたか否かを判断する（ステップＳ１１）。いずれのスイッチも押されていないと判断した場合（ステップＳ１１；ＮＯ）、第１指定部１０２は、待機する。スイッチ１１１〜１１３の少なくともいずれかが押されたと判断した場合（ステップＳ１１；ＹＥＳ）、第１指定部１０２は、母音を指定する（ステップＳ１２）。第１指定部１０２は、スイッチ１１１〜１１３の操作状態に応じて、［ａ］、［ｉ］、［ｕ］、［ｅ］、［ｏ］のうちのいずれかの母音を指定し、また、スイッチ１１４の操作状態に応じて拗音を表現するための半母音を指定する。 In the input device 10, the first designation unit 102 determines whether or not at least one of the switches 111 to 113 has been pressed based on the detection result of the operation detection unit 101 (step S11). When it is determined that none of the switches are pressed (step S11; NO), the first designated unit 102 stands by. When it is determined that at least one of the switches 111 to 113 is pressed (step S11; YES), the first designation unit 102 designates a vowel (step S12). The first designation unit 102 designates one of the vowels [a], [i], [u], [e], and [o] according to the operation state of the switches 111 to 113, and also A semivowel for expressing a yoon is specified according to the operation state of the switch 114.

次に、動き検出部１０３は、入力装置１０の動きを検出する（ステップＳ１３）。第２指定部１０４は、動き検出部１０３により検出された入力装置１０の動かされた方向、並びに操作検出部１０１により検出されたスイッチ１１５及びスイッチ１１６の操作状態に応じて、子音を指定する（ステップＳ１４）。第２指定部１０４は、［ａ］、［ｋ］、［ｓ］、［ｔ］、［ｎ］、［ｈ］、［ｍ］、［ｇ］、［ｚ］（ざ行）、［ｄ］（だ行）、［ｂ］（ば行）及び［ｐ］（ぱ行）のうちの、いずれかの子音を指定する。 Next, the motion detection unit 103 detects the motion of the input device 10 (step S13). The second designation unit 104 designates consonants according to the direction in which the input device 10 is moved detected by the motion detection unit 103 and the operation states of the switch 115 and the switch 116 detected by the operation detection unit 101 ( Step S14). The second designation unit 104 includes [a], [k], [s], [t], [n], [h], [m], [g], [z] (sounds), and [d]. Specify one of the consonants of (da line), [b] (ba line), and [p] (pa line).

次に、送信部１０５は、指定された母音及び子音を、歌唱合成制御装置２０へ送信する（ステップＳ１５）。この送信後、入力装置１０の処理はステップＳ１１に戻される。即ち、スイッチ１１１〜１１３の少なくともいずれかが押されている間は、送信部１０５は、母音及び子音を歌唱合成制御装置２０へ送信する。 Next, the transmission unit 105 transmits the designated vowels and consonants to the song synthesis control device 20 (step S15). After this transmission, the processing of the input device 10 is returned to step S11. That is, while at least one of the switches 111 to 113 is pressed, the transmission unit 105 transmits the vowel and the consonant to the song synthesis control device 20.

歌唱合成制御装置２０において受信部２０１は、入力装置１０から、母音及び子音を受信したか否かを判断する（ステップＳ２１）。母音及び子音を受信していないと判断した場合（ステップＳ２１；ＮＯ）、受信部２０１は待機する。母音及び子音が受信されたと判断した場合（ステップＳ２１；ＹＥＳ）、決定部２０３は、操作検出部２０２の検出結果に基づいて、少なくともいずれかの操作子２１１が押されたか否かを判断する（ステップＳ２２）。いずれの操作子２１１も押されていないと判断した場合（ステップＳ２１；ＮＯ）、歌唱合成制御装置２０の処理は、ステップＳ２１に戻される。 In the song synthesis control device 20, the receiving unit 201 determines whether or not vowels and consonants have been received from the input device 10 (step S21). When it is determined that the vowels and consonants have not been received (step S21; NO), the receiving unit 201 stands by. When it is determined that the vowel and the consonant have been received (step S21; YES), the determination unit 203 determines whether or not at least one of the controls 211 has been pressed based on the detection result of the operation detection unit 202 (step S21; YES). Step S22). If it is determined that none of the controls 211 has been pressed (step S21; NO), the process of the song synthesis control device 20 is returned to step S21.

決定部２０３は、少なくともいずれかの操作子２１１が押されたと判断した場合（ステップＳ２２；ＹＥＳ）、押された操作子２１１に応じた音高を決定する（ステップＳ２３）。決定部２０３は、この操作子２１１に固有の音高を決定する。操作子２１１は鍵を模した操作子である。よって、より高い音高に対応する鍵に相当する操作子２１１が押された場合ほど、決定部２０３はより高い音高を決定するとよい。 When it is determined that at least one of the controls 211 has been pressed (step S22; YES), the determination unit 203 determines the pitch according to the pressed controls 211 (step S23). The determination unit 203 determines the pitch peculiar to the operator 211. The operator 211 is an operator that imitates a key. Therefore, it is preferable that the determination unit 203 determines the higher pitch as the operator 211 corresponding to the key corresponding to the higher pitch is pressed.

合成指示部２０４は、受信された子音及び母音、並びに決定された音高に基づいて、歌唱音声を合成するように、音声合成部２０５に指示する（ステップＳ２４）。具体的には、合成指示部２０４は、子音及び母音に従い決定された歌詞を発音記号に変換して、この発音記号及び決定した音高の音声を合成する指示を生成し、音声合成部２０５に出力する。音声合成部２０５は、入力された合成指示に従って歌唱音声を合成する（ステップＳ２５）。歌唱音声の合成には公知の技術を用いることができるので、ここではその概要だけ説明する。音声合成部２０５は、素片ライブラリを有している。素片ライブラリは、ある特定の歌唱者の声からサンプリングした音楽素片（歌声の断片）を含むデータベースである。素片ライブラリには、その歌唱者の歌唱音声波形から採取された素片データが複数含まれている。素片データとは、歌唱音声波形から、音声学的な特徴部分を切り出して符号化した音声データをいう。 The synthesis instruction unit 204 instructs the voice synthesis unit 205 to synthesize the singing voice based on the received consonants and vowels and the determined pitch (step S24). Specifically, the synthesis instruction unit 204 converts the lyrics determined according to the consonants and vowels into phonetic symbols, generates an instruction to synthesize the phonetic symbols and the voice of the determined pitch, and sends the voice synthesis instruction unit 205 to the voice synthesis instruction unit 205. Output. The voice synthesis unit 205 synthesizes the singing voice according to the input synthesis instruction (step S25). Since a known technique can be used for singing voice synthesis, only the outline thereof will be described here. The voice synthesis unit 205 has a piece library. A piece library is a database containing music pieces (fragments of a singing voice) sampled from the voice of a specific singer. The fragment library contains a plurality of fragment data collected from the singing voice waveform of the singer. The elemental piece data refers to voice data obtained by cutting out a phonetic feature part from a singing voice waveform and encoding it.

ここで、素片データについて、［さいた］という歌詞の歌唱音声を合成する場合を例として説明する。［さいた］という歌詞は発音記号で［ｓａｉｔａ］と表される。発音記号［ｓａｉｔａ］で表される音声の波形を特徴により分析すると、［ｓ］の音の立ち上がり部分、［ｓ］の音、［ｓ］の音から［ａ］の音への遷移部分、［ａ］の音…と続き、［ａ］の音の減衰部分で終わる。各素片データは、これらの音声学的な特徴部分に対応する音声データである。素変ライブラリには、あらゆる音及び音の組み合わせに関する素片データが格納されている。以下の説明において、ある発音記号で表される音の立ち上がり部分に対応する素片データを、その発音記号の前に［＃］を付けて、［＃ｓ］のように表す。また、ある発音記号で表される音の減衰部分に対応する素片データを、その発音記号の後に［＃］を付けて、［ａ＃］のように表す。また、ある発音記号で表される音から他の発音記号で表される音への遷移部分に対応する素片データを、それらの発音記号の間に［−］を入れて、［ｓ−ａ］のように表す。 Here, the case of synthesizing the singing voice of the lyrics [Saita] will be described as an example of the elemental piece data. The lyrics [saita] are phonetic symbols expressed as [saita]. Analyzing the waveform of the voice represented by the phonetic symbol [saita] by characteristics, the rising part of the sound of [s], the sound of [s], the transition part from the sound of [s] to the sound of [a], [ It continues with the sound of [a], and ends with the attenuated part of the sound of [a]. Each piece of data is voice data corresponding to these phonetic feature parts. The elemental transformation library stores elemental data for all sounds and sound combinations. In the following description, the piece data corresponding to the rising part of the sound represented by a certain phonetic symbol is represented as [# s] by adding [#] in front of the phonetic symbol. Further, the elemental piece data corresponding to the attenuated portion of the sound represented by a certain phonetic symbol is represented as [a #] by adding [#] after the phonetic symbol. In addition, the piece data corresponding to the transition part from the sound represented by one phonetic symbol to the sound represented by another phonetic symbol is inserted with [-] between those phonetic symbols, and [s-a]. ] Is expressed as.

例えば、［ぱ］という音声は、［＃ｐ］、［ｐ］、［ｐ−ａ］及び［ａ］という素片データを順番に並べて繋げることにより合成される。音声合成部２０５は、これらの素片データを組み合わせた後、音高を調整する。音声合成部２０５は、音高を調整した合成音声の音信号を出力する。音声出力部２０６は、音声合成部２０５から出力された音信号に従って合成音声を出力する（ステップＳ２６）。 For example, the voice [pa] is synthesized by connecting the element data [#p], [p], [pa] and [a] in order. The voice synthesis unit 205 adjusts the pitch after combining these elemental piece data. The voice synthesis unit 205 outputs the sound signal of the synthetic voice whose pitch is adjusted. The voice output unit 206 outputs the synthesized voice according to the sound signal output from the voice synthesis unit 205 (step S26).

次に、合成指示部２０４は、入力装置１０から受信される母音又は子音が変化したか否かを判断する（ステップＳ２７）。具体的には、合成指示部２０４は、母音及び子音の少なくとも一方が変化したか、並びに母音及び子音が受信されなくなったかを判断する。母音及び子音に変化がないと判断した場合（ステップＳ２７；ＮＯ）、合成指示部２０４は音高の変更があるか否かを判断する（ステップＳ２８）。具体的には、合成指示部２０４は、操作子２１１が押されなくなった（操作子２１１から指が離された）か、及び別の操作子２１１が押されたかを判断する。音高に変更がないと判断された場合は（ステップＳ２８；ＮＯ）、合成指示部２０４は、新たな歌唱音声の合成を指示しない。具体的に葉、歌唱合成制御装置２０の処理はステップＳ２５に処理が戻され、音声合成部２０５は、音声出力部２０６を用いて、同じ歌詞（文字）の合成音声を出力し続ける（ステップＳ２５，Ｓ２６）。音声合成部２０５は、最後の母音（先の例では［ａ］）を伸ばし続ける音信号を出力する。 Next, the synthesis instruction unit 204 determines whether or not the vowel or consonant received from the input device 10 has changed (step S27). Specifically, the synthesis indicator 204 determines whether at least one of the vowels and consonants has changed, and whether the vowels and consonants are no longer received. When it is determined that there is no change in the vowels and consonants (step S27; NO), the synthesis instruction unit 204 determines whether or not there is a change in pitch (step S28). Specifically, the synthesis instruction unit 204 determines whether the operator 211 is no longer pressed (a finger is released from the operator 211) or another operator 211 is pressed. If it is determined that there is no change in pitch (step S28; NO), the synthesis instruction unit 204 does not instruct the synthesis of a new singing voice. Specifically, the processing of the leaf and singing synthesis control device 20 is returned to step S25, and the voice synthesis unit 205 continues to output the synthetic voice of the same lyrics (characters) using the voice output unit 206 (step S25). , S26). The voice synthesis unit 205 outputs a sound signal that keeps extending the last vowel ([a] in the previous example).

一方、合成指示部２０４が入力装置１０から受信される母音又は子音が変化したと判断された場合（ステップＳ２７；ＹＥＳ）、又は音高が変更されたと判断した場合（ステップＳ２８；ＹＥＳ）には、歌唱合成制御装置２０の処理は、ステップＳ２１に戻される。
そして、入力装置１０から母音及び子音が受信され（ステップＳ２１；ＹＥＳ）、更に操作子２１１の操作で音高が指定された場合には（ステップＳ２２；ＹＥＳ）、合成指示部２０４は、新たな歌唱音声の合成を音声合成部２０５に指示し、歌唱音声の合成、及び合成音声の出力を行わせる（ステップＳ２３〜Ｓ２６）。 On the other hand, when the synthesis instruction unit 204 determines that the vowel or consonant received from the input device 10 has changed (step S27; YES), or determines that the pitch has changed (step S28; YES). , The process of the singing synthesis control device 20 is returned to step S21.
Then, when the vowel and the consonant are received from the input device 10 (step S21; YES) and the pitch is specified by the operation of the operator 211 (step S22; YES), the synthesis instruction unit 204 is newly added. The voice synthesis unit 205 is instructed to synthesize the singing voice, and the singing voice is synthesized and the synthesized voice is output (steps S23 to S26).

以上説明した音声合成装置１によれば、ユーザーは片方の手で入力装置１０を持ち、スイッチを押す操作と入力装置１０を動かす動作とによって、歌詞の母音及び子音を指定することができる。更に、ユーザーは他方の手で歌唱合成制御装置２０を操作することによって、歌詞の音高を指定することができる。よって、ユーザーは、歌詞の母音、子音及び音高を簡単に指定して、歌唱合成制御装置２０に合成音声を出力させることができる。 According to the speech synthesizer 1 described above, the user holds the input device 10 with one hand, and can specify the vowels and consonants of the lyrics by the operation of pressing the switch and the operation of moving the input device 10. Further, the user can specify the pitch of the lyrics by operating the singing synthesis control device 20 with the other hand. Therefore, the user can easily specify the vowels, consonants, and pitches of the lyrics and have the singing synthesis control device 20 output the synthesized speech.

３．変形例
本発明は上述の実施形態に限定されるものではなく、種々の変形実施が可能である。以下、変形例をいくつか説明する。以下の変形例のうち２つ以上のものが組み合わせて用いられてもよい。 3. 3. Modifications The present invention is not limited to the above-described embodiment, and various modifications can be performed. Hereinafter, some modification examples will be described. Two or more of the following modifications may be used in combination.

３−１．被把持部１１
図６は、変形例に係る被把持部１１の構造を例示する図であり、図７は、被把持部１１の動きと指定される子音との関係を例示する図である。被把持部１１の接触面１１Ａに設けられるスイッチの種類及び数は、図２の例に限定されない。この変形例では、図６に示すように、接触面１１Ａにおいて、濁音を指定するスイッチ１１５及び半濁音を指定する１１６が設けられておらず、代わりに、モードを切り替えるスイッチ１１７が設けられている。第１指定部１０２は、スイッチ１１７がオフされている間は、図７の左側の図に示すように、子音として［ａ］、［ｋ］、［ｓ］、［ｔ］、［ｎ］、［ｈ］、［ｍ］、［ｙ］、［ｒ］を指定可能とする。一方、第１指定部１０２は、スイッチ１１７がオンされている間は、図７の右側の図に示すように、子音として［ｙ］、［ｗ］、［ｇ］、［ｚ］、［ｄ］、［ｂ］、［ｐ］を指定可能とする。 3-1. Grasped portion 11
FIG. 6 is a diagram illustrating the structure of the gripped portion 11 according to the modified example, and FIG. 7 is a diagram illustrating the relationship between the movement of the gripped portion 11 and the designated consonant. The type and number of switches provided on the contact surface 11A of the gripped portion 11 are not limited to the example of FIG. In this modification, as shown in FIG. 6, on the contact surface 11A, the switch 115 for designating the voiced sound and the 116 for specifying the handakuon are not provided, and instead, a switch 117 for switching the mode is provided. .. While the switch 117 is off, the first designation unit 102 has [a], [k], [s], [t], [n], as consonants, as shown in the figure on the left side of FIG. [H], [m], [y], and [r] can be specified. On the other hand, while the switch 117 is turned on, the first designated unit 102 has [y], [w], [g], [z], and [d] as consonants, as shown in the figure on the right side of FIG. ], [B], [p] can be specified.

図８は、別の変形例に係る被把持部１１の構造を例示する図であり、図９は、被把持部１１の動きと指定される子音との関係を例示する図である。入力装置１０は、接触面１１Ａに設けられたスイッチの操作に応じて子音を、入力装置１０の動きに応じて母音を指定してもよい。この例では図８に示すように、接触面１１Ａに、子音を指定するためのスイッチとして、スイッチ１１１〜１１４及び１１８が設けられている。この場合、４つのスイッチ１１１〜１１３及び１１８のオン／オフの組み合わせにより、清音、濁音及び半濁音を含む計１６個の子音を指定することができる。スイッチ１１４は、上述した実施形態と同様、拗音の使用の有無を指定するためのスイッチである。図９に示すように、この変形例では、第１指定部１０２及び第２指定部１０４に代えて、第１指定部１０６及び第２指定部１０７が設けられている。第１指定部１０６は、歌詞の母音及び子音のうちの子音を、操作検出部１０１により検出されたスイッチ１１１〜１１３及び１１８の操作の状態に応じて指定する。第２指定部１０７は、歌唱音声の歌詞の母音及び子音のうちの母音を、スイッチ１１４の操作の状態及び動き検出部１０３により検出された入力装置１０の動きに応じて指定する。送信部１０５は、第１指定部１０６により指定された子音及び第２指定部１０７により指定された母音を、歌唱合成制御装置２０へ送信する。歌唱合成制御装置２０の構成は、上述した実施形態と同じでよい。 FIG. 8 is a diagram illustrating the structure of the gripped portion 11 according to another modification, and FIG. 9 is a diagram illustrating the relationship between the movement of the gripped portion 11 and the designated consonant. The input device 10 may specify a consonant according to the operation of a switch provided on the contact surface 11A and a vowel according to the movement of the input device 10. In this example, as shown in FIG. 8, switches 111 to 114 and 118 are provided on the contact surface 11A as switches for designating consonants. In this case, a total of 16 consonants including clear sound, voiced sound, and semi-voiced sound can be specified by the combination of turning on / off the four switches 111 to 113 and 118. The switch 114 is a switch for designating whether or not to use the yoon, as in the above-described embodiment. As shown in FIG. 9, in this modified example, the first designated unit 106 and the second designated unit 107 are provided in place of the first designated unit 102 and the second designated unit 104. The first designation unit 106 designates a consonant among the vowels and consonants of the lyrics according to the operation state of the switches 111 to 113 and 118 detected by the operation detection unit 101. The second designation unit 107 designates the vowels and consonants of the lyrics of the singing voice according to the operation state of the switch 114 and the movement of the input device 10 detected by the movement detection unit 103. The transmission unit 105 transmits the consonants designated by the first designated unit 106 and the vowels designated by the second designated unit 107 to the singing synthesis control device 20. The configuration of the song synthesis control device 20 may be the same as that of the above-described embodiment.

図１０は、この変形例に係る入力装置１０及び歌唱合成制御装置２０の動作を示すフローチャートである。図１０のフローは、例えば、入力装置１０及び歌唱合成制御装置２０の電源がオンされている間において、実行される。入力装置１０において、第１指定部１０６は、操作検出部１０１の検出結果に基づいて、スイッチ１１１〜１１３及び１１８の少なくともいずれかが押されたか否かを判断する（ステップＳ３１）。いずれのスイッチも押されていないと判断した場合（ステップＳ３１；ＮＯ）、第１指定部１０６は、待機する。スイッチ１１１〜１１３及び１１８の少なくともいずれかが押されたと判断した場合（ステップＳ３１；ＹＥＳ）、第１指定部１０６は、子音を指定する（ステップＳ３２）。 FIG. 10 is a flowchart showing the operation of the input device 10 and the song synthesis control device 20 according to this modification. The flow of FIG. 10 is executed, for example, while the power of the input device 10 and the song synthesis control device 20 is turned on. In the input device 10, the first designation unit 106 determines whether or not at least one of the switches 111 to 113 and 118 has been pressed based on the detection result of the operation detection unit 101 (step S31). If it is determined that none of the switches are pressed (step S31; NO), the first designated unit 106 stands by. When it is determined that at least one of the switches 111 to 113 and 118 is pressed (step S31; YES), the first designation unit 106 designates a consonant (step S32).

次に、動き検出部１０３は、入力装置１０の動きを検出する（ステップＳ３３）。第２指定部１０７は、動き検出部１０３により検出された入力装置１０の動かされた方向に応じて、母音を指定する（ステップＳ３４）。 Next, the motion detection unit 103 detects the motion of the input device 10 (step S33). The second designation unit 107 designates a vowel according to the moving direction of the input device 10 detected by the motion detection unit 103 (step S34).

次に、送信部１０５は、指定された子音及び母音を、歌唱合成制御装置２０へ送信する（ステップＳ３５）。この送信後、入力装置１０の処理はステップＳ３１に戻される。即ち、スイッチ１１１〜１１３及び１１８の少なくともいずれかが押されている間は、送信部１０５は、子音及び母音を歌唱合成制御装置２０へ送信する。 Next, the transmission unit 105 transmits the designated consonants and vowels to the song synthesis control device 20 (step S35). After this transmission, the processing of the input device 10 is returned to step S31. That is, while at least one of the switches 111 to 113 and 118 is pressed, the transmission unit 105 transmits consonants and vowels to the song synthesis control device 20.

３−２．入力装置１０の動きと子音との関係
上述した実施形態で説明した入力装置１０の動きの方向と指定される子音との関係は、一例に過ぎない。例えば３軸の直交座標系を規定して、軸方向毎に異なる子音を対応させてもよい。また、入力装置１０の動きは、入力装置１０の振動に限られず、入力装置１０の姿勢の変化（回転、ひねり）等であってもよい。入力装置１０は、その動きに応じた子音又は母音が指定するように構成されていればよい。 3-2. Relationship between the movement of the input device 10 and consonants The relationship between the direction of movement of the input device 10 and the designated consonant described in the above-described embodiment is only an example. For example, a three-axis Cartesian coordinate system may be defined, and different consonants may be associated with each axis direction. Further, the movement of the input device 10 is not limited to the vibration of the input device 10, and may be a change (rotation, twist) of the posture of the input device 10. The input device 10 may be configured to specify a consonant or a vowel according to its movement.

３−３．他の変形例
入力装置１０の具体的形状は、実施形態において例示したものに限定されない。例えば、入力装置１０は、交通整理のための誘導灯等の、棒状に形成された装置であってもよい。また、入力装置１０は、照明機能を有さなくてもよく、例えば、杖又は指揮棒であってもよい。また、入力装置１０の形状は棒状であるものに限られず、例えば、ダンベル、又はユーザーの身体の部位に装着される装置（例えば、グローブ型の装置）等の、棒状でない装置であってもよい。また、入力装置１０は、携帯型のデバイス（例えば、スマートフォン）であってもよい。この場合、入力装置１０は、タッチスクリーンの表面をなぞるユーザーの指の動きを検出し、その動きに応じた母音又は子音を指定してもよい。この場合、タッチスクリーン上で指が移動した方向と、母音又は子音とが対応付けられていればよい。 3-3. Other Modifications The specific shape of the input device 10 is not limited to that illustrated in the embodiment. For example, the input device 10 may be a rod-shaped device such as a guide light for traffic control. Further, the input device 10 does not have to have a lighting function, and may be, for example, a cane or a baton. Further, the shape of the input device 10 is not limited to a rod shape, and may be a non-rod shape device such as a dumbbell or a device worn on a part of the user's body (for example, a glove type device). .. Further, the input device 10 may be a portable device (for example, a smartphone). In this case, the input device 10 may detect the movement of the user's finger tracing the surface of the touch screen and specify a vowel or a consonant according to the movement. In this case, it is sufficient that the direction in which the finger moves on the touch screen is associated with the vowel or consonant.

入力装置１０は、母音又は子音の指定に用いられるスイッチが押されている期間においてのみ、入力装置１０の動きを検出してもよい。これにより、常に入力装置１０の動きを検知する場合に比べて、入力装置１０消費電力の低減が期待できる。 The input device 10 may detect the movement of the input device 10 only during the period when the switch used for designating the vowel or consonant is pressed. As a result, the power consumption of the input device 10 can be expected to be reduced as compared with the case where the movement of the input device 10 is constantly detected.

入力装置において母音又は子音の指定に用いられる操作子は、モーメンタリ型のスイッチに限定されない。モーメンタリ型のスイッチに代えて、又は加えて、オルタネイト型のスイッチが用いられてもよい。あるいは、スイッチに代えて、又は加えて、レバー、スライダー、又はダイヤル等が用いられてもよい。 The controls used to specify vowels or consonants in the input device are not limited to momentary switches. Alternate type switches may be used in place of or in addition to the momentary type switches. Alternatively, or in addition to the switch, a lever, slider, dial, or the like may be used.

歌唱合成制御装置２０は、電子鍵盤楽器を模した外観を有していなくてもよく、弦楽器や、管楽器、吹奏楽器等の楽器を模した外観を有していてもよいし、楽器を模した外観でなくてもよい。歌唱合成制御装置２０は、少なくとも歌唱音声の合成を制御する機能を有していればよい。操作部２１に含まれる操作子の数も、１以上の数であれば、いくつでもよい。 The singing synthesis control device 20 may not have an appearance imitating an electronic keyboard instrument, may have an appearance imitating a musical instrument such as a stringed instrument, a wind instrument, or a wind instrument, or may imitate an instrument. It does not have to be the appearance. The singing synthesis control device 20 may have at least a function of controlling the synthesis of the singing voice. The number of controls included in the operation unit 21 may be any number as long as it is 1 or more.

上述した実施形態で説明した入力装置１０及び歌唱合成制御装置２０の構成又は動作の一部が省略されてもよい。例えば、入力装置１０が、拗音、濁音及び半濁音の少なくともいずれかを指定しない構成であってもよい。 A part of the configuration or operation of the input device 10 and the song synthesis control device 20 described in the above-described embodiment may be omitted. For example, the input device 10 may be configured not to specify at least one of a yoon, a voiced sound, and a semi-voiced sound.

１…音声合成装置、１０…入力装置、１０１…操作検出部、１０２…第１指定部、１０３…動き検出部、１０４…第２指定部、１０５…送信部、１０６…第１指定部、１０７…第２指定部、１１…被把持部、１１Ａ…接触面、１１１〜１１８…スイッチ、１２…発光部、２０…歌唱合成制御装置、２０１…受信部、２０２…操作検出部、２０３…決定部、２０４…合成指示部、２０５…音声合成部、２０６…音声出力部、２１…操作部、２１１…操作子、３０…ケーブル。 1 ... Voice synthesizer, 10 ... Input device, 101 ... Operation detection unit, 102 ... First designated unit, 103 ... Motion detection unit, 104 ... Second designated unit, 105 ... Transmitting unit, 106 ... First designated unit, 107 ... 2nd designated unit, 11 ... gripped unit, 11A ... contact surface, 111-118 ... switch, 12 ... light emitting unit, 20 ... singing synthesis control device, 201 ... receiver unit, 202 ... operation detection unit, 203 ... determination unit , 204 ... synthesis instruction unit, 205 ... voice synthesis unit, 206 ... voice output unit, 21 ... operation unit, 211 ... operator, 30 ... cable.

Claims

The first designated part that specifies one of the vowels and consonants of the lyrics of the singing voice synthesized by the singing synthesis control device according to the operation on the operator, and
A second designated part that specifies the other of the vowels and consonants according to the movement of the own device, and
An input device having a transmission unit that transmits the designated vowels and consonants to the singing synthesis control device.

It has a gripped portion that has a contact surface that comes into contact with the user's finger in use.
The input device according to claim 1, wherein the operator is provided on the contact surface of the gripped portion.

The input device according to claim 1 or 2, wherein the second designated unit designates the other of the vowel and the consonant according to the direction in which the input device is moved.

The input device is a device separate from the singing synthesis control device having an operator for inputting pitch.
The input device according to claim 1 or 2.

The input device according to any one of claims 1 to 4 and
Equipped with a singing synthesis control device
The singing synthesis control device is
A receiving unit that receives the specified vowels and consonants from the input device, and
With one or more controls,
An operation detection unit that detects an operation on one or more controls, and an operation detection unit.
It has a determination unit that determines the pitch according to the operator whose operation is detected by the operation detection unit, the vowels and consonants received by the reception unit, and a pitch determined by the determination unit. A voice synthesizer having a voice synthesizer that generates a synthetic voice.

In the input device, a step of designating one of the vowels and consonants of the lyrics of the singing voice synthesized by the singing synthesis control device according to the operation on the controller, and
In the input device, a step of designating the other of the vowel and the consonant according to the movement of the own device, and
A step in which the input device transmits the designated vowels and consonants to the song synthesis control device.
Input method with.

Computer,
The first designated part that specifies one of the vowels and consonants of the lyrics of the singing voice synthesized by the singing synthesis control device according to the operation on the operator, and
A second designated part that specifies the other of the vowels and consonants according to the movement of the own device, and
With a transmitter that transmits the specified vowels and consonants to the singing synthesis control device
A program to make it work.