JP6089496B2

JP6089496B2 - Sound generation apparatus and sound generation method

Info

Publication number: JP6089496B2
Application number: JP2012184938A
Authority: JP
Inventors: 英治赤澤
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2012-08-24
Filing date: 2012-08-24
Publication date: 2017-03-08
Anticipated expiration: 2032-08-24
Also published as: JP2014044230A

Description

本発明は、利用者からの指示に応じた音響信号を生成する技術に関する。 The present invention relates to a technique for generating an acoustic signal in accordance with an instruction from a user.

利用者からの指示に応じた音響信号を生成する技術が従来から提案されている。例えば特許文献１には、利用者からの指示に応じて合成された低音質の試聴用の音声データを利用者に提供し、所定の課金処理の実行を条件として高音質の購入用の音声データを利用者に提供する技術が開示されている。 Conventionally, a technique for generating an acoustic signal according to an instruction from a user has been proposed. For example, Patent Document 1 provides a user with low sound quality audio data synthesized in response to an instruction from a user, and purchases high sound quality audio data on condition that a predetermined charging process is executed. A technology for providing the user with the information is disclosed.

特開２００４−２９５６４５号公報JP 2004-295645 A

特許文献１の技術では、課金処理の実行前に利用者に提供される音声データが低音質の音声データに制限され、課金処理の実行を条件として制限が解除される。しかし、例えば高音質な音声データを必要としない利用者にとっては試聴用の音声データで充分であるから、課金処理を実行する充分な誘因を利用者に付与する（ひいては効果的な収益化を実現する）ことは実際には困難である。他方、極端に低音質の音声データを試聴用に提供すれば、購入手続を実行しようとする利用者の意欲をかえって減殺しかねない。以上の事情を考慮して、本発明は、音響信号の生成機能の制限を解除する誘因を有効に維持しながら音響信号の生成機能を効果的に制限することを目的とする。 In the technique of Patent Document 1, the audio data provided to the user before execution of the accounting process is limited to low-quality audio data, and the restriction is released on condition that the accounting process is executed. However, for users who do not need high-quality audio data, for example, audio data for audition is sufficient, so the user is given sufficient incentive to execute billing processing (and realize effective monetization). To do) is actually difficult. On the other hand, if extremely low-quality audio data is provided for trial listening, the user's willingness to execute the purchase procedure may be reduced. In view of the above circumstances, an object of the present invention is to effectively limit an acoustic signal generation function while effectively maintaining an incentive to cancel the limitation of the acoustic signal generation function.

以上の課題を解決するために、本発明の第１態様に係る音響生成装置は、第１動作モードまたは第２動作モードを選択する制御手段と、第１動作モードでは、利用者からの指示に応じた発音内容の音響信号を生成し、第２動作モードでは、第１動作モードと比較して低い自由度で発音内容が設定された音響信号を生成する音響生成手段とを具備する。以上の構成によれば、第２動作モードでの音響信号の発音内容が第１動作モードと比較して制限されるから、低音質の音声データを試聴のために利用者に提供する特許文献１の技術と比較すると、第１動作モードに移行する誘因を利用者に付与しながら第２動作モードにて音響信号の生成機能を効果的に制限することが可能である。 In order to solve the above-described problems, the sound generation device according to the first aspect of the present invention includes a control unit that selects the first operation mode or the second operation mode, and an instruction from the user in the first operation mode. According to the second operation mode, an acoustic signal generation unit configured to generate an acoustic signal in which the pronunciation content is set with a lower degree of freedom than the first operation mode is provided. According to the above configuration, since the sound content of the acoustic signal in the second operation mode is limited as compared with the first operation mode, the low-quality sound data is provided to the user for auditioning. Compared with the technique, it is possible to effectively limit the function of generating an acoustic signal in the second operation mode while giving the user an incentive to shift to the first operation mode.

なお、発音内容の設定の自由度とは、利用者からの指示に応じて発音内容が変更され得る度合（利用者が発音内容を自由に設定し得る度合）を意味する。自由度が高いほど、利用者からの指示が発音内容に反映される度合は大きく、自由度が低いほど、利用者からの指示が発音内容に反映される度合は小さい。利用者からの指示とは無関係に発音内容が設定される状態（発音内容が利用者からの指示に依存しない状態）は、自由度が最低である状態に相当する。以上の説明から理解される通り、第２動作モードは、第１動作モードと比較して小さい度合で利用者からの指示を発音内容に反映させる動作モードと、利用者からの指示を発音内容に反映させない動作モードとを包含する。 Note that the degree of freedom in setting the pronunciation content means the degree to which the pronunciation content can be changed in accordance with an instruction from the user (the degree to which the user can freely set the pronunciation content). The higher the degree of freedom, the greater the degree to which the instruction from the user is reflected in the pronunciation content, and the lower the degree of freedom, the smaller the degree to which the instruction from the user is reflected in the pronunciation content. A state where the pronunciation content is set regardless of the instruction from the user (a state where the pronunciation content does not depend on the instruction from the user) corresponds to a state where the degree of freedom is the lowest. As understood from the above description, the second operation mode is an operation mode in which the instruction from the user is reflected in the pronunciation content to a lesser degree than the first operation mode, and the instruction from the user is used as the pronunciation content. Operation modes that are not reflected.

本発明の好適な態様において、音響生成手段は、第２動作モードにおいて、第１動作モードと比較して音声符号（発音文字や音素記号）の種類数が少ない発音内容の音響信号を生成する。以上の態様の具体例は、例えば第１実施形態から第３実施形態として後述される。例えば、利用者の属性情報に対応する音声符号に制限された発音内容の音響信号を生成する構成（例えば後述の第２実施形態）や、利用者の位置情報に対応する音声符号に制限された発音内容の音響信号を生成する構成（例えば後述の第３実施形態）が好適である。 In a preferred aspect of the present invention, the sound generation means generates a sound signal of the pronunciation content in the second operation mode with a smaller number of types of phonetic codes (phonetic characters and phoneme symbols) than in the first operation mode. Specific examples of the above aspects will be described later as, for example, the first to third embodiments. For example, a configuration (for example, a second embodiment described later) that generates a sound signal having a pronunciation content limited to a voice code corresponding to user attribute information, or a voice code corresponding to user position information is used. A configuration (for example, a third embodiment to be described later) that generates sound signals of pronunciation content is suitable.

また、音響生成手段が、第２動作モードにおいて、利用者からの指示に対して非依存に設定された発音内容（すなわち利用者からの指示とは無関係に選定された発音内容）の音響信号を生成する構成（例えば後述の第４実施形態）や、音響生成手段が、第２動作モードにおいて、利用者からの指示に応じた音声符号を特定の音声符号（代替符号）に置換した発音内容の音響信号を生成する構成（例えば後述の第１実施形態や第５実施形態）も採用され得る。また、音響生成手段が、第２動作モードにおいて、音声符号の総数が第１動作モードと比較して制限された発音内容の音響信号を生成することも可能である。 In addition, the sound generation means outputs the sound signal of the sound production content that is set independent of the instruction from the user (that is, the sound production content selected regardless of the instruction from the user) in the second operation mode. The sound generation content (for example, a fourth embodiment to be described later) or the sound generation unit in which the sound generation unit replaces the sound code according to the instruction from the user with a specific sound code (alternative code) in the second operation mode. A configuration for generating an acoustic signal (for example, a first embodiment or a fifth embodiment described later) may be employed. The sound generation means can also generate a sound signal having a pronunciation content in which the total number of voice codes is limited in comparison with the first operation mode in the second operation mode.

本発明の第２態様に係る音響生成装置は、第１動作モードまたは第２動作モードを選択する制御手段と、第１動作モードでは、利用者からの指示に応じた音高の音響信号を生成し、第２動作モードでは、第１動作モードと比較して低い自由度で音高が設定された音響信号を生成する音響生成手段とを具備する。以上の構成によれば、第２動作モードでの音響信号の音高が第１動作モードと比較して制限されるから、低音質の音声データを試聴用に利用者に提供する特許文献１の技術と比較すると、第１動作モードに移行する誘因を利用者に付与しながら第２動作モードにて音響信号の生成機能を効果的に制限することが可能である。 The sound generation device according to the second aspect of the present invention generates a sound signal having a pitch according to an instruction from a user in the first operation mode and a control unit that selects the first operation mode or the second operation mode. The second operation mode includes sound generation means for generating an acoustic signal in which the pitch is set with a lower degree of freedom compared to the first operation mode. According to the above configuration, since the pitch of the acoustic signal in the second operation mode is limited as compared with the first operation mode, the low-quality audio data is provided to the user for audition. Compared with technology, it is possible to effectively limit the function of generating an acoustic signal in the second operation mode while giving the user an incentive to move to the first operation mode.

なお、音高の設定の自由度とは、利用者からの指示に応じて音高が変更され得る度合（利用者が音高を自由に設定し得る度合）を意味する。自由度が高いほど、利用者からの指示が音高に反映される度合は大きく、自由度が低いほど、利用者からの指示が音高に反映される度合は小さい。利用者からの指示とは無関係に音高が設定される状態（音高が利用者からの指示に依存しない状態）は、自由度が最低である状態に相当する。以上の説明から理解される通り、第２動作モードは、第１動作モードと比較して小さい度合で利用者からの指示を音高に反映させる動作モードと、利用者からの指示を音高に反映させない動作モードとを包含する。 The degree of freedom in setting the pitch means the degree to which the pitch can be changed according to an instruction from the user (the degree to which the user can set the pitch freely). The higher the degree of freedom, the greater the degree to which the instruction from the user is reflected in the pitch, and the lower the degree of freedom, the smaller the degree to which the instruction from the user is reflected in the pitch. A state in which the pitch is set regardless of the instruction from the user (a state in which the pitch does not depend on the instruction from the user) corresponds to a state in which the degree of freedom is the lowest. As understood from the above description, in the second operation mode, the operation mode in which the instruction from the user is reflected in the pitch and the instruction from the user in the pitch is smaller than the first operation mode. Operation modes that are not reflected.

具体的には、第２動作モードにおいて、音響生成手段が、第１動作モードと比較して音高の種類数が少ない音響信号を生成する構成（例えば後述の第７実施形態）や、利用者からの指示に対して非依存に設定された音高の音響信号を生成する構成（例えば後述の第８実施形態）が好適である。例えば、自動作曲処理で生成された旋律の音響信号を生成する構成や、事前に用意された旋律の音響信号を生成する構成が採用される。 Specifically, in the second operation mode, the sound generation means generates a sound signal with a smaller number of types of pitches compared to the first operation mode (for example, a seventh embodiment described later), or a user A configuration (for example, an eighth embodiment described later) that generates a sound signal having a pitch that is set to be independent of the instruction from is preferable. For example, a configuration for generating a melody acoustic signal generated by the automatic song processing or a configuration for generating a melody acoustic signal prepared in advance is adopted.

第１態様または第２態様に係る音響生成装置の好適例において、音響生成手段は、第２動作モードにおける音響信号の生成に対する制限の度合を、相異なる第１期間と第２期間とで相違させる。以上の態様によれば、第１動作モードに移行する誘因を利用者に効果的に付与することが可能である。 In a preferred example of the sound generation device according to the first aspect or the second aspect, the sound generation means makes the degree of restriction on the generation of the sound signal in the second operation mode different between the different first period and second period. . According to the above aspect, it is possible to effectively give the user an incentive to shift to the first operation mode.

以上の各態様に係る音響生成装置は、音響信号の生成に専用されるＤＳＰ（Digital Signal Processor）などのハードウェア（電子回路）で実現されるほか、ＣＰＵ（Central Processing Unit）などの汎用の演算処理装置とプログラムとの協働でも実現される。本発明のプログラムは、コンピュータが読取可能な記録媒体に格納された形態で提供されてコンピュータにインストールされるほか、通信網を介した配信の形態で提供されてコンピュータにインストールされる。 The sound generation device according to each aspect described above is realized by hardware (electronic circuit) such as DSP (Digital Signal Processor) dedicated to generation of sound signals, and general-purpose arithmetic such as CPU (Central Processing Unit). This is also realized by cooperation between the processing device and the program. The program of the present invention is provided in a form stored in a computer-readable recording medium and installed in the computer, or is provided in a form distributed via a communication network and installed in the computer.

本発明の第１態様に係るプログラムは、第１動作モードまたは第２動作モードを選択する制御処理と、第１動作モードでは、利用者からの指示に応じた発音内容の音響信号を生成し、第２動作モードでは、第１動作モードと比較して低い自由度で発音内容が設定された音響信号を生成する音響生成処理とをコンピュータに実行させる。また、本発明の第２態様に係るプログラムは、第１動作モードまたは第２動作モードを選択する制御処理と、第１動作モードでは、利用者からの指示に応じた音高の音響信号を生成し、第２動作モードでは、第１動作モードと比較して低い自由度で音高が設定された音響信号を生成する音響生成処理とをコンピュータに実行させる。 The program according to the first aspect of the present invention generates a sound signal having a pronunciation content according to an instruction from a user in the control process for selecting the first operation mode or the second operation mode, and in the first operation mode, In the second operation mode, the computer is caused to execute an acoustic generation process for generating an acoustic signal in which the pronunciation content is set with a lower degree of freedom than in the first operation mode. The program according to the second aspect of the present invention generates a sound signal having a pitch according to an instruction from a user in the control process for selecting the first operation mode or the second operation mode, and in the first operation mode. In the second operation mode, the computer is caused to execute an acoustic generation process for generating an acoustic signal in which the pitch is set with a lower degree of freedom than in the first operation mode.

本発明の第１実施形態に係る音響生成装置のブロック図である。1 is a block diagram of a sound generation device according to a first embodiment of the present invention. 制御情報の模式図である。It is a schematic diagram of control information. 音響生成部のブロック図である。It is a block diagram of a sound generation part. 第３実施形態の音響生成装置のブロック図である。It is a block diagram of the sound production | generation apparatus of 3rd Embodiment.

＜第１実施形態＞
図１は、本発明の第１実施形態に係る音響生成装置１００のブロック図である。第１実施形態の音響生成装置１００は、素片接続型の音声合成で歌唱音の音響信号Ｖを生成する音声合成装置であり、図１に示すように、演算処理装置１０と記憶装置１２と通信装置１４と入力装置１６と放音装置１８とを具備するコンピュータシステムで実現される。例えば据置型の情報処理装置（パーソナルコンピュータ）や携帯型の情報処理装置（例えば携帯電話機やスマートフォン等）で音響生成装置１００は実現される。 <First Embodiment>
FIG. 1 is a block diagram of a sound generation device 100 according to the first embodiment of the present invention. The acoustic generation device 100 according to the first embodiment is a speech synthesizer that generates an acoustic signal V of a singing sound by segment-connected speech synthesis. As illustrated in FIG. 1, the arithmetic processing device 10, the storage device 12, This is realized by a computer system including a communication device 14, an input device 16, and a sound emitting device 18. For example, the sound generation device 100 is realized by a stationary information processing device (personal computer) or a portable information processing device (for example, a mobile phone or a smartphone).

通信装置１４は、通信網（例えばインターネット）を介して通信する。通信装置１４と課金処理装置（図示略）とが通信網を介して通信することで利用者に対する課金処理が実行される。入力装置１６は、利用者からの指示を受付ける機器であり、例えば利用者が操作する複数の操作子を含んで構成される。放音装置１８（例えばヘッドホンやスピーカ）は、演算処理装置１０が生成した音響信号Ｖに応じた音波を放射する。 The communication device 14 communicates via a communication network (for example, the Internet). Charging processing for the user is executed by communication between the communication device 14 and a charging processing device (not shown) via a communication network. The input device 16 is a device that receives an instruction from a user, and includes, for example, a plurality of operators operated by the user. The sound emitting device 18 (for example, a headphone or a speaker) emits a sound wave corresponding to the acoustic signal V generated by the arithmetic processing device 10.

記憶装置１２は、演算処理装置１０が実行する音響生成プログラム（アプリケーションソフトウェア）ＰGMや演算処理装置１０が使用する各種のデータ（音声素片群Ｇ，制御情報Ｃ）を記憶する。半導体記録媒体や磁気記録媒体等の公知の記録媒体または複数の記録媒体の組合せが記憶装置１２として採用される。 The storage device 12 stores a sound generation program (application software) PGM executed by the arithmetic processing device 10 and various data (speech segment group G, control information C) used by the arithmetic processing device 10. A known recording medium such as a semiconductor recording medium or a magnetic recording medium or a combination of a plurality of recording media is employed as the storage device 12.

音声素片群Ｇは、音響信号Ｖの素材として利用される複数の音声素片の集合（音声合成ライブラリ）である。音声素片は、言語的な意味の区別の最小単位である音素（例えば母音や子音）や複数の音素を連結した音素連鎖（例えばダイフォンやトライフォン）である。 The speech unit group G is a set (speech synthesis library) of a plurality of speech units used as a material of the acoustic signal V. A phoneme segment is a phoneme (for example, a vowel or a consonant) that is the minimum unit of distinction of linguistic meaning, or a phoneme chain (for example, a diphone or a triphone) that connects a plurality of phonemes.

制御情報Ｃは、複数の音符の時系列で表現される旋律を指定する。図２に示すように、第１実施形態の制御情報Ｃは、楽曲内の各音符に各々が対応する複数の音符情報Ｎの時系列である。各音符情報Ｎは、音高情報Ｘ1と期間情報Ｘ2と発音情報Ｘ3とを含んで構成される。音高情報Ｘ1は、各音符の音高（各音高に対応する番号）を指定する。期間情報Ｘ2は、音符の発音期間を例えば音符の開始時刻と継続長とで指定する。なお、音符の開始時刻と終了時刻とで発音期間を規定することも可能である。発音情報Ｘ3は、音符の音声符号を指定する。例えば発音文字（書記素）や音素記号等の音声符号が発音情報Ｘ3で指定される。 The control information C designates a melody expressed by a time series of a plurality of notes. As shown in FIG. 2, the control information C of the first embodiment is a time series of a plurality of note information N each corresponding to each note in the music. Each note information N includes pitch information X1, period information X2, and pronunciation information X3. The pitch information X1 designates the pitch of each note (number corresponding to each pitch). The period information X2 specifies the sound generation period of the note by, for example, the start time and duration of the note. It is also possible to define the pronunciation period by the start time and end time of the notes. The pronunciation information X3 designates the voice code of the note. For example, a phonetic code such as a phonetic character (grapheme) or a phoneme symbol is specified by the phonetic information X3.

演算処理装置１０は、記憶装置１２に記憶された音響生成プログラムＰGMを実行することで、音響信号Ｖを生成するための複数の機能（動作制御部２２，音響生成部２４）を実現する。なお、演算処理装置１０の各機能を複数の集積回路に分散した構成や、専用の電子回路（ＤＳＰ）が演算処理装置１０の一部の機能を実現する構成も採用され得る。 The arithmetic processing unit 10 executes a sound generation program PGM stored in the storage device 12 to realize a plurality of functions (the operation control unit 22 and the sound generation unit 24) for generating the sound signal V. A configuration in which each function of the arithmetic processing device 10 is distributed over a plurality of integrated circuits, or a configuration in which a dedicated electronic circuit (DSP) realizes a part of the functions of the arithmetic processing device 10 may be employed.

動作制御部２２は、音響生成装置１００の動作モードを制御する。具体的には、動作制御部２２は、第１動作モードと第２動作モードとの一方を選択することが可能である。第１動作モードは、利用者が音響生成装置１００（音響生成プログラムＰGM）を正規に利用するための動作モードである。他方、第２動作モードは、利用者が音響生成装置１００を試用するための動作モード（試用モード，体験モード）である。動作制御部２２は、音響生成プログラムＰGMが記憶装置１２に記憶された直後の初期状態では第２動作モードを選択する。そして、入力装置１６に対する利用者からの指示を契機として認証処理や課金処理等の所定の購入処理が通信装置１４と課金処理装置（課金サーバ装置）との間で実行され、購入処理が適正に完了した場合に、動作制御部２２は動作モードを第２動作モードから第１動作モードに移行する。なお、購入処理の内容は任意である。例えば課金処理装置との間で通信せずに音響生成装置１００の単体で購入処理を実行することも可能である。 The operation control unit 22 controls the operation mode of the sound generation device 100. Specifically, the operation control unit 22 can select one of the first operation mode and the second operation mode. The first operation mode is an operation mode for the user to properly use the sound generation device 100 (sound generation program PGM). On the other hand, the second operation mode is an operation mode (trial mode, experience mode) for the user to try the sound generation device 100. The operation control unit 22 selects the second operation mode in the initial state immediately after the sound generation program PGM is stored in the storage device 12. Then, in response to an instruction from the user to the input device 16, predetermined purchase processing such as authentication processing and billing processing is executed between the communication device 14 and the billing processing device (billing server device), and the purchase processing is properly performed. When the operation is completed, the operation control unit 22 shifts the operation mode from the second operation mode to the first operation mode. The content of the purchase process is arbitrary. For example, it is also possible to execute purchase processing by the sound generation device 100 alone without communicating with the billing processing device.

音響生成部２４は、記憶装置１２に記憶された音声素片群Ｇと制御情報Ｃとを利用して音響信号Ｖを生成する。図３は、第１実施形態の音響生成部２４のブロック図である。図３に示すように、音響生成部２４は、情報編集部３２と合成処理部３４とを含んで構成される。 The sound generation unit 24 generates the sound signal V using the speech element group G and the control information C stored in the storage device 12. FIG. 3 is a block diagram of the sound generation unit 24 of the first embodiment. As shown in FIG. 3, the sound generation unit 24 includes an information editing unit 32 and a synthesis processing unit 34.

利用者は、入力装置１６を適宜に操作することで音高情報Ｘ1と期間情報Ｘ2と発音情報Ｘ3とを音符毎に指示することが可能である。図３の情報編集部３２は、入力装置１６に対する利用者からの指示に応じて制御情報Ｃ（音高情報Ｘ1，期間情報Ｘ2，発音情報Ｘ3）を生成および編集する。 The user can instruct the pitch information X1, the period information X2, and the pronunciation information X3 for each note by appropriately operating the input device 16. The information editing unit 32 in FIG. 3 generates and edits control information C (pitch information X1, period information X2, and pronunciation information X3) in response to an instruction from the user to the input device 16.

第１実施形態の情報編集部３２は、発音情報Ｘ3に指定可能な音声符号（発音文字や音素記号）の範囲を第１動作モードと第２動作モードとで相違させる。具体的には、情報編集部３２は、第２動作モードで発音情報Ｘ3に設定可能な音声符号を、第１動作モードで発音情報Ｘ3に設定可能な音声符号と比較して制限する。すなわち、第２動作モードでは、音声素片群Ｇの複数の音声素片のうち特定の音声素片の集合に対応する範囲（以下「発音制限範囲」という）ＱA内の音声符号のみが発音情報Ｘ3の候補として利用者に許可され、第１動作モードでは、利用者が発音情報Ｘ3として指示可能な候補が音声素片群Ｇ内の全部の音声素片に対応する音声符号まで拡張される。例えば、音名に対応する音声符号（ドレミファソラシド）の範囲が発音制限範囲ＱAとして好適である。なお、音高情報Ｘ1および期間情報Ｘ2について利用者が指示可能な範囲は第１動作モードと第２動作モードとで共通する。 The information editing unit 32 of the first embodiment makes the range of phonetic codes (phonetic characters and phoneme symbols) that can be specified in the phonetic information X3 different between the first operation mode and the second operation mode. Specifically, the information editing unit 32 limits the speech code that can be set in the pronunciation information X3 in the second operation mode as compared with the speech code that can be set in the pronunciation information X3 in the first operation mode. That is, in the second operation mode, only the speech code in the range QA corresponding to a set of specific speech units (hereinafter referred to as “sound generation restriction range”) among the plurality of speech units of the speech unit group G is the pronunciation information. In the first operation mode, candidates that can be designated as the pronunciation information X3 are expanded to speech codes corresponding to all speech units in the speech unit group G in the first operation mode. For example, a range of a voice code (doremifasolaside) corresponding to a pitch name is suitable as the pronunciation restriction range QA. It should be noted that the range in which the user can instruct the pitch information X1 and the period information X2 is common to the first operation mode and the second operation mode.

具体的には、情報編集部３２は、利用者が指示した音声符号が発音制限範囲ＱA内の音声符号に該当するか否かを判定し、発音制限範囲ＱA内の音声符号に該当する場合にはその音声符号を発音情報Ｘ3に設定する一方、発音制限範囲ＱA内の音声符号に該当しない場合には、利用者が指示した音声符号を特定の音声符号（以下「代替符号」という）に置換する。代替符号は、利用者からの指示とは無関係に事前に選定された音声符号である。代替符号に対応する音声素片は音声素片群Ｇに包含される。例えば楽曲の歌詞として一般的な「ラ」等の音声符号や「ピー」等の擬声語の音声符号が代替符号として好適である。また、無音の音声符号を代替符号として利用することも可能である。 Specifically, the information editing unit 32 determines whether or not the voice code instructed by the user corresponds to a voice code within the pronunciation restriction range QA, and when it corresponds to a voice code within the pronunciation restriction range QA. Sets the voice code in the pronunciation information X3, but replaces the voice code designated by the user with a specific voice code (hereinafter referred to as "alternative code") if the voice code does not fall within the pronunciation restriction range QA. To do. The alternative code is a voice code selected in advance regardless of an instruction from the user. A speech unit corresponding to the alternative code is included in the speech unit group G. For example, a general voice code such as “La” or an onomatopoeia voice code such as “Pee” is suitable as an alternative code for the lyrics of music. It is also possible to use a silent voice code as an alternative code.

合成処理部３４は、記憶装置１２内の音声素片群Ｇと制御情報Ｃとを利用して音響信号Ｖを生成する。具体的には、合成処理部３４は、音高情報Ｘ1が指定する音高と期間情報Ｘ2が指定する発音期間とに対応する各音符を、制御情報Ｃの発音情報Ｘ3に対応する音声符号で発声した音声（歌唱音）の音響信号Ｖを生成する。具体的には、合成処理部３４は、制御情報Ｃの各音符情報Ｎの発音情報Ｘ3に対応する音声素片を音声素片群Ｇから順次に選択したうえで音高情報Ｘ1の音高と期間情報Ｘ2の発音期間とに調整し、調整後の各音声素片を相互に連結することで音響信号Ｖを生成する。合成処理部３４が生成した音響信号Ｖが放音装置１８に供給されて音波として再生される。なお、制御情報Ｃに応じた音響信号Ｖの生成には公知の音声合成技術が任意に採用される。 The synthesis processing unit 34 generates the acoustic signal V using the speech element group G and the control information C in the storage device 12. Specifically, the synthesis processing unit 34 uses each voice note corresponding to the pitch specified by the pitch information X1 and the pronunciation period specified by the period information X2 as a voice code corresponding to the pronunciation information X3 of the control information C. An acoustic signal V of the uttered voice (singing sound) is generated. Specifically, the synthesis processing unit 34 sequentially selects speech units corresponding to the pronunciation information X3 of each note information N of the control information C from the speech unit group G, and then determines the pitch of the pitch information X1. The sound signal V is generated by adjusting the sound generation period of the period information X2 and connecting the adjusted speech segments to each other. The acoustic signal V generated by the synthesis processing unit 34 is supplied to the sound emitting device 18 and reproduced as a sound wave. A known speech synthesis technique is arbitrarily employed for generating the acoustic signal V in accordance with the control information C.

第１動作モードでは音声素片群Ｇ内の全部の音声素片に対応する音声符号を候補として発音情報Ｘ3が指定され、第２動作モードで発音情報Ｘ3の候補となる音声符号は発音制限範囲ＱA内に制限される。したがって、第１実施形態の音響生成部２４は、第１動作モードと比較して音声符号（発音記号や音素記号）の種類数が少ない音響信号Ｖを第２動作モードにて生成する要素として機能する。第２動作モードにおける音声符号の制限（発音制限範囲ＱA）が第１動作モードでは解除されると換言することも可能である。 In the first operation mode, the pronunciation information X3 is designated by using the speech codes corresponding to all the speech units in the speech unit group G as candidates, and the speech code that is a candidate for the pronunciation information X3 in the second operation mode is the pronunciation restriction range. Limited within QA. Therefore, the sound generation unit 24 according to the first embodiment functions as an element that generates the sound signal V having a smaller number of types of phonetic codes (phonetic symbols and phoneme symbols) in the second operation mode than in the first operation mode. To do. In other words, the restriction of the sound code in the second operation mode (sound generation restriction range QA) is released in the first operation mode.

以上に説明した通り、第１実施形態では、第２動作モードでの音響信号Ｖの発音内容（音声符号の種類数）が第１動作モードと比較して制限されるから、低音質の音声データを試聴用に利用者に提供する特許文献１の技術と比較して音響生成装置１００の機能が有効に制限され、購入処理の実行で第２動作モードを第１動作モードに移行させる（音響信号Ｖの生成機能の制限を解除する）充分な誘因を利用者に付与することが可能である。 As described above, in the first embodiment, the sound content (number of types of speech codes) of the acoustic signal V in the second operation mode is limited as compared with the first operation mode. The function of the sound generation device 100 is effectively limited as compared with the technique of Patent Document 1 that provides the user with a sample for listening, and the second operation mode is shifted to the first operation mode by executing the purchase process (acoustic signal). It is possible to give the user a sufficient incentive to remove the restriction on the function of generating V.

＜第２実施形態＞
本発明の第２実施形態を以下に説明する。第１実施形態では、第２動作モードで発音情報Ｘ3の候補となる音声符号の発音制限範囲ＱAを事前に設定した。第２実施形態では発音制限範囲ＱAが可変に設定される。なお、以下に例示する各形態において作用や機能が第１実施形態と同等である要素については、第１実施形態で参照した符号を流用して各々の詳細な説明を適宜に省略する。 Second Embodiment
A second embodiment of the present invention will be described below. In the first embodiment, the pronunciation restriction range QA of the speech code that is a candidate for the pronunciation information X3 in the second operation mode is set in advance. In the second embodiment, the sound generation restriction range QA is variably set. In addition, about the element which an effect | action and function are equivalent to 1st Embodiment in each form illustrated below, the detailed description of each is abbreviate | omitted suitably using the code | symbol referred in 1st Embodiment.

第２実施形態の記憶装置１２は、利用者の属性情報を記憶する。属性情報は、例えば利用者の氏名等の個人情報や利用者の趣味や嗜好を指定する情報を含んで構成される。情報編集部３２は、記憶装置１２に記憶された利用者の属性情報に応じて発音制限範囲ＱAを可変に設定する。例えば、属性情報が示す利用者の氏名の音声符号（発音文字や音素記号）を包含するように発音制限範囲ＱAを設定する構成や、属性情報が示す利用者の趣味や嗜好に関連する単語の音声符号を包含するように発音制限範囲ＱAを設定する構成が好適に採用される。第１動作モードでは、第１実施形態と同様に、発音制限範囲ＱAの制限が解除される。 The storage device 12 of the second embodiment stores user attribute information. The attribute information includes, for example, personal information such as the user's name and information specifying the user's hobbies and preferences. The information editing unit 32 variably sets the pronunciation restriction range QA according to the user attribute information stored in the storage device 12. For example, a configuration in which the pronunciation restriction range QA is set so as to include the voice code (phonetic character or phoneme symbol) of the user's name indicated by the attribute information, or a word related to the user's hobbies and preferences indicated by the attribute information A configuration in which the pronunciation restriction range QA is set so as to include the speech code is preferably employed. In the first operation mode, the restriction on the sound generation restriction range QA is released as in the first embodiment.

音声素片群Ｇと制御情報Ｃとを利用した音響信号Ｖの生成は第１実施形態と同様である。したがって、第２実施形態の音響生成部２４は、利用者の属性情報に対応する音声符号に制限された発音内容の音響信号Ｖを第２動作モードにて生成する要素として機能する。以上の説明から理解されるように、第２実施形態においても、第２動作モードでの音響信号Ｖの発音内容が第１動作モードと比較して制限されるから、第１実施形態と同様の効果が実現される。また、第２実施形態では、利用者の属性情報に応じて発音制限範囲ＱAが可変に設定されるから、発音制限範囲ＱAの制限の範囲内で利用者の属性情報を反映した発音内容の音響信号Ｖを生成できるという利点がある。 The generation of the acoustic signal V using the speech element group G and the control information C is the same as in the first embodiment. Therefore, the sound generation unit 24 of the second embodiment functions as an element that generates the sound signal V of the pronunciation content limited to the voice code corresponding to the user attribute information in the second operation mode. As can be understood from the above description, also in the second embodiment, the sound generation content of the acoustic signal V in the second operation mode is limited as compared with the first operation mode, and thus the same as in the first embodiment. The effect is realized. In the second embodiment, the pronunciation restriction range QA is variably set according to the user's attribute information. Therefore, the sound of the pronunciation content reflecting the user's attribute information within the restriction range of the pronunciation restriction range QA. There is an advantage that the signal V can be generated.

＜第３実施形態＞
図４は、第３実施形態の音響生成装置１００のブロック図である。図４に示すように、第３実施形態の音響生成装置１００は、位置検出装置４０を第１実施形態に追加した構成である。位置検出装置４０は、音響生成装置１００の位置（利用者の位置）を検出する。例えば、位置検出装置４０は、複数のＧＰＳ（Global Positioning System）衛星からの電波を受信して位置情報を生成する。 <Third Embodiment>
FIG. 4 is a block diagram of the sound generation device 100 of the third embodiment. As illustrated in FIG. 4, the sound generation device 100 according to the third embodiment has a configuration in which a position detection device 40 is added to the first embodiment. The position detection device 40 detects the position (user position) of the sound generation device 100. For example, the position detection device 40 receives radio waves from a plurality of GPS (Global Positioning System) satellites and generates position information.

第３実施形態の情報編集部３２は、第２実施形態と同様に、第２動作モードで発音情報Ｘ3の候補となる音声符号の発音制限範囲ＱAを可変に設定する。具体的には、情報編集部３２は、位置検出装置４０が生成した位置情報に応じて発音制限範囲ＱAを可変に設定する。例えば、位置情報が示す位置に関連する単語（例えば地名等や施設名等）の音声符号を包含するように発音制限範囲ＱAを設定する構成が好適である。第１動作モードでは、第１実施形態と同様に、発音制限範囲ＱAの制限が解除される。 As in the second embodiment, the information editing unit 32 of the third embodiment variably sets the pronunciation restriction range QA of a speech code that is a candidate for the pronunciation information X3 in the second operation mode. Specifically, the information editing unit 32 variably sets the pronunciation restriction range QA according to the position information generated by the position detection device 40. For example, a configuration in which the pronunciation restriction range QA is set so as to include a speech code of a word (for example, place name or facility name) related to the position indicated by the position information is suitable. In the first operation mode, the restriction on the sound generation restriction range QA is released as in the first embodiment.

音声素片群Ｇと制御情報Ｃとを利用した音響信号Ｖの生成は第１実施形態と同様である。したがって、第３実施形態の音響生成部２４は、利用者の位置情報に対応する音声符号に制限された発音内容の音響信号Ｖを第２動作モードにて生成する要素として機能する。以上の説明から理解されるように、第３実施形態においても、第２動作モードでの音響信号Ｖの発音内容が第１動作モードと比較して制限されるから、第１実施形態と同様の効果が実現される。また、第３実施形態では、利用者の位置情報に応じて発音制限範囲ＱAが可変に設定されるから、発音制限範囲ＱAの制限の範囲内で利用者の位置情報を反映した発音内容の音響信号Ｖを生成できるという利点がある。また、利用者の位置に応じて相異なる発音内容の音響信号Ｖを生成できるという興趣性を利用者に提供することも可能である。 The generation of the acoustic signal V using the speech element group G and the control information C is the same as in the first embodiment. Therefore, the sound generation unit 24 according to the third embodiment functions as an element that generates the sound signal V of the pronunciation content limited to the sound code corresponding to the position information of the user in the second operation mode. As can be understood from the above description, also in the third embodiment, the sound generation content of the acoustic signal V in the second operation mode is limited as compared with the first operation mode, and thus the same as in the first embodiment. The effect is realized. In the third embodiment, since the pronunciation restriction range QA is variably set according to the position information of the user, the sound of the pronunciation content reflecting the user's position information within the restriction range of the pronunciation restriction range QA. There is an advantage that the signal V can be generated. In addition, it is possible to provide the user with an interest that the sound signal V having different pronunciation contents can be generated according to the position of the user.

＜第４実施形態＞
第１実施形態から第３実施形態では、発音制限範囲ＱA内で利用者からの指示に応じて発音情報Ｘ3を設定した。第４実施形態の情報編集部３２は、第２動作モードにおいて、利用者からの指示に対して非依存な音声符号（発音文字や音素記号）を発音情報Ｘ3として指定する。 <Fourth embodiment>
In the first to third embodiments, the pronunciation information X3 is set in response to an instruction from the user within the pronunciation restriction range QA. In the second operation mode, the information editing unit 32 of the fourth embodiment specifies a speech code (phonetic character or phoneme symbol) that is independent of an instruction from the user as the phonetic information X3.

具体的には、情報編集部３２は、制御情報Ｃ内の各音符情報Ｎの発音情報Ｘ3をランダムに設定する。例えば、情報編集部３２は、事前に用意された複数の候補から単語を順次にランダムに選択し、各単語に対応する音声符号を順番に各発音情報Ｘ3に割当てる。発音情報Ｘ3の候補を発音制限範囲ＱA内の音声符号に制限するか否かは不問である。また、事前に用意された文章に対応する音声符号を順番に各発音情報Ｘ3に割当てる構成も採用され得る。第１動作モードでは、第１実施形態と同様に、利用者が指示した任意の音声符号が発音情報Ｘ3として指定され得る。 Specifically, the information editing unit 32 randomly sets the pronunciation information X3 of each note information N in the control information C. For example, the information editing unit 32 sequentially selects a word from a plurality of candidates prepared in advance at random, and assigns a speech code corresponding to each word to each pronunciation information X3 in order. It does not matter whether or not the candidates for the pronunciation information X3 are limited to speech codes within the pronunciation restriction range QA. A configuration may also be employed in which speech codes corresponding to sentences prepared in advance are assigned to each pronunciation information X3 in order. In the first operation mode, as in the first embodiment, any voice code designated by the user can be designated as the pronunciation information X3.

音声素片群Ｇと制御情報Ｃとを利用した音響信号Ｖの生成は第１実施形態と同様である。したがって、第４実施形態の音響生成部２４は、利用者からの指示に対して非依存に設定された発音内容（すなわち利用者からの指示とは無関係に選定された発音内容）の音響信号Ｖを生成する要素として機能する。以上の説明から理解される通り、第４実施形態においても、第２動作モードでの音響信号Ｖの発音内容が第１動作モードと比較して制限されるから、第１実施形態と同様の効果が実現される。 The generation of the acoustic signal V using the speech element group G and the control information C is the same as in the first embodiment. Therefore, the sound generation unit 24 according to the fourth embodiment is configured to generate the sound signal V of the sound production content set independently of the user's instruction (that is, the sound production content selected regardless of the user's instruction). Functions as an element that generates As understood from the above description, also in the fourth embodiment, the sound content of the acoustic signal V in the second operation mode is limited as compared with the first operation mode, and thus the same effect as in the first embodiment. Is realized.

＜第５実施形態＞
第５実施形態の情報編集部３２は、第１動作モードおよび第２動作モードの双方において、利用者が任意の音声符号を発音情報Ｘ3として指定することが可能である。ただし、第２動作モードでは、情報編集部３２は、利用者からの指示に応じて設定された発音情報Ｘ3の一部を、利用者からの指示に非依存な代替符号（例えば第１実施形態で例示した「ラ」や「ピー」等の音声符号）に置換する。例えば、利用者が指示した複数の発音情報Ｘ3の時系列からランダムに選択された各発音情報Ｘ3が代替符号に置換される。第１動作モードでは、第１実施形態と同様に、利用者が指示した任意の音声符号が発音情報Ｘ3として指定される。 <Fifth Embodiment>
The information editing unit 32 of the fifth embodiment allows the user to specify an arbitrary voice code as the pronunciation information X3 in both the first operation mode and the second operation mode. However, in the second operation mode, the information editing unit 32 replaces a part of the pronunciation information X3 set according to the instruction from the user with an alternative code independent of the instruction from the user (for example, the first embodiment). (Speech codes such as “La” and “P”) illustrated in FIG. For example, each piece of pronunciation information X3 randomly selected from the time series of the plurality of pronunciation information X3 designated by the user is replaced with a substitute code. In the first operation mode, as in the first embodiment, an arbitrary voice code designated by the user is designated as the pronunciation information X3.

音声素片群Ｇと制御情報Ｃとを利用した音響信号Ｖの生成は第１実施形態と同様である。したがって、第５実施形態の音響生成部２４は、第２動作モードにおいて、利用者が指示した音声符号を特定の音声符号（代替符号）に置換した発音内容の音響信号Ｖを生成する要素として機能する。以上の説明から理解される通り、第５実施形態においても、第２動作モードでの音響信号Ｖの発音内容が第１動作モードと比較して制限されるから、第１実施形態と同様の効果が実現される。 The generation of the acoustic signal V using the speech element group G and the control information C is the same as in the first embodiment. Therefore, the sound generation unit 24 of the fifth embodiment functions as an element that generates the sound signal V of the pronunciation content in which the sound code instructed by the user is replaced with a specific sound code (alternative code) in the second operation mode. To do. As understood from the above description, also in the fifth embodiment, since the sound content of the acoustic signal V in the second operation mode is limited as compared with the first operation mode, the same effect as in the first embodiment. Is realized.

＜第６実施形態＞
第６実施形態の情報編集部３２は、制御情報Ｃで指定可能な音符数の最大値（以下「最大音符数」という）Ｍを第１動作モードと第２動作モードとで相違させる。具体的には、第２動作モードでの最大音符数Ｍ2は第１動作モードでの最大音符数Ｍ1を下回る。すなわち、第２動作モードでは第１動作モードと比較して音響信号Ｖの最大音符数Ｍが制限される。音高情報Ｘ1と期間情報Ｘ2と発音情報Ｘ3とは音符毎に設定されるから、音高情報Ｘ1で指定される音高の総数や発音情報Ｘ3で指定される音声符号の総数も第２動作モードでは第１動作モードと比較して制限される。 <Sixth Embodiment>
The information editing unit 32 of the sixth embodiment makes the maximum number of notes (hereinafter referred to as “maximum number of notes”) M that can be specified by the control information C different between the first operation mode and the second operation mode. Specifically, the maximum note number M2 in the second operation mode is lower than the maximum note number M1 in the first operation mode. That is, in the second operation mode, the maximum number M of notes of the acoustic signal V is limited as compared with the first operation mode. Since the pitch information X1, the period information X2 and the pronunciation information X3 are set for each note, the total number of pitches specified by the pitch information X1 and the total number of speech codes specified by the pronunciation information X3 are also the second operation. The mode is limited as compared with the first operation mode.

音声素片群Ｇと制御情報Ｃとを利用した音響信号Ｖの生成は第１実施形態と同様である。したがって、第６実施形態の音響生成部２４は、第２動作モードにおいて、音声符号の総数（最大音符数Ｍ）が第１動作モードと比較して制限された発音内容の音響信号Ｖを生成する要素として機能する。以上の説明から理解される通り、第６実施形態においても、第２動作モードでの音響信号Ｖの発音内容が第１動作モードと比較して制限されるから、第１実施形態と同様の効果が実現される。 The generation of the acoustic signal V using the speech element group G and the control information C is the same as in the first embodiment. Therefore, in the second operation mode, the sound generation unit 24 of the sixth embodiment generates the sound signal V of the pronunciation content in which the total number of voice codes (maximum note number M) is limited compared to the first operation mode. Acts as an element. As understood from the above description, in the sixth embodiment, the sound content of the acoustic signal V in the second operation mode is limited as compared with the first operation mode, and thus the same effect as in the first embodiment. Is realized.

＜第７実施形態＞
第１実施形態から第６実施形態では、制御情報Ｃ内の各音符情報Ｎの音高情報Ｘ1を利用者が任意に指示できる構成を例示した。第７実施形態では、音高情報Ｘ1に設定可能な音高の範囲を第１動作モードと第２動作モードとで相違させる。 <Seventh embodiment>
In the first to sixth embodiments, the configuration in which the user can arbitrarily specify the pitch information X1 of each note information N in the control information C has been exemplified. In the seventh embodiment, the range of pitches that can be set in the pitch information X1 is made different between the first operation mode and the second operation mode.

具体的には、情報編集部３２は、第２動作モードで音高情報Ｘ1に設定可能な音高の範囲を、第１動作モードで音高情報Ｘ1に設定可能な音高の範囲と比較して制限する。すなわち、第２動作モードでは、第１動作モードで設定可能な複数の音高から選択された特定の範囲（以下「音高制限範囲」という）ＱB内の音高のみが音高情報Ｘ1の候補として利用者に許可され、第１動作モードでは、利用者が音高情報Ｘ1として指示可能な候補が全種類の音高に拡張される。例えば、鍵盤楽器の白鍵に対応する７個の幹音（黒鍵に対応する派生音以外の音高）を音高制限範囲ＱBに設定した構成や、所定の３個の音高（例えばドレミ）のみを音高制限範囲ＱBに設定した構成が採用される。また、特定のスケール（例えばアイオニアンスケールや琉球スケール）や特定の音域（オクターブ）に属する各音高を音高制限範囲ＱBに設定することも可能である。 Specifically, the information editing unit 32 compares the pitch range that can be set in the pitch information X1 in the second operation mode with the pitch range that can be set in the pitch information X1 in the first operation mode. Limit. That is, in the second operation mode, only a pitch within a specific range (hereinafter referred to as “pitch limit range”) QB selected from a plurality of pitches that can be set in the first operation mode is a candidate for pitch information X1. In the first operation mode, candidates that the user can designate as pitch information X1 are expanded to all types of pitches. For example, a configuration in which seven stem sounds corresponding to white keys of a keyboard instrument (pitch other than a derived sound corresponding to a black key) are set in a pitch limit range QB, or a predetermined three pitches (for example, Doremi ) Only in the pitch limit range QB is adopted. It is also possible to set each pitch belonging to a specific scale (for example, an Ionian scale or Ryukyu scale) or a specific pitch range (octave) as the pitch limit range QB.

音声素片群Ｇと制御情報Ｃとを利用した音響信号Ｖの生成は第１実施形態と同様である。したがって、第７実施形態の音響生成部２４は、第１動作モードと比較して音高の種類数が少ない音響信号Ｖを第２動作モードにて生成する要素として機能する。第２動作モードにおける音高の制限（音高制限範囲ＱB）が第１動作モードでは解除されると換言することも可能である。なお、以上の説明では発音情報Ｘ3が指定する音高の制限のみに着目したが、発音情報Ｘ3が指定する音声符号の制限については第１実施形態から第６実施形態の何れかの構成が任意に採用される。 The generation of the acoustic signal V using the speech element group G and the control information C is the same as in the first embodiment. Therefore, the sound generation unit 24 of the seventh embodiment functions as an element that generates the sound signal V having a smaller number of types of pitches in the second operation mode than in the first operation mode. In other words, it can be said that the pitch restriction (pitch restriction range QB) in the second operation mode is released in the first operation mode. In the above description, only the pitch limit specified by the pronunciation information X3 is focused. However, any of the configurations of the first to sixth embodiments is optional for the restriction of the voice code specified by the pronunciation information X3. Adopted.

以上に説明した通り、第７実施形態では、第２動作モードでの音響信号Ｖの音高が第１動作モードと比較して制限されるから、低音質の音声データを試聴用に利用者に提供する特許文献１の技術と比較して音響生成装置１００の機能が有効に制限され、購入処理の実行で第２動作モードを第１動作モードに移行させる（音響信号Ｖの生成機能の制限を解除する）充分な誘因を利用者に付与することが可能である。 As described above, in the seventh embodiment, since the pitch of the acoustic signal V in the second operation mode is limited as compared with the first operation mode, low-quality sound data is provided to the user for listening. Compared with the technology of Patent Document 1 to be provided, the function of the sound generation device 100 is effectively limited, and the second operation mode is shifted to the first operation mode by executing the purchase process (the generation function of the sound signal V is limited). It is possible to give a sufficient incentive to the user.

＜第８実施形態＞
第７実施形態では、音高制限範囲ＱB内で利用者からの指示に応じて音高情報Ｘ1を設定した。第８実施形態の情報編集部３２は、第２動作モードにおいて、利用者からの指示に対して非依存な音高（すなわち、利用者からの指示とは無関係に選定された音高）を制御情報Ｃ内の各音高情報Ｘ1として指定する。 <Eighth Embodiment>
In the seventh embodiment, the pitch information X1 is set in response to an instruction from the user within the pitch limit range QB. In the second operation mode, the information editing unit 32 of the eighth embodiment controls the pitch independent of the instruction from the user (that is, the pitch selected regardless of the instruction from the user). Designated as each pitch information X1 in the information C.

具体的には、情報編集部３２は、第２動作モードにおいて、自動作曲処理で生成した旋律の各音高を順番に各音高情報Ｘ1に割当てる。自動作曲処理には公知の技術が任意に採用される。なお、利用者が指示した単語や文章（例えば利用者が発音情報Ｘ3として指示した各音声符号の配列）に応じた旋律を生成することも可能である。また、事前に用意された旋律の各音高を各音高情報Ｘ1に割当てる構成も採用され得る。第１動作モードでは、音高の制限が解除され、利用者からの指示に応じた音高の音響信号Ｖが生成される。 Specifically, in the second operation mode, the information editing unit 32 sequentially assigns each pitch of the melody generated by the automatic song processing to each pitch information X1. A known technique is arbitrarily adopted for the automatic composition process. It is also possible to generate a melody according to a word or sentence designated by the user (for example, an arrangement of each voice code designated by the user as the pronunciation information X3). In addition, a configuration in which each pitch of a melody prepared in advance is assigned to each pitch information X1 may be employed. In the first operation mode, the restriction on the pitch is released, and a sound signal V having a pitch according to an instruction from the user is generated.

音声素片群Ｇと制御情報Ｃとを利用した音響信号Ｖの生成は第１実施形態と同様である。したがって、第８実施形態の音響生成部２４は、利用者からの指示に対して非依存に設定された音高の音響信号Ｖを生成する要素として機能する。以上の説明から理解される通り、第７実施形態においても、第２動作モードでの音響信号Ｖの音高が第１動作モードと比較して制限されるから、第７実施形態と同様の効果が実現される。 The generation of the acoustic signal V using the speech element group G and the control information C is the same as in the first embodiment. Therefore, the sound generation unit 24 of the eighth embodiment functions as an element that generates the sound signal V having a pitch set independently of the instruction from the user. As understood from the above description, also in the seventh embodiment, since the pitch of the acoustic signal V in the second operation mode is limited as compared with the first operation mode, the same effect as in the seventh embodiment. Is realized.

＜第９実施形態＞
第９実施形態の音響生成部２４は、第２動作モードでの音響信号Ｖの生成に対する制限の度合を、購入処理の実行前の複数の期間の各々で個別に設定する。例えば、音響生成プログラムＰGMが記憶装置１２に記憶された直後の所定長（例えば３日間）にわたる第１期間Ａ1では、発音制限範囲ＱAがサ行（サシスセソ）に設定され、音高制限範囲ＱBがＣメジャースケールに設定される。第１期間Ａ1の経過後の所定長（例えば７日間）にわたる第２期間Ａ2では、発音制限範囲ＱAが音名に対応する音声符号（ドレミファソラシド）に設定され、音高情報Ｘ1が第８実施形態における自動作曲処理の旋律に制限される。第２期間Ａ2の経過から購入処理の完了（第１動作モードへの移行）までの第３期間Ａ3では、発音制限範囲ＱAが「ラ」に設定され、音高情報Ｘ1が所定の旋律の各音高に制限される。なお、音響信号Ｖの生成機能の制限（発音内容や音高の制限）を経時的に強化する構成と生成機能の制限を経時的に緩和する構成の双方が想定され得る。 <Ninth Embodiment>
The sound generation unit 24 of the ninth embodiment individually sets the degree of restriction on the generation of the sound signal V in the second operation mode in each of a plurality of periods before execution of the purchase process. For example, in the first period A1 over a predetermined length (for example, 3 days) immediately after the sound generation program PGM is stored in the storage device 12, the sound generation restriction range QA is set to the bank (suspicious seso) and the pitch restriction range QB Set to C major scale. In the second period A2 over a predetermined length (for example, 7 days) after the passage of the first period A1, the pronunciation restriction range QA is set to the voice code (doremifasolaside) corresponding to the pitch name, and the pitch information X1 is the eighth implementation Limited to the melody of automatic song processing in the form. In the third period A3 from the elapse of the second period A2 to the completion of the purchase process (transition to the first operation mode), the pronunciation restriction range QA is set to “La”, and the pitch information X1 is set to each predetermined melody. Limited to pitch. It is possible to envisage both a configuration in which the restriction on the generation function of the acoustic signal V (a restriction on the sound content and pitch) is strengthened over time and a structure in which the restriction on the generation function is relaxed over time.

第９実施形態においても前述の各形態と同様の効果が実現される。また、第９実施形態では、音響信号Ｖの生成に対する制限が可変に制御されるから、購入処理の実行で第１動作モードに移行する誘因を利用者に対して効果的に付与することが可能である。なお、以上の例示では、発音情報Ｘ3に指定される音声符号の制限と音高情報Ｘ1に指定される音高の制限との双方を経時的に変化させたが、音声符号および音高の一方の制限のみを経時的に変化させることも可能である。 In the ninth embodiment, the same effects as those of the above-described embodiments are realized. Moreover, in 9th Embodiment, since the restriction | limiting with respect to the production | generation of the acoustic signal V is controlled variably, the incentive to transfer to 1st operation mode by execution of a purchase process can be given effectively to a user. It is. In the above example, both the restriction of the voice code specified in the pronunciation information X3 and the restriction of the pitch specified in the pitch information X1 are changed over time, but one of the voice code and the pitch is changed. It is also possible to change only the limitation of the time course.

＜変形例＞
以上の各形態は多様に変形され得る。具体的な変形の態様を以下に例示する。以下の例示から任意に選択された２以上の態様を適宜に併合することも可能である。 <Modification>
Each of the above forms can be variously modified. Specific modifications are exemplified below. Two or more modes arbitrarily selected from the following examples can be appropriately combined.

（１）第２動作モードで発音情報Ｘ3に指定される音声符号（発音文字や音素記号）を第１動作モードと比較して制限する方法は前述の例示に限定されない。第１実施形態から第６実施形態の音響生成部２４は、第１動作モードでは利用者からの指示に応じた発音内容（発音情報Ｘ3）の音響信号Ｖを生成し、第２動作モードでは、第１動作モードと比較して低い自由度で発音内容が設定された音響信号Ｖを生成する要素として包括される。 (1) The method of restricting the phonetic code (phonetic character or phoneme symbol) specified in the pronunciation information X3 in the second operation mode as compared with the first operation mode is not limited to the above example. In the first operation mode, the sound generation unit 24 of the first to sixth embodiments generates the sound signal V of the pronunciation content (pronunciation information X3) according to the instruction from the user in the first operation mode, and in the second operation mode, It is included as an element that generates the acoustic signal V in which the sound generation content is set with a lower degree of freedom compared to the first operation mode.

また、前述の各形態では、歌唱音の音響信号Ｖを生成する場合を想定したため、制御情報Ｃが音高情報Ｘ1および期間情報Ｘ2を含む構成を例示したが、音高情報Ｘ1や期間情報Ｘ2は第１実施形態から第６実施形態から省略され得る。すなわち、音高の経時的な変化を要件としない会話音等の音声の音響信号Ｖの生成にも第１実施形態から第６実施形態は適用され得る。 Further, in each of the above embodiments, since it is assumed that the acoustic signal V of the singing sound is generated, the configuration in which the control information C includes the pitch information X1 and the period information X2 is exemplified. However, the pitch information X1 and the period information X2 are exemplified. Can be omitted from the first to sixth embodiments. That is, the first to sixth embodiments can be applied to the generation of a sound signal V of a voice such as a conversation sound that does not require a change in pitch over time.

（２）第２動作モードで音高情報Ｘ1に指定される音高を第１動作モードと比較して制限する方法は前述の例示に限定されない。第７実施形態および第８実施形態の音響生成部２４は、第１動作モードでは利用者からの指示に応じた音高の音響信号Ｖを生成し、第２動作モードでは、第１動作モードと比較して低い自由度で音高が設定された音響信号Ｖを生成する要素として包括される。また、発音情報Ｘ3に指定される音声符号を第２動作モードで制限する構成は第７実施形態や第８実施形態から省略され得る。 (2) The method of limiting the pitch specified in the pitch information X1 in the second operation mode as compared with the first operation mode is not limited to the above example. The sound generation unit 24 of the seventh embodiment and the eighth embodiment generates a sound signal V having a pitch according to an instruction from the user in the first operation mode, and the first operation mode and the second operation mode. It is included as an element that generates the acoustic signal V in which the pitch is set with a lower degree of freedom. Further, the configuration for restricting the voice code specified in the pronunciation information X3 in the second operation mode can be omitted from the seventh embodiment and the eighth embodiment.

また、第７実施形態および第８実施形態では、歌唱音の音響信号Ｖを生成する場合を想定したため、制御情報Ｃが発音情報Ｘ3を含む構成を例示したが、発音情報Ｘ3は第７実施形態や第８実施形態から省略され得る。すなわち、音高情報Ｘ1が指定する音高と期間情報Ｘ2が指定する発音期間とで規定される音符の楽音（例えば楽器の演奏音）の音響信号Ｖを生成する場合にも第７実施形態や第８実施形態は適用され得る。 In the seventh embodiment and the eighth embodiment, since it is assumed that the acoustic signal V of the singing sound is generated, the configuration in which the control information C includes the pronunciation information X3 is exemplified. However, the pronunciation information X3 is the seventh embodiment. And may be omitted from the eighth embodiment. That is, the seventh embodiment also applies to the case where the acoustic signal V of the musical tone of a note (for example, the performance sound of a musical instrument) defined by the pitch specified by the pitch information X1 and the sound generation period specified by the period information X2 is generated. The eighth embodiment can be applied.

（３）前述の各形態では、記憶装置１２に記憶された音声素片群Ｇのうち発音制限範囲ＱA内の音声符号に対応する音声素片を第２動作モードで選択的に利用して音響信号Ｖを生成したが、第１動作モードで利用される音声素片群Ｇ1と第２動作モードで利用される音声素片群Ｇ2とを個別に用意することも可能である。音声素片群Ｇ1は、利用者が音響生成装置１００を正規に利用する（音響生成装置１００の全部の機能を利用する）ための音声合成ライブラリであり、音声素片群Ｇ2は、利用者が音響生成装置１００を試用するための試用版（特定の機能を制限して音響生成装置１００の利用を許可する体験版）の音声合成ライブラリである。各音声素片の発声者は音声素片群Ｇ1と音声素片群Ｇ2とで共通するが、音声素片群Ｇ1内の音声素片の種類数は音声素片群Ｇ2内の音声素片の種類数を上回る。すなわち、音声素片群Ｇ2は音声素片群Ｇ1のうち発音制限範囲ＱA内の音声符号に対応する音声素片の部分集合に相当する。したがって、音声素片群Ｇ2のデータ量は音声素片群Ｇ1のデータ量を下回る。 (3) In each of the above-described embodiments, the speech unit corresponding to the speech code within the pronunciation restriction range QA out of the speech unit group G stored in the storage device 12 is selectively used in the second operation mode for sound. Although the signal V is generated, the speech element group G1 used in the first operation mode and the speech element group G2 used in the second operation mode can be prepared separately. The speech unit group G1 is a speech synthesis library that allows the user to use the acoustic generation device 100 in a regular manner (uses all the functions of the acoustic generation device 100), and the speech unit group G2 is used by the user. It is a speech synthesis library of a trial version (trial version that restricts specific functions and permits use of the sound generation apparatus 100) for using the sound generation apparatus 100 as a trial. The speaker of each speech unit is common to the speech unit group G1 and the speech unit group G2, but the number of speech units in the speech unit group G1 is the number of speech units in the speech unit group G2. It exceeds the number of types. That is, the speech unit group G2 corresponds to a subset of speech units corresponding to speech codes within the pronunciation restriction range QA in the speech unit group G1. Therefore, the data amount of the speech unit group G2 is less than the data amount of the speech unit group G1.

音響生成プログラムＰGMの導入直後の初期状態では音声素片群Ｇ2が記憶装置１２に記憶され、購入処理が適正に完了することを条件に音声素片群Ｇ1が配信装置から通信装置１４に配信されて記憶装置１２に記憶される。以上の構成によれば、購入処理を実行しない場合には音声素片群Ｇ1を記憶装置１２に格納する必要がないから、音声素片群Ｇ1の送受信時の通信量や音声素片群Ｇ1の記憶容量が削減されるという利点がある。 In the initial state immediately after the introduction of the sound generation program PGM, the speech unit group G2 is stored in the storage device 12, and the speech unit group G1 is distributed from the distribution device to the communication device 14 on the condition that the purchase process is properly completed. And stored in the storage device 12. According to the above configuration, since the speech unit group G1 does not need to be stored in the storage device 12 when the purchase process is not executed, the communication amount at the time of transmission / reception of the speech unit group G1 and the speech unit group G1 There is an advantage that the storage capacity is reduced.

（４）第２動作モードでは音響信号Ｖの生成に利用される音声素片の種類数が第１動作モードと比較して少ないという事情を考慮すると、第２動作モードにおいて音響信号Ｖの生成に必要な音声素片を動作毎に配信装置から通信装置１４が取得する構成も好適である。すなわち、動作モードが第２動作モードから第１動作モードに移行するまでは記憶装置１２に音声素片（音声素片群Ｇ）が記憶されない。 (4) Considering the fact that the number of types of speech elements used for generating the acoustic signal V in the second operation mode is smaller than that in the first operation mode, the acoustic signal V is generated in the second operation mode. A configuration in which the communication device 14 acquires necessary speech segments from the distribution device for each operation is also suitable. That is, the speech unit (speech unit group G) is not stored in the storage device 12 until the operation mode shifts from the second operation mode to the first operation mode.

（５）第７実施形態における音高制限範囲ＱBの設定に第２実施形態の属性情報や第３実施形態の位置情報を利用することも可能である。すなわち、情報編集部３２は、利用者の属性情報や位置情報に応じて音高制限範囲ＱBを可変に設定する。 (5) It is also possible to use the attribute information of the second embodiment and the position information of the third embodiment for setting the pitch limit range QB in the seventh embodiment. That is, the information editing unit 32 variably sets the pitch limit range QB according to the user's attribute information and position information.

（６）前述の各形態では、購入処理の実行を条件として第２動作モードから第１動作モードに移行する場合を例示したが、動作モードの移行の方向や条件は適宜に変更される。例えば、音響生成プログラムＰGMの導入の直後の初期状態では第１動作モードを選択し、購入処理が実行されずに所定の期間が経過した場合に第１動作モードから第２動作モードに移行する構成も採用され得る。また、音響信号Ｖの生成に対する制限の度合が相違する３種類以上の動作モードから何れかの動作モードを選択する構成も採用され得る。例えば、利用者が支払う金額に応じて段階的に動作モード（音響信号Ｖの生成に対する制限の度合）を変更することも可能である。音響信号Ｖの生成に対する制限の度合を利用者に通知する構成も採用され得る。例えば、動作モードを利用者に通知する画像（ポップアップ方式のダイアログ）を表示装置に表示する構成や、動作モードを通知する電子メールを利用者の登録アドレスに送信する構成が好適である。 (6) In each of the above-described embodiments, the case of shifting from the second operation mode to the first operation mode on the condition that the purchase process is executed has been illustrated, but the direction and conditions of the transition of the operation mode are appropriately changed. For example, a configuration in which the first operation mode is selected in the initial state immediately after the introduction of the sound generation program PGM and the first operation mode is shifted to the second operation mode when a predetermined period elapses without performing the purchase process. Can also be employed. In addition, a configuration in which any one operation mode is selected from three or more operation modes with different degrees of restriction on the generation of the acoustic signal V may be employed. For example, it is possible to change the operation mode (the degree of restriction on the generation of the acoustic signal V) step by step according to the amount paid by the user. A configuration for notifying the user of the degree of restriction on the generation of the acoustic signal V can also be employed. For example, a configuration in which an image (pop-up dialog) for notifying the user of the operation mode is displayed on the display device or a configuration in which an e-mail for notifying the operation mode is transmitted to the registered address of the user is suitable.

また、例えば利用者がプレイするゲームで特定の条件（例えば所定のステージのクリア）を充足することや音響生成プログラムＰGMに関するアンケートに回答すること等の各種の条件を、動作モード（音響信号Ｖの生成に対する制限の度合）の変更の条件とすることも可能である。 Also, for example, various conditions such as satisfying a specific condition (for example, clearing a predetermined stage) in a game played by the user or answering a questionnaire regarding the sound generation program PGM are set in the operation mode (acoustic signal V). It is also possible to set a condition for changing the degree of restriction on generation.

以上の説明から理解される通り、前述の各形態における動作制御部２２は、音響信号Ｖの生成に対する制限の度合が相違する第１動作モードまたは第２動作モードを選択する要素として包括され、動作モードの移行の方向や条件および動作モードの総数は本発明において任意である。 As understood from the above description, the operation control unit 22 in each embodiment described above is included as an element for selecting the first operation mode or the second operation mode with different degrees of restriction on the generation of the acoustic signal V. The direction and condition of mode transition and the total number of operation modes are arbitrary in the present invention.

１００……音響生成装置、１０……演算処理装置、１２……記憶装置、１４……通信装置、１６……入力装置、１８……放音装置、２２……動作制御部、２４……音響生成部、３２……情報編集部、３４……合成処理部、４０……位置検出装置。
DESCRIPTION OF SYMBOLS 100 ... Sound generation device, 10 ... Arithmetic processing device, 12 ... Memory | storage device, 14 ... Communication device, 16 ... Input device, 18 ... Sound emission device, 22 ... Operation control part, 24 ... Sound Generation unit, 32... Information editing unit, 34... Synthesis processing unit, 40.

Claims

An operation control means for selecting the first operation mode or the second operation mode;
In the first operation mode, a sound signal having a pronunciation content corresponding to an instruction from the user is generated, and in the second operation mode, the pronunciation content is set with a lower degree of freedom compared to the first operation mode. Sound generating means for generating an acoustic signal,
The sound generation means generates a sound signal having a pitch according to an instruction from a user in the first operation mode, and has a lower degree of freedom in the second operation mode than in the first operation mode. A sound generator that generates a sound signal with a set pitch.

The sound generation device according to claim 1 , wherein the sound generation unit generates a sound signal having a smaller number of types of pitches in the second operation mode than in the first operation mode.

It said sound generating means, the degree of restriction to the generation of the acoustic signal in the second operation mode, different first period and sound generating apparatus according to claim 1 or claim 2 is different in the second period.

Computer
Select the first operation mode or the second operation mode,
In the first operation mode, a sound signal having a pronunciation content corresponding to an instruction from the user is generated, and in the second operation mode, the pronunciation content is set with a lower degree of freedom compared to the first operation mode. Generate an acoustic signal,
In the generation of the acoustic signal, an acoustic signal having a pitch according to an instruction from a user is generated in the first operation mode, and the second operation mode has a low degree of freedom compared to the first operation mode. Generate an acoustic signal with the pitch set by
Sound generation method.