JP7355165B2

JP7355165B2 - Music playback system, control method and program for music playback system

Info

Publication number: JP7355165B2
Application number: JP2022098691A
Authority: JP
Inventors: 秀樹高野
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2018-08-28
Filing date: 2022-06-20
Publication date: 2023-10-03
Anticipated expiration: 2039-08-27
Also published as: JP2022120188A; JP7095742B2; JPWO2020045398A1; WO2020045398A1

Description

本開示は、楽曲を再生する技術に関する。 The present disclosure relates to technology for playing music.

利用者からの入力に応じて楽曲を再生する技術が従来から提案されている。例えば、特許文献１には、利用者がマイクに対して入力する音声に応じて動作するカラオケ装置が開示されている。利用者は、動作の内容を表す音声（例えば「エンソウオンヲオオキク」または「オンカイヲアゲル」等）をマイクに対して発話する。 2. Description of the Related Art Techniques for playing music according to input from users have been proposed in the past. For example, Patent Document 1 discloses a karaoke device that operates according to voice input by a user into a microphone. The user speaks into the microphone a voice representing the content of the action (for example, "enso wo on wo okikku" or "on kai wo ageru").

特開平１１－２９６１８２号公報Japanese Patent Application Publication No. 11-296182

特許文献１の技術では、利用者がカラオケ装置に動作を指示するための音声は、その動作を直接的に表す音声に限定される。以上の事情を考慮して、本開示は、音声入力の方法を多様化することを目的とする。 In the technique disclosed in Patent Document 1, the voice used by the user to instruct the karaoke apparatus to perform an operation is limited to the voice that directly represents the operation. In consideration of the above circumstances, the present disclosure aims to diversify voice input methods.

以上の課題を解決するために、本開示の好適な態様に係る楽曲再生システムは、入力音声が歌唱音声であるか歌唱音声以外の指示音声であるかを判別する判別部と、前記入力音声が歌唱音声であると判別された場合に、当該入力音声に対応する楽曲の再生に関する第１動作を、楽曲の再生を制御する再生制御部に対して指示し、前記入力音声が指示音声であると判別された場合に、当該入力音声が表す第２動作を前記再生制御部に対して指示する動作制御部とを具備する。 In order to solve the above problems, a music playback system according to a preferred aspect of the present disclosure includes a determination unit that determines whether input audio is singing audio or instruction audio other than singing audio; If it is determined that the input voice is a singing voice, instructs a playback control unit that controls playback of the song to perform a first operation regarding playback of a song corresponding to the input voice, and determines that the input voice is an instruction voice. and an operation control section that instructs the playback control section to perform a second operation represented by the input voice when the input voice is determined.

本開示の好適な態様に係る楽曲再生システムの制御方法は、入力音声が歌唱音声であるか歌唱音声以外の指示音声であるかを判別し、前記入力音声が歌唱音声であると判別された場合に、当該入力音声に対応する楽曲の再生に関する第１動作を、楽曲の再生を制御する再生制御部に対して指示し、前記入力音声が指示音声であると判別された場合に、当該入力音声が表す第２動作を前記再生制御部に対して指示する。 A method for controlling a music playback system according to a preferred aspect of the present disclosure determines whether input audio is singing audio or instruction audio other than singing audio, and when the input audio is determined to be singing audio. Instructs a playback control unit that controls playback of music to perform a first operation related to playing music corresponding to the input audio, and when the input audio is determined to be instruction audio, the input audio The second operation represented by is instructed to the reproduction control section.

本開示の他の態様に係る楽曲再生の制御方法は、楽曲の再生を指示する第１入力音声に対応する楽曲名の提示の指示を再生制御部に付与し、当該提示された楽曲名の楽曲が所望の楽曲であることを表す第２入力音声を受け付けた場合に、前記第１入力音声に対応する楽曲を再生する動作を前記再生制御部に対して指示する。 A music playback control method according to another aspect of the present disclosure provides a playback control unit with an instruction to present a music name corresponding to a first input voice instructing playback of a music piece, and a music piece with the presented music name. When receiving a second input voice indicating that the music corresponding to the first input voice is a desired song, the playback controller is instructed to play back the song corresponding to the first input voice.

本開示の他の態様に係る楽曲再生の制御方法は、入力音声が、楽曲の再生を制御する再生制御部による再生中の楽曲の歌唱音声であるか、当該再生制御部による再生中の楽曲以外の楽曲の歌唱音声であるかを判別し、前記入力音声が前記再生制御部による再生中の楽曲の歌唱音声であると判別された場合には、当該入力音声を評価する動作を歌唱評価部に対して指示し、前記入力音声が前記再生制御部による再生中の楽曲以外の楽曲の歌唱音声であると判別された場合には、当該入力音声に対応する楽曲を再生する動作を前記再生制御部に対して指示する。 A music playback control method according to another aspect of the present disclosure is such that the input audio is a singing voice of a song being played by a playback control unit that controls playback of a song, or is other than the song being played by the playback control unit. If the input audio is determined to be the singing audio of the song being played by the playback control unit, the singing evaluation unit is instructed to perform an operation of evaluating the input audio. If the input audio is determined to be singing audio of a song other than the song being played by the playback control unit, the playback control unit causes the playback control unit to perform an operation to play a song corresponding to the input audio. give instructions to

第１実施形態に係る楽曲再生システムの構成を例示するブロック図である。FIG. 1 is a block diagram illustrating the configuration of a music playback system according to a first embodiment. 動作テーブルの模式図である。It is a schematic diagram of an operation table. 端末装置が実行する処理を例示するフローチャートである。3 is a flowchart illustrating a process executed by a terminal device. 歌唱音声であるか否かを判別する処理を例示するフローチャートである。12 is a flowchart illustrating a process for determining whether or not it is a singing voice. 第２実施形態に係る楽曲再生システムの構成を例示するブロック図である。FIG. 2 is a block diagram illustrating the configuration of a music playback system according to a second embodiment. 端末装置が実行する処理を例示するフローチャートである。3 is a flowchart illustrating a process executed by a terminal device. 第３実施形態に係る楽曲再生システムの構成を例示するブロック図である。FIG. 3 is a block diagram illustrating the configuration of a music playback system according to a third embodiment. 端末装置の構成を例示するブロック図である。FIG. 2 is a block diagram illustrating the configuration of a terminal device. 端末装置の構成を例示するブロック図である。FIG. 2 is a block diagram illustrating the configuration of a terminal device. 処理装置の構成を例示するブロック図である。FIG. 2 is a block diagram illustrating the configuration of a processing device. 変形例に係る制御装置の処理のフローチャートである。It is a flow chart of processing of a control device concerning a modification.

＜第１実施形態＞
図１は、第１実施形態に係る楽曲再生システム１０の構成を例示するブロック図である。第１実施形態に係る楽曲再生システム１０は、利用者Ｕの操作に応じて楽曲を再生するコンピュータシステムである。伴奏音を含む楽曲（すなわちカラオケ曲）が楽曲再生システム１０により再生され、利用者Ｕが当該楽曲に合わせて歌唱する。例えば利用者Ｕが運転する自動車の車内に楽曲再生システム１０が設置される。したがって、複数の操作子を利用した手入力により利用者Ｕが楽曲再生システム１０に各種の動作を指示することが困難である。そこで、楽曲再生システム１０は、利用者Ｕによる音声入力により動作の指示を受け付ける。すなわち、車の運転を妨げずに楽曲再生システム１０の操作が可能になる。例えば携帯電話機およびスマートフォン等の情報端末が楽曲再生システム１０として利用される。 <First embodiment>
FIG. 1 is a block diagram illustrating the configuration of a music playback system 10 according to the first embodiment. The music reproduction system 10 according to the first embodiment is a computer system that reproduces music according to user U's operations. A song including accompaniment sounds (ie, a karaoke song) is played by the music playback system 10, and the user U sings along with the song. For example, the music playback system 10 is installed inside a car driven by user U. Therefore, it is difficult for the user U to instruct the music reproduction system 10 to perform various operations through manual input using a plurality of operators. Therefore, the music playback system 10 receives operation instructions through voice input from the user U. That is, it becomes possible to operate the music reproduction system 10 without interfering with driving the car. For example, information terminals such as mobile phones and smartphones are used as the music playback system 10.

図１に例示される通り、楽曲再生システム１０は、収音装置１１と制御装置１２と記憶装置１３と再生装置１４とを具備する。収音装置１１は、周囲の音を収音する音響機器（マイクロホン）である。第１実施形態の収音装置１１は、利用者Ｕが発音する音声（すなわち入力音声Ｖ）を受け付ける。収音装置１１が受け付けた入力音声Ｖにより楽曲再生システム１０が動作する。具体的には、歌唱音声と歌唱音声以外の指示音声とが入力音声Ｖとして例示される。歌唱音声は、利用者Ｕが任意の楽曲を歌唱する音声である。複数の音符で構成される旋律を伴う音声が歌唱音声である。他方、指示音声は、旋律を伴わない音声である。具体的には、指示音声は、楽曲再生システム１０に各種の動作を指示する音声である。例えば、楽曲の再生、停止、キーの変更、または、音量の変更等の動作を指示する指示音声が想定される。収音装置１１は、利用者Ｕからの入力音声Ｖを受け付けて、当該入力音声Ｖの波形を表す音響信号Ｘを生成する。すなわち、歌唱音声または指示音声を表す音響信号Ｘが生成される。なお、実際には、歌唱音声および指示音声とは異なる音声（以下「発話音声」という）も収音装置１１により収音される。発話音声は、例えば会話による音声などである。 As illustrated in FIG. 1, the music playback system 10 includes a sound collection device 11, a control device 12, a storage device 13, and a playback device 14. The sound collection device 11 is an audio device (microphone) that collects surrounding sounds. The sound collection device 11 of the first embodiment receives the voice produced by the user U (that is, the input voice V). The music reproduction system 10 operates based on the input voice V received by the sound collection device 11. Specifically, a singing voice and an instruction voice other than the singing voice are exemplified as the input voice V. The singing voice is the voice of the user U singing an arbitrary song. Singing audio is audio accompanied by a melody made up of multiple notes. On the other hand, the instruction voice is a voice without a melody. Specifically, the instruction voice is a voice that instructs the music reproduction system 10 to perform various operations. For example, an instruction voice may be used to instruct operations such as playing or stopping a song, changing the key, or changing the volume. The sound collection device 11 receives an input voice V from a user U, and generates an acoustic signal X representing a waveform of the input voice V. That is, an acoustic signal X representing a singing voice or an instruction voice is generated. Note that, in reality, the sound collection device 11 also collects sounds different from the singing sounds and the instruction sounds (hereinafter referred to as "speech sounds"). The uttered sound is, for example, the sound of a conversation.

制御装置１２（コンピュータの例示）は、例えばＣＰＵ（Central Processing Unit）等の処理回路で構成され、楽曲再生システム１０の各要素を統括的に制御する。制御装置１２は、記憶装置１３に記憶されたプログラムを実行することで複数の機能（判別部１２１、動作制御部１２３および再生制御部１２５）を実現する。なお、制御装置１２の一部の機能を専用の電子回路で実現してもよい。また、制御装置１２の機能を複数の装置に搭載してもよい。 The control device 12 (an example of a computer) is configured with a processing circuit such as a CPU (Central Processing Unit), and controls each element of the music reproduction system 10 in an integrated manner. The control device 12 implements a plurality of functions (discrimination section 121, operation control section 123, and reproduction control section 125) by executing programs stored in the storage device 13. Note that some functions of the control device 12 may be realized by a dedicated electronic circuit. Further, the functions of the control device 12 may be installed in a plurality of devices.

記憶装置１３は、制御装置１２が実行するプログラムと、制御装置１２が使用する各種のデータとを記憶する。例えば半導体記録媒体および磁気記録媒体等の公知の記録媒体、または、複数種の記録媒体の組合せが、記憶装置１３として任意に採用される。図１に例示される通り、相異なる複数の楽曲をそれぞれ表す複数の楽曲データＭと、動作テーブルとが記憶装置１３に記憶される。例えばＭＩＤＩ（Musical Instrument Digital Interface）規格に準拠した形式のファイル（ＳＭＦ：Standard MIDI File）が楽曲データＭとして好適である。なお、楽曲の演奏音の波形を表わすオーディオファイルを楽曲データＭとして利用してもよい。第１実施形態の楽曲データＭは、楽曲名と演奏データと参照データとを含む。演奏データは、複数の演奏パートの各々について音符列（演奏内容）を指定する時系列データである。参照データは、歌唱パートの音符列（ガイドメロディ）を指定する時系列データである。演奏データと参照データとは、同一の楽曲データＭ内の相異なるチャンネルのデータである。なお、楽曲再生システム１０と通信可能なウェブサーバに複数の楽曲データＭを記憶してもよい。 The storage device 13 stores programs executed by the control device 12 and various data used by the control device 12. For example, a known recording medium such as a semiconductor recording medium and a magnetic recording medium, or a combination of multiple types of recording media may be arbitrarily employed as the storage device 13. As illustrated in FIG. 1, a plurality of pieces of music data M each representing a plurality of different pieces of music and an operation table are stored in the storage device 13. For example, a file in a format compliant with the MIDI (Musical Instrument Digital Interface) standard (SMF: Standard MIDI File) is suitable as the music data M. Note that an audio file representing the waveform of the performance sound of a song may be used as the song data M. The music data M of the first embodiment includes a music name, performance data, and reference data. The performance data is time-series data that specifies a note sequence (performance content) for each of a plurality of performance parts. The reference data is time-series data that specifies the note string (guide melody) of the singing part. The performance data and the reference data are data of different channels within the same music data M. Note that a plurality of pieces of music data M may be stored in a web server that can communicate with the music playback system 10.

再生装置１４は、制御装置１２（再生制御部１２５）による制御のもとで、各種の楽曲を再生する再生機器である。具体的には、再生装置１４は、記憶装置１３に記憶される楽曲データＭが表す楽曲を放音する放音装置（スピーカ）を含む。なお、再生装置１４が表示装置を含んでもよい。 The playback device 14 is a playback device that plays various songs under the control of the control device 12 (playback control section 125). Specifically, the reproduction device 14 includes a sound emitting device (speaker) that emits the music represented by the music data M stored in the storage device 13. Note that the playback device 14 may include a display device.

図２は、動作テーブルの模式図である。動作テーブルは、楽曲再生システム１０の複数の相異なる動作が登録されたデータテーブルである。図２に例示される通り、各動作には、当該動作を表す文字列（以下「登録文字列」という）が対応付けられる。具体的には、利用者Ｕによる発音が想定される指示音声を表す文字列に対応（例えば類似または一致）する登録文字列が登録される。例えば楽曲の再生の停止を指示する指示音声（例えば文字列「停止して」を発音する指示音声）に対応する登録文字列「停止」が例示される。 FIG. 2 is a schematic diagram of the operation table. The operation table is a data table in which a plurality of different operations of the music reproduction system 10 are registered. As illustrated in FIG. 2, each action is associated with a character string representing the action (hereinafter referred to as "registered character string"). Specifically, a registered character string that corresponds to (for example, is similar to or matches) a character string representing an instruction voice that is expected to be pronounced by user U is registered. For example, a registered character string "stop" that corresponds to an instruction voice that instructs to stop playing a song (for example, an instruction voice that pronounces the character string "stop") is exemplified.

図１の判別部１２１は、収音装置１１が生成した音響信号Ｘから、利用者Ｕによる入力音声Ｖが歌唱音声であるか指示音声であるかを判別する。動作制御部１２３は、楽曲の再生に関する各種の動作（例えば楽曲の再生、停止またはキーの変更）を再生制御部１２５に対して指示する。第１実施形態の動作制御部１２３は、判別部１２１による判別結果に応じた動作を再生制御部１２５に指示する。 The determining unit 121 in FIG. 1 determines from the acoustic signal X generated by the sound collection device 11 whether the input voice V by the user U is a singing voice or an instruction voice. The operation control unit 123 instructs the playback control unit 125 to perform various operations related to music playback (for example, playing, stopping, or changing keys). The operation control unit 123 of the first embodiment instructs the playback control unit 125 to perform an operation according to the determination result by the determination unit 121.

再生制御部１２５は、楽曲の再生を制御する。具体的には、再生制御部１２５は、動作制御部１２３からの指示を実行することで、楽曲を再生する再生装置１４を制御する。第１実施形態の再生制御部１２５は、データ処理部と音源部とを具備する。データ処理部は、楽曲データＭに含まれる演奏データに基づいて楽曲の各音符の発音または消音を指示する。音源部は、データ処理部からの指示に応じて楽曲の演奏音を表す音響信号を生成して再生装置１４に供給する。再生装置１４は、再生制御部１２５から供給される音響信号を再生する。 The playback control unit 125 controls playback of music. Specifically, the playback control unit 125 controls the playback device 14 that plays music by executing instructions from the operation control unit 123. The playback control section 125 of the first embodiment includes a data processing section and a sound source section. The data processing section instructs the sounding or muting of each note of the music based on the performance data included in the music data M. The sound source section generates an acoustic signal representing the performance sound of a music piece in response to an instruction from the data processing section and supplies it to the playback device 14 . The playback device 14 plays back the audio signal supplied from the playback control section 125.

図３は、制御装置１２が実行する処理を例示するフローチャートである。以下の説明では、楽曲が再生されていない状態（以下「待機状態」という）を前提として、音声入力により楽曲再生システム１０に動作を指示する場合を想定する。第１実施形態では、利用者Ｕが所望する楽曲の再生を楽曲再生システム１０に指示する。利用者Ｕは、所望する楽曲の歌唱音声、または、当該楽曲の再生を指示する指示音声を発音することで、楽曲の再生を指示することが可能である。楽曲名または楽曲を識別する識別情報（例えば番号）を含む文字列を発話する音声が指示音声として好適である。例えば楽曲「ＡＢＣ」の再生を指示する場合には、当該楽曲「ＡＢＣ」を歌唱する歌唱音声、または、例えば文字列「［ＡＢＣ］を再生」を発音した指示音声が入力音声Ｖとして例示される。すなわち、第１実施形態では、歌唱音声および指示音声の各々は、利用者Ｕが所望する楽曲「ＡＢＣ」を指定する音声であるとも換言できる。 FIG. 3 is a flowchart illustrating the processing executed by the control device 12. In the following description, it is assumed that the music reproduction system 10 is instructed to operate by voice input, assuming that no music is being played (hereinafter referred to as a "standby state"). In the first embodiment, the user U instructs the music playback system 10 to play a desired song. The user U can instruct playback of a song by producing a singing voice of the desired song or an instruction voice instructing playback of the song. A voice that utters a character string including a song name or identification information (for example, a number) for identifying a song is suitable as the instruction voice. For example, when instructing to play the song "ABC", the input voice V may be a singing voice singing the song "ABC" or an instruction voice pronouncing, for example, the character string "Play [ABC]". . That is, in the first embodiment, each of the singing voice and the instruction voice can be said to be a voice specifying the song "ABC" desired by the user U.

例えば収音装置１１が入力音声Ｖを受け付けると、図３の処理が実行される。図３の処理を開始すると、判別部１２１は、入力音声Ｖが歌唱音声であるか歌唱音声以外の音声（すなわち指示音声または発話音声）であるかを判別する（Ｓa1）。 For example, when the sound collection device 11 receives the input voice V, the process shown in FIG. 3 is executed. When the process of FIG. 3 is started, the determining unit 121 determines whether the input voice V is a singing voice or a voice other than a singing voice (that is, an instruction voice or a spoken voice) (Sa1).

図４は、ステップＳa1の処理を例示するフローチャートである。判別部１２１は、記憶装置１３に記憶された複数の参照データの各々について、当該参照データと音響信号Ｘとの類似の度合を表す指標（以下「類似指標」という）を算出する（Ｓa11）。例えば参照データが音符毎に指定するピッチと音響信号Ｘから検出される各ピッチとの類似の度合を表す指標が類似指標として利用される。例えば、参照データと音響信号Ｘとの音符毎のピッチの差分を音符列について合算した値が類似指標として利用される。音響信号Ｘのピッチの検出には、公知のピッチ検出技術が採用される。判別部１２１は、複数の参照データのそれぞれについて算出された複数の類似指標のうちの最大値が、所定の閾値を上回るか否かを判定する（Ｓa12）。判別部１２１は、当該最大値が所定の閾値を上回る場合（Ｓa12：YES）、入力音声Ｖが、当該最大値に対応する参照データが表す楽曲の歌唱音声であると判別する（Ｓa13）。すなわち、ステップＳa13により、利用者Ｕが歌唱する楽曲の参照データが特定される。類似指標の算出には、動的時間伸縮法（ＤＴＷ：Dynamic Time Warping）、鼻歌検索（Query by Singing/Humming）等の公知の技術が任意に採用される。なお、動的時間伸縮法により類似指標を算出する構成によれば、音響信号Ｘと参照データとのテンポおよびキーの相違も推定することが可能になる。 FIG. 4 is a flowchart illustrating the process of step Sa1. The determination unit 121 calculates, for each of the plurality of reference data stored in the storage device 13, an index (hereinafter referred to as "similarity index") representing the degree of similarity between the reference data and the acoustic signal X (Sa11). For example, an index representing the degree of similarity between the pitch specified by the reference data for each note and each pitch detected from the acoustic signal X is used as the similarity index. For example, a value obtained by summing the pitch differences for each note between the reference data and the acoustic signal X for a note string is used as the similarity index. A known pitch detection technique is employed to detect the pitch of the acoustic signal X. The determining unit 121 determines whether the maximum value of the plurality of similarity indices calculated for each of the plurality of reference data exceeds a predetermined threshold (Sa12). If the maximum value exceeds the predetermined threshold (Sa12: YES), the determining unit 121 determines that the input voice V is the singing voice of the song represented by the reference data corresponding to the maximum value (Sa13). That is, in step Sa13, the reference data of the song sung by the user U is specified. Known techniques such as dynamic time warping (DTW) and query by singing/humming are arbitrarily employed to calculate the similarity index. Note that according to the configuration in which the similarity index is calculated by the dynamic time warping method, it is also possible to estimate the difference in tempo and key between the acoustic signal X and the reference data.

他方、判別部１２１は、当該最大値が所定の閾値を下回る場合（Ｓa12：NO）、入力音声Ｖが歌唱音声以外の音声であると判別する（Ｓa14）。以上の説明から理解される通り、ステップＳa1では、入力音声Ｖが歌唱音声であるか否かが判別されるとともに、入力音声Ｖが歌唱音声であると判別された場合には当該入力音声Ｖに対応する楽曲（つまり利用者Ｕが歌唱している楽曲）が特定される。 On the other hand, when the maximum value is less than the predetermined threshold (Sa12: NO), the determining unit 121 determines that the input voice V is a voice other than singing voice (Sa14). As understood from the above explanation, in step Sa1, it is determined whether or not the input voice V is a singing voice, and if it is determined that the input voice V is a singing voice, the input voice V is The corresponding song (that is, the song that user U is singing) is specified.

動作制御部１２３は、入力音声Ｖが歌唱音声であると判別された場合（Ｓa1：YES）、再生制御部１２５に対して第１動作を指示する（Ｓa2）。第１動作は、入力音声Ｖ（歌唱音声）に対応する楽曲の再生に関する動作である。第１実施形態では、入力音声Ｖに対応する楽曲を再生する動作が第１動作として再生制御部１２５に指示される。具体的には、判別部１２１は、ステップＳa13により特定された参照データが表す楽曲（すなわち入力音声Ｖが表す楽曲）を再生する第１動作を再生制御部１２５に指示する。第１実施形態の第１動作は、入力音声Ｖに対応する楽曲を当該入力音声Ｖに対応する位置から再生する動作である。例えば、楽曲のうち利用者Ｕが歌唱した部分の直後から当該楽曲を再生する第１動作が指示される。すなわち、利用者Ｕは、楽曲の再生を指示するための歌唱音声に連続して当該楽曲を歌唱することができる。 When the input voice V is determined to be a singing voice (Sa1: YES), the operation control unit 123 instructs the reproduction control unit 125 to perform the first operation (Sa2). The first operation is an operation related to reproducing the music corresponding to the input voice V (singing voice). In the first embodiment, the reproduction control unit 125 is instructed to perform an operation of reproducing the music corresponding to the input audio V as the first operation. Specifically, the determination unit 121 instructs the reproduction control unit 125 to perform a first operation of reproducing the music represented by the reference data identified in step Sa13 (that is, the music represented by the input voice V). The first operation of the first embodiment is an operation of reproducing the music corresponding to the input voice V from the position corresponding to the input voice V. For example, the first operation of playing the song immediately after the part of the song sung by the user U is instructed. That is, the user U can sing the song in succession to the singing voice for instructing the playback of the song.

再生制御部１２５は、第１動作を実行する（Ｓa3）。具体的には、再生制御部１２５は、入力音声Ｖに対応する楽曲を再生装置１４に再生させる。具体的には、再生制御部１２５は、入力音声Ｖに対応する楽曲を当該入力音声Ｖに対応する位置から再生装置１４に再生させる。具体的には、再生制御部１２５は、入力音声Ｖに対応する楽曲の演奏データに応じた音響信号を、当該入力音声Ｖに対応する部分から時系列に再生装置１４に供給する。以上の説明から理解される通り、利用者Ｕが歌唱音声を発音した場合には、入力音声Ｖに対応する楽曲が特定され、当該楽曲が再生される。なお、前述の通り、動的時間伸縮法により入力音声Ｖに対応する楽曲を特定する構成によれば、音響信号Ｘと参照データとのテンポおよびキーの相違の推定が可能であるので、入力音声Ｖに応じたテンポおよびキーで楽曲を再生することが可能になる。 The playback control unit 125 executes the first operation (Sa3). Specifically, the playback control unit 125 causes the playback device 14 to play back the music corresponding to the input audio V. Specifically, the reproduction control unit 125 causes the reproduction device 14 to reproduce the music corresponding to the input voice V from the position corresponding to the input voice V. Specifically, the playback control unit 125 supplies the playback device 14 with audio signals corresponding to the performance data of the music piece corresponding to the input sound V in chronological order starting from the portion corresponding to the input sound V. As understood from the above description, when the user U pronounces a singing voice, the music corresponding to the input voice V is identified and the music is played. As mentioned above, according to the configuration in which the music corresponding to the input audio V is specified using the dynamic time warping method, it is possible to estimate the difference in tempo and key between the audio signal X and the reference data, so that the input audio It becomes possible to play music at a tempo and key according to V.

他方、判別部１２１は、入力音声Ｖが歌唱音声以外の音声であると判別された場合（Ｓa1：NO）、入力音声Ｖが指示音声であるか指示音声以外の音声（すなわち発話音声）であるかを判別する（Ｓa4）。具体的には、判別部１２１は、入力音声Ｖを表す文字列（以下「入力文字列」という）に類似する登録文字列が動作テーブルに登録されている場合には、入力音声Ｖが指示音声であると判別し、入力文字列に類似する登録文字列が動作テーブルに登録されていない場合には、入力音声Ｖが指示音声以外の音声であると判別する。入力文字列と登録文字列との対比には、例えば編集距離等の公知の技術が任意に採用される。入力文字列は、例えば音響信号Ｘに対する音声認識により特定される。例えば、入力文字列「［ＡＢＣ］を再生」が特定された場合には、図２の動作テーブルの登録文字列「［楽曲名］を再生」が特定される。また、入力文字列の［楽曲名］に対応する参照データが特定される。例えば形態素解析等の自然言語処理を入力文字列に対して実行することで固有名詞（例えば［ＡＢＣ］）を抽出し、当該固有名詞と楽曲データＭの楽曲名との対比により、再生対象の楽曲が特定される。具体的には、複数の楽曲データＭの楽曲名のうち、入力文字列から抽出された固有名詞に類似する楽曲名がある場合には、当該楽曲名に対応する演奏データが特定される。なお、入力文字列の［楽曲名］に対応する演奏データが存在しない場合には、例えば［楽曲名］に対応する演奏データが存在しないことを利用者Ｕに知らせてもよい。例えば文字列「［楽曲名］はありません。」を表す音声または画像が再生される。以上の説明から理解される通り、ステップＳa4では、入力音声Ｖが指示音声であると判別された場合には、当該入力音声Ｖが表す楽曲の演奏データも特定される。なお、動作テーブルは、楽曲再生システム１０と通信可能なサーバ装置に記憶してもよい。以上の構成では、楽曲再生システム１０が入力音声Ｖまたは入力文字列をサーバ装置に送信し、サーバ装置で入力音声Ｖが指示音声であるか否かの判別がされる。 On the other hand, when the determination unit 121 determines that the input voice V is a voice other than a singing voice (Sa1: NO), the determination unit 121 determines whether the input voice V is an instruction voice or a voice other than an instruction voice (that is, a spoken voice). (Sa4). Specifically, if a registered character string similar to the character string representing the input voice V (hereinafter referred to as "input character string") is registered in the action table, the determination unit 121 determines that the input voice V is the instruction voice. If the character string similar to the input character string is not registered in the action table, it is determined that the input voice V is a voice other than the instruction voice. For comparing the input character string and the registered character string, a known technique such as editing distance, for example, is arbitrarily employed. The input character string is specified by voice recognition of the acoustic signal X, for example. For example, when the input character string "play [ABC]" is specified, the registered character string "play [song name]" in the operation table of FIG. 2 is specified. Further, reference data corresponding to [music name] of the input character string is specified. For example, by performing natural language processing such as morphological analysis on the input character string, a proper noun (e.g. [ABC]) is extracted, and by comparing the proper noun with the song name of the song data M, the song to be played is extracted. is specified. Specifically, if there is a song name similar to the proper noun extracted from the input character string among the song names of the plurality of song data M, the performance data corresponding to the song name is specified. Note that if there is no performance data corresponding to the input character string [song name], the user U may be informed, for example, that there is no performance data corresponding to [song name]. For example, a sound or image representing the character string "[song title] does not exist." is played. As understood from the above description, in step Sa4, if the input voice V is determined to be an instruction voice, the performance data of the music piece represented by the input voice V is also specified. Note that the operation table may be stored in a server device that can communicate with the music playback system 10. In the above configuration, the music reproduction system 10 transmits the input voice V or the input character string to the server device, and the server device determines whether the input voice V is an instruction voice.

動作制御部１２３は、入力音声Ｖが指示音声であると判定された場合（Ｓa4：YES）、当該入力音声Ｖが表す第２動作を再生制御部１２５に対して指示する（Ｓa5）。動作テーブルの複数の動作のうち、入力文字列に類似する登録文字列に対応する動作が第２動作として指示される。すなわち、入力音声Ｖ（指示音声）で指定された楽曲「ＡＢＣ」を再生する第２動作が指示される。第１実施形態の第２動作は、指示音声で指定された楽曲を先頭から再生する動作である。すなわち、第１動作は、歌唱音声による指示に基づく動作であるのに対して、第２動作は指示音声に基づく動作である。第１実施形態では、第１動作と第２動作とは相異なる動作である。再生制御部１２５は、第２動作を実行する（Ｓa6）。第１実施形態の再生制御部１２５は、指示音声で指定された楽曲を再生装置１４に再生させる。具体的には、再生制御部１２５は、指示音声で指定された楽曲に対応する演奏データに応じた音響信号を先頭から再生装置１４に供給する。具体的には、ステップＳa4で特定された演奏データが表わす楽曲が再生される。なお、再生制御部１２５は、演奏データおよび参照データに応じた音響信号を再生装置１４に供給してもよい。以上の説明から理解される通り、利用者Ｕが指示音声を発音した場合には、当該指示音声が指定する楽曲が特定され、当該楽曲が再生される。 When it is determined that the input voice V is an instruction voice (Sa4: YES), the operation control unit 123 instructs the reproduction control unit 125 to perform the second operation represented by the input voice V (Sa5). Among the plurality of actions in the action table, the action corresponding to the registered character string similar to the input character string is instructed as the second action. That is, the second operation of reproducing the music "ABC" specified by the input voice V (instruction voice) is instructed. The second operation of the first embodiment is an operation of reproducing the music specified by the instruction voice from the beginning. That is, the first action is an action based on an instruction by a singing voice, whereas the second action is an action based on an instruction voice. In the first embodiment, the first operation and the second operation are different operations. The playback control unit 125 executes the second operation (Sa6). The playback control unit 125 of the first embodiment causes the playback device 14 to play the music specified by the instruction voice. Specifically, the playback control unit 125 supplies the playback device 14 with an audio signal corresponding to the performance data corresponding to the music specified by the instruction voice from the beginning. Specifically, the music represented by the performance data specified in step Sa4 is played back. Note that the playback control unit 125 may supply the playback device 14 with an audio signal according to the performance data and reference data. As understood from the above explanation, when the user U pronounces the instruction voice, the music specified by the instruction voice is specified, and the music is played.

他方、動作制御部１２３は、入力音声Ｖが指示音声以外の音声（すなわち会話音等の発話音声）であると判定された場合（Ｓa4：NO）、再生制御部１２５に対して何も指示しない（Ｓa7）。以上の説明から理解される通り、ステップＳa1とステップＳa4とで、入力音声Ｖが歌唱音声であるか指示音声であるかが判別される。また、ステップＳa1-Ｓa3の処理と、ステップＳa4-Ｓa6の処理との順番は逆でもよいし、双方の処理が並行して実行されてもよい。 On the other hand, if it is determined that the input voice V is a voice other than the instruction voice (that is, a spoken voice such as a conversational sound) (Sa4: NO), the operation control unit 123 does not instruct the playback control unit 125 to do anything. (Sa7). As understood from the above explanation, it is determined in step Sa1 and step Sa4 whether the input voice V is a singing voice or an instruction voice. Furthermore, the order of the processing in steps Sa1-Sa3 and the processing in steps Sa4-Sa6 may be reversed, or both processing may be performed in parallel.

以上の説明から理解される通り、第１実施形態では、入力音声Ｖが歌唱音声であると判別された場合には、入力音声Ｖに対応する楽曲の再生に関する第１動作が再生制御部１２５に指示される。一方で、入力音声Ｖが指示音声であると判別された場合には、当該入力音声Ｖが表す第２動作が再生制御部１２５に指示される。すなわち、歌唱音声および指示音声による多様な音声入力が可能である。また、第１実施形態では、第１動作と第２動作とが相異なる動作であるから、入力音声Ｖの種類（歌唱音声／指示音声）を適宜に変更することで、所望する動作を再生制御部１２５に指示することが可能である。具体的には、歌唱音声が入力された場合には、当該歌唱音声（入力音声Ｖ）に対応する楽曲を当該歌唱音声に対応する位置から再生する第１動作が指示され、楽曲の再生を指示する指示音声が入力された場合には、当該指示音声（入力音声Ｖ）に対応する楽曲を先頭から再生する第２動作が指示される。 As can be understood from the above description, in the first embodiment, when it is determined that the input voice V is a singing voice, the reproduction control unit 125 performs the first operation regarding reproduction of the music corresponding to the input voice V. be instructed. On the other hand, when it is determined that the input voice V is an instruction voice, the reproduction control unit 125 is instructed to perform the second action represented by the input voice V. That is, a variety of voice inputs including singing voice and instruction voice are possible. In addition, in the first embodiment, since the first action and the second action are different actions, by appropriately changing the type of input voice V (singing voice/instruction voice), the desired operation can be controlled to play. 125. Specifically, when a singing voice is input, a first operation is instructed to play a song corresponding to the singing voice (input voice V) from a position corresponding to the singing voice, and the playback of the song is instructed. When an instruction voice is input, a second operation of reproducing the music corresponding to the instruction voice (input voice V) from the beginning is instructed.

＜第２実施形態＞
本開示の第２実施形態を説明する。なお、以下の各例示において機能が第１実施形態と同様である要素については、第１実施形態の説明で使用した符号を流用して各々の詳細な説明を適宜に省略する。 <Second embodiment>
A second embodiment of the present disclosure will be described. In each of the following examples, for elements whose functions are similar to those in the first embodiment, the reference numerals used in the description of the first embodiment will be used, and detailed descriptions of each will be omitted as appropriate.

第１実施形態では、楽曲が再生されていない待機状態を前提としたが、第２実施形態では、既に楽曲が再生されている状態（以下「再生状態」という）を前提として、音声入力により楽曲再生システム１０に動作を指示する場面を想定する。 In the first embodiment, it is assumed that the song is in a standby state where no music is being played. However, in the second embodiment, it is assumed that the song is already being played (hereinafter referred to as the "playing state"). Let us assume a situation where an operation is instructed to the playback system 10.

図５は、第２実施形態に係る楽曲再生システム１０の構成図である。図５に例示される通り、第２実施形態の楽曲再生システム１０は、第１実施形態の楽曲再生システム１０に、歌唱評価部１２７を追加した構成である。収音装置１１は、第１実施形態と同様に、利用者Ｕからの入力音声Ｖを受け付ける。判別部１２１は、第１実施形態と同様に、利用者Ｕによる入力音声Ｖが歌唱音声であるか指示音声であるかを判別する。 FIG. 5 is a configuration diagram of a music playback system 10 according to the second embodiment. As illustrated in FIG. 5, the music reproduction system 10 of the second embodiment has a configuration in which a singing evaluation section 127 is added to the music reproduction system 10 of the first embodiment. The sound collection device 11 receives the input voice V from the user U, similarly to the first embodiment. Similar to the first embodiment, the determining unit 121 determines whether the input voice V by the user U is a singing voice or an instruction voice.

また、第２実施形態の判別部１２１は、入力音声Ｖが、再生制御部１２５による再生中の楽曲の歌唱音声であるか、再生制御部１２５による再生中の楽曲以外の歌唱音声であるかを判別する。歌唱評価部１２７は、利用者Ｕの歌唱音声を評価する。具体的には、歌唱評価部１２７は、利用者Ｕの歌唱音声と、当該歌唱音声（入力音声Ｖ）に対応する楽曲の参照データとを比較することで、歌唱音声に対する評価を表す評価値（例えば得点）を生成する。評価値の生成には、公知の任意の技術が採用される。歌唱評価部１２７より生成された評価値は、例えば再生装置１４により再生（放音または表示）される。 Further, the determining unit 121 of the second embodiment determines whether the input voice V is a singing voice of a song being played by the playback control unit 125 or a singing voice of a song other than the song being played by the playback control unit 125. Discern. The singing evaluation unit 127 evaluates user U's singing voice. Specifically, the singing evaluation unit 127 compares the singing voice of the user U with the reference data of the song corresponding to the singing voice (input voice V), thereby obtaining an evaluation value ( e.g. score). Any known technique may be used to generate the evaluation value. The evaluation value generated by the singing evaluation unit 127 is reproduced (sounded or displayed) by the reproduction device 14, for example.

図６は、第２実施形態に係る制御装置１２が実行する処理を例示するフローチャートである。例えば収音装置１１が入力音声Ｖを受け付けると、図６の処理が実行される。図６のフローチャートは、図３で例示したステップＳa1-Ｓa7の処理に加えて、ステップＳa8-Ｓa10の処理を実行する。 FIG. 6 is a flowchart illustrating processing executed by the control device 12 according to the second embodiment. For example, when the sound collection device 11 receives the input voice V, the process shown in FIG. 6 is executed. The flowchart of FIG. 6 executes the processing of steps Sa8 to Sa10 in addition to the processing of steps Sa1 to Sa7 illustrated in FIG.

図６の処理が開始されると、判別部１２１は、入力音声Ｖが歌唱音声であるか歌唱音声以外の音声であるかを判別する（Ｓa1）。第１実施形態と同様に、複数の参照データについて算定された類似指標の最大値と閾値との対比により、入力音声Ｖが歌唱音声であるか否かが判別される。 When the process of FIG. 6 is started, the determining unit 121 determines whether the input voice V is a singing voice or a voice other than a singing voice (Sa1). Similar to the first embodiment, it is determined whether the input voice V is a singing voice by comparing the maximum value of the similarity index calculated for the plurality of reference data and the threshold value.

判別部１２１は、入力音声Ｖが歌唱音声であると判別された場合（Ｓa1：YES）、当該入力音声Ｖが、再生制御部１２５による再生中の楽曲の歌唱音声であるか、再生制御部１２５による再生中の楽曲以外の楽曲の歌唱音声であるかを判別する（Ｓa8）。具体的には、図４のステップＳa13で特定された参照データ（すなわち、類似指標が最大値であり、かつ、当該最大値が閾値を超える参照データ）の楽曲が再生されている場合には、入力音声Ｖが再生中の楽曲の歌唱音声であると判別される。他方、図４のステップＳa13で特定された参照データの楽曲が再生されていない場合には、入力音声Ｖが再生中の楽曲以外の楽曲の歌唱音声であると判別される。 When the determination unit 121 determines that the input voice V is a singing voice (Sa1: YES), the determination unit 121 determines whether the input voice V is a singing voice of a song being played by the playback control unit 125. (Sa8). Specifically, when the music of the reference data identified in step Sa13 of FIG. 4 (that is, the reference data whose similarity index is the maximum value and whose maximum value exceeds the threshold) is being played, It is determined that the input voice V is the singing voice of the song being played. On the other hand, if the song of the reference data specified in step Sa13 of FIG. 4 is not being played, it is determined that the input voice V is the singing voice of a song other than the song that is being played.

動作制御部１２３は、入力音声Ｖが再生中の楽曲の歌唱音声であると判別された場合（Ｓa8：YES）、当該入力音声Ｖを評価する第３動作を歌唱評価部１２７に対して指示する（Ｓa9）。歌唱評価部１２７は、第３動作を実行する（Ｓa10）。具体的には、歌唱音声の評価値が生成される。他方、動作制御部１２３は、入力音声Ｖが再生中の楽曲以外の歌唱音声であると判別された場合（Ｓa8：NO）、第１動作を再生制御部１２５に対して指示する（Ｓa2）。第１動作は、第１実施形態と同様に、入力音声Ｖに対応する楽曲を再生する動作である。ただし、第２実施形態では、入力音声Ｖに対応する楽曲を再生中の楽曲の後に再生（すなわち予約再生）する動作を第１動作として例示する。なお、再生中の楽曲を停止して当該入力音声Ｖに対応する楽曲を再生（すなわち即時再生）する第１動作を指示してもよい。再生制御部１２５は、第１動作を実行する（Ｓa3）。第２実施形態の再生制御部１２５は、入力音声Ｖに対応する楽曲を再生中の楽曲の後に再生装置１４に再生させる。 If the input voice V is determined to be the singing voice of the song being played (Sa8: YES), the operation control unit 123 instructs the singing evaluation unit 127 to perform a third operation of evaluating the input voice V. (Sa9). The singing evaluation unit 127 executes a third operation (Sa10). Specifically, an evaluation value of the singing voice is generated. On the other hand, when the operation control unit 123 determines that the input voice V is a singing voice other than the song being played back (Sa8: NO), the operation control unit 123 instructs the playback control unit 125 to perform the first operation (Sa2). The first operation is an operation of reproducing the music corresponding to the input audio V, as in the first embodiment. However, in the second embodiment, the first operation is an operation in which the music corresponding to the input audio V is played back after the music being played (ie, scheduled playback). Note that the first operation of stopping the music being played and playing the music corresponding to the input audio V (that is, instantaneous playback) may be instructed. The playback control unit 125 executes the first operation (Sa3). The playback control unit 125 of the second embodiment causes the playback device 14 to play the music corresponding to the input audio V after the music being played.

他方、判別部１２１は、第１実施形態と同様に、入力音声Ｖが歌唱音声以外の音声であると判別された場合（Ｓa1：NO）、入力音声Ｖが指示音声であるか指示音声以外の音声であるかを判別する（Ｓa4）。動作制御部１２３は、第１実施形態と同様に、入力音声Ｖが指示音声であると判定された場合（Ｓa4：YES）、当該入力音声Ｖが表す第２動作を再生制御部１２５に対して指示する（Ｓa5）。図２に例示される通り、例えば、再生中の楽曲を停止する動作、キーを変更する動作、または、音量を大きくする動作等の各種の動作が第２動作として例示される。なお、所望の楽曲の再生を指示する指示音声を利用者Ｕが発音した場合には、当該指示音声が指定する楽曲を再生（予約再生または即時再生）する第２動作が指示される。 On the other hand, similarly to the first embodiment, when it is determined that the input voice V is a voice other than a singing voice (Sa1: NO), the determination unit 121 determines whether the input voice V is an instruction voice or a voice other than an instruction voice. Determine whether it is a voice (Sa4). Similarly to the first embodiment, when it is determined that the input voice V is an instruction voice (Sa4: YES), the operation control unit 123 transmits the second operation represented by the input voice V to the playback control unit 125. Give instructions (Sa5). As illustrated in FIG. 2, various operations such as stopping the music being played, changing the key, or increasing the volume are exemplified as the second operation. Note that when the user U pronounces an instruction voice instructing reproduction of a desired music piece, a second operation is instructed to reproduce (reserved playback or immediate playback) the music piece specified by the instruction voice.

再生制御部１２５は、第１実施形態と同様に、第２動作を実行する（Ｓa6）。例えば、楽曲の再生を停止する第２動作が指示された場合には、楽曲の再生を停止させる。他方、動作制御部１２３は、入力音声Ｖが指示音声以外の音声であると判定された場合（Ｓa4：NO）、再生制御部１２５に対して指示しない（Ｓa7）。 The reproduction control unit 125 executes the second operation similarly to the first embodiment (Sa6). For example, when the second action of stopping the reproduction of the music is instructed, the reproduction of the music is stopped. On the other hand, if it is determined that the input voice V is a voice other than the instruction voice (Sa4: NO), the operation control unit 123 does not instruct the reproduction control unit 125 (Sa7).

第２実施形態においても第１実施形態と同様の効果が実現される。第２実施形態では、入力音声Ｖが再生中の楽曲の歌唱音声である場合には、当該入力音声を評価する第３動作が歌唱評価部１２７に指示され、入力音声Ｖが再生中の楽曲以外の歌唱音声である場合には、当該入力音声Ｖに対応する楽曲を再生する第１動作が再生制御部１２５に指示される。したがって、入力音声Ｖが再生中の楽曲の歌唱音声であるか否かに応じて、第１動作と第３動作との指示を変更することができる。なお、第２実施形態においても、待機状態では、第１実施形態で例示した処理が実行される。 The second embodiment also achieves the same effects as the first embodiment. In the second embodiment, when the input voice V is the singing voice of the song being played, the singing evaluation unit 127 is instructed to perform the third operation of evaluating the input voice, and the input voice V is the singing voice of the song being played. If the input voice V is a singing voice, the reproduction control unit 125 is instructed to perform a first operation of reproducing the music corresponding to the input voice V. Therefore, the instructions for the first action and the third action can be changed depending on whether the input sound V is the singing sound of the song being played. Note that in the second embodiment as well, the processing exemplified in the first embodiment is executed in the standby state.

第１実施形態および第２実施形態で説明した通り、入力音声Ｖ（歌唱音声）に対応する楽曲の再生に関する第１動作は、例えば入力音声Ｖに対応する楽曲を再生（即時再生／予約再生）する動作である。また、入力音声Ｖ（指示音声）が表す第２動作は、例えば、当該入力音声Ｖで指定される楽曲を再生（即時再生／予約再生）する動作、または、再生中の楽曲を制御（例えばキー、音量または再生速度の変更）する動作である。ただし、第１動作と第２動作との内容は、以上の例示に限定されない。また、待機状態と再生状態とで共通の楽曲を歌唱する歌唱音声を受け付けた場合に、待機状態と再生状態とで相異なる第１動作を指示する構成が好適である。待機状態では、入力音声Ｖに対応する楽曲を再生する第１動作が指示され、再生状態では、入力音声Ｖを評価する第３動作が指示される。ただし、待機状態と再生状態とで、共通の動作（例えば楽曲を再生する第１動作）が共通に指示されてもよい。 As explained in the first embodiment and the second embodiment, the first operation related to reproducing the music corresponding to the input audio V (singing audio) is, for example, reproducing the music corresponding to the input audio V (immediate playback/scheduled playback) This is an action to do. In addition, the second action represented by the input voice V (instruction voice) is, for example, an operation to play the music specified by the input voice V (immediate playback/scheduled playback), or to control the music being played (for example, by pressing the key , change the volume or playback speed). However, the contents of the first operation and the second operation are not limited to the above examples. Moreover, when a singing voice singing a common song in the standby state and the playback state is received, a configuration is suitable that instructs different first actions in the standby state and the playback state. In the standby state, a first operation of reproducing the music corresponding to the input voice V is instructed, and in the reproduction state, a third operation of evaluating the input voice V is instructed. However, a common action (for example, a first action of playing music) may be instructed in common in the standby state and the playback state.

＜第３実施形態＞
第１実施形態では、楽曲再生システム１０の機能を単体の端末装置で実現したが、第３実施形態では、楽曲再生システム１０の機能を複数の装置で実現する。図７は、第３実施形態に係る楽曲再生システム１０の構成を例示するブロック図である。図７に例示される通り、第３実施形態の楽曲再生システム１０は、端末装置２０と端末装置３０と処理装置４０とを具備する。 <Third embodiment>
In the first embodiment, the functions of the music reproduction system 10 are realized by a single terminal device, but in the third embodiment, the functions of the music reproduction system 10 are realized by a plurality of devices. FIG. 7 is a block diagram illustrating the configuration of a music playback system 10 according to the third embodiment. As illustrated in FIG. 7, the music reproduction system 10 of the third embodiment includes a terminal device 20, a terminal device 30, and a processing device 40.

処理装置４０は、利用者Ｕが所望する楽曲を再生する再生機器である。例えば車内に搭載されるカーナビゲーション機器またはカーオーディオ機器等が処理装置４０として好適である。利用者Ｕは、端末装置２０および端末装置３０に対する音声入力により処理装置４０に対して動作の指示が可能である。端末装置２０および端末装置３０は、利用者Ｕからの入力音声Ｖを共通に受け付けて、当該入力音声Ｖに応じた指示を処理装置４０に送信する情報端末である。処理装置４０が搭載された車内に、端末装置２０および端末装置３０が設置される。端末装置２０は、歌唱音声を受け付けて第１動作の指示Ｐ1を処理装置４０に送信する。例えば携帯電話機およびスマートフォン等の情報端末が端末装置２０として好適である。他方、端末装置３０は、指示音声を受け付けて第２動作の指示Ｐ2を処理装置４０に送信する。例えば、スマートスピーカ等の音声対話装置が端末装置３０として好適である。端末装置２０および端末装置３０の各々は、処理装置４０と有線または無線により通信可能である。 The processing device 40 is a playback device that plays music desired by the user U. For example, a car navigation device or a car audio device installed in a car is suitable as the processing device 40. The user U can instruct the processing device 40 to operate by voice input to the terminal device 20 and the terminal device 30. The terminal device 20 and the terminal device 30 are information terminals that commonly receive an input voice V from the user U and transmit an instruction according to the input voice V to the processing device 40. The terminal device 20 and the terminal device 30 are installed inside the vehicle in which the processing device 40 is mounted. The terminal device 20 receives the singing voice and transmits a first action instruction P1 to the processing device 40. For example, information terminals such as mobile phones and smartphones are suitable as the terminal device 20. On the other hand, the terminal device 30 receives the instruction voice and transmits a second operation instruction P2 to the processing device 40. For example, a voice interaction device such as a smart speaker is suitable as the terminal device 30. Each of the terminal device 20 and the terminal device 30 can communicate with the processing device 40 by wire or wirelessly.

図８は、端末装置２０の構成を例示するブロック図である。図８に例示される通り、端末装置２０は、収音装置２１と通信装置２２と制御装置２３と記憶装置２４とを具備する。収音装置２１は、周囲の音を収音する音響機器（マイクロホン）である。具体的には、収音装置２１は、利用者Ｕからの入力音声Ｖを受け付けて、当該入力音声Ｖを表す音響信号Ｘを生成する。 FIG. 8 is a block diagram illustrating the configuration of the terminal device 20. As shown in FIG. As illustrated in FIG. 8, the terminal device 20 includes a sound collection device 21, a communication device 22, a control device 23, and a storage device 24. The sound collection device 21 is an audio device (microphone) that collects surrounding sounds. Specifically, the sound collection device 21 receives the input voice V from the user U and generates the acoustic signal X representing the input voice V.

制御装置２３（コンピュータの例示）は、例えばＣＰＵ等の処理回路で構成され、楽曲再生システム１０の各要素を統括的に制御する。制御装置２３は、記憶装置２４に記憶されたプログラムを実行することで複数の機能（第１処理部２３１および第１動作制御部２３３）を実現する。なお、制御装置２３の一部の機能を専用の電子回路で実現してもよい。また、制御装置２３の機能を複数の装置に搭載してもよい。 The control device 23 (an example of a computer) is composed of a processing circuit such as a CPU, and controls each element of the music reproduction system 10 in an integrated manner. The control device 23 implements a plurality of functions (first processing section 231 and first operation control section 233) by executing programs stored in the storage device 24. Note that some functions of the control device 23 may be realized by a dedicated electronic circuit. Further, the functions of the control device 23 may be installed in a plurality of devices.

記憶装置２４は、制御装置２３が実行するプログラムと、制御装置２３が使用する各種のデータとを記憶する。例えば半導体記録媒体および磁気記録媒体等の公知の記録媒体、または、複数種の記録媒体の組合せが、記憶装置２４として任意に採用され得る。図８に例示される通り、第３実施形態の記憶装置２４は、相異なる複数の楽曲にそれぞれ対応する複数の参照データを記憶する。 The storage device 24 stores programs executed by the control device 23 and various data used by the control device 23. For example, known recording media such as semiconductor recording media and magnetic recording media, or a combination of multiple types of recording media may be arbitrarily employed as the storage device 24. As illustrated in FIG. 8, the storage device 24 of the third embodiment stores a plurality of reference data respectively corresponding to a plurality of different songs.

第１処理部２３１は、収音装置２１が生成した音響信号Ｘから、利用者Ｕによる入力音声Ｖが歌唱音声であるか歌唱音声以外の音声（すなわち指示音声または発話音声）であるかを判別する。図３で例示したステップＳa1（図４のＳa11-Ｓa14）と同様の処理で、入力音声Ｖが歌唱音声であるか否かが判別される。具体的には、記憶装置２４に記憶された複数の参照データの各々と音響信号Ｘとの間で算出された類似指標が、入力音声Ｖが歌唱音声であるか否かの判別に利用される。 The first processing unit 231 determines from the acoustic signal X generated by the sound collection device 21 whether the input voice V by the user U is a singing voice or a voice other than a singing voice (that is, an instruction voice or a spoken voice). do. In a process similar to step Sa1 (Sa11-Sa14 in FIG. 4) illustrated in FIG. 3, it is determined whether the input voice V is a singing voice. Specifically, a similarity index calculated between each of the plurality of reference data stored in the storage device 24 and the acoustic signal X is used to determine whether the input voice V is a singing voice. .

第１動作制御部２３３は、入力音声Ｖが歌唱音声であると判別された場合に、第１動作の指示Ｐ1を通信装置２２から処理装置４０に対して送信させる。第１動作の指示Ｐ1は、例えば、第１実施形態と同様に、入力音声Ｖに対応する楽曲を再生する動作である。通信装置２２は、第１動作制御部２３３の制御のもとで、第１動作の指示Ｐ1を処理装置４０に送信する。なお、入力音声Ｖが歌唱音声でないと判別（つまり歌唱音声以外の音声であると判別）された場合には、第１動作の指示Ｐ1は送信されない。処理装置４０は、端末装置２０から送信された指示Ｐ1を受信する。 The first operation control unit 233 causes the communication device 22 to transmit a first operation instruction P1 to the processing device 40 when it is determined that the input voice V is a singing voice. The first action instruction P1 is, for example, an action to play back the music corresponding to the input audio V, as in the first embodiment. The communication device 22 transmits the first operation instruction P1 to the processing device 40 under the control of the first operation control section 233. Note that when it is determined that the input voice V is not a singing voice (that is, it is determined that it is a voice other than a singing voice), the instruction P1 for the first operation is not transmitted. The processing device 40 receives the instruction P1 transmitted from the terminal device 20.

図９は、端末装置３０の構成を例示するブロック図である。図９に例示される通り、端末装置３０は、収音装置３１と通信装置３２と制御装置３３と記憶装置３４とを具備する。収音装置３１は、周囲の音を収音する音響機器（マイクロホン）である。具体的には、収音装置３１は、端末装置２０の収音装置２１と同様に、利用者Ｕからの入力音声Ｖを受け付けて、当該入力音声Ｖを表す音響信号Ｘを生成する。 FIG. 9 is a block diagram illustrating the configuration of the terminal device 30. As illustrated in FIG. 9, the terminal device 30 includes a sound collection device 31, a communication device 32, a control device 33, and a storage device 34. The sound collection device 31 is an audio device (microphone) that collects surrounding sounds. Specifically, like the sound collection device 21 of the terminal device 20, the sound collection device 31 receives the input voice V from the user U and generates the acoustic signal X representing the input voice V.

制御装置３３（コンピュータの例示）は、例えばＣＰＵ等の処理回路で構成され、楽曲再生システム１０の各要素を統括的に制御する。制御装置３３は、記憶装置３４に記憶されたプログラムを実行することで複数の機能（第２処理部３３１および第２動作制御部３３３）を実現する。なお、制御装置３３の一部の機能を専用の電子回路で実現してもよい。また、制御装置３３の機能を複数の装置に搭載してもよい。 The control device 33 (an example of a computer) is composed of a processing circuit such as a CPU, and controls each element of the music reproduction system 10 in an integrated manner. The control device 33 realizes a plurality of functions (second processing section 331 and second operation control section 333) by executing programs stored in the storage device 34. Note that some functions of the control device 33 may be realized by a dedicated electronic circuit. Further, the functions of the control device 33 may be installed in a plurality of devices.

記憶装置３４は、制御装置３３が実行するプログラムと、制御装置３３が使用する各種のデータとを記憶する。例えば半導体記録媒体および磁気記録媒体等の公知の記録媒体、または、複数種の記録媒体の組合せが、記憶装置３４として任意に採用され得る。図９に例示される通り、第３実施形態の記憶装置３４は、第１実施形態と同様の動作テーブルと、複数の楽曲データＭにそれぞれ対応する複数の楽曲名とを記憶する。 The storage device 34 stores programs executed by the control device 33 and various data used by the control device 33. For example, known recording media such as semiconductor recording media and magnetic recording media, or a combination of multiple types of recording media may be arbitrarily employed as the storage device 34. As illustrated in FIG. 9, the storage device 34 of the third embodiment stores an operation table similar to that of the first embodiment and a plurality of song names corresponding to the plurality of song data M, respectively.

第２処理部３３１は、収音装置３１が生成した音響信号Ｘから、利用者Ｕによる入力音声Ｖが指示音声であるか指示音声以外の音声（すなわち歌唱音声または発話音声）であるかを判別する。入力音声Ｖが指示音声であるか否かの判別には、第１実施形態と同様に、動作テーブルが利用される。また、指示音声が楽曲の再生を指示する場合には、記憶装置３４に記憶された複数の楽曲名のうち、指示音声が指定する楽曲名が特定される。楽曲名の特定には、第１実施形態と同様に、入力文字列に対する形態素解析等の自然言語処理が利用される。なお、第２処理部３３１での処理と第１処理部２３１での処理は、並行して実行される。 The second processing unit 331 determines from the acoustic signal X generated by the sound collection device 31 whether the input voice V by the user U is an instruction voice or a voice other than the instruction voice (that is, a singing voice or a spoken voice). do. Similar to the first embodiment, an action table is used to determine whether the input voice V is an instruction voice. Further, when the instruction voice instructs to play a song, the song name specified by the instruction voice is specified from among the plurality of song names stored in the storage device 34. As in the first embodiment, natural language processing such as morphological analysis of input character strings is used to identify the song name. Note that the processing in the second processing unit 331 and the processing in the first processing unit 231 are executed in parallel.

第２動作制御部３３３は、入力音声Ｖが指示音声であると判別された場合に、第２動作の指示Ｐ2を通信装置３２から処理装置４０に対して送信させる。第２動作は、例えば、第１実施形態と同様に、指示音声が指定する楽曲を再生する動作である。具体的には、第２処理部３３１が特定した楽曲名に対応する演奏データを再生する第２動作の指示Ｐ2が送信される。通信装置３２は、第２動作制御部３３３の制御のもとで、第２動作の指示Ｐ2を処理装置４０に送信する。なお、入力音声Ｖが指示音声でないと判別（つまり指示音声以外の音声であると判別）された場合には、第２動作の指示Ｐ2は送信されない。処理装置４０は、端末装置３０から送信された第２動作の指示Ｐ2を受信する。 The second operation control unit 333 causes the communication device 32 to transmit a second operation instruction P2 to the processing device 40 when it is determined that the input voice V is an instruction voice. The second operation is, for example, an operation of reproducing the music specified by the instruction voice, similar to the first embodiment. Specifically, a second operation instruction P2 for reproducing the performance data corresponding to the song name specified by the second processing unit 331 is transmitted. The communication device 32 transmits the second operation instruction P2 to the processing device 40 under the control of the second operation control section 333. Note that when it is determined that the input voice V is not an instruction voice (that is, it is determined that it is a voice other than the instruction voice), the instruction P2 for the second operation is not transmitted. The processing device 40 receives the second operation instruction P2 transmitted from the terminal device 30.

図１０は、処理装置４０の構成を例示するブロック図である。図１０に例示される通り、処理装置４０は、再生装置４１と通信装置４２と制御装置４３と記憶装置４４とを具備する。制御装置４３（コンピュータの例示）は、例えばＣＰＵ等の処理回路で構成され、楽曲再生システム１０の各要素を統括的に制御する。制御装置４３は、記憶装置４４に記憶されたプログラムを実行することで再生制御部４３１を実現する。なお、制御装置４３の一部の機能を専用の電子回路で実現してもよい。また、制御装置４３の機能を複数の装置に搭載してもよい。 FIG. 10 is a block diagram illustrating the configuration of the processing device 40. As shown in FIG. As illustrated in FIG. 10, the processing device 40 includes a playback device 41, a communication device 42, a control device 43, and a storage device 44. The control device 43 (an example of a computer) is composed of a processing circuit such as a CPU, and controls each element of the music reproduction system 10 in an integrated manner. The control device 43 realizes the playback control section 431 by executing a program stored in the storage device 44. Note that some functions of the control device 43 may be realized by a dedicated electronic circuit. Further, the functions of the control device 43 may be installed in a plurality of devices.

記憶装置４４は、制御装置４３が実行するプログラムと、制御装置４３が使用する各種のデータとを記憶する。例えば半導体記録媒体および磁気記録媒体等の公知の記録媒体、または、複数種の記録媒体の組合せが、記憶装置４４として任意に採用され得る。図１０に例示される通り、第３実施形態の記憶装置４４は、第１実施形態と同様に、複数の楽曲データＭを記憶する。 The storage device 44 stores programs executed by the control device 43 and various data used by the control device 43. For example, known recording media such as semiconductor recording media and magnetic recording media, or a combination of multiple types of recording media may be arbitrarily employed as the storage device 44. As illustrated in FIG. 10, the storage device 44 of the third embodiment stores a plurality of pieces of music data M, similarly to the first embodiment.

再生装置４１は、制御装置４３の指示のもとで楽曲を再生する再生機器である。通信装置４２は、端末装置２０から第１動作の指示Ｐ1を受信する。また、端末装置３０から第２動作の指示Ｐ2を受信する。 The playback device 41 is a playback device that plays music under instructions from the control device 43. The communication device 42 receives the first operation instruction P1 from the terminal device 20. Further, the second operation instruction P2 is received from the terminal device 30.

再生制御部４３１は、通信装置４２が受信した指示Ｐ1または指示Ｐ2を実行することで、再生装置４１を制御する。すなわち、再生制御部４３１は、端末装置２０の第１動作制御部２３３から指示された第１動作、または、端末装置３０の第２動作制御部３３３から指示された第２動作を実行する。第３実施形態の再生制御部４３１は、第１実施形態と同様のデータ処理部と音源部とを含み、第１動作制御部２３３または第２動作制御部３３３の指示に応じて演奏データから生成した音響信号を、再生装置４１に供給する。再生装置４１は、再生制御部４３１から供給された音響信号に応じた楽曲を再生する。 The playback control unit 431 controls the playback device 41 by executing the instruction P1 or the instruction P2 received by the communication device 42. That is, the playback control section 431 executes the first operation instructed by the first operation control section 233 of the terminal device 20 or the second operation instructed from the second operation control section 333 of the terminal device 30. The playback control section 431 of the third embodiment includes a data processing section and a sound source section similar to those of the first embodiment, and generates data from performance data in accordance with instructions from the first operation control section 233 or the second operation control section 333. The resulting acoustic signal is supplied to the playback device 41. The playback device 41 plays music according to the audio signal supplied from the playback control unit 431.

以上の説明から理解される通り、端末装置２０の第１処理部２３１と端末装置３０の第２処理部３３１とで、入力音声Ｖが歌唱音声であるか歌唱音声以外の指示音声であるかを判別する判別部として機能する。すなわち、判別部の機能を複数の装置で実現してもよい。また、端末装置２０の第１動作制御部２３３と端末装置３０の第２動作制御部３３３とは、入力音声Ｖが歌唱音声であると判別された場合に、当該入力音声Ｖに対応する楽曲の再生に関する第１動作を再生制御部４３１に対して指示し、入力音声Ｖが指示音声であると判別された場合に、当該入力音声Ｖが表す第２動作を再生制御部４３１に対して指示する動作制御部として機能する。すなわち、動作制御部の機能を複数の装置で実現してもよい。 As understood from the above explanation, the first processing unit 231 of the terminal device 20 and the second processing unit 331 of the terminal device 30 determine whether the input voice V is a singing voice or an instruction voice other than a singing voice. Functions as a discriminator for discrimination. That is, the function of the discriminator may be realized by a plurality of devices. Furthermore, when it is determined that the input voice V is a singing voice, the first operation control unit 233 of the terminal device 20 and the second operation control unit 333 of the terminal device 30 control the performance of the music corresponding to the input voice V. Instructs the playback control unit 431 to perform a first action related to playback, and instructs the playback control unit 431 to perform a second action represented by the input sound V when it is determined that the input voice V is an instruction sound. Functions as an operation control unit. That is, the function of the operation control section may be realized by a plurality of devices.

以上の説明から理解される通り、楽曲再生システム１０の機能を単一の装置で実現するか、複数の装置で実現するかは任意である。なお、複数の装置で実現する構成は、第３実施形態で例示した構成に限定されない。例えば、端末装置２０の第１処理部２３１および第１動作制御部２３３を、端末装置２０と通信可能なサーバ装置に搭載してもよい。具体的には、端末装置２０は、収音装置２１が生成した音響信号Ｘをサーバ装置に送信する。サーバ装置は、端末装置２０から受信した音響信号Ｘから第１動作を特定して、当該第１動作の指示Ｐ1を端末装置２０に送信する。そして、端末装置２０は、サーバ装置から送信された第１動作の指示Ｐ1を処理装置４０に送信する。また、端末装置３０の第２処理部３３１および第２動作制御部３３３の何れか一方をサーバ装置に搭載してもよい。なお、第３実施形態の構成を第２実施形態に適用してもよい。 As understood from the above description, it is optional whether the functions of the music playback system 10 are realized by a single device or by a plurality of devices. Note that the configuration realized by a plurality of devices is not limited to the configuration illustrated in the third embodiment. For example, the first processing unit 231 and first operation control unit 233 of the terminal device 20 may be installed in a server device that can communicate with the terminal device 20. Specifically, the terminal device 20 transmits the acoustic signal X generated by the sound collection device 21 to the server device. The server device specifies the first action from the acoustic signal X received from the terminal device 20, and transmits an instruction P1 for the first action to the terminal device 20. Then, the terminal device 20 transmits the first operation instruction P1 transmitted from the server device to the processing device 40. Further, either one of the second processing section 331 and the second operation control section 333 of the terminal device 30 may be installed in the server device. Note that the configuration of the third embodiment may be applied to the second embodiment.

＜変形例＞
以上に例示した各態様に付加される具体的な変形の態様を以下に例示する。以下の例示から任意に選択された複数の態様を、相互に矛盾しない範囲で適宜に併合してもよい。 <Modified example>
Specific modification modes added to each of the embodiments exemplified above are illustrated below. A plurality of aspects arbitrarily selected from the examples below may be combined as appropriate to the extent that they do not contradict each other.

（１）前述の各形態では、楽曲再生システム１０の収音装置が入力音声Ｖを受け付けたが、楽曲再生システム１０とは別個の収音装置を入力音声Ｖの受け付けに利用してもよい。例えば、車内に設置された収音装置、または、着脱可能な収音装置が入力音声Ｖの受け付けに利用される。以上の説明から理解される通り、楽曲再生システム１０と収音装置との一体／別体は任意である。 (1) In each of the above-described embodiments, the sound collection device of the music playback system 10 receives the input voice V, but a sound collection device separate from the music playback system 10 may be used to receive the input sound V. For example, a sound pickup device installed in the vehicle or a detachable sound pickup device is used to receive the input voice V. As understood from the above description, the music playback system 10 and the sound collection device may be integrated or separated as desired.

（２）前述の各形態では、演奏データおよび参照データを含む楽曲データＭを例示したが、楽曲名、演奏データおよび参照データとは異なるデータを楽曲データＭが含んでもよい。例えば、歌詞を表す歌詞データを楽曲データＭが含んでもよい。例えば歌詞の提示に歌詞データが利用される。例えば表示により歌詞を提示してもよいし、歌詞を表す音響を放音することで歌詞を提示してもよい。 (2) In each of the above embodiments, the song data M includes performance data and reference data, but the song data M may include data different from the song name, performance data, and reference data. For example, the music data M may include lyrics data representing lyrics. For example, lyrics data is used to present lyrics. For example, the lyrics may be presented by display, or by emitting sound representing the lyrics.

（３）前述の各形態において、入力音声Ｖが歌唱音声であるか指示音声であるかを判別する具体的な処理の方法は任意である。例えば、歌詞データが楽曲データＭに含まれる場合には、入力音声Ｖが表す文字列を音声認識により特定して、当該文字列と各楽曲データＭの歌詞データとを比較することで入力音声Ｖが歌唱音声であるか否かを判別してもよい。また、入力音声Ｖに対する音声認識、または、機械学習により得られたニューラルネットワーク等の学習済モデル（人工知能）等の公知の技術を利用して入力音声Ｖが指示音声であるか否かを判別してもよい。以上の構成では、指示音声が表す指示の内容も特定される。以上の説明から理解される通り、入力音声Ｖが指示音声であるか否かの判別に動作テーブルは必須ではない。 (3) In each of the above embodiments, the specific processing method for determining whether the input voice V is a singing voice or an instruction voice is arbitrary. For example, when lyrics data is included in the music data M, the character string represented by the input voice V is specified by voice recognition, and the input voice V is It may be determined whether or not the voice is a singing voice. In addition, it is determined whether or not the input voice V is an instruction voice using known technology such as voice recognition for the input voice V or a trained model (artificial intelligence) such as a neural network obtained by machine learning. You may. In the above configuration, the content of the instruction expressed by the instruction voice is also specified. As understood from the above explanation, the action table is not essential for determining whether the input voice V is an instruction voice.

（４）前述の各形態において、楽曲を再生する第１動作または第２動作の指示により楽曲を再生する前に、当該楽曲を利用者Ｕに提示する処理（以下「楽曲提示処理」という）を楽曲再生システム１００が実行してもよい。楽曲提示処理では、例えば楽曲名を利用者Ｕに提示する。図１１は、楽曲提示処理のフローチャートである。楽曲の再生を指示する入力音声Ｖ（以下「第１入力音声」という）を収音装置１１が受け付けると、図１１の処理が開始される。第１入力音声は、歌唱音声でも指示音声でもよい。例えば楽曲「ＡＢＣ」の再生を指示する場合には、当該楽曲「ＡＢＣ」を歌唱する歌唱音声、または、例えば文字列「［ＡＢＣ］を再生」を発音した指示音声が第１入力音声として例示される。 (4) In each of the above-mentioned embodiments, before the music is played according to the instruction of the first action or the second action to play the music, a process of presenting the music to the user U (hereinafter referred to as "music presentation process") is performed. The music playback system 100 may also perform the process. In the song presentation process, for example, a song name is presented to the user U. FIG. 11 is a flowchart of music presentation processing. When the sound collection device 11 receives an input voice V (hereinafter referred to as "first input voice") instructing reproduction of a music piece, the process of FIG. 11 is started. The first input voice may be a singing voice or an instruction voice. For example, when instructing to play the song "ABC", a singing voice singing the song "ABC" or an instruction voice pronouncing the character string "[ABC]" is exemplified as the first input voice. Ru.

制御装置１２は、第１入力音声に対応する楽曲名を特定する（Ｓb1）。すなわち第１入力音声により再生の指示がされた楽曲の楽曲名が特定される。動作制御部１２３は、第１入力音声に対応する楽曲名の提示の指示を再生制御部１２５に付与する（Ｓb2）。再生装置１４は、再生制御部１２５の指示により楽曲名を提示する。例えば、楽曲名を表す音（例えば「［楽曲名］ですか？」を表す音）が再生装置１４により放音される。なお、楽曲名を表す文字列を再生装置１４により表示してもよい。 The control device 12 identifies the song name corresponding to the first input audio (Sb1). In other words, the name of the song for which the reproduction instruction was given by the first input voice is specified. The operation control unit 123 gives an instruction to the playback control unit 125 to present the song name corresponding to the first input audio (Sb2). The playback device 14 presents the song name according to instructions from the playback control unit 125. For example, the playback device 14 emits a sound representing the song name (for example, a sound representing "[song name]?"). Note that a character string representing the song name may be displayed by the playback device 14.

利用者Ｕは、再生装置１４により再生された楽曲名の楽曲が所望の楽曲である場合には、当該楽曲名の楽曲が所望の楽曲であることを表す入力音声Ｖ（以下「第２入力音声」という）を発音する。第２入力音声は、例えば「はい」を発話した音声である。なお、利用者Ｕは、再生装置１４により再生された楽曲名の楽曲が所望の楽曲でない場合には、当該楽曲名の楽曲が所望の楽曲でないことを表す音声（例えば「いいえ」を発話した音声）を発音する。 If the song with the song name played by the playback device 14 is the desired song, the user U can use the input voice V (hereinafter referred to as "second input voice") that indicates that the song with the song name is the desired song. ”) is pronounced. The second input voice is, for example, a voice saying "yes". Note that, if the song with the song title played by the playback device 14 is not the desired song, the user U may hear a voice indicating that the song with the song name is not the desired song (for example, a voice saying "no"). ) to pronounce.

制御装置１２は、収音装置１１が第２入力音声を受け付けたか否かを判定する（Ｓb3）。第２入力音声を受け付けたと判定された場合（Ｓb3：YES）、動作制御部１２３は、第１入力音声に対応する楽曲を再生する動作を再生制御部１２５に対して指示する（Ｓb4）。すなわち、再生装置１４により再生された楽曲名の楽曲が再生される。 The control device 12 determines whether the sound collection device 11 has received the second input audio (Sb3). If it is determined that the second input audio has been accepted (Sb3: YES), the operation control unit 123 instructs the reproduction control unit 125 to perform an operation of reproducing the music corresponding to the first input audio (Sb4). That is, the song having the song name played by the playback device 14 is played back.

他方、第２入力音声以外の入力音声Ｖを受け付けたと判定された場合（Ｓb3：ＮＯ）、第１入力音声に対応する楽曲は再生されない。なお、楽曲再生システム１０は、所望しない楽曲であることを表す入力Ｖ音声を受け付けた場合、再生する楽曲を特定しなおしてもよい。以上の説明から理解される通り、楽曲提示処理では、第１入力音声により特定された楽曲が所望する楽曲であるか否かを、楽曲の再生の前に利用者Ｕが確認できるという利点がある。 On the other hand, if it is determined that input audio V other than the second input audio has been received (Sb3: NO), the music corresponding to the first input audio is not played. Note that when the music reproduction system 10 receives input V audio indicating that the music is an undesired music, it may re-specify the music to be played. As can be understood from the above explanation, the music presentation process has the advantage that the user U can confirm whether or not the music specified by the first input audio is the desired music before playing the music. .

（５）前述の各形態では、楽曲再生システム１０を車内で利用したが、楽曲再生システム１０を利用する場所は任意である。 (5) In each of the above embodiments, the music playback system 10 is used in a car, but the music playback system 10 can be used at any location.

（６）前述の各形態では、楽曲再生システム１０はカラオケ曲を再生したが、楽曲再生システム１０が再生する楽曲は以上の例示に限定されない。例えば、歌唱者による歌声を含む楽曲を再生してもよい。 (6) In each of the above-described embodiments, the music playback system 10 played back karaoke music, but the music played by the music playback system 10 is not limited to the above examples. For example, a song including a singer's singing voice may be played.

（７）前述の各形態の楽曲再生システム１０は、複数の利用者Ｕでも利用される。複数の利用者Ｕにより利用される場合、収音装置は、複数の利用者Ｕのそれぞれが発音する複数の音声を含む入力音声Ｖを受け付ける。楽曲再生システム１０は、当該入力音声Ｖから各利用者Ｕの音声を分離し、当該分離後の各音声について歌唱音声であるか指示音声であるか判別する。すなわち、複数の利用者Ｕが同時に発音する場合でも、各利用者Ｕが発話した音声に対応する動作を楽曲再生システム１０に指示することが可能である。 (7) The music reproduction system 10 of each of the above-mentioned forms is also used by a plurality of users U. When used by a plurality of users U, the sound collection device receives input speech V including a plurality of sounds produced by each of the plurality of users U. The music reproduction system 10 separates the voices of each user U from the input voice V, and determines whether each voice after the separation is a singing voice or an instruction voice. That is, even when a plurality of users U make sounds at the same time, it is possible to instruct the music playback system 10 to perform operations corresponding to the voices uttered by each user U.

（８）第１実施形態では、第１動作と第１動作とは異なる第２動作と例示したが、第１動作と第２動作とが同じ動作であってもよい。ただし、第１動作と第２動作とが相異なる動作である構成によれば、利用者Ｕは入力音声Ｖの種類（歌唱音声／指示音声）を適宜に変更することで、所望する動作を再生制御部１２５に指示することが可能である。 (8) In the first embodiment, the first action and the second action are different from each other, but the first action and the second action may be the same action. However, according to a configuration in which the first action and the second action are different actions, the user U can reproduce the desired action by appropriately changing the type of input voice V (singing voice/instruction voice). It is possible to instruct the control unit 125.

（９）前述の各形態では、入力音声Ｖに対応する楽曲を再生する動作を第１動作として例示したが、第１動作の内容は以上の例示に限定されない。例えば、入力音声Ｖ（歌唱音声）に応じて再生態様を変更する動作を第１動作としてもよい。例えば入力音声Ｖのテンポに応じて、再生中の楽曲または再生を開始させる楽曲のテンポを変更する第１動作、または、入力音声Ｖのキーに応じて、再生中の楽曲または再生を開始させる楽曲のキーを変更する第１動作が好適である。 (9) In each of the above-described embodiments, the operation of reproducing the music corresponding to the input audio V is illustrated as the first operation, but the content of the first operation is not limited to the above examples. For example, the first operation may be an operation of changing the reproduction mode according to the input voice V (singing voice). For example, the first operation of changing the tempo of the song being played or the song that starts playing depending on the tempo of the input audio V, or the song that is being played or the song that starts playing depending on the key of the input audio V. The first action of changing the key is preferred.

（１０）第１実施形態では、楽曲のうち利用者Ｕが歌唱した部分の直後から当該楽曲を再生する動作を、入力音声Ｖに対応する位置から再生する第１動作として例示したが、入力音声Ｖに対応する位置から再生する第１動作は以上の例示に限定されない。例えば、入力音声Ｖに対応する楽曲を区分した複数の区間（以下「単位区間」という）のうち、当該入力音声Ｖが表す部分を含む単位区間の先頭から、当該楽曲を再生する動作を第１動作とする構成も採用される。以上の構成において、楽曲データＭは、単位区間を画定するための区間データを含む。区間データは、各単位区間の始点および終点を規定する。単位区間は、例えば、音楽的な表情のまとまりであるフレーズ、または、Ａメロ、Ｂメロおよびサビ等の構造区間である。なお、単位区間は以上の例示に限定されない。複数の単位区間のうち入力音声Ｖが表す部分を含む単位区間は、公知の楽曲解析技術により特定される。以上の構成では、複数の単位区間のうち入力音声Ｖが表す部分を含む単位区間の先頭から、当該楽曲が再生されるから、利用者は、楽曲のうち入力音声Ｖに対応する部分から当該楽曲を歌唱することができる。 (10) In the first embodiment, the operation of reproducing the song immediately after the part sung by the user U of the song was exemplified as the first operation of reproducing the song from the position corresponding to the input voice V, but the input voice The first operation of reproducing from the position corresponding to V is not limited to the above example. For example, among a plurality of sections (hereinafter referred to as "unit sections") into which the song corresponding to the input voice V is divided, the operation of reproducing the song from the beginning of the unit section that includes the part represented by the input voice V is performed in the first step. A configuration in which it is an operation is also adopted. In the above configuration, the music data M includes section data for defining unit sections. The section data defines the start point and end point of each unit section. The unit section is, for example, a phrase that is a group of musical expressions, or a structural section such as a melody, a melody, and a chorus. Note that the unit section is not limited to the above example. Among the plurality of unit sections, the unit section including the portion represented by the input voice V is specified by a known music analysis technique. In the above configuration, the music is played from the beginning of the unit section including the part represented by the input audio V among the plurality of unit intervals, so the user can play the music from the part corresponding to the input audio V among the multiple unit intervals. can sing.

また、入力音声Ｖに対応する楽曲を区分した複数の単位区間のうち、当該入力音声Ｖが表す部分を含む単位区間の直前または直後の単位区間の先頭から、当該楽曲を再生する動作を第１動作としてもよい。なお、楽曲の先頭から再生させる第１動作も好適である。以上の説明から理解される通り、第１動作では、入力音声Ｖ（歌唱音声）に対応する楽曲を再生させる位置は可変に設定される。 Also, among the plurality of unit sections into which the music corresponding to the input audio V is divided, the operation of playing the music from the beginning of the unit section immediately before or after the unit section including the part represented by the input audio V is performed as the first operation. It may also be an action. Note that the first operation of reproducing the music from the beginning is also suitable. As understood from the above description, in the first operation, the position at which the music corresponding to the input voice V (singing voice) is played is set to be variable.

（１１）第２実施形態では、入力音声Ｖが歌唱音声であるか指示音声であるかを判別する構成を前提として、入力音声Ｖが歌唱音声である場合に、当該入力音声Ｖが再生中の楽曲の歌唱音声であるか否かを判別した。ただし、第２実施形態において、入力音声Ｖが歌唱音声であるか指示音声であるかを判別する構成を前提せずに、入力音声Ｖが再生中の楽曲の歌唱音声であるか否かを判別してもよい。すなわち、入力音声Ｖが、再生制御部１２５による再生中の楽曲の歌唱音声であるか、再生制御部１２５による再生中の楽曲以外の楽曲の歌唱音声であるかを判別し、当該入力音声Ｖが再生制御部１２５による再生中の楽曲の歌唱音声であると判別された場合には、第３動作を歌唱評価部１２７に対して指示し、当該入力音声Ｖが再生制御部１２５による再生中の楽曲以外の楽曲の歌唱音声であると判別された場合には、第１動作を再生制御部１２５に対して指示する構成は、入力音声Ｖが歌唱音声であるか指示音声であるかを判別する構成とは独立して成立する。 (11) In the second embodiment, assuming a configuration that determines whether the input voice V is a singing voice or an instruction voice, when the input voice V is a singing voice, the input voice V is being played back. It was determined whether the audio was the singing voice of a song or not. However, in the second embodiment, it is determined whether or not the input voice V is the singing voice of the song being played, without assuming a configuration for determining whether the input voice V is a singing voice or an instruction voice. You may. That is, it is determined whether the input voice V is the singing voice of the song being played by the playback control unit 125 or the singing voice of a song other than the song being played by the playback control unit 125, and the input voice V is determined. If it is determined that the input voice V is the singing voice of the song being played by the playback control unit 125, the third operation is instructed to the singing evaluation unit 127, and the input voice V is determined to be the singing voice of the song being played by the playback control unit 125. The configuration that instructs the playback control unit 125 to perform the first operation when it is determined that the input voice is a singing voice of a song other than the above is the configuration that determines whether the input voice V is a singing voice or an instruction voice. It is established independently of.

（１２）第３実施形態では、端末装置２０と端末装置３０とが独立して処理を実行したが、端末装置２０と端末装置３０とが連動して処理を実行してもよい。例えば端末装置３０は、利用者Ｕによる入力音声Ｖが指示音声以外の音声であると判別した場合に、端末装置２０に対して入力音声Ｖが歌唱音声であるか否かを判別する指示を送信してもよい。端末装置２０は、端末装置３０からの指示を受信すると、入力音声Ｖが歌唱音声であるか否かの判別をする。 (12) In the third embodiment, the terminal device 20 and the terminal device 30 independently executed the process, but the terminal device 20 and the terminal device 30 may execute the process in conjunction with each other. For example, when the terminal device 30 determines that the input voice V by the user U is a voice other than the instruction voice, the terminal device 30 transmits an instruction to the terminal device 20 to determine whether the input voice V is a singing voice. You may. Upon receiving the instruction from the terminal device 30, the terminal device 20 determines whether the input voice V is a singing voice.

（１３）前述の各形態において、例えば歌唱音声または指示音声を受け付けた場合に、楽曲再生システム１０を起動する構成も好適である。 (13) In each of the above embodiments, a configuration is also suitable in which the music reproduction system 10 is started, for example, when a singing voice or an instruction voice is received.

（１４）前述の各形態に係る楽曲再生システム１０の機能は、各形態での例示の通り、ＣＰＵ等の処理回路とプログラムとの協働により実現される。前述の各形態に係るプログラムは、コンピュータが読取可能な記録媒体に格納された形態で提供されてコンピュータにインストールされ得る。記録媒体は、例えば非一過性（non-transitory）の記録媒体であり、ＣＤ-ＲＯＭ等の光学式記録媒体（光ディスク）が好例であるが、半導体記録媒体または磁気記録媒体等の公知の任意の形式の記録媒体も包含される。なお、非一過性の記録媒体とは、一過性の伝搬信号（transitory, propagating signal）を除く任意の記録媒体を含み、揮発性の記録媒体も除外されない。また、通信網を介した配信の形態でプログラムをコンピュータに提供してもよい。 (14) The functions of the music playback system 10 according to each of the above embodiments are realized by cooperation between a processing circuit such as a CPU and a program, as illustrated in each embodiment. The programs according to each of the above embodiments may be provided in a form stored in a computer-readable recording medium and installed in a computer. The recording medium is, for example, a non-transitory recording medium, and an optical recording medium (optical disk) such as a CD-ROM is a good example, but any known recording medium such as a semiconductor recording medium or a magnetic recording medium is used. Also included are recording media in the form of. Note that the non-transitory recording medium includes any recording medium except for transitory, propagating signals, and does not exclude volatile recording media. Further, the program may be provided to the computer in the form of distribution via a communication network.

＜付記＞
以上に例示した形態から、例えば以下の構成が把握される。 <Additional notes>
From the embodiments exemplified above, the following configurations can be understood, for example.

本開示の好適な態様（第１態様）に係る楽曲再生システムの制御方法は、入力音声が歌唱音声であるか歌唱音声以外の指示音声であるかを判別し、前記入力音声が歌唱音声であると判別された場合に、当該入力音声に対応する楽曲の再生に関する第１動作を、楽曲の再生を制御する再生制御部に対して指示し、前記入力音声が指示音声であると判別された場合に、当該入力音声が表す第２動作を前記再生制御部に対して指示する。以上の態様では、入力音声が歌唱音声であると判別された場合には、当該入力音声に対応する楽曲の再生に関する第１動作が再生制御部に指示され、入力音声が指示音声であると判別された場合には、当該入力音声が表す第２動作が再生制御部に指示される。すなわち、歌唱音声および指示音声による多様な音声入力が可能である。例えば、前記第１動作と前記第２動作とは、相異なる動作である。以上の態様では、入力音声の種類（歌唱音声／指示音声）を適宜に変更することで、所望する動作を再生制御部に指示することが可能である。 A method for controlling a music playback system according to a preferred aspect (first aspect) of the present disclosure is to determine whether an input voice is a singing voice or an instruction voice other than a singing voice, and to determine whether the input voice is a singing voice. If it is determined that the input voice is an instruction voice, the input voice is determined to be an instruction voice, and the input voice is determined to be an instruction voice. Then, the playback controller is instructed to perform a second action represented by the input audio. In the above aspect, when the input voice is determined to be a singing voice, the playback control unit is instructed to perform the first operation related to the reproduction of the music corresponding to the input voice, and the input voice is determined to be an instruction voice. If so, the second action represented by the input audio is instructed to the playback control unit. That is, a variety of voice inputs including singing voice and instruction voice are possible. For example, the first operation and the second operation are different operations. In the above aspect, by appropriately changing the type of input audio (singing audio/instruction audio), it is possible to instruct the playback control unit to perform a desired operation.

第１態様の好適例（第２態様）では、前記第１動作は、前記入力音声に対応する楽曲を当該歌唱音声に対応する位置から再生する動作であり、前記第２動作は、前記入力音声に対応する楽曲を先頭から再生する動作である。以上の態様では、入力音声に対応する楽曲を当該入力音声に対応する位置から再生する動作が第１動作であり、入力音声に対応する楽曲を先頭から再生する動作が第２動作である。したがって、利用者は、入力音声の種類（歌唱音声／指示音声）を適宜に変更することで、所望する再生方法により楽曲を再生させることが可能である。 In a preferred example of the first aspect (second aspect), the first operation is an operation of reproducing a song corresponding to the input voice from a position corresponding to the singing voice, and the second operation is an operation of reproducing the music corresponding to the input voice from a position corresponding to the singing voice. This is an operation that plays the music corresponding to the song from the beginning. In the above aspect, the operation of reproducing the music corresponding to the input audio from the position corresponding to the input audio is the first operation, and the operation of reproducing the music corresponding to the input audio from the beginning is the second operation. Therefore, by appropriately changing the type of input audio (singing audio/instruction audio), the user can reproduce the music according to a desired reproduction method.

第２態様の好適例（第３態様）では、前記第１動作は、前記入力音声に対応する楽曲を区分した複数の区間のうち、当該入力音声が表す部分を含む区間の先頭から、当該楽曲を再生する動作である。以上の態様では、入力音声が歌唱音声である場合には、当該入力音声に対応する楽曲を区分した複数の区間のうち、当該入力音声が表す部分を含む区間の先頭から、当該楽曲が再生される。したがって、利用者は、楽曲のうち歌唱音声に対応する部分から当該楽曲を歌唱することができる。 In a preferred example of the second aspect (third aspect), the first operation is performed from the beginning of the section including the part represented by the input voice out of a plurality of sections into which the music corresponding to the input voice is divided. This is the action of reproducing. In the above aspect, when the input audio is a singing audio, the music is played from the beginning of the section that includes the part represented by the input audio, out of multiple sections into which the music corresponding to the input audio is divided. Ru. Therefore, the user can sing the song from the portion of the song that corresponds to the singing voice.

第１態様から第３態様の何れかの好適例（第４態様）では、前記入力音声が、前記再生制御部による再生中の楽曲の歌唱音声であるか、前記再生制御部による再生中の楽曲以外の楽曲の歌唱音声であるかを判別し、前記入力音声が前記再生制御部による再生中の楽曲の歌唱音声であると判別された場合には、当該入力音声を評価する第３動作を歌唱評価部に対して指示し、前記入力音声が前記再生制御部による再生中の楽曲以外の楽曲の歌唱音声であると判別された場合には、当該入力音声に対応する楽曲を再生する動作を前記第１動作として前記再生制御部に対して指示する。以上の態様では、入力音声が再生中の楽曲の歌唱音声である場合には、当該入力音声を評価する第３動作が歌唱評価部に指示され、入力音声が再生中の楽曲以外の楽曲の歌唱音声である場合には、当該入力音声に対応する楽曲を再生する第１動作が再生制御部に指示される。したがって、入力音声が再生中の楽曲の歌唱音声であるか否かに応じて、第１動作と第３動作との指示を変更することができる。 In a preferred example of any one of the first to third aspects (fourth aspect), the input audio is a singing voice of a song being played by the playback control unit, or the song being played by the playback control unit. If the input voice is determined to be the singing voice of a song being played by the playback control unit, perform a third operation of evaluating the input voice. When the input audio is determined to be the singing audio of a song other than the song being played by the playback control section, the evaluation section performs an operation to play back a song corresponding to the input audio. As a first operation, an instruction is given to the reproduction control section. In the above aspect, when the input audio is the singing audio of the song that is being played, the singing evaluation section is instructed to perform the third operation of evaluating the input audio, and the input audio is the singing audio of the song that is not the song that is being played. If the input audio is audio, the reproduction control unit is instructed to perform a first operation of reproducing the music corresponding to the input audio. Therefore, the instructions for the first action and the third action can be changed depending on whether the input sound is the singing sound of the song being played.

本開示の他の態様（第５態様）に係る楽曲再生システムの制御方法は、楽曲の再生を指示する第１入力音声に対応する楽曲名の提示の指示を再生制御部に付与し、前記提示された楽曲名の楽曲が所望の楽曲であることを表す第２入力音声を受け付けた場合に、前記第１入力音声に対応する楽曲を再生する動作を前記再生制御部に対して指示する。以上の態様では、楽曲の再生を指示する第１入力音声に対応する楽曲名の提示の指示を再生制御部に付与し、当該提示された楽曲名の楽曲が所望の楽曲であることを表す第２入力音声を受け付けた場合に、第１入力音声に対応する楽曲を再生する動作が再生制御部に対して指示される。すなわち、楽曲が再生される前に、当該楽曲が所望の楽曲であるか否かを楽曲名の提示により確認することができる。 A method for controlling a music playback system according to another aspect (fifth aspect) of the present disclosure includes: providing a playback control unit with an instruction to present a song name corresponding to a first input voice instructing playback of a song; When receiving a second input voice indicating that the song with the given song name is the desired song, the playback controller is instructed to play back the song corresponding to the first input voice. In the above aspect, an instruction to present a song name corresponding to the first input voice instructing playback of a song is given to the playback control unit, and a first input voice indicating that the song with the presented song name is a desired song is given to the playback control unit. When the second input sound is received, the playback control unit is instructed to play the music corresponding to the first input sound. That is, before a song is played, it is possible to confirm whether the song is the desired song by presenting the song name.

本開示の他の態様（第６態様）に係る楽曲再生システムの制御方法は、入力音声が、楽曲の再生を制御する再生制御部による再生中の楽曲の歌唱音声であるか、当該再生制御部による再生中の楽曲以外の楽曲の歌唱音声であるかを判別し、前記入力音声が前記再生制御部による再生中の楽曲の歌唱音声であると判別された場合には、当該入力音声を評価する動作を歌唱評価部に対して指示し、前記入力音声が前記再生制御部による再生中の楽曲以外の楽曲の歌唱音声であると判別された場合には、当該入力音声に対応する楽曲を再生する動作を前記再生制御部に対して指示する。以上の態様では、入力音声が再生中の楽曲の歌唱音声である場合には、当該入力音声を評価する動作が歌唱評価部に指示され、入力音声が再生中の楽曲以外の楽曲の歌唱音声である場合には、当該入力音声に対応する楽曲を再生する動作が再生制御部に指示される。したがって、入力音声が再生中の楽曲の歌唱音声であるか否かに応じて、楽曲再生システムに異なる動作を指示することが可能になる。 A method for controlling a music playback system according to another aspect (sixth aspect) of the present disclosure is to determine whether the input audio is a singing voice of a song being played by a playback control unit that controls playback of a song, or determines whether the input voice is a singing voice of a song other than the song being played by the playback controller, and if the input voice is determined to be the singing voice of a song that is being played by the playback control unit, evaluates the input voice. Instruct the singing evaluation unit to perform the operation, and if the input audio is determined to be the singing audio of a song other than the song being played by the playback control unit, play the song corresponding to the input audio. The operation is instructed to the reproduction control section. In the above aspect, when the input audio is the singing audio of the song that is being played, the singing evaluation section is instructed to perform an operation to evaluate the input audio, and if the input audio is the singing audio of a song other than the song that is being played, the singing evaluation section is instructed to evaluate the input audio. In some cases, the playback control unit is instructed to play back the music corresponding to the input audio. Therefore, it is possible to instruct the music playback system to perform different operations depending on whether the input sound is the singing sound of the music being played.

本開示の好適な態様（第７態様）に係る楽曲再生システムは、入力音声が歌唱音声であるか歌唱音声以外の指示音声であるかを判別する判別部と、前記入力音声が歌唱音声であると判別された場合に、当該入力音声に対応する楽曲の再生に関する第１動作を、楽曲の再生を制御する再生制御部に対して指示し、前記入力音声が指示音声であると判別された場合に、当該入力音声が表す第２動作を前記再生制御部に対して指示する動作制御部とを具備する。以上の態様では、入力音声が歌唱音声であると判別された場合には、当該入力音声に対応する楽曲の再生に関する第１動作が再生制御部に指示され、入力音声が指示音声であると判別された場合には、当該入力音声が表す第２動作が再生制御部に指示される。すなわち、歌唱音声および指示音声による多様な音声入力が可能である。 A music playback system according to a preferred aspect (seventh aspect) of the present disclosure includes a determination unit that determines whether the input voice is a singing voice or an instruction voice other than the singing voice, and the input voice is a singing voice. If it is determined that the input voice is an instruction voice, the input voice is determined to be an instruction voice, and the input voice is determined to be an instruction voice. and an operation control section that instructs the reproduction control section to perform a second operation represented by the input voice. In the above aspect, when the input voice is determined to be a singing voice, the playback control unit is instructed to perform the first operation related to the reproduction of the music corresponding to the input voice, and the input voice is determined to be an instruction voice. If so, the second action represented by the input audio is instructed to the playback control unit. That is, a variety of voice inputs including singing voice and instruction voice are possible.

第７態様の好適例（第８態様）において、前記第１動作は、前記入力音声に対応する楽曲を当該入力音声に対応する位置から再生する動作であり、前記第２動作は、前記入力音声に対応する楽曲を先頭から再生する動作である。以上の態様では、入力音声に対応する楽曲を当該入力音声に対応する位置から再生する動作が第１動作であり、入力音声に対応する楽曲を先頭から再生する動作が第２動作である。したがって、利用者は、入力音声の種類（歌唱音声／指示音声）を適宜に変更することで、所望する再生方法により楽曲を再生させることが可能である。 In a preferred example of the seventh aspect (eighth aspect), the first operation is an operation of reproducing a song corresponding to the input audio from a position corresponding to the input audio, and the second operation is an operation of reproducing the music corresponding to the input audio from a position corresponding to the input audio. This is an operation that plays the music corresponding to the song from the beginning. In the above aspect, the operation of reproducing the music corresponding to the input audio from the position corresponding to the input audio is the first operation, and the operation of reproducing the music corresponding to the input audio from the beginning is the second operation. Therefore, by appropriately changing the type of input audio (singing audio/instruction audio), the user can reproduce the music according to a desired reproduction method.

第８態様の好適例（第９態様）において、前記第１動作は、前記入力音声に対応する楽曲を区分した複数の区間のうち、当該入力音声が表す部分を含む区間の先頭から、当該楽曲を再生する動作である。以上の態様では、入力音声が歌唱音声である場合には、当該入力音声に対応する楽曲を区分した複数の区間のうち、当該入力音声が表す部分を含む区間の先頭から、当該楽曲が再生される。したがって、利用者は、歌唱音声に連続して当該楽曲を歌唱することができる。 In a preferred example of the eighth aspect (ninth aspect), the first operation is performed from the beginning of the section including the part represented by the input voice, of a plurality of sections into which the song corresponding to the input voice is divided. This is the action of reproducing. In the above aspect, when the input audio is a singing audio, the music is played from the beginning of the section that includes the part represented by the input audio, out of multiple sections into which the music corresponding to the input audio is divided. Ru. Therefore, the user can sing the song in succession to the singing voice.

第７態様から第９態様の何れかの好適例（第１０態様）において、前記判別部は、前記入力音声が、前記再生制御部による再生中の楽曲の歌唱音声であるか、前記再生制御部による再生中の楽曲以外の楽曲の歌唱音声であるかを判別し、前記動作制御部は、前記入力音声が前記再生制御部による再生中の楽曲の歌唱音声であると判別された場合には、当該入力音声を評価する第３動作を歌唱評価部に対して指示し、前記入力音声が前記再生制御部による再生中の楽曲以外の楽曲の歌唱音声であると判別された場合には、当該入力音声に対応する楽曲を再生する動作を前記第１動作として前記再生制御部に対して指示する。以上の態様では、入力音声が再生中の楽曲の歌唱音声である場合には、当該入力音声を評価する第３動作が歌唱評価部に指示され、入力音声が再生中の楽曲以外の楽曲の歌唱音声である場合には、当該入力音声に対応する楽曲を再生する第１動作が再生制御部に指示される。したがって、入力音声が再生中の楽曲の歌唱音声であるか否かに応じて、第１動作と第３動作との指示を変更することができる。 In a preferred example of any one of the seventh to ninth aspects (tenth aspect), the determination unit determines whether the input audio is a singing voice of a song being played by the playback control unit. The operation control section determines whether the input sound is the singing voice of a song other than the song being played back by the playback control section, and if the input voice is determined to be the singing voice of the song being played back by the playback control section, A third operation for evaluating the input voice is instructed to the singing evaluation section, and if the input voice is determined to be the singing voice of a song other than the song being played by the playback control section, the input voice is evaluated. An instruction is given to the reproduction control unit as the first operation to reproduce the music corresponding to the audio. In the above aspect, when the input audio is the singing audio of the song that is being played, the singing evaluation section is instructed to perform the third operation of evaluating the input audio, and the input audio is the singing audio of the song that is not the song that is being played. If the input audio is audio, the reproduction control unit is instructed to perform a first operation of reproducing the music corresponding to the input audio. Therefore, the instructions for the first action and the third action can be changed depending on whether the input sound is the singing sound of the song being played.

本開示の好適な態様（第１１態様）に係るプログラムは、１または複数のプロセッサを、入力音声が歌唱音声であるか歌唱音声以外の指示音声であるかを判別する判別部、および、前記入力音声が歌唱音声であると判別された場合に、当該入力音声に対応する楽曲の再生に関する第１動作を、楽曲の再生を制御する再生制御部に対して指示し、前記入力音声が指示音声であると判別された場合に、当該入力音声が表す第２動作を前記再生制御部に対して指示する動作制御部として機能させる。以上の態様では、入力音声が歌唱音声であると判別された場合には、当該入力音声に対応する楽曲の再生に関する第１動作が再生制御部に指示され、入力音声が指示音声であると判別された場合には、当該入力音声が表す第２動作が再生制御部に指示される。すなわち、歌唱音声および指示音声による多様な音声入力が可能である。 A program according to a preferred aspect (eleventh aspect) of the present disclosure includes a determination unit that causes one or more processors to determine whether an input voice is a singing voice or an instruction voice other than a singing voice; If the voice is determined to be a singing voice, instruct a playback control unit that controls playback of the song to perform a first operation related to playback of a song corresponding to the input voice, and determine that the input voice is an instruction voice. If it is determined that there is an input voice, it functions as an operation control section that instructs the reproduction control section to perform a second operation represented by the input voice. In the above aspect, when the input voice is determined to be a singing voice, the playback control unit is instructed to perform the first operation related to the reproduction of the music corresponding to the input voice, and the input voice is determined to be an instruction voice. If so, the second action represented by the input audio is instructed to the playback control unit. That is, a variety of voice inputs including singing voice and instruction voice are possible.

１０…楽曲再生システム、１１…収音装置、１２…制御装置、１２１…判別部、１２３…動作制御部、１２５…再生制御部、１２７…歌唱評価部、１３…記憶装置、１４…再生装置、２０…端末装置、２１…収音装置、２２…通信装置、２３…制御装置、２４…記憶装置、２７…動作制御部、２３１…第１処理部、２３３…第１制御部、３０…端末装置、３１…収音装置、３２…通信装置、３３…制御装置、３３１…第２処理部、３３３…第２動作制御部、３４…記憶装置、４０…処理装置、４１…再生装置、４２…通信装置、４３…制御装置、４３１…再生制御部、４４…記憶装置。 DESCRIPTION OF SYMBOLS 10... Music playback system, 11... Sound collection device, 12... Control device, 121... Discrimination part, 123... Operation control part, 125... Playback control part, 127... Singing evaluation part, 13... Storage device, 14... Playback device, 20...Terminal device, 21...Sound collection device, 22...Communication device, 23...Control device, 24...Storage device, 27...Operation control section, 231...First processing section, 233...First control section, 30...Terminal device , 31...Sound collection device, 32...Communication device, 33...Control device, 331...Second processing section, 333...Second operation control section, 34...Storage device, 40...Processing device, 41...Playback device, 42...Communication device, 43...control device, 431...playback control section, 44...storage device.

Claims

Determine whether the input voice is a singing voice or an instruction voice other than a singing voice,
When the input voice is determined to be a singing voice, a first operation of reproducing a song corresponding to the input voice from a position corresponding to the input voice is performed on a playback control unit that controls playback of the song. a computer-implemented control method for a music playback system, comprising: instructing the playback control unit to perform a second action represented by the input sound when the input sound is determined to be the instruction sound;

The method for controlling a music playback system according to claim 1, wherein the second operation is an operation corresponding to a registered character string similar to an input character string representing the input voice, among a plurality of operations corresponding to different registered character strings. .

3. The method of controlling a music reproduction system according to claim 1, wherein the second operation is an operation of reproducing the music specified by the input voice.

3. The method of controlling a music reproduction system according to claim 1, wherein the second operation is an operation of stopping reproduction of the music specified by the input voice.

3. The method of controlling a music reproduction system according to claim 1, wherein the second operation is an operation of changing the key of the music specified by the input voice.

3. The method of controlling a music reproduction system according to claim 1, wherein the second operation is an operation of changing the volume of the music specified by the input voice.

a determination unit that determines whether the input voice is a singing voice or an instruction voice other than the singing voice;
When the input voice is determined to be a singing voice, a first operation of reproducing a song corresponding to the input voice from a position corresponding to the input voice is performed on a playback control unit that controls playback of the song. and an action control unit that instructs the playback control unit to perform a second action represented by the input sound when the input sound is determined to be the instruction sound.

one or more processors,
a determination unit that determines whether the input voice is a singing voice or an instruction voice other than a singing voice, and
When the input voice is determined to be a singing voice, a first operation of reproducing a song corresponding to the input voice from a position corresponding to the input voice is performed on a playback control unit that controls playback of the song. and when the input voice is determined to be an instruction voice, the program functions as an operation control unit that instructs the playback control unit to perform a second operation represented by the input voice.