JP2019159099A

JP2019159099A - Music reproduction system

Info

Publication number: JP2019159099A
Application number: JP2018045524A
Authority: JP
Inventors: 達也北郷; Tatsuya Kitago
Original assignee: Onkyo Corp
Current assignee: Onkyo Corp
Priority date: 2018-03-13
Filing date: 2018-03-13
Publication date: 2019-09-19

Abstract

To reduce user's stress when a music which is not selected is played, when the music is selected only by a voice.SOLUTION: The music reproduction system includes a sound collection unit which collects sound, a voice recognition unit which recognizes voice collected by the sound collection unit, a candidate song extraction unit which extracts a plurality of candidate songs based on music selection information included in the information voice-recognized by the voice recognition unit, a candidate song part reproduction unit which respectively sequentially reproduces a part of the plurality of candidate songs extracted by the candidate song extraction unit, and a selected song reproduction unit which reproduces a song selected from the reproduced candidate songs based on a song selection instruction input included in the information collected by the sound collection unit and recognized by the voice recognition unit for a reproduction of the candidate song part reproduction unit.SELECTED DRAWING: Figure 1

Description

本発明は、楽曲再生システムに関するものである。 The present invention relates to a music playback system.

近年、ネットワークを介した音楽配信サービスが普及している。このような音楽配信サービスでは、ユーザは、楽曲を再生する機能を有する情報端末（スマートフォン，ＰＣ，オーディオ機器など）をネットワークに接続して、音楽配信サーバからネットワークを介して送られてきた楽曲情報を、一端情報端末にダウンロードしてから再生するか、或いは、情報端末にダウンロードしながら同時再生（ストリーミング）する（例えば、下記特許文献１参照）。 In recent years, music distribution services via networks have become widespread. In such a music distribution service, a user connects an information terminal (smart phone, PC, audio device, etc.) having a function of reproducing music to the network, and the music information sent from the music distribution server via the network Is downloaded to the information terminal and then played back, or simultaneously downloaded (streamed) while being downloaded to the information terminal (see, for example, Patent Document 1 below).

また最近では、ユーザ側の情報端末として、音声対話型のインターフェースを備えた情報端末が用いられ、対話形式で情報検索を行うシステムが普及している。更には、音声対話対応スピーカ（スマートスピーカ，ＡＩスピーカなど）と呼ばれる、音声入出力のみで手入力機能や表示画面を持たない、完全対話型の情報端末も普及しつつある。この音声対話対応スピーカは、ＡＩ（人工知能）を取り入れた音声認識情報検索システムを採用するなどして、ユーザの求める情報を対話式で提供するものであり、面倒な操作入力を省き、ハンズフリーでの使用が可能になるといった利便性を有している。 Recently, an information terminal having a voice interactive interface has been used as an information terminal on the user side, and a system for searching information in an interactive format has become widespread. Furthermore, fully interactive information terminals called voice interactive speakers (smart speakers, AI speakers, etc.) that have only voice input / output and no manual input functions or display screens are becoming widespread. This voice interactive speaker is a voice recognition information retrieval system that incorporates AI (artificial intelligence), etc., and provides information requested by the user interactively, eliminating cumbersome operation inputs and hands-free. It has the convenience that it can be used in

特開２００４−１９１７０１号公報JP 2004-191701 A

前述した音声対話対応スピーカを音楽配信サービスのユーザ側情報端末として用いる場合、ユーザが聴きたい楽曲の情報は、音声で入力されることになり、音楽配信サーバからユーザ側情報端末に送られてきた楽曲情報は、音声対話対応スピーカによって再生されて初めてユーザに確認されることになる。従来型の表示画面を備える情報端末の場合には、ユーザは、送られてきた楽曲情報を表示画面で確認することができるが、表示画面を持たない音声対話対応スピーカは、このような確認をすることができない。 When the above-described speaker for voice conversation is used as the user side information terminal of the music distribution service, the information of the music that the user wants to listen to is input by voice and sent from the music distribution server to the user side information terminal. The music information is not confirmed by the user until it is reproduced by the speaker for voice dialogue. In the case of an information terminal equipped with a conventional display screen, the user can check the transmitted music information on the display screen. However, a voice interactive speaker that does not have a display screen performs such confirmation. Can not do it.

そうすると、常にユーザの選曲に合致した楽曲情報が送られてくれば問題はないが、ユーザの選曲とは異なる楽曲情報が送られてきた場合には、ユーザは、選曲とは異なる楽曲の再生を聴き続けるか、或いは一旦楽曲の再生を止めて、再度音声入力し直すことになり、繰り返しの再入力でも求める楽曲に辿り着かない場合には、他の入力手段への代替えができないこともあって、ユーザは大きなストレスを感じることになる。 Then, there is no problem as long as music information that matches the user's music selection is sent, but when music information different from the user's music selection is sent, the user plays the music different from the music selection. If you continue to listen to it or you stop playing the song and re-input the voice again, and you cannot reach the desired song even after repeated re-inputs, you may not be able to substitute other input means. The user will feel great stress.

前述した説明は、オンラインの音楽配信サービスに音声対話対応スピーカを接続する場合を例に説明したが、オフラインの楽曲再生システムであっても、音声入力のみによるハンズフリーでの使用を前提とする場合には、同様の問題が生じることになる。 In the above explanation, a case where a speaker for voice conversation is connected to an online music distribution service has been described as an example. However, even in an offline music playback system, it is assumed that hands-free use with only voice input is assumed. The same problem will occur.

本発明は、このような問題に対処するために提案されたものである。すなわち、音声のみで選曲をする際に、選曲していない楽曲が再生された場合のユーザのストレス軽減を図ること、などが本発明の課題である。 The present invention has been proposed to address such problems. That is, the subject of the present invention is to reduce the user's stress when a music piece that has not been selected is played when the music piece is selected only by voice.

このような課題を解決するために、本発明は、以下の構成を具備するものである。
音声を収音する収音部と、前記収音部によって収音された音声を音声認識する音声認識部と、前記音声認識部が音声認識した情報に含まれる選曲情報に基づいて、複数の候補楽曲を抽出する候補楽曲抽出部と、前記候補楽曲抽出部によって抽出された複数の前記候補楽曲の一部をそれぞれ順次再生する候補楽曲部分再生部と、前記候補楽曲部分再生部の再生に対して、前記収音部で収音されて前記音声認識部が音声認識した情報に含まれる楽曲選択指示入力に基づいて、再生された前記候補楽曲から選択された楽曲を再生する選択楽曲再生部と、を備えることを特徴とする楽曲再生システム。 In order to solve such a problem, the present invention has the following configuration.
A plurality of candidates based on a sound collection unit that collects sound, a voice recognition unit that recognizes voice collected by the sound collection unit, and music selection information included in the information recognized by the voice recognition unit A candidate song extraction unit that extracts songs, a candidate song partial playback unit that sequentially plays a part of the plurality of candidate songs extracted by the candidate song extraction unit, and playback of the candidate song part playback unit A selected music reproduction unit that reproduces a music selected from the reproduced candidate music, based on a music selection instruction input included in the information collected by the sound collection unit and recognized by the voice recognition unit; A music playback system comprising:

本発明の実施形態に係る楽曲再生システムを示した説明図である。It is explanatory drawing which showed the music reproduction system which concerns on embodiment of this invention. 本発明の他の実施形態に係る楽曲再生システムに示した説明図である。It is explanatory drawing shown in the music reproduction system which concerns on other embodiment of this invention. 楽曲再生システムにおける音声認識された情報に対する処理のフロー例を示した説明図である。It is explanatory drawing which showed the example of the flow of a process with respect to the audio | voice recognized information in a music reproduction system. 候補楽曲から選択楽曲を選択する手法例を示した説明図である（（ａ）が第１例、（ｂ）が第２例を示している。）。It is explanatory drawing which showed the example of the method of selecting a selection music from a candidate music ((a) shows the 1st example and (b) shows the 2nd example). 楽曲再生システムの構成例を示した説明図である。It is explanatory drawing which showed the structural example of the music reproduction system.

以下、図面を参照して本発明の実施形態を説明する。図１において、楽曲再生システム１は、収音部１０、音声認識部１１、候補楽曲抽出部１２、候補楽曲部分再生部１３、選択楽曲再生部１４を備えている。この楽曲再生システム１は、音声入力のみで楽曲の選曲と再生を行うためのシステムであり、ユーザの音声入力による選曲時に、ユーザが受けるストレスを軽減することを課題としている。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. In FIG. 1, the music playback system 1 includes a sound collection unit 10, a voice recognition unit 11, a candidate music extraction unit 12, a candidate music part playback unit 13, and a selected music playback unit 14. This music reproduction system 1 is a system for selecting and reproducing music only by voice input, and it is an object to reduce the stress received by the user at the time of music selection by the user's voice input.

楽曲再生システム１は、先ず、収音部１０によって収音された音声を音声認識部１１が音声認識する。次に、音声認識部１１が音声認識した情報に含まれる選曲情報に基づいて、候補楽曲抽出部１２が複数の候補楽曲を抽出する。そして、候補楽曲部分再生部１３が、候補楽曲抽出部１２によって抽出された複数の候補楽曲をそれぞれ順次再生する。この際、候補楽曲部分再生部１３は、曲の一部を再生する。すなわち、候補楽曲部分再生部１３は、各曲からサビ部分やイントロ部分などを部分的に取り出して再生する。その後、候補楽曲部分再生部１３の再生に対して、収音部１０で収音されて音声認識部１１が音声認識した情報に楽曲選択指示入力がある場合には、この楽曲選択指示入力によって候補楽曲から選択された楽曲を選択楽曲再生部１４が再生する。 In the music reproduction system 1, first, the voice recognition unit 11 recognizes the voice collected by the sound collection unit 10. Next, based on the music selection information included in the information recognized by the voice recognition unit 11, the candidate song extraction unit 12 extracts a plurality of candidate songs. Then, the candidate song partial reproduction unit 13 sequentially reproduces the plurality of candidate songs extracted by the candidate song extraction unit 12. At this time, the candidate music part reproduction unit 13 reproduces a part of the music. That is, the candidate music part reproduction unit 13 partially extracts a chorus part, an intro part, etc. from each music piece and reproduces it. Thereafter, when the music selection instruction input is included in the information collected by the sound collection unit 10 and voice-recognized by the voice recognition unit 11 for reproduction by the candidate music partial reproduction unit 13, the candidate is selected by this music selection instruction input. The selected music playback unit 14 plays back the music selected from the music.

このような楽曲再生システム１によると、ユーザが収音部１０に対して音声入力で選曲を行うと、音声認識された情報に含まれる選曲情報に基づいて、複数の候補楽曲が抽出され、その候補楽曲のイントロ部分やサビ部分などが順次再生される。ユーザは、その候補楽曲の中に選曲に合致した楽曲が存在する場合には、音声入力で楽曲選択指示入力を行う。これにより、その楽曲選択指示によって選択された楽曲が再生されることになる。これによると、ユーザの選曲を単独で抽出する場合と比較して、ユーザの選曲に合致した楽曲が候補楽曲に含まれる確率が高くなり、ユーザのストレスを軽減することが可能になる。 According to such a music playback system 1, when a user selects music by inputting sound into the sound collection unit 10, a plurality of candidate songs are extracted based on music selection information included in the voice-recognized information. The intro part and the chorus part of the candidate music are played sequentially. When there is a song that matches the song selection among the candidate songs, the user inputs a song selection instruction by voice input. Thereby, the music selected by the music selection instruction is reproduced. According to this, compared with the case where a user's music selection is extracted independently, the probability that the music matched with the user's music selection will be included in the candidate music increases, and it becomes possible to reduce the user's stress.

図２は、本発明の他の実施形態を示している。この例の楽曲再生システム１Ａは、音声認識部１１が音声認識した情報に含まれる選曲情報に基づいて、特定の楽曲を抽出して再生する通常再生モード１５を備えている。また、音声認識部１１が音声認識した情報に基づいて、再生された楽曲がユーザの選曲と一致していないことを判断する不一致再生判断部１６を備えており、不一致再生判断部１６が不一致と判断した場合に、前述した候補楽曲抽出部１３が候補楽曲の抽出を行う。 FIG. 2 shows another embodiment of the present invention. The music playback system 1A in this example includes a normal playback mode 15 that extracts and plays back specific music based on music selection information included in the information recognized by the voice recognition unit 11. The voice recognition unit 11 further includes a mismatch reproduction determination unit 16 that determines that the reproduced music does not match the user's music selection based on the information recognized by the voice recognition unit 11, and the mismatch reproduction determination unit 16 does not match. When the determination is made, the candidate song extraction unit 13 described above extracts candidate songs.

不一致再生判断部１６は、例えば、通常再生モード１５おける音声認識による選曲操作が所定時間内に所定回数以上繰り返された場合、通常再生モード１５における再生が同じ曲を再生し、その都度ユーザに音声入力で中断指示をされるような場合、再選曲を促す設定された言葉が、所定時間の間に所定回数以上検出された場合など、音声入力されて音声認識された情報から、再生された楽曲がユーザの選曲と一致していないと判断される場合に作動する。 For example, when the music selection operation based on voice recognition in the normal playback mode 15 is repeated a predetermined number of times or more within a predetermined time, the discrepancy playback determination unit 16 plays back the same music played in the normal playback mode 15, When the input is instructed to be interrupted, or when a set word that prompts re-selection is detected more than a predetermined number of times during a predetermined time, etc. Is activated when it is determined that does not match the user's music selection.

このような楽曲再生システム１Ａは、ユーザが収音部１０に対して音声入力で選曲を行うと、先ずは、通常再生モード１５にて、音声認識部１１が音声認識した情報に含まれる選曲情報に基づいて楽曲が抽出されて、その楽曲が再生される。ユーザが再生された楽曲の再生継続を望む場合は、そのまま通常再生モード１５が継続する。 In such a music playback system 1A, when a user selects music by voice input to the sound collection unit 10, first, music selection information included in the information recognized by the voice recognition unit 11 in the normal playback mode 15. The music is extracted based on the above and the music is reproduced. When the user wants to continue playing the played music, the normal playback mode 15 continues.

これに対して、ユーザが例えば音声入力で中断指示をしたような場合に、不一致再生判断部１６が不一致判断をすることになり、その場合には、音声認識された情報に含まれる選曲情報に基づいて、複数の候補楽曲が抽出されて、その候補楽曲のイントロ部分やサビ部分などが順次再生される。ユーザは、その候補楽曲の中に選曲に合致した楽曲が存在する場合には、音声入力で楽曲選択指示入力を行う。これにより、その楽曲が選択されて再生される。 On the other hand, for example, when the user gives a stop instruction by voice input, the mismatch playback determination unit 16 determines the mismatch, and in this case, the music selection information included in the voice-recognized information is included. Based on this, a plurality of candidate music pieces are extracted, and the intro part, the chorus part, etc. of the candidate music pieces are reproduced sequentially. When there is a song that matches the song selection among the candidate songs, the user inputs a song selection instruction by voice input. Thereby, the music is selected and reproduced.

図３は、楽曲再生システム１Ａにおける音声認識された情報を処理する各部の処理フロー例を示している。ステップＳ１では、音声認識部１１にて選曲情報の有無が判断されている。ここでの選曲情報は、曲名、アーティスト名、アルバム名、コンピレーションなど、各種の情報が該当する。選曲情報が無い場合（ステップＳ１：ＮＯ）には、選曲待ちの状態が継続される。 FIG. 3 shows an example of the processing flow of each unit that processes the speech-recognized information in the music playback system 1A. In step S1, the voice recognition unit 11 determines whether or not there is music selection information. Here, the music selection information corresponds to various information such as a song name, artist name, album name, and compilation. If there is no music selection information (step S1: NO), the state of waiting for music selection is continued.

音声認識部１１が選曲情報有りを判断した場合（ステップＳ１：ＹＥＳ）には、通常再生モード１５にて選曲に対応する楽曲が再生される（ステップＳ２）。また、楽曲の再生と共に、不一致再生判断部１６が不一致再生判断を行う（ステップＳ３）。ステップＳ３では、選曲行為が短時間で所定回数以上繰り返し行われる、同じ曲の再生と中断指示が所定回数以上繰り返される、選曲を行う際に用いられる設定された言葉（起動ワード）が短時間に所定回数以上繰り返されるなどの状況が判断され、ユーザの選曲と通常再生モードでの再生曲が一致しているか否かが判断される。ユーザの選曲と再生曲が一致している場合（ステップＳ３：ＮＯ）には、通常再生モード（ステップＳ２）が継続される。 When the voice recognition unit 11 determines that there is music selection information (step S1: YES), the music corresponding to the music selection is reproduced in the normal reproduction mode 15 (step S2). In addition, along with the reproduction of the music, the inconsistent reproduction determination unit 16 determines inconsistent reproduction (step S3). In step S3, the song selection action is repeated a predetermined number of times or more in a short time, and the reproduction and interruption instructions for the same song are repeated a predetermined number of times or more. It is determined whether or not the situation is repeated a predetermined number of times or more, and it is determined whether or not the user's selection of music matches the playback music in the normal playback mode. When the user's music selection and the reproduction music match (step S3: NO), the normal reproduction mode (step S2) is continued.

これに対して、ステップＳ３にてユーザの選曲と再生曲が一致してないと判断された場合（ステップＳ３：ＹＥＳ）には、ステップＳ４にて、例えば、「ご希望の曲が見つかりませんか？」という問いかけを実行して、候補楽曲を抽出するか否かが判断される。この問いかけに対して、「いいえ」の回答が音声認識された場合（ステップＳ４：ＮＯ）には、引き続き通常再生が実行されるが（ステップＳ２）、「ご希望の曲が見つかりませんか？」という問いかけに対して「はい」といった回答が音声認識された場合（ステップＳ４）には、候補楽曲抽出部１２にて候補楽曲の抽出が行われる（ステップＳ５）。 On the other hand, if it is determined in step S3 that the user's song selection and the playback song do not match (step S3: YES), in step S4, for example, “The desired song cannot be found. It is determined whether or not to extract candidate music. In response to this question, if the answer “No” is recognized as a voice (step S4: NO), normal playback continues (step S2). When the answer “yes” is recognized by voice in response to the question “” (step S4), candidate song extraction unit 12 extracts candidate songs (step S5).

ステップＳ５における候補楽曲抽出部１２の処理は、最後に音声認識された情報に含まれる選曲情報に基づいて、或いは、繰り返された音声入力から音声認識された情報に含まれる選曲情報に基づいて、それに関連する複数の候補楽曲を抽出する。 The process of the candidate music extraction unit 12 in step S5 is based on the music selection information included in the information finally recognized by voice, or based on the music selection information included in the information recognized by voice from repeated voice input. A plurality of candidate songs related to it are extracted.

この際、選曲情報が曲名の場合は、例えば、同じ曲名で異なる楽曲が存在する場合はその全て又は一部、音声認識の精度を考慮した類似の曲名、などが候補楽曲として抽出される。また、選曲情報がアーティスト名の場合には、通常再生モートで再生された楽曲とは異なる同名アーティストの楽曲、呼び名が同じ又は類似したアーティストの楽曲、などが候補楽曲として抽出される。また、選曲情報がアルバム名等の場合には、同じアルバム内の他の楽曲、異なるアーティストの同名アルバム内の楽曲、などが候補楽曲として抽出される。 At this time, when the music selection information is a song name, for example, when there are different songs with the same song name, all or a part of them, similar song names considering the accuracy of voice recognition, and the like are extracted as candidate songs. Further, when the music selection information is the artist name, the music of the same artist different from the music reproduced in the normal reproduction mode, the music of the artist having the same or similar name, and the like are extracted as candidate music. When the music selection information is an album name or the like, other music in the same album, music in the same name album of a different artist, and the like are extracted as candidate music.

そして、ステップＳ５にて候補楽曲が抽出されると、抽出された楽曲の全て又は一部に対して、再生の順番付けが設定される。再生の順番付けは、例えば、音声認識による確からしさによる順番、ユーザによって予め指定された情報による順番付け、記憶されている過去のユーザの選択傾向（履歴）に基づいた順番付け、などによって設定される。 Then, when candidate music is extracted in step S5, reproduction ordering is set for all or part of the extracted music. The order of reproduction is set by, for example, the order based on the probability based on voice recognition, the order based on information specified in advance by the user, or the order based on the past user selection tendency (history). The

抽出された候補楽曲の中で再生を行う楽曲の順番が設定されると、その順番に応じて、候補楽曲の部分再生が行われる（ステップＳ６）。曲の部分再生は、予め曲のどの部分（イントロ部分、サビ部分、特徴部分など）を再生するかが設定されており、その設定に応じて部分再生がなされる。複数の候補楽曲を順次再生する際には、間に間隔を入れて再生する、或いは、クロスフェードで再生するなどして、各曲の区切りが分かりやすいように再生を行う。 When the order of the music to be reproduced among the extracted candidate music is set, partial reproduction of the candidate music is performed according to the order (step S6). In the partial reproduction of the music, which part (intro part, chorus part, feature part, etc.) of the music is set in advance, and the partial reproduction is performed according to the setting. When a plurality of candidate songs are sequentially played back, they are played back with an interval between them, or played back with a cross-fade, etc., so that the break of each song is easily understood.

そして、候補楽曲の部分再生中或いは再生後に、ユーザから音声入力で楽曲選択指示の入力があった場合（ステップＳ７：ＹＥＳ）には、候補楽曲の中から選択楽曲を特定して、選択楽曲再生部１４がその楽曲の再生を行い（ステップＳ８）、そのまま通常再生モード（ステップＳ２）に移行する。この際の楽曲選択指示は、例えば、「それ！」、「その曲！」、「ＯＫ！」といった、選択指示と判断できるものであればよい。 Then, during or after the partial reproduction of the candidate music, when the user inputs a music selection instruction by voice input (step S7: YES), the selected music is specified from the candidate music and the selected music is reproduced. The unit 14 reproduces the music (step S8), and proceeds to the normal reproduction mode (step S2) as it is. The music selection instruction at this time may be any instruction that can be determined as a selection instruction, such as “That!”, “That song!”, And “OK!”.

選択楽曲指示の入力が無い場合（ステップＳ７：ＮＯ）には、抽出した候補楽曲の中に他の候補楽曲が有るか無いかを判断して（ステップＳ９）、他の候補楽曲が有る場合（ステップＳ９：ＹＥＳ）には、候補楽曲を変更して（ステップＳ１０）して、再び候補楽曲の部分再生を行う（ステップＳ６）。また、他の候補楽曲が無い場合（ステップＳ９：ＮＯ）には、「継続しますか」といった問いかけを行って継続確認をして（ステップＳ１１）、継続する場合（ステップＳ１１：ＹＥＳ）には、候補楽曲の再抽出を行う（ステップＳ５）。この際の候補楽曲の再抽出時には、選曲のヒントとなる追加キーワードを質問するなどして、新たな抽出要素の音声入力を促すようにしてもよい。そして、継続しない場合（ステップＳ１１：ＮＯ）には、選曲処理を終了する。 If there is no input of the selected music instruction (step S7: NO), it is determined whether or not there is another candidate music in the extracted candidate music (step S9), and there is another candidate music ( In step S9: YES, the candidate music is changed (step S10), and the candidate music is partially reproduced again (step S6). Further, when there is no other candidate music (step S9: NO), a question such as “Do you want to continue” is asked to confirm the continuation (step S11), and when continuing (step S11: YES) The candidate music is re-extracted (step S5). At this time, when the candidate music is re-extracted, the user may be prompted to input a new extraction element by, for example, asking an additional keyword to be a hint for music selection. And when not continuing (step S11: NO), a music selection process is complete | finished.

図４には、複数の候補楽曲から選択楽曲を選択する手法の例を示している。図４（ａ）に示した例では、候補曲を順番に続けて部分再生し（ステップＳ２０）、全て再生後に、「ご希望の曲は何番目でしたか？」といった質問を行い（ステップＳ２１）、それに対する答えに応じて選択曲の再生を行う（ステップＳ２２）。また、ステップＳ２１の質問に対して回答が無い場合には、候補曲があるか否かを確認して（ステップＳ２３）、候補曲が有る場合には（ステップＳ２３：ＹＥＳ）、再度候補曲を順番に続けて再生する（ステップＳ２０）。また、候補曲が無い場合には（ステップＳ２３：ＮＯ）、「ご希望の曲はありません」といった音声出力を行った後（ステップＳ２４）、通常再生モード（ステップＳ２）に移行する。 FIG. 4 shows an example of a method for selecting a selected song from a plurality of candidate songs. In the example shown in FIG. 4A, the candidate songs are partially reproduced in order (step S20). After all the candidate songs have been reproduced, a question such as “What was the desired song?” Was asked (step S21). ), The selected song is reproduced according to the answer to the answer (step S22). If there is no answer to the question in step S21, it is confirmed whether there is a candidate song (step S23). If there is a candidate song (step S23: YES), the candidate song is again selected. Playback is continued in order (step S20). If there is no candidate song (step S23: NO), after outputting a sound such as "There is no desired song" (step S24), the process proceeds to the normal playback mode (step S2).

図４（ｂ）に示した例では、１番目の候補曲をある時間のみ再生し（ステップＳ３０）、「ご希望の曲ですか？」といった質問を行い（ステップＳ３１）、希望の曲であることが確認できた場合は選択曲の再生を行う（ステップＳ３２）。また、ステップＳ３１の質問に対して希望の曲で無い旨の回答の場合には、候補曲があるか否かを確認して（ステップＳ３３）、候補曲が有る場合には（ステップＳ３３：ＹＥＳ）、次の候補曲をある時間のみ再生する（ステップＳ３０）。また、候補曲が無い場合には（ステップＳ３３：ＮＯ）、「ご希望の曲はありません」といった音声出力を行った後（ステップＳ３４）、通常再生モード（ステップＳ２）に移行する。 In the example shown in FIG. 4B, the first candidate song is played only for a certain time (step S30), and a question such as "Would you like it?" Is asked (step S31), which is the desired song. If it can be confirmed, the selected song is reproduced (step S32). If the answer to the question in step S31 indicates that the song is not the desired song, it is checked whether there is a candidate song (step S33). If there is a candidate song (step S33: YES) ), The next candidate song is reproduced only for a certain time (step S30). If there is no candidate song (step S33: NO), after outputting a voice message such as "There is no desired song" (step S34), the process proceeds to the normal playback mode (step S2).

このような楽曲再生システム１，１Ａは、家庭用オーディオや車載用オーディオなどのオフラインのシステムにおける音声入力での使用に採用することができるが、オンラインの音楽配信サービスにおいて効果的に採用することができる。 Such music playback systems 1 and 1A can be used for voice input in offline systems such as home audio and in-vehicle audio, but can be effectively used in online music distribution services. it can.

図５には、楽曲再生システム１，１Ａをオンラインの音楽配信サービスに採用する際のシステム例を示している。この例では、ネットワーク２０を介して通信可能な音楽配信サーバ２１と、ネットワーク２０を介して音楽配信サーバ２１との情報送受信が可能なユーザ端末装置である、音声対話対応のスピーカ装置３０を備えている。 FIG. 5 shows a system example when the music playback systems 1 and 1A are employed in an online music distribution service. In this example, a music distribution server 21 that can communicate via the network 20 and a speaker device 30 that supports voice conversation, which is a user terminal device that can transmit and receive information to and from the music distribution server 21 via the network 20 are provided. Yes.

スピーカ装置３０は、前述した収音部１０を構成するマイク３１と、再生された候補楽曲又は選択楽曲の音声信号を音に変換するスピーカユニット３２とを備えている。また、スピーカ装置３０はマイク３１が収音した音声信号が入力され、スピーカユニット３２に音声信号を出力する情報処理部３３と、情報処理部３３から出力された情報をネットワーク２０を介して音楽配信サーバ２１に送信し、音楽配信サーバ２１からネットワーク２０を介して送信された情報を受信して情報処理部３３に送る通信部３４を備えている。 The speaker device 30 includes a microphone 31 that constitutes the sound collecting unit 10 described above, and a speaker unit 32 that converts a sound signal of the reproduced candidate song or selected song into sound. In addition, the speaker device 30 receives an audio signal picked up by the microphone 31 and outputs an audio signal to the speaker unit 32, and distributes the information output from the information processing unit 33 via the network 20. A communication unit 34 is provided that transmits information to the server 21, receives information transmitted from the music distribution server 21 via the network 20, and sends the information to the information processing unit 33.

このようなスピーカ装置３０と音楽配信サーバ２１がネットワーク２０に接続されたシステムにおいて、前述した楽曲再生システム１，１Ａは、収音部１０を除く構成要素の一部又は全部を、音楽配信サーバ２１のコンピュータとスピーカ装置３０における情報処理部３３のコンピュータの一方又は両方に適宜配備させることができる。その際、音楽配信サーバ２１或いは情報処理部３３には、音声認識部１１が実行する音声認識処理、候補楽曲抽出部１２が実行する候補楽曲抽出処理、候補楽曲部分再生部１３が実行する候補楽曲部分再処理、選択楽曲再生部１４が実行する選択楽曲再生処理、不一致再生判断部１６が実行する不一致再生判断処理をコンピュータに実行させる楽曲再生プログラムがインストールされることになり、インストールされた楽曲再生プログラムは、コンピュータが読み取り可能なメモリや他の記録媒体に記録されることになる。 In such a system in which the speaker device 30 and the music distribution server 21 are connected to the network 20, the music reproduction system 1, 1 </ b> A described above is configured such that the music distribution server 21 uses some or all of the components except the sound collection unit 10. The information processing unit 33 in the speaker device 30 and the computer of the information processing unit 33 may be appropriately deployed in one or both. At that time, the music distribution server 21 or the information processing unit 33 includes a voice recognition process executed by the voice recognition unit 11, a candidate music extraction process executed by the candidate music extraction unit 12, and a candidate music executed by the candidate music partial reproduction unit 13. A music playback program that causes a computer to execute partial reprocessing, selected music playback processing executed by the selected music playback unit 14, and mismatch playback determination processing executed by the mismatch playback determination unit 16 is installed. The program is recorded in a computer-readable memory or other recording medium.

このような楽曲再生システム１，１Ａを用いると、音声信号のみで選曲を行う場合に、選曲していない楽曲が再生された場合のユーザのストレスを軽減することができる。 When such music reproduction systems 1 and 1A are used, when music selection is performed using only audio signals, it is possible to reduce the stress on the user when music that has not been selected is reproduced.

以上、本発明の実施の形態について図面を参照して詳述してきたが、具体的な構成はこれらの実施の形態に限られるものではなく、本発明の要旨を逸脱しない範囲の設計の変更等があっても本発明に含まれる。また、上述の各実施の形態は、その目的及び構成等に特に矛盾や問題がない限り、互いの技術を流用して組み合わせることが可能である。 As described above, the embodiments of the present invention have been described in detail with reference to the drawings. However, the specific configuration is not limited to these embodiments, and the design can be changed without departing from the scope of the present invention. Is included in the present invention. In addition, the above-described embodiments can be combined by utilizing each other's technology as long as there is no particular contradiction or problem in the purpose and configuration.

１，１Ａ：楽曲再生システム，
１０：収音部，１１：音声認識部，１２：候補楽曲抽出部，
１３：候補楽曲部分再生部，１４：選択楽曲再生部，１５：通常再生モード，
１６：不一致再生判断部，
２０：ネットワーク，２１：音楽配信サーバ，
３０：スピーカ装置（ユーザ端末装置），３１：マイク（収音部），
３２：スピーカユニット，３３：情報処理部，３４：通信部 1, 1A: Music playback system,
10: sound collection unit, 11: voice recognition unit, 12: candidate song extraction unit,
13: Candidate music part playback unit, 14: Selected music playback unit, 15: Normal playback mode,
16: Inconsistent reproduction determination unit,
20: Network, 21: Music distribution server,
30: Speaker device (user terminal device), 31: Microphone (sound collecting unit),
32: Speaker unit, 33: Information processing unit, 34: Communication unit

Claims

A sound collection unit for collecting sound;
A voice recognition unit that recognizes voice collected by the sound collection unit;
A candidate song extraction unit that extracts a plurality of candidate songs based on music selection information included in the information recognized by the voice recognition unit;
A candidate song partial reproduction unit that sequentially reproduces a part of each of the plurality of candidate songs extracted by the candidate song extraction unit;
With respect to the reproduction of the candidate song partial reproduction unit, the candidate song is selected from the reproduced candidate songs based on the song selection instruction input included in the information collected by the sound collection unit and recognized by the voice recognition unit. A selected music playback unit that plays the selected music,
A music playback system comprising:

Based on the music selection information included in the information recognized by the voice recognition unit, a normal playback mode for extracting and playing back a specific song,
Based on the information recognized by the voice recognition unit, a mismatch reproduction determination unit that determines that the reproduced music does not match the user's music selection,
The music reproduction system according to claim 1, wherein when the inconsistent reproduction determination unit determines that there is a discrepancy, the candidate music extraction unit performs candidate music extraction.

A music playback method for selecting and playing music based on a user's voice selection input,
A speech recognition process for recognizing input speech,
A candidate song extraction step for extracting a plurality of candidate songs based on the music selection information included in the voice-recognized information;
A candidate song partial reproduction step of sequentially reproducing a part of each of the plurality of candidate songs extracted in the candidate song extraction step;
A selected song that reproduces a song selected from the reproduced candidate songs based on a song selection instruction input included in information obtained by voice recognition of the user's voice for reproduction in the candidate song partial reproduction step A music reproduction method comprising: a reproduction step.

A voice recognition process for recognizing the collected voice;
A candidate song extraction process for extracting a plurality of candidate songs based on music selection information included in the information recognized by the voice recognition process;
A candidate song partial reproduction process for sequentially reproducing a part of each of the plurality of candidate songs extracted by the candidate song extraction process;
For the reproduction of the candidate song partial reproduction process, a song selected from the reproduced candidate songs based on a song selection instruction input included in the information collected and voice-recognized in the voice recognition process A music reproduction program for causing a computer to execute selected music reproduction processing to be reproduced.

A computer-readable recording medium on which the music reproduction program according to claim 4 is recorded.

A music distribution server capable of communicating via a network;
A user terminal device capable of transmitting and receiving information to and from the music distribution server via a network,
The user terminal device includes the sound collection unit and a speaker unit that converts the reproduced candidate music or the music to sound,
The music playback system according to claim 1, wherein each unit of the music playback system is provided in one or both of the computer of the user terminal device and the computer of the music distribution server.

A speaker device used in the music playback system according to claim 1,
A speaker apparatus comprising: the sound collection unit; and a speaker unit that converts the reproduced candidate music or the music to sound.