JP2004347943A

JP2004347943A - Data processor, musical piece reproducing apparatus, control program for data processor, and control program for musical piece reproducing apparatus

Info

Publication number: JP2004347943A
Application number: JP2003146099A
Authority: JP
Inventors: Yoshihisa Takeda; 能久武田; Naoya Koga; 直哉古賀; Akira Inoue; 明井上; Kazuyoshi Sukai; 和義須貝
Original assignee: Clarion Co Ltd
Current assignee: Faurecia Clarion Electronics Co Ltd
Priority date: 2003-05-23
Filing date: 2003-05-23
Publication date: 2004-12-09

Abstract

<P>PROBLEM TO BE SOLVED: To provide a data processor of better operability, a musical piece reproducing apparatus, a control program for the data processor, and a control program for the musical piece reproducing apparatus. <P>SOLUTION: The on-vehicle musical piece reproducing apparatus 100 which selects and reproduces a musical piece based on the speech input from a user is equipped with a memory section 130 which divides the questions for obtaining the speech input from the user as a response to a plurality of layers 300 to 304 in order to specify the musical piece to be selected and stores the same. The apparatus reproduces a question belonging to the certain layer by the speech and urges the user to make answer and thereafter if the apparatus fails to obtain the answer for the question from the user upon lapse of the specified time or if the apparatus could obtain just the prescribed ambiguous expression as the speech input, the apparatus reproduces the question belonging to the layer or the other layer by the speech and urges the user to make answer. The operability is therefore improved. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
本発明は、ユーザの音声入力に基づいてデータ処理を実行するデータ処理装置、楽曲再生装置、および、これらの制御プログラムに関する。
【０００２】
【従来の技術】
従来から、装置操作を、より簡便なものとすべく、ユーザの手操作に代えて音声入力による操作を可能にしたデータ処理装置が知られている。また、この種の装置としては、ユーザとの複数回の対話に基づいて、実行すべきデータ処理を絞り込むようになされたものが提案されている（例えば、特許文献１参照）。
【０００３】
このような装置にあっては、対話中にユーザが、黙り込むなどして、ユーザからの応答が得られなくなった場合、対話を進めずに待機するのが一般的である。
【０００４】
【特許文献１】
特開２００３−１０８１７５号公報
【０００５】
【発明が解決しようとする課題】
しかしながら、上記のように、対話の進行が止まってしまうと、操作性が悪くなるといった問題がある。具体的には、装置が対話を止めたままにしてしまうと、ユーザが操作を再開したい場合には、どこまで対話が進んでいたかを覚えておくか、その都度確認する必要がある。
【０００６】
本発明は、上述した事情に鑑みてなされたものであり、より操作性の良いデータ処理装置、楽曲再生装置、データ処理装置の制御プログラムおよび楽曲再生装置の制御プログラムを提供することを目的とする。
【０００７】
【課題を解決するための手段】
上記課題を解決するために、請求項１に記載の発明は、ユーザからの音声入力に基づいて複数のデータ処理のいずれか一を実行するデータ処理装置において、データ処理ごとに、ユーザに音声入力を促すための音声データを記憶する記憶手段とを備え、これらの音声データの中から１つを再生しユーザに音声入力を促した後、所定時間が経過しても、この促しに対してユーザから音声入力が得られなかった場合、あるいは、音声入力として所定のあいまい言葉しか得られなかった場合に、前記音声データの中の他の音声データを再生し、ユーザに音声入力を促すことを特徴とする。
【０００８】
請求項２に記載の発明は、請求項１に記載のデータ処理装置において、ユーザからの音声入力があるまで、先に再生した音声データと異なる音声データを再生し、ユーザに音声入力を促すことを特徴とする。
【０００９】
請求項３に記載の発明は、ユーザからの音声入力に基づいて複数のデータ処理のいずれか一を実行するデータ処理装置において、ユーザからの音声入力を応答として得るための質問を複数の階層に分けて記憶する記憶手段を備え、ある階層に属する質問を音声により再生して、ユーザに応答を促した後、所定時間が経過しても、この質問に対してユーザから応答が得られなかった場合、あるいは、音声入力として所定のあいまい言葉しか得られなかった場合に、当該階層あるいは他の階層に属する質問を音声により再生し、ユーザに応答を促すことを特徴とする。
【００１０】
請求項４に記載の発明は、ユーザからの音声入力に基づいて複数のデータ処理のいずれか一を実行するデータ処理装置において、ユーザからの音声入力を応答として得るための質問を複数の階層に分けて記憶する記憶手段を備え、ある階層に属する質問を音声により再生して、ユーザに応答を促した後、所定時間が経過しても、この質問に対してユーザから応答が得られなかった場合、あるいは、所定のあいまい言葉しか得られなかった場合に、当該階層あるいは当該階層から下に属する質問から特定され得るデータ処理を順次実行することを特徴とする。
【００１１】
請求項５に記載の発明は、ユーザからの音声入力に基づいて楽曲を選曲し再生する楽曲再生装置において、楽曲ごとに、当該楽曲の再生を指示する旨の音声入力をユーザに促すための音声データを記憶する記憶手段を備え、これらの音声データの中の１つを再生して、ユーザに音声入力を促した後、所定時間が経過しても、この促しに対してユーザから音声入力が得られなかった場合、あるいは、音声入力として所定のあいまい言葉しか得られなかった場合に、前記音声データの中の他の音声データを再生し、ユーザに音声入力を促すことを特徴とする。
【００１２】
請求項６に記載の発明は、請求項５に記載の楽曲再生装置において、ユーザからの音声入力があるまで、先に再生した音声データと異なる音声データを再生することを特徴とする。
【００１３】
請求項７に記載の発明は、ユーザからの音声入力に基づいて楽曲を選曲し再生する楽曲再生装置において、選曲すべき楽曲を特定すべく、ユーザからの音声入力を応答として得るための質問を複数の階層に分けて記憶する記憶手段を備え、ある階層に属する質問を音声により再生して、ユーザに応答を促した後、所定時間が経過しても、この質問に対してユーザから応答が得られなかった場合、あるいは、音声入力として所定のあいまい言葉しか得られなかった場合に、当該階層あるいは他の階層に属する質問を音声により再生してユーザに応答を促すことを特徴とする。
【００１４】
請求項８に記載の発明は、ユーザからの音声入力に基づいて楽曲を選曲し再生する楽曲再生装置において、選曲すべき楽曲を特定すべく、ユーザからの音声入力を応答として得るための質問を複数の階層に分けて記憶する記憶手段を備え、ある階層に属する質問を音声により再生して、ユーザに応答を促した後、所定時間が経過しても、この質問に対してユーザから応答が得られなかった場合、あるいは、音声入力として所定のあいまい言葉しか得られなかった場合に、当該階層あるいは当該階層から下に属する質問から特定され得る楽曲を順次選曲し再生することを特徴とする。
【００１５】
請求項９に記載の発明は、ユーザからの音声入力に基づいて複数のデータ処理のいずれか一を実行するデータ処理装置を、データ処理ごとに、ユーザに音声入力を促すための音声データを記憶する手段、および、これらの音声データの中から１つを再生しユーザに音声入力を促した後、所定時間が経過しても、この促しに対してユーザから音声入力が得られなかった場合、あるいは、音声入力として所定のあいまい言葉しか得られなかった場合に、前記音声データの中の他の音声データを再生し、ユーザに音声入力を促す手段として機能させることを特徴とするデータ処理装置の制御プログラムを提供する。
【００１６】
請求項１０に記載の発明は、ユーザからの音声入力に基づいて複数のデータ処理のいずれか一を実行するデータ処理装置において、ユーザからの音声入力を応答として得るための質問を複数の階層に分けて記憶する手段、および、ある階層に属する質問を音声により再生して、ユーザに応答を促した後、所定時間が経過しても、この質問に対してユーザから応答が得られなかった場合、あるいは、音声入力として所定のあいまい言葉しか得られなかった場合に、当該階層あるいは他の階層に属する質問を音声により再生し、ユーザに応答を促す手段として機能させることを特徴とするデータ処理装置の制御プログラムを提供する。
【００１７】
請求項１１に記載の発明は、ユーザからの音声入力に基づいて複数のデータ処理のいずれか一を実行するデータ処理装置を、ユーザからの音声入力を応答として得るための質問を複数の階層に分けて記憶する手段、および、ある階層に属する質問を音声により再生して、ユーザに応答を促した後、所定時間が経過しても、この質問に対してユーザから応答が得られなかった場合、あるいは、所定のあいまい言葉しか得られなかった場合に、当該階層あるいは当該階層から下に属する質問から特定され得るデータ処理を順次実行する手段として機能させることを特徴とするデータ処理装置の制御プログラムを提供する。
【００１８】
請求項１２に記載の発明は、ユーザからの音声入力に基づいて楽曲を選曲し再生する楽曲再生装置を、楽曲ごとに、当該楽曲の再生を指示する旨の音声入力をユーザに促すための音声データを記憶する手段、および、これらの音声データの中の１つを再生して、ユーザに音声入力を促した後、所定時間が経過しても、この促しに対してユーザから音声入力が得られなかった場合、あるいは、音声入力として所定のあいまい言葉しか得られなかった場合に、前記音声データの中の他の音声データを再生し、ユーザに音声入力を促す手段として機能させることを特徴とする楽曲再生装置の制御プログラムを提供する。
【００１９】
請求項１３に記載の発明は、ユーザからの音声入力に基づいて楽曲を選曲し再生する楽曲再生装置を、選曲すべき楽曲を特定すべく、ユーザからの音声入力を応答として得るための質問を複数の階層に分けて記憶する手段、および、ある階層に属する質問を音声により再生して、ユーザに応答を促した後、所定時間が経過しても、この質問に対してユーザから応答が得られなかった場合、あるいは、音声入力として所定のあいまい言葉しか得られなかった場合に、当該階層あるいは他の階層に属する質問を音声により再生してユーザに応答を促す手段として機能させることを特徴とする楽曲再生装置の制御プログラムを提供する。
【００２０】
請求項１４に記載の発明は、ユーザからの音声入力に基づいて楽曲を選曲し再生する楽曲再生装置を、選曲すべき楽曲を特定すべく、ユーザからの音声入力を応答として得るための質問を複数の階層に分けて記憶する手段、および、ある階層に属する質問を音声により再生して、ユーザに応答を促した後、所定時間が経過しても、この質問に対してユーザから応答が得られなかった場合、あるいは、音声入力として所定のあいまい言葉しか得られなかった場合に、当該階層あるいは当該階層から下に属する質問から特定され得る楽曲を順次選曲し再生する手段として機能させることを特徴とする楽曲再生装置の制御プログラムを提供する。
【００２１】
【発明の実施の形態】
以下、図面を参照して本発明の実施の形態について説明する。本実施形態では、楽曲再生装置として自動車などの車両に搭載される車載用楽曲再生装置を例示する。
【００２２】
＜第１実施形態＞
図１は、本実施形態にかかる車載用楽曲再生装置１００の機能的構成を、この車載用楽曲再生装置１００に楽曲データなどのマルチメディアデータを配信するための配信システム１と共に示す図である。この図に示すように、配信システム１は、配信サーバ１０と、車載用楽曲再生装置１００とを備え、これらがインターネット２および無線通信網３からなるネットワーク４を介して互いにデータ通信可能に接続されている。なお、同図には、配信サーバ１０と車載用楽曲再生装置１００とを、各々１台ずつ例示しているが、その台数は任意である。
【００２３】
配信サーバ１０は、一般的なコンピュータシステムから構成され、楽曲データや映像データ、テキストデータなどのマルチメディアデータをネットワーク４を介して車載用楽曲再生装置１００に配信するものであり、多数のマルチメディアデータが格納されたデータベース１１を備えている。
【００２４】
車載用楽曲再生装置１００は、自動車などの車両に搭載され、配信サーバ１０から配信されたマルチメディアデータを再生するものである。図示のように、車載用楽曲再生装置１００は、制御部１１０と、記憶部１３０とを備えている。この制御部１１０は、ＣＰＵや、ＲＯＭ、ＲＡＭなどを備え、車載用楽曲再生装置１００の各部を制御する。
【００２５】
通信装置１２０は、制御部１１０の制御の下、ネットワーク４に接続された各種端末装置とデータ通信するものである。より具体的には、通信装置１２０は、例えば携帯電話機あるいは無線ＬＡＮ通信装置（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）などの移動通信装置に相当し、無線通信網３を介して当該無線通信網３あるいはインターネット２に接続された各種端末とデータ通信する。本実施形態では、この通信装置１２０は、特に配信サーバ１０とデータ通信することで、この配信サーバ１０から楽曲データや映像データなどのマルチメディアデータを受信する。
【００２６】
記憶部１３０は、ＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）などの主記憶装置を備え、制御部１１０により実行される各種制御プログラムや、配信サーバ１０から受信したマルチメディアデータ（楽曲データや映像データなど）、合成音声により出力されるテキストデータなどの各種データを記憶するものである。また、記憶部１３０は、マルチメディアの種類（楽曲、テキスト、映像など）ごとに、当該記憶部１３０に格納されているデータを管理するためのテーブルデータを記憶している。
【００２７】
図２は、マルチメディアデータの１つである楽曲データを管理するための楽曲テーブル２００の構成を模式的に示す図である。この図に示すように、楽曲テーブル２００の１件のレコードには、楽曲ＩＤと、ジャンル情報と、アーティスト情報と、曲名情報と、新曲フラグとが含まれている。
【００２８】
楽曲ＩＤは、楽曲データの識別子であり、ジャンル情報は、楽曲が属するジャンルを示すものであり、このジャンルとしては、例えばＪＰＯＰ（日本のポピュラー音楽）、演歌、ロック、クラシックなどがある。アーティスト情報は、楽曲が歌であれば歌手の情報、クラシックのような演奏のみの楽曲であれば指揮者や演奏楽団の情報を示すものである。例えば楽曲が歌である場合には、アーティスト情報として、グループかソロかを示す情報、ボーカルが男性か女性かを示す情報、および、アーティスト名（もしくはグループ名）が含まれている。また、同図に示す曲名情報は、楽曲の曲目を示すものであり、新曲フラグは、楽曲を新曲として扱うか否かを示す情報であり、新曲であれば「ＹＥＳ」、新曲でなければ「ＮＯ」が示される。
【００２９】
この楽曲テーブル２００は、車載用楽曲再生装置１００が配信サーバ１０から楽曲データを受信し、記憶部１３０に格納するごとに、この楽曲データに対応するレコードを生成しレコードを追加することで更新される。楽曲データ以外のマルチメディアデータ（例えば、映像データなど）を管理するためのテーブルデータについても、楽曲テーブル２００と同様に、配信サーバ１０からデータを受信するごとに更新されるが、その詳細な説明については割愛することにする。
【００３０】
なお、楽曲テーブル２００には、上記の情報の他にも、例えばアーティスト情報として、作詞者あるいは作曲者名を含めるようにしても良いし、また、レコードに視聴人気ランキング情報や販売実績ランキング情報といった情報を含める構成としても良い。また、配信サーバ１０が楽曲テーブル２００などのマルチメディアデータを管理するためのテーブルデータを生成し、これらを車載用楽曲再生装置１００がネットワーク４を介して受信する構成としても良い。
【００３１】
さて、再び図１に戻り、楽曲再生部１４０は、制御部１１０の制御の下、マルチメディアデータの１つである楽曲データに基づいてアナログ信号を生成し、ミキサ１４１を介してアンプ１４２に出力するものである。アンプ１４２は、ミキサ１４１からのアナログ信号を増幅してスピーカ１４３に出力する。スピーカ１４３は、アンプ１４２から入力されたアナログ信号に応じて放音するものである。この構成の下、制御部１１０が記憶部１３０に記憶された楽曲データを楽曲再生部１４０に出力することで、スピーカ１４３から楽曲音が出力される。
【００３２】
マイク１５０は、収音装置であり、本実施形態では、ユーザが発した音声を収音し、アナログ信号をアンプ１５１に出力する。アンプ１５１は、入力されたアナログ信号を増幅してＡ／Ｄ変換器１５２に出力するものである。Ａ／Ｄ変換器１５２は、入力されたアナログ信号を所定ビットに量子化してデジタル信号に変換し、音声入力信号としてＶＲ１５３に出力するものである。
【００３３】
ＶＲ（ＶｏｉｃｅＲｅｃｏｇｎｉｔｉｏｎ：音声認識部）１５３は、音声入力信号に基づいて音声認識処理を実行し、その認識結果を制御部１１０に出力するものであり、音声認識処理用の回路（例えばＤＳＰ（ＤｉｇｉｔａｌＳｉｇｎａｌＰｒｏｃｅｓｓｏｒ））を備え、制御部１１０が音声認識処理を実行するよりも、高速処理が可能となっている。
【００３４】
ＴＴＳ（ＴｅｘｔＴｏＳｐｅｅｃｈ：音声変換部）１６０は、制御部１１０から入力されたテキストデータに基づいて、テキスト内容に即した合成音声を生成すべく、デジタル信号である合成音声データを生成し、Ｄ／Ａ変換器１６１に出力するものである。Ｄ／Ａ変換器１６１は、合成音声データをアナログ信号に変換し、ミキサ１４１を介してアンプ１４２に出力する。これにより、スピーカ１４３から合成音声が出力される。なお、上記のように、ミキサ１４１には、楽曲データに基づくアナログ信号と、合成音声データに基づくアナログ信号とが入力されており、両者が同時に入力されている場合には、楽曲と合成音声との音量比率が調整されスピーカ１４３から出力される。
【００３５】
操作部１５４は、電源のオン／オフなどに用いられるものであり、押下式ボタンなどの複数の操作子を備え、ユーザによる操作子の操作を検出し、制御部１１０に出力する。また、車載用楽曲再生装置１００は、この他にも、各種情報が表示される表示部（例えば液晶ディスプレイ）を備え、再生中の楽曲に関する情報や映像などが表示される。また、ユーザが音声により車載用楽曲再生装置１００を操作している間、あたかも自然人と対話しているかの印象をユーザに与えることができるように、この表示部には、ＣＧ合成映像あるいは実写映像からなる人物映像が表示されるようになっている。
【００３６】
さて上記のように、車載用楽曲再生装置１００は、ユーザの音声を認識する構成を備え、音声指示による装置の各種操作が可能となっている。音声により操作されるものとしては、例えば、車載用楽曲再生装置１００の初期設定（時刻設定など）や、楽曲選択操作などがある。以下では、説明が煩雑になるのを避けるべく、ユーザが音声を入力することによって楽曲を選択する際の操作について詳述する。
【００３７】
車載用楽曲再生装置１００にあっては、音声入力による楽曲選択は、車載用楽曲再生装置１００から出力される複数の質問音声に対して、ユーザが順次応答を音声により入力することで行われる構成となっている。この車載用楽曲再生装置１００から出力される質問は、ユーザが所望する楽曲を特定するための質問であり、楽曲を絞り込むために、大項目から小項目の順にレイヤ（階層）に分けられている。
【００３８】
図３は、楽曲選択操作の際に用いられる質問レイヤ構造の一例を示す図である。この図に示すように、質問は、その内容に応じて、第１レイヤ３００〜第５レイヤ３０４に分けられている。具体的には、第１レイヤ３００には、再生すべき楽曲のジャンルを特定するための質問が含まれ、これらの質問に対してユーザが応答することで所望する楽曲のジャンルが絞り込まれる。第２レイヤ３０１および第３レイヤ３０２には、アーティストを絞り込むための質問が含まれている。例えば第２レイヤ３０１には、ユーザが所望するアーティストがソロかグループかを特定するための質問が含まれ、第３レイヤ３０２には、ボーカルの性別を特定するための質問が含まれている。第４レイヤ３０３には、第１レイヤ３００〜第３レイヤ３０２に含まれる質問に対する応答から特定されるアーティスト候補を順次ユーザに提示し、これらの候補の中から所望のアーティストをユーザに選択させるための質問が含まれる。また、第５レイヤ３０４には、第４レイヤ３０３の質問にてユーザが選択したアーティストに属する曲目を順次ユーザに提示し、これらの候補の中から所望の曲をユーザに選択させるための質問が含まれている。
【００３９】
これら各レイヤ３００〜３０４に属する質問は、楽曲選択用の質問テーブル（以下、単に「質問テーブル」と称する）４００に予め登録されている。具体的には、質問テーブル４００は、記憶部１３０に予め記憶され、図４に示すように、質問文を示すテキストデータを、当該質問が属するレイヤごとに記録するものである。例えば、この図に一例を示すように、ユーザが所望する楽曲のジャンルを特定するための質問が属する第１レイヤ３００には、質問文として、「新曲を紹介しますか」や、「ＪＰＯＰを聴きますか」、「演歌を聴きますか」といった内容のテキストデータが登録されている。
【００４０】
また、質問に対してユーザが応答に使用するであろうフレーズは、応答テーブル５００として登録されている。この応答テーブル５００は、記憶部１３０に予め記憶されており、図５に示すように、応答フレーズが意味ごとに登録されている。本実施形態では、フレーズがとり得る意味として、肯定表現、否定表現、および、あいまい表現（あいまい言葉）が予め設定されている。肯定表現は、車載用楽曲再生装置１００が出力した質問に対するユーザの同意を意味するフレーズであり、例えば、「はい」、「うん」、「そう」などがある。否定表現は、肯定表現とは逆に、ユーザの拒否を意味するフレーズであり、例えば、「いいえ」、「ちがう」、「だめ」などがある。また、あいまい表現は、質問に対して否定とは限らないが、肯定ではないということが明らかな応答を示すものであり、例えば「えーと」、「うーん」、「あれ」などがある。
【００４１】
このような構成の下、車載用楽曲再生装置１００は、ユーザに対して質問を、合成音声にて出力した後、ユーザの応答（音声）をマイク１５０から収音し、音声入力信号に対して音声認識を施して、ユーザの応答が肯定表現、否定表現、および、あいまい表現のいずれかを判断することとなる。
【００４２】
次いで、このような車載用楽曲再生装置１００からの質問と、この質問に対するユーザの応答とからなる対話によって楽曲が選択される際の動作について説明する。
【００４３】
図６は、車載用楽曲再生装置１００の制御部１１０が、楽曲選曲・再生のために実行する選曲・再生処理の処理手順を示すフローチャートである。この図に示すように、制御部１１０は、先ず、レイヤ変数Ｎを「１」に初期化する（ステップＳａ１）。レイヤ変数Ｎは、現在の質問がどのレイヤ（図３参照）に属しているかを識別するためのものである。次いで、制御部１１０は、レイヤ変数Ｎにて示される第Ｎレイヤに属する質問の各々をユーザに合成音声にて順次出力し、これらの質問に対するユーザの応答を識別する質問処理を実行する（ステップＳａ２）。なお、この質問処理の具体的な処理内容については、後に詳述する。
【００４４】
次いで、制御部１１０は、ステップＳａ２にて、ユーザの応答として肯定表現の応答があったかを判別し（ステップＳａ３）、この判別結果がＮＯであれば、再度、同一のレイヤに属する質問をユーザに与えるべく、処理手順をステップＳａ２に戻す。
【００４５】
一方、ステップＳａ３における判別結果がＹＥＳである場合には、制御部１１０は、レイヤ変数Ｎを「１」だけインクリメントし（ステップＳａ４）、レイヤ変数Ｎが全レイヤ数（本実施形態では「４」）より大であるかを判別する（ステップＳａ５）。この判別結果がＮＯであれば、制御部１１０は、レイヤ変数Ｎで指定された第Ｎレイヤに属する質問をユーザに与えるべく、処理手順をステップＳａ２に戻す。また、この判別結果がＹＥＳであれば、レイヤごとに、肯定表現の応答が得られたこととなり、ユーザが所望する楽曲データが特定されるから、制御部１１０は、この特定された楽曲データを再生する（ステップＳａ６）。
【００４６】
次いで、上記ステップＳａ２における質問処理について図７に示すフローチャートを参照して、より詳細に説明する。
【００４７】
この図に示すように、質問処理にあっては、第Ｎレイヤに属する質問が順番にユーザに対して合成音声出力される。具体的には、制御部１１０は、先ず、質問変数Ｑを「１」に初期化する（ステップＳｂ１）。この質問変数Ｑは、レイヤに属する質問のうち、現在、どの質問まで出力が完了したかを示すものである。次いで、制御部１１０は、質問変数Ｑにて指定された質問をユーザに与えるべく、合成音声出力する（ステップＳｂ２）。そして、制御部１１０は、質問を合成音声出力してから一定時間内にユーザから応答があったか（音声入力があったか）を判別する（ステップＳｂ３）。
【００４８】
具体的には、ステップＳｂ３において、制御部１１０は、ステップＳｂ２を実行した後に、タイマカウントを開始すると共に、音声入力を受付け続ける。そして、一定時間が経過してタイマカウントがタイムアウトする前に、制御部１１０がユーザからの音声入力を取得した場合、ステップＳｂ３の判別結果がＹＥＳとなり、また、音声入力を取得せずにタイムアウトした場合には、ステップＳｂ３の判別結果がＮＯとる。なお、より詳細には、制御部１１０は、タイマカウントがタイムアウトする前に、音声入力があったとしても、その音声入力に対して音声識別処理を施した結果、その音声が応答テーブル５００（図５参照）に登録された各フレーズのいずれにも該当しない場合には、音声入力がなかったものとする。すなわち、このステップＳｂ３にあっては、質問の応答として取り得る音声入力（応答テーブル５００に登録されているフレーズ）があった場合にだけ、判別結果がＹＥＳとなる。
【００４９】
ステップＳｂ３の判別結果がＹＥＳである場合には、制御部１１０は、音声入力として得られた応答が肯定表現フレーズを含むものであるかを判別し（ステップＳｂ４）、この判別結果がＹＥＳであれば、現在のレイヤに属する他の質問をユーザに与える必要が無いため、質問処理を終了し、処理手順をステップＳａ３に進める。また、制御部１１０は、音声入力として得られた応答が否定表現、あるいは、あいまい表現であれば（ステップＳｂ４：ＮＯ）、現在のレイヤに属する次の質問をユーザに与えるべく、次の処理を実行する。すなわち、制御部１１０は、次の質問を指定すべく質問変数Ｑを「１」だけインクリメントした後（ステップＳｂ５）、質問変数Ｑが全質問数より大きいかを判別し（ステップＳｂ６）、この判別結果がＮＯである場合に、処理手順をステップＳｂ２に戻し、次の質問をユーザに与える。また、ステップＳｂ６の判別結果がＹＥＳである場合、すなわち、現在のレイヤに属する全ての質問がユーザに与えられている場合には、質問処理を終了し、処理手順をステップＳａ３に進める。そして、現在のレイヤにおいて肯定表現の応答が得られていないため、ステップＳａ３における判別結果がＮＯとなり、現在のレイヤに属する質問を最初からユーザに与えるべく、処理手順がステップＳａ２に戻る。
【００５０】
さて、上記ステップＳｂ３の判別結果がＮＯの場合、すなわち、質問をユーザに与えた後、一定時間が経過しても、ユーザから、肯定表現、否定表現およびあいまい表現のいずれかに属する音声入力が応答として得られなかった場合には、制御部１１０は、処理手順をステップＳｂ５に進め、現在のレイヤに属する他の質問をユーザに与え、応答を促す。
【００５１】
つまり、制御部１１０は、ユーザに質問を与えてから一定時間が経過した後、応答が得られなかった場合（ステップＳｂ３：ＮＯ）、および、応答として、あいまい表現が得られた場合（ステップＳｂ４：ＮＯ）には、現在のレイヤに属する他の質問をユーザに順次与えることとなる。
【００５２】
これにより、質問に対してユーザが肯定的な応答をしなかった場合、車載用楽曲再生装置１００が対話の進行をとめるのではなく、質問に応答することで明らかにすべき事項（例えば、所望する楽曲のジャンルやアーティストなど）に関した他の質問がユーザに与えられることで、あたかも、自然人と対話しているかのような自然な対話が実現される。
【００５３】
また、一般的に、ユーザが質問の応答に思案する場合、すなわち、応答に対して黙り込んでしまうか、あるいは、あいまいな応答しかできない場合、ユーザが、その質問に同意していないことが大半である。そこで、上記のように、他の質問を順次与える構成とすることで、ユーザは、所望する質問に対してのみ応答すれば良く、操作性を向上させることができる。
【００５４】
さらにまた、ユーザが車両の運転者であるような場合、車載用楽曲再生装置１００の対話操作を、運転している最中に行うことがある。従って、運転の状況によっては、運転者は、対話操作の途中で、運転に集中することが多々あり、質問に対して一定時間応答しない場合がある。このような場合、対話の進行が止まったままであると、運転者は、再度、対話を再開する場合に、どこまで進行していたかを覚えておく必要があり、これを忘れてしまった場合には、結局最初から対話をやり直さなければならなくなる。これに対して、本実施形態によれば、車載用楽曲再生装置１００は、ユーザからの応答が質問を出力してから一定時間得られなかった場合、質問を順次出力するから、ユーザは、対話操作を再開したい場合に、対話の進行を覚えておかなくとも、所望の質問が出力されたときに応答するだけで良く、操作が容易となる。
【００５５】
＜第２実施形態＞
上述した実施形態では、車載用楽曲再生装置１００は、質問を出力した後、一定時間が経過しても、ユーザから応答が得られなかった場合、あるいは、あいまい表現しか得られなかった場合に、同一のレイヤに属する他の質問を出力する構成について説明した。
【００５６】
しかしながら、ユーザが質問に対する応答する際に、車載用楽曲再生装置１００に現在のレイヤに属する処理ではなく、他のレイヤに属する処理させるべく、応答に思案して黙り込んでしまう、あるいは、あいまい表現を使用する場合がある。この場合の具体例としては、ユーザが、図３に示す第１レイヤ３００に属する質問に応答することで所望の楽曲ジャンルとして「演歌」を選択した上で、第２レイヤ３０１、第３レイヤ３０２と対話を進めたものの、第３レイヤ３０２にて楽曲ジャンルを「ＪＰＯＰ」に変更したいと思った場合などがある。
【００５７】
そこで、本実施形態では、図８に示すように、第１実施形態にて説明した質問処理において、車載用楽曲再生装置１００の制御部１１０は、質問が出力されてから一定時間内に応答がなかった場合（ステップＳｂ３：ＮＯ）、次に出力する質問を上位のレイヤに属する質問（すなわち、先の対話で確定した事項に関する質問）とすべく、レイヤ変数Ｎが「１」より大きいかを判別した後に（ステップＳｃ１）、この判別結果がＹＥＳであれば、レイヤ変数Ｎを「１」だけデクリメントし（ステップＳｃ２）、処理手順をステップＳａ３に進める。これにより、ユーザから一定時間応答がなかった場合の次の質問が、上位のレイヤに属する質問となる。
【００５８】
なお、ステップＳｃ１における判別結果がＮＯである場合には、それ以上上位のレイヤが無いことを示すため、現在のレイヤに属する質問を最初から繰り返すべく、処理手順をステップＳａ３に進める。
【００５９】
また、質問から一定時間内に応答があった場合であっても、制御部１１０は、その応答が、あいまい表現であるかを判別し（ステップＳｂ４）、この判別結果がＹＥＳであれば、一定時間応答が無かった場合と同様に、上位のレイヤに属する質問を次に出力すべく、処理手順をステップＳｃ１に進める。
【００６０】
このように、本実施形態によれば、他のレイヤに属する処理を所望するためにユーザが質問に対して沈黙、あるいは、あいまい表現を使用した場合に、次の質問が自動的に上位のレイヤに属するものとなるから、ユーザは、はじめから対話をやり直しするなどの操作をしなくとも、所望するレイヤの質問が出力されたときに応答すれば、所望の操作を装置に実行させることができる。
【００６１】
なお、本実施形態では、質問を上位のレイヤに属するものにする構成について例示したが、下位のレイヤに属する質問としても良い。上位および下位のレイヤのどちらに移行するかは、質問のレイヤ構造に応じて適宜に選択可能である。
【００６２】
＜第３実施形態＞
上述した第１あるいは第２実施形態にあっては、車載用楽曲再生装置１００は、質問を出力してから一定時間が経過しても、この質問に対してユーザから応答が得られなかった場合、あるいは、音声入力として所定のあいまい言葉しか得られなかった場合に、他の質問を出力する構成について説明した。
【００６３】
しかしながら、楽曲の選曲・再生操作のように、楽曲を特定するための対話がある程度進行しているような場合には、ユーザの所望する楽曲候補は、大まかに特定される。
【００６４】
そこで、本実施形態にあっては、図９に示すように、車載用楽曲再生装置１００の制御部１１０は、質問を出力してから一定時間が経過しても、この質問に対してユーザから応答が得られなかった場合（ステップＳｂ３：ＮＯ）、あるいは、音声入力による応答として、あいまい言葉しか得られなかった場合に（ステップＳｃ３：ＹＥＳ）、現在のレイヤから特定され得る全ての楽曲を順次再生する（ステップＳｄ２）。
【００６５】
このように、本実施形態では、車載用楽曲再生装置１００は、質問に対してユーザから応答が得られなかった場合、あるいは、音声入力として所定のあいまい言葉しか得られなかった場合、対話の進行を止めるのではなく、現在までの対話にて確定した事項の中の処理（すなわち、特定され得る楽曲の再生）を順次実行するため、少なくとも、ユーザが所望する処理を含む各種処理が実行されることとなる。
【００６６】
これにより、例えば、ユーザが楽曲のジャンルは特定するものの、その他はランダムに再生させたいといった場合に、質問にあえて一定時間応答しないようにするか、または、あいまいな表現をするといった操作の態様も可能となり、操作性が向上することとなる。
【００６７】
＜変形例＞
上述した各実施形態は、あくまでも本発明の一態様にすぎず、本発明の範囲内で任意に変形可能である。
【００６８】
例えば、上述した各実施形態では、本発明が車載用の楽曲再生装置を適用する場合について例示したが、車載用に限らず、家庭用のものであっても良いし、携帯用のものであっても良い。さらには、ユーザからの音声入力に基づいて複数のデータ処理のいずれか一を実行する装置であれば、任意の装置に適用することが可能である。
【００６９】
また例えば、上述した各実施形態において、質問に対するユーザからの応答として、あいまい表現が得られた場合に、実際にユーザが、どのような処理を所望しているかを学習する構成としても良い。
【００７０】
具体的には、第１および第２実施形態にて説明したように、車載用楽曲再生装置１００は、質問に対するユーザの応答があいまい表現であった場合には、一定時間応答が得られなかった場合と同様に、他の質問を順次出力する構成となっている。そこで、車載用楽曲再生装置１００が、ある質問に対する応答として、あいまい表現を得た場合、順次出力する質問のうち、ユーザが、どの質問に対して肯定したかを学習（対応付け）しておき、再度、同じ質問であいまい表現が使われた場合に、学習した質問を次に出力する構成としても良い。これにより、ユーザごとの趣向に合った質問を出力することが可能となる。
【００７１】
【発明の効果】
以上説明したように、本発明によれば、より操作性の良いデータ処理装置、楽曲再生装置、データ処理装置の制御プログラムおよび楽曲再生装置の制御プログラムが提供される。
【図面の簡単な説明】
【図１】本発明の第１実施形態にかかる車載用楽曲再生装置の機能的構成を、当該車載用楽曲再生装置に楽曲データなどのマルチメディアデータを配信するための配信システムと共に示す図である。
【図２】楽曲テーブルの構成を模式的に示す図である。
【図３】楽曲選択操作の際に用いられる質問レイヤ構造の一例を示す図である。
【図４】質問テーブルの構成を模式的に示す図である。
【図５】応答テーブルの構成を模式的に示す図である。
【図６】第１実施形態にかかる選曲・再生処理の処理手順を示すフローチャートである。
【図７】第１実施形態にかかる質問処理の処理手順を示すフローチャートである。
【図８】本発明の第２実施形態にかかる質問処理の処理手順を示すフローチャートである。
【図９】本発明の第３実施形態にかかる質問処理の処理手順を示すフローチャートである。
【符号の説明】
１００車載用楽曲再生装置
１１０制御部
１３０記憶部
１４０楽曲再生部
１４３スピーカ
１５０マイク
２００楽曲テーブル
３００〜３０４レイヤ
４００質問テーブル
５００応答テーブル[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a data processing device that executes data processing based on a user's voice input, a music reproduction device, and a control program for these devices.
[0002]
[Prior art]
2. Description of the Related Art Conventionally, there has been known a data processing apparatus which enables an operation by voice input instead of a manual operation of a user in order to make the operation of the apparatus simpler. Further, as this type of device, there has been proposed a device in which data processing to be executed is narrowed down based on a plurality of dialogs with a user (for example, see Patent Document 1).
[0003]
In such an apparatus, when the user cannot obtain a response from the user due to, for example, silence during the dialog, the user generally waits without proceeding with the dialog.
[0004]
[Patent Document 1]
JP 2003-108175 A
[0005]
[Problems to be solved by the invention]
However, as described above, when the progress of the dialogue stops, there is a problem that operability deteriorates. Specifically, if the apparatus keeps the dialogue stopped, when the user wants to resume the operation, it is necessary to remember how far the dialogue has progressed or to confirm each time.
[0006]
The present invention has been made in view of the circumstances described above, and has as its object to provide a data processing device, a music reproduction device, a control program for a data processing device, and a control program for a music reproduction device, which have better operability. .
[0007]
[Means for Solving the Problems]
In order to solve the above-mentioned problem, an invention according to claim 1 is a data processing apparatus that executes any one of a plurality of data processes based on a voice input from a user. Storage means for storing voice data for prompting the user, and reproducing one of the voice data to prompt the user to input a voice. If no voice input is obtained from the user, or if only a predetermined ambiguous word is obtained as the voice input, another voice data in the voice data is reproduced, and the user is prompted for voice input. And
[0008]
According to a second aspect of the present invention, in the data processing apparatus according to the first aspect, voice data different from the previously reproduced voice data is reproduced until the user inputs a voice, and the user is prompted to input a voice. It is characterized by.
[0009]
According to a third aspect of the present invention, in the data processing device which executes any one of a plurality of data processes based on a voice input from a user, a question for obtaining a voice input from the user as a response is provided in a plurality of layers. A storage means is provided for separately storing, and after reproducing a question belonging to a certain hierarchy by voice and prompting the user to respond, even if a predetermined time has elapsed, the user did not receive a response to this question. In this case, or when only a predetermined ambiguous word is obtained as a voice input, a question belonging to the hierarchy or another hierarchy is reproduced by voice to prompt the user to respond.
[0010]
According to a fourth aspect of the present invention, in the data processing device which executes any one of a plurality of data processes based on a voice input from a user, a question for obtaining a voice input from the user as a response is stored in a plurality of layers. A storage means is provided for separately storing, and after reproducing a question belonging to a certain hierarchy by voice and prompting the user to respond, even if a predetermined time has elapsed, the user did not receive a response to this question. In this case, or when only a predetermined ambiguous word is obtained, data processing that can be specified from the hierarchy or a question belonging to the hierarchy below is sequentially executed.
[0011]
According to a fifth aspect of the present invention, in a music reproducing apparatus for selecting and reproducing music based on a voice input from a user, a voice for urging the user to input a voice to instruct the reproduction of the music for each music. A storage means for storing data is provided, and one of these voice data is reproduced, and after prompting the user for voice input, even if a predetermined time has elapsed, the user receives voice input in response to the prompt. If not obtained, or if only a predetermined ambiguous word is obtained as a voice input, another voice data in the voice data is reproduced to prompt the user to input a voice.
[0012]
According to a sixth aspect of the present invention, in the music reproducing apparatus according to the fifth aspect, audio data different from the previously reproduced audio data is reproduced until there is a voice input from the user.
[0013]
According to a seventh aspect of the present invention, in a music reproducing apparatus for selecting and reproducing music based on a voice input from a user, a question for obtaining a voice input from the user as a response to specify a music to be selected is specified. A storage unit is provided for storing the information in a plurality of layers, and after reproducing a question belonging to a certain layer by voice and prompting the user to respond, even if a predetermined time has elapsed, the user receives a response to the question even if a predetermined time has elapsed. If not obtained, or if only a predetermined ambiguous word is obtained as a voice input, a question belonging to the hierarchy or another hierarchy is reproduced by voice to prompt the user to respond.
[0014]
In a music reproducing apparatus for selecting and reproducing music based on a voice input from a user, the invention according to claim 8 may include a question for obtaining a voice input from the user as a response in order to specify a music to be selected. A storage unit is provided for storing the information in a plurality of layers, and after reproducing a question belonging to a certain layer by voice and prompting the user to respond, even if a predetermined time has elapsed, the user receives a response to the question even if a predetermined time has elapsed. If not obtained, or if only a predetermined ambiguous word is obtained as a voice input, songs that can be specified from the hierarchy or questions belonging to the hierarchy below are sequentially selected and reproduced.
[0015]
According to a ninth aspect of the present invention, a data processing device that executes one of a plurality of data processes based on a voice input from a user stores voice data for prompting the user to perform a voice input for each data process. Means for reproducing, and prompting the user to input a voice by reproducing one of the voice data, and if no voice input is obtained from the user in response to the prompt even after a predetermined time has elapsed, Alternatively, when only a predetermined ambiguous word is obtained as a voice input, the other voice data in the voice data is reproduced, and the data processing device is made to function as means for prompting the user to input a voice. Provide a control program.
[0016]
According to a tenth aspect of the present invention, in the data processing device which executes any one of a plurality of data processes based on a voice input from a user, a question for obtaining a voice input from the user as a response is stored in a plurality of layers. Means for separately storing, and when a question belonging to a certain hierarchy is reproduced by voice, and after prompting the user for a response, even if a predetermined time has elapsed, the user does not receive a response to this question Alternatively, when only a predetermined ambiguous word is obtained as a voice input, a question belonging to the hierarchy or another hierarchy is reproduced by voice to function as a means for prompting a user to respond. Provide a control program.
[0017]
The invention according to claim 11 provides a data processing device that executes any one of a plurality of data processes based on a voice input from a user, in which a question for obtaining a voice input from the user as a response is provided in a plurality of hierarchies. Means for separately storing, and when a question belonging to a certain hierarchy is reproduced by voice, and after prompting the user for a response, even if a predetermined time has elapsed, the user does not receive a response to this question Or a control program for a data processing apparatus, which, when only a predetermined ambiguous word is obtained, functions as a means for sequentially executing data processing that can be specified from the hierarchy or a question below the hierarchy. I will provide a.
[0018]
According to a twelfth aspect of the present invention, there is provided a music reproducing apparatus for selecting and reproducing a music piece based on a voice input from a user, and a voice for urging the user to input a voice to instruct the reproduction of the music piece for each music piece. Means for storing data, and reproducing one of these voice data to prompt the user for voice input, and after a predetermined time elapses, the user receives a voice input in response to the prompt. If not, or if only a predetermined ambiguous word is obtained as a voice input, the other voice data in the voice data is reproduced to function as a means for prompting the user to voice input. The present invention provides a control program for a music playback device that performs
[0019]
According to a thirteenth aspect of the present invention, there is provided a music reproducing apparatus for selecting and reproducing music based on a voice input from a user, in order to specify a music to be selected, a question for obtaining a voice input from the user as a response. Means for storing the information in a plurality of layers, and reproducing a question belonging to a certain layer by voice and prompting the user for a response. If not, or if only a predetermined ambiguous word is obtained as a voice input, a question belonging to the hierarchy or another hierarchy is reproduced by voice to function as a means for prompting the user to respond. The present invention provides a control program for a music playback device that performs
[0020]
According to a fourteenth aspect of the present invention, there is provided a music reproducing apparatus for selecting and reproducing music based on a voice input from a user, in order to specify a music to be selected, a question for obtaining a voice input from the user as a response. Means for storing the information in a plurality of layers, and reproducing a question belonging to a certain layer by voice and prompting the user for a response. If not, or if only a predetermined ambiguous word is obtained as a voice input, it is made to function as a means for sequentially selecting and reproducing music that can be specified from the hierarchy or a question below the hierarchy. And a control program for the music reproducing apparatus.
[0021]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, embodiments of the present invention will be described with reference to the drawings. In the present embodiment, an in-vehicle music reproducing device mounted on a vehicle such as an automobile will be exemplified as the music reproducing device.
[0022]
<First embodiment>
FIG. 1 is a diagram showing a functional configuration of an in-vehicle music reproducing apparatus 100 according to the present embodiment, together with a distribution system 1 for distributing multimedia data such as music data to the in-vehicle music reproducing apparatus 100. As shown in FIG. 1, the distribution system 1 includes a distribution server 10 and an in-vehicle music reproducing apparatus 100, which are connected to each other via a network 4 including the Internet 2 and a wireless communication network 3 so as to be able to perform data communication. ing. Although FIG. 1 illustrates the distribution server 10 and the in-vehicle music reproducing device 100 one by one, the number is arbitrary.
[0023]
The distribution server 10 is composed of a general computer system, and distributes multimedia data such as music data, video data, and text data to the in-vehicle music reproduction device 100 via the network 4. It has a database 11 in which data is stored.
[0024]
The in-vehicle music reproducing apparatus 100 is mounted on a vehicle such as an automobile, and reproduces multimedia data distributed from the distribution server 10. As illustrated, the in-vehicle music reproducing apparatus 100 includes a control unit 110 and a storage unit 130. The control unit 110 includes a CPU, a ROM, a RAM, and the like, and controls each unit of the in-vehicle music reproducing device 100.
[0025]
The communication device 120 performs data communication with various terminal devices connected to the network 4 under the control of the control unit 110. More specifically, the communication device 120 corresponds to a mobile communication device such as a mobile phone or a wireless LAN communication device (Local Area Network), and is connected to the wireless communication network 3 or the Internet 2 via the wireless communication network 3. Data communication with the various terminals. In the present embodiment, the communication device 120 receives multimedia data such as music data and video data from the distribution server 10 by performing data communication particularly with the distribution server 10.
[0026]
The storage unit 130 includes a main storage device such as an HDD (Hard Disk Drive), and various control programs executed by the control unit 110, multimedia data (such as music data and video data) received from the distribution server 10, and synthesis. It stores various data such as text data output by voice. Further, the storage unit 130 stores table data for managing data stored in the storage unit 130 for each type of multimedia (music, text, video, and the like).
[0027]
FIG. 2 is a diagram schematically showing a configuration of a music table 200 for managing music data, which is one of multimedia data. As shown in this figure, one record of the music table 200 includes a music ID, genre information, artist information, music name information, and a new music flag.
[0028]
The song ID is an identifier of the song data, and the genre information indicates the genre to which the song belongs. Examples of the genre include JPOP (Japanese popular music), enka, rock, and classical music. The artist information indicates information of a singer if the song is a song, and information of a conductor or a performance orchestra if the song is a performance-only song such as a classic. For example, when the song is a song, the artist information includes information indicating whether the song is a group or solo, information indicating whether the vocal is male or female, and an artist name (or group name). The song name information shown in the figure indicates the title of the song, and the new song flag is information indicating whether or not the song is treated as a new song. "NO" is indicated.
[0029]
The music table 200 is updated by generating and adding a record corresponding to the music data each time the in-vehicle music reproducing apparatus 100 receives the music data from the distribution server 10 and stores the music data in the storage unit 130. You. Table data for managing multimedia data (for example, video data, etc.) other than music data is also updated every time data is received from the distribution server 10 in the same manner as the music table 200. Will be omitted.
[0030]
In addition, in addition to the above-mentioned information, the song table 200 may include, for example, a songwriter or a composer name as artist information, or a record such as viewing popularity ranking information or sales performance ranking information. It may be configured to include information. Alternatively, the distribution server 10 may generate table data for managing multimedia data such as the music table 200, and the in-vehicle music playback device 100 may receive the data via the network 4.
[0031]
Now, returning to FIG. 1 again, under the control of the control unit 110, the music reproduction unit 140 generates an analog signal based on the music data, which is one of the multimedia data, and outputs the analog signal to the amplifier 142 via the mixer 141. Is what you do. Amplifier 142 amplifies the analog signal from mixer 141 and outputs the amplified signal to speaker 143. The speaker 143 emits sound according to the analog signal input from the amplifier 142. Under this configuration, the control unit 110 outputs the music data stored in the storage unit 130 to the music reproduction unit 140, so that the music sound is output from the speaker 143.
[0032]
The microphone 150 is a sound collection device. In the present embodiment, the microphone 150 collects a sound emitted by the user and outputs an analog signal to the amplifier 151. The amplifier 151 amplifies the input analog signal and outputs it to the A / D converter 152. The A / D converter 152 quantizes the input analog signal into predetermined bits, converts the input signal into a digital signal, and outputs the digital signal to the VR 153 as an audio input signal.
[0033]
The VR (Voice Recognition: voice recognition unit) 153 executes voice recognition processing based on a voice input signal, and outputs a recognition result to the control unit 110. A circuit for voice recognition processing (for example, a DSP (Digital) Signal processor), and the processing can be performed at a higher speed than when the control unit 110 executes the voice recognition processing.
[0034]
A TTS (Text To Speech: voice conversion unit) 160 generates synthetic voice data, which is a digital signal, based on the text data input from the control unit 110 in order to generate a synthesized voice in accordance with the text content. / A converter 161. The D / A converter 161 converts the synthesized voice data into an analog signal and outputs the analog signal to the amplifier 142 via the mixer 141. As a result, a synthesized voice is output from the speaker 143. As described above, an analog signal based on music data and an analog signal based on synthesized voice data are input to the mixer 141, and when both are input simultaneously, the music and the synthesized voice are Is adjusted and output from the speaker 143.
[0035]
The operation unit 154 is used for turning on / off the power, has a plurality of operators such as push-down buttons, detects an operation of the operator by a user, and outputs the operation to the controller 110. In addition, the in-vehicle music reproducing apparatus 100 further includes a display unit (for example, a liquid crystal display) on which various information is displayed, and displays information, video, and the like regarding the music being reproduced. Also, while the user is operating the in-vehicle music reproducing apparatus 100 by voice, the display unit can display a CG composite image or a live-action image so as to give the user the impression of interacting with a natural person. Is displayed.
[0036]
As described above, the in-vehicle music reproducing apparatus 100 has a configuration for recognizing a user's voice, and enables various operations of the apparatus by voice instructions. The operation performed by voice includes, for example, an initial setting (time setting, etc.) of the in-vehicle music reproducing apparatus 100 and a music selecting operation. In the following, in order to avoid complicating the description, an operation when the user selects a music by inputting a voice will be described in detail.
[0037]
In the in-vehicle music reproducing apparatus 100, the music selection by voice input is performed by the user sequentially inputting a plurality of question voices output from the in-vehicle music reproducing apparatus 100 by voice. It has become. The question output from the in-vehicle music reproducing apparatus 100 is a question for specifying a music desired by the user, and is divided into layers (hierarchies) in order from large items to small items in order to narrow down the music. .
[0038]
FIG. 3 is a diagram illustrating an example of a question layer structure used in a music selection operation. As shown in this figure, the question is divided into first to third layers 300 to 304 according to the content. Specifically, the first layer 300 includes questions for specifying the genre of the music to be reproduced, and the genre of the desired music is narrowed down by the user responding to these questions. The second layer 301 and the third layer 302 include a question for narrowing down artists. For example, the second layer 301 includes a question for specifying whether the artist desired by the user is a solo or a group, and the third layer 302 includes a question for specifying the gender of the vocal. On the fourth layer 303, artist candidates identified from responses to the questions included in the first to third layers 300 to 302 are sequentially presented to the user, and the user is allowed to select a desired artist from these candidates. Questions are included. Further, the fifth layer 304 sequentially presents the songs belonging to the artist selected by the user in the question of the fourth layer 303 to the user, and asks the user to select a desired song from these candidates. include.
[0039]
The questions belonging to each of the layers 300 to 304 are registered in advance in a music selection question table (hereinafter simply referred to as “question table”) 400. Specifically, the question table 400 is stored in the storage unit 130 in advance, and, as shown in FIG. 4, records text data indicating a question sentence for each layer to which the question belongs. For example, as shown in an example in this figure, the first layer 300 to which the question for specifying the genre of the music desired by the user belongs is included in the first layer 300 as a question sentence "Do you introduce a new song?" Text data such as "Do you listen?" And "Do you listen to enka?" Are registered.
[0040]
In addition, a phrase that the user will use for answering the question is registered as an answer table 500. This response table 500 is stored in the storage unit 130 in advance, and as shown in FIG. 5, a response phrase is registered for each meaning. In the present embodiment, positive expressions, negative expressions, and ambiguous expressions (ambiguous words) are set in advance as possible meanings of the phrase. The affirmative expression is a phrase meaning the user's consent to the question output by the in-vehicle music reproducing device 100, and includes, for example, “Yes”, “Yeah”, “So”. Negative expressions are phrases that mean rejection of the user, contrary to positive expressions, and include, for example, “No”, “No”, and “No”. Further, the ambiguous expression indicates a response that is not always negative but is not affirmative to the question, and includes, for example, "um", "um", and "that".
[0041]
Under such a configuration, the in-vehicle music reproducing apparatus 100 outputs a question to the user as a synthesized voice, and then picks up the user's response (voice) from the microphone 150 and responds to the voice input signal. Speech recognition is performed to determine whether the user's response is an affirmative expression, a negative expression, or an ambiguous expression.
[0042]
Next, a description will be given of an operation when a music piece is selected by a dialogue including a question from the in-vehicle music reproduction apparatus 100 and a response of the user to the question.
[0043]
FIG. 6 is a flowchart showing a processing procedure of music selection / playback processing executed by the control unit 110 of the vehicle-mounted music playback apparatus 100 for music selection / playback. As shown in the figure, the control unit 110 first initializes a layer variable N to “1” (step Sa1). The layer variable N is for identifying which layer (see FIG. 3) the current question belongs to. Next, the control unit 110 sequentially outputs each of the questions belonging to the Nth layer indicated by the layer variable N to the user as a synthesized voice, and executes a question process for identifying the user's response to these questions (step). Sa2). The specific processing content of this question processing will be described later in detail.
[0044]
Next, in Step Sa2, the control unit 110 determines whether or not a response of an affirmative expression has been received as a response from the user (Step Sa3). If the determination result is NO, the question belonging to the same layer is again asked to the user. In order to give, the processing procedure returns to step Sa2.
[0045]
On the other hand, if the decision result in the step Sa3 is YES, the control unit 110 increments the layer variable N by “1” (step Sa4) and sets the layer variable N to the total number of layers (“4” in the present embodiment). ) Is determined (step Sa5). If the determination result is NO, the control unit 110 returns the processing procedure to step Sa2 in order to give the user a question belonging to the Nth layer specified by the layer variable N. If the determination result is YES, a positive expression response has been obtained for each layer, and the music data desired by the user is specified. Reproduction is performed (step Sa6).
[0046]
Next, the question processing in step Sa2 will be described in more detail with reference to the flowchart shown in FIG.
[0047]
As shown in this figure, in the question processing, the questions belonging to the N-th layer are sequentially output to the user as synthesized speech. Specifically, the control unit 110 first initializes the question variable Q to “1” (Step Sb1). The question variable Q indicates which question among the questions belonging to the layer has been output. Next, the control unit 110 outputs synthesized speech so as to give the user the question specified by the question variable Q (step Sb2). Then, the control unit 110 determines whether or not there is a response from the user (whether or not there is a voice input) within a certain period of time after outputting the synthesized voice of the question (step Sb3).
[0048]
Specifically, in step Sb3, after executing step Sb2, control unit 110 starts counting the timer and continues to accept a voice input. If the control unit 110 obtains a voice input from the user before the timer count times out after a certain period of time has elapsed, the determination result in step Sb3 becomes YES, and the time-out occurs without obtaining the voice input. In this case, the determination result of step Sb3 is NO. More specifically, even if there is a voice input before the timer count times out, the control unit 110 performs voice recognition processing on the voice input, and as a result, the voice is stored in the response table 500 (FIG. If none of the phrases registered in (5) are registered, it is determined that there is no voice input. That is, in step Sb3, the determination result is YES only when there is a voice input (phrase registered in the response table 500) that can be taken as a response to the question.
[0049]
When the determined result in the step Sb3 is YES, the control unit 110 determines whether or not the response obtained as the voice input includes the positive expression phrase (step Sb4). Since there is no need to give the user another question belonging to the current layer, the question processing is terminated, and the procedure proceeds to step Sa3. If the response obtained as the voice input is a negative expression or an ambiguous expression (step Sb4: NO), control unit 110 performs the following processing to give the user the next question belonging to the current layer. Execute. That is, after incrementing the question variable Q by “1” to specify the next question (step Sb5), the control unit 110 determines whether the question variable Q is larger than the total number of questions (step Sb6). If the result is NO, the process returns to step Sb2 and the next question is given to the user. If the result of the determination in step Sb6 is YES, that is, if all the questions belonging to the current layer have been given to the user, the question processing ends, and the procedure proceeds to step Sa3. Then, since a positive expression response has not been obtained in the current layer, the determination result in step Sa3 is NO, and the processing procedure returns to step Sa2 in order to give the user a question belonging to the current layer from the beginning.
[0050]
If the result of the determination in step Sb3 is NO, that is, even if a certain period of time has elapsed after the question was given to the user, a voice input belonging to any of the affirmative expression, the negative expression, and the ambiguous expression is received from the user. If a response has not been obtained, the control unit 110 advances the processing procedure to step Sb5, gives another question belonging to the current layer to the user, and prompts a response.
[0051]
That is, the control unit 110 determines that no response has been obtained after a certain period of time has passed since the question was given to the user (step Sb3: NO), and that an ambiguous expression was obtained as a response (step Sb4). : NO), other questions belonging to the current layer are sequentially given to the user.
[0052]
Accordingly, when the user does not respond positively to the question, the in-vehicle music reproducing apparatus 100 does not stop the progress of the dialog but responds to the question to clarify items (for example, desired By giving the user another question about the genre of the music to be played, the artist, etc.), a natural conversation as if interacting with a natural person is realized.
[0053]
Also, in general, when a user ponders in answering a question, that is, silently responds to the response or gives only an ambiguous response, the user often does not agree with the question. is there. Therefore, as described above, by adopting a configuration in which other questions are sequentially given, the user only has to respond to a desired question, and operability can be improved.
[0054]
Furthermore, when the user is the driver of the vehicle, the interactive operation of the in-vehicle music reproducing device 100 may be performed during driving. Therefore, depending on the driving situation, the driver often concentrates on driving during the interactive operation, and may not respond to the question for a certain period of time. In such a case, if the progress of the dialogue is stopped, the driver needs to remember how far the dialogue has progressed when restarting the dialogue again, and if the driver forgets this, After all, you have to start over from the beginning. On the other hand, according to the present embodiment, the in-vehicle music reproducing apparatus 100 sequentially outputs questions when a response from the user is not obtained for a certain period of time after outputting the question. When the user wants to restart the operation, he or she need only respond when a desired question is output without having to remember the progress of the dialogue, and the operation becomes easy.
[0055]
<Second embodiment>
In the above-described embodiment, the in-vehicle music reproducing device 100 outputs a question, outputs a question, and after a certain period of time, if no response is obtained from the user, or if only an ambiguous expression is obtained, The configuration for outputting another question belonging to the same layer has been described.
[0056]
However, when the user responds to the question, the in-vehicle music reproducing device 100 does not perform processing belonging to the current layer, but performs processing belonging to another layer. May be used. As a specific example of this case, the user selects “Enka” as a desired music genre by responding to a question belonging to the first layer 300 shown in FIG. 3, and then selects the second layer 301 and the third layer 302. There is a case where the user wishes to change the music genre to “JPOP” in the third layer 302 after the dialogue with the user.
[0057]
Therefore, in the present embodiment, as shown in FIG. 8, in the question processing described in the first embodiment, the control unit 110 of the in-vehicle music reproducing device 100 makes a response within a certain time after the question is output. If not (step Sb3: NO), it is determined whether or not the layer variable N is greater than “1” so that the next question to be output is a question belonging to a higher layer (that is, a question regarding a matter determined in the previous dialogue). After the determination (step Sc1), if this determination result is YES, the layer variable N is decremented by "1" (step Sc2), and the processing procedure proceeds to step Sa3. Thereby, the next question when there is no response from the user for a certain period of time becomes a question belonging to a higher layer.
[0058]
If the determination result in step Sc1 is NO, it indicates that there is no higher layer, and the process proceeds to step Sa3 to repeat the question belonging to the current layer from the beginning.
[0059]
Further, even if there is a response within a certain period of time from the question, the control unit 110 determines whether the response is an ambiguous expression (step Sb4). As in the case where there is no time response, the process proceeds to step Sc1 in order to output the question belonging to the higher layer next.
[0060]
As described above, according to the present embodiment, when the user silences a question or uses an ambiguous expression in order to desire a process belonging to another layer, the next question is automatically transmitted to the upper layer. Therefore, the user can cause the apparatus to execute a desired operation by responding when a question of a desired layer is output, without performing an operation such as starting the dialog again from the beginning. .
[0061]
In the present embodiment, the configuration in which the question belongs to the upper layer has been described as an example, but the question may belong to the lower layer. Whether to shift to the upper layer or the lower layer can be appropriately selected according to the layer structure of the question.
[0062]
<Third embodiment>
In the above-described first or second embodiment, when the in-vehicle music reproducing apparatus 100 does not receive a response to the question even after a certain time has elapsed since the question was output, Alternatively, a configuration has been described in which another question is output when only a predetermined ambiguous word is obtained as a voice input.
[0063]
However, in the case where the dialogue for specifying the music has progressed to some extent, such as a music selection / playback operation, a music candidate desired by the user is roughly specified.
[0064]
Therefore, in the present embodiment, as shown in FIG. 9, the control unit 110 of the in-vehicle music reproducing apparatus 100 outputs the question to the user from the user even if a certain period of time has passed since the question was output. If no response is obtained (step Sb3: NO), or if only ambiguous words are obtained as a response by voice input (step Sc3: YES), all songs that can be specified from the current layer are sequentially read. Playback is performed (step Sd2).
[0065]
As described above, in the present embodiment, the in-vehicle music reproducing apparatus 100 proceeds with the progress of the dialog when a response to the question is not obtained from the user or when only a predetermined ambiguous word is obtained as a voice input. Instead of stopping, the processes in the items determined in the dialogue up to the present (ie, the reproduction of the music that can be specified) are sequentially executed, so that at least various processes including the processes desired by the user are executed. It will be.
[0066]
Thereby, for example, when the user specifies the genre of the music but wants to play the others at random, the operation mode such as not responding to the question for a certain period of time or giving an ambiguous expression is also available. It becomes possible and operability is improved.
[0067]
<Modification>
Each of the above-described embodiments is merely an embodiment of the present invention, and can be arbitrarily modified within the scope of the present invention.
[0068]
For example, in each of the above-described embodiments, the case where the present invention is applied to an in-vehicle music reproducing device is illustrated. However, the present invention is not limited to the in-vehicle use and may be a home-use or portable one. May be. Furthermore, the present invention can be applied to any device as long as it performs any one of a plurality of data processes based on a voice input from a user.
[0069]
Further, for example, in each of the above-described embodiments, when an ambiguous expression is obtained as a response from the user to the question, a configuration may be employed in which the user actually learns what kind of processing the user desires.
[0070]
Specifically, as described in the first and second embodiments, when the user's response to the question is an ambiguous expression, the in-vehicle music reproducing device 100 has not been able to obtain a response for a certain period of time. As in the case, other questions are sequentially output. Therefore, when the in-vehicle music reproducing apparatus 100 obtains an ambiguous expression as a response to a certain question, it learns (associates) which question the user affirmed in the sequentially output questions. Alternatively, when an ambiguous expression is used again for the same question, the learned question may be output next. As a result, it is possible to output a question that matches the taste of each user.
[0071]
【The invention's effect】
As described above, according to the present invention, a data processing device, a music reproduction device, a control program for a data processing device, and a control program for a music reproduction device having better operability are provided.
[Brief description of the drawings]
FIG. 1 is a diagram showing a functional configuration of an in-vehicle music reproducing apparatus according to a first embodiment of the present invention, together with a distribution system for distributing multimedia data such as music data to the in-vehicle music reproducing apparatus. .
FIG. 2 is a diagram schematically showing a configuration of a music table.
FIG. 3 is a diagram showing an example of a question layer structure used in a music selection operation.
FIG. 4 is a diagram schematically showing a configuration of a question table.
FIG. 5 is a diagram schematically illustrating a configuration of a response table.
FIG. 6 is a flowchart illustrating a processing procedure of music selection / playback processing according to the first embodiment.
FIG. 7 is a flowchart illustrating a procedure of a question process according to the first embodiment;
FIG. 8 is a flowchart illustrating a procedure of a question process according to the second embodiment of the present invention.
FIG. 9 is a flowchart illustrating a processing procedure of a question process according to the third embodiment of the present invention.
[Explanation of symbols]
100 In-vehicle music playback device
110 control unit
130 storage unit
140 Music playback unit
143 Speaker
150 microphone
200 music table
300-304 layers
400 question table
500 response table

Claims

In a data processing device that executes any one of a plurality of data processes based on a voice input from a user,
Storage means for storing voice data for prompting the user to input voice for each data processing,
After reproducing one of these voice data and prompting the user to input a voice, if a voice input is not obtained from the user in response to the prompt even after a predetermined time has elapsed, or as a voice input. A data processing apparatus characterized in that when only a predetermined ambiguous word is obtained, another voice data in the voice data is reproduced to prompt the user to input a voice.

2. The data processing apparatus according to claim 1, wherein until data is input from the user, audio data different from the previously reproduced audio data is reproduced to prompt the user to input a voice.

In a data processing device that executes any one of a plurality of data processes based on a voice input from a user,
Storage means for storing a question for obtaining a voice input from the user as a response in a plurality of layers,
After playing a question belonging to a certain hierarchy by voice and prompting the user to respond, even if a predetermined time elapses, if the user does not receive a response to this question, A data processing apparatus characterized in that when only ambiguous words are obtained, a question belonging to the hierarchy or another hierarchy is reproduced by voice to prompt a user to respond.

In a data processing device that executes any one of a plurality of data processes based on a voice input from a user,
Storage means for storing a question for obtaining a voice input from the user as a response in a plurality of layers,
After playing a question belonging to a certain hierarchy by voice and prompting the user to respond, even if a predetermined time has elapsed, if the user has not received a response to this question, or only a predetermined vague word A data processing device which, when not obtained, sequentially executes data processing that can be specified from the hierarchy or a question belonging to the hierarchy below.

In a music playback device that selects and plays music based on a voice input from a user,
Storage means for storing voice data for prompting a user to input a voice to instruct reproduction of the music for each music,
After reproducing one of these voice data and prompting the user for voice input, if no voice input is obtained from the user even after a predetermined time has passed, or A music reproducing apparatus characterized in that, when only a predetermined ambiguous word is obtained as input, the other audio data in the audio data is reproduced to prompt the user to input a voice.

6. The music reproducing apparatus according to claim 5, wherein audio data different from the previously reproduced audio data is reproduced until a user inputs a voice.

In a music playback device that selects and plays music based on a voice input from a user,
In order to specify a song to be selected, a storage unit is provided for storing a question for obtaining a voice input from a user as a response in a plurality of layers,
After playing a question belonging to a certain hierarchy by voice and prompting the user to respond, even if a predetermined time elapses, if the user does not receive a response to this question, A music reproducing apparatus characterized in that, when only ambiguous words are obtained, a question belonging to the hierarchy or another hierarchy is reproduced by voice to prompt a user to respond.

In a music playback device that selects and plays music based on a voice input from a user,
In order to specify a song to be selected, a storage unit is provided for storing a question for obtaining a voice input from a user as a response in a plurality of layers,
After playing a question belonging to a certain hierarchy by voice and prompting the user to respond, even if a predetermined time elapses, if the user does not receive a response to this question, A music reproducing apparatus characterized in that, when only ambiguous words are obtained, music that can be specified from the hierarchy or a question below the hierarchy is sequentially selected and reproduced.

A data processing device that executes any one of a plurality of data processes based on a voice input from a user,
Means for storing voice data for prompting the user to input voice for each data processing; and
After reproducing one of these voice data and prompting the user to input a voice, if a voice input is not obtained from the user in response to the prompt even after a predetermined time has elapsed, or as a voice input. A control program for a data processing device, wherein, when only a predetermined ambiguous word is obtained, another voice data in the voice data is reproduced, and the function is made to function as means for prompting a user to input a voice.

In a data processing device that executes any one of a plurality of data processes based on a voice input from a user,
Means for storing a question for obtaining a voice input from the user as a response in a plurality of layers, and
After playing a question belonging to a certain hierarchy by voice and prompting the user to respond, even if a predetermined time elapses, if the user does not receive a response to this question, A control program for a data processing device, characterized in that, when only ambiguous words are obtained, a question belonging to the hierarchy or another hierarchy is reproduced by voice and functions as means for prompting a user to respond.

A data processing device that executes any one of a plurality of data processes based on a voice input from a user,
Means for storing a question for obtaining a voice input from the user as a response in a plurality of layers, and
After playing a question belonging to a certain hierarchy by voice and prompting the user to respond, even if a predetermined time has elapsed, if the user has not received a response to this question, or only a predetermined vague word A control program for a data processing device, which, when not obtained, functions as means for sequentially executing data processing that can be specified from the hierarchy or a question below the hierarchy.

A music playback device that selects and plays music based on voice input from a user,
Means for storing voice data for prompting the user to input a voice to instruct reproduction of the music for each music, and
After reproducing one of these voice data and prompting the user for voice input, if no voice input is obtained from the user even after a predetermined time has passed, or When only a predetermined ambiguous word is obtained as an input, a control program for a music reproducing apparatus, which reproduces another audio data in the audio data and functions as a means for prompting a user to input a voice.

A music playback device that selects and plays music based on voice input from a user,
Means for storing a question for obtaining a voice input from a user as a response in a plurality of layers in order to specify a music to be selected, and
After playing a question belonging to a certain hierarchy by voice and prompting the user to respond, even if a predetermined time elapses, if the user does not receive a response to this question, A control program for a music reproducing apparatus, characterized in that when only ambiguous words are obtained, a question belonging to the hierarchy or another hierarchy is reproduced by voice to function as means for prompting a user to respond.

A music playback device that selects and plays music based on voice input from a user,
Means for storing a question for obtaining a voice input from a user as a response in a plurality of layers in order to specify a music to be selected, and
After playing a question belonging to a certain hierarchy by voice and prompting the user to respond, even if a predetermined time elapses, if the user does not receive a response to this question, When only ambiguous words are obtained, a control program for a music reproducing apparatus, which functions as a means for sequentially selecting and reproducing music that can be identified from the hierarchy or a question below the hierarchy, and playing the selected music.