JP6917821B2

JP6917821B2 - Playback device, program and playback method

Info

Publication number: JP6917821B2
Application number: JP2017147691A
Authority: JP
Inventors: 修弥和田
Original assignee: NTT Docomo Inc
Current assignee: NTT Docomo Inc
Priority date: 2017-07-31
Filing date: 2017-07-31
Publication date: 2021-08-11
Anticipated expiration: 2037-07-31
Also published as: JP2019029828A

Description

本発明は、再生装置、プログラムおよび再生方法に関する。 The present invention relates to a reproduction device, a program and a reproduction method.

特許文献１には、楽曲を表す曲データの内容に応じて、楽曲を出力する装置を切り替える音響システムが開示されている。この音響システムによれば、楽曲の周波数特性に応じた装置で楽曲を出力することができる。 Patent Document 1 discloses an acoustic system that switches a device that outputs music according to the content of music data representing the music. According to this sound system, the music can be output by a device according to the frequency characteristics of the music.

特開２０１６−８２４５６号公報Japanese Unexamined Patent Publication No. 2016-82456

ユーザがコンテンツを視聴する場合、コンテンツを視聴する状況は、例えば雑音やユーザの位置、視聴する人数等によって変化する。そして、変化後の状況によっては、それまでに音を出力していた装置では音が聞き取りにくくなるなど、音を出力していた装置がユーザにとって適切な装置でなくなる場合がある。 When a user views content, the situation in which the content is viewed changes depending on, for example, noise, the position of the user, the number of viewers, and the like. Then, depending on the situation after the change, the device that has output the sound may not be an appropriate device for the user, for example, the device that has output the sound may be difficult to hear.

本発明は、コンテンツを視聴する状況が変化しても、ユーザがコンテンツに応じた適切な視聴を行えるようにすることを目的とする。 An object of the present invention is to enable a user to perform appropriate viewing according to the content even if the situation of viewing the content changes.

本発明は、コンテンツを取得する取得手段と、前記取得手段が取得したコンテンツを再生する再生手段と、自装置の周囲の状況を特定する特定手段と、前記再生手段で再生されたコンテンツの音声の出力先を、当該コンテンツの属性と、前記特定手段が特定した状況に基づいて複数の出力先の中から選択する選択手段と、前記選択手段が選択した出力先へ、前記再生手段で再生されたコンテンツの音声を出力する出力手段とを備える再生装置を提供する。 The present invention relates to an acquisition means for acquiring content, a reproduction means for reproducing the content acquired by the acquisition means, a specific means for specifying the surrounding situation of the own device, and an audio of the content reproduced by the reproduction means. The output destination is reproduced by the reproduction means to the selection means for selecting the output destination from a plurality of output destinations based on the attributes of the content and the situation specified by the specific means, and the output destination selected by the selection means. Provided is a playback device including an output means for outputting audio of contents.

本発明においては、前記特定手段は、自装置の周囲の雑音レベルを特定し、前記選択手段は、前記再生手段で再生されたコンテンツの音声の出力先を、当該コンテンツの属性と、前記特定手段が特定した雑音レベルに基づいて選択する構成であってもよい。 In the present invention, the specific means identifies the noise level around the own device, and the selection means sets the output destination of the sound of the content reproduced by the reproduction means to the attributes of the content and the specific means. The configuration may be selected based on the noise level specified by.

また、本発明においては、前記特定手段は、自装置のユーザの位置を特定し、前記選択手段は、前記再生手段で再生されたコンテンツの音声の出力先を、当該コンテンツの属性と、前記特定手段が特定した位置に基づいて選択する構成としてもよい。 Further, in the present invention, the specific means specifies the position of the user of the own device, and the selection means specifies the output destination of the sound of the content reproduced by the reproduction means as the attribute of the content and the identification. It may be configured to be selected based on the position specified by the means.

また、本発明においては、前記特定手段は、自装置の周囲にいる人の数を特定し、前記選択手段は、前記再生手段で再生されたコンテンツの音声の出力先を、当該コンテンツの属性と、前記特定手段が特定した数に基づいて選択する構成としてもよい。 Further, in the present invention, the specific means specifies the number of people around the own device, and the selection means sets the output destination of the sound of the content reproduced by the reproduction means as the attribute of the content. , The configuration may be selected based on the number specified by the specific means.

また、本発明は、コンピュータを、コンテンツを取得する取得手段と、前記取得手段が取得したコンテンツを再生する再生手段と、自装置の周囲の状況を特定する特定手段と、前記再生手段で再生されたコンテンツの音声の出力先を、当該コンテンツの属性と、前記特定手段が特定した状況に基づいて複数の出力先の中から選択する選択手段と、前記選択手段が選択した出力先へ、前記再生手段で再生されたコンテンツの音声を出力する出力手段として機能させるためのプログラムを提供する。 Further, in the present invention, the computer is reproduced by the acquisition means for acquiring the content, the reproduction means for reproducing the content acquired by the acquisition means, the specific means for specifying the surrounding situation of the own device, and the reproduction means. The playback to the selection means for selecting the audio output destination of the content from a plurality of output destinations based on the attributes of the content and the situation specified by the specific means, and the output destination selected by the selection means. Provided is a program for functioning as an output means for outputting the sound of the content reproduced by the means.

また、本発明は、コンテンツを取得する取得ステップと、前記取得ステップで取得したコンテンツを再生する再生ステップと、自装置の周囲の状況を特定する特定ステップと、前記再生ステップで再生されたコンテンツの音声の出力先を、当該コンテンツの属性と、前記特定ステップで特定した状況に基づいて複数の出力先の中から選択する選択ステップと、前記選択ステップで選択した出力先へ、前記再生ステップで再生されたコンテンツの音声を出力する出力ステップとを備える再生方法を提供する。 Further, the present invention includes an acquisition step for acquiring content, a reproduction step for reproducing the content acquired in the acquisition step, a specific step for specifying the surrounding situation of the own device, and the content reproduced in the reproduction step. A selection step for selecting an audio output destination from a plurality of output destinations based on the attributes of the content and a situation specified in the specific step, and an output destination selected in the selection step are played back in the playback step. Provided is a reproduction method including an output step for outputting the sound of the content.

本発明によれば、コンテンツを視聴する状況が変化しても、ユーザがコンテンツに応じた適切な視聴を行うことができる。 According to the present invention, even if the situation in which the content is viewed changes, the user can perform appropriate viewing according to the content.

本発明の一実施形態に係る再生システムに含まれる装置を示した図。The figure which showed the apparatus included in the reproduction system which concerns on one Embodiment of this invention. コンテンツ再生装置１０のハードウェア構成の一例を示した図。The figure which showed an example of the hardware composition of the content reproduction apparatus 10. コンテンツ再生装置１０において実現する機能を示した機能ブロック図。The functional block diagram which showed the function realized in the content reproduction apparatus 10. コンテンツ再生装置１０が行う処理の流れを示したフローチャート。The flowchart which showed the flow of the process performed by the content reproduction apparatus 10. コンテンツ再生装置１０が行う処理の流れを示したフローチャート。The flowchart which showed the flow of the process performed by the content reproduction apparatus 10.

［実施形態］
図１は、本発明の一実施形態に係る再生システム１に含まれる装置を示した図である。コンテンツ再生装置１０は、コンテンツを取得し、取得したコンテンツを再生する装置である。コンテンツ再生装置１０は、コンテンツを再生し、コンテンツの音声を内部スピーカまたは外部機器で出力する。なお、コンテンツ再生装置１０は、本発明の処理を行うコンピュータとして機能してもよい。 [Embodiment]
FIG. 1 is a diagram showing an apparatus included in the reproduction system 1 according to the embodiment of the present invention. The content playback device 10 is a device that acquires content and reproduces the acquired content. The content reproduction device 10 reproduces the content and outputs the sound of the content by the internal speaker or the external device. The content reproduction device 10 may function as a computer that performs the processing of the present invention.

外部スピーカ２０は、エンクロージャーと一または複数のスピーカユニットを有するスピーカである。外部スピーカ２０は、コンテンツ再生装置１０に有線で接続されており、コンテンツ再生装置１０から供給される音声信号が表す音声を出力する。なお、外部スピーカ２０は、音声信号を有線ではなく無線により取得する構成であってもよい。ヘッドフォン４０は、コンテンツ再生装置１０から無線により出力される音声信号を取得し、取得した音声信号が表す音声を出力する。なお、ヘッドフォン４０は、音声信号を無線ではなく有線により取得する構成であってもよい。 The external speaker 20 is a speaker having an enclosure and one or more speaker units. The external speaker 20 is connected to the content reproduction device 10 by wire, and outputs the sound represented by the audio signal supplied from the content reproduction device 10. The external speaker 20 may be configured to acquire audio signals wirelessly instead of by wire. The headphone 40 acquires an audio signal wirelessly output from the content reproduction device 10, and outputs the audio represented by the acquired audio signal. The headphone 40 may be configured to acquire an audio signal by wire instead of wirelessly.

（コンテンツ再生装置１０の構成）
図２は、コンテンツ再生装置１０のハードウェア構成の一例を示した図である。通信部１０５は、有線および／または無線ネットワークを介してコンピュータ間の通信を行うためのハードウェア（送受信デバイス）であり、例えばネットワークデバイス、ネットワークコントローラ、ネットワークカード、通信モジュールなどともいう。通信部１０５は、ネットワークを介して音声通信やデータ通信を行う。例えば、通信部１０５は、インターネットで配信されるラジオ放送のデータを取得する。また、通信部１０５は、ネットワークを介してアクセス可能なサーバから楽曲や動画などのコンテンツのデータを取得する。これらのデータは、本発明に係るコンテンツの一例である。近距離通信部１０６は、Bluetooth（登録商標）や無線ＬＡＮの通信などの近距離無線通信の通信インターフェースとして機能する。近距離通信部１０６は、本発明に係る出力手段の一例である。出力部１０３は、音声信号を出力する出力端子を有しており、出力端子にケーブルで接続された外部スピーカ２０へ音声信号を出力する。出力部１０３も、本発明に係る出力手段の一例である。音声処理部１０７は、マイクロフォンとスピーカ（内蔵スピーカ）を有している。音声処理部１０７は、供給される音声信号を内蔵スピーカへ出力する。内蔵スピーカは、音声信号が表す音声を出力する。また、音声処理部１０７は、マイクロフォンが取得した音声をデジタル信号に変換し、このデジタル信号を制御部１０１へ供給する。 (Configuration of content playback device 10)
FIG. 2 is a diagram showing an example of the hardware configuration of the content reproduction device 10. The communication unit 105 is hardware (transmission / reception device) for performing communication between computers via a wired and / or wireless network, and is also referred to as, for example, a network device, a network controller, a network card, a communication module, or the like. The communication unit 105 performs voice communication and data communication via a network. For example, the communication unit 105 acquires radio broadcast data distributed on the Internet. In addition, the communication unit 105 acquires content data such as music and moving images from a server accessible via a network. These data are examples of the contents according to the present invention. The short-range communication unit 106 functions as a communication interface for short-range wireless communication such as Bluetooth (registered trademark) and wireless LAN communication. The short-range communication unit 106 is an example of the output means according to the present invention. The output unit 103 has an output terminal for outputting an audio signal, and outputs the audio signal to the external speaker 20 connected to the output terminal with a cable. The output unit 103 is also an example of the output means according to the present invention. The voice processing unit 107 has a microphone and a speaker (built-in speaker). The voice processing unit 107 outputs the supplied voice signal to the built-in speaker. The built-in speaker outputs the voice represented by the voice signal. Further, the voice processing unit 107 converts the voice acquired by the microphone into a digital signal, and supplies the digital signal to the control unit 101.

記憶部１０２は、不揮発性メモリを有しており、制御部１０１が実行するプログラムを記憶している。また、記憶部１０２は、通信部１０５がサーバから取得したコンテンツのデータを記憶する。なお、プログラムは、電気通信回線を介してネットワークから送信されてもよい。記憶部１０２は、コンピュータ読み取り可能な記録媒体であり、例えば、ＣＤ−ＲＯＭ（Compact Disc ＲＯＭ）などの光ディスク、ハードディスクドライブ、フレキシブルディスク、光磁気ディスク（例えば、コンパクトディスク、デジタル多用途ディスク、Ｂｌｕ−ｒａｙ（登録商標）ディスク）、スマートカード、フラッシュメモリ（例えば、カード、スティック、キードライブ）、フロッピー（登録商標）ディスク、磁気ストリップなどの少なくとも１つで構成されてもよい。記憶部１０２は、補助記憶装置と呼ばれてもよい。 The storage unit 102 has a non-volatile memory and stores a program executed by the control unit 101. Further, the storage unit 102 stores the content data acquired by the communication unit 105 from the server. The program may be transmitted from the network via a telecommunication line. The storage unit 102 is a computer-readable recording medium, and is, for example, an optical disk such as a CD-ROM (Compact Disc ROM), a hard disk drive, a flexible disk, an optical magnetic disk (for example, a compact disk, a digital versatile disk, Blu-). It may consist of at least one such as a ray® disc), a smart card, a flash memory (eg, a card, stick, key drive), a floppy® disc, a magnetic strip, and the like. The storage unit 102 may be called an auxiliary storage device.

制御部１０１は、ＣＰＵ（Central Processing Unit）およびメモリを有している。コンテンツ再生装置１０において実現する機能は、記憶部１０２に記憶されているプログラム（ソフトウェア）をＣＰＵ、メモリなどのハードウェア上に読み込ませることでＣＰＵが演算を行い、出力部１０３の制御、通信部１０５の制御、近距離通信部１０６の制御、音声処理部１０７の制御、メモリおよび記憶部１０２におけるデータの読み出しおよび／または書き込みの制御を行うことで実現される。後述の各種処理は、１つのＣＰＵでの実行に限定されるものではなく、２以上のＣＰＵにより同時または逐次に実行されてもよい。ＣＰＵは、１以上のチップで実装されてもよい。制御部１０１は、周辺装置とのインターフェース、制御装置、演算装置、レジスタなどを含む中央処理装置（ＣＰＵ：Central Processing Unit）で構成されてもよい。メモリは、コンピュータ読み取り可能な記録媒体であり、例えば、ＲＯＭ（Read Only Memory）、ＥＰＲＯＭ（Erasable Programmable ＲＯＭ）、ＥＥＰＲＯＭ（Electrically Erasable Programmable ＲＯＭ）、ＲＡＭ（Random Access Memory）などの少なくとも１つで構成されてもよい。メモリは、レジスタ、キャッシュ、メインメモリ（主記憶装置）などと呼ばれてもよい。メモリは、本発明の一実施形態に係る処理を実施するために実行可能なプログラム（プログラムコード）、ソフトウェアモジュールなどを保存することができる。 The control unit 101 has a CPU (Central Processing Unit) and a memory. The function realized in the content playback device 10 is that the CPU performs calculations by loading the program (software) stored in the storage unit 102 onto hardware such as the CPU and memory, and controls the output unit 103 and the communication unit. This is achieved by controlling the 105, controlling the short-range communication unit 106, controlling the voice processing unit 107, and controlling the reading and / or writing of data in the memory and the storage unit 102. The various processes described later are not limited to execution by one CPU, and may be executed simultaneously or sequentially by two or more CPUs. The CPU may be mounted on one or more chips. The control unit 101 may be composed of a central processing unit (CPU) including an interface with a peripheral device, a control device, an arithmetic unit, a register, and the like. The memory is a computer-readable recording medium, and is composed of at least one such as a ROM (Read Only Memory), an EPROM (Erasable Programmable ROM), an EPROM (Electrically Erasable Programmable ROM), and a RAM (Random Access Memory). You may. The memory may be referred to as a register, a cache, a main memory (main storage device), or the like. The memory can store a program (program code), a software module, or the like that can be executed to perform the process according to the embodiment of the present invention.

図３は、制御部１０１が記憶部１０２に記憶されているプログラムを実行することにより実現する機能のうち、本発明に係る機能の構成を示した機能ブロック図である。取得部１００１は、コンテンツのデータを記憶部１０２または通信部１０５から取得する。再生部１００２は、取得部１００１が取得したコンテンツを再生する。特定部１００３は、音声処理部１０７のマイクロフォンが収音した音を表すデジタル信号を取得し、取得したデジタル信号を解析してコンテンツ再生装置の周囲の状況を特定する。音声認識部１００５は、音声処理部１０７のマイクロフォンが収音した音を表すデジタル信号を取得し、取得したデジタル信号を解析してユーザの音声を認識する。選択部１００４は、コンテンツの音声の出力先を特定部１００３が特定した状況と、再生部１００２が再生するコンテンツの属性に基づいて選択する。選択部１００４は、音声認識部１００５が行った音声認識の結果に基づいて、再生部１００２が再生するコンテンツの音声が選択した出力先へ出力されるように、出力部、近距離通信部１０６、音声処理部１０７を制御する。 FIG. 3 is a functional block diagram showing a configuration of a function according to the present invention among the functions realized by the control unit 101 executing a program stored in the storage unit 102. The acquisition unit 1001 acquires content data from the storage unit 102 or the communication unit 105. The reproduction unit 1002 reproduces the content acquired by the acquisition unit 1001. The identification unit 1003 acquires a digital signal representing the sound picked up by the microphone of the voice processing unit 107, analyzes the acquired digital signal, and identifies the surrounding situation of the content reproduction device. The voice recognition unit 1005 acquires a digital signal representing the sound picked up by the microphone of the voice processing unit 107, analyzes the acquired digital signal, and recognizes the user's voice. The selection unit 1004 selects the output destination of the audio of the content based on the situation in which the specific unit 1003 specifies the output destination and the attribute of the content to be reproduced by the reproduction unit 1002. The selection unit 1004 has an output unit, a short-range communication unit 106, so that the voice of the content to be reproduced by the playback unit 1002 is output to the selected output destination based on the result of the voice recognition performed by the voice recognition unit 1005. Controls the voice processing unit 107.

（実施形態の動作例）
次に本実施形態の動作の一例について説明する。コンテンツ再生装置１０は、マイクロフォンが取得したユーザの音声の音声認識を行う。コンテンツ再生装置１０は、認識した音声の内容がコンテンツの再生を指示する内容である場合、再生するコンテンツを問い合わせる音声を内蔵スピーカから出力する。コンテンツ再生装置１０は、再生するコンテンツをユーザが音声で指示すると、音声認識を行い、認識したコンテンツを取得する。 (Operation example of the embodiment)
Next, an example of the operation of this embodiment will be described. The content playback device 10 performs voice recognition of the user's voice acquired by the microphone. When the content of the recognized voice is the content instructing the playback of the content, the content playback device 10 outputs the voice inquiring about the content to be played from the built-in speaker. When the user instructs the content to be reproduced by voice, the content reproduction device 10 performs voice recognition and acquires the recognized content.

例えば、ユーザは、記憶部１０２に記憶されているコンテンツを再生する場合、再生するコンテンツのタイトル名を音声で発し、インターネットで配信されているラジオ放送を再生する場合、視聴する放送局名を音声で発する。コンテンツ再生装置１０（取得部１００１）は、記憶部１０２に記憶されているコンテンツのタイトル名を認識した場合、認識したタイトル名のコンテンツを記憶部１０２から取得する。また、コンテンツ再生装置１０（取得部１００１）は、ラジオ放送の放送局名を認識した場合、認識した放送局が配信しているラジオ放送のデータを、通信部１０５を制御して取得する。 For example, when the user plays back the content stored in the storage unit 102, the title name of the content to be played is voiced, and when playing a radio broadcast distributed on the Internet, the name of the broadcasting station to be viewed is voiced. Emit with. When the content reproduction device 10 (acquisition unit 1001) recognizes the title name of the content stored in the storage unit 102, the content reproduction device 10 (acquisition unit 1001) acquires the content of the recognized title name from the storage unit 102. Further, when the content reproduction device 10 (acquisition unit 1001) recognizes the name of the broadcasting station of the radio broadcast, the content reproduction device 10 (acquisition unit 1001) controls the communication unit 105 to acquire the data of the radio broadcast distributed by the recognized broadcasting station.

次にコンテンツ再生装置１０は、コンテンツの音声の音声信号を出力する前に、音声信号の出力先を選択する処理を行う。図４は、音声信号の出力先を選択する処理の流れを示したフローチャートである。まずコンテンツ再生装置１０（特定部１００３）は、音声処理部１０７のマイクロフォンから供給されるデジタル信号を解析し、騒音（ノイズ）のレベルを検知する（ステップＳＡ１）。 Next, the content reproduction device 10 performs a process of selecting an output destination of the audio signal before outputting the audio signal of the audio of the content. FIG. 4 is a flowchart showing a flow of processing for selecting an output destination of an audio signal. First, the content reproduction device 10 (specific unit 1003) analyzes the digital signal supplied from the microphone of the voice processing unit 107 and detects the noise level (step SA1).

次にコンテンツ再生装置１０（選択部１００４）は、再生するコンテンツの属性を特定する（ステップＳＡ２）。コンテンツ再生装置１０は、ラジオ放送の音声を再生する場合、コンテンツの属性がラジオ放送であると特定し、記憶部１０２から取得した楽曲のコンテンツを再生する場合、コンテンツの属性が楽曲であると特定する。 Next, the content reproduction device 10 (selection unit 1004) specifies the attribute of the content to be reproduced (step SA2). The content playback device 10 identifies that the attribute of the content is radio broadcasting when reproducing the sound of the radio broadcast, and identifies that the attribute of the content is music when reproducing the content of the music acquired from the storage unit 102. do.

次にコンテンツ再生装置１０（選択部１００４）は、コンテンツの音声信号の出力先を選択する（ステップＳＡ３）。ここでコンテンツ再生装置１０は、ステップＳＡ１で検知したレベルとステップＳＡ２で特定したコンテンツの属性により出力先を選択する。具体的には、コンテンツ再生装置１０は、再生するコンテンツの属性が楽曲であり、検知した騒音のレベルが予め定められた閾値未満である場合、ステップＳＡ３の開始時点で音声信号の出力先として選択されている出力先を、音声信号の出力先とする。例えば、コンテンツ再生装置１０は、再生するコンテンツの属性が楽曲であり、検知した騒音のレベルが予め定められた閾値未満であり、ステップＳＡ３の開始時点で音声信号の出力先が外部スピーカ２０となっている場合、音声信号の出力先として外部スピーカ２０を選択する。また、コンテンツ再生装置１０は、再生するコンテンツの属性が楽曲であり、検知した騒音のレベルが予め定められた閾値以上である場合、音声信号の出力先としてヘッドフォン４０を選択する。また、コンテンツ再生装置１０は、再生するコンテンツの属性がラジオ放送であり、検知した騒音のレベルが予め定められた閾値未満である場合、音声信号の出力先として内蔵スピーカを選択する。また、コンテンツ再生装置１０は、再生するコンテンツの属性がラジオ放送であり、検知した騒音のレベルが予め定められた閾値以上である場合、音声信号の出力先としてヘッドフォン４０を選択する。コンテンツ再生装置１０（選択部１００４）は、選択した出力先からコンテンツの音声が出力されるように音声処理部１０７、近距離通信部１０６、出力部１０３を制御する。 Next, the content reproduction device 10 (selection unit 1004) selects an output destination of the audio signal of the content (step SA3). Here, the content reproduction device 10 selects an output destination according to the level detected in step SA1 and the attribute of the content specified in step SA2. Specifically, when the attribute of the content to be reproduced is a musical piece and the detected noise level is less than a predetermined threshold value, the content reproduction device 10 selects it as an audio signal output destination at the start of step SA3. The output destination is set as the output destination of the audio signal. For example, in the content reproduction device 10, the attribute of the content to be reproduced is music, the detected noise level is less than a predetermined threshold value, and the output destination of the audio signal is the external speaker 20 at the start of step SA3. If so, the external speaker 20 is selected as the output destination of the audio signal. Further, when the attribute of the content to be reproduced is music and the detected noise level is equal to or higher than a predetermined threshold value, the content reproduction device 10 selects the headphones 40 as the output destination of the audio signal. Further, when the attribute of the content to be reproduced is radio broadcasting and the detected noise level is less than a predetermined threshold value, the content reproduction device 10 selects the built-in speaker as the output destination of the audio signal. Further, when the attribute of the content to be reproduced is radio broadcasting and the detected noise level is equal to or higher than a predetermined threshold value, the content reproduction device 10 selects the headphones 40 as the output destination of the audio signal. The content reproduction device 10 (selection unit 1004) controls the audio processing unit 107, the short-range communication unit 106, and the output unit 103 so that the audio of the content is output from the selected output destination.

次にコンテンツ再生装置１０（選択部１００４）は、選択した出力先をユーザに通知する（ステップＳＡ４）。コンテンツ再生装置１０（再生部１００２）は、音声信号の出力先を通知すると、取得したコンテンツを再生する（ステップＳＡ５）。例えば、コンテンツ再生装置１０は、音声信号の出力先として選択した出力先が外部スピーカ２０である場合、「外部スピーカで音声を出力します」という音声を内蔵スピーカから出力する。次にコンテンツ再生装置１０は、コンテンツの再生を開始し、再生したコンテンツの音声を表す音声信号を出力部１０３から外部スピーカ２０へ出力する。外部スピーカ２０は、コンテンツ再生装置１０から供給される音声信号が表す音声を出力する。また、コンテンツ再生装置１０は、音声信号の出力先として選択した出力先がヘッドフォン４０である場合、「ヘッドフォンで音声を出力します」という音声を内蔵スピーカから出力する。次にコンテンツ再生装置１０は、コンテンツの再生を開始し、再生したコンテンツの音声を表す音声信号を近距離通信部１０６から出力する。ヘッドフォン４０は、コンテンツ再生装置１０から供給される音声信号を取得し、取得した音声信号が表す音声を出力する。また、コンテンツ再生装置１０は、音声信号の出力先として選択した出力先が内蔵スピーカである場合、「内蔵スピーカで音声を出力します」という音声を内蔵スピーカから出力する。次にコンテンツ再生装置１０は、コンテンツの再生を開始し、再生したコンテンツの音声を内蔵スピーカで出力する。 Next, the content playback device 10 (selection unit 1004) notifies the user of the selected output destination (step SA4). When the content reproduction device 10 (reproduction unit 1002) notifies the output destination of the audio signal, the content reproduction device 10 reproduces the acquired content (step SA5). For example, when the output destination selected as the output destination of the audio signal is the external speaker 20, the content reproduction device 10 outputs the voice "output the voice by the external speaker" from the built-in speaker. Next, the content reproduction device 10 starts reproduction of the content, and outputs an audio signal representing the sound of the reproduced content from the output unit 103 to the external speaker 20. The external speaker 20 outputs the sound represented by the sound signal supplied from the content reproduction device 10. Further, when the output destination selected as the output destination of the audio signal is the headphone 40, the content playback device 10 outputs the sound "output the sound with the headphones" from the built-in speaker. Next, the content reproduction device 10 starts reproduction of the content, and outputs an audio signal representing the voice of the reproduced content from the short-range communication unit 106. The headphone 40 acquires an audio signal supplied from the content reproduction device 10 and outputs an audio represented by the acquired audio signal. Further, when the output destination selected as the output destination of the audio signal is the built-in speaker, the content playback device 10 outputs the voice "output the voice by the built-in speaker" from the built-in speaker. Next, the content reproduction device 10 starts reproduction of the content and outputs the sound of the reproduced content by the built-in speaker.

コンテンツ再生装置１０は、コンテンツの再生を開始すると、図５の処理を行う。コンテンツ再生装置１０（特定部１００３）は、音声処理部１０７のマイクロフォンから供給されるデジタル信号を解析し、騒音（ノイズ）のレベルを検知する（ステップＳＢ１）。例えば、コンテンツ再生装置１０は、マイクロフォンから供給されるデジタル信号において、ユーザの音声と、出力している音声信号が表す音声をキャンセルし、キャンセル後の音声のレベルを騒音のレベルとする。 When the content reproduction device 10 starts reproduction of the content, the content reproduction device 10 performs the process of FIG. The content playback device 10 (specification unit 1003) analyzes the digital signal supplied from the microphone of the voice processing unit 107 and detects the noise level (step SB1). For example, the content reproduction device 10 cancels the user's voice and the voice represented by the output voice signal in the digital signal supplied from the microphone, and sets the level of the canceled voice as the noise level.

次にコンテンツ再生装置１０（選択部１００４）は、コンテンツの音声信号の出力先を選択する（ステップＳＢ２）。ここでコンテンツ再生装置１０は、ステップＳＢ１で検知したレベルとステップＳＡ２で特定したコンテンツの属性により出力先を選択する。ステップＳＢ１で検知した騒音のレベルは、コンテンツ再生装置１０の周囲の状況の一例である。即ち、コンテンツ再生装置１０は、コンテンツ再生装置１０の周囲の状況およびコンテンツの属性により出力先を選択する。 Next, the content playback device 10 (selection unit 1004) selects an output destination of the audio signal of the content (step SB2). Here, the content reproduction device 10 selects an output destination according to the level detected in step SB1 and the attribute of the content specified in step SA2. The noise level detected in step SB1 is an example of the situation around the content reproduction device 10. That is, the content reproduction device 10 selects an output destination according to the surrounding conditions of the content reproduction device 10 and the attributes of the content.

具体的には、コンテンツ再生装置１０は、再生するコンテンツの属性が楽曲であり、検知した騒音のレベルが予め定められた閾値未満である場合、ステップＳＢ２の開始時点で音声信号の出力先として選択されている出力先を、音声信号の出力先とする。例えば、コンテンツ再生装置１０は、再生するコンテンツの属性が楽曲であり、ステップＳＢ２の開始時点で音声信号の出力先が外部スピーカ２０となっている場合、音声信号の出力先として外部スピーカ２０を選択する。また、コンテンツ再生装置１０は、再生するコンテンツの属性が楽曲であり、検知した騒音のレベルが予め定められた閾値以上である場合、音声信号の出力先としてヘッドフォン４０を選択する。 Specifically, when the attribute of the content to be reproduced is a musical piece and the detected noise level is less than a predetermined threshold value, the content reproduction device 10 selects it as an audio signal output destination at the start of step SB2. The output destination is set as the output destination of the audio signal. For example, in the content reproduction device 10, when the attribute of the content to be reproduced is a musical piece and the output destination of the audio signal is the external speaker 20 at the start of step SB2, the external speaker 20 is selected as the output destination of the audio signal. do. Further, when the attribute of the content to be reproduced is music and the detected noise level is equal to or higher than a predetermined threshold value, the content reproduction device 10 selects the headphones 40 as the output destination of the audio signal.

また、コンテンツ再生装置１０は、再生するコンテンツの属性がラジオ放送であり、検知した騒音のレベルが予め定められた閾値未満である場合、音声信号の出力先として内蔵スピーカを選択する。また、コンテンツ再生装置１０は、再生するコンテンツの属性がラジオ放送であり、検知した騒音のレベルが予め定められた閾値以上である場合、音声信号の出力先としてヘッドフォン４０を選択する。 Further, when the attribute of the content to be reproduced is radio broadcasting and the detected noise level is less than a predetermined threshold value, the content reproduction device 10 selects the built-in speaker as the output destination of the audio signal. Further, when the attribute of the content to be reproduced is radio broadcasting and the detected noise level is equal to or higher than a predetermined threshold value, the content reproduction device 10 selects the headphones 40 as the output destination of the audio signal.

次にコンテンツ再生装置１０（選択部１００４）は、ステップＳＢ２の開始時点での音声信号の出力先と、ステップＳＢ２で選択した音声信号の出力先とが同じであるか判断する（ステップＳＢ３）。コンテンツ再生装置１０は、ステップＳＢ２の開始時点での音声信号の出力先と、ステップＳＢ２で選択した音声信号の出力先とが同じである場合（ステップＳＢ３でＹＥＳ）、処理の流れをステップＳＢ８へ移す。 Next, the content reproduction device 10 (selection unit 1004) determines whether the output destination of the audio signal at the start of step SB2 and the output destination of the audio signal selected in step SB2 are the same (step SB3). When the output destination of the audio signal at the start of step SB2 and the output destination of the audio signal selected in step SB2 are the same (YES in step SB3), the content playback device 10 transfers the processing flow to step SB8. Move.

コンテンツ再生装置１０（選択部１００４）は、ステップＳＢ２の開始時点での音声信号の出力先と、ステップＳＢ２で選択した音声信号の出力先とが異なる場合（ステップＳＢ３でＮＯ）、音声信号の出力先の変更を行うか問い合わせる音声を出力する（ステップＳＢ４）。 The content playback device 10 (selection unit 1004) outputs an audio signal when the output destination of the audio signal at the start of step SB2 and the output destination of the audio signal selected in step SB2 are different (NO in step SB3). Output an audio inquiring whether to make the previous change (step SB4).

例えば、コンテンツの属性が楽曲であり、音声信号の出力先が外部スピーカ２０であり、騒音のレベルが閾値未満である状態から、騒音のレベルが閾値以上となると、ステップＳＢ２において音声信号の出力先としてヘッドフォン４０が選択され、ステップＳＢ２の開始時点での音声信号の出力先と、ステップＳＢ２で選択した音声信号の出力先とが異なる状態となる。この場合、コンテンツ再生装置１０は、ステップＳＢ３でＮＯと判断し、例えば「音声の出力先をヘッドフォンに変更しますか？」との音声を内蔵スピーカから出力し、音声の出力先を変更するか否かをユーザに問い合わせる。 For example, when the attribute of the content is music, the output destination of the audio signal is the external speaker 20, and the noise level is less than the threshold value, and the noise level is equal to or higher than the threshold value, the audio signal output destination is in step SB2. The headphone 40 is selected as the headphone 40, and the output destination of the audio signal at the start of step SB2 and the output destination of the audio signal selected in step SB2 are in different states. In this case, the content playback device 10 determines NO in step SB3, and outputs, for example, the sound "Do you want to change the sound output destination to headphones?" From the built-in speaker, and changes the sound output destination. Ask the user if not.

また、コンテンツの属性がラジオ放送であり、音声信号の出力先が内蔵スピーカであり、騒音のレベルが閾値未満である状態から、騒音のレベルが閾値以上となると、ステップＳＢ２において音声信号の出力先としてヘッドフォン４０が選択され、ステップＳＢ２の開始時点での音声信号の出力先と、ステップＳＢ２で選択した音声信号の出力先とが異なる状態となる。この場合、コンテンツ再生装置１０は、ステップＳＢ３でＮＯと判断し、例えば「音声の出力先をヘッドフォンに変更しますか？」との音声を内蔵スピーカから出力し、音声の出力先を変更するか否かをユーザに問い合わせる。 Further, when the attribute of the content is radio broadcasting, the output destination of the audio signal is the built-in speaker, and the noise level is less than the threshold value, and the noise level is equal to or higher than the threshold value, the audio signal output destination is in step SB2. The headphone 40 is selected as the headphone 40, and the output destination of the audio signal at the start of step SB2 and the output destination of the audio signal selected in step SB2 are in different states. In this case, the content playback device 10 determines NO in step SB3, and outputs, for example, the sound "Do you want to change the sound output destination to headphones?" From the built-in speaker, and changes the sound output destination. Ask the user if not.

また、コンテンツの属性がラジオ放送であり、音声信号の出力先がヘッドフォン４０であり、騒音のレベルが閾値以上である状態から、騒音のレベルが閾値未満となると、ステップＳＢ２において音声信号の出力先として内蔵スピーカが選択され、ステップＳＢ２の開始時点での音声信号の出力先と、ステップＳＢ２で選択した音声信号の出力先とが異なる状態となる。この場合、コンテンツ再生装置１０は、ステップＳＢ３でＮＯと判断し、例えば「音声の出力先を内蔵スピーカに変更しますか？」との音声を表す音声信号をヘッドフォン４０へ出力し、音声の出力先を変更するか否かをユーザに問い合わせる。 Further, when the attribute of the content is radio broadcasting, the output destination of the audio signal is the headphone 40, and the noise level is equal to or higher than the threshold value, and the noise level becomes less than the threshold value, the output destination of the audio signal in step SB2. The built-in speaker is selected as, and the output destination of the audio signal at the start of step SB2 and the output destination of the audio signal selected in step SB2 are in different states. In this case, the content playback device 10 determines NO in step SB3, outputs an audio signal representing the audio such as "Do you want to change the audio output destination to the built-in speaker?" To the headphones 40, and outputs the audio. Ask the user if they want to change the destination.

ユーザは、ステップＳＢ４で出力された音声に応じて音声を発する。コンテンツ再生装置１０（音声認識部１００５）は、ユーザが音声を発すると、音声認識を行う（ステップＳＢ５）。次にコンテンツ再生装置１０（選択部１００４）は、音声認識の結果が、音声信号の出力先の変更を指示する内容であるか判断する（ステップＳＢ６）。コンテンツ再生装置１０は、音声認識の結果、ユーザが発した音声が出力先の変更を指示する内容ではない場合（ステップＳＢ６でＮＯ）、音声の出力先を変更せず、処理の流れをステップＳＢ８へ移す。なお、コンテンツ再生装置１０は、ステップＳＢ４の処理を終えた後、予め定められた時間が経過してもユーザの音声を認識しなかった場合にも、処理の流れをステップＳＢ８へ移すようにしてもよい。 The user emits a voice according to the voice output in step SB4. When the user emits a voice, the content reproduction device 10 (voice recognition unit 1005) performs voice recognition (step SB5). Next, the content reproduction device 10 (selection unit 1004) determines whether the result of the voice recognition is the content instructing the change of the output destination of the voice signal (step SB6). As a result of voice recognition, when the voice uttered by the user does not indicate the change of the output destination (NO in step SB6), the content playback device 10 does not change the voice output destination and follows the processing flow in step SB8. Move to. The content playback device 10 shifts the processing flow to step SB8 even if the user's voice is not recognized even after a predetermined time has elapsed after the processing in step SB4 is completed. May be good.

コンテンツ再生装置１０（選択部１００４）は、ユーザが発した音声が出力先の変更を指示する内容である場合（ステップＳＢ６でＹＥＳ）、音声の出力先をステップＳＢ２で選択した出力先に変更する（ステップＳＢ７）。例えば、コンテンツ再生装置１０は、音声信号の出力先を外部スピーカ２０からヘッドフォン４０に変更する場合、外部スピーカ２０への音声信号の出力を停止し、近距離通信部１０６から無線通信により音声信号をヘッドフォン４０へ出力する。これにより、外部スピーカ２０からはコンテンツの音が出力されなくなり、ヘッドフォン４０からコンテンツの音が出力される。ユーザは、騒音のレベルが大きい状況においては、ヘッドフォン４０から出力される音を聞くため、騒音に遮られることなくコンテンツの音声を聞くことができる。 When the voice emitted by the user indicates the change of the output destination (YES in step SB6), the content playback device 10 (selection unit 1004) changes the output destination of the voice to the output destination selected in step SB2. (Step SB7). For example, when the content playback device 10 changes the output destination of the audio signal from the external speaker 20 to the headphones 40, the content playback device 10 stops the output of the audio signal to the external speaker 20 and transmits the audio signal from the short-range communication unit 106 by wireless communication. Output to headphones 40. As a result, the sound of the content is not output from the external speaker 20, and the sound of the content is output from the headphones 40. Since the user hears the sound output from the headphones 40 in a situation where the noise level is high, the user can hear the sound of the content without being interrupted by the noise.

また、コンテンツ再生装置１０は、音声信号の出力先を内蔵スピーカからヘッドフォン４０に変更する場合、内蔵スピーカへの音声信号の供給を停止し、近距離通信部１０６から無線通信により音声信号をヘッドフォン４０へ出力する。これにより、内蔵スピーカからはコンテンツの音が出力されなくなり、ヘッドフォン４０からコンテンツの音が出力される。ユーザは、騒音のレベルが大きい状況においては、ヘッドフォン４０から出力される音を聞くため、騒音に遮られることなくコンテンツの音声を聞くことができる。 Further, when the output destination of the audio signal is changed from the built-in speaker to the headphone 40, the content playback device 10 stops the supply of the audio signal to the built-in speaker and transmits the audio signal from the short-range communication unit 106 by wireless communication to the headphone 40. Output to. As a result, the sound of the content is not output from the built-in speaker, and the sound of the content is output from the headphone 40. Since the user hears the sound output from the headphones 40 in a situation where the noise level is high, the user can hear the sound of the content without being interrupted by the noise.

また、コンテンツ再生装置１０は、音声信号の出力先をヘッドフォン４０から内蔵スピーカに変更する場合、近距離通信部１０６からの音声信号の出力を停止し、音声信号を内蔵スピーカへ供給する。これにより、ヘッドフォン４０からはコンテンツの音が出力されなくなり、内蔵スピーカからコンテンツの音が出力される。騒音のレベルが小さい状況では、内部スピーカからコンテンツの音が出力されるため、ユーザは、騒音に遮られることなく、内蔵スピーカから出力されるコンテンツの音を聞くことができる。 Further, when the output destination of the audio signal is changed from the headphone 40 to the built-in speaker, the content reproduction device 10 stops the output of the audio signal from the short-range communication unit 106 and supplies the audio signal to the built-in speaker. As a result, the sound of the content is not output from the headphones 40, and the sound of the content is output from the built-in speaker. In a situation where the noise level is low, the sound of the content is output from the internal speaker, so that the user can hear the sound of the content output from the built-in speaker without being interrupted by the noise.

コンテンツ再生装置１０は、ステップＳＢ７の処理を終えた後、ステップＳＢ３でＹＥＳと判断した場合、またはステップＳＢ６でＮＯと判断した場合、コンテンツの再生が終了したか判断する（ステップＳＢ８）。例えば、コンテンツ再生装置１０は、記憶部１０２から取得したコンテンツを再生している場合、コンテンツの再生を終えていないと、コンテンツの再生が終了していないと判断し、処理の流れをステップＳＢ１へ戻す。また、コンテンツ再生装置１０は、コンテンツの再生停止を指示するユーザの音声を認識していない場合、処理の流れをステップＳＢ１へ戻す。コンテンツ再生装置１０は、記憶部１０２から取得したコンテンツの再生を終えると、ステップＳＢ８でコンテンツの再生が終了したと判断し、図５の処理を終了する。また、コンテンツ再生装置１０は、コンテンツの再生停止を指示するユーザの音声を認識した場合、ステップＳＢ８でコンテンツの再生が終了したと判断し、図５の処理を終了する。 After finishing the process of step SB7, the content reproduction device 10 determines whether the content reproduction is completed when it is determined to be YES in step SB3 or NO in step SB6 (step SB8). For example, when the content reproduction device 10 is reproducing the content acquired from the storage unit 102, the content reproduction device 10 determines that the reproduction of the content is not completed unless the reproduction of the content is completed, and proceeds to the process flow to step SB1. return. Further, when the content reproduction device 10 does not recognize the voice of the user instructing to stop the reproduction of the content, the process flow is returned to step SB1. When the content reproduction device 10 finishes the reproduction of the content acquired from the storage unit 102, the content reproduction device 10 determines in step SB8 that the reproduction of the content is completed, and ends the process of FIG. Further, when the content reproduction device 10 recognizes the voice of the user instructing to stop the reproduction of the content, it determines that the reproduction of the content is completed in step SB8, and ends the process of FIG.

本発明によれば、コンテンツの属性とコンテンツ再生装置１０の周囲の状況に応じて音声信号の出力先が変更されるため、コンテンツを視聴する状況が変化してもコンテンツに応じて適切に音声を視聴することができる。 According to the present invention, the output destination of the audio signal is changed according to the attributes of the content and the surrounding conditions of the content playback device 10. Therefore, even if the situation of viewing the content changes, the audio can be appropriately produced according to the content. You can watch it.

［変形例］
以上、本発明の実施形態について説明したが、本発明は上述した実施形態に限定されることなく、他の様々な形態で実施可能である。例えば、上述の実施形態を以下のように変形して本発明を実施してもよい。なお、上述した実施形態および以下の変形例は、各々を組み合わせてもよい。 [Modification example]
Although the embodiments of the present invention have been described above, the present invention is not limited to the above-described embodiments, and can be implemented in various other embodiments. For example, the present invention may be carried out by modifying the above-described embodiment as follows. The above-described embodiment and the following modifications may be combined.

本発明においては、騒音のレベルの変化が変化したときだけでなく、ユーザが発した音声に応じて、音声信号の出力先を変更してもよい。 In the present invention, the output destination of the voice signal may be changed not only when the change in the noise level changes but also according to the voice emitted by the user.

上述した実施形態においては、コンテンツ再生装置１０がユーザの音声を認識しているが、外部のサーバ装置へユーザの音声を表す情報を送信し、送信した情報を用いて外部のサーバ装置がユーザの音声を認識し、認識結果をコンテンツ再生装置１０へ送る構成であってもよい。 In the above-described embodiment, the content playback device 10 recognizes the user's voice, but the information representing the user's voice is transmitted to the external server device, and the external server device uses the transmitted information to perform the user's voice. The configuration may be such that the voice is recognized and the recognition result is sent to the content reproduction device 10.

上述した実施形態においては、コンテンツ再生装置１０は、周囲の状況およびコンテンツの属性により音声信号の出力先を選択し、コンテンツ再生装置１０の周囲の状況の一例として、騒音のレベルを用いているが、周囲の状況は騒音のレベルに限定されるものではない。例えば、コンテンツ再生装置１０が撮像素子を有し、撮像素子で得られた画像から周囲にいる人間の距離および／または人数を特定し、特定結果とコンテンツの属性に応じて音声信号の出力先を選択してもよい。例えば、コンテンツ再生装置１０は、周囲にいる人間の人数を特定し、コンテンツがロックであり、周囲にいる人間の人数が二人以上である場合、音声信号の出力先として外部スピーカ２０を選択し、コンテンツがロックであり、周囲にいる人間の人数が一人である場合、音声信号の出力先としてヘッドフォン４０を選択してもよい。また、コンテンツ再生装置１０は、コンテンツ再生装置１０からユーザまでの距離が外部スピーカ２０からユーザまでの距離より遠く、コンテンツの属性がクラシックである場合、音声信号の出力先として内蔵スピーカを選択し、外部スピーカ２０からユーザまでの距離がコンテンツ再生装置１０からユーザまでの距離より遠く、コンテンツの属性がクラシックである場合、音声信号の出力先として外部スピーカ２０を選択してもよい。 In the above-described embodiment, the content playback device 10 selects the output destination of the audio signal according to the surrounding conditions and the attributes of the content, and uses the noise level as an example of the surrounding conditions of the content playback device 10. , Surrounding conditions are not limited to noise levels. For example, the content reproduction device 10 has an image sensor, specifies the distance and / or the number of people in the vicinity from the image obtained by the image sensor, and determines the output destination of the audio signal according to the specific result and the attribute of the content. You may choose. For example, the content playback device 10 specifies the number of people around, and when the content is locked and the number of people around is two or more, the external speaker 20 is selected as the output destination of the audio signal. When the content is locked and the number of people around is one, the headphone 40 may be selected as the output destination of the audio signal. Further, when the distance from the content reproduction device 10 to the user is farther than the distance from the external speaker 20 to the user and the attribute of the content is classic, the content reproduction device 10 selects the built-in speaker as the output destination of the audio signal. When the distance from the external speaker 20 to the user is farther than the distance from the content playback device 10 to the user and the attribute of the content is classic, the external speaker 20 may be selected as the output destination of the audio signal.

また、コンテンツ再生装置１０は、予め定められたユーザの位置を撮像素子で得られた画像から特定し、特定した位置とコンテンツの属性に応じて音声信号の出力先を選択するようにしてもよい。コンテンツ再生装置１０は、予め定められたユーザの位置を特定し、コンテンツ再生装置１０から特定した位置までの距離が予め定められた閾値以上であり、コンテンツの属性がクラシックである場合、音声信号の出力先として外部スピーカ２０を選択し、コンテンツ再生装置１０から特定した位置までの距離が予め定められた閾値未満であり、コンテンツの属性がクラシックである場合、音声信号の出力先として内蔵スピーカを選択してもよい。 Further, the content reproduction device 10 may specify a predetermined user position from the image obtained by the image sensor, and select an audio signal output destination according to the specified position and the attribute of the content. .. The content playback device 10 identifies a predetermined user position, and when the distance from the content playback device 10 to the specified position is equal to or greater than a predetermined threshold value and the content attribute is classic, the audio signal When the external speaker 20 is selected as the output destination, the distance from the content playback device 10 to the specified position is less than a predetermined threshold, and the content attribute is classic, the built-in speaker is selected as the output destination of the audio signal. You may.

コンテンツ再生装置１０は、予め定められたユーザを撮像素子で得られた画像から認識する場合、予め定められたユーザ以外の人間の有無に応じて音声信号の出力先を変更するようにしてもよい。コンテンツ再生装置１０は、予め定められたユーザ以外の人間がコンテンツ再生装置１０の周囲に存在し、コンテンツの属性がクラシックである場合、音声信号の出力先としてヘッドフォン４０を選択し、予め定められたユーザ以外の人間がコンテンツ再生装置１０の周囲に存在せず、コンテンツの属性がクラシックである場合、音声信号の出力先として外部スピーカ２０または内蔵スピーカを選択してもよい。 When the content reproduction device 10 recognizes a predetermined user from an image obtained by the image sensor, the content reproduction device 10 may change the output destination of the audio signal according to the presence or absence of a person other than the predetermined user. .. In the content playback device 10, when a person other than a predetermined user exists around the content playback device 10 and the attribute of the content is classic, the headphone 40 is selected as the output destination of the audio signal, and the content playback device 10 is predetermined. When a person other than the user does not exist around the content reproduction device 10 and the attribute of the content is classic, the external speaker 20 or the built-in speaker may be selected as the output destination of the audio signal.

本発明においては、コンテンツ再生装置１０は、時間帯も加味して音声信号の出力先を選択する構成としてもよい。例えば、上述した実施形態では、コンテンツ再生装置１０は、コンテンツの属性がラジオ放送であり、騒音のレベルが閾値未満であり、時刻が昼の時間帯である場合には、音声信号の出力先を内蔵スピーカとし、コンテンツの属性がラジオ放送であり、騒音のレベルが閾値未満であり、時刻が夜の時間帯である場合には、音声信号の出力先をヘッドフォン４０としてもよい。 In the present invention, the content reproduction device 10 may be configured to select the output destination of the audio signal in consideration of the time zone. For example, in the above-described embodiment, when the content attribute is radio broadcasting, the noise level is less than the threshold value, and the time is in the daytime zone, the content playback device 10 sets the output destination of the audio signal. When the built-in speaker, the attribute of the content is radio broadcasting, the noise level is less than the threshold value, and the time is in the night time zone, the output destination of the audio signal may be the headphone 40.

本発明においては、コンテンツ再生装置１０は、マイクロフォンで取得する音声により、複数のユーザの各々を認識してもよい。コンテンツ再生装置１０は、複数のユーザの各々を認識する場合、同じ状況と同じコンテンツであっても、音声信号の出力先がユーザ毎に設定されている構成であってもよい。 In the present invention, the content reproduction device 10 may recognize each of a plurality of users by the voice acquired by the microphone. When recognizing each of a plurality of users, the content playback device 10 may have the same content in the same situation or may have a configuration in which the output destination of the audio signal is set for each user.

本発明においては、コンテンツの音声として、音声アシスタントの音声を含めてもよい。音声アシスタントの音声を出力する際にも、上述した実施形態でコンテンツが楽曲である場合と同様に音声の出力先を変更してもよい。 In the present invention, the voice of the voice assistant may be included as the voice of the content. When outputting the voice of the voice assistant, the voice output destination may be changed as in the case where the content is a musical piece in the above-described embodiment.

上述した実施形態においては、コンテンツ再生装置１０は、ネットワークを介してコンテンツを取得しているが、近距離通信部１０６で他の装置から近距離通信でコンテンツを取得してもよい。また、ＵＳＢメモリ、メモリカードなどの外部メモリにアクセスするインターフェースをコンテンツ再生装置１０に設け、インターフェースに装着された外部メモリからコンテンツを取得してもよい。 In the above-described embodiment, the content reproduction device 10 acquires the content via the network, but the short-range communication unit 106 may acquire the content from another device by short-range communication. Further, the content playback device 10 may be provided with an interface for accessing an external memory such as a USB memory or a memory card, and the content may be acquired from the external memory mounted on the interface.

上述した実施形態においては、コンテンツ再生装置１０は、ユーザの音声の認識結果が音声信号の出力先の変更を指示する内容である場合、音声信号の出力先を変更しているが、音声信号の出力先を変更する構成は、実施形態の構成に限定されるものではない。例えば、コンテンツ再生装置１０は、ステップＳＢ３の後、音声信号の出力先をステップＳＢ２で選択した出力先へ変更する内容の音声を出力し、ステップＳＢ３〜ステップＳＢ６の処理を行わない構成としてもよい。 In the above-described embodiment, the content playback device 10 changes the output destination of the audio signal when the recognition result of the user's voice indicates the change of the output destination of the audio signal. The configuration for changing the output destination is not limited to the configuration of the embodiment. For example, the content playback device 10 may be configured such that after step SB3, the audio signal output destination is changed to the output destination selected in step SB2, and the audio content is output, and the processes of steps SB3 to SB6 are not performed. ..

上記実施形態の説明に用いたブロック図は、機能単位のブロックを示している。これらの機能ブロック（構成部）は、ハードウェアおよび／またはソフトウェアの任意の組み合わせによって実現される。また、各機能ブロックの実現手段は特に限定されない。即ち、各機能ブロックは、物理的および／または論理的に結合した１つの装置により実現されてもよいし、物理的および／または論理的に分離した２つ以上の装置を直接的および／または間接的に（例えば、有線および／または無線）で接続し、これら複数の装置により実現されてもよい。 The block diagram used in the description of the above embodiment shows a block of functional units. These functional blocks (components) are realized by any combination of hardware and / or software. Further, the means for realizing each functional block is not particularly limited. That is, each functional block may be realized by one physically and / or logically coupled device, or directly and / or indirectly by two or more physically and / or logically separated devices. (For example, wired and / or wireless) may be connected and realized by these plurality of devices.

また、コンテンツ再生装置１０は、マイクロプロセッサ、デジタル信号プロセッサ（ＤＳＰ：Digital Signal Processor）、ＡＳＩＣ（Application Specific Integrated Circuit）、ＰＬＤ（Programmable Logic Device）、ＦＰＧＡ（Field Programmable Gate Array）などのハードウェアを含んで構成されてもよく、当該ハードウェアにより、各機能ブロックの一部または全てが実現されてもよい。例えば、制御部１０１は、これらのハードウェアの少なくとも１つで実装されてもよい。 Further, the content playback device 10 includes hardware such as a microprocessor, a digital signal processor (DSP: Digital Signal Processor), an ASIC (Application Specific Integrated Circuit), a PLD (Programmable Logic Device), and an FPGA (Field Programmable Gate Array). It may be configured by, and a part or all of each functional block may be realized by the hardware. For example, the control unit 101 may be implemented by at least one of these hardware.

本明細書で説明した実施形態／変形例の処理手順、フローチャートなどは、矛盾の無い限り、順序を入れ替えてもよい。例えば、本明細書で説明した方法については、例示的な順序で様々なステップの要素を提示しており、提示した特定の順序に限定されない。 The order of the processing procedures, flowcharts, and the like of the embodiments / modifications described in the present specification may be changed as long as there is no contradiction. For example, the methods described herein present elements of various steps in an exemplary order, and are not limited to the particular order presented.

入出力された情報等は特定の場所（例えば、メモリ）に保存されてもよいし、管理テーブルで管理してもよい。入出力される情報等は、上書き、更新、または追記され得る。出力された情報等は削除されてもよい。入力された情報等は他の装置へ送信されてもよい。 The input / output information and the like may be stored in a specific location (for example, a memory) or may be managed by a management table. Input / output information and the like can be overwritten, updated, or added. The output information and the like may be deleted. The input information or the like may be transmitted to another device.

判定は、１ビットで表される値（０か１か）によって行われてもよいし、真偽値（Boolean：trueまたはfalse）によって行われてもよいし、数値の比較（例えば、所定の値との比較）によって行われてもよい。 The determination may be made by a value represented by 1 bit (0 or 1), by a boolean value (Boolean: true or false), or by comparing numerical values (for example, a predetermined value). It may be done by comparison with the value).

本明細書で説明した各態様／実施形態は単独で用いてもよいし、組み合わせて用いてもよいし、実行に伴って切り替えて用いてもよい。また、所定の情報の通知（例えば、「Ｘであること」の通知）は、明示的に行うものに限られず、暗黙的（例えば、当該所定の情報の通知を行わない）ことによって行われてもよい。 Each aspect / embodiment described in the present specification may be used alone, in combination, or switched with execution. Further, the notification of predetermined information (for example, the notification of "being X") is not limited to the explicit one, but is performed implicitly (for example, the notification of the predetermined information is not performed). May be good.

ソフトウェアは、ソフトウェア、ファームウェア、ミドルウェア、マイクロコード、ハードウェア記述言語と呼ばれるか、他の名称で呼ばれるかを問わず、命令、命令セット、コード、コードセグメント、プログラムコード、プログラム、サブプログラム、ソフトウェアモジュール、アプリケーション、ソフトウェアアプリケーション、ソフトウェアパッケージ、ルーチン、サブルーチン、オブジェクト、実行可能ファイル、実行スレッド、手順、機能などを意味するよう広く解釈されるべきである。また、ソフトウェア、命令などは、伝送媒体を介して送受信されてもよい。例えば、ソフトウェアが、同軸ケーブル、光ファイバケーブル、ツイストペアおよびデジタル加入者回線（ＤＳＬ）などの有線技術および／または赤外線、無線およびマイクロ波などの無線技術を使用してウェブサイト、サーバ、または他のリモートソースから送信される場合、これらの有線技術および／または無線技術は、伝送媒体の定義内に含まれる。 Software, whether referred to as software, firmware, middleware, microcode, hardware description language, or by any other name, is an instruction, instruction set, code, code segment, program code, program, subprogram, software module. , Applications, software applications, software packages, routines, subroutines, objects, executable files, execution threads, procedures, features, etc. should be broadly interpreted. Further, software, instructions, and the like may be transmitted and received via a transmission medium. For example, the software uses wired technology such as coaxial cable, fiber optic cable, twisted pair and digital subscriber line (DSL) and / or wireless technology such as infrared, wireless and microwave to websites, servers, or other When transmitted from a remote source, these wired and / or wireless technologies are included within the definition of transmission medium.

本明細書で説明した情報、信号などは、様々な異なる技術のいずれかを使用して表されてもよい。例えば、上記の説明全体に渡って言及され得るデータ、命令、コマンド、情報、信号、ビット、シンボル、チップなどは、電圧、電流、電磁波、磁界若しくは磁性粒子、光場若しくは光子、またはこれらの任意の組み合わせによって表されてもよい。なお、本明細書で説明した用語および／または本明細書の理解に必要な用語については、同一のまたは類似する意味を有する用語と置き換えてもよい。 The information, signals, etc. described herein may be represented using any of a variety of different techniques. For example, data, instructions, commands, information, signals, bits, symbols, chips, etc. that may be referred to throughout the above description are voltages, currents, electromagnetic waves, magnetic fields or magnetic particles, light fields or photons, or any of these. It may be represented by a combination of. The terms described herein and / or the terms necessary for understanding the present specification may be replaced with terms having the same or similar meanings.

本明細書で使用する「システム」および「ネットワーク」という用語は、互換的に使用される。 The terms "system" and "network" as used herein are used interchangeably.

本明細書で使用する「判断（determining）」、「決定（determining）」という用語は、多種多様な動作を包含する場合がある。「判断」、「決定」は、例えば、判定（judging）、計算（calculating）、算出（computing）、処理（processing）、導出（deriving）、調査（investigating）、探索（looking up）（例えば、テーブル、データベースまたは別のデータ構造での探索）、確認（ascertaining）した事を「判断」「決定」したとみなす事などを含み得る。また、「判断」、「決定」は、受信（receiving）（例えば、情報を受信すること）、送信（transmitting）（例えば、情報を送信すること）、入力（input）、出力（output）、アクセス（accessing）（例えば、メモリ中のデータにアクセスすること）した事を「判断」「決定」したとみなす事などを含み得る。また、「判断」、「決定」は、解決（resolving）、選択（selecting）、選定（choosing）、確立（establishing）、比較（comparing）などした事を「判断」「決定」したとみなす事を含み得る。つまり、「判断」「決定」は、何らかの動作を「判断」「決定」したとみなす事を含み得る。 The terms "determining" and "determining" as used herein may include a wide variety of actions. "Judgment" and "decision" are, for example, judgment, calculation, computing, processing, deriving, investigating, looking up (for example, table). , Searching in a database or another data structure), ascertaining can be regarded as "judgment" or "decision". Also, "judgment" and "decision" are receiving (for example, receiving information), transmitting (for example, transmitting information), input (input), output (output), and access. (Accessing) (for example, accessing data in memory) may be regarded as "judgment" or "decision". In addition, "judgment" and "decision" mean that "resolving", "selecting", "choosing", "establishing", "comparing", etc. are regarded as "judgment" and "decision". Can include. That is, "judgment" and "decision" may include considering some action as "judgment" and "decision".

「接続された（connected）」、「結合された（coupled）」という用語、またはこれらのあらゆる変形は、２またはそれ以上の要素間の直接的または間接的なあらゆる接続または結合を意味し、互いに「接続」または「結合」された２つの要素間に１またはそれ以上の中間要素が存在することを含むことができる。要素間の結合または接続は、物理的なものであっても、論理的なものであっても、あるいはこれらの組み合わせであってもよい。本明細書で使用する場合、２つの要素は、１またはそれ以上の電線、ケーブルおよび／またはプリント電気接続を使用することにより、並びにいくつかの非限定的かつ非包括的な例として、無線周波数領域、マイクロ波領域および光（可視および不可視の両方）領域の波長を有する電磁エネルギーなどの電磁エネルギーを使用することにより、互いに「接続」または「結合」されると考えることができる。 The terms "connected", "coupled", or any variation thereof, mean any direct or indirect connection or connection between two or more elements, and each other. It can include the presence of one or more intermediate elements between two "connected" or "combined" elements. The connection or connection between the elements may be physical, logical, or a combination thereof. As used herein, the two elements are by using one or more wires, cables and / or printed electrical connections, and, as some non-limiting and non-comprehensive examples, radio frequencies. By using electromagnetic energies such as electromagnetic energies with wavelengths in the region, microwave region and light (both visible and invisible) regions, they can be considered to be "connected" or "coupled" to each other.

本明細書で使用する「に基づいて」という記載は、別段に明記されていない限り、「のみに基づいて」を意味しない。言い換えれば、「に基づいて」という記載は、「のみに基づいて」と「に少なくとも基づいて」の両方を意味する。 The phrase "based on" as used herein does not mean "based on" unless otherwise stated. In other words, the statement "based on" means both "based only" and "at least based on".

上記の各装置の構成における「手段」を、「部」、「回路」、「デバイス」等に置き換えてもよい。 The "means" in the configuration of each of the above devices may be replaced with a "part", a "circuit", a "device" and the like.

「含む（ｉｎｃｌｕｄｉｎｇ）」、「含んでいる（ｃｏｍｐｒｉｓｉｎｇ）」、およびそれらの変形が、本明細書あるいは特許請求の範囲で使用されている限り、これら用語は、用語「備える」と同様に、包括的であることが意図される。さらに、本明細書あるいは特許請求の範囲において使用されている用語「または（or）」は、排他的論理和ではないことが意図される。 As long as "inclusion," "comprising," and variations thereof are used herein or in the claims, these terms are as comprehensive as the term "comprising." Intended to be targeted. Furthermore, the term "or" as used herein or in the claims is intended not to be an exclusive OR.

本開示の全体において、例えば、英語でのa, an, およびtheのように、翻訳により冠詞が追加された場合、これらの冠詞は、文脈から明らかにそうではないことが示されていなければ、複数のものを含むものとする。 Throughout this disclosure, if articles are added by translation, for example, a, an, and the in English, unless the context clearly indicates that these articles are not. It shall include more than one.

以上、本発明について詳細に説明したが、当業者にとっては、本発明が本明細書中に説明した実施形態に限定されるものではないということは明らかである。本発明は、特許請求の範囲の記載により定まる本発明の趣旨および範囲を逸脱することなく修正および変更態様として実施することができる。したがって、本明細書の記載は、例示説明を目的とするものであり、本発明に対して何ら制限的な意味を有するものではない。 Although the present invention has been described in detail above, it is clear to those skilled in the art that the present invention is not limited to the embodiments described herein. The present invention can be implemented as modifications and modifications without departing from the spirit and scope of the invention as defined by the claims. Therefore, the description of the present specification is for the purpose of exemplification and does not have any limiting meaning to the present invention.

１０…コンテンツ再生装置、２０…外部スピーカ、４０…ヘッドフォン、１０１…制御部、１０２…記憶部、１０３…出力部、１０５…通信部、１０６…近距離通信部、１０７…音声処理部、１００１…取得部、１００２…再生部、１００３…特定部、１００４…選択部、１００５…音声認識部。 10 ... Content playback device, 20 ... External speaker, 40 ... Headphones, 101 ... Control unit, 102 ... Storage unit, 103 ... Output unit, 105 ... Communication unit, 106 ... Short-range communication unit, 107 ... Voice processing unit, 1001 ... Acquisition unit, 1002 ... playback unit, 1003 ... specific unit, 1004 ... selection unit, 1005 ... voice recognition unit.

Claims

How to get the content and how to get it
A reproduction means for reproducing the content acquired by the acquisition means, and a reproduction means.
Specific means to identify the number of people around your device,
A selection means for selecting the output destination of the sound of the content played by the playback means from a plurality of output destinations based on the attributes of the content and the number specified by the specific means.
It is provided with an output means for outputting the sound of the content reproduced by the reproduction means to the output destination selected by the selection means .
The selection means selects a speaker as an output destination in the first case where the number is 2 or more among the two cases in which the attributes of the contents are common, and in the second case where the number is 1. Is a playback device characterized by selecting headphones as the output destination.

Computer,
How to get the content and how to get it
A reproduction means for reproducing the content acquired by the acquisition means, and a reproduction means.
Specific means to identify the number of people around your device,
A selection means for selecting the output destination of the sound of the content played by the playback means from a plurality of output destinations based on the attributes of the content and the number specified by the specific means.
A program for functioning as an output means for outputting the sound of the content reproduced by the reproduction means to the output destination selected by the selection means .
The selection means selects a speaker as an output destination in the first case where the number is 2 or more among the two cases in which the attributes of the contents are common, and in the second case where the number is 1. Is a program that features selecting headphones as the output destination.

The acquisition step to acquire the content and
A playback step for playing back the content acquired in the acquisition step, and
Specific steps to identify the number of people around your device,
A selection step of selecting the audio output destination of the content played in the playback step from a plurality of output destinations based on the attributes of the content and the number specified in the specific step.
It is provided with an output step for outputting the sound of the content played in the playback step to the output destination selected in the selection step .
In the selection step, of the two cases in which the attributes of the contents are common, in the first case where the number is 2 or more, the speaker is selected as the output destination, and in the second case where the number is 1. Is a playback method characterized by selecting headphones as the output destination.