JP2019174665A

JP2019174665A - Information processor, control method and program operable by voice

Info

Publication number: JP2019174665A
Application number: JP2018063184A
Authority: JP
Inventors: 達郎五十嵐; Tatsuro Igarashi
Original assignee: SoftBank Corp
Current assignee: SoftBank Corp
Priority date: 2018-03-28
Filing date: 2018-03-28
Publication date: 2019-10-10
Anticipated expiration: 2038-03-28
Also published as: JP6728261B2

Abstract

To provide an information processor capable of reducing erroneous start by a specific voice command from a television etc. while maintaining convenience and at low cost.SOLUTION: The information processing apparatus according to an embodiment of the present invention is an information processor operable by voice, including a recognition unit which recognizes a specific voice command, an activation unit which activates the information processor in response to the specific voice command, and a detection unit which detects a predetermined sound. When the activation unit detects the predetermined sound, the activation unit stops an activation process of the information processor in response to the specific voice command for a predetermined period.SELECTED DRAWING: Figure 2

Description

本発明は、音声で操作可能な情報処理装置、制御方法及びプログラムに関する。 The present invention relates to an information processing apparatus that can be operated by voice, a control method, and a program.

従来、ユーザからの特定の音声コマンド（ウェイクワード）に反応して起動するスピーカーが知られていた。そして、このようなスピーカーに関し、ユーザ以外からの特定の音声コマンドによって、当該スピーカーが誤起動してしまうことを防止する技術が開発されていた。例えば、特許文献１には、スピーカーに呼びかけたユーザ（発話者）が、予め許可されたユーザか、当該ユーザ以外の詐称者かを、話者識別技術により判別することが開示されている。そして、特許文献１に記載のスピーカーは、予め登録された登録者からの音声コマンドのみに反応して起動することによって、ユーザ以外からの音声コマンドによる誤起動を防止している。 Conventionally, a speaker that is activated in response to a specific voice command (wake word) from a user has been known. And about such a speaker, the technique which prevents the said speaker from starting accidentally by the specific audio | voice command from those other than a user was developed. For example, Patent Document 1 discloses that a speaker identification technique determines whether a user (speaker) calling on a speaker is a previously authorized user or an impersonator other than the user. The speaker described in Patent Document 1 is activated only in response to a voice command from a registrant registered in advance, thereby preventing erroneous activation due to a voice command from a user other than the user.

特開２０１７―０６８２４３号公報JP 2017-068243 A

ここで、ユーザ以外からの特定の音声コマンドとして、例えば、テレビやラジオ、ＰＣなどから発せられる特定の音声コマンドが考えられる。このようなテレビ等からの特定の音声コマンドによってスピーカーが誤起動することを、防止する必要がある。 Here, as a specific voice command from a user other than the user, for example, a specific voice command issued from a television, radio, PC or the like can be considered. It is necessary to prevent the speaker from being erroneously activated by a specific voice command from such a television or the like.

しかしながら、テレビ等からの音声を特許文献１に記載の話者識別技術によって識別するには、テレビ等からの音声が高音質化していることもあり、高コストになるおそれがある。 However, in order to identify the sound from the television or the like by the speaker identification technique described in Patent Document 1, the sound from the television or the like may be improved in sound quality, which may increase the cost.

また、特許文献１に記載のスピーカーは、予め登録された登録者でなければスピーカーを操作できないため、例えば来客者など登録者以外の者がスピーカーを全く操作できなくなってしまい、利便性に欠けるという問題も生じる。 Further, the speaker described in Patent Document 1 cannot be operated by a registrant who has been registered in advance. For example, a person other than a registered person such as a visitor cannot operate the speaker at all, which is not convenient. Problems also arise.

そこで、本発明は、上述のような問題に鑑み、利便性を保ちつつ、且つ、低コストにより、テレビ等からの特定の音声コマンドによって誤起動することを低減可能な情報処理装置、制御方法及びプログラムを提供することを目的とする。 Accordingly, in view of the above-described problems, the present invention is an information processing apparatus, a control method, and an information processing apparatus capable of reducing erroneous activation by a specific voice command from a television or the like while maintaining convenience and at low cost. The purpose is to provide a program.

本発明の一実施形態における情報処理装置は、音声によって操作可能な情報処理装置であって、特定の音声コマンドを認識する認識部と、特定の音声コマンドに応答して、情報処理装置を起動する起動部と、所定のサウンドを検出する検出部と、を備え、起動部は、所定のサウンドを検出した場合、所定の期間、特定の音声コマンドに応答した情報処理装置の起動処理を停止することを特徴とする。 An information processing apparatus according to an embodiment of the present invention is an information processing apparatus operable by voice, and recognizes a specific voice command and activates the information processing apparatus in response to the specific voice command. An activation unit, and a detection unit that detects a predetermined sound, and the activation unit stops activation processing of the information processing apparatus in response to a specific voice command for a predetermined period when the predetermined sound is detected. It is characterized by.

本発明の一実施形態における情報処理装置において、所定のサウンドは、広告放送に含まれる所定のサウンドロゴであり、検出部は、広告放送に含まれる所定のサウンドロゴを検出し、起動部は、所定のサウンドロゴを検出した場合、所定の期間、特定の音声コマンドに応答した情報処理装置の起動処理を停止することを特徴としてもよい。 In the information processing apparatus according to the embodiment of the present invention, the predetermined sound is a predetermined sound logo included in the advertisement broadcast, the detection unit detects the predetermined sound logo included in the advertisement broadcast, and the activation unit includes: When a predetermined sound logo is detected, the start processing of the information processing apparatus responding to a specific voice command may be stopped for a predetermined period.

本発明の一実施形態における情報処理装置において、認識部は、予め登録されたユーザが発した特定の音声コマンドを認識し、起動部は、所定の期間において、予め登録されたユーザ以外のユーザが発した特定の音声コマンドに対しては情報処理装置の起動処理を停止し、予め登録されたユーザが発した特定の音声コマンドに対しては情報処理装置を起動することを特徴としてもよい。 In the information processing apparatus according to an embodiment of the present invention, the recognition unit recognizes a specific voice command issued by a user registered in advance, and the activation unit receives a user other than the user registered in advance for a predetermined period. The activation process of the information processing apparatus may be stopped for a specific voice command issued, and the information processing apparatus may be started for a specific voice command issued by a previously registered user.

本発明の一実施形態における情報処理装置において、検出部は、所定のサウンドに加えて、当該所定のサウンドが発せられた方向を検出し、起動部は、所定の期間において、所定のサウンドが発せられた方向から発せられた特定の音声コマンドに対しては情報処理装置の起動処理を停止し、所定のサウンドが発せられた方向以外の方向から発せられた特定の音声コマンドに対しては情報処理装置を起動することを特徴としてもよい。 In the information processing apparatus according to the embodiment of the present invention, the detection unit detects a direction in which the predetermined sound is emitted in addition to the predetermined sound, and the activation unit emits the predetermined sound in a predetermined period. The activation processing of the information processing apparatus is stopped for a specific voice command issued from a specified direction, and information processing is performed for a specific voice command issued from a direction other than the direction from which a predetermined sound is generated. The apparatus may be activated.

本発明の一実施形態における情報処理装置において、所定の期間は、複数種類の所定のサウンドの各々に対応して設定され、検出部は、複数種類の所定のサウンドの少なくとも１つを検出可能であり、起動部は、検出部が検出した所定のサウンドに対応して設定された所定の期間、特定の音声コマンドに応答した情報処理装置の起動処理を停止することを特徴としてもよい。 In the information processing apparatus according to an embodiment of the present invention, the predetermined period is set corresponding to each of a plurality of types of predetermined sounds, and the detection unit can detect at least one of the plurality of types of predetermined sounds. In addition, the activation unit may stop activation processing of the information processing apparatus in response to a specific voice command for a predetermined period set corresponding to the predetermined sound detected by the detection unit.

本発明の一実施形態における制御方法は、音声によって操作可能な情報処理装置の制御方法であって、特定の音声コマンドを認識する認識ステップと、特定の音声コマンドに応答して、情報処理装置を起動する起動ステップと、所定のサウンドを検出する検出ステップと、を含み、起動ステップにおいて、所定のサウンドを検出した場合、所定の期間、特定の音声コマンドに応答した情報処理装置の起動処理を停止することを特徴とする。 A control method according to an embodiment of the present invention is a method for controlling an information processing apparatus that can be operated by voice, in which a recognition step for recognizing a specific voice command and the information processing apparatus in response to the specific voice command A start step for starting and a detection step for detecting a predetermined sound, and when the predetermined sound is detected in the start step, the start processing of the information processing apparatus in response to a specific voice command is stopped for a predetermined period. It is characterized by doing.

本発明の一実施形態におけるプログラムは、音声によって操作可能な情報処理装置を、特定の音声コマンドを認識する認識手段と、特定の音声コマンドに応答して、情報処理装置を起動する起動手段と、所定のサウンドを検出する検出手段、として機能させ、起動手段において、所定のサウンドを検出した場合、所定の期間、特定の音声コマンドに応答した情報処理装置の起動処理を停止することを特徴とする。 A program according to an embodiment of the present invention includes an information processing apparatus operable by voice, a recognition unit that recognizes a specific voice command, an activation unit that starts the information processing apparatus in response to the specific voice command, A detection means for detecting a predetermined sound, and when the predetermined sound is detected by the activation means, the activation processing of the information processing apparatus responding to a specific voice command is stopped for a predetermined period. .

本発明の一実施形態における情報処理装置は、音声によって操作可能な情報処理装置であって、特定の音声コマンドを認識する認識部と、特定の音声コマンドに応答して、情報処理装置を起動する起動部と、所定のサウンドが発せられた方向を検出する検出部と、を備え、起動部は、当該方向からの特定の音声コマンドに応答した情報処理装置の起動処理を実行しないことを特徴とする。 An information processing apparatus according to an embodiment of the present invention is an information processing apparatus operable by voice, and recognizes a specific voice command and activates the information processing apparatus in response to the specific voice command. An activation unit, and a detection unit that detects a direction in which a predetermined sound is emitted, wherein the activation unit does not execute an activation process of the information processing apparatus in response to a specific voice command from the direction. To do.

本発明の一実施形態における情報処理装置において、所定のサウンドは、広告放送に含まれる所定のサウンドロゴであり、検出部は、広告放送に含まれる所定のサウンドロゴが発せられた方向を検出し、起動部は、当該方向からの特定の音声コマンドに応答した情報処理装置の起動処理を実行しないことを特徴としてもよい。 In the information processing apparatus according to an embodiment of the present invention, the predetermined sound is a predetermined sound logo included in the advertisement broadcast, and the detection unit detects a direction in which the predetermined sound logo included in the advertisement broadcast is emitted. The activation unit may not perform the activation process of the information processing apparatus in response to a specific voice command from the direction.

本発明の一実施形態における情報処理装置において、認識部は、当該方向から予め登録されたユーザが発した特定の音声コマンドを認識し、起動部は、当該方向から特定の音声コマンドを認識した場合、予め登録されたユーザ以外のユーザが発した特定の音声コマンドに対しては情報処理装置の起動処理を停止し、予め登録されたユーザが発した特定の音声コマンドに対しては情報処理装置を起動することを特徴としてもよい。 In the information processing apparatus according to the embodiment of the present invention, the recognition unit recognizes a specific voice command issued by a user registered in advance from the direction, and the activation unit recognizes the specific voice command from the direction. The activation processing of the information processing apparatus is stopped for a specific voice command issued by a user other than a user registered in advance, and the information processing apparatus is stopped for a specific voice command issued by a user registered in advance. It is good also as starting.

本発明の一実施形態における情報処理装置において、当該方向から特定の音声コマンドを認識した場合、情報処理装置を起動するか否かをユーザに対して確認する確認部を、さらに備えることを特徴としてもよい。 The information processing apparatus according to an embodiment of the present invention further includes a confirmation unit that confirms to the user whether to activate the information processing apparatus when a specific voice command is recognized from the direction. Also good.

本発明の一実施形態における情報処理装置において、起動部は、ユーザから情報処理装置を起動する旨の回答が入力された場合、当該回答が入力された以降、当該方向から特定の音声コマンドを認識したことに応答して、情報処理装置を起動することを特徴としてもよい。 In the information processing apparatus according to the embodiment of the present invention, when an answer to activate the information processing apparatus is input from the user, the activation unit recognizes a specific voice command from the direction after the answer is input. In response, the information processing apparatus may be activated.

本発明の一実施形態における情報処理装置において、情報処理装置の設置場所に依存するパラメータを設定する設定部を、さらに備え、検出部は、自装置から見て所定のコマンドが発せられた方向が変化した場合に、情報処理装置の設置場所が変化したことを検出し、設定部は、変化後の設置場所に基づいて、当該パラメータを再設定することを特徴としてもよい。 The information processing apparatus according to an embodiment of the present invention further includes a setting unit that sets a parameter depending on the installation location of the information processing apparatus, and the detection unit has a direction in which a predetermined command is issued as viewed from the own apparatus. When it has changed, it may be detected that the installation location of the information processing apparatus has changed, and the setting unit may reset the parameter based on the changed installation location.

本発明の一実施形態における制御方法は、音声によって操作可能な情報処理装置の制御方法であって、特定の音声コマンドを認識する認識ステップと、特定の音声コマンドに応答して、情報処理装置を起動する起動ステップと、所定のサウンドが発せられた方向を検出する検出ステップと、を含み、起動ステップにおいて、当該方向からの特定の音声コマンドに応答した情報処理装置の起動処理を実行しないことを特徴とする。 A control method according to an embodiment of the present invention is a method for controlling an information processing apparatus that can be operated by voice, in which a recognition step for recognizing a specific voice command and the information processing apparatus in response to the specific voice command A starting step for starting and a detecting step for detecting a direction in which a predetermined sound is emitted, and in the starting step, start processing of the information processing apparatus in response to a specific voice command from the direction is not executed. Features.

本発明の一実施形態におけるプログラムは、音声によって操作可能な情報処理装置を、特定の音声コマンドを認識する認識手段と、特定の音声コマンドに応答して、情報処理装置を起動する起動手段と、所定のサウンドが発せられた方向を検出する検出手段、として機能させ、検出手段において、当該方向からの特定の音声コマンドに応答した情報処理装置の起動処理を実行しないことを特徴とする。 A program according to an embodiment of the present invention includes an information processing apparatus operable by voice, a recognition unit that recognizes a specific voice command, an activation unit that starts the information processing apparatus in response to the specific voice command, The information processing apparatus is configured to function as a detection unit that detects a direction in which a predetermined sound is emitted, and the detection unit does not execute an activation process of the information processing apparatus in response to a specific voice command from the direction.

本発明によれば、利便性を保ちつつ、且つ、低コストにより、テレビ等からの特定の音声コマンドによって誤起動することを低減可能な情報処理装置、制御方法及びプログラムを提供できる。 According to the present invention, it is possible to provide an information processing apparatus, a control method, and a program that can reduce erroneous start by a specific voice command from a television or the like while maintaining convenience and at low cost.

従来の情報処理装置の状態を説明するための図である。It is a figure for demonstrating the state of the conventional information processing apparatus. 本発明の第１の実施形態における情報処理装置の状態を説明するための図である。It is a figure for demonstrating the state of the information processing apparatus in the 1st Embodiment of this invention. 本発明の第１の実施形態における情報処理システムの構成例を示す図である。It is a figure which shows the structural example of the information processing system in the 1st Embodiment of this invention. 本発明の第１の実施形態における情報処理装置の構成例を示す図である。It is a figure which shows the structural example of the information processing apparatus in the 1st Embodiment of this invention. 本発明の第１の実施形態における情報処理装置の動作例を示すフローチャートである。It is a flowchart which shows the operation example of the information processing apparatus in the 1st Embodiment of this invention. 本発明の第１の実施形態における情報処理装置の他の状態を説明するための図である。It is a figure for demonstrating the other state of the information processing apparatus in the 1st Embodiment of this invention. 本発明の第２の実施形態における情報処理装置の状態を説明するための図である。It is a figure for demonstrating the state of the information processing apparatus in the 2nd Embodiment of this invention. 本発明の第２の実施形態における検出部によって所定のサウンドが発せられた方向を検出する動作を説明するための図である。It is a figure for demonstrating the operation | movement which detects the direction in which the predetermined sound was emitted by the detection part in the 2nd Embodiment of this invention. 本発明の第２の実施形態における検出部によって所定のサウンドが発せられた方向を検出する他の状態を説明するための図である。It is a figure for demonstrating the other state which detects the direction where the predetermined sound was emitted by the detection part in the 2nd Embodiment of this invention. 本発明の第２の実施形態における情報処理装置の動作例を示すフローチャートである。It is a flowchart which shows the operation example of the information processing apparatus in the 2nd Embodiment of this invention. 本発明の第２の実施形態における情報処理装置の他の状態を説明するための図である。It is a figure for demonstrating the other state of the information processing apparatus in the 2nd Embodiment of this invention. 本発明の第２の実施形態の変形例２における情報処理装置の構成例を示す図である。It is a figure which shows the structural example of the information processing apparatus in the modification 2 of the 2nd Embodiment of this invention. 本発明の第２の実施形態の変形例４における情報処理装置の構成例を示す図である。It is a figure which shows the structural example of the information processing apparatus in the modification 4 of the 2nd Embodiment of this invention.

＜第１の実施形態＞
本発明の第１の実施形態について、図面を参照して説明する。 <First Embodiment>
A first embodiment of the present invention will be described with reference to the drawings.

本発明の第１の実施形態において、スマートスピーカー（ＡＩ（ＡｒｔｉｆｉｃｉａｌＩｎｔｅｌｌｉｇｅｎｃｅ）スピーカー）などの情報処理装置は、音声を認識し、当該音声に対応した各種動作を実行可能である。例えば、情報処理装置は、ユーザからの特定の音声コマンド（例えば、「Ｈｅｌｌｏ！」）を認識し、当該特定の音声コマンドに応答して起動することができる。なお、情報処理装置の起動は、当該情報処理装置をスリープ状態からアクティブ状態に遷移させることである。また、特定の音声コマンドの詳細は、後述する。 In the first embodiment of the present invention, an information processing apparatus such as a smart speaker (AI (Artificial Intelligence) speaker) can recognize voice and perform various operations corresponding to the voice. For example, the information processing apparatus can recognize a specific voice command (for example, “Hello!”) From the user and start in response to the specific voice command. The activation of the information processing apparatus is to change the information processing apparatus from the sleep state to the active state. Details of the specific voice command will be described later.

そして、このような情報処理装置は、人間の肉声だけでなく、テレビやラジオ、ＰＣなどが発する音声も認識する可能性がある。そのため、テレビ等が発した特定の音声コマンドによって、情報処理装置が起動してしまうおそれがある。 Such an information processing apparatus may recognize not only a human voice but also a voice emitted from a television, a radio, a PC, or the like. Therefore, the information processing apparatus may be activated by a specific voice command issued by a television or the like.

特に、テレビやラジオ、配信動画において、情報処理装置（スマートスピーカー）の広告放送（いわゆるＣＭ（ＣｏｍｍｅｒｃｉａｌＭｅｓｓａｇｅ））を行う場合には、当該情報処理装置の動作を視聴者に伝えるために、特定の音声コマンドを発せざるを得ない状況が想定される。このような場合、テレビ等における広告放送内において発せられる特定の音声コマンドによって、情報処理装置が起動してしまうおそれが高まる。 In particular, when an advertisement broadcast (information so-called CM (Commercial Message)) of an information processing device (smart speaker) is performed on a television, radio, or distribution video, a specific information is transmitted in order to inform the viewer of the operation of the information processing device. A situation in which a voice command must be issued is assumed. In such a case, there is an increased risk that the information processing apparatus is activated by a specific voice command issued in an advertisement broadcast on a television or the like.

図１は、従来の情報処理装置の状態を説明するための図である。図１において、情報処理装置１０は、実際に部屋などに設置されるスマートスピーカーである。また、図１の例では、テレビ２０において、情報処理装置（スマートスピーカー）の広告放送（ＣＭ）が放送されている。具体的には、テレビ２０は、広告放送として、“登場人物３０が、情報処理装置１０Ａに対して、特定の音声コマンド「Ｈｅｌｌｏ！」を発している場面”を放送している。なお、図１において、情報処理装置１０Ａは、広告放送内の情報処理装置であって、仮想的なものである。 FIG. 1 is a diagram for explaining a state of a conventional information processing apparatus. In FIG. 1, an information processing apparatus 10 is a smart speaker that is actually installed in a room or the like. In the example of FIG. 1, an advertisement broadcast (CM) of an information processing device (smart speaker) is broadcast on the television 20. Specifically, the television 20 broadcasts “a scene where the character 30 is issuing a specific voice command“ Hello! ”To the information processing apparatus 10A” as an advertisement broadcast. 1, the information processing apparatus 10 </ b> A is an information processing apparatus in an advertisement broadcast and is virtual.

このような広告放送がテレビ２０により放送されると、実際に部屋などに設置されている情報処理装置１０が、テレビ２０から出力される特定の音声コマンド「Ｈｅｌｌｏ！」を認識してしまい、当該情報処理装置１０が起動してしまう。言い換えると、情報処理装置１０が、スリープ状態から、アクティブ状態（図１の「ＯＮ」の状態）に遷移してしまう。 When such an advertisement broadcast is broadcast by the television 20, the information processing apparatus 10 actually installed in a room or the like recognizes a specific voice command “Hello!” Output from the television 20, The information processing apparatus 10 starts up. In other words, the information processing apparatus 10 transitions from the sleep state to the active state (the “ON” state in FIG. 1).

そして、情報処理装置１０が一度起動すると、当該情報処理装置１０はアクティブ状態となるので、音声による各種処理の実行の指示を受け付けてしまう。そのため、情報処理装置１０が、テレビ等から発せられる様々な音声に反応してしまい、ユーザの意図しない処理を実行してしまうという問題が生じる。 Then, once the information processing apparatus 10 is activated, the information processing apparatus 10 enters an active state, and thus receives instructions for executing various processes by voice. Therefore, there arises a problem that the information processing apparatus 10 reacts to various sounds emitted from a television or the like and executes a process unintended by the user.

そこで、本発明の第１の実施形態では、テレビ等において放送される広告放送（ＣＭ）などに、所定のサウンドが含まれている。例えば、情報処理装置１０を販売等する企業が、当該情報処理装置１０の広告放送（ＣＭ）などに、所定のサウンドを含ませる。そして、情報処理装置１０は、所定のサウンドを検出した場合、その後特定の音声コマンドを認識しても、所定の期間、起動処理を停止するように構成される。なお、所定のサウンドや所定の期間の詳細は、後述する。 Therefore, in the first embodiment of the present invention, a predetermined sound is included in an advertisement broadcast (CM) broadcast on a television or the like. For example, a company that sells the information processing apparatus 10 includes a predetermined sound in an advertisement broadcast (CM) of the information processing apparatus 10. Then, the information processing apparatus 10 is configured to stop the activation process for a predetermined period even when a specific voice command is subsequently recognized when a predetermined sound is detected. Details of the predetermined sound and the predetermined period will be described later.

図２は、本発明の第１の実施形態における情報処理装置１０の状態を説明するための図である。図２において、テレビ２０は、所定のサウンド４０を含む広告放送を放送している。なお、広告放送の内容については、図１に例示する広告放送と同様である。 FIG. 2 is a diagram for explaining the state of the information processing apparatus 10 according to the first embodiment of the present invention. In FIG. 2, the television 20 broadcasts an advertisement broadcast including a predetermined sound 40. The content of the advertisement broadcast is the same as that of the advertisement broadcast illustrated in FIG.

このような広告放送がテレビ２０により放送されると、実際に部屋などに設置されている情報処理装置１０は、所定のサウンド４０を検出する。そうすると、情報処理装置１０は、所定のサウンドを検出したことにより、その後特定の音声コマンド「Ｈｅｌｌｏ！」を認識しても、所定の期間、当該情報処理装置１０の起動処理を停止する。すなわち、情報処理装置１０は、所定のサウンドを検出してから所定の期間を、特定の音声コマンドを認識しても当該情報処理装置１０を起動しない不感期間として設定する。そして、情報処理装置１０は、不感期間に特定の音声コマンドを認識しても起動しない。例えば、図２において、所定のサウンドを検出してから所定の期間に、テレビ２０から特定の音声コマンドが発せられても、情報処理装置１０は、スリープ状態（図２の「ＯＦＦ」の状態）のままとなり、アクティブ状態に遷移しない。 When such an advertisement broadcast is broadcast by the television 20, the information processing apparatus 10 actually installed in a room or the like detects a predetermined sound 40. Then, the information processing apparatus 10 stops the activation process of the information processing apparatus 10 for a predetermined period even if it recognizes a specific voice command “Hello!” After detecting a predetermined sound. That is, the information processing apparatus 10 sets a predetermined period after detecting a predetermined sound as a dead period in which the information processing apparatus 10 is not activated even when a specific voice command is recognized. The information processing apparatus 10 does not start even when a specific voice command is recognized during the dead period. For example, in FIG. 2, even if a specific voice command is issued from the television 20 in a predetermined period after detecting a predetermined sound, the information processing apparatus 10 is in the sleep state (the “OFF” state in FIG. 2). Remain and do not transition to the active state.

その結果、本発明の第１の実施形態における情報処理装置１０は、スリープ状態を維持するため、テレビ等から発せられる様々な音声に反応しなくなり、ユーザの意図しない処理が実行されることを防止できる。 As a result, since the information processing apparatus 10 according to the first exemplary embodiment of the present invention maintains the sleep state, the information processing apparatus 10 does not respond to various sounds emitted from a television or the like, and prevents a process unintended by the user from being executed. it can.

（システム構成）
図３は、本発明の第１の実施形態における情報処理システムの構成例を示す図である。図３に示すように、情報処理システムは、情報処理装置１０と、サーバ装置２００と、ネットワーク３００とを含む。なお、情報処理装置１０やサーバ装置２００の数は、１つに限られず、いくつであってもよい。 (System configuration)
FIG. 3 is a diagram illustrating a configuration example of the information processing system according to the first embodiment of the present invention. As illustrated in FIG. 3, the information processing system includes an information processing device 10, a server device 200, and a network 300. Note that the number of information processing apparatuses 10 and server apparatuses 200 is not limited to one and may be any number.

情報処理装置１０は、音声によって操作可能な装置であり、音声を認識して、所定の処理を実行可能な機能を備える。情報処理装置１０は、例えば、スマートスピーカー（ＡＩスピーカー）である。スマートスピーカーは、音声対話型の機能を有し、例えばユーザが音声で指示することにより、所定の処理を実行可能である。所定の処理は、例えば、情報処理装置１０を起動することである。情報処理装置１０の起動は、スリープ状態の情報処理装置１０を、アクティブ状態に遷移させる処理である。スリープ状態は、情報処理装置１０の処理が制限された状態である。例えば、スリープ状態において、情報処理装置１０の処理は、音声による入力を受け付ける処理のみに制限される。そして、情報処理装置１０は、スリープ状態において、音声による入力によって特定の音声コマンドを認識した後でなければ、アクティブ状態に遷移しない。 The information processing apparatus 10 is an apparatus that can be operated by voice, and has a function of recognizing the voice and executing a predetermined process. The information processing apparatus 10 is, for example, a smart speaker (AI speaker). The smart speaker has a voice interactive function, and can execute a predetermined process, for example, when a user gives a voice instruction. The predetermined process is, for example, starting up the information processing apparatus 10. The activation of the information processing apparatus 10 is a process for causing the information processing apparatus 10 in the sleep state to transition to the active state. The sleep state is a state where processing of the information processing apparatus 10 is restricted. For example, in the sleep state, the processing of the information processing apparatus 10 is limited to only processing that accepts voice input. Then, in the sleep state, the information processing apparatus 10 does not transition to the active state unless it recognizes a specific voice command by voice input.

一方、アクティブ状態は、情報処理装置１０が各種処理を実行可能な状態である。ユーザは、アクティブ状態の情報処理装置１０に対して、音声により、各種処理の実行を指示することが可能である。各種処理は、例えば、音楽を再生する処理や、インターネットにおける検索エンジンを用いた検索処理、各種ＷＥＢサイトにおける商品購入処理などである。例えば、情報処理装置１０は、ユーザからの「音楽を再生して」という音声による指示に応答して、音楽を再生する処理を実行する。また、各種処理は、家電製品に対する各種指示（当該家電製品をＯＮ／ＯＦＦする処理など）や、音声データのテキストデータへの変換処理、メール等による他装置（図示しない）への送信処理、簡単な会話処理などを含んでいてもよい。なお、情報処理装置１０が実行可能な処理は、これらの例に限られず、どのような処理であってもよい。 On the other hand, the active state is a state in which the information processing apparatus 10 can execute various processes. The user can instruct the information processing apparatus 10 in the active state to execute various processes by voice. The various processes are, for example, a process of playing music, a search process using a search engine on the Internet, and a product purchase process on various WEB sites. For example, the information processing apparatus 10 executes a process of playing music in response to a voice instruction “play music” from the user. In addition, various types of processing include various instructions for home appliances (processing for turning on / off the home appliances), conversion processing of voice data into text data, transmission processing to other devices (not shown) by e-mail, etc. Conversation processing etc. may be included. The processing that can be executed by the information processing apparatus 10 is not limited to these examples, and may be any processing.

サーバ装置２００は、所定のサービスを提供可能な装置であり、例えば検索エンジンやＷＥＢサーバである。サーバ装置２００は、情報処理装置１０からのアクセスを受け付け、所定のサービスを提供可能である。例えば、サーバ装置２００は、情報処理装置１０に対して、商品を購入可能なＷＥＢサイトを提供する。 The server device 200 is a device that can provide a predetermined service, and is, for example, a search engine or a WEB server. The server apparatus 200 can receive access from the information processing apparatus 10 and can provide a predetermined service. For example, the server device 200 provides the information processing device 10 with a WEB site where products can be purchased.

ネットワーク３００は、情報処理装置１０とサーバ装置２００を相互に接続させるためのネットワークであり、例えば、無線ネットワークや有線ネットワークである。具体的には、ネットワーク３００は、ワイヤレスＬＡＮ（ｗｉｒｅｌｅｓｓＬＡＮ：ＷＬＡＮ）や広域ネットワーク（ｗｉｄｅａｒｅａｎｅｔｗｏｒｋ：ＷＡＮ）、ＩＳＤＮｓ（ｉｎｔｅｇｒａｔｅｄｓｅｒｖｉｃｅｄｉｇｉｔａｌｎｅｔｗｏｒｋｓ）、無線ＬＡＮｓ、ＬＴＥ（ｌｏｎｇｔｅｒｍｅｖｏｌｕｔｉｏｎ）、ＬＴＥ−Ａｄｖａｎｃｅｄ、第４世代（４Ｇ）、第５世代（５Ｇ）、ＣＤＭＡ（ｃｏｄｅｄｉｖｉｓｉｏｎｍｕｌｔｉｐｌｅａｃｃｅｓｓ）、ＷＣＤＭＡ（登録商標）などである。 The network 300 is a network for connecting the information processing apparatus 10 and the server apparatus 200 to each other, and is, for example, a wireless network or a wired network. Specifically, the network 300 includes a wireless LAN (WLAN), a wide area network (WAN), ISDNs (integrated service digital networks), wireless LANs, LTE (long term evolution, LTE-c, LTE-c). 4th generation (4G), 5th generation (5G), CDMA (code division multiple access), WCDMA (registered trademark), and the like.

また、ネットワーク３００は、これらの例に限られず、例えば、公衆交換電話網（ＰｕｂｌｉｃＳｗｉｔｃｈｅｄＴｅｌｅｐｈｏｎｅＮｅｔｗｏｒｋ：ＰＳＴＮ）やブルートゥース（Ｂｌｕｅｔｏｏｔｈ（登録商標））、光回線、ＡＤＳＬ（ＡｓｙｍｍｅｔｒｉｃＤｉｇｉｔａｌＳｕｂｓｃｒｉｂｅｒＬｉｎｅ）回線、衛星通信網などであってもよく、どのようなネットワークであってもよい。 The network 300 is not limited to these examples. For example, the public switched telephone network (PSTN), Bluetooth (Bluetooth (registered trademark)), an optical line, an ADSL (Asymmetric Digital Subscriber satellite), and the like. A communication network may be used, and any network may be used.

また、ネットワーク３００、例えば、ＮＢ−ＩｏＴ（ＮａｒｒｏｗＢａｎｄＩｏＴ）や、ｅＭＴＣ（ｅｎｈａｎｃｅｄＭａｃｈｉｎｅＴｙｐｅＣｏｍｍｕｎｉｃａｔｉｏｎ）であってもよい。なお、ＮＢ−ＩｏＴやｅＭＴＣは、ＩｏＴ向けの無線通信方式であり、低コスト、低消費電力で長距離通信が可能なネットワークである。また、ネットワーク３００は、Ｖ２Ｘ（ＶｅｈｉｃｌｅｔｏＥｖｅｒｙｔｈｉｎｇ：車車間通信又は路車間通信）に用いられる通信ネットワークであってもよい。Ｖ２Ｘは、例えば、自動車と自動車(車車間)、又は、自動車と信号機や道路標識などのインフラ(路車間)が、直接相互に通信する通信方式である。 Moreover, the network 300, for example, NB-IoT (Narrow Band IoT) and eMTC (enhanced machine type communication) may be sufficient. Note that NB-IoT and eMTC are wireless communication systems for IoT, and are networks capable of long-distance communication with low cost and low power consumption. The network 300 may be a communication network used for V2X (Vehicle to Everything: road-to-vehicle communication or road-to-vehicle communication). V2X is a communication method in which, for example, an automobile and an automobile (between vehicles) or an automobile and an infrastructure such as a traffic light or a road sign (between roads and vehicles) directly communicate with each other.

また、ネットワーク３００は、これらの組み合わせであってもよい。また、ネットワーク３００は、これらの例を組み合わせた複数の異なるネットワークを含むものであってもよい。例えば、ネットワーク３００は、ＬＴＥによる無線ネットワークと、閉域網であるイントラネットなどの有線ネットワークとを含むものであってもよい。 The network 300 may be a combination of these. The network 300 may include a plurality of different networks that combine these examples. For example, the network 300 may include an LTE wireless network and a wired network such as an intranet that is a closed network.

（情報処理装置の構成例）
図４は、本発明の第１の実施形態における情報処理装置１０の構成例を示す図である。図４に例示するように、情報処理装置１０は、例えば、制御部１０１と、通信部１０２と、入出力部１０３と、表示部１０４と、記憶部１０５とを含む。 (Configuration example of information processing device)
FIG. 4 is a diagram illustrating a configuration example of the information processing apparatus 10 according to the first embodiment of the present invention. As illustrated in FIG. 4, the information processing apparatus 10 includes, for example, a control unit 101, a communication unit 102, an input / output unit 103, a display unit 104, and a storage unit 105.

通信部１０２は、ネットワーク３００を介して各種データや情報、信号の送受信を行う通信インタフェースである。通信部１０２は、ネットワーク３００を介して、サーバ装置２００との通信を実行する機能を有する。また、通信部１０２は、ＢｌｕｅＴｏｏｔｈなどの近距離無線通信を介して、情報処理装置１０の近距離に存在する他の装置（図示しない）との間で、各種の処理を実行するための信号等の送受信を実行してもよい。例えば、通信部１０２は、家電製品に対して、当該家電製品の起動を指示する制御信号を送信してもよい。 The communication unit 102 is a communication interface that transmits and receives various data, information, and signals via the network 300. The communication unit 102 has a function of executing communication with the server device 200 via the network 300. In addition, the communication unit 102 performs signals and the like for performing various processes with other devices (not shown) existing at a short distance of the information processing device 10 via short-range wireless communication such as BlueTooth. May be executed. For example, the communication unit 102 may transmit a control signal instructing activation of the home appliance to the home appliance.

入出力部１０３は、情報処理装置１０に対する各種操作を入力する装置により実現される。入出力部１０３は、キーボードやマウス、タッチパネル、マイク（マイクロホン）、各種センサなどである。入出力部１０３は、例えば、マイクを含み、音声による入力を受け付ける。音声による入力は、例えば、情報処理装置１０を起動するための特定の音声コマンド（音声による命令）である。なお、特定の音声コマンドは、ウェイクワードやホットワード、呼びかけ、などと呼称されてもよい。また、上述したように、情報処理装置１０の起動は、スリープ状態の情報処理装置１０を、アクティブ状態に遷移させる処理である。 The input / output unit 103 is realized by a device that inputs various operations to the information processing apparatus 10. The input / output unit 103 includes a keyboard, a mouse, a touch panel, a microphone (microphone), various sensors, and the like. The input / output unit 103 includes, for example, a microphone and accepts input by voice. The voice input is, for example, a specific voice command (voice command) for activating the information processing apparatus 10. The specific voice command may be referred to as a wake word, a hot word, a call, or the like. Further, as described above, the activation of the information processing apparatus 10 is a process of causing the information processing apparatus 10 in the sleep state to transition to the active state.

特定の音声コマンドは、予め定められた語句であり、情報処理装置１０を起動するためにユーザが呼びかける語句である。例えば、特定の音声コマンドは、「Ｈｅｌｌｏ」や「ＯｋＣｏｍｐｕｔｅｒ」などであり、どのような語句であってもよい。また、特定の音声コマンドは、複数の語句の組み合わせ（例えば、「ＯｋＣｏｍｐｕｔｅｒ」）であってもよい。また、特定の音声コマンドは、語句を複数回繰り返すもの（例えば、「Ｈｅｌｌｏ」を３回繰り返すなど）であってもよい。また、特定の音声コマンドは、ユーザが適宜変更可能であってもよい。 The specific voice command is a predetermined phrase and is a phrase that the user calls to activate the information processing apparatus 10. For example, the specific voice command is “Hello”, “Ok Computer”, or the like, and may be any word or phrase. Further, the specific voice command may be a combination of a plurality of words (for example, “Ok Computer”). Further, the specific voice command may be a command that repeats a phrase multiple times (for example, “Hello” is repeated three times). Further, the specific voice command may be appropriately changed by the user.

また、音声による入力は、アクティブ状態の情報処理装置１０に対して、各種処理の実行を指示するものであってもよい。例えば、入出力部１０３は、「音楽を再生して」や「今日の天気は？」などの音声による指示を受け付けることができる。なお、音声による入力は、これらの例に限られず、どのようなものであってもよい。 Further, the voice input may instruct the information processing apparatus 10 in the active state to execute various processes. For example, the input / output unit 103 can accept voice instructions such as “Play music” and “What is the weather today?”. The input by voice is not limited to these examples, and any input may be used.

さらに、入出力部１０３は、所定のサウンド（音データを含む）の入力を受け付け可能である。所定のサウンドは、例えば、広告放送に含まれる所定のサウンドロゴである。例えば、複数のサウンドロゴのうち、予め定められた所定のサウンドロゴが、所定のサウンドとして設定される。 Furthermore, the input / output unit 103 can accept input of a predetermined sound (including sound data). The predetermined sound is, for example, a predetermined sound logo included in the advertisement broadcast. For example, a predetermined sound logo determined in advance among the plurality of sound logos is set as the predetermined sound.

サウンドロゴは、企業などが、テレビやラジオ、配信動画などの広告放送（ＣＭ）において、当該企業又は当該企業の商品に対して付されるメロディーや効果音、曲、音声などの音響である。サウンドロゴは、例えば数秒間などの短い音響であってもよいし、広告放送の開始から終了まで流れる数十秒程度の長い音響であってもよく、どのような長さであってもよい。 The sound logo is a sound such as a melody, sound effect, song, or voice attached to a company or a product of the company in an advertisement broadcast (CM) such as television, radio, or distribution video. The sound logo may be a short sound such as several seconds, for example, may be a long sound of about several tens of seconds flowing from the start to the end of advertisement broadcasting, and may be any length.

また、所定のサウンドは、サウンドロゴに限られず、所定のメロディーや効果音、曲、音声であってもよい。また、所定のサウンドは、人間が聞こえる必要はなく、情報処理装置１０が検出可能な音情報であれば、例えばモスキート音等の高周波など、どのようなものであってもよい。また、所定のサウンドは、どのような長さであってもよい。 The predetermined sound is not limited to the sound logo, and may be a predetermined melody, sound effect, song, or voice. The predetermined sound does not need to be heard by humans, and may be any sound information such as a high frequency such as a mosquito sound as long as the information can be detected by the information processing apparatus 10. Further, the predetermined sound may have any length.

なお、情報処理装置１０は、所定のサウンドを検出した場合、所定の期間、その後特定の音声コマンドを受け付けても、当該情報処理装置１０を起動することを停止する。具体的には、スリープ状態の情報処理装置１０は、所定のサウンドを検出した場合、所定の期間、「Ｈｅｌｌｏ！」という所定の音声コマンドを受け付けても、アクティブ状態に遷移しない。 Note that when a predetermined sound is detected, the information processing apparatus 10 stops starting the information processing apparatus 10 for a predetermined period, even if a specific voice command is received thereafter. Specifically, when a predetermined sound is detected, the information processing apparatus 10 in the sleep state does not transition to the active state even if a predetermined voice command “Hello!” Is received for a predetermined period.

表示部１０４は、例えば、液晶ディスプレイやＯＥＬＤである。なお、表示部１０４は、これらの例に限定されず、ヘッドマウントディスプレイ（ＨＭＤ）などであってもよい。表示部１０４は、フレームバッファに書き込まれた表示データに従って、画像やテキスト情報、３Ｄ（ｔｈｒｅｅｄｉｍｅｎｓｉｏｎａｌ）などの表示データを表示可能である。 The display unit 104 is, for example, a liquid crystal display or OELD. In addition, the display part 104 is not limited to these examples, A head mounted display (HMD) etc. may be sufficient. The display unit 104 can display display data such as images, text information, and 3D (three dimensional) according to display data written in the frame buffer.

記憶部１０５は、情報処理装置１０が動作するうえで必要とする各種プログラムや各種データを記憶する機能を有する。記憶部１０５は、例えば、ＨＤＤ、ＳＳＤ、フラッシュメモリなど各種の記憶媒体により実現される。なお、情報処理装置１０は、プログラムを記憶部１０５に記憶し、当該プログラムを実行して、制御部１０１が、当該制御部１０１に含まれる各部としての処理を実行してもよい。当該プログラムは、情報処理装置１０に、制御部１０１が実行する各機能を実現させる。 The storage unit 105 has a function of storing various programs and various data necessary for the information processing apparatus 10 to operate. The storage unit 105 is realized by various storage media such as an HDD, an SSD, and a flash memory. The information processing apparatus 10 may store the program in the storage unit 105, execute the program, and the control unit 101 may execute processing as each unit included in the control unit 101. The program causes the information processing apparatus 10 to realize each function executed by the control unit 101.

制御部１０１は、例えば、中央処理装置（ＣＰＵ）やマイクロプロセッサ、ＡＳＩＣ、ＦＰＧＡなどであってもよい。なお、制御部１０１は、これらの例に限られず、どのようなものであってもよい。 The control unit 101 may be, for example, a central processing unit (CPU), a microprocessor, an ASIC, an FPGA, or the like. In addition, the control part 101 is not restricted to these examples, What kind of thing may be sufficient.

図４に例示するように、制御部１０１は、認識部１１０と、起動部１１１と、検出部１１２とを含む。 As illustrated in FIG. 4, the control unit 101 includes a recognition unit 110, an activation unit 111, and a detection unit 112.

認識部１１０は、特定の音声コマンドを認識する機能を備える。認識部１１０は、情報処理装置１０がスリープ状態又はアクティブ状態である場合において、入出力部１０３から入力された音声に含まれる特定の音声コマンドを認識する。具体的には、認識部１１０は、入出力部１０３から入力された音声に含まれる、「Ｈｅｌｌｏ！」や「ＯｋＣｏｍｐｕｔｅｒ」などの特定の音声コマンドを認識する。 The recognition unit 110 has a function of recognizing a specific voice command. The recognition unit 110 recognizes a specific voice command included in the voice input from the input / output unit 103 when the information processing apparatus 10 is in the sleep state or the active state. Specifically, the recognition unit 110 recognizes a specific voice command such as “Hello!” Or “Ok Computer” included in the voice input from the input / output unit 103.

検出部１１２は、所定のサウンドを検出する機能を備える。例えば、検出部１１２は、テレビやラジオ、動画配信サービスなどにおける広告放送に含まれる所定のサウンドロゴを検出する。なお、検出部１１２は、認識部１１０による特定の音声コマンドの認識とは無関係に、所定のサウンドを検出可能である。 The detection unit 112 has a function of detecting a predetermined sound. For example, the detection unit 112 detects a predetermined sound logo included in an advertisement broadcast on a television, radio, video distribution service, or the like. The detection unit 112 can detect a predetermined sound regardless of recognition of a specific voice command by the recognition unit 110.

起動部１１１は、認識部１１０が特定の音声コマンドを認識したことに応答して、情報処理装置１０を起動する機能を備える。具体的には、起動部１１１は、認識部１１０が特定の音声コマンドを認識したことに応答して、スリープ状態の情報処理装置１０を、アクティブ状態に遷移させる。なお、上述したように、スリープ状態は、情報処理装置１０の処理が制限された状態であり、例えば、音声による入力を受け付ける処理のみに制限された状態である。一方、アクティブ状態は、情報処理装置１０が各種処理を実行可能な状態であり、例えば、ユーザから音声による各種処理の実行の指示を受け付けできる状態である。 The activation unit 111 has a function of activating the information processing apparatus 10 in response to the recognition unit 110 recognizing a specific voice command. Specifically, in response to the recognition unit 110 recognizing a specific voice command, the activation unit 111 causes the information processing apparatus 10 in the sleep state to transition to the active state. Note that, as described above, the sleep state is a state in which the processing of the information processing apparatus 10 is restricted, and is a state in which, for example, the state is restricted to only processing that accepts input by voice. On the other hand, the active state is a state in which the information processing apparatus 10 can execute various processes. For example, the active state is a state in which an instruction to execute various processes by voice can be received from the user.

また、起動部１１１は、所定のサウンドを検出した場合、所定の期間、特定の音声コマンドに応答した情報処理装置１０を起動する処理（起動処理）を停止する。すなわち、起動部１１１は、所定のサウンドを検出した場合、所定の期間、「Ｈｅｌｌｏ！」という所定の音声コマンドを受け付けても、スリープ状態の情報処理装置１０をアクティブ状態に遷移する処理を実行しない。 Further, when detecting a predetermined sound, the activation unit 111 stops a process (activation process) for activating the information processing apparatus 10 in response to a specific voice command for a predetermined period. That is, when detecting a predetermined sound, the activation unit 111 does not execute the process of transitioning the information processing apparatus 10 in the sleep state to the active state even if a predetermined voice command “Hello!” Is received for a predetermined period. .

所定の期間は、例えば、所定のサウンドが含まれる広告放送（ＣＭ）の長さであり、３０秒や１分などである。ここで、テレビ等が特定の音声コマンドを発する可能性が高いのは、当該広告放送の時間である。そこで、起動部１１１は、テレビ等において広告放送の長さを所定の期間（すなわち、不感期間）として設定し、当該所定の期間（不感期間）に特定の音声コマンドを認識しても、情報処理装置１０を起動しない。なお、所定の期間は、これらの例に限られず、例えば５分など、どのような長さに設定されてもよい。 The predetermined period is, for example, the length of an advertisement broadcast (CM) including a predetermined sound, such as 30 seconds or 1 minute. Here, it is the time of the advertisement broadcast that the TV or the like is likely to issue a specific voice command. Therefore, the activation unit 111 sets the length of the advertisement broadcast on a television or the like as a predetermined period (that is, a dead period) and recognizes a specific voice command during the predetermined period (dead period). The device 10 is not activated. The predetermined period is not limited to these examples, and may be set to any length, for example, 5 minutes.

（情報処理装置の動作例）
図５は、本発明の第１の実施形態における情報処理装置１０の動作例を示すフローチャートである。なお、図５に示す動作例はあくまでも一例であって、情報処理装置１０の動作は図５に示す動作例に限定されない。 (Operation example of information processing device)
FIG. 5 is a flowchart illustrating an operation example of the information processing apparatus 10 according to the first embodiment of the present invention. Note that the operation example illustrated in FIG. 5 is merely an example, and the operation of the information processing apparatus 10 is not limited to the operation example illustrated in FIG.

情報処理装置１０の検出部１１２が、所定のサウンドを認識する（Ｓ１００）。例えば、認識部１１０は、所定のサウンドロゴを認識する。 The detection unit 112 of the information processing apparatus 10 recognizes a predetermined sound (S100). For example, the recognition unit 110 recognizes a predetermined sound logo.

その後、認識部１１０が、特定の音声コマンドを認識する（Ｓ１０１）。例えば、認識部１１０は、「Ｈｅｌｌｏ！」という所定の音声コマンドを認識する。 Thereafter, the recognition unit 110 recognizes a specific voice command (S101). For example, the recognition unit 110 recognizes a predetermined voice command “Hello!”.

起動部１１１は、認識部１１０が所定の音声コマンドを認識したことに応答して、検出部１１２が所定のサウンドを検出してから所定の期間経過したか否かを判定する（Ｓ１０２）。 In response to the recognition unit 110 recognizing a predetermined voice command, the activation unit 111 determines whether or not a predetermined period has elapsed since the detection unit 112 detected a predetermined sound (S102).

起動部１１１は、所定の期間経過していた場合（Ｓ１０２のＹＥＳ）、情報処理装置１０を起動する（Ｓ１０３）。具体的には、起動部１１１は、スリープ状態の情報処理装置１０を、アクティブ状態に遷移させる。一方、起動部１１１は、所定の期間経過していない場合（Ｓ１０２のＮＯ）、情報処理装置１０の起動処理を停止する（Ｓ１０４）。具体的には、起動部１１１は、スリープ状態の情報処理装置１０を、アクティブ状態に遷移させる処理を実行しない。 When the predetermined period has elapsed (YES in S102), the activation unit 111 activates the information processing apparatus 10 (S103). Specifically, the activation unit 111 causes the information processing apparatus 10 in the sleep state to transition to the active state. On the other hand, when the predetermined period has not elapsed (NO in S102), the activation unit 111 stops the activation process of the information processing apparatus 10 (S104). Specifically, the activation unit 111 does not execute the process of causing the information processing apparatus 10 in the sleep state to transition to the active state.

上記のように、本発明の第１の実施形態において、情報処理装置１０は、所定のサウンドを検出した場合、その後特定の音声コマンドを認識しても、所定の期間、当該情報処理装置１０の起動処理を停止する。その結果、情報処理装置１０は、テレビ等から発せられる特定の音声コマンドによって、当該情報処理装置１０が誤起動されることを防止することができる。また、情報処理装置１０は、スリープ状態を維持するため、テレビ等から発せられる様々な音声に反応しなくなり、ユーザの意図しない処理が実行されることを防止できる。 As described above, in the first embodiment of the present invention, when the information processing apparatus 10 detects a predetermined sound, even if it recognizes a specific voice command thereafter, the information processing apparatus 10 Stop the startup process. As a result, the information processing apparatus 10 can prevent the information processing apparatus 10 from being erroneously activated by a specific voice command issued from a television or the like. Further, since the information processing apparatus 10 maintains the sleep state, the information processing apparatus 10 does not react to various sounds emitted from a television or the like, and can prevent a process unintended by the user from being executed.

このように、情報処理装置１０は、所定のサウンドを検出させることによって、当該情報処理装置１０の誤起動できるところ、所定のサウンドの検出は、話者認識技術によって識別する場合に比べて低コストで実現可能である。したがって、本発明の第１の実施形態における情報処理装置１０は、低コストにより、テレビ等からの特定の音声コマンドによってスピーカーが誤起動することを低減できる。また、本発明の第１の実施形態における情報処理装置１０は、所定のサウンドの有無によって起動の要否を決定でき、情報処理装置１０の操作を行うユーザを予め登録する必要がないため、例えば来客者など登録者以外の者が当該情報処理装置１０を操作することが可能となり、利便性も向上できる。 As described above, the information processing apparatus 10 can erroneously start the information processing apparatus 10 by detecting the predetermined sound. However, the detection of the predetermined sound is lower in cost than the case of identifying by the speaker recognition technology. It is feasible. Therefore, the information processing apparatus 10 according to the first exemplary embodiment of the present invention can reduce the erroneous start of the speaker due to a specific voice command from a television or the like at low cost. Further, the information processing apparatus 10 according to the first exemplary embodiment of the present invention can determine whether or not activation is necessary depending on the presence or absence of a predetermined sound, and does not need to register a user who operates the information processing apparatus 10 in advance. A person other than the registrant, such as a visitor, can operate the information processing apparatus 10, and convenience can be improved.

（変形例１）
変形例１は、情報処理装置１０が所定のサウンドを検出した場合であっても、予め登録されたユーザが発した特定の音声コマンドを認識したことに応答して、当該情報処理装置１０を起動する場合の形態である。 (Modification 1)
In the first modification, even when the information processing apparatus 10 detects a predetermined sound, the information processing apparatus 10 is activated in response to the recognition of a specific voice command issued by a user registered in advance. It is the form when doing.

変形例１における情報処理装置１０の記憶部１０５は、ユーザの音声データを予め記憶する。ユーザの音声データは、例えば、ユーザから、特定の音声コマンドなどを含む所定のフレーズを予め入力させることにより、記憶することができる。所定のフレーズは、例えば、「Ｈｅｌｌｏ！」や「Ｍｅｓｓａｇｅ」など複数種類の語句であり、情報処理装置１０は、ユーザから予め入力された音声に基づいて、ユーザの音声データを作成する。 The storage unit 105 of the information processing apparatus 10 in Modification 1 stores user voice data in advance. The user's voice data can be stored, for example, by inputting a predetermined phrase including a specific voice command from the user in advance. The predetermined phrase is, for example, a plurality of types of phrases such as “Hello!” And “Message”, and the information processing apparatus 10 creates the user's voice data based on the voice input in advance by the user.

制御部１０１の認識部１１０は、記憶部１０５に記憶されているユーザの音声データに基づいて、認識した特定の音声コマンドが、予め登録されたユーザから発せられたか否かを判定する。なお、特定の音声コマンドが、予め登録されたユーザから発せられたか否かを判定することは、例えば特徴部分を比較することにより実現可能である。そして、特定の音声コマンドが、予め登録されたユーザから発せられたか否かを判定することは、例えば、話者識別技術を用いてテレビ等から発生られた音声と人間の肉声とを区別することに比べて、低コストで実現可能である。 Based on the user's voice data stored in the storage unit 105, the recognition unit 110 of the control unit 101 determines whether the recognized specific voice command is issued from a user registered in advance. Note that it is possible to determine whether or not a specific voice command has been issued from a user registered in advance, for example, by comparing characteristic portions. And determining whether or not a specific voice command is issued from a pre-registered user, for example, distinguishing a voice generated from a television or the like from a human voice using a speaker identification technique Compared to, it can be realized at low cost.

起動部１１１は、検出部１１２が所定のサウンドを検出し、当該所定のサウンドの検出から所定の期間内であっても、認識部１１０が予め登録されたユーザから発せられた特定の音声コマンドを認識した場合には、情報処理装置１０を起動する。すなわち、起動部１１１は、認識部１１０が予め登録されたユーザから発せられた特定の音声コマンドを認識した場合には、スリープ状態の情報処理装置１０を、アクティブ状態に遷移させる。 In the activation unit 111, the detection unit 112 detects a predetermined sound, and a specific voice command issued from a user registered in advance by the recognition unit 110 is detected even within a predetermined period from the detection of the predetermined sound. If recognized, the information processing apparatus 10 is activated. That is, when the recognition unit 110 recognizes a specific voice command issued from a user registered in advance, the activation unit 111 causes the information processing apparatus 10 in the sleep state to transition to the active state.

図６は、本発明の第１の実施形態における情報処理装置１０の他の状態を説明するための図である。図６において、ユーザ５０は予め登録されたユーザであり、情報処理装置１０の記憶部１０５には、ユーザ５０の音声データが予め記憶されている。 FIG. 6 is a diagram for explaining another state of the information processing apparatus 10 according to the first embodiment of the present invention. In FIG. 6, a user 50 is a registered user, and the voice data of the user 50 is stored in advance in the storage unit 105 of the information processing apparatus 10.

図６において、テレビ２０は、所定のサウンド４０を含む広告放送を放送している。なお、広告放送の内容については、図１に例示する広告放送と同様である。図６において、実際に部屋などに設置されている情報処理装置１０は、テレビ２０から発せられる所定のサウンド４０を検出することになる。そうすると、情報処理装置１０は、所定のサウンドを検出したことに応答して、その後特定の音声コマンド「Ｈｅｌｌｏ！」を認識しても、所定の期間、当該情報処理装置１０の起動処理を停止する。すなわち、図６において、テレビ２０から特定の音声コマンドが発せられても、情報処理装置１０は、スリープ状態のままとなり、アクティブ状態に遷移しない。 In FIG. 6, the television 20 broadcasts an advertisement broadcast including a predetermined sound 40. The content of the advertisement broadcast is the same as that of the advertisement broadcast illustrated in FIG. In FIG. 6, the information processing apparatus 10 actually installed in a room or the like detects a predetermined sound 40 emitted from the television 20. Then, in response to the detection of the predetermined sound, the information processing apparatus 10 stops the activation process of the information processing apparatus 10 for a predetermined period even after recognizing a specific voice command “Hello!”. . That is, in FIG. 6, even when a specific voice command is issued from the television 20, the information processing apparatus 10 remains in the sleep state and does not transition to the active state.

しかしながら、図６において、ユーザ５０が特定の音声コマンド「Ｈｅｌｌｏ！」を発した場合には、情報処理装置１０は、当該ユーザ５０からの特定の音声コマンド「Ｈｅｌｌｏ！」であることを識別することによって起動する。すなわち、スリープ状態の情報処理装置１０は、ユーザ５０からの特定の音声コマンド「Ｈｅｌｌｏ！」に応答して、アクティブ状態（図６の「ＯＮ」の状態）に遷移する。 However, in FIG. 6, when the user 50 issues a specific voice command “Hello!”, The information processing apparatus 10 identifies the specific voice command “Hello!” From the user 50. Start by. That is, the information processing apparatus 10 in the sleep state transitions to the active state (the “ON” state in FIG. 6) in response to a specific voice command “Hello!” From the user 50.

上記のように、本発明の第１の実施形態の変形例１において、情報処理装置１０は、所定のサウンドを検出し、当該所定のサウンドの検出から所定の期間内であっても、予め登録されたユーザから発せられた特定の音声コマンドに応答して、当該情報処理装置１０を起動する。そのため、情報処理装置１０は、所定のサウンドの検出から所定の期間内であっても、全く起動できなくなるわけではなく、予め登録されたユーザであれば起動可能である。その結果、予め登録されたユーザであれば、いつでも情報処理装置１０を起動できることになり、利便性を向上させることができる。 As described above, in the first modification of the first embodiment of the present invention, the information processing apparatus 10 detects a predetermined sound and registers it in advance even within a predetermined period from the detection of the predetermined sound. In response to a specific voice command issued by the user, the information processing apparatus 10 is activated. For this reason, the information processing apparatus 10 does not stop at all even within a predetermined period after detection of a predetermined sound, and can be started by a user registered in advance. As a result, if the user is registered in advance, the information processing apparatus 10 can be activated at any time, and convenience can be improved.

（変形例２）
変形例２は、情報処理装置１０が所定のサウンドに加えて、当該所定のサウンドが発せられた方向を検出することにより、当該方向以外から発せられた特定の音声コマンドを認識した場合に、情報処理装置１０を起動する場合の形態である。 (Modification 2)
In the second modification, in addition to the predetermined sound, the information processing apparatus 10 detects the direction in which the predetermined sound is emitted, thereby recognizing a specific voice command issued from other than the direction. This is a mode in which the processing apparatus 10 is activated.

変形例２における情報処理装置１０の検出部１１２は、所定のサウンドに加えて、当該所定のサウンドが発せられた方向を検出する。すなわち、所定のサウンドの音源の方向を検出する。音源の方向の検出は、例えば、音信号の時間差検出に基づく方法や、指向性のビームを走査する方法（ビームフォーミング技術）、空間周波数として求める方法などを用いることができる。なお、検出部１１２が所定のサウンドが発せられた方向を検出する処理は、以下で説明する本発明の第２の実施形態における情報処理装置１０の処理と同様である。 The detection unit 112 of the information processing apparatus 10 in Modification 2 detects the direction in which the predetermined sound is emitted in addition to the predetermined sound. That is, the direction of the sound source of a predetermined sound is detected. The direction of the sound source can be detected using, for example, a method based on time difference detection of a sound signal, a method of scanning a directional beam (beam forming technique), a method of obtaining a spatial frequency, or the like. In addition, the process in which the detection part 112 detects the direction in which the predetermined sound was emitted is the same as the process of the information processing apparatus 10 in the 2nd Embodiment of this invention demonstrated below.

そして、起動部１１１は、検出部１１２が検出した所定のサウンドが発せられた方向から、特定の音声コマンドを受信しても（すなわち、所定のサウンドが発せられた方向からの特定の音声コマンドを認識しても）、情報処理装置１０の起動処理を行わない。すなわち、起動部１１１は、検出部１１２が検出した所定のサウンドが発せられた方向を、不感方向として設定し、当該不感方向から特定の音声コマンドを受信しても、情報処理装置１０の起動処理を行わない。 The activation unit 111 receives a specific voice command from the direction in which the predetermined sound detected by the detection unit 112 is emitted (that is, the specific voice command from the direction in which the predetermined sound is generated). However, the activation processing of the information processing apparatus 10 is not performed. That is, the activation unit 111 sets the direction in which the predetermined sound detected by the detection unit 112 is emitted as the insensitive direction, and even if a specific voice command is received from the insensitive direction, the activation process of the information processing apparatus 10 is performed. Do not do.

所定のサウンドは、テレビ等において放送される広告放送に含まれるものであるところ、所定のサウンドが発せられる方向は、テレビ等が設置されている方向になる。そして、テレビ等が設置されている方向から発せられる特定の音声コマンドは、テレビ等から発せられたものである可能性が高い。そこで、変形例２において、起動部１１１は、テレビ等が設置されている方向を不感方向として設定し、当該方向から発せられた特定の音声コマンドに対しては情報処理装置１０を起動しない。 The predetermined sound is included in the advertisement broadcast broadcast on the television or the like, and the direction in which the predetermined sound is emitted is the direction in which the television or the like is installed. A specific voice command issued from the direction in which the television or the like is installed is highly likely to be issued from the television or the like. Therefore, in the second modification, the activation unit 111 sets the direction in which the television or the like is installed as the insensitive direction, and does not activate the information processing apparatus 10 for a specific voice command issued from the direction.

一方、起動部１１１は、検出部１１２が検出した所定のサウンドが発せられた方向以外から、特定の音声コマンドを受信した場合（すなわち、所定のサウンドが発せられた方向以外からの特定の音声コマンドを認識した場合）には、情報処理装置１０の起動処理を実行する。すなわち、起動部１１１は、不感方向以外の方向から発せられた特定の音声コマンドに応答して、情報処理装置１０を起動する。 On the other hand, the activation unit 111 receives a specific voice command from a direction other than the direction in which the predetermined sound detected by the detection unit 112 is emitted (that is, the specific voice command from a direction other than the direction in which the predetermined sound is generated). When the information processing apparatus 10 is recognized, an activation process of the information processing apparatus 10 is executed. That is, the activation unit 111 activates the information processing apparatus 10 in response to a specific voice command issued from a direction other than the dead direction.

上記のように、本発明の第１の実施形態の変形例２において、情報処理装置１０は、テレビ等が設置されている方向から発せられた特定の音声コマンドに対して情報処理装置１０を起動しないことにより、テレビ等から発せられる特定の音声コマンドによって、当該情報処理装置１０が誤起動されることを防止することができる。また、情報処理装置１０は、テレビ等が設置されている方向以外から発せられた特定の音声コマンドに対しては情報処理装置１０を起動するため、所定のサウンドの検出から所定の期間内であっても、全く起動できなくなるわけではなく、利便性を向上させることができる。 As described above, in the second modification of the first embodiment of the present invention, the information processing apparatus 10 activates the information processing apparatus 10 in response to a specific voice command issued from the direction in which the television or the like is installed. By not doing so, it is possible to prevent the information processing apparatus 10 from being erroneously activated by a specific voice command issued from a television or the like. In addition, the information processing apparatus 10 activates the information processing apparatus 10 in response to a specific voice command issued from a direction other than the direction in which the television or the like is installed, and therefore, within a predetermined period from detection of a predetermined sound. However, it does not become impossible to start up at all, and convenience can be improved.

（変形例３）
変形例３は、複数種類の所定のサウンドが存在し、当該複数種類の所定のサウンドの各々に対して、情報処理装置１０の起動処理を停止する所定の期間が設定される場合の形態である。 (Modification 3)
The modified example 3 is a form in which a plurality of types of predetermined sounds exist and a predetermined period for stopping the activation processing of the information processing apparatus 10 is set for each of the plurality of types of predetermined sounds. .

所定のサウンドは、例えば、所定のサウンドロゴであり、テレビ等で放送される広告放送に含まれるものである。そして、所定の期間は、例えば、広告放送の長さである。ここで、広告放送の長さは、当該広告放送の放送される時間帯（例えば、朝昼夜など）や、当該広告放送を放送する媒体（例えば、テレビやラジオ、動画配信など）によって、様々な長さが設定される。例えば、昼や深夜に放送される広告放送は、朝方や夕方に放送される広告放送に比べて長いことがある。また、動画配信における広告放送は、テレビやラジオにおける広告放送に比べて長いことがある。これ以外の理由によっても、広告放送の長さは、様々な長さに設定される。そのため、広告放送の長さによって、所定の期間を変更する必要がある。所定の期間を広告放送の長さによって変更しなければ、広告放送の長さよりも、所定の期間が短くなる可能性がある。そうすると、広告放送は続いているのに所定の期間が終了した場合が生じ、そのような場合に広告放送において放送される特定の音声コマンドによって、情報処理装置１０が誤起動してしまうおそれがあるからである。 The predetermined sound is, for example, a predetermined sound logo, and is included in an advertisement broadcast broadcast on a television or the like. The predetermined period is, for example, the length of the advertisement broadcast. Here, the length of the advertisement broadcast varies depending on the time zone (for example, morning and night) when the advertisement broadcast is broadcast and the medium (for example, television, radio, video distribution, etc.) that broadcast the advertisement broadcast. The length is set. For example, an advertisement broadcast that is broadcast at noon or late at night may be longer than an advertisement broadcast that is broadcast in the morning or evening. In addition, advertisement broadcasts for video distribution may be longer than advertisement broadcasts on television and radio. For other reasons as well, the length of the advertisement broadcast is set to various lengths. Therefore, it is necessary to change the predetermined period according to the length of the advertisement broadcast. If the predetermined period is not changed according to the length of the advertisement broadcast, the predetermined period may be shorter than the length of the advertisement broadcast. In this case, there is a case where the predetermined period ends even though the advertisement broadcast continues, and in such a case, the information processing apparatus 10 may be erroneously started by a specific voice command broadcast in the advertisement broadcast. Because.

そこで、本発明の第１の実施形態の変形例３では、複数種類の所定のサウンドの各々に対して、情報処理装置１０の起動処理を停止する所定の期間を設定する。そして、広告放送の長さに基づいて、当該当該広告放送に含ませる所定のサウンドを決定する。具体的には、情報処理装置１０を販売等する企業は、広告放送の長さ以上の所定の期間が設定されている所定のサウンドを、当該広告放送に含ませる。これによって、情報処理装置１０の起動部１１１は、所定のサウンドに対応する所定の期間、すなわち広告の長さ以上の期間、情報処理装置１０の起動処理を停止することができる。 Therefore, in the third modification of the first embodiment of the present invention, a predetermined period for stopping the activation process of the information processing apparatus 10 is set for each of a plurality of types of predetermined sounds. Based on the length of the advertisement broadcast, a predetermined sound to be included in the advertisement broadcast is determined. Specifically, a company that sells the information processing apparatus 10 includes a predetermined sound in which a predetermined period longer than the length of the advertisement broadcast is set in the advertisement broadcast. Thus, the activation unit 111 of the information processing apparatus 10 can stop the activation process of the information processing apparatus 10 for a predetermined period corresponding to a predetermined sound, that is, a period longer than the length of the advertisement.

記憶部１０５は、複数種類の所定のサウンドの各々と、所定の期間とを対応付けた情報を記憶する。例えば、記憶部１０５は、一の所定のサウンドに対しては、１５秒の所定の期間を対応付けた情報を記憶し、他の所定のサウンドに対しては、３０秒の所定の期間を対応付けた情報を記憶する。なお、１５秒や３０秒はあくまでも例示であって、所定の期間（不感期間）は、どのような長さであってもよい。 The storage unit 105 stores information in which each of a plurality of types of predetermined sounds is associated with a predetermined period. For example, the storage unit 105 stores information in which a predetermined period of 15 seconds is associated with one predetermined sound, and a predetermined period of 30 seconds is associated with another predetermined sound. Memorize the added information. Note that 15 seconds and 30 seconds are merely examples, and the predetermined period (dead period) may be any length.

起動部１１１は、検出部１１２が検出した所定のサウンドに対応する所定の期間を、記憶部１０５を参照して特定する。そして、起動部１１１は、特定した所定の期間、認識部１１０が特定の音声コマンドを認識しても、情報処理装置１０の起動処理を停止する。 The activation unit 111 identifies a predetermined period corresponding to the predetermined sound detected by the detection unit 112 with reference to the storage unit 105. And the starting part 111 stops the starting process of the information processing apparatus 10, even if the recognition part 110 recognizes a specific voice command for the specified predetermined period.

上記のように、本発明の第１の実施形態の変形例３において、複数種類の所定のサウンドの各々に対して、起動処理を停止する所定の期間が設定される。そのため、例えば広告放送の長さなどに基づいて、当該広告放送に含ませる所定のサウンドを決定することにより、所定のサウンドに対応する所定の期間、すなわち広告の長さ以上の期間、情報処理装置１０の起動処理を停止することができる。その結果、広告放送の長さよりも、所定の期間が短くなることを防止することが可能となり、テレビ等から発せられる特定の音声コマンドによって情報処理装置１０が誤起動してしまうことを低減することが可能となる。 As described above, in the third modification of the first embodiment of the present invention, a predetermined period for stopping the activation process is set for each of a plurality of types of predetermined sounds. For this reason, for example, by determining a predetermined sound to be included in the advertisement broadcast based on the length of the advertisement broadcast, the information processing apparatus can be used for a predetermined period corresponding to the predetermined sound, that is, a period longer than the advertisement length. 10 start-up processes can be stopped. As a result, it is possible to prevent the predetermined period from becoming shorter than the length of the advertisement broadcast, and to reduce the erroneous start of the information processing apparatus 10 due to a specific voice command issued from a television or the like. Is possible.

（変形例４）
変形例４は、所定のサウンド（例えば、所定のサウンドロゴ）を検出するタイミングと、当該所定のサウンドを検出したことに応答して情報処理装置１０の起動の停止を開始するタイミングとが、互いに異なる（連続していない）場合の形態である。 (Modification 4)
In the modified example 4, the timing for detecting a predetermined sound (for example, a predetermined sound logo) and the timing for starting the stop of the information processing apparatus 10 in response to the detection of the predetermined sound are mutually different. It is a form when different (not continuous).

情報処理装置１０の不感期間（すなわち、所定の期間）が長いと、当該情報処理装置１０の起動処理が停止される時間が長くなってしまい、利便性が低下する可能性がある。一方、所定のサウンドは、例えば、所定のサウンドロゴであり、情報処理装置１０を販売等する企業が、当該情報処理装置１０の広告放送（ＣＭ）などに含ませるものである。そのため、情報処理装置１０を販売等する企業は、広告放送（ＣＭ）の内容を把握している可能性が高い。すなわち、情報処理装置１０を販売等する企業は、当該広告放送（ＣＭ）内のいずれのタイミングで、特定の音声コマンドが発せられるのか把握できる。この場合、情報処理装置１０を販売等する企業は、情報処理装置１０の不感期間（所定の期間）を、広告放送（ＣＭ）において特定の音声コマンドが発せられるタイミングに合わせることで、不感期間を短くすることができる。その結果、情報処理装置１０の利便性を向上させることができる。 If the insensitive period (that is, the predetermined period) of the information processing apparatus 10 is long, the time for which the startup process of the information processing apparatus 10 is stopped becomes long, and convenience may be reduced. On the other hand, the predetermined sound is, for example, a predetermined sound logo, and is included in an advertisement broadcast (CM) of the information processing apparatus 10 by a company that sells the information processing apparatus 10. Therefore, a company that sells the information processing apparatus 10 has a high possibility of grasping the content of the advertisement broadcast (CM). That is, a company that sells the information processing apparatus 10 can grasp at which timing in the advertisement broadcast (CM) a specific voice command is issued. In this case, a company that sells the information processing apparatus 10 matches the dead period (predetermined period) of the information processing apparatus 10 with the timing at which a specific voice command is issued in an advertisement broadcast (CM). Can be shortened. As a result, the convenience of the information processing apparatus 10 can be improved.

具体的には、複数種類の所定のサウンドの各々に対して、情報処理装置１０の起動処理を停止する所定の期間（不感期間）の開始のタイミングが設定される。そして、情報処理装置１０を販売等する企業は、複数種類の所定のサウンドのうち、広告放送（ＣＭ）において特定の音声コマンドが発せられるタイミングに合わせて、当該情報処理装置１０の不感期間が開始される所定のサウンドを選択し、当該選択した所定のサウンドを当該広告放送に含める。なお、この場合において、所定の期間（不感期間）は、特定の音声のコマンドが発せられる長さに合わせて設定してもよい。その場合、広告放送（ＣＭ）において特定の音声コマンドが発せられる時間帯だけを、所定の期間（不感期間）とすることが可能になる。 Specifically, the start timing of a predetermined period (dead period) for stopping the activation process of the information processing apparatus 10 is set for each of a plurality of types of predetermined sounds. A company that sells the information processing apparatus 10 starts a dead period of the information processing apparatus 10 at a timing when a specific voice command is issued in an advertisement broadcast (CM) among a plurality of types of predetermined sounds. The predetermined sound to be selected is selected, and the selected predetermined sound is included in the advertisement broadcast. In this case, the predetermined period (dead period) may be set according to the length at which a specific voice command is issued. In that case, only a time zone in which a specific voice command is issued in the advertisement broadcast (CM) can be set as a predetermined period (dead period).

また、テレビ等において、同じ企業からの広告放送（ＣＭ）が、連続して放送される場合がある。例えば、情報処理装置１０を販売等する企業が、当該情報処理装置１０についての広告放送（ＣＭ）を複数パターン作成しており、あるパターンの広告放送（ＣＭ）に続いて、別のパターンの広告放送（ＣＭ）を放送することにより、広告効果を高める場合が想定される。このような場合には、初めに放送された広告放送（ＣＭ）において特定の音声コマンドが発せられるタイミングから、連続して（又は、他企業の広告放送（ＣＭ）を挟んで）放送される別のパターンの広告放送（ＣＭ）において特定の音声コマンドが発せられるタイミングまで、情報処理装置１０の不感期間（所定の期間）を継続すべき場合が想定される。 In addition, advertisement broadcasting (CM) from the same company may be continuously broadcast on a television or the like. For example, a company that sells the information processing apparatus 10 has created a plurality of advertisement broadcasts (CMs) for the information processing apparatus 10, followed by another pattern of advertisement broadcasts (CM). It is assumed that the advertising effect is enhanced by broadcasting the broadcast (CM). In such a case, it is possible to broadcast continuously (or with another company's advertisement broadcast (CM)) from the timing at which a specific voice command is issued in the advertisement broadcast (CM) broadcast first. It is assumed that the insensitive period (predetermined period) of the information processing apparatus 10 should be continued until a specific voice command is issued in the advertisement broadcast (CM) of this pattern.

この場合、情報処理装置１０を販売等する企業は、情報処理装置１０の不感期間（所定の期間）を、最初の広告放送（ＣＭ）において特定の音声コマンドが発せられるタイミング（時間）から、別パターンの広告放送（ＣＭ）において特定の音声コマンドが発せられる時点まで継続させる。これによって、情報処理装置１０は、最初の広告放送（ＣＭ）において検出した所定のサウンドによって、その後放送される別パターンの広告放送（ＣＭ）において発せられる特定の音声コマンドに対しても不感となり、情報処理装置１０の起動処理を停止することが可能となる。 In this case, a company that sells the information processing apparatus 10 separates the insensitive period (predetermined period) of the information processing apparatus 10 from the timing (time) at which a specific voice command is issued in the first advertisement broadcast (CM). This is continued until a specific voice command is issued in the advertisement broadcast (CM) of the pattern. As a result, the information processing apparatus 10 is insensitive to a specific voice command issued in another pattern of the advertisement broadcast (CM) that is subsequently broadcast by the predetermined sound detected in the first advertisement broadcast (CM). The activation process of the information processing apparatus 10 can be stopped.

上記の例のように、所定のサウンド（例えば、所定のサウンドロゴ）を検出するタイミングと、当該所定のサウンドを検出したことに応答して情報処理装置１０の起動を停止するタイミングとが、互いに異なることが望ましい場合がある。なお、上記の例はあくまでも例示であって、所定のサウンドを検出するタイミングと、情報処理装置１０の起動を停止するタイミングとを互いに異ならせることが望ましい場合は、様々なケースが想定される。 As in the above example, the timing for detecting a predetermined sound (for example, a predetermined sound logo) and the timing for stopping the activation of the information processing apparatus 10 in response to the detection of the predetermined sound are mutually It may be desirable to be different. Note that the above example is merely an example, and various cases are assumed when it is desirable to make the timing for detecting a predetermined sound different from the timing for stopping the activation of the information processing apparatus 10.

記憶部１０５は、複数種類の所定のサウンドの各々と、所定の期間（不感期間）の開始のタイミングとを対応付けた情報を記憶する。例えば、記憶部１０５は、一の所定のサウンドに対して、所定の期間（不感期間）の開始のタイミングとして、当該一の所定のサウンドを検出してから１０秒後である旨の情報が記憶される。すなわち、一の所定のサウンドは、当該一の所定のサウンドが検出されてから、１０秒経過するまでは、不感期間（所定の期間）とならない。なお、１０秒後はあくまでも例示であって、所定の期間（不感期間）の開始のタイミングは、所定のサウンドを検出してから何秒後（何分後など単位は任意）であってもよい。 The storage unit 105 stores information in which each of a plurality of types of predetermined sounds is associated with the start timing of a predetermined period (dead period). For example, the storage unit 105 stores, for one predetermined sound, information indicating that it is 10 seconds after the one predetermined sound is detected as the start timing of the predetermined period (dead period). Is done. That is, one predetermined sound does not become a dead period (predetermined period) until 10 seconds elapses after the one predetermined sound is detected. Note that 10 seconds later is merely an example, and the start timing of the predetermined period (dead period) may be any number of seconds after the detection of the predetermined sound (units are arbitrary, such as how many minutes later). .

また、記憶部１０５は、変形例３と同様に、複数種類の所定のサウンドの各々に対して、情報処理装置１０の起動処理を停止する所定の期間を設定してもよい。例えば、記憶部１０５は、一の所定のサウンドに対しては、１５秒の所定の期間を対応付けた情報を記憶する。その結果、一の所定のサウンドを検出した情報処理装置１０は、当該一の所定のサウンドが検出されてから１０秒後に所定の期間（不感期間）が開始され、その後、当該所定の期間（不感期間）が１５秒間継続する。 Similarly to the third modification, the storage unit 105 may set a predetermined period for stopping the activation process of the information processing apparatus 10 for each of a plurality of types of predetermined sounds. For example, the storage unit 105 stores information in which a predetermined period of 15 seconds is associated with one predetermined sound. As a result, the information processing apparatus 10 that has detected one predetermined sound starts a predetermined period (insensitive period) 10 seconds after the one predetermined sound is detected, and thereafter, the predetermined period (insensitive period). Period) continues for 15 seconds.

なお、１５秒後はあくまでも例示であって、所定の期間（不感期間）は、どのような長さであってもよい。また、所定の期間は、例えば、広告放送（ＣＭ）において特定の音声コマンドが発せられる長さに設定されてもよい。また、所定の期間は、例えば、広告放送（ＣＭ）において特定の音声コマンドが複数回発せられる場合、最後に発せられる特定の音声コマンドが終了するタイミングを含む長さに設定されてもよい。 Note that 15 seconds later is merely an example, and the predetermined period (dead period) may be any length. Further, the predetermined period may be set to a length at which a specific voice command is issued in an advertisement broadcast (CM), for example. Further, for example, when a specific voice command is issued a plurality of times in an advertisement broadcast (CM), the predetermined period may be set to a length including a timing at which the last specific voice command is ended.

なお、記憶部１０５は、複数種類の所定のサウンドの各々に対して、所定の期間（不感期間）の開始のタイミングに加えて、終了のタイミングが設定されていてもよい。例えば、記憶部１０５は、他の所定のサウンドに対して、所定の期間（不感期間）の開始が当該他の所定のサウンドの検出から１０秒後であり、当該所定の期間（不感期間）の終了が当該他の所定のサウンドの検出から３０秒後であることを示す情報を記憶する。この場合、他の所定のサウンドを検出した情報処理装置１０は、当該他の所定のサウンドの検出から１０秒後に所定の期間（不感期間）を開始し、当該他の所定のサウンドの検出から３０秒後に所定の期間（不感期間）を終了する。 Note that the storage unit 105 may set an end timing in addition to the start timing of a predetermined period (dead period) for each of a plurality of types of predetermined sounds. For example, the storage unit 105 starts the predetermined period (insensitive period) for another predetermined sound 10 seconds after the detection of the other predetermined sound, and the predetermined period (insensitive period) Information indicating that the end is 30 seconds after the detection of the other predetermined sound is stored. In this case, the information processing apparatus 10 that has detected the other predetermined sound starts a predetermined period (dead period) 10 seconds after the detection of the other predetermined sound, and 30 times after the detection of the other predetermined sound. The predetermined period (dead period) ends after 2 seconds.

なお、３０秒後はあくまでも例示であって、所定の期間（不感期間）の終了のタイミングは、所定のサウンドを検出してから何秒後（何分後など単位は任意）であってもよい。また、所定の期間（不感期間）の終了のタイミングは、広告放送（ＣＭ）において特定の音声コマンドの発生が終了するタイミングに設定されてもよい。また、所定の期間（不感期間）の終了のタイミングは、例えば、広告放送（ＣＭ）において特定の音声コマンドが複数回発せられる場合、最後に発せられる特定の音声コマンドが終了するタイミングに設定されてもよい。 Note that the time after 30 seconds is merely an example, and the end timing of the predetermined period (dead period) may be any number of seconds after the detection of the predetermined sound (units are arbitrary, such as after several minutes). . The end timing of the predetermined period (dead period) may be set to a timing at which the generation of a specific voice command ends in the advertisement broadcast (CM). Also, the end timing of the predetermined period (dead period) is set to the timing at which the last specific voice command is ended, for example, when a specific voice command is issued a plurality of times in an advertisement broadcast (CM). Also good.

起動部１１１は、検出部１１２が検出した所定のサウンドに対応する所定の期間（不感期間）の開始のタイミングを、記憶部１０５を参照して特定する。そして、起動部１１１は、所定の期間（不感期間）の開始のタイミング以降、認識部１１０が特定の音声コマンドを認識しても、情報処理装置１０の起動処理を停止する。 The activation unit 111 identifies the start timing of a predetermined period (dead period) corresponding to the predetermined sound detected by the detection unit 112 with reference to the storage unit 105. Then, the activation unit 111 stops the activation process of the information processing apparatus 10 even after the recognition unit 110 recognizes a specific voice command after the start timing of the predetermined period (dead period).

また、起動部１１１は、検出部１１２が検出した所定のサウンドに対応する所定の期間を、記憶部１０５を参照して特定する。そして、起動部１１１は、所定の期間（不感期間）の開始のタイミング以降であって、所定の期間内において、認識部１１０が特定の音声コマンドを認識しても、情報処理装置１０の起動処理を停止する。 In addition, the activation unit 111 identifies a predetermined period corresponding to the predetermined sound detected by the detection unit 112 with reference to the storage unit 105. The activation unit 111 performs the activation process of the information processing apparatus 10 even when the recognition unit 110 recognizes a specific voice command within a predetermined period after the start timing of the predetermined period (dead period). To stop.

なお、起動部１１１は、検出部１１２が検出した所定のサウンドに対応する所定の期間（不感期間）の終了のタイミングを、記憶部１０５を参照して特定してもよい。そして、起動部１１１は、所定の期間（不感期間）の開始のタイミング以降、所定の期間（不感期間）の終了のタイミングまで、認識部１１０が特定の音声コマンドを認識しても、情報処理装置１０の起動処理を停止する。 The activation unit 111 may specify the end timing of the predetermined period (dead period) corresponding to the predetermined sound detected by the detection unit 112 with reference to the storage unit 105. And even if the recognition part 110 recognizes a specific voice command from the timing of the start of a predetermined period (dead period) to the timing of the end of a predetermined period (dead period), the starting part 111 is information processing apparatus. 10 start-up processing is stopped.

上記のように、本発明の第１の実施形態の変形例４において、複数種類の所定のサウンドの各々に対して、所定のサウンドを検出したことに応答して情報処理装置１０の起動の停止を開始するタイミングが設定される。所定のサウンドを検出するタイミングと、当該所定のサウンドを検出したことに応答して情報処理装置１０の起動を停止するタイミングとを互いに異ならせることができる。その結果、広告放送（ＣＭ）において特定の音声コマンドが発せられるタイミング（時間）に合わせて、情報処理装置１０の起動処理を停止することによって、不感期間（所定の期間）を短くすることができる。また、広告放送（ＣＭ）の放送の態様に応じて、異なる広告放送（ＣＭ）をまたいで不感期間（所定の期間）を設定することもできる。このように、変形例４では、情報処理装置１０の起動処理を停止する所定の期間（不感期間）の開始のタイミングや終了のタイミングを自由に設定でき、利便性を向上させることができる。 As described above, in the fourth modification of the first embodiment of the present invention, the start of the information processing apparatus 10 is stopped in response to detection of a predetermined sound for each of a plurality of types of predetermined sounds. The timing to start is set. The timing for detecting the predetermined sound and the timing for stopping the activation of the information processing apparatus 10 in response to the detection of the predetermined sound can be made different from each other. As a result, the dead period (predetermined period) can be shortened by stopping the activation process of the information processing apparatus 10 in accordance with the timing (time) at which a specific voice command is issued in the advertisement broadcast (CM). . In addition, a dead period (predetermined period) can be set across different advertisement broadcasts (CM) according to the broadcast mode of the advertisement broadcast (CM). As described above, in the fourth modification, the start timing and end timing of the predetermined period (dead period) for stopping the activation processing of the information processing apparatus 10 can be freely set, and convenience can be improved.

＜第２の実施形態＞
本発明の第２の実施形態について、図面を参照して説明する。 <Second Embodiment>
A second embodiment of the present invention will be described with reference to the drawings.

本発明の第２の実施形態では、テレビ等において放送される広告放送（ＣＭ）などに、所定のサウンドを含ませ、情報処理装置１０が、所定のサウンドとともに、当該所定のサウンドが発せられた方向を検出する場合の実施形態である。そして、情報処理装置１０は、所定のサウンドが発せられた方向からの特定の音声コマンドを認識しても、当該情報処理装置の起動処理を実行しない。 In the second embodiment of the present invention, a predetermined sound is included in an advertisement broadcast (CM) broadcast on a television or the like, and the information processing apparatus 10 emits the predetermined sound together with the predetermined sound. It is an embodiment in the case of detecting a direction. And even if the information processing apparatus 10 recognizes a specific voice command from the direction in which the predetermined sound is emitted, the information processing apparatus 10 does not execute the activation process of the information processing apparatus.

図７は、本発明の第２の実施形態における情報処理装置１０の状態を説明するための図である。図７において、テレビ２０は、所定のサウンド４０を含む広告放送を放送している。なお、広告放送の内容については、図１に例示する広告放送と同様である。 FIG. 7 is a diagram for explaining the state of the information processing apparatus 10 according to the second embodiment of the present invention. In FIG. 7, the television 20 broadcasts an advertisement broadcast including a predetermined sound 40. The content of the advertisement broadcast is the same as that of the advertisement broadcast illustrated in FIG.

このような広告放送がテレビ２０により放送されると、実際に部屋などに設置されている情報処理装置１０は、所定のサウンド４０と、当該所定のサウンド４０が発せられた方向を検出する。図７の例では、所定のサウンドが発せられた方向として、方向６０を検出する。図２に示すように、所定のサウンド４０が発せられる方向は、ある程度の範囲（方向６０）として検出されてもよいし、ある一方向（すなわち、ある一点）として検出されてもよい。 When such an advertisement broadcast is broadcast by the television 20, the information processing apparatus 10 actually installed in a room or the like detects a predetermined sound 40 and a direction in which the predetermined sound 40 is emitted. In the example of FIG. 7, the direction 60 is detected as the direction in which the predetermined sound is emitted. As shown in FIG. 2, the direction in which the predetermined sound 40 is emitted may be detected as a certain range (direction 60), or may be detected as a certain direction (that is, a certain point).

この場合において、情報処理装置１０は、所定のサウンドが発せられた方向からの特定の音声コマンドを認識しても、当該情報処理装置１０の起動処理を停止する。すなわち、情報処理装置１０は、所定のサウンドが発せられた方向を、特定の音声コマンドを認識しても当該情報処理装置１０を起動しない不感方向として設定し、当該不感方向からの特定の音声コマンドを認識しても情報処理装置１０を起動しない。例えば、図２において、所定のサウンドが発せられた方向、すなわちテレビ２０の方向から特定の音声コマンドが発せられても、情報処理装置１０は、スリープ状態（図７の「ＯＦＦ」の状態）のままとなり、アクティブ状態に遷移しない。 In this case, even if the information processing apparatus 10 recognizes a specific voice command from the direction in which the predetermined sound is emitted, the information processing apparatus 10 stops the activation process of the information processing apparatus 10. That is, the information processing apparatus 10 sets the direction in which the predetermined sound is emitted as a dead direction that does not activate the information processing apparatus 10 even when a specific voice command is recognized, and the specific voice command from the dead direction. Does not start up the information processing apparatus 10. For example, in FIG. 2, even if a specific voice command is issued from the direction in which a predetermined sound is emitted, that is, the direction of the television 20, the information processing apparatus 10 is in the sleep state (the “OFF” state in FIG. 7). Remain and do not transition to the active state.

その結果、本発明の第１の実施形態における情報処理装置１０は、テレビ２０等から発せられる特定の音声コマンドを認識してもスリープ状態を維持するため、テレビ等から発せられる様々な音声に反応しなくなり、ユーザの意図しない処理が実行されることを防止できる。 As a result, the information processing apparatus 10 according to the first exemplary embodiment of the present invention maintains a sleep state even when a specific voice command issued from the television 20 or the like is recognized, and thus reacts to various voices emitted from the television or the like. Thus, it is possible to prevent a process unintended by the user from being executed.

（システム構成）
本発明の第２の実施形態における情報処理システムの構成例は、図３に示す本発明の第１の実施形態における情報処理システムの構成例と同様であるため、詳細な説明は省略する。 (System configuration)
The configuration example of the information processing system in the second embodiment of the present invention is the same as the configuration example of the information processing system in the first embodiment of the present invention shown in FIG.

（情報処理装置の構成例）
本発明の第２の実施形態における情報処理装置の構成例は、図３に示す本発明の第１の実施形態における情報処理装置の構成例と同様である。 (Configuration example of information processing device)
The configuration example of the information processing apparatus in the second embodiment of the present invention is the same as the configuration example of the information processing apparatus in the first embodiment of the present invention shown in FIG.

本発明の第２の実施形態において、検出部１１２は、所定のサウンドに加えて、当該所定のサウンドが発せられた方向を検出する。すなわち、所定のサウンドの音源の方向を検出する。音源の方向の検出は、例えば、音信号の時間差検出に基づく方法や、指向性のビームを走査する方法（ビームフォーミング技術）、空間周波数として求める方法などを用いることができる。 In the second embodiment of the present invention, the detection unit 112 detects the direction in which the predetermined sound is emitted in addition to the predetermined sound. That is, the direction of the sound source of a predetermined sound is detected. The direction of the sound source can be detected using, for example, a method based on time difference detection of a sound signal, a method of scanning a directional beam (beam forming technique), a method of obtaining a spatial frequency, or the like.

図８及び図９は、検出部１１２によって所定のサウンドが発せられた方向を検出する動作を説明するための図である。図８及び図９において、テレビ２０は、所定のサウンドを含む広告放送を放送している。 8 and 9 are diagrams for explaining an operation of detecting a direction in which a predetermined sound is emitted by the detection unit 112. FIG. 8 and 9, the television 20 broadcasts an advertisement broadcast including a predetermined sound.

図８に例示するように、検出部１１２は、所定のサウンドが発せられた方向について、情報処理装置１０のある一点１１を通る地面１２に水平な面１３を基準として、上方向にα［度］、下方向にβ［度］の範囲として検出される。図８の例では、検出部１１２は、上方向α［度］から下方向β［度］の範囲６０Ａを、所定のサウンドが発せられた方向として検出する。なお、所定のサウンドが発せられた方向は、面１３に対して上方向の角度のみで示されることもあれば、面１３に対して下方向の角度のみで示されることもある。 As illustrated in FIG. 8, the detection unit 112 detects α [degrees] in the upward direction with respect to a direction in which a predetermined sound is emitted with reference to a plane 13 that is horizontal to the ground 12 passing through a certain point 11 of the information processing apparatus 10. ] Is detected as a range of β [degrees] in the downward direction. In the example of FIG. 8, the detection unit 112 detects a range 60A from the upward direction α [degrees] to the downward direction β [degrees] as a direction in which a predetermined sound is emitted. Note that the direction in which the predetermined sound is emitted may be indicated only by an upward angle with respect to the surface 13, or may be indicated only by an downward angle with respect to the surface 13.

また、図９に例示するように、検出部１１２は、所定のサウンドが発せられた方向について、情報処理装置のある一点１１を通る地面１２に垂直な面１４を基準として、右方向にγ［度］、左方向にδ［度］の範囲として検出される。図９の例では、検出部１１２は、右方向γ［度］から左方向δ［度］の範囲６０Ｂが、所定のサウンドが発せられた方向と検出する。なお、所定のサウンドが発せられた方向は、面１４に対して左方向の角度のみで示されることもあれば、面１４に対して右方向の角度のみで示されることもある。 In addition, as illustrated in FIG. 9, the detection unit 112 detects γ [ Degree], it is detected as a range of δ [degree] in the left direction. In the example of FIG. 9, the detection unit 112 detects a range 60 </ b> B from the right direction γ [degree] to the left direction δ [degree] as a direction in which a predetermined sound is emitted. It should be noted that the direction in which the predetermined sound is emitted may be indicated by only the leftward angle with respect to the surface 14 or may be indicated only by the rightward angle with respect to the surface 14.

上記のように、検出部１１２は、所定のサウンドが発せられる方向を、ある程度の範囲として検出可能である。なお、検出部１１２は、所定のサウンドが発せられる方向を、ある一方向（すなわち、ある一点）として検出してもよい。 As described above, the detection unit 112 can detect the direction in which a predetermined sound is emitted as a certain range. Note that the detection unit 112 may detect a direction in which a predetermined sound is emitted as a certain direction (that is, a certain point).

また、検出部１１２は、所定のサウンドを発した物体の位置を検出してもよい。図８の例において、検出部１１２は、例えば、所定のサウンドを発した物体の地面１２からの垂直方向の距離の範囲（図８の例では、高さｈ及び高さＨの範囲）を検出可能である。また、図９の例において、検出部１１２は、例えば、所定のサウンドを発した物体の地面１２に水平方向の距離の範囲（図９の例では、距離ｌ及び距離Ｌの範囲）を検出可能である。すなわち、検出部１１２は、所定のサウンドを発した物体の位置を、自装置（情報処理装置１０）からの距離として検出することができる。 The detection unit 112 may detect the position of an object that has emitted a predetermined sound. In the example of FIG. 8, the detection unit 112 detects, for example, a range of the distance in the vertical direction from the ground 12 of the object that has emitted a predetermined sound (in the example of FIG. 8, the range of the height h and the height H). Is possible. In the example of FIG. 9, the detection unit 112 can detect, for example, a distance range in the horizontal direction on the ground 12 of the object that has emitted a predetermined sound (in the example of FIG. 9, a range of distance l and distance L). It is. That is, the detection unit 112 can detect the position of the object that has emitted the predetermined sound as the distance from the own device (the information processing device 10).

なお、検出部１１２は、所定のサウンドを発した物体の位置を、当該情報処理装置１０が設置された空間（例えば、部屋）における相対的な位置として検出してもよい。 Note that the detection unit 112 may detect the position of an object that emits a predetermined sound as a relative position in a space (for example, a room) in which the information processing apparatus 10 is installed.

起動部１１１は、検出部１１２が検出した所定のサウンドが発せられた方向から、特定の音声コマンドを受信しても（すなわち、所定のサウンドが発せられた方向からの特定の音声コマンドを認識しても）、情報処理装置１０の起動処理を行わない。すなわち、起動部１１１は、検出部１１２が検出した所定のサウンドが発せられた方向を、不感方向として設定し、当該不感方向から特定の音声コマンドを受信しても、情報処理装置１０の起動処理を行わない。 Even if the activation unit 111 receives a specific voice command from the direction in which the predetermined sound detected by the detection unit 112 is emitted (that is, recognizes the specific voice command from the direction in which the predetermined sound is emitted). However, the activation processing of the information processing apparatus 10 is not performed. That is, the activation unit 111 sets the direction in which the predetermined sound detected by the detection unit 112 is emitted as the insensitive direction, and even if a specific voice command is received from the insensitive direction, the activation process of the information processing apparatus 10 is performed. Do not do.

具体的には、起動部１１１は、認識部１１０によって認識された所定の音声コマンドが発せられた方向が、検出部１１２によって検出された所定のサウンドが発せられた方向に合致する場合には、情報処理装置１０の起動処理を実行しない。なお、合致するとは、認識部１１０によって認識された所定の音声コマンドが発せられた方向が、検出部１１２によって検出された所定のサウンドが発せられた方向と一致する又は含まれる場合である。 Specifically, the activation unit 111, when the direction in which the predetermined voice command recognized by the recognition unit 110 is issued matches the direction in which the predetermined sound detected by the detection unit 112 is emitted, The activation process of the information processing apparatus 10 is not executed. Note that “match” means that the direction in which the predetermined voice command recognized by the recognition unit 110 is issued matches or is included in the direction in which the predetermined sound detected by the detection unit 112 is emitted.

なお、起動部１１１は、一度、所定のサウンドが発せられた方向を検出すると、以後、その方向から特定の音声コマンドを受信しても（すなわち、以後、所定のサウンドが発せられた方向からの特定の音声コマンドを認識しても）、情報処理装置１０の起動処理を行わない。すなわち、実施形態１とは異なり、情報処理装置１０の起動処理を停止する期間は、所定の期間に限られず、継続して当該起動処理を停止することになる。スマートスピーカーなどの情報処理装置１０の設置位置や、テレビ等の設置位置は、固定される可能性が高い。そこで、一度所定のサウンドが発せられた方向を検出すると、継続して情報処理装置１０の起動処理を行わないようにすることで、情報処理装置１０が当該方向を検出する処理が頻繁に実行されることを防止し、当該情報処理装置１０の処理負荷を低減することが可能となる。 In addition, once the activation unit 111 detects the direction in which the predetermined sound is emitted, the activation unit 111 receives a specific voice command from that direction thereafter (that is, from the direction in which the predetermined sound is emitted thereafter). Even if a specific voice command is recognized, the information processing apparatus 10 is not activated. That is, unlike the first embodiment, the period for stopping the activation process of the information processing apparatus 10 is not limited to a predetermined period, and the activation process is continuously stopped. There is a high possibility that the installation position of the information processing apparatus 10 such as a smart speaker or the installation position of a television or the like is fixed. Therefore, once the direction in which the predetermined sound is emitted is detected, the information processing apparatus 10 is not continuously activated so that the information processing apparatus 10 frequently executes the process of detecting the direction. And the processing load on the information processing apparatus 10 can be reduced.

所定のサウンドは、テレビ等において放送される広告放送に含まれるものであるところ、所定のサウンドが発せられる方向は、テレビ等が設置されている方向になる。そして、テレビ等が設置されている方向から発せられる特定の音声コマンドは、テレビ等から発せられたものである可能性が高い。そこで、起動部１１１は、テレビ等が設置されている方向を不感方向として設定し、当該方向から発せられた特定の音声コマンドに対しては情報処理装置１０を起動しない。 The predetermined sound is included in the advertisement broadcast broadcast on the television or the like, and the direction in which the predetermined sound is emitted is the direction in which the television or the like is installed. A specific voice command issued from the direction in which the television or the like is installed is highly likely to be issued from the television or the like. Therefore, the activation unit 111 sets the direction in which the television or the like is installed as the insensitive direction, and does not activate the information processing apparatus 10 for a specific voice command issued from the direction.

また、起動部１１１は、検出部１１２が所定のサウンドを発した物体の位置を検出する場合には、当該物体の位置から発せられた特定の音声コマンドを認識しても、情報処理装置１０の起動処理を行わない。一方、起動部１１１は、検出部１１２によって検出された物体の位置以外の位置から発せられた特定の音声コマンドを認識した場合には、情報処理装置１０を起動する。 In addition, when the detection unit 112 detects the position of an object that emits a predetermined sound, the activation unit 111 recognizes a specific voice command issued from the position of the object, even if it recognizes a specific voice command. Does not perform startup processing. On the other hand, when the activation unit 111 recognizes a specific voice command issued from a position other than the position of the object detected by the detection unit 112, the activation unit 111 activates the information processing apparatus 10.

なお、本発明の第２の実施形態において、所定のサウンドは、例えば、広告放送に含まれる所定のサウンドロゴであってもよい。例えば、複数のサウンドロゴのうち、予め定められた所定のサウンドロゴが、所定のサウンドとして設定される。また、所定のサウンドは、サウンドロゴに限られず、所定のメロディーや効果音、曲、音声であってもよい。また、所定のサウンドは、人間が聞こえる必要はなく、情報処理装置１０が検出可能な音情報であれば、例えばモスキート音等の高周波など、どのようなものであってもよい。また、所定のサウンドは、どのような長さであってもよい。 In the second embodiment of the present invention, the predetermined sound may be, for example, a predetermined sound logo included in advertisement broadcasting. For example, a predetermined sound logo determined in advance among the plurality of sound logos is set as the predetermined sound. The predetermined sound is not limited to the sound logo, and may be a predetermined melody, sound effect, song, or voice. The predetermined sound does not need to be heard by humans, and may be any sound information such as a high frequency such as a mosquito sound as long as the information can be detected by the information processing apparatus 10. Further, the predetermined sound may have any length.

（情報処理装置の動作例）
図１０は、本発明の第２の実施形態における情報処理装置１０の動作例を示すフローチャートである。なお、図１０に示す動作例はあくまでも一例であって、情報処理装置１０の動作は図１０に示す動作例に限定されない。 (Operation example of information processing device)
FIG. 10 is a flowchart illustrating an operation example of the information processing apparatus 10 according to the second exemplary embodiment of the present invention. Note that the operation example illustrated in FIG. 10 is merely an example, and the operation of the information processing apparatus 10 is not limited to the operation example illustrated in FIG.

情報処理装置１０の検出部１１２が、所定のサウンド、及び、当該所定のサウンドが発せられた方向を検出する（Ｓ２００）。例えば、認識部１１０は、所定のサウンドロゴ、及び、当該所定のサウンドロゴが発せられた方向を検出する。 The detection unit 112 of the information processing apparatus 10 detects a predetermined sound and a direction in which the predetermined sound is emitted (S200). For example, the recognition unit 110 detects a predetermined sound logo and a direction in which the predetermined sound logo is emitted.

その後、認識部１１０が、特定の音声コマンド、及び、当該特定の音声コマンドが発せられた方向を認識する（Ｓ２０１）。例えば、認識部１１０は、「Ｈｅｌｌｏ！」という所定の音声コマンドと、当該所定の音声コマンドが発せられた方向を認識する。 Thereafter, the recognition unit 110 recognizes the specific voice command and the direction in which the specific voice command is issued (S201). For example, the recognition unit 110 recognizes a predetermined voice command “Hello!” And a direction in which the predetermined voice command is issued.

次に、起動部１１１は、認識部１１０によって認識された所定の音声コマンドが発せられた方向が、検出部１１２によって検出された所定のサウンドが発せられた方向に合致するか否かを判定する（Ｓ２０２）。 Next, the activation unit 111 determines whether the direction in which the predetermined voice command recognized by the recognition unit 110 is emitted matches the direction in which the predetermined sound detected by the detection unit 112 is emitted. (S202).

そして、起動部１１１は、認識部１１０によって認識された所定の音声コマンドが発せられた方向が、検出部１１２によって検出された所定のサウンドが発せられた方向に合致する場合（Ｓ２０２のＹＥＳ）、情報処理装置１０を起動しない（Ｓ２０３）。 When the direction in which the predetermined voice command recognized by the recognition unit 110 is emitted matches the direction in which the predetermined sound detected by the detection unit 112 is emitted (YES in S202), the activation unit 111 The information processing apparatus 10 is not activated (S203).

一方、起動部１１１は、認識部１１０によって認識された所定の音声コマンドが発せられた方向が、検出部１１２によって検出された所定のサウンドが発せられた方向に合致しない場合（Ｓ２０２のＮＯ）、情報処理装置１０を起動する（Ｓ２０４）。 On the other hand, when the direction in which the predetermined voice command recognized by the recognition unit 110 is emitted does not match the direction in which the predetermined sound detected by the detection unit 112 is emitted (NO in S202), the activation unit 111 The information processing apparatus 10 is activated (S204).

上記のように、本発明の第２の実施形態において、情報処理装置１０は、所定のサウンドが発せられた方向からの特定の音声コマンドを認識しても、当該情報処理装置の起動処理を実行しない。その結果、情報処理装置１０は、テレビ等から発せられる特定の音声コマンドによって、当該情報処理装置１０が誤起動されることを防止することができる。また、情報処理装置１０は、スリープ状態を維持するため、テレビ等から発せられる様々な音声に反応しなくなり、ユーザの意図しない処理が実行されることを防止できる。 As described above, in the second embodiment of the present invention, even when the information processing apparatus 10 recognizes a specific voice command from the direction in which a predetermined sound is emitted, the information processing apparatus starts up. do not do. As a result, the information processing apparatus 10 can prevent the information processing apparatus 10 from being erroneously activated by a specific voice command issued from a television or the like. Further, since the information processing apparatus 10 maintains the sleep state, the information processing apparatus 10 does not react to various sounds emitted from a television or the like, and can prevent a process unintended by the user from being executed.

このように、情報処理装置１０は、所定のサウンドが発せられた方向を検出することによって、当該情報処理装置１０の誤起動できるところ、所定のサウンドが発せられた方向の検出は、話者認識技術によって識別する場合に比べて低コストで実現可能である。そのため、本発明の第２の実施形態における紹鴎処理装置１０は、低コストにより、テレビ等からの特定の音声コマンドによってスピーカーが誤起動することを低減できる。また、情報処理装置１０は、所定のサウンドが発せられた方向から、特定の音声コマンドが発せられたか否かによって起動の要否を決定でき、情報処理装置１０の操作を行うユーザを予め登録する必要がないため、例えば来客者など登録者以外の者が当該情報処理装置１０を操作することが可能となり、利便性も向上できる。 As described above, the information processing apparatus 10 can erroneously start the information processing apparatus 10 by detecting the direction in which the predetermined sound is emitted. However, the detection of the direction in which the predetermined sound is generated is performed by speaker recognition. This can be realized at a lower cost compared with the case of identifying by technology. Therefore, the introduction processing device 10 according to the second embodiment of the present invention can reduce the erroneous start of the speaker due to a specific voice command from a television or the like at a low cost. Further, the information processing apparatus 10 can determine whether or not the activation is necessary depending on whether or not a specific voice command is issued from the direction in which the predetermined sound is emitted, and registers a user who operates the information processing apparatus 10 in advance. Since it is not necessary, for example, a person other than the registrant such as a visitor can operate the information processing apparatus 10, and convenience can be improved.

（変形例１）
変形例１は、情報処理装置１０は、所定のサウンドが発せられた方向からの特定の音声コマンドを認識した場合であっても、予め登録されたユーザが発した特定の音声コマンドであることに応答して、当該情報処理装置１０を起動する場合の例である。 (Modification 1)
The first modification is that the information processing apparatus 10 is a specific voice command issued by a previously registered user even when the information processing apparatus 10 recognizes a specific voice command from the direction in which the predetermined sound is generated. In this example, the information processing apparatus 10 is activated in response.

変形例１における情報処理装置１０の記憶部１０５は、ユーザの音声データを予め記憶する。ユーザの音声データは、例えば、ユーザから、特定の音声コマンドなどを含む所定のフレーズを予め入力させることにより、記憶することができる。所定のフレーズは、例えば、「Ｈｅｌｌｏ！」や「Ｍｅｓｓａｇｅ」などを含み、情報処理装置１０は、ユーザから予め入力された音声に基づいて、ユーザの音声データを作成する。 The storage unit 105 of the information processing apparatus 10 in Modification 1 stores user voice data in advance. The user's voice data can be stored, for example, by inputting a predetermined phrase including a specific voice command from the user in advance. The predetermined phrase includes, for example, “Hello!”, “Message”, and the like, and the information processing apparatus 10 creates the user's voice data based on the voice input in advance by the user.

制御部１０１の認識部１１０は、記憶部１０５に記憶されているユーザの音声データに基づいて、認識した特定の音声コマンドが、予め登録されたユーザから発せられたか否かを判定する。なお、特定の音声コマンドが、予め登録されたユーザから発せられたか否かを判定することは、例えば特徴部分を比較することにより実現可能であり、話者識別技術を用いてテレビ等から発生られた音声と人間の肉声とを区別することに比べて、低コストで実現可能である。 Based on the user's voice data stored in the storage unit 105, the recognition unit 110 of the control unit 101 determines whether the recognized specific voice command is issued from a user registered in advance. Note that determining whether or not a specific voice command has been issued from a pre-registered user can be realized by, for example, comparing feature parts and generated from a television or the like using speaker identification technology. This can be realized at a low cost compared to distinguishing between a voice and a human voice.

起動部１１１は、検出部１１２が所定のサウンドの方向を検出し、当該所定のサウンドが発せられた方向からの特定の音声コマンドを認識した場合であっても、認識部１１０が予め登録されたユーザから発せられた特定の音声コマンドである旨を認識したことに応答して、情報処理装置１０を起動する。すなわち、起動部１１１は、認識部１１０が予め登録されたユーザから発せられた特定の音声コマンドを認識した場合には、スリープ状態の情報処理装置１０を、アクティブ状態に遷移させる。 Even when the detection unit 112 detects the direction of a predetermined sound and recognizes a specific voice command from the direction in which the predetermined sound is emitted, the activation unit 111 has the recognition unit 110 registered in advance. In response to recognizing that the voice command is a specific voice command issued by the user, the information processing apparatus 10 is activated. That is, when the recognition unit 110 recognizes a specific voice command issued from a user registered in advance, the activation unit 111 causes the information processing apparatus 10 in the sleep state to transition to the active state.

図１１は、本発明の第２の実施形態における情報処理装置１０の他の状態を説明するための図である。図１１において、ユーザ５０は予め登録されたユーザであり、情報処理装置１０の記憶部１０５には、ユーザ５０の音声データが予め記憶されている。 FIG. 11 is a diagram for explaining another state of the information processing apparatus 10 according to the second embodiment of the present invention. In FIG. 11, the user 50 is a registered user, and the voice data of the user 50 is stored in advance in the storage unit 105 of the information processing apparatus 10.

図１１において、テレビ２０は、所定のサウンドを含む広告放送を放送している。なお、広告放送の内容については、図１に例示する広告放送と同様である。そのため、実際に部屋などに設置されている情報処理装置１０は、所定のサウンドを検出することになる。そうすると、情報処理装置１０は、所定のサウンドを検出したことにより、当該所定のサウンドが発せられた方向から特定の音声コマンド「Ｈｅｌｌｏ！」を受信しても、当該情報処理装置１０を起動しない。すなわち、図１１において、テレビ２０から特定の音声コマンドが発せられても、情報処理装置１０は、スリープ状態のままとなり、アクティブ状態に遷移しない。 In FIG. 11, the television 20 broadcasts an advertisement broadcast including a predetermined sound. The content of the advertisement broadcast is the same as that of the advertisement broadcast illustrated in FIG. For this reason, the information processing apparatus 10 actually installed in a room or the like detects a predetermined sound. Then, the information processing apparatus 10 does not start up the information processing apparatus 10 even if the information processing apparatus 10 detects a predetermined sound and receives a specific voice command “Hello!” From the direction in which the predetermined sound is emitted. That is, in FIG. 11, even if a specific voice command is issued from the television 20, the information processing apparatus 10 remains in the sleep state and does not transition to the active state.

しかしながら、図１１において、ユーザ５０が特定の音声コマンド「Ｈｅｌｌｏ！」を発した場合には、当該ユーザ５０からの特定の音声コマンド「Ｈｅｌｌｏ！」が発せられた方向が、所定のサウンドが発せられた方向に合致する場合であっても、情報処理装置１０は、特定の音声コマンド「Ｈｅｌｌｏ！」が予め登録されたユーザからのものであると認識することによって、起動処理を実行する。すなわち、スリープ状態の情報処理装置１０は、ユーザ５０からの特定の音声コマンド「Ｈｅｌｌｏ！」に応答して、アクティブ状態（図６の「ＯＮ」の状態）に遷移する。 However, in FIG. 11, when the user 50 issues a specific voice command “Hello!”, A predetermined sound is emitted in the direction in which the specific voice command “Hello!” Is issued from the user 50. The information processing apparatus 10 executes the activation process by recognizing that the specific voice command “Hello!” Is from a user registered in advance even if the direction matches. That is, the information processing apparatus 10 in the sleep state transitions to the active state (the “ON” state in FIG. 6) in response to a specific voice command “Hello!” From the user 50.

上記のように、本発明の第２の実施形態の変形例１において、情報処理装置１０は、所定のサウンドを検出し、当該所定のサウンドが発せられた方向から特定の音声コマンドを受信した場合（すなわち、所定のサウンドが発せられた方向からの特定の音声コマンドを認識した場合）であっても、予め登録されたユーザから発せられた特定の音声コマンドである場合には、当該情報処理装置１０を起動する。そのため、情報処理装置１０は、所定のサウンドが発せられた方向であっても、全く起動できなくなるわけではなく、予め登録されたユーザであれば起動可能である。その結果、予め登録されたユーザであれば、いつでも情報処理装置１０を起動できることになり、利便性を向上させることができる。 As described above, in the first modification of the second embodiment of the present invention, the information processing apparatus 10 detects a predetermined sound and receives a specific voice command from the direction in which the predetermined sound is emitted. Even when (a specific voice command is recognized from the direction in which the predetermined sound is emitted), if the specific voice command is issued from a user registered in advance, the information processing apparatus 10 is started. For this reason, the information processing apparatus 10 is not disabled at all even in a direction in which a predetermined sound is emitted, and can be started up by a user registered in advance. As a result, if the user is registered in advance, the information processing apparatus 10 can be activated at any time, and convenience can be improved.

（変形例２）
変形例２は、情報処理装置１０が、所定のサウンドが発せられた方向から特定の音声コマンドを受信した場合（すなわち、所定のサウンドが発せられた方向からの特定の音声コマンドを認識した場合）、当該情報処理装置１０を起動するか否かをユーザに対して確認する場合の形態である。なお、変形例２では、情報処理装置１０は、ユーザに対して、所定のサウンドが発せられた方向から特定の音声コマンドを受信する都度、当該情報処理装置１０を起動するか否かをユーザに対して確認する。 (Modification 2)
In the second modification, when the information processing apparatus 10 receives a specific voice command from the direction in which the predetermined sound is emitted (that is, when the specific voice command is recognized from the direction in which the predetermined sound is emitted). This is a mode for confirming with the user whether or not to start the information processing apparatus 10. In the second modification, the information processing apparatus 10 asks the user whether to activate the information processing apparatus 10 every time a specific voice command is received from the direction in which the predetermined sound is emitted. Check against it.

図１２は、本発明の第２の実施形態の変形例２における情報処理装置１０の構成例を示す図である。図１２に例示するように、情報処理装置１０は、例えば、制御部１０１と、通信部１０２と、入出力部１０３と、表示部１０４と、記憶部１０５とを含む。なお、通信部１０２と、入出力部１０３と、表示部１０４と、記憶部１０５の構成例は、図４に示す本発明の第１の実施形態における情報処理装置１０の構成例と同様であるため、詳細な説明は省略する。 FIG. 12 is a diagram illustrating a configuration example of the information processing apparatus 10 according to the second modification of the second embodiment of the present invention. As illustrated in FIG. 12, the information processing apparatus 10 includes, for example, a control unit 101, a communication unit 102, an input / output unit 103, a display unit 104, and a storage unit 105. The configuration examples of the communication unit 102, the input / output unit 103, the display unit 104, and the storage unit 105 are the same as the configuration example of the information processing apparatus 10 according to the first embodiment of the present invention illustrated in FIG. Therefore, detailed description is omitted.

図１２に例示するように、制御部１０１は、認識部１１０と、起動部１１１と、検出部１１２、確認部１１３と、を含む。 As illustrated in FIG. 12, the control unit 101 includes a recognition unit 110, an activation unit 111, a detection unit 112, and a confirmation unit 113.

確認部１１３は、所定のサウンドが発せられた方向から特定の音声コマンドを受信した場合（すなわち、所定のサウンドが発せられた方向からの特定の音声コマンドを認識した場合）、当該情報処理装置１０を起動するか否かをユーザに対して確認する処理を実行する。具体的には、確認部１１３は、例えば、「起動しますか？」や「呼んだ？」といった音声を入出力部１０３から出力させ、ユーザに対して、情報処理装置１０を起動するか否かを確認する。 When the confirmation unit 113 receives a specific voice command from the direction in which the predetermined sound is emitted (that is, when the specific voice command is recognized from the direction in which the predetermined sound is emitted), the information processing apparatus 10 A process of confirming with the user whether or not to start is executed. Specifically, for example, the confirmation unit 113 causes the input / output unit 103 to output a sound such as “Do you want to start?” Or “Called?” And whether to start the information processing apparatus 10 for the user. To check.

起動部１１１は、ユーザから、情報処理装置１０を起動することを示す回答が入力されたことに応答して、情報処理装置１０を起動する。例えば、起動部１１１は、ユーザから、「起動する」や「Ｈｅｌｌｏ！」などの回答が入力されたことに応答して、情報処理装置１０を起動する。 The activation unit 111 activates the information processing apparatus 10 in response to input of an answer indicating activation of the information processing apparatus 10 from the user. For example, the activation unit 111 activates the information processing apparatus 10 in response to input of an answer such as “activate” or “Hello!” From the user.

一方、起動部１１１は、ユーザから、情報処理装置１０を起動しないことを示す回答が入力されたことに応答して、又は、ユーザから何の回答もないことに応答して、情報処理装置１０を起動しない。例えば、起動部１１１は、ユーザから、「起動しない」という回答が入力されたことに応答して、情報処理装置１０を起動しない。あるいは、起動部１１１は、ユーザから、所定の時間、回答が入力されないことに応答して、情報処理装置１０を起動しない。 On the other hand, the activation unit 111 responds to input of an answer indicating that the information processing apparatus 10 is not activated from the user, or in response to no response from the user. Does not start. For example, the activation unit 111 does not activate the information processing apparatus 10 in response to the input of an answer “do not activate” from the user. Alternatively, the activation unit 111 does not activate the information processing apparatus 10 in response to no answer being input from the user for a predetermined time.

上記のように、本発明の第２の実施形態の変形例２において、情報処理装置１０は、所定のサウンドが発せられた方向から特定の音声コマンドを受信した場合、情報処理装置１０を起動するか否かをユーザに対して確認する。これによって、ユーザの回答に応じて、情報処理装置１０を起動させ、又は、起動させないため、情報処理装置１０が誤起動されることを防止し、ユーザの意図しない処理が実行されることを防止できる。 As described above, in the second modification of the second embodiment of the present invention, the information processing apparatus 10 activates the information processing apparatus 10 when receiving a specific voice command from the direction in which the predetermined sound is emitted. Confirm whether or not to the user. As a result, the information processing apparatus 10 is activated or not activated in accordance with the user's answer, thereby preventing the information processing apparatus 10 from being erroneously activated and preventing the processing unintended by the user from being executed. it can.

（変形例３）
変形例３は、情報処理装置１０が、所定のサウンドが発せられた方向から特定の音声コマンドを受信した場合（すなわち、所定のサウンドが発せられた方向からの特定の音声コマンドを認識した場合）、それ以降、情報処理装置１０を起動するか否かをユーザに対して確認する場合の形態である。なお、変形例３では、情報処理装置１０は、ユーザに対して一度確認処理を実行し、ユーザから回答が入力された場合、それ以降の確認処理は実行せず、それ以降に認識する特定の音声コマンドに対しては、一度入力されたユーザの回答に従って、情報処理装置１０を起動する、又は、起動しない。 (Modification 3)
In Modification 3, the information processing apparatus 10 receives a specific voice command from the direction in which the predetermined sound is emitted (that is, when the information processing apparatus 10 recognizes the specific voice command from the direction in which the predetermined sound is emitted). Thereafter, it is a form in the case of confirming to the user whether or not to start the information processing apparatus 10. In the third modification, the information processing apparatus 10 executes confirmation processing once for the user, and when an answer is input from the user, the subsequent confirmation processing is not performed, and a specific recognition recognized thereafter is performed. In response to the voice command, the information processing apparatus 10 is activated or not activated according to the user's input once inputted.

変形例３における情報処理装置１０の構成例は、図１２に示す本発明の第２の実施形態の変形例２の情報処理装置１０の構成例と同様であるため、詳細な説明は省略する。 The configuration example of the information processing apparatus 10 in Modification 3 is the same as the configuration example of the information processing apparatus 10 in Modification 2 of the second embodiment of the present invention illustrated in FIG.

確認部１１３は、所定のサウンドが発せられた方向から特定の音声コマンドを受信した場合（すなわち、所定のサウンドが発せられた方向からの特定の音声コマンドを認識した場合）、当該情報処理装置１０を起動するか否かをユーザに対して確認する処理を実行する。具体的には、確認部１１３は、例えば、「起動しますか？」や「呼んだ？」という音声を入出力部１０３から出力させ、ユーザに対して、情報処理装置１０を起動するか否かを確認する。 When the confirmation unit 113 receives a specific voice command from the direction in which the predetermined sound is emitted (that is, when the specific voice command is recognized from the direction in which the predetermined sound is emitted), the information processing apparatus 10 A process of confirming with the user whether or not to start is executed. Specifically, the confirmation unit 113 outputs, for example, a voice “Do you want to start?” Or “Called?” From the input / output unit 103 to determine whether to start the information processing apparatus 10 for the user. To check.

起動部１１１は、ユーザから、情報処理装置１０を起動することを示す回答が入力されたことに応答して、それ以降、所定のサウンドが発せられた方向から特定の音声コマンドを受信した場合（すなわち、所定のサウンドが発せられた方向からの特定の音声コマンドを認識した場合）、情報処理装置１０を起動する。例えば、起動部１１１は、ユーザから、「起動する」や「Ｈｅｌｌｏ！」などの回答が入力されたことに応答して、それ以降、所定のサウンドが発せられた方向から特定の音声コマンドを受信した場合、情報処理装置１０を起動する。 In response to the input of an answer indicating that the information processing apparatus 10 is activated from the user, the activation unit 111 receives a specific voice command from the direction in which the predetermined sound is emitted thereafter ( That is, when a specific voice command from the direction in which the predetermined sound is emitted is recognized), the information processing apparatus 10 is activated. For example, the activation unit 111 receives a specific voice command from a direction in which a predetermined sound is emitted in response to an input of an answer such as “activate” or “Hello!” From the user. When the information processing apparatus 10 is activated, the information processing apparatus 10 is activated.

一方、起動部１１１は、ユーザから、情報処理装置１０を起動しないことを示す回答が入力されたことに応答して、又は、ユーザから何の回答もないことに応答して、それ以降、所定のサウンドが発せられた方向から特定の音声コマンドを受信した場合（すなわち、所定のサウンドが発せられた方向からの特定の音声コマンドを認識した場合）、情報処理装置１０を起動しない。例えば、起動部１１１は、ユーザから、「起動しない」という回答が入力されたことに応答して、それ以降、所定のサウンドが発せられた方向から特定の音声コマンドを受信した場合には、情報処理装置１０を起動しない。あるいは、起動部１１１は、ユーザから、所定の時間、回答が入力されないことに応答して、それ以降、所定のサウンドが発せられた方向から特定の音声コマンドを受信した場合には、情報処理装置１０を起動しない。 On the other hand, the activation unit 111 responds to the input of an answer indicating that the information processing apparatus 10 is not activated from the user, or responds to the absence of any answer from the user. When a specific voice command is received from the direction in which the sound is emitted (that is, when a specific voice command is recognized from the direction in which the predetermined sound is emitted), the information processing apparatus 10 is not activated. For example, when the activation unit 111 receives a specific voice command from a direction in which a predetermined sound is emitted in response to an input of an answer “not activated” from the user, The processing apparatus 10 is not activated. Alternatively, when the activation unit 111 receives a specific voice command from a direction in which a predetermined sound is generated in response to a response not being input from the user for a predetermined time, Do not start 10.

上記のように、本発明の第２の実施形態の変形例３において、情報処理装置１０は、所定のサウンドが発せられた方向から特定の音声コマンドを受信した場合（すなわち、所定のサウンドが発せられた方向からの特定の音声コマンドを認識した場合）、情報処理装置１０を起動するか否かをユーザに対して確認する。これによって、ユーザの回答に応じて、情報処理装置１０を起動させ、又は、起動させないため、情報処理装置１０が誤起動されることを防止し、ユーザの意図しない処理が実行されることを防止できる。また、変形例３では、一度確認処理を実行した場合には、一度入力されたユーザの回答に従って、情報処理装置１０を起動する、又は、起動しない。そのため、情報処理装置１０は、所定のサウンドが発せられた方向から特定の音声コマンドをするごとに確認処理を実行しなくなり、利便性も向上する。 As described above, in the third modification of the second embodiment of the present invention, the information processing apparatus 10 receives a specific voice command from the direction in which the predetermined sound is emitted (that is, the predetermined sound is emitted). When a specific voice command from a given direction is recognized), the user is confirmed whether or not to start the information processing apparatus 10. As a result, the information processing apparatus 10 is activated or not activated in accordance with the user's answer, thereby preventing the information processing apparatus 10 from being erroneously activated and preventing the processing unintended by the user from being executed. it can. Further, in the third modification, when the confirmation process is executed once, the information processing apparatus 10 is activated or not activated in accordance with the user's answer once inputted. Therefore, the information processing apparatus 10 does not execute the confirmation process every time a specific voice command is issued from the direction in which the predetermined sound is emitted, and convenience is improved.

（変形例４）
変形例４は、情報処理装置１０が、所定のサウンドが発せられた方向が変化した場合に、当該情報処理装置１０の設置場所が変化したことを検出し、変化後の設置場所に基づいて、情報処理装置１０の設置場所に依存するパラメータを再設定する場合の形態である。 (Modification 4)
In Modification 4, the information processing apparatus 10 detects that the installation location of the information processing apparatus 10 has changed when the direction in which the predetermined sound is emitted changes, and based on the changed installation location, This is a mode in the case of resetting parameters depending on the installation location of the information processing apparatus 10.

図１３は、本発明の第２の実施形態の変形例４における情報処理装置１０の構成例を示す図である。図１３に例示するように、情報処理装置１０は、例えば、制御部１０１と、通信部１０２と、入出力部１０３と、表示部１０４と、記憶部１０５とを含む。なお、通信部１０２と、入出力部１０３と、表示部１０４と、記憶部１０５の構成例は、図４に示す本発明の第１の実施形態における情報処理装置１０の構成例と同様であるため、詳細な説明は省略する。 FIG. 13 is a diagram illustrating a configuration example of the information processing apparatus 10 according to the fourth modification of the second embodiment of the present invention. As illustrated in FIG. 13, the information processing apparatus 10 includes, for example, a control unit 101, a communication unit 102, an input / output unit 103, a display unit 104, and a storage unit 105. The configuration examples of the communication unit 102, the input / output unit 103, the display unit 104, and the storage unit 105 are the same as the configuration example of the information processing apparatus 10 according to the first embodiment of the present invention illustrated in FIG. Therefore, detailed description is omitted.

図１３に例示するように、制御部１０１は、認識部１１０と、起動部１１１と、検出部１１２、設定部１１４と、を含む。 As illustrated in FIG. 13, the control unit 101 includes a recognition unit 110, an activation unit 111, a detection unit 112, and a setting unit 114.

検出部１１２は、所定のサウンドに加えて、当該所定のサウンドが発せられた方向を検出する。検出部１１２が、所定のサウンドが発せられた方向を検出する方法は、図８や図９に例示する方法などを用いることができる。 In addition to the predetermined sound, the detection unit 112 detects the direction in which the predetermined sound is emitted. As a method for the detection unit 112 to detect the direction in which the predetermined sound is emitted, the method illustrated in FIGS. 8 and 9 can be used.

また、検出部１１２は、自装置から見て所定のサウンドが発せられた方向が変化したことを検出する。検出部１１２は、前回検出した所定のサウンドが発せられた方向と、今回検出した所定のサウンドが発せられた方向とを比較して、所定量以上の違いがある場合に、自装置から見て所定のサウンドが発せられた方向が変化したと判定する。所定量は、図８及び図９に例示したα乃至δ各々の変化量であり、例えば５［度］である。なお、所定量は、どのような値であってもよい。 In addition, the detection unit 112 detects that the direction in which the predetermined sound is emitted as viewed from the own device has changed. The detection unit 112 compares the direction in which the predetermined sound detected last time is emitted with the direction in which the predetermined sound detected this time is emitted. It is determined that the direction in which the predetermined sound is emitted has changed. The predetermined amount is a change amount of each of α to δ illustrated in FIGS. 8 and 9, and is, for example, 5 [degrees]. The predetermined amount may be any value.

また、検出部１１２は、自装置から見て所定のサウンドが発せられた方向が変化したことを検出した場合に、情報処理装置１０の設置場所が変化したことを検出する。なお、検出部１１２は、自装置から見て所定のサウンドが発せられた方向が変化した変化量に基づいて、所定のサウンドが発せられた物体との相対的な位置を算出し、当該相対的な位置に基づいて、情報処理装置１０の設置場所を推定してもよい。 The detection unit 112 detects that the installation location of the information processing apparatus 10 has changed when it is detected that the direction in which the predetermined sound is emitted as viewed from its own device has changed. The detection unit 112 calculates a relative position with respect to the object from which the predetermined sound is emitted based on the amount of change in which the direction in which the predetermined sound is emitted as viewed from the own device is changed, and the relative position is calculated. The installation location of the information processing apparatus 10 may be estimated based on the correct position.

設定部１１４は、情報処理装置１０の設置場所に依存するパラメータを設定する処理を実行する。設置場所に依存するパラメータは、例えば、入出力部１０３における音声の受信感度や、当該入出力部１０３から出力する音声の大きさなどである。なお、設置場所に依存するパラメータは、これらの例に限られず、どのようなものであってもよい。音声の受信感度や、出力する音声の大きさは、情報処理装置１０の設置場所に応じて変化させることが望ましい。そこで、設定部１１４は、変化後の情報処理装置１０の設置場所に基づいて、設置場所に依存するパラメータを再設定する処理を実行する。 The setting unit 114 executes processing for setting parameters depending on the installation location of the information processing apparatus 10. The parameters depending on the installation location are, for example, the reception sensitivity of sound in the input / output unit 103, the volume of sound output from the input / output unit 103, and the like. The parameters depending on the installation location are not limited to these examples, and may be any parameters. It is desirable to change the sound reception sensitivity and the volume of sound to be output according to the installation location of the information processing apparatus 10. Therefore, the setting unit 114 executes processing for resetting parameters depending on the installation location based on the installation location of the information processing apparatus 10 after the change.

上記のように、本発明の第２の実施形態の変形例４において、情報処理装置１０は、自装置から見て所定のサウンドが発せられた方向が変化した場合に、当該情報処理装置１０の設置場所が変化したことを検出し、変化後の設置場所に基づいて、情報処理装置１０の設置場所に依存するパラメータを再設定する。これによって、情報処理装置１０は、自動的に設置場所に依存するパラメータを変更することが可能となり、利便性を向上させることができる。 As described above, in the fourth modification of the second embodiment of the present invention, the information processing apparatus 10 changes the information processing apparatus 10 when the direction in which the predetermined sound is emitted as viewed from the own apparatus changes. A change in the installation location is detected, and parameters that depend on the installation location of the information processing apparatus 10 are reset based on the changed installation location. As a result, the information processing apparatus 10 can automatically change the parameter depending on the installation location, and can improve convenience.

本開示の各実施形態のプログラムは、コンピュータに読み取り可能な記憶媒体に記憶された状態で提供されてもよい。記憶媒体は、「一時的でない有形の媒体」に、プログラムを記憶可能である。記憶媒体は、ＨＤＤやＳＤＤなどの任意の適切な記憶媒体、またはこれらの２つ以上の適切な組合せを含むことができる。記憶媒体は、揮発性、不揮発性、または揮発性と不揮発性の組合せでよい。なお、記憶媒体はこれらの例に限られず、プログラムを記憶可能であれば、どのようなデバイスまたは媒体であってもよい。 The program according to each embodiment of the present disclosure may be provided in a state of being stored in a computer-readable storage medium. The storage medium can store the program in “a tangible medium that is not temporary”. The storage medium can include any suitable storage medium such as HDD or SDD, or a suitable combination of two or more thereof. The storage medium may be volatile, non-volatile, or a combination of volatile and non-volatile. The storage medium is not limited to these examples, and any device or medium may be used as long as it can store the program.

なお、情報処理装置１０は、例えば、記憶媒体に記憶されたプログラムを読み出し、読み出したプログラムを実行することによって、各実施形態に示す複数の機能部の機能を実現することができる。また、当該プログラムは、任意の伝送媒体（通信ネットワークや放送波等）を介して、情報処理装置１０に提供されてもよい。情報処理装置１０は、例えば、インターネット等を介してダウンロードしたプログラムを実行することにより、各実施形態に示す複数の機能部の機能を実現する。 Note that the information processing apparatus 10 can realize the functions of a plurality of functional units described in each embodiment, for example, by reading a program stored in a storage medium and executing the read program. In addition, the program may be provided to the information processing apparatus 10 via any transmission medium (communication network, broadcast wave, etc.). The information processing apparatus 10 realizes the functions of a plurality of functional units shown in each embodiment, for example, by executing a program downloaded via the Internet or the like.

なお、当該プログラムは、例えば、ＡｃｔｉｏｎＳｃｒｉｐｔ、ＪａｖａＳｃｒｉｐｔ(登録商標)などのスクリプト言語、Ｏｂｊｅｃｔｉｖｅ―Ｃ、Ｊａｖａ(登録商標)などのオブジェクト指向プログラミング言語、ＨＴＭＬ５などのマークアップ言語などを用いて実装できる。 The program can be implemented using, for example, a script language such as ActionScript or JavaScript (registered trademark), an object-oriented programming language such as Objective-C or Java (registered trademark), or a markup language such as HTML5.

情報処理装置１０における処理の少なくとも一部は、１以上のコンピュータにより構成されるクラウドコンピューティングにより実現されていてもよい。 At least a part of the processing in the information processing apparatus 10 may be realized by cloud computing including one or more computers.

本開示の実施形態を諸図面や実施例に基づき説明してきたが、当業者であれば本開示に基づき種々の変形や修正を行うことが容易であることに注意されたい。従って、これらの変形や修正は本開示の範囲に含まれることに留意されたい。例えば、各手段、各ステップ等に含まれる機能等は論理的に矛盾しないように再配置可能であり、複数の手段やステップ等を１つに組み合わせたり、或いは分割したりすることが可能である。また、各実施形態に示す構成を適宜組み合わせることとしてもよい。 Although the embodiments of the present disclosure have been described based on the drawings and examples, it should be noted that those skilled in the art can easily make various changes and modifications based on the present disclosure. Therefore, it should be noted that these variations and modifications are included in the scope of the present disclosure. For example, the functions included in each means, each step, etc. can be rearranged so that there is no logical contradiction, and a plurality of means, steps, etc. can be combined or divided into one. . Moreover, it is good also as combining suitably the structure shown to each embodiment.

１０情報処理装置（スマートスピーカー）
１０Ａ広告放送内の情報処理装置
１０１制御部、１０２通信部、１０３入出力部、１０４表示部、１０５記憶部、１１０認識部、１１１起動部、１１２検出部、１１３確認部、１１４設定部
１１ある一点、１２地面、１３面、１４面
２０テレビ
３０登場人物
４０所定のサウンド
５０ユーザ
６０、６０Ａ、６０Ｂ所定の方向
２００サーバ装置
３００ネットワーク 10 Information processing equipment (smart speakers)
Information processing apparatus in 10A advertisement broadcast 101 Control unit, 102 Communication unit, 103 Input / output unit, 104 Display unit, 105 Storage unit, 110 Recognition unit, 111 Start-up unit, 112 Detection unit, 113 Confirmation unit, 114 Setting unit 11 One point, 12 ground, 13 planes, 14 planes 20 TV 30 characters 40 predetermined sound 50 user 60, 60A, 60B predetermined direction 200 server apparatus 300 network

Claims

An information processing apparatus operable by voice,
A recognition unit that recognizes a specific voice command;
An activation unit that activates the information processing apparatus in response to the specific voice command;
A detection unit for detecting a predetermined sound,
When the predetermined sound is detected, the activation unit stops an activation process of the information processing apparatus in response to the specific voice command for a predetermined period.

The predetermined sound is a predetermined sound logo included in the advertisement broadcast,
The detection unit detects a predetermined sound logo included in the advertisement broadcast,
2. The information processing according to claim 1, wherein when the predetermined sound logo is detected, the activation unit stops activation processing of the information processing apparatus in response to the specific voice command for a predetermined period. apparatus.

The recognition unit recognizes the specific voice command issued by a user registered in advance,
The activation unit stops the activation process of the information processing apparatus for the specific voice command issued by a user other than the user registered in advance during the predetermined period, and the user registered in advance 3. The information processing apparatus according to claim 1, wherein the information processing apparatus is activated for the specific voice command issued.

In addition to the predetermined sound, the detection unit detects a direction in which the predetermined sound is emitted,
The activation unit stops the activation process of the information processing apparatus in response to the specific voice command issued from the direction in which the predetermined sound is emitted during the predetermined period, and generates the predetermined sound. 4. The information processing apparatus according to claim 1, wherein the information processing apparatus is activated in response to the specific voice command issued from a direction other than the specified direction. 5.

The predetermined period is set corresponding to each of a plurality of types of the predetermined sounds,
The detection unit is capable of detecting at least one of a plurality of types of the predetermined sounds,
The activation unit stops an activation process of the information processing apparatus in response to the specific voice command for a predetermined period set corresponding to the predetermined sound detected by the detection unit. Item 5. The information processing device according to any one of Items 1 to 4.

A method of controlling an information processing apparatus operable by voice,
A recognition step for recognizing a specific voice command;
An activation step of activating the information processing apparatus in response to the specific voice command;
Detecting a predetermined sound, and
In the starting step, when the predetermined sound is detected, a starting process of the information processing apparatus in response to the specific voice command is stopped for a predetermined period.

An information processing device that can be operated by voice
A recognition means for recognizing a specific voice command;
Activating means for activating the information processing apparatus in response to the specific voice command;
Function as a detection means for detecting a predetermined sound,
A program characterized in that, when the predetermined sound is detected by the activation means, the activation processing of the information processing apparatus in response to the specific voice command is stopped for a predetermined period.

An information processing apparatus operable by voice,
A recognition unit that recognizes a specific voice command;
An activation unit that activates the information processing apparatus in response to the specific voice command;
A detection unit for detecting a direction in which a predetermined sound is emitted,
The information processing apparatus, wherein the activation unit does not execute activation processing of the information processing apparatus in response to the specific voice command from the direction.

The predetermined sound is a predetermined sound logo included in the advertisement broadcast,
The detection unit detects a direction in which the predetermined sound logo included in the advertisement broadcast is emitted,
The information processing apparatus according to claim 8, wherein the activation unit does not execute activation processing of the information processing apparatus in response to the specific voice command from the direction.

The recognizing unit recognizes the specific voice command issued by a user registered in advance from the direction,
When the activation unit recognizes the specific voice command from the direction, the activation unit stops activation processing of the information processing apparatus for the specific voice command issued by a user other than the user registered in advance, The information processing apparatus according to claim 8, wherein the information processing apparatus is activated for the specific voice command issued by the user registered in advance.

11. The information processing apparatus according to claim 8, further comprising: a confirmation unit that confirms to the user whether or not to activate the information processing apparatus when the specific voice command is recognized from the direction. The information processing apparatus according to one item.

In response to recognizing the specific voice command from the direction after the response is input, the activation unit receives the information indicating that the information processing device is activated from the user. The information processing apparatus according to claim 11, wherein the processing apparatus is activated.

A setting unit for setting a parameter depending on an installation location of the information processing apparatus,
The detection unit detects that the installation location of the information processing device has changed when the direction in which the predetermined command is issued as viewed from the device changes,
The information processing apparatus according to any one of claims 8 to 12, wherein the setting unit resets the parameter based on the installation location after the change.

A method of controlling an information processing apparatus operable by voice,
A recognition step for recognizing a specific voice command;
An activation step of activating the information processing apparatus in response to the specific voice command;
Detecting a direction in which a predetermined sound is emitted, and
In the activation step, the activation process of the information processing apparatus in response to the specific voice command from the direction is not executed.

An information processing device that can be operated by voice
A recognition means for recognizing a specific voice command;
Activating means for activating the information processing apparatus in response to the specific voice command;
Function as a detection means for detecting the direction in which a predetermined sound is emitted,
The program according to claim 1, wherein the detection means does not execute a startup process of the information processing apparatus in response to the specific voice command from the direction.