JP2019219886A

JP2019219886A - Information processing device, information processing method and information processing program

Info

Publication number: JP2019219886A
Application number: JP2018116467A
Authority: JP
Inventors: 祐介松下; Yusuke Matsushita; 亮大河村; Ryota Kawamura; 悦子坂本; Etsuko Sakamoto; 愛絵広沢; Manae Hirosawa; 裕人中井; Hiroto Nakai
Original assignee: SoftBank Corp
Current assignee: SoftBank Corp
Priority date: 2018-06-19
Filing date: 2018-06-19
Publication date: 2019-12-26
Anticipated expiration: 2038-06-19
Also published as: JP6700338B2

Abstract

To provide an information processing device that executes a control corresponding to an instruction by speech from a user.SOLUTION: An information processing device includes: a receiving unit that receives an instruction input by speech from a user; an environmental information obtaining unit that obtains environmental information capable of identifying an environment around the user; and an estimating unit that estimates control details to be executed corresponding to the instruction input in accordance with the environmental information; and an executing unit that executes the control details estimated by the estimating unit.SELECTED DRAWING: Figure 2

Description

本発明は、ユーザからの音声指示に基づいて制御を行う情報処理装置、情報処理方法、及び、情報処理プログラムに関する。 The present invention relates to an information processing apparatus, an information processing method, and an information processing program that perform control based on a voice instruction from a user.

従来、人工知能を利用した機器の開発が目覚ましい。その中には、ユーザからの音声による指示に従って、指示された内容を実行する機器がある。例えば、ユーザからの音声による指示に従って、音楽を再生したり、アラームを実行したり、計算をしたり、他の機器（例えば、照明装置）の制御を行ったりするスマートスピーカーが存在する。特許文献１には、ユーザが置かれている環境として、ユーザの現在位置及び現在時刻に基づいて、ユーザに音声入力を促す文を表示する端末装置が開示されている。 Conventionally, the development of devices utilizing artificial intelligence has been remarkable. Among them, there is a device that executes the instructed content in accordance with a voice instruction from a user. For example, there is a smart speaker that plays music, executes an alarm, performs a calculation, and controls another device (for example, a lighting device) according to a voice instruction from a user. Patent Literature 1 discloses a terminal device that displays, as an environment in which a user is placed, a sentence prompting the user to input a voice based on the current position and the current time of the user.

特許６１５４４８９号公報Japanese Patent No. 6154489

ところで、ユーザからの音声による指示は、その時々の状況やユーザの心持ちによっては、別の指示を意図していることがある。しかしながら、上述の特許文献１に記載のような端末装置においては、ユーザからの指示に対して、画一的な処理しか行うことができず、フレキシビリティが低いという問題があった。 By the way, an instruction from the user by voice may be intended to be another instruction depending on the situation at the time or the user's mind. However, the terminal device as described in Patent Document 1 described above has a problem that only uniform processing can be performed in response to an instruction from a user, and flexibility is low.

そこで、本発明は上記問題に鑑みて成されたものであり、ユーザによる音声入力の意図をくんだ処理を行う情報処理装置、情報処理方法及び情報処理プログラムを提供することを目的とする。 The present invention has been made in view of the above problems, and has as its object to provide an information processing apparatus, an information processing method, and an information processing program for performing a process in which a user has intention of voice input.

上記課題を解決するために、本発明の一態様に係る情報処理装置は、ユーザからの音声による指示入力を受け付ける受付部と、ユーザの周囲の環境を特定可能な環境情報を取得する環境情報取得部と、環境情報に応じて、指示入力に対して実行すべき制御内容を推定する推定部と、推定部が推定した制御内容を実行する実行部と、を備える。 In order to solve the above problem, an information processing apparatus according to an aspect of the present invention includes a receiving unit that receives an instruction input by a voice from a user, and an environment information obtaining unit that obtains environment information that can specify an environment around the user. An estimating unit for estimating the control content to be executed in response to the instruction input in accordance with the environment information; and an execution unit for executing the control content estimated by the estimating unit.

上記課題を解決するために、本発明の一態様に係る情報処理方法は、ユーザからの音声による指示入力を受け付ける受付ステップと、ユーザの周囲の環境を特定可能な環境情報を取得する環境情報取得ステップと、環境情報に応じて、指示入力に対して実行すべき制御内容を推定する推定ステップと、推定ステップにおいて推定した制御内容を実行する実行ステップと、をコンピュータが実行する。 In order to solve the above problems, an information processing method according to one aspect of the present invention includes a receiving step of receiving an instruction input by a voice from a user, and an environment information obtaining step of obtaining environment information capable of specifying an environment around the user. The computer executes a step, an estimation step of estimating the control content to be executed in response to the instruction input, and an execution step of executing the control content estimated in the estimation step according to the environment information.

上記課題を解決するために、本発明の一態様に係る情報処理プログラムは、コンピュータに、ユーザからの音声による指示入力を受け付ける受付機能と、ユーザの周囲の環境を特定可能な環境情報を取得する環境情報取得機能と、環境情報に応じて、指示入力に対して実行すべき制御内容を推定する推定機能と、推定機能が推定した制御内容を実行する実行機能と、を実現させる。 In order to solve the above problems, an information processing program according to one embodiment of the present invention acquires, in a computer, a reception function of receiving an instruction input by voice from a user, and environment information capable of specifying an environment around the user. An environment information acquisition function, an estimation function for estimating the control content to be executed in response to the instruction input, and an execution function for executing the control content estimated by the estimation function are realized.

上記情報処理装置において、情報処理装置は、指示入力の内容を示す情報と、環境を示す情報とから、実行すべき制御内容を導出可能な制御モデルを記憶する記憶部を更に備え、推定部は、制御モデルを用いて、制御内容を推定することとしてもよい。 In the information processing device, the information processing device further includes a storage unit that stores a control model capable of deriving control content to be executed from information indicating the content of the instruction input and information indicating the environment, and the estimating unit includes: The control content may be estimated using a control model.

上記情報処理装置において、環境情報取得部は、ユーザの周囲の音声を収集するマイクが集音した音声を、環境情報として取得するものであり、推定部は、音声に基づいてユーザの周囲の環境を推定し、推定した環境に応じて、指示入力に対して実行すべき制御内容を推定することとしてもよい。 In the above information processing apparatus, the environment information acquisition unit acquires, as environment information, a sound collected by a microphone that collects sound around the user, and the estimation unit determines the environment around the user based on the sound. May be estimated, and the control content to be executed in response to the instruction input may be estimated according to the estimated environment.

上記情報処理装置において、マイクは、集音する方向が定められた指向性マイクであり、指向性マイクは、所定の位置に周囲に向けて指向性が向くように複数配置されており、推定部は、指向性マイク各々が集音した音声に基づいて、各音声の音源の位置を推定し、ユーザの周囲の音声の音源の位置に基づいて、ユーザの周囲の環境を推定することとしてもよい。 In the above information processing apparatus, the microphone is a directional microphone in which a sound collecting direction is determined, and a plurality of directional microphones are arranged at predetermined positions so that the directivity is directed toward the surroundings. May estimate the position of the sound source of each sound based on the sound collected by each of the directional microphones, and may estimate the environment around the user based on the position of the sound source of the sound around the user. .

上記情報処理装置において、受付部は、マイクを介して、ユーザからの音声による指示入力を受け付け、推定部は、ユーザの位置を推定し、推定したユーザの位置に応じて、指示入力に対して実行すべき制御内容を推定することとしてもよい。 In the above information processing apparatus, the receiving unit receives an instruction input by voice from the user via the microphone, the estimating unit estimates the position of the user, and responds to the instruction input according to the estimated user position. The control content to be executed may be estimated.

上記情報処理装置において、推定部は、環境情報に応じて、指示入力に対して実行すべき制御内容が異なった制御内容を推定することとしてもよい。 In the above information processing apparatus, the estimating unit may estimate, based on the environment information, control contents having different control contents to be executed in response to the instruction input.

上記情報処理装置において、情報処理装置は、スピーカーを備え、実行部は、スピーカーを制御するものであって、環境情報に応じて、推定した環境に応じた音声を出力することとしてもよい。 In the above information processing apparatus, the information processing apparatus may include a speaker, and the execution unit may control the speaker, and may output a sound according to the estimated environment according to the environment information.

上記情報処理装置において、実行部は、環境情報に応じて、他の機器を制御するものであることとしてもよい。 In the information processing device, the execution unit may control another device according to the environment information.

本発明の一態様に係る情報処理装置は、ユーザの周辺の環境を特定可能な情報を取得し、その情報を用いて、ユーザの指示入力を解釈して、制御を行うので、ユーザの意図を組んだ処理を行うことができる。したがって、ユーザにとって、ユーザからの音声指示入力に対してフレキシビリティに富んだ処理を行うことができる情報処理装置を提供することができる。 The information processing device according to one embodiment of the present invention obtains information that can specify the environment around the user, interprets the user's instruction input using the information, and performs control. A combined process can be performed. Therefore, it is possible to provide an information processing apparatus capable of performing a process that is highly flexible for a user in response to a voice instruction input from the user.

通信システムの構成例を示す図である。FIG. 1 is a diagram illustrating a configuration example of a communication system. 情報処理装置の構成例を示すブロック図である。FIG. 2 is a block diagram illustrating a configuration example of an information processing device. スピーカーの構成例を示すブロック図である。It is a block diagram showing an example of composition of a speaker. 制御モデルの構成例を示すデータ概念図である。It is a data conceptual diagram which shows the example of a structure of a control model. 情報処理装置の機器制御時の動作を示すフローチャートである。9 is a flowchart illustrating an operation of the information processing apparatus during device control.

＜実施形態＞
本発明の第１の実施形態について、図面を参照しながら説明する。 <Embodiment>
A first embodiment of the present invention will be described with reference to the drawings.

本発明に係る情報処理装置１００は、ユーザ１０の周囲の音声を取得して解析することで、ユーザ１０の状況を解析し、ユーザ１０からの音声による指示入力を、その状況に応じた解釈をして、指示入力の内容に対応する制御を実行する装置である。 The information processing apparatus 100 according to the present invention acquires and analyzes the voice around the user 10 to analyze the situation of the user 10, and interprets the instruction input by voice from the user 10 according to the situation. Then, the apparatus executes control corresponding to the content of the instruction input.

情報処理装置１００は、ユーザ１０の周囲の環境を特定可能な環境情報を取得し、取得するとともに、ユーザ１０からの音声による指示入力を受け付ける。情報処理装置１００は、指示された内容を解析し、環境情報に応じて指示入力に対して実行すべき制御内容を推定して実行する。これにより、情報処理装置１００は、ユーザ１０からの指示に対して、環境に応じた解釈を行ったうえでの制御を行うので、同じ命令であってもフレキシビリティに富んだ対応をとることができる。情報処理装置１００は、どのような態様で実現されてもよく、スマートスピーカーのようなスピーカーであったり、単なるコンピュータシステムやサーバ装置であったり、ロボットであったりしてもよい。即ち、情報処理装置１００は、単体のコンピュータシステムとして成立してもよいし、スマートスピーカーやロボット等の機器に内蔵されてもよい。また、情報処理装置１００は、スマートスピーカーやロボットを制御するための制御装置であってもよい。 The information processing apparatus 100 acquires environment information that can specify the environment around the user 10, acquires the environment information, and accepts a voice instruction input from the user 10. The information processing apparatus 100 analyzes the instructed contents, estimates control contents to be executed in response to the instruction input according to the environment information, and executes them. Thereby, the information processing apparatus 100 performs control after interpreting the instruction from the user 10 in accordance with the environment, so that even the same instruction can be flexibly handled. it can. The information processing device 100 may be realized in any manner, and may be a speaker such as a smart speaker, a simple computer system or a server device, or a robot. That is, the information processing apparatus 100 may be realized as a single computer system, or may be built in a device such as a smart speaker or a robot. Further, the information processing device 100 may be a control device for controlling a smart speaker or a robot.

以下、このような情報処理装置１００について説明する。 Hereinafter, such an information processing apparatus 100 will be described.

（システム構成）
図１に、情報処理装置１００を含む通信システム１の概要を示す。通信システム１は、ユーザ１０の周囲の環境を特定可能な情報として、ユーザ１０の周囲の音声を利用する。通信システム１は、ユーザ１０からの音声による指示及びユーザ１０の周囲の音声を収集する機器として、スピーカー装置２００を含む。スピーカー装置２００は、マイクを内蔵しており、周囲の音声を逐次集音し、集音して得られる音声データを情報処理装置１００に送信する。情報処理装置１００は、音声データを受信し、受信した音声データからユーザ１０の指示を抽出するとともに、ユーザ１０の周囲の環境を推定し、推定した環境に応じた制御であって、ユーザ１０から指定された制御を行う。 (System configuration)
FIG. 1 shows an outline of a communication system 1 including an information processing device 100. The communication system 1 uses the voice around the user 10 as information that can specify the environment around the user 10. The communication system 1 includes a speaker device 200 as a device that collects a voice instruction from the user 10 and a voice around the user 10. The speaker device 200 has a built-in microphone, sequentially collects surrounding sounds, and transmits sound data obtained by collecting the sounds to the information processing device 100. The information processing apparatus 100 receives the audio data, extracts an instruction of the user 10 from the received audio data, estimates an environment around the user 10, and performs control according to the estimated environment. Performs the specified control.

また、通信システム１は、ユーザ１０の指示に基づく制御の対象となり得る各種の機器を含むこととしてよい。機器としては、各種の家電を用いることができ、例えば、照明装置、空調装置、スピーカー、テレビ、給湯装置、電動ブラインド、電動カーテン、…などがある。図１には、一例として、ミニコンポ３０、照明装置４０が示されている。 The communication system 1 may include various devices that can be controlled based on an instruction from the user 10. As the device, various home appliances can be used, and examples thereof include a lighting device, an air conditioner, a speaker, a television, a water heater, an electric blind, an electric curtain, and the like. FIG. 1 shows a mini component system 30 and a lighting device 40 as an example.

図１に示すように、情報処理装置１００は、ネットワーク３００を介して、スピーカー装置２００と接続されている。また、情報処理装置１００は、各種の機器（家電）と接続されていてよく、情報処理装置１００は、各機器を制御可能に構成されていてもよい。制御可能に構成されているとは、情報処理装置１００が各機器を遠隔制御できるように各機器の制御権を有していることを意味する。 As shown in FIG. 1, the information processing device 100 is connected to a speaker device 200 via a network 300. Further, the information processing device 100 may be connected to various devices (home appliances), and the information processing device 100 may be configured to be able to control each device. Being configured to be controllable means that the information processing apparatus 100 has a control right for each device so that the device can be remotely controlled.

情報処理装置１００は、スピーカー装置２００が取得した音声データを、ネットワーク３００を介して受信する。情報処理装置１００は、受信した音声データに基づいて、ユーザ１０の状況（環境）を推定するとともに、ユーザからの音声による指示の内容を解釈する。そして、推定した状況（環境）に応じて、ユーザ１０の意図や状況に応じた制御を行う。ここで情報処理装置１００が実行する制御とは、自装置の制御の他、他の機器を制御するための信号を出力することまで含んでよい。図１の例では、ユーザ１０が「音楽かけて」と指示した場合の例を示している。従来であれば、このような指示を受けた場合、何らかの音楽を再生することになる。一方で、図１では、従来とは異なり、情報処理装置１００は、ミニコンポ３０から音楽が流れているという状況を推定し、スピーカー装置２００に、再生をする前に、「他に音楽かかっているようです。再生しますか？」と、再生をしてもよいか問い合わせを行うという処理を実行させた例を示している。 The information processing device 100 receives the audio data acquired by the speaker device 200 via the network 300. The information processing apparatus 100 estimates the situation (environment) of the user 10 based on the received voice data, and interprets the content of the voice instruction from the user. Then, control is performed according to the intention and the situation of the user 10 according to the estimated situation (environment). Here, the control performed by the information processing apparatus 100 may include outputting a signal for controlling another device in addition to the control of the own apparatus. The example of FIG. 1 illustrates an example in which the user 10 instructs “play music”. Conventionally, when such an instruction is received, some music is reproduced. On the other hand, in FIG. 1, unlike the related art, the information processing apparatus 100 estimates a situation in which music is being played from the mini-component 30 and, before playing the music to the speaker device 200, “plays music elsewhere”. "Do you want to play back?", And asks if you want to play back.

ネットワーク３００は、情報処理装置１００と各種の機器との間を相互に接続させるためのネットワークであり、例えば、無線ネットワークや有線ネットワークである。具体的には、ネットワーク３００は、ワイヤレスＬＡＮ（ｗｉｒｅｌｅｓｓＬＡＮ：ＷＬＡＮ）や広域ネットワーク（ｗｉｄｅａｒｅａｎｅｔｗｏｒｋ：ＷＡＮ）、ＩＳＤＮｓ（ｉｎｔｅｇｒａｔｅｄｓｅｒｖｉｃｅｄｉｇｉｔａｌｎｅｔｗｏｒｋｓ）、無線ＬＡＮｓ、ＬＴＥ（ｌｏｎｇｔｅｒｍｅｖｏｌｕｔｉｏｎ）、ＬＴＥ−Ａｄｖａｎｃｅｄ、第４世代（４Ｇ）、第５世代（５Ｇ）、ＣＤＭＡ（ｃｏｄｅｄｉｖｉｓｉｏｎｍｕｌｔｉｐｌｅａｃｃｅｓｓ）、ＷＣＤＭＡ（登録商標）、イーサネット（登録商標）などである。 The network 300 is a network for mutually connecting the information processing apparatus 100 and various devices, and is, for example, a wireless network or a wired network. More specifically, the network 300 includes a wireless LAN (wireless LAN: WLAN), a wide area network (WAN), ISDNs (integrated service digital networks), wireless LANs, LTE (long term evolution, and volume evolution). Fourth generation (4G), fifth generation (5G), CDMA (code division multiple access), WCDMA (registered trademark), Ethernet (registered trademark), and the like.

また、ネットワーク３００は、これらの例に限られず、例えば、公衆交換電話網（ＰｕｂｌｉｃＳｗｉｔｃｈｅｄＴｅｌｅｐｈｏｎｅＮｅｔｗｏｒｋ：ＰＳＴＮ）やブルートゥース（Ｂｌｕｅｔｏｏｔｈ（登録商標））、ブルートゥースローエナジー（ＢｌｕｅｔｏｏｔｈＬｏｗＥｎｅｒｇｙ）、光回線、ＡＤＳＬ（ＡｓｙｍｍｅｔｒｉｃＤｉｇｉｔａｌＳｕｂｓｃｒｉｂｅｒＬｉｎｅ）回線、衛星通信網などであってもよく、どのようなネットワークであってもよい。ネットワーク３００は、ユーザ１０の住居に備えられる場合には、ホームネットワークと呼称されることもある。 The network 300 is not limited to these examples, and may be, for example, a public switched telephone network (PSTN), Bluetooth (Bluetooth (registered trademark)), Bluetooth Low Energy, optical line AD, optical line, or the like. (Asymmetric Digital Subscriber Line) A line, a satellite communication network or the like may be used, and any network may be used. When provided at the residence of the user 10, the network 300 may be called a home network.

また、ネットワーク３００、例えば、ＮＢ−ＩｏＴ（ＮａｒｒｏｗＢａｎｄＩｏＴ）や、ｅＭＴＣ（ｅｎｈａｎｃｅｄＭａｃｈｉｎｅＴｙｐｅＣｏｍｍｕｎｉｃａｔｉｏｎ）であってもよい。なお、ＮＢ−ＩｏＴやｅＭＴＣは、ＩｏＴ向けの無線通信方式であり、低コスト、低消費電力で長距離通信が可能なネットワークである。 Further, the network 300 may be, for example, NB-IoT (Narrow Band IoT) or eMTC (enhanced Machine Type Communication). Note that NB-IoT and eMTC are wireless communication systems for IoT, and are networks capable of long-distance communication with low cost and low power consumption.

また、ネットワーク３００は、これらの組み合わせであってもよい。また、ネットワーク３００は、これらの例を組み合わせた複数の異なるネットワークを含むものであってもよい。例えば、ネットワーク３００は、ＬＴＥによる無線ネットワークと、閉域網であるイントラネットなどの有線ネットワークとを含むものであってもよい。 Further, the network 300 may be a combination of these. The network 300 may include a plurality of different networks obtained by combining these examples. For example, the network 300 may include a wireless network based on LTE and a wired network such as an intranet that is a closed network.

（情報処理装置の構成例）
図２は、情報処理装置１００の構成例を示すブロック図である。図２に示すように、情報処理装置１００は、例えば、受信部１１０と、記憶部１２０と、制御部１３０と、送信部１４０と、を備える。 (Example of configuration of information processing device)
FIG. 2 is a block diagram illustrating a configuration example of the information processing apparatus 100. As illustrated in FIG. 2, the information processing apparatus 100 includes, for example, a receiving unit 110, a storage unit 120, a control unit 130, and a transmitting unit 140.

受信部１１０は、ネットワーク３００を介して、スピーカー装置２００から音声データを受信する通信インターフェースである。受信部１１０は、ユーザからの音声による指示入力を示す音声データや、ユーザの周囲の環境を特定可能な情報としての音声データを受信する。受信部１１０は、音声データを受信すると、制御部１３０に伝達する。なお、受信部１１０は、スピーカー装置２００以外の装置からユーザの環境を特定可能な情報を受信できるように構成されてもよい。例えば、ユーザのいる家屋の家電の稼働状況を示す情報や、各種のセンサによるセンシングデータを受信する。受信部１１０は、これらの情報を環境推定部１３２に伝達し、環境推定部１３２は、これらの情報をユーザの状況を推定するために用いることとしてもよい。 The receiving unit 110 is a communication interface that receives audio data from the speaker device 200 via the network 300. The receiving unit 110 receives voice data indicating an instruction input by a voice from the user, and voice data as information that can specify an environment around the user. When receiving the audio data, the receiving unit 110 transmits the audio data to the control unit 130. Note that the receiving unit 110 may be configured to be able to receive information that can specify the user's environment from a device other than the speaker device 200. For example, it receives information indicating the operation status of home appliances in the house where the user is located, and sensing data from various sensors. The receiving unit 110 transmits these pieces of information to the environment estimating unit 132, and the environment estimating unit 132 may use these pieces of information to estimate the situation of the user.

記憶部１２０は、情報処理装置１００が動作するうえで必要とする各種プログラムや各種データを記憶する機能を有する。記憶部１２０は、例えば、ＨＤＤ、ＳＳＤ、フラッシュメモリなど各種の記憶媒体により実現される。なお、情報処理装置１００は、プログラムを記憶部１２０に記憶し、当該プログラムを実行して、制御部１３０が、制御部１３０に含まれる各機能部としての処理を実行してもよい。このプログラムは、情報処理装置１００に、制御部１３０が実行する各機能を実現させる。記憶部１２０は、受信した受信した音声データに基づいてユーザからの指示内容を推定するための音声解析を行う音声解析プログラムや、音声データに基づいてユーザ１０の状況（環境）を推定する環境推定プログラムを記憶している。また、記憶部１２０は、推定したユーザ１０の状況とユーザからの音声による指示入力から、実行すべき制御を推定するために用いる制御モデル１２１を記憶している。制御モデル１２１の詳細については、後述する。また、記憶部１２０は、音の種別からユーザの置かれている環境を推定するための環境推定モデルを記憶している。環境推定モデルは、音声のサンプルと、その音声が示す状況とを対応付けた情報である。 The storage unit 120 has a function of storing various programs and various data required for the operation of the information processing device 100. The storage unit 120 is realized by various storage media such as an HDD, an SSD, and a flash memory. Note that the information processing apparatus 100 may store the program in the storage unit 120, execute the program, and cause the control unit 130 to execute processing as each functional unit included in the control unit 130. This program causes the information processing apparatus 100 to realize each function executed by the control unit 130. The storage unit 120 performs a voice analysis program for performing voice analysis for estimating the instruction content from the user based on the received voice data, and an environment estimation for estimating a situation (environment) of the user 10 based on the voice data. I remember the program. Further, the storage unit 120 stores a control model 121 used for estimating the control to be executed based on the estimated situation of the user 10 and the instruction input by voice from the user. Details of the control model 121 will be described later. The storage unit 120 stores an environment estimation model for estimating the environment where the user is located from the type of sound. The environment estimation model is information that associates a voice sample with a situation indicated by the voice.

制御部１３０は、情報処理装置１００の各部を制御するものであり、例えば、中央処理装置（ＣＰＵ）やマイクロプロセッサ、ＡＳＩＣ、ＦＰＧＡなどであってもよい。なお、制御部１３０は、これらの例に限られず、どのようなものであってもよい。 The control unit 130 controls each unit of the information processing apparatus 100, and may be, for example, a central processing unit (CPU), a microprocessor, an ASIC, an FPGA, or the like. The control unit 130 is not limited to these examples, and may be any type.

制御部１３０は、音声処理部１３１と、環境推定部１３２と、指示推定部１３３と、制御推定部１３４と、実行部１３５と、を含む。 The control unit 130 includes a voice processing unit 131, an environment estimation unit 132, an instruction estimation unit 133, a control estimation unit 134, and an execution unit 135.

音声処理部１３１は、受信部１１０から伝達された音声データを解析する機能を有する。音声処理部１３１は、伝達された音声データから、ユーザの指示が含まれる音声データと、含まれない音声データとに分離する。ここで、音声データは所定時間長の音声データであり、その中で人の声が含まれる部分と含まれない部分とに分けることとしてもよいし、複数の同じ状況の音を集音した音声データを複数受け付けていた場合には、それらの音声データの中でユーザの声が含まれているものと含まれていないものとに分けることとしてもよい。そして、音声処理部１３１は、ユーザの指示が含まれる音声データを指示推定部１３３に伝達し、ユーザの指示が含まれていない音声データを環境推定部１３２に伝達する。 The audio processing unit 131 has a function of analyzing the audio data transmitted from the receiving unit 110. The audio processing unit 131 separates the transmitted audio data into audio data that includes a user instruction and audio data that does not. Here, the audio data is audio data of a predetermined time length, and may be divided into a portion including a human voice and a portion not including the human voice, or a voice obtained by collecting a plurality of sounds in the same situation. When a plurality of data are received, the voice data may be divided into those that include the voice of the user and those that do not. Then, the audio processing unit 131 transmits the audio data including the user's instruction to the instruction estimation unit 133, and transmits the audio data not including the user's instruction to the environment estimation unit 132.

環境推定部１３２は、伝達された音声データから、ユーザの周囲の環境を推定する。環境推定部１３２は、記憶部１２０に記憶されている環境推定モデルを用いて、ユーザの周囲の環境を推定する。一例として、環境推定部１３２は、伝達された音声データが、環境推定モデルが保持する音声データと一定以上相関する音声データに対応付けられたユーザの環境を特定することで、ユーザの周囲の環境を推定する。環境推定部１３２は、推定したユーザの環境を示す情報を制御推定部１３４に伝達する。 The environment estimating unit 132 estimates the environment around the user from the transmitted voice data. The environment estimation unit 132 estimates the environment around the user using the environment estimation model stored in the storage unit 120. As an example, the environment estimating unit 132 specifies the environment of the user in which the transmitted voice data is associated with voice data that correlates with the voice data held by the environment estimation model by a certain degree or more, and thereby the environment around the user. Is estimated. The environment estimating unit 132 transmits information indicating the estimated environment of the user to the control estimating unit 134.

指示推定部１３３は、伝達された音声データから、ユーザの指示を推定する。指示推定部１３３は、例えば、既存の音声認識技術を利用して、ユーザの音声による指示をテキストデータに変換し、制御推定部１３４に伝達する。 The instruction estimating unit 133 estimates a user instruction from the transmitted voice data. The instruction estimating unit 133 converts the user's voice instruction into text data using an existing voice recognition technology, and transmits the text data to the control estimating unit 134.

制御推定部１３４は、指示推定部１３３から伝達されたユーザの指示入力の内容と、環境推定部１３２から伝達されたユーザの周囲の環境を示す情報とに基づいて、実行すべき制御内容を推定する。制御推定部１３４は、ユーザの指示入力の内容と、ユーザの周囲の環境を示す情報とを入力として、制御モデル１２１を用いて、実行すべき制御内容を推定する。制御推定部１３４は、推定した制御内容を、実行部１３５に伝達する。 The control estimating unit 134 estimates the control content to be executed based on the contents of the user's instruction input transmitted from the instruction estimating unit 133 and the information indicating the environment around the user transmitted from the environment estimating unit 132. I do. The control estimating unit 134 estimates the control content to be executed using the control model 121 by using the content of the user's instruction input and the information indicating the environment around the user as inputs. The control estimation unit 134 transmits the estimated control content to the execution unit 135.

実行部１３５は、制御推定部１３４から伝達された制御内容を実行する。即ち、伝達された制御内容に基づいて、制御対象の機器に対して、実行する処理内容を示す制御信号を生成し、送信部１４０に送信させる。 The execution unit 135 executes the control content transmitted from the control estimation unit 134. That is, based on the transmitted control content, a control signal indicating the content of the processing to be executed is generated for the device to be controlled, and transmitted to the transmitting unit 140.

送信部１４０は、制御部１３０（実行部１３５）からの指示に従って、各種の機器（スピーカー装置２００や家電など）に制御信号を送信する機能を有する通信インターフェースである。 The transmission unit 140 is a communication interface having a function of transmitting a control signal to various devices (such as the speaker device 200 and home appliances) in accordance with an instruction from the control unit 130 (execution unit 135).

以上が、情報処理装置１００の構成例である。 The above is the configuration example of the information processing apparatus 100.

（スピーカーの構成例）
図３は、スピーカー装置２００の構成例を示すブロック図である。図３に示すように、スピーカー装置２００は、受信部２１０と、記憶部２２０と、スピーカー２３０と、マイク２４０と、送信部２５０と、を備える。 (Example of speaker configuration)
FIG. 3 is a block diagram illustrating a configuration example of the speaker device 200. As shown in FIG. 3, the speaker device 200 includes a receiving unit 210, a storage unit 220, a speaker 230, a microphone 240, and a transmitting unit 250.

受信部２１０は、情報処理装置１００から制御信号（音声データ）を受信する通信インターフェースである。受信部２１０は、受信した制御信号（音声データ）をスピーカー２３０に伝達する。 The receiving unit 210 is a communication interface that receives a control signal (voice data) from the information processing device 100. Receiving section 210 transmits the received control signal (voice data) to speaker 230.

記憶部２２０は、スピーカー装置２００が動作する上で必要とする各種のプログラムやデータを記憶する機能を有する。記憶部２２０は、例えば、ＨＤＤ、ＳＳＤ、フラッシュメモリなど各種の記憶媒体により実現される。なお、スピーカー装置２００は、プログラムを記憶部２２０に記憶し、当該プログラムを実行して、図示しない制御部が、スピーカー装置２００として実現すべき機能を実現することとしてよい。記憶部２２０は、マイク２４０が集音した音声データを記憶する。 The storage unit 220 has a function of storing various programs and data required for the operation of the speaker device 200. The storage unit 220 is realized by various storage media such as an HDD, an SSD, and a flash memory. Note that the speaker device 200 may store a program in the storage unit 220 and execute the program, so that a control unit (not shown) realizes a function to be realized as the speaker device 200. The storage unit 220 stores audio data collected by the microphone 240.

スピーカー２３０は、情報処理装置１００から送信されて受信した制御信号（音声データ）を再生する機能を有する。 The speaker 230 has a function of reproducing a control signal (audio data) transmitted from the information processing apparatus 100 and received.

マイク２４０は、スピーカー装置２００の周囲の音声を集音する機能を有する。マイク２４０は、１つのマイクロフォンで構成されてもよいし、複数のマイクロフォンで構成されていてもよい。また、マイクロフォンは、集音の方向が限定された指向性のものであってもよい。マイク２４０は、集音した音声を示す音声データを、記憶部２２０に記憶する。 The microphone 240 has a function of collecting sound around the speaker device 200. Microphone 240 may be configured with one microphone, or may be configured with a plurality of microphones. Further, the microphone may be a directional microphone having a limited sound collection direction. The microphone 240 stores audio data indicating the collected audio in the storage unit 220.

送信部２５０は、記憶部２２０に記憶されている音声データを、情報処理装置１００に送信する機能を有する通信インターフェースである。送信部２５０は、記憶部２２０に記憶されている音声データを逐次、情報処理装置１００に送信することとしてもよいし、ユーザからの音声による指示入力があったと検出できた場合に、その前後の所定長分の音声データを送信することとしてもよい。 The transmission unit 250 is a communication interface having a function of transmitting the audio data stored in the storage unit 220 to the information processing device 100. The transmission unit 250 may sequentially transmit the audio data stored in the storage unit 220 to the information processing apparatus 100, or when it is detected that an instruction input by a user has been made, Audio data for a predetermined length may be transmitted.

以上が、スピーカー装置２００の構成例である。 The above is the configuration example of the speaker device 200.

（制御モデル１２１の構成例）
次に、制御モデル１２１の一例を、図４を用いて説明する。図４は、制御モデル１２１のデータ概念図である。図４に示すように、制御モデル１２１は、環境条件４０１と、制御機器４０２と、制御内容４０３とが対応付けられた情報である。 (Configuration Example of Control Model 121)
Next, an example of the control model 121 will be described with reference to FIG. FIG. 4 is a conceptual data diagram of the control model 121. As shown in FIG. 4, the control model 121 is information in which an environmental condition 401, a control device 402, and a control content 403 are associated with each other.

環境条件４０１は、ユーザ１０からの指示の内容と、その際のユーザの周囲の環境の内容とを示す情報であり、対応する制御内容を実行するための条件を示す情報である。 The environment condition 401 is information indicating the content of the instruction from the user 10 and the content of the environment around the user at that time, and is information indicating a condition for executing the corresponding control content.

制御機器４０２は、対応する環境条件４０１が満たされた場合に、制御の対象となる機器を示す情報である。制御機器４０２には、制御の対象となる機器は複数含まれていてもよい。 The control device 402 is information indicating a device to be controlled when the corresponding environmental condition 401 is satisfied. The control device 402 may include a plurality of devices to be controlled.

制御内容４０３は、対応する環境条件４０１が満たされた場合に、対応する制御機器４０２において実行する制御の内容を示す情報である。制御内容４０３には、複数の制御内容が含まれていてもよい。また、対応する制御機器４０２に複数の機器が設定されている場合であって、制御内容４０３にも複数の制御内容が記載されている場合には、どの機器がどの制御内容を実行するかが規定される。 The control content 403 is information indicating the content of control executed by the corresponding control device 402 when the corresponding environmental condition 401 is satisfied. The control contents 403 may include a plurality of control contents. When a plurality of devices are set in the corresponding control device 402 and a plurality of control contents are also described in the control content 403, it is determined which device performs which control content. Stipulated.

図４の例で言えば、例えば、指示推定部１３３が推定したユーザの指示内容が「音楽かけて」であって、環境推定部１３２が推定したユーザの環境が、「周囲は静か」であると推定された場合には、制御推定部１３４は、制御機器４０２として、「スピーカー装置２００」を選択し、「静かな音楽を流す」ことを実行すべき制御として推定することになる。また、あるいは、指示推定部１３３が推定したユーザの指示内容が「音楽かけて」であって、環境推定部１３２が推定したユーザの環境が、「音楽を検知している」状況であると推定された場合には、制御推定部１３４は、制御機器４０２として、「スピーカー装置２００」を選択し、「「他に音楽がかかっているようです。再生しますか？」と問い合わせをする音声を出力させる」ことを実行すべき制御として推定することになる。 In the example of FIG. 4, for example, the instruction content of the user estimated by the instruction estimation unit 133 is “playing music”, and the environment of the user estimated by the environment estimation unit 132 is “quiet surroundings”. If it is estimated, the control estimating unit 134 selects the “speaker device 200” as the control device 402 and estimates that “playing quiet music” is to be executed. Alternatively, it is estimated that the content of the user's instruction estimated by the instruction estimating unit 133 is “playing music”, and the environment of the user estimated by the environment estimating unit 132 is a state of “detecting music”. In this case, the control estimating unit 134 selects the “speaker device 200” as the control device 402, and outputs a voice inquiring ““ It looks like there is music playing elsewhere. "Output" is estimated as the control to be executed.

なお、図４は、あくまで制御モデル１２１の一例に過ぎない。制御モデル１２１は、様々な環境条件（ユーザの指示入力内容と、ユーザの周囲の環境との組み合わせ）を入力として、制御の対象となる機器と、その機器における制御内容を導出できるデータであれば、どのようなデータであってもよく、その他の一例としては、機械学習（深層学習を含む）における推定モデルを用いることができる。また、図４に示す環境条件４０１、制御機器４０２、制御内容４０３もまた、一例に過ぎず、図４に示した内容以外の条件に基づいて、何らかの機器が、何らかの制御を行うこととしてよいことは言うまでもない。 FIG. 4 is merely an example of the control model 121. The control model 121 is a device to be controlled and data that can derive the control content of the device by using various environmental conditions (a combination of the user's instruction input content and the environment around the user) as inputs. Any data may be used. As another example, an estimation model in machine learning (including deep learning) can be used. In addition, the environmental condition 401, the control device 402, and the control content 403 illustrated in FIG. 4 are also merely examples, and any device may perform some control based on a condition other than the content illustrated in FIG. Needless to say.

（情報処理装置１００の動作例）
図５は、情報処理装置１００の動作であって、機器の制御を行う際の動作を示すフローチャートである。 (Example of operation of information processing apparatus 100)
FIG. 5 is a flowchart showing an operation of the information processing apparatus 100, which is an operation when controlling the device.

図５に示すように、情報処理装置１００の受信部１１０は、スピーカー装置２００から音声データを受信する（ステップＳ５０１）。受信部１１０は、受信した音声データを、音声処理部１３１に伝達する。音声処理部１３１は、伝達された音声データから、ユーザの指示入力が含まれると推定される音声データと、ユーザの指示入力が含まれていないと推定される音声データとに分離する。そして、音声処理部１３１は、ユーザの指示入力が含まれると推定される音声データを指示推定部１３３に伝達し、ユーザの指示入力が含まれていないと推定される音声データを環境推定部１３２に伝達する。 As illustrated in FIG. 5, the receiving unit 110 of the information processing device 100 receives audio data from the speaker device 200 (Step S501). The receiving unit 110 transmits the received audio data to the audio processing unit 131. The audio processing unit 131 separates the transmitted audio data into audio data estimated to include a user's instruction input and audio data estimated to not include a user's instruction input. Then, the audio processing unit 131 transmits the audio data estimated to include the user's instruction input to the instruction estimating unit 133, and converts the audio data estimated to not include the user's instruction input to the environment estimating unit 132. To communicate.

指示推定部１３３は、伝達された音声データを解析し、ユーザの指示入力の内容を推定する（ステップＳ５０２）。指示推定部１３３は、推定したユーザの指示入力の内容を制御推定部１３４に伝達する。 The instruction estimating unit 133 analyzes the transmitted voice data and estimates the content of the user's instruction input (step S502). The instruction estimating unit 133 transmits the content of the estimated user's instruction input to the control estimating unit 134.

また、環境推定部１３２は、伝達された音声データを解析し、ユーザの周囲の環境を推定する（ステップＳ５０３）。環境推定部１３２は、推定したユーザの環境を示す情報を制御推定部１３４に伝達する。なお、ステップＳ５０２とステップＳ５０３の制御の処理順序は、前後してもよいし、並列に実行されてもよい。 The environment estimating unit 132 analyzes the transmitted voice data and estimates the environment around the user (step S503). The environment estimating unit 132 transmits information indicating the estimated environment of the user to the control estimating unit 134. Note that the processing order of the control in step S502 and step S503 may be reversed, or may be executed in parallel.

制御推定部１３４は、ユーザからの指示入力の内容を示す情報と、ユーザの環境を示す情報とを入力として、制御モデル１２１を用いて実行すべき制御内容を推定する（ステップＳ５０４）。即ち、制御推定部１３４は、伝達されたユーザからの指示入力の内容を示す情報と、ユーザの環境を示す情報とに一致する環境条件４０１が制御モデル１２１にあるか否かを探索する。そして、条件に合致する環境条件４０１があった場合に、対応する制御機器４０２と制御内容４０３とを実行すべき制御内容として推定する。制御推定部１３４は、推定した制御内容を示す情報を実行部１３５に伝達する。 The control estimating unit 134 estimates the control content to be executed using the control model 121, using the information indicating the content of the instruction input from the user and the information indicating the environment of the user as inputs (step S504). That is, the control estimating unit 134 searches whether or not the control model 121 has an environment condition 401 that matches the information indicating the content of the instruction input transmitted from the user and the information indicating the user's environment. Then, when there is an environmental condition 401 that matches the condition, the corresponding control device 402 and control content 403 are estimated as the control content to be executed. The control estimating unit 134 transmits information indicating the estimated control content to the executing unit 135.

実行部１３５は、制御推定部１３４から伝達された制御内容を示す情報に基づいて、制御対象の機器に対する制御内容を実行させるための制御信号を生成する。そして、実行部１３５は、生成した制御信号を、送信部１４０に送信させて（ステップＳ５０５）処理を終了する。 The execution unit 135 generates a control signal for causing the control target device to execute the control content based on the information indicating the control content transmitted from the control estimation unit 134. Then, execution unit 135 causes transmission unit 140 to transmit the generated control signal (step S505), and ends the process.

これにより、情報処理装置１００は、ユーザからの音声による指示入力があった場合に、ユーザの置かれている環境に応じて、指示の内容の解釈を異ならせた制御を行うことができるので、フレキシビリティに富んだ応答が可能な情報処理装置１００を提供することができる。言い換えれば、状況適応性の高い情報処理装置１００を提供することができる。 Thereby, when there is an instruction input by voice from the user, the information processing apparatus 100 can perform control in which interpretation of the content of the instruction is made different according to the environment in which the user is placed. It is possible to provide the information processing apparatus 100 capable of providing a flexible response. In other words, it is possible to provide the information processing apparatus 100 with high situation adaptability.

（制御具体例）
以下には、スピーカー装置２００が集音した音声データに基づいて、情報処理装置１００が実行する処理について具体的に説明する。 (Specific example of control)
Hereinafter, a process executed by the information processing apparatus 100 based on audio data collected by the speaker device 200 will be specifically described.

（例１）
まず、情報処理装置１００は、音声データを解析したことにより、以下の情報（ａ１）、（ａ２）を得たとする。
（ａ１）「音楽かけて」とのユーザからの指示入力あり。
（ａ２）ユーザの周囲の状況は静かである。 (Example 1)
First, it is assumed that the information processing apparatus 100 obtains the following information (a1) and (a2) by analyzing the audio data.
(A1) There is an instruction input from the user to "play music".
(A2) The situation around the user is quiet.

このような場合、情報処理装置１００は、制御モデル１２１の環境条件４０１を参照して、対応する制御機器４０２、制御内容４０３を特定する。これらの条件に対して、例えば、制御推定部１３４は、実行すべき制御内容として、静かな音楽をかける（再生する）という処理を実行する。即ち、情報処理装置１００は、静かな環境に対しては、静かな音楽を流すことが状況に適していると判断して実行することができる。したがって、情報処理装置１００は、ユーザが置かれている状況にふさわしく、ユーザからの指示に沿った制御を実現することができる。 In such a case, the information processing apparatus 100 refers to the environmental condition 401 of the control model 121 and specifies the corresponding control device 402 and control content 403. For these conditions, for example, the control estimating unit 134 executes a process of playing (playing) quiet music as the control content to be executed. That is, in a quiet environment, the information processing apparatus 100 can determine that it is appropriate to play quiet music according to the situation, and execute it. Therefore, the information processing apparatus 100 can implement control in accordance with an instruction from the user, as appropriate for the situation where the user is located.

（例２）
まず、情報処理装置１００は、音声データを解析したことにより、以下の情報（ｂ１）、（ｂ２）を得たとする。
（ｂ１）「音楽かけて」とのユーザからの指示入力あり。
（ｂ２）ユーザの周囲の状況として音楽が検知できる。 (Example 2)
First, it is assumed that the information processing apparatus 100 has obtained the following information (b1) and (b2) by analyzing the audio data.
(B1) There is an instruction input from the user to "play music".
(B2) Music can be detected as a situation around the user.

このような場合、情報処理装置１００は、制御モデル１２１の環境条件４０１を参照して、対応する制御機器４０２、制御内容４０３を特定する。これらの条件に対して、例えば、制御推定部１３４は、実行すべき制御内容として、音楽を再生してもよいか問い合わせをするという処理を実行する。即ち、情報処理装置１００は、音楽が既に再生されている状況下で別の音楽を再生するのは、耳障りになる（不協和音を生む可能性がある）ので、その確認をとることで、ユーザに不快感を与える可能性を低減することができる。 In such a case, the information processing apparatus 100 refers to the environmental condition 401 of the control model 121 and specifies the corresponding control device 402 and control content 403. In response to these conditions, for example, the control estimating unit 134 executes a process of inquiring whether music may be reproduced as the control content to be executed. In other words, the information processing apparatus 100 is unpleasant to play another music in a situation where the music has already been played (it may produce a dissonant sound). The possibility of giving discomfort can be reduced.

（例１）と（例２）とを比較すれば理解できるように、ユーザの「音楽かけて」という同じ内容の指示に対して、情報処理装置１００は、異なった対応をとることができる。即ち、情報処理装置１００は、ユーザの置かれている環境に即した形で、ユーザの指示にしたがった制御を実現することができる。よって、ユーザの指示に対してフレキシビリティに富んだ対応を実現することができる情報処理装置１００を提供することができる。 As can be understood from a comparison between (Example 1) and (Example 2), the information processing apparatus 100 can take different responses to a user's instruction of “play music” having the same content. That is, the information processing apparatus 100 can realize control in accordance with the user's instruction in a form suitable for the environment where the user is located. Therefore, it is possible to provide the information processing apparatus 100 that can realize a flexible response to a user's instruction.

（例３）
まず、情報処理装置１００は、音声データを解析したことにより、以下の情報（ｃ１）、（ｃ２）、（ｃ３）を得たとする。
（ｃ１）「電気消して」とのユーザからの指示入力あり。
（ｃ２）ユーザが居る部屋の電気がついている。
（ｃ３）誰もいない部屋の電気がついている。 (Example 3)
First, it is assumed that the information processing apparatus 100 has obtained the following information (c1), (c2), and (c3) by analyzing the audio data.
(C1) There is an instruction input from the user to "turn off the power".
(C2) The room in which the user is located is turned on.
(C3) Electricity in a room where no one is present is on.

このような場合、情報処理装置１００は、制御モデル１２１の環境条件４０１を参照して、対応する制御機器４０２、制御内容４０３を特定する。これらの条件に対して、例えば、制御推定部１３４は、実行すべき制御内容として、ユーザがいない部屋の照明装置を消灯するという処理を行ってもよい。こうすることで、ユーザからの指示にしたがって、不要な電気を消灯することができるとともに、ユーザがいる空間では消灯しないようにすることができる。 In such a case, the information processing apparatus 100 refers to the environmental condition 401 of the control model 121 and specifies the corresponding control device 402 and control content 403. For these conditions, for example, the control estimating unit 134 may perform a process of turning off a lighting device in a room where there is no user as the control content to be executed. In this way, unnecessary electricity can be turned off in accordance with an instruction from the user, and can be prevented from being turned off in the space where the user is.

（例４）
まず、情報処理装置１００は、音声データを解析したことにより、以下の情報（ｄ１）、（ｄ２）、（ｄ３）を得たとする。
（ｄ１）「テレビの音消して」とのユーザからの指示入力あり。
（ｄ２）電話の音が検知できている。
（ｄ３）テレビを視聴している他のユーザがいる（指示を出したユーザとは別のユーザの声が聞こえる）。 (Example 4)
First, it is assumed that the information processing apparatus 100 obtains the following information (d1), (d2), and (d3) by analyzing the audio data.
(D1) There is an instruction input from the user to “mute the TV”.
(D2) The telephone sound has been detected.
(D3) There is another user watching the television (a voice of a user different from the user who issued the instruction is heard).

このような場合、情報処理装置１００は、制御モデル１２１の環境条件４０１を参照して、対応する制御機器４０２、制御内容４０３を特定する。これらの条件に対して、例えば、制御推定部１３４は、実行すべき制御内容として、テレビの音量を下げるという処理を行う。指示を出したユーザは、電話の邪魔になると判断してテレビの音を消すという指示を出したのに対し、情報処理装置１００は、テレビを視聴している他のユーザがいると判断した場合には、他のユーザがテレビの内容を把握できるように、かつ、指示を出したユーザの電話の邪魔にならないように、テレビの音を消すのではなくボリュームを下げるという処理を行って、双方のユーザにとって好ましい処理を実現することができる。 In such a case, the information processing apparatus 100 refers to the environmental condition 401 of the control model 121 and specifies the corresponding control device 402 and control content 403. For these conditions, for example, the control estimating unit 134 performs a process of lowering the volume of the television as the control content to be executed. When the user who issued the instruction judges that it is in the way of the telephone and issues an instruction to mute the TV sound, whereas the information processing apparatus 100 determines that there is another user watching the TV. In order to allow other users to understand the contents of the television and not to disturb the telephone of the user who issued the instruction, a process of lowering the volume instead of turning off the television sound is performed. Can be realized.

以上に説明したように、情報処理装置１００は、ユーザの指示に対して、ユーザの環境に応じた処理を実行することができる。即ち、情報処理装置１００は、同じ指示であっても、ユーザの環境が異なれば、別の制御を実行することができるようになるので、フレキシビリティに富んだ対応が可能な情報処理装置１００を提供することができる。 As described above, the information processing apparatus 100 can execute a process according to a user's environment in response to a user's instruction. That is, the information processing apparatus 100 can execute another control if the user's environment is different even with the same instruction. Can be provided.

（補足）
上記実施形態に係る装置は、上記実施形態に限定されるものではなく、他の手法により実現されてもよいことは言うまでもない。以下、各種変形例について説明する。 (Supplement)
It is needless to say that the device according to the above embodiment is not limited to the above embodiment, and may be realized by another method. Hereinafter, various modifications will be described.

（１）上記実施形態においては、制御の内容を実行する情報処理装置１００と、ユーザ１０の周囲の環境に係る情報を取得する機器としてのスピーカー装置２００とが別の装置である例を説明した。しかし、両装置は、１つの装置で実現されてもよい。即ち、スピーカー装置２００が、情報処理装置１００が保持する機能も備えることとしてよい。この場合、スピーカー装置２００と情報処理装置１００との間で通信を行う必要がなくなり、通信遅延による発生し得る制御の遅延を抑制することができる。 (1) In the above-described embodiment, an example has been described in which the information processing device 100 that executes the content of the control and the speaker device 200 as the device that acquires information about the environment around the user 10 are different devices. . However, both devices may be realized by one device. That is, the speaker device 200 may have a function held by the information processing device 100. In this case, there is no need to perform communication between the speaker device 200 and the information processing device 100, and control delay that may occur due to communication delay can be suppressed.

また、スピーカー装置２００は、情報処理装置１００が有する機能の一部のみを実行できるように、情報処理装置１００の一部の機能部を有することとしてもよい。例えば、スピーカー装置２００は、音声処理部１３１の機能を保持してもよく、例えば、複数の指向性マイクで取得した音声データの中からユーザの指示の声が含まれる音声を特定（フィルタリング）し、ユーザの指示を含む音声データと、指示を含まない（ユーザの周囲の環境音の）音声データと、が区別できるように、情報処理装置１００に送信することとしてもよい。 Further, the speaker device 200 may include some functional units of the information processing device 100 so that only some of the functions of the information processing device 100 can be executed. For example, the speaker device 200 may hold the function of the sound processing unit 131, and for example, specifies (filters) a sound including a voice of a user's instruction from among sound data acquired by a plurality of directional microphones. Alternatively, the data may be transmitted to the information processing apparatus 100 so that the voice data including the user's instruction and the voice data not including the instruction (of the ambient sound around the user) can be distinguished.

（２）上記実施の形態において、スピーカー装置２００は、音声データを逐次送信することとしているが、これはその限りではない。スピーカー装置２００はユーザからの指示入力があったタイミングにおいてのみ、その音声データと周囲の音を示す音声データとを送信することとしてよい。これを実現するために、スピーカー装置２００自身は逐次音声を集音するが、その際に、ユーザからの音声による指示入力があるか否かを検知する検知部を備えてもよい。例えば、人の音声の周波数領域に音があるか否かに基づいてユーザからの指示入力があるか否かを検知し、あると判定した場合に、スピーカー装置２００は、その前後の所定時間長の音声データを情報処理装置１００に送信することとしてよい。 (2) In the above embodiment, the speaker device 200 transmits the audio data sequentially, but this is not a limitation. The speaker device 200 may transmit the audio data and the audio data indicating the surrounding sound only at the timing when the user inputs an instruction. In order to realize this, the speaker device 200 itself sequentially collects sound, and at that time, may include a detection unit that detects whether or not there is an instruction input by a user from the sound. For example, based on whether or not there is a sound in the frequency domain of the human voice, it is detected whether or not there is an instruction input from the user. May be transmitted to the information processing apparatus 100.

また、情報処理装置１００は、ユーザの指示入力に対してユーザの環境を特定する音声データとして、指示入力があった時間帯を含む所定期間長の音声データを用いることとしてもよいし、指示入力があった所定時間前までの所定時間長の音声データを用いることとしてもよいし、指示入力があった所定時間後の所定時間長の音声データを用いることとしてもよい。いずれの態様を採用するかは、情報処理装置１００及びスピーカー装置２００に予めユーザが設定しておくこととしてよい。指示入力があった時間帯を含む所定時間長の音声データを用いる場合には、ジャストタイムの制御を実現することができる。また、指示入力があった所定時間前までの所定時間長の音声データを用いる場合には、先にユーザの置かれている環境を特定することができるのでユーザの周囲の環境の特定に時間を要することなく、すぐにユーザの指示内容を実行できる。また、指示入力があった所定時間後の所定時間長の音声データを用いる場合には、ユーザの指示入力の解析やユーザの周囲の環境の推定に時間を要する場合や音声データの送信遅延がある場合に、その遅延を考慮した制御、即ち、実際の制御を行う際にユーザが置かれている状況に応じた制御を行うことができる。 In addition, the information processing apparatus 100 may use voice data of a predetermined period length including a time zone in which the instruction input is performed, as the voice data specifying the user's environment in response to the user's instruction input. It is also possible to use voice data of a predetermined time length up to a predetermined time before there is, or to use voice data of a predetermined time length after a predetermined time after the instruction input. Which mode is adopted may be set in advance by the user in the information processing device 100 and the speaker device 200. In the case of using audio data of a predetermined time length including a time zone in which an instruction is input, just time control can be realized. Further, in the case of using voice data of a predetermined time length up to a predetermined time before the instruction input, the environment where the user is located can be specified first. The user can immediately execute the instruction content without need. In addition, when voice data of a predetermined time length after a predetermined time after the instruction input is used, there is a case where it takes time to analyze the user's instruction input and estimate the environment around the user, or there is a delay in transmitting the voice data. In this case, it is possible to perform control in consideration of the delay, that is, control according to the situation where the user is placed when performing actual control.

（３）上記実施の形態においては、詳細は示していないが、情報処理装置１００は、ユーザの状況を推定するにあたって、音声データの各種の音源の位置を加味した制御を行うこととしてよい。この場合の位置とは、音声を集音するスピーカー装置２００が設置されている場所から見た音源の位置となる。そして、音声データの各種の音源の位置を加味した制御とは、ユーザの環境の推定を更に詳細に行うためのものであり、例えば、スピーカー装置２００から見てＸ度の方角から、ユーザの指示があった場合であって、Ｙ度の方角からテレビの音声があることが検知できたとする。このとき、環境推定部１３２は、Ｘ度とＹ度との間に所定閾値以上の開きがある場合には、指示を出したユーザは、テレビからは離れた位置にいて、テレビはついているもののユーザはテレビを見ていないという状況であると推定することができる。逆にＸ度とＹ度が所定閾値以内になっている場合には、ユーザはテレビの近くでテレビを視聴しているという状況を推定することもできる。 (3) Although not described in detail in the above embodiment, the information processing apparatus 100 may perform control in consideration of the positions of various sound sources of audio data when estimating the situation of the user. The position in this case is the position of the sound source viewed from the place where the speaker device 200 that collects sound is installed. The control in consideration of the positions of various sound sources in the audio data is for performing more detailed estimation of the user's environment. For example, the user's instruction from the direction of X degrees when viewed from the speaker device 200 is indicated. It is assumed that the presence of TV sound has been detected from the direction of Y degrees. At this time, if there is a difference equal to or more than the predetermined threshold between the X degree and the Y degree, the environment estimating unit 132 determines that the user who has issued the instruction is at a position away from the television and the television is on. It can be assumed that the user is not watching TV. Conversely, when the X degrees and the Y degrees are within the predetermined threshold, it is possible to estimate the situation where the user is watching the television near the television.

また、あるいは、ユーザの指示がＸ度の方角から来た場合に、Ｙ度の方角から水音が聞こえる、あるいは、Ｚ度の方角から掃除機の稼働音が聞こえるという情報を得た場合には、指示を出したユーザの他にもユーザが存在するという環境を推定することができる。このように、音の発生源の位置に基づいて、ユーザの環境を推定することができる。また、ユーザの指示があったときのユーザの位置を特定して、制御に活用することもできる。即ち、ユーザの方角を特定することで、家屋内でユーザがどこにいるのかを特定することができるとともに、その裏返しで、ユーザが存在しない場所を特定することもできる。 Alternatively, when the user's instruction comes from the direction of X degrees, if information is obtained that a water sound can be heard from the direction of Y degrees, or the operation sound of the vacuum cleaner can be heard from the direction of Z degrees. It is possible to estimate an environment in which a user exists in addition to the user who issued the instruction. Thus, the environment of the user can be estimated based on the position of the sound source. In addition, it is also possible to specify the position of the user at the time of the user's instruction and use it for control. In other words, by specifying the direction of the user, it is possible to specify where the user is in the house and, by turning it over, it is possible to specify a place where the user does not exist.

これにより、情報処理装置１００は、ユーザの状況の推定において、より細やかな情報を得たうえでの制御を実現できるので、制御により大きな多様性を持たせることができる。 Thereby, the information processing apparatus 100 can realize control after obtaining more detailed information in estimating the situation of the user, so that the control can have great diversity.

（４）本開示の各実施形態のプログラムは、コンピュータに読み取り可能な記憶媒体に記憶された状態で提供されてもよい。記憶媒体は、「一時的でない有形の媒体」に、プログラムを記憶可能である。記憶媒体は、ＨＤＤやＳＤＤなどの任意の適切な記憶媒体、またはこれらの２つ以上の適切な組合せを含むことができる。記憶媒体は、揮発性、不揮発性、または揮発性と不揮発性の組合せでよい。なお、記憶媒体はこれらの例に限られず、プログラムを記憶可能であれば、どのようなデバイスまたは媒体であってもよい。 (4) The program of each embodiment of the present disclosure may be provided in a state stored in a computer-readable storage medium. The storage medium is capable of storing the program in a “temporary tangible medium”. The storage medium may include any suitable storage medium such as an HDD or an SDD, or a suitable combination of two or more thereof. The storage medium may be volatile, non-volatile, or a combination of volatile and non-volatile. The storage medium is not limited to these examples, and may be any device or medium that can store a program.

なお、情報処理装置１００は、例えば、記憶媒体に記憶されたプログラムを読み出し、読み出したプログラムを実行することによって、各実施形態に示す複数の機能部の機能を実現することができる。また、当該プログラムは、任意の伝送媒体（通信ネットワークや放送波等）を介して、情報処理装置１００に提供されてもよい。情報処理装置１００は、例えば、インターネット等を介してダウンロードしたプログラムを実行することにより、各実施形態に示す複数の機能部の機能を実現する。 Note that the information processing apparatus 100 can realize the functions of the plurality of functional units described in each embodiment, for example, by reading a program stored in a storage medium and executing the read program. Further, the program may be provided to the information processing apparatus 100 via an arbitrary transmission medium (such as a communication network or a broadcast wave). The information processing apparatus 100 realizes the functions of the plurality of functional units described in each embodiment by executing a program downloaded via the Internet or the like, for example.

なお、当該プログラムは、例えば、ＡｃｔｉｏｎＳｃｒｉｐｔ、ＪａｖａＳｃｒｉｐｔ(登録商標)などのスクリプト言語、Ｏｂｊｅｃｔｉｖｅ―Ｃ、Ｊａｖａ(登録商標)などのオブジェクト指向プログラミング言語、ＨＴＭＬ５などのマークアップ言語などを用いて実装できる。 The program can be implemented using, for example, a script language such as ActionScript or JavaScript (registered trademark), an object-oriented programming language such as Objective-C or Java (registered trademark), or a markup language such as HTML5.

情報処理装置１００における処理の少なくとも一部は、１以上のコンピュータにより構成されるクラウドコンピューティングにより実現されていてもよい。また、情報処理装置１００の各機能部は、上記実施形態に示した機能を実現する１または複数の回路によって実現されてもよく、１の回路により複数の機能部の機能が実現されることとしてもよい。 At least a part of the processing in the information processing apparatus 100 may be realized by cloud computing including one or more computers. Further, each functional unit of the information processing apparatus 100 may be realized by one or a plurality of circuits that realize the functions described in the above-described embodiments, and the function of the plurality of functional units may be realized by one circuit. Is also good.

（５）本開示の実施形態を諸図面や実施例に基づき説明してきたが、当業者であれば本開示に基づき種々の変形や修正を行うことが容易であることに注意されたい。従って、これらの変形や修正は本開示の範囲に含まれることに留意されたい。例えば、各手段、各ステップ等に含まれる機能等は論理的に矛盾しないように再配置可能であり、複数の手段やステップ等を１つに組み合わせたり、或いは分割したりすることが可能である。また、各実施形態に示す構成を適宜組み合わせることとしてもよい。 (5) Although the embodiments of the present disclosure have been described based on the drawings and examples, it should be noted that those skilled in the art can easily make various changes and modifications based on the present disclosure. Therefore, it should be noted that these variations and modifications are included in the scope of the present disclosure. For example, the functions included in each means, each step, and the like can be rearranged so as not to be logically inconsistent, and a plurality of means, steps, and the like can be combined into one or divided. . Further, the configurations shown in the embodiments may be appropriately combined.

１００情報処理装置
１１０受信部
１２０記憶部
１３０制御部
１３１音声処理部
１３２環境推定部
１３３指示推定部
１３４制御推定部
１３５実行部
１４０送信部 REFERENCE SIGNS LIST 100 information processing device 110 reception unit 120 storage unit 130 control unit 131 voice processing unit 132 environment estimation unit 133 instruction estimation unit 134 control estimation unit 135 execution unit 140 transmission unit

Claims

A receiving unit that receives an instruction input by voice from a user,
An environment information acquisition unit that acquires environment information that can specify the environment around the user;
An estimating unit that estimates control content to be executed for the instruction input according to the environment information;
An information processing device comprising: an execution unit that executes the control content estimated by the estimation unit.

The information processing apparatus further includes a storage unit that stores a control model capable of deriving control content to be executed from information indicating the content of the instruction input and information indicating an environment,
The information processing apparatus according to claim 1, wherein the estimating unit estimates the control content using the control model.

The environment information acquisition unit acquires a sound collected by a microphone that collects sounds around the user as the environment information,
The said estimation part estimates the environment around the said user based on the said audio | voice, and estimates the control content which should be performed with respect to the said instruction input according to the estimated environment. 3. The information processing device according to 2.

The microphone is a directional microphone whose sound collecting direction is determined,
A plurality of the directional microphones are arranged so that the directivity is directed toward a periphery at a predetermined position,
The estimating unit estimates the position of the sound source of each sound based on the sound collected by each of the directional microphones, and determines the environment around the user based on the position of the sound source of the sound around the user. The information processing apparatus according to claim 3, wherein the estimation is performed.

The receiving unit receives, via the microphone, an instruction input by voice from the user,
The information processing apparatus according to claim 4, wherein the estimating unit estimates the position of the user, and estimates control content to be executed in response to the instruction input, according to the estimated position of the user. .

The information according to any one of claims 1 to 5, wherein the estimating unit estimates, based on the environmental information, control contents to be executed in response to the instruction input, the control contents being different. Processing equipment.

The information processing device includes a speaker,
The said execution part controls the said speaker, and outputs the audio | voice corresponding to the estimated environment according to the said environment information. The method according to any one of claims 1 to 6. Information processing device.

The information processing apparatus according to claim 1, wherein the execution unit controls another device according to the environment information.

A receiving step of receiving an instruction input by voice from the user;
An environment information acquisition step of acquiring environment information capable of identifying an environment around the user;
An estimation step of estimating a control content to be executed for the instruction input, according to the environment information;
An execution step of executing the control content estimated in the estimation step,
Information processing method in which a computer executes the processing.

On the computer,
A reception function for receiving a voice instruction input from a user;
An environment information acquisition function for acquiring environment information that can specify the environment around the user;
An estimating function for estimating control content to be executed for the instruction input, according to the environment information;
An execution function for executing the control content estimated by the estimation function,
Information processing program that realizes.